`_. Uncompress the archive and change into the ``PyCogent`` directory and type::
$ python setup.py build
This automatically compiles the modules. If you have administrative privileges type::
$ sudo python setup.py install
This then places the entire package into your python/site-packages folder.
If you do not have administrator privileges on your machine you can change the build approach to::
$ python setup.py build_ext -if
which compiles the extensions in place (the ``i`` option) forcibly (the ``f`` option, ie even if they've already been compiled). Then move the cogent directory to where you want it (or leave it in place) and add this location to your python path using ``sys.path.insert(0, "/your/path/to/PyCogent")`` in each script, or by setting shell environment variables (e.g. ``$ export PYTHONPATH=/your/path/to/PyCogent:$PYTHONPATH``)
Testing
-------
``PyCogent/tests`` contains all the tests (currently >3100). You can most readily run the tests using the ``PyCogent/run_tests`` shell script. This is done by typing:
.. code-block:: guess
$ sh run_tests
which will automatically build extensions in place, set up the PYTHONPATH and run ``PyCogent/tests/alltests.py``. Note that if certain optional applications are not installed this will be indicated in the output as "can't find" or "not installed". A "`.`" will be printed to screen for each test and if they all pass, you'll see output like:
.. code-block:: guess
Ran 3299 tests in 58.128s
OK
Tips for usage
--------------
A good IDE can greatly simplify writing control scripts. Features such as code completion and definition look-up are extremely useful. For a complete list of `editors go here`_.
To get help on attributes of an object in python, use
.. code-block:: python
>>> dir(myalign)
to list the attributes of ``myalign`` or
.. code-block:: python
>>> help(myalign.writeToFile)
to figure out how to use the ``myalign.writeToFile`` method. Also note that the directory structure of the package is similar to the import statements required to use a module -- to see the contents of ``alignment.py`` or ``sequence.py`` you need to look in the directory ``cogent/core`` path, to use the classes in those files you specify ``cogent.core`` for importing.
.. _Python: http://www.python.org
.. _Cython: http://www.cython.org/
.. _Numpy: http://numpy.scipy.org/
.. _Matplotlib: http://matplotlib.sourceforge.net
.. _Apple: http://www.apple.com
.. _Pyrex: http://www.cosc.canterbury.ac.nz/~greg/python/Pyrex/
.. _`editors go here`: http://www.python.org/cgi-bin/moinmoin/PythonEditors
.. _mpi4py: http://code.google.com/p/mpi4py
.. _`restructured text`: http://docutils.sourceforge.net/rst.html
.. _gcc: http://gcc.gnu.org/
.. _SQLAlchemy: http://www.sqlalchemy.org
.. _`MySQL-python`: http://sourceforge.net/projects/mysql-python
.. _zlib: http://www.zlib.net/
.. _`compiling matplotlib`: http://sourceforge.net/projects/pycogent/forums/forum/651121/topic/5635916
PyCogent-1.5.3/run_tests 000755 000765 000024 00000000612 11623063350 016212 0 ustar 00jrideout staff 000000 000000 #!/bin/sh
# make sure we remove .pyc files in case someone has renamed a module ..
find . -name '*.pyc' -delete
# for automated testing - check if using alternate python install
PYTHON_EXE=python
if [ $PYTHON_TEST_EXE ]; then
PYTHON_EXE=$PYTHON_TEST_EXE;
fi
set -e
$PYTHON_EXE setup.py build_ext --inplace
export PYTHONPATH=`pwd`:$PYTHONPATH
cd tests
nice $PYTHON_EXE alltests.py "$@"
PyCogent-1.5.3/setup.py 000644 000765 000024 00000015136 12024702176 015761 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from distutils.core import setup
import sys, os, re, subprocess
__author__ = "Peter Maxwell"
__copyright__ = "Copyright 2007-2011, The Cogent Project"
__contributors__ = ["Peter Maxwell", "Gavin Huttley", "Matthew Wakefield",
"Greg Caporaso", "Daniel McDonald"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Peter Maxwell"
__email__ = "pm67nz@gmail.com"
__status__ = "Production"
# Check Python version, no point installing if unsupported version inplace
if sys.version_info < (2, 6):
py_version = ".".join([str(n) for n in sys.version_info])
raise RuntimeError("Python-2.6 or greater is required, Python-%s used." % py_version)
# Check Numpy version, no point installing if unsupported version inplace
try:
import numpy
except ImportError:
raise RuntimeError("Numpy required but not found.")
numpy_version = re.split("[^\d]", numpy.__version__)
numpy_version_info = tuple([int(i) for i in numpy_version if i.isdigit()])
if numpy_version_info < (1, 3):
raise RuntimeError("Numpy-1.3 is required, %s found." % numpy_version)
doc_imports_failed = False
try:
import sphinx
except ImportError:
doc_imports_failed = True
# A new command for predist, ie: pyrexc but no compile.
import distutils.ccompiler
class NullCompiler(distutils.ccompiler.CCompiler):
# this basically to stop pyrexc building binaries, just the .c files
executables = ()
def __init__(self, *args, **kw):
pass
def compile(self, *args, **kw):
return []
def link(self, *args, **kw):
pass
# Pyrex makes some messy C code so limit some warnings when we know how.
import distutils.sysconfig
if (distutils.sysconfig.get_config_var('CC') or '').startswith("gcc"):
pyrex_compile_options = ['-w']
else:
pyrex_compile_options = []
# On windows with no commandline probably means we want to build an installer.
if sys.platform == "win32" and len(sys.argv) < 2:
sys.argv[1:] = ["bdist_wininst"]
# Restructured Text -> HTML
def build_html():
if doc_imports_failed:
print "Failed to build html due to ImportErrors for sphinx"
return
cwd = os.getcwd()
os.chdir('doc')
subprocess.call(["make", "html"])
os.chdir(cwd)
print "Built index.html"
# Compiling Pyrex modules to .c and .so
include_path = os.path.join(os.getcwd(), 'include')
# find arrayobject.h on every system an alternative would be to put
# arrayobject.h into pycogent/include, but why ..
numpy_include_path = numpy.get_include()
distutils_extras = {"include_dirs": [include_path, numpy_include_path]}
try:
if 'DONT_USE_PYREX' in os.environ:
raise ImportError
from Cython.Compiler.Version import version
version = tuple([int(v) \
for v in re.split("[^\d]", version) if v.isdigit()])
if version < (0, 11, 2):
print "Your Cython version is too old"
raise ImportError
except ImportError:
print "No Cython, will compile from .c files"
for cmd in ['cython', 'pyrexc', 'predist']:
if cmd in sys.argv:
print "'%s' not available without Cython" % cmd
sys.exit(1)
from distutils.extension import Extension
pyrex_suffix = ".c"
else:
from Cython.Distutils import build_ext
from Cython.Distutils.extension import Extension
pyrex_suffix = ".pyx"
class build_wrappers(build_ext):
# for predist, make .c files
def run(self):
self.compiler = NullCompiler()
# skip build_ext.run() and thus ccompiler setup
build_ext.build_extensions(self)
class build_wrappers_and_html(build_wrappers):
def run(self):
build_wrappers.run(self)
build_html()
distutils_extras["cmdclass"] = {
'build_ext': build_ext,
'pyrexc': build_wrappers,
'cython': build_wrappers,
'predist': build_wrappers_and_html}
# predist python setup.py predist --inplace --force, this is in _darcs/prefs/prefs for instructing darcs predist to execute the subsequent, predist is a darcs word
# Save some repetitive typing. We have all compiled
# modules in place with their python siblings.
def CogentExtension(module_name, extra_compile_args=[], **kw):
path = module_name.replace('.', '/')
kw['extra_compile_args'] = pyrex_compile_options + extra_compile_args
if pyrex_suffix == '.pyx':
kw['pyrex_include_dirs'] = [include_path]
return Extension(module_name, [path + pyrex_suffix], **kw)
short_description = "COmparative GENomics Toolkit"
# This ends up displayed by the installer
long_description = """Cogent
A toolkit for statistical analysis of biological sequences.
Version %s.
""" % __version__
setup(
name="cogent",
version=__version__,
url="http://sourceforge.net/projects/pycogent",
author="Gavin Huttley, Rob Knight",
author_email="gavin.huttley@anu.edu.au, rob@spot.colorado.edu",
description=short_description,
long_description=long_description,
platforms=["any"],
license=["GPL"],
keywords=["biology", "genomics", "statistics", "phylogeny", "evolution",
"bioinformatics"],
classifiers=[
"Development Status :: 5 - Production/Stable",
"Intended Audience :: Science/Research",
"License :: OSI Approved :: GNU General Public License (GPL)",
"Topic :: Scientific/Engineering :: Bio-Informatics",
"Topic :: Software Development :: Libraries :: Python Modules",
"Operating System :: OS Independent",
],
packages=['cogent', 'cogent.align', 'cogent.align.weights', 'cogent.app',
'cogent.cluster', 'cogent.core', 'cogent.data', 'cogent.db',
'cogent.db.ensembl', 'cogent.draw',
'cogent.evolve', 'cogent.format', 'cogent.maths',
'cogent.maths.matrix', 'cogent.maths.stats',
'cogent.maths.stats.cai', 'cogent.maths.unifrac',
'cogent.maths.spatial', 'cogent.motif', 'cogent.parse',
'cogent.phylo', 'cogent.recalculation', 'cogent.seqsim',
'cogent.struct', 'cogent.util'],
ext_modules=[
CogentExtension("cogent.align._compare"),
CogentExtension("cogent.align._pairwise_seqs"),
CogentExtension("cogent.align._pairwise_pogs"),
CogentExtension("cogent.evolve._solved_models"),
CogentExtension("cogent.evolve._likelihood_tree"),
CogentExtension("cogent.evolve._pairwise_distance"),
CogentExtension("cogent.struct._asa"),
CogentExtension("cogent.struct._contact"),
CogentExtension("cogent.maths._period"),
CogentExtension("cogent.maths.spatial.ckd3"),
],
**distutils_extras
)
PyCogent-1.5.3/tests/ 000755 000765 000024 00000000000 12024703635 015404 5 ustar 00jrideout staff 000000 000000 PyCogent-1.5.3/tests/__init__.py 000644 000765 000024 00000001204 12024702176 017511 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
sub_modules = ['alltests',
'benchmark',
'benchmark_aligning',
'test_draw',
'test_phylo',
'timetrial']
for sub_module in sub_modules:
exec ("from %s import %s" % (__name__, sub_module))
__author__ = ""
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Peter Maxwell", "Gavin Huttley", "Rob Knight",
"Matthew Wakefield", "Andrew Butterfield", "Edward Lang"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Production"
PyCogent-1.5.3/tests/alltests.py 000644 000765 000024 00000031543 12024702176 017616 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
#
# suite of cogent package unit tests.
# run suite by executing this file
#
import doctest, cogent.util.unit_test as unittest, sys, os
from cogent.util.misc import app_path
__author__ = "Peter Maxwell and Gavin Huttley"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Peter Maxwell", "Gavin Huttley", "Rob Knight",
"Hau Ying", "Helen Lindsay", "Jeremy Widmann",
"Sandra Smit", "Greg Caporaso", "Matthew Wakefield"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Production"
def my_import(name):
"""Imports a module, possibly qualified with periods. Returns the module.
__import__ only imports the top-level module.
Recipe from python documentation at:
http://www.python.org/doc/2.4/lib/built-in-funcs.html
"""
mod = __import__(name)
components = name.split('.')
for comp in components[1:]:
mod = getattr(mod, comp)
return mod
def module_present(modules):
"""returns True if dependencies present"""
if type(modules) == str:
modules = [modules]
try:
for module in modules:
mod = __import__(module)
except ImportError:
return False
return True
def suite():
modules_to_test = [
'test_recalculation.rst',
'test_phylo',
'test_dictarray.rst',
'test_align.test_align',
'test_align.test_algorithm',
'test_align.test_weights.test_methods',
'test_align.test_weights.test_util',
'test_app.test_parameters',
'test_app.test_util',
'test_cluster.test_goodness_of_fit',
'test_cluster.test_metric_scaling',
'test_cluster.test_approximate_mds',
'test_cluster.test_procrustes',
'test_cluster.test_UPGMA',
'test_cluster.test_nmds',
'test_core.test_alphabet',
'test_core.test_alignment',
'test_core.test_annotation',
'test_core.test_bitvector',
'test_core.test_core_standalone',
'test_core.test_features.rst',
'test_core.test_entity',
'test_core.test_genetic_code',
'test_core.test_info',
'test_core.test_location',
'test_core.test_maps',
'test_core.test_moltype',
'test_core.test_profile',
'test_core.test_seq_aln_integration',
'test_core.test_sequence',
'test_core.test_tree',
'test_core.test_usage',
'test_data.test_molecular_weight',
'test_evolve.test_best_likelihood',
'test_evolve.test_bootstrap',
'test_evolve.test_coevolution',
'test_evolve.test_models',
'test_evolve.test_motifchange',
'test_evolve.test_substitution_model',
'test_evolve.test_scale_rules',
'test_evolve.test_likelihood_function',
'test_evolve.test_newq',
'test_evolve.test_pairwise_distance',
'test_evolve.test_parameter_controller',
'test_format.test_bedgraph',
'test_format.test_fasta',
'test_format.test_mage',
'test_format.test_pdb_color',
'test_format.test_xyzrn',
'test_maths.test_fit_function',
'test_maths.test_geometry',
'test_maths.test_matrix_logarithm',
'test_maths.test_period',
'test_maths.test_matrix.test_distance',
'test_maths.test_spatial.test_ckd3',
'test_maths.test_stats.test_alpha_diversity',
'test_maths.test_stats.test_distribution',
'test_maths.test_stats.test_histogram',
'test_maths.test_stats.test_information_criteria',
'test_maths.test_stats.test_period',
'test_maths.test_stats.test_special',
'test_maths.test_stats.test_test',
'test_maths.test_stats.test_ks',
'test_maths.test_stats.test_rarefaction',
'test_maths.test_stats.test_util',
'test_maths.test_stats.test_cai.test_adaptor',
'test_maths.test_stats.test_cai.test_get_by_cai',
'test_maths.test_stats.test_cai.test_util',
'test_maths.test_optimisers',
'test_maths.test_distance_transform',
'test_maths.test_unifrac.test_fast_tree',
'test_maths.test_unifrac.test_fast_unifrac',
'test_motif.test_util',
'test_parse.test_aaindex',
'test_parse.test_agilent_microarray',
'test_parse.test_binary_sff',
'test_parse.test_blast',
'test_parse.test_bowtie',
'test_parse.test_bpseq',
'test_parse.test_cigar',
'test_parse.test_clustal',
'test_parse.test_column',
'test_parse.test_comrna',
'test_parse.test_consan',
'test_parse.test_cove',
'test_parse.test_ct',
'test_parse.test_cut',
'test_parse.test_cutg',
'test_parse.test_dialign',
'test_parse.test_ebi',
'test_parse.test_fasta',
'test_parse.test_fastq',
'test_parse.test_gbseq',
'test_parse.test_gibbs',
'test_parse.test_genbank',
'test_parse.test_gff',
'test_parse.test_greengenes',
'test_parse.test_ilm',
'test_parse.test_illumina_sequence',
'test_parse.test_locuslink',
'test_parse.test_mage',
'test_parse.test_meme',
'test_parse.test_msms',
'test_parse.test_ncbi_taxonomy',
'test_parse.test_nexus',
'test_parse.test_nupack',
'test_parse.test_pdb',
'test_parse.test_psl',
'test_parse.test_structure',
'test_parse.test_pamlmatrix',
'test_parse.test_phylip',
'test_parse.test_pknotsrg',
'test_parse.test_rdb',
'test_parse.test_record',
'test_parse.test_record_finder',
'test_parse.test_rfam',
'test_parse.test_rnaalifold',
'test_parse.test_rna_fold',
'test_parse.test_rnaview',
'test_parse.test_rnaforester',
'test_parse.test_sprinzl',
'test_parse.test_tinyseq',
'test_parse.test_tree',
'test_parse.test_unigene',
'test_seqsim.test_analysis',
'test_seqsim.test_birth_death',
'test_seqsim.test_markov',
'test_seqsim.test_microarray',
'test_seqsim.test_microarray_normalize',
'test_seqsim.test_randomization',
'test_seqsim.test_searchpath',
'test_seqsim.test_sequence_generators',
'test_seqsim.test_tree',
'test_seqsim.test_usage',
'test_struct.test_knots',
'test_struct.test_pairs_util',
'test_struct.test_rna2d',
'test_struct.test_asa',
'test_struct.test_contact',
'test_struct.test_annotation',
'test_struct.test_selection',
'test_struct.test_manipulation',
'test_util.test_unit_test',
'test_util.test_array',
'test_util.test_dict2d',
'test_util.test_misc',
'test_util.test_organizer',
'test_util.test_recode_alignment',
'test_util.test_table.rst',
'test_util.test_transform',
]
try:
import matplotlib
except:
print >> sys.stderr, "No matplotlib so not running test_draw.py"
else:
modules_to_test.append('test_draw')
modules_to_test.append('test_draw.test_distribution_plots')
#Try importing modules for app controllers
apps = [('formatdb', 'test_formatdb'),
('blastall', 'test_blast'),
('blat', 'test_blat'),
('bwa', 'test_bwa'),
('carnac', 'test_carnac'),
('clearcut', 'test_clearcut'),
('clustalw', 'test_clustalw'),
('cmalign', 'test_infernal'),
('cmfinder.pl', 'test_cmfinder'),
('comrna', 'test_comrna'),
('contrafold', 'test_contrafold'),
('covea', 'test_cove'),
('dialign2-2', 'test_dialign'),
('dynalign', 'test_dynalign'),
('FastTree', 'test_fasttree'),
('foldalign', 'test_foldalign'),
('guppy', 'test_guppy'),
('ilm', 'test_ilm'),
('knetfold.pl', 'test_knetfold'),
('mafft', 'test_mafft'),
('mfold', 'test_mfold'),
('mothur', 'test_mothur'),
('muscle', 'test_muscle_v38'),
('msms', 'test_msms'),
('ParsInsert', 'test_parsinsert'),
('pplacer', 'test_pplacer'),
('rdp_classifier-2.2.jar', 'test_rdp_classifier'),
('rdp_classifier-2.0.jar', 'test_rdp_classifier20'),
('Fold.out', 'test_nupack'),
('findphyl', 'test_pfold'),
('pknotsRG-1.2-i386-linux-static', 'test_pknotsrg'),
('RNAalifold', 'test_rnaalifold'),
('rnaview', 'test_rnaview'),
('RNAfold', 'test_vienna_package'),
('raxmlHPC', 'test_raxml_v730'),
('rtax', 'test_rtax'),
('sfold.X86_64.LINUX', 'test_sfold'),
('stride', 'test_stride'),
('hybrid-ss-min', 'test_unafold'),
('cd-hit', 'test_cd_hit'),
('calculate_likelihood', 'test_gctmpca'),
('sfffile', 'test_sfffile'),
('sffinfo', 'test_sffinfo'),
('uclust','test_uclust'),
('usearch','test_usearch')
]
for app, test_name in apps:
should_run_test = False
if app_path(app):
should_run_test = True
elif app.startswith('rdp_classifier') and os.environ.get('RDP_JAR_PATH'):
# This is ugly, but because this is a jar file, it won't be in
# $PATH -- we require users to set an environment variable to
# point to the location of this jar file, so we test for that.
# My new version of app_path can be applied to do smarter checks,
# but will involve some re-write of how we check whether tests can
# be run. -Greg
if app == os.path.basename(os.environ.get('RDP_JAR_PATH')):
should_run_test = True
if should_run_test:
modules_to_test.append('test_app.' + test_name)
else:
print >> sys.stderr, "Can't find %s executable: skipping test" % app
if app_path('muscle'):
modules_to_test.append('test_format.test_pdb_color')
# we now toggle the db tests, based on an environment flag
if int(os.environ.get('TEST_DB', 0)):
db_tests = ['test_db.test_ncbi', 'test_db.test_pdb',
'test_db.test_rfam', 'test_db.test_util']
# we check for an environment flag for ENSEMBL
# we expect this to have the username and account for a localhost
# installation of the Ensembl MySQL databases
if 'ENSEMBL_ACCOUNT' in os.environ:
# check for cogent.db.ensembl dependencies
test_ensembl = True
for module in ['MySQLdb', 'sqlalchemy']:
if not module_present(module):
test_ensembl = False
print >> sys.stderr, \
"Module '%s' not present: skipping test" % module
if test_ensembl:
db_tests += ['test_db.test_ensembl.test_assembly',
'test_db.test_ensembl.test_database',
'test_db.test_ensembl.test_compara',
'test_db.test_ensembl.test_genome',
'test_db.test_ensembl.test_host',
'test_db.test_ensembl.test_metazoa',
'test_db.test_ensembl.test_species',
'test_db.test_ensembl.test_feature_level']
else:
print >> sys.stderr, "Environment variable ENSEMBL_ACCOUNT not "\
"set: skipping db.ensembl tests"
for db_test in db_tests:
modules_to_test.append(db_test)
else:
print >> sys.stderr, \
"Environment variable TEST_DB=1 not set: skipping db tests"
assert sys.version_info >= (2, 6)
alltests = unittest.TestSuite()
for module in modules_to_test:
if module.endswith('.rst'):
module = os.path.join(*module.split(".")[:-1]) + ".rst"
test = doctest.DocFileSuite(module, optionflags=
doctest.REPORT_ONLY_FIRST_FAILURE |
doctest.ELLIPSIS)
else:
test = unittest.findTestCases(my_import(module))
alltests.addTest(test)
return alltests
class BoobyTrappedStream(object):
def __init__(self, output):
self.output = output
def write(self, text):
self.output.write(text)
raise RuntimeError, "Output not allowed in tests"
def flush(self):
pass
def isatty(self):
return False
if __name__ == '__main__':
if '--debug' in sys.argv:
s = suite()
s.debug()
else:
orig = sys.stdout
if '--output-ok' in sys.argv:
sys.argv.remove('--output-ok')
else:
sys.stdout = BoobyTrappedStream(orig)
try:
unittest.main(defaultTest='suite', argv=sys.argv)
finally:
sys.stdout = orig
PyCogent-1.5.3/tests/benchmark.py 000644 000765 000024 00000012073 12024702176 017712 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
import sys #,hotshot
from cogent.evolve.substitution_model import Nucleotide, Dinucleotide, Codon
from cogent import LoadSeqs, LoadTree
from cogent.maths import optimisers
from cogent.util import parallel
__author__ = "Peter Maxwell and Gavin Huttley"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Peter Maxwell", "Gavin Huttley"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Production"
ALIGNMENT = LoadSeqs(filename="data/brca1.fasta")
TREE = LoadTree(filename="data/murphy.tree")
def subtree(size):
names = ALIGNMENT.getSeqNames()[:size]
assert len(names) == size
tree = TREE.getSubTree(names) #.balanced()
return names, tree
def brca_test(subMod, names, tree, length, par_rules, **kw):
#names = ALIGNMENT.getSeqNames()[:taxa]
#assert len(names) == taxa
tree = TREE.getSubTree(names) #.balanced()
aln = ALIGNMENT.takeSeqs(names).omitGapPositions()[:length]
assert len(aln) == length, (len(aln), length)
#the_tree_analysis = LikelihoodFunction(treeobj = tree, submodelobj = subMod, alignobj = aln)
par_controller = subMod.makeParamController(tree, **kw)
for par_rule in par_rules:
par_controller.setParamRule(**par_rule)
#lf = par_controller.makeCalculator(aln)
return (par_controller, aln)
def measure_evals_per_sec(pc, aln):
pc.setAlignment(aln)
return pc.measureEvalsPerSecond(time_limit=2.0, wall=False)
def makePC(modelClass, parameterisation, length, taxa, tree, opt_mprobs, **kw):
modelClass = eval(modelClass)
if parameterisation is not None:
predicates = {'silly': silly_predicate}
par_rules = [{'par_name':'silly', 'is_independent':parameterisation}]
else:
predicates = {}
par_rules = []
subMod = modelClass(equal_motif_probs=True, optimise_motif_probs=opt_mprobs,
predicates=predicates, recode_gaps=True, mprob_model="conditional")
(pc, aln) = brca_test(subMod, taxa, tree, length, par_rules, **kw)
return (pc, aln)
def quiet(f, *args, **kw):
import sys, cStringIO
temp = cStringIO.StringIO()
_stdout = sys.stdout
try:
sys.stdout = temp
result = f(*args, **kw)
finally:
#pass
sys.stdout = _stdout
return result
def evals_per_sec(*args):
pc, aln = makePC(*args) #quiet(makeLF, *args)
speed1 = measure_evals_per_sec(pc, aln)
speed = str(int(speed1))
return speed
class CompareImplementations(object):
def __init__(self, switch):
self.switch = switch
def __call__(self, *args):
self.switch(0)
(pc,aln) = quiet(makePC, *args)
speed1 = measure_evals_per_sec(pc,aln)
self.switch(1)
(pc,aln) = quiet(makePC, *args)
speed2 = measure_evals_per_sec(pc,aln)
if speed1 < speed2:
speed = '+%2.1f' % (speed2/speed1)
else:
speed = '-%2.1f' % (speed1/speed2)
if speed in ['+1.0', '-1.0']:
speed = ''
return speed
def benchmarks(test):
alphabets = ["Nucleotide", "Dinucleotide", "Codon"]
sequence_lengths = [18, 2004]
treesizes = [5, 20]
for (optimise_motifs, parameterisation) in [
(False, 'global'), (False, 'local'), (True, 'global')]:
print parameterisation, ['', 'opt motifs'][optimise_motifs]
print ' ' * 14,
wcol = 5*len(sequence_lengths) + 2
for alphabet in alphabets:
print str(alphabet).ljust(wcol),
print
print '%-15s' % "", # "length"
for alphabet in alphabets:
for sequence_length in sequence_lengths:
print "%4s" % sequence_length,
print ' ',
print
print ' '*12 + (' | '.join(['']+['-'*(len(sequence_lengths)*5) for alphabet in alphabets]+['']))
for treesize in treesizes:
print ("%4s taxa | " % treesize),
(taxa, tree) = subtree(treesize)
for alphabet in alphabets:
for sequence_length in sequence_lengths:
speed = test(alphabet, parameterisation=='local',
sequence_length, taxa, tree, optimise_motifs)
print "%4s" % speed,
print '| ',
print
print
print
def silly_predicate(a,b):
return a.count('A') > a.count('T') or b.count('A') > b.count('T')
#def asym_predicate((a,b)):
# print a, b, 'a' in a
# return 'a' in a
#mA = Codon()
#mA.setPredicates({'asym': asym_predicate})
def exponentiator_switch(switch):
import cogent.evolve.substitution_calculation
cogent.evolve.substitution_calculation.use_new = switch
import sys
if 'relative' in sys.argv:
test = CompareImplementations(exponentiator_switch)
else:
test = evals_per_sec
parallel.inefficiency_forgiven = True
if parallel.getCommunicator().Get_rank() > 0:
#benchmarks(test)
quiet(benchmarks, test)
else:
try:
benchmarks(test)
except KeyboardInterrupt:
print ' OK'
PyCogent-1.5.3/tests/benchmark_aligning.py 000644 000765 000024 00000002334 12024702176 021561 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
import numpy
import time
from cogent import DNA
from cogent.align.align import classic_align_pairwise, make_dna_scoring_dict
__author__ = "Peter Maxwell"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Peter Maxwell"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Peter Maxwell"
__email__ = "pm67nz@gmail.com"
__status__ = "Production"
def _s2i(s):
return numpy.array(['ATCG'.index(c) for c in s])
def test(r=1, **kw):
S = make_dna_scoring_dict(10, -1, -8)
seq2 = DNA.makeSequence('AAAATGCTTA' * r)
seq1 = DNA.makeSequence('AATTTTGCTG' * r)
t0 = time.time()
aln = classic_align_pairwise(seq1, seq2, S, 10, 2, local=False, **kw)
t = time.time() - t0
return (len(seq1)*len(seq2))/t
print t
if __name__ == '__main__':
d = 2
e = 1
options = [(False, False), (True, False), (False, True)]
template = "%10s " * 4
print " 1000s positions per second"
print template % ("size", "simple", "logs", "scaled")
for r in [50, 100, 200, 500]:
times = [test(r, use_logs=l, use_scaling=s) for (l,s) in options]
print template % tuple([r*10] + [int(t/1000) for t in times])
PyCogent-1.5.3/tests/data/ 000755 000765 000024 00000000000 12024703634 016314 5 ustar 00jrideout staff 000000 000000 PyCogent-1.5.3/tests/test_align/ 000755 000765 000024 00000000000 12024703632 017532 5 ustar 00jrideout staff 000000 000000 PyCogent-1.5.3/tests/test_app/ 000755 000765 000024 00000000000 12024703632 017220 5 ustar 00jrideout staff 000000 000000 PyCogent-1.5.3/tests/test_cluster/ 000755 000765 000024 00000000000 12024703632 020121 5 ustar 00jrideout staff 000000 000000 PyCogent-1.5.3/tests/test_core/ 000755 000765 000024 00000000000 12024703632 017370 5 ustar 00jrideout staff 000000 000000 PyCogent-1.5.3/tests/test_data/ 000755 000765 000024 00000000000 12024703633 017352 5 ustar 00jrideout staff 000000 000000 PyCogent-1.5.3/tests/test_db/ 000755 000765 000024 00000000000 12024703632 017025 5 ustar 00jrideout staff 000000 000000 PyCogent-1.5.3/tests/test_dictarray.rst 000644 000765 000024 00000002302 11634002704 021147 0 ustar 00jrideout staff 000000 000000 >>> import numpy
>>> from cogent import DNA
>>> from cogent.util.dict_array import DictArrayTemplate
>>> a = numpy.identity(3, int)
>>> b = DictArrayTemplate('abc', 'ABC').wrap(a)
>>> b[0]
===========
A B C
-----------
1 0 0
-----------
>>> b['a']
===========
A B C
-----------
1 0 0
-----------
>>> b.keys()
['a', 'b', 'c']
>>> row = b['a']
>>> row.keys()
['A', 'B', 'C']
>>> list(row)
[1, 0, 0]
>>> sum(row)
1
>>> # Dimensions can also be ordinay integers
>>> b = DictArrayTemplate(3, 3).wrap(a)
>>> b.keys()
[0, 1, 2]
>>> b[0].keys()
[0, 1, 2]
>>> sum(b[0])
1
>>> # Or a mix
>>> b = DictArrayTemplate('ABC', 3).wrap(a)
>>> b.keys()
['A', 'B', 'C']
>>> b['A'].keys()
[0, 1, 2]
``DictArray`` should work properly in ``numpy`` operations.
>>> darr = DictArrayTemplate(list(DNA), list(DNA)).wrap([[.7,.1,.1,.1],
... [.1,.7,.1,.1],
... [.1,.1,.7,.1],
... [.1,.1,.1,.7]])
>>> mprobs = numpy.array([0.25, 0.25, 0.25, 0.25])
>>> print mprobs.dot(darr)
[ 0.25 0.25 0.25 0.25]
>>> print numpy.dot(mprobs, darr)
[ 0.25 0.25 0.25 0.25]
PyCogent-1.5.3/tests/test_draw/ 000755 000765 000024 00000000000 12024703633 017376 5 ustar 00jrideout staff 000000 000000 PyCogent-1.5.3/tests/test_draw.py 000644 000765 000024 00000024321 12024702176 017753 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""This script/module can do any of 5 different actions with the figures it makes.
When run as a script, one of the actions must be specified:
- exercise
The default when used as a module. Drawing code is used to make a
matplotlib figure object but that is all.
- record
Save PNG images of the figures in draw_results/baseline/
To be used before making changes in drawing code.
- check
Compare figures with saved baseline figures. Fail if they don't match.
Failed figures are saved in draw_results/current. Also makes an HTML
page comparing them with the baseline images.
- compare
Save ALL differing figures in draw_results/current and make HTML page
comparing them with the baseline images.
- view
Save all differing figures in draw_results/current and make HTML page
comparing them with the baseline images, along with all the matching
figures too.
"""
import warnings
warnings.filterwarnings('ignore', category=UserWarning, module='matplotlib')
import matplotlib
matplotlib.use('Agg')
import unittest
import sys, os, cStringIO
from cogent import DNA, LoadTree, LoadSeqs
from cogent.core import alignment, alphabet, annotation
from cogent.draw.linear import *
from cogent.draw.dendrogram import *
from cogent.draw.compatibility import partimatrix
__author__ = "Peter Maxwell"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Peter Maxwell", "Gavin Huttley", "Rob Knight",
"Matthew Wakefield"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Production"
def file_for_test(msg, baseline=False, prefixed=True):
file_ext = "png"
dirname = 'baseline' if baseline else 'current'
if prefixed:
dirname = os.path.join('draw_results', dirname)
fname = msg.replace(' ', '_') + '.' + file_ext
return os.path.join(dirname, fname)
def fig2png(fig):
f = cStringIO.StringIO()
fig.savefig(f, format='png')
return f.getvalue()
def writefile(fname, content):
dirname = os.path.dirname(fname)
if not os.path.exists(dirname):
os.makedirs(dirname)
f = open(fname, 'wb')
f.write(content)
f.close()
def exercise(msg, fig):
pass
def record(msg, fig):
png = fig2png(fig)
fname = file_for_test(msg, True)
writefile(fname, png)
class CheckOutput(object):
def __init__(self, failOnDifference=True, showAll=False):
if not os.path.exists('draw_results/baseline'):
raise RuntimeError(
'No baseline found. Run "test_draw.py record" first')
self.results = []
self.failOnDifference = failOnDifference
self.showAll = showAll
self.anyFailures = False
def __call__(self, msg, fig):
fname = file_for_test(msg, True)
observed = fig2png(fig)
if os.path.exists(fname):
expected = open(fname, 'rb').read()
different = observed != expected
self.results.append((msg, different))
if different:
self.anyFailures = True
writefile(file_for_test(msg, False), observed)
if self.failOnDifference:
raise AssertionError('See draw_results/comparison.html')
else:
print 'difference from', fname
else:
raise RuntimeError('No baseline image at %s' % fname)
def writeHTML(self):
html = ['Drawing Test Output ',
'']
html.append('%s figures of which %s differ from baseline' % (
len(self.results), sum(d for (m,d) in self.results)))
for (msg, different) in self.results:
fn1 = file_for_test(msg, True, False)
fn2 = file_for_test(msg, False, False)
if different:
html.append('
%s ' % msg)
html.append('Old ')
html.append(' ' % fn1)
html.append('New ')
html.append(' ' % fn2)
elif self.showAll:
html.append('%s ' % msg)
html.append(' ' % fn1)
else:
html.append('%s
' % msg)
html.append(' ')
html.append('')
html = '\n'.join(html)
f = open('draw_results/comparison.html', 'w')
f.write(html)
f.close()
def report(self):
self.writeHTML()
if self.anyFailures or self.showAll:
if sys.platform == 'darwin':
import subprocess
subprocess.call(['open', 'draw_results/comparison.html'])
else:
print "See draw_results/comparison.html"
def do(msg, display, **kw):
fig = display.makeFigure(**kw)
test_figure(msg, fig)
def makeSampleSequence():
seq = 'tgccnwsrygagcgtgttaaacaatggccaactctctaccttcctatgttaaacaagtgagatcgcaggcgcgccaaggc'
seq = DNA.makeSequence(seq)
v = seq.addAnnotation(annotation.Feature, 'exon', 'exon', [(20,35)])
v = seq.addAnnotation(annotation.Feature, 'repeat_unit', 'repeat_unit', [(39,49)])
v = seq.addAnnotation(annotation.Feature, 'repeat_unit', 'rep2', [(49,60)])
return seq
def makeSampleAlignment():
# must be an esier way to make an alignment of annotated sequences!
from cogent.align.align import global_pairwise, make_dna_scoring_dict
DNA = make_dna_scoring_dict(10, -8, -8)
seq1 = makeSampleSequence()[:-2]
seq2 = makeSampleSequence()[2:]
seq2 = seq2[:30] + seq2[50:]
seq1.Name = 'FAKE01'
seq2.Name = 'FAKE02'
names = (seq1.getName(), seq2.getName())
align = global_pairwise(seq1, seq2, DNA, 2, 1)
align.addAnnotation(annotation.Variable, 'redline', 'align', [((0,15),1),((15,30),2),((30,45),3)])
align.addAnnotation(annotation.Variable, 'blueline', 'align', [((0,15),1.5),((15,30),2.5),((30,45),3.5)])
return align
seq = makeSampleSequence()
a = seq.addAnnotation(annotation.Variable, 'blueline', 'seq', [((0,15),1),((15,30),2),((30,45),3)])
v = seq.addAnnotation(annotation.Feature, 'gene', 'gene', [(0,15),(20,35),(40,55)])
b = v.addAnnotation(annotation.Variable, 'redline', 'feat', [((0,15),1.5),((15,30),2.5),((30,45),3.5)])
align = makeSampleAlignment()
def green_cg(seq):
seq = str(seq)
posn = 0
result = []
while True:
last = posn
posn = seq.find('CG', posn)
if posn < 0: break
result.append('k' * (posn-last)+'gg')
posn += 2
result.append('k' * (len(seq)-last))
return list(''.join(result))
class DrawingTests(unittest.TestCase):
def test_seqs(self):
seqd = Display(seq)
do('sequence wrapped at 50',
seqd, rowlen=50)
small = FontProperties(size=7, stretch='extra-condensed')
do('squashed sequence',
seqd.copy(seq_font=small, colour_sequences=True))
do('seq display slice from 5 to 45 starts %s' % seq[5:8],
seqd[5:45])
def test_alns(self):
alignd = Display(align, colour_sequences=True, min_feature_height=10)
do('coloured text alignment',
alignd)
do('coloured alignment no text',
alignd.copy(show_text=False))
do('no text and no colour',
alignd.copy(show_text=False, colour_sequences=False))
do('no shapes',
alignd.copy(show_text=False, draw_bases=False))
do('no text or colour or shapes',
alignd.copy(show_text=False, colour_sequences=False, draw_bases=False))
do('green seqs',
alignd.copy(seq_color_callback=green_cg))
def test_legend(self):
from cogent.draw.legend import Legend
do('Feature Legend', Legend())
def test_dotplot(self):
from cogent.draw.dotplot import Display2D
do('2d', Display2D(seq, seq[:40], show_text=False, draw_bases=False))
def test_trees(self):
treestring = "((A:.1,B:.22)ab:.3,((C:.4,D:.5)cd:.55,E:.6)cde:.7,F:.2)"
for edge in 'ABCDEF':
treestring = treestring.replace(edge, edge+edge.lower()*10)
t = LoadTree(treestring=treestring)
for klass in [
UnrootedDendrogram,
SquareDendrogram,
ContemporaneousDendrogram,
ShelvedDendrogram,
# StraightDendrogram,
# ContemporaneousStraightDendrogram
]:
dendro = klass(t)
dendro.getConnectingNode('Ccccccccccc', 'Eeeeeeeeeee').setCollapsed(
color="green", label="C, D and E")
do(klass.__name__, dendro, shade_param="length",
show_params=["length"])
def callback(edge):
return ["blue", "red"][edge.Name.startswith("A")]
do("Highlight edge A", UnrootedDendrogram(t), edge_color_callback=callback)
def test_partimatrix(self):
aln = LoadSeqs(filename='data/brca1.fasta', moltype=DNA)
species5 = ['Human','HowlerMon','Mouse','NineBande','DogFaced']
aln = aln.takeSeqs(species5)
aln = aln[:500]
fig = partimatrix(aln, samples=0, display=True, print_stats=False,
s_limit=10, title="brca1")
test_figure('compatibility', fig)
if __name__ != "__main__":
test_figure = exercise
else:
myargs = []
for arg in ['exercise', 'record', 'check', 'compare', 'view']:
if arg in sys.argv:
sys.argv.remove(arg)
myargs.append(arg)
if len(myargs) != 1:
print 'Need one action, got', myargs
print __doc__
sys.exit(1)
action = myargs[0]
if action == 'record':
test_figure = record
elif action == 'check':
test_figure = CheckOutput(True)
elif action == 'compare':
test_figure = CheckOutput(False)
elif action == 'view':
test_figure = CheckOutput(False, True)
elif action == 'exercise':
test_figure = exercise
else:
raise RuntimeError('Unknown action %s' % action)
try:
unittest.main()
finally:
if hasattr(test_figure, 'report'):
test_figure.report()
PyCogent-1.5.3/tests/test_evolve/ 000755 000765 000024 00000000000 12024703633 017741 5 ustar 00jrideout staff 000000 000000 PyCogent-1.5.3/tests/test_format/ 000755 000765 000024 00000000000 12024703632 017730 5 ustar 00jrideout staff 000000 000000 PyCogent-1.5.3/tests/test_maths/ 000755 000765 000024 00000000000 12024703635 017557 5 ustar 00jrideout staff 000000 000000 PyCogent-1.5.3/tests/test_motif/ 000755 000765 000024 00000000000 12024703635 017561 5 ustar 00jrideout staff 000000 000000 PyCogent-1.5.3/tests/test_parse/ 000755 000765 000024 00000000000 12024703635 017555 5 ustar 00jrideout staff 000000 000000 PyCogent-1.5.3/tests/test_phylo.py 000644 000765 000024 00000026430 12024702176 020154 0 ustar 00jrideout staff 000000 000000 #! /usr/bin/env python
import unittest, os
import warnings
from numpy import log, exp
warnings.filterwarnings('ignore', 'Not using MPI as mpi4py not found')
from cogent.phylo.distance import EstimateDistances
from cogent.phylo.nj import nj, gnj
from cogent.phylo.least_squares import wls
from cogent import LoadSeqs, LoadTree
from cogent.phylo.tree_collection import LogLikelihoodScoredTreeCollection,\
WeightedTreeCollection, LoadTrees
from cogent.evolve.models import JC69, HKY85, F81
from cogent.phylo.consensus import majorityRule, weightedMajorityRule
from cogent.util.misc import remove_files
__author__ = "Peter Maxwell"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Peter Maxwell", "Gavin Huttley", "Matthew Wakefield",\
"Daniel McDonald"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Production"
def Tree(t):
return LoadTree(treestring=t)
class ConsensusTests(unittest.TestCase):
def setUp(self):
self.trees = [
Tree("((a,b),(c,d));"),
Tree("((a,b),(c,d));"),
Tree("((a,c),(b,d));"),
Tree("((a,b),c,d);")]
data = zip(map(log, [0.4,0.4,0.05,0.15]), # emphasizing the a,b clade
self.trees)
data.sort()
data.reverse()
self.scored_trees = data
def test_majorityRule(self):
"""Tests for majority rule consensus trees"""
trees = self.trees
outtrees = majorityRule(trees, strict=False)
self.assertEqual(len(outtrees), 1)
self.assert_(outtrees[0].sameTopology(Tree("((c,d),(a,b));")))
outtrees = majorityRule(trees, strict=True)
self.assertEqual(len(outtrees), 1)
self.assert_(outtrees[0].sameTopology(Tree("(c,d,(a,b));")))
def test_consensus_from_scored_trees_collection(self):
"""tree collection should get same consensus as direct approach"""
sct = LogLikelihoodScoredTreeCollection([(1, t) for t in self.trees])
ct = sct.getConsensusTree()
self.assertTrue(ct.sameTopology(Tree("((c,d),(a,b));")))
def test_weighted_consensus_from_scored_trees_collection(self):
"""weighted consensus from a tree collection should be different"""
sct = LogLikelihoodScoredTreeCollection(self.scored_trees)
ct = sct.getConsensusTree()
self.assertTrue(ct.sameTopology(Tree("((a,b),(c,d));")))
def test_weighted_trees_satisyfing_cutoff(self):
"""build consensus tree from those satisfying cutoff"""
sct = LogLikelihoodScoredTreeCollection(self.scored_trees)
cts = sct.getWeightedTrees(cutoff=0.8)
expected_trees = [Tree(t) for t in "((a,b),(c,d));", "((a,b),(c,d));",
"((a,b),c,d);"]
for i in range(len(cts)):
cts[i][1].sameTopology(expected_trees[i])
ct = cts.getConsensusTree()
self.assertTrue(ct.sameTopology(Tree("((a,b),(c,d));")))
def test_tree_collection_read_write_file(self):
"""should correctly read / write a collection from a file"""
def eval_klass(coll):
coll.writeToFile('sample.trees')
read = LoadTrees('sample.trees')
self.assertTrue(type(read) == type(coll))
eval_klass(LogLikelihoodScoredTreeCollection(self.scored_trees))
# convert lnL into p
eval_klass(WeightedTreeCollection([(exp(s), t)
for s,t in self.scored_trees]))
remove_files(['sample.trees'], error_on_missing=False)
class TreeReconstructionTests(unittest.TestCase):
def setUp(self):
self.tree = LoadTree(treestring='((a:3,b:4):2,(c:6,d:7):30,(e:5,f:5):5)')
self.dists = self.tree.getDistances()
def assertTreeDistancesEqual(self, t1, t2):
d1 = t1.getDistances()
d2 = t2.getDistances()
self.assertEqual(len(d1), len(d2))
for key in d2:
self.assertAlmostEqual(d1[key], d2[key])
def test_nj(self):
"""testing nj"""
reconstructed = nj(self.dists)
self.assertTreeDistancesEqual(self.tree, reconstructed)
def test_gnj(self):
"""testing gnj"""
results = gnj(self.dists, keep=1)
(length, reconstructed) = results[0]
self.assertTreeDistancesEqual(self.tree, reconstructed)
results = gnj(self.dists, keep=10)
(length, reconstructed) = results[0]
self.assertTreeDistancesEqual(self.tree, reconstructed)
# Results should be a TreeCollection
len(results)
results.getConsensusTree()
# From GNJ paper. Pearson, Robins, Zhang 1999.
tied_dists = {
('a', 'b'):3, ('a', 'c'):3, ('a', 'd'):4, ('a', 'e'):3,
('b', 'c'):3, ('b', 'd'):3, ('b', 'e'):4,
('c', 'd'):3, ('c', 'e'):3,
('d', 'e'):3}
results = gnj(tied_dists, keep=3)
scores = [score for (score, tree) in results]
self.assertEqual(scores[:2], [7.75, 7.75])
self.assertNotEqual(scores[2], 7.75)
def test_wls(self):
"""testing wls"""
reconstructed = wls(self.dists, a=4)
self.assertTreeDistancesEqual(self.tree, reconstructed)
def test_truncated_wls(self):
"""testing wls with order option"""
order = ['e', 'b', 'c', 'd']
reconstructed = wls(self.dists, order=order)
self.assertEqual(set(reconstructed.getTipNames()), set(order))
def test_limited_wls(self):
"""testing (well, exercising at least), wls with constrained start"""
init = LoadTree(treestring='((a,c),b,d)')
reconstructed = wls(self.dists, start=init)
self.assertEqual(len(reconstructed.getTipNames()), 6)
init2 = LoadTree(treestring='((a,d),b,c)')
reconstructed = wls(self.dists, start=[init, init2])
self.assertEqual(len(reconstructed.getTipNames()), 6)
init3 = LoadTree(treestring='((a,d),b,z)')
self.assertRaises(Exception, wls, self.dists, start=[init, init3])
# if start tree has all seq names, should raise an error
self.assertRaises(Exception, wls, self.dists,
start=[LoadTree(treestring='((a,c),b,(d,(e,f)))')])
class DistancesTests(unittest.TestCase):
def setUp(self):
self.al = LoadSeqs(data = {'a':'GTACGTACGATC',
'b':'GTACGTACGTAC',
'c':'GTACGTACGTTC',
'e':'GTACGTACTGGT'})
self.collection = LoadSeqs(data = {'a':'GTACGTACGATC',
'b':'GTACGTACGTAC',
'c':'GTACGTACGTTC',
'e':'GTACGTACTGGT'}, aligned=False)
def assertDistsAlmostEqual(self, expected, observed, precision=4):
observed = dict([(frozenset(k),v) for (k,v) in observed.items()])
expected = dict([(frozenset(k),v) for (k,v) in expected.items()])
for key in expected:
self.assertAlmostEqual(expected[key], observed[key], precision)
def test_EstimateDistances(self):
"""testing (well, exercising at least), EstimateDistances"""
d = EstimateDistances(self.al, JC69())
d.run()
canned_result = {('b', 'e'): 0.440840,
('c', 'e'): 0.440840,
('a', 'c'): 0.088337,
('a', 'b'): 0.188486,
('a', 'e'): 0.440840,
('b', 'c'): 0.0883373}
result = d.getPairwiseDistances()
self.assertDistsAlmostEqual(canned_result, result)
# excercise writing to file
d.writeToFile('junk.txt')
try:
os.remove('junk.txt')
except OSError:
pass # probably parallel
def test_EstimateDistancesWithMotifProbs(self):
"""EstimateDistances with supplied motif probs"""
motif_probs= {'A':0.1,'C':0.2,'G':0.2,'T':0.5}
d = EstimateDistances(self.al, HKY85(), motif_probs=motif_probs)
d.run()
canned_result = {('a', 'c'): 0.07537,
('b', 'c'): 0.07537,
('a', 'e'): 0.39921,
('a', 'b'): 0.15096,
('b', 'e'): 0.39921,
('c', 'e'): 0.37243}
result = d.getPairwiseDistances()
self.assertDistsAlmostEqual(canned_result, result)
def test_EstimateDistances_fromThreeway(self):
"""testing (well, exercising at least), EsimateDistances fromThreeway"""
d = EstimateDistances(self.al, JC69(), threeway=True)
d.run()
canned_result = {('b', 'e'): 0.495312,
('c', 'e'): 0.479380,
('a', 'c'): 0.089934,
('a', 'b'): 0.190021,
('a', 'e'): 0.495305,
('b', 'c'): 0.0899339}
result = d.getPairwiseDistances(summary_function="mean")
self.assertDistsAlmostEqual(canned_result, result)
def test_EstimateDistances_fromUnaligned(self):
"""Excercising estimate distances from unaligned sequences"""
d = EstimateDistances(self.collection, JC69(), do_pair_align=True,
rigorous_align=True)
d.run()
canned_result = {('b', 'e'): 0.440840,
('c', 'e'): 0.440840,
('a', 'c'): 0.088337,
('a', 'b'): 0.188486,
('a', 'e'): 0.440840,
('b', 'c'): 0.0883373}
result = d.getPairwiseDistances()
self.assertDistsAlmostEqual(canned_result, result)
d = EstimateDistances(self.collection, JC69(), do_pair_align=True,
rigorous_align=False)
d.run()
canned_result = {('b', 'e'): 0.440840,
('c', 'e'): 0.440840,
('a', 'c'): 0.088337,
('a', 'b'): 0.188486,
('a', 'e'): 0.440840,
('b', 'c'): 0.0883373}
result = d.getPairwiseDistances()
self.assertDistsAlmostEqual(canned_result, result)
def test_EstimateDistances_other_model_params(self):
"""test getting other model params from EstimateDistances"""
d = EstimateDistances(self.al, HKY85(), est_params=['kappa'])
d.run()
# this will be a Number object with Mean, Median etc ..
kappa = d.getParamValues('kappa')
self.assertAlmostEqual(kappa.Mean, 0.8939, 4)
# this will be a dict with pairwise instances, it's called by the above
# method, so the correctness of it's values is already checked
kappa = d.getPairwiseParam('kappa')
def test_EstimateDistances_modify_lf(self):
"""tests modifying the lf"""
def constrain_fit(lf):
lf.setParamRule('kappa', is_constant=True)
lf.optimise(local=True)
return lf
d = EstimateDistances(self.al, HKY85(), modify_lf=constrain_fit)
d.run()
result = d.getPairwiseDistances()
d = EstimateDistances(self.al, F81())
d.run()
expect = d.getPairwiseDistances()
self.assertDistsAlmostEqual(expect, result)
if __name__ == '__main__':
unittest.main()
PyCogent-1.5.3/tests/test_recalculation.rst 000644 000765 000024 00000012320 11425201333 022010 0 ustar 00jrideout staff 000000 000000 A simple calculator
>>> from cogent.recalculation.definition import *
>>> def add(*args):
... return sum(args)
...
>>> top = CalcDefn(add)(ParamDefn('A'), ParamDefn('B'))
>>> pc = top.makeParamController()
>>> f = pc.makeCalculator()
f.getValueArray() shows the inputs, ie: the optimisable parameters
>>> f.getValueArray()
[1.0, 1.0]
The calculator can be called like a function
>>> f([3.0, 4.25])
7.25
Or just a subset of the inputs can be changed directly
>>> f.change([(1, 4.5)])
7.5
>>> f.getValueArray()
[3.0, 4.5]
Now with scopes. We will set up the calculation
result = (Ax+Bx) + (Ay+By) + (Az+Bz)
A and B will remain distinct parameters, but x,y and z are merely scopes - ie:
it may be the case that Ax = Ay = Az, and that may simplify the calculation, but
we will never even notice if Ax = Bx.
Each scope dimension (here there is just one, 'category') must be collapsed away
at some point towards the end of the calculation if the calculation is to produce
a scalar result. Here this is done with the selectFromDimension method.
>>> A = ParamDefn('A', dimensions = ['category'])
>>> B = ParamDefn('B', dimensions = ['category'])
>>> mid = CalcDefn(add, name="mid")(A, B)
>>> top = CalcDefn(add)(
... mid.selectFromDimension('category', "x"),
... mid.selectFromDimension('category', "y"),
... mid.selectFromDimension('category', "z"))
...
>>> # or equivalently:
>>> # top = CalcDefn(add, *mid.acrossDimension('category',
>>> # ['x', 'y', 'z']))
>>>
>>> pc = top.makeParamController()
>>> f = pc.makeCalculator()
>>> f.getValueArray()
[1.0, 1.0]
There are still only 2 inputs because the default scope
is global, ie: Ax == Ay == Az. If we allow A to be
different in the x,y and z categories and set their
initial values to 2.0:
>>> pc.assignAll('A', value=2.0, independent=True)
>>> f = pc.makeCalculator()
>>> f.getValueArray()
[1.0, 2.0, 2.0, 2.0]
Now we have A local and B still global, so the calculation is
(Ax+B) + (Ay+B) + (Az+B) with the input parameters being
[B, Ax, Ay, Az], so:
>>> f([1.0, 2.0, 2.0, 2.0])
9.0
>>> f([0.25, 2.0, 2.0, 2.0])
6.75
Constants do not appear in the optimisable inputs.
Set one of the 3 A values to be a constant and there
will be one fewer optimisable parameters:
>>> pc.assignAll('A', scope_spec={'category':'z'}, const=True)
>>> f = pc.makeCalculator()
>>> f.getValueArray()
[1.0, 2.0, 2.0]
The parameter controller should catch cases where the specified scope
does not exist:
>>> pc.assignAll('A', scope_spec={'category':'nosuch'})
Traceback (most recent call last):
InvalidScopeError: ...
>>> pc.assignAll('A', scope_spec={'nonsuch':'nosuch'})
Traceback (most recent call last):
InvalidDimensionError: ...
It is complicated guesswork matching the parameters you expect with positions in
the value array, let alone remembering whether or not they are presented to the
optimiser as logs, so .getValueArray(), .change() and .__call__() should only be
used by optimisers. For other purposes there is an alternative, human friendly
interface:
>>> pc.updateFromCalculator(f)
>>> pc.getParamValue('A', category='x')
2.0
>>> pc.getParamValue('B', category=['x', 'y'])
1.0
Despite the name, .getParamValue can get the value from any step in the
calculation, so long as it has a unique name.
>>> pc.getParamValue('mid', category='x')
3.0
For bulk retrieval of parameter values by parameter name and scope name there is
the .getParamValueDict() method:
>>> pc.getParamValueDict(['category']).keys()
['A', 'B']
>>> pc.getParamValueDict(['category'])['A']['x']
2.0
Here is a function that is more like a likelihood function, in that it has a
maximum:
>>> def curve(x, y):
... return 0 - (x**2 + y**2)
...
>>> top = CalcDefn(curve)(ParamDefn('X'), ParamDefn('Y'))
>>> pc = top.makeParamController()
>>> f = pc.makeCalculator()
Now ask it to find the maximum. It is a simple function with only one local
maximum so local optimisation should be enough:
>>> f.optimise(local=True)
>>> pc.updateFromCalculator(f)
There were two parameters, X and Y, and at the maximum they should both be 0.0:
>>> pc.getParamValue('Y')
0.0
>>> pc.getParamValue('X')
0.0
Because this function has a maximum it is possible to ask it for a confidence
interval around a parameter, ie: how far from 0.0 can we move x before f(x,y)
falls bellow f(X,Y)-dropoff:
>>> pc.getParamInterval('X', dropoff=4, xtol=0.0)
(-2.0, 0.0, 2.0)
We test the ability to omit xtol. Due to precision issues we convert the returned value to a string.
>>> '-2.0, 0.0, 2.0' == "%.1f, %.1f, %.1f" % pc.getParamInterval('X', dropoff=4)
True
And finally intervals can be calculated in bulk by passing a dropoff value to
.getParamValueDict():
>>> pc.getParamValueDict([], dropoff=4, xtol=0.0)['X']
(-2.0, 0.0, 2.0)
For likelihood functions it is more convenient to provide 'p' rather than 'dropoff', dropoff = chdtri(1, p) / 2.0. Also in general you won't need ultra precise answers, so don't use 'xtol=0.0', that's just to make the doctest work.
PyCogent-1.5.3/tests/test_seqsim/ 000755 000765 000024 00000000000 12024703633 017742 5 ustar 00jrideout staff 000000 000000 PyCogent-1.5.3/tests/test_struct/ 000755 000765 000024 00000000000 12024703632 017764 5 ustar 00jrideout staff 000000 000000 PyCogent-1.5.3/tests/test_util/ 000755 000765 000024 00000000000 12024703635 017420 5 ustar 00jrideout staff 000000 000000 PyCogent-1.5.3/tests/timetrial.py 000644 000765 000024 00000006362 12024702176 017756 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
# Simple script to run another command a certain number of times,
# recording how long each run took, and writing the results out to a file.
import os
import os.path
import re
import string
import sys
import time
__author__ = "Peter Maxwell and Gavin Huttley"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Peter Maxwell", "Gavin Huttley", "Edward Lang"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Production"
# Values that affect the running of the program.
minimum_accepted_time = 2
iterations = int(sys.argv[1])
args = sys.argv[2:]
script_re = re.compile(".py$")
type = ""
for i in range(len(args)):
if script_re.search(args[i], 1):
script = args[i][0:string.index(args[i], '.')]
if args[i] == "mpirun":
type = "parallel"
if not type:
type = "serial"
output = "timing/" + script + "-" + str(int(time.time())) + "-" + type
def usage():
pass
def standard_dev(numbers = [], mean = 1):
import math
sum = 0
for i in range(len(numbers)):
sum = sum + math.pow(numbers[i] - mean, 2)
sigma = math.sqrt(sum / (len(numbers) - 1))
return sigma
def main():
if args:
command = ' '.join(map(str, args))
else:
usage()
sys.exit()
total_time = 0.0
times = []
print 'Running "%s" %d times...' % (command, iterations)
i = 0
attempt = 0
while i < iterations:
start_time = time.time()
os.system(command + " > " + output + "." + str(i))
end_time = time.time() - start_time
if end_time > minimum_accepted_time:
times.append(end_time)
total_time = total_time + end_time
print "Time for run %d: %.3f seconds" % (i, end_time)
i = i + 1
attempt = 0
else:
print "Discarding probably bogus time: %.3f seconds" % end_time
attempt = attempt + 1
if attempt == 5:
print "Aborting early due to multiple errors"
sys.exit(3)
times.sort()
mean = total_time / len(times)
sd = standard_dev(times, mean)
print ""
print "Fastest time : %.3f" % times[0]
print "Slowest time : %.3f" % times[len(times) - 1]
print "Mean : %.3f" % mean
print "Standard dev : %.3f" % sd
print "Total time : %.3f" % total_time
print ""
corrected_total = 0.0
corrected_times = []
for i in range(len(times)):
if abs(mean - times[i]) < sd:
corrected_times.append(times[i])
corrected_total = corrected_total + times[i]
else:
print "Discarding value '%.3f'" % times[i]
if len(times) != len(corrected_times):
corrected_mean = corrected_total / len(corrected_times)
corrected_sd = standard_dev(corrected_times, corrected_mean)
print ""
print "CORRECTED RESULTS"
print "Fastest time : %.3f" % corrected_times[0]
print "Slowest time : %.3f" % corrected_times[len(corrected_times)-1]
print "Mean : %.3f" % corrected_mean
print "Standard dev : %.3f" % corrected_sd
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_util/__init__.py 000644 000765 000024 00000001042 12024702176 021525 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
__all__ = ['test_unit_test', 'test_misc', 'test_array', 'test_dict2d',
'test_organizer', 'test_transform', 'test_recode_alignment']
__author__ = ""
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jeremy Widmann", "Sandra Smit", "Gavin Huttley",
"Rob Knight", "Zongzhi Liu", "Amanda Birmingham",
"Greg Caporaso"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
PyCogent-1.5.3/tests/test_util/test_array.py 000644 000765 000024 00000073122 12024702176 022153 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Provides tests for array.py
"""
#SUPPORT2425
#from __future__ import with_statement
from cogent.util.unit_test import main, TestCase#, numpy_err
from cogent.util.array import gapped_to_ungapped, unmasked_to_masked, \
ungapped_to_gapped, masked_to_unmasked, pairs_to_array,\
ln_2, log2, safe_p_log_p, safe_log, row_uncertainty, column_uncertainty,\
row_degeneracy, column_degeneracy, hamming_distance, norm,\
euclidean_distance, \
count_simple, count_alphabet, \
is_complex, is_significantly_complex, \
has_neg_off_diags, has_neg_off_diags_naive, \
sum_neg_off_diags, sum_neg_off_diags_naive, \
scale_row_sum, scale_row_sum_naive, scale_trace, \
abs_diff, sq_diff, norm_diff, \
cartesian_product, with_diag, without_diag, \
only_nonzero, combine_dimensions, split_dimension, \
non_diag, perturb_one_off_diag, perturb_off_diag, \
merge_samples, sort_merged_samples_by_value, classifiers, \
minimize_error_count, minimize_error_rate, mutate_array
import numpy
Float = numpy.core.numerictypes.sctype2char(float)
from numpy import array, zeros, transpose, sqrt, reshape, arange, \
ravel, trace, ones
__author__ = "Rob Knight and Jeremy Widmann"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jeremy Widmann", "Rob Knight", "Sandra Smit"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class arrayTests(TestCase):
"""Tests of top-level functions."""
def setUp(self):
"""set up some standard sequences and masks"""
self.gap_state = array('-', 'c')
self.s1 = array('ACT-G', 'c')
self.s2 = array('--CT', 'c')
self.s3 = array('AC--', 'c')
self.s4 = array('AC', 'c')
self.s5 = array('--', 'c')
self.m1 = array([0,0,0,1,0])
self.m2 = array([1,1,0,0])
self.m3 = array([0,0,1,1])
self.m4 = array([0,0])
self.m5 = array([1,1])
def test_unmasked_to_masked(self):
"""unmasked_to_masked should match hand-calculated results"""
u2m = unmasked_to_masked
self.assertEqual(u2m(self.m1), array([0,1,2,4]))
self.assertEqual(u2m(self.m2), array([2,3]))
self.assertEqual(u2m(self.m3), array([0,1]))
self.assertEqual(u2m(self.m4), array([0,1]))
self.assertEqual(u2m(self.m5), array([]))
def test_ungapped_to_gapped(self):
"""ungapped_to_gapped should match hand-calculated results"""
u2g = ungapped_to_gapped
gap_state = self.gap_state
self.assertEqual(u2g(self.s1, gap_state), array([0,1,2,4]))
self.assertEqual(u2g(self.s2, gap_state), array([2,3]))
self.assertEqual(u2g(self.s3, gap_state), array([0,1]))
self.assertEqual(u2g(self.s4, gap_state), array([0,1]))
self.assertEqual(u2g(self.s5, gap_state), array([]))
def test_masked_to_unmasked(self):
"""masked_to_unmasked should match hand-calculated results"""
m2u = masked_to_unmasked
self.assertEqual(m2u(self.m1), array([0,1,2,2,3]))
self.assertEqual(m2u(self.m1, True), array([0,1,2,-1,3]))
self.assertEqual(m2u(self.m2), array([-1,-1,0,1]))
self.assertEqual(m2u(self.m2, True), array([-1,-1,0,1]))
self.assertEqual(m2u(self.m3), array([0,1,1,1]))
self.assertEqual(m2u(self.m3, True), array([0,1,-1,-1]))
self.assertEqual(m2u(self.m4), array([0,1]))
self.assertEqual(m2u(self.m4, True), array([0,1]))
self.assertEqual(m2u(self.m5), array([-1,-1]))
self.assertEqual(m2u(self.m5, True), array([-1,-1]))
def test_gapped_to_ungapped(self):
"""gapped_to_ungapped should match hand-calculated results"""
g2u = gapped_to_ungapped
gap_state = self.gap_state
self.assertEqual(g2u(self.s1, gap_state), array([0,1,2,2,3]))
self.assertEqual(g2u(self.s1, gap_state, True), array([0,1,2,-1,3]))
self.assertEqual(g2u(self.s2, gap_state), array([-1,-1,0,1]))
self.assertEqual(g2u(self.s2, gap_state, True), array([-1,-1,0,1]))
self.assertEqual(g2u(self.s3, gap_state), array([0,1,1,1]))
self.assertEqual(g2u(self.s3, gap_state, True), array([0,1,-1,-1]))
self.assertEqual(g2u(self.s4, gap_state), array([0,1]))
self.assertEqual(g2u(self.s4, gap_state, True), array([0,1]))
self.assertEqual(g2u(self.s5, gap_state), array([-1,-1]))
self.assertEqual(g2u(self.s5, gap_state, True), array([-1,-1]))
def test_pairs_to_array(self):
"""pairs_to_array should match hand-calculated results"""
p2a = pairs_to_array
p1 = [0, 1, 0.5]
p2 = [2, 3, 0.9]
p3 = [1, 2, 0.6]
pairs = [p1, p2, p3]
self.assertEqual(p2a(pairs), \
array([[0,.5,0,0],[0,0,.6,0],[0,0,0,.9],[0,0,0,0]]))
#try it without weights -- should assign 1
new_pairs = [[0,1],[2,3],[1,2]]
self.assertEqual(p2a(new_pairs), \
array([[0,1,0,0],[0,0,1,0],[0,0,0,1],[0,0,0,0]]))
#try it with explicit array size
self.assertEqual(p2a(pairs, 5), \
array([[0,.5,0,0,0],[0,0,.6,0,0],[0,0,0,.9,0],[0,0,0,0,0],\
[0,0,0,0,0]]))
#try it when we want to map the indices into gapped coords
#we're effectively doing ABCD -> -A--BC-D-
transform = array([1,4,5,7])
result = p2a(pairs, transform=transform)
self.assertEqual(result.shape, (8,8))
exp = zeros((8,8), Float)
exp[1,4] = 0.5
exp[4,5] = 0.6
exp[5,7] = 0.9
self.assertEqual(result, exp)
result = p2a(pairs, num_items=9, transform=transform)
self.assertEqual(result.shape, (9,9))
exp = zeros((9,9), Float)
exp[1,4] = 0.5
exp[4,5] = 0.6
exp[5,7] = 0.9
self.assertEqual(result, exp)
class ArrayMathTests(TestCase):
def test_ln_2(self):
"""ln_2: should be constant"""
self.assertFloatEqual(ln_2, 0.693147)
def test_log2(self):
"""log2: should work fine on positive/negative numbers and zero"""
self.assertEqual(log2(1),0)
self.assertEqual(log2(2),1)
self.assertEqual(log2(4),2)
self.assertEqual(log2(8),3)
#SUPPORT2425
#with numpy_err(divide='ignore'):
ori_err = numpy.geterr()
numpy.seterr(divide='ignore')
try:
try:
self.assertEqual(log2(0),float('-inf'))
except (ValueError, OverflowError): #platform-dependent
pass
finally:
numpy.seterr(**ori_err)
#SUPPORT2425
ori_err = numpy.geterr()
numpy.seterr(divide='raise')
try:
#with numpy_err(divide='raise'):
self.assertRaises(FloatingPointError, log2, 0)
finally:
numpy.seterr(**ori_err)
#nan is the only thing that's not equal to itself
try:
self.assertNotEqual(log2(-1),log2(-1)) #now nan
except ValueError:
pass
def test_safe_p_log_p(self):
"""safe_p_log_p: should handle pos/neg/zero/empty arrays as expected
"""
#normal valid array
a = array([[4,0,8],[2,16,4]])
self.assertEqual(safe_p_log_p(a),array([[-8,0,-24],[-2,-64,-8]]))
#just zeros
a = array([[0,0],[0,0]])
self.assertEqual(safe_p_log_p(a),array([[0,0],[0,0]]))
#negative number -- skip
self.assertEqual(safe_p_log_p(array([-4])), array([0]))
#integer input, float output
self.assertFloatEqual(safe_p_log_p(array([3])),array([-4.75488750]))
#empty array
self.assertEqual(safe_p_log_p(array([])),array([]))
def test_safe_log(self):
"""safe_log: should handle pos/neg/zero/empty arrays as expected
"""
#normal valid array
a = array([[4,0,8],[2,16,4]])
self.assertEqual(safe_log(a),array([[2,0,3],[1,4,2]]))
#input integers, output floats
self.assertFloatEqual(safe_log(array([1,2,3])),array([0,1,1.5849625]))
#just zeros
a = array([[0,0],[0,0]])
self.assertEqual(safe_log(a),array([[0,0],[0,0]]))
#negative number
try:
self.assertFloatEqual(safe_log(array([0,3,-4]))[0:2], \
array([0,1.5849625007]))
except ValueError: #platform-dependent
pass
try:
self.assertNotEqual(safe_log(array([0,3,-4]))[2],\
safe_log(array([0,3,-4]))[2])
except ValueError: #platform-dependent
pass
#empty array
self.assertEqual(safe_log(array([])),array([]))
#double empty array
self.assertEqual(safe_log(array([[]])),array([[]]))
def test_row_uncertainty(self):
"""row_uncertainty: should handle pos/neg/zero/empty arrays as expected
"""
#normal valid array
b = transpose(array([[.25,.2,.45,.25,1],[.25,.2,.45,0,0],\
[.25,.3,.05,.75,0],[.25,.3,.05,0,0]]))
self.assertFloatEqual(row_uncertainty(b),[2,1.97,1.47,0.81,0],1e-3)
#one-dimensional array
self.assertRaises(ValueError, row_uncertainty,\
array([.25,.25,.25,.25]))
#zeros
self.assertEqual(row_uncertainty(array([[0,0]])),array([0]))
#empty 2D array
self.assertEqual(row_uncertainty(array([[]])),array([0]))
self.assertEqual(row_uncertainty(array([[],[]])),array([0,0]))
#negative number -- skip
self.assertEqual(row_uncertainty(array([[-2]])), array([0]))
def test_col_uncertainty(self):
"""column_uncertainty: should handle pos/neg/zero/empty arrays
"""
b = array([[.25,.2,.45,.25,1],[.25,.2,.45,0,0],[.25,.3,.05,.75,0],\
[.25,.3,.05,0,0]])
self.assertFloatEqual(column_uncertainty(b),[2,1.97,1.47,0.81,0],1e-3)
#one-dimensional array
self.assertRaises(ValueError, column_uncertainty,\
array([.25,.25,.25,.25]))
#zeros
self.assertEqual(column_uncertainty(array([[0,0]])),array([0,0]))
#empty 2D array
self.assertEqual(column_uncertainty(array([[]])),array([]))
self.assertEqual(column_uncertainty(array([[],[]])),array([]))
#negative number -- skip
self.assertEqual(column_uncertainty(array([[-2]])), array([0]))
def test_row_degeneracy(self):
"""row_degeneracy: should work with different cutoff values and arrays
"""
a = array([[.1, .3, .4, .2],[.5, .3, 0, .2],[.8, 0, .1, .1]])
self.assertEqual(row_degeneracy(a,cutoff=.75),[3,2,1])
self.assertEqual(row_degeneracy(a,cutoff=.95),[4,3,3])
#one-dimensional array
self.assertRaises(ValueError, row_degeneracy,\
array([.25,.25,.25,.25]))
#if cutoff value is not found, results are clipped to the
#number of columns in the array
self.assertEqual(row_degeneracy(a,cutoff=2), [4,4,4])
#same behavior on empty array
self.assertEqual(row_degeneracy(array([[]])),[])
def test_column_degeneracy(self):
"""column_degeneracy: should work with different cutoff values
"""
a = array([[.1,.8,.3],[.3,.2,.3],[.6,0,.4]])
self.assertEqual(column_degeneracy(a,cutoff=.75),[2,1,3])
self.assertEqual(column_degeneracy(a,cutoff=.45),[1,1,2])
#one-dimensional array
self.assertRaises(ValueError, column_degeneracy,\
array([.25,.25,.25,.25]))
#if cutoff value is not found, results are clipped to the
#number of rows in the array
self.assertEqual(column_degeneracy(a,cutoff=2), [3,3,3])
#same behavior on empty array
self.assertEqual(column_degeneracy(array([[]])),[])
def test_hamming_distance_same_length(self):
"""hamming_distance: should return # of chars different"""
hd = hamming_distance(array('ABC','c'),array('ABB','c'))
self.assertEqual(hd,1)
self.assertEqual(hamming_distance(array('ABC', 'c'),array('ABC', 'c')),0)
self.assertEqual(hamming_distance(array('ABC', 'c'),array('DDD', 'c')),3)
def test_hamming_distance_diff_length(self):
"""hamming_distance: truncates at shortest sequence"""
self.assertEqual(hamming_distance(array('ABC', 'c'),array('ABBDDD', 'c')),1)
self.assertEqual(hamming_distance(array('ABC', 'c'),array('ABCDDD', 'c')),0)
self.assertEqual(hamming_distance(array('ABC', 'c'),array('DDDDDD', 'c')),3)
def test_norm(self):
"""norm: should return vector or matrix norm"""
self.assertFloatEqual(norm(array([2,3,4,5])),sqrt(54))
self.assertEqual(norm(array([1,1,1,1])),2)
self.assertFloatEqual(norm(array([[2,3],[4,5]])),sqrt(54))
self.assertEqual(norm(array([[1,1],[1,1]])),2)
def test_euclidean_distance(self):
"""euclidean_distance: should return dist between 2 vectors or matrices
"""
a = array([3,4])
b = array([8,5])
c = array([[2,3],[4,5]])
d = array([[1,5],[8,2]])
self.assertFloatEqual(euclidean_distance(a,b),sqrt(26))
self.assertFloatEqual(euclidean_distance(c,d),sqrt(30))
def test_euclidean_distance_unexpected(self):
"""euclidean_distance: works always when frames are aligned. UNEXPECTED!
"""
a = array([3,4])
b = array([8,5])
c = array([[2,3],[4,5]])
d = array([[1,5],[8,2]])
e = array([[4,5],[4,5],[4,5]])
f = array([1,1,1,1,1])
self.assertFloatEqual(euclidean_distance(a,c),sqrt(4))
self.assertFloatEqual(euclidean_distance(c,a),sqrt(4))
self.assertFloatEqual(euclidean_distance(a,e),sqrt(6))
#IT DOES RAISE AN ERROR WHEN THE FRAMES ARE NOT ALIGNED
self.assertRaises(ValueError,euclidean_distance,c,e)
self.assertRaises(ValueError,euclidean_distance,c,f)
def test_count_simple(self):
"""count_simple should return correct counts"""
self.assertEqual(count_simple(array([]), 3), array([0,0,0]))
self.assertEqual(count_simple(array([1,2,2,1,0]), 3), array([1,2,2]))
self.assertEqual(count_simple(array([1,1,1,1,1]), 3), array([0,5,0]))
self.assertEqual(count_simple(array([1,1,1,1,1]), 2), array([0,5]))
#raises index error if alphabet length is 0
self.assertRaises(IndexError, count_simple, array([1]), 0)
def test_count_alphabet(self):
"""count_alphabet should return correct counts"""
self.assertEqual(count_alphabet(array([]), 3), array([0,0,0]))
self.assertEqual(count_alphabet(array([1,2,2,1,0]), 3), array([1,2,2]))
self.assertEqual(count_alphabet(array([1,1,1,1,1]), 3), array([0,5,0]))
self.assertEqual(count_alphabet(array([1,1,1,1,1]), 2), array([0,5]))
#raises index error if alphabet length is 0
self.assertRaises(IndexError, count_alphabet, array([1]), 0)
def test_is_complex(self):
"""is_complex should return True on matrix with complex values"""
self.assertEqual(is_complex(array([[1,2],[3,4]])), False)
self.assertEqual(is_complex(array([[1,2],[3,4.0]])), False)
self.assertEqual(is_complex(array([[1,2+1j],[3,4]])), True)
self.assertEqual(is_complex(array([[1,2.0j],[3,4.0]])), True)
def test_is_significantly_complex(self):
"""is_significantly_complex should return True on complex matrix"""
isc = is_significantly_complex
self.assertEqual(isc(array([[1,2],[3,4]])), False)
self.assertEqual(isc(array([[1,2],[3,4.0]])), False)
self.assertEqual(isc(array([[1,2+1j],[3,4]])), True)
self.assertEqual(isc(array([[1,2.0j],[3,4.0]])), True)
self.assertEqual(isc(array([[1,1e-10j],[3,4.0]])), False)
self.assertEqual(isc(array([[1,1e-10j],[3,4.0]]), 1e-12), True)
def test_has_neg_off_diags_naive(self):
"""has_neg_off_diags_naive should return True if any off-diags negative"""
hnod = has_neg_off_diags_naive
self.assertEqual(hnod(array([[1,2],[3,4]])), False)
self.assertEqual(hnod(array([[-1,2],[3,-4]])), False)
self.assertEqual(hnod(array([[-1,-2],[3,-4]])), True)
self.assertEqual(hnod(array([[1,-2],[3,4]])), True)
def test_has_neg_off_diags(self):
"""has_neg_off_diags should be same as has_neg_off_diags_naive"""
hnod = has_neg_off_diags
self.assertEqual(hnod(array([[1,2],[3,4]])), False)
self.assertEqual(hnod(array([[-1,2],[3,-4]])), False)
self.assertEqual(hnod(array([[-1,-2],[3,-4]])), True)
self.assertEqual(hnod(array([[1,-2],[3,4]])), True)
def test_sum_neg_off_diags_naive(self):
"""sum_neg_off_diags_naive should return the sum of negative off-diags"""
snod = sum_neg_off_diags_naive
self.assertEqual(snod(array([[1,2],[3,4]])), 0)
self.assertEqual(snod(array([[-1,2],[3,-4]])), 0)
self.assertEqual(snod(array([[-1,-2],[3,-4]])), -2)
self.assertEqual(snod(array([[1,-2],[3,4]])), -2)
self.assertEqual(snod(array([[1,-2],[-3,4]])), -5)
def test_sum_neg_off_diags(self):
"""sum_neg_off_diags should return same as sum_neg_off_diags_naive"""
snod = sum_neg_off_diags
self.assertEqual(snod(array([[1,2],[3,4]])), 0)
self.assertEqual(snod(array([[-1,2],[3,-4]])), 0)
self.assertEqual(snod(array([[-1,-2],[3,-4]])), -2)
self.assertEqual(snod(array([[1,-2],[3,4]])), -2)
self.assertEqual(snod(array([[1,-2],[-3,4]])), -5)
def test_scale_row_sum(self):
"""scale_row_sum should give same result as scale_row_sum_naive"""
m = array([[1.0,2,3,4],[2,4,4,0],[1,1,1,1],[0,0,0,100]])
scale_row_sum(m)
self.assertFloatEqual(m, [[0.1,0.2,0.3,0.4],[0.2,0.4,0.4,0],\
[0.25,0.25,0.25,0.25],[0,0,0,1.0]])
scale_row_sum(m,4)
self.assertFloatEqual(m, [[0.4,0.8,1.2,1.6],[0.8,1.6,1.6,0],\
[1,1,1,1],[0,0,0,4.0]])
#if any of the rows sums to zero, an exception will be raised.
#SUPPORT2425
ori_err = numpy.geterr()
numpy.seterr(divide='raise')
try:
#with numpy_err(divide='raise'):
self.assertRaises((ZeroDivisionError, FloatingPointError), \
scale_row_sum, array([[1,0],[0,0]]))
finally:
numpy.seterr(**ori_err)
def test_scale_row_sum_naive(self):
"""scale_row_sum_naive should scale rows to correct values"""
m = array([[1.0,2,3,4],[2,4,4,0],[1,1,1,1],[0,0,0,100]])
scale_row_sum_naive(m)
self.assertFloatEqual(m, [[0.1,0.2,0.3,0.4],[0.2,0.4,0.4,0],\
[0.25,0.25,0.25,0.25],[0,0,0,1.0]])
scale_row_sum_naive(m,4)
self.assertFloatEqual(m, [[0.4,0.8,1.2,1.6],[0.8,1.6,1.6,0],\
[1,1,1,1],[0,0,0,4.0]])
#if any of the rows sums to zero, an exception will be raised.
#SUPPORT2425
ori_err = numpy.geterr()
numpy.seterr(divide='raise')
try:
#with numpy_err(divide='raise'):
self.assertRaises((ZeroDivisionError, FloatingPointError), \
scale_row_sum_naive, array([[1,0],[0,0]]))
finally:
numpy.seterr(**ori_err)
def test_scale_trace(self):
"""scale_trace should scale trace to correct values"""
#should scale to -1 by default
#WARNING: won't work with integer matrices
m = array([[-2., 0],[0,-2]])
scale_trace(m)
self.assertFloatEqual(m, [[-0.5, 0],[0,-0.5]])
#should work even with zero rows
m = array([
[1.0,2,3,4],
[2,4,4,0],
[1,1,0,1],
[0,0,0,0]
])
m_orig = m.copy()
scale_trace(m)
self.assertFloatEqual(m, m_orig / -5)
#but should fail if trace is zero
m = array([[0,1,1],[1,0,1],[1,1,0]])
#SUPPORT2425
ori_err = numpy.geterr()
numpy.seterr(divide='raise')
try:
#with numpy_err(divide='raise'):
self.assertRaises((ZeroDivisionError, FloatingPointError), \
scale_trace, m)
finally:
numpy.seterr(**ori_err)
def test_abs_diff(self):
"""abs_diff should calculate element-wise sum of abs(first-second)"""
m = array([[1.0,2,3],[4,5,6], [7,8,9]])
m2 = array([[1.0,1,4],[2,6,-1],[8,6,-5]])
#matrix should not be different from itself
self.assertEqual(abs_diff(m,m), 0.0)
self.assertEqual(abs_diff(m2,m2), 0.0)
#difference should be same either direction
self.assertEqual(abs_diff(m,m2), 29.0)
self.assertEqual(abs_diff(m2,m), 29.0)
def test_sq_diff(self):
"""sq_diff should calculate element-wise sum square of abs(first-second)"""
m = array([[1.0,2,3],[4,5,6], [7,8,9]])
m2 = array([[1.0,1,4],[2,6,-1],[8,6,-5]])
#matrix should not be different from itself
self.assertEqual(sq_diff(m,m), 0.0)
self.assertEqual(sq_diff(m2,m2), 0.0)
#difference should be same either direction
self.assertEqual(sq_diff(m,m2), 257.0)
self.assertEqual(sq_diff(m2,m), 257.0)
def test_norm_diff(self):
"""norm_diff should calculate per-element rms difference"""
m = array([[1.0,2,3],[4,5,6], [7,8,9]])
m2 = array([[1.0,1,4],[2,6,-1],[8,6,-5]])
#matrix should not be different from itself
self.assertEqual(norm_diff(m,m), 0.0)
self.assertEqual(norm_diff(m2,m2), 0.0)
#difference should be same either direction
self.assertEqual(norm_diff(m,m2), sqrt(257.0)/9)
self.assertEqual(norm_diff(m2,m), sqrt(257.0)/9)
def test_carteisan_product(self):
"""cartesian_product should return expected results."""
a = 'abc'
b = [1,2,3]
c = [1.0]
d = [0,1]
#cartesian_product of list of single list should be same list
self.assertEqual(cartesian_product([c]), [(1.0,)])
self.assertEqual(cartesian_product([a]), [('a',),('b',),('c',)])
#should combine two lists correctly
self.assertEqual(cartesian_product([a,b]), \
[('a',1),('a',2),('a',3),('b',1),('b',2),\
('b',3),('c',1),('c',2),('c',3)])
#should combine three lists correctly
self.assertEqual(cartesian_product([d,d,d]), \
[(0,0,0),(0,0,1),(0,1,0),(0,1,1),(1,0,0),(1,0,1),(1,1,0),(1,1,1)])
self.assertEqual(cartesian_product([c,d,d]), \
[(1.0,0,0),(1.0,0,1),(1.0,1,0),(1.0,1,1)])
def test_without_diag(self):
"""without_diag should omit diagonal from matrix"""
a = array([[1,2,3],[4,5,6],[7,8,9]])
b = without_diag(a)
self.assertEqual(b, array([[2,3],[4,6],[7,8]]))
def test_with_diag(self):
"""with_diag should add diagonal to matrix"""
a = array([[2,3],[4,6],[7,8]])
b = with_diag(a, array([1,5,9]))
self.assertEqual(b, array([[1,2,3],[4,5,6],[7,8,9]]))
def test_only_nonzero(self):
"""only_nonzero should return only items whose first element is nonzero"""
a = reshape(arange(1,46),(5,3,3))
a[1,0,0] = 0
a[3,0,0] = 0
#expect result to be rows 0, 2 and 3 of a
result = only_nonzero(a)
self.assertEqual(result,
array([[[1,2,3],[4,5,6],[7,8,9]],\
[[19,20,21],[22,23,24],[25,26,27]],
[[37,38,39],[40,41,42],[43,44,45]]]))
def test_combine_dimensions(self):
"""combine_dimensions should aggregate expected dimensions"""
m = reshape(arange(81), (3,3,3,3))
a = combine_dimensions(m, 0)
self.assertEqual(a.shape, (3,3,3,3))
a = combine_dimensions(m, 1)
self.assertEqual(a.shape, (3,3,3,3))
a = combine_dimensions(m, 2)
self.assertEqual(a.shape, (9,3,3))
a = combine_dimensions(m, 3)
self.assertEqual(a.shape, (27,3))
a = combine_dimensions(m, 4)
self.assertEqual(a.shape, (81,))
#should work for negative indices as well, starting at end
a = combine_dimensions(m, -1)
self.assertEqual(a.shape, (3,3,3,3))
a = combine_dimensions(m, -2)
self.assertEqual(a.shape, (3,3,9))
a = combine_dimensions(m, -3)
self.assertEqual(a.shape, (3,27))
a = combine_dimensions(m, -4)
self.assertEqual(a.shape, (81,))
def test_split_dimension(self):
"""split_dimension should unpack specified dimension"""
m = reshape(arange(12**3), (12,12,12))
a = split_dimension(m, 0, (4,3))
self.assertEqual(a.shape, (4,3,12,12))
a = split_dimension(m, 0, (2,3,2))
self.assertEqual(a.shape, (2,3,2,12,12))
a = split_dimension(m, 1, (6,2))
self.assertEqual(a.shape, (12, 6, 2, 12))
a = split_dimension(m, 2, (3,4))
self.assertEqual(a.shape, (12,12,3,4))
#should work for negative index
a = split_dimension(m, -1, (3,4))
self.assertEqual(a.shape, (12,12,3,4))
a = split_dimension(m, -2, (3,4))
self.assertEqual(a.shape, (12,3,4,12))
a = split_dimension(m, -3, (3,4))
self.assertEqual(a.shape, (3,4,12,12))
#should fail with IndexError for invalid dimension
self.assertRaises(IndexError, split_dimension, m, 5, (3,4))
#should assume even split if not supplied
m = reshape(arange(16**3), (16,16,16))
a = split_dimension(m, 0)
self.assertEqual(a.shape, (4,4,16,16))
a = split_dimension(m, 1)
self.assertEqual(a.shape, (16,4,4,16))
def test_non_diag(self):
"""non_diag should return non-diag elements from flattened matrices"""
a = reshape(arange(16), (4,4))
m = non_diag(a)
self.assertEqual(m, array([[1,2],[5,6],[9,10],[13,14]]))
a = reshape(arange(27), (3,9))
m = non_diag(a)
self.assertEqual(m, array([[1,2,3,5,6,7],[10,11,12,14,15,16],\
[19,20,21,23,24,25]]))
def test_perturb_one_off_diag(self):
"""perturb_element should perturb a random off-diagonal element"""
for i in range(100):
a = zeros((4,4), Float)
p = perturb_one_off_diag(a)
#NOTE: off-diag element and diag element will _both_ change
self.assertEqual(sum(ravel(p != a)), 2)
#check that sum is still 0
self.assertEqual(sum(ravel(p)), 0)
#check that rrace is negative
assert trace(p) < 1
#check that we can pick an element to change
a = zeros((4,4), Float)
p = perturb_one_off_diag(a, mean=5, sd=0.1, element_to_change=8)
#check that row still sums to 0
self.assertEqual(sum(ravel(p)), 0)
#set diag in changed row to 0
p[2][2] = 0
assert ((4.5 < sum(p)).any() < 5.5).any()
assert 4.5 < p[2][3] < 5.5
p[2][3] = 0
self.assertEqual(sum(ravel(p)), 0)
def test_perturb_off_diag(self):
"""perturb_off_diag should change all off_diag elements."""
a = zeros((4,4), Float)
d = perturb_off_diag(a)
self.assertFloatEqual(sum(ravel(d)), 0)
#try it with a valid rate matrix
a = ones((4,4), Float)
for i in range(4):
a[i][i] = -3
d = perturb_off_diag(a)
self.assertNotEqual(d, a)
self.assertFloatEqual(sum(ravel(d)), 0)
#check that we didn't change it too much
assert -13 < trace(d) < -11
def test_merge_samples(self):
"""merge_samples should keep the sample label"""
self.assertEqual(merge_samples(array([1,2]),array([3,4]),array([5])),
array([[1,2,3,4,5],[0,0,1,1,2]]))
def test_sort_merged_samples_by_value(self):
"""sort_merged_samples_by_value should keep label associations"""
s = merge_samples(array([3,4]), array([5,6]), array([1,2]))
result = sort_merged_samples_by_value(s)
self.assertEqual(result, array([[1,2,3,4,5,6],[2,2,0,0,1,1]]))
def test_classifiers(self):
"""classifiers should return all the 1D classifiers of samples"""
first = array([2,1,5,3,5])
second = array([2,5,5,4,6,7])
result = classifiers(first, second)
self.assertEqual(len(result), 6)
exp = [(1,False,0,4,1,6),(3,False,1,3,2,5),(4,False,1,2,3,5),\
(5,False,2,2,3,4),(9,False,4,0,5,2),(10,False,5,0,5,1)]
self.assertEqual(result, exp)
#should work in reverse
result = classifiers(second, first)
exp = [(1,True,0,4,1,6),(3,True,1,3,2,5),(4,True,1,2,3,5),\
(5,True,2,2,3,4),(9,True,4,0,5,2),(10,True,5,0,5,1)]
def test_minimize_error_count(self):
"""minimize_error_count should return correct classifier"""
first = array([2,1,5,3,5])
second = array([2,5,5,4,6,7])
c = classifiers(first, second)
exp = (4,False,1,2,3,5)
self.assertEqual(minimize_error_count(c), exp)
def test_minimize_error_rate(self):
"""minimize_error_rate should return correct classifier"""
#should be same as error count on example used above
first = array([2,1,5,3,5])
second = array([2,5,5,4,6,7])
c = classifiers(first, second)
exp = (4,False,1,2,3,5)
self.assertEqual(minimize_error_rate(c), exp)
#here's a case where they should differ
first = array([2,3,11,5])
second = array([1,4,6,7,8,9,10])
c = classifiers(first, second)
self.assertEqual(minimize_error_count(c), (3,False,1,2,2,6))
self.assertEqual(minimize_error_rate(c), (5,False,2,1,3,5))
def test_mutate_array(self):
"""mutate_array should return mutated copy"""
a = arange(5)
m = mutate_array(a, 1, 2)
assert a is not m
self.assertNotEqual(a, m)
residuals = m - a
assert min(residuals) > -6
assert max(residuals) < 6
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_util/test_dict2d.py 000644 000765 000024 00000064132 12024702176 022207 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from cogent.util.unit_test import TestCase, main
from cogent.util.dict2d import Dict2D, \
average, largest, smallest, swap, nonzero, not_0, upper_to_lower, \
lower_to_upper, Dict2DInitError, Dict2DError, Dict2DSparseError
from cogent.maths.stats.util import Numbers, Freqs
__author__ = "Greg Caporaso"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Greg Caporaso", "Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Greg Caporaso"
__email__ = "caporaso@colorado.edu"
__status__ = "Development"
class Dict2DTests(TestCase):
""" Tests of the Dict2DTests class """
def setUp(self):
"""Define a few standard matrices"""
self.empty = {}
self.single_same = {'a':{'a':2}}
self.single_diff = {'a':{'b':3}}
self.square = {
'a':{'a':1,'b':2,'c':3},
'b':{'a':2,'b':4,'c':6},
'c':{'a':3,'b':6,'c':9},
}
self.top_triangle = {
'a':{'a':1, 'b':2, 'c':3},
'b':{'b':4, 'c':6},
'c':{'c':9}
}
self.bottom_triangle = {
'b':{'a':2},
'c':{'a':3, 'b':6}
}
self.sparse = {
'a':{'a':1, 'c':3},
'd':{'b':2},
}
self.dense = {
'a':{'a':1,'b':2,'c':3},
'b':{'a':2,'b':4,'c':6},
}
def test_init(self):
"""Dict2D init should work as expected"""
#NOTE: currently only tests init from dict of dicts. Other initializers
#are tested in the test_guess_input* and test_from* methods
#should compare equal to the relevant dict
for d in [self.empty, self.single_same, self.single_diff, self.dense, \
self.sparse]:
d2d = Dict2D(d)
self.assertEqual(d2d, d)
self.assertEqual(d2d.__class__, Dict2D)
#spot-check values
d2d = Dict2D(self.sparse)
self.assertEqual(d2d['a']['c'], 3)
self.assertEqual(d2d['d']['b'], 2)
self.assertEqual(len(d2d), 2)
self.assertEqual(len(d2d['a']), 2)
self.assertRaises(KeyError, d2d.__getitem__, 'c')
#check truth values
assert not Dict2D(self.empty)
assert Dict2D(self.single_same)
def test_fromDicts(self):
"""Dict2D.fromDicts should construct from dict of dicts"""
d2d = Dict2D()
d2d.fromDicts(self.sparse)
self.assertEqual(d2d['a']['c'], 3)
self.assertEqual(d2d['d']['b'], 2)
self.assertEqual(len(d2d), 2)
self.assertEqual(len(d2d['a']), 2)
self.assertRaises(KeyError, d2d.__getitem__, 'c')
self.assertRaises(Dict2DInitError, d2d.fromDicts, [1,2,3])
def test_fromIndices(self):
"""Dict2D.fromIndices should construct from list of indices"""
d2d = Dict2D(self.sparse)
d2d2 = Dict2D()
self.assertNotEqual(d2d, d2d2)
d2d2.fromIndices([('a','a',1),('a','c',3),('d','b',2)])
self.assertEqual(d2d, d2d2)
self.assertRaises(Dict2DInitError, d2d2.fromIndices, [1,2,3])
def test_fromLists(self):
"""Dict2D.fromLists should construct from list of lists"""
#Note that this only works for dense matrices, not sparse ones
orig = Dict2D(self.dense)
new = Dict2D(self.dense) #will overwrite this data
self.assertEqual(orig, new)
assert orig is not new
new.RowOrder = ['b','a']
new.ColOrder = ['c','a','b']
new.fromLists([[3,6,9],[1,3,5]])
self.assertNotEqual(orig, new)
test = Dict2D({'b':{'c':3,'a':6,'b':9},'a':{'c':1,'a':3,'b':5}})
self.assertEqual(new, test)
def test_guess_input_type_fromLists(self):
"""Dict2D init can correctly guess input type: Lists """
# Will fail if Error is raised
d = Dict2D(data=[[1,2,3],[4,5,6]], RowOrder=list('ab'), \
ColOrder=list('def'))
def test_guess_input_type_fromDict(self):
"""Dict2D init can correctly guess input type: Dict """
# Will fail if error is raised
d = Dict2D({})
def test_guess_input_type_fromIndices(self):
"""Dict2D init can correctly guess input type: Indices """
# Will fail if error is raised
d = Dict2D([('a','b',1)])
def test_init_without_data(self):
"""Dict2D init functions correctly without a data parameter """
d = Dict2D(RowOrder=['a'],ColOrder=['b'],Pad=True,Default=42, RowConstructor=Freqs)
self.assertEqual(d.RowOrder,['a'])
self.assertEqual(d.ColOrder,['b'])
self.assertEqual(d.Pad,True)
self.assertEqual(d.Default,42)
self.assertEqual(d.RowConstructor, Freqs)
self.assertEqual(d,{'a':{'b':42.}})
def test_pad(self):
"""Dict2D pad should fill empty slots with default, but not make square"""
d = Dict2D(self.sparse)
d.pad()
self.assertEqual(len(d), 2)
self.assertEqual(len(d['a']), 3)
self.assertEqual(len(d['d']), 3)
self.assertEqual(d['a'].keys(), d['d'].keys())
self.assertEqual(d['a']['b'], None)
#check that it works with a different default value
d = Dict2D(self.sparse, Default='x')
d.pad()
self.assertEqual(d['a']['b'], 'x')
#check that it works with a different constructor
d = Dict2D(self.sparse, Default=0, RowConstructor=Freqs)
d.pad()
self.assertEqual(d['a']['b'], 0)
assert isinstance(d['a'], Freqs)
def test_purge(self):
"""Dict2D purge should delete unwanted keys at both levels"""
d = Dict2D(self.square)
d.RowOrder = 'ab'
d.ColOrder = 'bc'
d.purge()
self.assertEqual(d, Dict2D({'a':{'b':2,'c':3},'b':{'b':4,'c':6}}))
#check that a superset of the keys is OK
d = Dict2D(self.square)
d.RowOrder = dict.fromkeys('abcd')
d.ColOrder = dict.fromkeys('abcd')
d.purge()
self.assertEqual(d, Dict2D(self.square))
#check that everything gets deleted if nothing is valid
d.RowOrder = list('xyz')
d.ColOrder = list('xyz')
d.purge()
self.assertEqual(d, {})
def test_rowKeys(self):
"""Dict2D rowKeys should find all the keys of component rows"""
self.assertEqual(Dict2D(self.empty).rowKeys(), [])
self.assertEqual(Dict2D(self.single_diff).rowKeys(), ['a'])
#note that keys will be returned in arbitrary order
self.assertEqualItems(Dict2D(self.dense).rowKeys(), ['a','b',])
self.assertEqualItems(Dict2D(self.square).rowKeys(), ['a','b','c'])
self.assertEqualItems(Dict2D(self.sparse).rowKeys(), ['a','d'])
def test_colKeys(self):
"""Dict2D colKeys should find all the keys of component cols"""
self.assertEqual(Dict2D(self.empty).colKeys(), [])
self.assertEqual(Dict2D(self.single_diff).colKeys(), ['b'])
#note that keys will be returned in arbitrary order
self.assertEqualItems(Dict2D(self.square).colKeys(), ['a','b','c'])
self.assertEqualItems(Dict2D(self.dense).colKeys(), ['a','b','c'])
self.assertEqualItems(Dict2D(self.sparse).colKeys(), ['a','b','c'])
def test_sharedColKeys(self):
"""Dict2D sharedColKeys should find keys shared by all component cols"""
self.assertEqual(Dict2D(self.empty).sharedColKeys(), [])
self.assertEqual(Dict2D(self.single_diff).sharedColKeys(), ['b'])
#note that keys will be returned in arbitrary order
self.assertEqualItems(Dict2D(self.square).sharedColKeys(),['a','b','c'])
self.assertEqualItems(Dict2D(self.dense).sharedColKeys(), ['a','b','c'])
self.assertEqualItems(Dict2D(self.sparse).sharedColKeys(), [])
self.square['x'] = {'b':3, 'c':5, 'e':7}
self.assertEqualItems(Dict2D(self.square).colKeys(),['a','b','c','e'])
self.assertEqualItems(Dict2D(self.square).sharedColKeys(),['b','c'])
def test_square(self):
"""Dict2D square should ensure that all d[i][j] exist"""
#will raise exception if rows and cols aren't equal...
self.assertRaises(Dict2DError, Dict2D(self.sparse).square)
self.assertRaises(Dict2DError, Dict2D(self.dense).square)
#...unless reset_order is True
d = Dict2D(self.sparse)
d.square(reset_order=True)
self.assertEqual(d, Dict2D({
'a':{'a':1,'b':None,'c':3,'d':None},
'b':{'a':None, 'b':None, 'c':None, 'd':None},
'c':{'a':None, 'b':None, 'c':None, 'd':None},
'd':{'a':None, 'b':2, 'c':None, 'd':None},
}))
#Check that passing in a default works too
d = Dict2D(self.sparse)
d.square(reset_order=True, default='x')
self.assertEqual(d, Dict2D({
'a':{'a':1,'b':'x','c':3,'d':'x'},
'b':{'a':'x', 'b':'x', 'c':'x', 'd':'x'},
'c':{'a':'x', 'b':'x', 'c':'x', 'd':'x'},
'd':{'a':'x', 'b':2, 'c':'x', 'd':'x'},
}))
def test_rows(self):
"""Dict2D Rows property should return list in correct order"""
#should work with no data
self.assertEqual(list(Dict2D(self.empty).Rows), [])
#should work on square matrix
sq = Dict2D(self.square, RowOrder='abc', ColOrder='abc')
self.assertEqual(list(sq.Rows), [[1,2,3],[2,4,6],[3,6,9]])
#check that it works when we change the row and col order
sq.RowOrder = 'ba'
sq.ColOrder = 'ccb'
self.assertEqual(list(sq.Rows), [[6,6,4],[3,3,2]])
#check that it doesn't raise an error on sparse matrices...
sp = Dict2D(self.sparse)
rows = list(sp.Rows)
for r in rows:
r.sort()
rows.sort()
self.assertEqual(rows, [[1,3],[2]])
#...unless self.RowOrder and self.ColOrder are set...
sp.RowOrder = 'ad'
sp.ColOrder = 'abc'
self.assertRaises(Dict2DSparseError, list, sp.Rows)
#...and then, only if self.Pad is not set
sp.Pad = True
sp.Default = 'xxx'
self.assertEqual(list(sp.Rows), [[1, 'xxx', 3],['xxx',2,'xxx']])
def test_cols(self):
"""Dict2D Cols property should return list in correct order"""
#should work with no data
self.assertEqual(list(Dict2D(self.empty).Cols), [])
#should work with square matrix
sq = Dict2D(self.square, RowOrder='abc', ColOrder='abc')
self.assertEqual(list(sq.Cols), [[1,2,3],[2,4,6],[3,6,9]])
#check that it works when we change the row and col order
sq.RowOrder = 'ba'
sq.ColOrder = 'ccb'
self.assertEqual(list(sq.Cols), [[6,3],[6,3],[4,2]])
#check that it _does_ raise an error on sparse matrices...
sp = Dict2D(self.sparse)
self.assertRaises(Dict2DSparseError, list, sp.Cols)
#...especially if self.RowOrder and self.ColOrder are set...
sp.RowOrder = 'ad'
sp.ColOrder = 'abc'
self.assertRaises(Dict2DSparseError, list, sp.Cols)
#...and then, only if self.Pad is not set
sp.Pad = True
sp.Default = 'xxx'
self.assertEqual(list(sp.Cols), [[1,'xxx'],['xxx',2],[3,'xxx']])
def test_items(self):
"""Dict2D Items property should return list in correct order"""
#should work with no data
self.assertEqual(list(Dict2D(self.empty).Items), [])
#should work on square matrix
sq = Dict2D(self.square, RowOrder='abc', ColOrder='abc')
self.assertEqual(list(sq.Items), [1,2,3,2,4,6,3,6,9])
#check that it works when we change the row and col order
sq.RowOrder = 'ba'
sq.ColOrder = 'ccb'
self.assertEqual(list(sq.Items), [6,6,4,3,3,2])
#check that it doesn't raise an error on sparse matrices...
sp = Dict2D(self.sparse)
items = list(sp.Items)
items.sort()
self.assertEqual(items, [1,2,3])
#...unless self.RowOrder and self.ColOrder are set...
sp.RowOrder = 'ad'
sp.ColOrder = 'abc'
self.assertRaises(Dict2DSparseError, list, sp.Items)
#...and then, only if self.Pad is not set
sp.Pad = True
sp.Default = 'xxx'
self.assertEqual(list(sp.Items), [1, 'xxx', 3,'xxx',2,'xxx'])
def test_getRows(self):
"""Dict2D getRows should get specified rows"""
self.assertEqual(Dict2D(self.square).getRows(['a','c']), \
{'a':{'a':1,'b':2,'c':3},'c':{'a':3,'b':6,'c':9}})
#should work on sparse matrix
self.assertEqual(Dict2D(self.sparse).getRows(['d']), {'d':{'b':2}})
#should raise KeyError if row doesn't exist...
d = Dict2D(self.sparse)
self.assertRaises(KeyError, d.getRows, ['c'])
#...unless we're Padding
d.Pad = True
self.assertEqual(d.getRows('c'), {'c':{}})
#should work when we negate it
self.assertEqual(Dict2D(self.square).getRows(['a','c'], negate=True),
{'b':{'a':2,'b':4,'c':6}})
def test_getRowIndices(self):
"""Dict2D getRowIndices should return indices of rows where f(x) True"""
d = Dict2D(self.square)
lt_15 = lambda x: sum(x) < 15
self.assertEqual(d.getRowIndices(lt_15), ['a','b'])
#should be bound by RowOrder and ColOrder
d.RowOrder = d.ColOrder = 'ac'
self.assertEqual(d.getRowIndices(lt_15), ['a','c'])
#negate should work
d.RowOrder = d.ColOrder = None
self.assertEqual(d.getRowIndices(lt_15, negate=True), ['c'])
def test_getRowsIf(self):
"""Dict2D getRowsIf should return object with rows wher f(x) is True"""
d = Dict2D(self.square)
lt_15 = lambda x: sum(x) < 15
self.assertEqual(d.getRowsIf(lt_15), \
{'a':{'a':1,'b':2,'c':3},'b':{'a':2,'b':4,'c':6}})
#should do test by RowOrder, but copy the whole row
d.RowOrder = d.ColOrder = 'ac'
self.assertEqual(d.getRowsIf(lt_15), \
{'a':{'a':1,'b':2,'c':3},'c':{'a':3,'b':6,'c':9}})
#negate should work
d.RowOrder = d.ColOrder = None
self.assertEqual(d.getRowsIf(lt_15, negate=True), \
{'c':{'a':3,'b':6,'c':9}})
def test_getCols(self):
"""Dict2D getCols should return object with specified cols only"""
d = Dict2D(self.square)
self.assertEqual(d.getCols('bc'), {
'a':{'b':2, 'c':3},
'b':{'b':4, 'c':6},
'c':{'b':6,'c':9},
})
#check that it works on ragged matrices
d = Dict2D(self.top_triangle)
self.assertEqual(d.getCols('ac'), {
'a':{'a':1, 'c':3}, 'b':{'c':6}, 'c':{'c':9}
})
#check that negate works
d = Dict2D(self.square)
self.assertEqual(d.getCols('bc', negate=True), {
'a':{'a':1}, 'b':{'a':2}, 'c':{'a':3},
})
def test_getColIndices(self):
"""Dict2D getColIndices should return list of indices of matching cols"""
d = Dict2D(self.square)
lt_15 = lambda x: sum(x) < 15
self.assertEqual(d.getColIndices(lt_15), ['a','b'])
#check that negate works
self.assertEqual(d.getColIndices(lt_15, negate=True), ['c'])
def test_getColsIf(self):
"""Dict2D getColsIf should return new Dict2D with matching cols"""
d = Dict2D(self.square)
lt_15 = lambda x: sum(x) < 15
self.assertEqual(d.getColsIf(lt_15), {
'a':{'a':1,'b':2},'b':{'a':2,'b':4},'c':{'a':3,'b':6}
})
#check that negate works
self.assertEqual(d.getColsIf(lt_15, negate=True), \
{'a':{'c':3},'b':{'c':6},'c':{'c':9}})
def test_getItems(self):
"""Dict2D getItems should return list of relevant items"""
d = Dict2D(self.square)
self.assertEqual(d.getItems([('a','a'),('b','c'),('c','a'),('a','a')]),\
[1,6,3,1])
#should work on ragged matrices...
d = Dict2D(self.top_triangle)
self.assertEqual(d.getItems([('a','c'),('c','c')]), [3,9])
#...unless absent items are asked for...
self.assertRaises(KeyError, d.getItems, [('a','a'),('c','a')])
#...unles self.Pad is True
d.Pad = True
self.assertEqual(d.getItems([('a','c'),('c','a')]), [3, None])
#negate should work -- must specify RowOrder and ColOrder to get
#results in predictable order
d.Pad = False
d.RowOrder = d.ColOrder = 'abc'
self.assertEqual(d.getItems([('a','c'),('c','a'),('a','a')], \
negate=True), [2,4,6,9])
def test_getItemIndices(self):
"""Dict2D getItemIndices should return indices when f(item) is True"""
lt_5 = lambda x: x < 5
d = Dict2D(self.square)
d.RowOrder = d.ColOrder = 'abc'
self.assertEqual(d.getItemIndices(lt_5), \
[('a','a'),('a','b'),('a','c'),('b','a'),('b','b'),('c','a')])
self.assertEqual(d.getItemIndices(lt_5, negate=True), \
[('b','c'),('c','b'),('c','c')])
d = Dict2D(self.top_triangle)
d.RowOrder = d.ColOrder = 'abc'
self.assertEqual(d.getItemIndices(lt_5), \
[('a','a'),('a','b'),('a','c'),('b','b')])
def test_getItemsIf(self):
"""Dict2D getItemsIf should return list of items when f(item) is True"""
lt_5 = lambda x: x < 5
d = Dict2D(self.square)
d.RowOrder = d.ColOrder = 'abc'
self.assertEqual(d.getItemsIf(lt_5), [1,2,3,2,4,3])
self.assertEqual(d.getItemsIf(lt_5, negate=True), [6,6,9])
d = Dict2D(self.top_triangle)
d.RowOrder = d.ColOrder = 'abc'
self.assertEqual(d.getItemsIf(lt_5), [1,2,3,4])
self.assertEqual(d.getItemsIf(lt_5, negate=True), [6,9])
def test_toLists(self):
"""Dict2D toLists should convert dict into list of lists"""
d = Dict2D(self.square)
d.RowOrder = 'abc'
d.ColOrder = 'abc'
self.assertEqual(d.toLists(), [[1,2,3],[2,4,6],[3,6,9]])
self.assertEqual(d.toLists(headers=True), \
[['-', 'a', 'b', 'c'],
['a', 1, 2, 3],
['b', 2, 4, 6],
['c', 3, 6, 9],
])
#should raise error if called on sparse matrix...
self.assertRaises(Dict2DSparseError, Dict2D(self.sparse).toLists)
#...unless self.Pad is True
d = Dict2D(self.sparse)
d.RowOrder = 'ad'
d.ColOrder = 'abc'
d.Pad = True
d.Default = 'x'
self.assertEqual(d.toLists(headers=True), \
[['-','a','b','c'],['a',1,'x',3],['d','x',2,'x']])
#works without RowOrder or ColOrder
goal = [[1,2,3],[2,4,6],[3,6,9]]
# headers=False
d = Dict2D(self.square)
l = d.toLists()
for r in l:
r.sort()
l.sort()
self.assertEqual(l,goal)
# headers=True
d.toLists(headers=True)
l = d.toLists()
for r in l:
r.sort()
l.sort()
self.assertEqual(l,goal)
def test_copy(self):
"""Dict2D copy should copy data and selected attributes"""
#if it works on sparse matrices, it'll work on dense ones
s = Dict2D(self.sparse)
s.Pad = True
s.RowOrder = 'abc'
s.ColOrder = 'def'
s.xxx = 'yyy' #arbitrary attributes won't be copied
s2 = s.copy()
self.assertEqual(s, s2)
assert s is not s2
assert not hasattr(s2, 'xxx')
self.assertEqual(s2.RowOrder, 'abc')
self.assertEqual(s2.ColOrder, 'def')
self.assertEqual(s2.Pad, True)
assert 'Default' not in s2.__dict__
assert 'RowConstructor' not in s2.__dict__
def test_fill(self):
"""Dict2D fill should fill in specified values"""
#with no parameters, should just fill in elements that exist
d = Dict2D(self.sparse)
d.fill('x')
self.assertEqual(d, {'a':{'a':'x','c':'x'}, 'd':{'b':'x'}})
#if cols is set, makes sure all the relevant cols exist in each row
#doesn't delete extra cols if they are present
d = Dict2D(self.sparse)
d.fill('x', cols='bc')
#note that d[a][a] should not be affected by the fill
self.assertEqual(d, {'a':{'a':1,'b':'x','c':'x'},\
'd':{'b':'x','c':'x'}
})
#if rows but not cols is set, should create but not fill rows
d = Dict2D(self.sparse)
d.fill('y', rows='ab')
self.assertEqual(d, {'a':{'a':'y','c':'y'},
'b':{}, #new row created
'd':{'b':2} #unaffected since not in rows
})
#if both rows and cols are set, should create and fill rows
d = Dict2D(self.sparse)
d.fill('z', rows='abc', cols='abc')
self.assertEqual(d, {'a':{'a':'z','b':'z','c':'z'},
'b':{'a':'z','b':'z','c':'z'},
'c':{'a':'z','b':'z','c':'z'},
'd':{'b':2} #unaffected since col skipped
})
#if set_orders is True, should reset RowOrder and ColOrder
d = Dict2D(self.sparse)
d.fill('z', rows='abc', cols='xyz', set_orders=True)
self.assertEqual(d.RowOrder, 'abc')
self.assertEqual(d.ColOrder, 'xyz')
d.fill('a', set_orders=True)
self.assertEqual(d.RowOrder, None)
self.assertEqual(d.ColOrder, None)
def test_setDiag(self):
"""Dict2D setDiag should set diagonal to specified value"""
#should have no effect on empty dict2d
d = Dict2D(self.empty)
d.setDiag(0)
self.assertEqual(d, {})
#should work on one-element dict
d = Dict2D(self.single_same)
d.setDiag(0)
self.assertEqual(d, {'a':{'a':0}})
d = Dict2D(self.single_diff)
d.setDiag(0)
self.assertEqual(d, {'a':{'a':0,'b':3}})
#should work on dense dict
d = Dict2D(self.square)
d.setDiag(9)
self.assertEqual(d, {
'a':{'a':9,'b':2,'c':3},
'b':{'a':2,'b':9,'c':6},
'c':{'a':3,'b':6,'c':9},
})
#should work on sparse dict, creating cols for rows but not vice versa
d = Dict2D(self.sparse)
d.setDiag(-1)
self.assertEqual(d, {'a':{'a':-1,'c':3},'d':{'b':2,'d':-1}})
def test_scale(self):
"""Dict2D scale should apply f(x) to each d[i][j]"""
doubler = lambda x: x * 2
#should have no effect on empty Dict2D
d = Dict2D(self.empty)
d.scale(doubler)
self.assertEqual(d, {})
#should work on single-element dict
d = Dict2D(self.single_diff)
d.scale(doubler)
self.assertEqual(d, {'a':{'b':6}})
#should work on dense dict
d = Dict2D(self.square)
d.scale(doubler)
self.assertEqual(d, {
'a':{'a':2,'b':4,'c':6},
'b':{'a':4,'b':8,'c':12},
'c':{'a':6,'b':12,'c':18},
})
#should work on sparse dict, not creating any new elements
d = Dict2D(self.sparse)
d.scale(doubler)
self.assertEqual(d, {'a':{'a':2,'c':6},'d':{'b':4}})
def test_transpose(self):
"""Dict2D transpose should work on both dense and sparse matrices, in place"""
#should do nothing to empty matrix
d = Dict2D(self.empty)
d.transpose()
self.assertEqual(d, {})
#should do nothing to single-element square matrix
d = Dict2D(self.single_same)
d.transpose()
self.assertEqual(d, {'a':{'a':2}})
#should reverse single-element non-square matrix
d = Dict2D(self.single_diff)
d.transpose()
self.assertEqual(d, {'b':{'a':3}})
#should work on sparse matrix
d = Dict2D(self.sparse)
d.transpose()
self.assertEqual(d, {'a':{'a':1}, 'c':{'a':3},'b':{'d':2}})
#should reverse row and col order
d = Dict2D(self.dense)
d.RowOrder = 'ab'
d.ColOrder = 'abc'
d.transpose()
self.assertEqual(d, \
{'a':{'a':1,'b':2},'b':{'a':2,'b':4},'c':{'a':3,'b':6}})
self.assertEqual(d.ColOrder, 'ab')
self.assertEqual(d.RowOrder, 'abc')
def test_reflect(self):
"""Dict2D reflect should reflect square matrices across diagonal."""
d = Dict2D(self.top_triangle)
#should fail if RowOrder and/or ColOrder are unspecified
self.assertRaises(Dict2DError, d.reflect)
self.assertRaises(Dict2DError, d.reflect, upper_to_lower)
d.RowOrder = 'abc'
self.assertRaises(Dict2DError, d.reflect)
d.RowOrder = None
d.ColOrder = 'abc'
self.assertRaises(Dict2DError, d.reflect)
#should work if RowOrder and ColOrder are both set
d.RowOrder = 'abc'
d.reflect(upper_to_lower)
self.assertEqual(d, self.square)
#try it on lower triangle as well -- note that the diagonal won't be
#set if it's absent.
d = Dict2D(self.bottom_triangle)
d.ColOrder = 'abc'
d.RowOrder = 'abc'
d.reflect(lower_to_upper)
self.assertEqual(d, {
'a':{'b':2,'c':3},
'b':{'a':2,'c':6},
'c':{'a':3,'b':6},
})
d = Dict2D({
'a':{'a':2,'b':4,'c':6},
'b':{'a':10,'b':20, 'c':30},
'c':{'a':30, 'b':60, 'c':90},
})
d.ColOrder = d.RowOrder = 'abc'
d.reflect(average)
self.assertEqual(d, {
'a':{'a':2,'b':7,'c':18},
'b':{'a':7,'b':20,'c':45},
'c':{'a':18,'b':45,'c':90},
})
def test_toDelimited(self):
"""Dict2D toDelimited should return delimited string for printing"""
d = Dict2D(self.square)
d.RowOrder = d.ColOrder = 'abc'
self.assertEqual(d.toDelimited(), \
'-\ta\tb\tc\na\t1\t2\t3\nb\t2\t4\t6\nc\t3\t6\t9')
self.assertEqual(d.toDelimited(headers=False), \
'1\t2\t3\n2\t4\t6\n3\t6\t9')
#set up a custom formatter...
def my_formatter(x):
try:
return '%1.1f' % x
except:
return str(x)
#...and use it
self.assertEqual(d.toDelimited(headers=True, item_delimiter='x', \
row_delimiter='y', formatter=my_formatter), \
'-xaxbxcyax1.0x2.0x3.0ybx2.0x4.0x6.0ycx3.0x6.0x9.0')
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_util/test_misc.py 000644 000765 000024 00000210565 12024702176 021774 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for utility functions and classes.
"""
from copy import copy, deepcopy
from os import remove, rmdir
from os.path import exists
from cogent.app.util import get_tmp_filename
from cogent.util.unit_test import TestCase, main
from cogent.util.misc import (iterable, max_index, min_index,
flatten, is_iterable, is_char, is_char_or_noniterable,
is_str_or_noniterable, not_list_tuple, list_flatten,
recursive_flatten, unflatten, unzip, select, sort_order, find_all,
find_many, unreserve,
extract_delimited, caps_from_underscores,
add_lowercase, InverseDict, InverseDictMulti, DictFromPos, DictFromFirst,
DictFromLast, DistanceFromMatrix, PairsFromGroups,
ClassChecker, Delegator, FunctionWrapper,
ConstraintError, ConstrainedContainer,
ConstrainedString, ConstrainedList, ConstrainedDict,
MappedString, MappedList, MappedDict,
generateCombinations, makeNonnegInt,
NonnegIntError, reverse_complement, not_none, get_items_except,
NestedSplitter, curry, app_path, remove_files, get_random_directory_name,
revComp, parse_command_line_parameters, safe_md5,
create_dir, handle_error_codes, identity, if_, deep_list, deep_tuple,
combinate,gzip_dump,gzip_load,recursive_flatten_old,getNewId,toString,
timeLimitReached, get_independent_coords, get_merged_by_value_coords,
get_merged_overlapping_coords, get_run_start_indices)
from numpy import array
from time import clock, sleep
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Amanda Birmingham", "Sandra Smit",
"Zongzhi Liu", "Peter Maxwell", "Daniel McDonald"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class UtilsTests(TestCase):
"""Tests of individual functions in utils"""
def setUp(self):
""" """
self.files_to_remove = []
self.dirs_to_remove = []
def tearDown(self):
""" """
map(remove,self.files_to_remove)
map(rmdir,self.dirs_to_remove)
def test_identity(self):
"""should return same object"""
foo = [1,'a',lambda x: x]
exp = id(foo)
self.assertEqual(id(identity(foo)), exp)
def test_if_(self):
"""implementation of c-like tertiary operator"""
exp = 'yay'
obs = if_(True, 'yay', 'nay')
self.assertEqual(obs, exp)
exp = 'nay'
obs = if_(False, 'yay', 'nay')
self.assertEqual(obs, exp)
def test_deep_list(self):
"""should convert nested tuple to nested list"""
input = ((1,(2,3)),(4,5),(6,7))
exp = [[1,[2,3]],[4,5],[6,7]]
obs = deep_list(input)
self.assertEqual(obs, exp)
def test_deep_tuple(self):
"""Should convert a nested list to a nested tuple"""
exp = ((1,(2,3)),(4,5),(6,7))
input = [[1,[2,3]],[4,5],[6,7]]
obs = deep_tuple(input)
self.assertEqual(obs, exp)
def test_combinate(self):
"""Should return combinations"""
input = [1,2,3,4]
n = 2
exp = [[1,2],[1,3],[1,4],[2,3],[2,4],[3,4]]
obs = list(combinate(input, n))
self.assertEqual(obs, exp)
def test_recursive_flatten_old(self):
"""Should flatten nested lists"""
input = [[[1,2],[3,[4,5]],[6,7]],8]
exp = [1,2,3,4,5,6,7,8]
obs = recursive_flatten_old(input)
self.assertEqual(obs, exp)
def test_getNewId(self):
"""should return a random 12 digit id"""
rand_f = lambda x: 1
obs = getNewId(rand_f=rand_f)
exp = '111111111111'
self.assertEqual(obs,exp)
def test_toString(self):
"""should stringify an object"""
class foo(object):
def __init__(self):
self.bar = 5
exp = 'bar: 5'
obs = toString(foo())
self.assertEqual(obs, exp)
# this test and the code it tests is architecture dependent. that is not
# good.
#def test_timeLimitReached(self):
# """should return true if timelimit has been reached, else return false"""
# start = clock()
# timelimit = .0002
# exp = False
# sleep(1)
# obs = timeLimitReached(start, timelimit)
# self.assertEqual(obs, exp)
# sleep(1)
# exp = True
# obs = timeLimitReached(start, timelimit)
# self.assertEqual(obs, exp)
def test_safe_md5(self):
"""Make sure we have the expected md5"""
exp = 'd3b07384d113edec49eaa6238ad5ff00'
tmp_fp = get_tmp_filename(prefix='test_safe_md5', suffix='txt')
self.files_to_remove.append(tmp_fp)
tmp_f = open(tmp_fp, 'w')
tmp_f.write('foo\n')
tmp_f.close()
obs = safe_md5(open(tmp_fp, 'U'))
self.assertEqual(obs.hexdigest(),exp)
def test_iterable(self):
"""iterable(x) should return x or [x], always an iterable result"""
self.assertEqual(iterable('x'), 'x')
self.assertEqual(iterable(''), '')
self.assertEqual(iterable(3), [3])
self.assertEqual(iterable(None), [None])
self.assertEqual(iterable({'a':1}), {'a':1})
self.assertEqual(iterable(['a','b','c']), ['a', 'b', 'c'])
def test_max_index(self):
"""max_index should return index of largest item, last if tie"""
self.assertEqual(max_index('abcde'), 4)
self.assertEqual(max_index('ebcda'), 0)
self.assertRaises(ValueError, max_index, '')
self.assertEqual(max_index('ebcde'), 4)
self.assertEqual(max_index([0, 0, 1, 0]), 2)
def test_min_index(self):
"""min_index should return index of smallest item, first if tie"""
self.assertEqual(min_index('abcde'), 0)
self.assertEqual(min_index('ebcda'), 4)
self.assertRaises(ValueError, min_index, '')
self.assertEqual(min_index('ebcde'), 1)
self.assertEqual(min_index([0,0,1,0]), 0)
def test_flatten_no_change(self):
"""flatten should not change non-nested sequences (except to list)"""
self.assertEqual(flatten('abcdef'), list('abcdef')) #test identities
self.assertEqual(flatten([]), []) #test empty sequence
self.assertEqual(flatten(''), []) #test empty string
def test_flatten(self):
"""flatten should remove one level of nesting from nested sequences"""
self.assertEqual(flatten(['aa', 'bb', 'cc']), list('aabbcc'))
self.assertEqual(flatten([1,[2,3], [[4, [5]]]]), [1, 2, 3, [4,[5]]])
def test_is_iterable(self):
"""is_iterable should return True for iterables"""
#test str
self.assertEqual(is_iterable('aa'), True)
#test list
self.assertEqual(is_iterable([3,'aa']), True)
#test Number, expect False
self.assertEqual(is_iterable(3), False)
def test_is_char(self):
"""is_char(obj) should return True when obj is a char"""
self.assertEqual(is_char('a'), True)
self.assertEqual(is_char('ab'), False)
self.assertEqual(is_char(''), True)
self.assertEqual(is_char([3]), False)
self.assertEqual(is_char(3), False)
def test_is_char_or_noniterable(self):
"""is_char_or_noniterable should return True or False"""
self.assertEqual(is_char_or_noniterable('a'), True)
self.assertEqual(is_char_or_noniterable('ab'), False)
self.assertEqual(is_char_or_noniterable(3), True)
self.assertEqual(is_char_or_noniterable([3]), False)
def test_is_str_or_noniterable(self):
"""is_str_or_noniterable should return True or False"""
self.assertEqual(is_str_or_noniterable('a'), True)
self.assertEqual(is_str_or_noniterable('ab'), True)
self.assertEqual(is_str_or_noniterable(3), True)
self.assertEqual(is_str_or_noniterable([3]), False)
def test_recursive_flatten(self):
"""recursive_flatten should remove all nesting from nested sequences"""
self.assertEqual(recursive_flatten([1,[2,3], [[4, [5]]]]), [1,2,3,4,5])
#test default behavior on str unpacking
self.assertEqual(recursive_flatten(
['aa',[8,'cc','dd'], ['ee',['ff','gg']]]),
['a', 'a', 8, 'c', 'c', 'd', 'd', 'e', 'e', 'f', 'f', 'g', 'g'])
#test str untouched flattening using is_leaf=is_str_or_noniterable
self.assertEqual(recursive_flatten(
['aa',[8,'cc','dd'], ['ee',['ff','gg']]],
is_leaf=is_str_or_noniterable),
['aa',8,'cc','dd','ee','ff','gg'])
def test_create_dir(self):
"""create_dir creates dir and fails meaningful."""
tmp_dir_path = get_random_directory_name()
tmp_dir_path2 = get_random_directory_name(suppress_mkdir=True)
tmp_dir_path3 = get_random_directory_name(suppress_mkdir=True)
self.dirs_to_remove.append(tmp_dir_path)
self.dirs_to_remove.append(tmp_dir_path2)
self.dirs_to_remove.append(tmp_dir_path3)
# create on existing dir raises OSError if fail_on_exist=True
self.assertRaises(OSError, create_dir, tmp_dir_path,
fail_on_exist=True)
self.assertEquals(create_dir(tmp_dir_path,
fail_on_exist=True,
handle_errors_externally=True), 1)
# return should be 1 if dir exist and fail_on_exist=False
self.assertEqual(create_dir(tmp_dir_path, fail_on_exist=False), 1)
# if dir not there make it and return always 0
self.assertEqual(create_dir(tmp_dir_path2), 0)
self.assertEqual(create_dir(tmp_dir_path3, fail_on_exist=True), 0)
def test_handle_error_codes(self):
"""handle_error_codes raises the right error."""
self.assertRaises(OSError, handle_error_codes, "test", False,1)
self.assertEqual(handle_error_codes("test", True, 1), 1)
self.assertEqual(handle_error_codes("test", False, 0), 0)
self.assertEqual(handle_error_codes("test"), 0)
def test_not_list_tuple(self):
"""not_list_tuple(obj) should return False when obj is list or tuple"""
self.assertEqual(not_list_tuple([8,3]), False)
self.assertEqual(not_list_tuple((8,3)), False)
self.assertEqual(not_list_tuple('34'), True)
def test_list_flatten(self):
"""list_flatten should remove all nesting, str is untouched """
self.assertEqual(list_flatten(
['aa',[8,'cc','dd'], ['ee',['ff','gg']]], ),
['aa',8,'cc','dd','ee','ff','gg'])
def test_recursive_flatten_max_depth(self):
"""recursive_flatten should not remover more than max_depth levels"""
self.assertEqual(recursive_flatten([1,[2,3], [[4, [5]]]]), [1,2,3,4,5])
self.assertEqual(recursive_flatten([1,[2,3], [[4, [5]]]], 0), \
[1,[2,3], [[4, [5]]]])
self.assertEqual(recursive_flatten([1,[2,3], [[4, [5]]]], 1), \
[1,2,3, [4, [5]]])
self.assertEqual(recursive_flatten([1,[2,3], [[4, [5]]]], 2), \
[1,2,3,4, [5]])
self.assertEqual(recursive_flatten([1,[2,3], [[4, [5]]]], 3), \
[1,2,3,4,5])
self.assertEqual(recursive_flatten([1,[2,3], [[4, [5]]]], 5000), \
[1,2,3,4,5])
def test_unflatten(self):
"""unflatten should turn a 1D sequence into a 2D list"""
self.assertEqual(unflatten("abcdef", 1), list("abcdef"))
self.assertEqual(unflatten("abcdef", 1, True), list("abcdef"))
self.assertEqual(unflatten("abcdef", 2), ['ab','cd','ef'])
self.assertEqual(unflatten("abcdef", 3), ['abc','def'])
self.assertEqual(unflatten("abcdef", 4), ['abcd'])
#should be able to preserve extra items
self.assertEqual(unflatten("abcdef", 4, True), ['abcd', 'ef'])
self.assertEqual(unflatten("abcdef", 10), [])
self.assertEqual(unflatten("abcdef", 10, True), ['abcdef'])
#should succeed on empty sequnce
self.assertEqual(unflatten('',10), [])
def test_unflatten_bad_row_width(self):
"unflatten should raise ValueError with row_width < 1"""
self.assertRaises(ValueError, unflatten, "abcd", 0)
self.assertRaises(ValueError, unflatten, "abcd", -1)
def test_unzip(self):
"""unzip(items) should be the inverse of zip(*items)"""
chars = [list('abcde'), list('ghijk')]
numbers = [[1,2,3,4,5], [0,0,0,0,0]]
strings = [["abcde", "fghij", "klmno"], ['xxxxx'] * 3]
empty = [[]]
lists = [chars, numbers, strings]
zipped = [zip(*i) for i in lists]
unzipped = [unzip(i) for i in zipped]
for u, l in zip(unzipped, lists):
self.assertEqual(u, l)
def test_select_sequence(self):
"""select should work on a sequence with a list of indices"""
chars = 'abcdefghij'
strings = list(chars)
tests = { (0,):['a'],
(-1,):['j'],
(0, 2, 4): ['a', 'c', 'e'],
(9,8,7,6,5,4,3,2,1,0):list('jihgfedcba'),
(-8, 8): ['c', 'i'],
():[],
}
for test, result in tests.items():
self.assertEqual(select(test, chars), result)
self.assertEqual(select(test, strings), result)
def test_select_empty(self):
"""select should raise error if indexing into empty sequence"""
self.assertRaises(IndexError, select, [1], [])
def test_select_mapping(self):
"""select should return the values corresponding to a list of keys"""
values = {'a':5, 'b':2, 'c':4, 'd':6, 'e':7}
self.assertEqual(select('abc', values), [5,2,4])
self.assertEqual(select(['e','e','e'], values), [7,7,7])
self.assertEqual(select(('e', 'b', 'a'), values), [7, 2, 5])
#check that it raises KeyError on anything out of range
self.assertRaises(KeyError, select, 'abx', values)
def test_sort_order(self):
"""sort_order should return the ordered indices of items"""
self.assertEqual(sort_order('abc'), [0, 1, 2])
self.assertEqual(sort_order('cba'), [2,1,0])
self.assertEqual(sort_order('bca'), [2,0,1])
def test_sort_order_cmpfunc(self):
"""sort_order should use cmpfunc if passed"""
self.assertEqual(sort_order([4, 8, 10], lambda x,y:cmp(y,x)), [2, 1, 0])
def test_sort_order_empty(self):
"""sort_order should return empty list on empty sequence"""
self.assertEqual(sort_order([]), [])
def test_find_all(self):
"""find_all should return list of all occurrences"""
self.assertEqual(find_all('abc', 'd'), [])
self.assertEqual(find_all('abc', 'a'), [0])
self.assertEqual(find_all('abcabca', 'a'), [0,3,6])
self.assertEqual(find_all('abcabca', 'c'), [2,5])
self.assertEqual(find_all('abcabca', '3'), [])
self.assertEqual(find_all('abcabca', 'bc'), [1,4])
self.assertRaises(TypeError, find_all,'abcabca', 3)
def test_find_many(self):
"""find_many should return list of all occurrences of all items"""
#should be same as find_all for single chars
self.assertEqual(find_many('abc', 'd'), [])
self.assertEqual(find_many('abc', 'a'), [0])
self.assertEqual(find_many('abcabca', 'a'), [0,3,6])
self.assertEqual(find_many('abcabca', 'c'), [2,5])
self.assertEqual(find_many('abcabca', '3'), [])
#should sort together the items from the two lists
self.assertEqual(find_many('abcabca', 'bc'), [1,2,4,5])
#note difference between 2-char string and 1-string list
self.assertEqual(find_many('abcabca', ['bc']), [1,4])
self.assertRaises(TypeError, find_many,'abcabca', [3])
def test_unreserve(self):
"""unreserve should trim trailing underscore if present."""
for i in (None, [], ['x'], 'xyz', '', 'a', '__abc'):
self.assertEqual(unreserve(i), i)
self.assertEqual(unreserve('_'), '')
self.assertEqual(unreserve('class_'), 'class')
def test_extract_delimited_bad_delimiters(self):
"""extract_delimited should raise error if delimiters identical"""
self.assertRaises(TypeError, extract_delimited, '|acb|acx', '|','|')
def test_extract_delimited_missing_right(self):
"""extract_delimited should raise error if right delimiter missing"""
self.assertRaises(ValueError, extract_delimited, 'ac[acgsd', '[', ']')
def test_extract_delimited_normal(self):
"""extract_delimited should return correct field if present, or None"""
self.assertEqual(extract_delimited('[]', '[', ']'), '')
self.assertEqual(extract_delimited('asdsad', '[', ']'), None)
self.assertEqual(extract_delimited('ac[abc]ac', '[', ']'), 'abc')
self.assertEqual(extract_delimited('[xyz]asd', '[', ']'), 'xyz')
self.assertEqual(extract_delimited('acg[xyz]', '[', ']'), 'xyz')
self.assertEqual(extract_delimited('abcdef', 'a', 'e'), 'bcd')
def test_extract_delimited_indexed(self):
"""extract_delimited should return correct field with starting index"""
self.assertEqual(extract_delimited('[abc][def]', '[',']', 0), 'abc')
self.assertEqual(extract_delimited('[abc][def]','[',']',1), 'def')
self.assertEqual(extract_delimited('[abc][def]', '[',']',5), 'def')
def test_caps_from_underscores(self):
"""caps_from_underscores should become CapsFromUnderscores"""
cfu = caps_from_underscores
#should still convert strings without underscores
self.assertEqual(cfu('ABCDE abcde!$'), 'Abcde Abcde!$')
self.assertEqual(cfu('abc_def'), 'AbcDef')
#should read through multiple underscores
self.assertEqual(cfu('_caps__from_underscores___'),
'CapsFromUnderscores')
def test_add_lowercase(self):
"""add_lowercase should add lowercase version of each key w/ same val"""
d = {'a':1, 'b':'test', 'A':5, 'C':123, 'D':[], 'AbC':'XyZ', \
None:'3', '$':'abc', 145:'5'}
add_lowercase(d)
assert d['d'] is d['D']
d['D'].append(3)
self.assertEqual(d['D'], [3])
self.assertEqual(d['d'], [3])
self.assertNotEqual(d['a'], d['A'])
self.assertEqual(d, {'a':1, 'b':'test', 'A':5, 'C':123, 'c':123, \
'D':[3], 'd':[3], 'AbC':'XyZ', 'abc':'xyz', None:'3', '$':'abc', \
145:'5'})
#should work with strings
d = 'ABC'
self.assertEqual(add_lowercase(d), 'ABCabc')
#should work with tuples
d = tuple('ABC')
self.assertEqual(add_lowercase(d), tuple('ABCabc'))
#should work with lists
d = list('ABC')
self.assertEqual(add_lowercase(d), list('ABCabc'))
#should work with sets
d = set('ABC')
self.assertEqual(add_lowercase(d), set('ABCabc'))
#...even frozensets
d = frozenset('ABC')
self.assertEqual(add_lowercase(d), frozenset('ABCabc'))
def test_add_lowercase_tuple(self):
"""add_lowercase should deal with tuples correctly"""
d = {('A','B'):'C', ('D','e'):'F', ('b','c'):'H'}
add_lowercase(d)
self.assertEqual(d, {
('A','B'):'C',
('a','b'):'c',
('D','e'):'F',
('d','e'):'f',
('b','c'):'H',
})
def test_InverseDict(self):
"""InverseDict should invert dict's keys and values"""
self.assertEqual(InverseDict({}), {})
self.assertEqual(InverseDict({'3':4}), {4:'3'})
self.assertEqual(InverseDict({'a':'x','b':1,'c':None,'d':('a','b')}), \
{'x':'a',1:'b',None:'c',('a','b'):'d'})
self.assertRaises(TypeError, InverseDict, {'a':['a','b','c']})
d = InverseDict({'a':3, 'b':3, 'c':3})
self.assertEqual(len(d), 1)
assert 3 in d
assert d[3] in 'abc'
def test_InverseDictMulti(self):
"""InverseDictMulti should invert keys and values, keeping all keys"""
self.assertEqual(InverseDictMulti({}), {})
self.assertEqual(InverseDictMulti({'3':4}), {4:['3']})
self.assertEqual(InverseDictMulti(\
{'a':'x','b':1,'c':None,'d':('a','b')}), \
{'x':['a'],1:['b'],None:['c'],('a','b'):['d']})
self.assertRaises(TypeError, InverseDictMulti, {'a':['a','b','c']})
d = InverseDictMulti({'a':3, 'b':3, 'c':3, 'd':'3', 'e':'3'})
self.assertEqual(len(d), 2)
assert 3 in d
d3_items = d[3][:]
self.assertEqual(len(d3_items), 3)
d3_items.sort()
self.assertEqual(''.join(d3_items), 'abc')
assert '3' in d
d3_items = d['3'][:]
self.assertEqual(len(d3_items), 2)
d3_items.sort()
self.assertEqual(''.join(d3_items), 'de')
def test_DictFromPos(self):
"""DictFromPos should return correct lists of positions"""
d = DictFromPos
self.assertEqual(d(''), {})
self.assertEqual(d('a'), {'a':[0]})
self.assertEqual(d(['a','a','a']), {'a':[0,1,2]})
self.assertEqual(d('abacdeeee'), {'a':[0,2],'b':[1],'c':[3],'d':[4], \
'e':[5,6,7,8]})
self.assertEqual(d(('abc',None, 'xyz', None, 3)), {'abc':[0],None:[1,3],
'xyz':[2], 3:[4]})
def test_DictFromFirst(self):
"""DictFromFirst should return correct first positions"""
d = DictFromFirst
self.assertEqual(d(''), {})
self.assertEqual(d('a'), {'a':0})
self.assertEqual(d(['a','a','a']), {'a':0})
self.assertEqual(d('abacdeeee'), {'a':0,'b':1,'c':3,'d':4,'e':5})
self.assertEqual(d(('abc',None, 'xyz', None, 3)), {'abc':0,None:1,
'xyz':2, 3:4})
def test_DictFromLast(self):
"""DictFromLast should return correct last positions"""
d = DictFromLast
self.assertEqual(d(''), {})
self.assertEqual(d('a'), {'a':0})
self.assertEqual(d(['a','a','a']), {'a':2})
self.assertEqual(d('abacdeeee'), {'a':2,'b':1,'c':3,'d':4,'e':8})
self.assertEqual(d(('abc',None, 'xyz', None, 3)), {'abc':0,None:3,
'xyz':2, 3:4})
def test_DistanceFromMatrix(self):
"""DistanceFromMatrix should return correct elements"""
m = {'a':{'3':4, 6:1}, 'b':{'3':5,'6':2}}
d = DistanceFromMatrix(m)
self.assertEqual(d('a','3'), 4)
self.assertEqual(d('a',6), 1)
self.assertEqual(d('b','3'), 5)
self.assertEqual(d('b','6'), 2)
self.assertRaises(KeyError, d, 'c', 1)
self.assertRaises(KeyError, d, 'b', 3)
def test_PairsFromGroups(self):
"""PairsFromGroups should return dict with correct pairs"""
empty = []
self.assertEqual(PairsFromGroups(empty), {})
one = ['abc']
self.assertEqual(PairsFromGroups(one), dict.fromkeys([ \
('a','a'), ('a','b'), ('a','c'), \
('b','a'), ('b','b'), ('b','c'), \
('c','a'), ('c','b'), ('c','c'), \
]))
two = ['xy', 'abc']
self.assertEqual(PairsFromGroups(two), dict.fromkeys([ \
('a','a'), ('a','b'), ('a','c'), \
('b','a'), ('b','b'), ('b','c'), \
('c','a'), ('c','b'), ('c','c'), \
('x','x'), ('x','y'), ('y','x'), ('y','y'), \
]))
#if there's overlap, note that the groups should _not_ be expanded
#(e.g. in the following case, 'x' is _not_ similar to 'c', even though
#both 'x' and 'c' are similar to 'a'.
overlap = ['ax', 'abc']
self.assertEqual(PairsFromGroups(overlap), dict.fromkeys([ \
('a','a'), ('a','b'), ('a','c'), \
('b','a'), ('b','b'), ('b','c'), \
('c','a'), ('c','b'), ('c','c'), \
('x','x'), ('x','a'), ('a','x'), \
]))
def test_remove_files(self):
"""Remove files functions as expected """
# create list of temp file paths
test_filepaths = \
[get_tmp_filename(prefix='remove_files_test') for i in range(5)]
# try to remove them with remove_files and verify that an IOError is
# raises
self.assertRaises(OSError,remove_files,test_filepaths)
# now get no error when error_on_missing=False
remove_files(test_filepaths,error_on_missing=False)
# touch one of the filepaths so it exists
open(test_filepaths[2],'w').close()
# check that an error is raised on trying to remove the files...
self.assertRaises(OSError,remove_files,test_filepaths)
# ... but that the existing file was still removed
self.assertFalse(exists(test_filepaths[2]))
# touch one of the filepaths so it exists
open(test_filepaths[2],'w').close()
# no error is raised on trying to remove the files
# (although 4 don't exist)...
remove_files(test_filepaths,error_on_missing=False)
# ... and the existing file was removed
self.assertFalse(exists(test_filepaths[2]))
def test_get_random_directory_name(self):
"""get_random_directory_name functions as expected """
# repeated calls yield different directory names
dirs = []
for i in range(100):
d = get_random_directory_name(suppress_mkdir=True)
self.assertTrue(d not in dirs)
dirs.append(d)
actual = get_random_directory_name(suppress_mkdir=True)
self.assertFalse(exists(actual),'Random dir exists: %s' % actual)
self.assertTrue(actual.startswith('/'),\
'Random dir is not a full path: %s' % actual)
# prefix, suffix and output_dir are used as expected
actual = get_random_directory_name(suppress_mkdir=True,prefix='blah',\
output_dir='/tmp/',suffix='stuff')
self.assertTrue(actual.startswith('/tmp/blah2'),\
'Random dir does not begin with output_dir + prefix '+\
'+ 2 (where 2 indicates the millenium in the timestamp): %s' % actual)
self.assertTrue(actual.endswith('stuff'),\
'Random dir does not end with suffix: %s' % actual)
# changing rand_length functions as expected
actual1 = get_random_directory_name(suppress_mkdir=True)
actual2 = get_random_directory_name(suppress_mkdir=True,\
rand_length=10)
actual3 = get_random_directory_name(suppress_mkdir=True,\
rand_length=0)
self.assertTrue(len(actual1) > len(actual2) > len(actual3),\
"rand_length does not affect directory name lengths "+\
"as expected:\n%s\n%s\n%s" % (actual1,actual2,actual3))
# changing the timestamp pattern functions as expected
actual1 = get_random_directory_name(suppress_mkdir=True)
actual2 = get_random_directory_name(suppress_mkdir=True,\
timestamp_pattern='%Y')
self.assertNotEqual(actual1,actual2)
self.assertTrue(len(actual1)>len(actual2),\
'Changing timestamp_pattern does not affect directory name')
# empty string as timestamp works
actual3 = get_random_directory_name(suppress_mkdir=True,\
timestamp_pattern='')
self.assertTrue(len(actual2) > len(actual3))
# creating the directory works as expected
actual = get_random_directory_name(output_dir='/tmp/',\
prefix='get_random_directory_test')
self.assertTrue(exists(actual))
rmdir(actual)
def test_independent_spans(self):
"""get_independent_coords returns truly non-overlapping (decorated) spans"""
# single span is returned
data = [(0, 20, 'a')]
got = get_independent_coords(data)
self.assertEqual(got, data)
# multiple non-overlapping
data = [(20, 30, 'a'), (35, 40, 'b'), (65, 75, 'c')]
got = get_independent_coords(data)
self.assertEqual(got, data)
# over-lapping first/second returns first occurrence by default
data = [(20, 30, 'a'), (25, 40, 'b'), (65, 75, 'c')]
got = get_independent_coords(data)
self.assertEqual(got, [(20, 30, 'a'), (65, 75, 'c')])
# but randomly the first or second if random_tie_breaker is chosen
got = get_independent_coords(data, random_tie_breaker=True)
self.assertTrue(got in ([(20, 30, 'a'), (65, 75, 'c')],
[(25, 40, 'b'), (65, 75, 'c')]))
# over-lapping second/last returns first occurrence by default
data = [(20, 30, 'a'), (30, 60, 'b'), (50, 75, 'c')]
got = get_independent_coords(data)
self.assertEqual(got, [(20, 30, 'a'), (30, 60, 'b')])
# but randomly the first or second if random_tie_breaker is chosen
got = get_independent_coords(data, random_tie_breaker=True)
self.assertTrue(got in ([(20, 30, 'a'), (50, 75, 'c')],
[(20, 30, 'a'), (30, 60, 'b')]))
# over-lapping middle returns first occurrence by default
data = [(20, 24, 'a'), (25, 40, 'b'), (30, 35, 'c'), (65, 75, 'd')]
got = get_independent_coords(data)
self.assertEqual(got, [(20, 24, 'a'), (25, 40, 'b'), (65, 75, 'd')])
# but randomly the first or second if random_tie_breaker is chosen
got = get_independent_coords(data, random_tie_breaker=True)
self.assertTrue(got in ([(20, 24, 'a'), (25, 40, 'b'), (65, 75, 'd')],
[(20, 24, 'a'), (30, 35, 'c'), (65, 75, 'd')]))
def test_get_merged_spans(self):
"""tests merger of overlapping spans"""
sample = [[0, 10], [12, 15], [13, 16], [18, 25], [19, 20]]
result = get_merged_overlapping_coords(sample)
expect = [[0, 10], [12, 16], [18, 25]]
self.assertEqual(result, expect)
sample = [[0, 10], [5, 9], [12, 16], [18, 20], [19, 25]]
result = get_merged_overlapping_coords(sample)
expect = [[0, 10], [12, 16], [18, 25]]
self.assertEqual(result, expect)
def test_get_run_start_indices(self):
"""return indices corresponding to start of a run of identical values"""
# 0 1 2 3 4 5 6 7
data = [1, 2, 3, 3, 3, 4, 4, 5]
expect = [[0, 1], [1, 2], [2, 3], [5, 4], [7, 5]]
got = get_run_start_indices(data)
self.assertEqual(list(got), expect)
# raise an exception if try and provide a converter and num digits
def wrap_gen(): # need to wrap generator so we can actually test this
gen = get_run_start_indices(data, digits=1,
converter_func=lambda x: x)
def call():
for v in gen:
pass
return call
self.assertRaises(AssertionError, wrap_gen())
def test_merged_by_value_spans(self):
"""correctly merge adjacent spans with the same value"""
# initial values same
data = [[20, 21, 0], [21, 22, 0], [22, 23, 1], [23, 24, 0]]
self.assertEqual(get_merged_by_value_coords(data),
[[20, 22, 0], [22, 23, 1], [23, 24, 0]])
# middle values same
data = [[20, 21, 0], [21, 22, 1], [22, 23, 1], [23, 24, 0]]
self.assertEqual(get_merged_by_value_coords(data),
[[20, 21, 0], [21, 23, 1], [23, 24, 0]])
# last values same
data = [[20, 21, 0], [21, 22, 1], [22, 23, 0], [23, 24, 0]]
self.assertEqual(get_merged_by_value_coords(data),
[[20, 21, 0], [21, 22, 1], [22, 24, 0]])
# all unique values
data = [[20, 21, 0], [21, 22, 1], [22, 23, 2], [23, 24, 0]]
self.assertEqual(get_merged_by_value_coords(data),
[[20, 21, 0], [21, 22, 1], [22, 23, 2], [23, 24, 0]])
# all values same
data = [[20, 21, 0], [21, 22, 0], [22, 23, 0], [23, 24, 0]]
self.assertEqual(get_merged_by_value_coords(data),
[[20, 24, 0]])
# all unique values to 2nd decimal
data = [[20, 21, 0.11], [21, 22, 0.12], [22, 23, 0.13], [23, 24, 0.14]]
self.assertEqual(get_merged_by_value_coords(data),
[[20, 21, 0.11], [21, 22, 0.12], [22, 23, 0.13], [23, 24, 0.14]])
# all values same at 1st decimal
data = [[20, 21, 0.11], [21, 22, 0.12], [22, 23, 0.13], [23, 24, 0.14]]
self.assertEqual(get_merged_by_value_coords(data, digits=1),
[[20, 24, 0.1]])
class _my_dict(dict):
"""Used for testing subclass behavior of ClassChecker"""
pass
class ClassCheckerTests(TestCase):
"""Unit tests for the ClassChecker class."""
def setUp(self):
"""define a few standard checkers"""
self.strcheck = ClassChecker(str)
self.intcheck = ClassChecker(int, long)
self.numcheck = ClassChecker(float, int, long)
self.emptycheck = ClassChecker()
self.dictcheck = ClassChecker(dict)
self.mydictcheck = ClassChecker(_my_dict)
def test_init_good(self):
"""ClassChecker should init OK when initialized with classes"""
self.assertEqual(self.strcheck.Classes, [str])
self.assertEqual(self.numcheck.Classes, [float, int, long])
self.assertEqual(self.emptycheck.Classes, [])
def test_init_bad(self):
"""ClassChecker should raise TypeError if initialized with non-class"""
self.assertRaises(TypeError, ClassChecker, 'x')
self.assertRaises(TypeError, ClassChecker, str, None)
def test_contains(self):
"""ClassChecker should return True only if given instance of class"""
self.assertEqual(self.strcheck.__contains__('3'), True)
self.assertEqual(self.strcheck.__contains__('ahsdahisad'), True)
self.assertEqual(self.strcheck.__contains__(3), False)
self.assertEqual(self.strcheck.__contains__({3:'c'}), False)
self.assertEqual(self.intcheck.__contains__('ahsdahisad'), False)
self.assertEqual(self.intcheck.__contains__('3'), False)
self.assertEqual(self.intcheck.__contains__(3.0), False)
self.assertEqual(self.intcheck.__contains__(3), True)
self.assertEqual(self.intcheck.__contains__(4**60), True)
self.assertEqual(self.intcheck.__contains__(4**60 * -1), True)
d = _my_dict()
self.assertEqual(self.dictcheck.__contains__(d), True)
self.assertEqual(self.dictcheck.__contains__({'d':1}), True)
self.assertEqual(self.mydictcheck.__contains__(d), True)
self.assertEqual(self.mydictcheck.__contains__({'d':1}), False)
self.assertEqual(self.emptycheck.__contains__('d'), False)
self.assertEqual(self.numcheck.__contains__(3), True)
self.assertEqual(self.numcheck.__contains__(3.0), True)
self.assertEqual(self.numcheck.__contains__(-3), True)
self.assertEqual(self.numcheck.__contains__(-3.0), True)
self.assertEqual(self.numcheck.__contains__(3e-300), True)
self.assertEqual(self.numcheck.__contains__(0), True)
self.assertEqual(self.numcheck.__contains__(4**1000), True)
self.assertEqual(self.numcheck.__contains__('4**1000'), False)
def test_str(self):
"""ClassChecker str should be the same as str(self.Classes)"""
for c in [self.strcheck, self.intcheck, self.numcheck, self.emptycheck,
self.dictcheck, self.mydictcheck]:
self.assertEqual(str(c), str(c.Classes))
def test_copy(self):
"""copy.copy should work correctly on ClassChecker"""
c = copy(self.strcheck)
assert c is not self.strcheck
assert '3' in c
assert 3 not in c
assert c.Classes is self.strcheck.Classes
def test_deepcopy(self):
"""copy.deepcopy should work correctly on ClassChecker"""
c = deepcopy(self.strcheck)
assert c is not self.strcheck
assert '3' in c
assert 3 not in c
assert c.Classes is not self.strcheck.Classes
class modifiable_string(str):
"""Mutable class to allow arbitrary attributes to be set"""
pass
class _list_and_string(list, Delegator):
"""Trivial class to demonstrate Delegator.
"""
def __init__(self, items, string):
Delegator.__init__(self, string)
self.NormalAttribute = 'default'
self._x = None
self._constant = 'c'
for i in items:
self.append(i)
def _get_rand_property(self):
return self._x
def _set_rand_property(self, value):
self._x = value
prop = property(_get_rand_property, _set_rand_property)
def _get_constant_property(self):
return self._constant
constant = property(_get_constant_property)
class DelegatorTests(TestCase):
"""Verify that Delegator works with attributes and properties."""
def test_init(self):
"""Delegator should init OK when data supplied"""
ls = _list_and_string([1,2,3], 'abc')
self.assertRaises(TypeError, _list_and_string, [123])
def test_getattr(self):
"""Delegator should find attributes in correct places"""
ls = _list_and_string([1,2,3], 'abcd')
#behavior as list
self.assertEqual(len(ls), 3)
self.assertEqual(ls[0], 1)
ls.reverse()
self.assertEqual(ls, [3,2,1])
#behavior as string
self.assertEqual(ls.upper(), 'ABCD')
self.assertEqual(len(ls.upper()), 4)
self.assertEqual(ls.replace('a', 'x'), 'xbcd')
#behavior of normal attributes
self.assertEqual(ls.NormalAttribute, 'default')
#behavior of properties
self.assertEqual(ls.prop, None)
self.assertEqual(ls.constant, 'c')
#shouldn't be allowed to get unknown properties
self.assertRaises(AttributeError, getattr, ls, 'xyz')
#if the unknown property can be set in the forwarder, do it there
flex = modifiable_string('abcd')
ls_flex = _list_and_string([1,2,3], flex)
ls_flex.blah = 'zxc'
self.assertEqual(ls_flex.blah, 'zxc')
self.assertEqual(flex.blah, 'zxc')
#should get AttributeError if changing a read-only property
self.assertRaises(AttributeError, setattr, ls, 'constant', 'xyz')
def test_setattr(self):
"""Delegator should set attributes in correct places"""
ls = _list_and_string([1,2,3], 'abcd')
#ability to create a new attribute
ls.xyz = 3
self.assertEqual(ls.xyz, 3)
#modify a normal attribute
ls.NormalAttribute = 'changed'
self.assertEqual(ls.NormalAttribute, 'changed')
#modify a read/write property
ls.prop = 'xyz'
self.assertEqual(ls.prop, 'xyz')
def test_copy(self):
"""copy.copy should work correctly on Delegator"""
l = ['a']
d = Delegator(l)
c = copy(d)
assert c is not d
assert c._handler is d._handler
def test_deepcopy(self):
"""copy.deepcopy should work correctly on Delegator"""
l = ['a']
d = Delegator(l)
c = deepcopy(d)
assert c is not d
assert c._handler is not d._handler
assert c._handler == d._handler
class FunctionWrapperTests(TestCase):
"""Tests of the FunctionWrapper class"""
def test_init(self):
"""FunctionWrapper should initialize with any callable"""
f = FunctionWrapper(str)
g = FunctionWrapper(id)
h = FunctionWrapper(iterable)
x = 3
self.assertEqual(f(x), '3')
self.assertEqual(g(x), id(x))
self.assertEqual(h(x), [3])
def test_copy(self):
"""copy should work for FunctionWrapper objects"""
f = FunctionWrapper(str)
c = copy(f)
assert c is not f
assert c.Function is f.Function
#NOTE: deepcopy does not work for FunctionWrapper objects because you
#can't copy a function.
class _simple_container(object):
"""example of a container to constrain"""
def __init__(self, data):
self._data = list(data)
def __getitem__(self, item):
return self._data.__getitem__(item)
class _constrained_simple_container(_simple_container, ConstrainedContainer):
"""constrained version of _simple_container"""
def __init__(self, data):
_simple_container.__init__(self, data)
ConstrainedContainer.__init__(self, None)
class ConstrainedContainerTests(TestCase):
"""Tests of the generic ConstrainedContainer interface."""
def setUp(self):
"""Make a couple of standard containers"""
self.alphabet = _constrained_simple_container('abc')
self.numbers = _constrained_simple_container([1,2,3])
self.alphacontainer = 'abcdef'
self.numbercontainer = ClassChecker(int)
def test_matchesConstraint(self):
"""ConstrainedContainer matchesConstraint should return true if items ok"""
self.assertEqual(self.alphabet.matchesConstraint(self.alphacontainer), \
True)
self.assertEqual(self.alphabet.matchesConstraint(self.numbercontainer),\
False)
self.assertEqual(self.numbers.matchesConstraint(self.alphacontainer), \
False)
self.assertEqual(self.numbers.matchesConstraint(self.numbercontainer),\
True)
def test_otherIsValid(self):
"""ConstrainedContainer should use constraint for checking other"""
self.assertEqual(self.alphabet.otherIsValid('12d8jc'), True)
self.alphabet.Constraint = self.alphacontainer
self.assertEqual(self.alphabet.otherIsValid('12d8jc'), False)
self.alphabet.Constraint = list('abcdefghijkl12345678')
self.assertEqual(self.alphabet.otherIsValid('12d8jc'), True)
self.assertEqual(self.alphabet.otherIsValid('z'), False)
def test_itemIsValid(self):
"""ConstrainedContainer should use constraint for checking item"""
self.assertEqual(self.alphabet.itemIsValid(3), True)
self.alphabet.Constraint = self.alphacontainer
self.assertEqual(self.alphabet.itemIsValid(3), False)
self.assertEqual(self.alphabet.itemIsValid('a'), True)
def test_sequenceIsValid(self):
"""ConstrainedContainer should use constraint for checking sequence"""
self.assertEqual(self.alphabet.sequenceIsValid('12d8jc'), True)
self.alphabet.Constraint = self.alphacontainer
self.assertEqual(self.alphabet.sequenceIsValid('12d8jc'), False)
self.alphabet.Constraint = list('abcdefghijkl12345678')
self.assertEqual(self.alphabet.sequenceIsValid('12d8jc'), True)
self.assertEqual(self.alphabet.sequenceIsValid('z'), False)
def test_Constraint(self):
"""ConstrainedContainer should only allow valid constraints to be set"""
try:
self.alphabet.Constraint = self.numbers
except ConstraintError:
pass
else:
raise AssertionError, \
"Failed to raise ConstraintError with invalid constraint."
self.alphabet.Constraint = 'abcdefghi'
self.alphabet.Constraint = ['a','b', 'c', 1, 2, 3]
self.numbers.Constraint = range(20)
self.numbers.Constraint = xrange(20)
self.numbers.Constraint = [5,1,3,7,2]
self.numbers.Constraint = {1:'a',2:'b',3:'c'}
self.assertRaises(ConstraintError, setattr, self.numbers, \
'Constraint', '1')
class ConstrainedStringTests(TestCase):
"""Tests that ConstrainedString can only contain allowed items."""
def test_init_good_data(self):
"""ConstrainedString should init OK if string matches constraint"""
self.assertEqual(ConstrainedString('abc', 'abcd'), 'abc')
self.assertEqual(ConstrainedString('', 'abcd'), '')
items = [1,2,3.2234, (['a'], ['b'],), 'xyz']
#should accept anything str() does if no constraint is passed
self.assertEqual(ConstrainedString(items), str(items))
self.assertEqual(ConstrainedString(items, None), str(items))
self.assertEqual(ConstrainedString('12345'), str(12345))
self.assertEqual(ConstrainedString(12345, '1234567890'), str(12345))
#check that list is formatted correctly and chars are all there
test_list = [1,2,3,4,5]
self.assertEqual(ConstrainedString(test_list, '][, 12345'), str(test_list))
def test_init_bad_data(self):
"""ConstrainedString should fail init if unknown chars in string"""
self.assertRaises(ConstraintError, ConstrainedString, 1234, '123')
self.assertRaises(ConstraintError, ConstrainedString, '1234', '123')
self.assertRaises(ConstraintError, ConstrainedString, [1,2,3], '123')
def test_add_prevents_bad_data(self):
"""ConstrainedString should allow addition only of compliant string"""
a = ConstrainedString('123', '12345')
b = ConstrainedString('444', '4')
c = ConstrainedString('45', '12345')
d = ConstrainedString('x')
self.assertEqual(a + b, '123444')
self.assertEqual(a + c, '12345')
self.assertRaises(ConstraintError, b.__add__, c)
self.assertRaises(ConstraintError, c.__add__, d)
#should be OK if constraint removed
b.Constraint = None
self.assertEqual(b + c, '44445')
self.assertEqual(b + d, '444x')
#should fail if we add the constraint back
b.Constraint = '4x'
self.assertEqual(b + d, '444x')
self.assertRaises(ConstraintError, b.__add__, c)
#check that added strings retain constraint
self.assertRaises(ConstraintError, (a+b).__add__, d)
def test_mul(self):
"""ConstrainedString mul amd rmul should retain constraint"""
a = ConstrainedString('123', '12345')
b = 3*a
c = b*2
self.assertEqual(b, '123123123')
self.assertEqual(c, '123123123123123123')
self.assertRaises(ConstraintError, b.__add__, 'x')
self.assertRaises(ConstraintError, c.__add__, 'x')
def test_getslice(self):
"""ConstrainedString getslice should remember constraint"""
a = ConstrainedString('123333', '12345')
b = a[2:4]
self.assertEqual(b, '33')
self.assertEqual(b.Constraint, '12345')
def test_getitem(self):
"""ConstrainedString getitem should handle slice objects"""
a = ConstrainedString('7890543', '1234567890')
self.assertEqual(a[0], '7')
self.assertEqual(a[1], '8')
self.assertEqual(a[-1], '3')
self.assertRaises(AttributeError, getattr, a[1], 'Alphabet')
self.assertEqual(a[1:6:2], '804')
self.assertEqual(a[1:6:2].Constraint, '1234567890')
def test_init_masks(self):
"""ConstrainedString should init OK with masks"""
def mask(x):
return str(int(x) + 3)
a = ConstrainedString('12333', '45678', mask)
self.assertEqual(a, '45666')
assert 'x' not in a
self.assertRaises(TypeError, a.__contains__, 1)
class MappedStringTests(TestCase):
"""MappedString should behave like ConstrainedString, but should map items."""
def test_init_masks(self):
"""MappedString should init OK with masks"""
def mask(x):
return str(int(x) + 3)
a = MappedString('12333', '45678', mask)
self.assertEqual(a, '45666')
assert 1 in a
assert 'x' not in a
class ConstrainedListTests(TestCase):
"""Tests that bad data can't sneak into ConstrainedLists."""
def test_init_good_data(self):
"""ConstrainedList should init OK if list matches constraint"""
self.assertEqual(ConstrainedList('abc', 'abcd'), list('abc'))
self.assertEqual(ConstrainedList('', 'abcd'), list(''))
items = [1,2,3.2234, (['a'], ['b'],), list('xyz')]
#should accept anything str() does if no constraint is passed
self.assertEqual(ConstrainedList(items), items)
self.assertEqual(ConstrainedList(items, None), items)
self.assertEqual(ConstrainedList('12345'), list('12345'))
#check that list is formatted correctly and chars are all there
test_list = list('12345')
self.assertEqual(ConstrainedList(test_list, '12345'), test_list)
def test_init_bad_data(self):
"""ConstrainedList should fail init with items not in constraint"""
self.assertRaises(ConstraintError, ConstrainedList, '1234', '123')
self.assertRaises(ConstraintError,ConstrainedList,[1,2,3],['1','2','3'])
def test_add_prevents_bad_data(self):
"""ConstrainedList should allow addition only of compliant data"""
a = ConstrainedList('123', '12345')
b = ConstrainedList('444', '4')
c = ConstrainedList('45', '12345')
d = ConstrainedList('x')
self.assertEqual(a + b, list('123444'))
self.assertEqual(a + c, list('12345'))
self.assertRaises(ConstraintError, b.__add__, c)
self.assertRaises(ConstraintError, c.__add__, d)
#should be OK if constraint removed
b.Constraint = None
self.assertEqual(b + c, list('44445'))
self.assertEqual(b + d, list('444x'))
#should fail if we add the constraint back
b.Constraint = {'4':1, 5:2}
self.assertRaises(ConstraintError, b.__add__, c)
def test_iadd_prevents_bad_data(self):
"""ConstrainedList should allow in-place addition only of compliant data"""
a = ConstrainedList('12', '123')
a += '2'
self.assertEqual(a, list('122'))
self.assertEqual(a.Constraint, '123')
self.assertRaises(ConstraintError, a.__iadd__, '4')
def test_imul(self):
"""ConstrainedList imul should preserve constraint"""
a = ConstrainedList('12', '123')
a *= 3
self.assertEqual(a, list('121212'))
self.assertEqual(a.Constraint, '123')
def test_mul(self):
"""ConstrainedList mul should preserve constraint"""
a = ConstrainedList('12', '123')
b = a * 3
self.assertEqual(b, list('121212'))
self.assertEqual(b.Constraint, '123')
def test_rmul(self):
"""ConstrainedList rmul should preserve constraint"""
a = ConstrainedList('12', '123')
b = 3 * a
self.assertEqual(b, list('121212'))
self.assertEqual(b.Constraint, '123')
def test_setitem(self):
"""ConstrainedList setitem should work only if item in constraint"""
a = ConstrainedList('12', '123')
a[0] = '3'
self.assertEqual(a, list('32'))
self.assertRaises(ConstraintError, a.__setitem__, 0, 3)
a = ConstrainedList('1'*20, '123')
self.assertRaises(ConstraintError, a.__setitem__, slice(0,1,1), [3])
self.assertRaises(ConstraintError, a.__setitem__, slice(0,1,1), ['111'])
a[2:9:2] = '3333'
self.assertEqual(a, list('11313131311111111111'))
def test_append(self):
"""ConstrainedList append should work only if item in constraint"""
a = ConstrainedList('12', '123')
a.append('3')
self.assertEqual(a, list('123'))
self.assertRaises(ConstraintError, a.append, 3)
def test_extend(self):
"""ConstrainedList extend should work only if all items in constraint"""
a = ConstrainedList('12', '123')
a.extend('321')
self.assertEqual(a, list('12321'))
self.assertRaises(ConstraintError, a.extend, ['1','2', 3])
def test_insert(self):
"""ConstrainedList insert should work only if item in constraint"""
a = ConstrainedList('12', '123')
a.insert(0, '2')
self.assertEqual(a, list('212'))
self.assertRaises(ConstraintError, a.insert, 0, [2])
def test_getslice(self):
"""ConstrainedList getslice should remember constraint"""
a = ConstrainedList('123333', '12345')
b = a[2:4]
self.assertEqual(b, list('33'))
self.assertEqual(b.Constraint, '12345')
def test_setslice(self):
"""ConstrainedList setslice should fail if slice has invalid chars"""
a = ConstrainedList('123333', '12345')
a[2:4] = ['2','2']
self.assertEqual(a, list('122233'))
self.assertRaises(ConstraintError, a.__setslice__, 2,4, [2,2])
a[:] = []
self.assertEqual(a, [])
self.assertEqual(a.Constraint, '12345')
def test_setitem_masks(self):
"""ConstrainedList setitem with masks should transform input"""
a = ConstrainedList('12333', range(5), lambda x: int(x) + 1)
self.assertEqual(a, [2,3,4,4,4])
self.assertRaises(ConstraintError, a.append, 4)
b = a[1:3]
assert b.Mask is a.Mask
assert '1' not in a
assert '2' not in a
assert 2 in a
assert 'x' not in a
class MappedListTests(TestCase):
"""MappedList should behave like ConstrainedList, but map items."""
def test_setitem_masks(self):
"""MappedList setitem with masks should transform input"""
a = MappedList('12333', range(5), lambda x: int(x) + 1)
self.assertEqual(a, [2,3,4,4,4])
self.assertRaises(ConstraintError, a.append, 4)
b = a[1:3]
assert b.Mask is a.Mask
assert '1' in a
assert 'x' not in a
class ConstrainedDictTests(TestCase):
"""Tests that bad data can't sneak into ConstrainedDicts."""
def test_init_good_data(self):
"""ConstrainedDict should init OK if list matches constraint"""
self.assertEqual(ConstrainedDict(dict.fromkeys('abc'), 'abcd'), \
dict.fromkeys('abc'))
self.assertEqual(ConstrainedDict('', 'abcd'), dict(''))
items = [1,2,3.2234, tuple('xyz')]
#should accept anything dict() does if no constraint is passed
self.assertEqual(ConstrainedDict(dict.fromkeys(items)), \
dict.fromkeys(items))
self.assertEqual(ConstrainedDict(dict.fromkeys(items), None), \
dict.fromkeys(items))
self.assertEqual(ConstrainedDict([(x,1) for x in '12345']), \
dict.fromkeys('12345', 1))
#check that list is formatted correctly and chars are all there
test_dict = dict.fromkeys('12345')
self.assertEqual(ConstrainedDict(test_dict, '12345'), test_dict)
def test_init_sequence(self):
"""ConstrainedDict should init from sequence, unlike normal dict"""
self.assertEqual(ConstrainedDict('abcda'), {'a':2,'b':1,'c':1,'d':1})
def test_init_bad_data(self):
"""ConstrainedDict should fail init with items not in constraint"""
self.assertRaises(ConstraintError, ConstrainedDict, \
dict.fromkeys('1234'), '123')
self.assertRaises(ConstraintError,ConstrainedDict, \
dict.fromkeys([1,2,3]),['1','2','3'])
def test_setitem(self):
"""ConstrainedDict setitem should work only if key in constraint"""
a = ConstrainedDict(dict.fromkeys('12'), '123')
a['1'] = '3'
self.assertEqual(a, {'1':'3','2':None})
self.assertRaises(ConstraintError, a.__setitem__, 1, '3')
def test_copy(self):
"""ConstrainedDict copy should retain constraint"""
a = ConstrainedDict(dict.fromkeys('12'), '123')
b = a.copy()
self.assertEqual(a.Constraint, b.Constraint)
self.assertRaises(ConstraintError, a.__setitem__, 1, '3')
self.assertRaises(ConstraintError, b.__setitem__, 1, '3')
def test_fromkeys(self):
"""ConstrainedDict instance fromkeys should retain constraint"""
a = ConstrainedDict(dict.fromkeys('12'), '123')
b = a.fromkeys('23')
self.assertEqual(a.Constraint, b.Constraint)
self.assertRaises(ConstraintError, a.__setitem__, 1, '3')
self.assertRaises(ConstraintError, b.__setitem__, 1, '3')
b['2'] = 5
self.assertEqual(b, {'2':5, '3':None})
def test_setdefault(self):
"""ConstrainedDict setdefault shouldn't allow bad keys"""
a = ConstrainedDict({'1':None, '2': 'xyz'}, '123')
self.assertEqual(a.setdefault('2', None), 'xyz')
self.assertEqual(a.setdefault('1', None), None)
self.assertRaises(ConstraintError, a.setdefault, 'x', 3)
a.setdefault('3', 12345)
self.assertEqual(a, {'1':None, '2':'xyz', '3': 12345})
def test_update(self):
"""ConstrainedDict should allow update only of compliant data"""
a = ConstrainedDict(dict.fromkeys('123'), '12345')
b = ConstrainedDict(dict.fromkeys('444'), '4')
c = ConstrainedDict(dict.fromkeys('45'), '12345')
d = ConstrainedDict([['x','y']])
a.update(b)
self.assertEqual(a, dict.fromkeys('1234'))
a.update(c)
self.assertEqual(a, dict.fromkeys('12345'))
self.assertRaises(ConstraintError, b.update, c)
self.assertRaises(ConstraintError, c.update, d)
#should be OK if constraint removed
b.Constraint = None
b.update(c)
self.assertEqual(b, dict.fromkeys('45'))
b.update(d)
self.assertEqual(b, {'4':None, '5':None, 'x':'y'})
#should fail if we add the constraint back
b.Constraint = {'4':1, 5:2, '5':1, 'x':1}
self.assertRaises(ConstraintError, b.update, {4:1})
b.update({5:1})
self.assertEqual(b, {'4':None, '5':None, 'x':'y', 5:1})
def test_setitem_masks(self):
"""ConstrainedDict setitem should work only if key in constraint"""
key_mask = str
val_mask = lambda x: int(x) + 3
d = ConstrainedDict({1:4, 2:6}, '123', key_mask, val_mask)
d[1] = '456'
self.assertEqual(d, {'1':459,'2':9,})
d['1'] = 234
self.assertEqual(d, {'1':237,'2':9,})
self.assertRaises(ConstraintError, d.__setitem__, 4, '3')
e = d.copy()
assert e.Mask is d.Mask
assert '1' in d
assert not 1 in d
class MappedDictTests(TestCase):
"""MappedDict should work like ConstrainedDict, but map keys."""
def test_setitem_masks(self):
"""MappedDict setitem should work only if key in constraint"""
key_mask = str
val_mask = lambda x: int(x) + 3
d = MappedDict({1:4, 2:6}, '123', key_mask, val_mask)
d[1] = '456'
self.assertEqual(d, {'1':459,'2':9,})
d['1'] = 234
self.assertEqual(d, {'1':237,'2':9,})
self.assertRaises(ConstraintError, d.__setitem__, 4, '3')
e = d.copy()
assert e.Mask is d.Mask
assert '1' in d
assert 1 in d
assert 1 not in d.keys()
assert 'x' not in d.keys()
def test_getitem(self):
"""MappedDict getitem should automatically map key."""
key_mask = str
d = MappedDict({}, '123', key_mask)
self.assertEqual(d, {})
d['1'] = 5
self.assertEqual(d, {'1':5})
self.assertEqual(d[1], 5)
def test_get(self):
"""MappedDict get should automatically map key."""
key_mask = str
d = MappedDict({}, '123', key_mask)
self.assertEqual(d, {})
d['1'] = 5
self.assertEqual(d, {'1':5})
self.assertEqual(d.get(1, 'x'), 5)
self.assertEqual(d.get(5, 'x'), 'x')
def test_has_key(self):
"""MappedDict has_key should automatically map key."""
key_mask = str
d = MappedDict({}, '123', key_mask)
self.assertEqual(d, {})
d['1'] = 5
assert d.has_key('1')
assert d.has_key(1)
assert not d.has_key('5')
class generateCombinationsTests(TestCase):
"""Tests for public generateCombinations function."""
def test_generateCombinations(self):
"""function should return all combinations of given length"""
#test all 3-position combinations of a 2-digit alphabet, since I can
#work that one out by hand ...
correct_result = [ "AAA", "AAB", "ABA", "ABB", \
"BBB", "BBA", "BAB", "BAA"]
real_result = generateCombinations("AB", 3)
correct_result.sort()
real_result.sort()
self.assertEquals(str(real_result), str(correct_result))
#end test_generateCombinations
def test_generateCombinations_singleAlphabet(self):
"""function should return correct value when alphabet is one char"""
real_result = generateCombinations("A", 4)
self.assertEquals(str(real_result), str(["AAAA"]))
#end test_generateCombinations_singleAlphabet
def test_generateCombinations_singleLength(self):
"""function should return correct values if length is 1"""
real_result = generateCombinations("ABC", 1)
self.assertEquals(str(real_result), str(["A", "B", "C"]))
#end test_generateCombinations_singleLength
def test_generateCombinations_emptyAlphabet(self):
"""function should return empty list if alphabet arg is [], "" """
real_result = generateCombinations("", 4)
self.assertEquals(str(real_result), str([]))
real_result = generateCombinations([], 4)
self.assertEquals(str(real_result), str([]))
#end test_generateCombinations_emptyAlphabet
def test_generateCombinations_zeroLength(self):
"""function should return empty list if length arg is 0 """
real_result = generateCombinations("ABC", 0)
self.assertEquals(str(real_result), str([]))
#end test_generateCombinations_zeroLength
def test_generateCombinations_badArgs(self):
"""function should error if args are not castable to right type."""
self.assertRaises(RuntimeError, generateCombinations, 12, 4)
self.assertRaises(RuntimeError, generateCombinations, [], None)
#end test_generateCombinations_badArgs
#end generateCombinationsTests
class makeNonnegIntTests(TestCase):
"""Tests of the public makeNonnegInt function"""
def test_makeNonnegInt_unchanged(self):
"""Should return an input nonneg int unchanged"""
self.assertEquals(makeNonnegInt(3), 3)
#end test_makeNonnegInt_unchanged
def test_makeNonnegInt_castable(self):
"""Should return nonneg int version of a castable input"""
self.assertEquals(makeNonnegInt(-4.2), 4)
#end test_makeNonnegInt_castable
def test_makeNonnegInt_noncastable(self):
"""Should raise a special NonnegIntError if input isn't castable"""
self.assertRaises(NonnegIntError, makeNonnegInt, "blue")
#end test_makeNonnegInt_noncastable
#end makeNonnegIntTests
class reverse_complementTests(TestCase):
"""Tests of the public reverse_complement function"""
def test_reverse_complement_DNA(self):
"""reverse_complement should correctly return reverse complement of DNA"""
#input and correct output taken from example at
#http://bioweb.uwlax.edu/GenWeb/Molecular/Seq_Anal/
#Reverse_Comp/reverse_comp.html
user_input = "ATGCAGGGGAAACATGATTCAGGAC"
correct_output = "GTCCTGAATCATGTTTCCCCTGCAT"
real_output = reverse_complement(user_input)
self.assertEquals(real_output, correct_output)
# revComp is a pointer to reverse_complement (for backward
# compatibility)
real_output = revComp(user_input)
self.assertEquals(real_output, correct_output)
#end test_reverse_complement_DNA
def test_reverse_complement_RNA(self):
"""reverse_complement should correctly return reverse complement of RNA"""
#input and correct output taken from test_reverse_complement_DNA test,
#with all Ts changed to Us
user_input = "AUGCAGGGGAAACAUGAUUCAGGAC"
correct_output = "GUCCUGAAUCAUGUUUCCCCUGCAU"
#remember to use False toggle to get RNA instead of DNA
real_output = reverse_complement(user_input, False)
self.assertEquals(real_output, correct_output)
#end test_reverse_complement_RNA
def test_reverse_complement_caseSensitive(self):
"""reverse_complement should convert bases without changing case"""
user_input = "aCGtAcgT"
correct_output = "AcgTaCGt"
real_output = reverse_complement(user_input)
self.assertEquals(real_output, correct_output)
#end test_reverse_complement_caseSensitive
def test_reverse_complement_nonNucleicSeq(self):
"""reverse_complement should just reverse any chars but ACGT/U"""
user_input = "BDeF"
self.assertRaises(ValueError,reverse_complement,user_input)
#end test_reverse_complement_nonNucleicSeq
def test_reverse_complement_emptySeq(self):
"""reverse_complement should return empty string if given empty sequence"""
#shouldn't matter whether in DNA or RNA mode
real_output = reverse_complement("")
self.assertEquals(real_output, "")
#end test_reverse_complement_emptySeq
def test_reverse_complement_noSeq(self):
"""reverse_complement should return error if given no sequence argument"""
self.assertRaises(TypeError, reverse_complement)
#end test_reverse_complement_noSeq
#end reverse_complementTests
def test_not_none(self):
"""not_none should return True if none of the items is None"""
assert not_none([1,2,3,4])
assert not not_none([1,2,3,None])
self.assertEqual(filter(not_none,[(1,2),(3,None)]),[(1,2)])
#end test_not_none
def test_get_items_except(self):
"""get_items_except should return all items of seq not in indices"""
self.assertEqual(get_items_except('a-b-c-d',[1,3,5]),'abcd')
self.assertEqual(get_items_except([0,1,2,3,4,5,6],[1,3,5]),[0,2,4,6])
self.assertEqual(get_items_except((0,1,2,3,4,5,6),[1,3,5]),(0,2,4,6))
self.assertEqual(get_items_except('a-b-c-d',[1,3,5],tuple),
('a','b','c','d'))
#end test_get_items_except
def test_NestedSplitter(self):
"""NestedSplitter should make a function which return expected list"""
#test delimiters, constructor, filter_
line='ii=0; oo= 9, 6 5; ; xx= 8; '
cmds = [
"NestedSplitter(';=,')(line)",
"NestedSplitter([';', '=', ','])(line)",
"NestedSplitter([(';'), '=', ','], constructor=None)(line)",
"NestedSplitter([(';'), '=', ','], filter_=None)(line)",
"NestedSplitter([(';',1), '=', ','])(line)",
"NestedSplitter([(';',-1), '=', ','])(line)"
]
results=[
[['ii', '0'], ['oo', ['9', '6 5']], '', ['xx', '8'], ''],
[['ii', '0'], ['oo', ['9', '6 5']], '', ['xx', '8'], ''],
[['ii', '0'], [' oo', [' 9', ' 6 5']], ' ', [' xx', ' 8'], ' '],
[['ii', '0'], ['oo', ['9', '6 5']], ['xx', '8']],
[['ii', '0'], ['oo', ['9', '6 5; ; xx'], '8;']],
[['ii', '0; oo', ['9', '6 5; ; xx'], '8'], '']
]
for cmd, result in zip(cmds, results):
self.assertEqual(eval(cmd), result)
#test uncontinous level of delimiters
test = 'a; b,c; d,e:f; g:h;' #g:h should get [[g,h]] instead of [g,h]
self.assertEqual(NestedSplitter(';,:')(test),
['a', ['b', 'c'], ['d', ['e', 'f']], [['g', 'h']], ''])
#test empty
self.assertEqual(NestedSplitter(';,:')(''), [''])
self.assertEqual(NestedSplitter(';,:')(' '), [''])
self.assertEqual(NestedSplitter(';,:', filter_=None)(' ;, :'), [[[]]])
def test_curry(self):
"""curry should generate the function with parameters setted"""
curry_test = curry(cmp, 5)
knowns = ((3, 1),
(9, -1),
(5, 0))
for arg2, result in knowns:
self.assertEqual (curry_test(arg2), result)
def test_app_path(self):
"""app_path should return correct paths"""
self.assertEqual(app_path('ls'), '/bin/ls')
self.assertEqual(app_path('lsxxyyx'), False)
class CommandLineParserTests(TestCase):
def test_parse_command_line_parameters(self):
"""parse_command_line_parameters returns without error
There is not a lot of detailed testing that can be done here,
so the basic functionality is tested.
"""
option_parser, opts, args = parse_command_line_parameters(
script_description="My script",
script_usage=[('Print help','%prog -h','')],
version='1.0',help_on_no_arguments=False,
command_line_args=[])
self.assertEqual(len(args),0)
d = {'script_description':"My script",\
'script_usage':[('Print help','%prog -h','')],\
'version':'1.0',
'help_on_no_arguments':False,
'command_line_args':[]}
option_parser, opts, args = parse_command_line_parameters(**d)
self.assertEqual(len(args),0)
# allowing positional arguments functions as expected as does
# passing a positional argument
d = {'script_description':"My script",\
'script_usage':[('Print help','%prog -h','')],\
'version':'1.0',
'help_on_no_arguments':False,
'command_line_args':['hello'],
'disallow_positional_arguments':False}
option_parser, opts, args = parse_command_line_parameters(**d)
self.assertEqual(len(args),1)
#run tests on command-line invocation
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_util/test_organizer.py 000644 000765 000024 00000017071 12024702176 023036 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests Filter, Organizer and filterfunctions, classes for filtering
"""
from cogent.util.organizer import Filter, Organizer, GroupList, regroup
from cogent.util.transform import find_any, find_no, find_all,\
keep_if_more, exclude_if_more, keep_if_more_other, exclude_if_more_other
from cogent.util.unit_test import TestCase,main
__author__ = "Sandra Smit"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Sandra Smit", "Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Sandra Smit"
__email__ = "sandra.smit@colorado.edu"
__status__ = "Production"
class Sequence(object):
"""Simple sequence class for tests."""
def __init__(self, s, info):
self.s = s
self.__dict__.update(info)
if not hasattr(self, 'Gene'):
self.Gene = None
def __contains__(self, s):
return s in self.s
def __repr__(self):
return `self.s`
def __iter__(self):
return iter(self.s)
def __nonzero__(self):
return bool(self.s)
def lower(self):
return self.s.lower()
def __cmp__(self, other):
return cmp(self.s,other.s)
class FilterTests(TestCase):
"""Tests of Filter class"""
def test_init(self):
"""Filter should init as expected"""
empty_filter = Filter('',{})
named_empty_filter = Filter('Archaea',{})
self.assertEqual(empty_filter,{})
self.assertEqual(empty_filter.Name,'')
self.assertEqual(named_empty_filter,{})
self.assertEqual(named_empty_filter.Name,'Archaea')
f = find_all('abcd')
g = keep_if_more_other('ab',7)
fil = Filter('Archaea',{'Arch':[f,g]})
assert fil['Arch'][0] is f
assert fil['Arch'][1] is g
def test_call_empty(self):
"""Empty Filter should return True when called on anything"""
f = Filter('',{})
data = ['aa','bb','cc']
self.assertEqual(f(data),True)
def test_call_full(self):
"""Filter should return True if the object satisfies all criteria"""
seq1 = Sequence('ACGU',{'Gene':'LSU'})
seq2 = Sequence('ACGUACGU',{'Gene':'SSU'})
seq3 = Sequence('ACGUN',{'Gene':'LSU'})
seq4 = Sequence('ACG',{'Gene':'LSU'})
seq5 = Sequence('ACGU',{})
seq6 = Sequence('',{})
f = Filter('valid',{None:[find_all('AGCU'),find_no('N')],\
'Gene':[find_any(['LSU'])]})
self.assertEqual(f(seq1),True)
self.assertEqual(f(seq2),False)
self.assertEqual(f(seq3),False)
self.assertEqual(f(seq4),False)
self.assertEqual(f(seq5),False)
self.assertEqual(f(seq6),False)
class GroupListTests(TestCase):
"""Tests of GroupList class"""
def test_init_empty(self):
"""Empty GroupList should init OK"""
g = GroupList([])
self.assertEqual(len(g),0)
self.assertEqual(g.Groups,[])
def test_init_full(self):
"""GroupList should init OK with data and groups"""
data = ['a','b','c']
groups = [1,2,3]
g = GroupList(data,groups)
self.assertEqual(g,data)
self.assertEqual(g.Groups,groups)
self.assertEqual(len(g),3)
class OrganizerTests(TestCase):
"""Tests of Classifier class"""
def setUp(self):
"""Define some standard Organizers for testing"""
self.Empty = Organizer([])
self.a = Filter('a',{None:[find_any('a')]})
self.b = Filter('b',{None:[find_any('b')]})
self.Ab_org = Organizer([self.a,self.b])
lsu = Filter('LSU',{None:[exclude_if_more('N',5)],\
'Gene':[find_any(['LSU'])]})
ssu = Filter('SSU',{None:[exclude_if_more('N',5)],\
'Gene':[find_any(['SSU'])]})
self.Gene_org = Organizer([lsu,ssu])
self.Ab_seq = ['aa','bb','abab','cc','']
self.seq1 = Sequence('ACGU',{'Gene':'LSU'})
self.seq2 = Sequence('ACGUACGU',{'Gene':'SSU'})
self.seq3 = Sequence('ACGUNNNNNN',{'Gene':'LSU'})
self.seq4 = Sequence('ACGUNNNNNN',{'Gene':'SSU'})
self.seq5 = Sequence('ACGU',{})
self.seq6 = Sequence('',{})
self.seq7 = Sequence('ACGU',{'Gene':'unit'})
self.Gene_seq = [self.seq1,self.seq2,self.seq3,self.seq4,\
self.seq5,self.seq6,self.seq7]
f = Filter('valid',{None:[find_all('AGCU'),find_no('N')],\
'Gene':[find_any(['LSU'])]})
self.Mult_func_org = Organizer([f])
def test_init_empty(self):
"""Empty Organizer should init correctly"""
org = self.Empty
self.assertEqual(len(org),0)
def test_init_full(self):
"""Organizer should init correctly with multiple functions"""
org = Organizer([self.a,self.b])
self.assertEqual(org[0],self.a)
self.assertEqual(org[1],self.b)
self.assertEqual(len(org),2)
def test_empty_org_empty_list(self):
"""Empty Organizer should return [] when applied to []"""
org = self.Empty
l = []
self.assertEqual(org(l),[])
def test_empty_org_full_list(self):
"""Empty organizer, applied to full list, should return the original"""
org = self.Empty
l = self.Ab_seq
obs = org(l)
self.assertEqual(obs,[l])
self.assertEqual(obs[0].Groups,[None])
def test_full_org_empty_list(self):
"""Organizer should return [] when applied to []"""
org = self.Ab_org
l = []
obs = org(l)
self.assertEqual(obs,[])
def test_full_org_full_list(self):
"""Organizer should return correct organization"""
org = self.Ab_org
l = self.Ab_seq
obs = org(l)
obs.sort()
exp = [['aa','abab'],['bb'],['cc','']]
self.assertEqual(obs,exp)
self.assertEqual(obs[0].Groups,['a'])
self.assertEqual(obs[1].Groups,['b'])
self.assertEqual(obs[2].Groups,[None])
def test_double_org_empty_list(self):
"""Organizer should return [] when applied to []"""
org = self.Gene_org
l = []
obs = org(l)
self.assertEqual(obs,[])
def test_double_org_full_list(self):
"""Organizer should handle multiple filters correctly"""
org = self.Gene_org
l = self.Gene_seq
obs = org(l)
obs.sort()
exp = [[self.seq1],[self.seq2],[self.seq3,self.seq4,\
self.seq5,self.seq6,self.seq7]]
self.assertEqual(obs,exp)
self.assertEqual(obs[0].Groups,['LSU'])
self.assertEqual(obs[1].Groups,['SSU'])
self.assertEqual(obs[2].Groups,[None])
def test_multiple_func(self):
"""Organizer should handle filter with multiple functions correctly"""
org = self.Mult_func_org
l = self.Gene_seq
obs = org(l)
obs.sort()
exp = [[self.seq1],[self.seq2,self.seq3,self.seq4,self.seq5,\
self.seq6,self.seq7]]
self.assertEqual(obs,exp)
self.assertEqual(obs[0].Groups,['valid'])
self.assertEqual(obs[1].Groups,[None])
class organizerTests(TestCase):
"""Tests for module-level functions"""
def test_regroup(self):
"""regroup: should groups with identical hierarchy-info together"""
g1 = GroupList([1,2,3],['a'])
g2 = GroupList([4,5,6],['b'])
g3 = GroupList([7,7,7],['a','b'])
g4 = GroupList([8,8,8],['a'])
all = [g1, g2, g3, g4]
self.assertEqualItems(regroup(all), [[1,2,3,8,8,8],[7,7,7],[4,5,6]])
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_util/test_recode_alignment.py 000755 000765 000024 00000035142 12024702176 024337 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
# Author: Greg Caporaso (gregcaporaso@gmail.com)
# test_recode_alignment.py
""" Description
File created on 19 Jun 2007.
"""
from __future__ import division
from numpy import array
from cogent import LoadSeqs
from cogent.util.unit_test import TestCase, main
from cogent.core.alignment import DenseAlignment
from cogent.evolve.models import DSO78_matrix, DSO78_freqs
from cogent.evolve.substitution_model import SubstitutionModel
from cogent.core.alphabet import Alphabet
from cogent.app.gctmpca import gctmpca_aa_order,\
default_gctmpca_aa_sub_matrix
from cogent.util.recode_alignment import alphabets, recode_dense_alignment,\
build_alphabet_map, recode_freq_vector, recode_alignment,\
recode_counts_and_freqs, recode_count_matrix
__author__ = "Greg Caporaso"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Greg Caporaso"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Greg Caporaso"
__email__ = "gregcaporaso@gmail.com"
__status__ = "Beta"
class RecodeAlignmentTests(TestCase):
""" Tests of functions in recode_alphabet.py
These functions will probably move at some point, and the unit tests
will move with them.
"""
def setUp(self):
""" Initialize some variables for the tests """
self.canonical_abbrevs = 'ACDEFGHIKLMNPQRSTVWY'
self.ambiguous_abbrevs = 'BXZ'
self.all_to_a = [('A',self.canonical_abbrevs+\
self.ambiguous_abbrevs)]
self.charge_2 = alphabets['charge_2']
self.hydropathy_3 = alphabets['hydropathy_3']
self.orig = alphabets['orig']
self.aln = DenseAlignment(\
data={'1':'CDDFBXZ', '2':'CDD-BXZ', '3':'AAAASS-'})
self.aln2 = LoadSeqs(\
data={'1':'CDDFBXZ', '2':'CDD-BXZ', '3':'AAAASS-'})
def test_build_alphabet_map_handles_bad_data(self):
"""build_alphabet_map: bad data raises error """
self.assertRaises(ValueError,build_alphabet_map)
self.assertRaises(ValueError,build_alphabet_map,'not_a_valid_id')
self.assertRaises(ValueError,build_alphabet_map,\
alphabet_def=['A','BCD','B','EFG'])
def test_build_alphabet_map_w_alphabet_id(self):
"""build_alphabet_map: returns correct dict when given alphabet_id
"""
expected = dict([\
('G','G'), ('A','G'), ('V','G'), ('L','G'), ('I','G'),\
('S','G'), ('P','G'), ('T','G'), ('C','G'), ('N','G'), ('D','G'),\
('X','G'), ('B','G'), ('M','M'), ('F','M'), ('Y','M'), ('W','M'),\
('Q','M'), ('K','M'), ('H','M'), ('R','M'), ('E','M'), ('Z','M')])
self.assertEqual(build_alphabet_map('size_2'),expected)
self.assertEqual(build_alphabet_map('charge_3')['E'],'D')
self.assertEqual(build_alphabet_map('charge_3')['B'],'A')
self.assertEqual(build_alphabet_map('charge_3')['K'],'K')
def test_build_alphabet_map_w_alphabet_def(self):
"""build_alphabet_map: returns correct dict when given alphabet_def
"""
expected = dict([\
('G','S'), ('A','S'), ('V','S'), ('L','S'), ('I','S'),\
('S','S'), ('P','S'), ('T','S'), ('C','S'), ('N','S'), ('D','S'),\
('X','S'), ('B','S'), ('M','L'), ('F','L'), ('Y','L'), ('W','L'),\
('Q','L'), ('K','L'), ('H','L'), ('R','L'), ('E','L'), ('Z','L')])
self.assertEqual(build_alphabet_map(alphabet_def=\
[('S','GAVLISPTCNDXB'),('L','MFYWQKHREZ')]),expected)
def test_build_alphabet_map_handles_all_ids_and_defs_wo_error(self):
"""build_alphabet_map: handles all pre-defined alphabets w/o error"""
for alphabet_id, alphabet_def in alphabets.items():
try:
build_alphabet_map(alphabet_id=alphabet_id)
except ValueError:
raise AssertionError, "Failed on id: %s" % alphabet_id
try:
build_alphabet_map(alphabet_def=alphabet_def)
except ValueError:
raise AssertionError, "Failed on def: %s" % str(alphabet_def)
def test_recode_dense_alignment_handles_all_ids_and_defs_wo_error(self):
"""recode_dense_alignment: handles pre-defined alphabets w/o error"""
for alphabet_id, alphabet_def in alphabets.items():
try:
recode_dense_alignment(self.aln,alphabet_id=alphabet_id)
except ValueError:
raise AssertionError, "Failed on id: %s" % alphabet_id
try:
recode_dense_alignment(self.aln,alphabet_def=alphabet_def)
except ValueError:
raise AssertionError, "Failed on def: %s" % str(alphabet_def)
def test_recode_dense_alignment_leaves_original_alignment_intact(self):
"""recode_dense_alignment: leaves input alignment intact
"""
# provided with alphabet_id
actual = recode_dense_alignment(self.aln, alphabet_id='charge_2')
self.assertNotEqual(actual,self.aln)
# provided with alphabet_def
actual = recode_dense_alignment(self.aln, alphabet_def=self.charge_2)
self.assertNotEqual(actual,self.aln)
def test_recode_dense_alignment(self):
"""recode_dense_alignment: recode alignment to charge_2 alpha works
"""
expected_c2 = DenseAlignment(data=\
{'1':'AKKAKAK','2':'AKK-KAK','3':'AAAAAA-'})
expected_h3 = DenseAlignment(data=\
{'1':'PRRPRPR','2':'PRR-RPR','3':'PPPPYY-'})
expected_aa = DenseAlignment(data=\
{'1':'AAAAAAA','2':'AAA-AAA','3':'AAAAAA-'})
# provided with alphabet_id
actual = recode_dense_alignment(self.aln, alphabet_id='charge_2')
self.assertEqual(actual,expected_c2)
# provided with alphabet_def
actual = recode_dense_alignment(self.aln, alphabet_def=self.charge_2)
self.assertEqual(actual,expected_c2)
# different alphabet
actual = recode_dense_alignment(self.aln, alphabet_id='hydropathy_3')
self.assertEqual(actual,expected_h3)
actual = recode_dense_alignment(self.aln,\
alphabet_def=self.hydropathy_3)
self.assertEqual(actual,expected_h3)
# different alphabet
actual = recode_dense_alignment(self.aln, alphabet_def=self.all_to_a)
self.assertEqual(actual,expected_aa)
# original charactars which aren't remapped are let in original state
actual = recode_dense_alignment(self.aln, alphabet_def=[('a','b')])
self.assertEqual(actual,self.aln)
# non-alphabetic character mapped same as alphabetic characters
actual = recode_dense_alignment(self.aln, alphabet_def=[('.','-')])
expected = DenseAlignment(\
data={'1':'CDDFBXZ', '2':'CDD.BXZ', '3':'AAAASS.'})
self.assertEqual(actual,expected)
def test_recode_dense_alignment_to_orig(self):
"""recode_dense_alignment: recode aln to orig returns original aln
"""
# provided with alphabet_id
self.assertEqual(recode_dense_alignment(\
self.aln, alphabet_id='orig'), self.aln)
# provided with alphabet_def
self.assertEqual(recode_dense_alignment(\
self.aln, alphabet_def=self.orig), self.aln)
# THE FUNCTION THAT THESE TESTS APPLY TO ONLY EXISTS AS A STUB RIGHT
# NOW -- WILL UNCOMMENT THE TESTS WHEN THE FUNCTIONS IS READY.
# --GREG C. (11/19/08)
# def test_recode_alignment(self):
# """recode_alignment: recode alignment works as expected
# """
# expected_c2 = LoadSeqs(data=\
# {'1':'AKKAKAK','2':'AKK-KAK','3':'AAAAAA-'})
# expected_h3 = LoadSeqs(data=\
# {'1':'PRRPRPR','2':'PRR-RPR','3':'PPPPYY-'})
# expected_aa = LoadSeqs(data=\
# {'1':'AAAAAAA','2':'AAA-AAA','3':'AAAAAA-'})
#
# # provided with alphabet_id
# actual = recode_alignment(self.aln2, alphabet_id='charge_2')
# self.assertEqual(actual,expected_c2)
# # provided with alphabet_def
# actual = recode_alignment(self.aln2, alphabet_def=self.charge_2)
# self.assertEqual(actual,expected_c2)
#
# # different alphabet
# actual = recode_alignment(self.aln2, alphabet_id='hydropathy_3')
# self.assertEqual(actual,expected_h3)
# actual = recode_alignment(self.aln2,\
# alphabet_def=self.hydropathy_3)
# self.assertEqual(actual,expected_h3)
#
# # different alphabet
# actual = recode_alignment(self.aln2, alphabet_def=self.all_to_a)
# self.assertEqual(actual,expected_aa)
#
# # original charactars which aren't remapped are let in original state
# actual = recode_alignment(self.aln2, alphabet_def=[('a','b')])
# self.assertEqual(actual,self.aln2)
#
# # non-alphabetic character mapped same as alphabetic characters
# actual = recode_alignment(self.aln2, alphabet_def=[('.','-')])
# expected = LoadSeqs(\
# data={'1':'CDDFBXZ', '2':'CDD.BXZ', '3':'AAAASS.'})
# self.assertEqual(actual,expected)
#
# def test_recode_alignment_to_orig(self):
# """recode_alignment: recode aln to orig returns original aln
# """
# # provided with alphabet_id
# self.assertEqual(recode_alignment(\
# self.aln2, alphabet_id='orig'), self.aln2)
# # provided with alphabet_def
# self.assertEqual(recode_alignment(\
# self.aln2, alphabet_def=self.orig), self.aln2)
#
# def test_recode_alignment_leaves_original_alignment_intact(self):
# """recode_alignment: leaves input alignment intact
# """
# # provided with alphabet_id
# actual = recode_alignment(self.aln2, alphabet_id='charge_2')
# self.assertNotEqual(actual,self.aln2)
# # provided with alphabet_def
# actual = recode_alignment(self.aln2, alphabet_def=self.charge_2)
# self.assertNotEqual(actual,self.aln2)
def test_recode_freq_vector(self):
"""recode_freq_vector: bg freqs updated to reflect recoded alphabet
"""
freqs = {'A':0.21, 'E':0.29, 'C':0.05, 'D':0.45}
a_def = [('A','AEC'),('E','D')]
expected = {'A':0.55, 'E':0.45}
self.assertFloatEqual(recode_freq_vector(a_def,freqs),\
expected)
# reversal of alphabet
freqs = {'A':0.21, 'E':0.29, 'C':0.05, 'D':0.45}
a_def = [('A','D'),('E','C'),('C','E'),('D','A')]
expected = {'A':0.45,'E':0.05,'C':0.29,'D':0.21}
self.assertFloatEqual(recode_freq_vector(a_def,freqs),\
expected)
# no change in freqs (old alphabet = new alphabet)
freqs = {'A':0.21, 'E':0.29, 'C':0.05, 'D':0.45}
a_def = [('A','A'),('E','E'),('C','C'),('D','D')]
self.assertFloatEqual(recode_freq_vector(a_def,freqs),\
freqs)
freqs = {'A':0.21, 'E':0.29, 'C':0.05, 'D':0.45}
a_def = [('X','AEC'),('Y','D')]
expected = {'X':0.55, 'Y':0.45}
self.assertFloatEqual(recode_freq_vector(a_def,freqs),\
expected)
def test_recode_freq_vector_ignores(self):
"""recode_freq_vector: ignored chars are ignored
"""
freqs = {'A':0.21, 'B':0.29, 'C':0.05, 'D':0.45,'X':0.22,'Z':0.5}
a_def = [('A','A'),('B','B'),('C','C'),('D','D'),('X','X'),('Z','Z')]
expected = {'A':0.21,'C':0.05, 'D':0.45}
self.assertFloatEqual(recode_freq_vector(a_def,freqs),\
expected)
freqs = {'D':0.21, 'E':0.29, 'N':0.05,\
'Q':0.45,'B':0.26,'Z':0.74,'X':1.0}
a_def = [('D','DEN'),('Q','Q')]
expected = {'D':0.55, 'Q':0.45}
self.assertFloatEqual(recode_freq_vector(a_def,freqs),\
expected)
class RecodeMatrixTests(TestCase):
""" Tests of substitution matrix recoding. """
def setUp(self):
""" Create variables for use in the tests """
self.m1 = [[0,4,1,3,5],[4,0,2,4,6],[1,2,0,7,8],[3,4,7,0,9],[5,6,8,9,0]]
self.recoded_m1 =\
[[0,0,21,0,0],[0,0,0,0,0],[21,0,0,0,0],[0,0,0,0,0],[0,0,0,0,0]]
self.aa_order1 = 'DELIV'
self.input_freqs1 = dict(zip(self.aa_order1,[0.2]*5))
self.alphabet1 = [('D','DE'),('L','LIV')]
#create_recoded_rate_matrix(alphabets['a1_4'])
self.m2 = [[0,8,6,5,1],[8,0,7,3,0],[6,7,0,4,2],[5,3,4,0,0],[1,0,2,0,0]]
self.recoded_m2 =\
[[0,0,21,0,1],[0,0,0,0,0],[21,0,0,0,2],[0,0,0,0,0],[1,0,2,0,0]]
self.aa_order2 = 'DELIC'
self.input_freqs2 = dict(zip(self.aa_order2,[0.2]*5))
self.alphabet2 = [('D','DE'),('L','LI'),('C','C')]
self.alphabet2_w_ambig = [('D','DEX'),('L','LIB'),('C','CZ')]
def test_recode_counts_and_freqs(self):
"""recode_counts_and_freqs: functions as expected
"""
alphabet = alphabets['charge_his_3']
aa_order = 'ACDEFGHIKLMNPQRSTVWY'
actual = recode_counts_and_freqs(alphabet)
expected_matrix = recode_count_matrix(alphabet,\
count_matrix=DSO78_matrix,aa_order=aa_order)
expected_freqs = {}.fromkeys(aa_order,0.0)
expected_freqs.update(recode_freq_vector(alphabet,DSO78_freqs))
expected = (expected_matrix,expected_freqs)
self.assertEqual(actual,expected)
def test_recode_count_matrix_2_states(self):
"""recode_count_matrix: returns correct result with 2-state alphabet
"""
actual = recode_count_matrix(self.alphabet1,self.m1,self.aa_order1)
expected = self.recoded_m1
self.assertEqual(actual,expected)
def test_recode_count_matrix_3_states(self):
"""recode_count_matrix: returns correct result with 3-state alphabet
"""
actual = recode_count_matrix(self.alphabet2,self.m2,self.aa_order2)
expected = self.recoded_m2
self.assertEqual(actual,expected)
def test_recode_count_matrix_3_states_ambig_ignored(self):
"""recode_count_matrix: correct result w 3-state alphabet w ambig chars
"""
actual =\
recode_count_matrix(self.alphabet2_w_ambig,self.m2,self.aa_order2)
expected = self.recoded_m2
self.assertEqual(actual,expected)
def test_recode_count_matrix_no_change(self):
"""recode_count_matrix: no changes applied when they shouldn't be
"""
# recoding recoded matrices
actual =\
recode_count_matrix(self.alphabet1,self.recoded_m1,self.aa_order1)
expected = self.recoded_m1
self.assertEqual(actual,expected)
actual =\
recode_count_matrix(self.alphabet2,self.recoded_m2,self.aa_order2)
expected = self.recoded_m2
self.assertEqual(actual,expected)
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_util/test_table.rst 000755 000765 000024 00000230554 11643033237 022314 0 ustar 00jrideout staff 000000 000000 Data Manipulation using ``Table``
=================================
.. sectionauthor:: Gavin Huttley
..
Copyright 2007-2009, The Cogent Project
Credits Gavin Huttley, Felix Schill
License, GPL
version, 1.3.0.dev
Maintainer, Gavin Huttley
Email, gavin.huttley@anu.edu.au
Status, Production
The toolkit has a ``Table`` object that can be used for manipulating tabular data. It's properties can be considered like an ordered 2 dimensional dictionary or tuple with flexible output format capabilities of use for exporting data for import into external applications. Importantly, via the restructured text format one can generate html or latex formatted tables. The ``table`` module is located within ``cogent.util``. The ``LoadTable`` convenience function is provided as a top-level ``cogent`` import.
Table creation
--------------
Tables can be created directly using the Table object itself, or a convenience function that handles loading from files. We import both here:
.. doctest::
>>> from cogent import LoadTable
>>> from cogent.util.table import Table
First, if you try and create a ``Table`` without any data, it raises a ``RuntimeError``.
.. doctest::
>>> t = Table()
Traceback (most recent call last):
RuntimeError: header and rows must be provided to Table
>>> t = Table(header=[], rows=[])
Traceback (most recent call last):
RuntimeError: header and rows must be provided to Table
Let's create a very simple, rather nonsensical, table first. To create a table requires a header series, and a 2D series (either of type ``tuple``, ``list``, ``dict``).
.. doctest::
>>> column_headings = ['Journal', 'Impact']
The string "Journal" will become the first column heading, "Impact" the second column heading. The data are,
.. doctest::
>>> rows = [['INT J PARASITOL', 2.9],
... ['J MED ENTOMOL', 1.4],
... ['Med Vet Entomol', 1.0],
... ['INSECT MOL BIOL', 2.85],
... ['J AM MOSQUITO CONTR', 0.811],
... ['MOL PHYLOGENET EVOL', 2.8],
... ['HEREDITY', 1.99e+0],
... ['AM J TROP MED HYG', 2.105],
... ['MIL MED', 0.605],
... ['MED J AUSTRALIA', 1.736]]
We create the simplest of tables.
.. doctest::
>>> t = Table(header = column_headings, rows = rows)
>>> print t
=============================
Journal Impact
-----------------------------
INT J PARASITOL 2.9000
J MED ENTOMOL 1.4000
Med Vet Entomol 1.0000
INSECT MOL BIOL 2.8500
J AM MOSQUITO CONTR 0.8110
MOL PHYLOGENET EVOL 2.8000
HEREDITY 1.9900
AM J TROP MED HYG 2.1050
MIL MED 0.6050
MED J AUSTRALIA 1.7360
-----------------------------
The format above is referred to as 'simple' format in the documentation. Notice that the numbers in this table have 4 decimal places, despite the fact the original data were largely strings and had ``max`` of 3 decimal places precision. ``Table`` converts string representations of numbers to their appropriate form when you do ``str(table)`` or print the table.
We have several things we might want to specify when creating a table: the precision and or format of floating point numbers (integer argument - ``digits``), the spacing between columns (integer argument or actual string of whitespace - ``space``), title (argument - ``title``), and legend (argument - ``legend``). Lets modify some of these and provide a title and legend.
.. doctest::
>>> t = Table(column_headings, rows, title='Journal impact factors', legend='From ISI',
... digits=2, space=' ')
>>> print t
Journal impact factors
=================================
Journal Impact
---------------------------------
INT J PARASITOL 2.90
J MED ENTOMOL 1.40
Med Vet Entomol 1.00
INSECT MOL BIOL 2.85
J AM MOSQUITO CONTR 0.81
MOL PHYLOGENET EVOL 2.80
HEREDITY 1.99
AM J TROP MED HYG 2.10
MIL MED 0.60
MED J AUSTRALIA 1.74
---------------------------------
From ISI
.. note:: You can also a representation on a table for a quick summary.
.. doctest::
>>> t
Table(numrows=10, numcols=2, header=['Journal', 'Impact'], rows=[['INT J PARASITOL', 2.9000],..])
The Table class cannot handle arbitrary python objects, unless they are passed in as strings. Note in this case we now directly pass in the column headings list and the handling of missing data can be explicitly specified..
.. doctest::
>>> t2 = Table(['abcd', 'data'], [[str(range(1,6)), '0'],
... ['x', 5.0], ['y', None]],
... missing_data='*')
>>> print t2
=========================
abcd data
-------------------------
[1, 2, 3, 4, 5] 0
x 5.0000
y *
-------------------------
Table column headings can be assessed from the ``table.Header`` property
.. doctest::
>>> assert t2.Header == ['abcd', 'data']
and this is immutable (cannot be changed).
.. doctest::
>>> t2.Header[1] = 'Data'
Traceback (most recent call last):
RuntimeError: Table Header is immutable, use withNewHeader
If you want to change the Header, use the ``withNewHeader`` method. This can be done one column at a time, or as a batch. The returned Table is identical aside from the modified column labels.
.. doctest::
>>> mod_header = t2.withNewHeader('abcd', 'ABCD')
>>> assert mod_header.Header == ['ABCD', 'data']
>>> mod_header = t2.withNewHeader(['abcd', 'data'], ['ABCD', 'DATA'])
>>> print mod_header
=========================
ABCD DATA
-------------------------
[1, 2, 3, 4, 5] 0
x 5.0000
y *
-------------------------
Tables may also be created from 2-dimensional dictionaries. In this case, special capabilities are provided to enforce printing rows in a particular order.
.. doctest::
>>> d2D={'edge.parent': {'NineBande': 'root', 'edge.1': 'root',
... 'DogFaced': 'root', 'Human': 'edge.0', 'edge.0': 'edge.1',
... 'Mouse': 'edge.1', 'HowlerMon': 'edge.0'}, 'x': {'NineBande': 1.0,
... 'edge.1': 1.0, 'DogFaced': 1.0, 'Human': 1.0, 'edge.0': 1.0,
... 'Mouse': 1.0, 'HowlerMon': 1.0}, 'length': {'NineBande': 4.0,
... 'edge.1': 4.0, 'DogFaced': 4.0, 'Human': 4.0, 'edge.0': 4.0,
... 'Mouse': 4.0, 'HowlerMon': 4.0}, 'y': {'NineBande': 3.0, 'edge.1': 3.0,
... 'DogFaced': 3.0, 'Human': 3.0, 'edge.0': 3.0, 'Mouse': 3.0,
... 'HowlerMon': 3.0}, 'z': {'NineBande': 6.0, 'edge.1': 6.0,
... 'DogFaced': 6.0, 'Human': 6.0, 'edge.0': 6.0, 'Mouse': 6.0,
... 'HowlerMon': 6.0},
... 'edge.name': ['Human', 'HowlerMon', 'Mouse', 'NineBande', 'DogFaced',
... 'edge.0', 'edge.1']}
>>> row_order = d2D['edge.name']
>>> d2D['edge.name'] = dict(zip(row_order, row_order))
>>> t3 = Table(['edge.name', 'edge.parent', 'length', 'x', 'y', 'z'], d2D,
... row_order = row_order, missing_data='*', space=8, max_width = 50,
... row_ids = True, title = 'My Title',
... legend = 'Legend: this is a nonsense example.')
>>> print t3
My Title
==========================================
edge.name edge.parent length
------------------------------------------
Human edge.0 4.0000
HowlerMon edge.0 4.0000
Mouse edge.1 4.0000
NineBande root 4.0000
DogFaced root 4.0000
edge.0 edge.1 4.0000
edge.1 root 4.0000
------------------------------------------
continued: My Title
=====================================
edge.name x y
-------------------------------------
Human 1.0000 3.0000
HowlerMon 1.0000 3.0000
Mouse 1.0000 3.0000
NineBande 1.0000 3.0000
DogFaced 1.0000 3.0000
edge.0 1.0000 3.0000
edge.1 1.0000 3.0000
-------------------------------------
continued: My Title
=======================
edge.name z
-----------------------
Human 6.0000
HowlerMon 6.0000
Mouse 6.0000
NineBande 6.0000
DogFaced 6.0000
edge.0 6.0000
edge.1 6.0000
-----------------------
Legend: this is a nonsense example.
In the above we specify a maximum width of the table, and also specify row identifiers (using ``row_ids``, the integer corresponding to the column at which data begin, preceding columns are taken as the identifiers). This has the effect of forcing the table to wrap when the simple text format is used, but wrapping does not occur for any other format. The ``row_ids`` are keys for slicing the table by row, and as identifiers are presented in each wrapped sub-table.
Wrapping generate neat looking tables whether or not you index the table rows. We demonstrate here
.. doctest::
>>> from cogent import LoadTable
>>> h = ['A/C', 'A/G', 'A/T', 'C/A']
>>> rows = [[0.0425, 0.1424, 0.0226, 0.0391]]
>>> wrap_table = LoadTable(header=h, rows=rows, max_width=30)
>>> print wrap_table
==============================
A/C A/G A/T
------------------------------
0.0425 0.1424 0.0226
------------------------------
continued:
==========
C/A
----------
0.0391
----------
>>> wrap_table = LoadTable(header=h, rows=rows, max_width=30,
... row_ids=True)
>>> print wrap_table
==========================
A/C A/G A/T
--------------------------
0.0425 0.1424 0.0226
--------------------------
continued:
================
A/C C/A
----------------
0.0425 0.0391
----------------
We can also customise the formatting of individual columns.
.. doctest::
>>> rows = (('NP_003077_hs_mm_rn_dna', 'Con', 2.5386013224378985),
... ('NP_004893_hs_mm_rn_dna', 'Con', 0.12135142635634111e+06),
... ('NP_005079_hs_mm_rn_dna', 'Con', 0.95165949788861326e+07),
... ('NP_005500_hs_mm_rn_dna', 'Con', 0.73827030202664901e-07),
... ('NP_055852_hs_mm_rn_dna', 'Con', 1.0933217708952725e+07))
We first create a table and show the default formatting behaviour for ``Table``.
.. doctest::
>>> t46 = Table(['Gene', 'Type', 'LR'], rows)
>>> print t46
===============================================
Gene Type LR
-----------------------------------------------
NP_003077_hs_mm_rn_dna Con 2.5386
NP_004893_hs_mm_rn_dna Con 121351.4264
NP_005079_hs_mm_rn_dna Con 9516594.9789
NP_005500_hs_mm_rn_dna Con 0.0000
NP_055852_hs_mm_rn_dna Con 10933217.7090
-----------------------------------------------
We then format the ``LR`` column to use a scientific number format.
.. doctest::
>>> t46 = Table(['Gene', 'Type', 'LR'], rows)
>>> t46.setColumnFormat('LR', "%.4e")
>>> print t46
============================================
Gene Type LR
--------------------------------------------
NP_003077_hs_mm_rn_dna Con 2.5386e+00
NP_004893_hs_mm_rn_dna Con 1.2135e+05
NP_005079_hs_mm_rn_dna Con 9.5166e+06
NP_005500_hs_mm_rn_dna Con 7.3827e-08
NP_055852_hs_mm_rn_dna Con 1.0933e+07
--------------------------------------------
It is safe to directly modify certain attributes, such as the title, legend and white space separating columns, which we do for the ``t46``.
.. doctest::
>>> t46.Title = "A new title"
>>> t46.Legend = "A new legend"
>>> t46.Space = ' '
>>> print t46
A new title
========================================
Gene Type LR
----------------------------------------
NP_003077_hs_mm_rn_dna Con 2.5386e+00
NP_004893_hs_mm_rn_dna Con 1.2135e+05
NP_005079_hs_mm_rn_dna Con 9.5166e+06
NP_005500_hs_mm_rn_dna Con 7.3827e-08
NP_055852_hs_mm_rn_dna Con 1.0933e+07
----------------------------------------
A new legend
We can provide settings for multiple columns.
.. doctest::
>>> t3 = Table(['edge.name', 'edge.parent', 'length', 'x', 'y', 'z'], d2D,
... row_order = row_order)
>>> t3.setColumnFormat('x', "%.1e")
>>> t3.setColumnFormat('y', "%.2f")
>>> print t3
===============================================================
edge.name edge.parent length x y z
---------------------------------------------------------------
Human edge.0 4.0000 1.0e+00 3.00 6.0000
HowlerMon edge.0 4.0000 1.0e+00 3.00 6.0000
Mouse edge.1 4.0000 1.0e+00 3.00 6.0000
NineBande root 4.0000 1.0e+00 3.00 6.0000
DogFaced root 4.0000 1.0e+00 3.00 6.0000
edge.0 edge.1 4.0000 1.0e+00 3.00 6.0000
edge.1 root 4.0000 1.0e+00 3.00 6.0000
---------------------------------------------------------------
In some cases, the contents of a column can be of different types. In this instance, rather than passing a column template we pass a reference to a function that will handle this complexity. To illustrate this we will define a function that formats floating point numbers, but returns everything else as is.
.. doctest::
>>> def formatcol(value):
... if isinstance(value, float):
... val = "%.2f" % value
... else:
... val = str(value)
... return val
We apply this to a table with mixed string, integer and floating point data.
.. doctest::
>>> t6 = Table(['ColHead'], [['a'], [1], [0.3], ['cc']],
... column_templates = dict(ColHead=formatcol))
>>> print t6
=======
ColHead
-------
a
1
0.30
cc
-------
Representation of tables
^^^^^^^^^^^^^^^^^^^^^^^^
The representation formatting provides a quick overview of a table's dimensions and it's contents. We show this for a table with 3 columns and multiple rows
.. doctest::
>>> t46
Table(numrows=5, numcols=3, header=['Gene', 'Type', 'LR'], rows=[['NP_003077_hs_mm_rn_dna', 'Con', 2.5386],..])
and larger
.. doctest::
>>> t3
Table(numrows=7, numcols=6, header=['edge.name', 'edge.parent', 'length',..], rows=[['Human', 'edge.0', 4.0000,..],..])
.. note:: within a script use ``print repr(t3)`` to get the same representation.
Table output
------------
Table can output in multiple formats, including restructured text or 'rest' and delimited. These can be obtained using the ``tostring`` method and ``format`` argument as follows. Using table ``t`` from above,
.. doctest::
>>> print t.tostring(format='rest')
+------------------------------+
| Journal impact factors |
+---------------------+--------+
| Journal | Impact |
+=====================+========+
| INT J PARASITOL | 2.90 |
+---------------------+--------+
| J MED ENTOMOL | 1.40 |
+---------------------+--------+
| Med Vet Entomol | 1.00 |
+---------------------+--------+
| INSECT MOL BIOL | 2.85 |
+---------------------+--------+
| J AM MOSQUITO CONTR | 0.81 |
+---------------------+--------+
| MOL PHYLOGENET EVOL | 2.80 |
+---------------------+--------+
| HEREDITY | 1.99 |
+---------------------+--------+
| AM J TROP MED HYG | 2.10 |
+---------------------+--------+
| MIL MED | 0.60 |
+---------------------+--------+
| MED J AUSTRALIA | 1.74 |
+---------------------+--------+
| From ISI |
+------------------------------+
Arguments such as ``space`` have no effect in this case. The table may also be written to file in any of the available formats (latex, simple text, html, pickle) or using a custom separator (such as a comma or tab). This makes it convenient to get data into other applications (such as R or a spreadsheet program).
Here is the latex format, note how the title and legend are joined into the latex table caption. We also provide optional arguments for the column alignment (fist column left aligned, second column right aligned and remaining columns centred) and a label for table referencing.
.. doctest::
>>> print t3.tostring(format='tex', justify="lrcccc", label="table:example")
\begin{longtable}[htp!]{ l r c c c c }
\hline
\bf{edge.name} & \bf{edge.parent} & \bf{length} & \bf{x} & \bf{y} & \bf{z} \\
\hline
\hline
Human & edge.0 & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\
HowlerMon & edge.0 & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\
Mouse & edge.1 & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\
NineBande & root & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\
DogFaced & root & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\
edge.0 & edge.1 & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\
edge.1 & root & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\
\hline
\label{table:example}
\end{longtable}
More complex latex table justifying is also possible. Specifying the width of individual columns requires passing in a series (list or tuple) of justification commands. In the following we introduce the command for specific columns widths.
.. doctest::
>>> print t3.tostring(format='tex', justify=["l","p{3cm}","c","c","c","c"])
\begin{longtable}[htp!]{ l p{3cm} c c c c }
\hline
\bf{edge.name} & \bf{edge.parent} & \bf{length} & \bf{x} & \bf{y} & \bf{z} \\
\hline
\hline
Human & edge.0 & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\
HowlerMon & edge.0 & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\
Mouse & edge.1 & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\
NineBande & root & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\
DogFaced & root & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\
edge.0 & edge.1 & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\
edge.1 & root & 4.0000 & 1.0e+00 & 3.00 & 6.0000 \\
\hline
\end{longtable}
>>> print t3.tostring(sep=',')
edge.name,edge.parent,length, x, y, z
Human, edge.0,4.0000,1.0e+00,3.00,6.0000
HowlerMon, edge.0,4.0000,1.0e+00,3.00,6.0000
Mouse, edge.1,4.0000,1.0e+00,3.00,6.0000
NineBande, root,4.0000,1.0e+00,3.00,6.0000
DogFaced, root,4.0000,1.0e+00,3.00,6.0000
edge.0, edge.1,4.0000,1.0e+00,3.00,6.0000
edge.1, root,4.0000,1.0e+00,3.00,6.0000
You can specify any standard text character that will work with your desired target. Useful separators are tabs ('\\t'), or pipes ('\|'). If ``Table`` encounters any of these characters within a cell, it wraps the cell in quotes -- a standard approach to facilitate import by other applications. We will illustrate this with ``t2``.
.. doctest::
>>> print t2.tostring(sep=', ')
abcd, data
"[1, 2, 3, 4, 5]", 0
x, 5.0000
y, *
Note that I introduced an extra space after the column just to make the result more readable in this example.
Test the writing of phylip distance matrix format.
.. doctest::
>>> rows = [['a', '', 0.088337278874079342, 0.18848582712597683,
... 0.44084000179091454], ['c', 0.088337278874079342, '',
... 0.088337278874079342, 0.44083999937417828], ['b', 0.18848582712597683,
... 0.088337278874079342, '', 0.44084000179090932], ['e',
... 0.44084000179091454, 0.44083999937417828, 0.44084000179090932, '']]
>>> header = ['seq1/2', 'a', 'c', 'b', 'e']
>>> dist = Table(rows = rows, header = header,
... row_ids = True)
>>> print dist.tostring(format = 'phylip')
4
a 0.0000 0.0883 0.1885 0.4408
c 0.0883 0.0000 0.0883 0.4408
b 0.1885 0.0883 0.0000 0.4408
e 0.4408 0.4408 0.4408 0.0000
The ``tostring`` method also provides generic html generation via the restructured text format. The ``toRichHtmlTable`` method can be used to generate the html table element by itself, with greater control over formatting. Specifically, users can provide custom callback functions to the ``row_cell_func`` and ``header_cell_func`` arguments to control in detail the formatting of table elements, or use the simpler dictionary based ``element_formatters`` approach. We use the above ``dist`` table to provide a specific callback that will set the background color for diagonal cells. We first write a function that takes the cell value and coordinates, returning the html formmatted text.
.. doctest::
>>> def format_cell(value, row_num, col_num):
... bgcolor=['', ' bgcolor="#0055ff"'][value=='']
... return '%s ' % (bgcolor, value)
We then call the method, without this argument, then with it.
.. doctest::
>>> straight_html = dist.toRichHtmlTable()
>>> print straight_html
seq1/2 a...
>>> rich_html = dist.toRichHtmlTable(row_cell_func=format_cell,
... compact=False)
>>> print rich_html
seq1/2
a
c
b
e
a
0.0883 ...
Exporting bedGraph format
-------------------------
One export format available is bedGraph_. This format can be used for viewing data as annotation track in a genome browser. This format allows for unequal spans and merges adjacent spans with the same value. The format has many possible arguments that modify the appearance in the genome browser. For this example we just create a simple data set.
.. doctest::
>>> rows = [['1', 100, 101, 1.123], ['1', 101, 102, 1.123],
... ['1', 102, 103, 1.123], ['1', 103, 104, 1.123],
... ['1', 104, 105, 1.123], ['1', 105, 106, 1.123],
... ['1', 106, 107, 1.123], ['1', 107, 108, 1.123],
... ['1', 108, 109, 1], ['1', 109, 110, 1],
... ['1', 110, 111, 1], ['1', 111, 112, 1],
... ['1', 112, 113, 1], ['1', 113, 114, 1],
... ['1', 114, 115, 1], ['1', 115, 116, 1],
... ['1', 116, 117, 1], ['1', 117, 118, 1],
... ['1', 118, 119, 2], ['1', 119, 120, 2],
... ['1', 120, 121, 2], ['1', 150, 151, 2],
... ['1', 151, 152, 2], ['1', 152, 153, 2],
... ['1', 153, 154, 2], ['1', 154, 155, 2],
... ['1', 155, 156, 2], ['1', 156, 157, 2],
... ['1', 157, 158, 2], ['1', 158, 159, 2],
... ['1', 159, 160, 2], ['1', 160, 161, 2]]
...
>>> bgraph = LoadTable(header=['chrom', 'start', 'end', 'value'],
... rows=rows)
...
>>> print bgraph.tostring(format='bedgraph', name='test track',
... graphType='bar', description='test of bedgraph', color=(255,0,0)) # doctest: +NORMALIZE_WHITESPACE
track type=bedGraph name="test track" description="test of bedgraph" color=255,0,0 graphType=bar
1 100 108 1.12
1 108 118 1.0
1 118 161 2.0
The bedgraph formatter defaults to rounding values to 2 decimal places. You can adjust that precision using the ``digits`` argument.
.. doctest::
:options: +NORMALIZE_WHITESPACE
>>> print bgraph.tostring(format='bedgraph', name='test track',
... graphType='bar', description='test of bedgraph', color=(255,0,0),
... digits=0) # doctest: +NORMALIZE_WHITESPACE
track type=bedGraph name="test track" description="test of bedgraph" color=255,0,0 graphType=bar
1 100 118 1.0
1 118 161 2.0
.. note:: Writing files in bedgraph format is done using the ``writeToFile(format='bedgraph', name='test track', description='test of bedgraph', color=(255,0,0))``.
.. _bedGraph: https://cgwb.nci.nih.gov/goldenPath/help/bedgraph.html
Saving a table for reloading
----------------------------
Saving a table object to file for later reloading can be done using the standard ``writeToFile`` method and ``filename`` argument to the ``Table`` constructor, specifying any of the formats supported by ``tostring``. The table loading will recreate a table from raw data located at ``filename``. To illustrate this, we first write out the table ``t3`` in ``pickle`` format and then the table ``t2`` in a csv (comma separated values format).
.. doctest::
:options: +NORMALIZE_WHITESPACE
>>> t3 = Table(['edge.name', 'edge.parent', 'length', 'x', 'y', 'z'], d2D,
... row_order = row_order, missing_data='*', space=8, max_width = 50,
... row_ids = True, title = 'My Title',
... legend = 'Legend: this is a nonsense example.')
>>> t3.writeToFile("t3.pickle")
>>> t3_loaded = LoadTable(filename = "t3.pickle")
>>> print t3_loaded
My Title
==========================================
edge.name edge.parent length
------------------------------------------
Human edge.0 4.0000
HowlerMon edge.0 4.0000
Mouse edge.1 4.0000
NineBande root 4.0000
DogFaced root 4.0000
edge.0 edge.1 4.0000
edge.1 root 4.0000
------------------------------------------
continued: My Title
=====================================
edge.name x y
-------------------------------------
Human 1.0000 3.0000
HowlerMon 1.0000 3.0000
Mouse 1.0000 3.0000
NineBande 1.0000 3.0000
DogFaced 1.0000 3.0000
edge.0 1.0000 3.0000
edge.1 1.0000 3.0000
-------------------------------------
continued: My Title
=======================
edge.name z
-----------------------
Human 6.0000
HowlerMon 6.0000
Mouse 6.0000
NineBande 6.0000
DogFaced 6.0000
edge.0 6.0000
edge.1 6.0000
-----------------------
Legend: this is a nonsense example.
>>> t2 = Table(['abcd', 'data'], [[str(range(1,6)), '0'], ['x', 5.0],
... ['y', None]], missing_data='*', title = 'A \ntitle')
>>> t2.writeToFile('t2.csv', sep=',')
>>> t2_loaded = LoadTable(filename = 't2.csv', header = True, with_title = True,
... sep = ',')
>>> print t2_loaded
A
title
=========================
abcd data
-------------------------
[1, 2, 3, 4, 5] 0
x 5.0000
y
-------------------------
Note the ``missing_data`` attribute is not saved in the delimited format, but is in the ``pickle`` format. In the next case, I'm going to override the digits format on reloading of the table.
.. doctest::
>>> t2 = Table(['abcd', 'data'], [[str(range(1,6)), '0'], ['x', 5.0],
... ['y', None]], missing_data='*', title = 'A \ntitle',
... legend = "And\na legend too")
>>> t2.writeToFile('t2.csv', sep=',')
>>> t2_loaded = LoadTable(filename = 't2.csv', header = True,
... with_title = True, with_legend = True, sep = ',', digits = 2)
>>> print t2_loaded # doctest: +NORMALIZE_WHITESPACE
A
title
=======================
abcd data
-----------------------
[1, 2, 3, 4, 5] 0
x 5.00
y
-----------------------
And
a legend too
A few things to note about the delimited file saving: formatting arguments are lost in saving to a delimited format; the ``header`` argument specifies whether the first line of the file should be treated as the header; the ``with_title`` and ``with_legend`` arguments are necessary if the file contains them, otherwise they become the header or part of the table. Importantly, if you wish to preserve numerical precision use the ``pickle`` format.
``cPickle`` can load a useful object from the pickled ``Table`` by itself, without needing to know anything about the ``Table`` class.
.. doctest::
>>> import cPickle
>>> f = file("t3.pickle")
>>> pickled = cPickle.load(f)
>>> f.close()
>>> pickled.keys()
['digits', 'row_ids', 'rows', 'title', 'space', 'max_width', 'header',...
>>> pickled['rows'][0]
['Human', 'edge.0', 4.0, 1.0, 3.0, 6.0]
We can read in a delimited format using a custom reader. There are two approaches. The first one allows specifying different type conversions for different columns. The second allows specifying a whole line-based parser.
You can also read and write tables in gzip compressed format. This can be done simply by ending a filename with '.gz' or specifying ``compress=True``. We write a compressed file the two different ways and read it back in.
.. doctest::
>>> t2.writeToFile('t2.csv.gz', sep=',')
>>> t2_gz = LoadTable('t2.csv.gz', sep=',', with_title=True,
... with_legend = True)
>>> t2_gz.Shape == t2.Shape
True
>>> t2.writeToFile('t2.csv', sep=',', compress=True)
>>> t2_gz = LoadTable('t2.csv.gz', sep=',', with_title=True,
... with_legend = True)
>>> t2_gz.Shape == t2.Shape
True
Defining a custom reader with type conversion for each column
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We convert columns 2-5 to floats by specifying a field convertor. We then create a reader, specifying the data (below a list but can be a file) properties. Note that if no convertor is provided all data are returned as strings. We can also provide this reader to the ``Table`` constructor for a more direct way of opening such files. In this case, ``Table`` assumes there is a header row and nothing else.
.. doctest::
>>> from cogent.parse.table import ConvertFields, SeparatorFormatParser
>>> t3.Title = t3.Legend = None
>>> comma_sep = t3.tostring(sep=",").splitlines()
>>> print comma_sep
['edge.name,edge.parent,length, x, y, z', ' Human, ...
>>> converter = ConvertFields([(2,float), (3,float), (4,float), (5, float)])
>>> reader = SeparatorFormatParser(with_header=True,converter=converter,
... sep=",")
>>> comma_sep = [line for line in reader(comma_sep)]
>>> print comma_sep
[['edge.name', 'edge.parent', 'length', 'x', 'y', 'z'], ['Human',...
>>> t3.writeToFile("t3.tab", sep="\t")
>>> reader = SeparatorFormatParser(with_header=True,converter=converter,
... sep="\t")
>>> t3a = LoadTable(filename="t3.tab", reader=reader, title="new title",
... space=2)
...
>>> print t3a
new title
======================================================
edge.name edge.parent length x y z
------------------------------------------------------
Human edge.0 4.0000 1.0000 3.0000 6.0000
HowlerMon edge.0 4.0000 1.0000 3.0000 6.0000
Mouse edge.1 4.0000 1.0000 3.0000 6.0000
NineBande root 4.0000 1.0000 3.0000 6.0000
DogFaced root 4.0000 1.0000 3.0000 6.0000
edge.0 edge.1 4.0000 1.0000 3.0000 6.0000
edge.1 root 4.0000 1.0000 3.0000 6.0000
------------------------------------------------------
We can use the ``SeparatorFormatParser`` to ignore reading certain lines by using a callback function. We illustrate this using the above data, skipping any rows with ``edge.name`` starting with ``edge``.
.. doctest::
>>> def ignore_internal_nodes(line):
... return line[0].startswith('edge')
...
>>> reader = SeparatorFormatParser(with_header=True,converter=converter,
... sep="\t", ignore=ignore_internal_nodes)
...
>>> tips = LoadTable(filename="t3.tab", reader=reader, digits=1, space=2)
>>> print tips
=============================================
edge.name edge.parent length x y z
---------------------------------------------
Human edge.0 4.0 1.0 3.0 6.0
HowlerMon edge.0 4.0 1.0 3.0 6.0
Mouse edge.1 4.0 1.0 3.0 6.0
NineBande root 4.0 1.0 3.0 6.0
DogFaced root 4.0 1.0 3.0 6.0
---------------------------------------------
We can also limit the amount of data to be read in, very handy for checking large files.
.. doctest::
>>> t3a = LoadTable("t3.tab", sep='\t', limit=3)
>>> print t3a
================================================================
edge.name edge.parent length x y z
----------------------------------------------------------------
Human edge.0 4.0000 1.0000 3.0000 6.0000
HowlerMon edge.0 4.0000 1.0000 3.0000 6.0000
Mouse edge.1 4.0000 1.0000 3.0000 6.0000
----------------------------------------------------------------
Limiting should also work when ``static_column_types`` is invoked
.. doctest::
>>> t3a = LoadTable("t3.tab", sep='\t', limit=3, static_column_types=True)
>>> t3a.Shape[0] == 3
True
or when
In the above example, the data type in a column is static, e.g. all values in ``x`` are floats. Rather than providing a custom reader, you can get the ``Table`` to construct such a reader based on the first data row using the ``static_column_types`` argument.
.. doctest::
>>> t3a = LoadTable(filename="t3.tab", static_column_types=True, digits=1,
... sep='\t')
>>> print t3a
=======================================================
edge.name edge.parent length x y z
-------------------------------------------------------
Human edge.0 4.0 1.0 3.0 6.0
HowlerMon edge.0 4.0 1.0 3.0 6.0
Mouse edge.1 4.0 1.0 3.0 6.0
NineBande root 4.0 1.0 3.0 6.0
DogFaced root 4.0 1.0 3.0 6.0
edge.0 edge.1 4.0 1.0 3.0 6.0
edge.1 root 4.0 1.0 3.0 6.0
-------------------------------------------------------
If you invoke the ``static_column_types`` argument and the column data are not static, you'll get a ``ValueError``. We show this by first creating a simple table with mixed data types in a column, write to file and then try to load with ``static_column_types=True``.
.. doctest::
>>> t3b = LoadTable(header=['A', 'B'], rows=[[1,1], ['a', 2]], sep=2)
>>> print t3b
======
A B
------
1 1
a 2
------
>>> t3b.writeToFile('test3b.txt', sep='\t')
>>> t3b = LoadTable('test3b.txt', sep = '\t', static_column_types=True)
Traceback (most recent call last):
ValueError: invalid literal for int() with base 10: 'a'
We also test the reader function for a tab delimited format with missing data at the end.
.. doctest::
>>> data = ['ab\tcd\t', 'ab\tcd\tef']
>>> tab_reader = SeparatorFormatParser(sep='\t')
>>> for line in tab_reader(data):
... assert len(line) == 3, line
Defining a custom reader that operates on entire lines
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
It can also be the case that data types differ between lines. The basic mechanism is the same as above, except in defining the converter you must set the argument ``by_column=True``.
We illustrate this capability by writing a short function that tries to cast entire lines to ``int``, ``float`` or leaves as a string.
.. doctest::
>>> def CastLine():
... floats = lambda x: map(float, x)
... ints = lambda x: map(int, x)
... def call(line):
... try:
... line = ints(line)
... except ValueError:
... try:
... line = floats(line)
... except ValueError:
... pass
... return line
... return call
We then define a couple of lines, create an instance of ``ConvertFields`` and call it for each type.
.. doctest::
>>> line_str_ints = '\t'.join(map(str, range(5)))
>>> line_str_floats = '\t'.join(map(str, map(float, range(5))))
>>> data = [line_str_ints, line_str_floats]
>>> cv = ConvertFields(CastLine(), by_column=False)
>>> tab_reader = SeparatorFormatParser(with_header=False, converter=cv,
... sep='\t')
>>> for line in tab_reader(data):
... print line
[0, 1, 2, 3, 4]
[0.0, 1.0, 2.0, 3.0, 4.0]
Defining a custom writer
^^^^^^^^^^^^^^^^^^^^^^^^
We can likewise specify a writer, using a custom field formatter and provide this to the ``Table`` directly for writing. We first illustrate how the writer works to generate output. We then use it to escape some text fields in quotes. In order to read that back in, we define a custom reader that strips these quotes off.
.. doctest::
>>> from cogent.format.table import FormatFields, SeparatorFormatWriter
>>> formatter = FormatFields([(0,'"%s"'), (1,'"%s"')])
>>> writer = SeparatorFormatWriter(formatter=formatter, sep=" | ")
>>> for formatted in writer(comma_sep, has_header=True):
... print formatted
edge.name | edge.parent | length | x | y | z
"Human" | "edge.0" | 4.0 | 1.0 | 3.0 | 6.0
"HowlerMon" | "edge.0" | 4.0 | 1.0 | 3.0 | 6.0
"Mouse" | "edge.1" | 4.0 | 1.0 | 3.0 | 6.0
"NineBande" | "root" | 4.0 | 1.0 | 3.0 | 6.0
"DogFaced" | "root" | 4.0 | 1.0 | 3.0 | 6.0
"edge.0" | "edge.1" | 4.0 | 1.0 | 3.0 | 6.0
"edge.1" | "root" | 4.0 | 1.0 | 3.0 | 6.0
>>> t3.writeToFile(filename="t3.tab", writer=writer)
>>> strip = lambda x: x.replace('"', '')
>>> converter = ConvertFields([(0,strip), (1, strip)])
>>> reader = SeparatorFormatParser(with_header=True, converter=converter,
... sep="|", strip_wspace=True)
>>> t3a = LoadTable(filename="t3.tab", reader=reader, title="new title",
... space=2)
>>> print t3a
new title
=============================================
edge.name edge.parent length x y z
---------------------------------------------
Human edge.0 4.0 1.0 3.0 6.0
HowlerMon edge.0 4.0 1.0 3.0 6.0
Mouse edge.1 4.0 1.0 3.0 6.0
NineBande root 4.0 1.0 3.0 6.0
DogFaced root 4.0 1.0 3.0 6.0
edge.0 edge.1 4.0 1.0 3.0 6.0
edge.1 root 4.0 1.0 3.0 6.0
---------------------------------------------
.. note:: There are performance issues for large files. Pickling has proven very slow for saving very large files and introduces significant file size bloat. A simple delimited format is much more efficient both storage wise and, if you use a custom reader (or specify ``static_column_types=True``), to generate and read. A custom reader was approximately 6 fold faster than the standard delimited file reader.
Table slicing and iteration
---------------------------
The Table class is capable of slicing by row, range of rows, column or range of columns headings or used to identify a single cell. Slicing using the method ``getColumns`` can also be used to reorder columns. In the case of columns, either the string headings or their position integers can be used. For rows, if ``row_ids`` was specified as ``True`` the 0'th cell in each row can also be used.
.. doctest::
>>> t4 = Table(['edge.name', 'edge.parent', 'length', 'x', 'y', 'z'], d2D,
... row_order = row_order, row_ids = True, title = 'My Title')
We subset ``t4`` by column and reorder them.
.. doctest::
>>> new = t4.getColumns(['z', 'y'])
>>> print new
My Title
=============================
edge.name z y
-----------------------------
Human 6.0000 3.0000
HowlerMon 6.0000 3.0000
Mouse 6.0000 3.0000
NineBande 6.0000 3.0000
DogFaced 6.0000 3.0000
edge.0 6.0000 3.0000
edge.1 6.0000 3.0000
-----------------------------
We use the column position indexes to do get the same table.
.. doctest::
>>> new = t4.getColumns([5, 4])
>>> print new
My Title
=============================
edge.name z y
-----------------------------
Human 6.0000 3.0000
HowlerMon 6.0000 3.0000
Mouse 6.0000 3.0000
NineBande 6.0000 3.0000
DogFaced 6.0000 3.0000
edge.0 6.0000 3.0000
edge.1 6.0000 3.0000
-----------------------------
We can also using more general slicing, by both rows and columns. The following returns all rows from 4 on, and columns up to (but excluding) 'y':
.. doctest::
>>> k = t4[4:, :'y']
>>> print k
My Title
============================================
edge.name edge.parent length x
--------------------------------------------
DogFaced root 4.0000 1.0000
edge.0 edge.1 4.0000 1.0000
edge.1 root 4.0000 1.0000
--------------------------------------------
We can explicitly reference individual cells, in this case using both row and column keys.
.. doctest::
>>> val = t4['HowlerMon', 'y']
>>> print val
3.0
We slice a single row,
.. doctest::
>>> new = t4[3]
>>> print new
My Title
================================================================
edge.name edge.parent length x y z
----------------------------------------------------------------
NineBande root 4.0000 1.0000 3.0000 6.0000
----------------------------------------------------------------
and range of rows.
.. doctest::
>>> new = t4[3:6]
>>> print new
My Title
================================================================
edge.name edge.parent length x y z
----------------------------------------------------------------
NineBande root 4.0000 1.0000 3.0000 6.0000
DogFaced root 4.0000 1.0000 3.0000 6.0000
edge.0 edge.1 4.0000 1.0000 3.0000 6.0000
----------------------------------------------------------------
You can get disjoint rows.
.. doctest::
>>> print t4.getDisjointRows(['Human', 'Mouse', 'DogFaced'])
My Title
================================================================
edge.name edge.parent length x y z
----------------------------------------------------------------
Human edge.0 4.0000 1.0000 3.0000 6.0000
Mouse edge.1 4.0000 1.0000 3.0000 6.0000
DogFaced root 4.0000 1.0000 3.0000 6.0000
----------------------------------------------------------------
You can iterate over the table one row at a time and slice the rows. We illustrate this for slicing a single column,
.. doctest::
>>> for row in t:
... print row['Journal']
INT J PARASITOL
J MED ENTOMOL
Med Vet Entomol
INSECT MOL BIOL
J AM MOSQUITO CONTR
MOL PHYLOGENET EVOL
HEREDITY
AM J TROP MED HYG
MIL MED
MED J AUSTRALIA
and for multiple columns.
.. doctest::
>>> for row in t:
... print row['Journal'], row['Impact']
INT J PARASITOL 2.9
J MED ENTOMOL 1.4
Med Vet Entomol 1.0
INSECT MOL BIOL 2.85
J AM MOSQUITO CONTR 0.811
MOL PHYLOGENET EVOL 2.8
HEREDITY 1.99
AM J TROP MED HYG 2.105
MIL MED 0.605
MED J AUSTRALIA 1.736
The numerical slice equivalent to the first case above would be ``row[0]``, to the second case either ``row[:]``, ``row[:2]``.
Filtering tables - selecting subsets of rows/columns
----------------------------------------------------
We want to be able to slice a table, based on some condition(s), to produce a new subset table. For instance, we construct a table with type and probability values.
.. doctest::
>>> header = ['Gene', 'type', 'LR', 'df', 'Prob']
>>> rows = (('NP_003077_hs_mm_rn_dna', 'Con', 2.5386, 1, 0.1111),
... ('NP_004893_hs_mm_rn_dna', 'Con', 0.1214, 1, 0.7276),
... ('NP_005079_hs_mm_rn_dna', 'Con', 0.9517, 1, 0.3293),
... ('NP_005500_hs_mm_rn_dna', 'Con', 0.7383, 1, 0.3902),
... ('NP_055852_hs_mm_rn_dna', 'Con', 0.0000, 1, 0.9997),
... ('NP_057012_hs_mm_rn_dna', 'Unco', 34.3081, 1, 0.0000),
... ('NP_061130_hs_mm_rn_dna', 'Unco', 3.7986, 1, 0.0513),
... ('NP_065168_hs_mm_rn_dna', 'Con', 89.9766, 1, 0.0000),
... ('NP_065396_hs_mm_rn_dna', 'Unco', 11.8912, 1, 0.0006),
... ('NP_109590_hs_mm_rn_dna', 'Con', 0.2121, 1, 0.6451),
... ('NP_116116_hs_mm_rn_dna', 'Unco', 9.7474, 1, 0.0018))
>>> t5 = Table(header, rows)
>>> print t5
=========================================================
Gene type LR df Prob
---------------------------------------------------------
NP_003077_hs_mm_rn_dna Con 2.5386 1 0.1111
NP_004893_hs_mm_rn_dna Con 0.1214 1 0.7276
NP_005079_hs_mm_rn_dna Con 0.9517 1 0.3293
NP_005500_hs_mm_rn_dna Con 0.7383 1 0.3902
NP_055852_hs_mm_rn_dna Con 0.0000 1 0.9997
NP_057012_hs_mm_rn_dna Unco 34.3081 1 0.0000
NP_061130_hs_mm_rn_dna Unco 3.7986 1 0.0513
NP_065168_hs_mm_rn_dna Con 89.9766 1 0.0000
NP_065396_hs_mm_rn_dna Unco 11.8912 1 0.0006
NP_109590_hs_mm_rn_dna Con 0.2121 1 0.6451
NP_116116_hs_mm_rn_dna Unco 9.7474 1 0.0018
---------------------------------------------------------
We then seek to obtain only those rows that contain probabilities < 0.05. We use valid python code within a string. **Note:** Make sure your column headings could be valid python variable names or the string based approach will fail (you could use an external function instead, see below).
.. doctest::
>>> sub_table1 = t5.filtered(callback = "Prob < 0.05")
>>> print sub_table1
=========================================================
Gene type LR df Prob
---------------------------------------------------------
NP_057012_hs_mm_rn_dna Unco 34.3081 1 0.0000
NP_065168_hs_mm_rn_dna Con 89.9766 1 0.0000
NP_065396_hs_mm_rn_dna Unco 11.8912 1 0.0006
NP_116116_hs_mm_rn_dna Unco 9.7474 1 0.0018
---------------------------------------------------------
Using the above table we test the function to extract the raw data for a single column,
.. doctest::
>>> raw = sub_table1.getRawData('LR')
>>> raw
[34.3081..., 89.9766..., 11.8912, 9.7474...]
and from multiple columns.
.. doctest::
>>> raw = sub_table1.getRawData(columns = ['df', 'Prob'])
>>> raw
[[1, 0.0], [1, 0.0],...
We can also do filtering using an external function, in this case we use a ``lambda`` to obtain only those rows of type 'Unco' that contain probabilities < 0.05, modifying our callback function.
.. doctest::
>>> func = lambda (ty, pr): ty == 'Unco' and pr < 0.05
>>> sub_table2 = t5.filtered(columns = ('type', 'Prob'), callback = func)
>>> print sub_table2
=========================================================
Gene type LR df Prob
---------------------------------------------------------
NP_057012_hs_mm_rn_dna Unco 34.3081 1 0.0000
NP_065396_hs_mm_rn_dna Unco 11.8912 1 0.0006
NP_116116_hs_mm_rn_dna Unco 9.7474 1 0.0018
---------------------------------------------------------
This can also be done using the string approach.
.. doctest::
>>> sub_table2 = t5.filtered(callback = "type == 'Unco' and Prob < 0.05")
>>> print sub_table2
=========================================================
Gene type LR df Prob
---------------------------------------------------------
NP_057012_hs_mm_rn_dna Unco 34.3081 1 0.0000
NP_065396_hs_mm_rn_dna Unco 11.8912 1 0.0006
NP_116116_hs_mm_rn_dna Unco 9.7474 1 0.0018
---------------------------------------------------------
We can also filter table columns using ``filteredByColumn``. Say we only want the numerical columns, we can write a callback that returns ``False`` if some numerical operation fails, ``True`` otherwise.
.. doctest::
>>> def is_numeric(values):
... try:
... sum(values)
... except TypeError:
... return False
... return True
>>> print t5.filteredByColumn(callback=is_numeric)
=======================
LR df Prob
-----------------------
2.5386 1 0.1111
0.1214 1 0.7276
0.9517 1 0.3293
0.7383 1 0.3902
0.0000 1 0.9997
34.3081 1 0.0000
3.7986 1 0.0513
89.9766 1 0.0000
11.8912 1 0.0006
0.2121 1 0.6451
9.7474 1 0.0018
-----------------------
Appending tables
----------------
Tables may also be appended to each other, to make larger tables. We'll construct two simple tables to illustrate this.
.. doctest::
>>> geneA = Table(['edge.name', 'edge.parent', 'z'], [['Human','root',
... 6.0],['Mouse','root', 6.0], ['Rat','root', 6.0]],
... title='Gene A')
>>> geneB = Table(['edge.name', 'edge.parent', 'z'], [['Human','root',
... 7.0],['Mouse','root', 7.0], ['Rat','root', 7.0]],
... title='Gene B')
>>> print geneB
Gene B
==================================
edge.name edge.parent z
----------------------------------
Human root 7.0000
Mouse root 7.0000
Rat root 7.0000
----------------------------------
we now use the ``appended`` Table method to create a new table, specifying that we want a new column created (by passing the ``new_column`` argument a heading) in which the table titles will be placed.
.. doctest::
>>> new = geneA.appended('Gene', geneB, title='Appended tables')
>>> print new
Appended tables
============================================
Gene edge.name edge.parent z
--------------------------------------------
Gene A Human root 6.0000
Gene A Mouse root 6.0000
Gene A Rat root 6.0000
Gene B Human root 7.0000
Gene B Mouse root 7.0000
Gene B Rat root 7.0000
--------------------------------------------
We repeat this without adding a new column.
.. doctest::
>>> new = geneA.appended(None, geneB, title="Appended, no new column")
>>> print new
Appended, no new column
==================================
edge.name edge.parent z
----------------------------------
Human root 6.0000
Mouse root 6.0000
Rat root 6.0000
Human root 7.0000
Mouse root 7.0000
Rat root 7.0000
----------------------------------
Miscellaneous
-------------
Tables have a ``Shape`` attribute, which specifies *x* (number of columns) and *y* (number of rows). The attribute is a tuple and we illustrate it for the above ``sub_table`` tables. Combined with the ``filtered`` method, this attribute can tell you how many rows satisfy a specific condition.
.. doctest::
>>> t5.Shape
(11, 5)
>>> sub_table1.Shape
(4, 5)
>>> sub_table2.Shape
(3, 5)
For instance, 3 of the 11 rows in ``t`` were significant and belonged to the ``Unco`` type.
For completeness, we generate a table with no rows and assess its shape.
.. doctest::
>>> func = lambda (ty, pr): ty == 'Unco' and pr > 0.1
>>> sub_table3 = t5.filtered(columns = ('type', 'Prob'), callback = func)
>>> sub_table3.Shape
(0, 5)
The distinct values can be obtained for a single column,
.. doctest::
>>> distinct = new.getDistinctValues("edge.name")
>>> assert distinct == set(['Rat', 'Mouse', 'Human'])
or multiple columns
.. doctest::
>>> distinct = new.getDistinctValues(["edge.parent", "z"])
>>> assert distinct == set([('root', 6.0), ('root', 7.0)]), distinct
We can compute column sums. Assuming only numerical values in a column.
.. doctest::
>>> assert new.summed('z') == 39., new.summed('z')
We construct an example with mixed numerical and non-numerical data. We now compute the column sum with mixed non-numerical/numerical data.
.. doctest::
:options: +NORMALIZE_WHITESPACE
>>> mix = LoadTable(header=['A', 'B'], rows=[[0,''],[1,2],[3,4]])
>>> print mix
======
A B
------
0
1 2
3 4
------
>>> mix.summed('B', strict=False)
6
We also compute row sums for the pure numerical and mixed non-numerical/numerical rows. For summing across rows we must specify the actual row index as an ``int``.
.. doctest::
>>> mix.summed(0, col_sum=False, strict=False)
0
>>> mix.summed(1, col_sum=False)
3
We can compute the totals for all columns or rows too.
.. doctest::
>>> mix.summed(strict=False)
[4, 6]
>>> mix.summed(col_sum=False, strict=False)
[0, 3, 7]
It is not currently possible to do a subset of columns/rows. We show this for rows here.
.. doctest::
>>> mix.summed([0, 2], col_sum=False, strict=False)
Traceback (most recent call last):
RuntimeError: unknown indices type: [0, 2]
We test these for a strictly numerical table.
.. doctest::
>>> non_mix = LoadTable(header=['A', 'B'], rows=[[0,1],[1,2],[3,4]])
>>> non_mix.summed()
[4, 7]
>>> non_mix.summed(col_sum=False)
[1, 3, 7]
We can normalise a numerical table by row,
.. doctest::
>>> print non_mix.normalized(by_row=True)
================
A B
----------------
0.0000 1.0000
0.3333 0.6667
0.4286 0.5714
----------------
or by column, such that the row/column sums are 1.
.. doctest::
>>> print non_mix.normalized(by_row=False)
================
A B
----------------
0.0000 0.1429
0.2500 0.2857
0.7500 0.5714
----------------
We normalize by an arbitrary function (maximum value) by row,
.. doctest::
>>> print non_mix.normalized(by_row=True, denominator_func=max)
================
A B
----------------
0.0000 1.0000
0.5000 1.0000
0.7500 1.0000
----------------
by column.
.. doctest::
>>> print non_mix.normalized(by_row=False, denominator_func=max)
================
A B
----------------
0.0000 0.2500
0.3333 0.5000
1.0000 1.0000
----------------
Extending tables
----------------
In some cases it is desirable to compute an additional column from existing column values. This is done using the ``withNewColumn`` method. We'll use t4 from above, adding two of the columns to create an additional column.
.. doctest::
>>> t7 = t4.withNewColumn('Sum', callback="z+x", digits=2)
>>> print t7
My Title
==================================================================
edge.name edge.parent length x y z Sum
------------------------------------------------------------------
Human edge.0 4.00 1.00 3.00 6.00 7.00
HowlerMon edge.0 4.00 1.00 3.00 6.00 7.00
Mouse edge.1 4.00 1.00 3.00 6.00 7.00
NineBande root 4.00 1.00 3.00 6.00 7.00
DogFaced root 4.00 1.00 3.00 6.00 7.00
edge.0 edge.1 4.00 1.00 3.00 6.00 7.00
edge.1 root 4.00 1.00 3.00 6.00 7.00
------------------------------------------------------------------
We test this with an externally defined function.
.. doctest::
>>> func = lambda (x, y): x * y
>>> t7 = t4.withNewColumn('Sum', callback=func, columns=("y","z"),
... digits=2)
>>> print t7
My Title
===================================================================
edge.name edge.parent length x y z Sum
-------------------------------------------------------------------
Human edge.0 4.00 1.00 3.00 6.00 18.00
HowlerMon edge.0 4.00 1.00 3.00 6.00 18.00
Mouse edge.1 4.00 1.00 3.00 6.00 18.00
NineBande root 4.00 1.00 3.00 6.00 18.00
DogFaced root 4.00 1.00 3.00 6.00 18.00
edge.0 edge.1 4.00 1.00 3.00 6.00 18.00
edge.1 root 4.00 1.00 3.00 6.00 18.00
-------------------------------------------------------------------
>>> func = lambda x: x**3
>>> t7 = t4.withNewColumn('Sum', callback=func, columns="y", digits=2)
>>> print t7
My Title
===================================================================
edge.name edge.parent length x y z Sum
-------------------------------------------------------------------
Human edge.0 4.00 1.00 3.00 6.00 27.00
HowlerMon edge.0 4.00 1.00 3.00 6.00 27.00
Mouse edge.1 4.00 1.00 3.00 6.00 27.00
NineBande root 4.00 1.00 3.00 6.00 27.00
DogFaced root 4.00 1.00 3.00 6.00 27.00
edge.0 edge.1 4.00 1.00 3.00 6.00 27.00
edge.1 root 4.00 1.00 3.00 6.00 27.00
-------------------------------------------------------------------
Sorting tables
--------------
We want a table sorted according to values in a column.
.. doctest::
>>> sorted = t5.sorted(columns = 'LR')
>>> print sorted
=========================================================
Gene type LR df Prob
---------------------------------------------------------
NP_055852_hs_mm_rn_dna Con 0.0000 1 0.9997
NP_004893_hs_mm_rn_dna Con 0.1214 1 0.7276
NP_109590_hs_mm_rn_dna Con 0.2121 1 0.6451
NP_005500_hs_mm_rn_dna Con 0.7383 1 0.3902
NP_005079_hs_mm_rn_dna Con 0.9517 1 0.3293
NP_003077_hs_mm_rn_dna Con 2.5386 1 0.1111
NP_061130_hs_mm_rn_dna Unco 3.7986 1 0.0513
NP_116116_hs_mm_rn_dna Unco 9.7474 1 0.0018
NP_065396_hs_mm_rn_dna Unco 11.8912 1 0.0006
NP_057012_hs_mm_rn_dna Unco 34.3081 1 0.0000
NP_065168_hs_mm_rn_dna Con 89.9766 1 0.0000
---------------------------------------------------------
We want a table sorted according to values in a subset of columns, note the order of columns determines the sort order.
.. doctest::
>>> sorted = t5.sorted(columns=('LR', 'type'))
>>> print sorted
=========================================================
Gene type LR df Prob
---------------------------------------------------------
NP_055852_hs_mm_rn_dna Con 0.0000 1 0.9997
NP_004893_hs_mm_rn_dna Con 0.1214 1 0.7276
NP_109590_hs_mm_rn_dna Con 0.2121 1 0.6451
NP_005500_hs_mm_rn_dna Con 0.7383 1 0.3902
NP_005079_hs_mm_rn_dna Con 0.9517 1 0.3293
NP_003077_hs_mm_rn_dna Con 2.5386 1 0.1111
NP_061130_hs_mm_rn_dna Unco 3.7986 1 0.0513
NP_116116_hs_mm_rn_dna Unco 9.7474 1 0.0018
NP_065396_hs_mm_rn_dna Unco 11.8912 1 0.0006
NP_057012_hs_mm_rn_dna Unco 34.3081 1 0.0000
NP_065168_hs_mm_rn_dna Con 89.9766 1 0.0000
---------------------------------------------------------
We now do a sort based on 2 columns.
.. doctest::
>>> sorted = t5.sorted(columns=('type', 'LR'))
>>> print sorted
=========================================================
Gene type LR df Prob
---------------------------------------------------------
NP_055852_hs_mm_rn_dna Con 0.0000 1 0.9997
NP_004893_hs_mm_rn_dna Con 0.1214 1 0.7276
NP_109590_hs_mm_rn_dna Con 0.2121 1 0.6451
NP_005500_hs_mm_rn_dna Con 0.7383 1 0.3902
NP_005079_hs_mm_rn_dna Con 0.9517 1 0.3293
NP_003077_hs_mm_rn_dna Con 2.5386 1 0.1111
NP_065168_hs_mm_rn_dna Con 89.9766 1 0.0000
NP_061130_hs_mm_rn_dna Unco 3.7986 1 0.0513
NP_116116_hs_mm_rn_dna Unco 9.7474 1 0.0018
NP_065396_hs_mm_rn_dna Unco 11.8912 1 0.0006
NP_057012_hs_mm_rn_dna Unco 34.3081 1 0.0000
---------------------------------------------------------
Reverse sort a single column
.. doctest::
>>> sorted = t5.sorted('LR', reverse = 'LR')
>>> print sorted
=========================================================
Gene type LR df Prob
---------------------------------------------------------
NP_065168_hs_mm_rn_dna Con 89.9766 1 0.0000
NP_057012_hs_mm_rn_dna Unco 34.3081 1 0.0000
NP_065396_hs_mm_rn_dna Unco 11.8912 1 0.0006
NP_116116_hs_mm_rn_dna Unco 9.7474 1 0.0018
NP_061130_hs_mm_rn_dna Unco 3.7986 1 0.0513
NP_003077_hs_mm_rn_dna Con 2.5386 1 0.1111
NP_005079_hs_mm_rn_dna Con 0.9517 1 0.3293
NP_005500_hs_mm_rn_dna Con 0.7383 1 0.3902
NP_109590_hs_mm_rn_dna Con 0.2121 1 0.6451
NP_004893_hs_mm_rn_dna Con 0.1214 1 0.7276
NP_055852_hs_mm_rn_dna Con 0.0000 1 0.9997
---------------------------------------------------------
Reverse sort one column but not another
.. doctest::
>>> sorted = t5.sorted(columns=('type', 'LR'), reverse = 'LR')
>>> print sorted
=========================================================
Gene type LR df Prob
---------------------------------------------------------
NP_065168_hs_mm_rn_dna Con 89.9766 1 0.0000
NP_003077_hs_mm_rn_dna Con 2.5386 1 0.1111
NP_005079_hs_mm_rn_dna Con 0.9517 1 0.3293
NP_005500_hs_mm_rn_dna Con 0.7383 1 0.3902
NP_109590_hs_mm_rn_dna Con 0.2121 1 0.6451
NP_004893_hs_mm_rn_dna Con 0.1214 1 0.7276
NP_055852_hs_mm_rn_dna Con 0.0000 1 0.9997
NP_057012_hs_mm_rn_dna Unco 34.3081 1 0.0000
NP_065396_hs_mm_rn_dna Unco 11.8912 1 0.0006
NP_116116_hs_mm_rn_dna Unco 9.7474 1 0.0018
NP_061130_hs_mm_rn_dna Unco 3.7986 1 0.0513
---------------------------------------------------------
Reverse sort both columns.
.. doctest::
>>> sorted = t5.sorted(columns=('type', 'LR'), reverse = ('type', 'LR'))
>>> print sorted
=========================================================
Gene type LR df Prob
---------------------------------------------------------
NP_057012_hs_mm_rn_dna Unco 34.3081 1 0.0000
NP_065396_hs_mm_rn_dna Unco 11.8912 1 0.0006
NP_116116_hs_mm_rn_dna Unco 9.7474 1 0.0018
NP_061130_hs_mm_rn_dna Unco 3.7986 1 0.0513
NP_065168_hs_mm_rn_dna Con 89.9766 1 0.0000
NP_003077_hs_mm_rn_dna Con 2.5386 1 0.1111
NP_005079_hs_mm_rn_dna Con 0.9517 1 0.3293
NP_005500_hs_mm_rn_dna Con 0.7383 1 0.3902
NP_109590_hs_mm_rn_dna Con 0.2121 1 0.6451
NP_004893_hs_mm_rn_dna Con 0.1214 1 0.7276
NP_055852_hs_mm_rn_dna Con 0.0000 1 0.9997
---------------------------------------------------------
Joining Tables
--------------
The Table object is capable of joins or merging of records in two tables. There are two fundamental types of joins -- inner and outer -- with there being different sub-types. We demonstrate these first constructing some simple tables.
.. doctest::
>>> a=Table(header=["index", "col2","col3"],
... rows=[[1,2,3],[2,3,1],[2,6,5]], title="A")
>>> print a
A
=====================
index col2 col3
---------------------
1 2 3
2 3 1
2 6 5
---------------------
>>> b=Table(header=["index", "col2","col3"],
... rows=[[1,2,3],[2,2,1],[3,6,3]], title="B")
>>> print b
B
=====================
index col2 col3
---------------------
1 2 3
2 2 1
3 6 3
---------------------
>>> c=Table(header=["index","col_c2"],rows=[[1,2],[3,2],[3,5]],title="C")
>>> print c
C
===============
index col_c2
---------------
1 2
3 2
3 5
---------------
For a natural inner join, only 1 copy of columns with the same name are retained. So we expect the headings to be identical between the table ``a``/``b`` and the result of ``a.joined(b)`` or ``b.joined(a)``.
.. doctest::
>>> assert a.joined(b).Header == b.Header
>>> assert b.joined(a).Header == a.Header
For a standard inner join, the joined table should contain all columns from ``a`` and ``b`` excepting the index column(s). Simply providing a column name (or index) selects this behaviour. Note that in this case, column names from the second table are made unique by prefixing them with that tables title. If the provided tables do not have a title, a ``RuntimeError`` is raised.
.. doctest::
>>> b.Title = None
>>> try:
... a.joined(b)
... except RuntimeError:
... pass
>>> b.Title = 'B'
>>> assert a.joined(b, "index").Header == ["index", "col2", "col3",
... "B_col2", "B_col3"]
...
Note that the table title's were used to prefix the column headings from the second table. We further test this using table ``c`` which has different dimensions.
.. doctest::
>>> assert a.joined(c,"index").Header == ["index","col2","col3",
... "C_col_c2"]
It's also possible to specify index columns using numerical values, the results of which should be the same.
.. doctest::
>>> assert a.joined(b,[0, 2]).getRawData() ==\
... a.joined(b,["index","col3"]).getRawData()
Additionally, it's possible to provide two series of indices for the two tables. Here, they have identical values.
.. doctest::
>>> assert a.joined(b, ["index", "col3"],["index", "col3"]).getRawData()\
... == a.joined(b,["index","col3"]).getRawData()
The results of a standard join between tables ``a`` and ``b`` are
.. doctest::
>>> print a.joined(b, ["index"], title='A&B')
A&B
=========================================
index col2 col3 B_col2 B_col3
-----------------------------------------
1 2 3 2 3
2 3 1 2 1
2 6 5 2 1
-----------------------------------------
We demo the table specific indices.
.. doctest::
>>> print a.joined(c, ["col2"], ["index"], title='A&C by "col2/index"')
A&C by "col2/index"
=================================
index col2 col3 C_col_c2
---------------------------------
2 3 1 2
2 3 1 5
---------------------------------
Tables ``a`` and ``c`` share a single row with the same value in the ``index`` column, hence a join by that index should return a table with just that row.
.. doctest::
>>> print a.joined(c, "index", title='A&C by "index"')
A&C by "index"
=================================
index col2 col3 C_col_c2
---------------------------------
1 2 3 2
---------------------------------
A natural join of tables ``a`` and ``b`` results in a table with only rows that were identical between the two parents.
.. doctest::
>>> print a.joined(b, title='A&B Natural Join')
A&B Natural Join
=====================
index col2 col3
---------------------
1 2 3
---------------------
We test the outer join by defining an additional table with different dimensions, and conducting a join specifying ``inner_join=False``.
.. doctest::
>>> d=Table(header=["index", "col_c2"], rows=[[5,42],[6,23]], title="D")
>>> print d
D
===============
index col_c2
---------------
5 42
6 23
---------------
>>> print c.joined(d,inner_join=False, title='C&D Outer join')
C&D Outer join
======================================
index col_c2 D_index D_col_c2
--------------------------------------
1 2 5 42
1 2 6 23
3 2 5 42
3 2 6 23
3 5 5 42
3 5 6 23
--------------------------------------
We establish the ``joined`` method works for mixtures of character and numerical data, setting some indices and some cell values to be strings.
.. doctest::
>>> a=Table(header=["index", "col2","col3"],
... rows=[[1,2,"3"],["2",3,1],[2,6,5]], title="A")
>>> b=Table(header=["index", "col2","col3"],
... rows=[[1,2,"3"],["2",2,1],[3,6,3]], title="B")
>>> assert a.joined(b, ["index", "col3"],["index", "col3"]).getRawData()\
... == a.joined(b,["index","col3"]).getRawData()
We test that the ``joined`` method works when the column index orders differ.
.. doctest::
>>> t1_header = ['a', 'b']
>>> t1_rows = [(1,2),(3,4)]
>>> t2_header = ['b', 'c']
>>> t2_rows = [(3,6),(4,8)]
>>> t1 = Table(header = t1_header, rows = t1_rows, title='t1')
>>> t2 = Table(header = t2_header, rows = t2_rows, title='t2')
>>> t3 = t1.joined(t2, columns_self = ["b"], columns_other = ["b"])
>>> print t3
==============
a b t2_c
--------------
3 4 8
--------------
We then establish that a join with no values does not cause a failure, just returns an empty ``Table``.
.. doctest::
>>> t4_header = ['b', 'c']
>>> t4_rows = [(5,6),(7,8)]
>>> t4 = LoadTable(header = t4_header, rows = t4_rows)
>>> t4.Title = 't4'
>>> t5 = t1.joined(t4, columns_self = ["b"], columns_other = ["b"])
>>> print t5
==============
a b t4_c
--------------
--------------
Whose representation looks like
.. doctest::
>>> t5
Table(numrows=0, numcols=3, header=['a', 'b', 't4_c'], rows=[])
Transposing a table
-------------------
Tables can be transposed.
.. doctest::
>>> from cogent import LoadTable
>>> title='#Full OTU Counts'
>>> header = ['#OTU ID', '14SK041', '14SK802']
>>> rows = [[-2920, '332', 294],
... [-1606, '302', 229],
... [-393, 141, 125],
... [-2109, 138, 120],
... [-5439, 104, 117],
... [-1834, 70, 75],
... [-18588, 65, 47],
... [-1350, 60, 113],
... [-2160, 57, 52],
... [-11632, 47, 36]]
>>> table = LoadTable(header=header,rows=rows,title=title)
>>> print table
#Full OTU Counts
=============================
#OTU ID 14SK041 14SK802
-----------------------------
-2920 332 294
-1606 302 229
-393 141 125
-2109 138 120
-5439 104 117
-1834 70 75
-18588 65 47
-1350 60 113
-2160 57 52
-11632 47 36
-----------------------------
We now transpose this. We require a new column heading for header data and an identifier for which existing column will become the header (default is index 0).
.. doctest::
>>> tp = table.transposed(new_column_name='sample',
... select_as_header='#OTU ID', space=2)
...
>>> print tp
==============================================================================
sample -2920 -1606 -393 -2109 -5439 -1834 -18588 -1350 -2160 -11632
------------------------------------------------------------------------------
14SK041 332 302 141 138 104 70 65 60 57 47
14SK802 294 229 125 120 117 75 47 113 52 36
------------------------------------------------------------------------------
We test transposition with default value is the same.
.. doctest::
>>> tp = table.transposed(new_column_name='sample', space=2)
...
>>> print tp
==============================================================================
sample -2920 -1606 -393 -2109 -5439 -1834 -18588 -1350 -2160 -11632
------------------------------------------------------------------------------
14SK041 332 302 141 138 104 70 65 60 57 47
14SK802 294 229 125 120 117 75 47 113 52 36
------------------------------------------------------------------------------
We test transposition selecting a different column to become the header.
.. doctest::
>>> tp = table.transposed(new_column_name='sample',
... select_as_header='14SK802', space=2)
...
>>> print tp
==============================================================================
sample 294 229 125 120 117 75 47 113 52 36
------------------------------------------------------------------------------
#OTU ID -2920 -1606 -393 -2109 -5439 -1834 -18588 -1350 -2160 -11632
14SK041 332 302 141 138 104 70 65 60 57 47
------------------------------------------------------------------------------
Counting rows
-------------
We can count the number of rows for which a condition holds. This method uses the same arguments as ``filtered`` but returns an integer result only.
.. doctest::
>>> print c.count("col_c2 == 2")
2
>>> print c.joined(d,inner_join=False).count("index==3 and D_index==5")
2
Testing a sub-component
-----------------------
Before using ``Table``, we exercise some formatting code:
.. doctest::
>>> from cogent.format.table import formattedCells, phylipMatrix, latex
We check we can format an arbitrary 2D list, without a header, using the ``formattedCells`` function directly.
.. doctest::
>>> data = [[230, 'acdef', 1.3], [6, 'cc', 1.9876]]
>>> head = ['one', 'two', 'three']
>>> header, formatted = formattedCells(data, header = head)
>>> print formatted
[['230', 'acdef', '1.3000'], [' 6', ' cc', '1.9876']]
>>> print header
['one', ' two', ' three']
We directly test the latex formatting.
.. doctest::
>>> print latex(formatted, header, justify='lrl', caption='A legend',
... label="table:test")
\begin{longtable}[htp!]{ l r l }
\hline
\bf{one} & \bf{two} & \bf{three} \\
\hline
\hline
230 & acdef & 1.3000 \\
6 & cc & 1.9876 \\
\hline
\caption{A legend}
\label{table:test}
\end{longtable}
..
Import the ``os`` module so some file cleanup can be done at the end. To check the contents of those files, just delete the following prior to running the test. The try/except clause below is aimed at case where ``junk.pdf`` wasn't created due to ``reportlab`` not being present.
.. doctest::
:hide:
>>> import os
>>> to_delete = ['t3.pickle', 't2.csv', 't2.csv.gz', 't3.tab',
... 'test3b.txt']
>>> for f in to_delete:
... try:
... os.remove(f)
... except OSError:
... pass
PyCogent-1.5.3/tests/test_util/test_transform.py 000644 000765 000024 00000112414 12024702176 023046 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests of transformation and composition functions .
"""
from cogent.util.unit_test import TestCase, main
from cogent.util.misc import identity
from cogent.util.transform import apply_each, bools, bool_each, \
conjoin, all, both,\
disjoin, any, either, negate, none, neither, compose, compose_many, \
per_shortest, per_longest, for_seq, \
has_field, extract_field, test_field, index, test_container, \
trans_except, trans_all, make_trans, find_any, find_no, find_all,\
keep_if_more, exclude_if_more, keep_if_more_other, exclude_if_more_other,\
keep_chars,exclude_chars, reorder, reorder_inplace, float_from_string,\
first, last, first_in_set, last_in_set, first_not_in_set, last_not_in_set,\
first_index, last_index, first_index_in_set, last_index_in_set, \
first_index_not_in_set, last_index_not_in_set, perm, comb, cross_comb, _increment_comb
__author__ = "Sandra Smit"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Sandra Smit", "Zongzhi Liu"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Sandra Smit"
__email__ = "sandra.smit@colorado.edu"
__status__ = "Production"
class has_x(object):
#convenience class for has_field and related functions
def __init__(self, x):
self.x = x
def __hash__(self):
return hash(self.x)
def __str__(self):
return str(self.x)
class has_y(object):
#convenience class for has_field and related functions
def __init__(self, y):
self.y = y
def __hash__(self):
return hash(self.y)
def __str__(self):
return str(self.y)
class metafunctionsTests(TestCase):
"""Tests of standalone functions."""
def setUp(self):
"""Define some standard functions and data."""
self.Numbers = range(20)
self.SmallNumbers = range(3)
self.SmallNumbersRepeated = range(5) * 4
self.Letters = 'abcde'
self.Mixed = list(self.Letters) + range(5)
self.firsts = 'ab2'
self.seconds = '0bc'
self.is_char = lambda x: isinstance(x, str) and len(x) == 1
self.is_vowel = lambda x: x in 'aeiou'
self.is_consonant = lambda x: x not in 'aeiuo'
self.is_number = lambda x: isinstance(x, int)
self.is_odd_number = lambda x: x%2
self.is_odd_letter = lambda x: x in 'acegikmoqs'
self.is_zero = lambda x: x == 0
self.is_small = lambda x: x < 3
self.double = lambda x: x * 2
self.minusone = lambda x: x - 1
#function to test *args, **kwargs)
self.is_alpha_digit = lambda first, second: \
first.isalpha() and second.isdigit()
self.is_digit_alpha = lambda first, second: \
first.isdigit() and second.isalpha()
def test_apply_each(self):
"""apply_each should apply each function to args, kwargs"""
self.assertEqual(apply_each( \
[self.is_char, self.is_vowel, self.is_consonant, self.is_number], \
self.Letters[0]), [True, True, False, False])
self.assertEqual(apply_each( \
[self.is_char, self.is_vowel, self.is_consonant, self.is_number], \
self.Letters[1]), [True, False, True, False])
self.assertEqual(apply_each( \
[self.double, self.minusone], self.SmallNumbers[0]), [0, -1])
self.assertEqual(apply_each( \
[self.double, self.minusone], self.SmallNumbers[1]), [2, 0])
expects = [[True, False], [False, False], [False, True]]
for i in range(len(expects)):
self.assertEqual(apply_each( \
[self.is_alpha_digit, self.is_digit_alpha],
self.firsts[i], self.seconds[i]), expects[i])
self.assertEqual(apply_each( \
[self.is_alpha_digit, self.is_digit_alpha],
self.firsts[i], second=self.seconds[i]), expects[i])
self.assertEqual(apply_each( \
[self.is_alpha_digit, self.is_digit_alpha],
second=self.seconds[i], first=self.firsts[i]), expects[i])
def test_bools(self):
"""bools should convert items to True or False."""
self.assertEqual(bools(self.Letters), [True]*5)
self.assertEqual(bools(self.Numbers), [False] + [True]*19)
def test_bool_each(self):
"""bool_each should return boolean version of applying each f to args"""
self.assertEqual(bool_each([self.double, self.minusone], \
self.SmallNumbers[0]), [False, True])
self.assertEqual(bool_each([self.double, self.minusone], \
self.SmallNumbers[1]), [True, False])
def test_conjoin(self):
"""conjoin should return True if all components True"""
self.assertEqual(conjoin([self.is_odd_letter,self.is_vowel],'a'), True)
self.assertEqual(conjoin([self.is_odd_letter,self.is_vowel], x='b'),
False)
self.assertEqual(conjoin([self.is_odd_letter,self.is_vowel],'c'), False)
self.assertEqual(conjoin([self.is_odd_letter,self.is_vowel],'e'), True)
#technically, this one should be true as well, but I left it off to
#have an even vowel test case...
self.assertEqual(conjoin([self.is_odd_letter,self.is_vowel],'u'), False)
#should short-circuit, i.e. not evaluate later cases after False
self.assertEqual(conjoin([self.is_odd_letter, self.fail], 'b'), False)
self.assertRaises(AssertionError, conjoin, \
[self.is_odd_letter, self.fail], 'a')
def test_all(self):
"""all should return a function returning True if all components True"""
odd_vowel = all([self.is_odd_letter, self.is_vowel, self.is_char])
self.assertEqual(odd_vowel('a'), True)
self.assertEqual(map(odd_vowel, 'abceu'),
[True,False,False,True,False])
odd_number = all([self.is_odd_number, self.is_number])
self.assertEqual(map(odd_number, range(5)), [False,True]*2+[False])
#should short-circuit, i.e. not evaluate later cases after False
self.assertEqual(all([self.is_odd_letter, self.fail])('b'), False)
self.assertRaises(AssertionError, all([self.is_odd_letter,self.fail]),\
'a')
def test_both(self):
"""both should return True if both components True"""
odd_vowel = both(self.is_odd_letter, self.is_vowel)
self.assertEqual(map(odd_vowel, 'abcu'), [True,False,False,False])
#should short-circuit
self.assertEqual(both(self.is_odd_letter, self.fail)('b'), False)
self.assertRaises(AssertionError, both(self.is_odd_letter, self.fail),\
'a')
def test_disjoin(self):
"""disjoin should return True if any component True"""
self.assertEqual(disjoin([self.is_odd_letter,self.is_vowel], 'a'), True)
self.assertEqual(disjoin([self.is_odd_letter,self.is_vowel], 'b'),False)
self.assertEqual(disjoin([self.is_odd_letter,self.is_vowel], 'c'), True)
self.assertEqual(disjoin([self.is_odd_letter,self.is_vowel], x='u'),
True)
#should short-circuit after first True
self.assertEqual(disjoin([self.is_odd_letter, self.fail], 'a'), True)
self.assertRaises(AssertionError, \
disjoin, [self.is_odd_letter, self.fail], 'b')
def test_any(self):
"""any should return a function returning True if any component True"""
odd_vowel = any([self.is_odd_letter, self.is_vowel])
self.assertEqual(odd_vowel('a'), True)
self.assertEqual(map(odd_vowel, 'abceu'), [True,False,True,True,True])
odd = any([self.is_odd_number, self.is_small])
self.assertEqual(map(odd, range(5)), [True]*4+[False])
#should short-circuit after first True
self.assertEqual(any([self.is_odd_letter, self.fail])(x='a'), True)
self.assertRaises(AssertionError, any([self.is_odd_letter,self.fail]),\
'b')
def test_either(self):
"""either should return function returning True if either component True"""
odd_vowel = either(self.is_odd_letter, self.is_vowel)
self.assertEqual(map(odd_vowel, 'abcu'), [True,False,True,True])
#should short-circuit
self.assertEqual(either(self.is_odd_letter, self.fail)(x='a'), True)
self.assertRaises(AssertionError, \
either(self.is_odd_letter, self.fail), 'b')
def test_negate(self):
"""negate should return True if no component True"""
self.assertEqual(negate([self.is_odd_letter,self.is_vowel], 'a'), False)
self.assertEqual(negate([self.is_odd_letter,self.is_vowel], 'b'), True)
self.assertEqual(negate([self.is_odd_letter,self.is_vowel], 'c'), False)
self.assertEqual(negate([self.is_odd_letter,self.is_vowel], 'u'), False)
#should short-circuit after first True
self.assertEqual(negate([self.is_odd_letter, self.fail], x='a'), False)
self.assertRaises(AssertionError, \
negate, [self.is_odd_letter, self.fail], 'b')
def test_none(self):
"""none should return a function returning True if no component True"""
odd_vowel = none([self.is_odd_letter, self.is_vowel])
self.assertEqual(odd_vowel('a'), False)
self.assertEqual(map(odd_vowel, 'abceu'), [False,True] + [False]*3)
odd = none([self.is_odd_number, self.is_small])
self.assertEqual(map(odd, range(5)), [False]*4+[True])
#should short-circuit after first True
self.assertEqual(none([self.is_odd_letter, self.fail])(x='a'), False)
self.assertRaises(AssertionError, none([self.is_odd_letter,self.fail]),\
'b')
def test_neither(self):
"""neither should return function returning True if each component False"""
odd_vowel = neither(self.is_odd_letter, self.is_vowel)
self.assertEqual(map(odd_vowel, 'abcu'), [False,True,False,False])
#should short-circuit
self.assertEqual(neither(self.is_odd_letter, self.fail)(x='a'), False)
self.assertRaises(AssertionError, \
neither(self.is_odd_letter, self.fail), 'b')
def test_compose(self):
"""compose should return function returning f(g(x))"""
ds = compose(self.double, self.minusone)
sd = compose(self.minusone, self.double)
self.assertEqual(ds(5), 8)
self.assertEqual(sd(x=5), 9)
#check that it works when arg lists are different
commafy = compose(','.join, list)
self.assertEqual(commafy('abc'), 'a,b,c')
self.assertEqual(commafy(''), '')
self.assertEqual(commafy('a'), 'a')
def test_compose_many(self):
"""compose_many should return composition of all args"""
from numpy import arange
def to_strings(x):
return map(str, x)
printable_range = compose_many(''.join, to_strings, range)
printable_arange = compose_many(''.join, to_strings, arange)
self.assertEqual(printable_range(3), '012')
self.assertEqual(printable_range(0), '')
self.assertEqual(printable_range(5), '01234')
self.assertEqual(printable_arange(stop=51, start=10, step=10),
'1020304050')
def test_identity(self):
"""identity should return x"""
for i in ['a', 'abc', None, '', [], [1], 1, 2**50, 0.3e-50, {'a':3}]:
assert identity(i) is i
def test_has_field(self):
"""has_field should return True if specified field exists."""
x = has_x(1)
y = has_y(1)
check_x = has_field('x')
self.assertEqual(check_x(x), True)
self.assertEqual(check_x(y), False)
check_y = has_field('y')
self.assertEqual(check_y(x), False)
self.assertEqual(check_y(y), True)
del y.y
self.assertEqual(check_y(y), False)
y.x = 3
self.assertEqual(check_x(y), True)
def test_extract_field(self):
"""extract_field should apply constructor to field, or return None"""
num = has_x('1')
alpha = has_x('x')
y = has_y('1')
extractor = extract_field('x')
self.assertEqual(extractor(num), '1')
self.assertEqual(extractor(alpha), 'x')
self.assertEqual(extractor(y), None)
int_extractor = extract_field('x', int)
self.assertEqual(int_extractor(num), 1)
self.assertEqual(int_extractor(alpha), None)
self.assertEqual(int_extractor(y), None)
def test_test_field(self):
"""test_field should return boolean result of applying constructor"""
num = has_x('5')
alpha = has_x('x')
zero = has_x(0)
y = has_y('5')
tester = test_field('x')
self.assertEqual(tester(num), True)
self.assertEqual(tester(alpha), True)
self.assertEqual(tester(y), False)
int_tester = test_field('x', int)
self.assertEqual(int_tester(num), True)
self.assertEqual(int_tester(alpha), False)
self.assertEqual(int_tester(y), False)
self.assertEqual(int_tester(zero), False)
def test_index(self):
"""index should index objects by specified field or identity"""
num = has_x(5)
let = has_x('5')
zer = has_x('0')
non = has_x(None)
y = has_y(3)
items = [num, let, zer, non, y]
duplicates = items * 3
basic_indexer = index()
i = basic_indexer(items)
self.assertEqual(i, {num:[num], let:[let], zer:[zer], non:[non], y:[y]})
#test reusability
i = basic_indexer([3,3,4])
self.assertEqual(i, {3:[3, 3], 4:[4]})
#test duplicates
d = basic_indexer(duplicates)
self.assertEqual(d, {num:[num]*3, let:[let]*3, zer:[zer]*3, \
non:[non]*3, y:[y]*3})
#test with constructor
str_indexer = index(str)
i = str_indexer(items)
self.assertEqual(i, {'5':[num,let], '0':[zer], 'None':[non], '3':[y]})
#test order correct in duplicates
i = str_indexer(duplicates)
self.assertEqual(i, {'5':[num,let,num,let,num,let], '0':[zer,zer,zer],
'None':[non,non,non], '3':[y,y,y]})
#test with squashing
overwriter = index(str, overwrite=True)
i = overwriter(duplicates)
self.assertEqual(i, {'5':let, '0':zer, 'None':non, '3':y})
def test_test_container(self):
"""test_container should return True or False in a typesafe way."""
test_dict = test_container({'a':1})
test_list = test_container([1,2,3])
test_str = test_container('438hfanvr438')
for item in (1, 2, 3):
assert test_list(item)
assert not test_dict(item)
assert not test_str(item)
assert test_dict('a')
assert not test_list('a')
assert test_str('a')
for item in ('4', 'h', 'fan'):
assert not test_dict(item)
assert not test_list(item)
assert test_str(item)
for item in (['x','y'],{},{'a':3},'@#@',('a','b'),None,False):
assert not test_dict(item)
assert not test_list(item)
assert not test_str(item)
class SequenceFunctionsTests(TestCase):
"""Tests of standalone functions for dealing with sequences."""
def test_per_shortest(self):
"""per_shortest should divide by min(len(x), len(y))"""
self.assertEqual(per_shortest(20, 'aaaaaa', 'bbbb'), 5)
self.assertEqual(per_shortest(20, 'aaaaaa', 'b'), 20)
self.assertEqual(per_shortest(20, 'a', 'bbbbb'), 20)
self.assertEqual(per_shortest(20, '', 'b'), 0)
self.assertEqual(per_shortest(20, '', ''), 0)
#check that it does it in floating-point
self.assertEqual(per_shortest(1, 'aaaaaa', 'bbbb'), 0.25)
#check that it raises TypeError on non-seq
self.assertRaises(TypeError, per_shortest, 1, 2, 3)
def test_per_longest(self):
"""per_longest should divide by max(len(x), len(y))"""
self.assertEqual(per_longest(20, 'aaaaaa', 'bbbb'), 20/6.0)
self.assertEqual(per_longest(20, 'aaaaaa', 'b'), 20/6.0)
self.assertEqual(per_longest(20, 'a', 'bbbbb'), 20/5.0)
self.assertEqual(per_longest(20, '', 'b'), 20)
self.assertEqual(per_longest(20, '', ''), 0)
#check that it does it in floating-point
self.assertEqual(per_longest(1, 'aaaaaa', 'bbbb'), 1/6.0)
#check that it raises TypeError on non-seq
self.assertRaises(TypeError, per_longest, 1, 2, 3)
def test_for_seq(self):
"""for_seq should return the correct function"""
is_eq = lambda x,y: x == y
is_ne = lambda x,y: x != y
lt_5 = lambda x,y: x + y < 5
diff = lambda x,y: x - y
sumsq = lambda x: sum([i*i for i in x])
long_norm = lambda s, x, y: (s + 0.0) / max(len(x), len(y))
times_two = lambda s, x, y: 2*s
empty = []
s1 = [1,2,3,4,5]
s2 = [1,3,2,4,5]
s3 = [1,1,1,1,1]
s4 = [5,5,5,5,5]
s5 = [3,3,3,3,3]
short = [1]
#test behavior with default aggregator and normalizer
f = for_seq(is_eq)
self.assertFloatEqual(f(s1, s1), 1.0)
self.assertFloatEqual(f(s1, short), 1.0)
self.assertFloatEqual(f(short, s1), 1.0)
self.assertFloatEqual(f(short, s4), 0.0)
self.assertFloatEqual(f(s4, short), 0.0)
self.assertFloatEqual(f(s1,s2), 0.6)
f = for_seq(is_ne)
self.assertFloatEqual(f(s1, s1), 0.0)
self.assertFloatEqual(f(s1, short), 0.0)
self.assertFloatEqual(f(short, s1), 0.0)
self.assertFloatEqual(f(short, s4), 1.0)
self.assertFloatEqual(f(s4, short), 1.0)
self.assertFloatEqual(f(s1, s2), 0.4)
f = for_seq(lt_5)
self.assertFloatEqual(f(s3,s3), 1.0)
self.assertFloatEqual(f(s3,s4), 0.0)
self.assertFloatEqual(f(s2,s3), 0.6)
f = for_seq(diff)
self.assertFloatEqual(f(s1,s1), 0.0)
self.assertFloatEqual(f(s4,s1), 2.0)
self.assertFloatEqual(f(s1,s4), -2.0)
#test behavior with different aggregator
f = for_seq(diff)
self.assertFloatEqual(f(s1,s5), 0)
f = for_seq(diff, aggregator=sum)
self.assertFloatEqual(f(s1,s5), 0)
f = for_seq(diff, aggregator=sumsq)
self.assertFloatEqual(f(s1,s5), 2.0)
#test behavior with different normalizer
f = for_seq(diff, aggregator=sumsq, normalizer=None)
self.assertFloatEqual(f(s1,s5), 10)
f = for_seq(diff, aggregator=sumsq)
self.assertFloatEqual(f(s1,s5), 2.0)
f = for_seq(diff, aggregator=sumsq, normalizer=times_two)
self.assertFloatEqual(f(s1,s5), 20)
f = for_seq(diff, aggregator=sumsq)
self.assertFloatEqual(f(s5,short), 4)
f = for_seq(diff, aggregator=sumsq, normalizer=long_norm)
self.assertFloatEqual(f(s5,short), 0.8)
class Filter_Criteria_Tests(TestCase):
"""Tests of standalone functions used as filter criteria"""
def test_trans_except(self):
"""trans_except should return trans table mapping non-good chars to x"""
a = trans_except('Aa', '-')
none = trans_except('', '*')
some = trans_except('zxcvbnm,.zxcvbnm,.', 'V')
self.assertEqual('abcABA'.translate(a), 'a--A-A')
self.assertEqual(''.translate(a), '')
self.assertEqual('12345678'.translate(a), '--------')
self.assertEqual(''.translate(none), '')
self.assertEqual('abcdeEFGHI12345&*(!@'.translate(none), '*'*20)
self.assertEqual('qazwsxedcrfv'.translate(some),'VVzVVxVVcVVv')
def test_trans_all(self):
"""trans_all should return trans table mapping all bad chars to x"""
a = trans_all('Aa', '-')
none = trans_all('', '*')
some = trans_all('zxcvbnm,.zxcvbnm,.', 'V')
self.assertEqual('abcABA'.translate(a), '-bc-B-')
self.assertEqual(''.translate(a), '')
self.assertEqual('12345678'.translate(a), '12345678')
self.assertEqual(''.translate(none), '')
self.assertEqual('abcdeEFGHI12345&*(!@'.translate(none), \
'abcdeEFGHI12345&*(!@')
self.assertEqual('qazwsxedcrfv'.translate(some),'qaVwsVedVrfV')
def test_make_trans(self):
"""make_trans should return trans table mapping chars to default"""
a = make_trans()
self.assertEqual('abc123'.translate(a), 'abc123')
a = make_trans('a', 'x')
self.assertEqual('abc123'.translate(a), 'xbc123')
a = make_trans('ac', 'xa')
self.assertEqual('abc123'.translate(a), 'xba123')
a = make_trans('ac', 'xa', '.')
self.assertEqual('abc123'.translate(a), 'x.a...')
self.assertRaises(ValueError, make_trans, 'ac', 'xa', 'av')
def test_find_any(self):
"""find_any should be True if one of the words is in the string"""
f = find_any('ab')
self.assertEqual(f(''),0) #empty
self.assertRaises(AttributeError,f,None) # none
self.assertEqual(f('cde'),0) #none of the elements
self.assertEqual(f('axxx'),1) #one of the elements
self.assertEqual(f('bxxx'),1) #one of the elements
self.assertEqual(f('axxxb'),1) #all elements
self.assertEqual(f('aaaa'),1) #repeated element
# works on any sequence
f = find_any(['foo','bar'])
self.assertEqual(f("joe"),0)
self.assertEqual(f("only foo"),1)
self.assertEqual(f("bar and foo"),1)
# does NOT work on numbers
def test_find_no(self):
"""find_no should be True if none of the words in the string"""
f = find_no('ab')
self.assertEqual(f(''),1) #empty
self.assertRaises(AttributeError,f,None) # none
self.assertEqual(f('cde'),1) #none of the elements
self.assertEqual(f('axxx'),0) #one of the elements
self.assertEqual(f('bxxx'),0) #one of the elements
self.assertEqual(f('axxxb'),0) #all elements
self.assertEqual(f('aaaa'),0) #repeated element
# works on any sequence
f = find_no(['foo','bar'])
self.assertEqual(f("joe"),1)
self.assertEqual(f("only foo"),0)
self.assertEqual(f("bar and foo"),0)
# does NOT work on numbers
def test_find_all(self):
"""find_all should be True if all words appear in the string"""
f = find_all('ab')
self.assertEqual(f(''),0) #empty
self.assertRaises(AttributeError,f,None) # none
self.assertEqual(f('cde'),0) #none of the elements
self.assertEqual(f('axxx'),0) #one of the elements
self.assertEqual(f('bxxx'),0) #one of the elements
self.assertEqual(f('axxxb'),1) #all elements
self.assertEqual(f('aaaa'),0) #repeated element
# works on any sequence
f = find_all(['foo','bar'])
self.assertEqual(f("joe"),0)
self.assertEqual(f("only foo"),0)
self.assertEqual(f("bar and foo"),1)
# does NOT work on numbers
def test_keep_if_more(self):
"""keep_if_more should be True if #items in s > x"""
self.assertRaises(ValueError, keep_if_more,'lksfj','ksfd') #not int
self.assertRaises(IndexError,keep_if_more,'ACGU',-3) #negative
f = keep_if_more('a',0) #zero
self.assertEqual(f(''),0)
self.assertEqual(f('a'),1)
self.assertEqual(f('b'),0)
# works on strings
f = keep_if_more('ACGU',5) #positive
self.assertEqual(f(''),0)
self.assertEqual(f('ACGUAGCUioooNNNNNA'),1)
self.assertEqual(f('NNNNNNN'),0)
# works on words
f = keep_if_more(['foo'],1)
self.assertEqual(f(''),0)
self.assertEqual(f(['foo', 'bar','foo']),1)
self.assertEqual(f(['joe']),0)
# works on numbers
f = keep_if_more([0,1],3)
self.assertEqual(f(''),0)
self.assertEqual(f([0,1,2,3,4,5]),0)
self.assertEqual(f([0,1,0,1]),1)
def test_exclude_if_more(self):
"""exclude_if_more should be True if #items in s <= x"""
self.assertRaises(ValueError, exclude_if_more,'lksfj','ksfd') #not int
self.assertRaises(IndexError,exclude_if_more,'ACGU',-3) #negative
f = exclude_if_more('a',0) #zero
self.assertEqual(f(''),1)
self.assertEqual(f('a'),0)
self.assertEqual(f('b'),1)
# works on strings
f = exclude_if_more('ACGU',5) #positive
self.assertEqual(f(''),1)
self.assertEqual(f('ACGUAGCUioooNNNNNA'),0)
self.assertEqual(f('NNNNNNN'),1)
# works on words
f = exclude_if_more(['foo'],1)
self.assertEqual(f(''),1)
self.assertEqual(f(['foo', 'bar','foo']),0)
self.assertEqual(f(['joe']),1)
# works on numbers
f = exclude_if_more([0,1],3)
self.assertEqual(f(''),1)
self.assertEqual(f([0,1,2,3,4,5]),1)
self.assertEqual(f([0,1,0,1]),0)
def test_keep_if_more_other(self):
"""keep_if_more_other should be True if #other items > x"""
self.assertRaises(ValueError, keep_if_more_other,'lksfj','ks') #not int
self.assertRaises(IndexError,keep_if_more_other,'ACGU',-3) #negative
f = keep_if_more_other('a',0) #zero
self.assertEqual(f(''),0)
self.assertEqual(f('a'),0)
self.assertEqual(f('b'),1)
# works on strings
f = keep_if_more_other('ACGU',5) #positive
self.assertEqual(f(''),0)
self.assertEqual(f('ACGUNNNNN'),0)
self.assertEqual(f('ACGUAGCUioooNNNNNA'),1)
self.assertEqual(f('NNNNNNN'),1)
# works on words
f = keep_if_more_other(['foo'],1)
self.assertEqual(f(''),0)
self.assertEqual(f(['foo', 'bar','foo']),0)
self.assertEqual(f(['joe','oef']),1)
# works on numbers
f = keep_if_more_other([0,1],3)
self.assertEqual(f(''),0)
self.assertEqual(f([0,1,2,3,4,5]),1)
self.assertEqual(f([0,1,0,1]),0)
def test_exclude_if_more_other(self):
"""exclude_if_more_other should be True if #other items <= x"""
self.assertRaises(ValueError, exclude_if_more_other,'lks','ks') #not int
self.assertRaises(IndexError,exclude_if_more_other,'ACGU',-3) #negative
f = exclude_if_more_other('a',0) #zero
self.assertEqual(f(''),1)
self.assertEqual(f('a'),1)
self.assertEqual(f('b'),0)
# works on strings
f = exclude_if_more_other('ACGU',5) #positive
self.assertEqual(f(''),1)
self.assertEqual(f('ACGUNNNNN'),1)
self.assertEqual(f('ACGUAGCUioooNNNNNA'),0)
self.assertEqual(f('NNNNNNN'),0)
# works on words
f = exclude_if_more_other(['foo'],1)
self.assertEqual(f(''),1)
self.assertEqual(f(['foo', 'bar','foo']),1)
self.assertEqual(f(['joe','oef']),0)
# works on numbers
f = exclude_if_more_other([0,1],3)
self.assertEqual(f(''),1)
self.assertEqual(f([0,1,2,3,4,5]),0)
self.assertEqual(f([0,1,0,1]),1)
def test_keep_chars(self):
"""keep_chars returns a string containing only chars in keep"""
f = keep_chars('ab c3*[')
self.assertEqual(f(''),'') #empty
self.assertRaises(AttributeError,f,None) #None
#one character, case sensitive
self.assertEqual(f('b'),'b')
self.assertEqual(f('g'),'')
self.assertEqual(f('xyz123'),'3')
self.assertEqual(f('xyz 123'),' 3')
#more characters, case sensitive
self.assertEqual(f('kjbwherzcagebcujrkcs'),'bcabcc')
self.assertEqual(f('f[ffff*ff*fff3fff'),'[**3')
# case insensitive
f = keep_chars('AbC',False)
self.assertEqual(f('abcdef'),'abc')
self.assertEqual(f('ABCDEF'),'ABC')
self.assertEqual(f('aBcDeF'),'aBc')
def test_exclude_chars(self):
"""exclude_chars returns string containing only chars not in exclude"""
f = exclude_chars('ab c3*[')
self.assertEqual(f(''),'') #empty
self.assertRaises(AttributeError,f,None) #None
#one character, case sensitive
self.assertEqual(f('b'),'')
self.assertEqual(f('g'),'g')
self.assertEqual(f('xyz123'),'xyz12')
self.assertEqual(f('xyz 123'),'xyz12')
#more characters, case sensitive
self.assertEqual(f('axxxbxxxcxxx'),'xxxxxxxxx')
# case insensitive
f = exclude_chars('AbC',False)
self.assertEqual(f('abcdef'),'def')
self.assertEqual(f('ABCDEF'),'DEF')
self.assertEqual(f('aBcDeF'),'DeF')
def test_reorder(self):
"""reorder should always use the same order when invoked"""
list_test = reorder([3,2,1])
dict_test = reorder(['x','y','z'])
multi_test = reorder([3,2,2])
null_test = reorder([])
first_seq = 'abcde'
second_seq = [3,4,5,6,7]
empty_list = []
empty_dict = {}
full_dict = {'a':3, 'c':5, 'x':'abc','y':'234','z':'qaz'}
for i in (first_seq, second_seq, empty_list, empty_dict):
self.assertEqual(null_test(i), [])
self.assertEqual(list_test(first_seq), ['d','c','b'])
self.assertEqual(list_test(second_seq), [6,5,4])
self.assertEqual(multi_test(first_seq), ['d','c','c'])
self.assertEqual(dict_test(full_dict), ['abc','234','qaz'])
self.assertRaises(KeyError, dict_test, empty_dict)
self.assertRaises(IndexError, list_test, empty_list)
def test_reorder_inplace(self):
"""reorder_inplace should replace object's data with new order"""
attr_test = reorder_inplace([3,2,1], 'Data')
obj_test = reorder_inplace([3,2,2])
seq = [3,4,5,6,7]
class obj(object):
pass
o = obj()
o.XYZ = [9, 7, 5]
o.Data = ['a','b','c','d','e']
orig_data = o.Data
self.assertEqual(obj_test(seq), [6,5,5])
self.assertEqual(seq, [6,5,5])
assert attr_test(o) is o
self.assertEqual(o.XYZ, [9,7,5])
self.assertEqual(o.Data, ['d','c','b'])
assert orig_data is o.Data
def test_float_from_string(self):
"""float_from_string should ignore funny chars"""
ffs = float_from_string
self.assertEqual(ffs('3.5'), 3.5)
self.assertEqual(ffs(' -3.45e-10 '), float(' -3.45e-10 '))
self.assertEqual(ffs('jsdjhsdf[]()0.001IVUNZSDFl]]['), 0.001)
def test_first_index(self):
"""first_index should return index of first occurrence where f(s)"""
vowels = 'aeiou'
is_vowel = lambda x: x in vowels
s1 = 'ebcua'
s2 = 'bcbae'
s3 = ''
s4 = 'cbd'
self.assertEqual(first_index(s1, is_vowel), 0)
self.assertEqual(first_index(s2, is_vowel), 3)
self.assertEqual(first_index(s3, is_vowel), None)
self.assertEqual(first_index(s4, is_vowel), None)
def test_last_index(self):
"""last_index should return index of last occurrence where f(s)"""
vowels = 'aeiou'
is_vowel = lambda x: x in vowels
s1 = 'ebcua'
s2 = 'bcbaef'
s3 = ''
s4 = 'cbd'
self.assertEqual(last_index(s1, is_vowel), 4)
self.assertEqual(last_index(s2, is_vowel), 4)
self.assertEqual(last_index(s3, is_vowel), None)
self.assertEqual(last_index(s4, is_vowel), None)
def test_first_index_in_set(self):
"""first_index_in_set should return index of first occurrence """
vowels = 'aeiou'
s1 = 'ebcua'
s2 = 'bcbae'
s3 = ''
s4 = 'cbd'
self.assertEqual(first_index_in_set(s1, vowels), 0)
self.assertEqual(first_index_in_set(s2, vowels), 3)
self.assertEqual(first_index_in_set(s3, vowels), None)
self.assertEqual(first_index_in_set(s4, vowels), None)
def test_last_index_in_set(self):
"""last_index_in_set should return index of last occurrence"""
vowels = 'aeiou'
s1 = 'ebcua'
s2 = 'bcbaef'
s3 = ''
s4 = 'cbd'
self.assertEqual(last_index_in_set(s1, vowels), 4)
self.assertEqual(last_index_in_set(s2, vowels), 4)
self.assertEqual(last_index_in_set(s3, vowels), None)
self.assertEqual(last_index_in_set(s4, vowels), None)
def test_first_index_not_in_set(self):
"""first_index_not_in_set should return index of first occurrence """
vowels = 'aeiou'
s1 = 'ebcua'
s2 = 'bcbae'
s3 = ''
s4 = 'cbd'
self.assertEqual(first_index_not_in_set(s1, vowels), 1)
self.assertEqual(first_index_not_in_set(s2, vowels), 0)
self.assertEqual(first_index_not_in_set(s3, vowels), None)
self.assertEqual(first_index_not_in_set(s4, vowels), 0)
def test_last_index_not_in_set(self):
"""last_index_not_in_set should return index of last occurrence"""
vowels = 'aeiou'
s1 = 'ebcua'
s2 = 'bcbaef'
s3 = ''
s4 = 'cbd'
self.assertEqual(last_index_not_in_set(s1, vowels), 2)
self.assertEqual(last_index_not_in_set(s2, vowels), 5)
self.assertEqual(last_index_not_in_set(s3, vowels), None)
self.assertEqual(last_index_not_in_set(s4, vowels), 2)
def test_first(self):
"""first should return first occurrence where f(s)"""
vowels = 'aeiou'
is_vowel = lambda x: x in vowels
s1 = 'ebcua'
s2 = 'bcbae'
s3 = ''
s4 = 'cbd'
self.assertEqual(first(s1, is_vowel), 'e')
self.assertEqual(first(s2, is_vowel), 'a')
self.assertEqual(first(s3, is_vowel), None)
self.assertEqual(first(s4, is_vowel), None)
def test_last(self):
"""last should return last occurrence where f(s)"""
vowels = 'aeiou'
is_vowel = lambda x: x in vowels
s1 = 'ebcua'
s2 = 'bcbaef'
s3 = ''
s4 = 'cbd'
self.assertEqual(last(s1, is_vowel), 'a')
self.assertEqual(last(s2, is_vowel), 'e')
self.assertEqual(last(s3, is_vowel), None)
self.assertEqual(last(s4, is_vowel), None)
def test_first_in_set(self):
"""first_in_set should return first occurrence """
vowels = 'aeiou'
s1 = 'ebcua'
s2 = 'bcbae'
s3 = ''
s4 = 'cbd'
self.assertEqual(first_in_set(s1, vowels), 'e')
self.assertEqual(first_in_set(s2, vowels), 'a')
self.assertEqual(first_in_set(s3, vowels), None)
self.assertEqual(first_in_set(s4, vowels), None)
def test_last_in_set(self):
"""last_in_set should return last occurrence"""
vowels = 'aeiou'
s1 = 'ebcua'
s2 = 'bcbaef'
s3 = ''
s4 = 'cbd'
self.assertEqual(last_in_set(s1, vowels), 'a')
self.assertEqual(last_in_set(s2, vowels), 'e')
self.assertEqual(last_in_set(s3, vowels), None)
self.assertEqual(last_in_set(s4, vowels), None)
def test_first_not_in_set(self):
"""first_not_in_set should return first occurrence """
vowels = 'aeiou'
s1 = 'ebcua'
s2 = 'bcbae'
s3 = ''
s4 = 'cbd'
self.assertEqual(first_not_in_set(s1, vowels), 'b')
self.assertEqual(first_not_in_set(s2, vowels), 'b')
self.assertEqual(first_not_in_set(s3, vowels), None)
self.assertEqual(first_not_in_set(s4, vowels), 'c')
def test_last_not_in_set(self):
"""last_not_in_set should return last occurrence"""
vowels = 'aeiou'
s1 = 'ebcua'
s2 = 'bcbaef'
s3 = ''
s4 = 'cbd'
self.assertEqual(last_not_in_set(s1, vowels), 'c')
self.assertEqual(last_not_in_set(s2, vowels), 'f')
self.assertEqual(last_not_in_set(s3, vowels), None)
self.assertEqual(last_not_in_set(s4, vowels), 'd')
def test_perm(self):
"""perm should return correct permutations"""
self.assertEqual(list(perm('abc')), ['abc','acb','bac','bca','cab','cba'])
def test_comb(self):
"""comb should return correct combinations"""
self.assertEqual(list(comb(range(5), 0)),
[])
self.assertEqual(list(comb(range(5), 1)),
[[0], [1], [2], [3], [4]])
self.assertEqual(list(comb(range(5), 2)),
[[0, 1], [0, 2], [0, 3], [0, 4], [1, 2], [1, 3], [1, 4], [2, 3],
[2, 4], [3, 4]])
self.assertEqual(list(comb(range(5), 3)),
[[0, 1, 2], [0, 1, 3], [0, 1, 4], [0, 2, 3], [0, 2, 4], [0, 3, 4],
[1, 2, 3], [1, 2, 4], [1, 3, 4], [2, 3, 4]])
self.assertEqual(list(comb(range(5), 4)),
[[0, 1, 2, 3], [0, 1, 2, 4], [0, 1, 3, 4], [0, 2, 3, 4], [1, 2, 3, 4]])
self.assertEqual(list(comb(range(5), 5)),
[[0, 1, 2, 3, 4]])
def test_cross_comb(self):
"""cross_comb should produce correct combinations"""
v1 = range(2)
v2 = range(3)
v3 = list('abc')
vv1 = ([e] for e in v1)
v1_x_v2 = [[0, 0], [0, 1], [0, 2], [1, 0], [1, 1], [1, 2]]
v1v2v3 = [[0, 0, 'a'], [0, 0, 'b'], [0, 0, 'c'], [0, 1, 'a'],
[0, 1, 'b'], [0, 1, 'c'], [0, 2, 'a'], [0, 2, 'b'],
[0, 2, 'c'], [1, 0, 'a'], [1, 0, 'b'], [1, 0, 'c'],
[1, 1, 'a'], [1, 1, 'b'], [1, 1, 'c'], [1, 2, 'a'],
[1, 2, 'b'], [1, 2, 'c']]
self.assertEqual(list( _increment_comb(vv1, v2)), v1_x_v2)
self.assertEqual(list( cross_comb([v1, v2])), v1_x_v2)
self.assertEqual(list(cross_comb([v1, v2, v3])), v1v2v3)
#run tests if invoked from the commandline
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_util/test_trie.py 000644 000765 000024 00000021614 12024702176 021777 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""tests for Trie and compressed Trie class."""
__author__ = "Jens Reeder"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jens Reeder"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Jens Reeder"
__email__ = "jens.reeder@gmail.com"
__status__ = "Prototype"
from cogent.util.unit_test import TestCase, main
from cogent.util.trie import Trie, Compressed_Trie, build_prefix_map, build_trie, \
_build_prefix_map
class TrieTests(TestCase):
def setUp(self):
self.data = dict({"0": "ab", "1":"abababa", "2":"abab",
"3":"baba", "4":"ababaa","5":"a", "6":"abababa",
"7":"bab", "8":"babba"})
def test_init(self):
"""Trie init should create an empty trie."""
t = Trie()
self.assertEqual(t.root.labels, [])
self.assertEqual(t.root.children, {})
def test_insert_find(self):
"""An added key should be found by find."""
data = self.data
t = Trie()
for (label, seq) in data.iteritems():
t.insert(seq, label)
for (label, seq) in data.iteritems():
self.assertEqual(label in t.find(seq), True)
self.assertEqual(t.find("cacgchagc"), [])
self.assertEqual(t.find("abababa"), ["1","6"])
def test_insert_unique(self):
"""insert_unique should insert only unique words."""
data = self.data
t = Trie()
for (label, seq) in data.iteritems():
t._insert_unique(seq, label)
self.assertEqual(t.find("ab"), [])
self.assertEqual(t.find("cacgchagc"), [])
self.assertEqual(t.find("abababa"), ["1"])
def test_build_prefix_map(self):
"""prefix_map should map prefix strings."""
self.assertEqual(dict(_build_prefix_map(self.data.iteritems())),
{'1': ['0', '2', '5', '6'],
'8': [],
'3': ['7'],
'4': []})
class Compressed_Trie_Tests(TestCase):
def setUp(self):
self.data = dict({"0": "ab", "1":"abababa", "2":"abab",
"3":"baba", "4":"ababaa","5":"a", "6":"abababa",
"7":"bab", "8":"babba"})
self.trie = build_trie(self.data.iteritems())
def test_init(self):
"""Trie init should create an empty trie."""
t = Compressed_Trie()
self.assertEqual(t.root.labels, [])
self.assertEqual(t.root.children, {})
self.assertEqual(t.root.key, "")
def test_non_zero(self):
"""__non_zero__ should cehck for any data in the trie."""
t = Compressed_Trie()
self.assertEqual(t.__nonzero__(), False)
self.assertEqual(self.trie.__nonzero__(), True)
def test_len(self):
"""__len__ should return the number of seqs in the trie."""
self.assertEqual(len(self.trie), 9)
t = Compressed_Trie()
self.assertEqual(len(t), 0)
def test_size(self):
"""size should return the number of nodes in the trie."""
self.assertEqual(self.trie.size(), 10)
#empty trie contins only root node
t = Compressed_Trie()
self.assertEqual(size(t), 1)
def test_to_string(self):
"""_to_string should create a string representation."""
string_rep = """
key
{
\tkey a['5']
\t{
\t\tkey b['0']
\t\t{
\t\t\tkey ab['2']
\t\t\t{
\t\t\t\tkey a
\t\t\t\t{
\t\t\t\t\tkey a['4']
\t\t\t\t}
\t\t\t\t{
\t\t\t\t\tkey ba['1', '6']
\t\t\t\t}
\t\t\t}
\t\t}
\t}
}
{
\tkey bab['7']
\t{
\t\tkey a['3']
\t}
\t{
\t\tkey ba['8']
\t}
}
"""
self.assertEqual(str(self.trie), string_rep)
def test_insert_find(self):
"""An added key should be found by find."""
data = self.data
t = Compressed_Trie()
for (label, seq) in data.iteritems():
t.insert(seq, label)
for (label, seq) in data.iteritems():
self.assertEqual(label in t.find(seq), True)
self.assertEqual(t.find("abababa"), ["1","6"])
self.assertEqual(t.find("cacgchagc"), [])
def test_prefixMap(self):
"""prefix_map (Compressed_Trie) should map prefix strings."""
self.assertEqual(self.trie.prefixMap(),
{'1': ['6', '2', '0', '5'],
'8': ['7'],
'3': [],
'4': []})
def test_init(self):
"""Trie init should create an empty trie."""
t = Trie()
self.assertEqual(t.root.labels, [])
self.assertEqual(t.root.children, {})
def test_insert_find(self):
"""An added key should be found by find."""
data = self.data
t = Trie()
for (label, seq) in data.iteritems():
t.insert(seq, label)
for (label, seq) in data.iteritems():
self.assertEqual(label in t.find(seq), True)
self.assertEqual(t.find("cacgchagc"), [])
self.assertEqual(t.find("abababa"), ["1","6"])
def test_insert_unique(self):
"""insert_unique should insert only unique words."""
data = self.data
t = Trie()
for (label, seq) in data.iteritems():
t._insert_unique(seq, label)
self.assertEqual(t.find("ab"), [])
self.assertEqual(t.find("cacgchagc"), [])
self.assertEqual(t.find("abababa"), ["1"])
def test_build_prefix_map(self):
"""prefix_map should map prefix strings."""
self.assertEqual(dict(_build_prefix_map(self.data.iteritems())),
{'1': ['0', '2', '5', '6'],
'8': [],
'3': ['7'],
'4': []})
def test_build_trie(self):
"""Build_trie should build trie from seqs."""
t = build_trie(self.data.iteritems(), Trie)
self.assertTrue(isinstance(t, Trie))
for (label, seq) in self.data.iteritems():
self.assertContains(t.find(seq), label)
self.assertEqual(t.find(""), [])
self.assertEqual(t.find("ccc"), [])
class Compressed_Trie_Tests(TestCase):
def setUp(self):
self.data = dict({"0": "ab", "1":"abababa", "2":"abab",
"3":"baba", "4":"ababaa","5":"a", "6":"abababa",
"7":"bab", "8":"babba"})
self.trie = build_trie(self.data.iteritems())
def test_init(self):
"""Trie init should create an empty trie."""
t = Compressed_Trie()
self.assertEqual(t.root.labels, [])
self.assertEqual(t.root.children, {})
self.assertEqual(t.root.key, "")
def test_non_zero(self):
"""__non_zero__ should cehck for any data in the trie."""
t = Compressed_Trie()
self.assertEqual(t.__nonzero__(), False)
self.assertEqual(self.trie.__nonzero__(), True)
def test_len(self):
"""__len__ should return the number of seqs in the trie."""
self.assertEqual(len(self.trie), 9)
def test_size(self):
"""size should return the number of nodes in the trie."""
self.assertEqual(self.trie.size(), 10)
def test_to_string(self):
"""_to_string should create a string representation."""
string_rep = """
key
{
\tkey a['5']
\t{
\t\tkey b['0']
\t\t{
\t\t\tkey ab['2']
\t\t\t{
\t\t\t\tkey a
\t\t\t\t{
\t\t\t\t\tkey a['4']
\t\t\t\t}
\t\t\t\t{
\t\t\t\t\tkey ba['1', '6']
\t\t\t\t}
\t\t\t}
\t\t}
\t}
}
{
\tkey bab['7']
\t{
\t\tkey a['3']
\t}
\t{
\t\tkey ba['8']
\t}
}
"""
self.assertEqual(str(self.trie), string_rep)
def test_insert_find(self):
"""An added key should be found by find."""
data = self.data
t = Compressed_Trie()
for (label, seq) in data.iteritems():
t.insert(seq, label)
for (label, seq) in data.iteritems():
self.assertEqual(label in t.find(seq), True)
self.assertEqual(t.find("abababa"), ["1","6"])
self.assertEqual(t.find("cacgchagc"), [])
def test_prefixMap(self):
"""prefix_map (Compressed_Trie) should map prefix strings."""
self.assertEqual(self.trie.prefixMap(),
{'1': ['6', '2', '0', '5'],
'8': ['7'],
'3': [],
'4': []})
def test_build_trie(self):
"""Build_trie should build trie from seqs."""
t = build_trie(self.data.iteritems())
self.assertTrue(isinstance(t, Compressed_Trie))
for (label, seq) in self.data.iteritems():
self.assertContains(t.find(seq), label)
self.assertEqual(t.find(""), [])
self.assertEqual(t.find("ccc"), [])
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_util/test_unit_test.py 000644 000765 000024 00000144356 12024702176 023063 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests for cogent.util.unit_test, extension of the built-in PyUnit framework.
"""
##SUPPORT2425
#from __future__ import with_statement
from cogent.util.unit_test import TestCase, main, FakeRandom #,numpy_err
import numpy; from numpy import array, zeros, log, inf
from sys import exc_info
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Sandra Smit", "Gavin Huttley", "Daniel McDonald"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
## SUPPORT2425
#class NumpyErrTests(TestCase):
#"""Tests numpy_err function."""
#def test_usage(self):
#with numpy_err(divide='raise'):
#self.assertRaises(FloatingPointError, log, 0)
#with numpy_err(divide='ignore'):
#self.assertEqual(log(0), -inf)
#with numpy_err(divide='raise'):
#self.assertRaises(FloatingPointError, log, 0)
#def test_err_status(self):
#ori_status = numpy.geterr()
#numpy.seterr(divide='warn')
#with numpy_err(all='ignore'):
#for v in numpy.geterr().values():
#self.assertEqual(v, 'ignore')
#self.assertEqual(numpy.geterr()['divide'], 'warn')
#numpy.seterr(**ori_status)
class FakeRandomTests(TestCase):
"""Tests FakeRandom class."""
def test_call_constant(self):
"""FakeRandom __call__ should return next item from list if constant"""
const = FakeRandom([1])
self.assertEqual(const(), 1)
self.assertRaises(IndexError, const)
def test_call_constant_wrap(self):
"""FakeRandom __call__ should wrap for one-item list if specified"""
const = FakeRandom([1], True)
for i in range(10):
self.assertEqual(const(), True)
def test_call_var(self):
"""FakeRandom __call__ should work with a multi-item list"""
f = FakeRandom([1,2,3])
self.assertEqual(f(), 1)
self.assertEqual(f(), 2)
self.assertEqual(f(), 3)
self.assertRaises(IndexError, f)
def test_call_var_wrap(self):
"""FakeRandom __call__ should work with a multi-item wrapped list"""
f = FakeRandom([1,2,3], True)
result = [f() for i in range(10)]
self.assertEqual(result, [1,2,3,1,2,3,1,2,3,1])
def test_cal_var_args(self):
"""FakeRandom __call__ should ignore extra args"""
f = FakeRandom([[1,2,3]], True)
for i in range(5):
result = f((5,5)) #shape parameter ignored
self.assertEqual(result, [1,2,3])
class TestCaseTests(TestCase):
"""Tests for extension of the built-in unittest framework.
For each test, includes an example of success and failure.
"""
unequal_pairs = [
(1, 0),
([], ()),
(None, 0),
('', ' '),
(1, '1'),
(0, '0'),
('', None),
(array([1,2,3]),array([1,2,4])),
(array([[1,2],[3,4]]), array([[1.0,2.0],[3.0,4.1]])),
(array([1]), array([1,2])),
(zeros(0), array([1])),
(array([1,1,1]), array([1])),
(array([[1,1],[1,1]]), array([1,1,1,1])),
(zeros(0), None),
(zeros(3), zeros(5)),
(zeros(0), ''),
]
equal_pairs = [
(1, 1),
(0, 0),
(5, 5L),
(5, 5.0),
(0, 0.0),
('', ''),
(' ', ' '),
('a', 'a'),
(None, None),
([0, 1], [0.0, 1.0]),
(array([1,2,3]), array([1,2,3])),
(array([[1,2],[3,4]]), array([[1.0,2.0],[3.0,4.0]])),
(zeros(0), []),
(zeros(0), zeros(0)),
(array([]), zeros(0)),
(zeros(3), zeros(3)),
(array([0,0,0]), zeros(3)),
(array([]), []),
]
small = 1e-7
big = 1e-5
within_1e6_abs_pairs = [
(1, 1 + small),
(1 + small, 1),
(1, 1 - small),
(1 - small, 1),
(100000, 100000 - small),
(-100000, -100000 - small),
(-1, -1 + small),
(-1, -1 - small),
(0, small),
(0, -small),
(array([1,2]), array([1,2+small])),
(array([[1,2],[3,4]]), array([[1,2+small],[3,4]]))
]
within_1e6_rel_pairs = [
(1, 1 + 1 * small),
(1 + 1 * small, 1),
(1, 1 - 1 * small),
(1 - 1 * small, 1),
(100000, 100000 - 100000 * small),
(-100000, -100000 - 100000 * small),
(-1, -1 + -1 * small),
(-1, -1 - -1 * small),
(array([1,2]), array([1+small,2])),
(array([[1000,1000],[1000,1000]]), \
array([[1000+1000*small, 1000], [1000,1000]])),
]
outside_1e6_abs_pairs = [
(1, 1 + big),
(1 + big, 1),
(1, 1 - big),
(1 - big, 1),
(100000, 100000 - big),
(-100000, -100000 - big),
(-1, -1 + big),
(-1, -1 - big),
(0, big),
(0, -big),
(1e7, 1e7 + 1),
(array([1,1]), array([1,1+big])),
(array([[1,1],[1,1]]), array([[1,1+big],[1,1]])),
]
outside_1e6_rel_pairs = [
(1, 1 + 1 * big),
(1 + 1 * big, 1),
(1, 1 - 1 * big),
(1 - 1 * big, 1),
(100000, 100000 - 100000 * big),
(-100000, -100000 - 100000 * big),
(-1, -1 + -1 * big),
(-1, -1 - -1 * big),
(1e-30, 1e-30 + small),
(0, small),
(1e5, 1e5 + 1),
(array([1,1]), array([1,1+1*big])),
]
def test_assertNotEqual_None(self):
"""assertNotEqual should raise exception with two copies of None"""
try:
self.assertNotEqual(None, None)
except:
message = str(exc_info()[1])
self.assertEqual(message,
'Observed None and expected None: shouldn\'t test equal')
else:
raise AssertionError, \
"unit_test.assertNotEqual failed on input %s and %s" \
% (`first`, `second`)
def test_assertNotEqual_numbers(self):
"""assertNotEqual should raise exception with integer and float zero"""
try:
self.assertNotEqual(0, 0.0)
except:
message = str(exc_info()[1])
self.assertEqual(message,
'Observed 0 and expected 0.0: shouldn\'t test equal')
else:
raise AssertionError, \
"unit_test.assertNotEqual failed on input %s and %s" \
% (`first`, `second`)
def test_assertNotEqual_unequal(self):
"""assertNotEqual should not raise exception when values differ"""
for first, second in self.unequal_pairs:
try:
self.assertNotEqual(first, second)
except:
raise AssertionError, \
"unit_test.assertNotEqual failed on input %s and %s" \
% (`first`, `second`)
def test_assertNotEqual_equal(self):
"""assertNotEqual should raise exception when values differ"""
for first, second in self.equal_pairs:
try:
self.assertNotEqual(first, second)
except:
message = str(exc_info()[1])
self.assertEqual(message,
'Observed %s and expected %s: shouldn\'t test equal' \
% (`first`, `second`))
else:
raise AssertionError, \
"unit_test.assertNotEqual failed on input %s and %s" \
% (`first`, `second`)
def test_assertEqual_None(self):
"""assertEqual should not raise exception with two copies of None"""
try:
self.assertEqual(None, None)
except:
raise AssertionError, \
"unit_test.assertEqual failed on input %s and %s" \
% (`first`, `second`)
def test_assertEqual_numbers(self):
"""assertEqual should not raise exception with integer and float zero"""
try:
self.assertEqual(0, 0.0)
except:
raise AssertionError, \
"unit_test.assertEqual failed on input %s and %s" \
% (`first`, `second`)
def test_assertEqual_unequal(self):
"""assertEqual should raise exception when values differ"""
for first, second in self.unequal_pairs:
try:
self.assertEqual(first, second)
except:
message = str(exc_info()[1])
self.assertEqual(message,
'Got %s, but expected %s' \
% (`first`, `second`))
else:
raise AssertionError, \
"unit_test.assertEqual failed on input %s and %s" \
% (`first`, `second`)
def test_assertEqual_equal(self):
"""assertEqual should not raise exception when values test equal"""
for first, second in self.equal_pairs:
try:
self.assertEqual(first, second)
except:
raise AssertionError, \
"unit_test.assertEqual failed on input %s and %s" \
% (`first`, `second`)
def test_assertEqual_nested_array(self):
self.assertEqual([[1,0], [0,1]],
[array([1,0]), array([0,1])])
def test_assertEqual_shape_mismatch(self):
"""assertEqual should raise when obs and exp shapes mismatch"""
obs = [1,2,3]
exp = [1,2,3,4]
self.assertRaises(AssertionError, self.assertEqual, obs, exp)
def test_assertFloatEqualAbs_equal(self):
"""assertFloatEqualAbs should not raise exception when values within eps"""
for first, second in self.within_1e6_abs_pairs:
try:
self.assertFloatEqualAbs(first, second, eps=1e-6)
except:
raise AssertionError, \
"unit_test.assertFloatEqualAbs failed on input %s and %s" \
% (`first`, `second`)
def test_assertFloatEqualAbs_threshold(self):
"""assertFloatEqualAbs should raise exception when eps is very small"""
for first, second in self.within_1e6_abs_pairs:
try:
self.assertFloatEqualAbs(first, second, 1e-30)
except:
message = str(exc_info()[1])
diff = first - second
self.assertEqual(message,
'Got %s, but expected %s (diff was %s)' \
% (`first`, `second`, `diff`))
else:
raise AssertionError, \
"unit_test.assertFloatEqualAbs failed on input %s and %s" \
% (`first`, `second`)
def test_assertFloatEqualAbs_unequal(self):
"""assertFloatEqualAbs should raise exception when values differ by >eps"""
for first, second in self.outside_1e6_abs_pairs:
try:
self.assertFloatEqualAbs(first, second)
except:
message = str(exc_info()[1])
diff = first - second
self.assertEqual(message,
'Got %s, but expected %s (diff was %s)' \
% (`first`, `second`, `diff`))
else:
raise AssertionError, \
"unit_test.assertFloatEqualAbs failed on input %s and %s" \
% (`first`, `second`)
def test_assertFloatEqualAbs_shape_mismatch(self):
"""assertFloatEqualAbs should raise when obs and exp shapes mismatch"""
obs = [1,2,3]
exp = [1,2,3,4]
self.assertRaises(AssertionError, self.assertFloatEqualAbs, obs, exp)
def test_assertFloatEqualRel_equal(self):
"""assertFloatEqualRel should not raise exception when values within eps"""
for first, second in self.within_1e6_rel_pairs:
try:
self.assertFloatEqualRel(first, second)
except:
raise AssertionError, \
"unit_test.assertFloatEqualRel failed on input %s and %s" \
% (`first`, `second`)
def test_assertFloatEqualRel_unequal(self):
"""assertFloatEqualRel should raise exception when eps is very small"""
for first, second in self.within_1e6_rel_pairs:
try:
self.assertFloatEqualRel(first, second, 1e-30)
except:
message = str(exc_info()[1])
diff = first - second
self.assertEqual(message,
'Got %s, but expected %s (diff was %s)' \
% (`first`, `second`, `diff`))
else:
raise AssertionError, \
"unit_test.assertFloatEqualRel failed on input %s and %s" \
% (`first`, `second`)
def test_assertFloatEqualRel_unequal(self):
"""assertFloatEqualRel should raise exception when values differ by >eps"""
for first, second in self.outside_1e6_rel_pairs:
try:
self.assertFloatEqualRel(first, second)
except:
message = str(exc_info()[1])
diff = first - second
self.assertEqual(message,
'Got %s, but expected %s (diff was %s)' \
% (`first`, `second`, `diff`))
else:
raise AssertionError, \
"unit_test.assertFloatEqualRel failed on input %s and %s" \
% (`first`, `second`)
def test_assertFloatEqualRel_shape_mismatch(self):
"""assertFloatEqualRel should raise when obs and exp shapes mismatch"""
obs = [1,2,3]
exp = [1,2,3,4]
self.assertRaises(AssertionError, self.assertFloatEqualRel, obs, exp)
def test_assertFloatEqualList_equal(self):
"""assertFloatEqual should work on two lists of similar values"""
originals = [0, 1, -1, 10, -10, 100, -100]
modified = [i + 1e-7 for i in originals]
try:
self.assertFloatEqual(originals, modified)
self.assertFloatEqual([], []) #test empty lists as well
except:
raise AssertionError, \
"unit_test.assertFloatEqual failed on lists of similar values"
def test_assertFloatEqual_shape_mismatch(self):
"""assertFloatEqual should raise when obs and exp shapes mismatch"""
obs = [1,2,3]
exp = [1,2,3,4]
self.assertRaises(AssertionError, self.assertFloatEqual, obs, exp)
def test_assertFloatEqualList_unequal(self):
"""assertFloatEqual should fail on two lists of dissimilar values"""
originals = [0, 1, -1, 10, -10, 100, -100]
modified = [i + 1e-5 for i in originals]
try:
self.assertFloatEqual(originals, modified)
except:
pass
else:
raise AssertionError, \
"unit_test.assertFloatEqual failed on lists of dissimilar values"
def test_assertFloatEqual_mixed(self):
"""assertFloatEqual should work on equal lists of mixed types."""
first = [i[0] for i in self.equal_pairs]
second = [i[1] for i in self.equal_pairs]
self.assertFloatEqual(first, second)
def test_assertFloatEqualAbs_mixed(self):
first = [i[0] for i in self.equal_pairs]
second = [i[1] for i in self.equal_pairs]
"""assertFloatEqualAbs should work on equal lists of mixed types."""
self.assertFloatEqualAbs(first, second)
def test_assertFloatEqualRel_mixed(self):
first = [i[0] for i in self.equal_pairs]
second = [i[1] for i in self.equal_pairs]
"""assertFloatEqualRel should work on equal lists of mixed types."""
self.assertFloatEqualRel(first, second)
def test_assertFloatEqual_mixed_unequal(self):
"""assertFloatEqual should work on unequal lists of mixed types."""
first = [i[0] for i in self.unequal_pairs]
second = [i[1] for i in self.unequal_pairs]
self.assertRaises(AssertionError, \
self.assertFloatEqual, first, second)
def test_assertFloatEqualAbs_mixed(self):
"""assertFloatEqualAbs should work on lists of mixed types."""
first = [i[0] for i in self.unequal_pairs]
second = [i[1] for i in self.unequal_pairs]
self.assertRaises(AssertionError, \
self.assertFloatEqualAbs, first, second)
def test_assertFloatEqualRel_mixed(self):
"""assertFloatEqualRel should work on lists of mixed types."""
first = [i[0] for i in self.unequal_pairs]
second = [i[1] for i in self.unequal_pairs]
self.assertRaises(AssertionError, \
self.assertFloatEqualRel, first, second)
def test_assertEqualItems(self):
"""assertEqualItems should raise exception if items not equal"""
self.assertEqualItems('abc', 'abc')
self.assertEqualItems('abc', 'cba')
self.assertEqualItems('', '')
self.assertEqualItems('abc', ['a','b','c'])
self.assertEqualItems([0], [0.0])
try:
self.assertEqualItems('abc', 'abcd')
except:
message = str(exc_info()[1])
self.assertEqual(message,
'Observed and expected are different lengths: 3 and 4')
else:
raise AssertionError, \
"unit_test.assertEqualItems failed on input %s and %s" \
% (`first`, `second`)
try:
self.assertEqualItems('cab', 'acc')
except:
message = str(exc_info()[1])
self.assertEqual(message,
'Observed b and expected c at sorted index 1')
else:
raise AssertionError, \
"unit_test.assertEqualItems failed on input %s and %s" \
% (`first`, `second`)
try:
self.assertEqualItems('cba', 'yzx')
except:
message = str(exc_info()[1])
self.assertEqual(message,
'Observed a and expected x at sorted index 0')
else:
raise AssertionError, \
"unit_test.assertEqualItems failed on input %s and %s" \
% (`first`, `second`)
def test_assertSameItems(self):
"""assertSameItems should raise exception if items not same"""
x = 0
y = 'abcdef'
z = 3
y1 = 'abc' + 'def'
z1 = 3.0
y_id = id(y)
z_id = id(z)
y1_id = id(y1)
z1_id = id(z1)
self.assertSameItems([x,y,z], [x,y,z])
self.assertSameItems([x,y,z], [z,x,y])
self.assertSameItems('', '')
self.assertSameItems([x,y,z], (x,y,z))
try:
self.assertSameItems([x,y,z], [x,y,z,y])
except:
message = str(exc_info()[1])
self.assertEqual(message,
'Observed and expected are different lengths: 3 and 4')
else:
raise AssertionError, \
"unit_test.assertSameItems failed on input %s and %s" \
% (`[x,y,z]`, `[x,y,z,y]`)
try:
first_list = [x,y,z]
second_list = [y,x,z1]
self.assertSameItems(first_list, second_list)
except self.failureException:
pass
else:
raise AssertionError, \
"unit_test.assertEqualItems failed on input %s and %s" \
% (`[x,y,z]`, `[y,x,z1]`)
# assert y is not y1
# try:
# self.assertSameItems([y], (y1,))
# except self.failureException:
# pass
# else:
# raise AssertionError, \
# "unit_test.assertEqualItems failed on input %s and %s" \
# % (`[y]`, `(y1,)`)
def test_assertNotEqualItems(self):
"""assertNotEqualItems should raise exception if all items equal"""
self.assertNotEqualItems('abc', '')
self.assertNotEqualItems('abc', 'cbad')
self.assertNotEqualItems([0], [0.01])
try:
self.assertNotEqualItems('abc', 'abc')
except:
message = str(exc_info()[1])
self.assertEqual(message,
"Observed 'abc' has same items as 'abc'")
else:
raise AssertionError, \
"unit_test.assertNotEqualItems failed on input %s and %s" \
% (`'abc'`, `'abc'`)
try:
self.assertNotEqualItems('', [])
except:
message = str(exc_info()[1])
self.assertEqual(message, "Observed '' has same items as []")
else:
raise AssertionError, \
"unit_test.assertNotEqualItems failed on input %s and %s" \
% (`''`, `[]`)
def test_assertContains(self):
"""assertContains should raise exception if item not in test set"""
self.assertContains('abc', 'a')
self.assertContains(['a', 'b', 'c'], 'a')
self.assertContains(['a', 'b', 'c'], 'b')
self.assertContains(['a', 'b', 'c'], 'c')
self.assertContains({'a':1, 'b':2}, 'a')
class _fake_container(object):
def __contains__(self, other):
return True
fake = _fake_container()
self.assertContains(fake, 'x')
self.assertContains(fake, 3)
self.assertContains(fake, {'a':'b'})
try:
self.assertContains('', [])
except:
message = str(exc_info()[1])
self.assertEqual(message, "Item [] not found in ''")
else:
raise AssertionError, \
"unit_test.assertContains failed on input %s and %s" \
% (`''`, `[]`)
try:
self.assertContains('abcd', 'x')
except:
message = str(exc_info()[1])
self.assertEqual(message, "Item 'x' not found in 'abcd'")
else:
raise AssertionError, \
"unit_test.assertContains failed on input %s and %s" \
% (`'abcd'`, `'x'`)
def test_assertNotContains(self):
"""assertNotContains should raise exception if item in test set"""
self.assertNotContains('abc', 'x')
self.assertNotContains(['a', 'b', 'c'], 'x')
self.assertNotContains('abc', None)
self.assertNotContains(['a', 'b', 'c'], {'x':1})
self.assertNotContains({'a':1, 'b':2}, 3.0)
class _fake_container(object):
def __contains__(self, other):
return False
fake = _fake_container()
self.assertNotContains(fake, 'x')
self.assertNotContains(fake, 3)
self.assertNotContains(fake, {'a':'b'})
try:
self.assertNotContains('', '')
except:
message = str(exc_info()[1])
self.assertEqual(message, "Item '' should not have been in ''")
else:
raise AssertionError, \
"unit_test.assertNotContains failed on input %s and %s" \
% (`''`, `''`)
try:
self.assertNotContains('abcd', 'a')
except:
message = str(exc_info()[1])
self.assertEqual(message, "Item 'a' should not have been in 'abcd'")
else:
raise AssertionError, \
"unit_test.assertNotContains failed on input %s and %s" \
% (`'abcd'`, `'a'`)
try:
self.assertNotContains({'a':1, 'b':2}, 'a')
except:
message = str(exc_info()[1])
self.assertEqual(message, \
"Item 'a' should not have been in {'a': 1, 'b': 2}")
else:
raise AssertionError, \
"unit_test.assertNotContains failed on input %s and %s" \
% (`{'a':1, 'b':2}`, `'a'`)
def test_assertGreaterThan_equal(self):
"""assertGreaterThan should raise exception if equal"""
self.assertRaises(AssertionError, self.assertGreaterThan, 5, 5)
self.assertRaises(AssertionError, self.assertGreaterThan, 5.0, 5.0)
self.assertRaises(AssertionError, self.assertGreaterThan, 5.0, 5)
self.assertRaises(AssertionError, self.assertGreaterThan, 5, 5.0)
def test_assertGreaterThan_None(self):
"""assertGreaterThan should raise exception if compared to None"""
self.assertRaises(AssertionError, self.assertGreaterThan, 5, None)
self.assertRaises(AssertionError, self.assertGreaterThan, None, 5)
self.assertRaises(AssertionError, self.assertGreaterThan, 5.0, None)
self.assertRaises(AssertionError, self.assertGreaterThan, None, 5.0)
def test_assertGreaterThan_numbers_true(self):
"""assertGreaterThan should pass when observed > value"""
self.assertGreaterThan(10, 5)
def test_assertGreaterThan_numbers_false(self):
"""assertGreaterThan should raise when observed <= value"""
self.assertRaises(AssertionError, self.assertGreaterThan, 2, 5)
def test_assertGreaterThan_numbers_list_true(self):
"""assertGreaterThan should pass when all elements are > value"""
observed = [1,2,3,4,3,2,3,4,6,3]
self.assertGreaterThan(observed, 0)
def test_assertGreaterThan_numbers_list_false(self):
"""assertGreaterThan should raise when a single element is <= value"""
observed = [2,3,4,3,2,1,3,4,6,3]
self.assertRaises(AssertionError, self.assertGreaterThan, observed, 1)
def test_assertGreaterThan_floats_true(self):
"""assertGreaterThan should pass when observed > value"""
self.assertGreaterThan(5.0, 3.0)
def test_assertGreaterThan_floats_false(self):
"""assertGreaterThan should raise when observed <= value"""
self.assertRaises(AssertionError, self.assertGreaterThan, 3.0, 5.0)
def test_assertGreaterThan_floats_list_true(self):
"""assertGreaterThan should pass when all elements are > value"""
observed = [1.0,2.0,3.0,4.0,6.0,3.0]
self.assertGreaterThan(observed, 0.0)
def test_assertGreaterThan_floats_list_false(self):
"""assertGreaterThan should raise when any elements are <= value"""
observed = [2.0,3.0,4.0,1.0, 3.0,3.0]
self.assertRaises(AssertionError, self.assertGreaterThan, observed, 1.0)
def test_assertGreaterThan_mixed_true(self):
"""assertGreaterThan should pass when observed > value"""
self.assertGreaterThan(5.0, 3)
self.assertGreaterThan(5, 3.0)
def test_assertGreaterThan_mixed_false(self):
"""assertGreaterThan should raise when observed <= value"""
self.assertRaises(AssertionError, self.assertGreaterThan, -3, 5.0)
self.assertRaises(AssertionError, self.assertGreaterThan, 3.0, 5)
def test_assertGreaterThan_mixed_list_true(self):
"""assertGreaterThan should pass when all elements are > value"""
observed = [1.0, 2, 3.0, 4.0, 6, 3.0]
self.assertGreaterThan(observed, 0.0)
self.assertGreaterThan(observed, 0)
def test_assertGreaterThan_mixed_list_false(self):
"""assertGreaterThan should raise when a single element is <= value"""
observed = [2.0, 3, 4, 1.0, 3.0, 3.0]
self.assertRaises(AssertionError, self.assertGreaterThan, observed, 1.0)
self.assertRaises(AssertionError, self.assertGreaterThan, observed, 1)
def test_assertGreaterThan_numpy_array_true(self):
"""assertGreaterThan should pass when all elements are > value"""
observed = array([1,2,3,4])
self.assertGreaterThan(observed, 0)
self.assertGreaterThan(observed, 0.0)
def test_assertGreaterThan_numpy_array_false(self):
"""assertGreaterThan should pass when any element is <= value"""
observed = array([1,2,3,4])
self.assertRaises(AssertionError, self.assertGreaterThan, observed, 3)
self.assertRaises(AssertionError, self.assertGreaterThan, observed, 3.0)
def test_assertLessThan_equal(self):
"""assertLessThan should raise exception if equal"""
self.assertRaises(AssertionError, self.assertLessThan, 5, 5)
self.assertRaises(AssertionError, self.assertLessThan, 5.0, 5.0)
self.assertRaises(AssertionError, self.assertLessThan, 5.0, 5)
self.assertRaises(AssertionError, self.assertLessThan, 5, 5.0)
def test_assertLessThan_None(self):
"""assertLessThan should raise exception if compared to None"""
self.assertRaises(AssertionError, self.assertLessThan, 5, None)
self.assertRaises(AssertionError, self.assertLessThan, None, 5)
self.assertRaises(AssertionError, self.assertLessThan, 5.0, None)
self.assertRaises(AssertionError, self.assertLessThan, None, 5.0)
def test_assertLessThan_numbers_true(self):
"""assertLessThan should pass when observed < value"""
self.assertLessThan(10, 15)
def test_assertLessThan_numbers_false(self):
"""assertLessThan should raise when observed >= value"""
self.assertRaises(AssertionError, self.assertLessThan, 6, 5)
def test_assertLessThan_numbers_list_true(self):
"""assertLessThan should pass when all elements are < value"""
observed = [1,2,3,4,3,2,3,4,6,3]
self.assertLessThan(observed, 8)
def test_assertLessThan_numbers_list_false(self):
"""assertLessThan should raise when a single element is >= value"""
observed = [2,3,4,3,2,1,3,4,6,3]
self.assertRaises(AssertionError, self.assertLessThan, observed, 6)
def test_assertLessThan_floats_true(self):
"""assertLessThan should pass when observed < value"""
self.assertLessThan(-5.0, 3.0)
def test_assertLessThan_floats_false(self):
"""assertLessThan should raise when observed >= value"""
self.assertRaises(AssertionError, self.assertLessThan, 3.0, -5.0)
def test_assertLessThan_floats_list_true(self):
"""assertLessThan should pass when all elements are < value"""
observed = [1.0,2.0,-3.0,4.0,-6.0,3.0]
self.assertLessThan(observed, 5.0)
def test_assertLessThan_floats_list_false(self):
"""assertLessThan should raise when a single element is >= value"""
observed = [2.0,3.0,4.0,1.0, 3.0,3.0]
self.assertRaises(AssertionError, self.assertLessThan, observed, 4.0)
def test_assertLessThan_mixed_true(self):
"""assertLessThan should pass when observed < value"""
self.assertLessThan(2.0, 3)
self.assertLessThan(2, 3.0)
def test_assertLessThan_mixed_false(self):
"""assertLessThan should raise when observed >= value"""
self.assertRaises(AssertionError, self.assertLessThan, 6, 5.0)
self.assertRaises(AssertionError, self.assertLessThan, 6.0, 5)
def test_assertLessThan_mixed_list_true(self):
"""assertLessThan should pass when all elements are < value"""
observed = [1.0, 2, 3.0, 4.0, 6, 3.0]
self.assertLessThan(observed, 7.0)
self.assertLessThan(observed, 7)
def test_assertLessThan_mixed_list_false(self):
"""assertLessThan should raise when a single element is >= value"""
observed = [2.0, 3, 4, 1.0, 3.0, 3.0]
self.assertRaises(AssertionError, self.assertLessThan, observed, 4.0)
self.assertRaises(AssertionError, self.assertLessThan, observed, 4)
def test_assertLessThan_numpy_array_true(self):
"""assertLessThan should pass when all elements are < value"""
observed = array([1,2,3,4])
self.assertLessThan(observed, 5)
self.assertLessThan(observed, 5.0)
def test_assertLessThan_numpy_array_false(self):
"""assertLessThan should pass when any element is >= value"""
observed = array([1,2,3,4])
self.assertRaises(AssertionError, self.assertLessThan, observed, 3)
self.assertRaises(AssertionError, self.assertLessThan, observed, 3.0)
def test_assertIsBetween_bounds(self):
"""assertIsBetween should raise if min_value >= max_value"""
self.assertRaises(AssertionError, self.assertIsBetween, 5, 6, 3)
self.assertRaises(AssertionError, self.assertIsBetween, 5, 3, 3)
def test_assertIsBetween_equal(self):
"""assertIsBetween should raise when a value is equal to either bound"""
self.assertRaises(AssertionError, self.assertIsBetween, 1, 1, 5)
self.assertRaises(AssertionError, self.assertIsBetween, 5, 1, 5)
def test_assertIsBetween_None(self):
"""assertIsBetween should raise when compared to None"""
self.assertRaises(AssertionError, self.assertIsBetween, None, 1, 5)
self.assertRaises(AssertionError, self.assertIsBetween, 1, None, 5)
self.assertRaises(AssertionError, self.assertIsBetween, 5, 1, None)
def test_assertIsBetween_numbers_true(self):
"""assertIsBetween should pass when in bounds"""
self.assertIsBetween(5,3,7)
def test_assertIsBetween_numbers_false(self):
"""assertIsBetween should raise when out of bounds"""
self.assertRaises(AssertionError, self.assertIsBetween, 5, 1, 3)
def test_assertIsBetween_numbers_list_true(self):
"""assertIsBetween should pass when all elements are in bounds"""
observed = [3,4,5,4,3,4,5,4,3]
self.assertIsBetween(observed, 1, 7)
def test_assertIsBetween_numbers_list_false(self):
"""assertIsBetween should raise when any elements are out of bounds"""
observed = [3,4,5,4,3,4,5,6]
self.assertRaises(AssertionError, self.assertIsBetween, observed, 1, 5)
def test_assertIsBetween_floats_true(self):
"""assertIsBetween should pass when in bounds"""
self.assertIsBetween(5.0, 3.0 ,7.0)
def test_assertIsBetween_floats_false(self):
"""assertIsBetween should raise when out of bounds"""
self.assertRaises(AssertionError, self.assertIsBetween, 5.0, 1.0, 3.0)
def test_assertIsBetween_floats_list_true(self):
"""assertIsBetween should pass when all elements are in bounds"""
observed = [3.0, 4.0, -5.0, 4.0, 3.0]
self.assertIsBetween(observed, -7.0, 7.0)
def test_assertIsBetween_floats_list_false(self):
"""assertIsBetween should raise when any elements are out of bounds"""
observed = [3.0, 4.0, -5.0, 5.0, 6.0]
self.assertRaises(AssertionError, self.assertIsBetween,observed,1.0,5.0)
def test_assertIsBetween_mixed_true(self):
"""assertIsBetween should pass when in bounds"""
self.assertIsBetween(5.0, 3, 7)
self.assertIsBetween(5, 3.0, 7)
self.assertIsBetween(5, 3, 7.0)
self.assertIsBetween(5.0, 3.0, 7)
self.assertIsBetween(5, 3.0, 7.0)
self.assertIsBetween(5.0, 3, 7.0)
def test_assertIsBetween_mixed_false(self):
"""assertIsBetween should raise when out of bounds"""
self.assertRaises(AssertionError, self.assertIsBetween, 5.0, 1, 3)
self.assertRaises(AssertionError, self.assertIsBetween, 5, 1.0, 3)
self.assertRaises(AssertionError, self.assertIsBetween, 5, 1, 3.0)
self.assertRaises(AssertionError, self.assertIsBetween, 5.0, 1.0, 3)
self.assertRaises(AssertionError, self.assertIsBetween, 5, 1.0, 3.0)
self.assertRaises(AssertionError, self.assertIsBetween, 5.0, 1, 3.0)
def test_assertIsBetween_mixed_list_true(self):
"""assertIsBetween should pass when all elements are in bounds"""
observed = [3,4,5,4.0,3,4.0,5,4,3.0]
self.assertIsBetween(observed, 1, 7)
self.assertIsBetween(observed, 1.0, 7)
self.assertIsBetween(observed, 1, 7.0)
self.assertIsBetween(observed, 1.0, 7.0)
def test_assertIsBetween_mixed_list_false(self):
"""assertIsBetween should raise when any elements are out of bounds"""
observed = [3.0,4,5.0,4,3,4.0,5,6]
self.assertRaises(AssertionError, self.assertIsBetween,observed, 1.0, 5)
self.assertRaises(AssertionError, self.assertIsBetween,observed, 1, 5.0)
self.assertRaises(AssertionError, self.assertIsBetween,observed,1.0,5.0)
self.assertRaises(AssertionError, self.assertIsBetween,observed, 1, 5)
def test_assertIsBetween_numpy_array_true(self):
"""assertIsBetween should pass when all elements are in bounds"""
observed = array([1,2,4,5,6])
self.assertIsBetween(observed, 0, 7)
def test_assertIsBetween_numpy_array_false(self):
"""assertIsBetween should raise when any elements is out of bounds"""
observed = array([1,2,4,5,6])
self.assertRaises(AssertionError, self.assertIsBetween, observed, 2, 7)
def test_assertIsProb_None(self):
"""assertIsProb should raise when compared against None"""
self.assertRaises(AssertionError, self.assertIsProb, None)
def test_assertIsProb_numbers_true(self):
"""assertIsProb should pass when compared against valid numbers"""
self.assertIsProb(0)
self.assertIsProb(1)
def test_assertIsProb_numbers_false(self):
"""assertIsProb should raise when compared against invalid numbers"""
self.assertRaises(AssertionError, self.assertIsProb, -1)
self.assertRaises(AssertionError, self.assertIsProb, 2)
def test_assertIsProb_numbers_list_true(self):
"""assertIsProb should pass when all elements are probs"""
observed = [0, 1, 0]
self.assertIsProb(observed)
def test_assertIsProb_numbers_list_false(self):
"""assertIsProb should raise when any element is not a prob"""
observed = [-2, -4, 3]
self.assertRaises(AssertionError, self.assertIsProb, observed)
def test_assertIsProb_float_true(self):
"""assertIsProb should pass when compared against valid numbers"""
self.assertIsProb(0.0)
self.assertIsProb(1.0)
def test_assertIsProb_float_false(self):
"""assertIsProb should raise when compared against invalid numbers"""
self.assertRaises(AssertionError, self.assertIsProb, -1.0)
self.assertRaises(AssertionError, self.assertIsProb, 2.0)
def test_assertIsProb_float_list_true(self):
"""assertIsProb should pass when all elements are probs"""
observed = [0.0, 1.0, 0.0]
self.assertIsProb(observed)
def test_assertIsProb_float_list_false(self):
"""assertIsProb should raise when any element is not a prob"""
observed = [-2.0, -4.0, 3.0]
self.assertRaises(AssertionError, self.assertIsProb, observed)
def test_assertIsProb_mixed_list_true(self):
"""assertIsProb should pass when all elements are probs"""
observed = [0.0, 1, 0.0]
self.assertIsProb(observed)
def test_assertIsProb_mixed_list_false(self):
"""assertIsProb should raise when any element is not a prob"""
observed = [-2.0, -4, 3.0]
self.assertRaises(AssertionError, self.assertIsProb, observed)
def test_assertIsProb_numpy_array_true(self):
"""assertIsProb should pass when all elements are probs"""
observed = array([0.0,0.4,0.8])
self.assertIsProb(observed)
def test_assertIsProb_numpy_array_true(self):
"""assertIsProb should pass when all elements are probs"""
observed = array([0.0,-0.4,0.8])
self.assertRaises(AssertionError, self.assertIsProb, observed)
def test_assertSimilarMeans_one_obs_true(self):
"""assertSimilarMeans should pass when p > pvalue"""
obs = [5]
expected = [1,2,3,4,5,6,7,8,9,10,11]
self.assertSimilarMeans(obs, expected)
self.assertSimilarMeans(obs, expected, pvalue=0.25)
self._set_suite_pvalue(0.10)
self.assertSimilarMeans(obs, expected)
def test_assertSimilarMeans_one_obs_false(self):
"""assertSimilarMeans should raise when p < pvalue"""
obs = [5]
expected = [.001,.009,.00012]
self.assertRaises(AssertionError, self.assertSimilarMeans, \
obs, expected)
self.assertRaises(AssertionError, self.assertSimilarMeans, \
obs, expected, 0.1)
self._set_suite_pvalue(0.001)
self.assertRaises(AssertionError, self.assertSimilarMeans, \
obs, expected)
def test_assertSimilarMeans_twosample_true(self):
"""assertSimilarMeans should pass when p > pvalue"""
obs = [4,5,6]
expected = [1,2,3,4,5,6,7,8,9]
self.assertSimilarMeans(obs, expected)
self.assertSimilarMeans(obs, expected, pvalue=0.25)
self._set_suite_pvalue(0.10)
self.assertSimilarMeans(obs, expected)
def test_assertSimilarMeans_twosample_false(self):
"""assertSimilarMeans should raise when p < pvalue"""
obs = [1,2,3]
expected = [6,7,8,9,10,11,12,13,14]
self.assertRaises(AssertionError, self.assertSimilarMeans, \
obs, expected)
self.assertRaises(AssertionError, self.assertSimilarMeans, \
obs, expected, 0.1)
self._set_suite_pvalue(0.001)
self.assertRaises(AssertionError, self.assertSimilarMeans, \
obs, expected)
def test_assertSimilarFreqs_true(self):
"""assertSimilarFreqs should pass when p > pvalue"""
observed = [2,2,3,2,1,2,2,2,2]
expected = [2,2,2,2,2,2,2,2,2]
self.assertSimilarFreqs(observed, expected)
self.assertSimilarFreqs(observed, expected, pvalue=0.25)
self._set_suite_pvalue(0.10)
self.assertSimilarFreqs(observed, expected)
def test_assertSimilarFreqs_false(self):
"""assertSimilarFreqs should raise when p < pvalue"""
observed = [10,15,20,10,12,12,13]
expected = [100,50,10,20,700,2,100]
self.assertRaises(AssertionError, self.assertSimilarFreqs, \
observed, expected)
self.assertRaises(AssertionError, self.assertSimilarFreqs, \
observed, expected, 0.2)
self._set_suite_pvalue(0.001)
self.assertRaises(AssertionError, self.assertSimilarFreqs, \
observed, expected)
def test_assertSimilarFreqs_numpy_array_true(self):
"""assertSimilarFreqs should pass when p > pvalue"""
observed = array([2,2,3,2,1,2,2,2,2])
expected = array([2,2,2,2,2,2,2,2,2])
self.assertSimilarFreqs(observed, expected)
self.assertSimilarFreqs(observed, expected, pvalue=0.25)
self._set_suite_pvalue(0.10)
self.assertSimilarFreqs(observed, expected)
def test_assertSimilarFreqs_numpy_array_false(self):
"""assertSimilarFreqs should raise when p < pvalue"""
observed = array([10,15,20,10,12,12,13])
expected = array([100,50,10,20,700,2,100])
self.assertRaises(AssertionError, self.assertSimilarFreqs, \
observed, expected)
self.assertRaises(AssertionError, self.assertSimilarFreqs, \
observed, expected, 0.2)
self._set_suite_pvalue(0.001)
self.assertRaises(AssertionError, self.assertSimilarFreqs, \
observed, expected)
def test_set_suite_pvalue(self):
"""Should set the suite pvalue"""
# force stats to fail
self._set_suite_pvalue(0.99)
obs = [2,5,6]
exp = [1,2,3,4,5,6,7,8,9]
self.assertRaises(AssertionError, self.assertSimilarMeans, obs, exp)
# force stats to pass
self._set_suite_pvalue(0.01)
self.assertSimilarMeans(obs, exp)
def test_assertIsPermutation_true(self):
"""assertIsPermutation should pass when a is a permutation of b"""
observed = [3,2,1,4,5]
items = [1,2,3,4,5]
self.assertIsPermutation(observed, items)
def test_assertIsPermutation_false(self):
"""assertIsPermutation should raise when a is not a permutation of b"""
items = [1,2,3,4,5]
self.assertRaises(AssertionError,self.assertIsPermutation, items,items)
self.assertRaises(AssertionError,self.assertIsPermutation, [1,2],[3,4])
def test_assertSameObj_true(self):
"""assertSameObj should pass when 'a is b'"""
self.assertSameObj("foo", "foo")
self.assertSameObj(None, None)
bar = lambda x:5
self.assertSameObj(bar, bar)
def test_assertSameObj_false(self):
"""assertSameObj should raise when 'a is not b'"""
self.assertRaises(AssertionError, self.assertSameObj, "foo", "bar")
self.assertRaises(AssertionError, self.assertSameObj, None, "bar")
self.assertRaises(AssertionError, self.assertSameObj, lambda x:5, \
lambda y:6)
def test_assertNotSameObj_true(self):
"""assertNotSameObj should pass when 'a is not b'"""
self.assertNotSameObj("foo", "bar")
self.assertNotSameObj(None, 5)
self.assertNotSameObj(lambda x:5, lambda y:6)
def test_assertNotSameObj_false(self):
"""assertSameObj should raise when 'a is b'"""
self.assertRaises(AssertionError, self.assertNotSameObj, "foo", "foo")
self.assertRaises(AssertionError, self.assertNotSameObj, None, None)
bar = lambda x:5
self.assertRaises(AssertionError, self.assertNotSameObj, bar, bar)
def test_assertIsNotBetween_bounds(self):
"""assertIsNotBetween should raise if min_value >= max_value"""
self.assertRaises(AssertionError, self.assertIsNotBetween, 5, 4, 3)
self.assertRaises(AssertionError, self.assertIsNotBetween, 5, 3, 3)
def test_assertIsNotBetween_equals(self):
"""assertIsNotBetween should pass when equal on either bound"""
self.assertIsNotBetween(1, 1, 2)
self.assertIsNotBetween(1.0, 1, 2)
self.assertIsNotBetween(1, 1.0, 2)
self.assertIsNotBetween(1.0, 1.0, 2)
self.assertIsNotBetween(2, 1, 2)
self.assertIsNotBetween(2.0, 1, 2)
self.assertIsNotBetween(2, 1, 2.0)
self.assertIsNotBetween(2.0, 1, 2.0)
def test_assertIsNotBetween_None(self):
"""assertIsNotBetween should raise when compared against None"""
self.assertRaises(AssertionError, self.assertIsNotBetween, None, 1, 2)
self.assertRaises(AssertionError, self.assertIsNotBetween, 1, None, 2)
self.assertRaises(AssertionError, self.assertIsNotBetween, 1, 2, None)
def test_assertIsNotBetween_numbers_true(self):
"""assertIsNotBetween should pass when a number is not in bounds"""
self.assertIsNotBetween(1,2,3)
self.assertIsNotBetween(4,2,3)
self.assertIsNotBetween(-1,-3,-2)
self.assertIsNotBetween(-4,-3,-2)
self.assertIsNotBetween(2,-1,1)
def test_assertIsNotBetween_numbers_false(self):
"""assertIsNotBetween should raise when a number is in bounds"""
self.assertRaises(AssertionError, self.assertIsNotBetween, 2, 1, 3)
self.assertRaises(AssertionError, self.assertIsNotBetween, 0, -1, 1)
self.assertRaises(AssertionError, self.assertIsNotBetween, -2, -3, -1)
def test_assertIsNotBetween_numbers_list_true(self):
"""assertIsNotBetween should pass when all elements are out of bounds"""
obs = [1,2,3,4,5]
self.assertIsNotBetween(obs, 5, 10)
self.assertIsNotBetween(obs, -2, 1)
def test_assertIsNotBetween_numbers_list_false(self):
"""assertIsNotBetween should raise when any element is in bounds"""
obs = [1,2,3,4,5]
self.assertRaises(AssertionError, self.assertIsNotBetween, obs, 3, 7)
self.assertRaises(AssertionError, self.assertIsNotBetween, obs, -3, 3)
self.assertRaises(AssertionError, self.assertIsNotBetween, obs, 2, 4)
def test_assertIsNotBetween_float_true(self):
"""assertIsNotBetween should pass when a number is not in bounds"""
self.assertIsNotBetween(1.0, 2.0, 3.0)
self.assertIsNotBetween(4.0, 2.0, 3.0)
self.assertIsNotBetween(-1.0, -3.0, -2.0)
self.assertIsNotBetween(-4.0, -3.0, -2.0)
self.assertIsNotBetween(2.0, -1.0, 1.0)
def test_assertIsNotBetween_float_false(self):
"""assertIsNotBetween should raise when a number is in bounds"""
self.assertRaises(AssertionError, self.assertIsNotBetween, 2.0,1.0,3.0)
self.assertRaises(AssertionError, self.assertIsNotBetween, 0.0,-1.0,1.0)
self.assertRaises(AssertionError,self.assertIsNotBetween,-2.0,-3.0,-1.0)
def test_assertIsNotBetween_float_list_true(self):
"""assertIsNotBetween should pass when all elements are out of bounds"""
obs = [1.0, 2.0, 3.0, 4.0, 5.0]
self.assertIsNotBetween(obs, 5.0, 10.0)
self.assertIsNotBetween(obs, -2.0, 1.0)
def test_assertIsNotBetween_float_list_false(self):
"""assertIsNotBetween should raise when any element is in bounds"""
obs = [1.0, 2.0, 3.0, 4.0, 5.0]
self.assertRaises(AssertionError,self.assertIsNotBetween, obs, 3.0, 7.0)
self.assertRaises(AssertionError,self.assertIsNotBetween, obs, -3.0,3.0)
self.assertRaises(AssertionError,self.assertIsNotBetween, obs, 2.0, 4.0)
def test_assertIsNotBetween_mixed_true(self):
"""assertIsNotBetween should pass when a number is not in bounds"""
self.assertIsNotBetween(1, 2.0, 3.0)
self.assertIsNotBetween(1.0, 2, 3.0)
self.assertIsNotBetween(1.0, 2.0, 3)
def test_assertIsNotBetween_mixed_false(self):
"""assertIsNotBetween should raise when a number is in bounds"""
self.assertRaises(AssertionError, self.assertIsNotBetween, 2.0, 1.0, 3)
self.assertRaises(AssertionError, self.assertIsNotBetween, 2.0, 1, 3.0)
self.assertRaises(AssertionError, self.assertIsNotBetween, 2, 1.0, 3.0)
def test_assertIsNotBetween_mixed_list_true(self):
"""assertIsNotBetween should pass when all elements are not in bounds"""
obs = [1, 2.0, 3, 4.0, 5.0]
self.assertIsNotBetween(obs, 5.0, 10.0)
self.assertIsNotBetween(obs, 5, 10.0)
self.assertIsNotBetween(obs, 5.0, 10)
def test_assertIsNotBetween_mixed_list_false(self):
"""assertIsNotBetween should raise when any element is in bounds"""
obs = [1, 2.0, 3, 4.0, 5.0]
self.assertRaises(AssertionError,self.assertIsNotBetween, obs, 3.0, 7.0)
self.assertRaises(AssertionError,self.assertIsNotBetween, obs, 3, 7.0)
self.assertRaises(AssertionError,self.assertIsNotBetween, obs, 3.0, 7)
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_struct/__init__.py 000644 000765 000024 00000000637 12024702176 022105 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
__all__ = ['test_rna2d', 'test_annotation', 'test_selection', 'test_asa',
'test_manipulation', 'test_contact']
__author__ = ""
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Sandra Smit", "Marcin Cieslik"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
PyCogent-1.5.3/tests/test_struct/test_annotation.py 000644 000765 000024 00000004315 12024702176 023554 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
import os
try:
from cogent.util.unit_test import TestCase, main
from cogent.parse.pdb import PDBParser
from cogent.struct.annotation import xtradata
from cogent.struct.selection import einput
except ImportError:
from zenpdb.cogent.util.unit_test import TestCase, main
from zenpdb.cogent.parse.pdb import PDBParser
from zenpdb.cogent.struct.annotation import xtradata
from zenpdb.cogent.struct.selection import einput
__author__ = "Marcin Cieslik"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Marcin Cieslik"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Marcin Cieslik"
__email__ = "mpc4p@virginia.edu"
__status__ = "Development"
class AnnotationTest(TestCase):
"""tests if annotation get into xtra."""
def setUp(self):
self.input_file = os.path.join('data', '2E12.pdb')
self.input_structure = PDBParser(open(self.input_file))
def test_xtradata(self):
"""tests if an full_id's in the data dict are correctly parsed."""
structure = einput(self.input_structure, 'S')[('2E12',)]
model = einput(self.input_structure, 'M')[('2E12', 0)]
chain = einput(self.input_structure, 'C')[('2E12', 0, 'B')]
residue = einput(self.input_structure, 'R')[('2E12', 0, 'B', ('LEU', 24, ' '))]
atom = einput(self.input_structure, 'A')[('2E12', 0, 'B', ('LEU', 24, ' '), ('CD1', ' '))]
data_model = {(None, 0):{'model':1}}
xtradata(data_model, structure)
self.assertEquals(model.xtra, {'model': 1})
data_chain = {(None, None, 'B'):{'chain':1}}
xtradata(data_chain, model)
self.assertEquals(chain.xtra, {'chain': 1})
data_chain = {(None, 0, 'B'):{'chain': 2}}
xtradata(data_chain, structure)
self.assertEquals(chain.xtra['chain'], 2)
data_residue = {(None, None, 'B', ('LEU', 24, ' ')):{'residue':1}}
xtradata(data_residue, model)
self.assertEquals(residue.xtra, {'residue': 1})
data_residue = {(None, 0, 'B', ('LEU', 24, ' ')):{'residue':2}}
xtradata(data_residue, structure)
self.assertEquals(residue.xtra, {'residue': 2})
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_struct/test_asa.py 000644 000765 000024 00000021056 12024702176 022147 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
import os
import numpy as np
from numpy import sum
from cogent.util.unit_test import TestCase, main
from cogent.app.util import ApplicationNotFoundError
from cogent.parse.pdb import PDBParser
from cogent.struct.selection import einput
from cogent.maths.stats.test import correlation
__author__ = "Marcin Cieslik"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Marcin Cieslik"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Marcin Cieslik"
__email__ = "mpc4p@virginia.edu"
__status__ = "Development"
class DummyFile(object):
def __init__(self, some_string):
self.some_string = some_string
slist = self.some_string.split('\n')
self.some_string_list = [i + '\n' for i in slist]
def readlines(self):
return self.some_string_list
def close(self):
pass
test_file_water = """
HETATM 1185 O HOH 268 141.577 14.676 13.168 1.00 54.76 O
HETATM 1186 O HOH 269 137.019 17.606 19.854 1.00 33.36 O
HETATM 1187 O HOH 270 149.639 55.203 4.611 1.00 49.01 O
HETATM 1188 O HOH 271 156.238 32.191 -4.204 1.00 64.53 O
CONECT 453 685
CONECT 685 453
MASTER 357 0 0 1 10 0 0 6 1187 1 2 12
END
"""
dummy_water = DummyFile(test_file_water)
class asaTest(TestCase):
"""Tests for surface calculations."""
def setUp(self):
self.arr = np.random.random(3000).reshape((1000, 3))
self.point = np.random.random(3)
self.center = np.array([0.5, 0.5, 0.5])
def test_0import(self):
# sort by name
"""tests if can import _asa cython extension."""
global _asa
from cogent.struct import _asa
assert 'asa_loop' in dir(_asa)
def test_1import(self):
# sort by name
"""tests if can import asa."""
global asa
from cogent.struct import asa
def test_asa_loop(self):
"""tests if inner asa_loop (cython) performs correctly"""
self.lcoords = np.array([[-4., 0, 0], [0, 0, 0], [4, 0, 0], [10, 0, 0]])
self.qcoords = np.array([[0., 0, 0], [4., 0, 0]])
self.lradii = np.array([2., 3.])
self.qradii = np.array([3., 2.])
#spoints, np.ndarray[DTYPE_t, ndim =1] box,\
# DTYPE_t probe, unsigned int bucket_size, MAXSYM =200000)
self.spoints = np.array([[1., 0., 0.], [-1., 0., 0.], [0., 1., 0.], \
[0., -1., 0.], [0., 0., 1.], [0., 0., -1.]])
output = _asa.asa_loop(self.qcoords, self.lcoords, self.qradii, \
self.lradii, self.spoints, \
np.array([-100., -100., -100., 100., 100., 100.]), 1., 10)
self.assertFloatEqual(output, np.array([ 75.39822369, 41.88790205]))
def test_asa_xtra(self):
"""test internal asa"""
self.input_file = os.path.join('data', '2E12.pdb')
self.input_structure = PDBParser(open(self.input_file))
self.assertRaises(ValueError, asa.asa_xtra, self.input_structure, mode='a')
result = asa.asa_xtra(self.input_structure)
a = einput(self.input_structure, 'A')
for i in range(len(result)):
self.assertEquals(result.values()[i]['ASA'], a[result.keys()[i]].xtra['ASA'])
r = einput(self.input_structure, 'R')
for water in r.selectChildren('H_HOH', 'eq', 'name').values():
self.assertFalse('ASA' in water.xtra)
for residue in r.selectChildren('H_HOH', 'ne', 'name').values():
for a in residue:
self.assertTrue('ASA' in a.xtra)
result = asa.asa_xtra(self.input_structure, xtra_key='SASA')
for residue in r.selectChildren('H_HOH', 'ne', 'name').values():
for a in residue:
a.xtra['ASA'] == a.xtra['SASA']
def test_asa_xtra_stride(self):
"""test asa via stride"""
self.input_file = os.path.join('data', '2E12.pdb')
self.input_structure = PDBParser(open(self.input_file))
try:
result = asa.asa_xtra(self.input_structure, 'stride')
except ApplicationNotFoundError:
return
self.assertAlmostEqual(self.input_structure[(0,)][('B',)]\
[(('LEU', 35, ' '),)].xtra['STRIDE_ASA'], 17.20)
def test_compare(self):
"""compares internal asa to stride."""
self.input_file = os.path.join('data', '2E12.pdb')
self.input_structure = PDBParser(open(self.input_file))
try:
asa.asa_xtra(self.input_structure, mode='stride')
except ApplicationNotFoundError:
return
asa.asa_xtra(self.input_structure)
self.input_structure.propagateData(sum, 'A', 'ASA', xtra=True)
residues = einput(self.input_structure, 'R')
asa1 = []
asa2 = []
for residue in residues.selectChildren('H_HOH', 'ne', 'name').values():
asa1.append(residue.xtra['ASA'])
asa2.append(residue.xtra['STRIDE_ASA'])
self.assertAlmostEqual(correlation(asa1, asa2)[1], 0.)
def test_uc(self):
"""compares asa within unit cell."""
self.input_file = os.path.join('data', '2E12.pdb')
self.input_structure = PDBParser(open(self.input_file))
asa.asa_xtra(self.input_structure, symmetry_mode='uc', xtra_key='ASA_UC')
asa.asa_xtra(self.input_structure)
self.input_structure.propagateData(sum, 'A', 'ASA', xtra=True)
self.input_structure.propagateData(sum, 'A', 'ASA_UC', xtra=True)
residues = einput(self.input_structure, 'R')
x = residues[('2E12', 0, 'B', ('GLU', 77, ' '))].xtra.values()
self.assertTrue(x[0] != x[1])
def test_uc2(self):
self.input_file = os.path.join('data', '1LJO.pdb')
self.input_structure = PDBParser(open(self.input_file))
asa.asa_xtra(self.input_structure, symmetry_mode='uc', xtra_key='ASA_XTAL')
asa.asa_xtra(self.input_structure)
self.input_structure.propagateData(sum, 'A', 'ASA', xtra=True)
self.input_structure.propagateData(sum, 'A', 'ASA_XTAL', xtra=True)
residues = einput(self.input_structure, 'R')
r1 = residues[('1LJO', 0, 'A', ('ARG', 65, ' '))]
r2 = residues[('1LJO', 0, 'A', ('ASN', 46, ' '))]
self.assertFloatEqual(r1.xtra.values(),
[128.94081270529105, 22.807700865674093])
self.assertFloatEqual(r2.xtra.values(),
[115.35738419425566, 115.35738419425566])
def test_crystal(self):
"""compares asa within unit cell."""
self.input_file = os.path.join('data', '2E12.pdb')
self.input_structure = PDBParser(open(self.input_file))
asa.asa_xtra(self.input_structure, symmetry_mode='uc', crystal_mode=2, xtra_key='ASA_XTAL')
asa.asa_xtra(self.input_structure)
self.input_structure.propagateData(sum, 'A', 'ASA', xtra=True)
self.input_structure.propagateData(sum, 'A', 'ASA_XTAL', xtra=True)
residues = einput(self.input_structure, 'R')
r1 = residues[('2E12', 0, 'A', ('ALA', 42, ' '))]
r2 = residues[('2E12', 0, 'A', ('VAL', 8, ' '))]
r3 = residues[('2E12', 0, 'A', ('LEU', 25, ' '))]
self.assertFloatEqual(r1.xtra.values(), \
[32.041070749038823, 32.041070749038823])
self.assertFloatEqual(r3.xtra.values(), \
[0., 0.])
self.assertFloatEqual(r2.xtra.values(), \
[28.873559956056916, 0.0])
def test__prepare_entities(self):
self.input_structure = PDBParser(dummy_water)
self.assertRaises(ValueError, asa._prepare_entities, self.input_structure)
def _test_bio(self):
"""compares asa within a bio unit."""
self.input_file = os.path.join('data', '1A1X.pdb')
self.input_structure = PDBParser(open(self.input_file))
asa.asa_xtra(self.input_structure, symmetry_mode='bio', xtra_key='ASA_BIO')
asa.asa_xtra(self.input_structure)
self.input_structure.propagateData(sum, 'A', 'ASA', xtra=True)
self.input_structure.propagateData(sum, 'A', 'ASA_BIO', xtra=True)
residues = einput(self.input_structure, 'R')
r1 = residues[('1A1X', 0, 'A', ('GLU', 37, ' '))]
r2 = residues[('1A1X', 0, 'A', ('TRP', 15, ' '))]
self.assertFloatEqual(r1.xtra.values(), \
[20.583191467544726, 78.996394472066541])
self.assertFloatEqual(r2.xtra.values(), \
[136.41436710386989, 136.41436710386989])
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_struct/test_contact.py 000644 000765 000024 00000005274 12024702176 023042 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
import os
import numpy as np
try:
from cogent.util.unit_test import TestCase, main
from cogent.parse.pdb import PDBParser
from cogent.struct.selection import einput
except ImportError:
from zenpdb.cogent.util.unit_test import TestCase, main
from zenpdb.cogent.parse.pdb import PDBParser
from zenpdb.cogent.struct.selection import einput
__author__ = "Marcin Cieslik"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Marcin Cieslik"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Marcin Cieslik"
__email__ = "mpc4p@virginia.edu"
__status__ = "Development"
class asaTest(TestCase):
"""Tests for surface calculations."""
def setUp(self):
self.arr = np.random.random(3000).reshape((1000, 3))
self.point = np.random.random(3)
self.center = np.array([0.5, 0.5, 0.5])
def test_0import(self):
# sort by name
"""tests if can import _contact cython extension."""
global _contact
from cogent.struct import _contact
assert 'cnt_loop' in dir(_contact)
def test_1import(self):
# sort by name
"""tests if can import contact."""
global contact
from cogent.struct import contact
def test_chains(self):
"""compares contacts diff chains"""
self.input_file = os.path.join('data', '1A1X.pdb') # one chain
self.input_structure = PDBParser(open(self.input_file))
res = contact.contacts_xtra(self.input_structure)
self.assertTrue(res == {})
self.input_file = os.path.join('data', '2E12.pdb') # one chain
self.input_structure = PDBParser(open(self.input_file))
res = contact.contacts_xtra(self.input_structure)
self.assertTrue(res)
self.assertFloatEqual(\
res[('2E12', 0, 'B', ('THR', 17, ' '), ('OG1', ' '))]['CONTACTS']\
[('2E12', 0, 'A', ('ALA', 16, ' '), ('CB', ' '))][0], 5.7914192561064004)
def test_symmetry(self):
"""compares contacts diff symmetry mates"""
self.input_file = os.path.join('data', '2E12.pdb') # one chain
self.input_structure = PDBParser(open(self.input_file))
res = contact.contacts_xtra(self.input_structure, \
symmetry_mode='uc',
contact_mode='diff_sym')
self.assertTrue(res)
self.assertFloatEqual(\
res[('2E12', 0, 'B', ('GLU', 77, ' '), ('OE2', ' '))]['CONTACTS']\
[('2E12', 0, 'B', ('GLU', 57, ' '), ('OE2', ' '))][0], \
5.2156557833123873)
def test_crystal(self):
""""compares contacts diff unit-cell-mates"""
pass
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_struct/test_dihedral.py 000644 000765 000024 00000033544 12024702176 023164 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
#
# test_dihedral.py
#
# Tests the dihedral module.
#
"""Provides tests for functions in the file dihedral.py
"""
__author__ = "Kristian Rother"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Kristian Rother", "Sandra Smit"]
__credits__ = ["Janusz Bujnicki", "Nils Goldmann"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Kristian Rother"
__email__ = "krother@rubor.de"
__status__ = "Production"
from cogent.util.unit_test import main, TestCase
from cogent.struct.dihedral import dihedral, scalar, angle, \
calc_angle, DihedralGeometryError, AngleGeometryError
from random import random
from numpy import array
from math import pi, cos, sin
class DihedralTests(TestCase):
def get_random_array(self):
"""
Returns a one-dimensional numpy array with three random floats
in the range between -5.0 and +5.0.
"""
return array([(random()-0.5)*10,(random()-0.5)*10,(random()-0.5)*10])
def assertAlmostEqualAngle(self, is_value, should_value, digits=7):
"""
Checks two angles in degrees whether they are the same within
the given number of digits. This has been implemented to make sure
that 359.9999991 == 0.0
"""
maxv = 359.0
for i in range(digits):
maxv += 0.9 * 0.1**i
while is_value < 0.0: is_value += 360
while is_value > maxv: is_value -= 360
while should_value < 0.0: should_value += 360
while should_value > maxv: should_value -= 360
self.assertAlmostEqual(is_value,should_value, digits)
def test_scalar(self):
"""Tests the scalar product function for one-dimensional arrays."""
# test one-dimensional arrays
self.assertEqual(scalar(array([0]),array([0])),0.0)
self.assertEqual(scalar(array([2]),array([3])),6.0)
self.assertEqual(scalar(array([0,0]),array([0,0])),0.0)
self.assertEqual(scalar(array([-1,-4]),array([1,4])),-17.0)
self.assertEqual(scalar(array([1,2]),array([3,4])),11.0)
self.assertEqual(scalar(array([-1,-4]),array([-1,4])),-15.0)
self.assertEqual(scalar(array([0,0,0]),array([0,0,0])),0.0)
self.assertEqual(scalar(array([2.5,0,-1]),array([2.5,0,-1])),7.25)
self.assertEqual(scalar(array([1,2,3]),array([0,0,0])),0.0)
self.assertEqual(scalar(array([1,2,3]),array([1,2,3])),14.0)
self.assertEqual(scalar(array([1,2,3]),array([4,5,6])),32.0)
# test two-dimensional arrays (should not be a feature)
self.assertNotEqual(scalar(array([[0,0],[0,0]]),\
array([[0,0],[0,0]])),0.0)
def test_angle_simple(self):
"""Tests the angle function for one- and two-dimensional vectors."""
# test two-dimensional vectors (not arrays!)
self.assertEqual(angle(array([0,1]),array([1,0])),0.5*pi)
self.assertEqual(angle(array([5,0]),array([13,0])),0.0)
self.assertEqual(angle(array([2,3]),array([26,39])),0.0)
self.assertEqual(angle(array([2,3]),array([-3,2])),0.5*pi)
self.assertEqual(angle(array([-5,0]),array([13,0])),pi)
# test three-dimensional vectors (not arrays!)
self.assertEqual(angle(array([0,0,-1]),array([0,0,1])),pi)
self.assertEqual(angle(array([0,15,-1]),array([0,-15,1])),pi)
self.assertEqual(angle(array([0,0,7]),array([14,14,0])),0.5*pi)
self.assertEqual(angle(array([0,7,7]),array([0,14,14])),0.0)
self.assertAlmostEqual(angle(array([100000000.0,0,1]),\
array([1,0,0])),0.0)
def test_calc_angle_simple(self):
"""Tests the calc_angle function for one- and two-dimensional vectors."""
# test two-dimensional vectors (not arrays!)
self.assertEqual(calc_angle(array([0,1]),array([0,0]),array([1,0])),0.5*pi)
self.assertEqual(calc_angle(array([5,0]),array([0,0]),array([13,0])),0.0)
self.assertEqual(calc_angle(array([4,3]),array([2,0]),array([28,39])),0.0)
self.assertEqual(calc_angle(array([2,-13]),array([0,-10]),array([-3,-12])),0.5*pi)
self.assertEqual(calc_angle(array([-5,0]),array([0,0]),array([13,0])),pi)
# test three-dimensional vectors (not arrays!)
self.assertEqual(calc_angle(array([0,0,-1]),array([0,0,0]),array([0,0,1])),pi)
self.assertEqual(calc_angle(array([0,15,-1]),array([0,0,0]),array([0,-15,1])),pi)
self.assertEqual(calc_angle(array([0,10,7]),array([0,10,0]),array([14,24,0])),0.5*pi)
self.assertEqual(calc_angle(array([0,7,7]),array([0,0,0]),array([0,14,14])),0.0)
self.assertAlmostEqual(calc_angle(array([100000000.0,0,1]),\
array([0,0,0]),array([1,0,0])),0.0)
## def make_scipy_angles(self):
## """Generates the test data given below. Was commented out
## for getting rid of the library dependency"""
## from Scientific.Geometry import Vector
## for i in range(20):
## v1 = self.get_random_array()
## v2 = self.get_random_array()
## vec1 = Vector(v1[0],v1[1],v1[2])
## vec2 = Vector(v2[0],v2[1],v2[2])
## scipy_angle = vec1.angle(vec2)
## out = [(vec1[0],vec1[1],vec1[2]),\
## (vec2[0],vec2[1],vec2[2]),scipy_angle]
## print out,","
def test_angle_scipy(self):
"""
Asserts that dihedral and ScientificPython calculate the same angles.
"""
for v1,v2,scipy_angle in SCIPY_ANGLES:
ang = angle(array(v1),array(v2))
self.assertAlmostEqual(ang, scipy_angle)
def test_angle_fail(self):
"""The angle function should fail for zero length vectors."""
# should not work for zero length vectors
self.assertRaises(AngleGeometryError,angle,\
array([0,0]),array([0,0]))
self.assertRaises(AngleGeometryError,angle,\
array([0,0,0]),array([0,0,0]))
def test_dihedral_eight_basic_directions(self):
"""Checks dihedrals in all 45 degree intervals."""
# using vectors with integer positions.
self.assertAlmostEqualAngle(\
dihedral([-2,-1,0], [-1,0,0], [1,0,0], [2,-1, 0]), 0.0)
self.assertAlmostEqualAngle(\
dihedral([-2,-1,0], [-1,0,0], [1,0,0], [2,-1,-1]), 45.0)
self.assertAlmostEqualAngle(\
dihedral([-2,-1,0], [-1,0,0], [1,0,0], [2, 0,-1]), 90.0)
self.assertAlmostEqualAngle(\
dihedral([-2,-1,0], [-1,0,0], [1,0,0], [2, 1,-1]),135.0)
self.assertAlmostEqualAngle(\
dihedral([-2,-1,0], [-1,0,0], [1,0,0], [2, 1, 0]),180.0)
self.assertAlmostEqualAngle(\
dihedral([-2,-1,0], [-1,0,0], [1,0,0], [2, 1, 1]),225.0)
self.assertAlmostEqualAngle(\
dihedral([-2,-1,0], [-1,0,0], [1,0,0], [2, 0, 1]),270.0)
self.assertAlmostEqualAngle(\
dihedral([-2,-1,0], [-1,0,0], [1,0,0], [2,-1, 1]),315.0)
def test_dihedral_rotation(self):
"""Checks all angles in 0.2 degree intervals."""
# constructs vectors using sin/cos and then calculates dihedrals
precision = 5.0 # the higher the better
v1 = array([1,0,1])
v2 = array([0,0,1])
v3 = array([0,0,2])
for i in range(int(360*precision)):
degrees = i/precision
radians = pi*degrees/180.0
opp_degrees = 360-degrees
if opp_degrees == 360.0: opp_degrees = 0.0
# construct circular motion of vector
v4 = array([cos(radians), sin(radians),2])
self.assertAlmostEqualAngle(dihedral(v4,v3,v2,v1), degrees, 5)
# check rotation in the opposite direction
self.assertAlmostEqualAngle(dihedral(v1,v2,v3,v4), degrees, 5)
def test_dihedral_samples(self):
"""Checks values measured manually from atoms in PyMOL."""
coordinates = [
[(-1.225,4.621,42.070),(-1.407,4.455,43.516),\
(-2.495,4.892,44.221),(-3.587,5.523,43.715)],
[(-2.495,4.892,44.221),(1.513,0.381,40.711),\
(-3.091,4.715,47.723),(-0.567,3.892,44.433)],
[(-0.349,5.577,39.446),(-1.559,3.400,41.427),\
(-4.304,5.563,45.998),(-2.495,4.892,44.221)],
[(-45.819,84.315,19.372),(-31.124,72.286,14.035),\
(-27.975,58.688,7.025),(-16.238,78.659,23.731)],
[(-29.346,66.973,24.152),(-29.977,69.635,24.580),\
(-30.875,68.788,24.663),(-30.668,67.495,24.449)],
[(-34.586,84.884,14.064),(-23.351,69.756,11.028),\
(-40.924,69.442,24.630),(-30.875,68.788,24.663)]
]
angles = [1.201, 304.621, 295.672, 195.184, 358.699, 246.603]
for i in range(len(coordinates)):
v1,v2,v3,v4 = coordinates[i]
self.assertAlmostEqualAngle(dihedral(v1,v2,v3,v4), angles[i],3)
def test_dihedral_linear(self):
"""The dihedral function should fail for collinear vectors."""
v1 = [1,0,0]
v2 = [2,0,0]
v3 = [3,0,0]
v4 = [4,0,0]
# print dihedral(v1,v2,v3,v4)
for i in range(100):
offset = array([int((random()-0.5)*10),\
int((random()-0.5)*10),\
int((random()-0.5)*10)])
v1 = array([int((random()-0.5)*100),\
int((random()-0.5)*100),\
int((random()-0.5)*100)])
v2 = v1 * int((random()-0.5)*100) + offset
v3 = v1 * int((random()-0.5)*100) + offset
v4 = v1 * int((random()-0.5)*100) + offset
v1 += offset
self.assertRaises(DihedralGeometryError,dihedral,v1,v2,v3,v4)
def test_dihedral_identical(self):
"""The dihedral function should fail if two vectors are the same."""
# except for the first and last (the vectors form a triangle),
# in which case the dihedral angle should be 0.0
for i in range(100):
v1 = self.get_random_array()
v2 = self.get_random_array()
v3 = self.get_random_array()
self.assertRaises(DihedralGeometryError,dihedral,v1,v1,v2,v3)
self.assertRaises(DihedralGeometryError,dihedral,v1,v2,v1,v3)
self.assertRaises(DihedralGeometryError,dihedral,v1,v2,v2,v3)
self.assertRaises(DihedralGeometryError,dihedral,v1,v2,v3,v3)
self.assertRaises(DihedralGeometryError,dihedral,v1,v3,v2,v3)
# now the triangular case
# make sure that 359.999998 is equal to 0.0
torsion = dihedral(v1,v2,v3,v1) + 0.000001
if torsion > 360.0: torsion -= 360.0
self.assertAlmostEqualAngle(torsion,0.0,5)
SCIPY_ANGLES = [
[(-4.4891521637990852, -1.2310927013330153, -0.96969777583098771),
(4.2147455310344171, -3.5069051036633514, 2.2430685816310305),
2.2088870817461759] ,
[(0.13959847081794097, 1.7204537912940399, -1.9303780516641089),
(0.35412687539602361, -2.9493521724340743, -4.865941405480644),
1.2704043143950585] ,
[(2.3192363837822327, -3.6376441859213848, -2.2337816400479813),
(-1.0271253661119029, -2.5736009846920425, -4.1470855710278975),
0.83609068310373857] ,
[(-0.38347986357358477, -4.1453876196041719, -2.1354583394773785),
(0.27416747968044608, -2.5747732838982551, -0.68554680652905264),
0.28348352552436806] ,
[(-2.4928231204363449, 1.9263125976608209, -0.34275964486924715),
(0.6721152528064811, 1.5270465172130598, -3.5720133753579564),
1.3701121510959966] ,
[(-1.50101403139692, 2.3218275982958292, 1.044582416480222),
(-3.044743729573085, 2.0655933798619532, 2.9037849925327897),
0.46218988498537578] ,
[(4.9648826388927603, -1.7439743270051977, 1.0432135258334796),
(-3.3694557299188608, 3.7697639370274052, 2.6962018714965055),
2.3004031013653625] ,
[(-3.3337033325729051, -0.79660906888508021, -3.4875326261817454),
(1.4735023133671066, -0.066399047153666846, 0.94171530437632489),
2.8293957790283595] ,
[(-2.1249404252000517, 2.7456001658201568, 1.6891202129451799),
(-0.66412553435299504, 3.371012200444512, -1.1548086037901306),
0.89846374464990042] ,
[(3.3993205618602018, 1.2047703532166887, -1.5839949555795063),
(-4.6759756026580863, -2.8551222890449646, 4.888270217692785),
2.7825564894291754] ,
[(-3.966467296275785, 0.75617096138383189, -3.1711352932360248),
(2.1054362220912326, 4.2867761689586601, -0.65739369331424213),
1.6933117193742961] ,
[(0.44413554305522851, 4.6000690382282361, -3.8338383756621819),
(-2.4947413565865029, 1.8136080147734013, -4.0295344084655405),
0.73110709489481174] ,
[(-1.0971777991639065, -3.3166205797568815, 2.7098739534055563),
(2.1536566381847289, 4.7817155120086055, -0.068554664323454695),
2.4878958372925202] ,
[(2.2696914760438136, 4.8841630875833673, 4.9524177412608861),
(1.2249510822623111, 0.73008672334971658, 1.131607772478449),
0.45741431674591532] ,
[(-2.4456899797216938, 4.7894200033986447, -2.839449354837468),
(0.95035116225980154, 4.2179212878828238, -1.801158217734109),
0.63144509227951684] ,
[(-4.6954041297179474, -2.8326266911591391, -1.1804869511610427),
(3.2585456362256924, -2.2325171051479265, -2.0527260317826901),
1.8363466110668369] ,
[(2.1416146613604283, 3.8577375591718677, 3.1463493245087939),
(-0.32185887240442468, 2.2163051363839505, 2.4704882512058224),
0.52534998201320993] ,
[(-2.8493351354335941, -3.8203784990110954, 2.4657357720402273),
(-2.7799043389229383, 4.358406526726669, -2.8319872383058744),
2.0906833125217235] ,
[(2.274223250163784, -3.6086250253596406, 1.7143006579401876),
(3.2763334328544347, -0.89908959703552171, -4.4068824993431557),
1.4477009361020545] ,
[(0.66737672421842809, -3.4628508908383848, 3.9044108358095366),
(-1.9078974719893915, -0.53231141116878433, 1.3323584972786728),
1.0932781951137689] ,
]
if __name__== '__main__':
main()
PyCogent-1.5.3/tests/test_struct/test_knots.py 000644 000765 000024 00000153205 12024702176 022543 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
# test_knots.py
"""Provides tests for classes and functions in the file knots.py
"""
from __future__ import division
from cogent.util.unit_test import TestCase, main
from cogent.util.dict2d import Dict2D
from cogent.struct.rna2d import Pairs
from cogent.struct.knots import PairedRegion, PairedRegionFromPairs,\
PairedRegions, PairedRegionsFromPairs, ConflictMatrix,\
opt_all, contains_true, empty_matrix,\
pick_multi_best, dp_matrix_multi, matrix_solutions,\
opt_single_random, opt_single_property,\
inc_order, inc_length, inc_range,\
find_max_conflicts, find_min_gain,\
conflict_elimination, add_back_non_conflicting,\
num_bps, hydrogen_bonds,\
nussinov_fill, nussinov_traceback, nussinov_restricted
__author__ = "Sandra Smit"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Sandra Smit, Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Sandra Smit"
__email__ = "sandra.smit@colorado.edu"
__status__ = "Production"
class PairedRegionTests(TestCase):
"""Tests for PairedRegion class"""
def test_init_valid(self):
"""PairedRegion __init__: should work as expected on valid input
"""
pr = PairedRegion(3,10,2)
self.assertEqual(pr.Start, 3)
self.assertEqual(pr.End, 10)
self.assertEqual(pr.Length, 2)
self.assertEqual(pr.Pairs, [(3,10),(4,9)])
pr = PairedRegion(3,10,2, Id=0)
self.assertEqual(pr.Id, 0)
pr = PairedRegion(3,10,2, Id='A')
self.assertEqual(pr.Id, 'A')
self.assertRaises(ValueError, PairedRegion, 4, 10, 0)
def test_init_weird(self):
"""PairedRegion __init__: no error checking
"""
pr = PairedRegion(3,6,4)
self.assertEqual(pr.Start, 3)
self.assertEqual(pr.End, 6)
self.assertEqual(pr.Length, 4)
self.assertEqual(pr.Pairs, [(3,6),(4,5),(5,4),(6,3)])
def test_str(self):
"""PairedRegion __str__: should print pairs"""
pr = PairedRegion(3,10,2)
p = Pairs([(3,10),(4,9)])
self.assertEqual(str(pr), str(p))
def test_len(self):
"""PairedRegion __len__: should return number of pairs"""
pr = PairedRegion(3,10,2)
self.assertEqual(len(pr), 2)
def test_eq(self):
"""PairedRegion __eq__: should use pairs and IDs"""
pr1 = PairedRegion(3,10,2)
pr2 = PairedRegion(3,10,2)
pr3 = PairedRegion(3,10,2, Id='A')
pr4 = PairedRegion(3,10,2, Id='A')
pr5 = PairedRegion(3,20,4, Id='A')
self.assertEqual(pr1==pr2, True) # same pairs, no IDs
self.assertEqual(pr3==pr4, True) # same pairs, same IDs
self.assertEqual(pr1==pr3, False) # same pairs, diff ID
self.assertEqual(pr3==pr5, False) # diff pairs, same IDs
def test_upstream(self):
"""PairedRegions upstream: single and multiple pair(s)"""
pr = PairedRegion(3,10,2)
self.assertEqual(pr.upstream(), [3,4])
pr = PairedRegion(3,10,1)
self.assertEqual(pr.upstream(), [3])
def test_downstream(self):
"""PairedRegions downstream: single and multiple pair(s)"""
pr = PairedRegion(3,10,2)
self.assertEqual(pr.downstream(), [9,10])
pr = PairedRegion(3,10,1)
self.assertEqual(pr.downstream(), [10])
def test_paired(self):
"""PairedRegion paired: single and multiple pair(s)"""
pr = PairedRegion(3,10,2)
self.assertEqual(pr.paired(), [3,4,9,10])
pr = PairedRegion(3,10,1)
self.assertEqual(pr.paired(), [3,10])
def test_regionRange(self):
"""PairedRegion regionRange: single and multiple pair(s)"""
pr = PairedRegion(3,10,2)
self.assertEqual(pr.range(), 4)
pr = PairedRegion(1,10,4)
self.assertEqual(pr.range(), 2)
pr = PairedRegion(1,5,1)
self.assertEqual(pr.range(), 3)
# no error checking
pr = PairedRegion(5,8,3) # 5,6,7,-- 6,7,8
self.assertEqual(pr.range(), -2)
def test_overlapping(self):
"""PairedRegion overlapping: identical and different regions"""
pr1 = PairedRegion(1,10,2)
pr2 = PairedRegion(3,15,2)
self.assertEqual(pr1.overlapping(pr2), False)
self.assertEqual(pr2.overlapping(pr1), False)
pr1 = PairedRegion(1,10,2)
pr2 = PairedRegion(2,15,2)
pr3 = PairedRegion(9,20,4)
self.assertEqual(pr1.overlapping(pr2), True)
self.assertEqual(pr2.overlapping(pr1), True)
self.assertEqual(pr1.overlapping(pr3), True)
bl1 = PairedRegion(2,10,1)
bl2 = PairedRegion(12,20,3)
self.assertEqual(bl1.overlapping(bl2), False)
pr1 = PairedRegion(1,10,2, Id='A')
pr2 = PairedRegion(1,10,2, Id='A')
pr3 = PairedRegion(1,10,2, Id='B')
self.assertEqual(pr1.overlapping(pr2), True)
self.assertEqual(pr1.overlapping(pr3), True) # ignore ID
def test_conflicting(self):
"""PairedRegion conflicting: identical, nested and pseudoknot"""
bl1 = PairedRegion(2,10,3)
bl2 = PairedRegion(12,20,3)
# identical blocks are NOT conflicting...
self.assertEqual(bl1.conflicting(bl1), False) # identical blocks
self.assertEqual(bl1.conflicting(bl2), False) # one after the other
self.assertEqual(bl2.conflicting(bl1), False) # one after the other
bl1 = PairedRegion(1,30,2) #[(1,30),(2,29)]
bl2 = PairedRegion(14,20,2) #[(14,20),(15,19)]
self.assertEqual(bl1.conflicting(bl2), False) # one inside the other
self.assertEqual(bl2.conflicting(bl1), False) # one inside the other
bl1 = PairedRegion(1,10,2) #[(1,10),(2,9)]
bl2 = PairedRegion(4,15,3) #[(4,15),(5,14),(6,13)]
self.assertEqual(bl1.conflicting(bl2), True) # pseudoknot
self.assertEqual(bl2.conflicting(bl1), True) # pseudoknot
def test_score(self):
"""PairedRegion score: should take arbitrary scoring function"""
f = lambda x: x.Length # scoring function
bl1 = PairedRegion(2,10,3)
bl2 = PairedRegion(12,30,4)
bl1.score(f) # set Score attribute for bl1
bl2.score(f) # set Score attribute for bl2
self.assertEqual(bl1.Score, 3)
self.assertEqual(bl2.Score, 4)
def test_PairedRegionFromPairs(self):
"""PairedRegionFromPairs: should handle valid input"""
p = Pairs([(3,10),(4,9),(5,8)])
pr = PairedRegionFromPairs(p, Id='A')
self.assertEqual(pr.Start, 3)
self.assertEqual(pr.End, 10)
self.assertEqual(pr.Length, 3)
self.assertEqual(pr.Id, 'A')
self.assertEqual(pr.Pairs, [(3,10),(4,9),(5,8)])
def test_PairedRegionFromPairs_invalid(self):
"""PairedRegionFromPairs: conflicts and error checking"""
p = Pairs([(3,10),(4,9),(4,None)])
self.assertRaises(ValueError, PairedRegionFromPairs, p)
# no error checking on input pairs...
p = Pairs([(3,10),(4,9),(6,8)]) # not a real paired region
pr = PairedRegionFromPairs(p, Id='A')
self.assertEqual(pr.Start, 3)
self.assertEqual(pr.End, 10)
self.assertEqual(pr.Length, 3)
self.assertEqual(pr.Id, 'A')
# NOTE: Pairs will be different than input because assumption does
# not hold
self.assertEqual(pr.Pairs, [(3,10),(4,9),(5,8)])
self.assertRaises(ValueError, PairedRegionFromPairs, [])
class PairedRegionsTests(TestCase):
"""Tests for PairedRegions class"""
def test_init(self):
"""PairedRegions __init__: should accept list of PairedRegion objects
"""
pr1 = PairedRegion(3,10,2)
pr2 = PairedRegion(12,20,3)
obs = PairedRegions([pr1, pr2])
self.assertEqual(obs[0].Start, 3)
self.assertEqual(obs[1].Id, None)
self.assertEqual(obs[1].End, 20)
def test_init_no_validation(self):
"""PairedRegions __init__: does not perform validation
"""
# can give any list of arbitrary object as input
obs = PairedRegions([1,2,3])
self.assertEqual(obs[0], 1)
def test_str(self):
"""PairedRegions __str__: full and empty list
"""
pr1 = PairedRegion(3,10,2)
pr2 = PairedRegion(12,20,3, Id='A')
prs = PairedRegions([pr1, pr2])
self.assertEqual(str(prs), "(None:3,10,2; A:12,20,3;)")
def test_eq(self):
"""PairedRegions __eq__: with/without IDs, in or out of order
"""
pr1 = PairedRegion(3,10,2)
pr2 = PairedRegion(12,20,3, Id='A')
prs1 = PairedRegions([pr1, pr2])
pr3 = PairedRegion(3,10,3)
pr4 = PairedRegion(20,30,3, Id='A')
prs2 = PairedRegions([pr3, pr4])
pr5 = PairedRegion(3,10,2)
pr6 = PairedRegion(12,20,3, Id='A')
prs3 = PairedRegions([pr5, pr6])
prs4 = PairedRegions([pr6, pr5])
self.assertEqual(prs1==prs2, False)
self.assertEqual(prs1==prs1, True)
self.assertEqual(prs1==prs3, True)
self.assertEqual(prs1==prs4, True)
def test_ne(self):
"""PairedRegions __ne__: with/without IDs, in or out of order
"""
pr1 = PairedRegion(3,10,2)
pr2 = PairedRegion(12,20,3, Id='A')
prs1 = PairedRegions([pr1, pr2])
pr3 = PairedRegion(3,10,3)
pr4 = PairedRegion(20,30,3, Id='A')
prs2 = PairedRegions([pr3, pr4])
pr5 = PairedRegion(3,10,2)
pr6 = PairedRegion(12,20,3, Id='A')
prs3 = PairedRegions([pr5, pr6])
prs4 = PairedRegions([pr6, pr5])
self.assertEqual(prs1!=prs2, True)
self.assertEqual(prs1!=prs1, False)
self.assertEqual(prs1!=prs3, False)
self.assertEqual(prs1!=prs4, False)
def test_byId(self):
"""PairedRegions byId: unique IDs and duplicates
"""
pr1 = PairedRegion(3,10,2, Id='A')
pr2 = PairedRegion(12,20,3, Id='B')
prs1 = PairedRegions([pr1, pr2])
obs = prs1.byId()
self.assertEqual(obs['A'], pr1)
self.assertEqual(obs['B'], pr2)
self.assertEqual(len(obs), 2)
pr3 = PairedRegion(3,10,2, Id='A')
pr4 = PairedRegion(12,20,3, Id='A')
prs2 = PairedRegions([pr3, pr4])
self.assertRaises(ValueError, prs2.byId)
pr3 = PairedRegion(3,10,2)
pr4 = PairedRegion(12,20,3)
prs2 = PairedRegions([pr3, pr4])
self.assertRaises(ValueError, prs2.byId)
self.assertEqual(PairedRegions().byId(), {})
def test_numberOfRegions(self):
"""PairedRegions numberOfRegions: full and empty"""
pr1 = PairedRegion(2,10,2)
pr2 = PairedRegion(11,20,4)
prs1 = PairedRegions([pr1,pr2])
prs2 = PairedRegions([pr1,pr1,pr2,pr2])
self.assertEqual(prs1.numberOfRegions(), 2)
self.assertEqual(prs2.numberOfRegions(), 4)
self.assertEqual(PairedRegions().numberOfRegions(), 0)
def test_totalLength(self):
"""PairedRegions totalLength: full and empty"""
pr1 = PairedRegion(2,10,2)
pr2 = PairedRegion(11,20,4)
prs1 = PairedRegions([pr1,pr2])
self.assertEqual(prs1.totalLength(), 6)
self.assertEqual(PairedRegions().totalLength(), 0)
def test_totalScore(self):
"""PairedRegions totalScore: full, empty, None"""
pr1 = PairedRegion(2,10,2)
pr1.Score = 3
pr2 = PairedRegion(11,20,4)
pr2.Score = 2
pr3 = PairedRegion(11,20,4)
pr3.Score = None
pr4 = PairedRegion(11,20,4)
pr4.Score = "abc"
pr5 = PairedRegion(11,20,4)
prs1 = PairedRegions([pr1,pr2])
prs2 = PairedRegions([pr1,pr3])
prs3 = PairedRegions([pr1,pr4])
prs4 = PairedRegions([pr1,pr5])
self.assertEqual(prs1.totalScore(), 5)
self.assertRaises(ValueError, prs2.totalScore)
self.assertRaises(ValueError, prs3.totalScore)
self.assertRaises(ValueError, prs4.totalScore)
def test_toPairs(self):
"""PairedRegions toPairs: good data"""
pr1 = PairedRegion(2,10,2)
pr2 = PairedRegion(11,20,4)
prs1 = PairedRegions([pr2,pr1])
exp = [(2,10),(3,9),(11,20),(12,19),(13,18),(14,17)]
self.assertEqual(prs1.toPairs(), exp)
prs2 = PairedRegions([pr1,pr1])
exp = [(2,10),(2,10),(3,9),(3,9)]
self.assertEqual(prs2.toPairs(), exp)
self.assertEqual(PairedRegions().toPairs(), Pairs())
def test_byStartEnd(self):
"""PairedRegions byStartEnd: unique and duplicate keys"""
pr1 = PairedRegion(2,10,2)
pr2 = PairedRegion(11,20,4)
prs1 = PairedRegions([pr2,pr1])
exp = {(2,10): pr1, (11,20): pr2}
self.assertEqual(prs1.byStartEnd(), exp)
pr3 = PairedRegion(2,10,2, Id='A')
pr4 = PairedRegion(2,10,3, Id='B')
prs2 = PairedRegions([pr3,pr4])
self.assertRaises(ValueError, prs2.byStartEnd)
def test_lowestStart(self):
"""PairedRegions lowestStart: full and empty object"""
pr1 = PairedRegion(2,10,2)
pr2 = PairedRegion(11,20,4)
pr3 = PairedRegion(2,30,5,Id='A')
prs1 = PairedRegions([pr2,pr1,pr3])
self.assertEqual(prs1.lowestStart(), 2)
self.assertEqual(PairedRegions().lowestStart(), None)
def test_highestEnd(self):
"""PairedRegions highestEnd: full and empty object"""
pr1 = PairedRegion(2,10,2)
pr2 = PairedRegion(11,20,4)
pr3 = PairedRegion(2,30,5,Id='A')
prs1 = PairedRegions([pr2,pr1,pr3])
self.assertEqual(prs1.highestEnd(), 30)
self.assertEqual(PairedRegions().highestEnd(), None)
def test_sortedIds(self):
"""PairedRegions sortedIds: full and empty list"""
pr1 = PairedRegion(2,10,2, Id='C')
pr2 = PairedRegion(11,20,4, Id='A')
pr3 = PairedRegion(2,30,5,Id='B')
prs1 = PairedRegions([pr1,pr2,pr3])
self.assertEqual(prs1.sortedIds(), ['A','B','C'])
pr1 = PairedRegion(2,10,2, Id='C')
pr2 = PairedRegion(11,20,4, Id='A')
pr3 = PairedRegion(2,30,5,Id='C')
prs1 = PairedRegions([pr1,pr2,pr3])
self.assertEqual(prs1.sortedIds(), ['A','C','C'])
pr1 = PairedRegion(2,10,2)
pr2 = PairedRegion(11,20,4, Id='A')
pr3 = PairedRegion(2,30,5,Id=2)
prs1 = PairedRegions([pr1,pr2,pr3])
self.assertEqual(prs1.sortedIds(), [None, 2, 'A'])
def test_upstream(self):
"""PairedRegions upstream: full and empty"""
self.assertEqual(PairedRegions().upstream(), [])
pr1 = PairedRegion(2,10,2, Id='C')
pr2 = PairedRegion(11,20,4, Id='A')
pr3 = PairedRegion(4,30,1,Id='B')
prs1 = PairedRegions([pr1,pr2,pr3])
exp = [2,3,11,12,13,14,4]
exp.sort()
self.assertEqual(prs1.upstream(), exp)
def test_downstream(self):
"""PairedRegions upstream: full and empty"""
self.assertEqual(PairedRegions().downstream(), [])
pr1 = PairedRegion(2,10,2, Id='C')
pr2 = PairedRegion(11,20,4, Id='A')
pr3 = PairedRegion(4,30,1,Id='B')
prs1 = PairedRegions([pr1,pr2,pr3])
exp = [10,9,20,19,18,17,30]
exp.sort()
self.assertEqual(prs1.downstream(), exp)
def test_pairedPos(self):
"""PairedRegions pairedPos: full and empty"""
self.assertEqual(PairedRegions().pairedPos(), [])
pr1 = PairedRegion(2,10,2, Id='C')
pr2 = PairedRegion(11,20,4, Id='A')
pr3 = PairedRegion(4,30,1,Id='B')
prs1 = PairedRegions([pr1,pr2,pr3])
exp = [2,3,11,12,13,14,4,10,9,20,19,18,17,30]
exp.sort()
self.assertEqual(prs1.pairedPos(), exp)
def test_boundaries(self):
"""PairedRegions boundaries: full and empty"""
self.assertEqual(PairedRegions().boundaries(), [])
pr1 = PairedRegion(2,10,2, Id='C')
pr2 = PairedRegion(11,20,4, Id='A')
pr3 = PairedRegion(4,30,1,Id='B')
prs1 = PairedRegions([pr1,pr2,pr3])
exp = [2,10,11,20,4,30]
exp.sort()
self.assertEqual(prs1.boundaries(), exp)
def test_enumeratedBoundaries(self):
"""PairedRegions enumeratedBoundaries: full and empty"""
self.assertEqual(PairedRegions().enumeratedBoundaries(), {})
pr1 = PairedRegion(2,10,2, Id='C')
pr2 = PairedRegion(11,20,4, Id='A')
pr3 = PairedRegion(4,30,1,Id='B')
prs1 = PairedRegions([pr1,pr2,pr3])
exp = {0:2,2:10,3:11,4:20,1:4,5:30}
self.assertEqual(prs1.enumeratedBoundaries(), exp)
def test_invertedEnumeratedBoundaries(self):
"""PairedRegions invertedEnumeratedBoundaries: full and empty"""
self.assertEqual(PairedRegions().invertedEnumeratedBoundaries(), {})
pr1 = PairedRegion(2,10,2, Id='C')
pr2 = PairedRegion(11,20,4, Id='A')
pr3 = PairedRegion(4,30,1,Id='B')
prs1 = PairedRegions([pr1,pr2,pr3])
exp = {2:0,10:2,11:3,20:4,4:1,30:5}
self.assertEqual(prs1.invertedEnumeratedBoundaries(), exp)
pr1 = PairedRegion(3,10,2)
pr2 = PairedRegion(5,10,3)
prs = PairedRegions([pr1, pr2])
self.assertRaises(ValueError, prs.invertedEnumeratedBoundaries)
def test_merge(self):
"""PairedRegions merge: different, duplicates, empty"""
pr1 = PairedRegion(3,10,2, Id='A')
pr2 = PairedRegion(11,20,3, Id='B')
pr3 = PairedRegion(15,25,1, Id='C')
prs1 = PairedRegions([pr1, pr2])
prs2 = PairedRegions([pr1, pr3])
prs3 = PairedRegions()
exp = PairedRegions([pr1,pr2,pr3])
self.assertEqual(prs1.merge(prs2), exp)
self.assertEqual(prs2.merge(prs1), exp)
self.assertEqual(prs1.merge(prs3), prs1)
self.assertEqual(prs2.merge(prs3), prs2)
def test_conflicting_no_ids(self):
"""PairedRegions conflicting: raises error on duplicate IDs
"""
pr1 = PairedRegion(1,10,2)
pr2 = PairedRegion(11,20,2)
prs = PairedRegions([pr1, pr2])
self.assertRaises(ValueError, prs.conflicting) #conflicting IDs
def test_conflicting(self):
"""PairedRegions conflicting: works when IDs are set and unique
"""
pr1 = PairedRegion(3,10,2, Id='A')
pr2 = PairedRegion(11,20,3, Id='B')
pr3 = PairedRegion(15,25,1, Id='C')
prs = PairedRegions([pr1,pr2,pr3])
self.assertEqual(prs.conflicting(), PairedRegions([pr2,pr3]))
prs = PairedRegions()
self.assertEqual(prs.conflicting(), PairedRegions())
def test_non_conflicting_no_ids(self):
"""PairedRegions nonConflicting: raises error on duplicate IDs
"""
pr1 = PairedRegion(1,10,2)
pr2 = PairedRegion(11,20,2)
prs = PairedRegions([pr1, pr2])
self.assertRaises(ValueError, prs.nonConflicting) #conflicting IDs
def test_non_conflicting(self):
"""PairedRegions nonConflicting: works when IDs are set and unique
"""
pr1 = PairedRegion(3,10,2, Id='A')
pr2 = PairedRegion(11,20,3, Id='B')
pr3 = PairedRegion(15,25,1, Id='C')
prs = PairedRegions([pr1,pr2,pr3])
self.assertEqual(prs.nonConflicting(), PairedRegions([pr1]))
prs = PairedRegions()
self.assertEqual(prs.conflicting(), PairedRegions())
def test_conflictCliques(self):
"""PairedRegions conflictCliques: should work when IDs are unique"""
pr1 = PairedRegion(3,10,2, Id='A')
pr2 = PairedRegion(11,20,3, Id='B')
pr3 = PairedRegion(15,25,1, Id='C')
pr4 = PairedRegion(30,40,2, Id='D')
pr5 = PairedRegion(28,35,1, Id='E')
prs = PairedRegions([pr1,pr2,pr3,pr4, pr5])
obs = prs.conflictCliques()
exp = [PairedRegions([pr2,pr3]),PairedRegions([pr5,pr4])]
for i in obs:
self.failUnless(i in exp)
self.assertEqual(len(obs), len(exp))
prs = PairedRegions()
self.assertEqual(prs.conflictCliques(), [])
def test_PairedRegionsFromPairs(self):
"""PairedRegionsFromPairs: should work on valid input"""
p = Pairs([(1,10),(2,9),(12,20),(13,19),(14,18)])
prs = PairedRegionsFromPairs(p)
self.assertEqual(len(prs), 2)
self.assertEqual(prs[0].Id, 0)
self.assertEqual(prs[0].Pairs, [(1,10),(2,9)])
self.assertEqual(prs[0].Start, 1)
self.assertEqual(prs[0].End, 10)
self.assertEqual(PairedRegionsFromPairs(Pairs()), PairedRegions())
def test_PairedRegionsFromPairs_conflict(self):
"""PairedRegionsFromPairs: should raise error on overlapping pairs"""
p = Pairs([(2,20),(5,10),(10,15)])
self.assertRaises(ValueError, PairedRegionsFromPairs, p)
class ConflictMatrixTests(TestCase):
"""Tests for ConflictMatrix class"""
def test_conflict_matrix_from_pairs(self):
"""ConflixtMatrix __init__: Pairs as input, w/wo conflict """
f = ConflictMatrix
# conflict free
d = [(1,10),(2,9),(12,20),(13,19),(14,18)]
exp = Dict2D({0:{0:False,1:False},1:{0:False,1:False}})
self.assertEqual(f(d).Matrix, exp)
self.failIf(not isinstance(f(d).Matrix, Dict2D))
# 1 conflict
d = [(1,10),(2,9),(12,20),(13,19),(14,18),(15,30),(16,29)]
exp = Dict2D({0:{0:False,1:False,2:False},\
1:{0:False,1:False,2:True},\
2:{0:False,1:True,2:False}})
self.assertEqual(f(d).Matrix, exp)
# 1 conflict
d = Pairs([(1,10),(2,9),(12,20),(13,19),(14,18),(15,30),(16,29)])
exp = Dict2D({0:{0:False,1:False,2:False},\
1:{0:False,1:False,2:True},\
2:{0:False,1:True,2:False}})
m = f(d).Matrix
self.assertEqual(m, exp)
self.assertEqual(m.RowOrder, [0,1,2])
self.assertEqual(m.ColOrder, [0,1,2])
d = [] # empty input
exp = Dict2D()
self.assertEqual(f(d).Matrix, exp)
def test_ConflictMatrix_Pairs_overlap(self):
"""ConflictMatrix __init__: raises error on overlapping pairs"""
p = Pairs([(1,10),(2,9),(3,9),(12,20)])
self.assertRaises(ValueError, ConflictMatrix, p)
def test_conflict_matrix_from_PairedRegions(self):
"""ConflictMatrix __init__: PairedRegions as input, w/wo conflict
"""
f = ConflictMatrix
# conflict free
pr1 = PairedRegion(1,10,2, Id=0)
pr2 = PairedRegion(12,20,3, Id=1)
prs = PairedRegions([pr1,pr2])
exp = Dict2D({0:{0:False,1:False},1:{0:False,1:False}})
self.assertEqual(f(prs).Matrix, exp)
self.failIf(not isinstance(f(prs).Matrix, Dict2D))
pr1 = PairedRegion(1,10,2, Id=0)
pr2 = PairedRegion(12,20,3, Id=1)
pr3 = PairedRegion(15,30,2, Id=2)
prs = PairedRegions([pr1,pr2, pr3])
# 1 conflict
exp = Dict2D({0:{0:False,1:False,2:False},\
1:{0:False,1:False,2:True},\
2:{0:False,1:True,2:False}})
self.assertEqual(f(prs).Matrix, exp)
# 1 conflict
pr1 = PairedRegion(1,10,2, Id=4)
pr2 = PairedRegion(12,20,3, Id=1)
pr3 = PairedRegion(15,30,2, Id=9)
prs = PairedRegions([pr1,pr2, pr3])
exp = Dict2D({1:{4:False,1:False,9:True},\
4:{1:False,4:False,9:False},\
9:{1:True,4:False,9:False}})
m = f(prs).Matrix
self.assertEqual(m, exp)
self.assertEqual(m.RowOrder, [1,4,9])
self.assertEqual(m.ColOrder, [1,4,9])
prs = PairedRegions()
exp = Dict2D()
self.assertEqual(f(prs).Matrix, exp)
# input some weird data. Other errors might occur.
self.assertRaises(ValueError, f, 'ABC')
self.assertRaises(ValueError, f, [('a','b'),('c','d')])
def test_ConflictMatrix_PairedRegions_overlap(self):
"""ConflictMatrix __init__: raises error on overlapping PairedRegions
"""
pr1 = PairedRegion(1,10,2, Id='A')
pr2 = PairedRegion(8,20,2, Id='B')
prs = PairedRegions([pr1, pr2])
self.assertRaises(ValueError, ConflictMatrix, prs)
def test_conflictsOf(self):
"""ConflictMatrix conflictsOf: with/without conflicts"""
p = Pairs([(1,10),(5,15),(20,30),(25,35),(24,32),(0,80)])
cm = ConflictMatrix(p)
self.assertEqual(cm.conflictsOf(0), [])
self.assertEqual(cm.conflictsOf(1), [2])
self.assertEqual(cm.conflictsOf(2), [1])
self.assertEqual(cm.conflictsOf(3), [4,5])
p = Pairs([(1,10),(11,20)])
cm = ConflictMatrix(p)
self.assertEqual(cm.conflictsOf(0), [])
self.assertEqual(cm.conflictsOf(1), [])
self.assertRaises(KeyError, cm.conflictsOf, 2)
def test_conflicting(self):
"""ConflictMatrix conflicting: full and empty Pairs"""
p = Pairs([(1,10),(5,15),(20,30),(25,35),(24,32),(0,80)])
cm = ConflictMatrix(p)
obs = cm.conflicting()
exp = [1,2,3,4,5]
self.assertEqual(obs, exp)
self.assertEqual(ConflictMatrix(Pairs()).conflicting(), [])
def test_nonConflicting(self):
"""ConflictMatrix nonConflicting: full and empty Pairs"""
p = Pairs([(1,10),(5,15),(20,30),(25,35),(24,32),(0,80)])
cm = ConflictMatrix(p)
obs = cm.nonConflicting()
exp = [0]
self.assertEqual(obs, exp)
self.assertEqual(ConflictMatrix(Pairs()).nonConflicting(), [])
def test_conflictCliques(self):
"""ConflictMatrix conflictCliques: full and empty Pairs"""
p = Pairs([(1,10),(5,15),(20,30),(25,35),(24,32),(0,80)])
cm = ConflictMatrix(p)
obs = cm.conflictCliques()
exp = [[1,2],[3,4,5]]
self.assertEqual(obs, exp)
self.assertEqual(ConflictMatrix(Pairs()).conflictCliques(), [])
class DPTests(TestCase):
"""Tests for opt_all and related functions"""
def test_num_bps(self):
"""num_bps: should return length of paired region"""
f = num_bps
pr1 = PairedRegion(0,10,3)
self.assertEqual(f(pr1), 3)
def test_hydrogen_bonds(self):
"""hydrogen_bonds: score GC, AU, and GU base pairs"""
f = hydrogen_bonds('UACGAAAUGCGUG')
pr1 = PairedRegion(0,12,5)
self.assertEqual(f(pr1),10)
f = hydrogen_bonds('UACGAAA') # sequence too short
pr1 = PairedRegion(0,12,5)
self.assertRaises(IndexError, f, pr1)
def test_contains_true(self):
"""contains_true: should return True if True in input"""
f = contains_true
self.assertEqual(f([True]), True)
self.assertEqual(f([True, False]), True)
self.assertEqual(f([1, 0]), True)
self.assertEqual(f([1]), True)
self.assertEqual(f([False]), False)
self.assertEqual(f([3]), False)
self.assertEqual(f(["a","b","c"]), False)
self.assertEqual(f("abc"), False)
def test_empty_matrix(self):
"""empty_matrix: valid input and error"""
f = empty_matrix
p = PairedRegions()
exp = [[[p],[p]], [[p],[p]]]
self.assertEqual(f(2), exp)
self.assertEqual(f(1), [[[p]]])
self.assertRaises(ValueError, f, 0)
def test_pick_multi_best_max(self):
"""pick_multi_best: max, full and empty list"""
pr1 = PairedRegion(2,10,2, Id='A')
pr2 = PairedRegion(4,15,3, Id='B')
pr3 = PairedRegion(20,40,5, Id='C')
pr4 = PairedRegion(22,30,3, Id='D')
for i in [pr1,pr2,pr3,pr4]:
i.score(num_bps)
prs1 = PairedRegions([pr1, pr2])
prs2 = PairedRegions([pr3])
prs3 = PairedRegions([pr4])
self.assertEqualItems(pick_multi_best([prs1, prs2, prs3]), [prs1, prs2])
self.assertEqual(pick_multi_best([]), [PairedRegions()])
def test_pick_multi_best_min(self):
"""pick_multi_best: min, full and empty list"""
f = lambda x: -1
pr1 = PairedRegion(2,10,2)
pr2 = PairedRegion(4,15,3)
pr3 = PairedRegion(20,40,5)
pr4 = PairedRegion(22,30,3)
for i in [pr1,pr2,pr3,pr4]:
i.score(f)
prs1 = PairedRegions([pr1, pr2])
prs2 = PairedRegions([pr3])
prs3 = PairedRegions([pr4])
self.assertEqual(pick_multi_best([prs1, prs2, prs3], goal='min'),\
[prs1])
self.assertEqual(pick_multi_best([], goal='min'), [PairedRegions()])
def test_dp_matrix_multi_toy(self):
"""dp_matrix_multi: test on initial toy example"""
pr0 = PairedRegion(0, 70, 2, Id='C')
pr1 = PairedRegion(10, 30, 4, Id='A')
pr2 = PairedRegion(20, 50, 3, Id='B')
pr3 = PairedRegion(40, 90, 2, Id='E')
pr4 = PairedRegion(60, 80, 3, Id='D')
prs = PairedRegions([pr0, pr1, pr2, pr3, pr4])
obs = dp_matrix_multi(prs)
self.assertEqual(obs[0][0], [PairedRegions()])
self.assertEqual(obs[0][3], [PairedRegions([pr1])])
self.assertEqual(obs[2][5], [PairedRegions([pr2])])
self.assertEqual(obs[4][9], [PairedRegions([pr3,pr4])])
self.assertEqual(obs[2][9], [PairedRegions([pr2,pr4])])
self.assertEqual(obs[1][8], [PairedRegions([pr1,pr4])])
self.assertEqual(obs[1][9], [PairedRegions([pr1,pr3,pr4])])
self.assertEqual(obs[0][9], [PairedRegions([pr1,pr3,pr4])])
def test_dp_matrix_multi_lsu(self):
"""dp_matrix_multi: test on LSU rRNA domain I case"""
pr0 = PairedRegion(56, 69, 3, Id=0)
pr1 = PairedRegion(60, 92, 1, Id=1)
pr2 = PairedRegion(62, 89, 3, Id=2)
pr3 = PairedRegion(75, 109, 6, Id=3)
pr4 = PairedRegion(84, 96, 3, Id=4)
prs = PairedRegions([pr0, pr1, pr2, pr3, pr4])
obs = dp_matrix_multi(prs)
self.assertEqual(obs[0][0], [PairedRegions()])
self.assertEqual(obs[0][5], [PairedRegions([pr0])])
self.assertEqual(obs[1][6], [PairedRegions([pr2])])
self.assertEqual(obs[1][7], [PairedRegions([pr1,pr2])])
self.assertEqualItems(obs[2][8],\
[PairedRegions([pr2]),PairedRegions([pr4])])
self.assertEqual(obs[1][9], [PairedRegions([pr3,pr4])])
self.assertEqual(obs[0][9], [PairedRegions([pr0,pr3,pr4])])
def test_dp_matrix_multi_artificial(self):
"""dp_matrix_multi: test on artificial structure"""
pr0 = PairedRegion(0, 77, 2, Id=0)
pr1 = PairedRegion(7, 75, 5, Id=1)
pr2 = PairedRegion(13, 83, 3, Id=2)
pr3 = PairedRegion(18, 41, 5, Id=3)
pr4 = PairedRegion(23, 53, 10, Id=4)
pr5 = PairedRegion(33, 70, 3, Id=5)
pr6 = PairedRegion(59, 93, 9, Id=6)
pr7 = PairedRegion(78, 96, 3, Id=7)
prs = PairedRegions([pr0, pr1, pr2, pr3, pr4, pr5, pr6, pr7])
obs = dp_matrix_multi(prs)
self.assertEqual(obs[0][0], [PairedRegions()])
self.assertEqual(obs[0][6], [PairedRegions([pr3])])
self.assertEqual(obs[0][7], [PairedRegions([pr4])])
self.assertEqual(obs[9][15], [PairedRegions([pr7])])
self.assertEqual(obs[1][10], [PairedRegions([pr1,pr4])])
self.assertEqual(obs[0][11], [PairedRegions([pr0,pr1,pr4])])
self.assertEqual(obs[3][14], [PairedRegions([pr4, pr6])])
self.assertEqual(obs[3][14], [PairedRegions([pr4, pr6])])
self.assertEqual(obs[0][13], [PairedRegions([pr0, pr1, pr4])])
self.assertEqual(obs[0][14], [PairedRegions([pr4, pr6])])
self.assertEqual(obs[1][15], [PairedRegions([pr4, pr6])])
self.assertEqual(obs[0][15], [PairedRegions([pr0, pr1, pr4, pr7])])
def test_pick_multi_best_saturated(self):
"""pick_multi_best: should only include saturated solutions"""
pr1 = PairedRegion(2,10,2, Id='A')
pr1.Score = 2
pr2 = PairedRegion(15,25,2, Id='B')
pr2.Score = 2
pr3 = PairedRegion(4,22,4, Id='C')
pr3.Score = 0
prs1 = PairedRegions([pr1])
prs2 = PairedRegions([pr2])
prs3 = PairedRegions([pr1, pr3])
self.assertEqualItems(pick_multi_best([prs1, prs2, prs3]),\
[prs2, prs3])
self.assertEqual(pick_multi_best([]), [PairedRegions()])
def test_matrix_solutions(self):
"""matrix_solutions: should return contents of top-right cell"""
pr0 = PairedRegion(56, 69, 3, Id=0)
pr1 = PairedRegion(60, 92, 1, Id=1)
pr2 = PairedRegion(62, 89, 3, Id=2)
pr3 = PairedRegion(75, 109, 6, Id=3)
pr4 = PairedRegion(84, 96, 3, Id=4)
prs = PairedRegions([pr0, pr1, pr2, pr3, pr4])
obs = matrix_solutions(prs)
self.assertEqual(obs, [PairedRegions([pr0,pr3,pr4])])
# error, size should be at least 1
prs = PairedRegions()
self.assertRaises(ValueError, matrix_solutions, prs)
pr = PairedRegion(2,20, 5, Id='A')
prs = PairedRegions([pr])
obs = matrix_solutions(prs)
self.assertEqual(obs, [prs])
def test_opt_all_nested(self):
"""opt_all: should return input when already nested"""
p = Pairs([(1,10),(2,9),(20,30),(22,29)])
obs = opt_all(p)
self.assertEqual(len(obs),1)
self.assertEqual(obs[0], p)
p = Pairs()
self.assertEqual(opt_all(p), [[]])
def test_opt_all_overlap(self):
"""opt_all: should raise error on overlapping pairs"""
p = Pairs([(1,10),(2,9),(9,30),(22,29),(1,None)])
self.assertRaises(ValueError, opt_all, p)
def test_opt_all_knot(self):
"""opt_all: single/multiple solution(s)"""
p = Pairs([(1,10),(2,9),(3,15),(4,14),(11,20),(12,19),(25,30)])
obs = opt_all(p)
exp = Pairs([(1,10),(2,9),(11,20),(12,19),(25,30)])
exp_rem = [(3,15),(4,14)]
self.assertEqual(len(obs), 1)
self.assertEqual(obs[0], exp)
self.assertEqual(opt_all(p, return_removed=True)[0][1],\
exp_rem)
p = Pairs([(1,10),(2,9),(4,14),(3,15)])
obs = opt_all(p)
self.assertEqual(len(obs), 2)
self.assertEqualItems(obs, [Pairs([(1,10),(2,9)]),\
Pairs([(3,15),(4,14)])])
exp_rem = [(Pairs([(1,10),(2,9)]),Pairs([(3,15),(4,14)])),\
(Pairs([(3,15),(4,14)]),Pairs([(1,10),(2,9)]))]
self.assertEqualItems(opt_all(p, return_removed=True),\
exp_rem)
def test_opt_all_some_non_conflicting(self):
"""opt_all: some conflicting, other not"""
p = Pairs([(30,40),(10,20),(12,17),(13,None),(17,12),(35,45),(36,44)])
exp = Pairs([(10,20),(12,17),(35,45),(36,44)])
exp_rem = [(30,40)]
self.assertEqual(opt_all(p, return_removed=True),\
[(exp,exp_rem)])
def test_opt_all_scoring1(self):
"""opt_all: one optimal in bps, both optimal in energy"""
p = Pairs([(1,10),(2,9),(4,15),(5,14),(6,13)])
obs_bps = opt_all(p, goal='max', scoring_function=num_bps)
obs_energy = opt_all(p, goal='max',\
scoring_function=hydrogen_bonds('CCCAAAUGGGGUCGUUC'))
exp_bps = [[(4,15),(5,14),(6,13)]]
exp_energy = [[(1,10),(2,9)],[(4,15),(5,14),(6,13)]]
self.assertEqualItems(obs_bps, exp_bps)
self.assertEqualItems(obs_energy, exp_energy)
def test_opt_all_scoring2(self):
"""opt_all: both optimal in bps, one optimal in energy"""
p = Pairs([(0,9),(1,8),(2,7),(3,13),(4,12),(5,11)])
obs_bps = opt_all(p, goal='max', scoring_function=num_bps)
obs_energy = opt_all(p, goal='max',\
scoring_function=hydrogen_bonds('CCCAAAAGGGUUUU'))
exp_bps = [[(0,9),(1,8),(2,7)],[(3,13),(4,12),(5,11)]]
exp_energy = [[(0,9),(1,8),(2,7)]]
self.assertEqualItems(obs_bps, exp_bps)
self.assertEqualItems(obs_energy, exp_energy)
def test_opt_all_scoring3(self):
"""opt_all: one optimal in bps, the other optimal in energy"""
p = Pairs([(0,11),(1,10),(2,9),(4,15),(5,14),(6,13),(7,12)])
obs_bps = opt_all(p, goal='max', scoring_function=num_bps)
obs_energy = opt_all(p, goal='max',\
scoring_function=hydrogen_bonds('CCCCAAAAGGGGUUUU'))
exp_bps = [[(4,15),(5,14),(6,13),(7,12)]]
exp_energy = [[(0,11),(1,10),(2,9)]]
self.assertEqualItems(obs_bps, exp_bps)
self.assertEqualItems(obs_energy, exp_energy)
def test_opt_single_random(self):
"""opt_single_random: should return single solution"""
p = Pairs ([(10,20),(11,19),(15,25),(16,24)])
exp1, exp_rem1 = [(10,20),(11,19)], [(15,25),(16,24)]
exp2, exp_rem2 = [(15,25),(16,24)], [(10,20),(11,19)]
obs = opt_single_random(p)
self.failUnless(obs == exp1 or obs == exp2)
obs = opt_single_random(p, return_removed=True)
self.failUnless(obs == (exp1, exp_rem1) or obs == (exp2, exp_rem2))
def test_opt_single_property(self):
"""opt_single_property: three properties"""
# one solution single region, other solution two regions
p = Pairs ([(10,20),(25,35),(26,34),(27,33),\
(12,31),(13,30),(14,29),(15,28)])
exp = [(12,31),(13,30),(14,29),(15,28)]
exp_rem = [(10,20),(25,35),(26,34),(27,33)]
self.assertEqual(opt_single_property(p), exp)
self.assertEqual(opt_single_property(p, return_removed=True),\
(exp,exp_rem))
# both two blocks, one shorter average range
p = Pairs ([(10,20),(22,40),(23,39),(24,38),\
(17,26),(18,25),(36,43),(37,42)])
exp = [(17,26),(18,25),(36,43),(37,42)]
exp_rem = [(10,20),(22,40),(23,39),(24,38)]
self.assertEqual(opt_single_property(p), exp)
self.assertEqual(opt_single_property(p, return_removed=True),\
(exp,exp_rem))
# both single block over same range, pick lowest start
p = Pairs([(10,20),(15,25)])
exp = [(10,20)]
exp_rem = [(15,25)]
self.assertEqual(opt_single_property(p), exp)
self.assertEqual(opt_single_property(p, return_removed=True),\
(exp,exp_rem))
class EliminationMethodsTests(TestCase):
"""Tests for conflict_elimination and related functions"""
def test_find_max_conflicts(self):
"""find_max_conflicts: simple case"""
f = find_max_conflicts
pr0 = PairedRegion(0, 77, 2, Id=0)
pr1 = PairedRegion(7, 75, 5, Id=1)
pr2 = PairedRegion(13, 83, 3, Id=2)
pr3 = PairedRegion(18, 41, 5, Id=3)
pr4 = PairedRegion(23, 53, 10, Id=4)
pr5 = PairedRegion(33, 70, 3, Id=5)
pr6 = PairedRegion(59, 93, 9, Id=6)
pr7 = PairedRegion(78, 96, 3, Id=7)
prs = PairedRegions([pr0, pr1, pr2, pr3, pr4, pr5, pr6, pr7])
id_to_pr = prs.byId()
cm = ConflictMatrix(prs)
conf = cm.conflicting()
self.assertEqual(f(conf, cm, prs.byId()), 6)
prs = PairedRegions([pr0, pr1, pr2, pr3, pr4, pr5, pr7])
id_to_pr = prs.byId()
cm = ConflictMatrix(prs)
conf = cm.conflicting()
self.assertEqual(f(conf, cm, prs.byId()), 2)
prs = PairedRegions([pr0, pr1, pr3, pr4, pr5, pr7])
id_to_pr = prs.byId()
cm = ConflictMatrix(prs)
conf = cm.conflicting()
self.assertEqual(f(conf, cm, prs.byId()), 5)
def test_find_max_conflicts_on_start(self):
"""find_max_conflicts: in case of equal conflicts and gain"""
f = find_max_conflicts
pr0 = PairedRegion(10, 20, 2, Id=0)
pr1 = PairedRegion(15, 25, 2, Id=1)
prs = PairedRegions([pr0, pr1])
id_to_pr = prs.byId()
cm = ConflictMatrix(prs)
conf = cm.conflicting()
self.assertEqual(f(conf, cm, prs.byId()), 1)
def test_find_min_gain(self):
"""find_min_gain: differentiate on gain only"""
f = find_min_gain
pr0 = PairedRegion(0, 77, 2, Id=0)
pr1 = PairedRegion(7, 75, 5, Id=1)
pr2 = PairedRegion(13, 83, 3, Id=2)
pr3 = PairedRegion(18, 41, 5, Id=3)
pr4 = PairedRegion(23, 53, 10, Id=4)
pr5 = PairedRegion(33, 70, 3, Id=5)
pr6 = PairedRegion(59, 93, 9, Id=6)
pr7 = PairedRegion(78, 96, 3, Id=7)
prs = PairedRegions([pr0, pr1, pr2, pr3, pr4, pr5, pr6, pr7])
id_to_pr = prs.byId()
cm = ConflictMatrix(prs)
conf = cm.conflicting()
self.assertEqual(f(conf, cm, prs.byId()), 5)
prs = PairedRegions([pr0, pr1, pr2, pr3, pr4, pr6, pr7])
id_to_pr = prs.byId()
cm = ConflictMatrix(prs)
conf = cm.conflicting()
self.assertEqual(f(conf, cm, prs.byId()), 2)
def test_find_min_gain_conf(self):
"""find_min_gain: in case of equal gain, differentiate on conflicts"""
f = find_min_gain
pr0 = PairedRegion(10,30,3, Id=0)
pr1 = PairedRegion(1,20,6, Id=1)
pr2 = PairedRegion(22,40,2, Id=2)
pr3 = PairedRegion(50,80,3, Id=3)
pr4 = PairedRegion(60,90,8, Id=4)
prs = PairedRegions([pr0, pr1, pr2, pr3, pr4])
id_to_pr = prs.byId()
cm = ConflictMatrix(prs)
conf = cm.conflicting()
self.assertEqual(f(conf, cm, prs.byId()), 0)
def test_find_min_gain_start(self):
"""find_min_gain: in case of equal gain and number of conflicts"""
f = find_min_gain
pr0 = PairedRegion(10,30,3, Id=0)
pr1 = PairedRegion(1,20,6, Id=1)
pr2 = PairedRegion(22,40,2, Id=2)
pr3 = PairedRegion(50,80,3, Id=3)
pr4 = PairedRegion(60,90,7, Id=4)
pr5 = PairedRegion(45,55,1, Id=5)
prs = PairedRegions([pr0, pr1, pr2, pr3, pr4, pr5])
id_to_pr = prs.byId()
cm = ConflictMatrix(prs)
conf = cm.conflicting()
self.assertEqual(f(conf, cm, prs.byId()), 3)
def test_add_back_non_conflicting(self):
"""add_back_non_conflicting: should add all non-confl regions"""
f = add_back_non_conflicting
pr0 = PairedRegion(10,20,3, Id=0)
pr1 = PairedRegion(30,40,2, Id=1)
pr2 = PairedRegion(50,60,2, Id=2)
pr3 = PairedRegion(45,55,3, Id=3) # confl with pr1 and pr2
pr4 = PairedRegion(0,90,7, Id=4) # not confl with 1,2,3
pr5 = PairedRegion(32,38,2, Id=5) # not confl with 1,2,3
prs = PairedRegions([pr0, pr1, pr2])
removed = {3: pr3, 4: pr4, 5: pr5}
exp_prs = PairedRegions([pr0, pr1, pr2, pr4, pr5])
exp_rem = {3: pr3}
self.assertEqual(f(prs, removed), (exp_prs, exp_rem))
def test_add_back_non_conflicting_order(self):
"""add_back_non_conflicting: should add 5' side first"""
f = add_back_non_conflicting
pr0 = PairedRegion(10,20,3, Id=0)
pr1 = PairedRegion(30,40,2, Id=1)
pr2 = PairedRegion(50,60,2, Id=2)
pr3 = PairedRegion(45,55,3, Id=3) # confl with pr1 and pr2
pr4 = PairedRegion(0,90,7, Id=4) # not confl with 1,2,3
pr5 = PairedRegion(80,95,2, Id=5) # not confl with 1,2,3
prs = PairedRegions([pr0, pr1, pr2])
removed = {3: pr3, 4: pr4, 5: pr5}
exp_prs = PairedRegions([pr0, pr1, pr2, pr4 ])
exp_rem = {3: pr3, 5: pr5}
self.assertEqual(f(prs, removed), (exp_prs, exp_rem))
def test_elim_most_conflict(self):
"""conflict_elimination: find_max_conflicts, simple case"""
f = conflict_elimination
func = find_max_conflicts
pr0 = PairedRegion(0, 77, 2, Id=0)
pr1 = PairedRegion(7, 75, 5, Id=1)
pr2 = PairedRegion(13, 83, 3, Id=2)
pr3 = PairedRegion(18, 41, 5, Id=3)
pr4 = PairedRegion(23, 53, 10, Id=4)
pr5 = PairedRegion(33, 70, 3, Id=5)
pr6 = PairedRegion(59, 93, 9, Id=6)
pr7 = PairedRegion(78, 96, 3, Id=7)
prs = PairedRegions([pr0, pr1, pr2, pr3, pr4, pr5, pr6, pr7])
pairs = prs.toPairs()
exp = PairedRegions([pr0, pr1, pr4, pr7]).toPairs()
exp_rem = PairedRegions([pr2, pr3, pr5, pr6]).toPairs()
self.assertEqual(f(pairs, func), exp)
self.assertEqual(f(pairs, func, return_removed=True), (exp, exp_rem))
def test_elim_mc_circular(self):
"""conflict_elimination: find_max_conflicts, circular removal"""
# simply remove in order of most conflicts, don't add back
prfp = PairedRegionFromPairs
f = conflict_elimination
func = find_max_conflicts
pr0 = prfp([(13, 65), (14, 64)], Id=0)
pr1 = prfp([(15, 102), (16, 101), (17, 100), (18, 99), (19, 98)], Id=1)
pr2 = prfp([(22, 72), (23, 71), (24, 70), (25, 69),\
(26, 68), (27, 67), (28, 66)], Id=2)
pr3 = prfp([(31, 147), (32, 146), (33, 145), (34, 144), (35, 143),\
(36, 142), (37, 141), (38, 140), (39, 139)], Id=3)
pr4 = prfp([(42, 129), (43, 128), (44, 127)], Id=4)
pr5 = prfp([(46, 149), (47, 148)], Id=5)
pr6 = prfp([(49, 92), (50, 91), (51, 90), (52, 89), (53, 88)], Id=6)
pr7 = prfp([(75, 138), (76, 137), (77, 136), (78, 135)], Id=7)
prs = PairedRegions([pr0, pr1, pr2, pr3, pr4, pr5, pr6, pr7])
exp = PairedRegions([pr3, pr6]).toPairs()
exp_rem = PairedRegions([pr0, pr1, pr2, pr4, pr5, pr7]).toPairs()
self.assertEqual(f(prs.toPairs(), func, add_back=False,\
return_removed=True), (exp, exp_rem))
# add back circular removals
exp = PairedRegions([pr3, pr4, pr6]).toPairs()
exp_rem = PairedRegions([pr0, pr1, pr2, pr5, pr7]).toPairs()
self.assertEqual(f(prs.toPairs(), func, add_back=True,\
return_removed=True), (exp, exp_rem))
def test_elim_min_gain(self):
"""conflict_elimination: find_min_gain, simple case"""
f = conflict_elimination
func = find_min_gain
pr0 = PairedRegion(0, 77, 2, Id=0)
pr1 = PairedRegion(7, 75, 5, Id=1)
pr2 = PairedRegion(13, 83, 3, Id=2)
pr3 = PairedRegion(18, 41, 5, Id=3)
pr4 = PairedRegion(23, 53, 10, Id=4)
pr5 = PairedRegion(33, 70, 3, Id=5)
pr6 = PairedRegion(59, 93, 9, Id=6)
pr7 = PairedRegion(78, 96, 3, Id=7)
prs = PairedRegions([pr0, pr1, pr2, pr3, pr4, pr5, pr6, pr7])
pairs = prs.toPairs()
exp = PairedRegions([pr4, pr6]).toPairs()
exp_rem = PairedRegions([pr0, pr1, pr2, pr3, pr5, pr7]).toPairs()
self.assertEqual(f(pairs, func), exp)
self.assertEqual(f(pairs, func, return_removed=True), (exp, exp_rem))
def test_elim_min_gain_circular(self):
"""conflict_elimination: find_min_gain, circular removal"""
# simply remove in order of most conflicts, don't add back
prfp = PairedRegionFromPairs
f = conflict_elimination
func = find_min_gain
pr0 = prfp([(5, 170), (6, 169), (7, 168), (8, 167), (9, 166),\
(10, 165)], Id=0)
pr1 = prfp([(25, 62), (26, 61)], Id=1)
pr2 = prfp([(29, 46), (30, 45), (31, 44)], Id=2)
pr3 = prfp([(48, 124), (49, 123)], Id=3)
pr4 = prfp([(67, 183), (68, 182), (69, 181), (70, 180), (71, 179),\
(72, 178), (73, 177), (74, 176), (75, 175), (76, 174)], Id=4)
pr5 = prfp([(82, 172), (83, 171)], Id=5)
pr6 = prfp([(117, 135), (118, 134), (119, 133)], Id=6)
pr7 = prfp([(151, 199), (152, 198), (153, 197), (154, 196),\
(155, 195), (156, 194), (157, 193), (158, 192), (159, 191),\
(160, 190)], Id=7)
prs = PairedRegions([pr0, pr1, pr2, pr3, pr4, pr5, pr6, pr7])
exp = PairedRegions([pr1, pr2, pr4, pr6]).toPairs()
exp_rem = PairedRegions([pr0, pr3, pr5, pr7]).toPairs()
self.assertEqual(f(prs.toPairs(), func, add_back=False,\
return_removed=True), (exp, exp_rem))
# add back circular removals
exp = PairedRegions([pr1, pr2, pr4, pr5, pr6]).toPairs()
exp_rem = PairedRegions([pr0, pr3, pr7]).toPairs()
self.assertEqual(f(prs.toPairs(), func, add_back=True,\
return_removed=True), (exp, exp_rem))
class IncrementalMethodsTests(TestCase):
"""Tests for incremental pseudoknot-removal methods"""
def test_inc_order_forward(self):
"""nested_in_order: starting at 5' end"""
f = inc_order
p = Pairs([(1,10),(2,9),(3,15),(4,14),(11,20),(12,19),(25,30)])
exp = Pairs([(1,10),(2,9),(11,20),(12,19),(25,30)])
exp_rem = Pairs([(3,15),(4,14)])
self.assertEqual(f(p,reversed=False), exp)
self.assertEqual(f(p,return_removed=True), (exp, exp_rem))
p = Pairs([(1,20),(2,30),(3,29),(4,28),(5,27),(7,24)])
exp = Pairs([(1,20)])
exp_rem = Pairs([(2,30),(3,29),(4,28),(5,27),(7,24)])
self.assertEqual(f(p,reversed=False), exp)
self.assertEqual(f(p,return_removed=True), (exp, exp_rem))
self.assertEqual(f([]), [])
p = [(1,10),(3,13)] # input as list of tuples
exp = Pairs([(1,10)])
self.assertEqual(f(p), exp)
p = [(1,10),(4,7),(2,9),(5,None)] # pseudoknot-free
exp = [(1,10),(2,9),(4,7)]
self.assertEqual(f(p), exp)
p = [(1,10),(2,10)] #conflict
self.assertRaises(ValueError, f, p)
def test_inc_order_reversed(self):
"""nested_in_order: starting at 3' end"""
f = inc_order
p = Pairs([(1,10),(2,9),(3,15),(4,14),(24,31),(25,30)])
exp = Pairs([(3,15),(4,14),(24,31),(25,30)])
exp_rem = Pairs([(1,10),(2,9)])
self.assertEqual(f(p,reversed=True), exp)
self.assertEqual(f(p, reversed=True, return_removed=True),\
(exp, exp_rem))
p = Pairs([(1,20),(2,30),(3,29),(4,28),(5,27),(7,24)])
exp = Pairs([(2,30),(3,29),(4,28),(5,27),(7,24)])
exp_rem = Pairs([(1,20)])
self.assertEqual(f(p,reversed=True), exp)
self.assertEqual(f(p, reversed=True, return_removed=True),\
(exp, exp_rem))
self.assertEqual(f([], reversed=True), [])
p = [(1,10),(3,13)] # input as list of tuples
exp = Pairs([(3,13)])
self.assertEqual(f(p, reversed=True), exp)
p = [(1,10),(4,7),(2,9),(5,None)] # pseudoknot-free
exp = [(1,10),(2,9),(4,7)]
self.assertEqual(f(p), exp)
p = [(1,10),(2,10)] #conflict
self.assertRaises(ValueError, f, p)
def test_inc_length(self):
"""inc_length: should handle standard input
"""
f = inc_length
# All blocks in conflict, start empty, add first
p = Pairs([(1,10),(2,9),(3,8),(5,13),(6,12),(7,11)])
exp = Pairs([(1,10),(2,9),(3,8)])
self.assertEqual(f(p), exp)
# Start with length 3 and 2, add 1 block
p = Pairs([(1,10),(2,9),(3,8),(20,30),(21,29),(25,40),(32,38)])
exp = Pairs([(1,10),(2,9),(3,8),(20,30),(21,29),(32,38)])
self.assertEqual(f(p), exp)
p = Pairs([(1,10),(2,9),(3,8),(12,20),(13,19),(15,23),(16,22)])
exp_5 = Pairs([(1,10),(2,9),(3,8),(12,20),(13,19)])
exp_3 = Pairs([(1,10),(2,9),(3,8),(15,23),(16,22)])
self.assertEqual(f(p), exp_5)
self.assertEqual(f(p, reversed=True), exp_3)
self.assertEqual(f(p, return_removed=True),(exp_5,[(15,23),(16,22)]))
p = [(1,10),(4,7),(2,9),(5,None)] # pseudoknot-free
exp = [(1,10),(2,9),(4,7)]
self.assertEqual(f(p), exp)
p = [(1,10),(2,10)] #conflict
self.assertRaises(ValueError, f, p)
def test_inc_length_rev(self):
"""inc_length: should prefer 3' side when reversed is True
"""
f = inc_length
p = Pairs([(1,10),(2,9),(5,20),(6,19)])
self.assertEqual(f(p), [(1,10),(2,9)])
self.assertEqual(f(p, reversed=True), [(5,20),(6,19)])
def test_inc_range(self):
"""inc_range: should handle normal input
"""
f = inc_range
p = [(1,5),(4,20),(15,23),(16,22)]
exp = [(1,5),(15,23),(16,22)]
self.assertEqual(f(p), exp)
self.assertEqual(f(p, return_removed=True), (exp, [(4,20)]))
p = [(1,11),(5,15)] # same range
self.assertEqual(f(p), [(1,11)]) # 5' wins
self.assertEqual(f(p, reversed=True), [(5,15)]) # 3' wins
p = [(1,10),(2,10)] #conflict
self.assertRaises(ValueError, f, p)
def test_inc_range_empty(self):
"""inc_range: should handle empty or pseudoknot-free pairs
"""
f = inc_range
p = []
exp = []
self.assertEqual(f(p), exp)
p = [(1,10),(4,7),(2,9),(5,None)]
exp = [(1,10),(2,9),(4,7)]
self.assertEqual(f(p), exp)
class NussinovTests(TestCase):
"""Tests for restricted nussinov algorithm and related functions"""
def test_nussinov_fill(self):
"""nussinov_fill: basic test"""
p = Pairs([(0,7),(1,6),(2,5),(3,9),(4,8)])
exp = [[0,0,0,0,0,1,2,3,3,3],
[0,0,0,0,0,1,2,2,2,2],
[0,0,0,0,0,1,1,1,1,2],
[0,0,0,0,0,0,0,0,1,2],
[0,0,0,0,0,0,0,0,1,1,],
[0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,0,0,0]]
obs = nussinov_fill(p,size=10)
self.assertEqual(obs, exp)
def test_nussinov_traceback(self):
"""nussinov_traceback: basic test"""
p = Pairs([(0,7),(1,6),(2,5),(3,9),(4,8)])
m = nussinov_fill(p,size=10)
exp = set([(0,7),(1,6),(2,5)])
obs = nussinov_traceback(m, 0, 9, p)
self.assertEqual(obs, exp)
def test_nussinov_restricted(self):
"""nussinov_restricted: basic test"""
p = Pairs([(0,7),(1,6),(2,5),(3,9),(4,8)])
obs = nussinov_restricted(p)
obs_rem = nussinov_restricted(p, return_removed=True)
exp = [(0,7),(1,6),(2,5)]
exp_rem = ([(0,7),(1,6),(2,5)],[(3,9),(4,8)])
self.assertEqual(obs, exp)
self.assertEqual(obs_rem, exp_rem)
p = Pairs([(0,7),(1,6),(2,6)])
self.assertRaises(ValueError, nussinov_restricted, p)
p = Pairs([(0,7),(1,6),(2,5)])
exp = Pairs([(0,7),(1,6),(2,5)])
self.assertEqual(nussinov_restricted(p), exp)
def test_nussinov_restricted_bi(self):
"""nussinov_restricted: include bifurcation"""
p = Pairs([(0,7),(1,6),(2,14),(3,13),(4,12),(5,11),\
(8,17),(9,16),(10,15)])
obs = nussinov_restricted(p)
obs_rem = nussinov_restricted(p, return_removed=True)
exp = [(0,7),(1,6),(8,17),(9,16),(10,15)]
exp_rem = ([(0,7),(1,6),(8,17),(9,16),(10,15)],\
[(2,14),(3,13),(4,12),(5,11)])
self.assertEqual(obs, exp)
self.assertEqual(obs_rem, exp_rem)
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_struct/test_manipulation.py 000644 000765 000024 00000007652 12024702176 024111 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
import os, tempfile
try:
from cogent.util.unit_test import TestCase, main
from cogent.parse.pdb import PDBParser
from cogent.format.pdb import PDBWriter
from cogent.struct.selection import einput
from cogent.struct.manipulation import copy, clean_ical, \
expand_symmetry, expand_crystal
except ImportError:
from zenpdb.cogent.util.unit_test import TestCase, main
from zenpdb.cogent.parse.pdb import PDBParser
from cogent.format.pdb import PDBWriter
from zenpdb.cogent.struct.selection import einput
from zenpdb.cogent.struct.manipulation import copy, clean_ical, \
exapnd_symmetry, expand_crystal
__author__ = "Marcin Cieslik"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Marcin Cieslik"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Marcin Cieslik"
__email__ = "mpc4p@virginia.edu"
__status__ = "Development"
class ManipulationTest(TestCase):
"""tests manipulationg entities"""
def setUp(self):
self.input_file = os.path.join('data', '2E12.pdb')
self.input_structure = PDBParser(open(self.input_file))
def test_clean_ical(self):
"""tests the clean ical function which cleans structures."""
chainB = self.input_structure.table['C'][('2E12', 0, 'B')]
leu25 = self.input_structure.table['R'][('2E12', 0, 'B', \
('LEU', 25, ' '))]
leu25icA = copy(leu25)
self.assertTrue(leu25icA.parent is None)
self.assertTrue(leu25icA is not leu25)
self.assertTrue(leu25icA[(('N', ' '),)] is not leu25[(('N', ' '),)])
leu25icA.setIc('A')
self.assertEquals(leu25icA.getId(), (('LEU', 25, 'A'),))
chainB.addChild(leu25icA)
self.assertFalse(chainB[(('LEU', 25, 'A'),)] is \
chainB[(('LEU', 25, ' '),)])
self.assertEquals(clean_ical(self.input_structure), \
([], [('2E12', 0, 'B', ('LEU', 25, 'A'))]))
clean_ical(self.input_structure, pretend=False)
self.assertTrue(chainB[(('LEU', 25, 'A'),)] is leu25icA)
self.assertFalse((('LEU', 25, 'A'),) in chainB.keys())
self.assertFalse((('LEU', 25, 'A'),) in chainB)
self.assertTrue((('LEU', 25, 'A'),) in chainB.keys(unmask=True))
self.input_structure.setUnmasked(force=True)
self.assertEquals(clean_ical(self.input_structure), \
([], [('2E12', 0, 'B', ('LEU', 25, 'A'))]))
clean_ical(self.input_structure, pretend=False, mask=False)
self.assertFalse((('LEU', 25, 'A'),) in chainB.keys())
self.assertFalse((('LEU', 25, 'A'),) in chainB)
self.assertFalse((('LEU', 25, 'A'),) in chainB.keys(unmask=True))
def test_0expand_symmetry(self):
"""tests the expansion of a asu to a unit-cell."""
global fn
mh = expand_symmetry(self.input_structure[(0,)])
fd, fn = tempfile.mkstemp('.pdb')
os.close(fd)
fh = open(fn, 'w')
PDBWriter(fh, mh, self.input_structure.raw_header)
fh.close()
def test_1expand_crystal(self):
"""tests the expansion of a unit-cell to a crystal"""
fh = open(fn, 'r')
input_structure = PDBParser(fh)
self.assertTrue(input_structure.values(), 4) # 4 models
sh = expand_crystal(input_structure)
self.assertTrue(len(sh) == 27)
fd, fn2 = tempfile.mkstemp('.pdb')
os.close(fd)
fh = open(fn2, 'w')
a1 = einput(input_structure, 'A')
a2 = einput(sh.values()[3], 'A')
k = a1.values()[99].getFull_id()
name = sh.values()[3].name
a1c = a1[k].coords
a2c = a2[(name,) + k[1:]].coords
self.assertTrue(len(a1), len(a2))
self.assertRaises(AssertionError, self.assertFloatEqual, a1c, a2c)
PDBWriter(fh, sh)
fh.close()
os.unlink(fn)
os.unlink(fn2)
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_struct/test_pairs_util.py 000644 000765 000024 00000105505 12024702176 023560 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
# test_pairs_util.py
"""Provides tests for gapping/ungapping functions and base pair comparison
"""
from __future__ import division
from cogent.util.unit_test import TestCase, main
from cogent.core.sequence import RnaSequence, ModelSequence, Sequence
from cogent.core.moltype import RNA
from cogent.core.alphabet import CharAlphabet
from cogent.struct.rna2d import Pairs
from cogent.struct.pairs_util import PairsAdjustmentError,\
adjust_base, adjust_base_structures, adjust_pairs_from_mapping,\
delete_gaps_from_pairs, insert_gaps_in_pairs, gapped_to_ungapped,\
get_gap_symbol, get_gap_list, degap_model_seq, degap_seq,\
ungapped_to_gapped,\
pairs_intersection, pairs_union, compare_pairs,\
compare_pairs_mapping, compare_random_to_correct,\
sensitivity, selectivity, get_all_pairs, get_counts, extract_seqs,\
mcc, approximate_correlation, correlation_coefficient, all_metrics
__author__ = "Sandra Smit"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Sandra Smit", "Shandy Wikman", "Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Sandra Smit"
__email__ = "sandra.smit@colorado.edu"
__status__ = "Production"
class GappedUngappedTests(TestCase):
"""Provides tests for gapped_to_ungapped and ungapped_to_gapped functions
"""
def setUp(self):
"""setUp: set up method for all tests"""
self.rna1 = RnaSequence('UCAG-RYN-N', Name='rna1')
self.m1 = ModelSequence('UCAG-RYN-N', Name='rna1',\
Alphabet=RNA.Alphabets.DegenGapped)
self.s1 = 'UCAG-RYN-N'
def test_adjust_base(self):
"""adjust_base: should work for pairs object or list of pairs"""
p = Pairs()
self.assertEqual(adjust_base(p,10),[])
pairs = [(1,21),(2,15),(3,13),(4,11),(5,10),(6,9)]
offset = -1
expected = [(0,20),(1,14),(2,12),(3,10),(4,9),(5,8)]
obs_pairs = adjust_base(pairs, offset)
self.assertEqual(obs_pairs, expected)
pairs = Pairs([(0,10),(1,9)])
self.assertEqual(adjust_base(pairs, -1), Pairs([(-1,9),(0,8)]))
self.assertEqual(adjust_base(pairs, 5), Pairs([(5,15),(6,14)]))
self.assertRaises(PairsAdjustmentError, adjust_base, pairs, 3.5)
def test_adjust_base_structures(self):
"""adjust_pairs_structures: simple structure"""
p = Pairs([(3,10),(4,9)])
p2 = Pairs([(2,7), (30,40)])
self.assertEqual(adjust_base_structures([p,p2], -1),\
[[(2,9),(3,8)],[(1,6),(29,39)]])
def test_adjust_base_None(self):
"""adjust_base: should keep Nones or duplicates, ignore conflicts"""
pairs = Pairs([(2,8),(3,7),(6,None),(None,None),(2,10)])
expected = Pairs([(1,7),(2,6),(5,None),(None, None),(1,9)])
self.assertEqual(adjust_base(pairs,-1), expected)
p = Pairs([(1,2),(2,1),(1,2),(2,None)])
self.assertEqual(adjust_base(p, 1), [(2,3),(3,2),(2,3),(3,None)])
def test_adjust_pairs_from_mapping_confl(self):
"""adjust_pairs_from_mapping: should handle conflicts, pseudo, dupl
"""
f = adjust_pairs_from_mapping
p = Pairs([(0,6),(1,5),(2,None),(None,None),(1,4),(3,7),(6,0)])
m = {0:1,1:3,2:6,3:7,4:8,5:10,6:11,7:12}
exp = Pairs([(1,11),(3,10),(6,None),(None,None),(3,8),(7,12),(11,1)])
self.assertEqual(f(p, m), exp)
p = Pairs([(1,11),(3,10),(7,12),(6,None),(None,None),(5,8)])
m = {1: 0, 3: 1, 6: 2, 7: 3, 8: 4, 10: 5, 11: 6, 12: 7}
exp = Pairs([(0,6),(1,5),(3,7),(2,None),(None,None)])
self.assertEqual(f(p,m), exp)
def test_delete_gaps_from_pairs(self):
"""delete_gaps_from_pairs: should work on standard input"""
r = delete_gaps_from_pairs
# empty list
p = Pairs([])
self.assertEqual(r(p,[1,2,3]), [])
# normal list
p1 = Pairs([(2,8), (3,6)])
gap_list = [0,1,4,5,7,9]
self.assertEqualItems(r(p1, gap_list), [(0,3),(1,2)])
p2 = Pairs([(2,8),(3,6),(4,9)])
self.assertEqualItems(r(p2, gap_list), [(0,3),(1,2)])
p3 = Pairs([(2,8),(3,6),(4,10)])
self.assertEqualItems(r(p3, gap_list), [(0,3),(1,2)])
def test_delete_gaps_from_pairs_weird(self):
"""delete_gaps_from_pairs: should ignore conflicts etc"""
r = delete_gaps_from_pairs
gap_list = [0,1,4,5,7,9]
p = Pairs([(2,6),(3,8)])
self.assertEqualItems(r(p, gap_list), [(0,2),(1,3)])
p = Pairs([(2,6),(3,8),(3,None),(6,2),(3,8), (None, None)])
self.assertEqualItems(r(p, gap_list),\
[(0,2),(1,3),(1,None),(2,0),(1,3),(None, None)])
def test_insert_gaps_in_pairs(self):
"""insert_gaps_in_pairs: should work with normal and conflicts"""
p = Pairs([(0,3),(1,2),(1,4),(3,None)])
gaps = [0,1,4,5,7]
self.assertEqual(insert_gaps_in_pairs(p, gaps),\
[(2,8),(3,6),(3,9),(8,None)])
p = Pairs([(0,6),(1,5),(2,None),(3,7),(0,1),(5,1)])
gaps = [0,2,6,9]
self.assertEqual(insert_gaps_in_pairs(p, gaps),\
[(1,10),(3,8),(4,None),(5,11),(1,3),(8,3)])
gaps = [2,3,4,9]
self.assertEqual(insert_gaps_in_pairs(p, gaps),\
[(0,10),(1,8),(5,None),(6,11),(0,1),(8,1)])
p = Pairs([(0,6),(1,5),(2,None),(3,7),(0,1),(5,1)])
gaps = []
self.assertEqual(insert_gaps_in_pairs(p, gaps),\
[(0,6),(1,5),(2,None),(3,7),(0,1),(5,1)])
def test_get_gap_symbol(self):
"""get_gap_symbol: Sequence, ModelSequence, old_cogent, string"""
self.assertEqual(get_gap_symbol(self.rna1), '-')
self.assertEqual(get_gap_symbol(self.m1), '-')
self.assertEqual(get_gap_symbol(self.s1), '-')
self.assertEqual(get_gap_symbol(''), '-')
def test_get_gap_list(self):
"""get_gap_list: Sequence, ModelSequence, old_cogent, string"""
gs = '-'
self.assertEqual(get_gap_list(self.rna1), [4,8])
self.assertEqual(get_gap_list(self.m1), [4,8])
self.assertEqual(get_gap_list(self.s1,gs),[4,8])
self.assertEqual(get_gap_list('',gs), [])
def test_degap_model_seq(self):
"""degap_model_seq: replacement for broken method"""
self.assertEqual(str(degap_model_seq(self.m1)),'UCAGRYNN')
def test_degap_seq(self):
"""degap_seq: Sequence, ModelSequence, old_cogent, string"""
f = degap_seq
gs = '-'
self.assertEqual(f(self.rna1, gs), 'UCAGRYNN')
self.assertEqual(str(f(self.m1, gs)), 'UCAGRYNN')
self.assertEqual(f(self.s1, gs), 'UCAGRYNN')
def test_gapped_to_ungapped(self):
"""gapped_to_ungapped: Sequence, ModelSequence, old_cogent, string
"""
p = Pairs([(0,6),(1,5),(3,9)])
exp = Pairs([(0,5),(1,4),(3,7)])
f = gapped_to_ungapped
self.assertEqual(f(self.rna1, p)[1], exp)
self.assertEqual(f(self.m1, p)[1], exp)
self.assertEqual(f(self.s1, p)[1], exp)
def test_ungapped_to_gapped(self):
"""ungapped_to_gapped: Sequence, ModelSequence, old_cogent, string
"""
p = Pairs([(0,6),(1,5),(3,9)])
exp = Pairs([(0,5),(1,4),(3,7)])
f = ungapped_to_gapped
self.assertEqual(f(self.rna1, exp)[1], p)
self.assertEqual(f(self.m1, exp)[1], p)
self.assertEqual(f(self.s1, exp)[1], p)
class OldAdjustmentFunctionsTests(TestCase):
"""Provides tests for gapped_to_ungapped and ungapped_to_gapped functions
"""
def setUp(self):
"""setUp: set up method for all tests"""
self.ungapped = 'AGAUGCUAGCUAC'
self.gapped = '-AGA--UGC-UAG--CUAC'
self.diff_sym = '*AGA**UGC*UAG**CUAC'
self.simple = Pairs([(2,7),(3,6),(8,12)])
self.simple_g = Pairs([(3,11),(6,10),(12,18)])
self.out_order = Pairs([(6,10),(4,1),(9,7),(5,11)])
self.out_order_g = Pairs([(10,16),(7,2),(15,11),(8,17)])
self.duplicates = Pairs([(3,9),(3,9),(2,10),(0,12)])
self.duplicates_g = Pairs([(6,15),(6,15),(3,16),(1,18)])
self.pseudo = Pairs([(0,7),(2,6),(3,10)])
self.pseudo_g = Pairs([(1,11),(3,10),(6,16)])
def test_adjust_base(self):
"""adjust_base: should work for pairs object or list of pairs"""
p = Pairs()
self.assertEqual(adjust_base(p,10),[])
pairs = [(1,21),(2,15),(3,13),(4,11),(5,10),(6,9)]
offset = -1
expected = [(0,20),(1,14),(2,12),(3,10),(4,9),(5,8)]
obs_pairs = adjust_base(pairs, offset)
self.assertEqual(obs_pairs, expected)
pairs = Pairs([(0,10),(1,9)])
self.assertEqual(adjust_base(pairs, -1), Pairs([(-1,9),(0,8)]))
self.assertEqual(adjust_base(pairs, 5), Pairs([(5,15),(6,14)]))
self.assertRaises(PairsAdjustmentError, adjust_base, pairs, 3.5)
def test_adjust_base_None(self):
"""adjust_base: should keep Nones or duplicates, ignore conflicts"""
pairs = Pairs([(2,8),(3,7),(6,None),(None,None),(2,10)])
expected = Pairs([(1,7),(2,6),(5,None),(None, None),(1,9)])
self.assertEqual(adjust_base(pairs,-1), expected)
p = Pairs([(1,2),(2,1),(1,2),(2,None)])
self.assertEqual(adjust_base(p, 1), [(2,3),(3,2),(2,3),(3,None)])
def test_delete_gaps_from_pairs(self):
"""delete_gaps_from_pairs: should work on standard input"""
r = delete_gaps_from_pairs
# empty list
p = Pairs([])
self.assertEqual(r(p,[1,2,3]), [])
# normal list
p1 = Pairs([(2,8), (3,6)])
gap_list = [0,1,4,5,7,9]
self.assertEqualItems(r(p1, gap_list), [(0,3),(1,2)])
p2 = Pairs([(2,8),(3,6),(4,9)])
self.assertEqualItems(r(p2, gap_list), [(0,3),(1,2)])
p3 = Pairs([(2,8),(3,6),(4,10)])
self.assertEqualItems(r(p3, gap_list), [(0,3),(1,2)])
def test_delete_gaps_from_pairs_weird(self):
"""delete_gaps_from_pairs: should ignore conflicts etc"""
r = delete_gaps_from_pairs
gap_list = [0,1,4,5,7,9]
p = Pairs([(2,6),(3,8)])
self.assertEqualItems(r(p, gap_list), [(0,2),(1,3)])
p = Pairs([(2,6),(3,8),(3,None),(6,2),(3,8), (None, None)])
self.assertEqualItems(r(p, gap_list),\
[(0,2),(1,3),(1,None),(2,0),(1,3),(None, None)])
def test_insert_gaps_in_pairs(self):
"""insert_gaps_in_pairs: should work with normal and conflicts"""
p = Pairs([(0,3),(1,2),(1,4),(3,None)])
gaps = [0,1,4,5,7]
self.assertEqual(insert_gaps_in_pairs(p, gaps),\
[(2,8),(3,6),(3,9),(8,None)])
p = Pairs([(0,6),(1,5),(2,None),(3,7),(0,1),(5,1)])
gaps = [0,2,6,9]
self.assertEqual(insert_gaps_in_pairs(p, gaps),\
[(1,10),(3,8),(4,None),(5,11),(1,3),(8,3)])
gaps = [2,3,4,9]
self.assertEqual(insert_gaps_in_pairs(p, gaps),\
[(0,10),(1,8),(5,None),(6,11),(0,1),(8,1)])
def test_gapped_to_ungapped_simple(self):
"""gapped_to_ungapped: should work for simple case"""
s = RnaSequence(self.gapped)
p = self.simple_g
obs_seq, obs_pairs = gapped_to_ungapped(s,p)
self.assertEqual(obs_seq, self.ungapped)
self.assertEqualItems(obs_pairs, self.simple)
assert isinstance(obs_seq, RnaSequence)
assert isinstance(obs_pairs, Pairs)
def test_gapped_to_ungapped_out_of_order(self):
"""gapped_to_ungapped: should work when pairs are out of order
"""
s = RnaSequence(self.gapped)
p = Pairs(self.out_order_g)
obs_seq, obs_pairs = gapped_to_ungapped(s,p)
self.assertEqual(obs_seq, self.ungapped)
self.assertEqualItems(obs_pairs, self.out_order)
assert isinstance(obs_seq, RnaSequence)
assert isinstance(obs_pairs, Pairs)
def test_gapped_to_ungapped_duplicates(self):
"""gapped_to_ungapped: should work when pairs contains duplicates
"""
s = RnaSequence(self.gapped)
p = Pairs(self.duplicates_g)
obs_seq, obs_pairs = gapped_to_ungapped(s,p)
self.assertEqual(obs_seq, self.ungapped)
self.assertEqualItems(obs_pairs, self.duplicates)
assert isinstance(obs_seq, RnaSequence)
assert isinstance(obs_pairs, Pairs)
def test_gapped_to_ungapped_pseudo(self):
"""gapped_to_ungapped: shouldn't care about pseudoknots
"""
s = RnaSequence(self.gapped)
p = Pairs(self.pseudo_g)
obs_seq, obs_pairs = gapped_to_ungapped(s,p)
self.assertEqual(obs_seq, self.ungapped)
self.assertEqualItems(obs_pairs, self.pseudo)
assert isinstance(obs_seq, RnaSequence)
assert isinstance(obs_pairs, Pairs)
def test_gapped_to_ungapped_no_gaps(self):
"""gapped_to_ungapped: should return same pairs when no gaps
"""
s = RnaSequence(self.ungapped)
p = Pairs(self.simple)
obs_seq, obs_pairs = gapped_to_ungapped(s,p)
self.assertEqual(obs_seq, self.ungapped)
self.assertEqualItems(obs_pairs, self.simple)
assert isinstance(obs_seq, RnaSequence)
assert isinstance(obs_pairs, Pairs)
def test_ungapped_to_gapped(self):
"""ungapped_to_gapped: should work for basic case
"""
s = RnaSequence(self.gapped)
p = self.simple
obs_seq, obs_pairs = ungapped_to_gapped(s,p)
assert obs_seq is s
self.assertEqualItems(obs_pairs, self.simple_g)
assert isinstance(obs_seq, RnaSequence)
assert isinstance(obs_pairs, Pairs)
def test_ungapped_to_gapped_out_of_order(self):
"""ungapped_to_gapped: should work when pairs out of order
"""
s = RnaSequence(self.gapped)
p = self.out_order
obs_seq, obs_pairs = ungapped_to_gapped(s,p)
assert obs_seq is s
self.assertEqualItems(obs_pairs, self.out_order_g)
assert isinstance(obs_seq, RnaSequence)
assert isinstance(obs_pairs, Pairs)
def test_gapped_to_ungapped_simple(self):
"""gapped_to_ungapped: should work on simple case
"""
s = self.gapped
p = self.simple_g
obs_seq, obs_pairs = gapped_to_ungapped(s,p)
self.assertEqual(obs_seq, self.ungapped)
self.assertEqualItems(obs_pairs, self.simple)
assert not isinstance(obs_seq, RnaSequence)
assert isinstance(obs_seq, str)
assert isinstance(obs_pairs, Pairs)
def test_gapped_to_ungapped_pseudo(self):
"""gapped_to_ungapped: shouldn't care about pseudoknots
"""
s = self.gapped
p = self.pseudo_g
obs_seq, obs_pairs = gapped_to_ungapped(s,p)
self.assertEqual(obs_seq, self.ungapped)
self.assertEqualItems(obs_pairs, self.pseudo)
assert not isinstance(obs_seq, RnaSequence)
assert isinstance(obs_seq, str)
assert isinstance(obs_pairs, Pairs)
def test_ungapped_to_gapped_simple(self):
"""ungapped_to_gapped: should work on basic case"""
s = self.gapped
p = self.simple
obs_seq, obs_pairs = ungapped_to_gapped(s,p)
assert obs_seq is s
self.assertEqualItems(obs_pairs, self.simple_g)
assert not isinstance(obs_seq, RnaSequence)
assert isinstance(obs_seq, str)
assert isinstance(obs_pairs, Pairs)
def test_ungapped_to_gapped_duplicates(self):
"""ungapped_to_gapped: should work when pairs are duplicated"""
s = self.gapped
p = self.duplicates
obs_seq, obs_pairs = ungapped_to_gapped(s,p)
assert obs_seq is s
self.assertEqualItems(obs_pairs, self.duplicates_g)
assert not isinstance(obs_seq, RnaSequence)
assert isinstance(obs_seq, str)
assert isinstance(obs_pairs, Pairs)
def test_gapped_to_ungapped_general(self):
"""gapped_to_ungapped: should return object of right type
"""
s = RnaSequence(self.gapped)
p = self.simple_g
#in case of RnaSequence
obs_seq, obs_pairs = gapped_to_ungapped(s,p)
self.assertEqual(obs_seq, self.ungapped)
self.assertEqualItems(obs_pairs, self.simple)
assert isinstance(obs_seq, RnaSequence)
assert isinstance(obs_pairs, Pairs)
#in case of str input
s = self.gapped
obs_seq, obs_pairs = gapped_to_ungapped(s,p)
self.assertEqual(obs_seq, self.ungapped)
self.assertEqualItems(obs_pairs, self.simple)
assert not isinstance(obs_seq, RnaSequence)
assert isinstance(obs_seq, str)
assert isinstance(obs_pairs, Pairs)
def test_ungapped_to_gapped_general(self):
"""ungapped_to_gapped: should return object of right type
"""
s = RnaSequence(self.gapped)
p = self.simple
#in case of RnaSequence
obs_seq, obs_pairs = ungapped_to_gapped(s,p)
assert obs_seq is s
self.assertEqualItems(obs_pairs, self.simple_g)
assert isinstance(obs_seq, RnaSequence)
assert isinstance(obs_pairs, Pairs)
#in case of str input
s = self.gapped
obs_seq, obs_pairs = ungapped_to_gapped(s,p)
assert obs_seq is s
self.assertEqualItems(obs_pairs, self.simple_g)
assert not isinstance(obs_seq, RnaSequence)
assert isinstance(obs_seq, str)
assert isinstance(obs_pairs, Pairs)
def test_gapped_to_ungapped_general_seq(self):
"""gapped_to_ungapped: when input is Sequence obj, treat as string
"""
s = Sequence(self.gapped)
p = self.simple_g
obs_seq, obs_pairs = gapped_to_ungapped(s,p)
self.assertEqual(obs_seq, self.ungapped)
self.assertEqualItems(obs_pairs, self.simple)
#assert not isinstance(obs_seq, Sequence)
#assert isinstance(obs_seq, str)
assert isinstance(obs_seq, Sequence)
assert isinstance(obs_pairs, Pairs)
def test_adjust_pairs_from_mapping(self):
"""adjust_pairs_from_mapping: should work both ways
"""
#ungapped to gapped
r = RnaSequence('UC-AG-UC-CG-A-')
u_to_g = r.gapMaps()[0]
#{0: 0, 1: 1, 2: 3, 3: 4, 4: 6, 5: 7, 6: 9, 7: 10, 8: 12}
ungapped_pairs = Pairs([(0,8),(1,6),(2,5)])
exp_pairs = Pairs([(0,12),(1,9),(3,7)])
self.assertEqualItems(adjust_pairs_from_mapping(ungapped_pairs,\
u_to_g), exp_pairs)
#gapped to ungapped
r = RnaSequence('UC-AG-UC-CG-A-')
g_to_u = r.gapMaps()[1]
#{0: 0, 1: 1, 3: 2, 4: 3, 6: 4, 7: 5, 9: 6, 10: 7, 12: 8}
gapped_pairs = Pairs([(0,12),(1,9),(3,7)])
exp_pairs = Pairs([(0,8),(1,6),(2,5)])
self.assertEqualItems(adjust_pairs_from_mapping(gapped_pairs,\
g_to_u), exp_pairs)
class PairsComparisonTests(TestCase):
"""Provides tests for comparing different Pairs objects"""
def test_pairs_intersection(self):
"""pairs_intersection: should work on simple case
"""
p1 = Pairs([(3,10),(4,9),(5,8),(20,24)])
p2 = Pairs([(1,12),(4,9),(5,8)])
self.assertEqualItems(pairs_intersection(p1,p2),[(4,9),(5,8)])
#works when one is empty
p1 = Pairs([(3,10),(4,9),(5,8),(20,24)])
p2 = Pairs([])
self.assertEqualItems(pairs_intersection(p1,p2),[])
#works also on lists (not Pairs)
p1 = [(3,10),(4,9),(5,8),(20,24)]
p2 = [(1,12),(4,9),(5,8)]
self.assertEqualItems(pairs_intersection(p1,p2),[(4,9),(5,8)])
def test_pairs_intersection_duplicates(self):
"""pairs_intersection: should work on flipped pairs and duplicates
"""
p1 = Pairs([(3,10),(4,9),(5,8),(20,24)])
p2 = Pairs([(10,3),(4,9),(5,8),(9,4),(4,9),(23,30)])
self.assertEqualItems(pairs_intersection(p1,p2),[(3,10),(4,9),(5,8)])
# Conflicts, duplicates, None, pseudoknots
p1 = Pairs([(3,10),(4,9),(5,8),(20,24),(22,26),(3,2),(9,4),(6,None)])
p2 = Pairs([(1,12),(4,9),(5,8)])
self.assertEqualItems(pairs_intersection(p1,p2),\
[(4,9),(5,8)])
def test_pairs_union(self):
"""pairs_union: should work on simple case
"""
p1 = Pairs([(3,10),(4,9),(5,8),(20,24)])
p2 = Pairs([(1,12),(4,9),(5,8)])
self.assertEqualItems(pairs_union(p1,p2),\
[(1,12),(3,10),(4,9),(5,8),(20,24)])
#works when one is empty
p1 = Pairs([(3,10),(4,9),(5,8),(20,24)])
p2 = Pairs([])
self.assertEqualItems(pairs_union(p1,p2),p1)
#works also on lists (not Pairs)
p1 = [(3,10),(4,9),(5,8),(20,24)]
p2 = [(1,12),(4,9),(5,8)]
self.assertEqualItems(pairs_union(p1,p2),\
[(1,12),(3,10),(4,9),(5,8),(20,24)])
def test_union_duplicates(self):
"""pairs_union: should work on flipped base pairs and duplicates
"""
p1 = Pairs([(3,10),(4,9),(5,8),(20,24)])
p2 = Pairs([(10,3),(4,9),(5,8),(9,4),(4,9),(23,30)])
self.assertEqualItems(pairs_union(p1,p2),\
[(3,10),(4,9),(5,8),(20,24),(23,30)])
# Conflicts, duplicates, None, pseudoknots
p1 = Pairs([(3,10),(4,9),(5,8),(20,24),(22,26),(3,2),(9,4),(6,None)])
p2 = Pairs([(1,12),(4,9),(5,8)])
self.assertEqualItems(pairs_union(p1,p2),\
[(1,12),(3,10),(4,9),(5,8),(20,24),(22,26),(2,3)])
def test_compare_pairs(self):
"""compare_pairs: should work on simple case"""
#all the same
p1 = Pairs([(3,10),(4,9),(5,8),(20,24)])
p2 = Pairs([(3,10),(4,9),(5,8),(20,24)])
self.assertEqual(compare_pairs(p1,p2),1)
#all different
p1 = Pairs([(3,10),(4,9),(5,8),(20,24)])
p2 = Pairs([(1,2),(3,4),(5,6)])
self.assertEqual(compare_pairs(p1,p2),0)
#one empty
p1 = Pairs([(3,10),(4,9),(5,8),(20,24)])
p2 = Pairs([])
self.assertEqual(compare_pairs(p1,p2),0)
#partially different
p1 = Pairs([(1,2),(3,4),(5,6),(7,8)])
p2 = Pairs([(1,2),(3,4),(9,10),(11,12)])
self.assertFloatEqual(compare_pairs(p1,p2),.33333333333333333)
#partially different
p1 = Pairs([(1,2),(3,4),(5,6)])
p2 = Pairs([(1,2),(3,4),(9,10)])
self.assertFloatEqual(compare_pairs(p1,p2),.5)
def test_compare_pairs_both_empty(self):
"""compare_pairs: should return 1.0 when both lists are empty
"""
p1 = Pairs([])
p2 = Pairs([])
self.assertEqual(compare_pairs(p1,p2),1)
def test_compare_pairs_weird(self):
"""compare_pairs: should handle conflicts, duplicates, pseudo, None
"""
#Should raise error on conflict
p1 = Pairs([(1,2),(3,4),(5,6),(2,None),(4,3),(None,None)])
p2 = Pairs([(1,2),(3,4),(9,10)])
self.assertRaises(ValueError, compare_pairs, p1, p2)
p1 = Pairs([(1,2),(3,4),(5,6),(4,3),(None,None),(10,None)])
p2 = Pairs([(1,2),(3,4),(9,10)])
self.assertFloatEqual(compare_pairs(p1,p2),.5)
p1 = Pairs([(1,8),(2,10),(7,3)])
p2 = Pairs([(1,8),(10,2),(3,7),(4,6)])
self.assertFloatEqual(compare_pairs(p1,p2), 0.75)
def test_compare_pairs_mapping(self):
"""compare_pairs_mapping: should work with correct mapping
"""
# pos in first seq, base, pos in second seq
#1 U 0
#2 C 1
#3 G 2
#4 A 3
# A 4
#5 C 5
#6 C 6
#7 U
#8 G 7
#all the same
p1 = Pairs([(3,6),(1,8)])
p2 = Pairs([(2,6),(0,7)])
mapping = {1:0,2:1,3:2,4:3,5:5,6:6,7:None, 8:7}
self.assertEqual(compare_pairs_mapping(p1,p2, mapping),1)
#all different
p1 = Pairs([(3,6),(1,8)])
p2 = Pairs([(1,5),(4,7)])
mapping = {1:0,2:1,3:2,4:3,5:5,6:6,7:None, 8:7}
self.assertEqual(compare_pairs_mapping(p1,p2, mapping),0)
#partially the same
p1 = Pairs([(5,6),(1,4),(2,7)])
p2 = Pairs([(5,6),(0,3),(4,7)])
self.assertEqual(compare_pairs_mapping(p1,p2, mapping),.5)
p1 = Pairs([(1,8),(2,7),(3,6),(4,5)])
p2 = Pairs([(0,7),(1,6),(2,5),(3,4)])
self.assertFloatEqual(compare_pairs_mapping(p1,p2, mapping),1/7)
#one empty
p1 = Pairs([(1,8),(2,7),(3,6),(4,5)])
p2 = []
self.assertEqual(compare_pairs_mapping(p1,p2, mapping),0)
#both empty
p1 = []
p2 = []
self.assertEqual(compare_pairs_mapping(p1,p2, mapping),1)
def test_compare_random_to_correct(self):
"""comapre_random_to_correct: should return correct fraction
"""
p1 = Pairs([(1,8),(2,7),(3,6),(4,5)])
p2 = Pairs([(1,8)])
p3 = Pairs([(1,8), (2,7), (4,5)])
p4 = Pairs([(1,8),(2,7),(9,10),(11,12)])
self.assertFloatEqual(compare_random_to_correct(p2,p1),1)
self.assertFloatEqual(compare_random_to_correct(p3,p1),1)
self.assertFloatEqual(compare_random_to_correct(p4,p1),0.5)
self.assertFloatEqual(compare_random_to_correct([],p1),0)
self.assertFloatEqual(compare_random_to_correct(p2,[]),0)
self.assertFloatEqual(compare_random_to_correct([],[]),1)
class GardnerMetricsTest(TestCase):
"""Tests for the metrics from Gardner & Giegerich 2004"""
def setUp(self):
"""setUp: setup method for all tests"""
self.true = Pairs([(0,40),(1,39),(2,38),(3,37),(10,20),\
(11,19),(12,18),(13,17),(26,33),(27,32)])
self.predicted = Pairs([(0,40),(1,39),(2,38),(3,37),(4,36),\
(5,35),(10,22),(11,20),(14,29),(15,28)])
self.seq = ['>seq1\n','agguugaaggggauccgauccacuccccggcuggucaaccu']
def test_conflicts(self):
"""all metrics should raise error when conflicts in one of the structs
"""
ref = Pairs([(1,6),(2,5),(3,10),(7,None),(None,None),(5,2),(1,12)])
pred = Pairs([(6,1),(10,11),(3,12)])
self.assertRaises(ValueError, sensitivity, ref, pred)
self.assertRaises(ValueError, sensitivity, pred, ref)
self.assertRaises(ValueError, selectivity, ref, pred)
self.assertRaises(ValueError, selectivity, pred, ref)
self.assertRaises(ValueError, approximate_correlation, ref, pred,\
self.seq)
self.assertRaises(ValueError, approximate_correlation, pred, ref,\
self.seq)
self.assertRaises(ValueError, correlation_coefficient, ref, pred,\
self.seq)
self.assertRaises(ValueError, correlation_coefficient, pred, ref,\
self.seq)
self.assertRaises(ValueError, mcc, ref, pred, self.seq)
self.assertRaises(ValueError, mcc, pred, ref, self.seq)
def test_get_all_pairs(self):
"""get_all_pairs: should return the number of possible pairs"""
seq = RnaSequence('UCAG-NACGU')
seq2 = RnaSequence('UAAG-CACGC')
self.assertEqual(get_all_pairs([seq], min_dist=4), 6)
self.assertEqual(get_all_pairs([seq2], min_dist=4), 4)
# when given multiple sequences, should average over all of them
self.assertEqual(get_all_pairs([seq,seq2], min_dist=4), 5)
# different min distance
self.assertEqual(get_all_pairs([seq], min_dist=2),10)
# error on invalid minimum distance
self.assertRaises(ValueError, get_all_pairs, [seq], min_dist=-2)
def test_get_counts(self):
"""get_counts: should work with all parameters"""
seq = RnaSequence('UCAG-NAUGU')
seq2 = RnaSequence('UAAG-CACGC')
p = Pairs([(1,8),(2,7)])
p2 = Pairs([(1,8),(2,6),(3,6),(4,9),])
exp = {'TP':1,'TN':0, 'FN':1,'FP':3,\
'FP_INCONS':0, 'FP_CONTRA':0, 'FP_COMP':0}
self.assertEqual(get_counts(p, p2), exp)
exp = {'TP':1,'TN':0, 'FN':1,'FP':3,\
'FP_INCONS':1, 'FP_CONTRA':1, 'FP_COMP':1}
self.assertEqual(get_counts(p, p2, split_fp=True), exp)
seq = RnaSequence('UCAG-NACGU')
exp = {'TP':1,'TN':7, 'FN':1,'FP':3,\
'FP_INCONS':1, 'FP_CONTRA':1, 'FP_COMP':1}
self.assertEqual(get_counts(p, p2, split_fp=True,\
sequences=[seq], min_dist=2), exp)
# check against compare_ct.pm
exp = {'TP':4,'TN':266, 'FN':6,'FP':6,\
'FP_INCONS':2, 'FP_CONTRA':2, 'FP_COMP':2}
seq = 'agguugaaggggauccgauccacuccccggcuggucaaccu'.upper()
self.assertEqual(get_counts(self.true, self.predicted, split_fp=True,\
sequences=[seq], min_dist=4), exp)
def test_extract_seqs(self):
"""extract_seqs: should handle different input formats"""
s1 = ">seq1\nACGUAGC\n>seq2\nGGUAGCG"
s2 = [">seq1","ACGUAGC",">seq2","GGUAGCG"]
s3 = ['ACGUAGC','GGUAGCG']
s4 = [RnaSequence('ACGUAGC'), RnaSequence('GGUAGCG')]
m1 = ModelSequence('ACGUAGC', Name='rna1',\
Alphabet=RNA.Alphabets.DegenGapped)
m2 = ModelSequence('GGUAGCG', Name='rna2',\
Alphabet=RNA.Alphabets.DegenGapped)
s5 = [m1, m2]
f = extract_seqs
self.assertEqual(f(s1), ['ACGUAGC', 'GGUAGCG'])
self.assertEqual(f(s2), ['ACGUAGC', 'GGUAGCG'])
self.assertEqual(f(s3), ['ACGUAGC', 'GGUAGCG'])
self.assertEqual(f(s4), ['ACGUAGC', 'GGUAGCG'])
self.assertEqual(f(s5), ['ACGUAGC', 'GGUAGCG'])
def test_sensitivity(self):
"""sensitivity: check against compare_ct.pm"""
sen = sensitivity(self.true,self.predicted)
self.assertEqual(sen, 0.4)
def test_sensitivity_general(self):
"""sensitivity: should work in general"""
ref = Pairs([(1,6),(2,5),(3,10)])
pred = Pairs([(6,1),(10,11),(3,12)])
# one good prediction
self.assertFloatEqual(sensitivity(ref, pred), 1/3)
# over-prediction not penalized
pred = Pairs([(6,1),(10,11),(3,12),(13,20),(14,19),(15,18)])
self.assertFloatEqual(sensitivity(ref, pred), 1/3)
def test_sensitivity_dupl(self):
"""sensitivity: should handle duplicates, pseudo, None"""
ref = Pairs([(1,6),(2,5),(3,10),(7,None),(None,None),(5,2),(4,9)])
pred = Pairs([(6,1),(10,11),(3,12)])
self.assertFloatEqual(sensitivity(ref, pred), 0.25)
pred = Pairs([(6,1),(10,11),(3,12),(20,None),(None,None),(1,6)])
self.assertFloatEqual(sensitivity(ref, pred), 0.25)
def test_sensitivity_empty(self):
"""sensitivity: should work on emtpy Pairs"""
# both empty
self.assertFloatEqual(sensitivity(Pairs(), Pairs()), 1)
pred = Pairs([(6,1),(10,11),(3,12),(13,20),(14,19),(15,18)])
# prediction emtpy
self.assertFloatEqual(sensitivity(Pairs(), pred), 0)
# reference empty
self.assertFloatEqual(sensitivity(pred, Pairs()), 0)
def test_selectivity(self):
"""selectivity: check against compare_ct.pm"""
sel = selectivity(self.true,self.predicted)
self.assertEqual(sel, 0.5)
def test_selectivity_general(self):
"""selectivity: should work in general"""
ref = Pairs([(1,6),(2,5),(10,13)])
pred = Pairs([(6,1),(3,4),(10,12)])
# one good prediction
self.assertFloatEqual(selectivity(ref, pred), 0.5)
# over-prediction not penalized
pred = Pairs([(6,1),(10,11),(3,12),(13,20),(14,19),(15,18)])
self.assertFloatEqual(selectivity(ref, pred), 0.25)
def test_selectivity_dupl(self):
"""selectivity: duplicates and Nones shouldn't influence the calc.
"""
ref = Pairs([(1,6),(2,5),(10,13),(6,1),(7,None),(None,None)])
pred = Pairs([(6,1),(3,4),(10,12)])
self.assertFloatEqual(selectivity(ref, pred), 0.5)
def test_selectivity_empty(self):
"""selectivity: should handle empty reference/predicted structure"""
# both empty
self.assertFloatEqual(selectivity(Pairs(), Pairs()), 1)
pred = Pairs([(6,1),(10,11),(3,12),(13,20),(14,19),(15,18)])
# prediction emtpy
self.assertFloatEqual(selectivity(Pairs(), pred), 0)
# reference empty
self.assertFloatEqual(selectivity(pred, Pairs()), 0)
def test_approximate_correlation(self):
"""approximate_correlation: check against compare_ct.pm"""
self.assertFloatEqual(approximate_correlation(self.true,\
self.predicted, seqs=self.seq), 0.45)
def test_correlation_coefficient(self):
"""correlation_coefficient: check against compare_ct.pm"""
self.assertFloatEqual(correlation_coefficient(self.true,\
self.predicted, seqs=self.seq, min_dist=4), 0.42906394)
def test_cc_bad_pred(self):
"""correlation_coefficient: should give 0 when TP=0"""
ref = Pairs([(1,7),(2,5)])
pred = Pairs([(0,8)])
seqs = ['CAUCGAUUG']
self.assertEqual(correlation_coefficient(ref, pred, seqs=seqs), 0.0)
def test_mcc(self):
"""mcc: check against compare_ct.pm"""
res = mcc(self.true,self.predicted,self.seq, min_dist=4)
self.assertFloatEqual(res, 0.42906394)
def test_all_metrics(self):
"""all_metrics: check against compare_ct.pm"""
exp = {'SENSITIVITY':0.4, 'SELECTIVITY':0.5, 'AC':0.45,\
'CC':0.42906394, 'MCC':0.42906394}
obs = all_metrics(self.true, self.predicted, seqs=self.seq, min_dist=4)
self.assertEqualItems(obs.keys(), exp.keys())
for k in exp:
self.assertFloatEqual(obs[k], exp[k])
def test_get_counts_pseudo(self):
"""get_counts: should work when pseudo in ref -> classification off"""
# pairs that would normally be compatible, are now contradicting
ref = Pairs([(0,8),(1,7),(4,10)])
pred = Pairs([(0,8),(3,6),(4,10)])
seq = 'GACUGUGUCAU'
exp = {'TP':2,'TN':13-2-1, 'FN':1,'FP':1,\
'FP_INCONS':0, 'FP_CONTRA':1, 'FP_COMP':0}
self.assertEqual(get_counts(ref, pred, split_fp=True,\
sequences=[seq], min_dist=4), exp)
def test_all_metrics_pseudo(self):
"""all_metrics: pseudoknot in ref, check against compare_ct.pm"""
ref = Pairs([(0,8),(1,7),(4,10)])
pred = Pairs([(0,8),(3,6),(4,10)])
seq = 'GACUGUGUCAU'
exp = {'SENSITIVITY':0.6666667, 'SELECTIVITY':0.6666667,\
'AC':0.6666667, 'CC':0.57575758, 'MCC':0.57575758}
obs = all_metrics(ref, pred, seqs=[seq], min_dist=4)
self.assertEqualItems(obs.keys(), exp.keys())
for k in exp:
self.assertFloatEqual(obs[k], exp[k])
def test_all_metrics_weird_input(self):
"""all_metrics: should work when ref or prediction empty or no seqs"""
ref = Pairs([(3,10)])
pred = Pairs()
seqs = ['UACGUAGCUAGCUAGCUACG']
obs = all_metrics(ref, pred, seqs=[seqs], min_dist=4)
exp = {'SENSITIVITY':0, 'SELECTIVITY':0,\
'AC':0, 'CC':0, 'MCC':0}
for k in exp:
self.assertFloatEqual(obs[k], exp[k])
ref = Pairs()
pred = Pairs()
seqs = ['UACGUAGCUAGCUAGCUACG']
obs = all_metrics(ref, pred, seqs=[seqs], min_dist=4)
exp = {'SENSITIVITY':1, 'SELECTIVITY':1,\
'AC':1, 'CC':1, 'MCC':1}
for k in exp:
self.assertFloatEqual(obs[k], exp[k])
ref = Pairs([(3,10)])
pred = Pairs([(1,12)])
seqs = ['UACGUAGCUAGCUAGCUACG']
self.assertRaises(ValueError, all_metrics, ref, pred, seqs="",\
min_dist=4)
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_struct/test_rna2d.py 000644 000765 000024 00000127201 12024702176 022410 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests for ViennaStructure and related classes.
"""
from cogent.util.unit_test import TestCase, main
from cogent.struct.rna2d import ViennaStructure, Vienna, Pairs,\
Partners, EmptyPartners, WussStructure, wuss_to_vienna, StructureNode, \
Stem, classify, PairError
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Sandra Smit"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class RnaAlphabet(object):
Pairs = {
('A','U'): True, #True vs False for 'always' vs 'sometimes' pairing
('C','G'): True,
('G','C'): True,
('U','A'): True,
('G','U'): False,
('U','G'): False,
}
class Rna(str):
Alphabet = RnaAlphabet
class StemTests(TestCase):
"""Tests for the Stem object"""
def test_init_empty(self):
"""Stem should init ok with no parameters."""
s = Stem()
self.assertEqual((s.Start, s.End, s.Length), (None, None, 0))
def test_init(self):
"""Stem should init with Start, End, and Length"""
s = Stem(Length=3)
self.assertEqual((s.Start, s.End, s.Length), (None, None, 3))
#should set Length to 0 if not supplied and unpaired
s = Stem(Start=3)
self.assertEqual((s.Start, s.End, s.Length), (3, None, 0))
s = Stem(End=3)
self.assertEqual((s.Start, s.End, s.Length), (None, 3, 0))
#should set Length to 1 if not supplied and paired
s = Stem(Start=3, End=5)
self.assertEqual((s.Start, s.End, s.Length), (3, 5, 1))
#parameters should be in order Start, End, Length
#note that you're allowed to initialize an invalid stem, like the
#following one (can't have 7 pairs between 3 and 5); this is often
#useful when building up a tree that you plan to renumber().
s = Stem(3, 5, 7)
self.assertEqual((s.Start, s.End, s.Length), (3, 5, 7))
#not allowed more than 3 parameters
self.assertRaises(TypeError, Stem, 1, 2, 3, 4)
def test_len(self):
"""Stem len() should return self.Length"""
s = Stem()
self.assertEqual(len(s), 0)
s.Length = 5
self.assertEqual(len(s), 5)
s.Length = None
self.assertRaises(TypeError, len, s)
def test_getitem(self):
"""Stem getitem should return a Stem object for the ith pair in the stem"""
s = Stem()
self.assertRaises(IndexError, s.__getitem__, 0)
s.Start = 5
s.End = 8
s.Length = 1
pairs = list(s)
self.assertEqual(pairs, [Stem(5, 8, 1)])
s.Length = 2
pairs = list(s)
self.assertEqual(pairs, [Stem(5,8,1),Stem(6,7,1)])
#WARNING: Stem will not complain when iterating over an invalid helix,
#as per the one below
s.Length = 5
pairs = list(s)
self.assertEqual(pairs, [Stem(5,8,1),Stem(6,7,1),Stem(7,6,1),\
Stem(8,5,1), Stem(9,4,1)])
def test_cmp(self):
"""Stems should compare equal when the data is the same"""
self.assertEqual(Stem(1,2,3), Stem(1,2,3))
self.assertNotEqual(Stem(1,2,5), Stem(1,2,3))
a = Stem(1, 10, 2)
b = Stem(2, 8, 1)
c = Stem(15, 20, 2)
l = [c, b, a]
l.sort()
self.assertEqual(l, [a,b,c])
def test_extract(self):
"""Stems extract should return a list of 'pairs' from a sequence"""
seq = 'UGAGAUUUUCU'
s = Stem(1, 10, 3)
self.assertEqual(s.extract(seq), [('G','U'),('A','C'),('G','U')])
s = Stem(0, 1)
self.assertEqual(s.extract(seq), [('U','G')])
#should put in None if either position hasn't been specified
s = Stem(5)
self.assertEqual(s.extract(seq), [('U', None)])
s = Stem()
self.assertEqual(s.extract(seq), [(None, None)])
#should raise IndexError if the stem contains bases outside the seq
s = Stem(50, 60, 5)
self.assertRaises(IndexError, s.extract, seq)
def test_hash(self):
"""Stems hash should allow use as dict keys if unchanged"""
#WARNING: if you change the Stem after putting it in a dict, all bets
#are off as to behavior. Don't do it!
s = Stem(1, 5, 2)
t = Stem(1, 5, 2)
u = Stem(2, 4, 6)
v = Stem(2, 4, 6)
w = Stem(2, 4, 4)
d = {}
assert s is not t
for i in (s, t, u, v, w):
if i in d:
d[i] += 1
else:
d[i] = 1
self.assertEqual(len(d), 3)
self.assertEqual(d[Stem(1, 5, 2)], 2)
self.assertEqual(d[Stem(2, 4, 6)], 2)
self.assertEqual(d[Stem(2, 4, 4)], 1)
assert Stem(1,5) not in d
def test_str(self):
"""Stem str should print Start, End and Length as a tuple"""
self.assertEqual(str(Stem()), '(None,None,0)')
self.assertEqual(str(Stem(3)), '(3,None,0)')
self.assertEqual(str(Stem(3,4)), '(3,4,1)')
self.assertEqual(str(Stem(3,4,5)), '(3,4,5)')
def test_nonzero(self):
"""Stem nonzero should return True if paired (length > 0)"""
assert not Stem()
assert not Stem(1)
assert Stem(7, 10)
assert Stem(1, 7, 1)
assert Stem(2, 8, 3)
#go strictly by length; don't check if data is invalid
assert Stem(0, 0)
assert Stem(5, None, 10)
assert not Stem(5, 7, -1)
class PartnersTests(TestCase):
"""Tests for Partners object"""
def test_init(self):
"""Partners should init with empty list and stay free of conflicts"""
self.assertEqual(Partners([]),[])
empty = Partners([None]*6)
self.assertEqual(empty,[None,None,None,None,None,None])
self.assertRaises(ValueError,empty.__setitem__,2,2)
empty[2] = 3
self.assertEqual(empty,[None,None,3,2,None,None])
empty[3] = 1
self.assertEqual(empty,[None,3,None,1,None,None])
empty[3] = 5
self.assertEqual(empty,[None,None,None,5,None,3])
empty[1] = None
self.assertEqual(empty,[None,None,None,5,None,3])
def test_toPairs(self):
"""Partners toPairs() should return a Pairs object"""
p = Partners([None,3,None,1,5,4])
self.assertEqualItems(p.toPairs(),[(1,3),(4,5)])
assert isinstance(p.toPairs(),Pairs)
self.assertEqual(Partners([None]*10).toPairs(),[])
def test_not_implemented(self):
"""Partners not_implemented should raise error for 'naughty' methods"""
p = Partners([None,3,1,5,4])
self.assertRaises(NotImplementedError,p.pop)
self.assertRaises(NotImplementedError,p.sort)
self.assertRaises(NotImplementedError,p.__delitem__,3)
class PairsTests(TestCase):
"""Tests for Pairs object"""
def setUp(self):
"""Pairs SetUp method for all tests"""
self.Empty = Pairs([])
self.OneList = Pairs([[1,2]])
self.OneTuple = Pairs([(1,2)])
self.MoreLists = Pairs([[2,4],[3,9],[6,36],[7,49]])
self.MoreTuples = Pairs([(2,4),(3,9),(6,36),(7,49)])
self.MulNoOverlap = Pairs([(1,10),(2,9),(3,7),(4,12)])
self.MulOverlap = Pairs([(1,2),(2,3)])
self.Doubles = Pairs([[1,2],[1,2],[2,3],[1,3]])
self.Undirected = Pairs([(2,1),(6,4),(1,7),(8,3)])
self.UndirectedNone = Pairs([(5,None),(None,3)])
self.UndirectedDouble = Pairs([(2,1),(1,2)])
self.NoPseudo = Pairs([(1,20),(2,19),(3,7),(4,6),(10,15),(11,14)])
self.NoPseudo2 = Pairs([(1,3),(4,6)])
#((.(.)).)
self.p0 = Pairs([(0,6),(1,5),(3,8)])
#(.((..(.).).))
self.p1 = Pairs([(0,9),(2,12),(3,10),(5,7)])
#((.(.(.).)).)
self.p2 = Pairs([(0,10),(1,9),(3,12),(5,7)])
#((.((.(.)).).))
self.p3 = Pairs([(0,9),(1,8),(3,14),(4,13),(6,11)])
#(.(((.((.))).)).(((.((((..))).)))).)
self.p4 = Pairs([(0,35),(2,11),(3,10),(4,9),(6,14),(7,13),(16,28),\
(17,27),(18,26),(20,33),(21,32),(22,31),(23,30)])
#(.((.).))
self.p5 = Pairs([(0,5),(2,8),(3,7)])
self.p6 = Pairs([(0,19),(2,6),(3,5),(8,14),(9,13),(10,12),\
(16,22),(17,21)])
self.p7 = Pairs([(0,20),(2,6),(3,5),(8,14),(9,10),(11,16),(12,15),\
(17,23),(18,22)])
def test_init(self):
"""Pairs should initalize with both lists and tuples"""
self.assertEqual(self.Empty,[])
self.assertEqual(self.OneList,[[1,2]])
self.assertEqual(self.OneTuple,[(1,2)])
self.assertEqual(self.MulNoOverlap,[(1,10),(2,9),(3,7),(4,12)])
self.assertEqual(self.MulOverlap,[(1,2),(2,3)])
def test_toPartners(self):
"""Pairs toPartners() should return a Partners object"""
a = Pairs([(1,5),(3,4),(6,9),(7,8)]) #normal
b = Pairs([(0,4),(2,6)]) #pseudoknot
c = Pairs([(1,6),(3,6),(4,5)]) #conflict
self.assertEqual(a.toPartners(10),[None,5,None,4,3,1,9,8,7,6])
self.assertEqual(a.toPartners(13,3),\
[None,None,None,None,8,None,7,6,4,12,11,10,9])
assert isinstance(a.toPartners(10),Partners)
self.assertEqual(b.toPartners(7),[4,None,6,None,0,None,2])
self.assertRaises(ValueError,c.toPartners,7)
self.assertEqual(c.toPartners(7,strict=False),[None,None,None,6,5,4,3])
#raises an error when try to insert something at non-existing indices
self.assertRaises(IndexError,c.toPartners,0)
def test_toVienna(self):
"""Pairs toVienna() should return a ViennaStructure if possible"""
a = Pairs([(1,5),(3,4),(6,9),(7,8)]) #normal
b = Pairs([(0,4),(2,6)]) #pseudoknot
c = Pairs([(1,6),(3,6),(4,5)]) #conflict
d = Pairs([(1,6),(3,None)])
e = Pairs([(1,9),(8,2),(7,3)]) #not directed
f = Pairs([(1,6),(2,5),(10,15),(14,11)]) # not directed
self.assertEqual(a.toVienna(10),'.(.())(())')
self.assertEqual(a.toVienna(13,offset=3),'....(.())(())')
self.assertRaises(PairError,b.toVienna,7) #pseudoknot NOT accepted
self.assertRaises(Exception,b.toVienna,7) #old test for exception
self.assertRaises(ValueError,c.toVienna,7)
#pairs containging None are being skipped
self.assertEquals(d.toVienna(7),'.(....)')
#raises error when trying to insert at non-existing indices
self.assertRaises(IndexError,a.toVienna,3)
self.assertEqual(Pairs().toVienna(3),'...')
#test when parsing in the sequence
self.assertEqual(a.toVienna('ACGUAGCUAG'),'.(.())(())')
self.assertEqual(a.toVienna(Rna('AACCGGUUAGCUA'), offset=3),\
'....(.())(())')
self.assertEqual(e.toVienna(10),'.(((...)))')
self.assertEqual(f.toVienna(20),'.((..))...((..))....')
def test_tuples(self):
"""Pairs tuples() should transform the elements of list to tuples"""
x = Pairs([])
x.tuples()
assert x == []
x = Pairs([[1,2],[3,4]])
x.tuples()
assert x == [(1,2),(3,4)]
x = Pairs([(1,2),(3,4)])
x.tuples()
assert x == [(1,2),(3,4)]
assert x != [[1,2],[3,4]]
def test_unique(self):
"""Pairs unique() should remove double occurences of certain tuples"""
self.assertEqual(self.Empty.unique(),[])
self.assertEqual(self.MoreTuples.unique(),self.MoreTuples)
self.assertEqual(self.Doubles.unique(),Pairs([(1,2),(2,3),(1,3)]))
def test_directed(self):
"""Pairs directed() should change all pairs so that a}]',-0.01)
self.WussTwoHelix = WussStructure('{[.]}(<>).',1.11)
self.WussThreeHelix = WussStructure('::(<<({__}),,([(__)])-->>)')
self.WussPseudo = WussStructure('<<__AA>>_aa::')
def test_wuss_toPairs(self):
"""WussStructure toPairs() should return a valid Pairs object"""
self.assertEqual(self.WussNoPairs.toPairs(),[])
self.assertEqualItems(self.WussOneHelix.toPairs(),\
[(0,12),(2,11),(3,10),(4,7)])
self.assertEqualItems(self.WussTwoHelix.toPairs(),\
[(0,4),(1,3),(5,8),(6,7)])
self.assertEqualItems(self.WussThreeHelix.toPairs(),\
[(2,25),(3,24),(4,23),(5,10),(6,9),(13,20),(14,19),(15,18)])
self.assertEqualItems(self.WussPseudo.toPairs(),\
[(0,7),(1,6)])
def test_wuss_toPartners(self):
"""WussStructure toPartners() should return valid Partners object"""
self.assertEqual(self.WussNoPairs.toPartners(),[None]*6)
self.assertEqualItems(self.WussThreeHelix.toPartners(),\
[None,None,25,24,23,10,9,None,None,6,5,None,None,20,19,\
18,None,None,15,14,13,None,None,4,3,2])
self.assertEqualItems(self.WussPseudo.toPartners(),\
[7,6,None,None,None,None,1,0,None,None,None,None,None])
class Rna2dTests(TestCase):
def test_Vienna(self):
"""Vienna should initalize from several formats"""
self.NoPairs = Vienna('.......... (0.0)')
self.OneHelix = Vienna('((((())))) (-1e-02)')
self.TwoHelix = Vienna('((.))(()). \t(1.11)')
self.ThreeHelix = Vienna('(((((..))..(((..)))..)))')
self.GivenEnergy = Vienna('((.))',0.1)
self.TwoEnergies = Vienna('((.)) (4.6)',2.1)
self.assertEqual(self.NoPairs, '..........')
self.assertEqual(self.NoPairs.Energy, 0.0)
self.assertEqual(self.OneHelix, '((((()))))')
self.assertEqual(self.OneHelix.Energy, -1e-2)
self.assertEqual(self.TwoHelix, '((.))(()).')
self.assertEqual(self.TwoHelix.Energy, 1.11)
self.assertEqual(self.ThreeHelix, '(((((..))..(((..)))..)))')
self.assertEqual(self.ThreeHelix.Energy, None)
self.assertEqual(self.GivenEnergy.Energy,0.1)
self.assertEqual(self.TwoEnergies.Energy,2.1)
def test_EmptyPartners(self):
"""EmptyPartners should return list of 'None's of given length"""
self.assertEqual(EmptyPartners(0),[])
self.assertEqual(EmptyPartners(1),[None])
self.assertEqual(EmptyPartners(10),[None]*10)
def test_wuss_to_vienna(self):
"""wuss_to_vienna() should transform Wuss into Vienna"""
empty = WussStructure('.....')
normal = WussStructure('[.{[<...}}}}')
pseudo = WussStructure('[..AA..]..aa')
self.assertEqual(wuss_to_vienna(normal),'(.(((...))))')
self.assertEqual(wuss_to_vienna(empty),'.....')
self.assertEqual(wuss_to_vienna(pseudo),'(......)....')
def test_classify(self):
"""classify() should classify valid structures correctly"""
Empty = ''
NoPairs = '.....'
OneHelix = '((((()))))'
ManyHelices = '(..(((...)).((.(((((..))).)))..((((..))))))...)'
Ends = '..(.)..'
FirstEnd = '..((()))'
LastEnd = '((..((.))))...'
Internal = '(((...)))..((.)).'
#following structure is from p 25 of Eddy's WUSS description manual
Eddy = '..((((.(((...)))...((.((....))..)).)).))'
structs = [Empty, NoPairs, OneHelix, ManyHelices, Ends, \
FirstEnd, LastEnd, Internal, Eddy]
EmptyResult = ''
NoPairsResult = 'EEEEE'
OneHelixResult = 'SSSSSSSSSS'
ManyHelicesResult = 'SBBSSSLLLSSJSSBSSSSSLLSSSBSSSJJSSSSLLSSSSSSBBBS'
EndsResult = 'EESLSEE'
FirstEndResult = 'EESSSSSS'
LastEndResult = 'SSBBSSLSSSSEEE'
InternalResult = 'SSSLLLSSSFFSSLSSE'
#following structure is from p 25 of Eddy's WUSS description manual
Eddy = 'EESSSSJSSSLLLSSSJJJSSBSSLLLLSSBBSSJSSBSS'
results = [EmptyResult, NoPairsResult, OneHelixResult,
ManyHelicesResult, EndsResult, FirstEndResult, LastEndResult,
InternalResult, Eddy]
for s, r in zip(structs, results):
c = classify(s)
self.assertEqual(classify(s), r)
long_struct = ".((((((((((((((.((((((..((((.....)))))))))).))..))))))))))))....(((.((((.((((((((......((((((.((..(((((((....)))).)))..))))))))...))))))))...........(((((.(..(((((((((......((((((((((((.........))))))))))))....))))).))))..)..)))))..(((((((((((((((((((......(((((((((((((((((((((((((((((((...(((.......((((((((........)))))))).......)))...))))))))))))))))))))))))))))))).((((........(((((((((((((((((((...))))))))))))))))))).......)))).....((((((((((((((((((((((((((((((.(((...))).)))))))))))))))))))))))...........))))))).))))))))))))))))))).....)))).)))......"
#compare standalone method with classification from tree
c = classify(long_struct)
d = ViennaStructure(long_struct).toTree().classify()
self.assertEqual(c,d)
#Error is raised when trying to classify invalid structures
invalid_structure = '(((..)).))))(...)(...'
self.assertRaises(IndexError, classify, invalid_structure)
class ViennaNodeTests(TestCase):
"""Tests of the ViennaNode class."""
def setUp(self):
"""Instantiate some standard ViennaNodes."""
self.EmptyStr = ''
self.NoPairsStr = '.....'
self.OneHelixStr = '((((()))))'
self.ManyHelicesStr = '(..(((...)).((.(((((..))).)))..((((..))))))...)'
self.EndsStr = '..(.)..'
self.FirstEndStr = '..((()))'
self.LastEndStr = '((..((.))))...'
self.InternalStr = '(((...)))..((.)).'
#following structure is from p 25 of Eddy's WUSS description manual
self.EddyStr = '..((((.(((...)))...((.((....))..)).)).))'
#add in the tree versions by deleting trailing 'Str'
for s in self.__dict__.keys():
if s.endswith('Str'):
self.__dict__[s[:-3]] = \
ViennaStructure(self.__dict__[s]).toTree()
def test_str(self):
"""ViennaNode str should return Vienna-format string"""
for s in [self.EmptyStr, self.NoPairsStr, self.OneHelixStr,
self.ManyHelicesStr, self.EndsStr, self.InternalStr]:
self.assertEqual(str(ViennaStructure(s).toTree()), s)
#test with multiple-base helix in a node
r = StructureNode()
r.append(StructureNode())
r.append(StructureNode(Data=Stem(1,7,5)))
r[1].append(StructureNode())
r.append(StructureNode())
r.append(StructureNode())
r.renumber()
self.assertEqual(str(r), '.(((((.)))))..')
def test_classify(self):
"""ViennaNode classify should return correct classification string"""
self.assertEqual(self.Empty.classify(), '')
self.assertEqual(self.NoPairs.classify(), 'EEEEE')
self.assertEqual(self.OneHelix.classify(), 'SSSSSSSSSS')
self.assertEqual(self.ManyHelices.classify(), \
'SBBSSSLLLSSJSSBSSSSSLLSSSBSSSJJSSSSLLSSSSSSBBBS')
self.assertEqual(self.Ends.classify(), 'EESLSEE')
self.assertEqual(self.FirstEnd.classify(), 'EESSSSSS')
self.assertEqual(self.LastEnd.classify(), 'SSBBSSLSSSSEEE')
self.assertEqual(self.Internal.classify(), 'SSSLLLSSSFFSSLSSE')
self.assertEqual(self.Eddy.classify(), \
'EESSSSJSSSLLLSSSJJJSSBSSLLLLSSBBSSJSSBSS')
def test_renumber(self):
"""ViennaNode renumber should assign correct numbers to nodes"""
#should have no effect on empty structure
se = self.Empty
self.assertEqual(se.renumber(5), 5)
self.assertEqual((se.Start, se.End, se.Length), (None, None, 0))
#with no pairs, should number consecutively
sn = self.NoPairs
self.assertEqual(sn.renumber(5), 10)
self.assertEqual([i.Start for i in sn], [5, 6, 7, 8, 9])
self.assertEqual([i.End for i in sn], [None]*5)
self.assertEqual([i.Length for i in sn], [0]*5)
#spot checks on a complex structure
sm = self.ManyHelices
self.assertEqual(sm.renumber(5), 52)
s0 = sm[0]
self.assertEqual((s0.Start, s0.End, s0.Length), (5, 51, 1))
s5 = sm[0][2][2][0]
self.assertEqual(len(s5), 2)
self.assertEqual((s5.Start, s5.End, s5.Length), (18, 33, 1))
s6 = s5[0]
self.assertEqual((s6.Start, s6.End, s6.Length), (19,None,0))
#test with some helices of different lengths
root = StructureNode()
root.extend([StructureNode() for i in range(3)])
root.insert(1, StructureNode(Data=Stem(3, 7, 5)))
root.insert(3, StructureNode(Data=Stem(6,2,2)))
root.append(StructureNode())
self.assertEqual(root.renumber(0), 18)
self.assertEqual(len(root), 6)
curr = root[0]
self.assertEqual((curr.Start,curr.End,curr.Length), (0, None, 0))
curr = root[1]
self.assertEqual((curr.Start, curr.End, curr.Length), (1, 10, 5))
curr = root[2]
self.assertEqual((curr.Start, curr.End, curr.Length), (11, None, 0))
curr = root[3]
self.assertEqual((curr.Start, curr.End, curr.Length), (12, 15, 2))
curr = root[4]
self.assertEqual((curr.Start, curr.End, curr.Length), (16, None, 0))
curr = root[5]
self.assertEqual((curr.Start, curr.End, curr.Length), (17, None, 0))
def test_unpair(self):
"""StructureNode unpair should break a base pair and add correct nodes"""
i = self.Internal
self.assertEqual(i[0].unpair(), True)
self.assertEqual(str(i), '.((...))...((.)).')
e = self.Ends
self.assertEqual(e[0].unpair(), False)
self.assertEqual(str(e), self.EndsStr)
o = self.OneHelix
self.assertEqual(o[0].unpair(), True)
self.assertEqual(str(o), '.(((()))).')
self.assertEqual(o[1][0][0].unpair(), True)
self.assertEqual(str(o), '.((.().)).')
self.assertEqual(o[1].unpair(), True)
self.assertEqual(str(o), '..(.().)..')
self.assertEqual(o[2][1].unpair(), True)
self.assertEqual(str(o), '..(....)..')
self.assertEqual(o[2].unpair(), True)
self.assertEqual(str(o), '..........')
#test with multiple bases in helix
r = StructureNode()
r.append(StructureNode(Data=Stem(0,0, 5)))
r.renumber()
self.assertEqual(str(r), '((((()))))')
self.assertEqual(r[0].unpair(), True)
self.assertEqual(str(r), '.(((()))).')
def test_pairBefore(self):
"""StructureNode pairBefore should make a pair before the current node"""
#shouldn't be able to make any pairs if everything is paired already
o = self.OneHelix
for i in o:
self.assertEqual(i.pairBefore(), False)
n = self.NoPairs
#shouldn't be able to pair at the start...
self.assertEqual(n[0].pairBefore(), False)
#...or at the end...
self.assertEqual(n[-1].pairBefore(), False)
#...but should work OK in the middle
self.assertEqual(n[1].pairBefore(), True)
self.assertEqual(str(n), '(.)..')
e = self.Ends
self.assertEqual(e[2].pairBefore(), True)
self.assertEqual(e[1].pairBefore(), True)
self.assertEqual(str(e), '(((.)))')
self.assertEqual((e[0].Start, e[0].End, e[0].Length), (0,6,1))
def test_pairAfter(self):
"""StructureNode pairAfter should create pairs after a node"""
n = self.NoPairs
self.assertEqual(n.pairAfter(), True)
self.assertEqual(str(n), '(...)')
self.assertEqual(n[0].pairAfter(), True)
self.assertEqual(str(n), '((.))')
self.assertEqual(n[0][0].pairAfter(), False)
self.assertEqual(str(n), '((.))')
curr = n[0][0]
#check that child is correct
self.assertEqual(len(curr), 1)
self.assertEqual((curr[0].Start, curr[0].End, curr[0].Length), \
(2,None,0))
#check that pair is correct
self.assertEqual((curr.Start, curr.End, curr.Length), (1,3,1))
m = self.ManyHelices
n = m[0][2][0][0]
self.assertEqual(n.pairAfter(), True)
self.assertEqual(str(m), \
'(..((((.))).((.(((((..))).)))..((((..))))))...)')
self.assertEqual(n[0].pairAfter(), False)
def test_pairChildren(self):
"""StructureNode PairChildren should make the correct pairs"""
n = ViennaStructure('.....').toTree() #same as self.NoPairs
self.assertEqual(n.pairChildren(0, 4), True)
self.assertEqual(str(n), '(...)')
n = ViennaStructure('.....').toTree() #same as self.NoPairs
self.assertEqual(n.pairChildren(1, 4), True)
self.assertEqual(str(n), '.(..)')
n = ViennaStructure('.....').toTree() #same as self.NoPairs
#can't pair same object
self.assertEqual(n.pairChildren(1, 1), False)
self.assertEqual(str(n), '.....')
self.assertEqual(n.pairChildren(1, -1), True)
self.assertEqual(str(n), '.(..)')
#can't pair something already paired
self.assertEqual(n.pairChildren(0,1), False)
#IndexError if out of range
self.assertRaises(IndexError, n.pairChildren, 0, 5)
n.append(StructureNode())
n.append(StructureNode())
n.renumber()
self.assertEqual(str(n), '.(..)..')
self.assertEqual(n.pairChildren(0, -2), True)
self.assertEqual(str(n), '((..)).')
def test_expand(self):
"""StructureNode expand should extend helices."""
s = StructureNode(Data=(Stem(1, 10, 3)))
s.append(StructureNode())
#need to make a root node for consistency
r = StructureNode()
r.append(s)
self.assertEqual(str(s), '(((.)))')
s.expand()
self.assertEqual(str(s), '(((.)))')
self.assertEqual((s.Start, s.End, s.Length), (1, 10, 1))
n = s[0]
self.assertEqual((n.Start, n.End, n.Length), (2, 9, 1))
n = s[0][0]
self.assertEqual((n.Start, n.End, n.Length), (3, 8, 1))
n = s[0][0][0]
self.assertEqual((n.Start, n.End, n.Length), (None, None, 0))
s.renumber()
self.assertEqual((s.Start, s.End, s.Length), (0, 6, 1))
n = s[0]
self.assertEqual((n.Start, n.End, n.Length), (1, 5, 1))
n = s[0][0]
self.assertEqual((n.Start, n.End, n.Length), (2, 4, 1))
n = s[0][0][0]
self.assertEqual((n.Start, n.End, n.Length), (3, None, 0))
#check that it's not recursive
s[0][0].append(StructureNode(Data=Stem(20, 24, 2)))
s.expand()
n = s[0][0][-1]
self.assertEqual((n.Start, n.End, n.Length), (20, 24, 2))
n.expand()
self.assertEqual((n.Start, n.End, n.Length), (20, 24, 1))
n = n[0]
self.assertEqual((n.Start, n.End, n.Length), (21, 23, 1))
def test_expandAll(self):
"""StructureNode expandAll should act recursively"""
r = StructureNode()
r.append(StructureNode(Data=Stem(0, 6, 4)))
r.append(StructureNode(Data=Stem(0, 6, 3)))
r.append(StructureNode())
r[0].append(StructureNode())
r[0].append(StructureNode(Data=Stem(0,6,2)))
r[0][-1].append(StructureNode())
r.renumber()
self.assertEqual(str(r), '((((.((.))))))((())).')
r.expandAll()
self.assertEqual(str(r), '((((.((.))))))((())).')
expected_nodes = [
(None, None, 0),
(0, 13, 1),
(1, 12, 1),
(2, 11, 1),
(3, 10, 1),
(4, None, 0),
(5, 9, 1),
(6, 8, 1),
(7, None, 0),
(14, 19, 1),
(15, 18, 1),
(16, 17, 1),
(20, None, 0),
]
for obs, exp in zip(r.traverse(), expected_nodes):
self.assertEqual((obs.Start, obs.End, obs.Length), exp)
def test_collapse(self):
"""StructureNode collapse should collapse consecutive pairs from self"""
one = ViennaStructure('(.)').toTree()
self.assertEqual(one.collapse(), False)
self.assertEqual(str(one), '(.)')
two = ViennaStructure('((.))').toTree()
#can't collapse root node
self.assertEqual(two.collapse(), False)
#should be able to collapse next node
self.assertEqual(two[0].collapse(), True)
self.assertEqual((two[0].Start, two[0].End, two[0].Length), (0,4,2))
self.assertEqual(str(two), '((.))')
three = ViennaStructure('(((...)))..').toTree()
self.assertEqual(three[0].collapse(), True)
self.assertEqual((three[0].Start, three[0].End, three[0].Length), \
(0,8,3))
self.assertEqual(str(three), '(((...)))..')
self.assertEqual(three[0].collapse(), False)
self.assertEqual(three[-1].collapse(), False)
oh = self.OneHelix
self.assertEqual(oh[0].collapse(), True)
self.assertEqual(str(oh), '((((()))))')
def test_collapseAll(self):
"""StructureNode collapseAll should collapse consecutive pairs"""
for s in [self.Empty, self.NoPairs, self.OneHelix, self.ManyHelices,\
self.Ends, self.FirstEnd, self.LastEnd, self.Internal, self.Eddy]:
before = str(s)
s.collapseAll()
after = str(s)
self.assertEqual(after, before)
oh = self.OneHelix[0]
self.assertEqual((oh.Start, oh.End, oh.Length), (0,9,5))
m_obs = self.ManyHelices.traverse()
m_exp = [
(None, None, 0),
(0, 46, 1),
(1, None, 0),
(2, None, 0),
(3, 42, 1),
(4, 10, 2),
(6, None, 0),
(7, None, 0),
(8, None, 0),
(11, None, 0),
(12, 41, 1),
(13, 28, 1),
(14, None, 0),
(15, 27, 2),
(17, 24, 3),
(20, None, 0),
(21, None, 0),
(25, None, 0),
(29, None, 0),
(30, None, 0),
(31, 40, 4),
(35, None, 0),
(36, None, 0),
(43, None, 0),
(44, None, 0),
(45, None, 0),
(46, None, 0),
]
for obs, exp in zip([(i.Start, i.End, i.Length) for i in m_obs], m_exp):
self.assertEqual(obs, exp)
def test_breakBadPairs(self):
"""StructureNode breakBadPairs should eliminate mispaired bases."""
oh_str = ViennaStructure(self.OneHelixStr)
#no change if all pairs valid
oh = oh_str.toTree()
oh.breakBadPairs(Rna('CCCCCGGGGG'))
self.assertEqual(str(oh), str(oh_str))
#break everything if all pairs invalid
oh.breakBadPairs(Rna('CCCCCAAAAA'))
self.assertEqual(str(oh), '..........')
#break a single pair
oh = oh_str.toTree()
oh.breakBadPairs(Rna('GCCCCGGGGG'))
self.assertEqual(str(oh), '.(((()))).')
#break two pairs
oh = oh_str.toTree()
oh.breakBadPairs(Rna('GCCCCCGGGG'))
self.assertEqual(str(oh), '.(((..))).')
#break internal pairs
oh = oh_str.toTree()
oh.breakBadPairs(Rna('GCCGCGGGGG'))
self.assertEqual(str(oh), '.((.().)).')
#repeat with multiple independent helices
th_str = ViennaStructure('((.)).((.))')
th = th_str.toTree()
th.breakBadPairs(Rna('CCUGGCUUCGG'))
self.assertEqual(str(th), th_str)
th.breakBadPairs(Rna('CGUAGCAGUUU'))
self.assertEqual(str(th), '(...).((.))')
th = th_str.toTree()
th.breakBadPairs(Rna('UUUUUUUUUUU'))
self.assertEqual(str(th), '...........')
def test_extendHelix(self):
"""StructureNode extendHelix should extend the helix as far as possible
"""
#single paired node is root[4]
op_str = ViennaStructure('....(......)...')
op = op_str.toTree()
#can't extend if base pairs not allowed
op[4].extendHelix(Rna('AAAAAAAAAAAAAAA'))
self.assertEqual(str(op), op_str)
#should extend a pair 5'
op[4].extendHelix(Rna('AAACCAAAAAAGGAA'))
self.assertEqual(str(op), '...((......))..')
#should extend multiple pairs 5'
op = op_str.toTree()
op[4].extendHelix(Rna('CCCCCUUUUUUGGGG'))
self.assertEqual(str(op), '.((((......))))')
#should extend a pair 3', but must leave > 2-base loop
op = op_str.toTree()
op[4].extendHelix(Rna('AAAACCCCGGGGAAA'))
self.assertEqual(str(op), '....((....))...')
op[4][0].insert(1, StructureNode(Data=Stem(Start=1,End=1,Length=1)))
op.renumber()
self.assertEqual(str(op), '....((.()...))...')
op[4][0].extendHelix(Rna( 'AAAACCCUACGGGGAAA'))
self.assertEqual(str(op), '....(((()..)))...')
#should extend a pair in both directions if possible
op = op_str.toTree()
op[4].extendHelix(Rna('AAACCCAAAAGGGAA'))
self.assertEqual(str(op), '...(((....)))..')
def test_extendHelices(self):
"""StructureNode extendHelices should extend all helices"""
e = ViennaStructure('........')
t = e.toTree()
t.extendHelices(Rna('CCCCCCCCCC'))
self.assertEqual(str(t), e)
#no pairs if sequence can't form them
s = ViennaStructure('(.....(...)..)...((.....))...')
r = Rna('AAAAAAAAAAAAAAAAAAAAAAAAAAAAA')
t = s.toTree()
t.extendHelices(r)
self.assertEqual(str(t), s)
#should be able to extend a single helix
s = ViennaStructure('(.....(...)..)...((.....))...')
r = Rna('CAAAAACAAAGAAGCCCCCCCAGGGGGGG')
t = s.toTree()
t.extendHelices(r)
self.assertEqual(str(t), '(.....(...)..)((((((...))))))')
#should be able to extend multiple helices
s = ViennaStructure('(.....(...)..)...((.....))...')
r = Rna('AAAAACCCAGGGUUCCCCCAUAAAGGGAA')
t = s.toTree()
t.extendHelices(r)
self.assertEqual(str(t), '((...((...))))..(((.....)))..')
def test_fitSeq(self):
"""StructureNode fitSeq should adjust structure to match sequence"""
#this is just a minimal test, since we know that both breakBadPairs()
#and extendHelices() work fine with more extensive tests.
s = ViennaStructure('..(((.....)))......(((.....)))...')
r = Rna( 'UCCCCACUGAGGGGUUUGGGGGGUUUUCGCCCU')
t = s.toTree()
t.fitSeq(r)
self.assertEqual(str(t), '.((((.....))))...(((.((...)).))).')
#run the test suites if invoked as a script from the command line
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_struct/test_selection.py 000644 000765 000024 00000004202 12024702176 023362 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
import os
try:
from cogent.util.unit_test import TestCase, main
from cogent.parse.pdb import PDBParser
from cogent.struct.selection import einput, select
except ImportError:
from zenpdb.cogent.util.unit_test import TestCase, main
from zenpdb.cogent.parse.pdb import PDBParser
from zenpdb.cogent.struct.selection import einput, select
__author__ = "Marcin Cieslik"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Marcin Cieslik"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Marcin Cieslik"
__email__ = "mpc4p@virginia.edu"
__status__ = "Development"
class AnnotationTest(TestCase):
"""tests selecting entities"""
def setUp(self):
self.input_file = os.path.join('data', '2E12.pdb')
self.input_structure = PDBParser(open(self.input_file))
def test_einput(self):
"""tests einput."""
structures = einput(self.input_structure, 'S')
models = einput(self.input_structure, 'M')
chains = einput(self.input_structure, 'C')
residues = einput(self.input_structure, 'R')
atoms = einput(self.input_structure, 'A')
self.assertEquals(structures.level, 'H')
self.assertEquals(models.level, 'S')
self.assertEquals(chains.level, 'M')
self.assertEquals(residues.level, 'C')
self.assertEquals(atoms.level, 'R')
atoms2 = einput(models, 'A')
self.assertEquals(atoms, atoms2)
atoms3 = einput(chains, 'A')
self.assertEquals(atoms, atoms3)
structures2 = einput(atoms, 'S')
self.assertEquals(self.input_structure, structures2.values()[0])
residues2 = einput(atoms, 'R')
self.assertEquals(residues, residues2)
def test_select(self):
"""tests select."""
water = select(self.input_structure, 'R', 'H_HOH', 'eq', 'name')
for residue in water:
self.assertTrue(residue.name == 'H_HOH')
non_water = select(self.input_structure, 'R', 'H_HOH', 'ne', 'name')
for residue in non_water:
self.assertTrue(residue.name != 'H_HOH')
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_seqsim/__init__.py 000644 000765 000024 00000000463 12024702176 022057 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
__all__ = ['test_seqsim']
__author__ = ""
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Daniel McDonald"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Developmentn"
PyCogent-1.5.3/tests/test_seqsim/test_analysis.py 000644 000765 000024 00000030413 12024702176 023200 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for analysis.py: substitution matrix analysis code."""
from cogent.seqsim.analysis import tree_threeway_counts, \
tree_twoway_counts, counts_to_probs, probs_to_rates, \
tree_threeway_rates, tree_twoway_rates, \
rates_to_array, multivariate_normal_prob
from cogent.seqsim.tree import RangeNode
from cogent.core.usage import DnaPairs, ABPairs
from cogent.seqsim.usage import Rates, Counts, Probs
from numpy import array, average, ones, zeros, float64, ravel, diag, any
from numpy.random import random, randint
from copy import deepcopy
from cogent.parse.tree import DndParser
from cogent.util.unit_test import TestCase, main
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Daniel McDonald"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class analysisTests(TestCase):
"""Tests of top-level functions."""
def setUp(self):
"""Make a couple of standard trees"""
self.t1 = DndParser('((a,(b,c)),(d,e))', RangeNode)
#selt.t1 indices: ((0,(1,2)5)6,(3,4)7)8
def test_threeway_counts(self):
"""threeway_counts should produce correct count matrix"""
self.t1.makeIdIndex()
ind = self.t1.IdIndex
ind[0].Sequence = array([0,0,0])
ind[1].Sequence = array([0,1,0])
ind[2].Sequence = array([1,0,1])
ind[3].Sequence = array([1,1,0])
ind[4].Sequence = array([1,1,1])
depths = self.t1.leafLcaDepths()
result = tree_threeway_counts(self.t1, depths, ABPairs)
#check we got the right number of comparisons
self.assertEqual(len(result), 20)
#check we got the right keys
for k in [(1,2,0),(2,1,0),(0,1,3),(1,0,3),(0,1,4),(1,0,4),(0,2,3),\
(2,0,3),(0,2,4),(2,0,4),(1,2,3),(2,1,3),(1,2,4),(2,1,4),(3,4,1),\
(4,3,1),(3,4,2),(4,3,2)]:
assert k in result
#spot-check a few results
self.assertEqual(result[(1,2,0)]._data, array([[2,1],[0,0]]))
self.assertEqual(result[(2,1,0)]._data, array([[1,2],[0,0]]))
self.assertEqual(result[(2,1,3)]._data, array([[0,1],[1,1]]))
def test_twoway_counts(self):
"""twoway_counts should produce correct count matrix"""
self.t1.makeIdIndex()
ind = self.t1.IdIndex
ind[0].Sequence = array([0,0,0])
ind[1].Sequence = array([0,1,0])
ind[2].Sequence = array([1,0,1])
ind[3].Sequence = array([1,1,0])
ind[4].Sequence = array([1,1,1])
depths = self.t1.leafLcaDepths()
#check that it works with averaging
result = tree_twoway_counts(self.t1, ABPairs)
#check we got the right number of comparisons: average by default
self.assertEqual(len(result), 10)
#check we got the right keys
for k in [(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)]:
assert k in result
#spot-check a few results
self.assertEqual(result[(0,1)]._data, array([[2,.5],[.5,0]]))
self.assertEqual(result[(2,3)]._data, array([[0,1],[1,1]]))
#check that it works when we don't average
result = tree_twoway_counts(self.t1, ABPairs, average=False)
self.assertEqual(len(result), 20)
#check we got the right keys
for k in [(0,1),(0,2),(0,3),(0,4),(1,2),(1,3),(1,4),(2,3),(2,4),(3,4)]:
assert k in result
#reverse should be in result too
assert (k[1],k[0]) in result
#spot-check values
self.assertEqual(result[(0,1)]._data, array([[2,1],[0,0]]))
self.assertEqual(result[(1,0)]._data, array([[2,0],[1,0]]))
def test_counts_to_probs(self):
"""counts_to_probs should skip cases with zero rows"""
counts = {
(0,1): Counts(array([[0,1],[1,0]]), ABPairs),
(1,2): Counts(array([[0,0],[1,0]]), ABPairs), #bad row
(0,3): Counts(array([[0,0],[0,0]]), ABPairs), #bad row
(0,4): Counts(array([[0.0,0.0],[0.0,0.0]]), ABPairs), #bad row
(0,5): Counts(array([[0.1,0.3],[0.0,0.0]]), ABPairs), #bad row
(3,4): Counts(array([[0.1,0.3],[0.4,0.1]]), ABPairs),
(2,1): Counts(array([[0,5],[1,0]]), ABPairs),
}
result = counts_to_probs(counts)
self.assertEqual(len(result), 3)
self.assertFloatEqual(result[(0,1)]._data, array([[0,1],[1,0]]))
self.assertFloatEqual(result[(3,4)]._data, \
array([[0.25,0.75],[0.8,0.2]]))
self.assertFloatEqual(result[(2,1)]._data, array([[0,1],[1,0]]))
def test_probs_to_rates(self):
"""probs_to_rates converts probs to rates, omitting problem cases"""
probs = dict([(i, Probs.random(DnaPairs)) for i in range(100)])
rates = probs_to_rates(probs)
#check we got at most the same number of items as in probs
assert len(rates) <= len(probs)
#check that we didn't get anything bad
vals = rates.values()
for v in vals:
assert not v.isSignificantlyComplex()
#check that we didn't miss anything good
for key, val in probs.items():
if key not in rates:
try:
r = val.toRates()
print r.isValid()
assert r.isSignificantlyComplex() or (not r.isValid())
except (ZeroDivisionError, OverflowError, ValueError):
pass
def test_rates_to_array(self):
"""rates_to_array should pack rates into array correctly"""
m1 = array([[-1,1,1,1],[2,-2,2,2],[3,3,-3,3],[1,2,3,-4]])
m2 = m1 * 2
m3 = m1 * 0.5
m4 = zeros((4,4))
m5 = array([0,0])
r1, r2, r3, r4, r5 = [Rates(i, DnaPairs) for i in m1,m2,m3,m4,m5]
data = {(0,1,0):r1, (1,2,0):r2, (2,0,0):r3, (2,1,1):r4}
#note that array can be, but need not be, floating point
to_fill = zeros((3,3,3,16), 'float64')
result = rates_to_array(data, to_fill)
#check that the thnigs we deliberately set are OK
self.assertEqual(to_fill[0][1][0], ravel(m1))
self.assertNotEqual(to_fill[0][1][0], ravel(m2))
self.assertEqual(to_fill[1,2,0], ravel(m2))
self.assertEqual(to_fill[2][0][0], ravel(m3))
self.assertEqual(to_fill[2][1][1], ravel(m4))
#check that everything else is zero
nonzero = [(0,1,0),(1,2,0),(2,0,0)]
for x in [(i, j, k) for i in range(3) for j in range(3) \
for k in range(3)]:
if x not in nonzero:
self.assertEqual(to_fill[x], zeros(16))
#check that it works omitting the diagonal
to_fill = zeros((3,3,3,12), 'float64')
result = rates_to_array(data, to_fill, without_diagonal=True)
#check that the thnigs we deliberately set are OK
m1_nodiag = array([[1,1,1],[2,2,2],[3,3,3],[1,2,3]])
self.assertEqual(to_fill[0][1][0], ravel(m1_nodiag))
self.assertNotEqual(to_fill[0][1][0], ravel(m1_nodiag*2))
self.assertEqual(to_fill[1,2,0], ravel(m1_nodiag*2))
self.assertEqual(to_fill[2][0][0], ravel(m1_nodiag*0.5))
self.assertEqual(to_fill[2][1][1], zeros(12))
#check that everything else is zero
nonzero = [(0,1,0),(1,2,0),(2,0,0)]
for x in [(i, j, k) for i in range(3) for j in range(3) \
for k in range(3)]:
if x not in nonzero:
self.assertEqual(to_fill[x], zeros(12))
def test_tree_threeway_rates(self):
"""tree_threeway_rates should give plausible results on rand trees"""
#note: the following fails occasionally, but repeating it 5 times
#and checking that one passes is fairly safe
for i in range(5):
try:
t = self.t1
t.assignLength(0.05)
t.Q = Rates.random(DnaPairs).normalize()
t.assignQ()
t.assignP()
t.evolve(randint(0,4,100))
t.makeIdIndex()
depths = t.leafLcaDepths()
result = tree_threeway_rates(t, depths)
self.assertEqual(result.shape, (5,5,5,16))
#check that row sums are 0
for x in [(i,j,k) for i in range(5) for j in range(5) \
for k in range(5)]:
self.assertFloatEqual(sum(result[x]), 0)
assert any(result)
#check that it works without_diag
result = tree_threeway_rates(t, depths, without_diag=True)
self.assertEqual(result.shape, (5,5,5,12))
#check that it works with/without normalize
#default: no normalization, so row sums shouldn't be 1 after
#omitting diagonal
result = tree_threeway_rates(t, depths, without_diag=True)
self.assertEqual(result.shape, (5,5,5,12))
for x in [(i,j,k) for i in range(5) for j in range(5) \
for k in range(5)]:
assert sum(result[x]) == 0 or abs(sum(result[x]) - 1) > 0.01
#...but if we tell it to normalize, row sums should be nearly 1
#after omitting diagonal
result = tree_threeway_rates(t, depths, without_diag=True, \
normalize=True)
self.assertEqual(result.shape, (5,5,5,12))
for x in [(i,j,k) for i in range(5) for j in range(5) \
for k in range(5)]:
s = sum(result[x])
if s != 0:
self.assertFloatEqual(s, 1)
break
except AssertionError:
pass
def test_tree_twoway_rates(self):
"""tree_twoway_rates should give plausible results on rand trees"""
t = self.t1
t.assignLength(0.05)
t.Q = Rates.random(DnaPairs).normalize()
t.assignQ()
t.assignP()
t.evolve(randint(0,4,100))
t.makeIdIndex()
result = tree_twoway_rates(t)
self.assertEqual(result.shape, (5,5,16))
#check that row sums are 0
for x in [(i,j) for i in range(5) for j in range(5)]:
self.assertFloatEqual(sum(result[x]), 0)
#need to make sure we didn't just get an empty array
self.assertGreaterThan((abs(result)).sum(), 0)
#check that it works without_diag
result = tree_twoway_rates(t, without_diag=True)
self.assertEqual(result.shape, (5,5,12))
#check that it works with/without normalize
#default: no normalization, so row sums shouldn't be 1 after omitting
#diagonal
result = tree_twoway_rates(t, without_diag=True)
self.assertEqual(result.shape, (5,5,12))
#check that the row sums are not 1 before normalization (note that they
#can be zero, though)
sums_before = []
for x in [(i,j) for i in range(5) for j in range(5)]:
curr_sum = sum(result[x])
sums_before.append(curr_sum)
#...but if we tell it to normalize, row sums should be nearly 1
#after omitting diagonal
result = tree_twoway_rates(t, without_diag=True, \
normalize=True)
self.assertEqual(result.shape, (5,5,12))
sums_after = []
for x in [(i,j) for i in range(5) for j in range(5)]:
curr_sum = sum(result[x])
sums_after.append(curr_sum)
if curr_sum != 0:
self.assertFloatEqual(curr_sum, 1)
try:
self.assertFloatEqual(sums_before, sums_after)
except AssertionError:
pass
else:
raise AssertionError, "Expected different arrays before/after norm"
def test_multivariate_normal_prob(self):
"""Multivariate normal prob should match R results"""
cov = array([[3,1,2],[1,5,4],[2,4,6]])
a = array([0,0,0])
b = array([1,1,1])
c = array([0.1, 0.2, 0.3])
small_cov = cov/10.0
mvp = multivariate_normal_prob
self.assertFloatEqual(mvp(a, cov), 0.01122420)
self.assertFloatEqual(mvp(a, cov, b), 0.009018894)
self.assertFloatEqual(mvp(a, small_cov, b), 0.03982319)
self.assertFloatEqual(mvp(c, small_cov, b), 0.06091317)
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_seqsim/test_birth_death.py 000644 000765 000024 00000026676 12024702176 023652 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
#file test_birth_death.py
"""Unit tests of birth_death.py: implementation of the birth-death model.
"""
from cogent.seqsim.birth_death import ExtinctionError, TooManyTaxaError, \
BirthDeathModel, DoubleBirthDeathModel
from cogent.util.unit_test import TestCase, main, FakeRandom
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Mike Robeson"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class BirthDeathModelTests(TestCase):
"""Tests of the BirthDeathModel class, which makes birth-death trees."""
def test_init_deafults(self):
"""BirthDeathModel should init correctly w/ default params"""
m = BirthDeathModel(0.1, 0.2, 0.3)
self.assertEqual(m.BirthProb, 0.1)
self.assertEqual(m.DeathProb, 0.2)
self.assertEqual(m.TimePerStep, 0.3)
self.assertEqual(m.MaxStep, 1000)
self.assertEqual(m.MaxTaxa, None)
self.assertEqual(m.CurrStep, 0)
self.assertEqual(m.Tree.__class__, m.NodeClass)
self.assertEqual(m.CurrTaxa, [m.Tree])
self.assertEqual(m.ChangedBirthProb,None)
self.assertEqual(m.ChangedDeathProb,None)
self.assertEqual(m.ChangedBirthStep,None)
self.assertEqual(m.ChangedDeathStep,None)
self.assertEqual(m.CurrBirthProb, 0.1)
self.assertEqual(m.CurrDeathProb, 0.2)
def test_init_bad(self):
"""BirthDeathModel should raise exceptions on init with bad data"""
#BirthProb and DeathProb must be probabilities between 0 and 1
self.assertRaises(ValueError, BirthDeathModel, -1, 0.2, 0.3)
self.assertRaises(ValueError, BirthDeathModel, 2, 0.2, 0.3)
self.assertRaises(ValueError, BirthDeathModel, 0.1, -1, 0.3)
self.assertRaises(ValueError, BirthDeathModel, 0.1, 2, 0.3)
#TimePerStep can't be negative or 0
self.assertRaises(ValueError, BirthDeathModel, 0.1, 0.2, -1)
self.assertRaises(ValueError, BirthDeathModel, 0.1, 0.2, 0)
def test_init_extras(self):
"""BirthDeathModel should init OK with extra params"""
m = BirthDeathModel(BirthProb=0.1, DeathProb=0.2, TimePerStep=0.3, \
ChangedBirthProb=0.4, ChangedDeathProb=0.3, ChangedBirthStep=3,\
ChangedDeathStep=4, MaxStep=5, MaxTaxa=10)
self.assertEqual(m.BirthProb, 0.1)
self.assertEqual(m.DeathProb, 0.2)
self.assertEqual(m.TimePerStep, 0.3)
self.assertEqual(m.ChangedBirthProb, 0.4)
self.assertEqual(m.ChangedDeathProb, 0.3)
self.assertEqual(m.ChangedBirthStep, 3)
self.assertEqual(m.ChangedDeathStep, 4)
self.assertEqual(m.MaxStep, 5)
self.assertEqual(m.MaxTaxa, 10)
self.assertEqual(m.CurrStep, 0)
self.assertEqual(m.Tree.__class__, m.NodeClass)
self.assertEqual(m.CurrTaxa, [m.Tree])
def test_step(self):
"""BirthDeathModel step should match hand-calculated results"""
m = BirthDeathModel(BirthProb=0.1, DeathProb=0.2, TimePerStep=1)
born_and_died = FakeRandom([0],True)
born_only = FakeRandom([1,0],True)
died_only = FakeRandom([0,1],True)
neither = FakeRandom([1],True)
kill_alternate = FakeRandom([0,1,1,1], True)
born_alternate = FakeRandom([1,1,1,0], True)
#check that with neither birth nor death, we just continue
m.step(neither)
self.assertEqual(len(m.Tree.Children), 0)
#check that with born_only we get a duplication
m.step(born_only)
self.assertEqual(len(m.Tree.Children), 2)
assert m.Tree not in m.CurrTaxa
for i in m.CurrTaxa:
assert i.Parent is m.Tree
self.assertEqual(i.Length, 1)
#check that with a second round of born_only we duplicate again
m.step(born_only)
self.assertEqual(len(m.Tree.Children), 2)
self.assertEqual(len(list(m.Tree.traverse())), 4)
for i in m.Tree.traverse():
self.assertEqual(i.Length, 1)
for i in m.Tree.Children:
self.assertEqual(i.Length, 1)
#check that branch lengths add correctly
for i in range(4):
m.step(neither)
self.assertEqual(len(m.CurrTaxa), 4)
self.assertEqual(len(m.Tree.Children), 2)
self.assertEqual(len(list(m.Tree.traverse())), 4)
for i in m.Tree.traverse():
self.assertEqual(i.Length, 5)
for i in m.Tree.Children:
self.assertEqual(i.Length, 1)
#check that we can kill offspring correctly
m.step(kill_alternate)
self.assertEqual(len(m.CurrTaxa), 2)
#make sure we killed the right children
m.Tree.assignIds()
for i in m.Tree.Children:
#note that killing a child doesn't remove it, just stops it changing
self.assertEqual(len(i.Children), 2)
self.assertEqual(i.Children[0].Length, 5)
self.assertEqual(i.Children[1].Length, 6)
self.assertEqual([i.Length for i in m.Tree.traverse()], \
[5,6,5,6])
#make sure that born_and_died does the same thing as neither
m.step(born_and_died)
self.assertEqual([i.Length for i in m.Tree.traverse()], \
[5,7,5,7])
m.step(neither)
self.assertEqual([i.Length for i in m.Tree.traverse()], \
[5,8,5,8])
#check that only CurrTaxa are brought forward
self.assertEqual([i.Length for i in m.CurrTaxa], [8,8])
#check that we can duplicate a particular taxon
m.step(born_alternate)
self.assertEqual([i.Length for i in m.CurrTaxa], [9,1,1])
self.assertEqual(m.CurrTaxa[1].Parent.Length, 8)
#check that we can kill 'em all
m.step(died_only)
self.assertEqual(len(m.CurrTaxa), 0)
def test_prob_step_check(self):
"""prob_check and step_check should return error when out of bounds.
Prob values should be between zero and one
Step values should be greater than zero
"""
#ChangedBirthProb = -0.1 , raises ValueError
self.assertRaises(ValueError, BirthDeathModel, 0.1, 0.2, 0.3,\
ChangedBirthProb=-0.1,ChangedBirthStep=3,ChangedDeathProb=0.3,\
ChangedDeathStep=4, MaxStep=5)
#ChangedBirthStep = 0 , raises ValueError
self.assertRaises(ValueError, BirthDeathModel, 0.1, 0.2, 0.3,\
ChangedBirthProb=0.6,ChangedBirthStep=0,ChangedDeathProb=0.3,\
ChangedDeathStep=4, MaxStep=5)
#ChangedDeathProb = 2 , raises ValueError
self.assertRaises(ValueError, BirthDeathModel, 0.1, 0.2, 0.3,\
ChangedBirthProb=0.6,ChangedBirthStep=3,ChangedDeathProb=2,\
ChangedDeathStep=4, MaxStep=5)
#ChangedDeathStep = -1 , raises ValueError
self.assertRaises(ValueError, BirthDeathModel, 0.1, 0.2, 0.3,\
ChangedBirthProb=0.6,ChangedBirthStep=3,ChangedDeathProb=0.3,\
ChangedDeathStep=-1, MaxStep=5)
def test_timeOk(self):
"""BirthDeathModel TimeOk should return True if time not exceeded"""
b = BirthDeathModel(0.1, 0.2, 0.3, MaxStep=5)
assert b.timeOk()
b.CurrStep = 4
assert b.timeOk()
b.CurrStep = 5
assert not b.timeOk()
b.CurrStep = 1000
assert not b.timeOk()
b.MaxStep = None
assert b.timeOk()
b.MaxStep = 1001
assert b.timeOk()
b.step()
assert not b.timeOk()
def test_taxaOk(self):
"""BirthDeathModel TaxaOk should return True if taxa not exceeded"""
b = BirthDeathModel(0.1, 0.2, 0.3, MaxTaxa=5)
born_alternate = FakeRandom([1,1,1,0], True)
born_only = FakeRandom([1,0],True)
kill_only = FakeRandom([0,1,0,1], True)
#start off with single taxon
assert b.taxaOk()
#taxa are OK if there are a few
b.step(born_only) #now 2 taxa
assert b.taxaOk()
b.step(born_only) #now 4 taxa
assert b.taxaOk()
b.step(born_only) #now 8 taxa
assert not b.taxaOk()
b.MaxTaxa = 8
assert not b.taxaOk()
b.MaxTaxa = 9
assert b.taxaOk()
b.MaxTaxa = 17
assert b.taxaOk()
b.step(born_only)
assert b.taxaOk()
b.step(born_only)
assert not b.taxaOk()
#ok if no maximum
b.MaxTaxa = None
assert b.taxaOk()
#not ok if there are no taxa left
b.step(kill_only)
assert not b.taxaOk()
#still not OK if not MaxTaxa
b.MaxTaxa = None
assert not b.taxaOk()
def test_call_exact(self):
"""BirthDeathModel call should produce right # taxa when exact"""
m = BirthDeathModel(0.01, 0.005, 0.1, MaxTaxa=10)
for i in range(10):
try:
result = m(filter=True, exact=True)
self.assertEqual(len(list(result.traverse())), 10)
except (TooManyTaxaError, ExtinctionError), e:
pass
def test_call(self):
"""BirthDeathModel call should produce hand-calculated trees"""
m = BirthDeathModel(0.01, 0.005, 0.1, MaxTaxa=10)
r = FakeRandom(\
[1,0,\
1,1, 1,1,\
1,0, 0,0,\
0,0, 0,0, 1,0,\
0,0, 0,0, 0,1, 0,0, \
1,0, 0,0, 0,0,\
1,0, 0,0, 0,0, 1,0, \
1,0, 1,0, 0,1, 1,1, 1,0, 1,0, \
1,1, 1,1, 1,1, 1,1, 1,0, 1,1, 1,1, 1,1, 1,1], True)
m = BirthDeathModel(0.1, 0.5, 1, MaxTaxa=10)
result = m(filter=False, random_f=r)
self.assertEqual([i.Length for i in result.traverse()], \
[2,2,2,2,2,1,1,1,2,2,2,2])
#try it with pruning
m = BirthDeathModel(0.1, 0.5, 1, MaxTaxa=10)
result = m(filter=True, random_f=r)
self.assertEqual([i.Length for i in result.traverse()], \
[2,2,2,2,1,1,2,2,2,2])
#try it with fewer taxa
m = BirthDeathModel(0.1, 0.5, 1, MaxTaxa=4)
result = m(filter=True, random_f=r)
self.assertEqual([i.Length for i in result.traverse()], \
[2,2,1,1])
def test_changed_values_step(self):
"""Tests if values changed at specified steps in step().
Note, in m.step() CurrStep is logically tested one step later.
"""
m = BirthDeathModel( 0.1, 0.2, 0.3,ChangedBirthProb=0.6,\
ChangedBirthStep=3,ChangedDeathProb=0.3,ChangedDeathStep=4,\
MaxStep=5)
# all values should be as initialized
m.step()
assert m.CurrStep == 1
assert m.BirthProb == 0.1
assert m.DeathProb == 0.2
assert m.CurrBirthProb == 0.1
assert m.CurrDeathProb == 0.2
# continue 2 steps
m.step()
m.step()
# when logically evaluated CurrBirthProb should change
# from 0.1 to 0.6
m.step()
assert m.CurrStep == 4
assert m.BirthProb == 0.1
assert m.DeathProb == 0.2
assert m.CurrBirthProb == 0.6
assert m.CurrDeathProb == 0.2
# All values other than CurrStep should be as above
# except that CurrDeathProb should change from 0.2 to 0.3
m.step()
assert m.CurrStep == 5
assert m.BirthProb == 0.1
assert m.DeathProb == 0.2
assert m.CurrBirthProb == 0.6
assert m.CurrDeathProb == 0.3
class DoubleBirthDeathTests(TestCase):
"""Tests of the double birth-death model."""
def test_double_birth_death(self):
"""double_birth_death should run without errors"""
pass
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_seqsim/test_markov.py 000644 000765 000024 00000012334 12024702176 022656 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env/python
"""test_markov.py: tests of the MarkovGenerator class.
"""
from cogent.seqsim.markov import MarkovGenerator
from StringIO import StringIO
from operator import mul
from sys import path
from cogent.util.unit_test import TestCase, main
from numpy import array
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Jesse Zaneveld", "Daniel McDonald"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Development"
class MarkovGeneratorTests(TestCase):
"""Tests of the MarkovGenerator class."""
def setUp(self):
"""Define a few well-known frequencies."""
self.single = MarkovGenerator(['UUUUUUUUUU'], order=0)
self.equal = MarkovGenerator(['UUUUUCCCCC'], order=0)
self.unequal_0 = MarkovGenerator(['UCCCCCCCCC'], order=0)
self.unequal_1 = MarkovGenerator(['UCCCCCCCCC'], order=-1)
self.pairs = MarkovGenerator(['UCAUCAUCAUCAUCA'], order=1)
self.randquads=MarkovGenerator(['AACCUAUCUACUACUAUCUUCAUAUUCC']\
,order=3, calc_entropy=True, delete_bad_suffixes=False)
self.empty = MarkovGenerator('', order=0)
self.linebreaks= MarkovGenerator(StringIO('abb\nbcc\nd\n'))
self.dinucs=MarkovGenerator(['ATACATAC'],order=1)
self.orderfive=MarkovGenerator(['AAAAAGAAAAATAAAAAGAAAAAT'],order=5)
def test_init(self):
"""MarkovGenerator init should give right frequency distributions."""
self.assertEqual(self.empty.Frequencies, {})
self.assertEqual(self.single.Frequencies, {'':{'U':1.0}})
self.assertEqual(self.equal.Frequencies, {'':{'U':0.5,'C':0.5}})
self.assertEqual(self.unequal_0.Frequencies, {'':{'U':0.1,'C':0.9}})
self.assertEqual(self.unequal_1.Frequencies, {'':{'U':0.5, 'C':0.5}})
self.assertEqual(self.pairs.Frequencies, \
{'U':{'C':1},'C':{'A':1},'A':{'U':1}})
#check that recalculating the frequencies doesn't break anything
self.pairs.calcFrequencies()
self.assertEqual(self.pairs.Frequencies, \
{'U':{'C':1},'C':{'A':1},'A':{'U':1}})
exp={'AAC':{'C':1},'ACC':{'U':1},'CCU':{'A':1},'CUA':{'U':0.5,'C':0.5},\
'UAU':{'U':1/3.0,'C':2/3.0},'AUC':{'U':1},'UCU':{'U':0.5,'A':0.5},\
'UAC':{'U':1},'ACU':{'A':1},'CUU':{'C':1},'UUC':{'C':0.5,'A':0.5},\
'UCA':{'U':1},'CAU':{'A':1},'AUA':{'U':1},'AUU':{'C':1},
}
obs = self.randquads.Frequencies
self.assertFloatEqual(obs, exp)
#check that resetting linebreaks has the desired effect
self.assertEqual(self.linebreaks.Frequencies, \
{'a':{'b':1},'b':{'b':0.5,'c':0.5},'c':{'c':1}})
self.linebreaks.Linebreaks = True
self.linebreaks.Text.seek(0)
self.linebreaks.calcFrequencies()
#NOTE: current algorithm won't extend over line breaks. If you want
#to force use of line breaks, read into a single string.
self.assertEqual(self.linebreaks.Frequencies, \
{'a':{'b':1},'b':{'b':0.5,'c':0.5},'c':{'c':1}})
def test_next(self):
"""MarkovGenerator.next should generate text with expected properties"""
#haven't figured how to do this for longer correlation lengths yet
pass
def test_entropy(self):
"""MarkovGenerator._entropy() should correctly calculate average H"""
self.assertFloatEqual(self.randquads.Entropy, \
3.0/25 * 0.91829583405448956 + 8.0/25)
def test_evaluateProbability(self):
"""Should calculate proper P value for seq"""
self.dinucs.Prior=1
q=self.dinucs.evaluateProbability('AT')
self.assertFloatEqual(q,.50)
z=self.dinucs.evaluateProbability('ATAT')
self.assertFloatEqual(z,.25)
p=self.dinucs.evaluateProbability('ATATAT')
self.assertFloatEqual(p,.125)
j=self.dinucs.evaluateProbability('ATACAT')
self.assertFloatEqual(j,.125)
h=self.orderfive.evaluateProbability('AAAAAT')
self.assertFloatEqual(h,.50)
def test_replaceDegenerateBases(self):
"""strips degenerate bases...."""
text = 'AATCGCRRCCYAATC'
m=MarkovGenerator([text],order=2)
self.assertEqual(m.Text, [text])
m.replaceDegenerateBases()
self.assertEqual(m.Text[0][0:6],'aatcgc')
p=m.Text[0][6]
q= p in ['a','t','c','g']
self.assertEqual(q,True)
def test_wordToUniqueKey(self):
"""wordToUniqueKey should generate proper integers"""
m=MarkovGenerator(['aataacaataac'],order=2)
word='gca'
uniqueKey=m.wordToUniqueKey(word)
#a=0 c=1 t=2 g=3
#should be (4^0*3)+(4^1*1)+(4^2*0)=3+4+0=7
self.assertEqual(uniqueKey,7)
def test_evaluateArrayProbability(self):
"""evaluateArrayProbability should calc prob from array indices"""
m=MarkovGenerator(['aaaaaaaatt'],order=0)
#8 a's, 2 t's
m.calcFrequencies()
prob=m.evaluateArrayProbability(array([0,2]))
self.assertFloatEqual(prob,0.16) #0.8*0.2
prob=m.evaluateArrayProbability(array([0,1]))
self.assertFloatEqual(prob,0) #0.8*0
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_seqsim/test_microarray.py 000644 000765 000024 00000013105 12024702176 023524 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for the microarray module, dealing with fake expression data."""
from cogent.util.unit_test import TestCase, main
from cogent.seqsim.microarray import MicroarrayNode
from numpy import ones
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Daniel McDonald"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Development"
class MicroarrayNodeTests(TestCase):
"""Tests of the MicroarrayNode class"""
def test_init_empty(self):
"""MicroarrayNode empty init should return new object as expected"""
m = MicroarrayNode()
self.assertEqual(m.Length, 0)
self.assertEqual(m.Array, None)
self.assertEqual(m.Name, None)
self.assertEqual(m.Children, [])
self.assertEqual(m.Parent, None)
def test_init(self):
"""MicroarrayNode init should return new object w/ correct attributes"""
m = MicroarrayNode(Name='x')
self.assertEqual(m.Length, 0)
self.assertEqual(m.Array, None)
self.assertEqual(m.Name, 'x')
self.assertEqual(m.Children, [])
self.assertEqual(m.Parent, None)
n = MicroarrayNode(3, 'xyz', Parent=m)
self.assertEqual(n.Length, 3)
self.assertEqual(n.Array, 'xyz')
self.assertEqual(n.Name, None)
self.assertEqual(n.Children, [])
assert n.Parent is m
def test_mutate(self):
"""Microarray mutate should set arrays appropriately"""
#check that it works as the root
a = ones(25, 'float64')
m = MicroarrayNode()
m.setExpression(a)
assert m.Array is not a
self.assertEqual(m.Array, a)
#check that it works on a single node w/ branchlength set
m.Length = 1
m.setExpression(a)
self.assertNotEqual(m.Array, a)
assert min(m.Array) > -4
assert max(m.Array) < 6
#check that it works for the children
m.Length = None
m2, m3, m4 = MicroarrayNode(), MicroarrayNode(), MicroarrayNode()
m5, m6, m7 = MicroarrayNode(), MicroarrayNode(), MicroarrayNode()
m8, m9, m10 = MicroarrayNode(), MicroarrayNode(), MicroarrayNode()
m.Children = [m2,m3, m4]
m2.Children = [m5]
m3.Children = [m6,m7,m8]
m8.Children = [m9,m10]
m2.Length = 2 # should be ~ 2 sd from 1
m3.Length = 0 # should test equal to m.Array
m4.Length = 0.1 # should be ~ 0.1 sd from 1
m5.Length = 1 # should be ~ 3 sd from 1
m6.Length = 0.1 # should be in same bounds as m4
m7.Length = 2 # should be in same bounds as m2
m8.Length = 1 # should be ~ 1 sd from 1
m9.Length = 1 # should be in same bounds as m2
m10.Length = 0 # should test equal to m8
m.setExpression(a)
self.assertNotEqual(m.Array, m2.Array)
self.assertEqual(m.Array, m3.Array)
self.assertNotEqual(m.Array, m4.Array)
self.assertNotEqual(m.Array, m5.Array)
self.assertNotEqual(m.Array, m6.Array)
self.assertNotEqual(m.Array, m7.Array)
self.assertNotEqual(m.Array, m8.Array)
self.assertNotEqual(m.Array, m9.Array)
self.assertNotEqual(m.Array, m10.Array)
self.assertNotEqual(m2.Array, m3.Array)
self.assertNotEqual(m2.Array, m4.Array)
self.assertNotEqual(m2.Array, m5.Array)
self.assertNotEqual(m2.Array, m6.Array)
self.assertNotEqual(m2.Array, m7.Array)
self.assertNotEqual(m2.Array, m8.Array)
self.assertNotEqual(m2.Array, m9.Array)
self.assertNotEqual(m2.Array, m10.Array)
self.assertNotEqual(m3.Array, m4.Array)
self.assertNotEqual(m3.Array, m5.Array)
self.assertNotEqual(m3.Array, m6.Array)
self.assertNotEqual(m3.Array, m7.Array)
self.assertNotEqual(m3.Array, m8.Array)
self.assertNotEqual(m3.Array, m9.Array)
self.assertNotEqual(m3.Array, m10.Array)
self.assertNotEqual(m4.Array, m5.Array)
self.assertNotEqual(m4.Array, m6.Array)
self.assertNotEqual(m4.Array, m7.Array)
self.assertNotEqual(m4.Array, m8.Array)
self.assertNotEqual(m4.Array, m9.Array)
self.assertNotEqual(m4.Array, m10.Array)
self.assertNotEqual(m5.Array, m6.Array)
self.assertNotEqual(m5.Array, m7.Array)
self.assertNotEqual(m5.Array, m8.Array)
self.assertNotEqual(m5.Array, m9.Array)
self.assertNotEqual(m5.Array, m10.Array)
self.assertNotEqual(m6.Array, m7.Array)
self.assertNotEqual(m6.Array, m8.Array)
self.assertNotEqual(m6.Array, m9.Array)
self.assertNotEqual(m6.Array, m10.Array)
self.assertNotEqual(m7.Array, m8.Array)
self.assertNotEqual(m7.Array, m9.Array)
self.assertNotEqual(m7.Array, m10.Array)
self.assertNotEqual(m8.Array, m9.Array)
self.assertEqual(m8.Array, m10.Array)
self.assertNotEqual(m9.Array, m10.Array)
#check that amount of change is about right
#assert 1 > min(m2.Array) > -15
#assert 1 < max(m2.Array) < 15
#assert 1 > min(m4.Array) > 0.4
#assert 1 < max(m4.Array) < 1.6
# might want stochastic tests here...
self.assertIsBetween(m2.Array, -11, 13)
self.assertIsBetween(m4.Array, 0.4, 1.6)
self.assertIsBetween(m5.Array, -15, 17)
self.assertIsBetween(m6.Array, 0.4, 1.6)
self.assertIsBetween(m7.Array, -11, 13)
self.assertIsBetween(m8.Array, -4, 7)
self.assertIsBetween(m9.Array, -11, 13)
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_seqsim/test_microarray_normalize.py 000644 000765 000024 00000007700 12024702176 025610 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests of microarray_normalize.py: code for normalizing microarrays.
"""
from cogent.seqsim.microarray_normalize import (zscores, logzscores, ranks,
quantiles,
make_quantile_normalizer, make_normal_quantile_normalizer,
make_empirical_quantile_normalizer,
geometric_mean )
from cogent.util.unit_test import TestCase, main
from numpy import array, arange, reshape, log2
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Micah Hamady", "Daniel McDonald"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Development"
class microarray_normalize_tests(TestCase):
"""Tests of top-level functions."""
def test_zscores(self):
"""zscores should convert array to zscores within each column"""
a = reshape(arange(15),(5,3))
z = zscores(a)
self.assertEqual(z[2], array([0,0,0])) #middle should be mean
self.assertFloatEqual(z[0], [-1.41421356]*3)
#check that it works when arrays aren't sorted
a[0] = a[-1]
a[1] = a[-2]
a[0, -1] = 50
z = zscores(a)
self.assertEqual(z[0,0],z[-1,0])
self.assertFloatEqual(z[0,-1], 1.9853692256351525)
self.assertFloatEqual(z[-1,-1], -0.30544141932848506)
def test_logzscores(self):
"""logzscores should perform zscores on log of a"""
a = reshape(arange(1,16),(5,3)) #won't work with zero value
self.assertFloatEqual(logzscores(a), zscores(log2(a)))
def test_ranks(self):
"""ranks should convert array to ranks within each column"""
a = array([[10,20,30],[20,10,50],[30,5,10]])
r = ranks(a)
self.assertEqual(r, array([[0,2,1],[1,1,2],[2,0,0]]))
def test_quantiles(self):
"""quantiles should convert array to quantiles within each column"""
a = array([[10,20,30],[20,10,50],[30,5,10],[40,40,40]])
q = quantiles(a)
self.assertEqual(q, \
array([[0,.5,.25],[.25,.25,.75],[.5,0,0],[.75,.75,.5]]))
def test_make_quantile_normalizer(self):
"""make_quantile_normalizer should sample from right distribution."""
dist = array([1,2,3,4])
qn = make_quantile_normalizer(dist)
a = array([[10,20,30],[20,10,50],[30,5,10],[40,40,40]])
q = qn(a)
self.assertEqual(q, \
array([[1,3,2],[2,2,4],[3,1,1],[4,4,3]]))
#check that it works when they don't match in size exactly
dist = array([2,4,6,7,8,8,8,8])
qn = make_quantile_normalizer(dist)
a = array([[10,20,30],[20,10,50],[30,5,10],[40,40,40]])
q = qn(a)
self.assertEqual(q, \
array([[2,8,6],[6,6,8],[8,2,2],[8,8,8]]))
def test_make_normal_quantile_normalizer(self):
"""make_normal_quantile_normalizer should sample from normal dist."""
nqn = make_normal_quantile_normalizer(20, 10)
a = array([[10,20,30],[20,10,50],[30,5,10],[40,40,40]])
q = nqn(a)
exp = array([[-289.02323062, 20. , -47.44897502],
[ -47.44897502, -47.44897502, 87.44897502],
[ 20. , -289.02323062, -289.02323062],
[ 87.44897502, 87.44897502, 20. ]])
self.assertFloatEqual(q, exp)
def test_make_empirical_quantile_normalizer(self):
"""make_empirical_quantile_normalizer should convert a to dist of data"""
dist = array([4,2,3,1]) #note: out of order
qn = make_empirical_quantile_normalizer(dist)
a = array([[10,20,30],[20,10,50],[30,5,10],[40,40,40]])
q = qn(a)
self.assertEqual(q, \
array([[1,3,2],[2,2,4],[3,1,1],[4,4,3]]))
def test_geometric_mean(self):
"""geometric_mean should return geometric mean."""
a = array([1.05, 1.2, .96])
gmean = geometric_mean(a)
self.assertFloatEqual(gmean, 1.065484802091121)
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_seqsim/test_randomization.py 000644 000765 000024 00000006201 12024702176 024231 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for the microarray module, dealing with fake expression data."""
from cogent.util.unit_test import TestCase, main
from cogent.seqsim.randomization import shuffle_range, shuffle_between, \
shuffle_except_indices, shuffle_except
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Development"
class randomization_tests(TestCase):
"""Tests of the top-level functionality"""
def setUp(self):
"""Make some standard objects to randomize"""
self.numbers = list('123')
self.letters = list('abcdef')
self.to_test = self.numbers + 2*self.letters + self.numbers
def test_shuffle_range(self):
"""shuffle_range should shuffle only inside range"""
shuffle_range(self.to_test, 3, -3)
self.assertEqual(self.to_test[:3],self.numbers)
self.assertEqual(self.to_test[-3:], self.numbers)
self.assertNotEqual(self.to_test[3:-3], 2*self.letters)
self.assertEqualItems(self.to_test[3:-3], 2*self.letters)
#this time, start is negative and end is positive
shuffle_range(self.to_test, -15, 15)
self.assertEqual(self.to_test[:3],self.numbers)
self.assertEqual(self.to_test[-3:], self.numbers)
self.assertNotEqual(self.to_test[3:-3], 2*self.letters)
self.assertEqualItems(self.to_test[3:-3], 2*self.letters)
def test_shuffle_between(self):
"""shuffle_between should shuffle between specified chars"""
shuffle_peptides = shuffle_between('KR')
seq1 = 'AGHCDSGAHF' #each 10 chars long
seq2 = 'PLMIDNYHGT'
protein = seq1 + 'K' + seq2
result = shuffle_peptides(protein)
self.assertEqual(result[10], 'K')
self.assertNotEqual(result[:10], seq1)
self.assertEqualItems(result[:10], seq1)
self.assertNotEqual(result[11:], seq2)
self.assertEqualItems(result[11:], seq2)
def test_shuffle_except_indices(self):
"""shuffle_except_indices should shuffle all except specified indices"""
seq1 = 'AGHCDSGAHF' #each 10 chars long
seq2 = 'PLMIDNYHGT'
protein = seq1 + 'K' + seq2
result = list(protein)
shuffle_except_indices(result, [10])
self.assertEqual(result[10], 'K')
self.assertNotEqual(''.join(result), protein)
self.assertEqualItems(''.join(result), protein)
self.assertNotEqualItems(''.join(result[:10]), seq1)
def test_shuffle_except(self):
"""shuffle_except_indices should shuffle all except specified indices"""
seq1 = 'AGHCDSGAHF' #each 10 chars long
seq2 = 'PLMIDNYHGT'
protein = seq1 + 'K' + seq2
prot = protein
se = shuffle_except('K')
result = se(prot)
self.assertEqual(result[10], 'K')
self.assertNotEqual(''.join(result), protein)
self.assertEqualItems(''.join(result), protein)
self.assertNotEqualItems(''.join(result[:10]), seq1)
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_seqsim/test_searchpath.py 000644 000765 000024 00000064245 12024702176 023511 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests private methods of SearchPath and SearchNode classes.
"""
from cogent.util.unit_test import TestCase, main
from cogent.util.misc import NonnegIntError
from cogent.seqsim.searchpath import SearchPath, SearchNode
__author__ = "Amanda Birmingham"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Amanda Birmingham"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Amanda Birmingham"
__email__ = "amanda.birmingham@thermofisher.com"
__status__ = "Production"
class SearchPathHelper(object):
"""Contains data and function defs used by public AND private tests"""
#all primers have certain forbidden sequences: no runs longer than 3 of
#any purines or pyrimidines
standard_forbid_seq = ['AAAA', 'GAAA', 'AGAA', 'GGAA', 'AAGA', 'GAGA', \
'AGGA', 'GGGA', 'AAAG', 'GAAG', 'AGAG', 'GGAG', \
'AAGG', 'GAGG', 'AGGG', 'GGGG', 'CCCC', 'TCCC', \
'CTCC', 'TTCC', 'CCTC', 'TCTC', 'CTTC', 'TTTC', \
'CCCT', 'TCCT', 'CTCT', 'TTCT', 'CCTT', 'TCTT', \
'CTTT', 'TTTT']
alphabets = {SearchPath.DEFAULT_KEY:"ACGT"}
#end SearchPathHelper
class SearchNodeHelper(object):
"""Contains data and function defs used by public AND private tests"""
#list of possible bases at any position
alphabet = ["A", "C", "G", "T"]
#end SearchNodeHelper
class SearchPathTests(TestCase):
"""Tests public SearchPath methods."""
#-------------------------------------------------------------------
#Tests of clearNodes
def test_clearNodes(self):
"""Should empty path stack and variable forbidden"""
#create a searchpath and add just one node
test = SearchPath(SearchPathHelper.alphabets)
test.generate(1)
#now call clear and make sure path value is "" (empty)
test.clearNodes()
self.assertEquals(test.Value, "")
#end test_clearNodes
#-------------------------------------------------------------------
#-------------------------------------------------------------------
#Tests of generate method
def tripletGenerator(self, pathobj, path_length):
"""Helper function to generate primers and catalogue triplets"""
found_triplets = {}
#make a hundred random searchpaths
for i in xrange(100):
curr_path_val = pathobj.generate(path_length)
#now find all the triplets in this path
for r in xrange(path_length-2):
curr_triplet = curr_path_val[r:r+3]
found_triplets[curr_triplet] = True
#end if
#clear out the path
pathobj.clearNodes()
#next rand sequence
return found_triplets
#end tripletGenerator
def test_generate_fullCoverage(self):
"""With no constraints, should produce all possible triplets"""
path_length = 20
test = SearchPath(SearchPathHelper.alphabets)
#make a hundred random searchpaths and see what triplets produced
found_triplets = self.tripletGenerator(test, path_length)
num_found = len(found_triplets.keys())
self.assertEquals(num_found, 64)
#end test_generate_fullCoverage
def test_generate_withForbidden(self):
"""With 2 triplet constraints, should produce all others"""
forbidden_triplet = ["ATG", "CCT"]
path_length = 20
test = SearchPath(SearchPathHelper.alphabets, forbidden_triplet)
#make a hundred random searchpaths and see what triplets produced
found_triplets = self.tripletGenerator(test, path_length)
num_found = len(found_triplets.keys())
self.assertEquals(num_found, 62)
#end test_generate_oneForbidden
def test_generate_nonePossible(self):
"""Should return null if no path can match constraints"""
alphabet = {SearchPath.DEFAULT_KEY:"AB"}
#forbid all combinations of alphabet
forbidden_seqs = ["AA", "AB", "BB", "BA"]
test = SearchPath(alphabet, forbidden_seqs)
output = test.generate(2)
self.assertEquals(output, None)
#end test_generate_nonePossible
def test_generate_multiple(self):
"""Should be able to call generate multiple times to extend path"""
test = SearchPath(SearchPathHelper.alphabets)
output1 = test.generate(2)
output2 = test.generate(3)
#make sure that the length of the path is now three
self.assertEquals(len(output2), 3)
#make sure that the new path is a superset of the old one
self.assertEquals(output1, output2[:2])
#end test_generate_multiple
def test_generate_correctAlph(self):
"""Should get correct alphabet even if node is popped then readded"""
test_alphs = {0:"A",1:"BC",2:"D",3:"E",SearchPath.DEFAULT_KEY:"X"}
forbidden_seqs = ["CD"]
test = SearchPath(test_alphs, forbidden_seqs)
#given these position alphabets and this forbidden seq,
#the only legal 3-node searchpath should be ABD. Make
#a hundred searchpaths and make sure this is the only one
#that actually shows up.
found_paths = {}
for i in xrange(100):
curr_path = test.generate(3)
found_paths[curr_path] = True
test.clearNodes()
#next
#make sure there is only one path found and that it is the right one
found_path_str = str("".join(found_paths.keys()))
self.assertEquals(len(found_paths), 1)
self.assertEquals(found_path_str, "ABD")
#end test_generate_correctAlph
#-------------------------------------------------------------------
#-------------------------------------------------------------------
#Tests of findAllowedOption
def test_findAllowedOption_currentAllowed(self):
"""Should return true when current option is allowed"""
#searchpath with no forbidden seqs, so anything should work
test = SearchPath(SearchPathHelper.alphabets)
test._add_node(SearchNode(SearchNodeHelper.alphabet))
allowed_found = test.findAllowedOption()
self.assertEquals(allowed_found, True)
#end test_findAllowedOption_currentAllowed
def test_findAllowedOption_otherAllowed(self):
"""Should return true when curr option is bad but another is good"""
node_vals = []
#create a path and put in 2 nodes; since all the forbidden seqs I
#used to init the path have 4 entries, there should be no chance that
#the path I just created has anything forbidden in it
test = self._fill_path(2, node_vals)
#add the existing path value to the forbidden list
test._fixed_forbidden["".join(node_vals)] = True
test._forbidden_lengths[2] = True
#call findAllowedOption ... should find next available good option
allowed_found = test.findAllowedOption()
self.assertEquals(allowed_found, True)
#end test_findAllowedOption_otherAllowed
def test_findAllowedOption_none(self):
"""Should return false if curr option is bad and no good exist"""
test = self._fill_path(1)
self._empty_top(test)
#get the value of the top node's only remaining option;
#add to forbidden
last_option = test._get_top().Options
test._fixed_forbidden["".join(last_option)] = True
test._forbidden_lengths[1] = True
#now make sure we get back result that no options for path remain
allowed_found = test.findAllowedOption()
self.assertEquals(allowed_found, False)
#end test_findAllowedOption_none
#-------------------------------------------------------------------
#-------------------------------------------------------------------
#Tests of removeOption
#Helper function
def _fill_path(self, num_nodes, node_vals = []):
"""create a searchpath and add searchnodes; return path"""
test = SearchPath(SearchPathHelper.alphabets, \
SearchPathHelper.standard_forbid_seq)
for i in xrange(num_nodes):
curr_node = SearchNode(SearchNodeHelper.alphabet)
node_vals.append(curr_node.Value)
test._add_node(curr_node)
#next i
return test
#end _fill_path
#Helper function
def _empty_top(self, spath):
"""remove all but one options from the top node"""
top_node = spath._get_top()
num_options = len(top_node.Options)
for i in xrange(num_options-1): top_node.removeOption()
#end _empty_top
def test_removeOption_simple(self):
"""Should correctly remove option from untapped node"""
#create a searchpath and a searchnode
test = self._fill_path(1)
orig_len_minus1 = len(test._get_top().Options) - 1
#also check that remove result is true: node still has options
has_options = test.removeOption()
self.assertEqual(has_options, True)
#get the top node and make sure that it has fewer options
top_node = test._get_top()
option_len = len(top_node.Options)
self.assertEqual(option_len, orig_len_minus1)
#end test_removeOption_simple
def test_removeOption_empty(self):
"""Should return False if removing option leads to empty stack"""
#create a searchpath with just one searchnode, then (almost) empty it
test = self._fill_path(1)
self._empty_top(test)
#now remove the last option, and make sure the stack is now empty
some_left = test.removeOption()
self.assertEquals(some_left, False)
#end test_removeOption_empty
def test_removeOption_recurse(self):
"""Should correctly remove empty node and curr option of next"""
#put two nodes in the search path and almost empty the top one
test = self._fill_path(2)
self._empty_top(test)
test.removeOption()
#make sure there's only one item left in the path
self.assertEquals(len(test._path_stack), 1)
#make sure that it has one fewer options
top_node = test._get_top()
self.assertEquals(len(top_node.Options), len(top_node.Alphabet)-1)
#end test_removeOption_recurse
#-------------------------------------------------------------------
#end SearchPathTests
class SearchNodeTests(TestCase):
"""Tests public SearchNode methods."""
#-------------------------------------------------------------------
#Tests of removeOption method
def test_removeOption_someLeft(self):
"""removeOption should cull options and return T when some left."""
#create a search node and get its current value
test = SearchNode(SearchNodeHelper.alphabet)
last_val = test.Value
some_left = test.removeOption()
#new current value must be different from old
#and return value must be true
self.assertNotEqual(test.Value, last_val)
self.assertEqual(some_left, True)
#end test_removeOption_someLeft
def test_removeOption_noneLeft(self):
"""removeOption should cull options and return F when none left."""
test = SearchNode(SearchNodeHelper.alphabet)
num_options = len(test.Options)
#removeOption num_options times: that should get 'em all
for i in xrange(num_options): some_left = test.removeOption()
#return value should be false (no options remain)
self.assertEqual(some_left, False)
#end test_removeOption_noneLeft
#-------------------------------------------------------------------
#-------------------------------------------------------------------
#Test of options property (and _get_options method)
def test_options(self):
"""Should return a copy of real options"""
test = SearchNode(SearchNodeHelper.alphabet)
optionsA = test.Options
del optionsA[0]
optionsB = test.Options
self.assertNotEqual(len(optionsA), len(optionsB))
#end test_options
#-------------------------------------------------------------------
#end SearchNodeTests
class SearchPathTests_private(TestCase):
"""Tests for private SearchPath methods."""
#No need to test __str__: just calls toString in general_tools
#No need to test _accept_option or _remove_accepted_option: they
#simply pass in the base class implementation
#No need to test _in_extra_forbidden: just returns False in base class
#implementation
#-------------------------------------------------------------------
#Helper functions
def _fill_path(self, num_nodes, node_vals = []):
"""create a searchpath and add searchnodes; return path"""
test = SearchPath(SearchPathHelper.alphabets, \
SearchPathHelper.standard_forbid_seq)
for i in xrange(num_nodes):
curr_node = SearchNode(SearchNodeHelper.alphabet)
node_vals.append(curr_node.Value)
test._add_node(curr_node)
#next i
return test
#end _fill_path
def _empty_top(self, spath):
"""remove all but one options from the top node"""
top_node = spath._get_top()
num_options = len(top_node.Options)
for i in xrange(num_options-1): top_node.removeOption()
#end _empty_top
#-------------------------------------------------------------------
#-------------------------------------------------------------------
#Tests of __init__
def test_init_noForbid(self):
"""Init should correctly set private properties w/o forbid list"""
test = SearchPath(SearchPathHelper.alphabets)
real_result = len(test._fixed_forbidden.keys())
self.assertEquals(real_result, 0)
#end test_init_noForbid
def test_init_withForbid(self):
"""Init should correctly set private properties, w/forbid list"""
user_input = SearchPathHelper.standard_forbid_seq[:]
user_input.extend(["AUG", "aaaaccuag"])
test = SearchPath(SearchPathHelper.alphabets, user_input)
user_input = [i.upper() for i in user_input]
user_input.sort()
real_result = test._fixed_forbidden.keys()
real_result.sort()
self.assertEquals(str(real_result), str(user_input))
#end test_init_withForbid
def test_init_badAlphabets(self):
"""Init should fail if alphabets param is not dictionary-like"""
self.assertRaises(ValueError, SearchPath, "blue")
#end test_init_badAlphabets
def test_init_noDefault(self):
"""Init should fail if alphabets param has no 'default' key"""
self.assertRaises(ValueError, SearchPath, {12:"A"})
#end test_init_noDefault
#-------------------------------------------------------------------
#-------------------------------------------------------------------
#Tests of value property (and _get_value method)
def test_value_empty(self):
"""Should return empty string when path is empty"""
test = SearchPath(SearchPathHelper.alphabets)
self.assertEquals(test.Value, "")
#end test_value_empty
def test_value(self):
"""Should return string of node values when nodes exist"""
node_vals = []
test = self._fill_path(3, node_vals)
self.assertEquals(test.Value, "".join(node_vals))
#end test_value
#-------------------------------------------------------------------
#-------------------------------------------------------------------
#Tests of _top_index property (and _get_top_index method)
def test_top_index(self):
"""Should return index of top node when one exists"""
test = self._fill_path(3)
top_index = test._top_index
self.assertEquals(top_index,2)
#end test_top_index
def test_top_index_None(self):
"""Should return None when stack has no entries"""
test = SearchPath(SearchPathHelper.alphabets)
top_index = test._top_index
self.assertEquals(top_index, None)
#end test_top_index_None
#-------------------------------------------------------------------
#-------------------------------------------------------------------
#Tests of _get_top method
def test_get_top(self):
"""Should return a reference to top node on stack if there is one"""
test = SearchPath(SearchPathHelper.alphabets)
test._add_node(SearchNode(SearchNodeHelper.alphabet))
topnode = SearchNode(SearchNodeHelper.alphabet)
test._add_node(topnode)
resultnode = test._get_top()
self.assertEquals(resultnode, topnode)
#end test_get_top
def test_get_top_None(self):
"""Should return None if stack is empty"""
test = SearchPath(SearchPathHelper.alphabets)
topnode = test._get_top()
self.assertEquals(topnode, None)
#end test_get_top_None
#-------------------------------------------------------------------
#-------------------------------------------------------------------
#Test of _get_forbidden_lengths
def test_get_forbidden_lengths(self):
"""get_forbidden_lengths should return dict of forbidden seq lens"""
correct_result = str([3, 4, 9])
user_input = SearchPathHelper.standard_forbid_seq[:]
user_input.extend(["AUG", "aaaaccuag"])
test = SearchPath(SearchPathHelper.alphabets, user_input)
real_dict = test._get_forbidden_lengths()
real_list = real_dict.keys()
real_list.sort()
real_result = str(real_list)
self.assertEquals(real_result, correct_result)
#end test_get_forbidden_lengths
#-------------------------------------------------------------------
#-------------------------------------------------------------------
#Tests of _add_node
def test_add_node_first(self):
"""add_node should correctly add first node and increase top index."""
test = SearchPath(SearchPathHelper.alphabets, \
SearchPathHelper.standard_forbid_seq)
test_node = SearchNode(SearchNodeHelper.alphabet)
test._add_node(test_node)
self.assertEquals(len(test._path_stack), 1)
self.assertEquals(test._top_index, 0)
#end test_add_node_first
def test_add_node_subsequent(self):
"""add_node should correctly add additional nodes and up top index."""
test = SearchPath(SearchPathHelper.alphabets, \
SearchPathHelper.standard_forbid_seq)
test_node = SearchNode(SearchNodeHelper.alphabet)
test_node2 = SearchNode(SearchNodeHelper.alphabet)
test._add_node(test_node)
test._add_node(test_node2)
self.assertEquals(len(test._path_stack), 2)
self.assertEquals(test._top_index, 1)
#end test_add_node_subsequent
#-------------------------------------------------------------------
#-------------------------------------------------------------------
#Tests of _get_nmer
def test_get_nmer(self):
"""get_nmer should return correct nmer for n <= length of stack"""
node_values = []
n = 4
test_path = SearchPath(SearchPathHelper.alphabets, \
SearchPathHelper.standard_forbid_seq)
for i in xrange(n+1):
curr_node = SearchNode(SearchNodeHelper.alphabet)
test_path._add_node(curr_node)
node_values.append(curr_node.Value)
#next
#get a nmer, and get the last n values that were put on stack;
#should be the same
real_result = test_path._get_nmer(n)
correct_result = "".join(node_values[-n:])
self.assertEquals(real_result, correct_result)
#end test_get_nmer
def test_get_nmer_tooLong(self):
"""get_nmer should return None for n > length of stack"""
test_path = SearchPath(SearchPathHelper.alphabets, \
SearchPathHelper.standard_forbid_seq)
test_node = SearchNode(SearchNodeHelper.alphabet)
test_path._add_node(test_node)
#stack is 1 long. Ask for a 2 mer
real_result = test_path._get_nmer(2)
self.assertEquals(real_result, None)
#end test_get_nmer_tooLong
def test_get_nmer_len1(self):
"""get_nmer should return correct result for nmer 1 on full stack"""
test_path = SearchPath(SearchPathHelper.alphabets, \
SearchPathHelper.standard_forbid_seq)
test_node = SearchNode(SearchNodeHelper.alphabet)
test_path._add_node(test_node)
correct_result = test_node.Value
real_result = test_path._get_nmer(1)
self.assertEquals(real_result, correct_result)
#end test_get_nmer_len1
def test_get_nmer_len0(self):
"""get_nmer should return an empty string if n is 0"""
#if n is zero, this should return "" even when stack is empty
test_path = SearchPath(SearchPathHelper.alphabets, \
SearchPathHelper.standard_forbid_seq)
real_result = test_path._get_nmer(0)
self.assertEquals(real_result, "")
#end test_get_nmer_len0
def test_get_nmer_badArg(self):
"""get_nmer should error if given a non integer-castable n"""
test_path = SearchPath(SearchPathHelper.alphabets, \
SearchPathHelper.standard_forbid_seq)
self.assertRaises(NonnegIntError, test_path._get_nmer, "blue")
#end test_get_nmer_badArg
#-------------------------------------------------------------------
#-------------------------------------------------------------------
#Tests of _check_forbidden_seqs
def test_check_forbidden_seqs_fixed(self):
"""Should return True if path includes a fixed forbidden seq"""
forbidden_seq = ["G", "U", "A"]
user_input = ["".join(forbidden_seq)]
user_input.extend(SearchPathHelper.standard_forbid_seq)
test = SearchPath(SearchPathHelper.alphabets, user_input)
test._add_node(SearchNode(SearchNodeHelper.alphabet))
#add more values, and cheat so as to make them something forbidden
for bad_val in forbidden_seq:
curr_node = SearchNode(SearchNodeHelper.alphabet)
curr_node._options[0] = bad_val #torque the node's innards
test._add_node(curr_node)
#next bad_val
real_result = test._check_forbidden_seqs()
self.assertEquals(real_result, True)
#end test_check_forbidden_seqs_fixed
def test_check_forbidden_seqs_none(self):
"""Should return False if path includes no forbidden seqs"""
#a seq that isn't in the standard fixed forbidden lib
allowed_seq = ["C", "U", "A", "T"]
test = SearchPath(SearchPathHelper.alphabets, \
SearchPathHelper.standard_forbid_seq)
test._add_node(SearchNode(SearchNodeHelper.alphabet))
#add more values, and cheat so as to make them something known
for known_val in allowed_seq:
curr_node = SearchNode(SearchNodeHelper.alphabet)
curr_node._options[0] = known_val #torque the node's innards
test._add_node(curr_node)
#next bad_val
real_result = test._check_forbidden_seqs()
self.assertEquals(real_result, False)
#end test_check_forbidden_seqs_fixed
#-------------------------------------------------------------------
#-------------------------------------------------------------------
#Tests of _get_alphabet
def test_get_alphabet_exists(self):
"""Should return alphabet for position when one exists"""
alph1 = "G"
alph2 = "ACGT"
test_alphs = {0:alph1, 2:alph1, SearchPath.DEFAULT_KEY:alph2}
test = SearchPath(test_alphs)
real_alph = test._get_alphabet(2)
self.assertEquals(str(real_alph), alph1)
#end test_get_alphabet_exists
def test_get_alphabet_default(self):
"""Should return default alphabet if none defined for position"""
#SearchPathHelper.alphabets has only a default entry
test = SearchPath(SearchPathHelper.alphabets)
real_alph = test._get_alphabet(0)
correct_alph = SearchPathHelper.alphabets[SearchPath.DEFAULT_KEY]
self.assertEquals(str(real_alph), str(correct_alph))
#end test_get_alphabet_default
def test_get_alphabet_badPosition(self):
"""Should raise error if input isn't castable to nonneg int"""
test = SearchPath(SearchPathHelper.alphabets)
self.assertRaises(NonnegIntError, test._get_alphabet, "blue")
#end test_get_alphabet_badPosition
#-------------------------------------------------------------------
#end SearchPathTests_private
class SearchNodeTests_private(TestCase):
"""Tests for private SearchNode methods."""
#No need to test __str__: just calls toString in general_tools
#No need to test _get_value and the value property: just references
#an item in an array
#No need to test _get_alphabet and alphabet property: ibid
#-------------------------------------------------------------------
#Tests of __init__
def test_init_noArg(self):
"""Init should correctly set private properties w/no arg"""
correct_result = str(SearchNodeHelper.alphabet)
test = SearchNode(SearchNodeHelper.alphabet)
options = test.Options
options.sort()
real_result = str(options)
self.assertEquals(real_result, correct_result)
#end test_init_noArg
#-------------------------------------------------------------------
#end SearchNodeTests_private
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_seqsim/test_sequence_generators.py 000644 000765 000024 00000142373 12024702176 025427 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""test_sequence_generator.py: tests of the sequence_generator module.
"""
from cogent.seqsim.sequence_generators import permutations, combinations, \
SequenceGenerator, Partition, Composition, \
MageFrequencies, SequenceHandle, IUPAC_DNA, IUPAC_RNA, BaseFrequency, \
PairFrequency, BasePairFrequency, RegionModel, ConstantRegion, \
UnpairedRegion, ShuffledRegion, PairedRegion, MatchingRegion, \
SequenceModel, Rule, Motif, Module, SequenceEmbedder
from StringIO import StringIO
from operator import mul
from sys import path
from cogent.maths.stats.util import Freqs
from cogent.util.misc import app_path
from cogent.struct.rna2d import ViennaStructure
from cogent.util.unit_test import TestCase, main
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Daniel McDonald"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Development"
#need to skip some tests if RNAfold absent
if app_path('RNAfold'):
RNAFOLD_PRESENT = True
else:
RNAFOLD_PRESENT = False
class FunctionTests(TestCase):
"""Tests of standalone functions"""
def setUp(self):
self.standards = (0, 1, 5, 30, 173, 1000, 4382)
def test_permuations_negative_k(self):
"""permutations should raise IndexError if k negative"""
self.assertRaises(IndexError, permutations, 3, -1)
def test_permutations_k_more_than_n(self):
"""permutations should raise IndexError if k > n"""
self.assertRaises(IndexError, permutations, 3, 4)
def test_permutations_negative_n(self):
"""permutations should raise IndexError if n negative"""
self.assertRaises(IndexError, permutations, -3, -2)
def test_permutations_k_equals_1(self):
"""permutations should return n if k=1"""
for n in self.standards[1:]:
self.assertEqual(permutations(n,1), n)
def test_permutations_k_equals_2(self):
"""permutations should return n*(n-1) if k=2"""
for n in self.standards[2:]:
self.assertEqual(permutations(n,2), n*(n-1))
def test_permutations_k_equals_n(self):
"""permutations should return n! if k=n"""
for n in self.standards[1:]:
self.assertEqual(permutations(n,n), reduce(mul, range(1,n+1)))
def test_combinations_k_equals_n(self):
"""combinations should return 1 if k = n"""
for n in self.standards:
self.assertEqual(combinations(n,n), 1)
def test_combinations_k_equals_n_minus_1(self):
"""combinations should return n if k=(n-1)"""
for n in self.standards[1:]:
self.assertEqual(combinations(n, n-1), n)
def test_combinations_zero_k(self):
"""combinations should return 1 if k is zero"""
for n in self.standards:
self.assertEqual(combinations(n, 0), 1)
def test_combinations_symmetry(self):
"""combinations(n,k) should equal combinations(n,n-k)"""
for n in self.standards[3:]:
for k in (0, 1, 5, 18):
self.assertEquals(combinations(n, k), combinations(n, n-k))
def test_combinations_arbitrary_values(self):
"""combinations(n,k) should equal results from spreadsheet"""
results = {
30:{0:1, 1:30, 5:142506, 18:86493225, 29:30, 30:1},
173:{0:1, 1:173, 5:1218218079, 18:1.204353e24, 29:7.524850e32, \
30:3.611928e33},
1000:{0:1, 1:1000, 5:8.2502913e12, 18:1.339124e38,29:7.506513e55, \
30:2.429608e57},
4382:{0:1, 1:4382, 5:1.343350e16, 18:5.352761e49, 29:4.184411e74, \
30:6.0715804e76},
}
for n in self.standards[3:]:
for k in (0, 1, 5, 18, 29, 30):
self.assertFloatEqualRel(combinations(n,k), results[n][k], 1e-5)
class SequenceGeneratorTests(TestCase):
"""Tests of SequenceGenerator, which fills in degenerate bases"""
def setUp(self):
"""Defines a few standard generators"""
self.rna_codons = SequenceGenerator('NNN')
self.dna_iupac_small = SequenceGenerator('RH', IUPAC_DNA)
self.empty = SequenceGenerator('')
self.huge = SequenceGenerator('N'*50)
self.binary = SequenceGenerator('01??01', {'0':'0','1':'1','?':'10'})
def test_len(self):
"""len(SequenceGenerator) should return number of possible matches"""
lengths = ((self.rna_codons, 64), (self.dna_iupac_small, 6),
(self.empty, 0), (self.binary, 4))
for item, expected in lengths:
self.assertEqual(len(item), expected)
try:
len(self.huge)
except OverflowError:
pass
else:
raise AssertionError, "Failed to raise expected OverflowError"
def test_numPossibilities(self):
"""SequenceGenerator.numPossibilities() should be robust to overflow"""
lengths = ((self.rna_codons, 64), (self.dna_iupac_small, 6),
(self.empty, 0), (self.binary, 4), (self.huge, 4**50))
for item, expected in lengths:
self.assertEqual(item.numPossibilities(), expected)
def test_sequences(self):
"""SequenceGenerator should produce the correct list of sequences"""
self.assertEqual(list(self.empty), [])
self.assertEqual(list(self.dna_iupac_small), \
['AT','AC','AA','GT','GC','GA'])
codons = []
for first in 'UCAG':
for second in 'UCAG':
for third in 'UCAG':
codons.append(''.join([first, second, third]))
self.assertEqual(list(self.rna_codons), codons)
#test that it still works if we call the generator a second time
self.assertEqual(list(self.rna_codons), codons)
def test_iter(self):
"""SequenceGenerator should act like a list with for..in syntax"""
as_list = list(self.rna_codons)
for obs, exp in zip(self.rna_codons, as_list):
self.assertEqual(obs, exp)
def test_getitem(self):
"""SequenceGenerator should allow __getitem__ like a list"""
as_list = list(self.rna_codons)
for i in range(64):
self.assertEqual(self.rna_codons[i], as_list[i])
for i in range(1,65):
self.assertEqual(self.rna_codons[-i], as_list[-i])
self.assertEqual(self.huge[-1], 'G'*50)
def test_getitem_slices(self):
"""SequenceGenerator slicing should work the same as a list"""
e = list(self.rna_codons)
o = self.rna_codons
values = (
(o[:], e[:]),
(o[0:], e[0:]),
(o[1:], e[1:]),
(o[5:], e[5:]),
(o[0:5], e[0:5]),
(o[1:5], e[1:5]),
(o[5:5], e[5:5]),
(o[0:1], e[0:1]),
(o[len(o)-1:len(o)], e[len(e)-1:len(e)]),
(o[len(o):len(o)], e[len(e):len(e)]),
)
testnum = 0
for obs, exp in values:
testnum += 1
self.assertEqual(list(obs), exp)
big = list(self.huge[1:5])
self.assertEqual(['U'*49+'C', 'U'*49+'A', 'U'*49+'G', 'U'*48+'CU'], big)
class PartitionTests(TestCase):
"""Tests of the Paritition object."""
def test_single_partition(self):
"""If number of objects = bins * min, only one way to partition"""
for num_bins in range(1, 10):
for occupancy in range(10):
self.assertEqual(len(Partition(num_bins*occupancy,
num_bins, occupancy)), 1)
def test_partitions(self):
"""Test several properties of partitions, especially start/end"""
for num_bins in range(1, 5):
for occupancy in range(5):
for num_items in \
range(num_bins*occupancy, num_bins*occupancy + 10):
p = Partition(num_items, num_bins, occupancy)
l = [i for i in p]
l2 = [i for i in p]
#check that calling it twice doesn't break it
self.assertEqual(l, l2)
#check the lengths
self.assertEqual(len(p), len(l))
#check the ranges are the same...
self.assertEqual(l[0][1:], l[-1][0:-1])
#and that they contain the right values.
self.assertEqual(l[0][1:], [occupancy]*(num_bins - 1))
#check the first and last elements
self.assertEqual(l[0][0], l[-1][-1])
self.assertEqual(l[0][0], \
num_items - occupancy * (num_bins - 1))
def test_values(self):
"""Partition should match precalculated values"""
self.assertEqual(len(Partition(20, 4, 1)), 969)
def test_str(self):
"""str(partition) should work as expected"""
p = Partition(20,4,1)
self.assertEqual(str(p), "Items: 20 Pieces: 4 Min Per Piece: 1")
p.NumItems = 13
p.NumPieces = 2
p.MinOccupancy = 0
self.assertEqual(len(p), len(Partition(13, 2, 0)))
self.assertEqual(str(p), "Items: 13 Pieces: 2 Min Per Piece: 0")
class CompositionTests(TestCase):
"""Tests of the Composition class."""
def setUp(self):
"""Define a few standard compositions."""
self.bases_10pct = Composition(10, 0, "ACGU")
self.bases_5pct = Composition(5, 1, "ACGU")
self.bases_extra = Composition(10, 0, "CYGEJ")
self.small = Composition(20, 0, "xy")
self.unique = Composition(20, 1, "z")
def test_lengths(self):
"""Composition should return correct number of elements"""
self.assertEqual(len(self.bases_10pct), len(Partition(10,4,0)))
self.assertEqual(len(self.bases_5pct), len(Partition(20,4,1)))
self.assertEqual(len(self.bases_extra), len(Partition(10,5,0)))
self.assertEqual(len(self.small), len(Partition(5, 2, 0)))
self.assertEqual(len(self.unique), len(Partition(5, 1, 1)))
def test_known_vals(self):
"""Composition should return precalculated elements for known cases"""
self.assertEqual(len(Composition(5,1,"ACGU")), 969)
self.assertEqual(len(Composition(5,0,"ACGU")), 1771)
as_list = list(Composition(5,1,"ACGU"))
self.assertEqual(as_list[0], Freqs('A'*17+'CGU'))
self.assertEqual(as_list[-1], Freqs('U'*17+'ACG'))
def test_updating(self):
"""Composition updates should reset frequencies correctly."""
exp_list = list(Composition(5, 1, "GCAUN"))
self.bases_10pct.Spacing = 5
self.bases_10pct.Alphabet = "GCAUN"
self.bases_10pct.MinOccupancy = 1
self.assertEqual(list(self.bases_10pct), exp_list)
class MageFrequenciesTests(TestCase):
"""Tests of the MageFrequencies class -- presentation for Composition."""
def setUp(self):
"""Define a few standard compositions."""
self.bases_10pct = Composition(10, 0, "ACGU")
def test_str(self):
"""MageFrequencies string conversions work correctly"""
obs_list = list(self.bases_10pct)
self.assertEqual(str(MageFrequencies(obs_list[0])), '1.0 0.0 0.0')
self.assertEqual(str(MageFrequencies(obs_list[-1], "last")), \
'{last} 0.0 0.0 0.0')
self.assertEqual(str(MageFrequencies({'C':2, 'A':3, 'T':5, 'x':17}, \
'bases')), '{bases} 0.3 0.2 0.0')
class SequenceHandleTests(TestCase):
"""Tests of the SequenceHandle class."""
def setUp(self):
"""Define some standard SequenceHandles."""
self.rna = SequenceHandle('uuca', 'ucag')
self.any = SequenceHandle(['u', 1, None])
self.empty = SequenceHandle()
def test_init_good(self):
"""SequenceHandle should init OK without alphabet"""
self.assertEqual(SequenceHandle('abc123'), list('abc123'))
self.assertEqual(SequenceHandle(), list())
self.assertEqual(SequenceHandle('abcaaa', 'abcd'), list('abcaaa'))
self.assertEqual(SequenceHandle([1,2,3]), [1,2,3])
def test_init_bad(self):
"""SequenceHandle should raise ValueError if item not in alphabet"""
self.assertRaises(ValueError, SequenceHandle, 'abc1', 'abc')
self.assertRaises(ValueError, SequenceHandle, '1', [1])
def test_setitem_good(self):
"""SequenceHandle setitem should allow items in alphabet"""
self.rna[0] = 'c'
self.assertEqual(self.rna, list('cuca'))
self.rna[-1] = 'u'
self.assertEqual(self.rna, list('cucu'))
self.any[1] = [1, 2, 3]
self.assertEqual(self.any, ['u', [1, 2, 3], None])
def test_setitem_bad(self):
"""SequenceHandle setitem should reject items not in alphabet"""
self.assertRaises(ValueError, self.rna.__setitem__, 0, 'x')
def test_setslice_good(self):
"""SequenceHandle setslice should allow same-length slice"""
self.rna[:] = list('aaaa')
self.assertEqual(self.rna, list('aaaa'))
self.rna[0:1] = ['u']
self.assertEqual(self.rna, list('uaaa'))
self.rna[-2:] = ['g','g']
self.assertEqual(self.rna, list('uagg'))
def test_setslice_bad(self):
"""SequenceHandle setslice should reject bad items or length change"""
self.assertRaises(ValueError, self.rna.__setslice__, 0, len(self.rna), \
['a']*5)
self.assertRaises(ValueError, self.any.__setslice__, 0, len(self.any), \
['a']*5)
self.assertRaises(ValueError, self.rna.__setslice__, 0, 1, ['x'])
def test_string(self):
"""SequenceHandle str should join items without spaces"""
#use ''.join if items are strings
self.assertEqual(str(self.rna), 'uuca')
self.assertEqual(str(self.empty), '')
#if some of the items raise errors, use built-in method instead
self.assertEqual(str(self.any), str(['u', 1, None]))
def test_naughty_methods(self):
"""SequenceHandle list mutators should raise NotImplementedError"""
r = self.rna
naughty = [r.__delitem__, r.__delslice__, r.__iadd__, r.__imul__, \
r.append, r.extend, r.insert, r.pop, r.remove]
for n in naughty:
self.assertRaises(NotImplementedError, n)
class BaseFrequencyTests(TestCase):
"""Tests of BaseFrequency class: wrapper for FrequencyDistibution."""
def test_init(self):
"""BaseFrequency should init as expected"""
self.assertEqual(BaseFrequency('UUUCCCCAG'), \
Freqs('UUUCCCCAG', 'UCAG'))
self.assertEqual(BaseFrequency('TTTCAGG', RNA=False), \
Freqs('TTTCAGG'))
def test_init_bad(self):
"""BaseFrequency init should disallow bad characters"""
self.assertRaises(Exception, BaseFrequency, 'TTTCAGG')
self.assertRaises(Exception, BaseFrequency, 'UACGUA', False)
class PairFrequencyTests(TestCase):
"""Tests of PairFrequency class: wrapper for Freqs."""
def test_init_one_parameter(self):
"""PairFrequency should interpret single parameter as pair probs"""
obs = PairFrequency('UCCC')
exp = Freqs({('U','U'):0.0625, ('U','C'):0.1875,
('C','U'):0.1875, ('C','C'):0.5625})
for k, v in exp.items():
self.assertEqual(v, obs[k])
for k, v in obs.items():
if k not in exp:
self.assertEqual(v, 0)
self.assertEqual(PairFrequency('UCCC', [('U','U'),('C','C')]), \
Freqs({('U','U'):0.1, ('C','C'):0.9}))
#check that the alphabets are right: should not raise error on
#incrementing characters already there, but should raise KeyError
#on anything that's missing.
p = PairFrequency('UCCC')
p[('U','U')] += 1
try:
p[('X','U')] += 1
except KeyError:
pass
else:
raise AssertionError, "Expected KeyError."
p = PairFrequency('UCCC', (('C','C'),))
p[('C','C')] += 1
try:
p[('U','U')] += 1
except KeyError:
pass
else:
raise AssertionError, "Expected KeyError."
class BasePairFrequencyTests(TestCase):
"""Tests of the BaseFrequency class, constructed for easy initialization."""
def test_init(self):
"""BaseFrequency init should provide correct PairFrequency"""
WatsonCrick = [('A','U'), ('U','A'),('G','C'),('C','G')]
Wobble = WatsonCrick + [('G','U'), ('U','G')]
#by default, basepair should have the wobble alphabet
bpf = BasePairFrequency('UUACG')
pf = PairFrequency('UUACG', Wobble)
self.assertEqual(bpf, pf)
self.assertEqual(bpf.Constraint, pf.Constraint)
#can turn GU off, leading to watson-crickery
bpf = BasePairFrequency('UUACG', False)
#make sure this gives different results...
self.assertNotEqual(bpf, pf)
self.assertNotEqual(bpf.Constraint, pf.Constraint)
#...but that the results are the same when the correct alphabet is used
pf = PairFrequency('UUACG', WatsonCrick)
self.assertEqual(bpf, pf)
self.assertEqual(bpf.Constraint, pf.Constraint)
class RegionModelTests(TestCase):
"""Tests of the RegionModel class. Base class just returns the template."""
def test_init(self):
"""RegionModel base class should always return current template."""
#test blank region model
r = RegionModel()
self.assertEqual(str(r.Current), '')
self.assertEqual(len(r), 0)
#now assign it to a template
r.Template = ('ACGUUCGA')
self.assertEqual(str(r.Current), 'ACGUUCGA')
self.assertEqual(len(r), len('ACGUUCGA'))
#check that refresh doesn't break anything
r.refresh()
self.assertEqual(str(r.Current), 'ACGUUCGA')
self.assertEqual(len(r), len('ACGUUCGA'))
#check composition
self.assertEqual(r.Composition, None)
d = {'A':3, 'U':10}
r.Composition = Freqs(d)
self.assertEqual(r.Composition, d)
#check that composition doesn't break the update
r.refresh()
self.assertEqual(str(r.Current), 'ACGUUCGA')
self.assertEqual(len(r), len('ACGUUCGA'))
class ConstantRegionTests(TestCase):
"""Tests of the ConstantRegion class. Just returns the template."""
def test_init(self):
"""ConstantRegion should always return current template."""
#test blank region model
r = ConstantRegion()
self.assertEqual(str(r.Current), '')
self.assertEqual(len(r), 0)
#now assign it to a template
r.Template = ('ACGUUCGA')
self.assertEqual(str(r.Current), 'ACGUUCGA')
self.assertEqual(len(r), len('ACGUUCGA'))
#check that refresh doesn't break anything
r.refresh()
self.assertEqual(str(r.Current), 'ACGUUCGA')
self.assertEqual(len(r), len('ACGUUCGA'))
#check composition
self.assertEqual(r.Composition, None)
d = {'A':3, 'U':10}
r.Composition = Freqs(d)
self.assertEqual(r.Composition, d)
#check that composition doesn't break the update
r.refresh()
self.assertEqual(str(r.Current), 'ACGUUCGA')
self.assertEqual(len(r), len('ACGUUCGA'))
class UnpairedRegionTests(TestCase):
"""Tests of unpaired region: should fill in w/ single-base frequencies."""
def test_init(self):
"""Unpaired region should generate right freqs, even after change"""
freqs = Freqs({'C':10,'U':1, 'A':0})
r = UnpairedRegion('NN', freqs)
seq = r.Current
assert seq[0] in 'CU'
assert seq[1] in 'CU'
self.assertEqual(len(seq), 2)
fd = []
for i in range(1000):
r.refresh()
fd.append(str(seq))
fd = Freqs(''.join(fd))
observed = [fd['C'], fd['U']]
expected = [1800, 200]
self.assertSimilarFreqs(observed, expected)
self.assertEqual(fd['U'] + fd['C'], 2000)
freqs2 = Freqs({'A':5, 'U':5})
r.Composition = freqs2
r.Template = 'NNN' #note that changing the Template changes seq ref
seq = r.Current
self.assertEqual(len(seq), 3)
assert seq[0] in 'AU'
assert seq[1] in 'AU'
assert seq[2] in 'AU'
fd = []
for i in range(1000):
r.refresh()
fd.append(str(seq))
fd = Freqs(''.join(fd))
observed = [fd['A'], fd['U']]
expected = [1500, 1500]
self.assertSimilarFreqs(observed, expected)
self.assertEqual(fd['A'] + fd['U'], 3000)
class ShuffledRegionTests(TestCase):
"""Shuffled region should randomize string"""
def test_init(self):
"""Shuffled region should init ok with string, ignoring base freqs"""
#general strategy: seqs should be different, but sorted seqs should
#be the same
empty = ''
seq = 'UUUCCCCAAAGGG'
#check that we don't get errors on empty template
r = ShuffledRegion(empty)
r.refresh()
self.assertEqual(str(r.Current), '')
#check that changing the template changes the sequence
r.Template = seq
self.assertNotEqual(str(r.Current), '')
#check that it shuffled the sequence the first time
self.assertNotEqual(str(r.Current), seq)
curr = str(r.Current)
as_list = list(curr)
#check that we have the right number of each type of base
as_list.sort()
exp_as_list = list(seq)
exp_as_list.sort()
self.assertEqual(as_list, exp_as_list)
#check that we get something different if we refresh again
r.refresh()
self.assertNotEqual(str(r.Current), curr)
as_list = list(str(r.Current))
as_list.sort()
self.assertEqual(as_list, exp_as_list)
class PairedRegionTests(TestCase):
"""Tests of paired region generation."""
def test_init(self):
"""Paired region init and mutation should give expected results"""
WatsonCrick = {'A':'U', 'U':'A', 'C':'G', 'G':'C'}
Wobble = {'A':'U', 'U':'AG', 'C':'G', 'G':'UC'}
#check that empty init doesn't give errors
r = PairedRegion()
r.refresh()
#check that mutation works correctly
r.Template = "N"
self.assertEqual(len(r), 1)
r.monomers('UCCGGA')
upstream = r.Current[0]
downstream = r.Current[1]
states = {}
num_to_do = 10000
for i in range(num_to_do):
r.refresh()
curr = (upstream[0], downstream[0])
assert upstream[0] in Wobble[downstream[0]]
states[curr] = states.get(curr, 0) + 1
for i in states.keys():
assert i[1] in Wobble[i[0]]
for i in Wobble:
for j in Wobble[i]:
assert (i, j) in states.keys()
expected_dict = {('A','U'):num_to_do/14, ('U','A'):num_to_do/14,
('C','G'):num_to_do/14*4, ('G','C'):num_to_do/14*4,
('U','G'):num_to_do/14*2, ('G','U'):num_to_do/14*2,}
# the following for loop was replaced with the assertSimilarFreqs
# call below it
#for key, val in expected.items():
#self.assertFloatEqualAbs(val, states[key], 130) #conservative?
expected = [val for key, val in expected_dict.items()]
observed = [states[key] for key, val in expected_dict.items()]
self.assertSimilarFreqs(observed, expected)
assert ('G','U') in states
assert ('U','G') in states
r.monomers('UCGA', GU=False)
upstream = r.Current[0]
downstream = r.Current[1]
states = {}
num_to_do = 10000
for i in range(num_to_do):
r.refresh()
curr = (upstream[0], downstream[0])
assert upstream[0] in WatsonCrick[downstream[0]]
states[curr] = states.get(curr, 0) + 1
for i in states.keys():
assert i[1] in WatsonCrick[i[0]]
for i in WatsonCrick:
for j in WatsonCrick[i]:
assert (i, j) in states.keys()
expected_dict = {('A','U'):num_to_do/4, ('U','A'):num_to_do/4,
('C','G'):num_to_do/4, ('G','C'):num_to_do/4,}
expected = [val for key, val in expected_dict.items()]
observed = [states[key] for key, val in expected_dict.items()]
self.assertSimilarFreqs(observed, expected)
#for key, val in expected.items():
# self.assertFloatEqualAbs(val, states[key], 130) #3 std devs
assert ('G','U') not in states
assert ('U','G') not in states
class SequenceModelTests(TestCase):
"""Tests of the SequenceModel class."""
def test_init(self):
"""SequenceModel should init OK with Isoleucine motif."""
helices = [PairedRegion('NNN'), PairedRegion('NNNNN')]
constants = [ConstantRegion('CUAC'), ConstantRegion('UAUUGGGG')]
order = "H0 C0 H1 - H1 C1 H0"
isoleucine = SequenceModel(order=order, constants=constants, \
helices=helices)
isoleucine.Composition = BaseFrequency('UCAG')
#print
#print
for i in range(10):
isoleucine.refresh()
#print list(isoleucine)
#print
isoleucine.Composition = BaseFrequency('UCAG')
isoleucine.GU = False
#print
for i in range(10):
isoleucine.refresh()
#print list(isoleucine)
#print
class RuleTests(TestCase):
"""Tests of the Rule class"""
def test_init_bad_params(self):
"""Rule should fail validation except with exactly 5 parameters"""
self.assertRaises(TypeError, Rule, 1, 1, 1, 1)
self.assertRaises(TypeError, Rule, 1, 1, 1, 1, 1, 1)
def test_init_bad_length(self):
"""Rule should fail validation if helix extends past downstream start"""
self.assertRaises(ValueError, Rule, 0, 0, 1, 0, 2)
self.assertRaises(ValueError, Rule, 0, 0, 10, 10, 12)
def test_init_bad_negative_params(self):
"""Rule should fail validation if any parameters are negative"""
self.assertRaises(ValueError, Rule, -1, 0, 1, 0, 1)
self.assertRaises(ValueError, Rule, 0, -1, 1, 1, 1)
self.assertRaises(ValueError, Rule, 0, 0, -1, 0, 5)
self.assertRaises(ValueError, Rule, 0, 0, 0, -1, 1)
self.assertRaises(ValueError, Rule, 0, 0, 1, 1, -1)
def test_init_bad_zero_length(self):
"""Rule should fail validation if length is zero"""
self.assertRaises(ValueError, Rule, 0, 0, 1, 1, 0)
def test_init_overlap(self):
"""Rule should fail validation if bases must pair with themselves"""
self.assertRaises(ValueError, Rule, 0, 0, 0, 0, 1)
self.assertRaises(ValueError, Rule, 0, 10, 0, 15, 4)
def test_init_wrong_order(self):
"""First sequence must have lower index"""
self.assertRaises(ValueError, Rule, 1, 0, 0, 5, 3)
def test_init_ok_length(self):
"""Rule should init OK if helix extends to exactly downstream start"""
x = Rule(0, 0, 1, 0, 1)
self.assertEqual(str(x), \
"Up Seq: 0 Up Pos: 0 Down Seq: 1 Down Pos: 0 Length: 1")
#check adjacent bases
x = Rule(0, 0, 0, 1, 1)
self.assertEqual(str(x), \
"Up Seq: 0 Up Pos: 0 Down Seq: 0 Down Pos: 1 Length: 1")
x = Rule(1, 10, 2, 8, 7)
#check rule that would cause overlap if motifs weren't different
self.assertEqual(str(x), \
"Up Seq: 1 Up Pos: 10 Down Seq: 2 Down Pos: 8 Length: 7")
def test_str(self):
"""Rule str method should give expected results"""
x = Rule(1, 10, 2, 8, 7)
self.assertEqual(str(x), \
"Up Seq: 1 Up Pos: 10 Down Seq: 2 Down Pos: 8 Length: 7")
class RuleTests_compatibility(TestCase):
"""Tests to see whether the Rule compatibility code works"""
def setUp(self):
"""Sets up some standard rules"""
self.x = Rule(1, 5, 2, 10, 3)
self.x_ok = Rule(1, 8, 2, 14, 4)
self.x_ok_diff_sequences = Rule(3, 5, 5, 10, 3)
self.x_bad_first = Rule(1, 0, 3, 10, 10)
self.x_bad_first_2 = Rule(0, 0, 1, 8, 2)
self.x_bad_second = Rule(1, 15, 2, 15, 8)
self.x_bad_second_2 = Rule(1, 14, 2, 8, 4)
def test_is_compatible_ok(self):
"""Rule.isCompatible should return True if rules don't overlap"""
self.assertEqual(self.x.isCompatible(self.x_ok), True) #no return value
self.assertEqual(self.x.isCompatible(self.x_ok_diff_sequences), True)
#check that it's transitive
self.assertEqual(self.x_ok.isCompatible(self.x), True)
self.assertEqual(self.x_ok_diff_sequences.isCompatible(self.x), True)
def test_is_compatible_bad(self):
"""Rule.isComaptible should return False if rules overlap"""
tests = [ (self.x, self.x_bad_first),
(self.x, self.x_bad_first_2),
(self.x, self.x_bad_second),
(self.x, self.x_bad_second_2),
]
for first, second in tests:
self.assertEqual(first.isCompatible(second), False)
#check that it's transitive
self.assertEqual(second.isCompatible(first), False)
def test_fits_in_sequence(self):
"""Rule.fitsInSequence should return True if sequence long enough"""
sequences = map('x'.__mul__, range(21)) #0 to 20 copies of 'x'
rules = [self.x, self.x_ok, self.x_ok_diff_sequences, self.x_bad_first,
self.x_bad_first_2, self.x_bad_second, self.x_bad_second_2]
#test a bunch of values for all the rules we have handy
for s in sequences:
for r in rules:
if r.UpstreamPosition + r.Length > len(s):
self.assertEqual(r.fitsInSequence(s), False)
else:
self.assertEqual(r.fitsInSequence(s), True)
#test a couple of specific boundary cases
#length-1 helix
r = Rule(0, 0, 1, 0, 1)
self.assertEqual(r.fitsInSequence(''), False)
self.assertEqual(r.fitsInSequence('x'), True)
self.assertEqual(r.fitsInSequence('xx'), True)
#length-2 helix starting one base from the start
r = Rule(1, 1, 2, 2, 2)
self.assertEqual(r.fitsInSequence(''), False)
self.assertEqual(r.fitsInSequence('x'), False)
self.assertEqual(r.fitsInSequence('xx'), False)
self.assertEqual(r.fitsInSequence('xxx'), True)
self.assertEqual(r.fitsInSequence('xxxx'), True)
class ModuleTests(TestCase):
"""Tests of the Module class, which holds sequences and structures."""
def test_init_bad(self):
"""Module init should fail if seq/struct missing, or mismatched lengths"""
#test incorrect param number
self.assertRaises(TypeError, Module, 'abc')
self.assertRaises(TypeError, Module, 'abc', 'def', 'ghi')
#test incorrect lengths
self.assertRaises(ValueError, Module, 'abc', 'abcd')
self.assertRaises(ValueError, Module, 'abcd', 'acb')
def test_init_good(self):
"""Module init should work if seq and struct same length"""
m = Module('U', '.')
self.assertEqual(m.Sequence, 'U')
self.assertEqual(m.Structure, '.')
m.Sequence = ''
m.Structure = ''
self.assertEqual(m.Sequence, '')
self.assertEqual(m.Structure, '')
m.Sequence = 'CCUAGG'
m.Structure = '((..))'
self.assertEqual(m.Sequence, 'CCUAGG')
self.assertEqual(m.Structure, '((..))')
m.Structure = ''
self.assertRaises(ValueError, m.__len__)
def test_len(self):
"""Module len should work if seq and struct same length"""
m = Module('CUAG', '....')
self.assertEqual(len(m), 4)
m = Module('', '')
self.assertEqual(len(m), 0)
m.Sequence = 'AUCGAUCGA'
self.assertRaises(ValueError, m.__len__)
def test_str(self):
"""Module str should contain sequence and structure"""
m = Module('CUAG', '....')
self.assertEqual(str(m), 'Sequence: CUAG\nStructure: ....')
m = Module('', '')
self.assertEqual(str(m), 'Sequence: \nStructure: ')
def test_matches(self):
"""Module matches should return correct result for seq/struct match"""
empty = Module('', '')
short_p = Module('AC', '((')
short_u = Module('UU', '..')
short_up = Module('UU', '((')
long_all = Module('GGGACGGUUGGUUGGUU', ')))((..((....((((') #struct+seq
long_seq = Module('GGGACGGUUGGUU', ')))))))))))))') #seq but not struct
long_struct = Module('GGGGGGGGGGGGG', ')))((..((....') #struct, not seq
long_none = Module('GGGGGGGGGGGGG', ')))))))))))))') #not struct or seq
#test overall matching
for matcher in [empty, short_p, short_u, short_up]:
self.assertEqual(matcher.matches(long_all), True)
for longer in [long_seq, long_struct, long_none]:
if matcher is empty:
self.assertEqual(matcher.matches(longer), True)
else:
self.assertEqual(matcher.matches(longer), False)
#test specific positions
positions = {3:short_p, 11:short_u, 7:short_up, 15:short_up}
for module in [short_p, short_u, short_up]:
for i in range(len(long_all)):
result = module.matches(long_all, i)
if positions.get(i, None) is module:
self.assertEqual(result, True)
else:
self.assertEqual(result, False)
class MotifTests(TestCase):
"""Tests of the Motif object, which has a set of Modules and Rules."""
def setUp(self):
"""Defines a few standard motifs"""
self.ile_mod_0 = Module('NNNCUACNNNNN', '(((((..(((((')
self.ile_mod_1 = Module('NNNNNUAUUGGGGNNN', ')))))......)))))')
self.ile_rule_0 = Rule(0, 0, 1, 15, 3)
self.ile_rule_1 = Rule(0, 7, 1, 4, 5)
self.ile = Motif([self.ile_mod_0, self.ile_mod_1], \
[self.ile_rule_0, self.ile_rule_1])
self.hh_mod_0 = Module('NNNNUNNNNN', '(((((.((((')
self.hh_mod_1 = Module('NNNNCUGANGAGNNN', ')))).......((((')
self.hh_mod_2 = Module('NNNCGAAANNNN', '))))...)))))')
self.hh_rule_0 = Rule(0, 0, 2, 11, 5)
self.hh_rule_1 = Rule(0, 6, 1, 3, 4)
self.hh_rule_2 = Rule(1, 11, 2, 3, 4)
self.hh = Motif([self.hh_mod_0, self.hh_mod_1, self.hh_mod_2], \
[self.hh_rule_0, self.hh_rule_1, self.hh_rule_2])
self.simple_0 = Module('CCCCC', '(((..')
self.simple_1 = Module('GGGGG', '..)))')
self.simple_r = Rule(0, 0, 1, 4, 3)
self.simple = Motif([self.simple_0, self.simple_1], [self.simple_r])
def test_init_bad_rule_lengths(self):
"""Motif init should fail if rules don't match module lengths"""
bad_rule = Rule(0, 0, 1, 8, 6)
self.assertRaises(ValueError, Motif, [self.simple_0, self.simple_1], \
[bad_rule])
def test_init_conflicting_rules(self):
"""Motif init should fail if rules overlap"""
interferer = Rule(0, 2, 2, 20, 4)
self.assertRaises(ValueError, Motif, [self.ile_mod_0, self.ile_mod_1, \
self.ile_mod_0], [self.ile_rule_0, interferer])
def test_matches_simple(self):
"""Test of simple match should work correctly"""
index = '01234567890123456789012345678901'
seq = 'AAACCCCCUUUGGGGGAAACCCCCUUUGGGGG'
struct = ViennaStructure('((..((..))....))...(((.......)))')
struct_2 = ViennaStructure('((((((..((())))))))).....(((.)))')
#substring right, not pair
self.assertEqual(self.simple.matches(seq, struct, [19, 27]), True)
self.assertEqual(self.simple.matches(seq, struct_2, [19,27]), False)
for first_pos in range(len(seq) - len(self.simple_0) + 1):
for second_pos in range(len(seq) - len(self.simple_1) + 1):
#should match struct only at one location
match=self.simple.matches(seq, struct, [first_pos, second_pos])
if (first_pos == 19) and (second_pos == 27):
self.assertEqual(match, True)
else:
self.assertEqual(match, False)
#should never match in struct_2
self.assertEqual(self.simple.matches(seq, struct_2, \
[first_pos, second_pos]), False)
#check that it doesn't fail if there are _two_ matches
index = '01234567890123456789'
seq = 'CCCCCGGGGGCCCCCGGGGG'
struct = '(((....)))(((....)))'
struct = ViennaStructure(struct)
self.assertEqual(self.simple.matches(seq, struct, [0, 5]), True)
self.assertEqual(self.simple.matches(seq, struct, [10,15]), True)
#not allowed to cross-pair
self.assertEqual(self.simple.matches(seq, struct, [0, 15]), False)
def test_matches_ile(self):
"""Test of isoleucine match should work correctly"""
index = '012345678901234567890123456789012345'
seq_good = 'AAACCCCUACUUUUUCCCAAAAAUAUUGGGGGGGAA'
seq_bad = 'AAACCCCUACUUUUUCCCAAAAAUAUUGGGCGGGAA'
st_good = '...(((((..(((((...)))))......)))))..'
st_bad = '((((((((..(((((...)))))...))))))))..'
st_good = ViennaStructure(st_good)
st_bad = ViennaStructure(st_bad)
for first_pos in range(len(seq_good) - len(self.ile_mod_0) + 1):
for second_pos in range(len(seq_good) - len(self.ile_mod_1) + 1):
#seq_good and struct_good should match at one location
match=self.ile.matches(seq_good,st_good,[first_pos,second_pos])
if (first_pos == 3) and (second_pos == 18):
self.assertEqual(match, True)
else:
self.assertEqual(match, False)
self.assertEqual(self.ile.matches(seq_good, st_bad, \
[first_pos, second_pos]), False)
self.assertEqual(self.ile.matches(seq_bad, st_good, \
[first_pos, second_pos]), False)
self.assertEqual(self.ile.matches(seq_bad, st_bad, \
[first_pos, second_pos]), False)
def test_matches_hh(self):
"""Test of hammerhead match should work correctly"""
index = '0123456789012345678901234567890123456'
seq_good = 'CCCCUAGGGGCCCCCUGAAGAGAAAUUUCGAAAGGGG'
seq_bad ='CCCCCAGGGGCCCCCUGAAGAGAAAUUUCGAAGGGGG'
structure ='(((((.(((()))).......(((())))...)))))'
struct = ViennaStructure(structure)
self.assertEqual(self.hh.matches(seq_good, struct, [0, 10, 25]), True)
self.assertEqual(self.hh.matches(seq_bad, struct, [0, 10, 25]), False)
def test_structureMatches_hh(self):
"""Test of hammerhead structureMatch should work correctly"""
index = '0123456789012345678901234567890123456'
seq_good = 'CCCCUAGGGGCCCCCUGAAGAGAAAUUUCGAAAGGGG'
seq_bad ='CCCCCAGGGGCCCCCUGAAGAGAAAUUUCGAAGGGGG'
structure ='(((((.(((()))).......(((())))...)))))'
struct = ViennaStructure(structure)
self.assertEqual(self.hh.structureMatches(struct, [0, 10, 25]), True)
self.assertEqual(self.hh.structureMatches(struct, [0, 10, 25]), True)
class SequenceEmbedderTests(TestCase):
"""Tests of the SequenceEmbedder class."""
def setUp(self):
"""Define a few standard models and motifs"""
ile_mod_0 = Module('NNNCUACNNNNN', '(((((..(((((')
ile_mod_1 = Module('NNNNNUAUUGGGGNNN', ')))))......)))))')
ile_rule_0 = Rule(0, 0, 1, 15, 5)
ile_rule_1 = Rule(0, 7, 1, 4, 5)
ile_motif = Motif([ile_mod_0, ile_mod_1], \
[ile_rule_0, ile_rule_1])
helices = [PairedRegion('NNN'), PairedRegion('NNNNN')]
constants = [ConstantRegion('CUAC'), ConstantRegion('UAUUGGGG')]
order = "H0 C0 H1 - H1 C1 H0"
ile_model = SequenceModel(order=order, constants=constants, \
helices=helices, composition=BaseFrequency('UCAG'))
self.ile_embedder = SequenceEmbedder(length=50, num_to_do=10, \
motif=ile_motif, model=ile_model, composition=BaseFrequency('UCAG'))
short_ile_mod_0 = Module('NCUACNN', '(((..((')
short_ile_mod_1 = Module('NNUAUUGGGGN', '))......)))')
short_ile_rule_0 = Rule(0, 0, 1, 10, 3)
short_ile_rule_1 = Rule(0, 5, 1, 1, 2)
short_ile_motif = Motif([short_ile_mod_0, short_ile_mod_1], \
[short_ile_rule_0, short_ile_rule_1])
short_helices = [PairedRegion('N'), PairedRegion('NN')]
short_constants = [ConstantRegion('CUAC'), ConstantRegion('UAUUGGGG')]
short_order = "H0 C0 H1 - H1 C1 H0"
short_ile_model = SequenceModel(order=short_order, \
constants=short_constants, \
helices=short_helices, composition=BaseFrequency('UCAG'))
self.short_ile_embedder = SequenceEmbedder(length=50, num_to_do=10, \
motif=short_ile_motif, model=short_ile_model, \
composition=BaseFrequency('UCAG'))
def test_composition_change(self):
"""Changes in composition should propagate."""
rr = str(self.ile_embedder.RandomRegion.Current)
#for base in 'UCAG':
# assert base in rr
#the above two lines should generally be true but fail stochastically
self.ile_embedder.Composition = BaseFrequency('CG')
self.assertEqual(self.ile_embedder.Model.Composition, \
BaseFrequency('CG'))
self.assertEqual(self.ile_embedder.RandomRegion.Composition, \
BaseFrequency('CG'))
self.ile_embedder.RandomRegion.refresh()
self.assertEqual(len(self.ile_embedder.RandomRegion), 22)
rr = str(self.ile_embedder.RandomRegion.Current)
assert ('C' in rr or 'G' in rr)
assert 'A' not in rr
assert 'U' not in rr
def test_choose_locations_too_short(self):
"""SequenceEmbedder _choose_locations should fail if too little space"""
self.ile_embedder.Length = 28 #no positions left over
self.assertRaises(ValueError, self.ile_embedder._choose_locations)
self.ile_embedder.Length = 29 #one position left over
self.assertRaises(ValueError, self.ile_embedder._choose_locations)
def test_choose_locations_exact(self):
"""SequenceEmbedder _choose_locations should pick all locations"""
self.ile_embedder.Length = 30 #two positions left: must both be filled
for i in range(10):
first, second = self.ile_embedder._choose_locations()
self.assertEqual(first, 0)
self.assertEqual(second, 1)
def test_choose_locations_even(self):
"""SequenceEmbedder _choose_locations should pick locations evenly"""
self.ile_embedder.Length = 31 #three positions left
counts = {}
for i in range(1000):
key = tuple(self.ile_embedder._choose_locations())
assert key[0] != key[1]
curr = counts.get(key, 0)
counts[key] = curr + 1
expected = [333, 333, 333]
observed = [counts[(0,1)], counts[(0,2)], counts[(1,2)]]
self.assertSimilarFreqs(observed, expected)
#make sure nothing else snuck in there
self.assertEqual(counts[(0,1)]+counts[(0,2)]+counts[(1,2)], 1000)
def test_choose_locations_with_replacement(self):
"""SequenceEmbedder _choose_locations can sample with replacement"""
self.ile_embedder.Length = 28 #exact fit
self.ile_embedder.WithReplacement = True
for i in range(10):
first, second = self.ile_embedder._choose_locations()
self.assertEqual(first, 0)
self.assertEqual(second, 0)
self.ile_embedder.Length = 29 #one left over: can be 0,0 0,1 1,1
counts = {}
for i in range(1000):
key = tuple(self.ile_embedder._choose_locations())
curr = counts.get(key, 0)
counts[key] = curr + 1
expected = [250, 500, 250]
observed = [counts[(0,0)], counts[(0,1)], counts[(1,1)]]
self.assertSimilarFreqs(observed, expected)
#make sure nothing else snuck in there
self.assertEqual(counts[(0,0)]+counts[(0,1)]+counts[(1,1)], 1000)
def test_insert_modules(self):
"""SequenceEmbedder _insert_modules should make correct sequence"""
ile = self.ile_embedder
ile.Length = 50
ile.RandomRegion.Current[:] = ['A'] * 22
modules = list(ile.Model)
ile.Positions = [0, 0] #try inserting at first position
self.assertEqual(str(ile), modules[0] + modules[1] + 'A'*22)
ile.Positions = [3, 20]
self.assertEqual(str(ile), 'A'*3+modules[0]+'A'*17+modules[1]+'A'*2)
def test_refresh(self):
"""SequenceEmbedder refresh should change module sequences"""
modules_before = list(self.ile_embedder.Model)
random_before = str(self.ile_embedder.RandomRegion.Current)
self.ile_embedder.refresh()
random_after = str(self.ile_embedder.RandomRegion.Current)
self.assertNotEqual(random_before, random_after)
modules_after = list(self.ile_embedder.Model)
for before, after in zip(modules_before, modules_after):
self.assertNotEqual(before, after)
#check that it works twice
self.ile_embedder.refresh()
random_third = str(self.ile_embedder.RandomRegion.Current)
modules_third = list(self.ile_embedder.Model)
self.assertNotEqual(random_third, random_before)
self.assertNotEqual(random_third, random_after)
for first, second, third in \
zip(modules_before, modules_after, modules_third):
self.assertNotEqual(first, third)
self.assertNotEqual(second, third)
def test_countMatches(self):
"""Shouldn't find any Ile matches if all the pairs are GU"""
if not RNAFOLD_PRESENT:
return
self.ile_embedder.NumToDo = 100
self.ile_embedder.Composition = BaseFrequency('GGGGGGGGGU')
self.ile_embedder.Length = 40
good_count = self.ile_embedder.countMatches()
self.assertEqual(good_count, 0)
def test_countMatches_pass(self):
"""Should find some matches against a random background"""
if not RNAFOLD_PRESENT:
return
self.ile_embedder.NumToDo = 100
self.ile_embedder.Composition = BaseFrequency('UCAG')
self.ile_embedder.Length = 40
good_count = self.ile_embedder.countMatches()
self.assertNotEqual(good_count, 0)
def test_refresh_specific_position(self):
"""Should always find the module in the same position if specified"""
first_module = Module('AAAAA', '(((((')
second_module = Module('UUUUU', ')))))')
rule_1 = Rule(0, 0, 1, 4, 5)
helix = Motif([first_module, second_module], [rule_1])
model = SequenceModel(constants=[ConstantRegion('AAAAA'), \
ConstantRegion('UUUUU')], order='C0 - C1', \
composition=BaseFrequency('A'))
embedder = SequenceEmbedder(length=30, num_to_do=100, \
motif=helix, model=model, composition=BaseFrequency('CG'), \
positions=[3, 6])
last = ''
for i in range(100):
embedder.refresh()
curr = str(embedder)
self.assertEqual(curr[3:8], 'AAAAA')
self.assertEqual(curr[11:16], 'UUUUU')
self.assertEqual(curr.count('A'), 5)
self.assertEqual(curr.count('U'), 5)
self.assertNotEqual(last, curr)
last = curr
def test_refresh_primers(self):
"""Module should appear in correct location with primers"""
first_module = Module('AAAAA', '(((((')
second_module = Module('UUUUU', ')))))')
rule_1 = Rule(0, 0, 1, 4, 5)
helix = Motif([first_module, second_module], [rule_1])
model = SequenceModel(constants=[ConstantRegion('AAAAA'), \
ConstantRegion('UUUUU')], order='C0 - C1', \
composition=BaseFrequency('A'))
embedder = SequenceEmbedder(length=30, num_to_do=100, \
motif=helix, model=model, composition=BaseFrequency('CG'), \
positions=[3, 6], primer_5 = 'UUU', primer_3 = 'AAA')
last = ''
for i in range(100):
embedder.refresh()
curr = str(embedder)
self.assertEqual(curr[0:3], 'UUU')
self.assertEqual(curr[6:11], 'AAAAA')
self.assertEqual(curr[14:19], 'UUUUU')
self.assertEqual(curr.count('A'), 8)
self.assertEqual(curr.count('U'), 8)
self.assertEqual(curr[-3:], 'AAA')
self.assertNotEqual(last, curr)
last = curr
def xxx_test_count_long(self):
self.ile_embedder.NumToDo = 100000
self.ile_embedder.Composition = BaseFrequency('UCAG')
print
print "Extended helices"
for length in range(30, 150):
self.ile_embedder.Length = length
good_count = self.ile_embedder.countMatches()
print "Length: %s Matches: %s/100000" % (length, good_count)
print
def xxx_test_count_short(self):
self.short_ile_embedder.NumToDo = 10000
self.short_ile_embedder.Composition = BaseFrequency('UCAG')
print
print "Minimal motif"
for length in range(20, 150):
self.short_ile_embedder.Length = length
good_count = self.short_ile_embedder.countMatches()
print "Length: %s Matches: %s/10000" % (length, good_count)
print
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_seqsim/test_tree.py 000644 000765 000024 00000072754 12024702176 022332 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from cogent.util.unit_test import TestCase, main
from cogent.parse.tree import DndParser
from cogent.seqsim.tree import RangeNode, balanced_breakpoints, BalancedTree, \
RandomTree, CombTree, StarTree, LineTree
from cogent.core.usage import DnaPairs
from copy import deepcopy
from operator import mul, or_, add
from numpy import array, average, diag
from numpy.random import random, randint
from cogent.seqsim.usage import Rates
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Daniel McDonald"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class treeTests(TestCase):
"""Tests for top-level functions."""
def test_init(self):
"""Make sure keyword arguments are being passed to baseclass"""
node = RangeNode(LeafRange=1, Id=2, Name='foo', Length=42)
self.assertEqual(node.LeafRange, 1)
self.assertEqual(node.Id, 2)
self.assertEqual(node.Name, 'foo')
self.assertEqual(node.Length, 42)
def test_balanced_breakpoints(self):
"""balanced_breakpoints should produce expected arrays."""
self.assertRaises(ValueError, balanced_breakpoints, 1)
self.assertEqual(balanced_breakpoints(2), array([0]))
self.assertEqual(balanced_breakpoints(4), array([1,0,2]))
self.assertEqual(balanced_breakpoints(8), \
array([3,1,5,0,2,4,6]))
self.assertEqual(balanced_breakpoints(16), \
array([7,3,11,1,5,9,13,0,2,4,6,8,10,12,14]))
self.assertEqual(balanced_breakpoints(32), \
array([15,7,23,3,11,19,27,1,5,9,13,17,21,25,29,\
0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30]))
def test_BalancedTree(self):
"""BalancedTree should return a balanced tree"""
b = BalancedTree(4)
self.assertEqual(len(list(b.traverse())), 4)
b.assignIds()
self.assertEqual(str(b), '((0,1)4,(2,3)5)6')
def test_RandomTree(self):
"""RandomTree should return correct number of nodes
NOTE: all the work is done in breakpoints, which is thoroughly
tested indepndently. RandomTree just makes permutations.
"""
d = {}
for i in range(10):
r = RandomTree(100)
self.assertEqual(len(list(r.traverse())), 100)
r.assignIds()
#make sure we get different trees each time...
s = str(r)
assert s not in d
d[s] = None
def test_CombTree(self):
"""CombTree should return correct topology"""
c = CombTree(4)
c.assignIds()
self.assertEqual(str(c), '(0,(1,(2,3)4)5)6')
c = CombTree(4, deepest_first=False)
c.assignIds()
self.assertEqual(str(c), '(((0,1)4,2)5,3)6')
def test_StarTree(self):
"""StarTree should return correct star topology and # nodes"""
t = StarTree(5)
self.assertEqual(len(t.Children), 5)
for c in t.Children:
assert c.Parent is t
def test_LineTree(self):
"""LineTree should return correct number of nodes"""
t = LineTree(5)
depth = 1
curr = t
while curr.Children:
self.assertEqual(len(curr.Children), 1)
depth += 1
curr = curr.Children[0]
self.assertEqual(depth, 5)
class RangeTreeTests(TestCase):
"""Tests of the RangeTree class."""
def setUp(self):
"""Make some standard objects to test."""
#Notes on sample string:
#
#1. trailing zeros are stripped in conversion to/from float, so result
# is only exactly the same without them.
#
#2. trailing chars (e.g. semicolon) are not recaptured in the output,
# so were deleted from original Newick-format string.
#
#3. whitespace is stripped, but is handy for formatting, so is stripped
# from original string before comparisons.
self.sample_tree_string = """
(
(
xyz:0.28124,
(
def:0.24498,
mno:0.03627)
A:0.1771)
B:0.0487,
abc:0.05925,
(
ghi:0.06914,
jkl:0.13776)
C:0.09853)
"""
self.t = DndParser(self.sample_tree_string, RangeNode)
self.i = self.t.indexByAttr('Name')
self.sample_string_2 = '((((a,b),c),(d,e)),((f,g),h))'
self.t2 = DndParser(self.sample_string_2, RangeNode)
self.i2 = self.t2.indexByAttr('Name')
self.sample_string_3 = '(((a,b),c),(d,e))'
self.t3 = DndParser(self.sample_string_3, RangeNode)
def test_str(self):
"""RangeNode should round-trip Newick string corrrectly."""
r = RangeNode()
self.assertEqual(str(r), '()')
#should work for tree with branch lengths set
t = DndParser(self.sample_tree_string, RangeNode)
expected = self.sample_tree_string.replace('\n', '')
expected = expected.replace(' ', '')
self.assertEqual(str(t), expected)
#self.assertEqual(t.getNewick(with_distances=True), expected)
#should also work for tree w/o branch lengths
t2 = DndParser(self.sample_string_2, RangeNode)
self.assertEqual(str(t2), self.sample_string_2)
def test_traverse(self):
"""RangeTree traverse should visit all nodes in correct order"""
t = self.t
i = self.i
#first, check that lengths are correct
#naked traverse() only does leaves; should be 6.
self.assertEqual(len(list(t.traverse())), 6)
#traverse() with self_before should count all nodes.
self.assertEqual(len(list(t.traverse(self_before=True))), 10)
#traverse() with self_after should have same count as self_before
self.assertEqual(len(list(t.traverse(self_after=True))), 10)
#traverse() with self_before and self_after should visit internal
#nodes multiple times
self.assertEqual(len(list(t.traverse(True,True))), 14)
#now, check that items are in correct order
exp = ['xyz','def','mno','abc','ghi','jkl']
obs = [i.Name for i in t.traverse()]
self.assertEqual(obs, exp)
exp = [None, 'B', 'xyz', 'A', 'def', 'mno', 'abc', 'C', 'ghi', 'jkl']
obs = [i.Name for i in t.traverse(self_before=True)]
self.assertEqual(obs, exp)
exp = ['xyz', 'def', 'mno', 'A', 'B', 'abc', 'ghi', 'jkl', 'C', None]
obs = [i.Name for i in t.traverse(self_after=True)]
self.assertEqual(obs, exp)
exp = [None, 'B', 'xyz', 'A', 'def', 'mno', 'A', 'B', 'abc', 'C', \
'ghi', 'jkl', 'C', None]
obs = [i.Name for i in t.traverse(self_before=True, self_after=True)]
self.assertEqual(obs, exp)
def test_indexByAttr(self):
"""RangeNode indexByAttr should make index using correct attr"""
t = self.t
i = self.i
#check that we got the right number of elements
#all elements unique, so should be same as num nodes
self.assertEqual(len(i), len(list(t.traverse(self_before=True))))
#check that we got everything
i_keys = i.keys()
i_vals = i.values()
for node in t.traverse(self_before=True):
assert node.Name in i_keys
assert node in i_vals
#can't predict which node will have None as the key
if node.Name is not None:
assert i[node.Name] is node
#check that it works when elements are not unique
t = self.t3
for node in t.traverse(self_before=True):
node.X = 'b'
for node in t.traverse():
node.X = 'a'
result = t.indexByAttr('X', multiple=True)
self.assertEqual(len(result), 2)
self.assertEqual(len(result['a']), 5)
self.assertEqual(len(result['b']), 4)
for n in t.traverse():
assert n in result['a']
assert not n in result['b']
def test_indexbyFunc(self):
"""RangeNode indexByFunc should make index from function"""
t = self.t
def f(n):
try:
return n.Name.isupper()
except AttributeError:
return None
i = self.i
f_i = t.indexByFunc(f)
self.assertEqual(len(f_i), 3)
self.assertEqual(f_i[True], [i['B'], i['A'], i['C']])
self.assertEqual(f_i[False], [i['xyz'], i['def'], i['mno'], \
i['abc'], i['ghi'], i['jkl']])
self.assertEqual(f_i[None], [i[None]])
def test_assignIds(self):
"""RangeNode assignIds should work as expected"""
t = self.t2
index = self.i2
t.assignIds()
#check that ids were set correctly on the leaves
for i, a in enumerate('abcdefgh'):
self.assertEqual(index[a].Id, i)
#check that ranges were set correctly on the leaves
for i, a in enumerate('abcdefgh'):
self.assertEqual(index[a].LeafRange, (i, i+1))
#check that internal ids were set correctly
obs = [i.Id for i in t.traverse(self_after=True)]
exp = [0,1,8,2,9,3,4,10,11,5,6,12,7,13,14]
self.assertEqual(obs, exp)
#check that internal ranges were set correctly
obs = [i.LeafRange for i in t.traverse(self_after=True)]
exp = [(0,1),(1,2),(0,2),(2,3),(0,3),(3,4),(4,5),(3,5),(0,5), \
(5,6),(6,7),(5,7),(7,8),(5,8),(0,8)]
self.assertEqual(obs, exp)
def test_propagateAttr(self):
"""RangeNode propagateAttr should send attr down tree, unless set"""
t = self.t
i = self.i
for n in t.traverse(self_before=True):
assert not hasattr(n, 'XYZ')
t.XYZ = 3
t.propagateAttr('XYZ')
for n in t.traverse(self_before=True):
self.assertEqual(n.XYZ, 3)
#shouldn't overwrite internal nodes by default
a_children = list(i['A'].traverse(self_before=True))
i['A'].GHI = 5
t.GHI = 1
t.propagateAttr('GHI')
for n in t.traverse(self_before=True):
if n in a_children:
self.assertEqual(n.GHI, 5)
else:
self.assertEqual(n.GHI, 1)
t.GHI = 0
t.propagateAttr('GHI', overwrite=True)
for n in t.traverse(self_before=True):
self.assertEqual(n.GHI, 0)
def test_delAttr(self):
"""RangeNode delAttr should delete attr from self and children"""
t = self.t2
for n in t.traverse(self_before=True):
assert hasattr(n, 'Name')
t.delAttr('Name')
for n in t.traverse(self_before=True):
assert not hasattr(n, 'Name')
def test_accumulateAttr(self):
"""RangeNode accumulateAttr should accumulate attr in right direction"""
t = self.t3
#test towards_leaves (the default)
f = lambda a, b: b + 1
for n in t.traverse(self_before=True):
n.Level = 0
t.accumulateAttr('Level', f=f)
levels = [i.Level for i in t.traverse(self_before=True)]
self.assertEqual(levels, [0,1,2,3,3,2,1,2,2])
for n in t.traverse(self_before=True):
n.Level=0
#test away from leaves
f = lambda a, b : max(a, b+1)
for n in t.traverse(self_before=True):
n.Level=0
t.accumulateAttr('Level', towards_leaves=False,f=f)
levels = [i.Level for i in t.traverse(self_before=True)]
self.assertEqual(levels, [3,2,1,0,0,0,1,0,0])
def test_accumulateChildAttr(self):
"""RangeNode accumulateChildAttr should work as expected"""
t = self.t2
i = self.i2
i['a'].x = 3
i['b'].x = 4
i['d'].x = 0
i['f'].x = 1
i['g'].x = 1
i['h'].x = 1
t.accumulateChildAttr('x', f=mul)
self.assertEqual([i.__dict__.get('x', None) for i in \
t.traverse(self_after=True)],
[3, 4, 12, None, 12, 0, None, 0, 0, 1, 1, 1, 1, 1, 0])
t.accumulateChildAttr('x', f=add)
self.assertEqual([i.__dict__.get('x', None) for i in \
t.traverse(self_after=True)],
[3, 4, 7, None, 7, 0, None, 0, 7, 1, 1, 2, 1, 3, 10])
def test_assignLevelsFromRoot(self):
"""RangeNode assignLevelsFromRoot should match hand-calculated levels"""
t = self.t3
t.assignLevelsFromRoot()
levels = [i.Level for i in t.traverse(self_before=True)]
self.assertEqual(levels, [0,1,2,3,3,2,1,2,2])
def test_assignLevelsFromLeaves(self):
"""RangeNode assignLevelsFromLeaves should match hand-calculated levels"""
t = self.t3
t.assignLevelsFromLeaves()
levels = [i.Level for i in t.traverse(self_before=True)]
self.assertEqual(levels, [3,2,1,0,0,0,1,0,0])
t.assignLevelsFromLeaves(use_min=True)
levels = [i.Level for i in t.traverse(self_before=True)]
self.assertEqual(levels, [2,1,1,0,0,0,1,0,0])
def test_attrToList(self):
"""RangeNode attrToList should return correct list of attr"""
t = self.t3
t.assignIds()
t.assignLevelsFromRoot()
#make sure nodes are in the order we expect
self.assertEqual([n.Id for n in t.traverse(self_before=True)],
[8,6,5,0,1,2,7,3,4])
#by default, should return list containing all nodes
obs = t.attrToList('Level')
self.assertEqual(obs, [3,3,2,2,2,2,1,1,0])
#should be able to do leaves only if specified
obs = t.attrToList('Level', leaves_only=True)
self.assertEqual(obs, [3,3,2,2,2])
#should be able to specify larger size
obs=t.attrToList('Level', size=12)
self.assertEqual(obs, [3,3,2,2,2,2,1,1,0,None,None,None])
#should be able to set default
obs=t.attrToList('Level', default='x', size=12)
self.assertEqual(obs, [3,3,2,2,2,2,1,1,0,'x','x','x'])
def test_attrFromList(self):
"""RangeNode attrFromList should set values correctly"""
t = self.t3
t.assignIds()
#by default, should set all nodes from array
t.attrFromList('Level', [3,3,2,2,2,2,1,1,0])
self.assertEqual([n.Level for n in t.traverse(self_before=True)], \
[0,1,2,3,3,2,1,2,2])
#should also work if we choose to set only the leaves (rest should
#stay at default values)
t.Level = -1
t.propagateAttr('Level', overwrite=True)
t.attrFromList('Level', [3,3,2,2,2,2,1,1,0], leaves_only=True)
self.assertEqual([n.Level for n in t.traverse(self_before=True)], \
[-1,-1,-1,3,3,2,-1,2,2])
def test_toBreakpoints(self):
"""RangeNode toBreakpoints should give expected list"""
t = self.t2
t.assignIds()
self.assertEqual(t.toBreakpoints(), [4,2,1,0,3,6,5])
def test_fromBreakpoints(self):
"""RangeNode fromBreakpoints should have correct topology"""
breakpoints = [4,2,1,0,3,6,5]
t = RangeNode.fromBreakpoints(breakpoints)
#check number of leaves
self.assertEqual(len(list(t.traverse())), 8)
self.assertEqual(len(list(t.traverse(self_before=True))), 15)
#check that leaves were created in right order
self.assertEqual([i.Id for i in t.traverse()], range(8))
#check that whole toplogy is right wrt ids...
nodes = list(t.traverse(self_before=True))
obs = [i.Id for i in nodes]
exp = [8, 9, 11, 13, 0, 1, 2, 12, 3, 4, 10, 14, 5, 6, 7]
self.assertEqual(obs, exp)
#...and ranges
obs = [i.LeafRange for i in nodes]
exp = [(0,8),(0,5),(0,3),(0,2),(0,1),(1,2),(2,3),(3,5),(3,4),(4,5), \
(5,8),(5,7),(5,6),(6,7),(7,8)]
self.assertEqual(obs, exp)
def test_leafLcaDepths(self):
"""RangeNode leafLcaDepths should return expected depths"""
t = self.t3
result = t.leafLcaDepths()
self.assertEqual(result, array([[0,1,2,3,3],
[1,0,2,3,3],
[2,2,0,3,3],
[3,3,3,0,1],
[3,3,3,1,0]]))
def test_randomNode(self):
"""RandomNode should hit all nodes equally"""
t = self.t3
result = {}
for i in range(100):
ans = id(t.randomNode())
if ans not in result:
result[ans] = 0
result[ans] += 1
self.assertEqual(len(result), 9)
for node in t.traverse(self_before=True):
assert id(node) in result
def test_randomLeaf(self):
"""RandomLeaf should hit all leaf nodes equally"""
t = self.t3
result = {}
for i in range(100):
ans = id(t.randomLeaf())
if ans not in result:
result[ans] = 0
result[ans] += 1
self.assertEqual(len(result), 5)
for node in t.traverse():
assert id(node) in result
def test_randomNodeWithNLeaves(self):
"""RandomNodeWithNLeaves should return node with correct # leaves"""
t = self.t3
#check that only the root gets selected with 5 leaves
result = {}
for i in range(20):
ans = id(t.randomNodeWithNLeaves(5))
if ans not in result:
result[ans] = 0
result[ans] += 1
self.assertEqual(len(result), 1)
assert id(t) in result
#check that nothing has 6 or 4 (for this tree) leaves
self.assertRaises(KeyError, t.randomNodeWithNLeaves, 6)
self.assertRaises(KeyError, t.randomNodeWithNLeaves, 4)
#check that it works with fewer than 5 leaves
#SINGLE LEAF:
result = {}
for i in range(40):
ans = id(t.randomNodeWithNLeaves(1))
if ans not in result:
result[ans] = 0
result[ans] += 1
self.assertEqual(len(result), 5)
self.assertEqual(sum(result.values()), 40)
#TWO LEAVES:
result = {}
for i in range(20):
ans = id(t.randomNodeWithNLeaves(2))
if ans not in result:
result[ans] = 0
result[ans] += 1
self.assertEqual(len(result), 2)
self.assertEqual(sum(result.values()), 20)
#THREE LEAVES:
result = {}
for i in range(20):
ans = id(t.randomNodeWithNLeaves(3))
if ans not in result:
result[ans] = 0
result[ans] += 1
self.assertEqual(len(result), 1)
self.assertEqual(sum(result.values()), 20)
def test_randomNodeAtLevel(self):
"""RangeNode randomNodeAtLevel should return random node at correct level"""
t = self.t3
#LEAVES:
result = {}
for i in range(40):
ans = id(t.randomNodeAtLevel(0))
if ans not in result:
result[ans] = 0
result[ans] += 1
self.assertEqual(len(result), 5)
self.assertEqual(sum(result.values()), 40)
#BACK ONE LEVEL:
result = {}
for i in range(20):
ans = id(t.randomNodeAtLevel(1))
if ans not in result:
result[ans] = 0
result[ans] += 1
self.assertEqual(len(result), 2)
self.assertEqual(sum(result.values()), 20)
#BACK TWO LEVELS:
result = {}
for i in range(20):
ans = id(t.randomNodeAtLevel(2))
if ans not in result:
result[ans] = 0
result[ans] += 1
self.assertEqual(len(result), 1)
self.assertEqual(sum(result.values()), 20)
#BACK THREE LEVELS (to root):
result = {}
for i in range(20):
ans = id(t.randomNodeAtLevel(3))
if ans not in result:
result[ans] = 0
result[ans] += 1
self.assertEqual(len(result), 1)
self.assertEqual(sum(result.values()), 20)
self.assertEqual(result.keys()[0], id(t))
def test_outgroupLast(self):
"""RangeNode outgroupLast should reorder nodes to put outgroup last"""
t = self.t3
a, b, c, d, e = t.traverse()
self.assertEqual(t.outgroupLast(c,a,b), (a, b, c))
self.assertEqual(t.outgroupLast(c,b,a), (b, a, c))
self.assertEqual(t.outgroupLast(b,d,a), (b, a, d))
self.assertEqual(t.outgroupLast(c,d,e), (d, e, c))
self.assertEqual(t.outgroupLast(a,d,e), (d, e, a))
self.assertEqual(t.outgroupLast(a,d,b), (a, b, d))
#check that it works if we suppress the cache
self.assertEqual(t.outgroupLast(c,a,b, False), (a, b, c))
self.assertEqual(t.outgroupLast(c,b,a, False), (b, a, c))
self.assertEqual(t.outgroupLast(b,d,a, False), (b, a, d))
self.assertEqual(t.outgroupLast(c,d,e, False), (d, e, c))
self.assertEqual(t.outgroupLast(a,d,e, False), (d, e, a))
self.assertEqual(t.outgroupLast(a,d,b, False), (a, b, d))
def test_filter(self):
"""RangeNode filter should keep or omit selected nodes."""
t_orig = self.t2
t = deepcopy(t_orig)
idx = t.indexByAttr('Name')
to_keep = map(idx.__getitem__, 'abch')
curr_leaves = list(t.traverse())
t.filter(to_keep)
curr_leaves = list(t.traverse())
for i in to_keep:
assert i in curr_leaves
for i in map(idx.__getitem__, 'defg'):
assert i not in curr_leaves
#note that it collapses one-child nodes
self.assertEqual(str(t), '(((a,b),c),h)')
#test same thing but omitting
t = deepcopy(t_orig)
idx = t.indexByAttr('Name')
to_omit = map(idx.__getitem__, 'abch')
t.filter(to_omit, keep=False)
curr_leaves = list(t.traverse())
for i in to_omit:
assert i not in curr_leaves
for i in map(idx.__getitem__, 'defg'):
assert i in curr_leaves
#note that it collapses one-child nodes
self.assertEqual(str(t), '((d,e),(f,g))')
#test that it works with internal nodes
t = deepcopy(t_orig)
idx = t.indexByAttr('Name')
to_omit = [idx['a'].Parent.Parent]
t.filter(to_omit, keep=False)
self.assertEqual(str(t), '((d,e),((f,g),h))')
#test that it adds branch lengths
t = deepcopy(t_orig)
idx = t.indexByAttr('Name')
for i in t.traverse(self_after=True):
i.BranchLength = 1
to_omit = map(idx.__getitem__, 'abdefg')
t.filter(to_omit, keep=False)
self.assertEqual(str(t), '(c,h)')
#test that it got rid of the temporary '_selected' attribute
for node in t.traverse(self_before=True):
assert not hasattr(node, '_selected')
#if nothing valid in to_keep, should return empty tree
t = deepcopy(t_orig)
idx = t.indexByAttr('Name')
to_keep = []
t.filter(to_keep, keep=True)
curr_leaves = list(t.traverse())
assert len(curr_leaves), 0
#if nothing valid in to_keep, should return empty tree
t = deepcopy(t_orig)
idx = t.indexByAttr('Name')
to_keep = list('abcde') #note: just labels, not nodes
t.filter(to_keep, keep=True)
curr_leaves = list(t.traverse())
assert len(curr_leaves), 0
def test_addChildren(self):
"""RangeNode add_children should add specified # children to list"""
t = RangeNode()
t2 = RangeNode(Parent=t)
t.addChildren(5)
self.assertEqual(len(t.Children), 6)
assert t.Children[0] is t2
for c in t.Children:
assert c.Parent is t
class OldPhyloNodeTests(TestCase):
"""Tests of the PhyloNode class -- these are all now methods of RangeNode."""
def setUp(self):
"""Make a couple of standard trees"""
self.t1 = DndParser('((a,(b,c)),(d,e))', RangeNode)
#selt.t1 indices: ((0,(1,2)5)6,(3,4)7)8
def test_makeIdIndex(self):
"""RangeNode makeIdIndex should assign ids to every node"""
self.t1.makeIdIndex()
result = self.t1.IdIndex
nodes = list(self.t1.traverse(self_before=True))
#check we got an entry for each node
self.assertEqual(len(result), len(nodes))
#check the ids are in the result
for i in nodes:
assert hasattr(i, 'Id')
assert i.Id in result
def test_assignQ_single_passed(self):
"""RangeNode assignQ should propagate single Q param down tree"""
#should work if Q explicitly passed
t = self.t1
Q = ['a']
t.assignQ(Q)
for node in t.traverse(self_before=True):
assert node.Q is Q
def test_assignQ_single_set(self):
"""RangeNode assignQ should propagate single Q if set"""
t = self.t1
Q = ['a']
assert not hasattr(t, 'Q')
t.Q = Q
t.assignQ()
for node in t.traverse(self_before=True):
assert node.Q is Q
def test_assignQ_single_overwrite(self):
"""RangeNode assignQ should overwrite root Q if new Q passed"""
t = self.t1
Q = ['a']
Q2 = ['b']
t.Q = Q
t.assignQ(Q2)
for node in t.traverse(self_before=True):
assert node.Q is Q2
assert not node.Q is Q
def test_assignQ_multiple(self):
"""RangeNode assignQ should propagate multiple Qs"""
t = self.t1
Q1 = ['a']
Q2 = ['b']
Q3 = ['c']
t.makeIdIndex()
t.IdIndex[7].Q = Q1
t.IdIndex[5].Q = Q2
t.assignQ(Q3)
result = [i.Q for i in t.traverse(self_after=True)]
assert t.Q is Q3
self.assertEqual(result, [Q3,Q2,Q2,Q2,Q3,Q1,Q1,Q1,Q3])
def test_assignQ_multiple_overwrite(self):
"""RangeNode assignQ should allow overwrite"""
t = self.t1
Q1 = ['a']
Q2 = ['b']
Q3 = ['c']
t.makeIdIndex()
t.IdIndex[7].Q = Q1
t.IdIndex[5].Q = Q2
t.assignQ(Q3, overwrite=True)
for i in t.traverse(self_after=True):
assert i.Q is Q3
def test_assignQ_special(self):
"""RangeNode assignQ should work with special Qs"""
t = self.t1
Q1 = 'a'
Q2 = 'b'
Q3 = 'c'
t.makeIdIndex()
special = {7:Q1, 1:Q2}
#won't work if no Q at root
self.assertRaises(ValueError, t.assignQ, special_qs=special)
t.assignQ(Q3, special_qs=special)
result = [i.Q for i in t.traverse(self_after=True)]
self.assertEqual(result, ['c','b','c','c','c','a','a','a','c'])
def test_assignP(self):
"""RangeNode assignP should work when Qs set."""
t = self.t1
for i in t.traverse(self_before=True):
i.Length = random() * 0.5 #range 0 to 0.5
t.Q = Rates.random(DnaPairs)
t.assignQ()
t.assignP()
t.assignIds()
for node in t.traverse(self_after=True):
if node.Parent is not None:
self.assertFloatEqual(average(1-diag(node.P._data), axis=0), \
node.Length)
def test_assignLength(self):
"""RangeNode assignLength should set branch length"""
t = self.t1
t.assignLength(0.3)
for i in t.traverse(self_before=True):
self.assertEqual(i.Length, 0.3)
def test_evolve(self):
"""RangeNode evolve should work on a starting vector"""
t = self.t1
t.Q = Rates.random(DnaPairs)
t.assignQ()
t.assignLength(0.1)
t.assignP()
start = array([1,0,2,1,0,0,2,1,2,0,1,2,1,0,2,0,0,3,0,2,1,0,3,1,0,2,0,0,0,0,0,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,3])
t.evolve(start)
for i in t.traverse():
self.assertEqual(len(i.Sequence), len(start))
self.assertNotEqual(i.Sequence, start)
#WARNING: Doesn't test base freqs etc. at this point, but those aren't
#really evolve()'s responsibity (tested as self.P.mutate(seq) once
#P is set, which we've already demonstrated works.)
def test_assignPs(self):
"""RangeNode assignPs should assign multiple scaled P matrices"""
t = self.t1
for i in t.traverse(self_before=True):
i.Length = random() * 0.5 #range 0 to 0.5
t.Q = Rates.random(DnaPairs)
t.assignQ()
t.assignPs([1, 0.5, 0.25])
t.assignIds()
for node in t.traverse(self_after=True):
if node.Parent is not None:
self.assertEqual(len(node.Ps), 3)
self.assertFloatEqual(average(1-diag(node.Ps[0]._data), axis=0), \
node.Length)
self.assertFloatEqual(average(1-diag(node.Ps[1]._data), axis=0), \
0.5*node.Length)
self.assertFloatEqual(average(1-diag(node.Ps[2]._data), axis=0), \
0.25*node.Length)
def test_evolveSeqs(self):
"""PhlyoNode evolveSeqs should evolve multiple sequences"""
t = self.t1
for i in t.traverse(self_before=True):
i.Length = 0.5
t.Q = Rates.random(DnaPairs)
t.assignQ()
t.assignPs([1, 1, 0.1])
t.assignIds()
orig_seqs = [array(i) for i in [randint(0,4,200), randint(0,4,200), \
randint(0,4,200)]]
t.evolveSeqs(orig_seqs)
for node in t.traverse(): #only look at leaves
if node.Parent is not None:
self.assertEqual(len(node.Sequences), 3)
for orig, new in zip(orig_seqs, node.Sequences):
self.assertEqual(len(orig), len(new))
self.assertNotEqual(orig, new)
assert sum(orig_seqs[1]!=node.Sequences[1]) > \
sum(orig_seqs[2]!=node.Sequences[2])
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_seqsim/test_usage.py 000644 000765 000024 00000110542 12024702176 022463 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for usage and substitution matrices.
"""
from cogent.util.unit_test import TestCase, main
from cogent.core.moltype import RNA
from cogent.core.usage import RnaBases, DnaBases, RnaPairs, DnaPairs
from cogent.core.alphabet import Alphabet
from cogent.core.sequence import ModelRnaSequence as RnaSequence, \
ModelRnaCodonSequence
from cogent.seqsim.usage import Usage, DnaUsage, RnaUsage, PairMatrix, Counts,\
Probs, Rates, goldman_q_dna_pair, goldman_q_rna_pair,\
goldman_q_dna_triple, goldman_q_rna_triple
from numpy import average, asarray, sqrt, identity, diagonal, trace, \
array, sum
from cogent.maths.matrix_logarithm import logm
from cogent.maths.matrix_exponentiation import FastExponentiator as expm
#need to find test directory to get access to the tests of the Freqs interface
try:
from os import getcwd
from sys import path
from os.path import sep,join
test_path = getcwd().split(sep)
index = test_path.index('tests')
fields = test_path[:index+1] + ["test_maths"]
test_path = sep + join(*fields)
path.append(test_path)
from test_stats.test_util import StaticFreqsTestsI
my_alpha = Alphabet('abcde')
class myUsage(Usage):
Alphabet = my_alpha
class UsageAsFreqsTests(StaticFreqsTestsI, TestCase):
"""Note that the remaining Usage methods are tested here."""
ClassToTest=myUsage
except ValueError: #couldn't find directory
pass
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Daniel McDonald"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
NUM_TESTS = 10 #for randomization trials
class UsageTests(TestCase):
"""Tests of the Usage object."""
def setUp(self):
"""Defines some standard test items."""
self.ab = Alphabet('ab')
class abUsage(Usage):
Alphabet = self.ab
self.abUsage = abUsage
def test_init(self):
"""Usage init should succeed only in subclasses"""
self.assertRaises(TypeError, Usage, [1,2,3,4])
self.assertEqual(self.abUsage().items(), [('a',0),('b',0)])
self.assertEqual(self.abUsage([5,6]).items(), [('a',5.0),('b',6.0)])
#should also construct from seq, if not same length as freqs
self.assertEqual(self.abUsage([0,0,1,1,1,0,1,1]).items(), \
[('a',3),('b',5)])
def test_getitem(self):
"""Usage getitem should get item via alphabet"""
u = self.abUsage([3,4])
self.assertEqual(u['a'], 3)
self.assertEqual(u['b'], 4)
def test_setitem(self):
"""Usage setitem should set item via alphabet"""
u = self.abUsage([3,4])
self.assertEqual(u['a'], 3)
u['a'] = 10
self.assertEqual(u['a'], 10)
u['b'] += 5
self.assertEqual(u['a'], 10)
self.assertEqual(u['b'], 9)
def test_str(self):
"""Usage str should print like equivalent list"""
u = self.abUsage()
self.assertEqual(str(u), "[('a', 0.0), ('b', 0.0)]")
u = self.abUsage([1,2.0])
self.assertEqual(str(u), \
"[('a', 1.0), ('b', 2.0)]")
def test_iter(self):
"""Usage iter should iterate over keys"""
u = self.abUsage([1,2])
x = tuple(u)
self.assertEqual(x, ('a', 'b'))
#should be able to convert to dict via iter
d = dict(u)
self.assertEqual(dict(u), {'a':1,'b':2})
def test_cmp(self):
"""Usage cmp should work as expected"""
a = self.abUsage([3,4])
b = self.abUsage([3,2])
c = self.abUsage([3,4])
self.assertEqual(a, a)
self.assertNotEqual(a,b)
self.assertEqual(a,c)
self.assertEqual(a==a, True)
self.assertEqual(a!=a, False)
self.assertEqual(a==b, False)
self.assertEqual(a!=b, True)
self.assertEqual(a==c, True)
self.assertEqual(a!=c, False)
self.assertEqual(a==3, False)
self.assertEqual(a!=3, True)
def test_add(self):
"""Usage add should add two sets of counts together"""
u, v = self.abUsage([1,2]), self.abUsage([6,4])
x = self.abUsage([7,6])
y = self.abUsage([7,6])
self.assertEqual(x, y)
self.assertEqual(x, u+v)
self.assertEqual(u + v, self.abUsage([7,6]))
def test_sub(self):
"""Usage sub should subtract one set of counts from the other"""
u, v = self.abUsage([1,2]), self.abUsage([6,4])
self.assertEqual(v-u, self.abUsage([5,2]))
def test_mul(self):
"""Usage mul should multiply usage by a scalar"""
u = self.abUsage([0,4])
self.assertEqual(u*3, self.abUsage([0,12]))
def test_div(self):
"""Usage div should divide usage by scalar (unsafely)"""
u = self.abUsage([0,4])
self.assertEqual(u/2, self.abUsage([0,2]))
self.assertEqual(u/8, self.abUsage([0.0,0.5]))
#note: don't need to divide by floating point to get fractions
self.assertEqual(u/8.0, self.abUsage([0.0,0.5]))
def test_scale_sum(self):
"""Usage scale_sum should scale usage to specified sum"""
u = self.abUsage([1,3])
self.assertEqual(u.scale_sum(12), self.abUsage([3.0, 9.0]))
self.assertEqual(u.scale_sum(1), self.abUsage([0.25,0.75]))
#default is sum to 1
self.assertEqual(u.scale_sum(), self.abUsage([0.25,0.75]))
def test_scale_max(self):
"""Usage scale_max should scale usage to specified max"""
u = self.abUsage([1,3])
self.assertEqual(u.scale_max(12), self.abUsage([4.0, 12.0]))
self.assertEqual(u.scale_max(1), self.abUsage([1/3.0,1.0]))
#default is max to 1
self.assertEqual(u.scale_max(), self.abUsage([1/3.0,1.0]))
def test_probs(self):
"""Usage probs should scale usage to sum to 1"""
u = self.abUsage([1,3])
self.assertEqual(u.probs(), self.abUsage([0.25,0.75]))
def test_randomIndices(self):
"""Usage randomIndices should return correct sequence."""
d = DnaUsage([0.25, 0.5, 0.1, 0.15])
s = d.randomIndices(7, [0, 0.49, 1, 0.74, 0.76, 0.86, 0.2])
self.assertEqual(s, array([0,1,3,1,2,3,0]))
s = d.randomIndices(10000)
u, c, a, g = [asarray(s==i, 'int32') for i in [0,1,2,3]]
assert 2300 < sum(u) < 2700
assert 4800 < sum(c) < 5200
assert 800 < sum(a) < 1200
assert 1300 < sum(g) < 1700
def test_fromSeqData(self):
"""Usage fromSeqData should construct from a sequence object w/ data"""
class o(object): pass
s = o()
s._data = array([0,0,0,1])
self.assertEqual(self.abUsage.fromSeqData(s), self.abUsage([3,1]))
def test_fromArray(self):
"""Usage fromArray should construct from array holding seq of indices"""
s = array([0,0,0,1])
self.assertEqual(self.abUsage.fromArray(s), self.abUsage([3,1]))
def test_get(self):
"""Usage get should behave like dict"""
u = self.abUsage([3,4])
self.assertEqual(u.get('a', 5), 3)
self.assertEqual(u.get('b', 5), 4)
self.assertEqual(u.get('x', 5), 5)
def test_values(self):
"""Usage values should return list of values in alphabet order"""
u = self.abUsage([3,4])
self.assertEqual(u.values(), [3,4])
def test_keys(self):
"""Usage keys should return list of symbols in alphabet order"""
u = self.abUsage([3,4])
self.assertEqual(u.keys(), ['a','b'])
def test_items(self):
"""Usage items should return list of key-value pairs"""
u = self.abUsage([3,4])
self.assertEqual(u.items(), [('a',3),('b',4)])
def test_entropy(self):
"""Usage items should calculate their Shannon entropy"""
#two equal choices implies one bit of entropy
u = RnaUsage([1,1,0,0])
self.assertEqual(u.entropy(), 1)
u = RnaUsage([10,10,0,0])
self.assertEqual(u.entropy(), 1)
#four equal choices implies two bits
u = RnaUsage([3,3,3,3])
self.assertEqual(u.entropy(), 2)
#only one choice -> no entropy
u = RnaUsage([3,0,0,0])
self.assertEqual(u.entropy(), 0)
#empty usage also has no entropy
u = RnaUsage([0,0,0,0])
self.assertEqual(u.entropy(), 0)
#calculated this one by hand
u = RnaUsage([.5,.3,.1,.1])
self.assertFloatEqual(u.entropy(),1.6854752972273346)
class PairMatrixTests(TestCase):
"""Tests of the PairMatrix base class."""
def setUp(self):
"""Define standard alphabet and matrices for tests."""
self.ab = Alphabet('ab')
self.ab_pairs = self.ab*self.ab
self.empty = PairMatrix([0,0,0,0], self.ab_pairs)
self.named = PairMatrix([[1,2],[3,4]], self.ab_pairs, 'name')
def test_init(self):
"""PairMatrix init requires data and alphabet"""
#should only care about number of elements, not shape
p = PairMatrix([1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8], RnaPairs)
assert p.Alphabet is RnaPairs
self.assertEqual(len(p._data), 4)
self.assertEqual(len(p._data.flat), 16)
self.assertEqual(p._data[0], array([1,2,3,4]))
self.assertEqual(p._data[1], array([5,6,7,8]))
def test_init_bad(self):
"""PairMatrix init should fail if data wrong length"""
self.assertRaises(ValueError, PairMatrix, [1,2,3,4], RnaPairs)
#should also require alphabet
self.assertRaises(TypeError, PairMatrix, [1,2,3,4])
def test_toMatlab(self):
"""PairMatrix toMatlab should return correct format string"""
self.assertEqual(self.empty.toMatlab(), "m=[0.0 0.0;\n0.0 0.0];\n")
self.assertEqual(self.named.toMatlab(), \
"name=[1.0 2.0;\n3.0 4.0];\n")
def test_str(self):
"""PairMatrix __str__ should return string correpsonding to data"""
self.assertEqual(str(self.named), str(self.named._data))
def test_repr(self):
"""PairMatrix __repr__ should return reconstructable string"""
self.assertEqual(repr(self.named), \
'PairMatrix('+ repr(self.named._data) + ',' +\
repr(self.ab_pairs)+",'name')")
def test_getitem(self):
"""PairMatrix __getitem__ should translate indices and get from array"""
n = self.named
self.assertEqual(n['a'], array([1,2]))
self.assertEqual(n['b'], array([3,4]))
self.assertEqual(n['a','a'], 1)
self.assertEqual(n['a','b'], 2)
self.assertEqual(n['b','a'], 3)
self.assertEqual(n['b','b'], 4)
#WARNING: m[a][b] doesn't work b/c indices not translated!
#must access as m[a,b] instead.
try:
x = n['a']['b']
except ValueError:
pass
#should work even if SubAlphabets not the same
a = Alphabet('ab')
x = Alphabet('xyz')
j = a * x
m = PairMatrix([1,2,3,4,5,6], j)
self.assertEqual(m['a','x'], 1)
self.assertEqual(m['a','y'], 2)
self.assertEqual(m['a','z'], 3)
self.assertEqual(m['b','x'], 4)
self.assertEqual(m['b','y'], 5)
self.assertEqual(m['b','z'], 6)
#should work even if SubAlphabets are different types
a = Alphabet([1,2,3])
b = Alphabet(['abc', 'xyz'])
j = a * b
m = PairMatrix([1,2,3,4,5,6], j)
self.assertEqual(m[1,'abc'], 1)
self.assertEqual(m[1,'xyz'], 2)
self.assertEqual(m[2,'abc'], 3)
self.assertEqual(m[2,'xyz'], 4)
self.assertEqual(m[3,'abc'], 5)
self.assertEqual(m[3,'xyz'], 6)
self.assertEqual(list(m[2]), [3,4])
#gives KeyError if single item not present in first level
self.assertRaises(KeyError, m.__getitem__, 'x')
def test_empty(self):
"""PairMatrix empty classmethod should produce correct class"""
p = PairMatrix.empty(self.ab_pairs)
self.assertEqual(p._data.flat, array([0,0,0,0]))
self.assertEqual(p._data, array([[0,0],[0,0]]))
self.assertEqual(p._data.shape, (2,2))
def test_eq(self):
"""Pairmatrix test for equality should check all elements"""
p = self.ab_pairs
a = PairMatrix.empty(p)
b = PairMatrix.empty(p)
assert a is not b
self.assertEqual(a, b)
c = PairMatrix([1,2,3,4], p)
d = PairMatrix([1,2,3,4], p)
assert c is not d
self.assertEqual(c, d)
self.assertNotEqual(a, c)
#Note: still compare equal if alphabets are different
x = Alphabet('xy')
x = x*x
y = PairMatrix([1,2,3,4], x)
self.assertEqual(y, c)
#should check all elements, not just first
c = PairMatrix([1,1,1,1], p)
d = PairMatrix([1,1,1,4], p)
assert c is not d
self.assertNotEqual(c, d)
def test_ne(self):
"""PairMatrix test for inequality should check all elements"""
p = self.ab_pairs
a = PairMatrix.empty(p)
b = PairMatrix.empty(p)
c = PairMatrix([1,2,3,4], p)
d = PairMatrix([1,2,3,4], p)
assert a != c
assert a == b
assert c == d
#Note: still compare equal if alphabets are different
x = Alphabet('xy')
x = x*x
y = PairMatrix([1,2,3,4], x)
assert y == c
#should check all elements, not just first
c = PairMatrix([1,1,1,1], p)
d = PairMatrix([1,1,1,4], p)
assert c != d
def test_iter(self):
"""PairMatrix __iter__ should iterate over rows."""
p = self.ab_pairs
c = PairMatrix([1,2,3,4], p)
l = list(c)
self.assertEqual(len(l), 2)
self.assertEqual(list(l[0]), [1,2])
self.assertEqual(list(l[1]), [3,4])
def test_len(self):
"""PairMatrix __len__ should return number of rows"""
p = self.ab_pairs
c = PairMatrix([1,2,3,4], p)
self.assertEqual(len(c), 2)
class CountsTests(TestCase):
"""Tests of the Counts class, including inferring counts from sequences."""
def test_toProbs(self):
"""Counts toProbs should return valid prob matrix."""
c = Counts([1,2,3,4,2,2,2,2,0.2,0.4,0.6,0.8,1,0,0,0], RnaPairs)
p = c.toProbs()
assert isinstance(p, Probs)
self.assertEqual(p, Probs([0.1,0.2,0.3,0.4,0.25,0.25,0.25,0.25, \
0.1,0.2,0.3,0.4,1.0,0.0,0.0,0.0], RnaPairs))
self.assertEqual(p['U','U'], 0.1)
self.assertEqual(p['G','U'], 1.0)
self.assertEqual(p['G','G'], 0.0)
def test_fromPair(self):
"""Counts fromPair should return correct counts."""
s = Counts.fromPair( RnaSequence('UCCGAUCGAUUAUCGGGUACGUA'), \
RnaSequence('GUCGAGUAUAGCGUACGGCUACG'),
RnaPairs)
assert isinstance(s, Counts)
vals = [
('U','U',0),('U','C',2.5),('U','A',1),('U','G',2.5),
('C','U',2.5),('C','C',1),('C','A',1),('C','G',0.5),
('A','U',1),('A','C',1),('A','A',1),('A','G',2),
('G','U',2.5),('G','C',0.5),('G','A',2),('G','G',2),
]
for i, j, val in vals:
self.assertFloatEqual(s[i,j], val)
#check that it works for big seqs
s = Counts.fromPair( RnaSequence('UCAG'*1000), \
RnaSequence('UGAG'*1000),
RnaPairs)
assert isinstance(s, Counts)
vals = [
('U','U',1000),('U','C',0),('U','A',0),('U','G',0),
('C','U',0),('C','C',0),('C','A',0),('C','G',500),
('A','U',0),('A','C',0),('A','A',1000),('A','G',0),
('G','U',0),('G','C',500),('G','A',0),('G','G',1000),
]
for i, j, val in vals:
self.assertFloatEqual(s[i,j], val)
#check that it works for codon seqs
s1 = ModelRnaCodonSequence('UUCGCG')
s2 = ModelRnaCodonSequence('UUUGGG')
c = Counts.fromPair(s1, s2, RNA.Alphabet.Triples**2)
self.assertEqual(c._data.sum(), 2)
self.assertEqual(c._data[0,1], 0.5)
self.assertEqual(c._data[1,0], 0.5)
self.assertEqual(c._data[55,63], 0.5)
self.assertEqual(c._data[63,55], 0.5)
def test_fromTriple(self):
"""Counts fromTriple should return correct counts."""
cft = Counts.fromTriple
rs = RnaSequence
A, C, G, U = map(rs, 'ACGU')
#counts if different from both the other groups
s = cft(A, C, C, RnaPairs)
assert isinstance(s, Counts)
self.assertEqual(s['C','A'], 1)
self.assertEqual(s['A','C'], 0)
self.assertEqual(s['C','C'], 0)
#try it with longer sequences
AAA, CCC = map(rs, ['AAA', 'CCC'])
s = cft(AAA, CCC, CCC, RnaPairs)
self.assertEqual(s['C','A'], 3)
self.assertEqual(s['A','C'], 0)
#doesn't count if all three differ
ACG, CGA, GAC = map(rs, ['ACG','CGA','GAC'])
s = cft(ACG, CGA, GAC, RnaPairs)
self.assertEqual(s['C','A'], 0)
self.assertEqual(s['A','C'], 0)
self.assertEqual(s, Counts.empty(RnaPairs))
#counts as no change if same as other sequence...
s = cft(AAA, AAA, CCC, RnaPairs)
self.assertEqual(s['A','A'], 3)
self.assertEqual(s['A','C'], 0)
#...or same as the outgroup
s = cft(AAA, CCC, AAA, RnaPairs)
self.assertEqual(s['A','A'], 3)
self.assertEqual(s['A','C'], 0)
#spot-check a mixed example
s = cft( \
rs('AUCGCUAGCAUACGUCA'),
rs('AAGCUGCGUAGCGCAUA'),
rs('GCGCAUAUGACGAUAGC'),
RnaPairs
)
vals = [
('U','U',1),('U','C',0),('U','A',0),('U','G',0),
('C','U',0),('C','C',0),('C','A',0),('C','G',1),
('A','U',1),('A','C',0),('A','A',4),('A','G',0),
('G','U',0),('G','C',1),('G','A',0),('G','G',1),
]
for i, j, val in vals:
self.assertFloatEqual(s[i,j], val)
#check a long sequence
s = cft( \
rs('AUCGCUAGCAUACGUCA'*1000),
rs('AAGCUGCGUAGCGCAUA'*1000),
rs('GCGCAUAUGACGAUAGC'*1000),
RnaPairs
)
vals = [
('U','U',1000),('U','C',0),('U','A',0),('U','G',0),
('C','U',0),('C','C',0),('C','A',0),('C','G',1000),
('A','U',1000),('A','C',0),('A','A',4000),('A','G',0),
('G','U',0),('G','C',1000),('G','A',0),('G','G',1000),
]
for i, j, val in vals:
self.assertFloatEqual(s[i,j], val)
#check that it works when forced to use both variants of fromTriple
s = cft( \
rs('AUCGCUAGCAUACGUCA'*1000),
rs('AAGCUGCGUAGCGCAUA'*1000),
rs('GCGCAUAUGACGAUAGC'*1000),
RnaPairs,
threshold=0 #forces "large" method
)
vals = [
('U','U',1000),('U','C',0),('U','A',0),('U','G',0),
('C','U',0),('C','C',0),('C','A',0),('C','G',1000),
('A','U',1000),('A','C',0),('A','A',4000),('A','G',0),
('G','U',0),('G','C',1000),('G','A',0),('G','G',1000),
]
for i, j, val in vals:
self.assertFloatEqual(s[i,j], val)
s = cft( \
rs('AUCGCUAGCAUACGUCA'*1000),
rs('AAGCUGCGUAGCGCAUA'*1000),
rs('GCGCAUAUGACGAUAGC'*1000),
RnaPairs,
threshold=1e12 #forces "small" method
)
vals = [
('U','U',1000),('U','C',0),('U','A',0),('U','G',0),
('C','U',0),('C','C',0),('C','A',0),('C','G',1000),
('A','U',1000),('A','C',0),('A','A',4000),('A','G',0),
('G','U',0),('G','C',1000),('G','A',0),('G','G',1000),
]
for i, j, val in vals:
self.assertFloatEqual(s[i,j], val)
#check that it works for codon seqs
s1 = ModelRnaCodonSequence('UUCGCG')
s2 = ModelRnaCodonSequence('UUUGGG')
s3 = s2
c = Counts.fromTriple(s1, s2, s3, RNA.Alphabet.Triples**2)
self.assertEqual(c._data.sum(), 2)
self.assertEqual(c._data[0,1], 1)
self.assertEqual(c._data[63,55], 1)
class ProbsTests(TestCase):
"""Tests of the Probs class."""
def setUp(self):
"""Define an alphabet and some probs."""
self.ab = Alphabet('ab')
self.ab_pairs = self.ab**2
def test_isValid(self):
"""Probs isValid should return True if it's a prob matrix"""
a = self.ab_pairs
m = Probs([0.5,0.5,1,0], a)
self.assertEqual(m.isValid(), True)
#fails if don't sum to 1
m = Probs([0.5, 0, 1, 0], a)
self.assertEqual(m.isValid(), False)
#fails if negative elements
m = Probs([1, -1, 0, 1], a)
self.assertEqual(m.isValid(), False)
def test_makeModel(self):
"""Probs makeModel should return correct substitution pattern"""
a = Alphabet('abc')**2
m = Probs([0.5,0.25,0.25,0.1,0.8,0.1,0.3,0.6,0.1], a)
obs = m.makeModel(array([0,1,1,0,2,2]))
exp = array([[0.5,0.25,0.25],[0.1,0.8,0.1],[0.1,0.8,0.1],\
[0.5,0.25,0.25],[0.3,0.6,0.1],[0.3,0.6,0.1]])
self.assertEqual(obs, exp)
def test_mutate(self):
"""Probs mutate should return correct vector from input vector"""
a = Alphabet('abc')**2
m = Probs([0.5,0.25,0.25,0.1,0.8,0.1,0.3,0.6,0.1], a)
#because of fp math in accumulate, can't predict boundaries exactly
#so add/subtract eps to get the result we expect
eps = 1e-6
# a b b a c c a b c
seq = array([0,1,1,0,2,2,0,1,2])
random_vec = array([0,.01,.8-eps,1,1,.3,.05,.9+eps,.95])
self.assertEqual(m.mutate(seq, random_vec), \
# a a b c c a a c c
array([0,0,1,2,2,0,0,2,2]))
#check that freq. distribution is about right
seqs = array([m.mutate(seq) for i in range(1000)])
#WARNING: bool operators return byte arrays, whose sums wrap at 256!
zero_count = asarray(seqs == 0, 'int32')
sums = sum(zero_count, axis=0)
#expect: 500, 100, 100, 500, 300, 300, 500, 100, 300
#std dev = sqrt(npq), which is sqrt(250), sqrt(90), sqrt(210)
means = array([500, 100, 100, 500, 300, 300, 500, 100, 300])
var = array([250, 90, 90, 250, 210, 210, 250, 90, 210])
three_sd = 3 * sqrt(var)
for obs, exp, sd in zip(sums, means, three_sd):
assert exp - 2*sd < obs < exp + 2*sd
def test_toCounts(self):
"""Probs toCounts should return counts object w/ right numbers"""
a = Alphabet('abc')**2
m = Probs([0.5,0.25,0.25,0.1,0.8,0.1,0.3,0.6,0.1], a)
obs = m.toCounts(30)
assert isinstance(obs, Counts)
exp = Counts([[5.,2.5,2.5,1,8,1,3,6,1]], a)
self.assertEqual(obs, exp)
def test_toRates(self):
"""Probs toRates should return log of probs, optionally normalized"""
a = Alphabet('abc')**2
p = Probs([0.9,0.05,0.05,0.1,0.85,0.05,0.02,0.02,0.96], a)
assert p.isValid()
r = p.toRates()
assert isinstance(r, Rates)
assert r.isValid()
assert not r.isComplex()
self.assertEqual(r._data, logm(p._data))
r_norm = p.toRates(normalize=True)
self.assertFloatEqual(trace(r_norm._data), -1.0)
def test_random_p_matrix(self):
"""Probs random should return random Probsrows that sum to 1"""
for i in range(NUM_TESTS):
p = Probs.random(RnaPairs)._data
for i in p:
self.assertFloatEqual(sum(i), 1.0)
#length should be 4 by default
self.assertEqual(len(p), 4)
self.assertEqual(len(p[0]), 4)
def test_random_p_matrix_diag(self):
"""Probs random should work with a scalar diagonal"""
#if diagonal is 1, off-diagonal elements should be 0
for i in range(NUM_TESTS):
p = Probs.random(RnaPairs, 1)._data
self.assertEqual(p, identity(4, 'd'))
#if diagonal is between 0 and 1, rows should sum to 1
for i in range(NUM_TESTS):
p = Probs.random(RnaPairs, 0.5)._data
for i in range(4):
self.assertFloatEqual(sum(p[i]), 1.0)
self.assertEqual(p[i][i], 0.5)
assert min(p[i]) >= 0
assert max(p[i]) <= 1
#if diagonal > 1, rows should still sum to 1
for i in range(NUM_TESTS):
p = Probs.random(RnaPairs, 2)._data
for i in range(4):
self.assertEqual(p[i][i], 2.0)
self.assertFloatEqual(sum(p[i]), 1.0)
assert min(p[i]) < 0
def test_random_p_matrix_diag_vector(self):
"""Probs random should work with a vector diagonal"""
for i in range(NUM_TESTS):
diag = [0, 0.2, 0.6, 1.0]
p = Probs.random(RnaPairs, diag)._data
for i, d, row in zip(range(4), diag, p):
self.assertFloatEqual(sum(row), 1.0)
self.assertEqual(row[i], diag[i])
class RatesTests(TestCase):
"""Tests of the Rates class."""
def setUp(self):
"""Define standard alphabets."""
self.abc = Alphabet('abc')
self.abc_pairs = self.abc**2
def test_init(self):
"""Rates init should take additional parameter to normalize"""
r = Rates([-2,1,1,0,-1,1,0,0,0], self.abc_pairs)
self.assertEqual(r._data, array([[-2,1,1],[0,-1,1],[0,0,0]]))
r = Rates([-2.5,1,1,0,-1,1,0,0,0], self.abc_pairs)
self.assertEqual(r._data, array([[-2.5,1.,1.],[0.,-1.,1.],[0.,0.,0.]]))
r = Rates([-2,1,1,0,-1,1,2,0,-1], self.abc_pairs, normalize=True)
self.assertEqual(r._data, \
array([[-0.5,.25,.25],[0.,-.25,.25],[.5,0.,-.25]]))
def test_isComplex(self):
"""Rates isComplex should return True if complex elements"""
r = Rates([0,0,0.1j,0,0,0,0,0,0], self.abc_pairs)
assert r.isComplex()
r = Rates([0,0,0.1,0,0,0,0,0,0], self.abc_pairs)
assert not r.isComplex()
def test_isSignificantlyComplex(self):
"""Rates isSignificantlyComplex should be true if large imag component"""
r = Rates([0,0,0.2j,0,0,0,0,0,0], self.abc_pairs)
assert r.isSignificantlyComplex()
assert r.isSignificantlyComplex(0.01)
assert not r.isSignificantlyComplex(0.2)
assert not r.isSignificantlyComplex(0.3)
r = Rates([0,0,0.1,0,0,0,0,0,0], self.abc_pairs)
assert not r.isSignificantlyComplex()
assert not r.isSignificantlyComplex(1e-30)
assert not r.isSignificantlyComplex(1e3)
def test_isValid(self):
"""Rates isValid should check row sums and neg off-diags"""
r = Rates([-2,1,1,0,-1,1,0,0,0], self.abc_pairs)
assert r.isValid()
r = Rates([0,0,0,0,0,0,0,0,0], self.abc_pairs)
assert r.isValid()
#not valid if negative off-diagonal
r = Rates([-2,-1,3,1,-1,0,2,2,-4], self.abc_pairs)
assert not r.isValid()
#not valid if rows don't all sum to 0
r = Rates([0,0.0001,0,0,0,0,0,0,0], self.abc_pairs)
assert not r.isValid()
def test_normalize(self):
"""Rates normalize should return normalized copy of self where trace=-1"""
r = Rates([-2,1,1,0,-1,1,2,0,-1], self.abc_pairs)
n = r.normalize()
self.assertEqual(n._data, \
array([[-0.5,.25,.25],[0.,-.25,.25],[.5,0.,-.25]]))
#check that we didn't change the original
assert n._data is not r._data
self.assertEqual(r._data, \
array([[-2,1,1,],[0,-1,1,],[2,0,-1]]))
def test_toProbs(self):
"""Rates toProbs should return correct probability matrix"""
a = self.abc_pairs
p = Probs([0.75, 0.1, 0.15, 0.2, 0.7, 0.1, 0.05, 0.1, 0.85], a)
q = p.toRates()
self.assertEqual(q._data, logm(p._data))
p2 = q.toProbs()
self.assertFloatEqual(p2._data, p._data)
#test a case that didn't work for DNA
q = Rates(array(
[[-0.64098451, 0.0217681 , 0.35576469, 0.26345171],
[ 0.31144238, -0.90915091, 0.25825858, 0.33944995],
[ 0.01578521, 0.43162879, -0.99257581, 0.54516182],
[ 0.13229986, 0.04027147, 0.05817791, -0.23074925]]),
DnaPairs)
self.assertFloatEqual(q.toProbs(0.5)._data, expm(q._data)(t=0.5))
def test_timeForSimilarity(self):
"""Rates timeToSimilarity should return correct time"""
a = self.abc_pairs
p = Probs([0.75, 0.1, 0.15, 0.2, 0.7, 0.1, 0.05, 0.15, 0.8], a)
q = p.toRates()
d = 0.5
t = q.timeForSimilarity(d)
x = expm(q._data)(t)
self.assertFloatEqual(average(diagonal(x), axis=0), d)
t = q.timeForSimilarity(d, array([1/3.0]*3))
x = expm(q._data)(t)
self.assertFloatEqual(average(diagonal(x), axis=0), d)
self.assertEqual(q.timeForSimilarity(1), 0)
def test_toSimilarProbs(self):
"""Rates toSimilarProbs should match individual steps"""
a = self.abc_pairs
p = Probs([0.75, 0.1, 0.15, 0.2, 0.7, 0.1, 0.05, 0.15, 0.8], a)
q = p.toRates()
self.assertEqual(q.toSimilarProbs(0.5), \
q.toProbs(q.timeForSimilarity(0.5)))
#test a case that didn't work for DNA
q = Rates(array(
[[-0.64098451, 0.0217681 , 0.35576469, 0.26345171],
[ 0.31144238, -0.90915091, 0.25825858, 0.33944995],
[ 0.01578521, 0.43162879, -0.99257581, 0.54516182],
[ 0.13229986, 0.04027147, 0.05817791, -0.23074925]]),
DnaPairs)
p = q.toSimilarProbs(0.66)
self.assertFloatEqual(average(diagonal(p._data), axis=0), 0.66)
def test_random_q_matrix(self):
"""Rates random should return matrix of correct size"""
for i in range(NUM_TESTS):
q = Rates.random(RnaPairs)._data
self.assertEqual(len(q), 4)
self.assertEqual(len(q[0]), 4)
for row in q:
self.assertFloatEqual(sum(row), 0.0)
assert min(row) < 0
assert max(row) > 0
l = list(row)
l.sort()
assert min(l[1:]) >= 0
assert max(l[1:]) <= 1
def test_random_q_matrix_diag(self):
"""Rates random should set diagonal correctly from scalar"""
for i in range(NUM_TESTS):
q = Rates.random(RnaPairs, -1)._data
self.assertEqual(len(q), 4)
for i, row in enumerate(q):
self.assertFloatEqual(sum(row), 0)
self.assertEqual(row[i], -1)
assert max(row) <= 1
l = list(row)
l.sort()
assert min(l[1:]) >= 0
assert max(l[1:]) <= 1
for i in range(NUM_TESTS):
q = Rates.random(RnaPairs, -5)._data
self.assertEqual(len(q), 4)
for i, row in enumerate(q):
self.assertFloatEqual(sum(row), 0)
self.assertEqual(row[i], -5)
assert max(row) <= 5
l = list(row)
l.sort()
assert min(l[1:]) >= 0
assert max(l[1:]) <= 5
def test_random_q_matrix_diag_vector(self):
"""Rates random should init with vector as diagonal"""
diag = [1, -1, 2, -2]
for i in range(NUM_TESTS):
q = Rates.random(RnaPairs, diag)._data
for i, d, row in zip(range(4), diag, q):
self.assertFloatEqual(sum(row, axis=0), 0.0)
self.assertEqual(row[i], diag[i])
def test_fixNegsDiag(self):
"""Rates fixNegsDiag should fix negatives by adding to diagonal"""
q = Rates([[-6,2,2,2],[-6,-2,4,4],[2,2,-6,2],[4,4,-2,-6]], RnaPairs)
m = q.fixNegsDiag()._data
self.assertEqual(m,array([[-6,2,2,2],[0,-8,4,4],[2,2,-6,2],[4,4,0,-8]]))
def test_fixNegsEven(self):
"""Rates fixNegsEven should fix negatives by adding evenly to others"""
q = Rates([[-6,2,2,2],[-3,-2,3,2],[-2,-2,-6,2],[4,4,-6,-2]], RnaPairs)
m = q.fixNegsEven()._data
self.assertEqual(m,array([[-6,2,2,2],[0,-3,2,1],[0,0,-0,0],[2,2,0,-4]]))
def test_fixNegsFmin(self):
"""Rates fixNegsFmin should fix negatives using fmin method"""
q = Rates(array([[-0.28936029, 0.14543346, -0.02648614, 0.17041297],
[ 0.00949624, -0.31186005, 0.17313171, 0.1292321 ],
[ 0.10443209, 0.16134479, -0.30480186, 0.03902498],
[ 0.01611264, 0.12999161, 0.15558259, -0.30168684]]), DnaPairs)
r = q.fixNegsFmin()
assert not q.isValid()
assert r.isValid()
def test_fixNegsConstrainedOpt(self):
"""Rates fixNegsConstrainedOpt should fix negatives w/ constrained opt"""
q = Rates(array([[-0.28936029, 0.14543346, -0.02648614, 0.17041297],
[ 0.00949624, -0.31186005, 0.17313171, 0.1292321 ],
[ 0.10443209, 0.16134479, -0.30480186, 0.03902498],
[ 0.01611264, 0.12999161, 0.15558259, -0.30168684]]), DnaPairs)
r = q.fixNegsFmin()
assert not q.isValid()
assert r.isValid()
def test_fixNegsReflect(self):
"""Rates fixNegsReflect should reflect negatives across diagonal"""
ab = Alphabet('ab')**2
#should leave matrix alone if no off-diagonal elements
q = Rates([0,0,1,-1], ab)
self.assertEqual(q.fixNegsReflect()._data, array([[0,0],[1,-1]]))
q = Rates([-2,2,1,-1], ab)
self.assertEqual(q.fixNegsReflect()._data, array([[-2,2],[1,-1]]))
#should work if precisely one off-diag element in a pair is negative
q = Rates([2,-2,1,-1], ab)
self.assertEqual(q.fixNegsReflect()._data, array([[0,0],[3,-3]]))
q = Rates([-1,1,-2,2], ab)
self.assertEqual(q.fixNegsReflect()._data, array([[-3,3],[0,-0]]))
#should work if both off-diag elements in a pair are negative
q = Rates([1,-1,-2,2], ab)
self.assertEqual(q.fixNegsReflect()._data, array([[-2,2],[1,-1]]))
q = Rates([2,-2,-1,1], ab)
self.assertEqual(q.fixNegsReflect()._data, array([[-1,1],[2,-2]]))
q = Rates([[ 0, 3, -2, -1],
[ 2, -1, 2, -3],
[-1, -1, 2, 0],
[-3, 2, 0, 1]], RnaPairs)
q2 = q.fixNegsReflect()
self.assertEqual(q2._data, \
array([[-7, 3, 1, 3],
[ 2, -5, 3, 0],
[ 2, 0, -2, 0],
[ 1, 5, 0, -6]]))
class GoldmanTests(TestCase):
def setUp(self):
pass
def test_goldman_q_dna_pair(self):
"""Should return expected rate matrix"""
seq1 = "ATGCATGCATGC"
seq2 = "AAATTTGGGCCC"
expected = array([[-(2/3.0), (1/3.0), (1/3.0), 0],
[(1/3.0), -(2/3.0), 0, (1/3.0)],
[(1/3.0), 0, -(2/3.0), (1/3.0)],
[0, (1/3.0), (1/3.0), -(2/3.0)]])
observed = goldman_q_dna_pair(seq1, seq2)
self.assertFloatEqual(observed, expected)
def test_goldman_q_rna_pair(self):
"""Should return expected rate matrix"""
seq1 = "AUGCAUGCAUGC"
seq2 = "AAAUUUGGGCCC"
expected = array([[-(2/3.0), (1/3.0), (1/3.0), 0],
[(1/3.0), -(2/3.0), 0, (1/3.0)],
[(1/3.0), 0, -(2/3.0), (1/3.0)],
[0, (1/3.0), (1/3.0), -(2/3.0)]])
observed = goldman_q_rna_pair(seq1, seq2)
self.assertFloatEqual(observed, expected)
def test_goldman_q_dna_triple(self):
"""Should return expected rate matrix"""
seq1 = "ATGCATGCATGC"
seq2 = "AAATTTGGGCCC"
outgroup = "AATTGGCCAATT"
expected = array([[-(1/2.0), (1/2.0), 0, 0],
[0, 0, 0, 0],
[(1/3.0), 0, -(1/3.0), 0],
[0, 0, 0, 0]])
observed = goldman_q_dna_triple(seq1, seq2, outgroup)
self.assertFloatEqual(observed, expected)
def test_goldman_q_rna_triple(self):
"""Should return expected rate matrix"""
seq1 = "AUGCAUGCAUGC"
seq2 = "AAAUUUGGGCCC"
outgroup = "AAUUGGCCAAUU"
expected = array([[-(1/2.0), (1/2.0), 0, 0],
[0, 0, 0, 0],
[(1/3.0), 0, -(1/3.0), 0],
[0, 0, 0, 0]])
observed = goldman_q_rna_triple(seq1, seq2, outgroup)
self.assertFloatEqual(observed, expected)
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_parse/__init__.py 000644 000765 000024 00000002540 12024702176 021666 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
__all__ = ['test_aaindex',
'test_agilent_microarray',
'test_blast',
'test_bpseq',
'test_cigar',
'test_clustal',
'test_cutg',
'test_dialign',
'test_ebi',
'test_fasta',
'test_fastq',
'test_genbank',
'test_illumina_sequence',
'test_locuslink',
'test_mage',
'test_meme',
'test_msms',
'test_ncbi_taxonomy',
'test_nexus',
'test_pdb',
'test_structure',
'test_phylip',
'test_record',
'test_record_finder',
'test_rfam',
'test_rna_fold',
'test_rnaview',
'test_sprinzl',
'test_tree',
'test_unigene'
'test_stride']
__author__ = ""
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jeremy Widmann", "Catherine Lozuopone", "Gavin Huttley",
"Rob Knight", "Sandra Smit", "Micah Hamady",
"Jeremy Widmann", "Hua Ying", "Greg Caporaso",
"Zongzhi Liu", "Jason Carnes", "Peter Maxwell",
"Marcin Cieslik"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
PyCogent-1.5.3/tests/test_parse/test_aaindex.py 000644 000765 000024 00000133710 12024702176 022603 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests of the AAIndex parser.
"""
from cogent.util.unit_test import TestCase, main
from cogent.parse.aaindex import AAIndex1Parser, AAIndex2Parser,\
AAIndexRecord, AAIndex1Record, AAIndex2Record, AAIndex1FromFiles,\
AAIndex2FromFiles
__author__ = "Greg Caporaso"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Greg Caporaso", "Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Greg Caporaso"
__email__ = "caporaso@colorado.edu"
__status__ = "Production"
class test_aaindex1_parser(TestCase):
""" Tests aindex1_parser class """
def setUp(self):
""" Setup some variables """
self._fake_file = list(fake_file_aaindex1.split('\n'))
self.AAIndexObjects = AAIndex1FromFiles(self._fake_file)
def test_init(self):
""" AAI1: Test that init run w/o error """
aa1p = AAIndex1Parser()
def test_read_file_as_list(self):
"""AAI1: Test that a file is correctly opened as a list """
aap = AAIndex1Parser()
AAIndexObjects = aap(self._fake_file)
def test_correct_num_of_records(self):
"""AAI1: Test that one object is created per record """
self.assertEqual(6, len(self.AAIndexObjects))
def test_ID_entries(self):
""" AAI1: Test ID Entries """
self.assertEqual(self.AAIndexObjects['ANDN920101'].ID, 'ANDN920101')
self.assertEqual(self.AAIndexObjects['ARGP820103'].ID, 'ARGP820103')
self.assertEqual(self.AAIndexObjects['JURD980101'].ID, 'JURD980101')
def test_single_line_Description_entries(self):
""" AAI1: Test Single Line Description Entries """
self.assertEqual(self.AAIndexObjects['ANDN920101'].Description,\
'alpha-CH chemical shifts (Andersen et al., 1992)')
self.assertEqual(self.AAIndexObjects['ARGP820103'].Description,\
'Membrane-buried preference parameters (Argos et al., 1982)')
def test_multi_line_Description_entries(self):
""" AAI1: Test Multi Line Description Entries """
self.assertEqual(self.AAIndexObjects['JURD980101'].Description,\
'Modified Kyte-Doolittle hydrophobicity scale (Juretic et al., 1998)')
def test_LITDB_entries(self):
""" AAI1: Test LITDB Entries """
self.assertEqual(self.AAIndexObjects['ANDN920101'].LITDBEntryNum,\
'LIT:1810048b PMID:1575719')
self.assertEqual(self.AAIndexObjects['ARGP820103'].LITDBEntryNum,\
'LIT:0901079b PMID:7151796')
self.assertEqual(self.AAIndexObjects['JURD980101'].LITDBEntryNum,\
'')
def test_Authors_entries(self):
""" AAI1: Test Authors Entries """
self.assertEqual(self.AAIndexObjects['ANDN920101'].Authors,\
'Andersen, N.H., Cao, B. and Chen, C.')
self.assertEqual(self.AAIndexObjects['ARGP820103'].Authors,\
'Argos, P., Rao, J.K.M. and Hargrave, P.A.')
self.assertEqual(self.AAIndexObjects['JURD980101'].Authors,\
'Juretic, D., Lucic, B., Zucic, D. and Trinajstic, N.')
def test_mult_line_Title_entries(self):
""" AAI1: Test Multi Line Title Entries """
self.assertEqual(self.AAIndexObjects['ANDN920101'].Title,\
'Peptide/protein structure analysis using the chemical shift index ' +\
'method: upfield alpha-CH values reveal dynamic helices and aL sites')
self.assertEqual(self.AAIndexObjects['JURD980101'].Title,\
'Protein transmembrane structure: recognition and prediction by ' +\
'using hydrophobicity scales through preference functions')
def test_sing_line_Title_entries(self):
""" AAI1: Test Single Line Title Entries """
self.assertEqual(self.AAIndexObjects['ARGP820103'].Title,\
'Structural prediction of membrane-bound proteins')
def test_Citation_entries(self):
""" AAI1: Test Citation Entries """
self.assertEqual(self.AAIndexObjects['ANDN920101'].Citation,\
'Biochem. and Biophys. Res. Comm. 184, 1008-1014 (1992)')
self.assertEqual(self.AAIndexObjects['ARGP820103'].Citation,\
'Eur. J. Biochem. 128, 565-575 (1982)')
self.assertEqual(self.AAIndexObjects['JURD980101'].Citation,\
'Theoretical and Computational Chemistry, 5, 405-445 (1998)')
def test_Comments_entries(self):
""" AAI1: Test Comments Entries """
self.assertEqual(self.AAIndexObjects['ANDN920101'].Comments,\
'')
self.assertEqual(self.AAIndexObjects['ARGP820103'].Comments,\
'')
self.assertEqual(self.AAIndexObjects['JURD980101'].Comments,\
'')
self.assertEqual(self.AAIndexObjects['TSAJ990102'].Comments,\
'(Cyh 113.7)')
def test_single_line_Correlating_entries(self):
""" AAI1: Test single line Correlating Entries """
self.assertEqual(self.AAIndexObjects['ANDN920101'].\
Correlating['BUNA790102'], 0.949)
def test_empty_Correlating_entries(self):
""" AAI1: Test empty Correlating Entries """
self.assertEqual(self.AAIndexObjects['WILM950104'].Correlating, {})
def test_multi_line_Correlating_entries(self):
""" AAI1: Test multi line Correlating Entries """
self.assertEqual(self.AAIndexObjects['ARGP820103'].\
Correlating['ARGP820102'], 0.961)
self.assertEqual(self.AAIndexObjects['ARGP820103'].\
Correlating['MIYS850101'], 0.822)
self.assertEqual(self.AAIndexObjects['ARGP820103'].\
Correlating['JURD980101'], 0.800)
self.assertEqual(self.AAIndexObjects['JURD980101'].\
Correlating['KYTJ820101'], 0.996)
self.assertEqual(self.AAIndexObjects['JURD980101'].\
Correlating['NADH010101'], 0.925)
self.assertEqual(self.AAIndexObjects['JURD980101'].\
Correlating['OOBM770101'], -0.903)
def test_Data_entries(self):
""" AAI1: Test Data Entries """
self.assertEqual(self.AAIndexObjects['ANDN920101'].Data['A'],\
4.35)
self.assertEqual(self.AAIndexObjects['ANDN920101'].Data['Q'],\
4.37)
self.assertEqual(self.AAIndexObjects['ANDN920101'].Data['V'],\
3.95)
self.assertEqual(self.AAIndexObjects['ARGP820103'].Data['A'],\
1.56)
self.assertEqual(self.AAIndexObjects['ARGP820103'].Data['Q'],\
0.51)
self.assertEqual(self.AAIndexObjects['ARGP820103'].Data['V'],\
1.14)
self.assertEqual(self.AAIndexObjects['JURD980101'].Data['A'],\
1.10)
self.assertEqual(self.AAIndexObjects['JURD980101'].Data['Q'],\
-3.68)
self.assertEqual(self.AAIndexObjects['JURD980101'].Data['V'],\
4.2)
class test_aaindex2_parser(TestCase):
def setUp(self):
""" Setup some variables """
self._fake_file = list(fake_file_aaindex2.split('\n'))
self.AAIndexObjects = AAIndex2FromFiles(self._fake_file)
def test_init(self):
""" AAI2: Test that init run w/o error """
aa2p = AAIndex2Parser()
def test_read_file_as_list(self):
"""AAI1: Test that a file is correctly opened as a list """
aap = AAIndex2Parser()
AAIndexObjects = aap(self._fake_file)
def test_correct_num_of_records(self):
"""AAI2: Test that one object is created per record """
self.assertEqual(6, len(self.AAIndexObjects))
def test_ID_entries(self):
""" AAI2: Test ID Entries """
self.assertEqual(self.AAIndexObjects['ALTS910101'].ID, 'ALTS910101')
self.assertEqual(self.AAIndexObjects['BENS940103'].ID, 'BENS940103')
self.assertEqual(self.AAIndexObjects['QUIB020101'].ID, 'QUIB020101')
def test_Description_entries(self):
""" AAI2: Test Description Entries """
self.assertEqual(self.AAIndexObjects['ALTS910101'].Description,\
'The PAM-120 matrix (Altschul, 1991)')
self.assertEqual(self.AAIndexObjects['BENS940103'].Description,\
'Log-odds scoring matrix collected in 74-100 PAM (Benner et al., '+\
'1994)')
self.assertEqual(self.AAIndexObjects['QUIB020101'].Description,\
'STROMA score matrix for the alignment of known distant homologs ' +\
'(Qian-Goldstein, 2002)')
def test_LITDB_entries(self):
""" AAI2: Test LITDB Entries """
self.assertEqual(self.AAIndexObjects['ALTS910101'].LITDBEntryNum,\
'LIT:1713145 PMID:2051488')
self.assertEqual(self.AAIndexObjects['BENS940103'].LITDBEntryNum,\
'LIT:2023094 PMID:7700864')
self.assertEqual(self.AAIndexObjects['QUIB020101'].LITDBEntryNum,\
'PMID:12211027')
def test_Authors_entries(self):
""" AAI2: Test Atuthor Entries """
self.assertEqual(self.AAIndexObjects['ALTS910101'].Authors,\
'Altschul, S.F.')
self.assertEqual(self.AAIndexObjects['BENS940103'].Authors,\
'Benner, S.A., Cohen, M.A. and Gonnet, G.H.')
self.assertEqual(self.AAIndexObjects['QUIB020101'].Authors,\
'Qian, B. and Goldstein, R.A.')
def test_Title_entries(self):
""" AAI2: Test Title Entries """
self.assertEqual(self.AAIndexObjects['ALTS910101'].Title,\
'Amino acid substitution matrices from an information theoretic ' +\
'perspective')
self.assertEqual(self.AAIndexObjects['BENS940103'].Title,\
'Amino acid substitution during functionally constrained divergent ' +\
'evolution of protein sequences')
self.assertEqual(self.AAIndexObjects['QUIB020101'].Title,\
'Optimization of a new score function for the generation of '+\
'accurate alignments')
def test_Citation_entries(self):
""" AAI2: Test citation entries """
self.assertEqual(self.AAIndexObjects['ALTS910101'].Citation,\
'J. Mol. Biol. 219, 555-565 (1991)')
self.assertEqual(self.AAIndexObjects['BENS940103'].Citation,\
'Protein Engineering 7, 1323-1332 (1994)')
self.assertEqual(self.AAIndexObjects['QUIB020101'].Citation,\
'Proteins. 48, 605-610 (2002)')
def test_Comments_entries(self):
""" AAI2: Tests null, single line, multi line comments """
self.assertEqual(self.AAIndexObjects['ALTS910101'].Comments,\
'')
self.assertEqual(self.AAIndexObjects['BENS940103'].Comments,\
'extrapolated to 250 PAM')
self.assertEqual(self.AAIndexObjects['QUIB020101'].Comments,\
'')
self.assertEqual(self.AAIndexObjects['HENS920104'].Comments,\
'# Matrix made by matblas from blosum50.iij ' +
'* # BLOSUM Clustered Scoring Matrix in 1/3 Bit Units ' +
'* # Blocks Database = /data/blocks_5.0/blocks.dat ' +
'* # Cluster Percentage: >= 50 ' +
'* # Entropy = 0.4808, Expected = -0.3573')
def test_Data_entries_20x20_LTM(self):
""" AAI2: correct data entries when 20x20 LTM"""
self.assertEqual(self.AAIndexObjects['ALTS910101'].Data['A']['A'],\
3.)
self.assertEqual(self.AAIndexObjects['ALTS910101'].Data['Y']['R'],\
-6.)
self.assertEqual(self.AAIndexObjects['ALTS910101'].Data['V']['V'],\
5.)
self.assertEqual(self.AAIndexObjects['BENS940103'].Data['A']['A'],\
2.4)
self.assertEqual(self.AAIndexObjects['BENS940103'].Data['Y']['R'],\
-2.0)
self.assertEqual(self.AAIndexObjects['BENS940103'].Data['V']['V'],\
3.4)
self.assertEqual(self.AAIndexObjects['QUIB020101'].Data['A']['A'],\
2.5)
self.assertEqual(self.AAIndexObjects['QUIB020101'].Data['Y']['R'],\
-0.9)
self.assertEqual(self.AAIndexObjects['QUIB020101'].Data['V']['V'],\
4.2)
def test_Data_entries_20x20_Square(self):
""" AAI2: correct data entries when 20x20 squ matrix """
self.assertEqual(self.AAIndexObjects['HENS920104'].Data['V']['Y'],\
-1)
self.assertEqual(self.AAIndexObjects['HENS920104'].Data['Q']['A'],\
-1)
self.assertEqual(self.AAIndexObjects['HENS920104'].Data['N']['N'],\
7)
def test_Data_entries_with_abnormal_fields(self):
""" AAI2: test correct data entries when more than std fields present
Some entires in AAIndex2 have more that 20 fields, this tests that
that data is corrected parsed and identified.
"""
# There are no entries that fit this category that are square
# matrices, which is all we are concerned with at this point,
# so this method should just serve as a reminder to test this
# when we begin parsing data other than square matrices.
pass
def test_Data_entries_21x21_LTM(self):
""" AAI2: correct data entries when 21x21 LTM"""
self.assertEqual(self.AAIndexObjects['KOSJ950101'].Data['-']['-'],\
55.7)
self.assertEqual(self.AAIndexObjects['KOSJ950101'].Data['Y']['-'],\
0.3)
self.assertEqual(self.AAIndexObjects['KOSJ950101'].Data['N']['R'],\
3.0)
def test_Data_entries_22x21_square(self):
""" AAI2: correct data entries when 22x21 square matrix """
# It's not really a sqaure matrix, but it's fully populated ...
self.assertEqual(self.AAIndexObjects['OVEJ920102'].Data['J']['D'],\
0.001)
self.assertEqual(self.AAIndexObjects['OVEJ920102'].Data['-']['I'],\
0.022)
self.assertEqual(self.AAIndexObjects['OVEJ920102'].Data['D']['E'],\
0.109)
class AAIndexRecordTests(TestCase):
""" AAIR: Tests AAIndexRecord class """
def setUp(self):
self.id = "5"
self.description = "Some Info"
self.LITDB_entry_num = "25"
self.authors = "Greg"
self.title = "A test"
self.citation = "something"
self.comments = "This is a test, this is only a test"
self.data = {}
class AAIndex1RecordTests(AAIndexRecordTests):
""" AAIR1: Tests AAIndex1Records class """
def setUp(self):
AAIndexRecordTests.setUp(self)
self.correlating = [0.987, 0.783, 1., 0]
values = []
keys = 'ARNDCQEGHILKMFPSTWYV'
for i in range(20):
values += [float(i) + 0.15]
self.data = dict(zip(keys,values))
self.aar = AAIndex1Record(self.id, self.description,\
self.LITDB_entry_num, self.authors, self.title,\
self.citation, self.comments, self.correlating, self.data)
def test_init(self):
""" AAIR1: Tests init method returns with no errors"""
test_aar = AAIndex1Record(self.id, self.description,\
self.LITDB_entry_num, self.authors, self.title,\
self.citation, self.comments, self.correlating, self.data)
def test_general_init_data(self):
""" AAIR1: Tests init correctly initializes data"""
self.assertEqual(self.aar.ID, str(self.id))
self.assertEqual(self.aar.Description, str(self.description))
self.assertEqual(self.aar.LITDBEntryNum,\
str(self.LITDB_entry_num))
self.assertEqual(self.aar.Authors, str(self.authors))
self.assertEqual(self.aar.Title, str(self.title))
self.assertEqual(self.aar.Citation, str(self.citation))
self.assertEqual(self.aar.Comments, str(self.comments))
self.assertEqual(self.aar.Correlating, self.correlating)
self.assertEqual(self.aar.Data,self.data)
def test_toSquareDistanceMatrix(self):
""" AAIR1: Tests that _toSquareDistanceMatrix runs without returning an error """
square = self.aar._toSquareDistanceMatrix()
def test_toSquareDistanceMatrix_data_integrity_diagonal(self):
""" AAIR1: Tests that diag = 0 when square matrix is built """
square = self.aar._toSquareDistanceMatrix()
# Test diagonal
keys = 'ARNDCQEGHILKMFPSTWYV'
for k in keys:
self.assertEqual(square[k][k], 0.)
def test_toSquareDistanceMatrix_data_integrity(self):
""" AAIR1: Tests that _toSquareDistanceMatrix works right w/o stops """
square = self.aar._toSquareDistanceMatrix()
self.assertFloatEqualAbs(square['R']['A'], square['A']['R'])
self.assertFloatEqualAbs(square['A']['R'], 1.)
self.assertFloatEqualAbs(square['D']['N'], square['N']['D'])
self.assertFloatEqualAbs(square['D']['N'], 1.)
self.assertFloatEqualAbs(square['A']['C'], square['C']['A'])
self.assertFloatEqualAbs(square['A']['C'], 4.)
self.assertFloatEqualAbs(square['V']['A'], square['A']['V'])
self.assertFloatEqualAbs(square['V']['A'], 19.)
self.assertFloatEqualAbs(square['V']['Y'], square['Y']['V'])
self.assertFloatEqualAbs(square['V']['Y'], 1.)
def test_toSquareDistanceMatrix_data_integrity_w_stops(self):
""" AAIR1: Tests that _toSquareDistanceMatrix works right w/ stops """
square = self.aar._toSquareDistanceMatrix(include_stops=1)
self.assertFloatEqualAbs(square['R']['A'], square['A']['R'])
self.assertFloatEqualAbs(square['A']['R'], 1.)
self.assertFloatEqualAbs(square['D']['N'], square['N']['D'])
self.assertFloatEqualAbs(square['D']['N'], 1.)
self.assertFloatEqualAbs(square['A']['C'], square['C']['A'])
self.assertFloatEqualAbs(square['A']['C'], 4.)
self.assertFloatEqualAbs(square['V']['A'], square['A']['V'])
self.assertFloatEqualAbs(square['V']['A'], 19.)
self.assertFloatEqualAbs(square['V']['Y'], square['Y']['V'])
self.assertFloatEqualAbs(square['V']['Y'], 1.)
self.assertFloatEqualAbs(square['V']['*'], None)
self.assertFloatEqualAbs(square['*']['Y'], None)
self.assertFloatEqualAbs(square['*']['*'], None)
self.assertFloatEqualAbs(square['*']['R'], None)
def test_toDistanceMatrix(self):
""" AAIR1: Tests that toDistanceMatrix functions as expected """
dm = self.aar.toDistanceMatrix()
self.assertFloatEqualAbs(dm['R']['A'], dm['A']['R'])
self.assertFloatEqualAbs(dm['A']['R'], 1.)
self.assertFloatEqualAbs(dm['D']['N'], dm['N']['D'])
self.assertFloatEqualAbs(dm['D']['N'], 1.)
self.assertFloatEqualAbs(dm['A']['C'], dm['C']['A'])
self.assertFloatEqualAbs(dm['A']['C'], 4.)
self.assertFloatEqualAbs(dm['V']['A'], dm['A']['V'])
self.assertFloatEqualAbs(dm['V']['A'], 19.)
self.assertFloatEqualAbs(dm['V']['Y'], dm['Y']['V'])
self.assertFloatEqualAbs(dm['V']['Y'], 1.)
def test_toDistanceMatrix_w_stops(self):
""" AAIR1: Tests that toDistanceMatrix works right w/ stops """
square = self.aar.toDistanceMatrix(include_stops=1)
self.assertFloatEqualAbs(square['R']['A'], square['A']['R'])
self.assertFloatEqualAbs(square['A']['R'], 1.)
self.assertFloatEqualAbs(square['D']['N'], square['N']['D'])
self.assertFloatEqualAbs(square['D']['N'], 1.)
self.assertFloatEqualAbs(square['A']['C'], square['C']['A'])
self.assertFloatEqualAbs(square['A']['C'], 4.)
self.assertFloatEqualAbs(square['V']['A'], square['A']['V'])
self.assertFloatEqualAbs(square['V']['A'], 19.)
self.assertFloatEqualAbs(square['V']['Y'], square['Y']['V'])
self.assertFloatEqualAbs(square['V']['Y'], 1.)
self.assertFloatEqualAbs(square['V']['*'], None)
self.assertFloatEqualAbs(square['*']['Y'], None)
self.assertFloatEqualAbs(square['*']['*'], None)
self.assertFloatEqualAbs(square['*']['R'], None)
class AAIndex2RecordTests(AAIndexRecordTests):
""" AAIR2: Tests AAIndex2Records class """
def setUp(self):
AAIndexRecordTests.setUp(self)
# Build LTM data
values = range(210)
keys = 'ARNDCQEGHILKMFPSTWYV'
self.LTMdata = dict.fromkeys(keys)
i = 0
for r in keys:
new_row = dict.fromkeys(keys)
for c in keys:
if keys.find(c) <= keys.find(r):
new_row[c] = values[i]
i +=1
self.LTMdata[r] = new_row
self.aarLTM = AAIndex2Record(self.id, self.description,\
self.LITDB_entry_num, self.authors, self.title,\
self.citation, self.comments, self.LTMdata)
# Build Square matrix data
values = range(400)
self.SQUdata = dict.fromkeys(keys)
i = 0
for r in keys:
new_row = dict.fromkeys(keys)
for c in keys:
new_row[c] = values[i]
i +=1
self.SQUdata[r] = new_row
self.aarSquare = AAIndex2Record(self.id, self.description,\
self.LITDB_entry_num, self.authors, self.title,\
self.citation, self.comments, self.SQUdata)
def test_init(self):
""" AAIR2: Tests init method returns with no errors"""
test_aar = AAIndex2Record(self.id, self.description,\
self.LITDB_entry_num, self.authors, self.title,\
self.citation, self.comments, self.SQUdata)
def test_init_data(self):
""" AAIR2: Tests init correctly initializes data"""
self.assertEqual(self.aarLTM.ID, str(self.id))
self.assertEqual(self.aarLTM.Description, str(self.description))
self.assertEqual(self.aarLTM.LITDBEntryNum,\
str(self.LITDB_entry_num))
self.assertEqual(self.aarLTM.Authors, str(self.authors))
self.assertEqual(self.aarLTM.Title, str(self.title))
self.assertEqual(self.aarLTM.Citation, str(self.citation))
self.assertEqual(self.aarLTM.Comments, str(self.comments))
# def test_matrix_values_col_by_row(self):
# """ Tests that keys and values correctly correspond in data LTM
#
#
# Also tests that reverse keys are same as forward keys.
#
# """
#
# data_matrix = self.aarLTM.Data
# self.assertEqual(data_matrix['A']['A'], 0)
# self.assertEqual(data_matrix['A']['R'], 1)
# self.assertEqual(data_matrix['R']['R'], 2)
# self.assertEqual(data_matrix['C']['H'], 40)
# self.assertEqual(data_matrix['I']['M'], 87)
# self.assertEqual(data_matrix['D']['P'], 108)
# self.assertEqual(data_matrix['W']['V'], 207)
# self.assertEqual(data_matrix['Y']['V'], 208)
# self.assertEqual(data_matrix['V']['V'], 209)
# def test_LTM_values_row_by_col(self):
# """ Tests that keys are correctly linked to values in a LTM
#
# This tests that some random places hold the correct values.
# These are some randomly selected keys with hand calculated
# values. Also included are the extreme values. Technically if
# the first and last three are correct all values should be
# correct.
#
# """
# data_matrix = self.aarLTM.Data
# self.assertEqual(data_matrix['R']['A'], 1)
# self.assertEqual(data_matrix['H']['C'], 40)
# self.assertEqual(data_matrix['M']['I'], 87)
# self.assertEqual(data_matrix['P']['D'], 108)
# self.assertEqual(data_matrix['V']['W'], 207)
# self.assertEqual(data_matrix['V']['Y'], 208)
# self.assertEqual(data_matrix['A']['A'], 0)
# self.assertEqual(data_matrix['R']['R'], 2)
# self.assertEqual(data_matrix['V']['V'], 209)
def test_Square_Matrix_values_row_by_col(self):
""" AAIR2: Tests that key -> value pair integrity in Square matrix
"""
data_matrix = self.aarSquare.Data
self.assertEqual(data_matrix['R']['A'], 20)
#self.assertEqual(data_matrix['H']['C'], 40)
#self.assertEqual(data_matrix['M']['I'], 87)
#self.assertEqual(data_matrix['P']['D'], 108)
#self.assertEqual(data_matrix['V']['W'], 207)
#self.assertEqual(data_matrix['V']['Y'], 208)
self.assertEqual(data_matrix['A']['A'], 0)
self.assertEqual(data_matrix['R']['R'], 21)
self.assertEqual(data_matrix['V']['V'], 399)
def test_toSquareDistanceMatrix_data_integrity(self):
""" AAIR2: Tests that _toSquareDistanceMatrix works right w/o stops
"""
square = self.aarSquare._toSquareDistanceMatrix()
self.assertEqual(square['R']['A'], 20)
self.assertEqual(square['A']['A'], 0)
self.assertEqual(square['R']['R'], 21)
self.assertEqual(square['V']['V'], 399)
def test_toSquareDistanceMatrix_data_integrity_w_stops(self):
""" AAIR2: Tests that _toSquareDistanceMatrix works right with stops
"""
square = self.aarSquare._toSquareDistanceMatrix(include_stops=1)
self.assertEqual(square['R']['A'], 20)
self.assertEqual(square['A']['A'], 0)
self.assertEqual(square['R']['R'], 21)
self.assertEqual(square['V']['V'], 399)
self.assertEqual(square['V']['*'], None)
self.assertEqual(square['*']['Y'], None)
self.assertEqual(square['*']['*'], None)
self.assertEqual(square['*']['R'], None)
# Data for parser tests
fake_file_aaindex1 =\
"""
H ANDN920101
D alpha-CH chemical shifts (Andersen et al., 1992)
R LIT:1810048b PMID:1575719
A Andersen, N.H., Cao, B. and Chen, C.
T Peptide/protein structure analysis using the chemical shift index method:
upfield alpha-CH values reveal dynamic helices and aL sites
J Biochem. and Biophys. Res. Comm. 184, 1008-1014 (1992)
C BUNA790102 0.949
I A/L R/K N/M D/F C/P Q/S E/T G/W H/Y I/V
4.35 4.38 4.75 4.76 4.65 4.37 4.29 3.97 4.63 3.95
4.17 4.36 4.52 4.66 4.44 4.50 4.35 4.70 4.60 3.95
//
H ARGP820101
D Hydrophobicity index (Argos et al., 1982)
R LIT:0901079b PMID:7151796
A Argos, P., Rao, J.K.M. and Hargrave, P.A.
T Structural prediction of membrane-bound proteins
J Eur. J. Biochem. 128, 565-575 (1982)
C JOND750101 1.000 SIMZ760101 0.967 GOLD730101 0.936
TAKK010101 0.906 MEEJ810101 0.891 CIDH920105 0.867
LEVM760106 0.865 CIDH920102 0.862 MEEJ800102 0.855
MEEJ810102 0.853 CIDH920103 0.827 PLIV810101 0.820
CIDH920104 0.819 LEVM760107 0.806 NOZY710101 0.800
PARJ860101 -0.835 WOLS870101 -0.838 BULH740101 -0.854
I A/L R/K N/M D/F C/P Q/S E/T G/W H/Y I/V
0.61 0.60 0.06 0.46 1.07 0. 0.47 0.07 0.61 2.22
1.53 1.15 1.18 2.02 1.95 0.05 0.05 2.65 1.88 1.32
//
H TSAJ990102
D Volumes not including the crystallographic waters using the ProtOr (Tsai et
al., 1999)
R PMID:10388571
A Tsai, J., Taylor, R., Chothia, C. and Gerstein, M.
T The packing density in proteins: standard radii and volumes
J J Mol Biol. 290, 253-266 (1999)
* (Cyh 113.7)
C TSAJ990101 1.000 CHOC750101 0.996 BIGC670101 0.992
GOLD730102 0.991 KRIW790103 0.987 FAUJ880103 0.985
GRAR740103 0.978 CHAM820101 0.978 CHOC760101 0.972
FASG760101 0.940 LEVM760105 0.928 LEVM760102 0.918
ROSG850101 0.909 DAWD720101 0.905 CHAM830106 0.896
FAUJ880106 0.882 RADA880106 0.864 LEVM760107 0.861
LEVM760106 0.841 RADA880103 -0.879
I A/L R/K N/M D/F C/P Q/S E/T G/W H/Y I/V
90.0 194.0 124.7 117.3 103.3 149.4 142.2 64.9 160.0 163.9
164.0 167.3 167.0 191.9 122.9 95.4 121.5 228.2 197.0 139.0
//
H JURD980101
D Modified Kyte-Doolittle hydrophobicity scale (Juretic et al., 1998)
R
A Juretic, D., Lucic, B., Zucic, D. and Trinajstic, N.
T Protein transmembrane structure: recognition and prediction by using
hydrophobicity scales through preference functions
J Theoretical and Computational Chemistry, 5, 405-445 (1998)
C KYTJ820101 0.996 CHOC760103 0.967 NADH010102 0.931
JANJ780102 0.928 NADH010101 0.925 EISD860103 0.901
DESM900102 0.900 NADH010103 0.900 EISD840101 0.895
RADA880101 0.893 MANP780101 0.887 WOLR810101 0.881
PONP800103 0.879 JANJ790102 0.879 NADH010104 0.873
CHOC760104 0.870 PONP800102 0.869 JANJ790101 0.868
MEIH800103 0.861 PONP800101 0.858 NAKH920108 0.858
RADA880108 0.857 PONP800108 0.856 ROSG850102 0.854
PONP930101 0.849 RADA880107 0.842 BIOV880101 0.840
MIYS850101 0.837 FAUJ830101 0.833 CIDH920104 0.832
DESM900101 0.829 WARP780101 0.827 KANM800104 0.826
LIFS790102 0.824 RADA880104 0.824 NADH010105 0.821
NISK800101 0.816 NISK860101 0.808 BIOV880102 0.805
ARGP820102 0.802 ARGP820103 0.800 VHEG790101 -0.814
KRIW790101 -0.824 CHOC760102 -0.851 ROSM880101 -0.851
MONM990101 -0.853 JANJ780103 -0.853 RACS770102 -0.855
PRAM900101 -0.862 JANJ780101 -0.862 GUYH850101 -0.864
GRAR740102 -0.864 MEIH800102 -0.879 KUHL950101 -0.884
ROSM880102 -0.894 OOBM770101 -0.903
I A/L R/K N/M D/F C/P Q/S E/T G/W H/Y I/V
1.10 -5.10 -3.50 -3.60 2.50 -3.68 -3.20 -0.64 -3.20 4.50
3.80 -4.11 1.90 2.80 -1.90 -0.50 -0.70 -0.46 -1.3 4.2
//
H WILM950104
D Hydrophobicity coefficient in RP-HPLC, C18 with 0.1%TFA/2-PrOH/MeCN/H2O
(Wilce et al. 1995)
R
A Wilce, M.C., Aguilar, M.I. and Hearn, M.T.
T Physicochemical basis of amino acid hydrophobicity scales: evaluation of four
new scales of amino acid hydrophobicity coefficients derived from RP-HPLC of
peptides
J Anal Chem. 67, 1210-1219 (1995)
C
I A/L R/K N/M D/F C/P Q/S E/T G/W H/Y I/V
-2.34 1.60 2.81 -0.48 5.03 0.16 1.30 -1.06 -3.00 7.26
1.09 1.56 0.62 2.57 -0.15 1.93 0.19 3.59 -2.58 2.06
//
H ARGP820103
D Membrane-buried preference parameters (Argos et al., 1982)
R LIT:0901079b PMID:7151796
A Argos, P., Rao, J.K.M. and Hargrave, P.A.
T Structural prediction of membrane-bound proteins
J Eur. J. Biochem. 128, 565-575 (1982)
C ARGP820102 0.961 MIYS850101 0.822 NAKH900106 0.810
EISD860103 0.810 KYTJ820101 0.806 JURD980101 0.800
I A/L R/K N/M D/F C/P Q/S E/T G/W H/Y I/V
1.56 0.45 0.27 0.14 1.23 0.51 0.23 0.62 0.29 1.67
2.93 0.15 2.96 2.03 0.76 0.81 0.91 1.08 0.68 1.14
//
"""
fake_file_aaindex2 =\
"""
H ALTS910101
D The PAM-120 matrix (Altschul, 1991)
R LIT:1713145 PMID:2051488
A Altschul, S.F.
T Amino acid substitution matrices from an information theoretic perspective
J J. Mol. Biol. 219, 555-565 (1991)
M rows = ARNDCQEGHILKMFPSTWYV, cols = ARNDCQEGHILKMFPSTWYV
3.
-3. 6.
0. -1. 4.
0. -3. 2. 5.
-3. -4. -5. -7. 9.
-1. 1. 0. 1. -7. 6.
0. -3. 1. 3. -7. 2. 5.
1. -4. 0. 0. -5. -3. -1. 5.
-3. 1. 2. 0. -4. 3. -1. -4. 7.
-1. -2. -2. -3. -3. -3. -3. -4. -4. 6.
-3. -4. -4. -5. -7. -2. -4. -5. -3. 1. 5.
-2. 2. 1. -1. -7. 0. -1. -3. -2. -2. -4. 5.
-2. -1. -3. -4. -6. -1. -4. -4. -4. 1. 3. 0. 8.
-4. -4. -4. -7. -6. -6. -6. -5. -2. 0. 0. -6. -1. 8.
1. -1. -2. -2. -3. 0. -1. -2. -1. -3. -3. -2. -3. -5. 6.
1. -1. 1. 0. -1. -2. -1. 1. -2. -2. -4. -1. -2. -3. 1. 3.
1. -2. 0. -1. -3. -2. -2. -1. -3. 0. -3. -1. -1. -4. -1. 2. 4.
-7. 1. -5. -8. -8. -6. -8. -8. -5. -7. -5. -5. -7. -1. -7. -2. -6. 12.
-4. -6. -2. -5. -1. -5. -4. -6. -1. -2. -3. -6. -4. 4. -6. -3. -3. -1. 8.
0. -3. -3. -3. -2. -3. -3. -2. -3. 3. 1. -4. 1. -3. -2. -2. 0. -8. -3. 5.
//
H BENS940103
D Log-odds scoring matrix collected in 74-100 PAM (Benner et al., 1994)
R LIT:2023094 PMID:7700864
A Benner, S.A., Cohen, M.A. and Gonnet, G.H.
T Amino acid substitution during functionally constrained divergent
evolution of protein sequences
J Protein Engineering 7, 1323-1332 (1994)
* extrapolated to 250 PAM
M rows = ARNDCQEGHILKMFPSTWYV, cols = ARNDCQEGHILKMFPSTWYV
2.4
-0.8 4.8
-0.2 0.3 3.6
-0.3 -0.5 2.2 4.8
0.3 -2.2 -1.8 -3.2 11.8
-0.3 1.6 0.7 0.8 -2.6 3.0
-0.1 0.3 1.0 2.9 -3.2 1.7 3.7
0.6 -1.0 0.4 0.2 -2.0 -1.1 -0.5 6.6
-1.0 1.0 1.2 0.4 -1.3 1.4 0.2 -1.6 6.1
-0.8 -2.6 -2.8 -3.9 -1.2 -2.0 -2.9 -4.3 -2.3 4.0
-1.4 -2.4 -3.1 -4.2 -1.6 -1.7 -3.1 -4.6 -1.9 2.8 4.2
-0.4 2.9 0.9 0.4 -2.9 1.7 1.2 -1.1 0.6 -2.3 -2.4 3.4
-0.8 -1.8 -2.2 -3.2 -1.2 -1.0 -2.2 -3.5 -1.5 2.6 2.9 -1.5 4.5
-2.6 -3.5 -3.2 -4.7 -0.7 -2.8 -4.3 -5.4 0.0 0.9 2.1 -3.6 1.3 7.2
0.4 -1.0 -1.0 -1.0 -3.1 -0.2 -0.7 -1.7 -1.0 -2.6 -2.2 -0.8 -2.4 -3.8 7.5
1.1 -0.2 0.9 0.4 0.1 0.1 0.1 0.4 -0.3 -1.8 -2.2 0.0 -1.4 -2.6 0.5 2.1
0.7 -0.3 0.4 -0.2 -0.6 -0.1 -0.2 -1.0 -0.5 -0.3 -1.1 0.1 -0.4 -2.2 0.1 1.4 2.5
-4.1 -1.6 -4.0 -5.5 -0.9 -2.8 -4.7 -4.1 -1.0 -2.3 -0.9 -3.6 -1.3 3.0 -5.2 -3.4 -3.7 14.7
-2.6 -2.0 -1.4 -2.8 -0.4 -1.8 -3.0 -4.3 2.5 -1.0 -0.1 -2.4 -0.5 5.3 -3.4 -1.9 -2.1 3.6 8.1
0.1 -2.2 -2.2 -2.9 -0.2 -1.7 -2.1 -3.1 -2.1 3.2 1.9 -1.9 1.8 0.1 -1.9 -1.0 0.2 -2.9 -1.4 3.4
//
H QUIB020101
D STROMA score matrix for the alignment of known distant homologs
(Qian-Goldstein, 2002)
R PMID:12211027
A Qian, B. and Goldstein, R.A.
T Optimization of a new score function for the generation of accurate
alignments
J Proteins. 48, 605-610 (2002)
M rows = ARNDCQEGHILKMFPSTWYV, cols = ARNDCQEGHILKMFPSTWYV
2.5
0.2 5.2
1.1 0.7 2.5
1 0.1 3.3 5.3
1.2 -1.3 -1.9 -3.1 11.5
-0.1 2 1.9 1.1 -2.5 3.6
1.2 1.9 2.3 3.2 -2.4 1.7 3.7
1.4 -0.2 0.7 0.9 -1.3 -0.3 0.5 7.5
-1.4 1.5 1.4 0.5 -1.7 1.4 0.3 -1.7 6.8
0.3 -1.9 -2.4 -2.9 -3.2 -0.9 -3.1 -3.7 -1.8 4.5
-0.2 -1.5 -2.4 -3.4 -1.6 -1.2 -1.5 -3.8 -2.4 3.4 5.2
-0.2 3.4 1.6 1.4 -3 2.2 1.2 0.4 1.1 -1.5 -2 3.9
-0.2 -1.4 -2.1 -2.8 -1.3 -0.6 -2 -3.8 -0.8 2.2 3.1 -0.5 5.4
-1.6 -3.2 -2.5 -3.7 -0.8 -1.7 -13.7 -4.7 -0.9 2.2 3.7 -2.8 1.7 7
0.7 -0.6 -0.1 -0.2 -3.6 1 0 -0.8 -2.1 -2.4 -1.4 0.2 -1.9 -4.1 8.1
1.7 0.2 1.4 1.7 0.7 0.9 1.1 1.6 -0.1 -1.1 -0.8 1.4 -1.1 -2.5 2 2.8
1.7 0.2 1.4 0.1 0.3 -0.1 1.6 -0.6 -0.2 0 0.3 1 -0.3 -0.8 1.1 2.6 0.4
-3.3 -1.5 -4 -5.7 -0.5 -2.9 -4.7 -4.2 -1.2 -1.8 -1.2 -3 -0.6 3.7 -5 -2.8 -2.9 14.9
-1.8 -0.9 -0.8 -2.9 -0.3 -1.5 -2.2 -4.8 2.9 0.2 0.8 -1.5 0.5 5.2 -3.3 -0.9 -0.8 4.9 8.1
1.9 -2.8 -0.9 -2.5 0.7 -1.5 -1.3 -1.4 -2.5 4.5 3.4 -1 1.7 0.9 -1.1 -3 1.5 -2.5 0.3 4.2
//
H HENS920104
D BLOSUM50 substitution matrix (Henikoff-Henikoff, 1992)
R LIT:1902106 PMID:1438297
A Henikoff, S. and Henikoff, J.G.
T Amino acid substitution matrices from protein blocks
J Proc. Natl. Acad. Sci. USA 89, 10915-10919 (1992)
* # Matrix made by matblas from blosum50.iij
* # BLOSUM Clustered Scoring Matrix in 1/3 Bit Units
* # Blocks Database = /data/blocks_5.0/blocks.dat
* # Cluster Percentage: >= 50
* # Entropy = 0.4808, Expected = -0.3573
M rows = ARNDCQEGHILKMFPSTWYV, cols = ARNDCQEGHILKMFPSTWYV
5 -2 -1 -2 -1 -1 -1 0 -2 -1 -2 -1 -1 -3 -1 1 0 -3 -2 0
-2 7 -1 -2 -4 1 0 -3 0 -4 -3 3 -2 -3 -3 -1 -1 -3 -1 -3
-1 -1 7 2 -2 0 0 0 1 -3 -4 0 -2 -4 -2 1 0 -4 -2 -3
-2 -2 2 8 -4 0 2 -1 -1 -4 -4 -1 -4 -5 -1 0 -1 -5 -3 -4
-1 -4 -2 -4 13 -3 -3 -3 -3 -2 -2 -3 -2 -2 -4 -1 -1 -5 -3 -1
-1 1 0 0 -3 7 2 -2 1 -3 -2 2 0 -4 -1 0 -1 -1 -1 -3
-1 0 0 2 -3 2 6 -3 0 -4 -3 1 -2 -3 -1 -1 -1 -3 -2 -3
0 -3 0 -1 -3 -2 -3 8 -2 -4 -4 -2 -3 -4 -2 0 -2 -3 -3 -4
-2 0 1 -1 -3 1 0 -2 10 -4 -3 0 -1 -1 -2 -1 -2 -3 2 -4
-1 -4 -3 -4 -2 -3 -4 -4 -4 5 2 -3 2 0 -3 -3 -1 -3 -1 4
-2 -3 -4 -4 -2 -2 -3 -4 -3 2 5 -3 3 1 -4 -3 -1 -2 -1 1
-1 3 0 -1 -3 2 1 -2 0 -3 -3 6 -2 -4 -1 0 -1 -3 -2 -3
-1 -2 -2 -4 -2 0 -2 -3 -1 2 3 -2 7 0 -3 -2 -1 -1 0 1
-3 -3 -4 -5 -2 -4 -3 -4 -1 0 1 -4 0 8 -4 -3 -2 1 4 -1
-1 -3 -2 -1 -4 -1 -1 -2 -2 -3 -4 -1 -3 -4 10 -1 -1 -4 -3 -3
1 -1 1 0 -1 0 -1 0 -1 -3 -3 0 -2 -3 -1 5 2 -4 -2 -2
0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 2 5 -3 -2 0
-3 -3 -4 -5 -5 -1 -3 -3 -3 -3 -2 -3 -1 1 -4 -4 -3 15 2 -3
-2 -1 -2 -3 -3 -1 -2 -3 2 -1 -1 -2 0 4 -3 -2 -2 2 8 -1
0 -3 -3 -4 -1 -3 -3 -4 -4 4 1 -3 1 -1 -3 -2 0 -3 -1 5
//
H KOSJ950101
D Context-dependent optimal substitution matrices for exposed helix
(Koshi-Goldstein, 1995)
R LIT:2124140 PMID:8577693
A Koshi, J.M. and Goldstein, R.A.
T Context-dependent optimal substitution matrices.
J Protein Engineering 8, 641-645 (1995)
M rows = -ARNDCQEGHILKMFPSTWYV, cols = -ARNDCQEGHILKMFPSTWYV
55.7
3.0 3.0
3.0 3.0 0.4
0.1 3.0 3.0 2.1
3.0 3.0 3.0 0.1 1.9
2.2 2.4 3.0 0.8 1.3 3.0
25.6 47.2 1.5 1.0 0.7 0.3 1.9
2.3 4.3 0.6 0.2 2.0 0.8 0.1 0.3
3.1 2.8 3.7 0.4 0.1 2.0 14.8 0.9 62.7
1.3 0.4 0.3 4.6 0.3 0.1 1.9 0.5 2.2 5.1
0.6 0.2 0.4 1.9 1.5 0.4 0.2 0.3 15.2 0.2 0.5
48.2 3.3 0.1 3.2 4.9 0.1 1.7 1.7 1.4 3.0 0.6 1.0
0.1 9.7 2.7 0.7 1.1 1.5 15.9 3.9 1.4 7.3 52.1 0.3 0.9
11.0 2.0 0.4 0.6 0.1 0.6 0.5 0.1 0.6 2.9 0.1 0.1 0.1 0.1
9.4 1.5 0.1 1.5 1.6 73.6 0.1 2.6 0.1 0.1 2.1 4.0 0.1 0.1 0.8
0.7 0.3 2.2 0.1 0.1 0.1 0.1 8.4 5.7 2.0 4.5 0.3 47.5 8.2 0.9 1.6
0.1 3.4 7.8 0.5 0.1 0.7 5.3 2.2 0.2 0.7 0.5 5.2 5.3 1.0 1.5 8.6 0.1
4.9 56.8 1.5 1.0 0.3 0.9 5.8 0.1 0.2 1.6 2.1 2.4 0.2 0.1 1.1 20.2 2.0 1.2
2.3 3.3 0.1 0.4 0.1 6 4.8 0.8 0.1 0.1 1.4 0.3 0.6 0.1 1.2 0.6 0.1 0.5 13.3
0.3 4.7 7.5 1.8 0.1 4.4 0.7 0.1 56.9 0.6 0.1 2.3 1.2 2.2 0.1 0.1 0.1 0.1 4.4 0.1
18.4 0.1 0.1 0.1 0.1 0.1 0.4 0.1 0.1 0.1 5 2.6 10.8 1.2 3.5 1.3 0.1 0.1 3.4 0.1 0.1
//
H OVEJ920102
D Environment-specific amino acid substitution matrix for alpha residues
(Overington et al., 1992)
R LIT:1811128 PMID:1304904
A Overington, J., Donnelly, D., Johnson, M.S., Sali, A. and Blundell, T.L.
T Environment-specific amino acid substitution tables: tertiary templates
and prediction of protein folds
J Protein Science 1, 216-226 (1992)
M rows = ACDEFGHIKLMNPQRSTVWYJ-, cols = ACDEFGHIKLMNPQRSTVWYJ
0.355 0.007 0.090 0.100 0.050 0.177 0.037 0.077 0.096 0.056 0.081 0.103 0.106 0.090 0.088 0.163 0.120 0.098 0.065 0.036 0.252
0.001 0.901 0.000 0.000 0.000 0.000 0.000 0.004 0.001 0.000 0.000 0.003 0.000 0.006 0.006 0.004 0.002 0.000 0.007 0.000 0.000
0.038 0.000 0.315 0.109 0.006 0.041 0.027 0.009 0.033 0.004 0.009 0.088 0.051 0.089 0.023 0.065 0.048 0.013 0.012 0.011 0.009
0.044 0.011 0.111 0.305 0.011 0.048 0.026 0.011 0.059 0.013 0.009 0.068 0.069 0.086 0.053 0.033 0.045 0.017 0.012 0.018 0.000
0.017 0.000 0.005 0.007 0.415 0.004 0.009 0.039 0.025 0.097 0.042 0.013 0.006 0.011 0.009 0.009 0.014 0.041 0.053 0.085 0.009
0.065 0.000 0.070 0.042 0.006 0.370 0.017 0.022 0.029 0.013 0.015 0.036 0.043 0.031 0.013 0.068 0.049 0.014 0.009 0.021 0.045
0.010 0.000 0.012 0.011 0.010 0.007 0.571 0.003 0.022 0.005 0.015 0.043 0.006 0.035 0.021 0.016 0.008 0.017 0.009 0.037 0.009
0.029 0.014 0.009 0.008 0.048 0.021 0.004 0.325 0.017 0.076 0.107 0.018 0.007 0.007 0.015 0.014 0.033 0.112 0.016 0.030 0.018
0.053 0.007 0.044 0.081 0.020 0.041 0.044 0.026 0.336 0.029 0.059 0.073 0.045 0.094 0.163 0.041 0.054 0.026 0.041 0.028 0.036
0.038 0.000 0.006 0.018 0.210 0.019 0.004 0.139 0.033 0.415 0.225 0.033 0.016 0.041 0.028 0.029 0.026 0.133 0.037 0.057 0.036
0.013 0.000 0.004 0.003 0.016 0.007 0.000 0.043 0.014 0.053 0.197 0.010 0.000 0.018 0.004 0.003 0.010 0.018 0.021 0.021 0.018
0.031 0.007 0.057 0.035 0.010 0.026 0.054 0.012 0.034 0.012 0.013 0.195 0.015 0.066 0.026 0.037 0.046 0.012 0.002 0.048 0.000
0.022 0.000 0.036 0.035 0.005 0.026 0.011 0.009 0.020 0.006 0.000 0.013 0.424 0.013 0.016 0.039 0.011 0.009 0.002 0.000 0.000
0.025 0.011 0.045 0.039 0.011 0.021 0.031 0.004 0.045 0.015 0.035 0.059 0.015 0.183 0.029 0.030 0.030 0.008 0.007 0.025 0.009
0.019 0.011 0.012 0.023 0.005 0.008 0.019 0.010 0.069 0.009 0.004 0.018 0.013 0.028 0.348 0.030 0.019 0.005 0.007 0.018 0.018
0.086 0.021 0.075 0.047 0.012 0.079 0.033 0.020 0.041 0.020 0.009 0.089 0.082 0.069 0.063 0.264 0.096 0.028 0.005 0.020 0.054
0.043 0.007 0.039 0.033 0.020 0.038 0.014 0.026 0.032 0.015 0.026 0.057 0.028 0.046 0.035 0.065 0.266 0.037 0.016 0.034 0.000
0.055 0.000 0.018 0.021 0.069 0.022 0.044 0.178 0.025 0.111 0.016 0.018 0.025 0.017 0.015 0.129 0.060 0.350 0.012 0.043 0.162
0.009 0.000 0.003 0.004 0.022 0.004 0.007 0.006 0.012 0.006 0.020 0.001 0.001 0.006 0.004 0.002 0.007 0.003 0.588 0.064 0.000
0.009 0.000 0.006 0.006 0.046 0.006 0.029 0.014 0.007 0.013 0.031 0.033 0.003 0.020 0.010 0.007 0.017 0.016 0.078 0.377 0.027
0.009 0.000 0.001 0.000 0.001 0.004 0.001 0.002 0.002 0.002 0.004 0.000 0.000 0.004 0.003 0.006 0.004 0.010 0.000 0.005 0.297
0.028 0.004 0.041 0.074 0.010 0.029 0.017 0.022 0.050 0.031 0.033 0.031 0.045 0.039 0.028 0.047 0.034 0.032 0.002 0.021 0.000
//
"""
# Run tests if called from the command line
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_agilent_microarray.py 000644 000765 000024 00000002734 12024702176 025046 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests for the Microarray output parser
"""
from __future__ import division
from cogent.util.unit_test import TestCase, main
from cogent.parse.agilent_microarray import *
__author__ = "Jeremy Widmann"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jeremy Widmann", "Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Jeremy Widmann"
__email__ = "jeremy.widmann@colorado.edu"
__status__ = "Production"
class MicroarrayParserTests(TestCase):
"""Tests for MicroarrayParser.
"""
def setUp(self):
"""Setup function for MicroarrayParser tests.
"""
self.sample_file = ['first line in file',
'second line, useless data',
'FEATURES\tFirst\tL\tProbeName\tGeneName\tLogRatio',
'DATA\tFirst\tData\tProbe1\tGene1\t0.02',
'DATA\tSecond\tData\tProbe2\tGene2\t-0.34']
def test_MicroarrayParser_empty_list(self):
#Empty list should return tuple of empty lists
self.assertEqual(MicroarrayParser([]),([],[],[]))
def test_MicroarrayParser(self):
#Given correct file format, return correct results
self.assertEqual(MicroarrayParser(self.sample_file),
(['PROBE1','PROBE2'],
['GENE1','GENE2'],[float(0.02),float(-0.34)]))
#run if called from command-line
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_parse/test_binary_sff.py 000644 000765 000024 00000043253 12024702176 023316 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
import copy
import os
import tempfile
from unittest import TestCase, main
from cogent.parse.binary_sff import (
seek_pad, parse_common_header, parse_read_header, parse_read_data,
validate_common_header, parse_read, parse_binary_sff, UnsupportedSffError,
write_pad, write_common_header, write_read_header, write_read_data,
write_read, write_binary_sff, format_common_header, format_read_header,
format_read_data, format_binary_sff, base36_encode, base36_decode,
decode_location, decode_timestamp, decode_accession, decode_sff_filename,
)
__author__ = "Kyle Bittinger"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Kyle Bittinger"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Kyle Bittinger"
__email__ = "kylebittinger@gmail.com"
__status__ = "Production"
TEST_DIR = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))
SFF_FP = os.path.join(TEST_DIR, 'data', 'F6AVWTA01.sff')
class WritingFunctionTests(TestCase):
def setUp(self):
self.output_file = tempfile.TemporaryFile()
def test_write_pad(self):
self.output_file.write('\x01\x02\x03\x04')
write_pad(self.output_file)
self.output_file.seek(0)
buff = self.output_file.read()
self.assertEqual(buff, '\x01\x02\x03\x04\x00\x00\x00\x00')
def test_write_common_header(self):
write_common_header(self.output_file, COMMON_HEADER)
file_pos = self.output_file.tell()
self.assertTrue(file_pos % 8 == 0)
self.output_file.seek(0)
observed = parse_common_header(self.output_file)
self.assertEqual(observed, COMMON_HEADER)
file_pos = self.output_file.tell()
self.assertTrue(file_pos % 8 == 0)
def test_write_read_header(self):
write_read_header(self.output_file, READ_HEADER)
file_pos = self.output_file.tell()
self.assertTrue(file_pos % 8 == 0)
self.output_file.seek(0)
observed = parse_read_header(self.output_file)
self.assertEqual(observed, READ_HEADER)
file_pos = self.output_file.tell()
self.assertTrue(file_pos % 8 == 0)
def test_write_read_data(self):
write_read_data(self.output_file, READ_DATA)
file_pos = self.output_file.tell()
self.assertTrue(file_pos % 8 == 0)
self.output_file.seek(0)
num_flows = len(READ_DATA['flowgram_values'])
num_bases = len(READ_DATA['Bases'])
observed = parse_read_data(self.output_file, num_bases, num_flows)
self.assertEqual(observed, READ_DATA)
file_pos = self.output_file.tell()
self.assertTrue(file_pos % 8 == 0)
def test_write_read(self):
read = READ_HEADER.copy()
read.update(READ_DATA)
write_read(self.output_file, read)
file_pos = self.output_file.tell()
self.assertTrue(file_pos % 8 == 0)
self.output_file.seek(0)
num_flows = len(read['flowgram_values'])
observed = parse_read(self.output_file)
self.assertEqual(observed, read)
file_pos = self.output_file.tell()
self.assertTrue(file_pos % 8 == 0)
def test_write_binary_sff(self):
read = READ_HEADER.copy()
read.update(READ_DATA)
header = COMMON_HEADER.copy()
header['number_of_reads'] = 1
write_binary_sff(self.output_file, header, [read])
file_pos = self.output_file.tell()
self.assertTrue(file_pos % 8 == 0)
self.output_file.seek(0)
observed_header, observed_reads = parse_binary_sff(
self.output_file, native_flowgram_values=True)
observed_reads = list(observed_reads)
self.assertEqual(observed_header, header)
self.assertEqual(observed_reads[0], read)
self.assertEqual(len(observed_reads), 1)
file_pos = self.output_file.tell()
self.assertTrue(file_pos % 8 == 0)
class ParsingFunctionTests(TestCase):
def setUp(self):
self.sff_file = open(SFF_FP)
def test_seek_pad(self):
f = self.sff_file
f.seek(8)
seek_pad(f)
self.assertEqual(f.tell(), 8)
f.seek(9)
seek_pad(f)
self.assertEqual(f.tell(), 16)
f.seek(10)
seek_pad(f)
self.assertEqual(f.tell(), 16)
f.seek(15)
seek_pad(f)
self.assertEqual(f.tell(), 16)
f.seek(16)
seek_pad(f)
self.assertEqual(f.tell(), 16)
f.seek(17)
seek_pad(f)
self.assertEqual(f.tell(), 24)
def test_parse_common_header(self):
observed = parse_common_header(self.sff_file)
self.assertEqual(observed, COMMON_HEADER)
def test_validate_common_header(self):
header = {
'magic_number': 779314790,
'version': 1,
'flowgram_format_code': 1,
'index_offset': 0,
'index_length': 0,
'number_of_reads': 0,
'header_length': 0,
'key_length': 0,
'number_of_flows_per_read': 0,
'flow_chars': 'A',
'key_sequence': 'A',
}
self.assertEqual(validate_common_header(header), None)
header['version'] = 2
self.assertRaises(UnsupportedSffError, validate_common_header, header)
def test_parse_read_header(self):
self.sff_file.seek(440)
observed = parse_read_header(self.sff_file)
self.assertEqual(observed, READ_HEADER)
def test_parse_read_data(self):
self.sff_file.seek(440 + 32)
observed = parse_read_data(self.sff_file, 271, 400)
self.assertEqual(observed, READ_DATA)
def test_parse_read(self):
self.sff_file.seek(440)
observed = parse_read(self.sff_file, 400)
expected = dict(READ_HEADER.items() + READ_DATA.items())
self.assertEqual(observed, expected)
def test_parse_sff(self):
header, reads = parse_binary_sff(self.sff_file)
self.assertEqual(header, COMMON_HEADER)
counter = 0
for read in reads:
self.assertEqual(
len(read['flowgram_values']), header['number_of_flows_per_read'])
counter += 1
self.assertEqual(counter, 20)
class FormattingFunctionTests(TestCase):
def setUp(self):
self.output_file = tempfile.TemporaryFile()
def test_format_common_header(self):
self.assertEqual(
format_common_header(COMMON_HEADER), COMMON_HEADER_TXT)
def test_format_read_header(self):
self.assertEqual(
format_read_header(READ_HEADER), READ_HEADER_TXT)
def test_format_read_header(self):
self.assertEqual(
format_read_data(READ_DATA, READ_HEADER), READ_DATA_TXT)
def test_format_binary_sff(self):
output_buffer = format_binary_sff(open(SFF_FP))
output_buffer.seek(0)
expected = COMMON_HEADER_TXT + READ_HEADER_TXT + READ_DATA_TXT
observed = output_buffer.read(len(expected))
self.assertEqual(observed, expected)
class Base36Tests(TestCase):
def test_base36_encode(self):
self.assertEqual(base36_encode(2), 'C')
self.assertEqual(base36_encode(37), 'BB')
def test_base36_decode(self):
self.assertEqual(base36_decode('C'), 2)
self.assertEqual(base36_decode('BB'), 37)
def test_decode_location(self):
self.assertEqual(decode_location('C'), (0, 2))
def test_decode_timestamp(self):
self.assertEqual(decode_timestamp('C3U5GW'), (2004, 9, 22, 16, 59, 10))
self.assertEqual(decode_timestamp('GA202I'), (2010, 1, 22, 13, 28, 56))
def test_decode_accession(self):
self.assertEqual(
decode_accession('GA202I001ER3QL'),
((2010, 1, 22, 13, 28, 56), '0', 1, (1843, 859)))
def test_decode_sff_filename(self):
self.assertEqual(
decode_sff_filename('F6AVWTA01.sff'),
((2009, 11, 25, 14, 30, 19), 'A', 1))
COMMON_HEADER = {
'header_length': 440,
'flowgram_format_code': 1,
'index_length': 900,
'magic_number': 779314790,
'number_of_flows_per_read': 400,
'version': 1,
'flow_chars': 100 * 'TACG',
'key_length': 4,
'key_sequence': 'TCAG',
'number_of_reads': 20,
'index_offset': 33464,
}
COMMON_HEADER_TXT = """\
Common Header:
Magic Number: 0x2E736666
Version: 0001
Index Offset: 33464
Index Length: 900
# of Reads: 20
Header Length: 440
Key Length: 4
# of Flows: 400
Flowgram Code: 1
Flow Chars: TACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACG
Key Sequence: TCAG
"""
READ_HEADER = {
'name_length': 14,
'Name': 'GA202I001ER3QL',
'clip_adapter_left': 0,
'read_header_length': 32,
'clip_adapter_right': 0,
'number_of_bases': 271,
'clip_qual_left': 5,
'clip_qual_right': 271,
}
READ_HEADER_TXT = """
>GA202I001ER3QL
Run Prefix: R_2010_01_22_13_28_56_
Region #: 1
XY Location: 1843_0859
Read Header Len: 32
Name Length: 14
# of Bases: 271
Clip Qual Left: 5
Clip Qual Right: 271
Clip Adap Left: 0
Clip Adap Right: 0
"""
READ_DATA = {
'flow_index_per_base': (
1, 2, 3, 2, 3, 3, 2, 1, 1, 2, 1, 2, 0, 2, 3, 3, 2, 3, 3, 0, 2, 0, 2, 0,
1, 1, 1, 2, 0, 2, 2, 1, 0, 0, 3, 0, 2, 1, 0, 1, 1, 3, 1, 2, 2, 2, 3, 2,
1, 0, 2, 0, 3, 0, 3, 3, 1, 3, 0, 0, 0, 0, 2, 1, 0, 2, 0, 2, 0, 2, 2, 2,
2, 3, 2, 2, 0, 1, 0, 0, 0, 2, 1, 3, 2, 0, 3, 3, 2, 1, 2, 0, 2, 2, 1, 2,
1, 2, 0, 1, 3, 0, 0, 3, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 3, 0, 2, 1, 1, 2,
1, 3, 2, 2, 1, 0, 3, 3, 0, 2, 0, 1, 1, 3, 3, 3, 2, 0, 0, 0, 3, 3, 2, 1,
1, 2, 2, 1, 1, 0, 1, 0, 2, 0, 3, 1, 1, 0, 2, 0, 0, 1, 0, 3, 2, 3, 3, 3,
1, 3, 2, 0, 1, 3, 3, 3, 1, 3, 2, 0, 1, 2, 2, 3, 3, 3, 2, 3, 3, 3, 0, 3,
3, 2, 2, 0, 3, 1, 1, 3, 0, 1, 0, 3, 2, 2, 0, 2, 0, 2, 0, 0, 2, 3, 2, 2,
0, 2, 0, 3, 2, 3, 1, 2, 0, 3, 0, 2, 2, 2, 1, 1, 2, 2, 1, 1, 0, 3, 3, 2,
0, 1, 0, 3, 0, 2, 3, 1, 1, 1, 1, 3, 1, 0, 1, 1, 2, 2, 3, 1, 0, 0, 1, 1,
3, 3, 1, 3, 0, 1, 0),
'flowgram_values': (
101, 0, 98, 3, 0, 104, 2, 95, 1, 0, 97, 3, 0, 110, 2, 102, 102, 110, 2,
99, 101, 0, 195, 5, 102, 0, 5, 96, 7, 0, 95, 7, 101, 0, 8, 98, 9, 0,
190, 9, 201, 0, 194, 101, 107, 104, 12, 198, 13, 104, 2, 105, 295, 7,
4, 197, 10, 101, 195, 98, 101, 3, 10, 100, 102, 0, 100, 7, 101, 0, 96,
8, 11, 102, 12, 102, 203, 9, 196, 8, 13, 206, 13, 6, 103, 10, 4, 103,
102, 3, 7, 479, 9, 102, 202, 10, 198, 6, 195, 9, 102, 0, 100, 5, 100,
2, 103, 8, 8, 100, 6, 102, 7, 200, 388, 10, 97, 100, 8, 5, 100, 12, 197,
7, 13, 103, 8, 7, 104, 10, 101, 104, 12, 201, 12, 99, 8, 99, 106, 13,
103, 102, 8, 202, 108, 9, 13, 293, 7, 4, 203, 103, 202, 107, 376, 103,
8, 11, 188, 8, 99, 101, 104, 8, 92, 101, 12, 4, 92, 11, 101, 7, 96, 202,
8, 12, 93, 11, 11, 202, 7, 195, 101, 102, 6, 0, 101, 7, 7, 106, 2, 6,
107, 4, 404, 12, 6, 104, 8, 10, 98, 2, 105, 110, 100, 8, 95, 3, 105,
102, 208, 201, 13, 195, 14, 0, 99, 86, 202, 9, 301, 206, 8, 8, 85, 6,
101, 6, 9, 103, 8, 9, 96, 4, 7, 102, 111, 0, 8, 93, 7, 194, 111, 5, 10,
95, 5, 10, 104, 2, 6, 98, 103, 0, 11, 99, 15, 192, 110, 5, 98, 8, 91, 8,
10, 92, 5, 10, 102, 8, 7, 105, 15, 102, 7, 9, 100, 2, 3, 102, 6, 9, 203,
6, 14, 107, 12, 8, 107, 1, 103, 13, 202, 2, 6, 108, 103, 99, 11, 2, 201,
207, 14, 8, 94, 4, 95, 9, 195, 13, 193, 9, 306, 13, 100, 11, 6, 75, 13,
91, 12, 205, 7, 203, 10, 3, 107, 17, 111, 12, 4, 105, 106, 7, 208, 5, 9,
202, 8, 108, 6, 84, 16, 103, 108, 92, 16, 93, 8, 95, 94, 207, 17, 10,
103, 3, 0, 104, 0, 202, 217, 16, 12, 197, 4, 90, 15, 17, 108, 98, 125,
104, 88, 14, 15, 99, 187, 106, 109, 12, 100, 11, 81, 8, 11, 92, 304,
112, 107, 2, 11, 94, 7, 6, 86, 97, 19, 3, 225, 206),
'Bases': (
'TCAGCAGTAGTCCTGCTGCCTTCCGTAGGAGTTTGGACCGTGTCTCAGTTCCAATGTGGGGGACCTTCCT'
'CTCAGAACCCCTATCCATCGAAGACTAGGTGGGCCGTTACCCCGCCTACTATCTAATGGAACGCATCCCC'
'ATCGTCTACCGGAATACCTTTAATCATGTGAACATGTGAACTCATGATGCCATCTTGTATTAATCTTCCT'
'TTCAGAAGGCTGTCCAAGAGTAGACGGCAGGTTGGATACGTGTTACTCACCCGTGCGCCGG'),
'quality_scores': (
37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37,
37, 37, 37, 37, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 37, 37, 37, 37, 37,
37, 37, 37, 34, 34, 34, 34, 34, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37,
37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37,
37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37,
38, 32, 32, 32, 32, 38, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37,
37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37,
37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37,
37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37,
37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37,
37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37,
37, 37, 37, 37, 38, 38, 38, 38, 40, 40, 40, 38, 38, 38, 38, 38, 38, 38,
40, 38, 38, 38, 38, 38, 38, 37, 38, 38, 36, 37, 37, 36, 33, 28, 28, 31,
31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 32, 32, 31, 30, 30, 25, 25, 25,
25),
}
READ_DATA_TXT = """
Flowgram: 1.01 0.00 0.98 0.03 0.00 1.04 0.02 0.95 0.01 0.00 0.97 0.03 0.00 1.10 0.02 1.02 1.02 1.10 0.02 0.99 1.01 0.00 1.95 0.05 1.02 0.00 0.05 0.96 0.07 0.00 0.95 0.07 1.01 0.00 0.08 0.98 0.09 0.00 1.90 0.09 2.01 0.00 1.94 1.01 1.07 1.04 0.12 1.98 0.13 1.04 0.02 1.05 2.95 0.07 0.04 1.97 0.10 1.01 1.95 0.98 1.01 0.03 0.10 1.00 1.02 0.00 1.00 0.07 1.01 0.00 0.96 0.08 0.11 1.02 0.12 1.02 2.03 0.09 1.96 0.08 0.13 2.06 0.13 0.06 1.03 0.10 0.04 1.03 1.02 0.03 0.07 4.79 0.09 1.02 2.02 0.10 1.98 0.06 1.95 0.09 1.02 0.00 1.00 0.05 1.00 0.02 1.03 0.08 0.08 1.00 0.06 1.02 0.07 2.00 3.88 0.10 0.97 1.00 0.08 0.05 1.00 0.12 1.97 0.07 0.13 1.03 0.08 0.07 1.04 0.10 1.01 1.04 0.12 2.01 0.12 0.99 0.08 0.99 1.06 0.13 1.03 1.02 0.08 2.02 1.08 0.09 0.13 2.93 0.07 0.04 2.03 1.03 2.02 1.07 3.76 1.03 0.08 0.11 1.88 0.08 0.99 1.01 1.04 0.08 0.92 1.01 0.12 0.04 0.92 0.11 1.01 0.07 0.96 2.02 0.08 0.12 0.93 0.11 0.11 2.02 0.07 1.95 1.01 1.02 0.06 0.00 1.01 0.07 0.07 1.06 0.02 0.06 1.07 0.04 4.04 0.12 0.06 1.04 0.08 0.10 0.98 0.02 1.05 1.10 1.00 0.08 0.95 0.03 1.05 1.02 2.08 2.01 0.13 1.95 0.14 0.00 0.99 0.86 2.02 0.09 3.01 2.06 0.08 0.08 0.85 0.06 1.01 0.06 0.09 1.03 0.08 0.09 0.96 0.04 0.07 1.02 1.11 0.00 0.08 0.93 0.07 1.94 1.11 0.05 0.10 0.95 0.05 0.10 1.04 0.02 0.06 0.98 1.03 0.00 0.11 0.99 0.15 1.92 1.10 0.05 0.98 0.08 0.91 0.08 0.10 0.92 0.05 0.10 1.02 0.08 0.07 1.05 0.15 1.02 0.07 0.09 1.00 0.02 0.03 1.02 0.06 0.09 2.03 0.06 0.14 1.07 0.12 0.08 1.07 0.01 1.03 0.13 2.02 0.02 0.06 1.08 1.03 0.99 0.11 0.02 2.01 2.07 0.14 0.08 0.94 0.04 0.95 0.09 1.95 0.13 1.93 0.09 3.06 0.13 1.00 0.11 0.06 0.75 0.13 0.91 0.12 2.05 0.07 2.03 0.10 0.03 1.07 0.17 1.11 0.12 0.04 1.05 1.06 0.07 2.08 0.05 0.09 2.02 0.08 1.08 0.06 0.84 0.16 1.03 1.08 0.92 0.16 0.93 0.08 0.95 0.94 2.07 0.17 0.10 1.03 0.03 0.00 1.04 0.00 2.02 2.17 0.16 0.12 1.97 0.04 0.90 0.15 0.17 1.08 0.98 1.25 1.04 0.88 0.14 0.15 0.99 1.87 1.06 1.09 0.12 1.00 0.11 0.81 0.08 0.11 0.92 3.04 1.12 1.07 0.02 0.11 0.94 0.07 0.06 0.86 0.97 0.19 0.03 2.25 2.06
Flow Indexes: 1 3 6 8 11 14 16 17 18 20 21 23 23 25 28 31 33 36 39 39 41 41 43 43 44 45 46 48 48 50 52 53 53 53 56 56 58 59 59 60 61 64 65 67 69 71 74 76 77 77 79 79 82 82 85 88 89 92 92 92 92 92 94 95 95 97 97 99 99 101 103 105 107 110 112 114 114 115 115 115 115 117 118 121 123 123 126 129 131 132 134 134 136 138 139 141 142 144 144 145 148 148 148 151 151 152 153 153 154 155 155 155 155 156 159 159 161 162 163 165 166 169 171 173 174 174 177 180 180 182 182 183 184 187 190 193 195 195 195 195 198 201 203 204 205 207 209 210 211 211 212 212 214 214 217 218 219 219 221 221 221 222 222 225 227 230 233 236 237 240 242 242 243 246 249 252 253 256 258 258 259 261 263 266 269 272 274 277 280 283 283 286 289 291 293 293 296 297 298 301 301 302 302 305 307 309 309 311 311 313 313 313 315 318 320 322 322 324 324 327 329 332 333 335 335 338 338 340 342 344 345 346 348 350 351 352 352 355 358 360 360 361 361 364 364 366 369 370 371 372 373 376 377 377 378 379 381 383 386 387 387 387 388 389 392 395 396 399 399 400 400
Bases: tcagCAGTAGTCCTGCTGCCTTCCGTAGGAGTTTGGACCGTGTCTCAGTTCCAATGTGGGGGACCTTCCTCTCAGAACCCCTATCCATCGAAGACTAGGTGGGCCGTTACCCCGCCTACTATCTAATGGAACGCATCCCCATCGTCTACCGGAATACCTTTAATCATGTGAACATGTGAACTCATGATGCCATCTTGTATTAATCTTCCTTTCAGAAGGCTGTCCAAGAGTAGACGGCAGGTTGGATACGTGTTACTCACCCGTGCGCCGG
Quality Scores: 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 37 37 37 37 37 37 37 37 34 34 34 34 34 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 38 32 32 32 32 38 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 38 38 38 38 40 40 40 38 38 38 38 38 38 38 40 38 38 38 38 38 38 37 38 38 36 37 37 36 33 28 28 31 31 31 31 31 31 31 31 31 31 31 32 32 31 30 30 25 25 25 25
"""
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_blast.py 000644 000765 000024 00000032350 12024702176 022275 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests of BLAST parser.
"""
from string import split, strip
from cogent.util.unit_test import TestCase, main
from cogent.parse.blast import iter_finder, query_finder, iteration_set_finder,\
is_blast_junk, is_blat_junk, make_label, PsiBlastQueryFinder, \
TableToValues, \
PsiBlastTableParser, PsiBlastFinder, GenericBlastParser9, \
PsiBlastParser9, LastProteinIds9, QMEBlast9, QMEPsiBlast9, \
fastacmd_taxonomy_splitter, FastacmdTaxonomyParser
__author__ = "Micah Hamady"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Micah Hamady", "Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Micah Hamady"
__email__ = "hamady@colorado.edu"
__status__ = "Production"
class BlastTests(TestCase):
"""Tests of top-level functions"""
def setUp(self):
"""Define some standard data"""
self.rec = """# BLASTP 2.2.10 [Oct-19-2004]
# Iteration: 1
# Query: ece:Z4181
# Database: db/everything.faa
# Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score
ece:Z4181 ece:Z4181 100.00 110 0 0 1 110 1 110 3e-47 187
ece:Z4181 ecs:ECs3717 100.00 110 0 0 1 110 1 110 3e-47 187
ece:Z4181 cvi:CV2421 41.67 72 42 0 39 110 29 100 2e-06 52.8
# BLASTP 2.2.10 [Oct-19-2004]
# Iteration: 2
# Query: ece:Z4181
# Database: db/everything.faa
# Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score
ece:Z4181 ece:Z4181 100.00 110 0 0 1 110 1 110 3e-54 211
ece:Z4181 ecs:ECs3717 100.00 110 0 0 1 110 1 110 3e-54 211
ece:Z4181 cvi:CV2421 41.67 72 42 0 39 110 29 100 2e-08 59.0
ece:Z4181 sfl:CP0138 33.98 103 57 2 8 110 6 97 6e-06 50.5
ece:Z4181 spt:SPA2730 37.50 72 45 0 39 110 30 101 1e-05 49.8
ece:Z4181 sec:SC2804 37.50 72 45 0 39 110 30 101 1e-05 49.8
ece:Z4181 stm:STM2872 37.50 72 45 0 39 110 30 101 1e-05 49.8""".split('\n')
self.rec2 = """# BLASTP 2.2.10 [Oct-19-2004]
# Iteration: 1
# Query: ece:Z4181
# Database: db/everything.faa
# Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score
ece:Z4181 ece:Z4181 100.00 110 0 0 1 110 1 110 3e-47 187
ece:Z4181 ecs:ECs3717 100.00 110 0 0 1 110 1 110 3e-47 187
ece:Z4181 cvi:CV2421 41.67 72 42 0 39 110 29 100 2e-06 52.8
# BLASTP 2.2.10 [Oct-19-2004]
# Iteration: 2
# Query: ece:Z4181
# Database: db/everything.faa
# Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score
ece:Z4181 ece:Z4181 100.00 110 0 0 1 110 1 110 3e-54 211
ece:Z4181 ecs:ECs3717 100.00 110 0 0 1 110 1 110 3e-54 211
ece:Z4181 cvi:CV2421 41.67 72 42 0 39 110 29 100 2e-08 59.0
ece:Z4181 sfl:CP0138 33.98 103 57 2 8 110 6 97 6e-06 50.5
ece:Z4181 spt:SPA2730 37.50 72 45 0 39 110 30 101 1e-05 49.8
ece:Z4181 sec:SC2804 37.50 72 45 0 39 110 30 101 1e-05 49.8
ece:Z4181 stm:STM2872 37.50 72 45 0 39 110 30 101 1e-05 49.8
# BLASTP 2.2.10 [Oct-19-2004]
# Iteration: 1
# Query: ece:Z4182
# Database: db/everything.faa
# Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score
ece:Z4182 ece:Z4182 100.00 110 0 0 1 110 1 110 3e-47 187
ece:Z4182 ecs:ECs3718 100.00 110 0 0 1 110 1 110 3e-47 187
ece:Z4182 cvi:CV2422 41.67 72 42 0 39 110 29 100 2e-06 52.8""".split('\n')
self.rec3 = """# BLASTP 2.2.10 [Oct-19-2004]
# Iteration: 1
# Query: ece:Z4181
# Database: db/everything.faa
# Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score
ece:Z4181 ece:Z4181 100.00 110 0 0 1 110 1 110 3e-47 187
ece:Z4181 ecs:ECs3717 100.00 110 0 0 1 110 1 110 3e-47 187
ece:Z4181 spt:SPA2730 37.50 72 45 0 39 110 30 101 1e-05 49.8
# BLASTP 2.2.10 [Oct-19-2004]
# Iteration: 2
# Query: ece:Z4181
# Database: db/everything.faa
# Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score
ece:Z4181 ecs:ECs3717 100.00 110 0 0 1 110 1 110 3e-54 211
ece:Z4181 cvi:CV2421 41.67 72 42 0 39 110 29 100 2e-08 59.0
# BLASTP 2.2.10 [Oct-19-2004]
# Iteration: 1
# Query: ece:Z4182
# Database: db/everything.faa
# Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score
ece:Z4182 ece:Z4182 100.00 110 0 0 1 110 1 110 3e-47 187
ece:Z4182 cvi:CV2422 41.67 72 42 0 39 110 29 100 2e-06 52.8""".split('\n')
def test_iter_finder(self):
"""iter_finder should split on lines starting with '# Iteration:'"""
lines = 'abc\n# Iteration: 3\ndef'.split('\n')
self.assertEqual(map(iter_finder,lines), [False, True, False])
def test_query_finder(self):
"""query_finder should split on lines starting with '# Query:'"""
lines = 'abc\n# Query: dfdsffsd\ndef'.split('\n')
self.assertEqual(map(query_finder,lines), [False, True, False])
def test_iteration_set_finder(self):
"""iter_finder should split on lines starting with '# Iteration:'"""
lines = 'abc\n# Iteration: 3\ndef\n# Iteration: 1'.split('\n')
self.assertEqual(map(iteration_set_finder,lines), \
[False, False, False, True])
def test_is_junk(self):
"""is_junk should reject an assortment of invalid lines"""
#Note: testing two functions that call it instead of function itself
lines = 'abc\n# BLAST blah blah\n \n# BLAT blah\n123'.split('\n')
self.assertEqual(map(is_blast_junk, lines), \
[False, True, True, False, False])
self.assertEqual(map(is_blat_junk, lines), \
[False, False, True, True, False])
def test_make_label(self):
"""make_label should turn comment lines into (key, val) pairs"""
a = 'this test will fail: no # at start'
b = '#this test will fail because no colon'
c = '# Iteration: 1'
d = '# Query: ece:Z4147 ygdP; putative invasion protein [EC:3.6.1.-]'
e = '#Iteration: 1' #no space after the hash
self.assertRaises(ValueError, make_label, a)
self.assertRaises(ValueError, make_label, b)
#Note that we _do_ map the data type of known values value, so the
#value of the iteration will be 1, not '1'
self.assertEqual(make_label(c), ('ITERATION', 1))
self.assertEqual(make_label(d), ('QUERY', \
'ece:Z4147 ygdP; putative invasion protein [EC:3.6.1.-]'))
self.assertEqual(make_label(e), ('ITERATION', 1))
def test_TableToValues(self):
"""TableToValues should convert itself into the correct type."""
constructors = {'a':int, 'b':float, 'c':str}
table=[['c','b','a','d'], ['1.5', '3.5', '2', '2.5'],['1','2','3','4']]
self.assertEqual(TableToValues(table, constructors), \
([['1.5',3.5,2,'2.5'],['1',2.0,3,'4']], ['c','b','a','d']))
#check that it works with supplied header
self.assertEqual(TableToValues(table[1:], constructors, list('cbad')), \
([['1.5',3.5,2,'2.5'],['1',2.0,3,'4']], ['c','b','a','d']))
def test_PsiBlastTableParser(self):
"""PsiBlastTableParser should wrap values in table."""
fields = map(strip,
'Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score'.split(','))
table = map(split, """ece:Z4147 ece:Z4147 100.00 176 0 0 1 176 1 176 2e-89 328
ece:Z4147 ecs:ECs3687 100.00 176 0 0 1 176 1 176 2e-89 328
ece:Z4147 ecc:c3425 100.00 176 0 0 1 176 1 176 2e-89 328
ece:Z4147 sfl:SF2840 100.00 176 0 0 1 176 1 176 2e-89 328""".split('\n'))
headed_table = [fields] + table
new_table, new_fields = PsiBlastTableParser(headed_table)
self.assertEqual(new_fields, fields)
self.assertEqual(len(new_table), 4)
self.assertEqual(new_table[1], ['ece:Z4147', 'ecs:ECs3687', 100.0, \
176, 0, 0, 1, 176, 1, 176, 2e-89, 328])
def test_GenericBlastParser9(self):
"""GenericBlastParser9 should read blast's tabular format (#9)."""
rec = self.rec
p = GenericBlastParser9(rec, PsiBlastFinder)
result = list(p)
self.assertEqual(len(result), 2)
first, second = result
self.assertEqual(first[0], {'ITERATION':1,'QUERY':'ece:Z4181',\
'DATABASE':'db/everything.faa', 'FIELDS':'Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score'})
self.assertEqual(len(first[1]), 3)
self.assertEqual(second[0]['ITERATION'], 2)
self.assertEqual(len(second[1]), 7)
self.assertEqual(second[1][-1], \
'ece:Z4181 stm:STM2872 37.50 72 45 0 39 110 30 101 1e-05 49.8'.split())
def test_PsiBlastParser9(self):
"""PsiBlastParser9 should provide convenient results for format #9."""
result = PsiBlastParser9(self.rec2)
self.assertEqual(len(result), 2)
assert 'ece:Z4181' in result
assert 'ece:Z4182' in result
first = result['ece:Z4181']
second = result['ece:Z4182']
self.assertEqual(len(first), 2)
self.assertEqual(len(second), 1)
iter_1 = first[0]
iter_2 = first[1]
self.assertEqual(len(iter_1), 3)
self.assertEqual(len(iter_2), 7)
iter_1_2 = second[0]
self.assertEqual(len(iter_1_2), 3)
self.assertEqual(len(result['ece:Z4181'][1][3]), 12)
self.assertEqual(result['ece:Z4181'][1][3]['ALIGNMENT LENGTH'], 103)
def test_LastProteinIds9(self):
"""LastProteinIds9 should give last protein ids in iter"""
result = LastProteinIds9(self.rec)
self.assertEqual(result, ['ece:Z4181', 'ecs:ECs3717', 'cvi:CV2421',\
'sfl:CP0138', 'spt:SPA2730', 'sec:SC2804', 'stm:STM2872'])
#should also work if threshold set
result = LastProteinIds9(self.rec, False, threshold=8e-6)
self.assertEqual(result, ['ece:Z4181', 'ecs:ECs3717', 'cvi:CV2421',\
'sfl:CP0138'])
#should work on multiple records
result = map(LastProteinIds9, PsiBlastQueryFinder(self.rec2))
self.assertEqual(len(result), 2)
self.assertEqual(result[0], ['ece:Z4181', 'ecs:ECs3717', 'cvi:CV2421',\
'sfl:CP0138', 'spt:SPA2730', 'sec:SC2804', 'stm:STM2872'])
self.assertEqual(result[1], ['ece:Z4182','ecs:ECs3718','cvi:CV2422'])
def test_QMEBlast9(self):
"""QMEBlast9 should return expected lines from all iterations"""
self.assertFloatEqual(QMEBlast9(self.rec3), [\
('ece:Z4181','ece:Z4181',3e-47),
('ece:Z4181','ecs:ECs3717',3e-47),
('ece:Z4181','spt:SPA2730', 1e-5),
('ece:Z4181','ecs:ECs3717',3e-54), #WARNING: allows duplicates
('ece:Z4181','cvi:CV2421',2e-8),
('ece:Z4182','ece:Z4182',3e-47),
('ece:Z4182','cvi:CV2422',2e-6),
])
def test_QMEPsiBlast9(self):
"""QMEPsiBlast9 should only return items from last iterations"""
self.assertFloatEqual(QMEPsiBlast9(self.rec3), [\
('ece:Z4181','ecs:ECs3717',3e-54),
('ece:Z4181','cvi:CV2421',2e-8),
('ece:Z4182','ece:Z4182',3e-47),
('ece:Z4182','cvi:CV2422',2e-6),
])
def test_fastacmd_taxonomy_splitter(self):
"""fastacmd_taxonomy_splitter should split records into groups"""
text = """NCBI sequence id: gi|3021565|emb|AJ223314.1|PSAJ3314
NCBI taxonomy id: 3349
Common name: Scots pine
Scientific name: Pinus sylvestris
NCBI sequence id: gi|37777029|dbj|AB108787.1|
NCBI taxonomy id: 228610
Common name: cf. Acremonium sp. KR21-2
Scientific name: cf. Acremonium sp. KR21-2
""".splitlines()
recs = list(fastacmd_taxonomy_splitter(text))
self.assertEqual(len(recs), 2)
self.assertEqual(recs[0], text[:5]) #includes trailing blank
def test_FastaCmdTaxonomyParser(self):
"""FastaCmdTaxonomyParser should parse taxonomy record to dict"""
text = """NCBI sequence id: gi|3021565|emb|AJ223314.1|PSAJ3314
NCBI taxonomy id: 3349
Common name: Scots pine
Scientific name: Pinus sylvestris
NCBI sequence id: gi|37777029|dbj|AB108787.1|
NCBI taxonomy id: 228610
Common name: cf. Acremonium sp. KR21-2
Scientific name: cf. Acremonium sp. KR21-2
""".splitlines()
recs = list(FastacmdTaxonomyParser(text))
self.assertEqual(len(recs), 2)
for r in recs:
self.assertEqual(sorted(r.keys()), ['common_name','scientific_name',
'seq_id', 'tax_id'])
r0, r1 = recs
self.assertEqual(r0['tax_id'], '3349')
self.assertEqual(r0['common_name'], 'Scots pine')
self.assertEqual(r0['scientific_name'], 'Pinus sylvestris')
self.assertEqual(r0['seq_id'], 'gi|3021565|emb|AJ223314.1|PSAJ3314')
self.assertEqual(r1['tax_id'], '228610')
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_parse/test_blast_xml.py 000644 000765 000024 00000027220 12024702176 023155 0 ustar 00jrideout staff 000000 000000 #! /usr/bin/env python
#
# test_blast_xml.py
#
__author__ = "Kristian Rother"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Micah Hamady"]
__credits__ = ["Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Kristian Rother"
__email__ = "krother@rubor.de"
__status__ = "Prototype"
from cogent.util.unit_test import main, TestCase
from cogent.parse.blast_xml import BlastXMLResult, MinimalBlastParser7,\
get_tag, parse_hsp, parse_hit, parse_header, parse_parameters,\
HSP_XML_FIELDNAMES, HIT_XML_FIELDNAMES
import xml.dom.minidom
class GetTagTests(TestCase):
"""Tests for the auxiliary function evaluating the tag objects."""
def setUp(self):
self.single_tag = xml.dom.minidom.parseString(\
"bla content bla ")
self.double_tag = xml.dom.minidom.parseString(\
"first content second content ")
self.empty_tag = xml.dom.minidom.parseString(" ")
def test_get_tag_works(self):
self.assertEqual(get_tag(self.single_tag,'inner'),'content')
self.assertEqual(get_tag(self.double_tag,'inner'),'first content')
self.assertEqual(get_tag(self.empty_tag,'inner'),None)
self.assertEqual(get_tag(self.empty_tag,'inner', 'blue elephant'),\
'blue elephant')
self.assertEqual(get_tag(self.single_tag,'non-existing tag'),None)
self.assertEqual(get_tag(self.single_tag,'non-existing tag',\
'pink elephant'),'pink elephant')
self.assertEqual(get_tag(self.single_tag,'inner'),'content')
def test_get_tag_fail(self):
"""Make sure the tag and name parameters are in the proper types."""
self.assertRaises(AttributeError, get_tag,None,"h1")
self.assertRaises(AttributeError, get_tag,\
"This is not a XML tag object ","h1")
class MinimalBlastParser7Tests(TestCase):
"""Tests for the functions required by the Blast XML parsers."""
def setUp(self):
self.hit1 = xml.dom.minidom.parseString(HIT_WITH_ONE_HSP)
self.hit2 = xml.dom.minidom.parseString(HIT_WITH_TWO_HSPS)
self.hsp1 = xml.dom.minidom.parseString(HSP_ONE)
self.hsp2 = xml.dom.minidom.parseString(HSP_TWO)
self.hsp_gaps = xml.dom.minidom.parseString(HSP_WITH_GAPS)
self.param = xml.dom.minidom.parseString(PARAM_XML)
self.header = xml.dom.minidom.parseString(HEADER_XML)
self.complete = xml.dom.minidom.parseString(HEADER_COMPLETE)
def test_parse_header(self):
"""Fields from XML header tag should be available as dict."""
data = parse_header(self.header)
self.assertEqual(data.get('application'), 'my Grandma')
self.assertEqual(data.get('version'), 'has')
self.assertEqual(data.get('reference'), 'furry')
self.assertEqual(data.get('query_letters'), 27)
self.assertEqual(data.get('database'), 'Cats')
def test_parse_parameters(self):
"""Fields from XML parameter tag should be available as dict."""
data = parse_parameters(self.param)
self.assertEqual(data.get('matrix'), 'BLOSUM62')
self.assertEqual(data.get('expect'), '10')
self.assertEqual(data.get('gap_open_penalty'), 11.1)
self.assertEqual(data.get('gap_extend_penalty'), 22.2)
self.assertEqual(data.get('filter'), 'F')
def test_parse_header_complete(self):
"""Fields from header+param tag should be available as dict."""
# try to process header with parameters etc in the XML
data = parse_header(self.complete)
self.assertEqual(data.get('database'), 'Cats')
self.assertEqual(data.get('matrix'), 'BLOSUM62')
def test_parse_hit(self):
"""Should return a list with all values for a hit+hsp."""
data = parse_hit(self.hit1)
self.assertEqual(len(data),1)
d = dict(zip(HIT_XML_FIELDNAMES,data[0]))
self.assertEqual(d['SUBJECT_ID'],"gi|148670104|gb|EDL02051.1|")
self.assertEqual(d['HIT_DEF'],
"insulin-like growth factor 2 receptor, isoform CRA_c [Mus musculus]")
self.assertEqual(d['HIT_ACCESSION'],"2001")
self.assertEqual(int(d['HIT_LENGTH']),707)
# check hit with more HSPs
data = parse_hit(self.hit2)
self.assertEqual(len(data),2)
self.assertNotEqual(data[0],data[1])
def test_parse_hsp(self):
"""Should return list with all values for a hsp."""
data = parse_hsp(self.hsp1)
d = dict(zip(HSP_XML_FIELDNAMES,data))
self.assertEqual(float(d['BIT_SCORE']),1023.46)
self.assertEqual(float(d['SCORE']),2645)
self.assertEqual(float(d['E_VALUE']),0.333)
self.assertEqual(int(d['QUERY_START']),4)
self.assertEqual(int(d['QUERY_END']),18)
self.assertEqual(int(d['SUBJECT_START']),5)
self.assertEqual(int(d['SUBJECT_END']),19)
self.assertEqual(int(d['GAP_OPENINGS']),0)
self.assertEqual(int(d['ALIGNMENT_LENGTH']),14)
self.assertEqual(d['QUERY_ALIGN'],'ELEPHANTTHISISAHITTIGER')
self.assertEqual(d['MIDLINE_ALIGN'],'ORCA-WHALE')
self.assertEqual(d['SUBJECT_ALIGN'],'SEALSTHIS---HIT--GER')
class BlastXmlResultTests(TestCase):
"""Tests parsing of output of Blast with output mode 7 (XML)."""
def setUp(self):
self.result = BlastXMLResult(COMPLETE_XML,xml=True)
def test_options(self):
"""Constructor should take parser as an option."""
result = BlastXMLResult(COMPLETE_XML,parser=MinimalBlastParser7)
self.assertEqual(len(result.keys()),1)
# make sure whether normal Blast parser still works upon code merge!
def test_parsed_query_sequence(self):
"""The result dict should have one query sequence as a key."""
# The full query sequence is not given in the XML file.
# Thus it is not checked explicitly, only whether there is
# exactly one found.
self.assertEqual(len(self.result.keys()),1)
def test_parsed_iterations(self):
"""The result should have the right number of iterations."""
n_iter = 0
for query_id,hits in self.result.iterHitsByQuery():
n_iter += 1
self.assertEqual(n_iter,1)
def test_parsed_hsps(self):
"""The result should have the right number of hsps."""
n_hsps = 0
for query_id,hsps in self.result.iterHitsByQuery():
n_hsps += len(hsps)
self.assertEqual(n_hsps,3)
def test_parse_hit_details(self):
"""The result should have data from hit fields."""
for query in self.result:
first_hsp = self.result[query][0][0]
self.assertEqual(first_hsp['SUBJECT_ID'],
"gi|148670104|gb|EDL02051.1|")
self.assertEqual(first_hsp['HIT_DEF'],
"insulin-like growth factor 2 receptor, isoform CRA_c [Mus musculus]")
self.assertEqual(first_hsp['HIT_ACCESSION'],"2001")
self.assertEqual(first_hsp['HIT_LENGTH'],707)
def test_parse_hsp_details(self):
"""The result should have data from hsp fields."""
for query in self.result:
# should check integers in next version.
first_hsp = self.result[query][0][0]
self.assertEqual(first_hsp['QUERY ID'],1)
self.assertEqual(first_hsp['BIT_SCORE'],'1023.46')
self.assertEqual(first_hsp['SCORE'],'2645')
self.assertEqual(first_hsp['E_VALUE'],'0.333')
self.assertEqual(first_hsp['QUERY_START'],'4')
self.assertEqual(first_hsp['QUERY_END'],'18')
self.assertEqual(first_hsp['QUERY_ALIGN'],'ELEPHANTTHISISAHITTIGER')
self.assertEqual(first_hsp['MIDLINE_ALIGN'],'ORCA-WHALE')
self.assertEqual(first_hsp['SUBJECT_ALIGN'],'SEALSTHIS---HIT--GER')
self.assertEqual(first_hsp['SUBJECT_START'],'5')
self.assertEqual(first_hsp['SUBJECT_END'],'19')
self.assertEqual(first_hsp['PERCENT_IDENTITY'],'55')
self.assertEqual(first_hsp['POSITIVE'],'555')
self.assertEqual(first_hsp['GAP_OPENINGS'],0)
self.assertEqual(first_hsp['ALIGNMENT_LENGTH'],'14')
gap_hsp = self.result[query][0][1]
self.assertEqual(gap_hsp['GAP_OPENINGS'],'33')
HSP_XML = """
1
1023.46
2645
0.333
4
18
5
19
1
1
55
%s
555
14
ELEPHANTTHISISAHITTIGER
SEALSTHIS---HIT--GER
ORCA-WHALE
"""
HSP_ONE = HSP_XML%''
HSP_WITH_GAPS = HSP_XML%'33 '
HSP_TWO = """
2
1023.46
2645
0.333
6
22
5
23
1
1
55
%s
555
18
EPHANT---THISISAHIT-TIGER
ALSWWWTHIS---HITW--GER
ORCA-WHALE
"""
HIT_XML = """
1
gi|148670104|gb|EDL02051.1|
insulin-like growth factor 2 receptor, isoform CRA_c [Mus musculus]
2001
707
%s
"""
HIT_WITH_ONE_HSP = HIT_XML%HSP_ONE
HIT_WITH_TWO_HSPS = HIT_XML%(HSP_WITH_GAPS+HSP_TWO)
PARAM_XML = """
BLOSUM62
10
11.1
22.2
F
"""
HEADER_XML = """
my Grandma
has
Cats
furry
27
%s
"""
HIT_PREFIX = """
"""
HIT_SUFFIX = """
"""
HEADER_COMPLETE=HEADER_XML%(PARAM_XML+HIT_PREFIX+HIT_WITH_ONE_HSP+\
HIT_WITH_TWO_HSPS+HIT_SUFFIX)
COMPLETE_XML = """
"""+HEADER_COMPLETE
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_bowtie.py 000644 000765 000024 00000006674 12024702176 022473 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for the bowtie default output parser.
Compatible with bowtie version 0.12.5
"""
from cogent.parse.bowtie import BowtieOutputParser, BowtieToTable
from cogent.util.unit_test import TestCase, main
from cogent import LoadTable
__author__ = "Gavin Huttley, Anuj Pahwa"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight","Peter Maxwell", "Gavin Huttley", "Anuj Pahwa"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Development"
fname = 'data/bowtie_output.map'
expected = [['GAPC_0015:6:1:1283:11957#0/1', '-', 'Mus', 66047927, 'TGTATATATAAACATATATGGAAACTGAATATATATACATTATGTATGTATATATGTATATGTTATATATACATA', 'IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII', 0, ['55:A>G', '64:C>A']],
['GAPC_0015:6:1:1394:18813#0/1', '+', 'Mus', 77785518, 'ATGAAATTCCTAGCCAAATGGATGGACCTGGAGGGCATCATC', 'IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII', 447, []],
['GAPC_0015:6:1:1560:18056#0/1', '+', 'Mus', 178806665, 'TAGATAAAGGCTCTGTTTTTCATCATTGAGAAATTGTTATTTTTCTGATGTTATA', 'IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII', 0, ['9:T>G']],
['GAPC_0015:6:1:1565:19849#0/1', '+', 'Mus', 116516430, 'ACCATTTGCTTGGAAAATTGTTTTCCAGCCTTTCACTCTGAG', 'IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII', 141, []],
['GAPC_0015:6:1:1591:17397#0/1', '-', 'Mus', 120440696, 'TCTAAATCTGTTCATTAATTAAGCCTGTTTCCATGTCCTTGGTCTTAAGACCAATCTGTTATGCGGGTGTGA', 'IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII', 0, ['70:A>C', '71:G>T']]]
class BowtieOutputTest(TestCase):
def test_parsing(self):
"""make sure that the bowtie output file is parsed properly"""
parser = BowtieOutputParser(fname)
header = parser.next()
index = 0
for row in parser:
self.assertEqual(row, expected[index])
index += 1
def test_psl_to_table(self):
"""make sure that the table is built without any errors"""
table = BowtieToTable(fname)
def test_getting_seq_coords(self):
"""get correct information from the table"""
table = BowtieToTable(fname)
index = 0
for row in table:
query_name = row['Query Name']
strand_direction = row['Strand Direction']
query_offset = row['Offset']
self.assertEqual(query_name, expected[index][0])
self.assertEqual(strand_direction, expected[index][1])
self.assertEqual(query_offset, expected[index][3])
index += 1
def test_no_row_converter(self):
"""setting row_converter=None returns strings"""
# straight parser
parser = BowtieOutputParser(fname, row_converter=None)
header = parser.next()
for index, row in enumerate(parser):
query_offset = row[3]
other_matches = row[6]
self.assertEqual(query_offset, str(expected[index][3]))
self.assertEqual(other_matches, str(expected[index][6]))
# table
table = BowtieToTable(fname, row_converter=None)
for index, row in enumerate(table):
query_offset = row['Offset']
other_matches = row['Other Matches']
self.assertEqual(query_offset, str(expected[index][3]))
self.assertEqual(other_matches, str(expected[index][6]))
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_parse/test_bpseq.py 000644 000765 000024 00000017670 12024702176 022312 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Provides Tests for BpseqParser and related functions.
"""
from cogent.util.unit_test import TestCase, main
from cogent.core.info import Info
from cogent.struct.knots import inc_order
from cogent.parse.bpseq import BpseqParseError, construct_sequence,\
parse_header, parse_residues, MinimalBpseqParser, BpseqParser,\
bpseq_specify_output
__author__ = "Sandra Smit"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Sandra Smit", "Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Sandra Smit"
__email__ = "sandra.smit@colorado.edu"
__status__ = "Production"
class BpseqParserTests(TestCase):
"""Provides tests for BpseqParser and related functions"""
def test_parse_header(self):
"""parse_header: should work on standard header"""
h1 = ['Filename: d.16.b.E.coli.bpseq','Organism: Escherichia coli',\
'Accession Number: J01695', 'Citation and related information'+\
' available at http://www.rna.icmb.utexas.edu']
self.assertEqual(parse_header(h1),{'Filename':'d.16.b.E.coli.bpseq',\
'Accession Number': 'J01695', 'Organism': 'Escherichia coli',\
'Refs': {},'Citation':'http://www.rna.icmb.utexas.edu'})
assert isinstance(parse_header(h1), Info)
# lines without ':' are skipped
h2 = ['Filename: d.16.b.E.coli.bpseq','Organism: Escherichia coli',\
'Accession Number: J01695', 'Remark this is an interesting seq']
exp = {'Filename':'d.16.b.E.coli.bpseq', 'Refs': {},\
'Organism': 'Escherichia coli', 'Accession Number':'J01695'}
self.assertEqual(parse_header(h2),exp)
def test_construct_sequence(self):
"""construct_sequence: should return correct sequence or raise error
"""
d = {0:'A',1:'C',2:'G',3:'U'}
self.assertEqual(construct_sequence(d),'ACGU')
# doesn't check residue identity
d = {0:'A',1:'-',2:'R',3:'U'}
self.assertEqual(construct_sequence(d),'A-RU')
# error when sequence isn't continuous
d = {0:'A',1:'C',2:'G',5:'U'}
self.assertRaises(BpseqParseError, construct_sequence, d)
# error when first index is not zero
d = {1:'C',2:'G',3:'U',4:'A'}
self.assertRaises(BpseqParseError, construct_sequence, d)
def test_parse_residues(self):
"""parse_residues: should work on valid data
"""
lines = RES_LINES.split('\n')
exp_seq = 'UGGUAAUACGUUGCGAAGCC'
exp_pairs = [(2,8),(3,7),(4,11),(5,10),(6,9),(12,18),(13,17)]
self.assertEqual(parse_residues(lines, num_base=1,\
unpaired_symbol='0'), (exp_seq, exp_pairs))
def test_parse_residues_errors(self):
"""parse_residues: should raise BpseqParseErrors in several cases
"""
not_all_lines = RES_LINES_NOT_ALL.split('\n')
wrong_lines = RES_LINES_WRONG.split('\n')
conflict_lines = RES_LINES_CONFLICT.split('\n')
bp_conflict = RES_LINES_BP_CONFLICT.split('\n')
self.assertRaises(BpseqParseError, parse_residues, not_all_lines,\
num_base=1, unpaired_symbol='0')
self.assertRaises(BpseqParseError, parse_residues, wrong_lines,\
num_base=1, unpaired_symbol='0')
self.assertRaises(BpseqParseError, parse_residues, conflict_lines,\
num_base=1, unpaired_symbol='0')
self.assertRaises(BpseqParseError, parse_residues, bp_conflict,\
num_base=1, unpaired_symbol='0')
def test_parse_residues_diff_base(self):
"""parse_residues: should work with diff base and unpaired_symbol"""
lines = RES_LINES_DIFF_BASE.split('\n')
exp_seq = 'CAGACU'
exp_pairs = [(1,5),(2,4)]
obs = parse_residues(lines, num_base=3, unpaired_symbol='xxx')
self.assertEqual(obs, (exp_seq, exp_pairs))
def test_MinimalBpseqParser(self):
"""MinimalBpseqParser: should separate lines correctly"""
lines = ['Accesion: J01234', 'LABEL : label', '1 U 4', '2 A 10', 'xx',\
'A B C D E']
exp = {'HEADER': ['Accesion: J01234', 'LABEL : label'],\
'SEQ_STRUCT': ['1 U 4', '2 A 10']}
self.assertEqual(MinimalBpseqParser(lines), exp)
def test_BpseqParser(self):
"""BpseqParser: should work on valid data, returning Vienna or Pairs
"""
lines = RES_LINES_W_HEADER.split('\n')
exp_seq = 'UGGUAAUACGUUGCGAAGCC'
exp_pairs = [(2,8),(3,7),(4,11),(5,10),(6,9),(12,18),(13,17)]
self.assertEqual(BpseqParser(lines),(exp_seq, exp_pairs))
self.assertEqual(BpseqParser(lines)[0].Info,\
{'Filename':'d.16.b.E.coli.bpseq',\
'Accession Number': 'J01695', 'Organism': 'Escherichia coli',\
'Refs': {},'Citation':'http://www.rna.icmb.utexas.edu'})
# should work with different base
lines = RES_LINES_DIFF_BASE.split('\n')
exp_seq = 'CAGACU'
exp_pairs = [(1,5),(2,4)]
obs_seq, obs_pairs = BpseqParser(lines, num_base=3,\
unpaired_symbol='xxx')
self.assertEqual(obs_seq, exp_seq)
self.assertEqual(obs_seq.Info, {'Refs':{}})
self.assertEqual(obs_pairs, exp_pairs)
def test_BpseqParser_errors(self):
"""BpseqParser: should skip lines in unknown format"""
exp_seq = 'UGGUAAUACGUUGCGAAGCC'
exp_vienna_m = '....(((..)))((...)).'
exp_pairs = [(2,8),(3,7),(4,11),(5,10),(6,9),(12,18),(13,17)]
#skips lines in unknown format
lines = RES_LINES_UNKNOWN.split('\n')
obs_seq, obs_pairs = BpseqParser(lines)
self.assertEqual(obs_seq, exp_seq)
self.assertEqual(obs_pairs, exp_pairs)
self.assertEqual(obs_seq.Info,\
{'Filename':'d.16.b.E.coli.bpseq',\
'Accession Number': 'J01695', 'Organism': 'Escherichia coli',\
'Refs': {},'Citation':'http://www.rna.icmb.utexas.edu'})
class ConvenienceFunctionTests(TestCase):
"""Tests for convenience functions"""
def test_bpseq_specify_output(self):
"""bpseq_specify_output: different return values"""
f = bpseq_specify_output
lines = RES_LINES_W_HEADER.split('\n')
exp_seq = 'UGGUAAUACGUUGCGAAGCC'
exp_pairs = [(2,8),(3,7),(4,11),(5,10),(6,9),(12,18),(13,17)]
exp_pairs_majority = [(4,11),(5,10),(6,9),(12,18),(13,17)]
exp_pairs_first = [(2,8),(3,7),(12,18),(13,17)]
exp_vienna_majority = '....(((..)))((...)).'
self.assertEqual(f(lines),(exp_seq, exp_pairs))
self.assertEqual(f(lines, remove_pseudo=True),\
(exp_seq, exp_pairs_majority))
self.assertEqual(f(lines, remove_pseudo=True, pseudoknot_function=inc_order),\
(exp_seq, exp_pairs_first))
self.assertEqual(f(lines, return_vienna=True),\
(exp_seq, exp_vienna_majority))
RES_LINES=\
"""1 U 0
2 G 0
3 G 9
4 U 8
5 A 12
6 A 11
7 U 10
8 A 4
9 C 3
10 G 7
11 U 6
12 U 5
13 G 19
14 C 18
15 G 0
16 A 0
17 A 0
18 G 14
19 C 13
20 C 0"""
RES_LINES_NOT_ALL=\
"""1 U 0
2 G 0
3 G 0
6 A 0"""
RES_LINES_WRONG=\
"""1 U0
2 G 0
3 G 0
6 A 0"""
RES_LINES_CONFLICT=\
"""1 U 4
2 G 3
3 G 2
4 A 1
4 A 0"""
RES_LINES_BP_CONFLICT=\
"""1 U 0
2 G 4
3 G 4
4 C 2
5 A 0"""
RES_LINES_DIFF_BASE=\
"""3 C xxx
4 A 8
5 G 7
6 A xxx
7 C 5
8 U 4"""
RES_LINES_W_HEADER=\
"""Filename: d.16.b.E.coli.bpseq
Organism: Escherichia coli
Accession Number: J01695
Citation and related information available at http://www.rna.icmb.utexas.edu
1 U 0
2 G 0
3 G 9
4 U 8
5 A 12
6 A 11
7 U 10
8 A 4
9 C 3
10 G 7
11 U 6
12 U 5
13 G 19
14 C 18
15 G 0
16 A 0
17 A 0
18 G 14
19 C 13
20 C 0"""
RES_LINES_UNKNOWN=\
"""Filename: d.16.b.E.coli.bpseq
Organism: Escherichia coli
Accession Number: J01695
Citation and related information available at http://www.rna.icmb.utexas.edu
1 U 0
2 G 0
3 G 9
UNKNOWN LINE
4 U 8
5 A 12
6 A 11
7 U 10
8 A 4
9 C 3
10 G 7
11 U 6
12 U 5
13 G 19
14 C 18
15 G 0
16 A 0
17 A 0
18 G 14
19 C 13
20 C 0"""
#run if called from command-line
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_parse/test_cigar.py 000644 000765 000024 00000006075 12024702176 022262 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
import unittest, sys, os
from cogent import DNA, LoadSeqs
from cogent.parse.cigar import map_to_cigar, cigar_to_map, aligned_from_cigar, \
slice_cigar, CigarParser
__author__ = "Hua Ying"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Hua Ying", "Gavin Huttley"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Hua Ying"
__email__ = "hua.ying@anu.edu.au"
__status__ = "Production"
class TestCigar(unittest.TestCase):
def setUp(self):
self.cigar_text = '3D2M3D6MDM2D3MD'
self.aln_seq = DNA.makeSequence('---AA---GCTTAG-A--CCT-')
self.aln_seq1 = DNA.makeSequence('CCAAAAAA---TAGT-GGC--G')
self.map, self.seq = self.aln_seq.parseOutGaps()
self.map1, self.seq1 = self.aln_seq1.parseOutGaps()
self.slices = [(1, 4), (0, 8), (7, 12), (0, 1), (3, 5)]
self.aln = LoadSeqs(data = {"FAKE01": self.aln_seq, "FAKE02": self.aln_seq1})
self.cigars = {"FAKE01": self.cigar_text, "FAKE02": map_to_cigar(self.map1)}
self.seqs = {"FAKE01": str(self.seq), "FAKE02": str(self.seq1)}
def test_map_to_cigar(self):
"""convert a Map to cigar string"""
assert map_to_cigar(self.map) == self.cigar_text
def test_cigar_to_map(self):
"""test generating a Map from cigar"""
map = cigar_to_map(self.cigar_text)
assert str(map) == str(self.map)
def test_aligned_from_cigar(self):
"""test generating aligned seq from cigar"""
aligned_seq = aligned_from_cigar(self.cigar_text, self.seq)
assert aligned_seq == self.aln_seq
def test_slice_cigar(self):
"""test slicing cigars"""
for start, end in self.slices:
# test by_align = True
map1, loc1 = slice_cigar(self.cigar_text, start, end)
ori1 = self.aln_seq[start:end]
if loc1:
slicealn1 = self.seq[loc1[0]:loc1[1]].gappedByMap(map1)
assert ori1 == slicealn1
else:
assert map1.length == len(ori1)
# test by_align = False
map2, loc2 = slice_cigar(self.cigar_text, start, end, by_align = False)
slicealn2 = self.seq[start:end].gappedByMap(map2)
ori2 = self.aln_seq[loc2[0]:loc2[1]]
assert slicealn2 == ori2
def test_CigarParser(self):
"""test without slice"""
aln = CigarParser(self.seqs, self.cigars)
assert aln == self.aln
# test slice
i = 1
for start, end in self.slices:
self.aln.getSeq("FAKE01").addFeature("annot%d"%i, "annot", [(start, end)])
annot = self.aln.getAnnotationsFromAnySequence("annot%d"%i)
slice_aln = aln.getRegionCoveringAll(annot).asOneSpan().getSlice()
i += 1
cmp_aln = CigarParser(self.seqs, self.cigars, sliced = True,
ref_seqname = "FAKE01", start = start, end = end)
assert cmp_aln == slice_aln
if __name__ == '__main__':
unittest.main()
PyCogent-1.5.3/tests/test_parse/test_clustal.py 000644 000765 000024 00000012371 12024702176 022640 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for the clustal parsers.
"""
from cogent.parse.clustal import LabelLineParser, is_clustal_seq_line, \
last_space, delete_trailing_number, MinimalClustalParser
from cogent.parse.record import RecordError
from cogent.util.unit_test import TestCase, main
from cogent.core.alignment import Alignment
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Sandra Smit"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
#Note: the data are all strings and hence immutable, so it's OK to define
#them here instead of in setUp and then subclassing everything from that
#base class. If the data were mutable, we'd need to take more precautions
#to avoid crossover between tests.
minimal = 'abc\tucag'
two = 'abc\tuuu\ndef\tccc\n\n ***\n\ndef ggg\nabc\taaa\n'.split('\n')
real = """CLUSTAL W (1.82) multiple sequence alignment
abc GCAUGCAUGCAUGAUCGUACGUCAGCAUGCUAGACUGCAUACGUACGUACGCAUGCAUCA 60
def ------------------------------------------------------------
xyz ------------------------------------------------------------
abc GUCGAUACGUACGUCAGUCAGUACGUCAGCAUGCAUACGUACGUCGUACGUACGU-CGAC 119
def -----------------------------------------CGCGAUGCAUGCAU-CGAU 18
xyz -------------------------------------CAUGCAUCGUACGUACGCAUGAC 23
* * * * * **
abc UGACUAGUCAGCUAGCAUCGAUCAGU 145
def CGAUCAGUCAGUCGAU---------- 34
xyz UGCUGCAUCA---------------- 33
* ***""".split('\n')
bad = ['dshfjsdfhdfsj','hfsdjksdfhjsdf']
space_labels = ['abc uca','def ggg ccc']
class clustalTests(TestCase):
"""Tests of top-level functions."""
def test_is_clustal_seq_line(self):
"""is_clustal_seq_line should reject blanks and 'CLUSTAL'"""
ic = is_clustal_seq_line
assert ic('abc')
assert ic('abc def')
assert not ic('CLUSTAL')
assert not ic('CLUSTAL W fsdhicjkjsdk')
assert not ic(' * *')
assert not ic(' abc def')
assert not ic('MUSCLE (3.41) multiple sequence alignment')
def test_last_space(self):
"""last_space should split on last whitespace"""
self.assertEqual(last_space('a\t\t\t b c'), ['a b', 'c'])
self.assertEqual(last_space('xyz'), ['xyz'])
self.assertEqual(last_space(' a b'), ['a','b'])
def test_delete_trailing_number(self):
"""delete_trailing_number should delete the trailing number if present"""
dtn = delete_trailing_number
self.assertEqual(dtn('abc'), 'abc')
self.assertEqual(dtn('a b c'), 'a b c')
self.assertEqual(dtn('a \t b \t c'), 'a \t b \t c')
self.assertEqual(dtn('a b 3'), 'a b')
self.assertEqual(dtn('a b c \t 345'), 'a b c')
class MinimalClustalParserTests(TestCase):
"""Tests of the MinimalClustalParser class"""
def test_null(self):
"""MinimalClustalParser should return empty dict and list on null input"""
result = MinimalClustalParser([])
self.assertEqual(result, ({},[]))
def test_minimal(self):
"""MinimalClustalParser should handle single-line input correctly"""
result = MinimalClustalParser([minimal]) #expects seq of lines
self.assertEqual(result, ({'abc':['ucag']}, ['abc']))
def test_two(self):
"""MinimalClustalParser should handle two-sequence input correctly"""
result = MinimalClustalParser(two)
self.assertEqual(result, ({'abc':['uuu','aaa'],'def':['ccc','ggg']}, \
['abc', 'def']))
def test_real(self):
"""MinimalClustalParser should handle real Clustal output"""
data, labels = MinimalClustalParser(real)
self.assertEqual(labels, ['abc', 'def', 'xyz'])
self.assertEqual(data, {
'abc':
[ 'GCAUGCAUGCAUGAUCGUACGUCAGCAUGCUAGACUGCAUACGUACGUACGCAUGCAUCA',
'GUCGAUACGUACGUCAGUCAGUACGUCAGCAUGCAUACGUACGUCGUACGUACGU-CGAC',
'UGACUAGUCAGCUAGCAUCGAUCAGU'
],
'def':
[ '------------------------------------------------------------',
'-----------------------------------------CGCGAUGCAUGCAU-CGAU',
'CGAUCAGUCAGUCGAU----------'
],
'xyz':
[ '------------------------------------------------------------',
'-------------------------------------CAUGCAUCGUACGUACGCAUGAC',
'UGCUGCAUCA----------------'
]
})
def test_bad(self):
"""MinimalClustalParser should reject bad data if strict"""
result = MinimalClustalParser(bad, strict=False)
self.assertEqual(result, ({},[]))
#should fail unless we turned strict processing off
self.assertRaises(RecordError, MinimalClustalParser, bad)
def test_space_labels(self):
"""MinimalClustalParser should tolerate spaces in labels"""
result = MinimalClustalParser(space_labels)
self.assertEqual(result, ({'abc':['uca'],'def ggg':['ccc']},\
['abc', 'def ggg']))
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_column.py 000644 000765 000024 00000013751 12024702176 022471 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from cogent.util.unit_test import TestCase, main
from cogent.core.info import Info
from cogent.parse.foldalign import find_struct
from cogent.parse.pfold import tree_struct_sep
from cogent.parse.column import column_parser
__author__ = "Shandy Wikman"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Shandy Wikman"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Shandy Wikman"
__email__ = "ens01svn@cs.umu.se"
__status__ = "Development"
class ColumnParserTest(TestCase):
"""Provides tests for Column format RNA secondary structure parsers"""
def setUp(self):
"""Setup function"""
#output
self.pfold_out = PFOLD
self.foldalign_out = FOLDALIGN
#expected
self.pfold_exp = [['GCAGAUUUAGAUGC',[(0,13),(1,12),(2,11),(6,10)]]]
self.foldalign_exp = [['GCAGAUUUAGAUGC',[(0,13),(1,12),(2,11),(6,10)]]]
self.find_struct_exp = [[[(0,13),(1,12),(2,11),(6,10)],'GCCACGUAGCUCAG',
'GCCGUAUGUUUCAG']]
def test_pfold_output(self):
"""Test for column_parser for pfold format"""
tree,lines = tree_struct_sep(self.pfold_out)
self.assertEqual(tree,PFOLD_tree)
obs = column_parser(lines)
self.assertEqual(obs,self.pfold_exp)
def test_foldalign_output(self):
"""Test for column_parser for foldalign format"""
obs = column_parser(self.foldalign_out)
self.assertEqual(obs,self.foldalign_exp)
def test_foldalign_find_struct(self):
""" Test for foldalign parser find struct function"""
obs = find_struct(self.foldalign_out)
self.assertEqual(obs,self.find_struct_exp)
FOLDALIGN = ['; FOLDALIGN 2.0.3\n',
'; REFERENCE J.H. Havgaard, R.B. Lyngs\xf8, G.D. Stormo, J. Gorodkin\n',
'; REFERENCE Pairwise local structural alignment of RNA sequences\n',
'; REFERENCE with sequence similarity less than 40%\n',
'; REFERENCE Bioinformatics 21(9), 1815-1824, 2005\n',
'; ALIGNMENT_ID n.a.\n', '; ALIGNING seq1 against seq2\n',
'; ALIGN seq1 \n',
'; ALIGN seq2 \n',
'; ALIGN Score: 929\n',
'; ALIGN Identity: 69 % ( 48 / 70 )\n',
'; ALIGN Begin\n', '; ALIGN\n',
'; ALIGN seq1 GCCACGUAGC UCAG\n',
'; ALIGN Structure (((...(... ))))\n',
'; ALIGN seq2 GCCGUAUGUU UCAG\n',
'; ALIGN \n', '; ALIGN End\n',
'; ==============================================================================\n',
'; TYPE RNA\n', '; COL 1 label\n',
'; COL 2 residue\n', '; COL 3 seqpos\n',
'; COL 4 alignpos\n', '; COL 5 align_bp\n',
'; COL 6 seqpos_bp\n', '; ENTRY seq1\n',
'; ALIGNMENT_ID n.a.\n', '; ALIGNMENT_LIST seq1 seq2\n',
'; FOLDALIGN_SCORE 929\n', '; GROUP 1\n',
'; FILENAME seq1.fasta\n', '; START_POSITION 2\n',
'; END_POSITION 71\n', '; ALIGNMENT_SIZE 2\n',
'; ALIGNMENT_LENGTH 70\n', '; SEQUENCE_LENGTH 76\n',
'; PARAMETER max_length=76\n',
'; PARAMETER max_diff=76\n',
'; PARAMETER min_loop=3\n',
'; PARAMETER score_matrix=\n',
'; PARAMETER nobranching=\n',
'; PARAMETER global=\n',
'; ----------\n',
'N G 1 1 14 0.90\n',
'N C 2 2 13 0.79\n',
'N A 3 3 12 0.87\n',
'N G 4 4 . 0.60\n',
'N A 5 5 . 0.34\n',
'N U 6 6 . 0.34\n',
'N U 7 7 11 0.98\n',
'N U 8 8 . 0.34\n',
'N A 9 9 . 0.56\n',
'N G 10 10 . 0.67\n',
'N A 11 11 7 0.78\n',
'N U 12 12 3 0.87\n',
'N G 13 13 2 0.87\n',
'N C 14 14 1 0.90\n',
'; **********\n']
PFOLD = ['; generated by fasta2col\n',
'; ============================================================\n',
'; TYPE TREE\n', '; COL 1 label\n',
'; COL 2 number\n', '; COL 3 name\n',
'; COL 4 uplen\n', '; COL 5 child\n',
'; COL 6 brother\n', '; ENTRY tree\n',
'; root 1\n', '; ----------\n',
' N 1 seq1 0.001000 . .\n',
'; **********\n', '; TYPE RNA\n', '; COL 1 label\n',
'; COL 2 residue\n', '; COL 3 seqpos\n',
'; COL 4 alignpos\n', '; COL 5 align_bp\n',
'; COL 6 certainty\n', '; ENTRY seq1\n',
'; ----------\n',
'N G 1 1 14 0.90\n',
'N C 2 2 13 0.79\n',
'N A 3 3 12 0.87\n',
'N G 4 4 . 0.60\n',
'N A 5 5 . 0.34\n',
'N U 6 6 . 0.34\n',
'N U 7 7 11 0.98\n',
'N U 8 8 . 0.34\n',
'N A 9 9 . 0.56\n',
'N G 10 10 . 0.67\n',
'N A 11 11 7 0.78\n',
'N U 12 12 3 0.87\n',
'N G 13 13 2 0.87\n',
'N C 14 14 1 0.90\n',
'; **********\n']
PFOLD_tree = ['; generated by fasta2col\n',
'; ============================================================\n',
'; TYPE TREE\n', '; COL 1 label\n',
'; COL 2 number\n', '; COL 3 name\n',
'; COL 4 uplen\n', '; COL 5 child\n',
'; COL 6 brother\n', '; ENTRY tree\n',
'; root 1\n', '; ----------\n',
' N 1 seq1 0.001000 . .\n', '; **********\n']
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_comrna.py 000644 000765 000024 00000015433 12024702176 022452 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from cogent.util.unit_test import TestCase, main
from cogent.core.info import Info
from cogent.parse.comrna import comRNA_parser,common
__author__ = "Shandy Wikman"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Shandy Wikman"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Shandy Wikman"
__email__ = "ens01svn@cs.umu.se"
__status__ = "Development"
class ComrnaParserTest(TestCase):
"""Provides tests for COMRNA RNA secondary structure format parsers"""
def setUp(self):
"""Setup function """
#output
self.comrna_out = COMRNA
#expected
self.comrna_exp = [['GGCUAGAUAGCUCA',[(0,13),(1,12),(4,9),(5,8)]],
['GGCUAGAUAGCUCA',[(0,13),(1,12),(4,9),(5,8)]],
['GGCUAGAUAGCUCA',[(0,13),(1,12),(4,9),(5,8)]],
['GGCUAGAUAGCUCA',[(0,13),(1,12)]]]
def test_comrna_output(self):
"""Test for comrna format parser"""
obs = comRNA_parser(self.comrna_out)
self.assertEqual(obs,self.comrna_exp)
def test_common_func(self):
"""Test common function in comrna parser """
obs = common(self.comrna_exp)
exp = [['GGCUAGAUAGCUCA',[(0,13),(1,12),(4,9),(5,8)]],
['GGCUAGAUAGCUCA',[(0,13),(1,12)]]]
self.assertEqual(obs,exp)
COMRNA = ['comRNA input.fasta \n', '\n', 'PARAMETERS: \n', 'L = 4, Minimum length of a straight stem;\n', 'E = -5.00, Maximum stem energy allowed for a stem to be analyzed, in kcal/mol;\n', 'S = 0.00, Minimum stem similarity score b/w two stems compared;\n', 'Sh = 0.60, Maximum stem similarity score threshold that will be tested;\n', 'Sl = 0.20, Minimum stem similarity score threshold that will be tested;\n', 'P = 0.50, Minimum percentage of sequences in which a common structure should occur;\n', 'n = 10, Number of common structures to be reported;\n', 'x = 999, Maximum number of pseudoknot crossover pattern allowed between one stem and other stems in a structure;\n', 'a = 1, Use anchor region during stem comparison;\n', 'o = 4, Maximum number of overlapping nucleotides allowed between two stems;\n', 'c = 0.30, Maximum percentage of stem length that is allowed overlapping between two stems;\n', 'j = 0.70, Maximum percentage of stems allowed overlapping between two different cliques;\n', 'r = 0.40, Minimum percentage of stems required to be same for two cliques to be considered same when reporting structures;\n', 'f = 10, Number of flanking nucleotides of a stem to be refolded together during structure refinement;\n', 'v = 5, Maximum length of nucleotides allowed for a new loop to deviate from its length in the original structure pattern;\n', 'g = 0, Use topological sort to assemble stem blocks;\n', '\n', 'Sequence file name: "input.fasta"\n', '\n', 'Sequences loaded ...\n', '1 seq1 72 nt\n', '2 seq2 72 nt\n', '3 seq3 72 nt\n', '4 seq4 72 nt\n', '\n', '\n', 'Number of stems in each energy bin for each sequence:\n', '\n', 'energy[kc/m] -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0\n', 'seq1 2 1 1 6 1 5 9 2 1 3 0 1\n', 'seq2 2 1 1 6 1 5 9 2 1 3 0 1\n', 'seq3 2 1 1 6 1 5 9 2 1 3 0 1\n', 'seq4 2 1 1 6 1 5 9 2 1 3 0 1\n', '\n', '\n', 'Pairwise Sequence Identity (%): \n', '\n', ' 1 2 3 4\n', '\n', ' 1 - 100 100 100\n', ' 2 100 - 100 100\n', ' 3 100 100 - 100\n', ' 4 100 100 100 - \n', '\n', 'Average Pairwise Sequence Identity (%): 100\n', '\n', 'Comparing stems pairwise ... \n', '\n', 'Number of edges that has stem-similarity-score higher than a certain threshold in the stem graph:\n', '\n', 'Score: 0.8 0.78 0.76 0.74 0.72 0.7 0.68 0.66 0.64 0.62 0.6 0.58 0.56 0.54 0.52 0.5 0.48 0.46 0.44 0.42 0.4 0.38 0.36 0.34 0.32 0.3 0.28 0.26 0.24 0.22 0.2\n', 'Num of edges: 12 12 18 18 18 24 24 54 72 78 102 114 114 120 132 132 144 156 168 168 174 180 180 180 180 180 180 180 180 180 180 180\n', '\n', 'Time spent on comparing stems: 0.03 seconds user CPU time; 0.04 seconds real time.\n', '\n', 'Maximum structure finding time: 1 min\n', '\n', '=========================== S = 0.6 ===========================\n', '\n', 'Find conserved stems (cliques) ... ==== 17 cliques ==== 17 unique ====\n', 'Time spent on finding conserved stems: 0 sec CPU time; 0 sec clock time.\n', '\n', 'Construct clique topological graph ... ==== 53 edges ====\n', 'Assemble conserved stems (cliques) ... ==== 44 structures ====\n', 'Time spent on topologically assembling conserved stems: 0 sec CPU time; 0 sec clock time.\n', '\n', 'Report top 10 structures.\n', '-------------------------------------------\n', 'Structure #1: Score = 10.1, pattern: 41, path: 0 1 3 , comseq: 1 2 3 4 , incompatible_seq: 0() 1() 3() \n', '(a) Clique 0: OriginalScore = 3.82, ModifiedScore = 3.82\n', ' 1, seq1 1 GGCUAGA 7 ... 66 UCUGGCC 72 [-13 kc/m]\n', ' 2, seq2 1 GGCUAGA 7 ... 66 UCUGGCC 72 [-13 kc/m]\n', ' 3, seq3 1 GGCUAGA 7 ... 66 UCUGGCC 72 [-13 kc/m]\n', ' 4, seq4 1 GGCUAGA 7 ... 66 UCUGGCC 72 [-13 kc/m]\n', '(b) Clique 1: OriginalScore = 3.45, ModifiedScore = 3.45\n', ' 1, seq1 29 GGAUUGAA 36 ... 54 UUCGAUCC 61 [-11.6 kc/m]\n', ' 2, seq2 29 GGAUUGAA 36 ... 54 UUCGAUCC 61 [-11.6 kc/m]\n', ' 3, seq3 29 GGAUUGAA 36 ... 54 UUCGAUCC 61 [-11.6 kc/m]\n', ' 4, seq4 29 GGAUUGAA 36 ... 54 UUCGAUCC 61 [-11.6 kc/m]\n', '(c) Clique 3: OriginalScore = 2.82, ModifiedScore = 2.82\n', ' 1, seq1 49 GUCGG 53 UUCGAUC 61 CCGGC 65 [-8.4 kc/m]\n', ' 2, seq2 49 GUCGG 53 UUCGAUC 61 CCGGC 65 [-8.4 kc/m]\n', ' 3, seq3 49 GUCGG 53 UUCGAUC 61 CCGGC 65 [-8.4 kc/m]\n', ' 4, seq4 49 GUCGG 53 UUCGAUC 61 CCGGC 65 [-8.4 kc/m]\n', '\n', '\n', 'seq1 1 GGCUAGAUAGCUCA 14 \n', ' aa bb bb aa\n', 'seq2 1 GGCUAGAUAGCUCA 14 \n', ' aa bb bb aa\n', 'seq3 1 GGCUAGAUAGCUCA 14 \n', ' aa bb bb aa\n', 'seq4 1 GGCUAGAUAGCUCA 14 \n', ' aa aa\n', '\n', '\n', '-------------------------------------------']
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_consan.py 000644 000765 000024 00000002667 12024702176 022461 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from cogent.util.unit_test import TestCase, main
from cogent.core.info import Info
from cogent.parse.consan import consan_parser
__author__ = "Shandy Wikman"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Shandy Wikman"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Shandy Wikman"
__email__ = "ens01svn@cs.umu.se"
__status__ = "Development"
class ConsanParserTest(TestCase):
"""Provides tests for CONSAN RNA secondary structure parsers"""
def setUp(self):
"""Setup function"""
#output
self.consan_out = CONSAN
#expected
self.consan_exp = [{'seq1':'GGACACGUCGCUCA','seq2':'G.ACAGAUCGCUCA'},
[(0,13),(2,11),(5,8)]]
def test_consan_output(self):
"""Test for consan format parser"""
obs = consan_parser(self.consan_out)
self.assertEqual(obs,self.consan_exp)
CONSAN = ['Using standard STA scoring\n',
'Using QRADIUS constraints (Quality > 0.9500 SPACED 20) !\n',
'# STOCKHOLM 1.0\n', '\n', '#=GF SC\t 1.000000\n', '#=GF PI\t 0.720000\n',
'#=GF ME\t QRadius Num: 3 Win: 20 Cutoff: 0.95\n',
'\n', 'seq1 \tGGACACGUCGCUCA\n',
'seq2 \tG.ACAGAUCGCUCA\n',
'#=GC SS_cons\t\t >.>..>..<..<.<\n',
'#=GC PN \t\t .......*......\n', '\n', '\n', '\n',
'#=GF TIME 43.0\n', '\n', '//\n']
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_cove.py 000644 000765 000024 00000002641 12024702176 022124 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from cogent.util.unit_test import TestCase, main
from cogent.core.info import Info
from cogent.parse.cove import coves_parser
__author__ = "Shandy Wikman"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Shandy Wikman"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Shandy Wikman"
__email__ = "ens01svn@cs.umu.se"
__status__ = "Development"
class CovesParserTest(TestCase):
"""Provides tests for Coves RNA secondary structure parsers"""
def setUp(self):
"""Setup function """
#output
self.cove_out = COVE
#expected
self.cove_exp = [['UAGAUGGCUCUCAU',[(0,13),(2,11),(3,10)]]]
def test_cove_output(self):
"""Test coves format parser"""
obs = coves_parser(self.cove_out)
self.assertEqual(obs,self.cove_exp)
COVE = ['coves - scoring and structure prediction of RNA sequences\n', ' using a covariance model\n',
' version 2.4.4, January 1996\n', '\n',
'---------------------------------------------------\n',
'Database to search/score: single.fasta\n',
'Model: single.fasta.cm\n',
'GC% of background model: 50%\n',
'---------------------------------------------------\n', '\n',
'-32.55 bits : seq1\n', ' seq1 UAGAUGGCUCUCAU\n',
' seq1 >.>>......<<.<\n', '\n']
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_ct.py 000644 000765 000024 00000012257 12024702176 021602 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from cogent.util.unit_test import TestCase, main
from cogent.core.info import Info
from cogent.parse.ct import ct_parser
__author__ = "Shandy Wikman"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Shandy Wikman"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Shandy Wikman"
__email__ = "ens01svn@cs.umu.se"
__status__ = "Development"
class CtParserTest(TestCase):
"""Provides tests for RNA secondary structure parsers"""
def setUp(self):
"""Setup function"""
#output
self.carnac_out = CARNAC
self.dynalign_out = DYNALIGN
self.mfold_out = MFOLD
self.sfold_out = SFOLD
self.unafold_out = UNAFOLD
self.knetfold_out = KNETFOLD
#expected
self.carnac_exp = [['GCAGAUGGCUUC',[(0,11),(2,9),(3,8)]]]
self.dynalign_exp = [['GCAGAUGGCUUC',[(0,11),(2,9),(3,8)],-46.3]]
self.mfold_exp = [['GCAGAUGGCUUC',[(0,11),(2,9),(3,8)],-23.47]]
self.sfold_exp = [['GCAGAUGGCUUC',[(0,11),(2,9),(3,8)],-22.40]]
self.unafold_exp = [['GCAGAUGGCUUC',[(0,11),(2,9),(3,8)],-20.5]]
self.knetfold_exp = [['GCAGAUGGCUUC',[(0,11),(2,9),(3,8)]]]
def test_carnac_output(self):
"""Test for ct_parser for carnac format"""
obs = ct_parser(self.carnac_out)
self.assertEqual(obs,self.carnac_exp)
def test_dynalign_output(self):
"""Test for ct_parser for dynalign format"""
obs = ct_parser(self.dynalign_out)
self.assertEqual(obs,self.dynalign_exp)
def test_mfold_output(self):
"""Test for ct_parser for mfold format"""
obs = ct_parser(self.mfold_out)
self.assertEqual(obs,self.mfold_exp)
def test_sfold_output(self):
"""Test for ct_parser for sfold format"""
obs = ct_parser(self.sfold_out)
self.assertEqual(obs,self.sfold_exp)
def test_unafold_output(self):
"""Test for ct_parser for unafold format"""
obs = ct_parser(self.unafold_out)
self.assertEqual(obs,self.unafold_exp)
def test_knetfold_output(self):
"""Test for ct_parser for knetfold format"""
obs = ct_parser(self.knetfold_out)
self.assertEqual(obs,self.knetfold_exp)
CARNAC = [' 12 seq1\n', ' 1 G 0 2 12 1\n',
' 2 C 1 3 0 2\n', ' 3 A 2 4 10 3\n',
' 4 G 3 5 9 4\n', ' 5 A 4 6 0 5\n',
' 6 U 5 7 0 6\n', ' 7 G 6 8 0 7\n',
' 8 G 7 9 0 8\n', ' 9 C 8 10 4 9\n',
' 10 U 6 11 3 10 \n', ' 11 U 7 12 0 11\n',
' 12 C 8 13 1 12\n']
DYNALIGN = [' 72 ENERGY = -46.3 seq 3\n',
' 1 G 0 2 12 1\n', ' 2 C 1 3 0 2\n',
' 3 A 2 4 10 3\n', ' 4 G 3 5 9 4\n',
' 5 A 4 6 0 5\n', ' 6 U 5 7 0 6\n',
' 7 G 6 8 0 7\n', ' 8 G 7 9 0 8\n',
' 9 C 8 10 4 9\n', ' 10 U 6 11 3 10 \n',
' 11 U 7 12 0 11\n', ' 12 C 8 13 1 12\n']
MFOLD = [' 12 dG = -23.47 [initially -22.40] seq1 \n',
' 1 G 0 2 12 1\n', ' 2 C 1 3 0 2\n',
' 3 A 2 4 10 3\n', ' 4 G 3 5 9 4\n',
' 5 A 4 6 0 5\n', ' 6 U 5 7 0 6\n',
' 7 G 6 8 0 7\n', ' 8 G 7 9 0 8\n',
' 9 C 8 10 4 9\n', ' 10 U 6 11 3 10 \n',
' 11 U 7 12 0 11\n', ' 12 C 8 13 1 12\n']
SFOLD = ['Structure 1 -22.40 0.63786E-01\n',
' 1 G 0 2 12 1\n', ' 2 C 1 3 0 2\n',
' 3 A 2 4 10 3\n', ' 4 G 3 5 9 4\n',
' 5 A 4 6 0 5\n', ' 6 U 5 7 0 6\n',
' 7 G 6 8 0 7\n', ' 8 G 7 9 0 8\n',
' 9 C 8 10 4 9\n', ' 10 U 6 11 3 10 \n',
' 11 U 7 12 0 11\n', ' 12 C 8 13 1 12\n']
UNAFOLD = ['12\tdG = -20.5\tseq1\n', ' 1 G 0 2 12 1\n',
' 2 C 1 3 0 2\n', ' 3 A 2 4 10 3\n',
' 4 G 3 5 9 4\n', ' 5 A 4 6 0 5\n',
' 6 U 5 7 0 6\n', ' 7 G 6 8 0 7\n',
' 8 G 7 9 0 8\n', ' 9 C 8 10 4 9\n',
' 10 U 6 11 3 10 \n', ' 11 U 7 12 0 11\n',
' 12 C 8 13 1 12\n']
KNETFOLD = [' 12 \n', ' 1 G 0 2 12 1\n',
' 2 C 1 3 0 2\n', ' 3 A 2 4 10 3\n',
' 4 G 3 5 9 4\n', ' 5 A 4 6 0 5\n',
' 6 U 5 7 0 6\n', ' 7 G 6 8 0 7\n',
' 8 G 7 9 0 8\n', ' 9 C 8 10 4 9\n',
' 10 U 6 11 3 10 \n', ' 11 U 7 12 0 11\n',
' 12 C 8 13 1 12\n']
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_cut.py 000644 000765 000024 00000002204 12024702176 021756 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from cogent.util.unit_test import TestCase, main
from cogent.parse.cut import cut_parser
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Development"
class CutParserTest(TestCase):
"""Provides tests for codon usage table parser"""
def test_cut_parser(self):
"""cut_parser should work on first few lines of supplied file"""
lines = """#Species: Saccharomyces cerevisiae
#Division: gbpln
#Release: TranstermMay1994
#CdsCount: 24
#Coding GC 44.99%
#1st letter GC 47.28%
#2nd letter GC 40.83%
#3rd letter GC 46.86%
#Codon AA Fraction Frequency Number
GCA A 0.010 1.040 6
GCC A 0.240 22.420 130
GCG A 0.000 0.000 0
GCT A 0.750 71.610 411
TGC C 0.070 0.520 3
"""
result = cut_parser(lines.splitlines())
self.assertEqual(result, \
{'GCA':6,'GCC':130,'GCG':0,'GCT':411,'TGC':3})
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_cutg.py 000644 000765 000024 00000014671 12024702176 022140 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for the CUTG database parsers.
"""
from cogent.parse.cutg import CutgParser, CutgSpeciesParser, InfoFromLabel
from cogent.parse.record import RecordError
from cogent.util.unit_test import TestCase, main
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
sample_gene = r""">AB000406\AB000406\100..1506\1407\BAA19100.1\Xenopus laevis\Xenopus laevis mRNA for protein phosphatase 2A regulatory subunit,complete cds./gene="PP2A"/codon_start=1/product="protein phosphatase 2A regulatory subunit"/protein_id="BAA19100.1"/db_xref="GI:1783183"
1 7 4 2 15 6 4 4 11 4 1 7 6 7 4 7 14 7 10 7 4 6 5 2 3 8 3 4 3 6 5 4 6 0 8 5 8 8 20 10 21 13 2 9 5 7 22 16 20 10 10 8 7 3 6 15 12 5 14 11 6 0 1 0
>AB000458#1\AB000458\105..623\519\BAA22881.1\Xenopus laevis\Xenopus laevis Xem1 mRNA for transmembrane protein, complete cds./gene="Xem1"/codon_start=1/product="transmembrane protein"/protein_id="BAA22881.1"/db_xref="GI:2554596"
0 1 1 0 2 1 6 7 10 4 0 1 4 6 1 5 2 1 3 1 1 5 2 5 0 5 2 4 0 2 2 1 1 1 2 5 4 4 2 2 1 3 1 6 4 2 2 5 1 2 3 2 1 2 2 9 3 5 2 6 4 1 0 0
>AB000736\AB000736\27..557\531\BAA19174.1\Xenopus laevis\Xenopus laevis mRNA for myelin basic protein, complete cds./codon_start=1/product="myelin basic protein"/protein_id="BAA19174.1"/db_xref="GI:1816437"
1 1 1 2 5 11 1 0 3 2 0 1 8 4 2 11 3 1 2 1 0 0 5 2 0 5 3 3 0 1 14 3 4 2 1 0 1 4 3 7 3 0 2 3 9 7 1 4 3 3 5 4 0 0 5 2 1 1 0 4 1 0 0 1 """.split('\n')
sample_species = r"""Salmonella enterica: 332
432 1587 640 1808 586 275 758 825 3557 1668 2943 1663 1267 928 1002 1397 1359 1277 1096 1880 1305 1267 911 608 1752 1002 1684 1868 2960 1627 995 2399 1228 2103 1605 1289 2024 2017 4800 1681 1719 3181 1699 2857 700 1723 4347 2560 1577 4174 1031 2938 496 678 1354 3501 1713 1955 3743 2497 1366 214 22 96
Salmonella enterica IIIb 50:k:z: 3
2 5 2 4 3 3 4 6 12 5 7 1 7 5 1 4 7 3 10 13 5 6 9 4 3 5 17 14 14 14 6 18 5 12 15 10 4 11 24 5 16 17 9 19 6 3 11 9 9 18 7 20 2 0 8 17 6 11 16 11 5 2 0 1
Salmonella enterica subsp. VII: 5
4 1 6 7 6 0 8 15 10 9 18 9 15 6 6 19 21 3 21 14 11 22 16 10 11 6 38 13 20 10 8 8 9 9 7 5 7 22 58 21 25 40 24 22 11 16 41 21 6 14 9 11 2 12 10 31 16 23 22 22 3 0 1 4""".split('\n')
strange_db = r'''>AB001737\AB001737\1..696\696\BAA19944.1\Mus musculus\Mus musculus mRNA for anti-CEA scFv antibody, complete cds./codon_start=1/product="anti-CEA scFv antibody"/protein_id="BAA19944.1"/db_xref="GI:2094751"/db_xref="IMGT/LIGM:AB001737"'''
class InfoFromLabelTests(TestCase):
"""Tests of the InfoFromLabel constructor."""
def test_init(self):
"""InfoFromLabel should handle a typical label line"""
i = InfoFromLabel(sample_gene[0])
sa = self.assertEqual
sa(i.GenBank, ['AB000406'])
sa(i.Locus, 'AB000406')
sa(i.CdsNumber, '1'),
sa(i.Location, '100..1506')
sa(i.Length, '1407')
sa(i.Species, 'Xenopus laevis')
sa(i.Description, r'Xenopus laevis mRNA for protein phosphatase 2A regulatory subunit,complete cds./gene="PP2A"/codon_start=1/product="protein phosphatase 2A regulatory subunit"/protein_id="BAA19100.1"/db_xref="GI:1783183"')
sa(i.Gene, 'PP2A')
sa(i.CodonStart, '1')
sa(i.Product, 'protein phosphatase 2A regulatory subunit')
sa(i.GenPept, ['BAA19100.1'])
sa(i.GI, ['1783183'])
j = InfoFromLabel(sample_gene[2])
assert j.Refs is not i.Refs
assert j._handler is not i._handler
assert j._handler is j.Refs
assert j.Refs.GI is not i.Refs.GI
assert j.GI is not i.GI
sa(j.GenBank, ['AB000458']),
sa(j.Locus, 'AB000458')
sa(j.CdsNumber, '1')
sa(j.Location, '105..623')
sa(j.Length, '519'),
sa(j.Species, 'Xenopus laevis')
sa(j.Description, 'Xenopus laevis Xem1 mRNA for transmembrane protein, complete cds./gene="Xem1"/codon_start=1/product="transmembrane protein"/protein_id="BAA22881.1"/db_xref="GI:2554596"')
sa(j.GenPept, ['BAA22881.1']),
sa(j.GI, ['2554596'])
sa(j.Product, 'transmembrane protein')
def test_init_unknown_db(self):
"""InfoFromLabel should handle a line whose database is unknown"""
i = InfoFromLabel(strange_db)
self.assertEqual(i.Locus, 'AB001737')
class CutgSpeciesParserTests(TestCase):
"""Tests of the CutgSpeciesParser."""
def test_init(self):
"""CutgSpeciesParser should read records one at a time from lines"""
recs = list(CutgSpeciesParser(sample_species))
self.assertEqual(len(recs), 3)
a, b, c = recs
self.assertEqual(a.Species, 'Salmonella enterica')
self.assertEqual(a.NumGenes, 332)
self.assertEqual(a['CGA'], 432)
self.assertEqual(a['UGG'], 1366)
self.assertEqual(b.Species, 'Salmonella enterica IIIb 50:k:z')
self.assertEqual(b.NumGenes, 3)
self.assertEqual(b['CGA'], 2)
self.assertEqual(b['UGG'], 5)
self.assertEqual(c.Species, 'Salmonella enterica subsp. VII')
self.assertEqual(c.NumGenes, 5)
self.assertEqual(c['CGA'], 4)
self.assertEqual(c['UGG'], 3)
#check that it won't work if we're missing any lines
self.assertRaises(RecordError, list,
CutgSpeciesParser(sample_species[1:]))
self.assertRaises(RecordError, list,
CutgSpeciesParser(sample_species[:-1]))
#...but that it does work if we only have some of them
recs = list(CutgSpeciesParser(sample_species[2:]))
self.assertEqual(recs[0], b)
self.assertEqual(len(list(CutgSpeciesParser(sample_species[1:],
strict=False))), 2)
class CutgParserTests(TestCase):
"""Tests of the CutgParser.
Note: these are fairly incomplete at present since most of the work is in
parsing the label line, which is tested by itself.
"""
def test_init(self):
"""CutgParser should read records one at a time from lines"""
recs = list(CutgParser(sample_gene))
self.assertEqual(len(recs), 3)
a, b, c = recs
self.assertEqual(a.Species, 'Xenopus laevis')
self.assertEqual(a['CGC'], 7)
self.assertEqual(a.GI, ['1783183'])
self.assertRaises(RecordError, list, CutgParser(sample_gene[1:]))
self.assertEqual(len(list(CutgParser(sample_gene[1:],strict=False))), 2)
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_dialign.py 000644 000765 000024 00000010414 12024702176 022574 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
import unittest
from cogent import PROTEIN, LoadSeqs
from cogent.parse.dialign import align_block_lines, parse_data_line, DialignParser
__author__ = "Gavin Huttley"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Gavin Huttley"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Production"
data = \
"""
DIALIGN 2.2.1
*************
Program code written by Burkhard Morgenstern and Said Abdeddaim
e-mail contact: dialign (at) gobics (dot) de
Published research assisted by DIALIGN 2 should cite:
Burkhard Morgenstern (1999).
DIALIGN 2: improvement of the segment-to-segment
approach to multiple sequence alignment.
Bioinformatics 15, 211 - 218.
For more information, please visit the DIALIGN home page at
http://bibiserv.techfak.uni-bielefeld.de/dialign/
************************************************************
program call: dialign -fa -fn /tmp/ct/seq1.fasta /tmp/ct/seq1.txt
Aligned sequences: length:
================== =======
1) HTL2 57
2) MMLV 58
3) HEPB 62
4) ECOL 54
Average seq. length: 57.8
Please note that only upper-case letters are considered to be aligned.
Alignment (DIALIGN format):
===========================
HTL2 1 ldtapC-LFS DGS------P QKAAYVL--- ----WDQTIL QQDITPLPSH
MMLV 1 pdadhtw-YT DGSSLLQEGQ RKAGAAVtte teviWa---- KALDAG---T
HEPB 1 rpgl-CQVFA DAT------P TGWGLVM--- ----GHQRMR GTFSAPLPIH
ECOL 1 mlkqv-EIFT DGSCLGNPGP GGYGAIL--- ----RYRGRE KTFSAGytrT
0000000588 8882222229 9999999000 0000666666 6666633334
HTL2 37 ethSAQKGEL LALICGLRAa k--------- ---
MMLV 43 ---SAQRAEL IALTQALKm- ---------- ---
HEPB 37 t------AEL LAA-CFARSr sganiigtdn svv
ECOL 43 ---TNNRMEL MAAIv----- ---------- ---
0003333455 5533333300 0000000000 000
Sequence tree:
==============
Tree constructed using UPGMAbased on DIALIGN fragment weight scores
((HTL2 :0.130254MMLV :0.130254):0.067788(HEPB :0.120520ECOL :0.120520):0.077521);
""".splitlines()
class TestDialign(unittest.TestCase):
def setUp(self):
aln_seqs = {"HTL2": "ldtapC-LFSDGS------PQKAAYVL-------WDQTILQQDITPLPSHethSAQKGELLALICGLRAak------------",
"MMLV": "pdadhtw-YTDGSSLLQEGQRKAGAAVtteteviWa----KALDAG---T---SAQRAELIALTQALKm--------------",
"HEPB": "rpgl-CQVFADAT------PTGWGLVM-------GHQRMRGTFSAPLPIHt------AELLAA-CFARSrsganiigtdnsvv",
"ECOL": "mlkqv-EIFTDGSCLGNPGPGGYGAIL-------RYRGREKTFSAGytrT---TNNRMELMAAIv------------------"}
self.aln_seqs = {}
for name, seq in aln_seqs.items():
self.aln_seqs[name] = PROTEIN.Sequence(seq,Name=name)
self.QualityScores = "00000005888882222229999999900000006666666666633334000333345555333333000000000000000"
def test_line_split(self):
"""test splitting of sequence record lines"""
result = parse_data_line("HTL2 1 ldtapcLFSD GS------PQ KAAYVLWDQT ILQQDITPLP SHethsaqkg ")
self.assertEqual(result, ("HTL2", "ldtapcLFSDGS------PQKAAYVLWDQTILQQDITPLPSHethsaqkg"))
result = parse_data_line(" 1111111111 1000001111 1111033333 3333333333 3000000000 ")
self.assertEqual(result, (None, "11111111111000001111111103333333333333333000000000"))
def test_aligned_from_dialign(self):
"""test getting aligned seqs"""
aligned_seq = dict(list(DialignParser(data, seq_maker=PROTEIN.Sequence)))
assert aligned_seq == self.aln_seqs
def test_quality_scores(self):
"""test quality scores correctly returned"""
result = dict(list(DialignParser(data, seq_maker=PROTEIN.Sequence,
get_scores=True)))
assert result["QualityScores"] == self.QualityScores
if __name__ == '__main__':
unittest.main()
PyCogent-1.5.3/tests/test_parse/test_dotur.py 000644 000765 000024 00000004076 12024702176 022331 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
# test_dotur.py
from cogent.util.unit_test import TestCase, main
from cogent.parse.dotur import get_otu_lists, OtuListParser
__author__ = "Jeremy Widmann"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jeremy Widmann"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Jeremy Widmann"
__email__ = "jeremy.widmann@colorado.edu"
__status__ = "Development"
class DoturParserTests(TestCase):
"""Tests for DoturParser.
"""
def setUp(self):
"""setup for DoturParserTests.
"""
self.otu_list_string = \
"""unique 3 a b c
0.00 3 a b c
0.59 2 a,c b
0.78 1 a,c,b
"""
self.otu_res_list = [
[0.0,3,[['a'],['b'],['c']]],\
[0.0,3,[['a'],['b'],['c']]],\
[float(0.59),2,[['a','c'],['b']]],\
[float(0.78),1,[['a','c','b']]],\
]
self.otu_lists_unparsed=[\
['a','b','c'],
['a','b','c'],
['a,c','b'],
['a,c,b']
]
self.otu_lists_parsed=[\
[['a'],['b'],['c']],
[['a'],['b'],['c']],
[['a','c'],['b']],
[['a','c','b']]
]
def test_get_otu_lists_no_data(self):
"""get_otu_lists should function as expected.
"""
self.assertEqual(get_otu_lists([]),[])
def test_get_otu_lists(self):
"""get_otu_lists should function as expected.
"""
for obs, exp in zip(self.otu_lists_unparsed,self.otu_lists_parsed):
self.assertEqual(get_otu_lists(obs),exp)
def test_otulistparser_no_data(self):
"""OtuListParser should return correct result given no data.
"""
res = OtuListParser([])
self.assertEqual(list(res),[])
def test_otulistparser_parser(self):
"""OtuListParser should return correct result given basic output.
"""
res = OtuListParser(self.otu_list_string.split('\n'))
self.assertEqual(res,self.otu_res_list)
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_ebi.py 000644 000765 000024 00000104537 12024702176 021736 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
""" Provides tests for EbiParser and related classes and functions.
"""
from cogent.parse.ebi import cc_alternative_products_parser, \
cc_basic_itemparser, cc_biophysicochemical_properties_parser, \
cc_interaction_parser, cc_itemfinder, cc_parser, EbiFinder, \
MinimalEbiParser, hanging_paragraph_finder, join_parser, \
join_split_dict_parser, join_split_parser, labeloff, linecode_maker, \
linecode_merging_maker, mapping_parser, pairs_to_dict, period_tail_finder, \
rstrip_, ft_basic_itemparser, ft_id_parser, ft_mutagen_parser, \
ft_mutation_parser, ft_parser, try_int, ra_parser, rc_parser, \
rg_parser, rl_parser, rn_parser, rp_parser, rt_parser, rx_parser, \
gn_parser, single_ref_parser, ac_parser, de_itemparser, dr_parser, \
dt_parser, id_parser, oc_parser, og_parser, os_parser, ox_parser, \
sq_parser, de_parser, RecordError, FieldError, curry, required_labels, \
EbiParser
from cogent.util.unit_test import TestCase, main
from cogent.core.sequence import Sequence
from cogent.core.info import Info
__author__ = "Zongzhi Liu"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Zongzhi Liu", "Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Zongzhi Liu"
__email__ = "zongzhi.liu@gmail.com"
__status__ = "Development"
def item_empty_filter(d):
"""return a dict with only nonempty values"""
pairs = [(k,v) for (k,v) in d.iteritems() if v]
return dict(pairs)
class EbiTests(TestCase):
""" Tests ebi parsers and generic parsers and general functions """
def setUp(self):
""" Construct some fake data for testing purposes """
pass
def test_item_empty_filter(self):
"""item_empty_filter: known values"""
inputs = [
{1:0, 2:1, 3:'', 4:[], 5:False, 6:{}}]
expects = [
{2:1}]
self.assertEqual(map(item_empty_filter, inputs), expects)
def test_rstrip_(self):
"""rstrip_ should generate the expected function"""
test = ' aaa; '
self.assertEqual(rstrip_('; ')(test),
test.rstrip('; '))
#test default
self.assertEqual(rstrip_()(test),
test.rstrip())
def test_hanging_paragraph_finder(self):
"""hanging_paragraph_finder should give expected results"""
f = hanging_paragraph_finder
test = [
'-aaa:',
' content',
' content',
'-bbb:', #3
' bbb',
'c',]
self.assertEqual(list(f(test)),
[test[0:3], test[3:-1], test[-1:]])
#test all indent lines
all_indent = [' aa', ' bb']
self.assertEqual(list(f(all_indent)), [all_indent])
#test empty lines
self.assertEqual(list(f(['',' '])), [])
def test_period_tail_finder(self):
"""period_tail_finder should yield each group of expected lines."""
test = "a\naa.\nb\nbb.".splitlines()
self.assertEqual(list(period_tail_finder(test)),
[['a','aa.'],['b','bb.']])
def test_EbiFinder(self):
"""EbiFinder should return expected list"""
test = "a\n//\nb\n//".splitlines()
self.assertEqual(list(EbiFinder(test)),
[['a','//'],['b','//']])
test_fail = test + ['c']
self.assertRaises(RecordError,
list, EbiFinder(test_fail))
def test_pairs_to_dict(self):
"""pairs_to_dict should return the expected dict"""
test_dict = {'a': 1, 'b': 2, 'b': 3,}
sorted_items = [('a', 1), ('b', 2), ('b', 3),]
add_one = lambda x: x + 1
double = lambda x: x*2
set_zero = lambda x: 0
handlers ={ 'a': add_one, 'b': double,}
#test default all
self.assertEqual(pairs_to_dict(sorted_items),
{'a': 1, 'b': 3})
#test 'overwrite_value'
self.assertEqual(pairs_to_dict(sorted_items, 'overwrite_value'),
{'a': 1, 'b': 3})
#test no_duplicated_key, raise
self.assertRaises(ValueError, pairs_to_dict,
sorted_items, 'no_duplicated_key')
#test always_multi_value
self.assertEqual(pairs_to_dict(sorted_items, 'always_multi_value'),
{'a': [1], 'b': [2, 3]})
#test allow multi_value
self.assertEqual(pairs_to_dict(sorted_items, 'allow_multi_value'),
{'a': 1, 'b': [2, 3]})
#test raise error when key not found in all_keys
f = curry(pairs_to_dict, all_keys=['a','c'])
self.assertRaises(ValueError, f, sorted_items)
#test handlers
sorted_items.append(('c', 4))
self.assertEqual(pairs_to_dict(sorted_items, handlers=handlers,
default_handler=set_zero), {'a': 2, 'b': 6, 'c': 0})
#test raise error when no valid handlers were found
f = curry(pairs_to_dict, handlers=handlers)
self.assertRaises(ValueError, f, sorted_items)
#test sanity
test_dict = dict(sorted_items)
self.assertEqual(pairs_to_dict(test_dict.items()),
test_dict)
def test_linecode_maker(self):
"""linecode_maker: should return expected tuple"""
tests = ['AA aa.',
'BB bb.',
'CC C cc.',
'DD dd.']
expected_linecodes =[
'AA', 'BB', 'CC C', 'DD dd.']
#pprint(map(linecode_maker, tests))
self.assertEqual(map(linecode_maker, tests),
zip(expected_linecodes, tests))
def test_labeloff(self):
"""labeloff: should return expected lines"""
tests = ['AA aa.',
'BB bb.',
'CC C cc.',
'DD dd.',
'EE',
'']
#expects = [line[5:] for line in tests]
expects = ['aa.', ' bb.', ' cc.', '.','','']
self.assertEqual(labeloff(tests), expects)
def test_join_parser(self):
"""join parser should return expected str."""
test_list = '"aaa\nbbb \nccc"; \n'.splitlines()
test_str = 'aaa bb. '
#test default, list input
self.assertEqual(join_parser(test_list),
'"aaa bbb ccc"')
#test default, str input
self.assertEqual(join_parser(test_str),
'aaa bb')
#test no strip
self.assertEqual(join_parser(test_list, chars_to_strip=''),
'"aaa bbb ccc"; ')
#test strip
self.assertEqual(join_parser(test_list, chars_to_strip='"; '),
'aaa bbb ccc')
#test empty
self.assertEqual(join_parser([]),'')
self.assertEqual(join_parser(['', ' ']),'')
self.assertEqual(join_parser(''),'')
def test_join_split_parser(self):
"""join_split_parser: should return expected"""
f = join_split_parser
assertEqual = self.assertEqual
assertEqual(f(['aa; bb;', 'cc.']),
['aa', 'bb', 'cc'])
assertEqual(f(['aa; bb, bbb;', 'cc.'],delimiters=';,'),
['aa', ['bb','bbb'], 'cc'])
#test item_modifer
assertEqual(f('aa (bb) (cc).', '(', item_modifier=rstrip_(') ')),
['aa','bb','cc'])
assertEqual(f('aa (bb)xx (cc).', '(', item_modifier=rstrip_(') ')),
['aa','bb)xx','cc'])
#test empty
assertEqual(f([]),[''])
assertEqual(f(['', ' ']),[''])
assertEqual(f(''),[''])
def test_join_split_dict_parser(self):
"""join_split_dict_parser: should return expected"""
f = join_split_dict_parser
#test default
self.assertEqual(f('aa=1; bb=2,3; cc=4 (if aa=1);'),
{'aa':'1', 'bb': ['2','3'], 'cc': '4 (if aa=1)'})
self.assertEqual(f('aa=1; bb=2,3; cc=4:5', delimiters=';=,:'),
{'aa':'1', 'bb': ['2','3'], 'cc': '4:5'})
#test strict=False -> splits without dict()
self.assertEqual(f('aa=1; bb.', strict=False), [['aa', '1'], ['bb']])
#test strict -> raise ValueError
self.assertRaises(ValueError, f, 'aa=1; bb.')
self.assertRaises(ValueError, f, 'aa=1; bb=2=3.', ';=')
self.assertRaises(ValueError, f, '')
def test_mapping_parser(self):
"""mapping_parser: should return expected dict"""
fields = [None, 'A', 'B', ('C', int), ('D', float)]
line = 'blah aa bb; 2 3.1;'
expect = dict(A='aa', B='bb', C=2, D=3.1)
self.assertEqual(mapping_parser(line, fields), expect)
#test more splits -> ignore the last splits
line_leftover = line + 'blah blah'
self.assertEqual(mapping_parser(line_leftover, fields), expect)
#test more fields -> truncate the last fields
fields_leftover = fields + ['E']
self.assertEqual(mapping_parser(line, fields_leftover), expect)
#test empty
self.assertEqual(mapping_parser('', fields), {})
def test_linecode_merging_maker(self):
"""linecode_merging_maker: """
f = linecode_merging_maker
lines =['ID id.', 'RN rn.', 'RR invalid', 'RN rn.']
labels = ['ID', 'REF', 'RR', 'RN rn.']
self.assertEqual(map(f, lines), zip(labels, lines))
def test_parse_header(lines):
pass
def test_MinimalEbiParser_valid(self):
"""MinimalEbiParser: integrity of output """
f = curry(MinimalEbiParser, strict=False)
#test valid result: sequence, number of records, keys of a header
valid_result = list(f(fake_records_valid))
self.assertEqual(len(valid_result), 2)
sequence, header = valid_result[0]
self.assertEqual(sequence, 'aaccppgghhh')
#the first fake record use only the required labels, the header is
#deleted of '', which was assigned to sequence
self.assertEqual(list(sorted(header.keys())),
list(sorted(required_labels))[1:]) #[1:] to exclude the ''
#test selected_labels
selected_labels = ['ID', 'DE']
select_result = list(f(fake_records_valid,
selected_labels=selected_labels))
self.assertEqual(list(sorted(select_result[0][1].keys())),
list(sorted(selected_labels)))
#test bad record - unknown linecode or wrong line format
self.assertRaises(ValueError, list,
f(fake_records_valid + ['ID id.', 'RR not valid.','//']))
self.assertRaises(ValueError, list,
f(fake_records_valid + ['ID id.', ' RN bad format.','//']))
self.assertRaises(ValueError, list,
f(fake_records_valid + ['ID id.', 'RN bad format.','//']))
#test bad record - not end with '//'
self.assertRaises(RecordError, list, f(fake_records_valid +
['ID not end with //', ' seq']))
#test strict: lacking required linecodes
#?? How to test the error message? warn message?
#the first record, [:-5], is valid even when strict=True.
the_first_valid = list(f(fake_records_valid[:-5], strict=True))[0]
#[1] get the header_dict
self.assertEqual(len(the_first_valid[1]),9)
self.assertRaises(RecordError, list,
f(fake_records_valid, strict=True))
def test_EbiParser(self):
"""EbiParser: """
f = curry(EbiParser, strict=False)
first_valid = fake_records_valid[:-5]
#test valid
self.assertEqual(len(list(f(fake_records_valid))), 2)
#test skipping bad record which strict=False
#self.assertEqual(len(list(f(fake_records_valid[:-1] +
# ['OX xx=no equal.', '//']))), 1)
##test Raise RecordError from parse_head when strict=True
#self.assertRaises(RecordError, list, f(first_valid[:-1] +
# ['OX xx=no equal.', '//'], strict=True))
class RootParsersKnownValues(TestCase):
"""Test most xx_parsers with known value"""
def test_id_parser(self):
"""id_parser should return expected dict"""
id_line = [
'ID CYC_BOVIN STANDARD; PRT; 104 AA.']
self.assertEqual( id_parser(id_line),
{'DataClass': 'STANDARD', 'Length': 104, 'MolType': 'PRT',
'EntryName': 'CYC_BOVIN'})
def test_sq_parser(self):
"""sq_parser should return expected dict"""
lines = [
"SQ SEQUENCE 486 AA; 55639 MW; D7862E867AD74383 CRC64;"]
self.assertEqual( sq_parser(lines),
{'Crc64': 'D7862E867AD74383', 'Length': 486,
'MolWeight': 55639})
def test_ac_parser(self):
"""ac_parser should return expected list"""
lines = [
"AC Q16653; O00713; O00714;",
"AC Q92892; Q92893;"]
self.assertEqual( ac_parser(lines),
['Q16653', 'O00713', 'O00714', 'Q92892', 'Q92893'])
def test_oc_parser(self):
"""oc_parser should return expected list"""
lines = [
"OC Eukaryota; Metazoa; Chordata;",
"OC Mammalia;"]
self.assertEqual(oc_parser(lines),
['Eukaryota', 'Metazoa', 'Chordata', 'Mammalia'])
def test_dt_parser(self):
"""dt_parser should return expected list of list"""
lines = \
"DT 01-AUG-1988 (Rel. 08, Created)\n"\
"DT 30-MAY-2000 (Rel. 39, Last sequence update)\n"\
"DT 10-MAY-2005 (Rel. 47, Last annotation update)\n".splitlines()
self.assertEqual( dt_parser(lines),
['01-AUG-1988 (Rel. 08, Created)',
'30-MAY-2000 (Rel. 39, Last sequence update)',
'10-MAY-2005 (Rel. 47, Last annotation update)'])
def test_de_itemparser(self):
"""de_itemparser: known values"""
inputs = [
' AAA (aa) ',
'AAA [xx] (aa)',
'AAA',
'',]
expects = [
{'OfficalName': 'AAA', 'Synonyms': ['aa']},
{'OfficalName': 'AAA [xx]', 'Synonyms': ['aa']},
{'OfficalName': 'AAA', 'Synonyms': []},
{'OfficalName': '', 'Synonyms': []}]
#pprint(map(de_itemparser, inputs))
self.assertEqual(map(de_itemparser, inputs), expects)
def test_de_parser(self):
"""de_parser should return expected list"""
inputs = [
"DE Annexin [Includes: CCC] [Contains: AAA] (Fragment).",
"DE A [Includes: II] (Fragment).",
"DE A [Contains: CC].",
"DE A (Fragment).",
"DE A (aa)."]
filtered_dicts = [item_empty_filter(de_parser([e])) for e in inputs]
self.assertEqual(map(len, filtered_dicts), [4, 3, 2, 2, 2])
def test_os_parser(self):
"""os_parser should return expected list"""
lines = [
'OS Solanum melongena (Eggplant) (Auber-',
'OS gine).']
self.assertEqual(os_parser(lines),
['Solanum melongena', 'Eggplant', 'Auber- gine'])
lines = \
"""OS Escherichia coli.""".splitlines()
self.assertEqual(os_parser(lines),
['Escherichia coli'])
def test_og_parser(self):
"""og_parser should return expected list"""
lines = [
"OG XXX; xx.",
"OG Plasmid R6-5, Plasmid IncFII R100 (NR1), and",
"OG Plasmid IncFII R1-19 (R1 drd-19)."]
self.assertEqual(og_parser(lines),
['XXX; xx', [
'Plasmid R6-5', 'Plasmid IncFII R100 (NR1)',
'Plasmid IncFII R1-19 (R1 drd-19)']])
def test_ox_parser(self):
"""ox_parser should return expected dict"""
lines = ["OX NCBI_TaxID=9606;"]
self.assertEqual(ox_parser(lines),
{'NCBI_TaxID': '9606'})
def test_gn_parser(self):
"""gn_parser should return expected list of dict"""
lines = [
"GN Name=hns; Synonyms=bglY, cur, topS;",
"GN OrderedLocusNames=b1237, c1701, ECs1739;"]
self.assertEqual(gn_parser(lines),
[{'Synonyms': ['bglY', 'cur', 'topS'],
'OrderedLocusNames': ['b1237', 'c1701', 'ECs1739'],
'Name': 'hns'}])
lines = [
"GN Name=Jon99Cii; Synonyms=SER1, Ser99Da; ORFNames=CG7877;",
"GN and",
"GN Name=Jon99Ciii; Synonyms=SER2, Ser99Db; ORFNames=CG15519;"]
self.assertEqual(gn_parser(lines),
[{'ORFNames': 'CG7877', 'Synonyms': ['SER1', 'Ser99Da'],
'Name': 'Jon99Cii'},
{'ORFNames': 'CG15519', 'Synonyms': ['SER2', 'Ser99Db'],
'Name': 'Jon99Ciii'}])
def test_dr_parser(self):
"""dr_parser should return expected dict"""
lines = dr_lines
self.assertEqual(dr_parser(lines), dr_expect)
class FT_Tests(TestCase):
"""Tests for FT parsers. """
def test_ft_basic_itemparser(self):
"""ft_basic_itemparser: known values"""
inputs = [
['DNA_BIND >102 292'],
['CONFLICT 327 327 E -> R (in Ref. 2).'],
['PROPEP ?25 48',
' /FTId=PRO_021449.',],
['VARIANT 214 214 V -> I.',
' /FTId=VAR_009122.',],]
expects = [
('DNA_BIND', '>102', 292, ''),
('CONFLICT', 327, 327, 'E -> R (in Ref. 2)'),
('PROPEP', '?25', 48, '/FTId=PRO_021449'),
('VARIANT', 214, 214, 'V -> I. /FTId=VAR_009122')]
#pprint(map(ft_basic_itemparser, inputs))
self.assertEqual(map(ft_basic_itemparser, inputs), expects)
def test_try_int(self):
"""try_int: known values"""
inputs = ['9', '0', '-3', '2.3', '<9', '>9', '?', '?35', '']
expects = [9, 0, -3, '2.3', '<9', '>9', '?', '?35', '']
self.assertEqual(map(try_int, inputs), expects)
def test_ft_id_parser(self):
"""ft_id_parser: known values"""
inputs = [
'',
'ddd',
'/FTId=PRO_021449',
'V -> I. /FTId=VAR_009122',
'E -> R (tumor). /FTId=VAR_002343',]
expects = [
{'Description': '', 'Id': ''},
{'Description': 'ddd', 'Id': ''},
{'Description': '', 'Id': 'PRO_021449'},
{'Description': 'V -> I', 'Id': 'VAR_009122'},
{'Description': 'E -> R (tumor)', 'Id': 'VAR_002343'}]
#pprint(map(ft_id_parser, inputs))
self.assertEqual(map(ft_id_parser, inputs), expects)
def test_ft_mutation_parser(self):
"""ft_mutation_parser: known values"""
inputs = [
'',
'ddd', #should raise error?
'V -> I. /FTId=xxxxxx', #should raise error?
'V -> I',
'E -> R (tumor)',
'missing (tumor)', ]
expects = [
{'MutateFrom': '', 'Comment': '', 'MutateTo': ''},
{'MutateFrom': 'ddd', 'Comment': '', 'MutateTo': ''},
{'MutateFrom': 'V', 'Comment': '', 'MutateTo': 'I. /FTId=xxxxxx'},
{'MutateFrom': 'V', 'Comment': '', 'MutateTo': 'I'},
{'MutateFrom': 'E', 'Comment': 'tumor', 'MutateTo': 'R'},
{'MutateFrom': 'missing ', 'Comment': 'tumor', 'MutateTo': ''}]
#pprint(map(ft_mutation_parser, inputs))
self.assertEqual(map(ft_mutation_parser, inputs), expects)
def test_ft_mutation_parser_raise(self):
"""ft_mutation_parser: raise ValueError"""
pass
def test_ft_mutagen_parser(self):
"""ft_mutagen_parser: known values"""
inputs = [
'C->R,E,A: Loss of cADPr hydrolas',
'Missing: Abolishes ATP-binding', ]
expects = [
{'Comment': ' Loss of cADPr hydrolas',
'MutateFrom': 'C', 'MutateTo': 'R,E,A'},
{'Comment': ' Abolishes ATP-binding',
'MutateFrom': 'Missing', 'MutateTo': ''}]
#pprint(map(ft_mutagen_parser, inputs))
self.assertEqual(map(ft_mutagen_parser, inputs), expects)
def test_ft_id_mutation_parser(self):
"""ft_id_mutation_parser: known values"""
pass
def test_ft_parser(self):
"""ft_parser should return expected dict"""
lines = ft_lines
#pprint(ft_parser(lines))
self.assertEqual(ft_parser(lines), ft_expect)
class CC_Tests(TestCase):
"""tests for cc_parsers. """
def test_cc_itemfinder_valid(self):
"""cc_itemfinder: yield each expected block."""
#pprint(list(cc_itemfinder(labeloff(cc_lines))))
input_with_license = labeloff(cc_lines)
self.assertEqual(len(list(cc_itemfinder(
input_with_license))), 9)
input_without_license = labeloff(cc_lines[:-4])
self.assertEqual(len(list(cc_itemfinder(
input_without_license))), 8)
def test_cc_itemfinder_raise(self):
"""cc_itemfinder: raise RecordError if license block bad."""
input_with_license_lacking_bottom = labeloff(cc_lines[:-1])
self.assertRaises(FieldError, cc_itemfinder,
input_with_license_lacking_bottom)
def test_cc_basic_itemparser(self):
"""cc_basic_itemparser: known results or FieldError"""
valid_topics = [
['-!- topic1: first: line', ' second line'],
['-!- topic2: ', ' first line', ' second line'],
[' topic3: not treated invalid topic format'],]
expects = [
('topic1', ['first: line', 'second line']),
('topic2', ['first line', 'econd line']),
('topic3', ['not treated invalid topic format']),]
self.assertEqual(map(cc_basic_itemparser, valid_topics),
expects)
bad_topic = ['-!- bad_topic without colon', ' FieldError']
self.assertRaises(FieldError, cc_basic_itemparser, bad_topic)
def test_cc_interaction_parser(self):
"""cc_interaction_parser: known values"""
inputs = [
['Self; NbExp=1; IntAct=EBI-123485, EBI-123485;',
'Q9W158:CG4612; NbExp=1; IntAct=EBI-123485, EBI-89895;',
'Q9VYI0:fne; NbExp=1; IntAct=EBI-123485, EBI-126770;',]]
expects =[
[('Self', {'NbExp': '1',
'IntAct': ['EBI-123485', 'EBI-123485']}),
('Q9W158:CG4612', {'NbExp': '1',
'IntAct': ['EBI-123485', 'EBI-89895']}),
('Q9VYI0:fne', {'NbExp': '1',
'IntAct': ['EBI-123485', 'EBI-126770']})]]
self.assertEqual(map(cc_interaction_parser, inputs), expects)
def test_cc_biophysicochemical_properties_parser(self):
"""cc_biophysicochemical_properties_parser: known values"""
#pprint(cc['BIOPHYSICOCHEMICAL PROPERTIES']) #topic specific parser
f = cc_biophysicochemical_properties_parser
valid_inputs = [
['Kinetic parameters:',
' KM=98 uM for ATP;',
' KM=688 uM for pyridoxal;',
' Vmax=1.604 mmol/min/mg enzyme;',
'pH dependence:',
' Optimum pH is 6.0. Active pH 4.5 to 10.5;',]
]
expects = [
{'Kinetic parameters': {
'KM': ['98 uM for ATP', '688 uM for pyridoxal'],
'Vmax': '1.604 mmol/min/mg enzyme'},
'pH dependence': 'Optimum pH is 6.0. Active pH 4.5 to 10.5'},
]
self.assertEqual(map(f, valid_inputs), expects)
def test_cc_alternative_products_parser(self):
"""cc_alternative_products_parser: know values"""
f = cc_alternative_products_parser
valid_inputs = [
['Event=Alternative initiation;'
' Comment=Free text;',
'Event=Alternative splicing; Named isoforms=3;',
' Comment=Additional isoforms seem to exist.',
' confirmation;',
'Name=1; Synonyms=AIRE-1;',
' IsoId=O43918-1; Sequence=Displayed;',
'Name=3; Synonyms=AIRE-3,',
'ai-2, ai-3;', #broken the hanging_paragraph_finder
' IsoId=O43918-3; Sequence=VSP_004089, VSP_004090;',]]
expects = \
[[{'Comment': 'Free text', 'Event': 'Alternative initiation'},
{'Comment': 'Additional isoforms seem to exist. confirmation',
'Event': 'Alternative splicing',
'Named isoforms': '3',
'Names': [{'IsoId': 'O43918-1',
'Name': '1',
'Sequence': 'Displayed',
'Synonyms': 'AIRE-1'},
{'IsoId': 'O43918-3',
'Name': '3',
'Sequence': ['VSP_004089', 'VSP_004090'],
'Synonyms': ['AIRE-3', 'ai-2', 'ai-3']}]}]]
#pprint(map(f,valid_inputs))
self.assertEqual(map(f, valid_inputs), expects)
def test_cc_parser(self):
"""cc_parser: known values and raise when strict"""
cc = cc_parser(cc_lines)
#pprint(cc)
#print cc.keys()
self.assertEqual(list(sorted(cc.keys())),
['ALLERGEN', 'ALTERNATIVE PRODUCTS',
'BIOPHYSICOCHEMICAL PROPERTIES', 'DATABASE', 'DISEASE',
'INTERACTION', 'LICENSE', 'MASS SPECTROMETRY'])
#test Disease topic (default_handler)
self.assertEqual(cc['DISEASE'], [
'Defects in PHKA1 are linked to X-linked muscle glycogenosis '
'[MIM:311870]',
'Defects in ABCD1 are the cause of recessive X-linked '
'adrenoleukodystrophy (X-ALD) [MIM:300100]. X-ALD is a rare '
'phenotype'
])
#test License (default_handler)
#pprint(cc['LICENSE'])
self.assertEqual(cc['LICENSE'], [
'This SWISS-PROT entry is copyright. It is produced through a '
'collaboration removed'])
#pprint(cc['DATABASE']) #join_split_dict
self.assertEqual(cc['DATABASE'], [{
'NAME': 'CD40Lbase',
'NOTE': 'European CD40L defect database (mutation db)',
'WWW': '"http://www.expasy.org/cd40lbase/"'}])
#test strict
cc_lines_with_unknown_topic = ['CC -!- BLAHBLAH: xxxxx'] + cc_lines
#pprint(cc_parser(cc_lines_with_unknown_topic))
self.assertEqual(cc_parser(cc_lines_with_unknown_topic)['BLAHBLAH'],
['xxxxx'])
self.assertRaises(FieldError, cc_parser,
cc_lines_with_unknown_topic, strict=True)
class ReferenceTests(TestCase):
"""Tests for parsers related to reference blocks"""
def test_ref_finder(self):
"""ref_finder: should return a list of ref blocks"""
pass
def test_refs_parser(self):
"""refs_parser: should return a dict of {RN: ref_dict}"""
pass
def test_single_ref_parser(self):
"""single_ref_parser: should return the expected dict"""
fake_ref_block = ['RN [1]',
'RP NUCLEOTIDE',
'RC STRAIN=Bristol N2;',
'RX PubMed=1113;',
'RA Zhang L., Wu S.-L., Rubin C.S.;',
'RT "A novel ";',
'RL J. Biol. Chem. 276:10.',]
rn, others = single_ref_parser(fake_ref_block)
self.assertEqual(rn, 1)
self.assertEqual(len(others), 6)
#test strict: lacking required labels
self.assertEqual(len(single_ref_parser(
fake_ref_block[:-1], strict=False)[1]), 5)
self.assertRaises(RecordError, single_ref_parser, fake_ref_block[:-1],
True)
def test_ra_parser(self):
"""ra_parser should return expected list"""
lines = \
"RA Galinier A., Bleicher F., Negre D.,\n"\
"RA Cozzone A.J., Cortay J.-C.;\n".splitlines()
self.assertEqual( ra_parser(lines),
['Galinier A.', 'Bleicher F.', 'Negre D.', 'Cozzone A.J.',
'Cortay J.-C.'])
def test_rx_parser(self):
"""rx_parser should return expected dict"""
inputs = [
['RX MEDLINE=22709107; PubMed=12788972; DOI=10.1073/pnas.113'],
['RX PubMed=14577811; '\
'DOI=10.1597/1545-1569(2003)040<0632:AMMITS>2.0.CO;2;']]
expects = [
{'DOI': '10.1073/pnas.113', 'MEDLINE': '22709107',
'PubMed': '12788972'},
{'DOI': '10.1597/1545-1569(2003)040<0632:AMMITS>2.0.CO;2',
'PubMed': '14577811'}]
self.assertEqual(map(rx_parser, inputs), expects)
def test_rc_parser(self):
"""rc_parser should return expected dict"""
lines = [
"RC PLASMID=R1 (R7268); TRANSPOSON=Tn3;",
"RC STRAIN=AL.012, AZ.026;"]
self.assertEqual(rc_parser(lines),
{'TRANSPOSON': 'Tn3', 'PLASMID': 'R1 (R7268)',
'STRAIN': ['AL.012', 'AZ.026']})
def test_rt_parser(self):
"""rt_parser should return expected str"""
lines = [
'RT "New insulin-like proteins',
'RT analysis and homology modeling.";']
self.assertEqual( rt_parser(lines),
'New insulin-like proteins analysis and homology modeling')
def test_rl_parser(self):
"""rl_parser should return expected str"""
lines = [
"RL J. Mol. Biol. 168:321-331(1983)."]
self.assertEqual( rl_parser(lines),
'J. Mol. Biol. 168:321-331(1983)')
def test_rn_parser(self):
"""rn_parser should return expected str"""
lines = [ "RN [8]"]
self.assertEqual( rn_parser(lines), 8)
def test_rg_parser(self):
"""rg_parser should return expected str"""
lines = [ "RG The mouse genome sequencing consortium;"]
self.assertEqual( rg_parser(lines),
['The mouse genome sequencing consortium'])
def test_rp_parser(self):
"""rp_parser should return expected str"""
lines = [ "RP X-RAY CRYSTALLOGRAPHY (1.8 ANGSTROMS)."]
self.assertEqual( rp_parser(lines),
'X-RAY CRYSTALLOGRAPHY (1.8 ANGSTROMS)')
#################################
# global test data
ft_lines = \
"""FT CHAIN 29 262 Granzyme A.
FT /FTId=PRO_0000027394.
FT ACT_SITE 69 69 Charge relay system.
FT VARIANT 121 121 T -> M (in dbSNP:3104233).
FT /FTId=VAR_024291.
FT VARIANT 1 7 unknown (in a skin tumor).
FT /FTId=VAR_005851.
FT CONFLICT 282 282 R -> Q (in Ref. 18).
FT STRAND 30 30
FT STRAND 33 34
FT TURN 37 38""".splitlines()
ft_expect = \
{'ACT_SITE': [{'Start': 69, 'End': 69, 'Description': 'Charge relay system'}],
'CHAIN': [{'Description': {'Description': 'Granzyme A',
'Id': 'PRO_0000027394'},
'End': 262,
'Start': 29}],
'CONFLICT': [{'Description': {'Comment': 'in Ref. 18',
'MutateFrom': 'R',
'MutateTo': 'Q'},
'End': 282,
'Start': 282}],
'SecondaryStructure': [('STRAND', 30, 30),
('STRAND', 33, 34),
('TURN', 37, 38)],
'VARIANT': [{'Description': {'Comment': 'in dbSNP:3104233',
'Id': 'VAR_024291',
'MutateFrom': 'T',
'MutateTo': 'M'},
'End': 121,
'Start': 121},
{'Description': {'Comment': 'in a skin tumor',
'Id': 'VAR_005851',
'MutateFrom': 'unknown ',
'MutateTo': ''},
'End': 7,
'Start': 1}]}
dr_lines =\
"""DR MIM; 140050; gene.
DR GO; GO:0001772; C:immunological synapse; TAS.
DR GO; GO:0005634; C:nucleus; TAS.
DR GO; GO:0006915; P:apoptosis; TAS.
DR GO; GO:0006922; P:cleavage of lamin; IDA.
DR GO; GO:0006955; P:immune response; TAS.
DR InterPro; IPR001254; Peptidase_S1_S6.
DR InterPro; IPR001314; Peptidase_S1A.
DR Pfam; PF00089; Trypsin; 1.""".splitlines()
dr_expect =\
{'GO': [['GO:0001772', 'C:immunological synapse', 'TAS'],
['GO:0005634', 'C:nucleus', 'TAS'],
['GO:0006915', 'P:apoptosis', 'TAS'],
['GO:0006922', 'P:cleavage of lamin', 'IDA'],
['GO:0006955', 'P:immune response', 'TAS']],
'Pfam': [['PF00089', 'Trypsin', '1']],
'InterPro': [['IPR001254', 'Peptidase_S1_S6'],
['IPR001314', 'Peptidase_S1A']],
'MIM': [['140050', 'gene']]}
cc_lines = \
"""CC -!- ALLERGEN: Causes an allergic reaction in human. Binds to IgE.
CC bovine dander.
CC -!- ALTERNATIVE PRODUCTS:
CC Event=Alternative splicing; Named isoforms=3;
CC Comment=Additional isoforms seem to exist.
CC confirmation;
CC Name=1; Synonyms=AIRE-1;
CC IsoId=O43918-1; Sequence=Displayed;
CC Name=2; Synonyms=AIRE-2;
CC IsoId=O43918-2; Sequence=VSP_004089;
CC Name=3; Synonyms=AIRE-3;
CC IsoId=O43918-3; Sequence=VSP_004089, VSP_004090;
CC -!- BIOPHYSICOCHEMICAL PROPERTIES:
CC Kinetic parameters:
CC KM=98 uM for ATP;
CC KM=688 uM for pyridoxal;
CC Vmax=1.604 mmol/min/mg enzyme;
CC pH dependence:
CC Optimum pH is 6.0. Active pH 4.5 to 10.5;
CC -!- DATABASE: NAME=CD40Lbase;
CC NOTE=European CD40L defect database (mutation db);
CC WWW="http://www.expasy.org/cd40lbase/".
CC -!- DISEASE: Defects in PHKA1 are linked to X-linked muscle
CC glycogenosis [MIM:311870].
CC -!- DISEASE: Defects in ABCD1 are the cause of recessive X-linked
CC adrenoleukodystrophy (X-ALD) [MIM:300100]. X-ALD is a rare
CC phenotype.
CC -!- INTERACTION:
CC Self; NbExp=1; IntAct=EBI-123485, EBI-123485;
CC Q9W158:CG4612; NbExp=1; IntAct=EBI-123485, EBI-89895;
CC Q9VYI0:fne; NbExp=1; IntAct=EBI-123485, EBI-126770;
CC -!- MASS SPECTROMETRY: MW=24948; MW_ERR=6; METHOD=MALDI; RANGE=1-228;
CC NOTE=Ref.2.
CC --------------------------------------------------------------------------
CC This SWISS-PROT entry is copyright. It is produced through a collaboration
CC removed.
CC --------------------------------------------------------------------------
""".splitlines()
fake_records_valid = """ID CYC_BOVIN STANDARD; PRT; 104 AA.
AC ac1; ac2;
DT dt.
DE de.
OS os.
OC oc.
OX NCBI_TaxID=9606;
RN [1]
SQ SEQUENCE 486 AA; 55639 MW; D7862E867AD74383 CRC64
aac cpp
ggh hh
//
ID idid std; prt; 104 #-5
OX NCBI_TaxID=9606;
DE dede.
ggaaccpp
//""".splitlines()
# Run tests if called from the command line
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_fasta.py 000644 000765 000024 00000041153 12024702176 022267 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for FASTA and related parsers.
"""
from cogent.parse.fasta import FastaParser, MinimalFastaParser, \
NcbiFastaLabelParser, NcbiFastaParser, RichLabel, LabelParser, GroupFastaParser
from cogent.core.sequence import DnaSequence, Sequence, ProteinSequence as Protein
from cogent.core.info import Info
from cogent.parse.record import RecordError
from cogent.util.unit_test import TestCase, main
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
def Dna(seq, *args, **kwargs):
seq = seq.replace('u','t')
seq = seq.replace('U','T')
d = DnaSequence(seq, *args, **kwargs)
return d
class GenericFastaTest(TestCase):
"""Setup data for all the various FASTA parsers."""
def setUp(self):
"""standard files"""
self.labels = '>abc\n>def\n>ghi\n'.split('\n')
self.oneseq = '>abc\nUCAG\n'.split('\n')
self.multiline = '>xyz\nUUUU\nCC\nAAAAA\nG'.split('\n')
self.threeseq='>123\na\n> \t abc \t \ncag\ngac\n>456\nc\ng'.split('\n')
self.twogood='>123\n\n> \t abc \t \ncag\ngac\n>456\nc\ng'.split('\n')
self.oneX='>123\nX\n> \t abc \t \ncag\ngac\n>456\nc\ng'.split('\n')
self.nolabels = 'GJ>DSJGSJDF\nSFHKLDFS>jkfs\n'.split('\n')
self.empty = []
class MinimalFastaParserTests(GenericFastaTest):
"""Tests of MinimalFastaParser: returns (label, seq) tuples."""
def test_empty(self):
"""MinimalFastaParser should return empty list from 'file' w/o labels"""
self.assertEqual(list(MinimalFastaParser(self.empty)), [])
self.assertEqual(list(MinimalFastaParser(self.nolabels, strict=False)),
[])
self.assertRaises(RecordError, list, MinimalFastaParser(self.nolabels))
def test_no_labels(self):
"""MinimalFastaParser should return empty list from file w/o seqs"""
#should fail if strict (the default)
self.assertRaises(RecordError, list,
MinimalFastaParser(self.labels,strict=True))
#if not strict, should skip the records
self.assertEqual(list(MinimalFastaParser(self.labels, strict=False)),
[])
def test_single(self):
"""MinimalFastaParser should read single record as (label, seq) tuple"""
f = list(MinimalFastaParser(self.oneseq))
self.assertEqual(len(f), 1)
a = f[0]
self.assertEqual(a, ('abc', 'UCAG'))
f = list(MinimalFastaParser(self.multiline))
self.assertEqual(len(f), 1)
a = f[0]
self.assertEqual(a, ('xyz', 'UUUUCCAAAAAG'))
def test_multiple(self):
"""MinimalFastaParser should read multiline records correctly"""
f = list(MinimalFastaParser(self.threeseq))
self.assertEqual(len(f), 3)
a, b, c = f
self.assertEqual(a, ('123', 'a'))
self.assertEqual(b, ('abc', 'caggac'))
self.assertEqual(c, ('456', 'cg'))
def test_multiple_bad(self):
"""MinimalFastaParser should complain or skip bad records"""
self.assertRaises(RecordError, list, MinimalFastaParser(self.twogood))
f = list(MinimalFastaParser(self.twogood, strict=False))
self.assertEqual(len(f), 2)
a, b = f
self.assertEqual(a, ('abc', 'caggac'))
self.assertEqual(b, ('456', 'cg'))
class FastaParserTests(GenericFastaTest):
"""Tests of FastaParser: returns sequence objects."""
def test_empty(self):
"""FastaParser should return empty list from 'file' w/o labels"""
self.assertEqual(list(FastaParser(self.empty)), [])
self.assertEqual(list(FastaParser(self.nolabels, strict=False)),
[])
self.assertRaises(RecordError, list, FastaParser(self.nolabels))
def test_no_labels(self):
"""FastaParser should return empty list from file w/o seqs"""
#should fail if strict (the default)
self.assertRaises(RecordError, list,
FastaParser(self.labels,strict=True))
#if not strict, should skip the records
self.assertEqual(list(FastaParser(self.labels, strict=False)), [])
def test_single(self):
"""FastaParser should read single record as seq object"""
f = list(FastaParser(self.oneseq))
self.assertEqual(len(f), 1)
a = f[0]
self.assertEqual(a, ('abc', 'UCAG'))
self.assertEqual(a[1].Name, 'abc')
f = list(FastaParser(self.multiline))
self.assertEqual(len(f), 1)
a = f[0]
self.assertEqual(a, ('xyz', 'UUUUCCAAAAAG'))
self.assertEqual(a[1].Name, 'xyz')
def test_single_constructor(self):
"""FastaParser should use constructors if supplied"""
f = list(FastaParser(self.oneseq, Dna))
self.assertEqual(len(f), 1)
a = f[0]
self.assertEqual(a, ('abc', 'TCAG'))
self.assertEqual(a[1].Name, 'abc')
def upper_abc(x):
return None, {'ABC': x.upper()}
f = list(FastaParser(self.multiline, Dna, upper_abc))
self.assertEqual(len(f), 1)
a = f[0]
self.assertEqual(a, (None, 'TTTTCCAAAAAG'))
self.assertEqual(a[1].Name, None)
self.assertEqual(a[1].Info.ABC, 'XYZ')
def test_multiple(self):
"""FastaParser should read multiline records correctly"""
f = list(FastaParser(self.threeseq))
self.assertEqual(len(f), 3)
for i in f:
assert isinstance(i[1], Sequence)
a, b, c = f
self.assertEqual((a[1].Name, a[1]), ('123', 'a'))
self.assertEqual((b[1].Name, b[1]), ('abc', 'caggac'))
self.assertEqual((c[1].Name, c[1]), ('456', 'cg'))
def test_multiple_bad(self):
"""Parser should complain or skip bad records"""
self.assertRaises(RecordError, list, FastaParser(self.twogood))
f = list(FastaParser(self.twogood, strict=False))
self.assertEqual(len(f), 2)
a, b = f
a, b = a[1], b[1] #field 0 is name
self.assertEqual((a.Name, a), ('abc', 'caggac'))
self.assertEqual((b.Name, b), ('456', 'cg'))
def test_multiple_constructor_bad(self):
"""Parser should complain or skip bad records w/ constructor"""
def dnastrict(x, **kwargs):
try:
return Dna(x, check=True, **kwargs)
except Exception, e:
raise RecordError, "Could not convert sequence"
self.assertRaises(RecordError, list, FastaParser(self.oneX, dnastrict))
f = list(FastaParser(self.oneX, dnastrict, strict=False))
self.assertEqual(len(f), 2)
a, b = f
a, b = a[1], b[1]
self.assertEqual((a.Name, a), ('abc', 'caggac'.upper()))
self.assertEqual((b.Name, b), ('456', 'cg'.upper()))
class NcbiFastaLabelParserTests(TestCase):
"""Tests of the label line parser for NCBI's FASTA identifiers."""
def test_init(self):
"""Labels from genpept.fsa should work as expected"""
i = NcbiFastaLabelParser(
'>gi|37549575|ref|XP_352503.1| similar to EST gb|ATTS1136')[1]
self.assertEqual(i.GI, ['37549575'])
self.assertEqual(i.RefSeq, ['XP_352503.1'])
self.assertEqual(i.Description, 'similar to EST gb|ATTS1136')
i = NcbiFastaLabelParser(
'>gi|32398734|emb|CAD98694.1| (BX538350) dbj|baa86974.1, possible')[1]
self.assertEqual(i.GI, ['32398734'])
self.assertEqual(i.RefSeq, [])
self.assertEqual(i.EMBL, ['CAD98694.1'])
self.assertEqual(i.Description, '(BX538350) dbj|baa86974.1, possible')
i = NcbiFastaLabelParser(
'>gi|10177064|dbj|BAB10506.1| (AB005238) ')[1]
self.assertEqual(i.GI, ['10177064'])
self.assertEqual(i.DDBJ, ['BAB10506.1'])
self.assertEqual(i.Description, '(AB005238)')
class NcbiFastaParserTests(TestCase):
"""Tests of the NcbiFastaParser."""
def setUp(self):
"""Define a few standard files"""
self.peptide = [
'>gi|10047090|ref|NP_055147.1| small muscle protein, X-linked [Homo sapiens]',
'MNMSKQPVSNVRAIQANINIPMGAFRPGAGQPPRRKECTPEVEEGVPPTSDEEKKPIPGAKKLPGPAVNL',
'SEIQNIKSELKYVPKAEQ',
'>gi|10047092|ref|NP_037391.1| neuronal protein [Homo sapiens]',
'MANRGPSYGLSREVQEKIEQKYDADLENKLVDWIILQCAEDIEHPPPGRAHFQKWLMDGTVLCKLINSLY',
'PPGQEPIPKISESKMAFKQMEQISQFLKAAETYGVRTTDIFQTVDLWEGKDMAAVQRTLMALGSVAVTKD'
]
self.nasty = [
' ', #0 ignore leading blank line
'>gi|abc|ref|def|', #1 no description -- ok
'UCAG', #2 single line of sequence
'#comment', #3 comment -- skip
' \t ', #4 ignore blank line between records
'>gi|xyz|gb|qwe| \tdescr \t\t', #5 desciption has whitespace
'UUUU', #6 two lines of sequence
'CCCC', #7
'>gi|bad|ref|nonsense', #8 missing last pipe -- error
'ACU', #9
'>gi|bad|description', #10 not enough fields -- error
'AAA', #11
'>gi|bad|ref|stuff|label', #12
'XYZ', #13 bad sequence -- error
'>gi|bad|gb|ignore| description', #14 label without sequence -- error
'> gi | 123 | dbj | 456 | desc|with|pipes| ',#15 label w/ whitespace -- OK
'ucag', #16
' \t ', #17 ignore blank line inside record
'UCAG', #18
'tgac', #19 lowercase should be OK
'# comment', #20 comment -- skip
'NNNN', #21 degenerates should be OK
' ', #22 ignore trailing blank line
]
self.empty = []
self.no_label = ['ucag']
def test_empty(self):
"""NcbiFastaParser should accept empty input"""
self.assertEqual(list(NcbiFastaParser(self.empty)), [])
self.assertEqual(list(NcbiFastaParser(self.empty, Protein)), [])
def test_normal(self):
"""NcbiFastaParser should accept normal record if loose or strict"""
f = list(NcbiFastaParser(self.peptide, Protein))
self.assertEqual(len(f), 2)
a, b = f
a, b = a[1], b[1] #field 0 is the name
self.assertEqual(a, 'MNMSKQPVSNVRAIQANINIPMGAFRPGAGQPPRRKECTPEVEEGVPPTSDEEKKPIPGAKKLPGPAVNLSEIQNIKSELKYVPKAEQ')
self.assertEqual(a.Info.GI, ['10047090'])
self.assertEqual(a.Info.RefSeq, ['NP_055147.1'])
self.assertEqual(a.Info.DDBJ, [])
self.assertEqual(a.Info.Description,
'small muscle protein, X-linked [Homo sapiens]')
self.assertEqual(b, 'MANRGPSYGLSREVQEKIEQKYDADLENKLVDWIILQCAEDIEHPPPGRAHFQKWLMDGTVLCKLINSLYPPGQEPIPKISESKMAFKQMEQISQFLKAAETYGVRTTDIFQTVDLWEGKDMAAVQRTLMALGSVAVTKD')
self.assertEqual(b.Info.GI, ['10047092'])
self.assertEqual(b.Info.RefSeq, ['NP_037391.1'])
self.assertEqual(b.Info.Description, 'neuronal protein [Homo sapiens]')
def test_bad(self):
"""NcbiFastaParser should raise error on bad records if strict"""
#if strict, starting anywhere in the first 15 lines should cause errors
for i in range(15):
self.assertRaises(RecordError,list,NcbiFastaParser(self.nasty[i:]))
#...but the 16th is OK.
r = list(NcbiFastaParser(self.nasty[15:]))[0]
self.assertEqual(r, ('123', 'ucagUCAGtgacNNNN'))
#test that we get what we expect if not strict
r = list(NcbiFastaParser(self.nasty, Sequence, strict=False))
self.assertEqual(len(r), 4)
a, b, c, d = r
self.assertEqual((a[1], a[1].Info.GI, a[1].Info.RefSeq, \
a[1].Info.Description),
('UCAG', ['abc'], ['def'], ''))
self.assertEqual((b[1], b[1].Info.GI, b[1].Info.GenBank, \
b[1].Info.Description),
('UUUUCCCC', ['xyz'], ['qwe'], 'descr'))
self.assertEqual((c[1], c[1].Info.GI, c[1].Info.RefSeq, \
c[1].Info.Description),
('XYZ', ['bad'], ['stuff'], 'label'))
self.assertEqual((d[1], d[1].Info.GI, d[1].Info.DDBJ, \
d[1].Info.Description),
('ucagUCAGtgacNNNN'.upper(), ['123'], ['456'], 'desc|with|pipes|'))
#...and when we explicitly supply a constructor
r = list(NcbiFastaParser(self.nasty, Dna, strict=False))
self.assertEqual(len(r), 3)
a, b, c = r
a, b, c = a[1], b[1], c[1]
self.assertEqual((a, a.Info.GI, a.Info.RefSeq, a.Info.Description),
('TCAG', ['abc'], ['def'], ''))
self.assertEqual((b, b.Info.GI, b.Info.GenBank, b.Info.Description),
('TTTTCCCC', ['xyz'], ['qwe'], 'descr'))
self.assertEqual((c, c.Info.GI, c.Info.DDBJ, c.Info.Description),
('tcagTCAGtgacNNNN'.upper(), ['123'], ['456'], 'desc|with|pipes|'))
class LabelParsingTest(TestCase):
"""Test generic fasta label parsing"""
def test_rich_label(self):
"""rich label correctly constructs label strings"""
# labels should be equal based on the result of applying their
# attributes to their string template
k = RichLabel(Info(species="rat"), "%(species)s")
l = RichLabel(Info(species="rat", seq_id="xy5"), "%(species)s")
self.assertEqual(k, l)
# labels should construct from Info components correctly
k = RichLabel(Info(species="rat", seq_id="xy5"),
"%(seq_id)s:%(species)s")
self.assertEqual(k, "xy5:rat")
k = RichLabel(Info(species="rat", seq_id="xy5"),
"%(species)s:%(seq_id)s")
self.assertEqual(k, "rat:xy5")
# extra components should be ignored
k = RichLabel(Info(species="rat", seq_id="xy5"), "%(species)s")
self.assertEqual(k, "rat")
# the label should have Info object
self.assertEqual(k.Info.species, "rat")
self.assertEqual(k.Info.seq_id, "xy5")
# label should be constructable just like a normal string
self.assertEqual(RichLabel('a'), 'a')
def test_label_parser(self):
"""label parser factory function cope with mixed structure labels"""
# the label parser factory function should correctly handle label lines
# with mixed separators
make = LabelParser("%(species)s:%(accession)s",
[[0,"accession", str],
[2, "species", str]],
split_with=": ")
for label, expect in [(">abcd:human:misc", "misc:abcd"),
("abcd:human:misc", "misc:abcd"),
(">abcd:Human misc", "misc:abcd"),
(">abcd Human:misc", "misc:abcd"),
(">abcd:Human misc", "misc:abcd")]:
self.assertEqual(make(label), expect)
# should raise an assertion error if template doesn't match at least one field name
self.assertRaises(AssertionError, LabelParser, "%s:%s",
[[0,"accession", str],
[2, "species", str]],
split_with=": ")
class GroupFastaParsingTest(TestCase):
"""test parsing of grouped sequences in a collection"""
def test_groups(self):
"""correctly yield grouped sequences from fasta formatted data"""
data = [">group1:seq1_id:species1",
"ACTG",
">group1:seq2_id:species2",
"ACTG",
">group2:seq3_id:species1",
"ACGT",
">group2:seq4_id:species2",
"ACGT"]
expected = [{"species1": "ACTG", "species2":"ACTG"},
{"species1":"ACGT", "species2":"ACGT"}]
label_to_name = LabelParser("%(species)s", [(0,"Group",str),
(1,"seq_id",str),(2,"species",str)], split_with=":")
parser = GroupFastaParser(data, label_to_name, aligned=True)
count = 0
for group in parser:
got = group.todict()
want = expected[count]
self.assertEqual(got, want)
self.assertEqual(group.Info.Group, "group%s" % (count+1))
count += 1
# check we don't return a done group
done_groups = ["group1"]
parser = GroupFastaParser(data, label_to_name, done_groups=done_groups,
aligned=True)
for group in parser:
got = group.todict()
want = expected[1]
self.assertEqual(got, want)
self.assertEqual(group.Info.Group, "group2")
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_fastq.py 000644 000765 000024 00000004337 12024702176 022312 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from cogent.util.unit_test import TestCase, main
from cogent.parse.fastq import MinimalFastqParser
__author__ = "Gavin Huttley"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Gavin Huttley"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Production"
data = {
"GAPC_0015:6:1:1259:10413#0/1":
dict(seq='AACACCAAACTTCTCCACCACGTGAGCTACAAAAG',
qual=r'````Y^T]`]c^cabcacc`^Lb^ccYT\T\Y\WF'),
"GAPC_0015:6:1:1283:11957#0/1":
dict(seq='TATGTATATATAACATATACATATATACATACATA',
qual=r']KZ[PY]_[YY^```ac^\\`bT``c`\aT``bbb'),
"GAPC_0015:6:1:1284:10484#0/1":
dict(seq='TCAGTTTTCCTCGCCATATTTCACGTCCTAAAGCG',
qual=r'UM_]]U_]Z_Y^\^^``Y]`^SZ]\Ybb`^_LbL_'),
"GAPC_0015:6:1:1287:17135#0/1":
dict(seq='TGTGCCTATGGAAGCAGTTCTAGGATCCCCTAGAA',
qual=r'^aacccL\ccc\c\cTKTS]KZ\]]I\[Wa^T`^K'),
"GAPC_0015:6:1:1293:3171#0/1":
dict(seq="AAAGAAAGGAAGAAAAGAAAAAGAAACCCGAGTTA",
qual=r"b`bbbU_[YYcadcda_LbaaabWbaacYcc`a^c"),
"GAPC_0015:6:1:1297:10729#0/1":
dict(seq="TAATGCCAAAGAAATATTTCCAAACTACATGCTTA",
qual=r"T\ccLbb``bacc]_cacccccLccc\ccTccYL^"),
"GAPC_0015:6:1:1299:5940#0/1":
dict(seq="AATCAAGAAATGAAGATTTATGTATGTGAAGAATA",
qual=r"dcddbcfffdfffd`dd`^`c`Oc`Ybb`^eecde"),
"GAPC_0015:6:1:1308:6996#0/1":
dict(seq="TGGGACACATGTCCATGCTGTGGTTTTAACCGGCA",
qual=r"a]`aLY`Y^^ccYa`^^TccK_X]\c\c`caTTTc"),
"GAPC_0015:6:1:1314:13295#0/1":
dict(seq="AATATTGCTTTGTCTGAACGATAGTGCTCTTTGAT",
qual=r"cLcc\\dddddaaYd`T```bLYT\`a```bZccc"),
"GAPC_0015:6:1:1317:3403#0/1":
dict(seq="TTGTTTCCACTTGGTTGATTTCACCCCTGAGTTTG",
qual=r"\\\ZTYTSaLbb``\_UZ_bbcc`cc^[ac\a\T\ ".strip())# had to add space
}
class ParseFastq(TestCase):
def test_parse(self):
"""sequence and info objects should correctly match"""
for label, seq, qual in MinimalFastqParser('data/fastq.txt'):
self.assertTrue(label in data)
self.assertEqual(seq, data[label]["seq"])
self.assertEqual(qual, data[label]["qual"])
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_parse/test_flowgram.py 000644 000765 000024 00000131277 12024702176 023016 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""tests for Flowgram and Flowgramcollection objects
"""
__author__ = "Jens Reeder, Julia Goodrich"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jens Reeder","Julia Goodrich"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Jens Reeder"
__email__ = "jreeder@colorado.edu"
__status__ = "Development"
from cogent.util.unit_test import TestCase, main
from types import GeneratorType
from numpy import array, transpose
from cogent.core.sequence import Sequence
from cogent.parse.flowgram import Flowgram, build_averaged_flowgram
from cogent.parse.flowgram_parser import parse_sff
class FlowgramTests(TestCase):
def test_init_empty(self):
"""Flowgram should init correctly."""
f = Flowgram()
self.assertEqual(f._flowgram, '')
self.assertEqual(f.flowgram, [])
def test_init_data(self):
"""Flowgram init with data should set data in correct location"""
f = Flowgram('0.5 1.0 4.0 0.0', Name = 'a',KeySeq = "ATCG",
floworder = "TACG", header_info = {'Bases':'TACCCC'})
self.assertEqual(f._flowgram, '0.5 1.0 4.0 0.0')
self.assertEqual(f.flowgram, [0.5, 1.0, 4.0, 0.0])
self.assertEqual(f.Name, 'a')
self.assertEqual(f.keySeq, "ATCG")
self.assertEqual(f.floworder, "TACG")
self.assertEqual(f.Bases, 'TACCCC')
self.assertEqual(f.header_info, {'Bases':'TACCCC'})
f = Flowgram([0.5, 1.0, 4.0, 0.0], Name = 'a',KeySeq = "ATCG",
floworder = "TACG", header_info = {'Bases':'TACCCC'})
self.assertEqual(f._flowgram, '0.5 1.0 4.0 0.0')
self.assertEqual(f.flowgram, [0.5, 1.0, 4.0, 0.0])
def test_cmpSeqToString(self):
"""Sequence should compare equal to same string."""
f = Flowgram('0.5 1.0 4.0 0.0', Name = 'a',floworder = "TACG",
header_info = {'Bases':'TACCCC'})
self.assertTrue(f.cmpSeqToString('TACCCC'))
self.assertFalse(f.cmpSeqToString('TACCC'))
f = Flowgram('0.5 1.0 4.0 0.0',floworder = "TACG")
self.assertTrue(f.cmpSeqToString('TACCCC'))
self.assertFalse(f.cmpSeqToString('TACCC'))
def test_cmp_flow_to_string(self):
"""Sequence should compare equal to same string."""
f = Flowgram('0.5 1.0 4.0 0.0', Name = 'a',floworder = "TACG",
header_info = {'Bases':'TACCCC'})
self.assertEqual(f, '0.5 1.0 4.0 0.0')
self.assertNotEqual(f,'0.5 1.0 4.0')
f2 = Flowgram('0.5 1.0 4.0 0.0', Name = 'a',floworder = "TACG",
header_info = {'Bases':'TACCCC'})
self.assertEqual(f,f2)
def test_cmpBySeqs(self):
"""Flowgrams should be the same if name, bases, or to_seqs are equal"""
f = Flowgram('0.5 1.0 4.0 0.0', Name = 'a',floworder = "TACG",
header_info = {'Bases':'TACCCC'})
f2 = Flowgram('0.5 1.0 4.0 0.0', Name = 'a',floworder = "TACG",
header_info = {'Bases':'TACCCC'})
self.assertEqual(f.cmpBySeqs(f2), 0)
f2 = Flowgram('0.5 1.0 4.0 0.0', Name = 'b',floworder = "TACG",
header_info = {'Bases':'TACCCC'})
self.assertEqual(f.cmpBySeqs(f2), 0)
f2 = Flowgram('0.5 1.0 4.0 0.0',floworder = "TACG")
self.assertEqual(f.cmpBySeqs(f2), 0)
def test_cmpByName(self):
"""Flowgrams should be the same if name, bases, or to_seqs are equal"""
f = Flowgram('0.5 1.0 4.0 0.0', Name = 'a',floworder = "TACG",
header_info = {'Bases':'TACCCC'})
f2 = Flowgram('0.5 1.0 4.0 0.0', Name = 'a',floworder = "TACG",
header_info = {'Bases':'TACCCC'})
self.assertEqual(f.cmpByName(f2), 0)
self.assertEqual(f.cmpByName(f), 0)
f2 = Flowgram('0.5 1.0 4.0 0.0', Name = 'b',floworder = "TACG",
header_info = {'Bases':'TACCCC'})
self.assertNotEqual(f.cmpByName(f2), 0)
def test_toFasta(self):
"""Flowgram toFasta() should return Fasta-format string"""
even = '0.5 1.0 4.0 0.0'
odd = '0.5 1.0 4.0 1.0'
even_f = Flowgram(even, Name='even', floworder = "TACG")
odd_f = Flowgram(odd, Name='odd', floworder = "TACG")
self.assertEqual(even_f.toFasta(), '>even\nTACCCC')
#set line wrap to small number so we can test that it works
self.assertEqual(even_f.toFasta(LineWrap = 2), '>even\nTA\nCC\nCC')
self.assertEqual(odd_f.toFasta(LineWrap = 2), '>odd\nTA\nCC\nCC\nG')
even_f = Flowgram(even, Name='even', floworder = "TACG",
header_info ={'Bases':'TACCCG'})
odd_f = Flowgram(odd, Name='odd', floworder = "TACG",
header_info ={'Bases':'TACCCGG'})
self.assertEqual(even_f.toFasta(), '>even\nTACCCG')
#set line wrap to small number so we can test that it works
self.assertEqual(even_f.toFasta(LineWrap = 2), '>even\nTA\nCC\nCG')
self.assertEqual(odd_f.toFasta(LineWrap = 2), '>odd\nTA\nCC\nCG\nG')
def test_contains(self):
"""Flowgram contains should return correct result"""
f = Flowgram('0.5 1.0 4.0 0.0', Name = 'a',floworder = "TACG",
header_info = {'Bases':'TACCCC'})
assert '0.5' in f
assert '0.5 1.0' in f
assert '2.0' not in f
assert '5.0' not in f
def test_cmp(self):
"""_cmp_ should compare the flowgram strings."""
f1 = Flowgram(['1 2 3 4'])
f2 = Flowgram(['2 2 3 4'])
self.assertNotEqual(f1,f2)
self.assertEqual(f1,f1)
#works also with string
self.assertNotEqual(f1,"1 2 3 5")
self.assertEqual(f1,"1 2 3 4")
self.assertNotEqual(f1,"")
def test_iter(self):
"""Flowgram iter should iterate over sequence"""
f = Flowgram('0.5 1.0 4.0 0.0')
self.assertEqual(list(f), [0.5,1.0,4.0,0.0])
def test_str(self):
"""__str__ returns self._flowgram unmodified."""
f = Flowgram('0.5 1.0 4.0 0.0')
self.assertEqual(str(f), '0.5\t1.0\t4.0\t0.0')
f = Flowgram([0.5, 1.0, 4.0, 0.0])
self.assertEqual(str(f), '0.5\t1.0\t4.0\t0.0')
def test_len(self):
"""returns the length of the flowgram"""
f = Flowgram('0.5 1.0 4.0 0.0')
self.assertEqual(len(f), 4)
f = Flowgram()
self.assertEqual(len(f), 0)
def test_hash(self):
"""__hash__ behaves like the flowgram string for dict lookup."""
f = Flowgram('0.5 1.0 4.0 0.0', floworder = "TACG")
self.assertEqual(hash(f), hash('0.5 1.0 4.0 0.0'))
f = Flowgram()
self.assertEqual(hash(f), hash(''))
def test_toSeq(self):
"""toSeq should Translate flowgram to sequence"""
f = Flowgram('0.5 1.0 4.0 0.0', Name = 'a',floworder = "TACG",
header_info = {'Bases':'TACCCG'})
self.assertEqual(f.toSeq(), 'TACCCG')
self.assertEqual(isinstance(f.toSeq(),Sequence), True)
self.assertEqual(f.toSeq(Bases = False), 'TACCCC')
f = Flowgram('0.5 1.0 4.0 0.0 0.0 1.23 0.0 6.1',
Name = 'a',floworder = "TACG",
header_info = {'Bases':'TACCCG'})
self.assertEqual(f.toSeq(), 'TACCCG')
self.assertEqual(f.toSeq(Bases = False), 'TACCCCAGGGGGG')
f = Flowgram('0.5 1.0 4.0 0.0', Name = 'a',floworder = "TACG",
header_info = {})
self.assertEqual(f.toSeq(), 'TACCCC')
self.assertEqual(isinstance(f.toSeq(),Sequence), True)
self.assertEqual(f.toSeq(Bases = False), 'TACCCC')
f = Flowgram('0.5 1.0 4.0 0.0 0.0 1.23 0.0 6.1',
Name = 'a',floworder = "TACG",
header_info = {})
self.assertEqual(f.toSeq(Bases = True), 'TACCCCAGGGGGG')
f = Flowgram('0.4 0.0 0.0 0.0 0.0 1.23 0.0 1.1',
Name = 'a',floworder = "TACG",
header_info = {})
self.assertEqual(f.toSeq(), 'NAG')
def test_getQualityTrimmedFlowgram(self):
"""getQualityTrimmedFlowgram trims the flowgram correctly"""
f = Flowgram('0.5 1.0 4.1 0.0 0.0 1.23 0.0 3.1',
Name = 'a', floworder = "TACG",
header_info = {'Bases':'TACCCCAGGG', 'Clip Qual Right': 7,
'Flow Indexes': "1\t2\t3\t3\t3\t3\t6\t8\t8\t8"})
trimmed = f.getQualityTrimmedFlowgram()
self.assertEqual(trimmed.toSeq(), "TACCCCA")
self.assertEqual(str(trimmed), "0.5\t1.0\t4.1\t0.0\t0.0\t1.23")
# tests on real data
flow1 = self.flows[0]
flow2 = self.flows[1]
flow1_trimmed = flow1.getQualityTrimmedFlowgram()
self.assertEqual(str(flow1_trimmed), "1.06 0.08 1.04 0.08 0.05 0.94 0.10 2.01 0.10 0.07 0.96 0.09 1.04 1.96 1.07 0.10 1.01 0.13 0.08 1.01 1.06 1.83 2.89 0.18 0.96 0.13 0.99 0.11 1.94 0.12 0.13 1.92 0.21 0.07 0.94 0.17 0.03 0.97 2.76 0.15 0.05 1.02 1.14 0.10 0.98 2.54 1.13 0.96 0.15 0.21 1.90 0.16 0.07 1.78 0.22 0.07 0.93 0.22 0.97 0.08 2.02 0.15 0.19 1.02 0.19 0.09 1.02 0.17 0.99 0.09 0.18 1.84 0.16 0.91 0.10 1.10 1.00 0.20 0.09 1.11 3.01 1.07 1.98 0.14 0.22 1.09 0.17 1.99 0.15 0.20 0.92 0.17 0.07 1.01 2.96 0.15 0.07 1.06 0.20 1.00 0.10 0.12 1.00 0.15 0.08 1.90 0.19 0.10 0.99 0.18 0.09 0.99 1.08 0.15 0.07 1.06 0.14 1.84 0.13 0.11 0.95 1.05 0.13 1.04 1.10 0.18 0.94 0.14 0.10 0.97")
self.assertEqual(flow1_trimmed.Bases,
"tcagGCTAACTGTAACCCTCTTGGCACCCACTAAACGCCAATCTTGCTGGAGTGTTTACCAGGCACCCAGCAATGTGAATAGTCA")
flow2_trimmed = flow2.getQualityTrimmedFlowgram()
self.assertEqual(str(flow2_trimmed), "1.04 0.00 1.01 0.00 0.00 1.00 0.00 1.00 0.00 1.05 0.00 0.91 0.10 1.07 0.95 1.01 0.00 0.06 0.93 0.02 0.03 1.06 1.18 0.09 1.00 0.05 0.90 0.11 0.07 1.99 0.11 0.02 1.96 1.04 0.13 0.01 2.83 0.10 1.97 0.06 0.11 1.04 0.13 0.03 0.98 1.15 0.07 1.00 0.07 0.08 0.98 0.11 1.92 0.05 0.04 2.96 1.02 1.02 0.04 0.93 1.00 0.13 0.04 1.00 1.03 0.08 0.97 0.13 0.11 1.88 0.09 0.05 1.02 1.89 0.07 0.11 0.98 0.05 0.07 1.01 0.08 0.05 1.01 0.13 1.00 0.07 0.10 1.04 0.10 0.04 0.98 0.12 1.03 0.96 0.11 0.07 1.00 0.09 0.03 1.03 0.11 1.95 1.06 0.13 0.05 1.00 0.13 0.11 1.00 0.09 0.03 2.89 0.08 0.95 0.09 1.03 1.02 1.05 1.07 0.08 0.12 2.81 0.08 0.08 1.00 1.07 0.07 0.05 1.86 0.12 0.98 0.06 2.00 0.11 1.02 0.11 0.08 1.88 0.13 1.03 0.13 0.98 0.15 0.11 1.03 1.03 1.04 0.18 0.98 0.13 0.15 1.04 0.11 1.01 0.13 0.06 1.01 0.06 1.02 0.08 0.99 0.14 0.99 0.09 0.05 1.09 0.04 0.07 2.96 0.09 2.03 0.13 2.96 1.13 0.08 1.03 0.07 0.99 0.11 0.05 1.05 1.04 0.09 0.07 1.00 1.03 0.09 0.06 1.06 1.04 2.94 0.18 0.06 0.93 0.10 1.10 0.11 2.02 0.17 1.00 1.03 0.06 0.11 0.96 0.04 3.00 0.11 0.07 1.99 0.10 2.03 0.12 0.97 0.16 0.01 2.09 0.14 1.04 0.16 0.06 1.03 0.14 1.12 0.12 0.05 0.96 1.01 0.10 0.14 0.94 0.03 0.12 1.10 0.92 0.09 1.10 1.04 1.02 0.12 0.97 2.00 0.15 1.08 0.04 1.03 1.04 0.03 0.09 5.16 1.02 0.09 0.13 2.66 0.09 0.05 1.06 0.07 0.89 0.05 0.12 1.10 0.16 0.06 1.01 0.13 1.00 0.14 0.98 0.09 2.92 1.28 0.03 2.95 0.98 0.16 0.08 0.95 0.96 1.09 0.08 1.07 1.01 0.16 0.06 4.52 0.12 1.03 0.07 0.09 1.03 0.14 0.03 1.01 1.99")
self.assertEqual(flow2_trimmed.Bases,
"tcagAGACGCACTCAATTATTTCCATAGCTTGGGTAGTGTCAATAATGCTGCTATGAACATGGGAGTACAAATATTCTTCAAGATACTGATCTCATTTCCTTTAGATATATACCCAGAAGTGAAATTCCTGGATCACATAGTAGTTCTATTTTTATTTGATGAGAAACTTTATACTATTTTTCATAA")
def test_getPrimerTrimmedFlowgram(self):
"""getPrimerTrimmedFlowgram cuts the barcode of the flowgram correctly"""
f = Flowgram('0.5 1.0 4.1 0.0 0.0 1.23 0.0 3.1',
Name = 'a', floworder = "TACG",
header_info = {'Bases':'TACCCCAGGG', 'Clip Qual Right': 7,
'Flow Indexes': "1\t2\t3\t3\t3\t3\t6\t8\t8\t8"})
trimmed = f.getPrimerTrimmedFlowgram(primerseq="TA")
#test primer trimming
self.assertEqual(trimmed.toSeq(), "CCCCAGGG")
self.assertEqual(str(trimmed), "0.00\t0.00\t4.10\t0.00\t0.00\t1.23\t0.00\t3.10")
for (a,b) in zip(trimmed.flowgram, [0.0,0.0,4.1,0.0,0.0,1.23,0.0,3.1]):
self.assertFloatEqual(a,b)
trimmed = f.getPrimerTrimmedFlowgram(primerseq="TACC")
for (a,b) in zip(trimmed.flowgram, [0.0,0.0,2.1,0.0,0.0,1.23,0.0,3.1]):
self.assertFloatEqual(a,b)
self.assertEqual(trimmed.toSeq(), "CCAGGG")
self.assertEqual(str(trimmed), "0.00\t0.00\t2.10\t0.00\t0.00\t1.23\t0.00\t3.10")
# test that primer trimming does not leave ambig flow at begin
trimmed = f.getPrimerTrimmedFlowgram(primerseq="TACCCC")
for (a,b) in zip(trimmed.flowgram, [0.0,1.23,0.0,3.1]):
self.assertFloatEqual(a,b)
self.assertEqual(trimmed.toSeq(), "AGGG")
self.assertEqual(str(trimmed), "0.00\t1.23\t0.00\t3.10")
# tests on real data
flow1 = self.flows[0]
flow2 = self.flows[1]
flow3 = self.flows[2]
flow1_trimmed = flow1.getPrimerTrimmedFlowgram(primerseq="TCAG"+"GCTAACTGTAA")
self.assertEqual(str(flow1_trimmed), "0.00\t0.00\t2.89 0.18 0.96 0.13 0.99 0.11 1.94 0.12 0.13 1.92 0.21 0.07 0.94 0.17 0.03 0.97 2.76 0.15 0.05 1.02 1.14 0.10 0.98 2.54 1.13 0.96 0.15 0.21 1.90 0.16 0.07 1.78 0.22 0.07 0.93 0.22 0.97 0.08 2.02 0.15 0.19 1.02 0.19 0.09 1.02 0.17 0.99 0.09 0.18 1.84 0.16 0.91 0.10 1.10 1.00 0.20 0.09 1.11 3.01 1.07 1.98 0.14 0.22 1.09 0.17 1.99 0.15 0.20 0.92 0.17 0.07 1.01 2.96 0.15 0.07 1.06 0.20 1.00 0.10 0.12 1.00 0.15 0.08 1.90 0.19 0.10 0.99 0.18 0.09 0.99 1.08 0.15 0.07 1.06 0.14 1.84 0.13 0.11 0.95 1.05 0.13 1.04 1.10 0.18 0.94 0.14 0.10 0.97 1.08 0.12 1.08 0.18 0.08 1.00 0.13 0.98 0.15 0.87 0.13 0.19 1.01 3.06 0.17 0.11 1.04 0.09 1.03 0.10 0.11 2.02 0.16 0.11 1.04 0.04 0.09 1.87 0.13 2.09 0.13 0.10 0.97 0.17 0.08 0.08 0.04 0.12 0.05 0.08 0.07 0.08 0.05 0.07 0.06 0.07 0.03 0.05 0.04 0.09 0.04 0.07 0.04 0.07 0.06 0.03 0.06 0.06 0.06 0.06 0.07 0.09 0.04 0.05 0.08 0.05 0.04 0.09 0.06 0.03 0.02 0.08 0.04 0.06 0.05 0.08 0.03 0.08 0.05 0.05 0.05 0.10 0.05 0.05 0.07 0.06 0.04 0.06 0.05 0.03 0.04 0.05 0.06 0.04 0.04 0.07 0.04 0.04 0.05 0.05 0.04 0.07 0.06 0.05 0.03 0.08 0.05 0.06 0.04 0.06 0.05 0.04 0.04 0.04 0.05 0.06 0.04 0.05 0.04 0.05 0.05 0.06 0.05 0.06 0.04 0.06 0.07 0.06 0.05 0.05 0.05 0.06 0.06 0.04 0.05 0.06 0.03 0.06 0.04 0.06 0.05 0.03 0.06 0.06 0.05 0.06 0.04 0.03 0.06 0.06 0.06 0.03 0.04 0.05 0.05 0.07 0.04 0.05 0.06 0.07 0.07 0.05 0.07 0.06 0.05 0.06 0.05 0.07 0.06 0.05 0.06 0.07 0.05 0.06 0.04 0.06 0.05 0.05 0.06 0.04 0.06 0.04 0.03 0.06 0.05 0.05 0.04 0.05 0.05 0.04 0.04 0.05 0.06 0.06 0.04 0.04 0.05 0.06 0.04 0.04 0.04 0.05 0.05 0.04 0.05 0.05 0.03 0.06 0.06 0.06 0.04 0.07 0.05 0.05 0.04 0.06 0.06 0.05 0.05 0.07 0.04 0.06 0.06 0.06 0.04 0.06 0.03 0.06 0.04 0.06 0.04 0.09 0.05 0.05 0.05 0.07 0.06 0.05 0.05 0.06 0.05 0.05 0.05 0.04 0.04 0.06 0.05 0.05 0.05 0.05 0.04 0.05 0.05 0.06 0.04 0.05 0.05 0.05 0.05 0.05 0.04 0.06 0.04 0.05 0.05 0.04 0.05 0.05 0.05 0.04")
self.assertEqual(flow1_trimmed.Bases,
"CCCTCTTGGCACCCACTAAACGCCAATCTTGCTGGAGTGTTTACCAGGCACCCAGCAATGTGAATAGTCActgagcgggctggcaaggc")
flow1_trimmed = flow1.getPrimerTrimmedFlowgram(primerseq="TCAG"+"GCTAACTGTAAC")
self.assertEqual(str(flow1_trimmed), "0.00\t0.00\t1.89 0.18 0.96 0.13 0.99 0.11 1.94 0.12 0.13 1.92 0.21 0.07 0.94 0.17 0.03 0.97 2.76 0.15 0.05 1.02 1.14 0.10 0.98 2.54 1.13 0.96 0.15 0.21 1.90 0.16 0.07 1.78 0.22 0.07 0.93 0.22 0.97 0.08 2.02 0.15 0.19 1.02 0.19 0.09 1.02 0.17 0.99 0.09 0.18 1.84 0.16 0.91 0.10 1.10 1.00 0.20 0.09 1.11 3.01 1.07 1.98 0.14 0.22 1.09 0.17 1.99 0.15 0.20 0.92 0.17 0.07 1.01 2.96 0.15 0.07 1.06 0.20 1.00 0.10 0.12 1.00 0.15 0.08 1.90 0.19 0.10 0.99 0.18 0.09 0.99 1.08 0.15 0.07 1.06 0.14 1.84 0.13 0.11 0.95 1.05 0.13 1.04 1.10 0.18 0.94 0.14 0.10 0.97 1.08 0.12 1.08 0.18 0.08 1.00 0.13 0.98 0.15 0.87 0.13 0.19 1.01 3.06 0.17 0.11 1.04 0.09 1.03 0.10 0.11 2.02 0.16 0.11 1.04 0.04 0.09 1.87 0.13 2.09 0.13 0.10 0.97 0.17 0.08 0.08 0.04 0.12 0.05 0.08 0.07 0.08 0.05 0.07 0.06 0.07 0.03 0.05 0.04 0.09 0.04 0.07 0.04 0.07 0.06 0.03 0.06 0.06 0.06 0.06 0.07 0.09 0.04 0.05 0.08 0.05 0.04 0.09 0.06 0.03 0.02 0.08 0.04 0.06 0.05 0.08 0.03 0.08 0.05 0.05 0.05 0.10 0.05 0.05 0.07 0.06 0.04 0.06 0.05 0.03 0.04 0.05 0.06 0.04 0.04 0.07 0.04 0.04 0.05 0.05 0.04 0.07 0.06 0.05 0.03 0.08 0.05 0.06 0.04 0.06 0.05 0.04 0.04 0.04 0.05 0.06 0.04 0.05 0.04 0.05 0.05 0.06 0.05 0.06 0.04 0.06 0.07 0.06 0.05 0.05 0.05 0.06 0.06 0.04 0.05 0.06 0.03 0.06 0.04 0.06 0.05 0.03 0.06 0.06 0.05 0.06 0.04 0.03 0.06 0.06 0.06 0.03 0.04 0.05 0.05 0.07 0.04 0.05 0.06 0.07 0.07 0.05 0.07 0.06 0.05 0.06 0.05 0.07 0.06 0.05 0.06 0.07 0.05 0.06 0.04 0.06 0.05 0.05 0.06 0.04 0.06 0.04 0.03 0.06 0.05 0.05 0.04 0.05 0.05 0.04 0.04 0.05 0.06 0.06 0.04 0.04 0.05 0.06 0.04 0.04 0.04 0.05 0.05 0.04 0.05 0.05 0.03 0.06 0.06 0.06 0.04 0.07 0.05 0.05 0.04 0.06 0.06 0.05 0.05 0.07 0.04 0.06 0.06 0.06 0.04 0.06 0.03 0.06 0.04 0.06 0.04 0.09 0.05 0.05 0.05 0.07 0.06 0.05 0.05 0.06 0.05 0.05 0.05 0.04 0.04 0.06 0.05 0.05 0.05 0.05 0.04 0.05 0.05 0.06 0.04 0.05 0.05 0.05 0.05 0.05 0.04 0.06 0.04 0.05 0.05 0.04 0.05 0.05 0.05 0.04")
self.assertEqual(flow1_trimmed.Bases,
"CCTCTTGGCACCCACTAAACGCCAATCTTGCTGGAGTGTTTACCAGGCACCCAGCAATGTGAATAGTCActgagcgggctggcaaggc")
#test that trimming does not leave 4 zero flows (homopolymer)
flow1_trimmed = flow1.getPrimerTrimmedFlowgram(primerseq="TCAG"+"GCTAACTGTAACCC")
self.assertEqual(str(flow1_trimmed), "0.96 0.13 0.99 0.11 1.94 0.12 0.13 1.92 0.21 0.07 0.94 0.17 0.03 0.97 2.76 0.15 0.05 1.02 1.14 0.10 0.98 2.54 1.13 0.96 0.15 0.21 1.90 0.16 0.07 1.78 0.22 0.07 0.93 0.22 0.97 0.08 2.02 0.15 0.19 1.02 0.19 0.09 1.02 0.17 0.99 0.09 0.18 1.84 0.16 0.91 0.10 1.10 1.00 0.20 0.09 1.11 3.01 1.07 1.98 0.14 0.22 1.09 0.17 1.99 0.15 0.20 0.92 0.17 0.07 1.01 2.96 0.15 0.07 1.06 0.20 1.00 0.10 0.12 1.00 0.15 0.08 1.90 0.19 0.10 0.99 0.18 0.09 0.99 1.08 0.15 0.07 1.06 0.14 1.84 0.13 0.11 0.95 1.05 0.13 1.04 1.10 0.18 0.94 0.14 0.10 0.97 1.08 0.12 1.08 0.18 0.08 1.00 0.13 0.98 0.15 0.87 0.13 0.19 1.01 3.06 0.17 0.11 1.04 0.09 1.03 0.10 0.11 2.02 0.16 0.11 1.04 0.04 0.09 1.87 0.13 2.09 0.13 0.10 0.97 0.17 0.08 0.08 0.04 0.12 0.05 0.08 0.07 0.08 0.05 0.07 0.06 0.07 0.03 0.05 0.04 0.09 0.04 0.07 0.04 0.07 0.06 0.03 0.06 0.06 0.06 0.06 0.07 0.09 0.04 0.05 0.08 0.05 0.04 0.09 0.06 0.03 0.02 0.08 0.04 0.06 0.05 0.08 0.03 0.08 0.05 0.05 0.05 0.10 0.05 0.05 0.07 0.06 0.04 0.06 0.05 0.03 0.04 0.05 0.06 0.04 0.04 0.07 0.04 0.04 0.05 0.05 0.04 0.07 0.06 0.05 0.03 0.08 0.05 0.06 0.04 0.06 0.05 0.04 0.04 0.04 0.05 0.06 0.04 0.05 0.04 0.05 0.05 0.06 0.05 0.06 0.04 0.06 0.07 0.06 0.05 0.05 0.05 0.06 0.06 0.04 0.05 0.06 0.03 0.06 0.04 0.06 0.05 0.03 0.06 0.06 0.05 0.06 0.04 0.03 0.06 0.06 0.06 0.03 0.04 0.05 0.05 0.07 0.04 0.05 0.06 0.07 0.07 0.05 0.07 0.06 0.05 0.06 0.05 0.07 0.06 0.05 0.06 0.07 0.05 0.06 0.04 0.06 0.05 0.05 0.06 0.04 0.06 0.04 0.03 0.06 0.05 0.05 0.04 0.05 0.05 0.04 0.04 0.05 0.06 0.06 0.04 0.04 0.05 0.06 0.04 0.04 0.04 0.05 0.05 0.04 0.05 0.05 0.03 0.06 0.06 0.06 0.04 0.07 0.05 0.05 0.04 0.06 0.06 0.05 0.05 0.07 0.04 0.06 0.06 0.06 0.04 0.06 0.03 0.06 0.04 0.06 0.04 0.09 0.05 0.05 0.05 0.07 0.06 0.05 0.05 0.06 0.05 0.05 0.05 0.04 0.04 0.06 0.05 0.05 0.05 0.05 0.04 0.05 0.05 0.06 0.04 0.05 0.05 0.05 0.05 0.05 0.04 0.06 0.04 0.05 0.05 0.04 0.05 0.05 0.05 0.04")
self.assertEqual(flow1_trimmed.Bases,
"TCTTGGCACCCACTAAACGCCAATCTTGCTGGAGTGTTTACCAGGCACCCAGCAATGTGAATAGTCActgagcgggctggcaaggc")
#test that trimming does not leave 4 zero flows (signal <1.5)
flow1_trimmed = flow1.getPrimerTrimmedFlowgram(primerseq="TCAG"+"GCTAACTGTAACCCTC")
self.assertEqual(str(flow1_trimmed), "1.94 0.12 0.13 1.92 0.21 0.07 0.94 0.17 0.03 0.97 2.76 0.15 0.05 1.02 1.14 0.10 0.98 2.54 1.13 0.96 0.15 0.21 1.90 0.16 0.07 1.78 0.22 0.07 0.93 0.22 0.97 0.08 2.02 0.15 0.19 1.02 0.19 0.09 1.02 0.17 0.99 0.09 0.18 1.84 0.16 0.91 0.10 1.10 1.00 0.20 0.09 1.11 3.01 1.07 1.98 0.14 0.22 1.09 0.17 1.99 0.15 0.20 0.92 0.17 0.07 1.01 2.96 0.15 0.07 1.06 0.20 1.00 0.10 0.12 1.00 0.15 0.08 1.90 0.19 0.10 0.99 0.18 0.09 0.99 1.08 0.15 0.07 1.06 0.14 1.84 0.13 0.11 0.95 1.05 0.13 1.04 1.10 0.18 0.94 0.14 0.10 0.97 1.08 0.12 1.08 0.18 0.08 1.00 0.13 0.98 0.15 0.87 0.13 0.19 1.01 3.06 0.17 0.11 1.04 0.09 1.03 0.10 0.11 2.02 0.16 0.11 1.04 0.04 0.09 1.87 0.13 2.09 0.13 0.10 0.97 0.17 0.08 0.08 0.04 0.12 0.05 0.08 0.07 0.08 0.05 0.07 0.06 0.07 0.03 0.05 0.04 0.09 0.04 0.07 0.04 0.07 0.06 0.03 0.06 0.06 0.06 0.06 0.07 0.09 0.04 0.05 0.08 0.05 0.04 0.09 0.06 0.03 0.02 0.08 0.04 0.06 0.05 0.08 0.03 0.08 0.05 0.05 0.05 0.10 0.05 0.05 0.07 0.06 0.04 0.06 0.05 0.03 0.04 0.05 0.06 0.04 0.04 0.07 0.04 0.04 0.05 0.05 0.04 0.07 0.06 0.05 0.03 0.08 0.05 0.06 0.04 0.06 0.05 0.04 0.04 0.04 0.05 0.06 0.04 0.05 0.04 0.05 0.05 0.06 0.05 0.06 0.04 0.06 0.07 0.06 0.05 0.05 0.05 0.06 0.06 0.04 0.05 0.06 0.03 0.06 0.04 0.06 0.05 0.03 0.06 0.06 0.05 0.06 0.04 0.03 0.06 0.06 0.06 0.03 0.04 0.05 0.05 0.07 0.04 0.05 0.06 0.07 0.07 0.05 0.07 0.06 0.05 0.06 0.05 0.07 0.06 0.05 0.06 0.07 0.05 0.06 0.04 0.06 0.05 0.05 0.06 0.04 0.06 0.04 0.03 0.06 0.05 0.05 0.04 0.05 0.05 0.04 0.04 0.05 0.06 0.06 0.04 0.04 0.05 0.06 0.04 0.04 0.04 0.05 0.05 0.04 0.05 0.05 0.03 0.06 0.06 0.06 0.04 0.07 0.05 0.05 0.04 0.06 0.06 0.05 0.05 0.07 0.04 0.06 0.06 0.06 0.04 0.06 0.03 0.06 0.04 0.06 0.04 0.09 0.05 0.05 0.05 0.07 0.06 0.05 0.05 0.06 0.05 0.05 0.05 0.04 0.04 0.06 0.05 0.05 0.05 0.05 0.04 0.05 0.05 0.06 0.04 0.05 0.05 0.05 0.05 0.05 0.04 0.06 0.04 0.05 0.05 0.04 0.05 0.05 0.05 0.04")
self.assertEqual(flow1_trimmed.Bases,
"TTGGCACCCACTAAACGCCAATCTTGCTGGAGTGTTTACCAGGCACCCAGCAATGTGAATAGTCActgagcgggctggcaaggc")
flow1_untrimmed= flow1.getPrimerTrimmedFlowgram("")
self.assertEqual(str(flow1_untrimmed), "1.06 0.08 1.04 0.08 0.05 0.94 0.10 2.01 0.10 0.07 0.96 0.09 1.04 1.96 1.07 0.10 1.01 0.13 0.08 1.01 1.06 1.83 2.89 0.18 0.96 0.13 0.99 0.11 1.94 0.12 0.13 1.92 0.21 0.07 0.94 0.17 0.03 0.97 2.76 0.15 0.05 1.02 1.14 0.10 0.98 2.54 1.13 0.96 0.15 0.21 1.90 0.16 0.07 1.78 0.22 0.07 0.93 0.22 0.97 0.08 2.02 0.15 0.19 1.02 0.19 0.09 1.02 0.17 0.99 0.09 0.18 1.84 0.16 0.91 0.10 1.10 1.00 0.20 0.09 1.11 3.01 1.07 1.98 0.14 0.22 1.09 0.17 1.99 0.15 0.20 0.92 0.17 0.07 1.01 2.96 0.15 0.07 1.06 0.20 1.00 0.10 0.12 1.00 0.15 0.08 1.90 0.19 0.10 0.99 0.18 0.09 0.99 1.08 0.15 0.07 1.06 0.14 1.84 0.13 0.11 0.95 1.05 0.13 1.04 1.10 0.18 0.94 0.14 0.10 0.97 1.08 0.12 1.08 0.18 0.08 1.00 0.13 0.98 0.15 0.87 0.13 0.19 1.01 3.06 0.17 0.11 1.04 0.09 1.03 0.10 0.11 2.02 0.16 0.11 1.04 0.04 0.09 1.87 0.13 2.09 0.13 0.10 0.97 0.17 0.08 0.08 0.04 0.12 0.05 0.08 0.07 0.08 0.05 0.07 0.06 0.07 0.03 0.05 0.04 0.09 0.04 0.07 0.04 0.07 0.06 0.03 0.06 0.06 0.06 0.06 0.07 0.09 0.04 0.05 0.08 0.05 0.04 0.09 0.06 0.03 0.02 0.08 0.04 0.06 0.05 0.08 0.03 0.08 0.05 0.05 0.05 0.10 0.05 0.05 0.07 0.06 0.04 0.06 0.05 0.03 0.04 0.05 0.06 0.04 0.04 0.07 0.04 0.04 0.05 0.05 0.04 0.07 0.06 0.05 0.03 0.08 0.05 0.06 0.04 0.06 0.05 0.04 0.04 0.04 0.05 0.06 0.04 0.05 0.04 0.05 0.05 0.06 0.05 0.06 0.04 0.06 0.07 0.06 0.05 0.05 0.05 0.06 0.06 0.04 0.05 0.06 0.03 0.06 0.04 0.06 0.05 0.03 0.06 0.06 0.05 0.06 0.04 0.03 0.06 0.06 0.06 0.03 0.04 0.05 0.05 0.07 0.04 0.05 0.06 0.07 0.07 0.05 0.07 0.06 0.05 0.06 0.05 0.07 0.06 0.05 0.06 0.07 0.05 0.06 0.04 0.06 0.05 0.05 0.06 0.04 0.06 0.04 0.03 0.06 0.05 0.05 0.04 0.05 0.05 0.04 0.04 0.05 0.06 0.06 0.04 0.04 0.05 0.06 0.04 0.04 0.04 0.05 0.05 0.04 0.05 0.05 0.03 0.06 0.06 0.06 0.04 0.07 0.05 0.05 0.04 0.06 0.06 0.05 0.05 0.07 0.04 0.06 0.06 0.06 0.04 0.06 0.03 0.06 0.04 0.06 0.04 0.09 0.05 0.05 0.05 0.07 0.06 0.05 0.05 0.06 0.05 0.05 0.05 0.04 0.04 0.06 0.05 0.05 0.05 0.05 0.04 0.05 0.05 0.06 0.04 0.05 0.05 0.05 0.05 0.05 0.04 0.06 0.04 0.05 0.05 0.04 0.05 0.05 0.05 0.04")
self.assertEqual(flow1_untrimmed.Bases, "tcagGCTAACTGTAACCCTCTTGGCACCCACTAAACGCCAATCTTGCTGGAGTGTTTACCAGGCACCCAGCAATGTGAATAGTCActgagcgggctggcaaggc")
flow2_trimmed = flow2.getPrimerTrimmedFlowgram(primerseq="TCAG"+"AGACGCACT")
self.assertEqual(str(flow2_trimmed), "0.00\t0.05\t0.90 0.11 0.07 1.99 0.11 0.02 1.96 1.04 0.13 0.01 2.83 0.10 1.97 0.06 0.11 1.04 0.13 0.03 0.98 1.15 0.07 1.00 0.07 0.08 0.98 0.11 1.92 0.05 0.04 2.96 1.02 1.02 0.04 0.93 1.00 0.13 0.04 1.00 1.03 0.08 0.97 0.13 0.11 1.88 0.09 0.05 1.02 1.89 0.07 0.11 0.98 0.05 0.07 1.01 0.08 0.05 1.01 0.13 1.00 0.07 0.10 1.04 0.10 0.04 0.98 0.12 1.03 0.96 0.11 0.07 1.00 0.09 0.03 1.03 0.11 1.95 1.06 0.13 0.05 1.00 0.13 0.11 1.00 0.09 0.03 2.89 0.08 0.95 0.09 1.03 1.02 1.05 1.07 0.08 0.12 2.81 0.08 0.08 1.00 1.07 0.07 0.05 1.86 0.12 0.98 0.06 2.00 0.11 1.02 0.11 0.08 1.88 0.13 1.03 0.13 0.98 0.15 0.11 1.03 1.03 1.04 0.18 0.98 0.13 0.15 1.04 0.11 1.01 0.13 0.06 1.01 0.06 1.02 0.08 0.99 0.14 0.99 0.09 0.05 1.09 0.04 0.07 2.96 0.09 2.03 0.13 2.96 1.13 0.08 1.03 0.07 0.99 0.11 0.05 1.05 1.04 0.09 0.07 1.00 1.03 0.09 0.06 1.06 1.04 2.94 0.18 0.06 0.93 0.10 1.10 0.11 2.02 0.17 1.00 1.03 0.06 0.11 0.96 0.04 3.00 0.11 0.07 1.99 0.10 2.03 0.12 0.97 0.16 0.01 2.09 0.14 1.04 0.16 0.06 1.03 0.14 1.12 0.12 0.05 0.96 1.01 0.10 0.14 0.94 0.03 0.12 1.10 0.92 0.09 1.10 1.04 1.02 0.12 0.97 2.00 0.15 1.08 0.04 1.03 1.04 0.03 0.09 5.16 1.02 0.09 0.13 2.66 0.09 0.05 1.06 0.07 0.89 0.05 0.12 1.10 0.16 0.06 1.01 0.13 1.00 0.14 0.98 0.09 2.92 1.28 0.03 2.95 0.98 0.16 0.08 0.95 0.96 1.09 0.08 1.07 1.01 0.16 0.06 4.52 0.12 1.03 0.07 0.09 1.03 0.14 0.03 1.01 1.99 1.05 0.14 1.03 0.13 0.03 1.10 0.10 0.96 0.11 0.99 0.12 0.05 0.94 2.83 0.14 0.12 0.96 0.00 1.00 0.11 0.14 1.98 0.08 0.11 1.04 0.01 0.11 2.03 0.15 2.05 0.10 0.03 0.93 0.01 0.08 0.12 0.00 0.16 0.05 0.07 0.08 0.11 0.07 0.05 0.04 0.10 0.05 0.05 0.03 0.07 0.03 0.04 0.04 0.06 0.03 0.05 0.04 0.09 0.03 0.08 0.03 0.07 0.02 0.05 0.02 0.06 0.01 0.05 0.04 0.06 0.02 0.04 0.04 0.04 0.03 0.03 0.06 0.06 0.03 0.02 0.02 0.08 0.03 0.01 0.01 0.06 0.03 0.01 0.03 0.04 0.02 0.00 0.02 0.05 0.00 0.02 0.02 0.03 0.00 0.02 0.02 0.04 0.01 0.00 0.01 0.05")
self.assertEqual(flow2_trimmed.Bases,
"CAATTATTTCCATAGCTTGGGTAGTGTCAATAATGCTGCTATGAACATGGGAGTACAAATATTCTTCAAGATACTGATCTCATTTCCTTTAGATATATACCCAGAAGTGAAATTCCTGGATCACATAGTAGTTCTATTTTTATTTGATGAGAAACTTTATACTATTTTTCATAActgagcgggctggcaaggc")
#trimming at the end of the flow cycle works
flow3_trimmed = flow3.getPrimerTrimmedFlowgram(primerseq="TCAG"+"ATTAGATACCCNGGTAGG")
self.assertEqual(str(flow3_trimmed), "0.05 0.05 2.04 0.10 0.03 1.06 1.05 1.01 0.07 0.09 2.07 1.01 0.93 2.88 1.06 1.95 1.00 0.05 0.05 2.97 0.09 0.00 0.93 1.01 0.06 0.05 0.99 0.09 0.98 1.01 0.03 1.02 1.92 0.07 0.01 1.03 1.01 0.01 0.05 0.96 0.09 0.05 0.98 1.07 0.02 2.02 2.05 0.09 1.87 0.12 2.15 0.05 0.13 0.92 1.05 1.96 3.01 0.13 0.04 1.05 0.96 0.05 0.05 0.95 0.12 0.01 1.00 2.02 0.03 0.03 0.99 1.01 0.05 0.06 0.98 0.13 0.06 0.97 0.11 1.01 0.08 0.12 1.02 0.12 1.02 2.19 1.03 1.01 0.08 0.11 0.96 0.09 0.08 1.01 0.08 0.06 2.10 2.11 0.12 1.04 0.13 0.09 0.94 1.03 0.08 0.05 3.06 0.12 1.00 0.03 0.09 0.95 0.10 0.03 2.09 0.21 0.99 0.06 0.11 4.06 0.10 1.04 0.04 1.05 1.05 1.04 1.02 0.97 0.13 0.93 0.10 0.12 1.08 0.12 0.99 1.06 0.10 0.11 0.98 0.10 0.02 2.01 0.10 1.01 0.09 0.96 0.07 0.11 2.03 4.12 1.05 0.08 1.01 0.04 0.98 0.14 0.12 2.96 0.13 1.98 0.12 2.08 0.10 0.12 1.99 0.13 0.07 0.98 0.03 0.93 0.86 4.10 0.13 0.10 3.99 1.13 0.07 0.06 1.07 0.09 0.05 1.03 1.12 0.13 0.05 2.01 0.08 0.80 0.05 0.11 0.98 0.13 0.04 1.01 0.07 1.02 0.07 0.11 1.07 2.19 0.06 0.97 0.11 1.03 0.05 0.11 1.05 0.14 0.06 1.03 0.13 0.10 0.97 0.16 0.13 1.00 0.13 0.06 1.02 2.15 0.02 0.16 0.95 0.09 2.06 2.12 0.07 0.07 2.08 0.12 0.97 1.00 0.03 0.99 1.02 1.01 0.03 0.15 0.90 0.07 0.01 2.00 1.01 1.00 0.06 0.11 1.08 1.00 0.03 1.99 0.03 1.00 0.02 1.85 1.93 0.14 1.97 0.91 1.83 0.06 0.04 1.97 0.05 2.08 0.04 0.06 1.05 0.05 2.13 0.16 0.09 1.17 0.01 1.01 1.07 0.09 0.14 0.91 0.06 0.08 1.03 1.04 0.08 0.05 1.05 1.03 1.16 0.06 0.05 1.01 0.06 2.15 0.06 1.99 0.13 0.04 1.08 0.97 0.11 0.07 1.05 0.08 0.07 2.13 0.14 0.09 1.10 0.15 0.00 1.02 0.07 1.05 0.05 0.95 0.09 1.00 0.15 0.95 0.08 0.15 1.11 0.07 0.12 1.05 1.06 0.09 1.03 0.07 0.11 1.01 0.05 0.05 1.05 0.98 0.00 0.93 0.08 0.12 1.85 1.11 0.10 0.07 1.00 0.01 0.10 1.87 0.05 2.14 1.10 0.03 1.06 0.10 0.91 0.10 0.06 1.05 1.02 1.02 0.07 0.06 0.98 0.95 1.09 0.06 0.14 0.97 0.04 2.44")
self.assertEqual(flow3_trimmed.Bases,
"CCACGCCGTAAACGGTGGGCGCTAGTTGTGCGAACCTTCCACGGTTTGTGCGGCGCAGCTAACGCATTAAGCGCCCTGCCTGGGGAGTACGATCGCAAGATTAAAACTCAAAGGAATTGACGGGGCCCCGCACAAGCAGCGGAGCATGCGGCTTAATTCGACGCAACGCGAAGAACCTTACCAAGGCTTGACATATACAGGAATATGGCAGAGATGTCATAGCCGCAAGGTCTGTATACAGG")
flow3_trimmed = flow3.getPrimerTrimmedFlowgram(primerseq="TCAG"+"ATTAGATACCCNGGTAG")
self.assertEqual(str(flow3_trimmed), "0.00\t0.00\t0.00 1.10 0.05 0.05 2.04 0.10 0.03 1.06 1.05 1.01 0.07 0.09 2.07 1.01 0.93 2.88 1.06 1.95 1.00 0.05 0.05 2.97 0.09 0.00 0.93 1.01 0.06 0.05 0.99 0.09 0.98 1.01 0.03 1.02 1.92 0.07 0.01 1.03 1.01 0.01 0.05 0.96 0.09 0.05 0.98 1.07 0.02 2.02 2.05 0.09 1.87 0.12 2.15 0.05 0.13 0.92 1.05 1.96 3.01 0.13 0.04 1.05 0.96 0.05 0.05 0.95 0.12 0.01 1.00 2.02 0.03 0.03 0.99 1.01 0.05 0.06 0.98 0.13 0.06 0.97 0.11 1.01 0.08 0.12 1.02 0.12 1.02 2.19 1.03 1.01 0.08 0.11 0.96 0.09 0.08 1.01 0.08 0.06 2.10 2.11 0.12 1.04 0.13 0.09 0.94 1.03 0.08 0.05 3.06 0.12 1.00 0.03 0.09 0.95 0.10 0.03 2.09 0.21 0.99 0.06 0.11 4.06 0.10 1.04 0.04 1.05 1.05 1.04 1.02 0.97 0.13 0.93 0.10 0.12 1.08 0.12 0.99 1.06 0.10 0.11 0.98 0.10 0.02 2.01 0.10 1.01 0.09 0.96 0.07 0.11 2.03 4.12 1.05 0.08 1.01 0.04 0.98 0.14 0.12 2.96 0.13 1.98 0.12 2.08 0.10 0.12 1.99 0.13 0.07 0.98 0.03 0.93 0.86 4.10 0.13 0.10 3.99 1.13 0.07 0.06 1.07 0.09 0.05 1.03 1.12 0.13 0.05 2.01 0.08 0.80 0.05 0.11 0.98 0.13 0.04 1.01 0.07 1.02 0.07 0.11 1.07 2.19 0.06 0.97 0.11 1.03 0.05 0.11 1.05 0.14 0.06 1.03 0.13 0.10 0.97 0.16 0.13 1.00 0.13 0.06 1.02 2.15 0.02 0.16 0.95 0.09 2.06 2.12 0.07 0.07 2.08 0.12 0.97 1.00 0.03 0.99 1.02 1.01 0.03 0.15 0.90 0.07 0.01 2.00 1.01 1.00 0.06 0.11 1.08 1.00 0.03 1.99 0.03 1.00 0.02 1.85 1.93 0.14 1.97 0.91 1.83 0.06 0.04 1.97 0.05 2.08 0.04 0.06 1.05 0.05 2.13 0.16 0.09 1.17 0.01 1.01 1.07 0.09 0.14 0.91 0.06 0.08 1.03 1.04 0.08 0.05 1.05 1.03 1.16 0.06 0.05 1.01 0.06 2.15 0.06 1.99 0.13 0.04 1.08 0.97 0.11 0.07 1.05 0.08 0.07 2.13 0.14 0.09 1.10 0.15 0.00 1.02 0.07 1.05 0.05 0.95 0.09 1.00 0.15 0.95 0.08 0.15 1.11 0.07 0.12 1.05 1.06 0.09 1.03 0.07 0.11 1.01 0.05 0.05 1.05 0.98 0.00 0.93 0.08 0.12 1.85 1.11 0.10 0.07 1.00 0.01 0.10 1.87 0.05 2.14 1.10 0.03 1.06 0.10 0.91 0.10 0.06 1.05 1.02 1.02 0.07 0.06 0.98 0.95 1.09 0.06 0.14 0.97 0.04 2.44")
self.assertEqual(flow3_trimmed.Bases,
"GCCACGCCGTAAACGGTGGGCGCTAGTTGTGCGAACCTTCCACGGTTTGTGCGGCGCAGCTAACGCATTAAGCGCCCTGCCTGGGGAGTACGATCGCAAGATTAAAACTCAAAGGAATTGACGGGGCCCCGCACAAGCAGCGGAGCATGCGGCTTAATTCGACGCAACGCGAAGAACCTTACCAAGGCTTGACATATACAGGAATATGGCAGAGATGTCATAGCCGCAAGGTCTGTATACAGG")
def test_createFlowHeader(self):
"""header_info dict turned into flowgram header"""
f = Flowgram('0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0', Name='a',
header_info = {'Bases':'TACCCCTTGG','Name Length':'14'})
self.assertEqual(f.createFlowHeader(),
""">a\n Name Length:\t14\nBases:\tTACCCCTTGG\nFlowgram:\t0.5\t1.0\t4.0\t0.0\t1.5\t0.0\t0.0\t2.0\n""")
def test_build_averaged_flowgram(self):
f1 = [0.3, 1.1, 4.0 , 0.01, 0.8, 0.0, 0.0, 2.0]
f2 = [0.6, 0.9, 4.05, 0.1, 1.2, 0.1, 0.4]
f3 = [0.4, 1.2, 4.05, 0.2, 1.3, 0.2]
f4 = [0.7, 1.0, 4.0 , 0.02, 1.5]
flowgrams = [f1,f2,f3,f4]
self.assertFloatEqual(build_averaged_flowgram(flowgrams),
[0.5, 1.05, 4.03, 0.08, 1.2, 0.1, 0.2, 2.0])
self.assertFloatEqual(build_averaged_flowgram([f1,f1,f1,f1,f1,f1]),
[0.3, 1.1, 4.0 , 0.01, 0.8, 0.0, 0.0, 2.0])
def setUp(self):
"""Define some standard data"""
self.rec = """Common Header:
Magic Number: 0x2E736666
Version: 0001
Index Offset: 96099976
Index Length: 1158685
# of Reads: 57902
Header Length: 440
Key Length: 4
# of Flows: 400
Flowgram Code: 1
Flow Chars: TACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACG
Key Sequence: TCAG
>FIQU8OX05GCVRO
Run Prefix: R_2008_10_15_16_11_02_
Region #: 5
XY Location: 2489_3906
Run Name: R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford
Analysis Name: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis
Full Path: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis
Read Header Len: 32
Name Length: 14
# of Bases: 104
Clip Qual Left: 5
Clip Qual Right: 85
Clip Adap Left: 0
Clip Adap Right: 0
Flowgram: 1.06 0.08 1.04 0.08 0.05 0.94 0.10 2.01 0.10 0.07 0.96 0.09 1.04 1.96 1.07 0.10 1.01 0.13 0.08 1.01 1.06 1.83 2.89 0.18 0.96 0.13 0.99 0.11 1.94 0.12 0.13 1.92 0.21 0.07 0.94 0.17 0.03 0.97 2.76 0.15 0.05 1.02 1.14 0.10 0.98 2.54 1.13 0.96 0.15 0.21 1.90 0.16 0.07 1.78 0.22 0.07 0.93 0.22 0.97 0.08 2.02 0.15 0.19 1.02 0.19 0.09 1.02 0.17 0.99 0.09 0.18 1.84 0.16 0.91 0.10 1.10 1.00 0.20 0.09 1.11 3.01 1.07 1.98 0.14 0.22 1.09 0.17 1.99 0.15 0.20 0.92 0.17 0.07 1.01 2.96 0.15 0.07 1.06 0.20 1.00 0.10 0.12 1.00 0.15 0.08 1.90 0.19 0.10 0.99 0.18 0.09 0.99 1.08 0.15 0.07 1.06 0.14 1.84 0.13 0.11 0.95 1.05 0.13 1.04 1.10 0.18 0.94 0.14 0.10 0.97 1.08 0.12 1.08 0.18 0.08 1.00 0.13 0.98 0.15 0.87 0.13 0.19 1.01 3.06 0.17 0.11 1.04 0.09 1.03 0.10 0.11 2.02 0.16 0.11 1.04 0.04 0.09 1.87 0.13 2.09 0.13 0.10 0.97 0.17 0.08 0.08 0.04 0.12 0.05 0.08 0.07 0.08 0.05 0.07 0.06 0.07 0.03 0.05 0.04 0.09 0.04 0.07 0.04 0.07 0.06 0.03 0.06 0.06 0.06 0.06 0.07 0.09 0.04 0.05 0.08 0.05 0.04 0.09 0.06 0.03 0.02 0.08 0.04 0.06 0.05 0.08 0.03 0.08 0.05 0.05 0.05 0.10 0.05 0.05 0.07 0.06 0.04 0.06 0.05 0.03 0.04 0.05 0.06 0.04 0.04 0.07 0.04 0.04 0.05 0.05 0.04 0.07 0.06 0.05 0.03 0.08 0.05 0.06 0.04 0.06 0.05 0.04 0.04 0.04 0.05 0.06 0.04 0.05 0.04 0.05 0.05 0.06 0.05 0.06 0.04 0.06 0.07 0.06 0.05 0.05 0.05 0.06 0.06 0.04 0.05 0.06 0.03 0.06 0.04 0.06 0.05 0.03 0.06 0.06 0.05 0.06 0.04 0.03 0.06 0.06 0.06 0.03 0.04 0.05 0.05 0.07 0.04 0.05 0.06 0.07 0.07 0.05 0.07 0.06 0.05 0.06 0.05 0.07 0.06 0.05 0.06 0.07 0.05 0.06 0.04 0.06 0.05 0.05 0.06 0.04 0.06 0.04 0.03 0.06 0.05 0.05 0.04 0.05 0.05 0.04 0.04 0.05 0.06 0.06 0.04 0.04 0.05 0.06 0.04 0.04 0.04 0.05 0.05 0.04 0.05 0.05 0.03 0.06 0.06 0.06 0.04 0.07 0.05 0.05 0.04 0.06 0.06 0.05 0.05 0.07 0.04 0.06 0.06 0.06 0.04 0.06 0.03 0.06 0.04 0.06 0.04 0.09 0.05 0.05 0.05 0.07 0.06 0.05 0.05 0.06 0.05 0.05 0.05 0.04 0.04 0.06 0.05 0.05 0.05 0.05 0.04 0.05 0.05 0.06 0.04 0.05 0.05 0.05 0.05 0.05 0.04 0.06 0.04 0.05 0.05 0.04 0.05 0.05 0.05 0.04
Flow Indexes: 1 3 6 8 8 11 13 14 14 15 17 20 21 22 22 23 23 23 25 27 29 29 32 32 35 38 39 39 39 42 43 45 46 46 46 47 48 51 51 54 54 57 59 61 61 64 67 69 72 72 74 76 77 80 81 81 81 82 83 83 86 88 88 91 94 95 95 95 98 100 103 106 106 109 112 113 116 118 118 121 122 124 125 127 130 131 133 136 138 140 143 144 144 144 147 149 152 152 155 158 158 160 160 163
Bases: tcagGCTAACTGTAACCCTCTTGGCACCCACTAAACGCCAATCTTGCTGGAGTGTTTACCAGGCACCCAGCAATGTGAATAGTCActgagcgggctggcaaggc
Quality Scores: 37 37 37 37 37 37 37 37 37 37 37 37 37 40 40 40 40 37 37 37 37 37 39 39 39 39 24 24 24 37 34 28 24 24 24 28 34 39 39 39 39 39 39 39 39 39 39 39 39 40 40 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37
>FIQU8OX05F8ILF
Run Prefix: R_2008_10_15_16_11_02_
Region #: 5
XY Location: 2440_0913
Run Name: R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford
Analysis Name: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis
Full Path: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis
Read Header Len: 32
Name Length: 14
# of Bases: 206
Clip Qual Left: 5
Clip Qual Right: 187
Clip Adap Left: 0
Clip Adap Right: 0
Flowgram: 1.04 0.00 1.01 0.00 0.00 1.00 0.00 1.00 0.00 1.05 0.00 0.91 0.10 1.07 0.95 1.01 0.00 0.06 0.93 0.02 0.03 1.06 1.18 0.09 1.00 0.05 0.90 0.11 0.07 1.99 0.11 0.02 1.96 1.04 0.13 0.01 2.83 0.10 1.97 0.06 0.11 1.04 0.13 0.03 0.98 1.15 0.07 1.00 0.07 0.08 0.98 0.11 1.92 0.05 0.04 2.96 1.02 1.02 0.04 0.93 1.00 0.13 0.04 1.00 1.03 0.08 0.97 0.13 0.11 1.88 0.09 0.05 1.02 1.89 0.07 0.11 0.98 0.05 0.07 1.01 0.08 0.05 1.01 0.13 1.00 0.07 0.10 1.04 0.10 0.04 0.98 0.12 1.03 0.96 0.11 0.07 1.00 0.09 0.03 1.03 0.11 1.95 1.06 0.13 0.05 1.00 0.13 0.11 1.00 0.09 0.03 2.89 0.08 0.95 0.09 1.03 1.02 1.05 1.07 0.08 0.12 2.81 0.08 0.08 1.00 1.07 0.07 0.05 1.86 0.12 0.98 0.06 2.00 0.11 1.02 0.11 0.08 1.88 0.13 1.03 0.13 0.98 0.15 0.11 1.03 1.03 1.04 0.18 0.98 0.13 0.15 1.04 0.11 1.01 0.13 0.06 1.01 0.06 1.02 0.08 0.99 0.14 0.99 0.09 0.05 1.09 0.04 0.07 2.96 0.09 2.03 0.13 2.96 1.13 0.08 1.03 0.07 0.99 0.11 0.05 1.05 1.04 0.09 0.07 1.00 1.03 0.09 0.06 1.06 1.04 2.94 0.18 0.06 0.93 0.10 1.10 0.11 2.02 0.17 1.00 1.03 0.06 0.11 0.96 0.04 3.00 0.11 0.07 1.99 0.10 2.03 0.12 0.97 0.16 0.01 2.09 0.14 1.04 0.16 0.06 1.03 0.14 1.12 0.12 0.05 0.96 1.01 0.10 0.14 0.94 0.03 0.12 1.10 0.92 0.09 1.10 1.04 1.02 0.12 0.97 2.00 0.15 1.08 0.04 1.03 1.04 0.03 0.09 5.16 1.02 0.09 0.13 2.66 0.09 0.05 1.06 0.07 0.89 0.05 0.12 1.10 0.16 0.06 1.01 0.13 1.00 0.14 0.98 0.09 2.92 1.28 0.03 2.95 0.98 0.16 0.08 0.95 0.96 1.09 0.08 1.07 1.01 0.16 0.06 4.52 0.12 1.03 0.07 0.09 1.03 0.14 0.03 1.01 1.99 1.05 0.14 1.03 0.13 0.03 1.10 0.10 0.96 0.11 0.99 0.12 0.05 0.94 2.83 0.14 0.12 0.96 0.00 1.00 0.11 0.14 1.98 0.08 0.11 1.04 0.01 0.11 2.03 0.15 2.05 0.10 0.03 0.93 0.01 0.08 0.12 0.00 0.16 0.05 0.07 0.08 0.11 0.07 0.05 0.04 0.10 0.05 0.05 0.03 0.07 0.03 0.04 0.04 0.06 0.03 0.05 0.04 0.09 0.03 0.08 0.03 0.07 0.02 0.05 0.02 0.06 0.01 0.05 0.04 0.06 0.02 0.04 0.04 0.04 0.03 0.03 0.06 0.06 0.03 0.02 0.02 0.08 0.03 0.01 0.01 0.06 0.03 0.01 0.03 0.04 0.02 0.00 0.02 0.05 0.00 0.02 0.02 0.03 0.00 0.02 0.02 0.04 0.01 0.00 0.01 0.05
Flow Indexes: 1 3 6 8 10 12 14 15 16 19 22 23 25 27 30 30 33 33 34 37 37 37 39 39 42 45 46 48 51 53 53 56 56 56 57 58 60 61 64 65 67 70 70 73 74 74 77 80 83 85 88 91 93 94 97 100 102 102 103 106 109 112 112 112 114 116 117 118 119 122 122 122 125 126 129 129 131 133 133 135 138 138 140 142 145 146 147 149 152 154 157 159 161 163 166 169 169 169 171 171 173 173 173 174 176 178 181 182 185 186 189 190 191 191 191 194 196 198 198 200 201 204 206 206 206 209 209 211 211 213 216 216 218 221 223 226 227 230 233 234 236 237 238 240 241 241 243 245 246 249 249 249 249 249 250 253 253 253 256 258 261 264 266 268 270 270 270 271 273 273 273 274 277 278 279 281 282 285 285 285 285 285 287 290 293 294 294 295 297 300 302 304 307 308 308 308 311 313 316 316 319 322 322 324 324 327
Bases: tcagAGACGCACTCAATTATTTCCATAGCTTGGGTAGTGTCAATAATGCTGCTATGAACATGGGAGTACAAATATTCTTCAAGATACTGATCTCATTTCCTTTAGATATATACCCAGAAGTGAAATTCCTGGATCACATAGTAGTTCTATTTTTATTTGATGAGAAACTTTATACTATTTTTCATAActgagcgggctggcaaggc
Quality Scores: 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 38 38 38 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 34 34 34 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 36 36 36 36 36 38 25 25 25 38 37 37 37 37 37 37 33 33 34 37 37 37 37 37 37 37 38 34 20 20 26 26 20 34 38 37 37 37 37 37 37 37 37 37 38 38 38 37 37 37 37 37 37 37 37 37 37
>FIQU8OX06G9PCS
Run Prefix: R_2008_10_15_16_11_02_
Region #: 6
XY Location: 2863_3338
Run Name: R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford
Analysis Name: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis
Full Path: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis
Read Header Len: 32
Name Length: 14
# of Bases: 264
Clip Qual Left: 5
Clip Qual Right: 264
Clip Adap Left: 0
Clip Adap Right: 0
Flowgram: 1.04 0.05 1.01 0.07 0.05 0.99 0.03 1.05 0.04 1.05 0.05 0.06 2.05 1.13 0.03 1.00 0.08 1.07 0.09 0.05 1.02 1.11 3.06 0.09 0.04 1.03 0.13 1.97 1.02 1.07 0.06 2.10 0.05 0.05 2.04 0.10 0.03 1.06 1.05 1.01 0.07 0.09 2.07 1.01 0.93 2.88 1.06 1.95 1.00 0.05 0.05 2.97 0.09 0.00 0.93 1.01 0.06 0.05 0.99 0.09 0.98 1.01 0.03 1.02 1.92 0.07 0.01 1.03 1.01 0.01 0.05 0.96 0.09 0.05 0.98 1.07 0.02 2.02 2.05 0.09 1.87 0.12 2.15 0.05 0.13 0.92 1.05 1.96 3.01 0.13 0.04 1.05 0.96 0.05 0.05 0.95 0.12 0.01 1.00 2.02 0.03 0.03 0.99 1.01 0.05 0.06 0.98 0.13 0.06 0.97 0.11 1.01 0.08 0.12 1.02 0.12 1.02 2.19 1.03 1.01 0.08 0.11 0.96 0.09 0.08 1.01 0.08 0.06 2.10 2.11 0.12 1.04 0.13 0.09 0.94 1.03 0.08 0.05 3.06 0.12 1.00 0.03 0.09 0.95 0.10 0.03 2.09 0.21 0.99 0.06 0.11 4.06 0.10 1.04 0.04 1.05 1.05 1.04 1.02 0.97 0.13 0.93 0.10 0.12 1.08 0.12 0.99 1.06 0.10 0.11 0.98 0.10 0.02 2.01 0.10 1.01 0.09 0.96 0.07 0.11 2.03 4.12 1.05 0.08 1.01 0.04 0.98 0.14 0.12 2.96 0.13 1.98 0.12 2.08 0.10 0.12 1.99 0.13 0.07 0.98 0.03 0.93 0.86 4.10 0.13 0.10 3.99 1.13 0.07 0.06 1.07 0.09 0.05 1.03 1.12 0.13 0.05 2.01 0.08 0.80 0.05 0.11 0.98 0.13 0.04 1.01 0.07 1.02 0.07 0.11 1.07 2.19 0.06 0.97 0.11 1.03 0.05 0.11 1.05 0.14 0.06 1.03 0.13 0.10 0.97 0.16 0.13 1.00 0.13 0.06 1.02 2.15 0.02 0.16 0.95 0.09 2.06 2.12 0.07 0.07 2.08 0.12 0.97 1.00 0.03 0.99 1.02 1.01 0.03 0.15 0.90 0.07 0.01 2.00 1.01 1.00 0.06 0.11 1.08 1.00 0.03 1.99 0.03 1.00 0.02 1.85 1.93 0.14 1.97 0.91 1.83 0.06 0.04 1.97 0.05 2.08 0.04 0.06 1.05 0.05 2.13 0.16 0.09 1.17 0.01 1.01 1.07 0.09 0.14 0.91 0.06 0.08 1.03 1.04 0.08 0.05 1.05 1.03 1.16 0.06 0.05 1.01 0.06 2.15 0.06 1.99 0.13 0.04 1.08 0.97 0.11 0.07 1.05 0.08 0.07 2.13 0.14 0.09 1.10 0.15 0.00 1.02 0.07 1.05 0.05 0.95 0.09 1.00 0.15 0.95 0.08 0.15 1.11 0.07 0.12 1.05 1.06 0.09 1.03 0.07 0.11 1.01 0.05 0.05 1.05 0.98 0.00 0.93 0.08 0.12 1.85 1.11 0.10 0.07 1.00 0.01 0.10 1.87 0.05 2.14 1.10 0.03 1.06 0.10 0.91 0.10 0.06 1.05 1.02 1.02 0.07 0.06 0.98 0.95 1.09 0.06 0.14 0.97 0.04 2.44
Flow Indexes: 1 3 6 8 10 13 13 14 16 18 21 22 23 23 23 26 28 28 29 30 32 32 35 35 38 39 40 43 43 44 45 46 46 46 47 48 48 49 52 52 52 55 56 59 61 62 64 65 65 68 69 72 75 76 78 78 79 79 81 81 83 83 86 87 88 88 89 89 89 92 93 96 99 100 100 103 104 107 110 112 115 117 118 118 119 120 123 126 129 129 130 130 132 135 136 139 139 139 141 144 147 147 149 152 152 152 152 154 156 157 158 159 160 162 165 167 168 171 174 174 176 178 181 181 182 182 182 182 183 185 187 190 190 190 192 192 194 194 197 197 200 202 203 204 204 204 204 207 207 207 207 208 211 214 215 218 218 220 223 226 228 231 232 232 234 236 239 242 245 248 251 252 252 255 257 257 258 258 261 261 263 264 266 267 268 271 274 274 275 276 279 280 282 282 284 286 286 287 287 289 289 290 291 291 294 294 296 296 299 301 301 304 306 307 310 313 314 317 318 319 322 324 324 326 326 329 330 333 336 336 339 342 344 346 348 350 353 356 357 359 362 365 366 368 371 371 372 375 378 378 380 380 381 383 385 388 389 390 393 394 395 398 400 400
Bases: tcagATTAGATACCCAGGTAGGCCACGCCGTAAACGGTGGGCGCTAGTTGTGCGAACCTTCCACGGTTTGTGCGGCGCAGCTAACGCATTAAGCGCCCTGCCTGGGGAGTACGATCGCAAGATTAAAACTCAAAGGAATTGACGGGGCCCCGCACAAGCAGCGGAGCATGCGGCTTAATTCGACGCAACGCGAAGAACCTTACCAAGGCTTGACATATACAGGAATATGGCAGAGATGTCATAGCCGCAAGGTCTGTATACAGG
Quality Scores: 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 37 37 37 37 37 37 37 37 37 40 40 38 38 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 40 40 40 37 37 37 37 37 37 37 37 38 38 40 40 40 40 40 38 38 38 38 38 40 40 38 38 38 38 38 40 40 40 40 38 38 38 38 38 38 31 30 30 30 32 31 32 31 32 31 31 28 25 21 20
""".split('\n')
flows, head = parse_sff(self.rec)
self.flows = list(flows)
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_parse/test_flowgram_collection.py 000644 000765 000024 00000110425 12024702176 025221 0 ustar 00jrideout staff 000000 000000 __author__ = "Julia Goodrich"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jens Reeder","Julia Goodrich"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Julia Goodrich"
__email__ = "julia.goodrich@colorado.edu"
__Status__ = "Development"
from cogent.util.unit_test import TestCase, main
from types import GeneratorType
from numpy import array, transpose
from cogent.core.sequence import Sequence
from cogent.parse.flowgram_collection import FlowgramCollection, flows_from_array,\
flows_from_generic,flows_from_kv_pairs,flows_from_empty,flows_from_dict,\
flows_from_sff,assign_sequential_names,flows_from_flowCollection,\
pick_from_prob_density, seqs_to_flows
from cogent.parse.flowgram import Flowgram
from cogent.core.alignment import SequenceCollection
from tempfile import mktemp
from os import remove
class flowgram_tests(TestCase):
"""Tests of top-level functions."""
def test_flows_from_array(self):
"""flows_from_array should return chars, and successive indices."""
a = array([[0,1,2],[2,1,0]]) #three 2-char seqs
obs_a, obs_labels, obs_info = flows_from_array(a)
#note transposition
self.assertEqual(obs_a, [array([0,2]), array([1,1]), array([2,0])])
self.assertEqual(obs_labels, None)
self.assertEqual(obs_info, None)
def test_flows_from_generic(self):
"""flows_from_flow should initialize from list of flowgram objects"""
c = Flowgram('0.0 1.1 3.0 1.0', Name='a')
b = Flowgram('0.5 1.0 4.0 0.0', Name = 'b')
obs_a, obs_labels, obs_info = flows_from_generic([c,b])
self.assertEqual(map(str,obs_a), ['0.0\t1.1\t3.0\t1.0',
'0.5\t1.0\t4.0\t0.0'])
self.assertEqual(obs_labels, ['a','b'])
self.assertEqual(obs_info, [None,None])
f = ['0.0 1.1 3.0 1.0','0.5 1.0 4.0 0.0']
obs_a, obs_labels, obs_info = flows_from_generic(f)
self.assertEqual(map(str,obs_a), ['0.0 1.1 3.0 1.0',
'0.5 1.0 4.0 0.0'])
self.assertEqual(obs_labels, [None,None])
self.assertEqual(obs_info, [None,None])
def test_flows_from_flowCollection(self):
"""flows_from_flowCollection should init from existing collection"""
c = FlowgramCollection({'a':'0.0 1.1 3.0 1.0','b':'0.5 1.0 4.0 0.0'})
obs_a, obs_labels, obs_info = flows_from_flowCollection(c)
self.assertEqual(map(str,obs_a), ['0.0\t1.1\t3.0\t1.0',
'0.5\t1.0\t4.0\t0.0'])
self.assertEqual(obs_labels, ['a','b'])
self.assertEqual(obs_info, [None,None])
def test_flows_from_kv_pairs(self):
"""seqs_from_kv_pairs should initialize from key-value pairs"""
c = [['a','0.0 1.1 3.0 1.0'],['b','0.5 1.0 4.0 0.0']]
obs_a, obs_labels, obs_info = flows_from_kv_pairs(c)
self.assertEqual(map(str,obs_a), ['0.0 1.1 3.0 1.0','0.5 1.0 4.0 0.0'])
self.assertEqual(obs_labels, ['a','b'])
self.assertEqual(obs_info, [None,None])
c =[['a',Flowgram('0.0 1.1 3.0 1.0')],['b',Flowgram('0.5 1.0 4.0 0.0')]]
obs_a, obs_labels, obs_info = flows_from_kv_pairs(c)
self.assertEqual(map(str,obs_a), ['0.0\t1.1\t3.0\t1.0','0.5\t1.0\t4.0\t0.0'])
self.assertEqual(obs_labels, ['a','b'])
self.assertEqual(obs_info, [None,None])
def test_flows_from_empty(self):
"""flowss_from_empty should always raise ValueError"""
self.assertRaises(ValueError, flows_from_empty, 'xyz')
def test_flows_from_dict(self):
"""flows_from_dict should init from dictionary"""
c = {'a':'0.0 1.1 3.0 1.0','b':'0.5 1.0 4.0 0.0'}
obs_a, obs_labels, obs_info = flows_from_dict(c)
self.assertEqual(map(str,obs_a), ['0.0 1.1 3.0 1.0','0.5 1.0 4.0 0.0'])
self.assertEqual(obs_labels, ['a','b'])
self.assertEqual(obs_info, [None,None])
c ={'a':Flowgram('0.0 1.1 3.0 1.0'),'b':Flowgram('0.5 1.0 4.0 0.0')}
obs_a, obs_labels, obs_info = flows_from_dict(c)
self.assertEqual(map(str,obs_a), ['0.0\t1.1\t3.0\t1.0','0.5\t1.0\t4.0\t0.0'])
self.assertEqual(obs_labels, ['a','b'])
self.assertEqual(obs_info, [None,None])
def test_pick_from_prob_density(self):
"""Should take bin probabilitys and bin size and return random"""
i = pick_from_prob_density([0,1.0,0,0],1)
self.assertEqual(i,1)
i = pick_from_prob_density([1.0,0,0,0],.01)
self.assertEqual(i,0.0)
def test_seqs_to_flows(self):
"""seqs_to_flows should take a list of seqs and probs and return """
seqs = [('a','ATCGT'), ('b','ACCCAG'), ('c','GTAATG')]
a = SequenceCollection(seqs)
flows = seqs_to_flows(a.items())
assert isinstance(flows,FlowgramCollection)
for f,i in zip(flows,['0.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0',
'0.0 1.0 3.0 0.0 0.0 1.0 0.0 1.0',
'0.0 0.0 0.0 1.0 1.0 2.0 0.0 0.0 1.0 0.0 0.0 1.0']):
self.assertEqual(f,i)
probs ={0:[1.0,0,0,0,0],1:[0,1.0,0,0,0],2:[0,0,1.0,0,0],3:[0,0,0,1.0,0]}
flows = seqs_to_flows(a.items(), probs = probs, bin_size = 1.0)
assert isinstance(flows,FlowgramCollection)
for f,i in zip(flows,['0.0 1.0 0.0 0.0 1.0 0.0 1.0 1.0 1.0 0.0 0.0 0.0',
'0.0 1.0 3.0 0.0 0.0 1.0 0.0 1.0',
'0.0 0.0 0.0 1.0 1.0 2.0 0.0 0.0 1.0 0.0 0.0 1.0']):
self.assertEqual(f,i)
class FlowgramCollectionTests(TestCase):
"""Tests sff parser functions"""
Class = FlowgramCollection
def test_guess_input_type(self):
""" _guess_input_type should figure out data type correctly"""
git = self.unordered._guess_input_type
self.assertEqual(git(self.unordered), 'flowcoll')
self.assertEqual(git(['0.0 1.1 3.0 1.0','0.5 1.0 4.0 0.0']), 'generic')
self.assertEqual(git([Flowgram('0.0 1.1 3.0 1.0'),
Flowgram('0.5 1.0 4.0 0.0')]), 'generic')
self.assertEqual(git([[1,2],[4,5]]), 'kv_pairs') #precedence over generic
self.assertEqual(git([('a',Flowgram('0.0 1.1 3.0 1.0')),
('b',Flowgram('0.5 1.0 4.0 0.0'))]), 'kv_pairs')
self.assertEqual(git([[1,2,3],[4,5,6]]), 'generic')
self.assertEqual(git(array([[1,2,3],[4,5,6]])), 'array')
self.assertEqual(git({'a':'0.0 1.1 3.0 1.0'}), 'dict')
self.assertEqual(git({'a':Flowgram('0.0 1.1 3.0 1.0')}), 'dict')
self.assertEqual(git([]), 'empty')
self.assertEqual(git('Common Header'), 'sff')
def test_init_pairs(self):
"""FlowgramCollection init from list of (key,val) should work"""
Flows = [['a','0.0 1.1 3.0 1.0'],['b','0.5 1.0 4.0 0.0']]
a = self.Class(Flows)
self.assertEqual(len(a.NamedFlows), 2)
self.assertEqual(a.NamedFlows['a'], '0.0 1.1 3.0 1.0')
self.assertEqual(a.NamedFlows['b'], '0.5 1.0 4.0 0.0')
self.assertEqual(a.Names, ['a','b'])
self.assertEqual(list(a.flows), ['0.0 1.1 3.0 1.0','0.5 1.0 4.0 0.0'])
def test_init_aln(self):
"""FlowgramCollection should init from existing Collections"""
start = self.Class([['a','0.0 1.1 3.0 1.0'],['b','0.5 1.0 4.0 0.0']])
exp = self.Class([['a','0.0 1.1 3.0 1.0'],['b','0.5 1.0 4.0 0.0']])
f = self.Class(start)
self.assertEqual(f, exp)
test_init_aln.__doc__ = Class.__name__ + test_init_aln.__doc__
def test_init_dict(self):
"""FlowgramCollection init from dict should work as expected"""
d = {'a':'0.0 1.1 3.0 1.0','b':'0.5 1.0 4.0 0.0'}
a = self.Class(d)
self.assertEqual(a, d)
self.assertEqual(a.NamedFlows.items(), d.items())
def test_init_name_mapped(self):
"""FlowgramCollection init should allow name mapping function"""
d = {'a':'0.0 1.1 3.0 1.0','b':'0.5 1.0 4.0 0.0'}
f = lambda x: x.upper()
a = self.Class(d, name_conversion_f=f)
self.assertNotEqual(a, d)
self.assertNotEqual(a.NamedFlows.items(), d.items())
d_upper = {'A':'0.0 1.1 3.0 1.0','B':'0.5 1.0 4.0 0.0'}
self.assertEqual(a, d_upper)
self.assertEqual(a.NamedFlows.items(), d_upper.items())
def test_init_flow(self):
"""FlowgramCollection init from list of flowgrams should use indices
as keys"""
f1 = Flowgram('0.0 1.1 3.0 1.0')
f2 = Flowgram('0.5 1.0 4.0 0.0')
flows = [f1,f2]
a = self.Class(flows)
self.assertEqual(len(a.NamedFlows), 2)
self.assertEqual(a.NamedFlows['seq_0'], '0.0 1.1 3.0 1.0')
self.assertEqual(a.NamedFlows['seq_1'], '0.5 1.0 4.0 0.0')
self.assertEqual(a.Names, ['seq_0','seq_1'])
self.assertEqual(list(a.Flows), ['0.0 1.1 3.0 1.0','0.5 1.0 4.0 0.0'])
def test_flows_from_sff(self):
"""flow_from_sff should init from sff iterator"""
s = self.rec
f = self.Class(s)
self.assertEqual(f.NamedFlows['FIQU8OX05GCVRO'], self.flow)
def test_init_duplicate_keys(self):
"""FlowgramCollection init from kv pairs should fail on dup. keys"""
f = [['a','0.0 1.1 3.0 1.0'],['b','0.5 1.0 4.0 0.0'],
['b','1.5 2.0 0.0 0.5']]
self.assertRaises(ValueError, self.Class, f)
self.assertEqual(self.Class(f, remove_duplicate_names=True).Names,
['a','b'])
def test_init_ordered(self):
"""FlowgramCollection should iter over flows correctly, ordered too"""
first = self.ordered1
sec = self.ordered2
un = self.unordered
self.assertEqual(first.Names, ['a','b'])
self.assertEqual(sec.Names, ['b', 'a'])
self.assertEqual(un.Names, un.NamedFlows.keys())
first_list = list(first.flow_str)
sec_list = list(sec.flow_str)
un_list = list(un.flow_str)
self.assertEqual(first_list, ['0.0 1.1 3.0 1.0','0.5 1.0 4.0 0.0'])
self.assertEqual(sec_list, ['0.5 1.0 4.0 0.0', '0.0 1.1 3.0 1.0'])
#check that the unordered seq matches one of the lists
self.assertTrue((un_list == first_list) or (un_list == sec_list))
self.assertNotEqual(first_list, sec_list)
def test_flow_str(self):
"""FlowgramCollection flow_str prop returns flows in correct order."""
first = self.ordered1
sec = self.ordered2
un = self.unordered
first_list = list(first.flow_str)
sec_list = list(sec.flow_str)
un_list = list(un.flow_str)
self.assertEqual(first_list, ['0.0 1.1 3.0 1.0','0.5 1.0 4.0 0.0'])
self.assertEqual(sec_list, ['0.5 1.0 4.0 0.0', '0.0 1.1 3.0 1.0'])
#check that the unordered seq matches one of the lists
self.assertTrue((un_list == first_list) or (un_list == sec_list))
self.assertNotEqual(first_list, sec_list)
def test_iter(self):
"""FlowgramCollection __iter__ method should yield flows inorder"""
f = self.Class(['0.0 1.1 3.0 1.0','0.5 1.0 4.0 0.0','1.5 0.0 2.0 1.0'], \
Names=['a','b','c'])
for i,b in zip(f,['0.0 1.1 3.0 1.0','0.5 1.0 4.0 0.0',
'1.5 0.0 2.0 1.0']):
self.assertEqual(i,b)
def test_str(self):
"""FlowgramCollection __str__ should return sff format"""
a = [Flowgram('0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0', Name='a',
header_info = {'Bases':'TACCCCTTGG','Name Length':'14'}),
Flowgram('1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0', Name = 'b',
header_info = {'Bases':'TTATTTACCG','Name Length':'14'})]
f = FlowgramCollection(a, header_info = {'Flow Chars':'TACG'})
self.assertEqual(str(f), """Common Header:\n Flow Chars:\tTACG\n\n>a\n Name Length:\t14\nBases:\tTACCCCTTGG\nFlowgram:\t0.5\t1.0\t4.0\t0.0\t1.5\t0.0\t0.0\t2.0\n\n>b\n Name Length:\t14\nBases:\tTTATTTACCG\nFlowgram:\t1.5\t1.0\t0.0\t0.0\t2.5\t1.0\t2.0\t1.0\n""")
def test_len(self):
"""len(FlowgramCollection) returns length of longest sequence"""
a = [('a','0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0'),
('b','1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0'),
('c','2.5 0.0 4.0 0.0 0.5 1.0 0.0 1.0')]
f = FlowgramCollection(a)
self.assertEqual(len(f), 3)
def test_writeToFile(self):
"""FlowgramCollection.writeToFile should write in correct format"""
a = [Flowgram('0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0', Name='a',
header_info = {'Bases':'TACCCCTTGG','Name Length':'14'}),
Flowgram('1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0', Name = 'b',
header_info = {'Bases':'TTATTTACCG','Name Length':'14'})]
f = FlowgramCollection(a, header_info = {'Flow Chars':'TACG'})
fn = mktemp(suffix='.sff')
f.writeToFile(fn)
result = open(fn, 'U').read()
self.assertEqual(result, """Common Header:\n Flow Chars:\tTACG\n\n>a\n Name Length:\t14\nBases:\tTACCCCTTGG\nFlowgram:\t0.5\t1.0\t4.0\t0.0\t1.5\t0.0\t0.0\t2.0\n\n>b\n Name Length:\t14\nBases:\tTTATTTACCG\nFlowgram:\t1.5\t1.0\t0.0\t0.0\t2.5\t1.0\t2.0\t1.0\n""")
remove(fn)
def test_createCommonHeader(self):
"""create_commor_header should return lines for sff common header"""
a = [Flowgram('0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0', Name='a',
header_info = {'Bases':'TACCCCTTGG','Name Length':'14'}),
Flowgram('1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0', Name = 'b',
header_info = {'Bases':'TTATTTACCG','Name Length':'14'})]
f = FlowgramCollection(a, header_info = {'Flow Chars':'TACG'})
self.assertEqual('\n'.join(f.createCommonHeader()),
"""Common Header:\n Flow Chars:\tTACG""")
def test_toFasta(self):
"""FlowgramCollection should return correct FASTA string"""
f = self.Class( [ '0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0',
'1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0',
'2.5 0.0 4.0 0.0 0.5 1.0 0.0 1.0',
'0.0 1.0 0.0 3.0 1.5 1.0 1.0 2.0'
], header_info = {'Flow Chars':'TACG'})
self.assertEqual(f.toFasta(), '>seq_0\nTACCCCTTGG\n>seq_1\nTTATTTACCG\n>seq_2\nTTTCCCCTAG\n>seq_3\nAGGGTTACGG')
#NOTE THE FOLLOWING SURPRISING BEHAVIOR BECAUSE OF THE TWO-ITEM
#SEQUENCE RULE:
aln = self.Class(['0.5 1.0 0.0 0.0','0.0 1.0 1.0 0.0'],
header_info = {'Flow Chars':'TACG'})
self.assertEqual(aln.toFasta(), '>A\nC\n>T\nA')
def test_toPhylip(self):
"""FlowgramCollection should return PHYLIP string format correctly"""
f = self.Class( [ '0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0',
'1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0',
'2.5 0.0 4.0 0.0 0.5 1.0 0.0 1.0',
'0.0 1.0 0.0 3.0 1.5 1.0 1.0 2.0'
], header_info = {'Flow Chars':'TACG'})
phylip_str, id_map = f.toPhylip()
self.assertEqual(phylip_str, """4 10\nseq0000001 TACCCCTTGG\nseq0000002 TTATTTACCG\nseq0000003 TTTCCCCTAG\nseq0000004 AGGGTTACGG""")
self.assertEqual(id_map, {'seq0000004':'seq_3', 'seq0000001':'seq_0', \
'seq0000003': 'seq_2', 'seq0000002': 'seq_1'})
def test_toNexus(self):
"""FlowgramCollection should return correct Nexus string format"""
f = self.Class( [ '0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0',
'1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0',
'2.5 0.0 4.0 0.0 0.5 1.0 0.0 1.0',
'0.0 1.0 0.0 3.0 1.5 1.0 1.0 2.0'
], header_info = {'Flow Chars':'TACG'})
expect = '#NEXUS\n\nbegin data;\n dimensions ntax=4 nchar=10;\n'+\
' format datatype=dna interleave=yes missing=? gap=-;\n'+\
' matrix\n seq_1 TTATTTACCG\n seq_0'+\
' TACCCCTTGG\n seq_3 AGGGTTACGG\n '+\
' seq_2 TTTCCCCTAG\n\n ;\nend;'
self.assertEqual(f.toNexus('dna'), expect)
def test_toSequenceCollection(self):
"""toSequenceCollection should return sequence collection from flows"""
f = self.Class( [ '0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0',
'1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0',
'2.5 0.0 4.0 0.0 0.5 1.0 0.0 1.0',
'0.0 1.0 0.0 3.0 1.5 1.0 1.0 2.0'
], header_info = {'Flow Chars':'TACG'})
s = f.toSequenceCollection()
assert isinstance(s,SequenceCollection)
for i,j in zip(s.iterSeqs(),['TACCCCTTGG','TTATTTACCG','TTTCCCCTAG',
'AGGGTTACGG']):
self.assertEqual(i,j)
a = [Flowgram('0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0', Name='a',
header_info = {'Bases':'TACTTGG','Name Length':'14'}),
Flowgram('1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0', Name = 'b',
header_info = {'Bases':'TTATTTG','Name Length':'14'})]
f = self.Class(a)
s = f.toSequenceCollection(Bases = True)
assert isinstance(s,SequenceCollection)
for i,j in zip(s.iterSeqs(),['TACTTGG','TTATTTG']):
self.assertEqual(i,j)
def test_addFlows(self):
"""addFlows should return an alignment with the new sequences appended"""
a = [('s4', '0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0'),
('s3', '1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0')]
b = [('s1','2.5 0.0 4.0 0.0 0.5 1.0 0.0 1.0'),
('s2', '0.0 1.0 0.0 3.0 1.5 1.0 1.0 2.0')]
f1 = self.Class(a, header_info = {'Flow Chars':'TACG'})
f2 = self.Class(b, header_info = {'Flow Chars':'TACG'})
self.assertEqual(f1.addFlows(f2).toFasta(),
self.Class(a+b, header_info = {'Flow Chars':'TACG'}).toFasta())
def test_iterFlows(self):
"""FlowgramCollection iterFlows() method should support reordering"""
f = self.Class(['0.0 1.1 3.0 1.0','0.5 1.0 4.0 0.0','1.5 0.0 2.0 1.0'], \
Names=['a','b','c'])
flows = map(str,list(f.iterFlows()))
self.assertEqual(flows, ['0.0\t1.1\t3.0\t1.0',
'0.5\t1.0\t4.0\t0.0','1.5\t0.0\t2.0\t1.0'])
flows = list(f.iterFlows(flow_order=['b','a','a']))
self.assertEqual(map(str,flows), ['0.5\t1.0\t4.0\t0.0',
'0.0\t1.1\t3.0\t1.0',
'0.0\t1.1\t3.0\t1.0'])
self.assertSameObj(flows[1], flows[2])
self.assertSameObj(flows[0], f.NamedFlows['b'])
def test_Items(self):
"""FlowgramCollection Items should iterate over items in specified order."""
#should work if one row
self.assertEqual(list(self.one_seq.Items), [0.0, 1.1, 3.0, 1.0])
#should take order into account
self.assertEqual(list(self.ordered1.Items),
[0.0, 1.1, 3.0, 1.0] + [0.5, 1.0, 4.0, 0.0])
self.assertEqual(list(self.ordered2.Items),
[0.5, 1.0, 4.0, 0.0] + [0.0, 1.1, 3.0, 1.0])
def test_takeFlows(self):
"""takeFlows should return new FlowgramCollection with selected seqs."""
f = self.Class(['0.0 1.1 3.0 1.0','0.5 1.0 4.0 0.0','1.5 0.0 2.0 1.0'], \
Names=['a','b','c'])
a = f.takeFlows('bc')
self.assertTrue(isinstance(a, FlowgramCollection))
self.assertEqual(a, {'b':'0.5 1.0 4.0 0.0','c':'1.5 0.0 2.0 1.0'})
#should be able to negate
a = f.takeFlows('bc', negate=True)
self.assertEqual(a, {'a':'0.0 1.1 3.0 1.0'})
def test_getFlowIndices(self):
"""FlowgramCollection getSeqIndices should return names of seqs where f(row) is True"""
f = self.ambiguous
is_long = lambda x: len(x) > 10
is_med = lambda x: len(str(x).replace('N','')) > 7 #strips gaps
is_any = lambda x: len(x) > 0
self.assertEqual(f.getFlowIndices(is_long,Bases = True), [])
f.Names = 'cba'
self.assertEqual(f.getFlowIndices(is_med,Bases = True), ['c','a'])
f.Names = 'bac'
self.assertEqual(f.getFlowIndices(is_med,Bases = True), ['a','c'])
self.assertEqual(f.getFlowIndices(is_any,Bases = True), ['b','a','c'])
#should be able to negate
self.assertEqual(f.getFlowIndices(is_med,Bases = True, negate=True),
['b'])
self.assertEqual(f.getFlowIndices(is_any, Bases = True,negate=True), [])
def test_takeFlowsIf(self):
"""FlowgramCollection takeFlowsIf should return flows where f(row) is True"""
is_long = lambda x: len(x) > 10
is_med = lambda x: len(str(x).replace('N','')) > 7
is_any = lambda x: len(x) > 0
f = self.ambiguous
self.assertEqual(f.takeFlowsIf(is_long, Bases = True), {})
self.assertEqual(f.takeFlowsIf(is_med, Bases = True), \
{'a':'0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0',
'c':'1.5 1.0 2.0 0.0 1.5 0.0 0.0 2.0'})
self.assertEqual(f.takeFlowsIf(is_any, Bases = True), f)
self.assertTrue(isinstance(f.takeFlowsIf(is_med, Bases = True),
FlowgramCollection))
#should be able to negate
self.assertEqual(f.takeFlowsIf(is_med, Bases = True,negate=True), \
{'b':'0.0 0.0 0.0 0.0 2.0 1.0 2.0 2.0'})
def test_getFlow(self):
"""FlowgramCollection.getFlow should return specified flow"""
a = [('a','0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0'),
('b','1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0'),
('c','2.5 0.0 4.0 0.0 0.5 1.0 0.0 1.0')]
f = FlowgramCollection(a)
self.assertEqual(f.getFlow('a'), '0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0')
self.assertRaises(KeyError, f.getFlow, 'd')
def test_getIntMap(self):
"""FlowgramCollection.getIntMap should return correct mapping."""
f = self.Class({'seq1':'0.5 1.0 2.0 0.0',
'seq2':'1.5 0.0 0.0 2.0','seq3':'0.0 3.0 0.1 1.0'})
int_keys = {'seq_0':'seq1','seq_1':'seq2','seq_2':'seq3'}
int_map = {'seq_0':'0.5 1.0 2.0 0.0','seq_1':'1.5 0.0 0.0 2.0',
'seq_2':'0.0 3.0 0.1 1.0'}
im,ik = f.getIntMap()
self.assertEqual(ik,int_keys)
self.assertEqual(im,int_map)
#test change prefix from default 'seq_'
prefix='seqn_'
int_keys = {'seqn_0':'seq1','seqn_1':'seq2','seqn_2':'seq3'}
int_map = {'seqn_0':'0.5 1.0 2.0 0.0','seqn_1':'1.5 0.0 0.0 2.0',
'seqn_2':'0.0 3.0 0.1 1.0'}
im,ik = f.getIntMap(prefix=prefix)
self.assertEqual(ik,int_keys)
self.assertEqual(im,int_map)
def test_toDict(self):
"""FlowgramCollection.toDict should return dict of strings (not obj)"""
f = self.Class({'a': '0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0'
, 'b': '1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0'})
self.assertEqual(f.toDict(), {'a':'0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0'
,'b':'1.5 1.0 0.0 0.0 2.5 1.0 2.0 1.0'})
for i in f.toDict().values():
assert isinstance(i, str)
def test_omitAmbiguousFlows(self):
"""FlowgramCollection omitAmbiguousFlows should return flows w/o N's"""
self.assertEqual(self.ambiguous.omitAmbiguousFlows(Bases=True),
{'a':'0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0',
'c':'1.5 1.0 2.0 0.0 1.5 0.0 0.0 2.0'})
self.assertEqual(self.ambiguous.omitAmbiguousFlows(Bases=False),
{'a':'0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0',
'c':'1.5 1.0 2.0 0.0 1.5 0.0 0.0 2.0'})
#check new object creation
self.assertNotSameObj(self.ambiguous.omitAmbiguousFlows(),
self.ambiguous)
self.assertTrue(isinstance(self.ambiguous.omitAmbiguousFlows(
Bases = True), FlowgramCollection))
def test_setBases(self):
"""FlowgramCollection setBases should set Bases property correctly"""
f = self.Class([Flowgram('0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0', Name='a',
header_info = {'Bases':'TACCCCTTGG'}),
Flowgram('0.0 1.0 0.0 0.0 2.0 1.0 2.0 2.0', Name='b',
header_info = {'Bases':'ATTACCGG'}),
Flowgram('1.5 1.0 2.0 0.0 1.5 0.0 0.0 2.0', Name='c',
header_info = {'Bases':'TTACCTTGG'})],
header_info = {'Flow Chars':'TACG'})
f.setBases()
for i,b in zip(f,['TACCCCTTGG','ATTACCGG','TTACCTTGG']):
self.assertEqual(i.Bases,b)
def setUp(self):
"""Define some standard data"""
self.rec = """Common Header:
Magic Number: 0x2E736666
Version: 0001
Index Offset: 96099976
Index Length: 1158685
# of Reads: 57902
Header Length: 440
Key Length: 4
# of Flows: 400
Flowgram Code: 1
Flow Chars: TACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACG
Key Sequence: TCAG
>FIQU8OX05GCVRO
Run Prefix: R_2008_10_15_16_11_02_
Region #: 5
XY Location: 2489_3906
Run Name: R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford
Analysis Name: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis
Full Path: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis
Read Header Len: 32
Name Length: 14
# of Bases: 104
Clip Qual Left: 5
Clip Qual Right: 85
Clip Adap Left: 0
Clip Adap Right: 0
Flowgram: 1.06 0.08 1.04 0.08 0.05 0.94 0.10 2.01 0.10 0.07 0.96 0.09 1.04 1.96 1.07 0.10 1.01 0.13 0.08 1.01 1.06 1.83 2.89 0.18 0.96 0.13 0.99 0.11 1.94 0.12 0.13 1.92 0.21 0.07 0.94 0.17 0.03 0.97 2.76 0.15 0.05 1.02 1.14 0.10 0.98 2.54 1.13 0.96 0.15 0.21 1.90 0.16 0.07 1.78 0.22 0.07 0.93 0.22 0.97 0.08 2.02 0.15 0.19 1.02 0.19 0.09 1.02 0.17 0.99 0.09 0.18 1.84 0.16 0.91 0.10 1.10 1.00 0.20 0.09 1.11 3.01 1.07 1.98 0.14 0.22 1.09 0.17 1.99 0.15 0.20 0.92 0.17 0.07 1.01 2.96 0.15 0.07 1.06 0.20 1.00 0.10 0.12 1.00 0.15 0.08 1.90 0.19 0.10 0.99 0.18 0.09 0.99 1.08 0.15 0.07 1.06 0.14 1.84 0.13 0.11 0.95 1.05 0.13 1.04 1.10 0.18 0.94 0.14 0.10 0.97 1.08 0.12 1.08 0.18 0.08 1.00 0.13 0.98 0.15 0.87 0.13 0.19 1.01 3.06 0.17 0.11 1.04 0.09 1.03 0.10 0.11 2.02 0.16 0.11 1.04 0.04 0.09 1.87 0.13 2.09 0.13 0.10 0.97 0.17 0.08 0.08 0.04 0.12 0.05 0.08 0.07 0.08 0.05 0.07 0.06 0.07 0.03 0.05 0.04 0.09 0.04 0.07 0.04 0.07 0.06 0.03 0.06 0.06 0.06 0.06 0.07 0.09 0.04 0.05 0.08 0.05 0.04 0.09 0.06 0.03 0.02 0.08 0.04 0.06 0.05 0.08 0.03 0.08 0.05 0.05 0.05 0.10 0.05 0.05 0.07 0.06 0.04 0.06 0.05 0.03 0.04 0.05 0.06 0.04 0.04 0.07 0.04 0.04 0.05 0.05 0.04 0.07 0.06 0.05 0.03 0.08 0.05 0.06 0.04 0.06 0.05 0.04 0.04 0.04 0.05 0.06 0.04 0.05 0.04 0.05 0.05 0.06 0.05 0.06 0.04 0.06 0.07 0.06 0.05 0.05 0.05 0.06 0.06 0.04 0.05 0.06 0.03 0.06 0.04 0.06 0.05 0.03 0.06 0.06 0.05 0.06 0.04 0.03 0.06 0.06 0.06 0.03 0.04 0.05 0.05 0.07 0.04 0.05 0.06 0.07 0.07 0.05 0.07 0.06 0.05 0.06 0.05 0.07 0.06 0.05 0.06 0.07 0.05 0.06 0.04 0.06 0.05 0.05 0.06 0.04 0.06 0.04 0.03 0.06 0.05 0.05 0.04 0.05 0.05 0.04 0.04 0.05 0.06 0.06 0.04 0.04 0.05 0.06 0.04 0.04 0.04 0.05 0.05 0.04 0.05 0.05 0.03 0.06 0.06 0.06 0.04 0.07 0.05 0.05 0.04 0.06 0.06 0.05 0.05 0.07 0.04 0.06 0.06 0.06 0.04 0.06 0.03 0.06 0.04 0.06 0.04 0.09 0.05 0.05 0.05 0.07 0.06 0.05 0.05 0.06 0.05 0.05 0.05 0.04 0.04 0.06 0.05 0.05 0.05 0.05 0.04 0.05 0.05 0.06 0.04 0.05 0.05 0.05 0.05 0.05 0.04 0.06 0.04 0.05 0.05 0.04 0.05 0.05 0.05 0.04
Flow Indexes: 1 3 6 8 8 11 13 14 14 15 17 20 21 22 22 23 23 23 25 27 29 29 32 32 35 38 39 39 39 42 43 45 46 46 46 47 48 51 51 54 54 57 59 61 61 64 67 69 72 72 74 76 77 80 81 81 81 82 83 83 86 88 88 91 94 95 95 95 98 100 103 106 106 109 112 113 116 118 118 121 122 124 125 127 130 131 133 136 138 140 143 144 144 144 147 149 152 152 155 158 158 160 160 163
Bases: tcagGCTAACTGTAACCCTCTTGGCACCCACTAAACGCCAATCTTGCTGGAGTGTTTACCAGGCACCCAGCAATGTGAATAGTCActgagcgggctggcaaggc
Quality Scores: 37 37 37 37 37 37 37 37 37 37 37 37 37 40 40 40 40 37 37 37 37 37 39 39 39 39 24 24 24 37 34 28 24 24 24 28 34 39 39 39 39 39 39 39 39 39 39 39 39 40 40 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37
>FIQU8OX05F8ILF
Run Prefix: R_2008_10_15_16_11_02_
Region #: 5
XY Location: 2440_0913
Run Name: R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford
Analysis Name: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis
Full Path: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis
Read Header Len: 32
Name Length: 14
# of Bases: 206
Clip Qual Left: 5
Clip Qual Right: 187
Clip Adap Left: 0
Clip Adap Right: 0
Flowgram: 1.04 0.00 1.01 0.00 0.00 1.00 0.00 1.00 0.00 1.05 0.00 0.91 0.10 1.07 0.95 1.01 0.00 0.06 0.93 0.02 0.03 1.06 1.18 0.09 1.00 0.05 0.90 0.11 0.07 1.99 0.11 0.02 1.96 1.04 0.13 0.01 2.83 0.10 1.97 0.06 0.11 1.04 0.13 0.03 0.98 1.15 0.07 1.00 0.07 0.08 0.98 0.11 1.92 0.05 0.04 2.96 1.02 1.02 0.04 0.93 1.00 0.13 0.04 1.00 1.03 0.08 0.97 0.13 0.11 1.88 0.09 0.05 1.02 1.89 0.07 0.11 0.98 0.05 0.07 1.01 0.08 0.05 1.01 0.13 1.00 0.07 0.10 1.04 0.10 0.04 0.98 0.12 1.03 0.96 0.11 0.07 1.00 0.09 0.03 1.03 0.11 1.95 1.06 0.13 0.05 1.00 0.13 0.11 1.00 0.09 0.03 2.89 0.08 0.95 0.09 1.03 1.02 1.05 1.07 0.08 0.12 2.81 0.08 0.08 1.00 1.07 0.07 0.05 1.86 0.12 0.98 0.06 2.00 0.11 1.02 0.11 0.08 1.88 0.13 1.03 0.13 0.98 0.15 0.11 1.03 1.03 1.04 0.18 0.98 0.13 0.15 1.04 0.11 1.01 0.13 0.06 1.01 0.06 1.02 0.08 0.99 0.14 0.99 0.09 0.05 1.09 0.04 0.07 2.96 0.09 2.03 0.13 2.96 1.13 0.08 1.03 0.07 0.99 0.11 0.05 1.05 1.04 0.09 0.07 1.00 1.03 0.09 0.06 1.06 1.04 2.94 0.18 0.06 0.93 0.10 1.10 0.11 2.02 0.17 1.00 1.03 0.06 0.11 0.96 0.04 3.00 0.11 0.07 1.99 0.10 2.03 0.12 0.97 0.16 0.01 2.09 0.14 1.04 0.16 0.06 1.03 0.14 1.12 0.12 0.05 0.96 1.01 0.10 0.14 0.94 0.03 0.12 1.10 0.92 0.09 1.10 1.04 1.02 0.12 0.97 2.00 0.15 1.08 0.04 1.03 1.04 0.03 0.09 5.16 1.02 0.09 0.13 2.66 0.09 0.05 1.06 0.07 0.89 0.05 0.12 1.10 0.16 0.06 1.01 0.13 1.00 0.14 0.98 0.09 2.92 1.28 0.03 2.95 0.98 0.16 0.08 0.95 0.96 1.09 0.08 1.07 1.01 0.16 0.06 4.52 0.12 1.03 0.07 0.09 1.03 0.14 0.03 1.01 1.99 1.05 0.14 1.03 0.13 0.03 1.10 0.10 0.96 0.11 0.99 0.12 0.05 0.94 2.83 0.14 0.12 0.96 0.00 1.00 0.11 0.14 1.98 0.08 0.11 1.04 0.01 0.11 2.03 0.15 2.05 0.10 0.03 0.93 0.01 0.08 0.12 0.00 0.16 0.05 0.07 0.08 0.11 0.07 0.05 0.04 0.10 0.05 0.05 0.03 0.07 0.03 0.04 0.04 0.06 0.03 0.05 0.04 0.09 0.03 0.08 0.03 0.07 0.02 0.05 0.02 0.06 0.01 0.05 0.04 0.06 0.02 0.04 0.04 0.04 0.03 0.03 0.06 0.06 0.03 0.02 0.02 0.08 0.03 0.01 0.01 0.06 0.03 0.01 0.03 0.04 0.02 0.00 0.02 0.05 0.00 0.02 0.02 0.03 0.00 0.02 0.02 0.04 0.01 0.00 0.01 0.05
Flow Indexes: 1 3 6 8 10 12 14 15 16 19 22 23 25 27 30 30 33 33 34 37 37 37 39 39 42 45 46 48 51 53 53 56 56 56 57 58 60 61 64 65 67 70 70 73 74 74 77 80 83 85 88 91 93 94 97 100 102 102 103 106 109 112 112 112 114 116 117 118 119 122 122 122 125 126 129 129 131 133 133 135 138 138 140 142 145 146 147 149 152 154 157 159 161 163 166 169 169 169 171 171 173 173 173 174 176 178 181 182 185 186 189 190 191 191 191 194 196 198 198 200 201 204 206 206 206 209 209 211 211 213 216 216 218 221 223 226 227 230 233 234 236 237 238 240 241 241 243 245 246 249 249 249 249 249 250 253 253 253 256 258 261 264 266 268 270 270 270 271 273 273 273 274 277 278 279 281 282 285 285 285 285 285 287 290 293 294 294 295 297 300 302 304 307 308 308 308 311 313 316 316 319 322 322 324 324 327
Bases: tcagAGACGCACTCAATTATTTCCATAGCTTGGGTAGTGTCAATAATGCTGCTATGAACATGGGAGTACAAATATTCTTCAAGATACTGATCTCATTTCCTTTAGATATATACCCAGAAGTGAAATTCCTGGATCACATAGTAGTTCTATTTTTATTTGATGAGAAACTTTATACTATTTTTCATAActgagcgggctggcaaggc
Quality Scores: 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 38 38 38 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 34 34 34 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 36 36 36 36 36 38 25 25 25 38 37 37 37 37 37 37 33 33 34 37 37 37 37 37 37 37 38 34 20 20 26 26 20 34 38 37 37 37 37 37 37 37 37 37 38 38 38 37 37 37 37 37 37 37 37 37 37
""".split('\n')
self.flow = """1.06 0.08 1.04 0.08 0.05 0.94 0.10 2.01 0.10 0.07 0.96 0.09 1.04 1.96 1.07 0.10 1.01 0.13 0.08 1.01 1.06 1.83 2.89 0.18 0.96 0.13 0.99 0.11 1.94 0.12 0.13 1.92 0.21 0.07 0.94 0.17 0.03 0.97 2.76 0.15 0.05 1.02 1.14 0.10 0.98 2.54 1.13 0.96 0.15 0.21 1.90 0.16 0.07 1.78 0.22 0.07 0.93 0.22 0.97 0.08 2.02 0.15 0.19 1.02 0.19 0.09 1.02 0.17 0.99 0.09 0.18 1.84 0.16 0.91 0.10 1.10 1.00 0.20 0.09 1.11 3.01 1.07 1.98 0.14 0.22 1.09 0.17 1.99 0.15 0.20 0.92 0.17 0.07 1.01 2.96 0.15 0.07 1.06 0.20 1.00 0.10 0.12 1.00 0.15 0.08 1.90 0.19 0.10 0.99 0.18 0.09 0.99 1.08 0.15 0.07 1.06 0.14 1.84 0.13 0.11 0.95 1.05 0.13 1.04 1.10 0.18 0.94 0.14 0.10 0.97 1.08 0.12 1.08 0.18 0.08 1.00 0.13 0.98 0.15 0.87 0.13 0.19 1.01 3.06 0.17 0.11 1.04 0.09 1.03 0.10 0.11 2.02 0.16 0.11 1.04 0.04 0.09 1.87 0.13 2.09 0.13 0.10 0.97 0.17 0.08 0.08 0.04 0.12 0.05 0.08 0.07 0.08 0.05 0.07 0.06 0.07 0.03 0.05 0.04 0.09 0.04 0.07 0.04 0.07 0.06 0.03 0.06 0.06 0.06 0.06 0.07 0.09 0.04 0.05 0.08 0.05 0.04 0.09 0.06 0.03 0.02 0.08 0.04 0.06 0.05 0.08 0.03 0.08 0.05 0.05 0.05 0.10 0.05 0.05 0.07 0.06 0.04 0.06 0.05 0.03 0.04 0.05 0.06 0.04 0.04 0.07 0.04 0.04 0.05 0.05 0.04 0.07 0.06 0.05 0.03 0.08 0.05 0.06 0.04 0.06 0.05 0.04 0.04 0.04 0.05 0.06 0.04 0.05 0.04 0.05 0.05 0.06 0.05 0.06 0.04 0.06 0.07 0.06 0.05 0.05 0.05 0.06 0.06 0.04 0.05 0.06 0.03 0.06 0.04 0.06 0.05 0.03 0.06 0.06 0.05 0.06 0.04 0.03 0.06 0.06 0.06 0.03 0.04 0.05 0.05 0.07 0.04 0.05 0.06 0.07 0.07 0.05 0.07 0.06 0.05 0.06 0.05 0.07 0.06 0.05 0.06 0.07 0.05 0.06 0.04 0.06 0.05 0.05 0.06 0.04 0.06 0.04 0.03 0.06 0.05 0.05 0.04 0.05 0.05 0.04 0.04 0.05 0.06 0.06 0.04 0.04 0.05 0.06 0.04 0.04 0.04 0.05 0.05 0.04 0.05 0.05 0.03 0.06 0.06 0.06 0.04 0.07 0.05 0.05 0.04 0.06 0.06 0.05 0.05 0.07 0.04 0.06 0.06 0.06 0.04 0.06 0.03 0.06 0.04 0.06 0.04 0.09 0.05 0.05 0.05 0.07 0.06 0.05 0.05 0.06 0.05 0.05 0.05 0.04 0.04 0.06 0.05 0.05 0.05 0.05 0.04 0.05 0.05 0.06 0.04 0.05 0.05 0.05 0.05 0.05 0.04 0.06 0.04 0.05 0.05 0.04 0.05 0.05 0.05 0.04"""
self.unordered = self.Class({'a':'0.0 1.1 3.0 1.0',
'b':'0.5 1.0 4.0 0.0'})
self.ordered1 = self.Class({'a':'0.0 1.1 3.0 1.0',\
'b':'0.5 1.0 4.0 0.0'}, Names=['a','b'])
self.ordered2 = self.Class({'a':'0.0 1.1 3.0 1.0',\
'b':'0.5 1.0 4.0 0.0'}, Names=['b','a'])
self.one_seq = self.Class({'a':'0.0 1.1 3.0 1.0'})
self.ambiguous = self.Class([Flowgram('0.5 1.0 4.0 0.0 1.5 0.0 0.0 2.0', Name='a',
header_info = {'Bases':'TACCCCTTGG'}),
Flowgram('0.0 0.0 0.0 0.0 2.0 1.0 2.0 2.0', Name = 'b',
header_info = {'Bases':'NTTACCGG'}),
Flowgram('1.5 1.0 2.0 0.0 1.5 0.0 0.0 2.0', Name='c',
header_info = {'Bases':'TTACCTTGG'})],
header_info = {'Flow Chars':'TACG'})
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_parse/test_flowgram_parser.py 000644 000765 000024 00000031350 12024702176 024361 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""tests for sff parser"""
__author__ = "Julia Goodrich, Jens Reeder"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Julia Goodrich","Jens Reeder"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Jens Reeder"
__email__ = "jreeder@colorado.edu"
__status__ = "Development"
from types import GeneratorType
from cogent.util.unit_test import TestCase, main
from cogent.parse.flowgram_parser import get_header_info, get_summaries,\
get_all_summaries, split_summary, parse_sff, lazy_parse_sff_handle
class SFFParserTests(TestCase):
"""Tests sff parser functions"""
def setUp(self):
"""Define some standard data"""
self.rec = """Common Header:
Magic Number: 0x2E736666
Version: 0001
Index Offset: 96099976
Index Length: 1158685
# of Reads: 57902
Header Length: 440
Key Length: 4
# of Flows: 400
Flowgram Code: 1
Flow Chars: TACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACG
Key Sequence: TCAG
>FIQU8OX05GCVRO
Run Prefix: R_2008_10_15_16_11_02_
Region #: 5
XY Location: 2489_3906
Run Name: R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford
Analysis Name: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis
Full Path: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis
Read Header Len: 32
Name Length: 14
# of Bases: 104
Clip Qual Left: 5
Clip Qual Right: 85
Clip Adap Left: 0
Clip Adap Right: 0
Flowgram: 1.06 0.08 1.04 0.08 0.05 0.94 0.10 2.01 0.10 0.07 0.96 0.09 1.04 1.96 1.07 0.10 1.01 0.13 0.08 1.01 1.06 1.83 2.89 0.18 0.96 0.13 0.99 0.11 1.94 0.12 0.13 1.92 0.21 0.07 0.94 0.17 0.03 0.97 2.76 0.15 0.05 1.02 1.14 0.10 0.98 2.54 1.13 0.96 0.15 0.21 1.90 0.16 0.07 1.78 0.22 0.07 0.93 0.22 0.97 0.08 2.02 0.15 0.19 1.02 0.19 0.09 1.02 0.17 0.99 0.09 0.18 1.84 0.16 0.91 0.10 1.10 1.00 0.20 0.09 1.11 3.01 1.07 1.98 0.14 0.22 1.09 0.17 1.99 0.15 0.20 0.92 0.17 0.07 1.01 2.96 0.15 0.07 1.06 0.20 1.00 0.10 0.12 1.00 0.15 0.08 1.90 0.19 0.10 0.99 0.18 0.09 0.99 1.08 0.15 0.07 1.06 0.14 1.84 0.13 0.11 0.95 1.05 0.13 1.04 1.10 0.18 0.94 0.14 0.10 0.97 1.08 0.12 1.08 0.18 0.08 1.00 0.13 0.98 0.15 0.87 0.13 0.19 1.01 3.06 0.17 0.11 1.04 0.09 1.03 0.10 0.11 2.02 0.16 0.11 1.04 0.04 0.09 1.87 0.13 2.09 0.13 0.10 0.97 0.17 0.08 0.08 0.04 0.12 0.05 0.08 0.07 0.08 0.05 0.07 0.06 0.07 0.03 0.05 0.04 0.09 0.04 0.07 0.04 0.07 0.06 0.03 0.06 0.06 0.06 0.06 0.07 0.09 0.04 0.05 0.08 0.05 0.04 0.09 0.06 0.03 0.02 0.08 0.04 0.06 0.05 0.08 0.03 0.08 0.05 0.05 0.05 0.10 0.05 0.05 0.07 0.06 0.04 0.06 0.05 0.03 0.04 0.05 0.06 0.04 0.04 0.07 0.04 0.04 0.05 0.05 0.04 0.07 0.06 0.05 0.03 0.08 0.05 0.06 0.04 0.06 0.05 0.04 0.04 0.04 0.05 0.06 0.04 0.05 0.04 0.05 0.05 0.06 0.05 0.06 0.04 0.06 0.07 0.06 0.05 0.05 0.05 0.06 0.06 0.04 0.05 0.06 0.03 0.06 0.04 0.06 0.05 0.03 0.06 0.06 0.05 0.06 0.04 0.03 0.06 0.06 0.06 0.03 0.04 0.05 0.05 0.07 0.04 0.05 0.06 0.07 0.07 0.05 0.07 0.06 0.05 0.06 0.05 0.07 0.06 0.05 0.06 0.07 0.05 0.06 0.04 0.06 0.05 0.05 0.06 0.04 0.06 0.04 0.03 0.06 0.05 0.05 0.04 0.05 0.05 0.04 0.04 0.05 0.06 0.06 0.04 0.04 0.05 0.06 0.04 0.04 0.04 0.05 0.05 0.04 0.05 0.05 0.03 0.06 0.06 0.06 0.04 0.07 0.05 0.05 0.04 0.06 0.06 0.05 0.05 0.07 0.04 0.06 0.06 0.06 0.04 0.06 0.03 0.06 0.04 0.06 0.04 0.09 0.05 0.05 0.05 0.07 0.06 0.05 0.05 0.06 0.05 0.05 0.05 0.04 0.04 0.06 0.05 0.05 0.05 0.05 0.04 0.05 0.05 0.06 0.04 0.05 0.05 0.05 0.05 0.05 0.04 0.06 0.04 0.05 0.05 0.04 0.05 0.05 0.05 0.04
Flow Indexes: 1 3 6 8 8 11 13 14 14 15 17 20 21 22 22 23 23 23 25 27 29 29 32 32 35 38 39 39 39 42 43 45 46 46 46 47 48 51 51 54 54 57 59 61 61 64 67 69 72 72 74 76 77 80 81 81 81 82 83 83 86 88 88 91 94 95 95 95 98 100 103 106 106 109 112 113 116 118 118 121 122 124 125 127 130 131 133 136 138 140 143 144 144 144 147 149 152 152 155 158 158 160 160 163
Bases: tcagGCTAACTGTAACCCTCTTGGCACCCACTAAACGCCAATCTTGCTGGAGTGTTTACCAGGCACCCAGCAATGTGAATAGTCActgagcgggctggcaaggc
Quality Scores: 37 37 37 37 37 37 37 37 37 37 37 37 37 40 40 40 40 37 37 37 37 37 39 39 39 39 24 24 24 37 34 28 24 24 24 28 34 39 39 39 39 39 39 39 39 39 39 39 39 40 40 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37
>FIQU8OX05F8ILF
Run Prefix: R_2008_10_15_16_11_02_
Region #: 5
XY Location: 2440_0913
Run Name: R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford
Analysis Name: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis
Full Path: /data/2008_10_15/R_2008_10_15_16_11_02_FLX04070166_adminrig_1548jinnescurtisstanford/D_2008_10_15_15_12_26_FLX04070166_1548jinnescurtisstanford_FullAnalysis
Read Header Len: 32
Name Length: 14
# of Bases: 206
Clip Qual Left: 5
Clip Qual Right: 187
Clip Adap Left: 0
Clip Adap Right: 0
Flowgram: 1.04 0.00 1.01 0.00 0.00 1.00 0.00 1.00 0.00 1.05 0.00 0.91 0.10 1.07 0.95 1.01 0.00 0.06 0.93 0.02 0.03 1.06 1.18 0.09 1.00 0.05 0.90 0.11 0.07 1.99 0.11 0.02 1.96 1.04 0.13 0.01 2.83 0.10 1.97 0.06 0.11 1.04 0.13 0.03 0.98 1.15 0.07 1.00 0.07 0.08 0.98 0.11 1.92 0.05 0.04 2.96 1.02 1.02 0.04 0.93 1.00 0.13 0.04 1.00 1.03 0.08 0.97 0.13 0.11 1.88 0.09 0.05 1.02 1.89 0.07 0.11 0.98 0.05 0.07 1.01 0.08 0.05 1.01 0.13 1.00 0.07 0.10 1.04 0.10 0.04 0.98 0.12 1.03 0.96 0.11 0.07 1.00 0.09 0.03 1.03 0.11 1.95 1.06 0.13 0.05 1.00 0.13 0.11 1.00 0.09 0.03 2.89 0.08 0.95 0.09 1.03 1.02 1.05 1.07 0.08 0.12 2.81 0.08 0.08 1.00 1.07 0.07 0.05 1.86 0.12 0.98 0.06 2.00 0.11 1.02 0.11 0.08 1.88 0.13 1.03 0.13 0.98 0.15 0.11 1.03 1.03 1.04 0.18 0.98 0.13 0.15 1.04 0.11 1.01 0.13 0.06 1.01 0.06 1.02 0.08 0.99 0.14 0.99 0.09 0.05 1.09 0.04 0.07 2.96 0.09 2.03 0.13 2.96 1.13 0.08 1.03 0.07 0.99 0.11 0.05 1.05 1.04 0.09 0.07 1.00 1.03 0.09 0.06 1.06 1.04 2.94 0.18 0.06 0.93 0.10 1.10 0.11 2.02 0.17 1.00 1.03 0.06 0.11 0.96 0.04 3.00 0.11 0.07 1.99 0.10 2.03 0.12 0.97 0.16 0.01 2.09 0.14 1.04 0.16 0.06 1.03 0.14 1.12 0.12 0.05 0.96 1.01 0.10 0.14 0.94 0.03 0.12 1.10 0.92 0.09 1.10 1.04 1.02 0.12 0.97 2.00 0.15 1.08 0.04 1.03 1.04 0.03 0.09 5.16 1.02 0.09 0.13 2.66 0.09 0.05 1.06 0.07 0.89 0.05 0.12 1.10 0.16 0.06 1.01 0.13 1.00 0.14 0.98 0.09 2.92 1.28 0.03 2.95 0.98 0.16 0.08 0.95 0.96 1.09 0.08 1.07 1.01 0.16 0.06 4.52 0.12 1.03 0.07 0.09 1.03 0.14 0.03 1.01 1.99 1.05 0.14 1.03 0.13 0.03 1.10 0.10 0.96 0.11 0.99 0.12 0.05 0.94 2.83 0.14 0.12 0.96 0.00 1.00 0.11 0.14 1.98 0.08 0.11 1.04 0.01 0.11 2.03 0.15 2.05 0.10 0.03 0.93 0.01 0.08 0.12 0.00 0.16 0.05 0.07 0.08 0.11 0.07 0.05 0.04 0.10 0.05 0.05 0.03 0.07 0.03 0.04 0.04 0.06 0.03 0.05 0.04 0.09 0.03 0.08 0.03 0.07 0.02 0.05 0.02 0.06 0.01 0.05 0.04 0.06 0.02 0.04 0.04 0.04 0.03 0.03 0.06 0.06 0.03 0.02 0.02 0.08 0.03 0.01 0.01 0.06 0.03 0.01 0.03 0.04 0.02 0.00 0.02 0.05 0.00 0.02 0.02 0.03 0.00 0.02 0.02 0.04 0.01 0.00 0.01 0.05
Flow Indexes: 1 3 6 8 10 12 14 15 16 19 22 23 25 27 30 30 33 33 34 37 37 37 39 39 42 45 46 48 51 53 53 56 56 56 57 58 60 61 64 65 67 70 70 73 74 74 77 80 83 85 88 91 93 94 97 100 102 102 103 106 109 112 112 112 114 116 117 118 119 122 122 122 125 126 129 129 131 133 133 135 138 138 140 142 145 146 147 149 152 154 157 159 161 163 166 169 169 169 171 171 173 173 173 174 176 178 181 182 185 186 189 190 191 191 191 194 196 198 198 200 201 204 206 206 206 209 209 211 211 213 216 216 218 221 223 226 227 230 233 234 236 237 238 240 241 241 243 245 246 249 249 249 249 249 250 253 253 253 256 258 261 264 266 268 270 270 270 271 273 273 273 274 277 278 279 281 282 285 285 285 285 285 287 290 293 294 294 295 297 300 302 304 307 308 308 308 311 313 316 316 319 322 322 324 324 327
Bases: tcagAGACGCACTCAATTATTTCCATAGCTTGGGTAGTGTCAATAATGCTGCTATGAACATGGGAGTACAAATATTCTTCAAGATACTGATCTCATTTCCTTTAGATATATACCCAGAAGTGAAATTCCTGGATCACATAGTAGTTCTATTTTTATTTGATGAGAAACTTTATACTATTTTTCATAActgagcgggctggcaaggc
Quality Scores: 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 38 38 38 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 34 34 34 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 37 36 36 36 36 36 38 25 25 25 38 37 37 37 37 37 37 33 33 34 37 37 37 37 37 37 37 38 34 20 20 26 26 20 34 38 37 37 37 37 37 37 37 37 37 38 38 38 37 37 37 37 37 37 37 37 37 37
""".split('\n')
def test_get_header_info(self):
"""get_header_info should return a sff file common header as a dict"""
header = get_header_info(self.rec)
self.assertEqual(len(header), 11)
self.assertEqual(header['Key Length'], '4')
self.assertEqual(header['Key Sequence'], 'TCAG')
def test_get_summaries(self):
"""get_summaries should return a generator of the summaries"""
summaries = get_summaries(self.rec,number_list = [1])
sum_list = list(summaries)
self.assertEqual(len(sum_list), 1)
self.assertEqual(isinstance(summaries, GeneratorType), True)
self.assertEqual(len(sum_list[0]), 18)
self.assertEqual(sum_list[0][0], '>FIQU8OX05F8ILF')
summaries = get_summaries(self.rec,name_list = ['FIQU8OX05GCVRO'])
sum_list = list(summaries)
self.assertEqual(len(sum_list), 1)
self.assertEqual(isinstance(summaries, GeneratorType), True)
self.assertEqual(len(sum_list[0]), 18)
self.assertEqual(sum_list[0][0], '>FIQU8OX05GCVRO')
summaries = get_summaries(self.rec,all_sums = True )
sum_list = list(summaries)
self.assertEqual(len(sum_list), 2)
self.assertEqual(isinstance(summaries, GeneratorType), True)
self.assertEqual(len(sum_list[0]), 18)
self.assertEqual(sum_list[0][0], '>FIQU8OX05GCVRO')
self.assertEqual(sum_list[1][0], '>FIQU8OX05F8ILF')
summaries = get_summaries(self.rec,number_list = [0],
name_list =['FIQU8OX05GCVRO'])
self.assertRaises(AssertionError,list,summaries)
summaries = get_summaries(self.rec)
self.assertRaises(ValueError,list, summaries)
def test_get_all_summaries(self):
"""get_all_summaries should return a list of the summaries"""
summaries = get_all_summaries(self.rec)
self.assertEqual(len(summaries), 2)
self.assertEqual(isinstance(summaries,list), True)
self.assertEqual(len(summaries[0]), 18)
self.assertEqual(summaries[0][0], '>FIQU8OX05GCVRO')
self.assertEqual(summaries[1][0], '>FIQU8OX05F8ILF')
def test_split_summary(self):
"""split_summary should return the info of a flowgram header."""
summaries = get_all_summaries(self.rec)
sum_dict = split_summary(summaries[0])
self.assertEqual(len(sum_dict), 18)
self.assertEqual(sum_dict['Name'], 'FIQU8OX05GCVRO')
assert 'Flowgram' in sum_dict
assert 'Bases' in sum_dict
sum_dict = split_summary(summaries[1])
self.assertEqual(len(sum_dict), 18)
self.assertEqual(sum_dict['Name'], 'FIQU8OX05F8ILF')
assert 'Flowgram' in sum_dict
assert 'Bases' in sum_dict
def test_parse_sff(self):
"""SFParser should read in the SFF file correctly."""
flows, head = parse_sff(self.rec)
self.assertEqual(len(flows),2)
self.assertEqual(len(head), 11)
self.assertEqual(head['Key Length'], '4')
self.assertEqual(head['Key Sequence'], 'TCAG')
self.assertEqual(flows[0].Name, 'FIQU8OX05GCVRO')
self.assertEqual(flows[1].Name, 'FIQU8OX05F8ILF')
def test_lazy_parse_sff_handle(self):
"""LazySFParser should read in the SFF file correctly."""
flows, head = lazy_parse_sff_handle(self.rec)
flows = list(flows)
self.assertEqual(len(flows),2)
self.assertEqual(len(head), 11)
self.assertEqual(head['Key Length'], '4')
self.assertEqual(head['Key Sequence'], 'TCAG')
self.assertEqual(flows[0].Name, 'FIQU8OX05GCVRO')
self.assertEqual(flows[1].Name, 'FIQU8OX05F8ILF')
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_parse/test_gbseq.py 000644 000765 000024 00000016547 12024702176 022303 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
import xml.dom.minidom
from cogent.util.unit_test import TestCase, main
from cogent.parse.gbseq import GbSeqXmlParser
__author__ = "Matthew Wakefield"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Matthew Wakefield"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Matthew Wakefield"
__email__ = "wakefield@wehi.edu.au"
__status__ = "Production"
data = """
AY286018
99
single
mRNA
linear
MAM
29-SEP-2003
01-JUN-2003
Macropus eugenii medium wave-sensitive opsin 1 (OPN1MW) mRNA, complete cds
AY286018
AY286018.1
gb|AY286018.1|
gi|31322957
Macropus eugenii (tammar wallaby)
Macropus eugenii
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Metatheria; Diprotodontia; Macropodidae; Macropus
1
1..99
Deeb,S.S.
Wakefield,M.J.
Tada,T.
Marotte,L.
Yokoyama,S.
Marshall Graves,J.A.
The cone visual pigments of an Australian marsupial, the tammar wallaby (Macropus eugenii): sequence, spectral tuning, and evolution
Mol. Biol. Evol. 20 (10), 1642-1649 (2003)
doi
10.1093/molbev/msg181
12885969
2
1..99
Deeb,S.S.
Wakefield,M.J.
Tada,T.
Marotte,L.
Yokoyama,S.
Graves,J.A.M.
Direct Submission
Submitted (29-APR-2003) RSBS, The Australian National University, Acton, ACT 0200, Australia
source
1..99
1
99
AY286018.1
organism
Macropus eugenii
mol_type
mRNA
db_xref
taxon:9315
country
Australia: Kangaroo Island
gene
1..99
1
99
AY286018.1
gene
OPN1MW
CDS
31..99
31
99
AY286018.1
gene
OPN1MW
note
cone pigments
codon_start
1
transl_table
1
product
medium wave-sensitive opsin 1
protein_id
AAP37945.1
db_xref
GI:31322958
translation
MTQAWDPAGFLAWRRDENE
ggcagggaaagggaagaaagtaaaggggccatgacacaggcatgggaccctgcagggttcttggcttggcggcgggacgagaacgaggagacgactcgg
"""
sample_seq = ">AY286018.1\nGGCAGGGAAAGGGAAGAAAGTAAAGGGGCCATGACACAGGCATGGGACCCTGCAGGGTTCTTGGCTTGGCGGCGGGACGAGAACGAGGAGACGACTCGG"
sample_annotations = '[source "[0:99]/99 of AY286018.1" at [0:99]/99, organism "Macropus eugenii" at [0:99]/99, gene "OPN1MW" at [0:99]/99, CDS "OPN1MW" at [30:99]/99]'
class ParseGBseq(TestCase):
def test_parse(self):
for name,seq in [GbSeqXmlParser(data).next(),GbSeqXmlParser(xml.dom.minidom.parseString(data)).next()]:
self.assertEqual(name, 'AY286018.1')
self.assertEqual(sample_seq, seq.toFasta())
self.assertEqual(str(seq.annotations), sample_annotations)
pass
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_parse/test_genbank.py 000644 000765 000024 00000043366 12024702176 022606 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for the GenBank database parsers.
"""
from cogent.parse.genbank import parse_locus, parse_single_line, \
indent_splitter, parse_sequence, block_consolidator, parse_organism, \
parse_feature, location_line_tokenizer, parse_simple_location_segment, \
parse_location_line, parse_reference, parse_source, \
Location, LocationList
from cogent.util.unit_test import TestCase, main
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Gavin Huttley"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class GenBankTests(TestCase):
"""Tests of the GenBank main functions."""
def test_parse_locus(self):
"""parse_locus should give correct results on specimen locus lines"""
line = 'LOCUS AF108830 5313 bp mRNA linear PRI 19-MAY-1999'
result = parse_locus(line)
self.assertEqual(len(result), 6)
self.assertEqual(result['locus'], 'AF108830')
self.assertEqual(result['length'], 5313) #note: int, not str
self.assertEqual(result['mol_type'], 'mRNA')
self.assertEqual(result['topology'], 'linear')
self.assertEqual(result['db'], 'PRI')
self.assertEqual(result['date'], '19-MAY-1999')
#should work if some of the fields are missing
line = 'LOCUS AF108830 5313'
result = parse_locus(line)
self.assertEqual(len(result), 2)
self.assertEqual(result['locus'], 'AF108830')
self.assertEqual(result['length'], 5313) #note: int, not str
def test_parse_single_line(self):
"""parse_single_line should split off the label and return the rest"""
line_1 = 'VERSION AF108830.1 GI:4868112\n'
self.assertEqual(parse_single_line(line_1), 'AF108830.1 GI:4868112')
#should work if leading spaces
line_2 = ' VERSION AF108830.1 GI:4868112\n'
self.assertEqual(parse_single_line(line_2), 'AF108830.1 GI:4868112')
def test_indent_splitter(self):
"""indent_splitter should split lines at correct locations"""
#if lines have same indent, should not group together
lines = [
'abc xxx',
'def yyy'
]
self.assertEqual(list(indent_splitter(lines)),\
[[lines[0]], [lines[1]]])
#if second line is indented, should group with first
lines = [
'abc xxx',
' def yyy'
]
self.assertEqual(list(indent_splitter(lines)),\
[[lines[0], lines[1]]])
#if both lines indented but second is more, should group with first
lines = [
' abc xxx',
' def yyy'
]
self.assertEqual(list(indent_splitter(lines)),\
[[lines[0], lines[1]]])
#if both lines indented equally, should not group
lines = [
' abc xxx',
' def yyy'
]
self.assertEqual(list(indent_splitter(lines)), \
[[lines[0]], [lines[1]]])
#for more complex situation, should produce correct grouping
lines = [
' xyz', #0 -
' xxx', #1 -
' yyy', #2
' uuu', #3
' iii', #4
' qaz', #5 -
' wsx', #6 -
' az', #7
' sx', #8
' gb',#9
' bg', #10
' aaa', #11 -
]
self.assertEqual(list(indent_splitter(lines)), \
[[lines[0]], lines[1:5], [lines[5]], lines[6:11], [lines[11]]])
#real example from genbank file
lines = \
"""LOCUS NT_016354 92123751 bp DNA linear CON 29-AUG-2006
DEFINITION Homo sapiens chromosome 4 genomic contig, reference assembly.
ACCESSION NT_016354 NT_006109 NT_006204 NT_006245 NT_006302 NT_006371
NT_006397 NT_016393 NT_016589 NT_016599 NT_016606 NT_022752
NT_022753 NT_022755 NT_022760 NT_022774 NT_022797 NT_022803
NT_022846 NT_022960 NT_025694 NT_028147 NT_029273 NT_030643
NT_030646 NT_030662 NT_031780 NT_031781 NT_031791 NT_034703
NT_034705 NT_037628 NT_037629 NT_079512
VERSION NT_016354.18 GI:88977422
KEYWORDS .
SOURCE Homo sapiens (human)
ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae; Homo.
?
REFERENCE 2 (bases 1 to 92123751)
AUTHORS International Human Genome Sequencing Consortium.
TITLE Finishing the euchromatic sequence of the human genome""".split('\n')
self.assertEqual(list(indent_splitter(lines)), \
[[lines[0]],[lines[1]],lines[2:8],[lines[8]],[lines[9]],lines[10:15],\
[lines[15]], lines[16:]])
def test_parse_sequence(self):
"""parse_sequence should strip bad chars out of sequence lines"""
lines = """
ORIGIN
1 gggagcgcgg cgcgggagcc cgaggctgag actcaccgga ggaagcggcg cgagcgcccc
61 gccatcgtcc \t\t cggctgaagt 123 \ngcagtg \n
121 cctgggctta agcagtcttc45ccacctcagc
//\n\n\n""".split('\n')
result = parse_sequence(lines)
self.assertEqual(result, 'gggagcgcggcgcgggagcccgaggctgagactcaccggaggaagcggcgcgagcgccccgccatcgtcccggctgaagtgcagtgcctgggcttaagcagtcttcccacctcagc')
def test_block_consolidator(self):
"""block_consolidator should join the block together."""
lines = """ ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini;
Hominidae; Homo.""".split('\n')
label, data = block_consolidator(lines)
self.assertEqual(label, 'ORGANISM')
self.assertEqual(data, ['Homo sapiens',
' Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;',
' Mammalia; Eutheria; Euarchontoglires; Primates; Catarrhini;',
' Hominidae; Homo.'])
lines = r"""COMMENT
Contact: Spindel ER
Division of Neuroscience""".splitlines()
label, data = block_consolidator(lines)
self.assertEqual(label, "COMMENT")
self.assertEqual(data, ['', ' Contact: Spindel ER',
' Division of Neuroscience'])
def test_parse_organism(self):
"""parse_organism should return species, taxonomy (up to genus)"""
#note: lines modified to include the following:
# - multiword names
# - multiword names split over a line break
# - periods and other punctuation in names
lines = """ ORGANISM Homo sapiens
Eukaryota; Metazoa; Chordata Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Euarchontoglires; Primates \t abc. 2.; Catarrhini
Hominidae; Homo.""".split('\n')
species, taxonomy = parse_organism(lines)
self.assertEqual(species, 'Homo sapiens')
self.assertEqual(taxonomy, ['Eukaryota', 'Metazoa', \
'Chordata Craniata', 'Vertebrata', 'Euteleostomi', 'Mammalia', \
'Eutheria', 'Euarchontoglires', 'Primates abc. 2.', \
'Catarrhini Hominidae', 'Homo'])
def test_parse_feature(self):
"""parse_feature should return dict containing annotations of feature"""
example_feature=\
""" CDS complement(join(102262..102647,105026..105217,
106638..106719,152424..152682,243209..243267))
/gene="nad1"
/note="Protein sequence is in conflict with the conceptual
translation; author given translation (not conceptual
translation)
start codon is created by C to U RNA editing"
/codon_start=1
/exception="RNA editing"
/product="NADH dehydrogenase subunit 1"
/protein_id="NP_064011.1"
/db_xref="GI:9838451"
/db_xref="IPI:12345"
/translation="MYIAVPAEILGIILPLLLGVAFLVLAERKVMAFVQRRKGPDVVG
SFGLLQPLADGSKLILKEPISPSSANFSLFRMAPVTTFMLSLVARAVVPFDYGMVLSD
PNIGLLYLFAISSLGVYGIIIAGWSSNSKYAFLGALRSAAQMVPYEVSIGLILITVLI
CVGPRNSSEIVMAQKQIWSGIPLFPVLVMFFISCLAETNRAPFDLPEAERELVAGYNV
EYSSMGSALFFLGEYANMILMSGLCTSLSPGGWPPILDLPISKRIPGSIWFSIKVILF
LFLYIWVRAAFPRYRYDQLMGLGRKVFLPLSLARVVAVSGVLVTFQWLP"""
result = parse_feature(example_feature.split('\n'))
self.assertEqual(result['type'], 'CDS')
self.assertEqual(result['raw_location'], \
['complement(join(102262..102647,105026..105217,', \
' 106638..106719,152424..152682,243209..243267))'])
self.assertEqual(result['gene'], ['nad1'])
self.assertEqual(result['note'], ['Protein sequence is in conflict with the conceptual translation; author given translation (not conceptual translation) start codon is created by C to U RNA editing'])
self.assertEqual(result['codon_start'], ['1'])
self.assertEqual(result['exception'], ['RNA editing'])
self.assertEqual(result['product'], ['NADH dehydrogenase subunit 1'])
self.assertEqual(result['protein_id'],['NP_064011.1'])
self.assertEqual(result['db_xref'], ['GI:9838451','IPI:12345'])
self.assertEqual(result['translation'],['MYIAVPAEILGIILPLLLGVAFLVLAERKVMAFVQRRKGPDVVGSFGLLQPLADGSKLILKEPISPSSANFSLFRMAPVTTFMLSLVARAVVPFDYGMVLSDPNIGLLYLFAISSLGVYGIIIAGWSSNSKYAFLGALRSAAQMVPYEVSIGLILITVLICVGPRNSSEIVMAQKQIWSGIPLFPVLVMFFISCLAETNRAPFDLPEAERELVAGYNVEYSSMGSALFFLGEYANMILMSGLCTSLSPGGWPPILDLPISKRIPGSIWFSIKVILFLFLYIWVRAAFPRYRYDQLMGLGRKVFLPLSLARVVAVSGVLVTFQWLP'])
self.assertEqual(len(result), 11)
short_feature = ['D-loop 15418..16866']
result = parse_feature(short_feature)
self.assertEqual(result['type'], 'D-loop')
self.assertEqual(result['raw_location'], ['15418..16866'])
#can get more than one = in a line
#from AF260826
bad_feature = \
""" tRNA 1173..1238
/note="codon recognized: AUC; Cove score = 16.56"
/product="tRNA-Ile"
/anticodon=(pos:1203..1205,aa:Ile)"""
result = parse_feature(bad_feature.split('\n'))
self.assertEqual(result['note'], \
['codon recognized: AUC; Cove score = 16.56'])
#need not always have an = in a line
#from NC_001807
bad_feature = \
''' mRNA 556
/partial
/citation=[6]
/product="H-strand"'''
result = parse_feature(bad_feature.split('\n'))
self.assertEqual(result['partial'], [''])
def test_location_line_tokenizer(self):
"""location_line_tokenizer should tokenize location lines"""
llt =location_line_tokenizer
self.assertEqual(list(llt(['123..456'])), ['123..456'])
self.assertEqual(list(llt(['complement(123..456)'])), \
['complement(', '123..456', ')'])
self.assertEqual(list(llt(['join(1..2,3..4)'])), \
['join(', '1..2', ',', '3..4', ')'])
self.assertEqual(list(llt([\
'join(complement(1..2, join(complement( 3..4),',\
'\n5..6), 7..8\t))'])),\
['join(','complement(','1..2',',','join(','complement(','3..4',\
')', ',', '5..6',')',',','7..8',')',')'])
def test_parse_simple_location_segment(self):
"""parse_simple_location_segment should parse simple segments"""
lsp = parse_simple_location_segment
l = lsp('37')
self.assertEqual(l._data, 37)
self.assertEqual(str(l), '37')
self.assertEqual(l.Strand, 1)
l = lsp('40..50')
first, second = l._data
self.assertEqual(first._data, 40)
self.assertEqual(second._data, 50)
self.assertEqual(str(l), '40..50')
self.assertEqual(l.Strand, 1)
#should handle ambiguous starts and ends
l = lsp('>37')
self.assertEqual(l._data, 37)
self.assertEqual(str(l), '>37')
l = lsp('<37')
self.assertEqual(l._data, 37)
self.assertEqual(str(l), '<37')
l = lsp('<37..>42')
first, second = l._data
self.assertEqual(first._data, 37)
self.assertEqual(second._data, 42)
self.assertEqual(str(first), '<37')
self.assertEqual(str(second), '>42')
self.assertEqual(str(l), '<37..>42')
def test_parse_location_line(self):
"""parse_location_line should give correct list of location objects"""
llt = location_line_tokenizer
r = parse_location_line(llt(['123..456']))
self.assertEqual(str(r), '123..456')
r = parse_location_line(llt(['complement(123..456)']))
self.assertEqual(str(r), 'complement(123..456)')
r = parse_location_line(llt(['complement(123..456, 345..678)']))
self.assertEqual(str(r), \
'join(complement(345..678),complement(123..456))')
r = parse_location_line(llt(['complement(join(123..456, 345..678))']))
self.assertEqual(str(r), \
'join(complement(345..678),complement(123..456))')
r = parse_location_line(\
llt(['join(complement(123..456), complement(345..678))']))
self.assertEqual(str(r), \
'join(complement(123..456),complement(345..678))')
#try some nested joins and complements
r = parse_location_line(llt(\
['complement(join(1..2,3..4,complement(5..6),',
'join(7..8,complement(9..10))))']))
self.assertEqual(str(r), \
'join(9..10,complement(7..8),5..6,complement(3..4),complement(1..2))')
def test_parse_reference(self):
"""parse_reference should give correct fields"""
r = \
"""REFERENCE 2 (bases 1 to 2587)
AUTHORS Janzen,D.M. and Geballe,A.P.
TITLE The effect of eukaryotic release factor depletion on translation
termination in human cell lines
JOURNAL (er) Nucleic Acids Res. 32 (15), 4491-4502 (2004)
PUBMED 15326224"""
result = parse_reference(r.split('\n'))
self.assertEqual(len(result), 5)
self.assertEqual(result['reference'], '2 (bases 1 to 2587)')
self.assertEqual(result['authors'], 'Janzen,D.M. and Geballe,A.P.')
self.assertEqual(result['title'], \
'The effect of eukaryotic release factor depletion ' + \
'on translation termination in human cell lines')
self.assertEqual(result['journal'], \
'(er) Nucleic Acids Res. 32 (15), 4491-4502 (2004)')
self.assertEqual(result['pubmed'], '15326224')
def test_parse_source(self):
"""parse_source should split into source and organism"""
s = \
"""SOURCE African elephant.
ORGANISM Mitochondrion Loxodonta africana
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Proboscidea; Elephantidae; Loxodonta.""".split('\n')
r = parse_source(s)
self.assertEqual(len(r), 3)
self.assertEqual(r['source'], 'African elephant.')
self.assertEqual(r['species'], 'Mitochondrion Loxodonta africana')
self.assertEqual(r['taxonomy'], ['Eukaryota','Metazoa', 'Chordata',\
'Craniata', 'Vertebrata', 'Euteleostomi', 'Mammalia',\
'Eutheria', 'Proboscidea', 'Elephantidae', 'Loxodonta'])
class LocationTests(TestCase):
"""Tests of the Location class."""
def test_init(self):
"""Location should init with 1 or 2 values, plus params."""
l = Location(37)
self.assertEqual(str(l), '37')
l = Location(37, Ambiguity = '>')
self.assertEqual(str(l), '>37')
l = Location(37, Ambiguity='<')
self.assertEqual(str(l), '<37')
l = Location(37, Accession='AB123')
self.assertEqual(str(l), 'AB123:37')
l = Location(37, Accession='AB123', Db='Kegg')
self.assertEqual(str(l), 'Kegg::AB123:37')
l1 = Location(37)
l2 = Location(42)
l = Location([l1,l2])
self.assertEqual(str(l), '37..42')
l3 = Location([l1,l2], IsBounds=True)
self.assertEqual(str(l3), '(37.42)')
l4 = Location([l1,l2], IsBetween=True)
self.assertEqual(str(l4), '37^42')
l5 = Location([l4,l3])
self.assertEqual(str(l5), '37^42..(37.42)')
l5 = Location([l4,l3], Strand=-1)
self.assertEqual(str(l5), 'complement(37^42..(37.42))')
class LocationListTests(TestCase):
"""Tests of the LocationList class."""
def test_extract(self):
"""LocationList extract should return correct sequence"""
l = Location(3)
l2_a = Location(5)
l2_b = Location(7)
l2 = Location([l2_a,l2_b], Strand=-1)
l3_a = Location(10)
l3_b = Location(12)
l3 = Location([l3_a, l3_b])
ll = LocationList([l, l2, l3])
s = ll.extract('ACGTGCAGTCAGTAGCAT')
# 123456789012345678
self.assertEqual(s, 'G'+'TGC'+'CAG')
#check a case where it wraps around
l5_a = Location(16)
l5_b = Location(4)
l5 = Location([l5_a,l5_b])
ll = LocationList([l5])
s = ll.extract('ACGTGCAGTCAGTAGCAT')
self.assertEqual(s, 'CATACGT')
if __name__ == '__main__':
from sys import argv
if len(argv) > 2 and argv[1] == 'x':
filename = argv[2]
lines = open(filename)
for i in indent_splitter(lines):
print '******'
print i[0]
for j in indent_splitter(i[1:]):
print '?????'
for line in j:
print line
else:
main()
PyCogent-1.5.3/tests/test_parse/test_gff.py 000644 000765 000024 00000005026 12024702176 021732 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for GFF and related parsers.
"""
from cogent.parse.gff import *
from cogent.util.unit_test import TestCase, main
from StringIO import StringIO
__author__ = "Matthew Wakefield"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Matthew Wakefield"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Matthew Wakefield"
__email__ = "wakefield@wehi.edu.au"
__status__ = "Production"
headers = [
"""##gff-version 2
##source-version
##date
##Type []
##DNA
##acggctcggattggcgctggatgatagatcagacgac
##...
##end-DNA
""",
"""##gff-version 2
""",
"",
]
# '\t\t\t\t\t\t\t \t[attribute]\n'
data_lines = [
('seq1\tBLASTX\tsimilarity\t101\t235\t87.1\t+\t0\tTarget "HBA_HUMAN" 11 55 ; E_value 0.0003\n',
('seq1', 'BLASTX', 'similarity', 100, 235, '87.1', '+', '0', 'Target "HBA_HUMAN" 11 55 ; E_value 0.0003', None)),
('dJ102G20\tGD_mRNA\tcoding_exon\t7105\t7201\t.\t-\t2\tSequence "dJ102G20.C1.1"\n',
('dJ102G20', 'GD_mRNA', 'coding_exon', 7201, 7104, '.', '-', '2', 'Sequence "dJ102G20.C1.1"', None)),
('dJ102G20\tGD_mRNA\tcoding_exon\t7105\t7201\t.\t-\t2\t\n',
('dJ102G20', 'GD_mRNA', 'coding_exon', 7201, 7104, '.', '-', '2', '', None)),
('12345\tSource with spaces\tfeature with spaces\t-100\t3600000000\t1e-5\t-\t.\tSequence "BROADO5" ; Note "This is a \\t tab containing \\n multi line comment"\n',
('12345', 'Source with spaces', 'feature with spaces', 3600000000L, 101, '1e-5', '-', '.', 'Sequence "BROADO5" ; Note "This is a \\t tab containing \\n multi line comment"', None)),
]
class GffTest(TestCase):
"""Setup data for all the GFF parsers."""
def testGffParserData(self):
"""Test GffParser with valid data lines"""
for (line,canned_result) in data_lines:
result = GffParser(StringIO(line)).next()
self.assertEqual(result,canned_result)
def testGffParserHeaders(self):
"""Test GffParser with valid data headers"""
data = "".join([x[0] for x in data_lines])
for header in headers:
result = list(GffParser(StringIO(header+data)))
self.assertEqual(result,[x[1] for x in data_lines])
def test_parse_attributes(self):
"""Test parse_attributes"""
self.assertEqual([parse_attributes(x[1][8]) for x in data_lines],
['HBA_HUMAN', 'dJ102G20.C1.1', '', 'BROADO5'])
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_gibbs.py 000644 000765 000024 00000522703 12024702176 022264 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests for the Gibbs parser
"""
from __future__ import division
from cogent.util.unit_test import TestCase, main
import string
import re
from cogent.motif.util import Motif,Module
from cogent.core.moltype import DNA,RNA,PROTEIN
from cogent.parse.record import DelimitedSplitter
from cogent.parse.record_finder import LabeledRecordFinder
from cogent.parse.gibbs import get_sequence_and_motif_blocks, get_sequence_map,\
get_motif_blocks, get_motif_sequences, get_motif_p_value, guess_alphabet,\
build_module_objects, module_ids_to_int, GibbsParser
from math import exp
__author__ = "Jeremy Widmann"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jeremy Widmann"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Jeremy Widmann"
__email__ = "jeremy.widmann@colorado.edu"
__status__ = "Production"
class GibbsTests(TestCase):
"""Tests for gibbs parser.
"""
def setUp(self):
"""Setup function for gibbs tests.
"""
self.gibbs_lines = GIBBS_FILE.split('\n')
self.sequence_map = {'1':'1091044',\
'10':'135765',\
'11':'1388082',\
'12':'140543',\
'13':'14286173',\
'14':'14578634',\
'15':'14600438',\
'16':'15218394',\
'17':'15597673',\
'18':'15599256',\
'19':'15602312',\
'2':'11467494',\
'20':'15605725',\
'21':'15605963',\
'22':'15609375',\
'23':'15609658',\
'24':'15613511',\
'25':'15614085',\
'26':'15614140',\
'27':'15615431',\
'28':'15643152',\
'29':'15672286',\
'3':'11499727',\
'30':'15790738',\
'31':'15791337',\
'32':'15801846',\
'33':'15805225',\
'34':'15805374',\
'35':'15807234',\
'36':'15826629',\
'37':'15899007',\
'38':'15899339',\
'39':'15964668',\
'4':'1174686',\
'40':'15966937',\
'41':'15988313',\
'42':'16078864',\
'43':'16123427',\
'44':'16125919',\
'45':'16330420',\
'46':'1633495',\
'47':'16501671',\
'48':'1651717',\
'49':'16759994',\
'5':'12044976',\
'50':'16761507',\
'51':'16803644',\
'52':'16804867',\
'53':'17229033',\
'54':'17229859',\
'55':'1729944',\
'56':'17531233',\
'57':'17537401',\
'58':'17547503',\
'59':'18309723',\
'6':'13186328',\
'60':'18313548',\
'61':'18406743',\
'62':'19173077',\
'63':'19554157',\
'64':'19705357',\
'65':'19746502',\
'66':'20092028',\
'67':'20151112',\
'68':'21112072',\
'69':'21222859',\
'7':'13358154',\
'70':'21223405',\
'71':'21227878',\
'72':'21283385',\
'73':'21674812',\
'74':'23098307',\
'75':'2649838',\
'76':'267116',\
'77':'27375582',\
'78':'2822332',\
'79':'30021713',\
'8':'13541053',\
'80':'3261501',\
'81':'3318841',\
'82':'3323237',\
'83':'4155972',\
'84':'4200327',\
'85':'4433065',\
'86':'4704732',\
'87':'4996210',\
'88':'5326864',\
'89':'6322180',\
'9':'13541117',\
'90':'6323138',\
'91':'6687568',\
'92':'6850955',\
'93':'7109697',\
'94':'7290567',\
'95':'9955016',\
'96':'15677788',\
}
self.motif_a_lines = """
10 columns
Num Motifs: 27
2, 1 72 klstq ILAISVDSPFSH lqyll 83 1.00 F 11467494
6, 1 66 nlntk IYAISNDSHFVQ knwie 77 1.00 F 13186328
8, 1 68 kknte VISVSEDTVYVH kawvq 79 1.00 F 13541053
9, 1 66 kfkak VIGISVDSPFSL aefak 77 1.00 F 13541117
""".split('\n')
self.motif_b_lines = """ MOTIF b
15 columns
Num Motifs: 6
2, 1 161 riles IQYVKENPGYACPVNWNFG dqvfy 179 1.00 F 11467494
47, 1 160 lrmvd ALQFHEEHGDVCPAQWEKG kegmn 178 1.00 F 16501671
67, 1 154 rkika AQYVAAHPGEVCPAKWKEG eatla 172 1.00 F 20151112
81, 1 166 lrvvi SLQLTAEKRVATPVDWKDG dsvmv 184 1.00 F 3318841
87, 1 163 lrvlk SLQLTNTHPVATPVNWKEG dkcci 181 1.00 F 4996210
95, 1 160 lrlvq AFQYTDEHGEVCPAGWKPG sdtik 178 1.00 F 9955016
**** * ******* ** *
Log Motif portion of MAP for motif b = -187.76179
Log Fragmentation portion of MAP for motif b = -7.77486
-------------------------------------------------------------------------
""".split('\n')
def test_get_sequence_and_motif_blocks(self):
"""get_sequence_and_motif_blocks tests."""
seq_motif_lines = ['before line',\
'=====MAP MAXIMIZATION RESULTS=====',\
'after line'\
]
exp_seq_block=['before line']
exp_motif_block=['=====MAP MAXIMIZATION RESULTS=====','after line']
seq_block,motif_block = get_sequence_and_motif_blocks(seq_motif_lines)
self.assertEqual(seq_block,exp_seq_block)
self.assertEqual(motif_block,exp_motif_block)
def test_get_sequence_map(self):
"""get_sequence_map tests."""
sequence_map = get_sequence_map(self.gibbs_lines)
self.assertEqual(sequence_map,self.sequence_map)
def test_get_motif_blocks(self):
"""get_motif_blocks tests."""
motif_lines = ['before motifs',\
'first MOTIF a',\
' motif a data',\
'second MOTIF b',\
' motif b data',
'after motifs'
]
exp_motif_blocks = [['first MOTIF a', 'motif a data'],\
['second MOTIF b', 'motif b data', 'after motifs']]
motif_blocks = get_motif_blocks(motif_lines)
self.assertEqual(motif_blocks,exp_motif_blocks)
def test_get_motif_sequences(self):
"""get_motif_sequences tests."""
motif_list = get_motif_sequences(self.motif_a_lines)
exp_motif_list = [('2', 71, 'ILAISVDSPFSH', 1.0, '1'),\
('6', 65, 'IYAISNDSHFVQ', 1.0, '1'),\
('8', 67, 'VISVSEDTVYVH', 1.0, '1'),\
('9', 65, 'VIGISVDSPFSL', 1.0, '1')]
self.assertEqual(motif_list,exp_motif_list)
def test_get_motif_p_value(self):
"""get_motif_p_value tests."""
log_list = ['Column 8 : Sequence Description from Fast A input',\
'Log Motif portion of MAP for motif a = -469.15170',\
'Log Fragmentation portion of MAP for motif a = -3.80666',\
]
exp_p_val = exp(-469.15170)
self.assertEqual(get_motif_p_value(log_list),exp_p_val)
def test_guess_alphabet(self):
"""guess_alphabet tests."""
motif_list = [('2', 71, 'ILAISVDSPFSH', 1.0, '1'),\
('6', 65, 'IYAISNDSHFVQ', 1.0, '1'),\
('8', 67, 'VISVSEDTVYVH', 1.0, '1'),\
('9', 65, 'VIGISVDSPFSL', 1.0, '1')]
alphabet = guess_alphabet(motif_list)
self.assertEqual(alphabet,PROTEIN)
def test_build_module_objects(self):
"""build_module_objects tests."""
module = list(build_module_objects(self.motif_b_lines,\
self.sequence_map))[0]
exp_module_dict = {('20151112', 153): 'AQYVAAHPGEVCPAKWKEG',\
('9955016', 159): 'AFQYTDEHGEVCPAGWKPG',\
('16501671', 159): 'ALQFHEEHGDVCPAQWEKG',\
('11467494', 160): 'IQYVKENPGYACPVNWNFG',\
('4996210', 162): 'SLQLTNTHPVATPVNWKEG',\
('3318841', 165): 'SLQLTAEKRVATPVDWKDG',\
}
#module.AlignedSeqs.items() == exp_module_dict.items()
for k1,k2 in zip(module.AlignedSeqs.keys(),\
exp_module_dict.keys()):
self.assertEqual(k1,k2)
v1 = str(module.AlignedSeqs[k1])
v2 = exp_module_dict[k2]
self.assertEqual(v1,v2)
def test_module_ids_to_int(self):
"""module_ids_to_int tests."""
module = list(build_module_objects(self.motif_b_lines,\
self.sequence_map))[0]
module_ids_to_int([module])
self.assertEqual(module.ID,'0')
GIBBS_FILE = """
Gibbs.linux superfamily_aln_gis.fasta 10,15,20,25 5,5,5,5
i = 20 range = 20 high = 11 low = -9
Gibbs 2.06.024 Jul 21 2005
Data file: superfamily_aln_gis.fasta
Current directory: /home/widmannj/superfamilies
The following options are set:
Concentrated Region False Sequence type False
Collapsed Alphabet False Pseudocount weight False
Use Expectation/Maximization False Don't Xnu sequence False
Help flag False Near optimal cutoff False
Number of iterations False Don't fragment False
Don't use map maximization False Repeat regions False
Output file False Informed priors file False
Plateau periods False palindromic sequence False
Don't Reverse complement False Number of seeds False
Seed Value False Pseudosite weight False
Suboptimal sampler output False Overlap False
Allow width to vary False Wilcoxon signed rank False
Sample along length False Output Scan File False
Output prior file False Modular Sampler False
Ignore Spacing Model False Sample Background False
Bkgnd Comp Model False Init from prior False
Homologous Seq pairs False Parallel Tempering False
Group Sampler False No progress info False
Fragment from middle False Verify Mode False
Alternate sample on k False No freq. soln. False
Calc. def. pseudo wt. False Motif/Recur smpl False
Phylogenetic Sampling False Supress Near Opt. False
Nearopt display cutoff False
site_samp = 0
nMotifLen = 10, 15, 20, 25
nAlphaLen = 20
nNumMotifs = 5 ,5 ,5 ,5
dPseudoCntWt = 0.1
dPseudoSiteWt = 0.8
nMaxIterations = 500
lSeedVal = 1149743202
nPlateauPeriods = 20
nSeeds = 10
nNumMotifTypes = 4
dCutoff = 0.01
dNearOptDispCutoff = 0.5
RevComplement = 0
glOverlapParam = 0
Rcutoff factor = 0.001
Post Plateau Samples = 0
Frag/Shft Per. = 5
Frag width = 15,22,30,37
Sequences to be Searched:
_________________________
#1 1091044
#2 11467494
#3 11499727
#4 1174686
#5 12044976
#6 13186328
#7 13358154
#8 13541053
#9 13541117
#10 135765
#11 1388082
#12 140543
#13 14286173
#14 14578634
#15 14600438
#16 15218394
#17 15597673
#18 15599256
#19 15602312
#20 15605725
#21 15605963
#22 15609375
#23 15609658
#24 15613511
#25 15614085
#26 15614140
#27 15615431
#28 15643152
#29 15672286
#30 15790738
#31 15791337
#32 15801846
#33 15805225
#34 15805374
#35 15807234
#36 15826629
#37 15899007
#38 15899339
#39 15964668
#40 15966937
#41 15988313
#42 16078864
#43 16123427
#44 16125919
#45 16330420
#46 1633495
#47 16501671
#48 1651717
#49 16759994
#50 16761507
#51 16803644
#52 16804867
#53 17229033
#54 17229859
#55 1729944
#56 17531233
#57 17537401
#58 17547503
#59 18309723
#60 18313548
#61 18406743
#62 19173077
#63 19554157
#64 19705357
#65 19746502
#66 20092028
#67 20151112
#68 21112072
#69 21222859
#70 21223405
#71 21227878
#72 21283385
#73 21674812
#74 23098307
#75 2649838
#76 267116
#77 27375582
#78 2822332
#79 30021713
#80 3261501
#81 3318841
#82 3323237
#83 4155972
#84 4200327
#85 4433065
#86 4704732
#87 4996210
#88 5326864
#89 6322180
#90 6323138
#91 6687568
#92 6850955
#93 7109697
#94 7290567
#95 9955016
#96 15677788
Processed Sequence Length: 16216 Total sequence length: 16307
Seed = 1149743202
motif A: 5 (+/- 7.88) out of 15393 a = 20; b = 61552; p = 0.000323771
motif B: 5 (+/- 7.87) out of 14888 a = 20; b = 59532; p = 0.000334158
motif C: 5 (+/- 7.86) out of 14383 a = 20; b = 57512; p = 0.000345232
motif D: 5 (+/- 7.85) out of 13878 a = 20; b = 55492; p = 0.000357066
** 1 **
1
2
3
4[] motif A cycle 4 AP 0.0 (0 sites)
[] motif B cycle 4 AP -567.7 (18 sites)
[] motif C cycle 4 AP -1161.7 (31 sites)
[] motif D cycle 4 AP -245.0 (4 sites)
Total Map : 412.899 Prev: -1.79769e+308 Diff: 1.79769e+308 Motifs: 53
5[] motif A cycle 5 AP -26.6 (1 sites)
[] motif B cycle 5 AP -426.5 (17 sites)
[] motif C cycle 5 AP -1245.5 (33 sites)
[------] motif D cycle 5 AP -210.4 (4 sites)
Total Map : 499.315 Prev: 412.899 Diff: 86.4157 Motifs: 55
6[] motif A cycle 6 AP -178.0 (10 sites)
[] motif B cycle 6 AP -691.2 (26 sites)
[] motif C cycle 6 AP -1189.4 (32 sites)
[------] motif D cycle 6 AP -315.4 (6 sites)
Total Map : 605.7 Prev: 499.315 Diff: 106.385 Motifs: 74
7[] motif A cycle 7 AP -260.3 (16 sites)
[] motif B cycle 7 AP -664.9 (25 sites)
[] motif C cycle 7 AP -1189.4 (32 sites)
[------] motif D cycle 7 AP -366.4 (7 sites)
Total Map : 646.637 Prev: 605.7 Diff: 40.937 Motifs: 80
8[] motif A cycle 8 AP -331.5 (20 sites)
[] motif B cycle 8 AP -725.3 (27 sites)
[] motif C cycle 8 AP -1141.6 (31 sites)
[------] motif D cycle 8 AP -366.4 (7 sites)
Total Map : 660.38 Prev: 646.637 Diff: 13.7434 Motifs: 85
9
10[] motif A cycle 10 AP -371.3 (22 sites)
[] motif B cycle 10 AP -727.7 (27 sites)
[] motif C cycle 10 AP -1245.5 (33 sites)
[] motif D cycle 10 AP -346.6 (7 sites)
Total Map : 671.561 Prev: 660.38 Diff: 11.181 Motifs: 89
11[] motif A cycle 11 AP -365.7 (22 sites)
[] motif B cycle 11 AP -760.1 (28 sites)
[] motif C cycle 11 AP -1189.4 (32 sites)
[] motif D cycle 11 AP -346.6 (7 sites)
Total Map : 685.28 Prev: 671.561 Diff: 13.719 Motifs: 89
12[] motif A cycle 12 AP -446.5 (26 sites)
[] motif B cycle 12 AP -725.3 (27 sites)
[] motif C cycle 12 AP -1141.6 (31 sites)
[] motif D cycle 12 AP -346.6 (7 sites)
Total Map : 689.5 Prev: 685.28 Diff: 4.21965 Motifs: 91
13
14
15[] motif A cycle 15 AP -422.8 (26 sites)
[] motif B cycle 15 AP -726.1 (27 sites)
[] motif C cycle 15 AP -1041.5 (29 sites)
[] motif D cycle 15 AP -348.1 (7 sites)
Total Map : 714.246 Prev: 689.5 Diff: 24.7462 Motifs: 89
16[] motif A cycle 16 AP -440.3 (27 sites)
[] motif B cycle 16 AP -725.3 (27 sites)
[] motif C cycle 16 AP -1041.5 (29 sites)
[] motif D cycle 16 AP -348.1 (7 sites)
Total Map : 717.139 Prev: 714.246 Diff: 2.89264 Motifs: 90
17[] motif A cycle 17 AP -416.4 (26 sites)
[] motif B cycle 17 AP -725.3 (27 sites)
[] motif C cycle 17 AP -1041.5 (29 sites)
[] motif D cycle 17 AP -348.1 (7 sites)
Total Map : 723.199 Prev: 717.139 Diff: 6.05975 Motifs: 89
18[] motif A cycle 18 AP -476.5 (29 sites)
[] motif B cycle 18 AP -660.6 (25 sites)
[] motif C cycle 18 AP -1041.5 (29 sites)
[] motif D cycle 18 AP -348.1 (7 sites)
Total Map : 725.194 Prev: 723.199 Diff: 1.99565 Motifs: 90
19[] motif A cycle 19 AP -387.7 (25 sites)
[] motif B cycle 19 AP -725.3 (27 sites)
[] motif C cycle 19 AP -1041.5 (29 sites)
[] motif D cycle 19 AP -348.1 (7 sites)
Total Map : 730.839 Prev: 725.194 Diff: 5.64428 Motifs: 88
20[] motif A cycle 20 AP -454.8 (28 sites)
[] motif B cycle 20 AP -691.2 (26 sites)
[] motif C cycle 20 AP -1090.2 (30 sites)
[] motif D cycle 20 AP -348.1 (7 sites)
Total Map : 740.315 Prev: 730.839 Diff: 9.47605 Motifs: 91
21
22[] motif A cycle 22 AP -347.4 (23 sites)
[] motif B cycle 22 AP -691.2 (26 sites)
[] motif C cycle 22 AP -1041.0 (29 sites)
[] motif D cycle 22 AP -348.1 (7 sites)
Total Map : 742.668 Prev: 740.315 Diff: 2.35327 Motifs: 85
23
24[] motif A cycle 24 AP -368.0 (24 sites)
[] motif B cycle 24 AP -728.8 (27 sites)
[] motif C cycle 24 AP -1041.0 (29 sites)
[] motif D cycle 24 AP -348.1 (7 sites)
Total Map : 742.863 Prev: 742.668 Diff: 0.19554 Motifs: 87
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
MAX :: 742.863379 (Seed = 1149743202, Iteration = 24 Motif A = 24 Motif B = 27 Motif C = 29 Motif D = 7 )
motif A: 5 (+/- 7.88) out of 15393 a = 20; b = 61552; p = 0.000323771
motif B: 5 (+/- 7.87) out of 14888 a = 20; b = 59532; p = 0.000334158
motif C: 5 (+/- 7.86) out of 14383 a = 20; b = 57512; p = 0.000345232
motif D: 5 (+/- 7.85) out of 13878 a = 20; b = 55492; p = 0.000357066
** 2 **
1
2
3
4[] motif A cycle 4 AP -53.8 (3 sites)
[] motif B cycle 4 AP -751.2 (33 sites)
[] motif C cycle 4 AP -1873.4 (54 sites)
[] motif D cycle 4 AP -667.4 (12 sites)
Total Map : 1679.61 Prev: -1.79769e+308 Diff: 1.79769e+308 Motifs: 102
5[] motif A cycle 5 AP -51.0 (3 sites)
[] motif B cycle 5 AP -684.4 (33 sites)
[] motif C cycle 5 AP -1599.1 (54 sites)
[------] motif D cycle 5 AP -534.8 (12 sites)
Total Map : 2005.21 Prev: 1679.61 Diff: 325.602 Motifs: 102
6[] motif A cycle 6 AP -71.4 (4 sites)
[] motif B cycle 6 AP -741.8 (35 sites)
[] motif C cycle 6 AP -1599.1 (54 sites)
[------] motif D cycle 6 AP -468.9 (11 sites)
Total Map : 2020.82 Prev: 2005.21 Diff: 15.6117 Motifs: 104
7[] motif A cycle 7 AP -51.0 (3 sites)
[] motif B cycle 7 AP -741.8 (35 sites)
[] motif C cycle 7 AP -1599.1 (54 sites)
[------] motif D cycle 7 AP -468.9 (11 sites)
Total Map : 2022.05 Prev: 2020.82 Diff: 1.23593 Motifs: 103
8
9
10[] motif A cycle 10 AP -48.5 (3 sites)
[] motif B cycle 10 AP -741.8 (35 sites)
[] motif C cycle 10 AP -1599.1 (54 sites)
[] motif D cycle 10 AP -534.8 (12 sites)
Total Map : 2026.1 Prev: 2022.05 Diff: 4.04393 Motifs: 104
11
12
13
14
15[] motif A cycle 15 AP -47.4 (3 sites)
[] motif B cycle 15 AP -741.8 (35 sites)
[] motif C cycle 15 AP -1599.1 (54 sites)
[] motif D cycle 15 AP -534.8 (12 sites)
Total Map : 2026.45 Prev: 2026.1 Diff: 0.353957 Motifs: 104
16
17
18[] motif A cycle 18 AP -64.6 (4 sites)
[] motif B cycle 18 AP -741.8 (35 sites)
[] motif C cycle 18 AP -1599.1 (54 sites)
[] motif D cycle 18 AP -468.9 (11 sites)
Total Map : 2028.63 Prev: 2026.45 Diff: 2.1753 Motifs: 104
19
20
21
22[] motif A cycle 22 AP -64.6 (4 sites)
[] motif B cycle 22 AP -850.9 (38 sites)
[] motif C cycle 22 AP -1599.1 (54 sites)
[] motif D cycle 22 AP -468.9 (11 sites)
Total Map : 2029.4 Prev: 2028.63 Diff: 0.769306 Motifs: 107
23
24
25[] motif A cycle 25 AP -64.6 (4 sites)
[] motif B cycle 25 AP -850.9 (38 sites)
[] motif C cycle 25 AP -1599.1 (54 sites)
[] motif D cycle 25 AP -534.8 (12 sites)
Total Map : 2029.8 Prev: 2029.4 Diff: 0.400632 Motifs: 108
26
27
28
29
30[] motif A cycle 30 AP -47.5 (3 sites)
[] motif B cycle 30 AP -850.9 (38 sites)
[] motif C cycle 30 AP -1658.1 (55 sites)
[] motif D cycle 30 AP -467.4 (11 sites)
Total Map : 2035.48 Prev: 2029.8 Diff: 5.68003 Motifs: 107
31
32
33
34[] motif A cycle 34 AP -47.5 (3 sites)
[] motif B cycle 34 AP -816.5 (37 sites)
[] motif C cycle 34 AP -1599.1 (54 sites)
[] motif D cycle 34 AP -467.4 (11 sites)
Total Map : 2036.53 Prev: 2035.48 Diff: 1.0477 Motifs: 105
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
MAX :: 2036.525370 (Seed = 1149743202, Iteration = 34 Motif A = 3 Motif B = 37 Motif C = 54 Motif D = 11 )
motif A: 5 (+/- 7.88) out of 15393 a = 20; b = 61552; p = 0.000323771
motif B: 5 (+/- 7.87) out of 14888 a = 20; b = 59532; p = 0.000334158
motif C: 5 (+/- 7.86) out of 14383 a = 20; b = 57512; p = 0.000345232
motif D: 5 (+/- 7.85) out of 13878 a = 20; b = 55492; p = 0.000357066
** 3 **
1
2
3
4[] motif A cycle 4 AP -408.4 (26 sites)
[] motif B cycle 4 AP -71.4 (2 sites)
[] motif C cycle 4 AP -1871.1 (53 sites)
[] motif D cycle 4 AP -174.1 (3 sites)
Total Map : 1146.38 Prev: -1.79769e+308 Diff: 1.79769e+308 Motifs: 84
5[] motif A cycle 5 AP -296.5 (26 sites)
[] motif B cycle 5 AP -71.4 (2 sites)
[] motif C cycle 5 AP -1665.3 (53 sites)
[] motif D cycle 5 AP -174.1 (3 sites)
Total Map : 1455.53 Prev: 1146.38 Diff: 309.149 Motifs: 84
6[] motif A cycle 6 AP -343.8 (29 sites)
[] motif B cycle 6 AP -71.4 (2 sites)
[] motif C cycle 6 AP -1696.2 (54 sites)
[] motif D cycle 6 AP -174.1 (3 sites)
Total Map : 1498.45 Prev: 1455.53 Diff: 42.9205 Motifs: 88
7
8[] motif A cycle 8 AP -384.0 (31 sites)
[] motif B cycle 8 AP -71.4 (2 sites)
[] motif C cycle 8 AP -1696.2 (54 sites)
[] motif D cycle 8 AP -174.1 (3 sites)
Total Map : 1502.55 Prev: 1498.45 Diff: 4.0994 Motifs: 90
9[] motif A cycle 9 AP -422.5 (33 sites)
[] motif B cycle 9 AP -71.4 (2 sites)
[] motif C cycle 9 AP -1696.2 (54 sites)
[] motif D cycle 9 AP -174.1 (3 sites)
Total Map : 1503.54 Prev: 1502.55 Diff: 0.99024 Motifs: 92
10
11[] motif A cycle 11 AP -419.9 (34 sites)
[] motif B cycle 11 AP -71.4 (2 sites)
[] motif C cycle 11 AP -1753.3 (55 sites)
[] motif D cycle 11 AP -174.1 (3 sites)
Total Map : 1516.48 Prev: 1503.54 Diff: 12.938 Motifs: 94
12[] motif A cycle 12 AP -419.9 (34 sites)
[] motif B cycle 12 AP -71.4 (2 sites)
[] motif C cycle 12 AP -1696.2 (54 sites)
[] motif D cycle 12 AP -174.1 (3 sites)
Total Map : 1518.67 Prev: 1516.48 Diff: 2.18498 Motifs: 93
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32motif A: 5 (+/- 7.88) out of 15393 a = 20; b = 61552; p = 0.000323771
motif B: 5 (+/- 7.87) out of 14888 a = 20; b = 59532; p = 0.000334158
motif C: 5 (+/- 7.86) out of 14383 a = 20; b = 57512; p = 0.000345232
motif D: 5 (+/- 7.85) out of 13878 a = 20; b = 55492; p = 0.000357066
** 4 **
1
2
3
4[] motif A cycle 4 AP 0.0 (0 sites)
[] motif B cycle 4 AP -1240.2 (53 sites)
[] motif C cycle 4 AP -192.8 (5 sites)
[] motif D cycle 4 AP -1139.0 (22 sites)
Total Map : 1140.99 Prev: -1.79769e+308 Diff: 1.79769e+308 Motifs: 80
5[] motif A cycle 5 AP -27.5 (1 sites)
[] motif B cycle 5 AP -1108.9 (54 sites)
[] motif C cycle 5 AP -107.7 (4 sites)
[+++] motif D cycle 5 AP -1112.4 (24 sites)
Total Map : 1548.32 Prev: 1140.99 Diff: 407.325 Motifs: 83
6
7
8
9[] motif A cycle 9 AP -370.2 (22 sites)
[] motif B cycle 9 AP -1103.5 (54 sites)
[] motif C cycle 9 AP -1257.6 (33 sites)
[+++] motif D cycle 9 AP 0.0 (0 sites)
Total Map : 1561.07 Prev: 1548.32 Diff: 12.75 Motifs: 109
10[] motif A cycle 10 AP -434.4 (26 sites)
[] motif B cycle 10 AP -1071.7 (53 sites)
[] motif C cycle 10 AP -1241.9 (34 sites)
[] motif D cycle 10 AP -75.4 (1 sites)
Total Map : 1606.41 Prev: 1561.07 Diff: 45.3433 Motifs: 114
11
12[] motif A cycle 12 AP -551.0 (33 sites)
[] motif B cycle 12 AP -1144.9 (55 sites)
[] motif C cycle 12 AP -1232.3 (34 sites)
[] motif D cycle 12 AP 0.0 (0 sites)
Total Map : 1660.34 Prev: 1606.41 Diff: 53.9249 Motifs: 122
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32motif A: 5 (+/- 7.88) out of 15393 a = 20; b = 61552; p = 0.000323771
motif B: 5 (+/- 7.87) out of 14888 a = 20; b = 59532; p = 0.000334158
motif C: 5 (+/- 7.86) out of 14383 a = 20; b = 57512; p = 0.000345232
motif D: 5 (+/- 7.85) out of 13878 a = 20; b = 55492; p = 0.000357066
** 5 **
1
2
3
4[] motif A cycle 4 AP -44.7 (2 sites)
[] motif B cycle 4 AP -1352.6 (53 sites)
[] motif C cycle 4 AP -98.3 (2 sites)
[] motif D cycle 4 AP -1454.8 (28 sites)
Total Map : 989.762 Prev: -1.79769e+308 Diff: 1.79769e+308 Motifs: 85
5[] motif A cycle 5 AP -61.7 (3 sites)
[] motif B cycle 5 AP -1093.0 (53 sites)
[] motif C cycle 5 AP -82.2 (2 sites)
[-----] motif D cycle 5 AP -1250.5 (29 sites)
Total Map : 1543.73 Prev: 989.762 Diff: 553.967 Motifs: 87
6[] motif A cycle 6 AP -117.1 (6 sites)
[] motif B cycle 6 AP -1118.4 (54 sites)
[] motif C cycle 6 AP -82.2 (2 sites)
[-----] motif D cycle 6 AP -1228.9 (29 sites)
Total Map : 1579.26 Prev: 1543.73 Diff: 35.536 Motifs: 91
7
8
9
10[] motif A cycle 10 AP -119.2 (6 sites)
[] motif B cycle 10 AP -1156.4 (55 sites)
[] motif C cycle 10 AP -81.4 (2 sites)
[] motif D cycle 10 AP -1227.7 (29 sites)
Total Map : 1584.6 Prev: 1579.26 Diff: 5.33507 Motifs: 92
11[] motif A cycle 11 AP -117.1 (6 sites)
[] motif B cycle 11 AP -1117.5 (54 sites)
[] motif C cycle 11 AP -81.4 (2 sites)
[] motif D cycle 11 AP -1227.7 (29 sites)
Total Map : 1584.81 Prev: 1584.6 Diff: 0.213828 Motifs: 91
12[] motif A cycle 12 AP -117.1 (6 sites)
[] motif B cycle 12 AP -1156.4 (55 sites)
[] motif C cycle 12 AP -81.4 (2 sites)
[] motif D cycle 12 AP -1227.7 (29 sites)
Total Map : 1587.59 Prev: 1584.81 Diff: 2.77506 Motifs: 92
13
14
15
16
17
18
19
20[] motif A cycle 20 AP -97.4 (5 sites)
[] motif B cycle 20 AP -1156.8 (55 sites)
[] motif C cycle 20 AP -79.9 (2 sites)
[] motif D cycle 20 AP -1227.7 (29 sites)
Total Map : 1587.99 Prev: 1587.59 Diff: 0.397059 Motifs: 91
21
22
23
24
25
26
27
28
29
30[] motif A cycle 30 AP -39.0 (2 sites)
[] motif B cycle 30 AP -1156.8 (55 sites)
[] motif C cycle 30 AP -80.5 (2 sites)
[] motif D cycle 30 AP -1227.7 (29 sites)
Total Map : 1588.16 Prev: 1587.99 Diff: 0.175261 Motifs: 88
31
32[] motif A cycle 32 AP -58.3 (3 sites)
[] motif B cycle 32 AP -1156.8 (55 sites)
[] motif C cycle 32 AP -80.5 (2 sites)
[] motif D cycle 32 AP -1227.7 (29 sites)
Total Map : 1588.59 Prev: 1588.16 Diff: 0.429544 Motifs: 89
33
34
35
36[] motif A cycle 36 AP -232.0 (13 sites)
[] motif B cycle 36 AP -1117.1 (54 sites)
[] motif C cycle 36 AP 0.0 (0 sites)
[] motif D cycle 36 AP -1227.7 (29 sites)
Total Map : 1601.78 Prev: 1588.59 Diff: 13.1946 Motifs: 96
37[] motif A cycle 37 AP -229.0 (13 sites)
[] motif B cycle 37 AP -1156.8 (55 sites)
[] motif C cycle 37 AP -234.4 (5 sites)
[] motif D cycle 37 AP -1227.7 (29 sites)
Total Map : 1611.86 Prev: 1601.78 Diff: 10.0767 Motifs: 102
38[] motif A cycle 38 AP -247.4 (14 sites)
[] motif B cycle 38 AP -1156.8 (55 sites)
[] motif C cycle 38 AP -544.7 (13 sites)
[] motif D cycle 38 AP -1227.7 (29 sites)
Total Map : 1688.66 Prev: 1611.86 Diff: 76.798 Motifs: 111
39
40[] motif A cycle 40 AP -199.9 (12 sites)
[] motif B cycle 40 AP -1156.8 (55 sites)
[] motif C cycle 40 AP -692.6 (17 sites)
[] motif D cycle 40 AP -1227.7 (29 sites)
Total Map : 1767.8 Prev: 1688.66 Diff: 79.137 Motifs: 113
41[] motif A cycle 41 AP -311.4 (18 sites)
[] motif B cycle 41 AP -1156.8 (55 sites)
[] motif C cycle 41 AP -784.5 (19 sites)
[] motif D cycle 41 AP -1227.7 (29 sites)
Total Map : 1785.93 Prev: 1767.8 Diff: 18.1288 Motifs: 121
42
43
44
45
46[] motif A cycle 46 AP -483.4 (27 sites)
[] motif B cycle 46 AP -1120.7 (54 sites)
[] motif C cycle 46 AP -928.7 (22 sites)
[] motif D cycle 46 AP -1227.7 (29 sites)
Total Map : 1799.6 Prev: 1785.93 Diff: 13.6793 Motifs: 132
47[] motif A cycle 47 AP -529.5 (29 sites)
[] motif B cycle 47 AP -1156.8 (55 sites)
[] motif C cycle 47 AP -880.8 (21 sites)
[] motif D cycle 47 AP -1227.7 (29 sites)
Total Map : 1804.67 Prev: 1799.6 Diff: 5.06939 Motifs: 134
48[] motif A cycle 48 AP -459.6 (26 sites)
[] motif B cycle 48 AP -1118.4 (54 sites)
[] motif C cycle 48 AP -876.6 (21 sites)
[] motif D cycle 48 AP -1227.7 (29 sites)
Total Map : 1808.9 Prev: 1804.67 Diff: 4.22417 Motifs: 130
49
50
51
52[] motif A cycle 52 AP -487.1 (27 sites)
[] motif B cycle 52 AP -1156.8 (55 sites)
[] motif C cycle 52 AP -914.8 (22 sites)
[] motif D cycle 52 AP -1227.7 (29 sites)
Total Map : 1817.39 Prev: 1808.9 Diff: 8.49225 Motifs: 133
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72motif A: 5 (+/- 7.88) out of 15393 a = 20; b = 61552; p = 0.000323771
motif B: 5 (+/- 7.87) out of 14888 a = 20; b = 59532; p = 0.000334158
motif C: 5 (+/- 7.86) out of 14383 a = 20; b = 57512; p = 0.000345232
motif D: 5 (+/- 7.85) out of 13878 a = 20; b = 55492; p = 0.000357066
** 6 **
1
2
3
4[] motif A cycle 4 AP -67.0 (3 sites)
[] motif B cycle 4 AP -1223.6 (54 sites)
[] motif C cycle 4 AP -145.7 (3 sites)
[] motif D cycle 4 AP -224.8 (4 sites)
Total Map : 981.16 Prev: -1.79769e+308 Diff: 1.79769e+308 Motifs: 64
5[] motif A cycle 5 AP -37.3 (2 sites)
[] motif B cycle 5 AP -989.0 (54 sites)
[] motif C cycle 5 AP -304.6 (8 sites)
[----] motif D cycle 5 AP -187.9 (4 sites)
Total Map : 1275.57 Prev: 981.16 Diff: 294.41 Motifs: 68
6[] motif A cycle 6 AP -81.8 (5 sites)
[] motif B cycle 6 AP -989.0 (54 sites)
[] motif C cycle 6 AP -590.1 (16 sites)
[----] motif D cycle 6 AP -358.3 (7 sites)
Total Map : 1416.85 Prev: 1275.57 Diff: 141.283 Motifs: 82
7[] motif A cycle 7 AP -96.8 (6 sites)
[] motif B cycle 7 AP -989.0 (54 sites)
[] motif C cycle 7 AP -633.2 (17 sites)
[----] motif D cycle 7 AP -358.3 (7 sites)
Total Map : 1424.61 Prev: 1416.85 Diff: 7.75982 Motifs: 84
8
9
10[] motif A cycle 10 AP -91.9 (6 sites)
[] motif B cycle 10 AP -989.0 (54 sites)
[] motif C cycle 10 AP -674.5 (18 sites)
[-] motif D cycle 10 AP -337.3 (7 sites)
Total Map : 1448.23 Prev: 1424.61 Diff: 23.6156 Motifs: 85
11[] motif A cycle 11 AP -91.9 (6 sites)
[] motif B cycle 11 AP -989.0 (54 sites)
[] motif C cycle 11 AP -667.6 (18 sites)
[-] motif D cycle 11 AP -337.3 (7 sites)
Total Map : 1456.26 Prev: 1448.23 Diff: 8.03531 Motifs: 85
12
13[] motif A cycle 13 AP -114.3 (7 sites)
[] motif B cycle 13 AP -989.0 (54 sites)
[] motif C cycle 13 AP -854.9 (22 sites)
[-] motif D cycle 13 AP -337.3 (7 sites)
Total Map : 1463.23 Prev: 1456.26 Diff: 6.96803 Motifs: 90
14[] motif A cycle 14 AP -137.2 (8 sites)
[] motif B cycle 14 AP -989.0 (54 sites)
[] motif C cycle 14 AP -803.3 (21 sites)
[-] motif D cycle 14 AP -337.3 (7 sites)
Total Map : 1465.75 Prev: 1463.23 Diff: 2.52048 Motifs: 90
15[] motif A cycle 15 AP -134.8 (8 sites)
[] motif B cycle 15 AP -989.0 (54 sites)
[] motif C cycle 15 AP -803.3 (21 sites)
[+] motif D cycle 15 AP -336.6 (7 sites)
Total Map : 1470.32 Prev: 1465.75 Diff: 4.56864 Motifs: 90
16[] motif A cycle 16 AP -108.3 (7 sites)
[] motif B cycle 16 AP -989.0 (54 sites)
[] motif C cycle 16 AP -850.5 (22 sites)
[+] motif D cycle 16 AP -411.5 (8 sites)
Total Map : 1480.76 Prev: 1470.32 Diff: 10.4373 Motifs: 91
17
18
19
20[] motif A cycle 20 AP -128.6 (8 sites)
[] motif B cycle 20 AP -989.0 (54 sites)
[] motif C cycle 20 AP -940.1 (24 sites)
[--] motif D cycle 20 AP -405.0 (8 sites)
Total Map : 1500.43 Prev: 1480.76 Diff: 19.6736 Motifs: 94
21[] motif A cycle 21 AP -108.3 (7 sites)
[] motif B cycle 21 AP -989.0 (54 sites)
[] motif C cycle 21 AP -940.1 (24 sites)
[--] motif D cycle 21 AP -405.0 (8 sites)
Total Map : 1501.82 Prev: 1500.43 Diff: 1.38679 Motifs: 93
22[] motif A cycle 22 AP -108.3 (7 sites)
[] motif B cycle 22 AP -989.0 (54 sites)
[] motif C cycle 22 AP -986.8 (25 sites)
[--] motif D cycle 22 AP -405.0 (8 sites)
Total Map : 1503.53 Prev: 1501.82 Diff: 1.70712 Motifs: 94
23
24
25
26
27[] motif A cycle 27 AP -128.6 (8 sites)
[] motif B cycle 27 AP -989.0 (54 sites)
[] motif C cycle 27 AP -939.0 (24 sites)
[] motif D cycle 27 AP -404.6 (8 sites)
Total Map : 1504 Prev: 1503.53 Diff: 0.475394 Motifs: 94
28[] motif A cycle 28 AP -108.3 (7 sites)
[] motif B cycle 28 AP -989.0 (54 sites)
[] motif C cycle 28 AP -939.0 (24 sites)
[] motif D cycle 28 AP -404.6 (8 sites)
Total Map : 1505.38 Prev: 1504 Diff: 1.37798 Motifs: 93
29
30
31
32
33
34
35[] motif A cycle 35 AP -142.7 (9 sites)
[] motif B cycle 35 AP -989.0 (54 sites)
[] motif C cycle 35 AP -889.8 (23 sites)
[] motif D cycle 35 AP -471.7 (9 sites)
Total Map : 1508.79 Prev: 1505.38 Diff: 3.40696 Motifs: 95
36
37[] motif A cycle 37 AP -159.8 (10 sites)
[] motif B cycle 37 AP -989.0 (54 sites)
[] motif C cycle 37 AP -939.0 (24 sites)
[] motif D cycle 37 AP -404.6 (8 sites)
Total Map : 1514.12 Prev: 1508.79 Diff: 5.3377 Motifs: 96
38[] motif A cycle 38 AP -159.8 (10 sites)
[] motif B cycle 38 AP -989.0 (54 sites)
[] motif C cycle 38 AP -986.4 (25 sites)
[] motif D cycle 38 AP -404.6 (8 sites)
Total Map : 1514.79 Prev: 1514.12 Diff: 0.670946 Motifs: 97
39
40[] motif A cycle 40 AP -175.3 (11 sites)
[] motif B cycle 40 AP -989.0 (54 sites)
[] motif C cycle 40 AP -939.0 (24 sites)
[] motif D cycle 40 AP -404.6 (8 sites)
Total Map : 1518.92 Prev: 1514.79 Diff: 4.12608 Motifs: 97
41[] motif A cycle 41 AP -151.0 (10 sites)
[] motif B cycle 41 AP -989.0 (54 sites)
[] motif C cycle 41 AP -937.3 (24 sites)
[] motif D cycle 41 AP -404.6 (8 sites)
Total Map : 1522.39 Prev: 1518.92 Diff: 3.46783 Motifs: 96
42[] motif A cycle 42 AP -132.3 (9 sites)
[] motif B cycle 42 AP -989.0 (54 sites)
[] motif C cycle 42 AP -986.4 (25 sites)
[] motif D cycle 42 AP -404.6 (8 sites)
Total Map : 1523.36 Prev: 1522.39 Diff: 0.973725 Motifs: 96
43
44
45
46
47
48
49
50[] motif A cycle 50 AP -151.0 (10 sites)
[] motif B cycle 50 AP -989.0 (54 sites)
[] motif C cycle 50 AP -986.4 (25 sites)
[] motif D cycle 50 AP -404.6 (8 sites)
Total Map : 1524.12 Prev: 1523.36 Diff: 0.756895 Motifs: 97
51
52
53
54[] motif A cycle 54 AP -169.9 (11 sites)
[] motif B cycle 54 AP -989.0 (54 sites)
[] motif C cycle 54 AP -939.0 (24 sites)
[] motif D cycle 54 AP -404.6 (8 sites)
Total Map : 1524.49 Prev: 1524.12 Diff: 0.371963 Motifs: 97
55
56
57
58
59
60
61
62
63
64
65[] motif A cycle 65 AP -151.0 (10 sites)
[] motif B cycle 65 AP -989.0 (54 sites)
[] motif C cycle 65 AP -986.8 (25 sites)
[] motif D cycle 65 AP -402.9 (8 sites)
Total Map : 1529.11 Prev: 1524.49 Diff: 4.61554 Motifs: 97
66[] motif A cycle 66 AP -169.9 (11 sites)
[] motif B cycle 66 AP -989.0 (54 sites)
[] motif C cycle 66 AP -986.8 (25 sites)
[] motif D cycle 66 AP -402.9 (8 sites)
Total Map : 1530.16 Prev: 1529.11 Diff: 1.05073 Motifs: 98
67
68
69
70[] motif A cycle 70 AP -169.9 (11 sites)
[] motif B cycle 70 AP -989.0 (54 sites)
[] motif C cycle 70 AP -986.4 (25 sites)
[] motif D cycle 70 AP -402.9 (8 sites)
Total Map : 1532.06 Prev: 1530.16 Diff: 1.89983 Motifs: 98
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90motif A: 5 (+/- 7.88) out of 15393 a = 20; b = 61552; p = 0.000323771
motif B: 5 (+/- 7.87) out of 14888 a = 20; b = 59532; p = 0.000334158
motif C: 5 (+/- 7.86) out of 14383 a = 20; b = 57512; p = 0.000345232
motif D: 5 (+/- 7.85) out of 13878 a = 20; b = 55492; p = 0.000357066
** 7 **
1
2
3
4[] motif A cycle 4 AP -201.0 (12 sites)
[] motif B cycle 4 AP -113.6 (3 sites)
[] motif C cycle 4 AP -930.1 (23 sites)
[] motif D cycle 4 AP -2959.1 (54 sites)
Total Map : 903.055 Prev: -1.79769e+308 Diff: 1.79769e+308 Motifs: 92
5[] motif A cycle 5 AP -150.4 (11 sites)
[] motif B cycle 5 AP -117.6 (4 sites)
[] motif C cycle 5 AP -871.9 (23 sites)
[] motif D cycle 5 AP -2452.5 (52 sites)
Total Map : 1437.59 Prev: 903.055 Diff: 534.536 Motifs: 90
6[] motif A cycle 6 AP -185.8 (13 sites)
[] motif B cycle 6 AP -63.1 (2 sites)
[] motif C cycle 6 AP -817.6 (22 sites)
[] motif D cycle 6 AP -2556.7 (54 sites)
Total Map : 1469.43 Prev: 1437.59 Diff: 31.8425 Motifs: 91
7[] motif A cycle 7 AP -185.8 (13 sites)
[] motif B cycle 7 AP -63.1 (2 sites)
[] motif C cycle 7 AP -817.6 (22 sites)
[] motif D cycle 7 AP -2469.2 (53 sites)
Total Map : 1490.24 Prev: 1469.43 Diff: 20.8077 Motifs: 90
8
9
10[] motif A cycle 10 AP -185.8 (13 sites)
[] motif B cycle 10 AP -60.9 (2 sites)
[] motif C cycle 10 AP -923.4 (24 sites)
[--] motif D cycle 10 AP -2454.7 (53 sites)
Total Map : 1505.14 Prev: 1490.24 Diff: 14.9023 Motifs: 92
11
12
13[] motif A cycle 13 AP -185.8 (13 sites)
[] motif B cycle 13 AP -60.9 (2 sites)
[] motif C cycle 13 AP -817.6 (22 sites)
[--] motif D cycle 13 AP -2454.7 (53 sites)
Total Map : 1505.52 Prev: 1505.14 Diff: 0.372181 Motifs: 90
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33motif A: 5 (+/- 7.88) out of 15393 a = 20; b = 61552; p = 0.000323771
motif B: 5 (+/- 7.87) out of 14888 a = 20; b = 59532; p = 0.000334158
motif C: 5 (+/- 7.86) out of 14383 a = 20; b = 57512; p = 0.000345232
motif D: 5 (+/- 7.85) out of 13878 a = 20; b = 55492; p = 0.000357066
** 8 **
1
2
3
4[] motif A cycle 4 AP -56.8 (3 sites)
[] motif B cycle 4 AP -594.9 (23 sites)
[] motif C cycle 4 AP -194.1 (4 sites)
[] motif D cycle 4 AP -367.4 (6 sites)
Total Map : 271.959 Prev: -1.79769e+308 Diff: 1.79769e+308 Motifs: 36
5[] motif A cycle 5 AP -46.0 (3 sites)
[] motif B cycle 5 AP -486.2 (21 sites)
[] motif C cycle 5 AP -979.1 (29 sites)
[-----] motif D cycle 5 AP -276.6 (6 sites)
Total Map : 840.305 Prev: 271.959 Diff: 568.345 Motifs: 59
6[] motif A cycle 6 AP -46.0 (3 sites)
[] motif B cycle 6 AP -481.1 (20 sites)
[] motif C cycle 6 AP -1396.9 (39 sites)
[-----] motif D cycle 6 AP -486.7 (10 sites)
Total Map : 950.671 Prev: 840.305 Diff: 110.367 Motifs: 72
7
8[] motif A cycle 8 AP -46.0 (3 sites)
[] motif B cycle 8 AP -316.3 (10 sites)
[] motif C cycle 8 AP -3004.0 (74 sites)
[-----] motif D cycle 8 AP -552.2 (11 sites)
Total Map : 954.552 Prev: 950.671 Diff: 3.88022 Motifs: 98
9
10[] motif A cycle 10 AP -46.5 (3 sites)
[] motif B cycle 10 AP -243.9 (8 sites)
[] motif C cycle 10 AP -2992.1 (74 sites)
[-] motif D cycle 10 AP -530.1 (11 sites)
Total Map : 1001.93 Prev: 954.552 Diff: 47.3832 Motifs: 96
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30motif A: 5 (+/- 7.88) out of 15393 a = 20; b = 61552; p = 0.000323771
motif B: 5 (+/- 7.87) out of 14888 a = 20; b = 59532; p = 0.000334158
motif C: 5 (+/- 7.86) out of 14383 a = 20; b = 57512; p = 0.000345232
motif D: 5 (+/- 7.85) out of 13878 a = 20; b = 55492; p = 0.000357066
** 9 **
1
2
3
4[] motif A cycle 4 AP -86.2 (4 sites)
[] motif B cycle 4 AP -122.1 (3 sites)
[] motif C cycle 4 AP -2149.3 (51 sites)
[] motif D cycle 4 AP -234.1 (4 sites)
Total Map : 509.201 Prev: -1.79769e+308 Diff: 1.79769e+308 Motifs: 62
5[] motif A cycle 5 AP -66.9 (4 sites)
[] motif B cycle 5 AP -75.7 (3 sites)
[] motif C cycle 5 AP -1697.1 (50 sites)
[] motif D cycle 5 AP -461.9 (9 sites)
Total Map : 1092.03 Prev: 509.201 Diff: 582.825 Motifs: 66
6[] motif A cycle 6 AP -311.5 (21 sites)
[] motif B cycle 6 AP -96.6 (3 sites)
[] motif C cycle 6 AP -1790.1 (52 sites)
[] motif D cycle 6 AP -742.9 (14 sites)
Total Map : 1213.73 Prev: 1092.03 Diff: 121.707 Motifs: 90
7[] motif A cycle 7 AP -368.8 (24 sites)
[] motif B cycle 7 AP -112.7 (4 sites)
[] motif C cycle 7 AP -1737.2 (51 sites)
[] motif D cycle 7 AP -1041.8 (19 sites)
Total Map : 1253.72 Prev: 1213.73 Diff: 39.9908 Motifs: 98
8
9
10[] motif A cycle 10 AP -481.8 (31 sites)
[] motif B cycle 10 AP -58.0 (2 sites)
[] motif C cycle 10 AP -1790.1 (52 sites)
[] motif D cycle 10 AP -1237.8 (23 sites)
Total Map : 1285.2 Prev: 1253.72 Diff: 31.4769 Motifs: 108
11[] motif A cycle 11 AP -489.8 (32 sites)
[] motif B cycle 11 AP -99.2 (3 sites)
[] motif C cycle 11 AP -1790.1 (52 sites)
[] motif D cycle 11 AP -1417.6 (26 sites)
Total Map : 1303.74 Prev: 1285.2 Diff: 18.5344 Motifs: 113
12[] motif A cycle 12 AP -492.2 (32 sites)
[] motif B cycle 12 AP -58.0 (2 sites)
[] motif C cycle 12 AP -1842.5 (53 sites)
[] motif D cycle 12 AP -1419.1 (26 sites)
Total Map : 1309.87 Prev: 1303.74 Diff: 6.13088 Motifs: 113
13
14
15
16
17[] motif A cycle 17 AP -469.5 (31 sites)
[] motif B cycle 17 AP -58.7 (2 sites)
[] motif C cycle 17 AP -1790.1 (52 sites)
[] motif D cycle 17 AP -1419.1 (26 sites)
Total Map : 1311.38 Prev: 1309.87 Diff: 1.50905 Motifs: 111
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37motif A: 5 (+/- 7.88) out of 15393 a = 20; b = 61552; p = 0.000323771
motif B: 5 (+/- 7.87) out of 14888 a = 20; b = 59532; p = 0.000334158
motif C: 5 (+/- 7.86) out of 14383 a = 20; b = 57512; p = 0.000345232
motif D: 5 (+/- 7.85) out of 13878 a = 20; b = 55492; p = 0.000357066
** 10 **
1
2
3
4[] motif A cycle 4 AP -374.5 (22 sites)
[] motif B cycle 4 AP -74.4 (2 sites)
[] motif C cycle 4 AP -1774.4 (54 sites)
[] motif D cycle 4 AP -1794.7 (36 sites)
Total Map : 1821.24 Prev: -1.79769e+308 Diff: 1.79769e+308 Motifs: 114
5[] motif A cycle 5 AP -364.1 (22 sites)
[] motif B cycle 5 AP -74.4 (2 sites)
[] motif C cycle 5 AP -1607.6 (54 sites)
[++] motif D cycle 5 AP -1738.3 (36 sites)
Total Map : 2005.44 Prev: 1821.24 Diff: 184.204 Motifs: 114
6[] motif A cycle 6 AP -355.0 (22 sites)
[] motif B cycle 6 AP -74.4 (2 sites)
[] motif C cycle 6 AP -1607.6 (54 sites)
[++] motif D cycle 6 AP -1800.3 (37 sites)
Total Map : 2026.99 Prev: 2005.44 Diff: 21.5427 Motifs: 115
7[] motif A cycle 7 AP -305.7 (20 sites)
[] motif B cycle 7 AP -74.4 (2 sites)
[] motif C cycle 7 AP -1607.6 (54 sites)
[++] motif D cycle 7 AP -1800.3 (37 sites)
Total Map : 2042.47 Prev: 2026.99 Diff: 15.4851 Motifs: 113
8[] motif A cycle 8 AP -360.2 (23 sites)
[] motif B cycle 8 AP -74.4 (2 sites)
[] motif C cycle 8 AP -1607.6 (54 sites)
[++] motif D cycle 8 AP -1734.6 (36 sites)
Total Map : 2049.89 Prev: 2042.47 Diff: 7.41789 Motifs: 115
9
10[] motif A cycle 10 AP -321.7 (22 sites)
[] motif B cycle 10 AP -74.4 (2 sites)
[] motif C cycle 10 AP -1607.6 (54 sites)
[] motif D cycle 10 AP -1800.3 (37 sites)
Total Map : 2065.41 Prev: 2049.89 Diff: 15.5161 Motifs: 115
11[] motif A cycle 11 AP -345.0 (23 sites)
[] motif B cycle 11 AP -74.4 (2 sites)
[] motif C cycle 11 AP -1607.6 (54 sites)
[] motif D cycle 11 AP -1800.3 (37 sites)
Total Map : 2068.72 Prev: 2065.41 Diff: 3.31307 Motifs: 116
12
13
14
15
16[] motif A cycle 16 AP -467.4 (29 sites)
[] motif B cycle 16 AP -74.4 (2 sites)
[] motif C cycle 16 AP -1607.6 (54 sites)
[] motif D cycle 16 AP -1800.3 (37 sites)
Total Map : 2074.61 Prev: 2068.72 Diff: 5.89359 Motifs: 122
17[] motif A cycle 17 AP -469.2 (29 sites)
[] motif B cycle 17 AP -74.4 (2 sites)
[] motif C cycle 17 AP -1607.6 (54 sites)
[] motif D cycle 17 AP -1800.3 (37 sites)
Total Map : 2075.76 Prev: 2074.61 Diff: 1.14881 Motifs: 122
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32[] motif A cycle 32 AP -377.5 (25 sites)
[] motif B cycle 32 AP -107.2 (3 sites)
[] motif C cycle 32 AP -1607.6 (54 sites)
[] motif D cycle 32 AP -1800.3 (37 sites)
Total Map : 2090.08 Prev: 2075.76 Diff: 14.3209 Motifs: 119
33
34[] motif A cycle 34 AP -398.0 (26 sites)
[] motif B cycle 34 AP -107.2 (3 sites)
[] motif C cycle 34 AP -1607.6 (54 sites)
[] motif D cycle 34 AP -1800.3 (37 sites)
Total Map : 2090.33 Prev: 2090.08 Diff: 0.249073 Motifs: 120
35[] motif A cycle 35 AP -377.0 (25 sites)
[] motif B cycle 35 AP -89.9 (3 sites)
[] motif C cycle 35 AP -1607.6 (54 sites)
[] motif D cycle 35 AP -1800.3 (37 sites)
Total Map : 2095.44 Prev: 2090.33 Diff: 5.10984 Motifs: 119
36[] motif A cycle 36 AP -377.0 (25 sites)
[] motif B cycle 36 AP -159.5 (5 sites)
[] motif C cycle 36 AP -1607.6 (54 sites)
[] motif D cycle 36 AP -1800.3 (37 sites)
Total Map : 2099.24 Prev: 2095.44 Diff: 3.79902 Motifs: 121
37[] motif A cycle 37 AP -377.0 (25 sites)
[] motif B cycle 37 AP -160.6 (5 sites)
[] motif C cycle 37 AP -1607.6 (54 sites)
[] motif D cycle 37 AP -1800.3 (37 sites)
Total Map : 2102.31 Prev: 2099.24 Diff: 3.07348 Motifs: 121
38
39
40
41[] motif A cycle 41 AP -356.4 (24 sites)
[] motif B cycle 41 AP -160.7 (5 sites)
[] motif C cycle 41 AP -1607.6 (54 sites)
[] motif D cycle 41 AP -1800.3 (37 sites)
Total Map : 2103.18 Prev: 2102.31 Diff: 0.864532 Motifs: 120
42
43[] motif A cycle 43 AP -377.0 (25 sites)
[] motif B cycle 43 AP -160.7 (5 sites)
[] motif C cycle 43 AP -1607.6 (54 sites)
[] motif D cycle 43 AP -1800.3 (37 sites)
Total Map : 2103.36 Prev: 2103.18 Diff: 0.182404 Motifs: 121
44
45[] motif A cycle 45 AP -422.3 (27 sites)
[] motif B cycle 45 AP -229.0 (7 sites)
[] motif C cycle 45 AP -1607.6 (54 sites)
[] motif D cycle 45 AP -1800.3 (37 sites)
Total Map : 2104.94 Prev: 2103.36 Diff: 1.57855 Motifs: 125
46[] motif A cycle 46 AP -422.3 (27 sites)
[] motif B cycle 46 AP -187.8 (6 sites)
[] motif C cycle 46 AP -1607.6 (54 sites)
[] motif D cycle 46 AP -1668.3 (35 sites)
Total Map : 2109.23 Prev: 2104.94 Diff: 4.29029 Motifs: 122
47[] motif A cycle 47 AP -446.7 (28 sites)
[] motif B cycle 47 AP -187.8 (6 sites)
[] motif C cycle 47 AP -1607.6 (54 sites)
[] motif D cycle 47 AP -1668.3 (35 sites)
Total Map : 2109.87 Prev: 2109.23 Diff: 0.644012 Motifs: 123
48
49[] motif A cycle 49 AP -490.2 (30 sites)
[] motif B cycle 49 AP -187.8 (6 sites)
[] motif C cycle 49 AP -1607.6 (54 sites)
[] motif D cycle 49 AP -1668.3 (35 sites)
Total Map : 2110.99 Prev: 2109.87 Diff: 1.11211 Motifs: 125
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
MAX :: 2110.985589 (Seed = 1149743202, Iteration = 49 Motif A = 30 Motif B = 6 Motif C = 54 Motif D = 35 )
Max subopt MAP found on seed 10
======================== NEAR OPTIMAL RESULTS ========================
======================================================================
MAP = 584 maybe = 588 discard = 64640
Max set 2111.157969 at 4
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
105
110
115
120
125
130
135
140
145
150
155
160
165
170
175
180
185
190
195
200
205
210
215
220
225
230
235
240
245
250
255
260
265
270
275
280
285
290
295
300
305
310
315
320
325
330
335
340
345
350
355
360
365
370
375
380
385
390
395
400
405
410
415
420
425
430
435
440
445
450
455
460
465
470
475
480
485
490
495
500
=============================================================
====== Results by Sequence =====
=============================================================
1, 1, 3 37 ldfdk EFRDKTVVIVAIPGAFTPTCTANHIPPF vekft 64 1.00 F 1091044
1, 2, 0 79 agvda VIVLSANDPFVQ safgk 90 1.00 F 1091044
2, 1, 3 32 irlsd YRGKKYVILFFYPANFTAISPTELMLLS drise 59 1.00 F 11467494
2, 2, 0 72 klstq ILAISVDSPFSH lqyll 83 1.00 F 11467494
2, 3, 1 161 riles IQYVKENPGYACPVNWNFG dqvfy 179 1.00 F 11467494
3, 1, 2 17 viqsd KLVVVDFYADWCMPCRYISPILEKL skeyn 41 1.00 F 11499727
4, 1, 2 22 llntt QYVVADFYADWCGPCKAIAPMYAQF aktfs 46 1.00 F 1174686
5, 1, 2 19 ifsak KNVIVDFWAAWCGPCKLTSPEFQKA adefs 43 1.00 F 12044976
6, 1, 3 26 eikei DLKSNWNVFFFYPYSYSFICPLELKNIS nkike 53 1.00 F 13186328
6, 2, 0 66 nlntk IYAISNDSHFVQ knwie 77 1.00 F 13186328
7, 1, 2 21 kenhs KPILIDFYADWCPPCRMLIPVLDSI ekkhg 45 1.00 F 13358154
8, 1, 3 28 kirls SYRGKWVVLFFYPADFTFVCPTEVEGFA edyek 55 1.00 F 13541053
8, 2, 0 68 kknte VISVSEDTVYVH kawvq 79 1.00 F 13541053
9, 1, 3 26 mrkls EFRGQNVVLAFFPGAFTSVCTKEMCTFR dsman 53 1.00 F 13541117
9, 2, 0 66 kfkak VIGISVDSPFSL aefak 77 1.00 F 13541117
10, 1, 2 17 aetse GVVLADFWAPWCGPCKMIAPVLEEL dqemg 41 1.00 F 135765
11, 1, 2 29 akesn KLIVIDFTASWCPPCRMIAPIFNDL akkfm 53 1.00 F 1388082
12, 1, 2 44 likqn DKLVIDFYATWCGPCKMMQPHLTKL iqayp 68 1.00 F 140543
13, 1, 3 25 melpd EFEGKWFILFSHPADFTPVCTTEFVAFQ evype 52 1.00 F 14286173
13, 2, 0 65 eldce LVGLSVDQVFSH ikwie 76 1.00 F 14286173
14, 1, 2 80 selrg KVVMLQFTASWCGVCRKEMPFIEKD iwlkh 104 1.00 F 14578634
15, 1, 3 25 kirls DFRGRIVVLYFYPRAMTPGCTREGVRFN ellde 52 1.00 F 14600438
15, 2, 0 65 klgav VIGVSTDSVEKN rkfae 76 1.00 F 14600438
16, 1, 2 23 lqnsd KPVLVDFYATWCGPCQLMVPILNEV setlk 47 1.00 F 15218394
17, 1, 2 157 adfrg RPLVINLWASWCPPCRREMPVLQQA qaenp 181 1.00 F 15597673
18, 1, 2 26 ensfh KPVLVDFWADWCAPCKALMPLLAQI aesyq 50 1.00 F 15599256
19, 1, 2 67 negkg KTILLNFWSETCGVCIAELKTFEQL lqsyp 91 1.00 F 15602312
20, 1, 2 61 eefkg KVLLINFWATWCPPCKEEIPMFKEI yekyr 85 1.00 F 15605725
21, 1, 0 80 megvd VTVVSMDLPFAQ krfce 91 1.00 F 15605963
22, 1, 3 26 vtlrg YRGAKNVLLVFFPLAFTGICQGELDQLR dhlpe 53 1.00 F 15609375
23, 1, 3 30 nvsla DYRGRRVIVYFYPAASTPGCTKQACDFR dnlgd 57 1.00 F 15609658
23, 2, 0 70 tagln VVGISPDKPEKL atfrd 81 1.00 F 15609658
24, 1, 3 24 tvsls DFKGKNIVLYFYPKDMTPGCTTEACDFR drved 51 1.00 F 15613511
24, 2, 0 64 glntv ILGVSPDPVERH kkfie 75 1.00 F 15613511
25, 1, 2 60 sdyrg DVVILNVWASWCEPCRKEMPALMEL qsdye 84 1.00 F 15614085
26, 1, 2 63 releg KGVFLNFWGTYCPPCEREMPHMEKL ygeyk 87 1.00 F 15614140
27, 1, 2 72 sslrg QPVILHFFATWCPVCQDEMPSLVKL dkeyr 96 1.00 F 15615431
28, 1, 3 20 tfthv DLYGKYTILFFFPKAGTSGCTREAVEFS renfe 47 1.00 F 15643152
28, 2, 0 56 fekaq VVGISRDSVEAL krfke 67 1.00 F 15643152
30, 1, 3 61 gltda LADNRAVVLFFYPFDFSPVCATELCAIQ narwf 88 1.00 F 15790738
30, 2, 0 101 tpgla VWGISPDSTYAH eafad 112 1.00 F 15790738
31, 1, 2 2 m TVTLKDFYADWCGPCKTQDPILEEL eadyd 26 1.00 F 15791337
32, 1, 0 80 idntv VLCISADLPFAQ srfcg 91 1.00 F 15801846
33, 1, 2 72 adyrg RPVVLNFWASWCGPCREEAPLFAKL aahpg 96 1.00 F 15805225
34, 1, 2 78 taaqg KPVVINFWASWCVPCRQEAPLFSKL sqeta 102 1.00 F 15805374
35, 1, 3 26 itlss YRGQSHVVLVFYPLDFSPVCSMQLPEYS gsqdd 53 1.00 F 15807234
35, 2, 0 66 eagav VLGINRDSVYAH rawaa 77 1.00 F 15807234
36, 1, 3 28 vnlae LFKGKKGVLFGVPGAFTPGCSKTHLPGF veqae 55 1.00 F 15826629
37, 1, 2 49 fitkn KIVVVDFWAEWCAPCLILAPVIEEL andyp 73 1.00 F 15899007
38, 1, 3 26 vkips DFKGKVVVLAFYPAAFTSVCTKEMCTFR dsmak 53 1.00 F 15899339
38, 2, 0 66 evnav VIGISVDPPFSN kafke 77 1.00 F 15899339
39, 1, 3 30 vttel LFKGKRVVLFAVPGAFTPTCSLNHLPGY lenrd 57 1.00 F 15964668
40, 1, 2 61 easrq QPVLVDFWAPWCGPCKQLTPVIEKV vreaa 85 1.00 F 15966937
41, 1, 2 61 sdfrg KTLLVNLWATWCVPCRKEMPALDEL qgkls 85 1.00 F 15988313
42, 1, 2 60 qdakg KKVLLNFWATWCKPCRQEMPAMEKL qkeya 84 1.00 F 16078864
43, 1, 2 53 llqdd LPMVIDFWAPWCGPCRSFAPIFAET aaera 77 1.00 F 16123427
44, 1, 3 50 fnlak ALKKGPVVLYFFPAAYTAGCTAEAREFA eatpe 77 1.00 F 16125919
46, 1, 2 21 vlkad GAILVDFWAEWCGPCKMIAPILDEI adeyq 45 1.00 F 1633495
47, 1, 3 31 fnfkq HTNGKTTVLFFWPMDFTFVCPSELIAFD kryee 58 1.00 F 16501671
47, 2, 0 71 krgve VVGVSFDSEFVH nawrn 82 1.00 F 16501671
47, 3, 1 160 lrmvd ALQFHEEHGDVCPAQWEKG kegmn 178 1.00 F 16501671
48, 1, 2 34 vlqcp KPILVYFGAPWCGLCHFVKPLLNHL hgewq 58 1.00 F 1651717
49, 1, 2 60 tlsee RPVLLYFWASWCGVCRFTTPAVAHL aaege 84 1.00 F 16759994
50, 1, 2 53 llkdd LPVVIDFWAPWCGPCRNFAPIFEDV aeers 77 1.00 F 16761507
51, 1, 3 33 slekn IEDDKWTILFFYPMDFTFVCPTEIVAIS arsde 60 1.00 F 16803644
52, 1, 2 19 iissh PKILLNFWAEWCAPCRCFWPTLEQF aemee 43 1.00 F 16804867
53, 1, 3 31 vttdd LFAGKTVAVFSLPGAFTPTCSSTHLPGY nelak 58 1.00 F 17229033
53, 2, 0 73 ngvde IVCISVNDAFVM newak 84 1.00 F 17229033
54, 1, 2 22 vlsed KVVVVDFTATWCGPCRLVSPLMDQL adeyk 46 1.00 F 17229859
55, 1, 2 18 vlegt GYVLVDYFSDGCVPCKALMPAVEEL skkye 42 1.00 F 1729944
56, 1, 2 28 rqhpe KIIILDFYATWCGPCKAIAPLYKEL atthk 52 1.00 F 17531233
57, 1, 2 27 ehlkg KIIGLYFSASWCPPCRAFTPKLKEF feeik 51 1.00 F 17537401
58, 1, 2 63 safrg QPVVINFWAPWCGPCVEEMPELSAL aqeqk 87 1.00 F 17547503
59, 1, 2 286 seykg KTIFLNFWATWCPPCRGEMPYIDEL ykeyn 310 1.00 F 18309723
60, 1, 3 28 rlsev LKRGRPVVLLFFPGAFTSVCTKELCTFR dkmal 55 1.00 F 18313548
60, 2, 0 68 kanae VLAISVDSPFAL kafkd 79 1.00 F 18313548
61, 1, 2 44 dsllg KKIGLYFSAAWCGPCQRFTPQLVEV ynels 68 1.00 F 18406743
61, 2, 2 364 sdlvg KTILMYFSAHWCPPCRAFTPKLVEV ykqik 388 1.00 F 18406743
62, 1, 3 26 eislq DYIGKYVVLAFYPLDFTFVCPTEINRFS dlkga 53 1.00 F 19173077
62, 2, 0 66 rrnav VLLISCDSVYTH kawas 77 1.00 F 19173077
63, 1, 2 15 sdfeg EVVVLNAWGQWCAPCRAEVDDLQLV qetld 39 1.00 F 19554157
64, 1, 2 39 eeykg KVVVINFWATWCGYCVEEMPGFEKV ykefg 63 1.00 F 19705357
66, 1, 2 7 agdfm KPMLLDFSATWCGPCRMQKPILEEL ekkyg 31 1.00 F 20092028
67, 1, 3 27 evtek DTEGRWSVFFFYPADFTFVCPTELGDVA dhyee 54 1.00 F 20151112
67, 2, 0 67 klgvd VYSVSTDTHFTH kawhs 78 1.00 F 20151112
67, 3, 1 154 rkika AQYVAAHPGEVCPAKWKEG eatla 172 1.00 F 20151112
68, 1, 3 29 vdtht LFTGRKVVLFAVPGAFTPTCSAKHLPGY veqfe 56 1.00 F 21112072
69, 1, 2 103 adykg KVVVLNVWGSWCPPCRAEAKNFEKV yqdvk 127 1.00 F 21222859
70, 1, 3 32 qinhk TYEGQWKVVFAWPKDFTFVCPTEIAAFG klnde 59 1.00 F 21223405
70, 2, 0 72 drdaq ILGFSGDSEFVH hawrk 83 1.00 F 21223405
71, 1, 3 28 eihly DLKGKKVLLSFHPLAWTQVCAQQMKSLE enyel 55 1.00 F 21227878
72, 1, 0 78 keegi VLTISADLPFAQ krwca 89 1.00 F 21283385
73, 1, 3 25 mvsls EFKGRKVLLIFYPGDDTPVCTAQLCDYR nnvaa 52 1.00 F 21674812
73, 2, 0 65 srgit VIGISGDSPESH kqfae 76 1.00 F 21674812
74, 1, 2 53 sdfkg ERVLINFWTTWCPPCRQEMPDMQRF yqdlq 77 1.00 F 23098307
76, 1, 2 20 kylqh QRVVVDFSAEWCGPCRAIAPVFDKL sneft 44 1.00 F 267116
77, 1, 2 81 aafkg KVSLVNVWASWCVPCHDEAPLLTEL gkdkr 105 1.00 F 27375582
78, 1, 2 34 vtsdn DVVLADFYADWCGPCQMLEPVVETL aeqtd 58 1.00 F 2822332
79, 1, 2 77 sdlkg KKVILNFWATWCGPCQQEMPDMEAF ykehk 101 1.00 F 30021713
80, 1, 2 19 tisan SNVLVYFWAPLCAPCDLFTPTYEAS srkhf 43 1.00 F 3261501
81, 1, 3 28 irfhd FLGDSWGILFSHPRDFTPVCTTELGRAA klape 55 1.00 F 3318841
81, 2, 0 68 krnvk LIALSIDSVEDH lawsk 79 1.00 F 3318841
81, 3, 1 166 lrvvi SLQLTAEKRVATPVDWKDG dsvmv 184 1.00 F 3318841
82, 1, 2 19 tietn PLVIVDFWAPWCGSCKMLGPVLEEV esevg 43 1.00 F 3323237
83, 1, 2 17 ektah QAVVVNVGASWCPDCRKIEPIMENL aktyk 41 1.00 F 4155972
84, 1, 2 79 vvnse TPVVVDFHAQWCGPCKILGPRLEKM vakqh 103 1.00 F 4200327
85, 1, 3 10 eidin EYKGKYVVLLFYPLDWTFVCPTEMIGYS evagq 37 1.00 F 4433065
85, 2, 0 50 eince VIGVSVDSVYCH qawce 61 1.00 F 4433065
86, 1, 3 32 vsvhs IAAGKKVILFGVPGAFTPTCSMSHVPGF igkae 59 1.00 F 4704732
86, 2, 0 74 kgide IICFSVNDPFVM kawgk 85 1.00 F 4704732
87, 1, 3 28 fdfyk YVGDNWAILFSHPHDFTPVCTTELAEFG kmhee 55 1.00 F 4996210
87, 2, 0 68 klnck LIGFSCNSKESH dqwie 79 1.00 F 4996210
87, 3, 1 163 lrvlk SLQLTNTHPVATPVNWKEG dkcci 181 1.00 F 4996210
88, 1, 3 41 ynask EFANKKVVLFALPGAFTPVCSANHVPEY iqklp 68 1.00 F 5326864
89, 1, 3 88 slkki TENNRVVVFFVYPRASTPGCTRQACGFR dnyqe 115 1.00 F 6322180
90, 1, 3 43 ewskl ISENKKVIITGAPAAFSPTCTVSHIPGY inyld 70 1.00 F 6323138
91, 1, 2 20 nenkg RLIVVDFFAQWCGPCRNIAPKVEAL akeip 44 1.00 F 6687568
92, 1, 0 68 klgve VLSVSVDSVFVH kmwnd 79 1.00 F 6850955
93, 1, 2 18 llttn KKVVVDFYANWCGPCKILGPIFEEV aqdkk 42 1.00 F 7109697
94, 1, 2 21 ilaed KLVVIDFYADWCGPCKIIAPKLDEL aqqys 45 1.00 F 7290567
95, 1, 3 31 evkls DYKGKYVVLFFYPLDFTFVCPTEIIAFS nraed 58 1.00 F 9955016
95, 2, 0 71 klgce VLGVSVDSQFTH lawin 82 1.00 F 9955016
95, 3, 1 160 lrlvq AFQYTDEHGEVCPAGWKPG sdtik 178 1.00 F 9955016
96, 1, 2 49 adlqg KVTLINFWFPSCPGCVSEMPKIIKT andyk 73 1.00 F 15677788
124 motifs
Column 1 : Sequence Number, Site Number
Column 2 : Motif type
Column 3 : Left End Location
Column 4 : Motif Element
Column 5 : Right End Location
Column 6 : Probability of Element
Column 7 : Forward Motif (F) or Reverse Complement (R)
Column 8 : Sequence Description from Fast A input
-------------------------------------------------------------------------
MOTIF a
Motif model (residue frequency x 100)
____________________________________________
Pos. # a v c d e f g h i w k l m n y p q r s t Info
_____________________________________________________________________________________________
1 | . 68 . . . . . . 20 . . 10 . . . . . . . . 2.4
2 | . 17 . . . . . . 34 3 . 34 . . 6 . . . . 3 1.7
3 | 13 6 10 . . . 51 . . . . 3 . . . . . . 10 3 1.9
4 | . 31 . . . 10 . . 48 . . 10 . . . . . . . . 2.1
5 | . . . . . . . . . . . . . 3 . . . . 96 . 3.8
7 | . . . 86 . . . . . . . . . 13 . . . . . . 3.2
8 | . . . 10 . . . . . . 3 10 . . . 6 3 . 58 6 2.0
9 | 3 34 . . 6 . . 6 . . 3 . . . . 37 3 . . 3 1.8
10 | . . . . 24 58 . . . . . . . . 17 . . . . . 2.9
12 | . . . . . . . 55 . . . 13 6 6 . . 17 . . . 3.4
nonsite 8 8 . 6 7 4 7 1 6 . 7 9 2 4 2 4 3 4 5 5
site 1 15 1 9 3 6 5 6 10 . . 8 . 2 2 4 2 . 16 1
Motif probability model
____________________________________________
Pos. # a v c d e f g h i w k l m n y p q r s t
____________________________________________
1 | 0.001 0.679 0.000 0.001 0.001 0.001 0.001 0.000 0.204 0.000 0.001 0.103 0.000 0.001 0.000 0.001 0.001 0.001 0.001 0.001
2 | 0.001 0.171 0.000 0.001 0.001 0.001 0.001 0.000 0.340 0.034 0.001 0.341 0.000 0.001 0.068 0.001 0.001 0.001 0.001 0.035
3 | 0.137 0.069 0.102 0.001 0.001 0.001 0.510 0.000 0.001 0.000 0.001 0.035 0.000 0.001 0.000 0.001 0.001 0.001 0.103 0.035
4 | 0.001 0.306 0.000 0.001 0.001 0.103 0.001 0.000 0.476 0.000 0.001 0.103 0.000 0.001 0.000 0.001 0.001 0.001 0.001 0.001
5 | 0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.002 0.000 0.035 0.000 0.001 0.001 0.001 0.950 0.001
7 | 0.001 0.001 0.000 0.849 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.002 0.000 0.136 0.000 0.001 0.001 0.001 0.001 0.001
8 | 0.001 0.001 0.000 0.103 0.001 0.001 0.001 0.000 0.001 0.000 0.035 0.103 0.000 0.001 0.000 0.069 0.034 0.001 0.577 0.069
9 | 0.035 0.340 0.000 0.001 0.069 0.001 0.001 0.068 0.001 0.000 0.035 0.002 0.000 0.001 0.000 0.374 0.034 0.001 0.001 0.035
10 | 0.001 0.001 0.000 0.001 0.238 0.577 0.001 0.000 0.001 0.000 0.001 0.002 0.000 0.001 0.170 0.001 0.001 0.001 0.001 0.001
12 | 0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.543 0.001 0.000 0.001 0.137 0.068 0.068 0.000 0.001 0.170 0.001 0.001 0.001
Background probability model
0.089 0.079 0.008 0.067 0.076 0.044 0.071 0.013 0.061 0.009 0.076 0.094 0.023 0.043 0.027 0.045 0.034 0.044 0.052 0.052
10 columns
Num Motifs: 29
1, 1 79 agvda VIVLSANDPFVQ safgk 90 1.00 F 1091044
2, 1 72 klstq ILAISVDSPFSH lqyll 83 1.00 F 11467494
6, 1 66 nlntk IYAISNDSHFVQ knwie 77 1.00 F 13186328
8, 1 68 kknte VISVSEDTVYVH kawvq 79 1.00 F 13541053
9, 1 66 kfkak VIGISVDSPFSL aefak 77 1.00 F 13541117
13, 1 65 eldce LVGLSVDQVFSH ikwie 76 1.00 F 14286173
15, 1 65 klgav VIGVSTDSVEKN rkfae 76 1.00 F 14600438
21, 1 80 megvd VTVVSMDLPFAQ krfce 91 1.00 F 15605963
23, 1 70 tagln VVGISPDKPEKL atfrd 81 1.00 F 15609658
24, 1 64 glntv ILGVSPDPVERH kkfie 75 1.00 F 15613511
28, 1 56 fekaq VVGISRDSVEAL krfke 67 1.00 F 15643152
30, 1 101 tpgla VWGISPDSTYAH eafad 112 1.00 F 15790738
32, 1 80 idntv VLCISADLPFAQ srfcg 91 1.00 F 15801846
35, 1 66 eagav VLGINRDSVYAH rawaa 77 1.00 F 15807234
38, 1 66 evnav VIGISVDPPFSN kafke 77 1.00 F 15899339
47, 1 71 krgve VVGVSFDSEFVH nawrn 82 1.00 F 16501671
53, 1 73 ngvde IVCISVNDAFVM newak 84 1.00 F 17229033
60, 1 68 kanae VLAISVDSPFAL kafkd 79 1.00 F 18313548
62, 1 66 rrnav VLLISCDSVYTH kawas 77 1.00 F 19173077
67, 1 67 klgvd VYSVSTDTHFTH kawhs 78 1.00 F 20151112
70, 1 72 drdaq ILGFSGDSEFVH hawrk 83 1.00 F 21223405
72, 1 78 keegi VLTISADLPFAQ krwca 89 1.00 F 21283385
73, 1 65 srgit VIGISGDSPESH kqfae 76 1.00 F 21674812
81, 1 68 krnvk LIALSIDSVEDH lawsk 79 1.00 F 3318841
85, 1 50 eince VIGVSVDSVYCH qawce 61 1.00 F 4433065
86, 1 74 kgide IICFSVNDPFVM kawgk 85 1.00 F 4704732
87, 1 68 klnck LIGFSCNSKESH dqwie 79 1.00 F 4996210
92, 1 68 klgve VLSVSVDSVFVH kmwnd 79 1.00 F 6850955
95, 1 71 klgce VLGVSVDSQFTH lawin 82 1.00 F 9955016
***** **** *
Column 1 : Sequence Number, Site Number
Column 2 : Left End Location
Column 4 : Motif Element
Column 5 : Right End Location
Column 6 : Probability of Element
Column 7 : Forward Motif (F) or Reverse Complement (R)
Column 8 : Sequence Description from Fast A input
Log Motif portion of MAP for motif a = -469.15170
Log Fragmentation portion of MAP for motif a = -3.80666
=============================================================
====== ELEMENTS OCCURRING GREATER THAN 50% OF THE TIME =====
====== Motif a =====
=============================================================
Listing of those elements occurring greater than 50% of the time
in near optimal sampling using 500 iterations
Motif model (residue frequency x 100)
____________________________________________
Pos. # a v c d e f g h i w k l m n y p q r s t Info
_____________________________________________________________________________________________
1 | . 74 . . . . . . 14 . . 11 . . . . . . . . 2.5
2 | . 14 . . . 3 . . 29 3 . 37 . . 7 . . . . 3 1.6
3 | 14 3 3 . . . 59 . . . . 3 . . . . . . 11 3 1.9
4 | . 33 . . . 7 . . 48 . . 11 . . . . . . . . 2.1
5 | . . . . . . . . . . . . . 3 . . . . 96 . 3.8
7 | . . . 96 . . . . . . . . . 3 . . . . . . 3.5
8 | . . . . . . . . . . 3 11 . . . 7 3 . 66 7 2.4
9 | . 40 . . 7 . . 7 . . 3 . . . . 33 3 . . 3 1.9
10 | . . . . 25 51 . . . . . . . . 18 . . . . 3 2.7
12 | . . . . . . . 59 . . . 14 . 7 . . 18 . . . 3.6
nonsite 8 8 . 6 7 4 7 1 6 . 7 9 2 4 2 4 3 4 5 5
site 1 16 . 9 3 6 5 6 9 . . 8 . 1 2 4 2 . 17 2
Motif probability model
____________________________________________
Pos. # a v c d e f g h i w k l m n y p q r s t
____________________________________________
1 | 0.002 0.729 0.000 0.001 0.001 0.001 0.001 0.000 0.147 0.000 0.001 0.111 0.000 0.001 0.000 0.001 0.001 0.001 0.001 0.001
2 | 0.002 0.147 0.000 0.001 0.001 0.037 0.001 0.000 0.292 0.037 0.001 0.365 0.000 0.001 0.073 0.001 0.001 0.001 0.001 0.037
3 | 0.147 0.038 0.037 0.001 0.001 0.001 0.583 0.000 0.001 0.000 0.001 0.038 0.000 0.001 0.000 0.001 0.001 0.001 0.110 0.037
4 | 0.002 0.329 0.000 0.001 0.001 0.074 0.001 0.000 0.474 0.000 0.001 0.111 0.000 0.001 0.000 0.001 0.001 0.001 0.001 0.001
5 | 0.002 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.002 0.000 0.037 0.000 0.001 0.001 0.001 0.946 0.001
7 | 0.002 0.001 0.000 0.947 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.002 0.000 0.037 0.000 0.001 0.001 0.001 0.001 0.001
8 | 0.002 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.038 0.111 0.000 0.001 0.000 0.074 0.037 0.001 0.655 0.074
9 | 0.002 0.401 0.000 0.001 0.074 0.001 0.001 0.073 0.001 0.000 0.038 0.002 0.000 0.001 0.000 0.328 0.037 0.001 0.001 0.037
10 | 0.002 0.001 0.000 0.001 0.256 0.510 0.001 0.000 0.001 0.000 0.001 0.002 0.000 0.001 0.182 0.001 0.001 0.001 0.001 0.037
12 | 0.002 0.001 0.000 0.001 0.001 0.001 0.001 0.582 0.001 0.000 0.001 0.147 0.000 0.073 0.000 0.001 0.182 0.001 0.001 0.001
Background probability model
0.089 0.079 0.008 0.067 0.076 0.044 0.071 0.013 0.061 0.009 0.076 0.094 0.023 0.043 0.027 0.045 0.034 0.044 0.052 0.052
10 columns
Num Motifs: 27
2, 1 72 klstq ILAISVDSPFSH lqyll 83 1.00 F 11467494
6, 1 66 nlntk IYAISNDSHFVQ knwie 77 1.00 F 13186328
8, 1 68 kknte VISVSEDTVYVH kawvq 79 1.00 F 13541053
9, 1 66 kfkak VIGISVDSPFSL aefak 77 1.00 F 13541117
13, 1 65 eldce LVGLSVDQVFSH ikwie 76 0.98 F 14286173
15, 1 65 klgav VIGVSTDSVEKN rkfae 76 1.00 F 14600438
21, 1 80 megvd VTVVSMDLPFAQ krfce 91 0.65 F 15605963
23, 1 70 tagln VVGISPDKPEKL atfrd 81 0.99 F 15609658
24, 1 64 glntv ILGVSPDPVERH kkfie 75 1.00 F 15613511
28, 1 56 fekaq VVGISRDSVEAL krfke 67 1.00 F 15643152
30, 1 101 tpgla VWGISPDSTYAH eafad 112 0.98 F 15790738
32, 1 80 idntv VLCISADLPFAQ srfcg 91 0.99 F 15801846
35, 1 66 eagav VLGINRDSVYAH rawaa 77 1.00 F 15807234
38, 1 66 evnav VIGISVDPPFSN kafke 77 1.00 F 15899339
47, 1 71 krgve VVGVSFDSEFVH nawrn 82 1.00 F 16501671
60, 1 68 kanae VLAISVDSPFAL kafkd 79 1.00 F 18313548
62, 1 66 rrnav VLLISCDSVYTH kawas 77 1.00 F 19173077
67, 1 67 klgvd VYSVSTDTHFTH kawhs 78 1.00 F 20151112
70, 1 72 drdaq ILGFSGDSEFVH hawrk 83 1.00 F 21223405
72, 1 78 keegi VLTISADLPFAQ krwca 89 0.99 F 21283385
73, 1 65 srgit VIGISGDSPESH kqfae 76 1.00 F 21674812
81, 1 68 krnvk LIALSIDSVEDH lawsk 79 1.00 F 3318841
85, 1 50 eince VIGVSVDSVYCH qawce 61 1.00 F 4433065
87, 1 68 klnck LIGFSCNSKESH dqwie 79 0.56 F 4996210
89, 1 127 kkyaa VFGLSADSVTSQ kkfqs 138 0.51 F 6322180
92, 1 68 klgve VLSVSVDSVFVH kmwnd 79 1.00 F 6850955
95, 1 71 klgce VLGVSVDSQFTH lawin 82 1.00 F 9955016
***** **** *
-------------------------------------------------------------------------
MOTIF b
Motif model (residue frequency x 100)
____________________________________________
Pos. # a v c d e f g h i w k l m n y p q r s t Info
_____________________________________________________________________________________________
1 | 50 . . . . . . . 16 . . . . . . . . . 33 . 1.9
2 | . . . . . 16 . . . . . 50 . . . . 33 . . . 2.1
3 | . . . . . . . . . . . . . . 33 . 66 . . . 3.4
4 | . 33 . . . 16 . . . . . 33 . . 16 . . . . . 1.7
6 | 33 . . 16 33 . . . . . . . . 16 . . . . . . 1.5
8 | . . . . . . . 50 . . 16 . . . . 33 . . . . 3.2
9 | . . . . . . 66 . . . . . . . . 16 . 16 . . 2.3
10 | . 33 . 16 33 . . . . . . . . . 16 . . . . . 1.7
11 | 50 50 . . . . . . . . . . . . . . . . . . 2.1
12 | . . 66 . . . . . . . . . . . . . . . . 33 4.4
13 | . . . . . . . . . . . . . . . 100 . . . . 3.8
14 | 50 50 . . . . . . . . . . . . . . . . . . 2.1
16 | . . . . . . . . . 100 . . . . . . . . . . 5.8
17 | . . . . 16 . . . . . 66 . . 16 . . . . . . 2.1
19 | . . . . . . 100 . . . . . . . . . . . . . 3.2
nonsite 8 7 . 6 7 4 7 1 6 . 7 9 2 4 2 4 3 4 5 5
site 12 11 4 2 5 2 11 3 1 6 5 5 . 2 4 10 6 1 2 2
Motif probability model
____________________________________________
Pos. # a v c d e f g h i w k l m n y p q r s t
____________________________________________
1 | 0.468 0.006 0.001 0.005 0.005 0.004 0.005 0.001 0.158 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.312 0.004
2 | 0.006 0.006 0.001 0.005 0.005 0.158 0.005 0.001 0.004 0.001 0.006 0.469 0.002 0.003 0.002 0.004 0.310 0.003 0.004 0.004
3 | 0.006 0.006 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.310 0.004 0.618 0.003 0.004 0.004
4 | 0.006 0.314 0.001 0.005 0.005 0.158 0.005 0.001 0.004 0.001 0.006 0.315 0.002 0.003 0.156 0.004 0.002 0.003 0.004 0.004
6 | 0.314 0.006 0.001 0.159 0.313 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.157 0.002 0.004 0.002 0.003 0.004 0.004
8 | 0.006 0.006 0.001 0.005 0.005 0.004 0.005 0.463 0.004 0.001 0.159 0.007 0.002 0.003 0.002 0.312 0.002 0.003 0.004 0.004
9 | 0.006 0.006 0.001 0.005 0.005 0.004 0.621 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.158 0.002 0.157 0.004 0.004
10 | 0.006 0.314 0.001 0.159 0.313 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.156 0.004 0.002 0.003 0.004 0.004
11 | 0.468 0.468 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.004
12 | 0.006 0.006 0.617 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.312
13 | 0.006 0.006 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.927 0.002 0.003 0.004 0.004
14 | 0.468 0.468 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.004
16 | 0.006 0.006 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.924 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.004
17 | 0.006 0.006 0.001 0.005 0.159 0.004 0.005 0.001 0.004 0.001 0.621 0.007 0.002 0.157 0.002 0.004 0.002 0.003 0.004 0.004
19 | 0.006 0.006 0.001 0.005 0.005 0.004 0.928 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.004
Background probability model
0.089 0.079 0.008 0.067 0.076 0.044 0.071 0.013 0.061 0.009 0.076 0.094 0.023 0.043 0.027 0.045 0.034 0.044 0.052 0.052
15 columns
Num Motifs: 6
2, 1 161 riles IQYVKENPGYACPVNWNFG dqvfy 179 1.00 F 11467494
47, 1 160 lrmvd ALQFHEEHGDVCPAQWEKG kegmn 178 1.00 F 16501671
67, 1 154 rkika AQYVAAHPGEVCPAKWKEG eatla 172 1.00 F 20151112
81, 1 166 lrvvi SLQLTAEKRVATPVDWKDG dsvmv 184 1.00 F 3318841
87, 1 163 lrvlk SLQLTNTHPVATPVNWKEG dkcci 181 1.00 F 4996210
95, 1 160 lrlvq AFQYTDEHGEVCPAGWKPG sdtik 178 1.00 F 9955016
**** * ******* ** *
Column 1 : Sequence Number, Site Number
Column 2 : Left End Location
Column 4 : Motif Element
Column 5 : Right End Location
Column 6 : Probability of Element
Column 7 : Forward Motif (F) or Reverse Complement (R)
Column 8 : Sequence Description from Fast A input
Log Motif portion of MAP for motif b = -187.76179
Log Fragmentation portion of MAP for motif b = -7.77486
=============================================================
====== ELEMENTS OCCURRING GREATER THAN 50% OF THE TIME =====
====== Motif b =====
=============================================================
Listing of those elements occurring greater than 50% of the time
in near optimal sampling using 500 iterations
Motif model (residue frequency x 100)
____________________________________________
Pos. # a v c d e f g h i w k l m n y p q r s t Info
_____________________________________________________________________________________________
1 | 50 . . . . . . . 16 . . . . . . . . . 33 . 1.9
2 | . . . . . 16 . . . . . 50 . . . . 33 . . . 2.1
3 | . . . . . . . . . . . . . . 33 . 66 . . . 3.4
4 | . 33 . . . 16 . . . . . 33 . . 16 . . . . . 1.7
6 | 33 . . 16 33 . . . . . . . . 16 . . . . . . 1.5
8 | . . . . . . . 50 . . 16 . . . . 33 . . . . 3.2
9 | . . . . . . 66 . . . . . . . . 16 . 16 . . 2.3
10 | . 33 . 16 33 . . . . . . . . . 16 . . . . . 1.7
11 | 50 50 . . . . . . . . . . . . . . . . . . 2.1
12 | . . 66 . . . . . . . . . . . . . . . . 33 4.4
13 | . . . . . . . . . . . . . . . 100 . . . . 3.8
14 | 50 50 . . . . . . . . . . . . . . . . . . 2.1
16 | . . . . . . . . . 100 . . . . . . . . . . 5.8
17 | . . . . 16 . . . . . 66 . . 16 . . . . . . 2.1
19 | . . . . . . 100 . . . . . . . . . . . . . 3.2
nonsite 8 7 . 6 7 4 7 1 6 . 7 9 2 4 2 4 3 4 5 5
site 12 11 4 2 5 2 11 3 1 6 5 5 . 2 4 10 6 1 2 2
Motif probability model
____________________________________________
Pos. # a v c d e f g h i w k l m n y p q r s t
____________________________________________
1 | 0.468 0.006 0.001 0.005 0.005 0.004 0.005 0.001 0.158 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.312 0.004
2 | 0.006 0.006 0.001 0.005 0.005 0.158 0.005 0.001 0.004 0.001 0.006 0.469 0.002 0.003 0.002 0.004 0.310 0.003 0.004 0.004
3 | 0.006 0.006 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.310 0.004 0.618 0.003 0.004 0.004
4 | 0.006 0.314 0.001 0.005 0.005 0.158 0.005 0.001 0.004 0.001 0.006 0.315 0.002 0.003 0.156 0.004 0.002 0.003 0.004 0.004
6 | 0.314 0.006 0.001 0.159 0.313 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.157 0.002 0.004 0.002 0.003 0.004 0.004
8 | 0.006 0.006 0.001 0.005 0.005 0.004 0.005 0.463 0.004 0.001 0.159 0.007 0.002 0.003 0.002 0.312 0.002 0.003 0.004 0.004
9 | 0.006 0.006 0.001 0.005 0.005 0.004 0.621 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.158 0.002 0.157 0.004 0.004
10 | 0.006 0.314 0.001 0.159 0.313 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.156 0.004 0.002 0.003 0.004 0.004
11 | 0.468 0.468 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.004
12 | 0.006 0.006 0.617 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.312
13 | 0.006 0.006 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.927 0.002 0.003 0.004 0.004
14 | 0.468 0.468 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.004
16 | 0.006 0.006 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.924 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.004
17 | 0.006 0.006 0.001 0.005 0.159 0.004 0.005 0.001 0.004 0.001 0.621 0.007 0.002 0.157 0.002 0.004 0.002 0.003 0.004 0.004
19 | 0.006 0.006 0.001 0.005 0.005 0.004 0.928 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.004
Background probability model
0.089 0.079 0.008 0.067 0.076 0.044 0.071 0.013 0.061 0.009 0.076 0.094 0.023 0.043 0.027 0.045 0.034 0.044 0.052 0.052
15 columns
Num Motifs: 6
2, 1 161 riles IQYVKENPGYACPVNWNFG dqvfy 179 1.00 F 11467494
47, 1 160 lrmvd ALQFHEEHGDVCPAQWEKG kegmn 178 1.00 F 16501671
67, 1 154 rkika AQYVAAHPGEVCPAKWKEG eatla 172 1.00 F 20151112
81, 1 166 lrvvi SLQLTAEKRVATPVDWKDG dsvmv 184 1.00 F 3318841
87, 1 163 lrvlk SLQLTNTHPVATPVNWKEG dkcci 181 1.00 F 4996210
95, 1 160 lrlvq AFQYTDEHGEVCPAGWKPG sdtik 178 1.00 F 9955016
**** * ******* ** *
-------------------------------------------------------------------------
MOTIF c
Motif model (residue frequency x 100)
____________________________________________
Pos. # a v c d e f g h i w k l m n y p q r s t Info
_____________________________________________________________________________________________
1 | . . . 5 3 . 5 . . . 53 3 . . . 3 11 7 1 3 1.6
3 | . 61 . . . . . . 22 . . 7 3 . . . . . 1 3 2.1
4 | . 38 . . . 3 3 . 11 . . 40 1 . . . . . . . 1.7
5 | 5 35 . . . . . . 24 . 1 31 1 . . . . . . . 1.7
6 | . . . 48 . . . 1 . . . . . 37 11 . 1 . . . 2.7
7 | 1 7 . . . 85 . . . . . 3 . . 1 . . . . . 3.4
8 | . . . . . 5 3 1 . 55 . . . . 18 . . . 9 5 3.5
9 | 87 . . . . 1 5 . . . . . . . . . . . 3 1 2.8
10 | 3 . . 14 9 . . 1 . . . . . 1 . 16 5 . 20 25 1.5
11 | . . . . . . 1 . . 90 . 1 . . 1 . . . 1 1 5.2
12 | . . 100 . . . . . . . . . . . . . . . . . 6.0
13 | 9 7 . . 1 . 53 . . . 1 . 1 . . 24 . . . . 2.0
14 | . 7 . 1 . . 1 . . . . 1 . . 1 83 . . 1 . 3.2
15 | . . 100 . . . . . . . . . . . . . . . . . 6.0
16 | . 5 . 1 1 . . 3 1 . 27 1 . . . . 9 46 . . 2.1
18 | . 3 . . 37 12 . . 18 . . 16 3 . . . 3 . . 3 1.4
20 | . . . 1 . . . . . . 3 . . . . 94 . . . . 3.9
22 | . 7 . . . 22 . . 9 . . 44 11 . 5 . . . . . 1.8
24 | 7 . . 3 37 . . 3 . . 27 1 . 1 . . 11 1 1 1 1.4
25 | 3 18 . 1 . 9 . . 7 . . 51 1 . . . . . 1 3 1.4
nonsite 8 7 1 6 7 4 6 1 5 1 7 9 2 4 2 4 3 4 4 4
site 5 9 10 3 4 7 3 . 4 7 5 10 1 2 2 11 2 2 2 2
Motif probability model
____________________________________________
Pos. # a v c d e f g h i w k l m n y p q r s t
____________________________________________
1 | 0.001 0.001 0.000 0.056 0.037 0.000 0.056 0.000 0.001 0.000 0.533 0.038 0.000 0.000 0.000 0.037 0.110 0.074 0.019 0.037
3 | 0.001 0.606 0.000 0.001 0.001 0.000 0.001 0.000 0.221 0.000 0.001 0.074 0.037 0.000 0.000 0.000 0.000 0.000 0.019 0.037
4 | 0.001 0.386 0.000 0.001 0.001 0.037 0.037 0.000 0.111 0.000 0.001 0.405 0.019 0.000 0.000 0.000 0.000 0.000 0.000 0.000
5 | 0.056 0.349 0.000 0.001 0.001 0.000 0.001 0.000 0.239 0.000 0.019 0.313 0.019 0.000 0.000 0.000 0.000 0.000 0.000 0.000
6 | 0.001 0.001 0.000 0.478 0.001 0.000 0.001 0.018 0.001 0.000 0.001 0.001 0.000 0.367 0.110 0.000 0.019 0.000 0.000 0.000
7 | 0.019 0.074 0.000 0.001 0.001 0.845 0.001 0.000 0.001 0.000 0.001 0.038 0.000 0.000 0.019 0.000 0.000 0.000 0.000 0.000
8 | 0.001 0.001 0.000 0.001 0.001 0.056 0.037 0.018 0.001 0.551 0.001 0.001 0.000 0.000 0.184 0.000 0.000 0.000 0.092 0.056
9 | 0.863 0.001 0.000 0.001 0.001 0.019 0.056 0.000 0.001 0.000 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.037 0.019
10 | 0.037 0.001 0.000 0.147 0.092 0.000 0.001 0.018 0.001 0.000 0.001 0.001 0.000 0.019 0.000 0.166 0.055 0.000 0.202 0.257
11 | 0.001 0.001 0.000 0.001 0.001 0.000 0.019 0.000 0.001 0.899 0.001 0.019 0.000 0.000 0.019 0.000 0.000 0.000 0.019 0.019
12 | 0.001 0.001 0.991 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
13 | 0.093 0.074 0.000 0.001 0.019 0.000 0.533 0.000 0.001 0.000 0.019 0.001 0.019 0.000 0.000 0.239 0.000 0.000 0.000 0.000
14 | 0.001 0.074 0.000 0.019 0.001 0.000 0.019 0.000 0.001 0.000 0.001 0.019 0.000 0.000 0.019 0.826 0.000 0.000 0.019 0.000
15 | 0.001 0.001 0.991 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
16 | 0.001 0.056 0.000 0.019 0.019 0.000 0.001 0.037 0.019 0.000 0.276 0.019 0.000 0.000 0.000 0.000 0.092 0.459 0.000 0.000
18 | 0.001 0.037 0.000 0.001 0.368 0.129 0.001 0.000 0.184 0.000 0.001 0.166 0.037 0.000 0.000 0.000 0.037 0.000 0.000 0.037
20 | 0.001 0.001 0.000 0.019 0.001 0.000 0.001 0.000 0.001 0.000 0.037 0.001 0.000 0.000 0.000 0.936 0.000 0.000 0.000 0.000
22 | 0.001 0.074 0.000 0.001 0.001 0.221 0.001 0.000 0.092 0.000 0.001 0.441 0.110 0.000 0.055 0.000 0.000 0.000 0.000 0.000
24 | 0.074 0.001 0.000 0.037 0.368 0.000 0.001 0.037 0.001 0.000 0.276 0.019 0.000 0.019 0.000 0.000 0.110 0.019 0.019 0.019
25 | 0.037 0.184 0.000 0.019 0.001 0.092 0.001 0.000 0.074 0.000 0.001 0.515 0.019 0.000 0.000 0.000 0.000 0.000 0.019 0.037
Background probability model
0.089 0.079 0.008 0.067 0.076 0.044 0.071 0.013 0.061 0.009 0.076 0.094 0.023 0.043 0.027 0.045 0.034 0.044 0.052 0.052
20 columns
Num Motifs: 54
3, 1 17 viqsd KLVVVDFYADWCMPCRYISPILEKL skeyn 41 1.00 F 11499727
4, 1 22 llntt QYVVADFYADWCGPCKAIAPMYAQF aktfs 46 1.00 F 1174686
5, 1 19 ifsak KNVIVDFWAAWCGPCKLTSPEFQKA adefs 43 1.00 F 12044976
7, 1 21 kenhs KPILIDFYADWCPPCRMLIPVLDSI ekkhg 45 1.00 F 13358154
10, 1 17 aetse GVVLADFWAPWCGPCKMIAPVLEEL dqemg 41 1.00 F 135765
11, 1 29 akesn KLIVIDFTASWCPPCRMIAPIFNDL akkfm 53 1.00 F 1388082
12, 1 44 likqn DKLVIDFYATWCGPCKMMQPHLTKL iqayp 68 1.00 F 140543
14, 1 80 selrg KVVMLQFTASWCGVCRKEMPFIEKD iwlkh 104 1.00 F 14578634
16, 1 23 lqnsd KPVLVDFYATWCGPCQLMVPILNEV setlk 47 1.00 F 15218394
17, 1 157 adfrg RPLVINLWASWCPPCRREMPVLQQA qaenp 181 1.00 F 15597673
18, 1 26 ensfh KPVLVDFWADWCAPCKALMPLLAQI aesyq 50 1.00 F 15599256
19, 1 67 negkg KTILLNFWSETCGVCIAELKTFEQL lqsyp 91 1.00 F 15602312
20, 1 61 eefkg KVLLINFWATWCPPCKEEIPMFKEI yekyr 85 1.00 F 15605725
25, 1 60 sdyrg DVVILNVWASWCEPCRKEMPALMEL qsdye 84 1.00 F 15614085
26, 1 63 releg KGVFLNFWGTYCPPCEREMPHMEKL ygeyk 87 1.00 F 15614140
27, 1 72 sslrg QPVILHFFATWCPVCQDEMPSLVKL dkeyr 96 1.00 F 15615431
31, 1 2 m TVTLKDFYADWCGPCKTQDPILEEL eadyd 26 1.00 F 15791337
33, 1 72 adyrg RPVVLNFWASWCGPCREEAPLFAKL aahpg 96 1.00 F 15805225
34, 1 78 taaqg KPVVINFWASWCVPCRQEAPLFSKL sqeta 102 1.00 F 15805374
37, 1 49 fitkn KIVVVDFWAEWCAPCLILAPVIEEL andyp 73 1.00 F 15899007
40, 1 61 easrq QPVLVDFWAPWCGPCKQLTPVIEKV vreaa 85 1.00 F 15966937
41, 1 61 sdfrg KTLLVNLWATWCVPCRKEMPALDEL qgkls 85 1.00 F 15988313
42, 1 60 qdakg KKVLLNFWATWCKPCRQEMPAMEKL qkeya 84 1.00 F 16078864
43, 1 53 llqdd LPMVIDFWAPWCGPCRSFAPIFAET aaera 77 1.00 F 16123427
46, 1 21 vlkad GAILVDFWAEWCGPCKMIAPILDEI adeyq 45 1.00 F 1633495
48, 1 34 vlqcp KPILVYFGAPWCGLCHFVKPLLNHL hgewq 58 1.00 F 1651717
49, 1 60 tlsee RPVLLYFWASWCGVCRFTTPAVAHL aaege 84 1.00 F 16759994
50, 1 53 llkdd LPVVIDFWAPWCGPCRNFAPIFEDV aeers 77 1.00 F 16761507
52, 1 19 iissh PKILLNFWAEWCAPCRCFWPTLEQF aemee 43 1.00 F 16804867
54, 1 22 vlsed KVVVVDFTATWCGPCRLVSPLMDQL adeyk 46 1.00 F 17229859
55, 1 18 vlegt GYVLVDYFSDGCVPCKALMPAVEEL skkye 42 1.00 F 1729944
56, 1 28 rqhpe KIIILDFYATWCGPCKAIAPLYKEL atthk 52 1.00 F 17531233
57, 1 27 ehlkg KIIGLYFSASWCPPCRAFTPKLKEF feeik 51 1.00 F 17537401
58, 1 63 safrg QPVVINFWAPWCGPCVEEMPELSAL aqeqk 87 1.00 F 17547503
59, 1 286 seykg KTIFLNFWATWCPPCRGEMPYIDEL ykeyn 310 1.00 F 18309723
61, 1 44 dsllg KKIGLYFSAAWCGPCQRFTPQLVEV ynels 68 1.00 F 18406743
61, 2 364 sdlvg KTILMYFSAHWCPPCRAFTPKLVEV ykqik 388 1.00 F 18406743
63, 1 15 sdfeg EVVVLNAWGQWCAPCRAEVDDLQLV qetld 39 1.00 F 19554157
64, 1 39 eeykg KVVVINFWATWCGYCVEEMPGFEKV ykefg 63 1.00 F 19705357
66, 1 7 agdfm KPMLLDFSATWCGPCRMQKPILEEL ekkyg 31 1.00 F 20092028
69, 1 103 adykg KVVVLNVWGSWCPPCRAEAKNFEKV yqdvk 127 1.00 F 21222859
74, 1 53 sdfkg ERVLINFWTTWCPPCRQEMPDMQRF yqdlq 77 1.00 F 23098307
76, 1 20 kylqh QRVVVDFSAEWCGPCRAIAPVFDKL sneft 44 1.00 F 267116
77, 1 81 aafkg KVSLVNVWASWCVPCHDEAPLLTEL gkdkr 105 1.00 F 27375582
78, 1 34 vtsdn DVVLADFYADWCGPCQMLEPVVETL aeqtd 58 1.00 F 2822332
79, 1 77 sdlkg KKVILNFWATWCGPCQQEMPDMEAF ykehk 101 1.00 F 30021713
80, 1 19 tisan SNVLVYFWAPLCAPCDLFTPTYEAS srkhf 43 1.00 F 3261501
82, 1 19 tietn PLVIVDFWAPWCGSCKMLGPVLEEV esevg 43 1.00 F 3323237
83, 1 17 ektah QAVVVNVGASWCPDCRKIEPIMENL aktyk 41 1.00 F 4155972
84, 1 79 vvnse TPVVVDFHAQWCGPCKILGPRLEKM vakqh 103 1.00 F 4200327
91, 1 20 nenkg RLIVVDFFAQWCGPCRNIAPKVEAL akeip 44 1.00 F 6687568
93, 1 18 llttn KKVVVDFYANWCGPCKILGPIFEEV aqdkk 42 1.00 F 7109697
94, 1 21 ilaed KLVVIDFYADWCGPCKIIAPKLDEL aqqys 45 1.00 F 7290567
96, 1 49 adlqg KVTLINFWFPSCPGCVSEMPKIIKT andyk 73 1.00 F 15677788
* ************** * * * **
Column 1 : Sequence Number, Site Number
Column 2 : Left End Location
Column 4 : Motif Element
Column 5 : Right End Location
Column 6 : Probability of Element
Column 7 : Forward Motif (F) or Reverse Complement (R)
Column 8 : Sequence Description from Fast A input
Log Motif portion of MAP for motif c = -1607.59351
Log Fragmentation portion of MAP for motif c = -10.42374
=============================================================
====== ELEMENTS OCCURRING GREATER THAN 50% OF THE TIME =====
====== Motif c =====
=============================================================
Listing of those elements occurring greater than 50% of the time
in near optimal sampling using 500 iterations
Motif model (residue frequency x 100)
____________________________________________
Pos. # a v c d e f g h i w k l m n y p q r s t Info
_____________________________________________________________________________________________
1 | . . . 5 3 . 5 . . . 53 3 . . . 3 11 7 1 3 1.6
3 | . 61 . . . . . . 22 . . 7 3 . . . . . 1 3 2.1
4 | . 38 . . . 3 3 . 11 . . 40 1 . . . . . . . 1.7
5 | 5 35 . . . . . . 24 . 1 31 1 . . . . . . . 1.7
6 | . . . 48 . . . 1 . . . . . 37 11 . 1 . . . 2.7
7 | 1 7 . . . 85 . . . . . 3 . . 1 . . . . . 3.4
8 | . . . . . 5 3 1 . 55 . . . . 18 . . . 9 5 3.5
9 | 87 . . . . 1 5 . . . . . . . . . . . 3 1 2.8
10 | 3 . . 14 9 . . 1 . . . . . 1 . 16 5 . 20 25 1.5
11 | . . . . . . 1 . . 90 . 1 . . 1 . . . 1 1 5.2
12 | . . 100 . . . . . . . . . . . . . . . . . 6.0
13 | 9 7 . . 1 . 53 . . . 1 . 1 . . 24 . . . . 2.0
14 | . 7 . 1 . . 1 . . . . 1 . . 1 83 . . 1 . 3.2
15 | . . 100 . . . . . . . . . . . . . . . . . 6.0
16 | . 5 . 1 1 . . 3 1 . 27 1 . . . . 9 46 . . 2.1
18 | . 3 . . 37 12 . . 18 . . 16 3 . . . 3 . . 3 1.4
20 | . . . 1 . . . . . . 3 . . . . 94 . . . . 3.9
22 | . 7 . . . 22 . . 9 . . 44 11 . 5 . . . . . 1.8
24 | 7 . . 3 37 . . 3 . . 27 1 . 1 . . 11 1 1 1 1.4
25 | 3 18 . 1 . 9 . . 7 . . 51 1 . . . . . 1 3 1.4
nonsite 8 7 1 6 7 4 6 1 5 1 7 9 2 4 2 4 3 4 4 4
site 5 9 10 3 4 7 3 . 4 7 5 10 1 2 2 11 2 2 2 2
Motif probability model
____________________________________________
Pos. # a v c d e f g h i w k l m n y p q r s t
____________________________________________
1 | 0.001 0.001 0.000 0.056 0.037 0.000 0.056 0.000 0.001 0.000 0.533 0.038 0.000 0.000 0.000 0.037 0.110 0.074 0.019 0.037
3 | 0.001 0.606 0.000 0.001 0.001 0.000 0.001 0.000 0.221 0.000 0.001 0.074 0.037 0.000 0.000 0.000 0.000 0.000 0.019 0.037
4 | 0.001 0.386 0.000 0.001 0.001 0.037 0.037 0.000 0.111 0.000 0.001 0.405 0.019 0.000 0.000 0.000 0.000 0.000 0.000 0.000
5 | 0.056 0.349 0.000 0.001 0.001 0.000 0.001 0.000 0.239 0.000 0.019 0.313 0.019 0.000 0.000 0.000 0.000 0.000 0.000 0.000
6 | 0.001 0.001 0.000 0.478 0.001 0.000 0.001 0.018 0.001 0.000 0.001 0.001 0.000 0.367 0.110 0.000 0.019 0.000 0.000 0.000
7 | 0.019 0.074 0.000 0.001 0.001 0.845 0.001 0.000 0.001 0.000 0.001 0.038 0.000 0.000 0.019 0.000 0.000 0.000 0.000 0.000
8 | 0.001 0.001 0.000 0.001 0.001 0.056 0.037 0.018 0.001 0.551 0.001 0.001 0.000 0.000 0.184 0.000 0.000 0.000 0.092 0.056
9 | 0.863 0.001 0.000 0.001 0.001 0.019 0.056 0.000 0.001 0.000 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.037 0.019
10 | 0.037 0.001 0.000 0.147 0.092 0.000 0.001 0.018 0.001 0.000 0.001 0.001 0.000 0.019 0.000 0.166 0.055 0.000 0.202 0.257
11 | 0.001 0.001 0.000 0.001 0.001 0.000 0.019 0.000 0.001 0.899 0.001 0.019 0.000 0.000 0.019 0.000 0.000 0.000 0.019 0.019
12 | 0.001 0.001 0.991 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
13 | 0.093 0.074 0.000 0.001 0.019 0.000 0.533 0.000 0.001 0.000 0.019 0.001 0.019 0.000 0.000 0.239 0.000 0.000 0.000 0.000
14 | 0.001 0.074 0.000 0.019 0.001 0.000 0.019 0.000 0.001 0.000 0.001 0.019 0.000 0.000 0.019 0.826 0.000 0.000 0.019 0.000
15 | 0.001 0.001 0.991 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
16 | 0.001 0.056 0.000 0.019 0.019 0.000 0.001 0.037 0.019 0.000 0.276 0.019 0.000 0.000 0.000 0.000 0.092 0.459 0.000 0.000
18 | 0.001 0.037 0.000 0.001 0.368 0.129 0.001 0.000 0.184 0.000 0.001 0.166 0.037 0.000 0.000 0.000 0.037 0.000 0.000 0.037
20 | 0.001 0.001 0.000 0.019 0.001 0.000 0.001 0.000 0.001 0.000 0.037 0.001 0.000 0.000 0.000 0.936 0.000 0.000 0.000 0.000
22 | 0.001 0.074 0.000 0.001 0.001 0.221 0.001 0.000 0.092 0.000 0.001 0.441 0.110 0.000 0.055 0.000 0.000 0.000 0.000 0.000
24 | 0.074 0.001 0.000 0.037 0.368 0.000 0.001 0.037 0.001 0.000 0.276 0.019 0.000 0.019 0.000 0.000 0.110 0.019 0.019 0.019
25 | 0.037 0.184 0.000 0.019 0.001 0.092 0.001 0.000 0.074 0.000 0.001 0.515 0.019 0.000 0.000 0.000 0.000 0.000 0.019 0.037
Background probability model
0.089 0.079 0.008 0.067 0.076 0.044 0.071 0.013 0.061 0.009 0.076 0.094 0.023 0.043 0.027 0.045 0.034 0.044 0.052 0.052
20 columns
Num Motifs: 54
3, 1 17 viqsd KLVVVDFYADWCMPCRYISPILEKL skeyn 41 1.00 F 11499727
4, 1 22 llntt QYVVADFYADWCGPCKAIAPMYAQF aktfs 46 1.00 F 1174686
5, 1 19 ifsak KNVIVDFWAAWCGPCKLTSPEFQKA adefs 43 1.00 F 12044976
7, 1 21 kenhs KPILIDFYADWCPPCRMLIPVLDSI ekkhg 45 1.00 F 13358154
10, 1 17 aetse GVVLADFWAPWCGPCKMIAPVLEEL dqemg 41 1.00 F 135765
11, 1 29 akesn KLIVIDFTASWCPPCRMIAPIFNDL akkfm 53 1.00 F 1388082
12, 1 44 likqn DKLVIDFYATWCGPCKMMQPHLTKL iqayp 68 1.00 F 140543
14, 1 80 selrg KVVMLQFTASWCGVCRKEMPFIEKD iwlkh 104 1.00 F 14578634
16, 1 23 lqnsd KPVLVDFYATWCGPCQLMVPILNEV setlk 47 1.00 F 15218394
17, 1 157 adfrg RPLVINLWASWCPPCRREMPVLQQA qaenp 181 1.00 F 15597673
18, 1 26 ensfh KPVLVDFWADWCAPCKALMPLLAQI aesyq 50 1.00 F 15599256
19, 1 67 negkg KTILLNFWSETCGVCIAELKTFEQL lqsyp 91 1.00 F 15602312
20, 1 61 eefkg KVLLINFWATWCPPCKEEIPMFKEI yekyr 85 1.00 F 15605725
25, 1 60 sdyrg DVVILNVWASWCEPCRKEMPALMEL qsdye 84 1.00 F 15614085
26, 1 63 releg KGVFLNFWGTYCPPCEREMPHMEKL ygeyk 87 1.00 F 15614140
27, 1 72 sslrg QPVILHFFATWCPVCQDEMPSLVKL dkeyr 96 1.00 F 15615431
31, 1 2 m TVTLKDFYADWCGPCKTQDPILEEL eadyd 26 1.00 F 15791337
33, 1 72 adyrg RPVVLNFWASWCGPCREEAPLFAKL aahpg 96 1.00 F 15805225
34, 1 78 taaqg KPVVINFWASWCVPCRQEAPLFSKL sqeta 102 1.00 F 15805374
37, 1 49 fitkn KIVVVDFWAEWCAPCLILAPVIEEL andyp 73 1.00 F 15899007
40, 1 61 easrq QPVLVDFWAPWCGPCKQLTPVIEKV vreaa 85 1.00 F 15966937
41, 1 61 sdfrg KTLLVNLWATWCVPCRKEMPALDEL qgkls 85 1.00 F 15988313
42, 1 60 qdakg KKVLLNFWATWCKPCRQEMPAMEKL qkeya 84 1.00 F 16078864
43, 1 53 llqdd LPMVIDFWAPWCGPCRSFAPIFAET aaera 77 1.00 F 16123427
46, 1 21 vlkad GAILVDFWAEWCGPCKMIAPILDEI adeyq 45 1.00 F 1633495
48, 1 34 vlqcp KPILVYFGAPWCGLCHFVKPLLNHL hgewq 58 1.00 F 1651717
49, 1 60 tlsee RPVLLYFWASWCGVCRFTTPAVAHL aaege 84 1.00 F 16759994
50, 1 53 llkdd LPVVIDFWAPWCGPCRNFAPIFEDV aeers 77 1.00 F 16761507
52, 1 19 iissh PKILLNFWAEWCAPCRCFWPTLEQF aemee 43 1.00 F 16804867
54, 1 22 vlsed KVVVVDFTATWCGPCRLVSPLMDQL adeyk 46 1.00 F 17229859
55, 1 18 vlegt GYVLVDYFSDGCVPCKALMPAVEEL skkye 42 1.00 F 1729944
56, 1 28 rqhpe KIIILDFYATWCGPCKAIAPLYKEL atthk 52 1.00 F 17531233
57, 1 27 ehlkg KIIGLYFSASWCPPCRAFTPKLKEF feeik 51 1.00 F 17537401
58, 1 63 safrg QPVVINFWAPWCGPCVEEMPELSAL aqeqk 87 1.00 F 17547503
59, 1 286 seykg KTIFLNFWATWCPPCRGEMPYIDEL ykeyn 310 1.00 F 18309723
61, 1 44 dsllg KKIGLYFSAAWCGPCQRFTPQLVEV ynels 68 1.00 F 18406743
61, 2 364 sdlvg KTILMYFSAHWCPPCRAFTPKLVEV ykqik 388 1.00 F 18406743
63, 1 15 sdfeg EVVVLNAWGQWCAPCRAEVDDLQLV qetld 39 1.00 F 19554157
64, 1 39 eeykg KVVVINFWATWCGYCVEEMPGFEKV ykefg 63 1.00 F 19705357
66, 1 7 agdfm KPMLLDFSATWCGPCRMQKPILEEL ekkyg 31 1.00 F 20092028
69, 1 103 adykg KVVVLNVWGSWCPPCRAEAKNFEKV yqdvk 127 1.00 F 21222859
74, 1 53 sdfkg ERVLINFWTTWCPPCRQEMPDMQRF yqdlq 77 1.00 F 23098307
76, 1 20 kylqh QRVVVDFSAEWCGPCRAIAPVFDKL sneft 44 1.00 F 267116
77, 1 81 aafkg KVSLVNVWASWCVPCHDEAPLLTEL gkdkr 105 1.00 F 27375582
78, 1 34 vtsdn DVVLADFYADWCGPCQMLEPVVETL aeqtd 58 1.00 F 2822332
79, 1 77 sdlkg KKVILNFWATWCGPCQQEMPDMEAF ykehk 101 1.00 F 30021713
80, 1 19 tisan SNVLVYFWAPLCAPCDLFTPTYEAS srkhf 43 1.00 F 3261501
82, 1 19 tietn PLVIVDFWAPWCGSCKMLGPVLEEV esevg 43 1.00 F 3323237
83, 1 17 ektah QAVVVNVGASWCPDCRKIEPIMENL aktyk 41 1.00 F 4155972
84, 1 79 vvnse TPVVVDFHAQWCGPCKILGPRLEKM vakqh 103 1.00 F 4200327
91, 1 20 nenkg RLIVVDFFAQWCGPCRNIAPKVEAL akeip 44 1.00 F 6687568
93, 1 18 llttn KKVVVDFYANWCGPCKILGPIFEEV aqdkk 42 1.00 F 7109697
94, 1 21 ilaed KLVVIDFYADWCGPCKIIAPKLDEL aqqys 45 1.00 F 7290567
96, 1 49 adlqg KVTLINFWFPSCPGCVSEMPKIIKT andyk 73 1.00 F 15677788
* ************** * * * **
-------------------------------------------------------------------------
MOTIF d
Motif model (residue frequency x 100)
____________________________________________
Pos. # a v c d e f g h i w k l m n y p q r s t Info
_____________________________________________________________________________________________
1 | 2 . . 28 17 2 . 2 8 . . 17 . . 11 . . . 2 5 1.1
2 | 5 2 . . 5 34 . . . . 2 14 . . 17 . . 8 2 5 1.4
3 | 8 . . 5 11 . 14 . 2 . 28 . . 5 2 . . 17 . 2 1.0
4 | 2 . . 11 . . 62 . . . 5 . . 11 . . 2 . 2 . 2.1
5 | . . . . . . 2 . . . 57 . . 5 . . 5 22 5 . 2.2
7 | 2 68 . . . 2 5 . 2 . 2 . . 2 . . . . 2 8 1.9
8 | 2 62 . . . . . . 25 . . 8 . . . . . . . . 2.3
9 | . 8 . . . 8 . . 5 . . 77 . . . . . . . . 2.3
10 | 8 8 . . . 57 . . 2 . . 5 . . 11 . . . 2 2 2.1
11 | 14 2 . . . 62 8 . . . . . . . . . . . 11 . 2.4
12 | 2 11 . . . 14 . 11 2 5 . 5 . . 45 . . . . . 2.4
13 | . . . . . . . . . . . . . . . 100 . . . . 4.3
14 | 22 . . . . 2 28 2 . . 8 17 5 . 2 . . 8 . . 1.2
15 | 51 . . 42 . . . . . . . . . 2 . . . . 2 . 2.3
16 | . . . 2 . 71 2 . . 5 . . 5 . 5 . . . 5 . 2.9
17 | . . . . . . . . . . . . . . . . . . 11 88 3.6
18 | 5 . . . . 25 2 . . . . . . . . 51 2 . 11 . 2.4
19 | . 54 . . . . 20 . 8 . . . . . . . . . . 17 2.0
20 | . . 97 . . . . . . . . . . . . . . . 2 . 6.2
21 | 5 . . . . . . . . . . . . . . 28 2 . 20 42 2.3
22 | 14 2 . . . . 2 . . . 14 5 5 . . . 2 8 5 37 1.3
23 | . . . . 62 . . . . . 2 . . 8 . . 14 . 5 5 2.2
24 | 14 2 . . . 2 2 22 11 . . 31 11 . . . . . . . 1.8
27 | 2 2 . . 2 45 17 . 8 . . 8 . . 8 2 . . . . 1.6
28 | 11 . . 2 2 8 5 . . . . . . 2 14 . 5 22 22 . 1.4
nonsite 8 7 . 6 7 4 7 1 5 . 7 9 2 4 2 4 3 4 5 5
site 7 9 3 3 4 13 7 1 3 . 4 7 1 1 4 7 1 3 4 8
Motif probability model
____________________________________________
Pos. # a v c d e f g h i w k l m n y p q r s t
____________________________________________
1 | 0.029 0.001 0.000 0.283 0.170 0.029 0.001 0.028 0.085 0.000 0.001 0.170 0.000 0.001 0.113 0.001 0.000 0.001 0.029 0.057
2 | 0.058 0.029 0.000 0.001 0.057 0.339 0.001 0.000 0.001 0.000 0.029 0.142 0.000 0.001 0.169 0.001 0.000 0.085 0.029 0.057
3 | 0.086 0.001 0.000 0.057 0.114 0.001 0.142 0.000 0.029 0.000 0.283 0.001 0.000 0.057 0.029 0.001 0.000 0.170 0.001 0.029
4 | 0.029 0.001 0.000 0.114 0.001 0.001 0.621 0.000 0.001 0.000 0.057 0.001 0.000 0.113 0.000 0.001 0.029 0.001 0.029 0.001
5 | 0.001 0.001 0.000 0.001 0.001 0.001 0.029 0.000 0.001 0.000 0.564 0.001 0.000 0.057 0.000 0.001 0.057 0.226 0.057 0.001
7 | 0.029 0.677 0.000 0.001 0.001 0.029 0.057 0.000 0.029 0.000 0.029 0.001 0.000 0.029 0.000 0.001 0.000 0.001 0.029 0.085
8 | 0.029 0.621 0.000 0.001 0.001 0.001 0.001 0.000 0.254 0.000 0.001 0.086 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.001
9 | 0.001 0.086 0.000 0.001 0.001 0.085 0.001 0.000 0.057 0.000 0.001 0.762 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.001
10 | 0.086 0.086 0.000 0.001 0.001 0.564 0.001 0.000 0.029 0.000 0.001 0.058 0.000 0.001 0.113 0.001 0.000 0.001 0.029 0.029
11 | 0.142 0.029 0.000 0.001 0.001 0.620 0.085 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.113 0.001
12 | 0.029 0.114 0.000 0.001 0.001 0.142 0.001 0.113 0.029 0.057 0.001 0.058 0.000 0.001 0.451 0.001 0.000 0.001 0.001 0.001
13 | 0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.987 0.000 0.001 0.001 0.001
14 | 0.227 0.001 0.000 0.001 0.001 0.029 0.283 0.028 0.001 0.000 0.086 0.170 0.057 0.001 0.029 0.001 0.000 0.085 0.001 0.001
15 | 0.508 0.001 0.000 0.423 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.029 0.000 0.001 0.000 0.001 0.029 0.001
16 | 0.001 0.001 0.000 0.029 0.001 0.705 0.029 0.000 0.001 0.057 0.001 0.001 0.057 0.001 0.057 0.001 0.000 0.001 0.057 0.001
17 | 0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.113 0.874
18 | 0.058 0.001 0.000 0.001 0.001 0.254 0.029 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.508 0.029 0.001 0.113 0.001
19 | 0.001 0.536 0.000 0.001 0.001 0.001 0.198 0.000 0.085 0.000 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.170
20 | 0.001 0.001 0.958 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.029 0.001
21 | 0.058 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.282 0.029 0.001 0.198 0.423
22 | 0.142 0.029 0.000 0.001 0.001 0.001 0.029 0.000 0.001 0.000 0.142 0.058 0.057 0.001 0.000 0.001 0.029 0.085 0.057 0.367
23 | 0.001 0.001 0.000 0.001 0.621 0.001 0.001 0.000 0.001 0.000 0.029 0.001 0.000 0.085 0.000 0.001 0.141 0.001 0.057 0.057
24 | 0.142 0.029 0.000 0.001 0.001 0.029 0.029 0.226 0.113 0.000 0.001 0.311 0.113 0.001 0.000 0.001 0.000 0.001 0.001 0.001
27 | 0.029 0.029 0.000 0.001 0.029 0.451 0.170 0.000 0.085 0.000 0.001 0.086 0.000 0.001 0.085 0.029 0.000 0.001 0.001 0.001
28 | 0.114 0.001 0.000 0.029 0.029 0.085 0.057 0.000 0.001 0.000 0.001 0.001 0.000 0.029 0.141 0.001 0.057 0.226 0.226 0.001
Background probability model
0.089 0.079 0.008 0.067 0.076 0.044 0.071 0.013 0.061 0.009 0.076 0.094 0.023 0.043 0.027 0.045 0.034 0.044 0.052 0.052
25 columns
Num Motifs: 35
1, 1 37 ldfdk EFRDKTVVIVAIPGAFTPTCTANHIPPF vekft 64 1.00 F 1091044
2, 1 32 irlsd YRGKKYVILFFYPANFTAISPTELMLLS drise 59 1.00 F 11467494
6, 1 26 eikei DLKSNWNVFFFYPYSYSFICPLELKNIS nkike 53 1.00 F 13186328
8, 1 28 kirls SYRGKWVVLFFYPADFTFVCPTEVEGFA edyek 55 1.00 F 13541053
9, 1 26 mrkls EFRGQNVVLAFFPGAFTSVCTKEMCTFR dsman 53 1.00 F 13541117
13, 1 25 melpd EFEGKWFILFSHPADFTPVCTTEFVAFQ evype 52 1.00 F 14286173
15, 1 25 kirls DFRGRIVVLYFYPRAMTPGCTREGVRFN ellde 52 1.00 F 14600438
22, 1 26 vtlrg YRGAKNVLLVFFPLAFTGICQGELDQLR dhlpe 53 1.00 F 15609375
23, 1 30 nvsla DYRGRRVIVYFYPAASTPGCTKQACDFR dnlgd 57 1.00 F 15609658
24, 1 24 tvsls DFKGKNIVLYFYPKDMTPGCTTEACDFR drved 51 1.00 F 15613511
28, 1 20 tfthv DLYGKYTILFFFPKAGTSGCTREAVEFS renfe 47 1.00 F 15643152
30, 1 61 gltda LADNRAVVLFFYPFDFSPVCATELCAIQ narwf 88 1.00 F 15790738
35, 1 26 itlss YRGQSHVVLVFYPLDFSPVCSMQLPEYS gsqdd 53 1.00 F 15807234
36, 1 28 vnlae LFKGKKGVLFGVPGAFTPGCSKTHLPGF veqae 55 1.00 F 15826629
38, 1 26 vkips DFKGKVVVLAFYPAAFTSVCTKEMCTFR dsmak 53 1.00 F 15899339
39, 1 30 vttel LFKGKRVVLFAVPGAFTPTCSLNHLPGY lenrd 57 1.00 F 15964668
44, 1 50 fnlak ALKKGPVVLYFFPAAYTAGCTAEAREFA eatpe 77 1.00 F 16125919
47, 1 31 fnfkq HTNGKTTVLFFWPMDFTFVCPSELIAFD kryee 58 1.00 F 16501671
51, 1 33 slekn IEDDKWTILFFYPMDFTFVCPTEIVAIS arsde 60 1.00 F 16803644
53, 1 31 vttdd LFAGKTVAVFSLPGAFTPTCSSTHLPGY nelak 58 1.00 F 17229033
60, 1 28 rlsev LKRGRPVVLLFFPGAFTSVCTKELCTFR dkmal 55 1.00 F 18313548
62, 1 26 eislq DYIGKYVVLAFYPLDFTFVCPTEINRFS dlkga 53 1.00 F 19173077
67, 1 27 evtek DTEGRWSVFFFYPADFTFVCPTELGDVA dhyee 54 1.00 F 20151112
68, 1 29 vdtht LFTGRKVVLFAVPGAFTPTCSAKHLPGY veqfe 56 1.00 F 21112072
70, 1 32 qinhk TYEGQWKVVFAWPKDFTFVCPTEIAAFG klnde 59 1.00 F 21223405
71, 1 28 eihly DLKGKKVLLSFHPLAWTQVCAQQMKSLE enyel 55 1.00 F 21227878
73, 1 25 mvsls EFKGRKVLLIFYPGDDTPVCTAQLCDYR nnvaa 52 1.00 F 21674812
81, 1 28 irfhd FLGDSWGILFSHPRDFTPVCTTELGRAA klape 55 1.00 F 3318841
85, 1 10 eidin EYKGKYVVLLFYPLDWTFVCPTEMIGYS evagq 37 1.00 F 4433065
86, 1 32 vsvhs IAAGKKVILFGVPGAFTPTCSMSHVPGF igkae 59 1.00 F 4704732
87, 1 28 fdfyk YVGDNWAILFSHPHDFTPVCTTELAEFG kmhee 55 1.00 F 4996210
88, 1 41 ynask EFANKKVVLFALPGAFTPVCSANHVPEY iqklp 68 1.00 F 5326864
89, 1 88 slkki TENNRVVVFFVYPRASTPGCTRQACGFR dnyqe 115 1.00 F 6322180
90, 1 43 ewskl ISENKKVIITGAPAAFSPTCTVSHIPGY inyld 70 1.00 F 6323138
95, 1 31 evkls DYKGKYVVLFFYPLDFTFVCPTEIIAFS nraed 58 1.00 F 9955016
***** ****************** **
Column 1 : Sequence Number, Site Number
Column 2 : Left End Location
Column 4 : Motif Element
Column 5 : Right End Location
Column 6 : Probability of Element
Column 7 : Forward Motif (F) or Reverse Complement (R)
Column 8 : Sequence Description from Fast A input
Log Motif portion of MAP for motif d = -1668.31468
Log Fragmentation portion of MAP for motif d = -7.86327
=============================================================
====== ELEMENTS OCCURRING GREATER THAN 50% OF THE TIME =====
====== Motif d =====
=============================================================
Listing of those elements occurring greater than 50% of the time
in near optimal sampling using 500 iterations
Motif model (residue frequency x 100)
____________________________________________
Pos. # a v c d e f g h i w k l m n y p q r s t Info
_____________________________________________________________________________________________
1 | 2 . . 28 17 2 . 2 8 . . 17 . . 11 . . . 2 5 1.1
2 | 5 2 . . 5 34 . . . . 2 14 . . 17 . . 8 2 5 1.4
3 | 8 . . 5 11 . 14 . 2 . 28 . . 5 2 . . 17 . 2 1.0
4 | 2 . . 11 . . 62 . . . 5 . . 11 . . 2 . 2 . 2.1
5 | . . . . . . 2 . . . 57 . . 5 . . 5 22 5 . 2.2
7 | 2 68 . . . 2 5 . 2 . 2 . . 2 . . . . 2 8 1.9
8 | 2 62 . . . . . . 25 . . 8 . . . . . . . . 2.3
9 | . 8 . . . 8 . . 5 . . 77 . . . . . . . . 2.3
10 | 8 8 . . . 57 . . 2 . . 5 . . 11 . . . 2 2 2.1
11 | 14 2 . . . 62 8 . . . . . . . . . . . 11 . 2.4
12 | 2 11 . . . 14 . 11 2 5 . 5 . . 45 . . . . . 2.4
13 | . . . . . . . . . . . . . . . 100 . . . . 4.3
14 | 22 . . . . 2 28 2 . . 8 17 5 . 2 . . 8 . . 1.2
15 | 51 . . 42 . . . . . . . . . 2 . . . . 2 . 2.3
16 | . . . 2 . 71 2 . . 5 . . 5 . 5 . . . 5 . 2.9
17 | . . . . . . . . . . . . . . . . . . 11 88 3.6
18 | 5 . . . . 25 2 . . . . . . . . 51 2 . 11 . 2.4
19 | . 54 . . . . 20 . 8 . . . . . . . . . . 17 2.0
20 | . . 97 . . . . . . . . . . . . . . . 2 . 6.2
21 | 5 . . . . . . . . . . . . . . 28 2 . 20 42 2.3
22 | 14 2 . . . . 2 . . . 14 5 5 . . . 2 8 5 37 1.3
23 | . . . . 62 . . . . . 2 . . 8 . . 14 . 5 5 2.2
24 | 14 2 . . . 2 2 22 11 . . 31 11 . . . . . . . 1.8
27 | 2 2 . . 2 45 17 . 8 . . 8 . . 8 2 . . . . 1.6
28 | 11 . . 2 2 8 5 . . . . . . 2 14 . 5 22 22 . 1.4
nonsite 8 7 . 6 7 4 7 1 5 . 7 9 2 4 2 4 3 4 5 5
site 7 9 3 3 4 13 7 1 3 . 4 7 1 1 4 7 1 3 4 8
Motif probability model
____________________________________________
Pos. # a v c d e f g h i w k l m n y p q r s t
____________________________________________
1 | 0.029 0.001 0.000 0.283 0.170 0.029 0.001 0.028 0.085 0.000 0.001 0.170 0.000 0.001 0.113 0.001 0.000 0.001 0.029 0.057
2 | 0.058 0.029 0.000 0.001 0.057 0.339 0.001 0.000 0.001 0.000 0.029 0.142 0.000 0.001 0.169 0.001 0.000 0.085 0.029 0.057
3 | 0.086 0.001 0.000 0.057 0.114 0.001 0.142 0.000 0.029 0.000 0.283 0.001 0.000 0.057 0.029 0.001 0.000 0.170 0.001 0.029
4 | 0.029 0.001 0.000 0.114 0.001 0.001 0.621 0.000 0.001 0.000 0.057 0.001 0.000 0.113 0.000 0.001 0.029 0.001 0.029 0.001
5 | 0.001 0.001 0.000 0.001 0.001 0.001 0.029 0.000 0.001 0.000 0.564 0.001 0.000 0.057 0.000 0.001 0.057 0.226 0.057 0.001
7 | 0.029 0.677 0.000 0.001 0.001 0.029 0.057 0.000 0.029 0.000 0.029 0.001 0.000 0.029 0.000 0.001 0.000 0.001 0.029 0.085
8 | 0.029 0.621 0.000 0.001 0.001 0.001 0.001 0.000 0.254 0.000 0.001 0.086 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.001
9 | 0.001 0.086 0.000 0.001 0.001 0.085 0.001 0.000 0.057 0.000 0.001 0.762 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.001
10 | 0.086 0.086 0.000 0.001 0.001 0.564 0.001 0.000 0.029 0.000 0.001 0.058 0.000 0.001 0.113 0.001 0.000 0.001 0.029 0.029
11 | 0.142 0.029 0.000 0.001 0.001 0.620 0.085 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.113 0.001
12 | 0.029 0.114 0.000 0.001 0.001 0.142 0.001 0.113 0.029 0.057 0.001 0.058 0.000 0.001 0.451 0.001 0.000 0.001 0.001 0.001
13 | 0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.987 0.000 0.001 0.001 0.001
14 | 0.227 0.001 0.000 0.001 0.001 0.029 0.283 0.028 0.001 0.000 0.086 0.170 0.057 0.001 0.029 0.001 0.000 0.085 0.001 0.001
15 | 0.508 0.001 0.000 0.423 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.029 0.000 0.001 0.000 0.001 0.029 0.001
16 | 0.001 0.001 0.000 0.029 0.001 0.705 0.029 0.000 0.001 0.057 0.001 0.001 0.057 0.001 0.057 0.001 0.000 0.001 0.057 0.001
17 | 0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.113 0.874
18 | 0.058 0.001 0.000 0.001 0.001 0.254 0.029 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.508 0.029 0.001 0.113 0.001
19 | 0.001 0.536 0.000 0.001 0.001 0.001 0.198 0.000 0.085 0.000 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.170
20 | 0.001 0.001 0.958 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.029 0.001
21 | 0.058 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.282 0.029 0.001 0.198 0.423
22 | 0.142 0.029 0.000 0.001 0.001 0.001 0.029 0.000 0.001 0.000 0.142 0.058 0.057 0.001 0.000 0.001 0.029 0.085 0.057 0.367
23 | 0.001 0.001 0.000 0.001 0.621 0.001 0.001 0.000 0.001 0.000 0.029 0.001 0.000 0.085 0.000 0.001 0.141 0.001 0.057 0.057
24 | 0.142 0.029 0.000 0.001 0.001 0.029 0.029 0.226 0.113 0.000 0.001 0.311 0.113 0.001 0.000 0.001 0.000 0.001 0.001 0.001
27 | 0.029 0.029 0.000 0.001 0.029 0.451 0.170 0.000 0.085 0.000 0.001 0.086 0.000 0.001 0.085 0.029 0.000 0.001 0.001 0.001
28 | 0.114 0.001 0.000 0.029 0.029 0.085 0.057 0.000 0.001 0.000 0.001 0.001 0.000 0.029 0.141 0.001 0.057 0.226 0.226 0.001
Background probability model
0.089 0.079 0.008 0.067 0.076 0.044 0.071 0.013 0.061 0.009 0.076 0.094 0.023 0.043 0.027 0.045 0.034 0.044 0.052 0.052
25 columns
Num Motifs: 35
1, 1 37 ldfdk EFRDKTVVIVAIPGAFTPTCTANHIPPF vekft 64 1.00 F 1091044
2, 1 32 irlsd YRGKKYVILFFYPANFTAISPTELMLLS drise 59 1.00 F 11467494
6, 1 26 eikei DLKSNWNVFFFYPYSYSFICPLELKNIS nkike 53 0.98 F 13186328
8, 1 28 kirls SYRGKWVVLFFYPADFTFVCPTEVEGFA edyek 55 1.00 F 13541053
9, 1 26 mrkls EFRGQNVVLAFFPGAFTSVCTKEMCTFR dsman 53 1.00 F 13541117
13, 1 25 melpd EFEGKWFILFSHPADFTPVCTTEFVAFQ evype 52 1.00 F 14286173
15, 1 25 kirls DFRGRIVVLYFYPRAMTPGCTREGVRFN ellde 52 1.00 F 14600438
22, 1 26 vtlrg YRGAKNVLLVFFPLAFTGICQGELDQLR dhlpe 53 1.00 F 15609375
23, 1 30 nvsla DYRGRRVIVYFYPAASTPGCTKQACDFR dnlgd 57 1.00 F 15609658
24, 1 24 tvsls DFKGKNIVLYFYPKDMTPGCTTEACDFR drved 51 1.00 F 15613511
28, 1 20 tfthv DLYGKYTILFFFPKAGTSGCTREAVEFS renfe 47 1.00 F 15643152
30, 1 61 gltda LADNRAVVLFFYPFDFSPVCATELCAIQ narwf 88 1.00 F 15790738
35, 1 26 itlss YRGQSHVVLVFYPLDFSPVCSMQLPEYS gsqdd 53 1.00 F 15807234
36, 1 28 vnlae LFKGKKGVLFGVPGAFTPGCSKTHLPGF veqae 55 1.00 F 15826629
38, 1 26 vkips DFKGKVVVLAFYPAAFTSVCTKEMCTFR dsmak 53 1.00 F 15899339
39, 1 30 vttel LFKGKRVVLFAVPGAFTPTCSLNHLPGY lenrd 57 1.00 F 15964668
44, 1 50 fnlak ALKKGPVVLYFFPAAYTAGCTAEAREFA eatpe 77 1.00 F 16125919
47, 1 31 fnfkq HTNGKTTVLFFWPMDFTFVCPSELIAFD kryee 58 1.00 F 16501671
51, 1 33 slekn IEDDKWTILFFYPMDFTFVCPTEIVAIS arsde 60 1.00 F 16803644
53, 1 31 vttdd LFAGKTVAVFSLPGAFTPTCSSTHLPGY nelak 58 1.00 F 17229033
60, 1 28 rlsev LKRGRPVVLLFFPGAFTSVCTKELCTFR dkmal 55 1.00 F 18313548
62, 1 26 eislq DYIGKYVVLAFYPLDFTFVCPTEINRFS dlkga 53 1.00 F 19173077
67, 1 27 evtek DTEGRWSVFFFYPADFTFVCPTELGDVA dhyee 54 1.00 F 20151112
68, 1 29 vdtht LFTGRKVVLFAVPGAFTPTCSAKHLPGY veqfe 56 1.00 F 21112072
70, 1 32 qinhk TYEGQWKVVFAWPKDFTFVCPTEIAAFG klnde 59 1.00 F 21223405
71, 1 28 eihly DLKGKKVLLSFHPLAWTQVCAQQMKSLE enyel 55 1.00 F 21227878
73, 1 25 mvsls EFKGRKVLLIFYPGDDTPVCTAQLCDYR nnvaa 52 1.00 F 21674812
81, 1 28 irfhd FLGDSWGILFSHPRDFTPVCTTELGRAA klape 55 1.00 F 3318841
85, 1 10 eidin EYKGKYVVLLFYPLDWTFVCPTEMIGYS evagq 37 1.00 F 4433065
86, 1 32 vsvhs IAAGKKVILFGVPGAFTPTCSMSHVPGF igkae 59 1.00 F 4704732
87, 1 28 fdfyk YVGDNWAILFSHPHDFTPVCTTELAEFG kmhee 55 1.00 F 4996210
88, 1 41 ynask EFANKKVVLFALPGAFTPVCSANHVPEY iqklp 68 1.00 F 5326864
89, 1 88 slkki TENNRVVVFFVYPRASTPGCTRQACGFR dnyqe 115 1.00 F 6322180
90, 1 43 ewskl ISENKKVIITGAPAAFSPTCTVSHIPGY inyld 70 0.99 F 6323138
95, 1 31 evkls DYKGKYVVLFFYPLDFTFVCPTEIIAFS nraed 58 1.00 F 9955016
***** ****************** **
Log Background portion of Map = -39912.17887
Log Alignment portion of Map = -957.33606
Log Site/seq portion of Map = 0.00000
Log Null Map = -46943.36311
Log Map = 2111.15797
log MAP = sum of motif and fragmentation parts of MAP + background + alignment + sites/seq - Null
=============================================================
====== Results by Sequence =====
====== ELEMENTS OCCURRING GREATER THAN 50% OF THE TIME =====
=============================================================
1, 1, 3 37 ldfdk EFRDKTVVIVAIPGAFTPTCTANHIPPF vekft 64 1.00 F 1091044
2, 1, 3 32 irlsd YRGKKYVILFFYPANFTAISPTELMLLS drise 59 1.00 F 11467494
2, 2, 0 72 klstq ILAISVDSPFSH lqyll 83 1.00 F 11467494
2, 3, 1 161 riles IQYVKENPGYACPVNWNFG dqvfy 179 1.00 F 11467494
3, 1, 2 17 viqsd KLVVVDFYADWCMPCRYISPILEKL skeyn 41 1.00 F 11499727
4, 1, 2 22 llntt QYVVADFYADWCGPCKAIAPMYAQF aktfs 46 1.00 F 1174686
5, 1, 2 19 ifsak KNVIVDFWAAWCGPCKLTSPEFQKA adefs 43 1.00 F 12044976
6, 1, 3 26 eikei DLKSNWNVFFFYPYSYSFICPLELKNIS nkike 53 0.98 F 13186328
6, 2, 0 66 nlntk IYAISNDSHFVQ knwie 77 1.00 F 13186328
7, 1, 2 21 kenhs KPILIDFYADWCPPCRMLIPVLDSI ekkhg 45 1.00 F 13358154
8, 1, 3 28 kirls SYRGKWVVLFFYPADFTFVCPTEVEGFA edyek 55 1.00 F 13541053
8, 2, 0 68 kknte VISVSEDTVYVH kawvq 79 1.00 F 13541053
9, 1, 3 26 mrkls EFRGQNVVLAFFPGAFTSVCTKEMCTFR dsman 53 1.00 F 13541117
9, 2, 0 66 kfkak VIGISVDSPFSL aefak 77 1.00 F 13541117
10, 1, 2 17 aetse GVVLADFWAPWCGPCKMIAPVLEEL dqemg 41 1.00 F 135765
11, 1, 2 29 akesn KLIVIDFTASWCPPCRMIAPIFNDL akkfm 53 1.00 F 1388082
12, 1, 2 44 likqn DKLVIDFYATWCGPCKMMQPHLTKL iqayp 68 1.00 F 140543
13, 1, 3 25 melpd EFEGKWFILFSHPADFTPVCTTEFVAFQ evype 52 1.00 F 14286173
13, 2, 0 65 eldce LVGLSVDQVFSH ikwie 76 0.98 F 14286173
14, 1, 2 80 selrg KVVMLQFTASWCGVCRKEMPFIEKD iwlkh 104 1.00 F 14578634
15, 1, 3 25 kirls DFRGRIVVLYFYPRAMTPGCTREGVRFN ellde 52 1.00 F 14600438
15, 2, 0 65 klgav VIGVSTDSVEKN rkfae 76 1.00 F 14600438
16, 1, 2 23 lqnsd KPVLVDFYATWCGPCQLMVPILNEV setlk 47 1.00 F 15218394
17, 1, 2 157 adfrg RPLVINLWASWCPPCRREMPVLQQA qaenp 181 1.00 F 15597673
18, 1, 2 26 ensfh KPVLVDFWADWCAPCKALMPLLAQI aesyq 50 1.00 F 15599256
19, 1, 2 67 negkg KTILLNFWSETCGVCIAELKTFEQL lqsyp 91 1.00 F 15602312
20, 1, 2 61 eefkg KVLLINFWATWCPPCKEEIPMFKEI yekyr 85 1.00 F 15605725
21, 1, 0 80 megvd VTVVSMDLPFAQ krfce 91 0.65 F 15605963
22, 1, 3 26 vtlrg YRGAKNVLLVFFPLAFTGICQGELDQLR dhlpe 53 1.00 F 15609375
23, 1, 3 30 nvsla DYRGRRVIVYFYPAASTPGCTKQACDFR dnlgd 57 1.00 F 15609658
23, 2, 0 70 tagln VVGISPDKPEKL atfrd 81 0.99 F 15609658
24, 1, 3 24 tvsls DFKGKNIVLYFYPKDMTPGCTTEACDFR drved 51 1.00 F 15613511
24, 2, 0 64 glntv ILGVSPDPVERH kkfie 75 1.00 F 15613511
25, 1, 2 60 sdyrg DVVILNVWASWCEPCRKEMPALMEL qsdye 84 1.00 F 15614085
26, 1, 2 63 releg KGVFLNFWGTYCPPCEREMPHMEKL ygeyk 87 1.00 F 15614140
27, 1, 2 72 sslrg QPVILHFFATWCPVCQDEMPSLVKL dkeyr 96 1.00 F 15615431
28, 1, 3 20 tfthv DLYGKYTILFFFPKAGTSGCTREAVEFS renfe 47 1.00 F 15643152
28, 2, 0 56 fekaq VVGISRDSVEAL krfke 67 1.00 F 15643152
30, 1, 3 61 gltda LADNRAVVLFFYPFDFSPVCATELCAIQ narwf 88 1.00 F 15790738
30, 2, 0 101 tpgla VWGISPDSTYAH eafad 112 0.98 F 15790738
31, 1, 2 2 m TVTLKDFYADWCGPCKTQDPILEEL eadyd 26 1.00 F 15791337
32, 1, 0 80 idntv VLCISADLPFAQ srfcg 91 0.99 F 15801846
33, 1, 2 72 adyrg RPVVLNFWASWCGPCREEAPLFAKL aahpg 96 1.00 F 15805225
34, 1, 2 78 taaqg KPVVINFWASWCVPCRQEAPLFSKL sqeta 102 1.00 F 15805374
35, 1, 3 26 itlss YRGQSHVVLVFYPLDFSPVCSMQLPEYS gsqdd 53 1.00 F 15807234
35, 2, 0 66 eagav VLGINRDSVYAH rawaa 77 1.00 F 15807234
36, 1, 3 28 vnlae LFKGKKGVLFGVPGAFTPGCSKTHLPGF veqae 55 1.00 F 15826629
37, 1, 2 49 fitkn KIVVVDFWAEWCAPCLILAPVIEEL andyp 73 1.00 F 15899007
38, 1, 3 26 vkips DFKGKVVVLAFYPAAFTSVCTKEMCTFR dsmak 53 1.00 F 15899339
38, 2, 0 66 evnav VIGISVDPPFSN kafke 77 1.00 F 15899339
39, 1, 3 30 vttel LFKGKRVVLFAVPGAFTPTCSLNHLPGY lenrd 57 1.00 F 15964668
40, 1, 2 61 easrq QPVLVDFWAPWCGPCKQLTPVIEKV vreaa 85 1.00 F 15966937
41, 1, 2 61 sdfrg KTLLVNLWATWCVPCRKEMPALDEL qgkls 85 1.00 F 15988313
42, 1, 2 60 qdakg KKVLLNFWATWCKPCRQEMPAMEKL qkeya 84 1.00 F 16078864
43, 1, 2 53 llqdd LPMVIDFWAPWCGPCRSFAPIFAET aaera 77 1.00 F 16123427
44, 1, 3 50 fnlak ALKKGPVVLYFFPAAYTAGCTAEAREFA eatpe 77 1.00 F 16125919
46, 1, 2 21 vlkad GAILVDFWAEWCGPCKMIAPILDEI adeyq 45 1.00 F 1633495
47, 1, 3 31 fnfkq HTNGKTTVLFFWPMDFTFVCPSELIAFD kryee 58 1.00 F 16501671
47, 2, 0 71 krgve VVGVSFDSEFVH nawrn 82 1.00 F 16501671
47, 3, 1 160 lrmvd ALQFHEEHGDVCPAQWEKG kegmn 178 1.00 F 16501671
48, 1, 2 34 vlqcp KPILVYFGAPWCGLCHFVKPLLNHL hgewq 58 1.00 F 1651717
49, 1, 2 60 tlsee RPVLLYFWASWCGVCRFTTPAVAHL aaege 84 1.00 F 16759994
50, 1, 2 53 llkdd LPVVIDFWAPWCGPCRNFAPIFEDV aeers 77 1.00 F 16761507
51, 1, 3 33 slekn IEDDKWTILFFYPMDFTFVCPTEIVAIS arsde 60 1.00 F 16803644
52, 1, 2 19 iissh PKILLNFWAEWCAPCRCFWPTLEQF aemee 43 1.00 F 16804867
53, 1, 3 31 vttdd LFAGKTVAVFSLPGAFTPTCSSTHLPGY nelak 58 1.00 F 17229033
54, 1, 2 22 vlsed KVVVVDFTATWCGPCRLVSPLMDQL adeyk 46 1.00 F 17229859
55, 1, 2 18 vlegt GYVLVDYFSDGCVPCKALMPAVEEL skkye 42 1.00 F 1729944
56, 1, 2 28 rqhpe KIIILDFYATWCGPCKAIAPLYKEL atthk 52 1.00 F 17531233
57, 1, 2 27 ehlkg KIIGLYFSASWCPPCRAFTPKLKEF feeik 51 1.00 F 17537401
58, 1, 2 63 safrg QPVVINFWAPWCGPCVEEMPELSAL aqeqk 87 1.00 F 17547503
59, 1, 2 286 seykg KTIFLNFWATWCPPCRGEMPYIDEL ykeyn 310 1.00 F 18309723
60, 1, 3 28 rlsev LKRGRPVVLLFFPGAFTSVCTKELCTFR dkmal 55 1.00 F 18313548
60, 2, 0 68 kanae VLAISVDSPFAL kafkd 79 1.00 F 18313548
61, 1, 2 44 dsllg KKIGLYFSAAWCGPCQRFTPQLVEV ynels 68 1.00 F 18406743
61, 2, 2 364 sdlvg KTILMYFSAHWCPPCRAFTPKLVEV ykqik 388 1.00 F 18406743
62, 1, 3 26 eislq DYIGKYVVLAFYPLDFTFVCPTEINRFS dlkga 53 1.00 F 19173077
62, 2, 0 66 rrnav VLLISCDSVYTH kawas 77 1.00 F 19173077
63, 1, 2 15 sdfeg EVVVLNAWGQWCAPCRAEVDDLQLV qetld 39 1.00 F 19554157
64, 1, 2 39 eeykg KVVVINFWATWCGYCVEEMPGFEKV ykefg 63 1.00 F 19705357
66, 1, 2 7 agdfm KPMLLDFSATWCGPCRMQKPILEEL ekkyg 31 1.00 F 20092028
67, 1, 3 27 evtek DTEGRWSVFFFYPADFTFVCPTELGDVA dhyee 54 1.00 F 20151112
67, 2, 0 67 klgvd VYSVSTDTHFTH kawhs 78 1.00 F 20151112
67, 3, 1 154 rkika AQYVAAHPGEVCPAKWKEG eatla 172 1.00 F 20151112
68, 1, 3 29 vdtht LFTGRKVVLFAVPGAFTPTCSAKHLPGY veqfe 56 1.00 F 21112072
69, 1, 2 103 adykg KVVVLNVWGSWCPPCRAEAKNFEKV yqdvk 127 1.00 F 21222859
70, 1, 3 32 qinhk TYEGQWKVVFAWPKDFTFVCPTEIAAFG klnde 59 1.00 F 21223405
70, 2, 0 72 drdaq ILGFSGDSEFVH hawrk 83 1.00 F 21223405
71, 1, 3 28 eihly DLKGKKVLLSFHPLAWTQVCAQQMKSLE enyel 55 1.00 F 21227878
72, 1, 0 78 keegi VLTISADLPFAQ krwca 89 0.99 F 21283385
73, 1, 3 25 mvsls EFKGRKVLLIFYPGDDTPVCTAQLCDYR nnvaa 52 1.00 F 21674812
73, 2, 0 65 srgit VIGISGDSPESH kqfae 76 1.00 F 21674812
74, 1, 2 53 sdfkg ERVLINFWTTWCPPCRQEMPDMQRF yqdlq 77 1.00 F 23098307
76, 1, 2 20 kylqh QRVVVDFSAEWCGPCRAIAPVFDKL sneft 44 1.00 F 267116
77, 1, 2 81 aafkg KVSLVNVWASWCVPCHDEAPLLTEL gkdkr 105 1.00 F 27375582
78, 1, 2 34 vtsdn DVVLADFYADWCGPCQMLEPVVETL aeqtd 58 1.00 F 2822332
79, 1, 2 77 sdlkg KKVILNFWATWCGPCQQEMPDMEAF ykehk 101 1.00 F 30021713
80, 1, 2 19 tisan SNVLVYFWAPLCAPCDLFTPTYEAS srkhf 43 1.00 F 3261501
81, 1, 3 28 irfhd FLGDSWGILFSHPRDFTPVCTTELGRAA klape 55 1.00 F 3318841
81, 2, 0 68 krnvk LIALSIDSVEDH lawsk 79 1.00 F 3318841
81, 3, 1 166 lrvvi SLQLTAEKRVATPVDWKDG dsvmv 184 1.00 F 3318841
82, 1, 2 19 tietn PLVIVDFWAPWCGSCKMLGPVLEEV esevg 43 1.00 F 3323237
83, 1, 2 17 ektah QAVVVNVGASWCPDCRKIEPIMENL aktyk 41 1.00 F 4155972
84, 1, 2 79 vvnse TPVVVDFHAQWCGPCKILGPRLEKM vakqh 103 1.00 F 4200327
85, 1, 3 10 eidin EYKGKYVVLLFYPLDWTFVCPTEMIGYS evagq 37 1.00 F 4433065
85, 2, 0 50 eince VIGVSVDSVYCH qawce 61 1.00 F 4433065
86, 1, 3 32 vsvhs IAAGKKVILFGVPGAFTPTCSMSHVPGF igkae 59 1.00 F 4704732
87, 1, 3 28 fdfyk YVGDNWAILFSHPHDFTPVCTTELAEFG kmhee 55 1.00 F 4996210
87, 2, 0 68 klnck LIGFSCNSKESH dqwie 79 0.56 F 4996210
87, 3, 1 163 lrvlk SLQLTNTHPVATPVNWKEG dkcci 181 1.00 F 4996210
88, 1, 3 41 ynask EFANKKVVLFALPGAFTPVCSANHVPEY iqklp 68 1.00 F 5326864
89, 1, 3 88 slkki TENNRVVVFFVYPRASTPGCTRQACGFR dnyqe 115 1.00 F 6322180
89, 2, 0 127 kkyaa VFGLSADSVTSQ kkfqs 138 0.51 F 6322180
90, 1, 3 43 ewskl ISENKKVIITGAPAAFSPTCTVSHIPGY inyld 70 0.99 F 6323138
91, 1, 2 20 nenkg RLIVVDFFAQWCGPCRNIAPKVEAL akeip 44 1.00 F 6687568
92, 1, 0 68 klgve VLSVSVDSVFVH kmwnd 79 1.00 F 6850955
93, 1, 2 18 llttn KKVVVDFYANWCGPCKILGPIFEEV aqdkk 42 1.00 F 7109697
94, 1, 2 21 ilaed KLVVIDFYADWCGPCKIIAPKLDEL aqqys 45 1.00 F 7290567
95, 1, 3 31 evkls DYKGKYVVLFFYPLDFTFVCPTEIIAFS nraed 58 1.00 F 9955016
95, 2, 0 71 klgce VLGVSVDSQFTH lawin 82 1.00 F 9955016
95, 3, 1 160 lrlvq AFQYTDEHGEVCPAGWKPG sdtik 178 1.00 F 9955016
96, 1, 2 49 adlqg KVTLINFWFPSCPGCVSEMPKIIKT andyk 73 1.00 F 15677788
122 motifs
Column 1 : Sequence Number, Site Number
Column 2 : Motif type
Column 3 : Left End Location
Column 4 : Motif Element
Column 5 : Right End Location
Column 6 : Probability of Element
Column 7 : Forward Motif (F) or Reverse Complement (R)
Column 8 : Sequence Description from Fast A input
======================== MAP MAXIMIZATION RESULTS ====================
======================================================================
=============================================================
====== Results by Sequence =====
=============================================================
1, 1, 3 37 ldfdk EFRDKTVVIVAIPGAFTPTCTANHIPPF vekft 64 1.00 F 1091044
1, 2, 0 79 agvda VIVLSANDPFVQ safgk 90 0.48 F 1091044
2, 1, 3 32 irlsd YRGKKYVILFFYPANFTAISPTELMLLS drise 59 1.00 F 11467494
2, 2, 0 72 klstq ILAISVDSPFSH lqyll 83 1.00 F 11467494
2, 3, 1 161 riles IQYVKENPGYACPVNWNFG dqvfy 179 1.00 F 11467494
3, 1, 2 17 viqsd KLVVVDFYADWCMPCRYISPILEKL skeyn 41 1.00 F 11499727
4, 1, 2 22 llntt QYVVADFYADWCGPCKAIAPMYAQF aktfs 46 1.00 F 1174686
5, 1, 2 19 ifsak KNVIVDFWAAWCGPCKLTSPEFQKA adefs 43 1.00 F 12044976
6, 1, 3 26 eikei DLKSNWNVFFFYPYSYSFICPLELKNIS nkike 53 0.98 F 13186328
6, 2, 0 66 nlntk IYAISNDSHFVQ knwie 77 1.00 F 13186328
7, 1, 2 21 kenhs KPILIDFYADWCPPCRMLIPVLDSI ekkhg 45 1.00 F 13358154
8, 1, 3 28 kirls SYRGKWVVLFFYPADFTFVCPTEVEGFA edyek 55 1.00 F 13541053
8, 2, 0 68 kknte VISVSEDTVYVH kawvq 79 1.00 F 13541053
9, 1, 3 26 mrkls EFRGQNVVLAFFPGAFTSVCTKEMCTFR dsman 53 1.00 F 13541117
9, 2, 0 66 kfkak VIGISVDSPFSL aefak 77 1.00 F 13541117
10, 1, 2 17 aetse GVVLADFWAPWCGPCKMIAPVLEEL dqemg 41 1.00 F 135765
11, 1, 2 29 akesn KLIVIDFTASWCPPCRMIAPIFNDL akkfm 53 1.00 F 1388082
12, 1, 2 44 likqn DKLVIDFYATWCGPCKMMQPHLTKL iqayp 68 1.00 F 140543
13, 1, 3 25 melpd EFEGKWFILFSHPADFTPVCTTEFVAFQ evype 52 1.00 F 14286173
13, 2, 0 65 eldce LVGLSVDQVFSH ikwie 76 0.98 F 14286173
14, 1, 2 80 selrg KVVMLQFTASWCGVCRKEMPFIEKD iwlkh 104 1.00 F 14578634
15, 1, 3 25 kirls DFRGRIVVLYFYPRAMTPGCTREGVRFN ellde 52 1.00 F 14600438
15, 2, 0 65 klgav VIGVSTDSVEKN rkfae 76 1.00 F 14600438
16, 1, 2 23 lqnsd KPVLVDFYATWCGPCQLMVPILNEV setlk 47 1.00 F 15218394
17, 1, 2 157 adfrg RPLVINLWASWCPPCRREMPVLQQA qaenp 181 1.00 F 15597673
18, 1, 2 26 ensfh KPVLVDFWADWCAPCKALMPLLAQI aesyq 50 1.00 F 15599256
19, 1, 2 67 negkg KTILLNFWSETCGVCIAELKTFEQL lqsyp 91 1.00 F 15602312
20, 1, 2 61 eefkg KVLLINFWATWCPPCKEEIPMFKEI yekyr 85 1.00 F 15605725
21, 1, 0 80 megvd VTVVSMDLPFAQ krfce 91 0.65 F 15605963
22, 1, 3 26 vtlrg YRGAKNVLLVFFPLAFTGICQGELDQLR dhlpe 53 1.00 F 15609375
23, 1, 3 30 nvsla DYRGRRVIVYFYPAASTPGCTKQACDFR dnlgd 57 1.00 F 15609658
23, 2, 0 70 tagln VVGISPDKPEKL atfrd 81 0.99 F 15609658
24, 1, 3 24 tvsls DFKGKNIVLYFYPKDMTPGCTTEACDFR drved 51 1.00 F 15613511
24, 2, 0 64 glntv ILGVSPDPVERH kkfie 75 1.00 F 15613511
25, 1, 2 60 sdyrg DVVILNVWASWCEPCRKEMPALMEL qsdye 84 1.00 F 15614085
26, 1, 2 63 releg KGVFLNFWGTYCPPCEREMPHMEKL ygeyk 87 1.00 F 15614140
27, 1, 2 72 sslrg QPVILHFFATWCPVCQDEMPSLVKL dkeyr 96 1.00 F 15615431
28, 1, 3 20 tfthv DLYGKYTILFFFPKAGTSGCTREAVEFS renfe 47 1.00 F 15643152
28, 2, 0 56 fekaq VVGISRDSVEAL krfke 67 1.00 F 15643152
30, 1, 3 61 gltda LADNRAVVLFFYPFDFSPVCATELCAIQ narwf 88 1.00 F 15790738
30, 2, 0 101 tpgla VWGISPDSTYAH eafad 112 0.98 F 15790738
31, 1, 2 2 m TVTLKDFYADWCGPCKTQDPILEEL eadyd 26 1.00 F 15791337
32, 1, 0 80 idntv VLCISADLPFAQ srfcg 91 0.99 F 15801846
33, 1, 2 72 adyrg RPVVLNFWASWCGPCREEAPLFAKL aahpg 96 1.00 F 15805225
34, 1, 2 78 taaqg KPVVINFWASWCVPCRQEAPLFSKL sqeta 102 1.00 F 15805374
35, 1, 3 26 itlss YRGQSHVVLVFYPLDFSPVCSMQLPEYS gsqdd 53 1.00 F 15807234
35, 2, 0 66 eagav VLGINRDSVYAH rawaa 77 1.00 F 15807234
36, 1, 3 28 vnlae LFKGKKGVLFGVPGAFTPGCSKTHLPGF veqae 55 1.00 F 15826629
37, 1, 2 49 fitkn KIVVVDFWAEWCAPCLILAPVIEEL andyp 73 1.00 F 15899007
38, 1, 3 26 vkips DFKGKVVVLAFYPAAFTSVCTKEMCTFR dsmak 53 1.00 F 15899339
38, 2, 0 66 evnav VIGISVDPPFSN kafke 77 1.00 F 15899339
39, 1, 3 30 vttel LFKGKRVVLFAVPGAFTPTCSLNHLPGY lenrd 57 1.00 F 15964668
40, 1, 2 61 easrq QPVLVDFWAPWCGPCKQLTPVIEKV vreaa 85 1.00 F 15966937
41, 1, 2 61 sdfrg KTLLVNLWATWCVPCRKEMPALDEL qgkls 85 1.00 F 15988313
42, 1, 2 60 qdakg KKVLLNFWATWCKPCRQEMPAMEKL qkeya 84 1.00 F 16078864
43, 1, 2 53 llqdd LPMVIDFWAPWCGPCRSFAPIFAET aaera 77 1.00 F 16123427
44, 1, 3 50 fnlak ALKKGPVVLYFFPAAYTAGCTAEAREFA eatpe 77 1.00 F 16125919
46, 1, 2 21 vlkad GAILVDFWAEWCGPCKMIAPILDEI adeyq 45 1.00 F 1633495
47, 1, 3 31 fnfkq HTNGKTTVLFFWPMDFTFVCPSELIAFD kryee 58 1.00 F 16501671
47, 2, 0 71 krgve VVGVSFDSEFVH nawrn 82 1.00 F 16501671
47, 3, 1 160 lrmvd ALQFHEEHGDVCPAQWEKG kegmn 178 1.00 F 16501671
48, 1, 2 34 vlqcp KPILVYFGAPWCGLCHFVKPLLNHL hgewq 58 1.00 F 1651717
49, 1, 2 60 tlsee RPVLLYFWASWCGVCRFTTPAVAHL aaege 84 1.00 F 16759994
50, 1, 2 53 llkdd LPVVIDFWAPWCGPCRNFAPIFEDV aeers 77 1.00 F 16761507
51, 1, 3 33 slekn IEDDKWTILFFYPMDFTFVCPTEIVAIS arsde 60 1.00 F 16803644
52, 1, 2 19 iissh PKILLNFWAEWCAPCRCFWPTLEQF aemee 43 1.00 F 16804867
53, 1, 3 31 vttdd LFAGKTVAVFSLPGAFTPTCSSTHLPGY nelak 58 1.00 F 17229033
53, 2, 0 73 ngvde IVCISVNDAFVM newak 84 0.32 F 17229033
54, 1, 2 22 vlsed KVVVVDFTATWCGPCRLVSPLMDQL adeyk 46 1.00 F 17229859
55, 1, 2 18 vlegt GYVLVDYFSDGCVPCKALMPAVEEL skkye 42 1.00 F 1729944
56, 1, 2 28 rqhpe KIIILDFYATWCGPCKAIAPLYKEL atthk 52 1.00 F 17531233
57, 1, 2 27 ehlkg KIIGLYFSASWCPPCRAFTPKLKEF feeik 51 1.00 F 17537401
58, 1, 2 63 safrg QPVVINFWAPWCGPCVEEMPELSAL aqeqk 87 1.00 F 17547503
59, 1, 2 286 seykg KTIFLNFWATWCPPCRGEMPYIDEL ykeyn 310 1.00 F 18309723
60, 1, 3 28 rlsev LKRGRPVVLLFFPGAFTSVCTKELCTFR dkmal 55 1.00 F 18313548
60, 2, 0 68 kanae VLAISVDSPFAL kafkd 79 1.00 F 18313548
61, 1, 2 44 dsllg KKIGLYFSAAWCGPCQRFTPQLVEV ynels 68 1.00 F 18406743
61, 2, 2 364 sdlvg KTILMYFSAHWCPPCRAFTPKLVEV ykqik 388 1.00 F 18406743
62, 1, 3 26 eislq DYIGKYVVLAFYPLDFTFVCPTEINRFS dlkga 53 1.00 F 19173077
62, 2, 0 66 rrnav VLLISCDSVYTH kawas 77 1.00 F 19173077
63, 1, 2 15 sdfeg EVVVLNAWGQWCAPCRAEVDDLQLV qetld 39 1.00 F 19554157
64, 1, 2 39 eeykg KVVVINFWATWCGYCVEEMPGFEKV ykefg 63 1.00 F 19705357
66, 1, 2 7 agdfm KPMLLDFSATWCGPCRMQKPILEEL ekkyg 31 1.00 F 20092028
67, 1, 3 27 evtek DTEGRWSVFFFYPADFTFVCPTELGDVA dhyee 54 1.00 F 20151112
67, 2, 0 67 klgvd VYSVSTDTHFTH kawhs 78 1.00 F 20151112
67, 3, 1 154 rkika AQYVAAHPGEVCPAKWKEG eatla 172 1.00 F 20151112
68, 1, 3 29 vdtht LFTGRKVVLFAVPGAFTPTCSAKHLPGY veqfe 56 1.00 F 21112072
69, 1, 2 103 adykg KVVVLNVWGSWCPPCRAEAKNFEKV yqdvk 127 1.00 F 21222859
70, 1, 3 32 qinhk TYEGQWKVVFAWPKDFTFVCPTEIAAFG klnde 59 1.00 F 21223405
70, 2, 0 72 drdaq ILGFSGDSEFVH hawrk 83 1.00 F 21223405
71, 1, 3 28 eihly DLKGKKVLLSFHPLAWTQVCAQQMKSLE enyel 55 1.00 F 21227878
72, 1, 0 78 keegi VLTISADLPFAQ krwca 89 0.99 F 21283385
73, 1, 3 25 mvsls EFKGRKVLLIFYPGDDTPVCTAQLCDYR nnvaa 52 1.00 F 21674812
73, 2, 0 65 srgit VIGISGDSPESH kqfae 76 1.00 F 21674812
74, 1, 2 53 sdfkg ERVLINFWTTWCPPCRQEMPDMQRF yqdlq 77 1.00 F 23098307
76, 1, 2 20 kylqh QRVVVDFSAEWCGPCRAIAPVFDKL sneft 44 1.00 F 267116
77, 1, 2 81 aafkg KVSLVNVWASWCVPCHDEAPLLTEL gkdkr 105 1.00 F 27375582
78, 1, 2 34 vtsdn DVVLADFYADWCGPCQMLEPVVETL aeqtd 58 1.00 F 2822332
79, 1, 2 77 sdlkg KKVILNFWATWCGPCQQEMPDMEAF ykehk 101 1.00 F 30021713
80, 1, 2 19 tisan SNVLVYFWAPLCAPCDLFTPTYEAS srkhf 43 1.00 F 3261501
81, 1, 3 28 irfhd FLGDSWGILFSHPRDFTPVCTTELGRAA klape 55 1.00 F 3318841
81, 2, 0 68 krnvk LIALSIDSVEDH lawsk 79 1.00 F 3318841
81, 3, 1 166 lrvvi SLQLTAEKRVATPVDWKDG dsvmv 184 1.00 F 3318841
82, 1, 2 19 tietn PLVIVDFWAPWCGSCKMLGPVLEEV esevg 43 1.00 F 3323237
83, 1, 2 17 ektah QAVVVNVGASWCPDCRKIEPIMENL aktyk 41 1.00 F 4155972
84, 1, 2 79 vvnse TPVVVDFHAQWCGPCKILGPRLEKM vakqh 103 1.00 F 4200327
85, 1, 3 10 eidin EYKGKYVVLLFYPLDWTFVCPTEMIGYS evagq 37 1.00 F 4433065
85, 2, 0 50 eince VIGVSVDSVYCH qawce 61 1.00 F 4433065
86, 1, 3 32 vsvhs IAAGKKVILFGVPGAFTPTCSMSHVPGF igkae 59 1.00 F 4704732
86, 2, 0 74 kgide IICFSVNDPFVM kawgk 85 0.43 F 4704732
87, 1, 3 28 fdfyk YVGDNWAILFSHPHDFTPVCTTELAEFG kmhee 55 1.00 F 4996210
87, 2, 0 68 klnck LIGFSCNSKESH dqwie 79 0.56 F 4996210
87, 3, 1 163 lrvlk SLQLTNTHPVATPVNWKEG dkcci 181 1.00 F 4996210
88, 1, 3 41 ynask EFANKKVVLFALPGAFTPVCSANHVPEY iqklp 68 1.00 F 5326864
89, 1, 3 88 slkki TENNRVVVFFVYPRASTPGCTRQACGFR dnyqe 115 1.00 F 6322180
90, 1, 3 43 ewskl ISENKKVIITGAPAAFSPTCTVSHIPGY inyld 70 0.99 F 6323138
91, 1, 2 20 nenkg RLIVVDFFAQWCGPCRNIAPKVEAL akeip 44 1.00 F 6687568
92, 1, 0 68 klgve VLSVSVDSVFVH kmwnd 79 1.00 F 6850955
93, 1, 2 18 llttn KKVVVDFYANWCGPCKILGPIFEEV aqdkk 42 1.00 F 7109697
94, 1, 2 21 ilaed KLVVIDFYADWCGPCKIIAPKLDEL aqqys 45 1.00 F 7290567
95, 1, 3 31 evkls DYKGKYVVLFFYPLDFTFVCPTEIIAFS nraed 58 1.00 F 9955016
95, 2, 0 71 klgce VLGVSVDSQFTH lawin 82 1.00 F 9955016
95, 3, 1 160 lrlvq AFQYTDEHGEVCPAGWKPG sdtik 178 1.00 F 9955016
96, 1, 2 49 adlqg KVTLINFWFPSCPGCVSEMPKIIKT andyk 73 1.00 F 15677788
124 motifs
Column 1 : Sequence Number, Site Number
Column 2 : Motif type
Column 3 : Left End Location
Column 4 : Motif Element
Column 5 : Right End Location
Column 6 : Probability of Element
Column 7 : Forward Motif (F) or Reverse Complement (R)
Column 8 : Sequence Description from Fast A input
-------------------------------------------------------------------------
MOTIF a
Motif model (residue frequency x 100)
____________________________________________
Pos. # a v c d e f g h i w k l m n y p q r s t Info
_____________________________________________________________________________________________
1 | . 68 . . . . . . 20 . . 10 . . . . . . . . 2.4
2 | . 17 . . . . . . 34 3 . 34 . . 6 . . . . 3 1.7
3 | 13 6 10 . . . 51 . . . . 3 . . . . . . 10 3 1.9
4 | . 31 . . . 10 . . 48 . . 10 . . . . . . . . 2.1
5 | . . . . . . . . . . . . . 3 . . . . 96 . 3.8
7 | . . . 86 . . . . . . . . . 13 . . . . . . 3.2
8 | . . . 10 . . . . . . 3 10 . . . 6 3 . 58 6 2.0
9 | 3 34 . . 6 . . 6 . . 3 . . . . 37 3 . . 3 1.8
10 | . . . . 24 58 . . . . . . . . 17 . . . . . 2.9
12 | . . . . . . . 55 . . . 13 6 6 . . 17 . . . 3.4
nonsite 8 8 . 6 7 4 7 1 6 . 7 9 2 4 2 4 3 4 5 5
site 1 15 1 9 3 6 5 6 10 . . 8 . 2 2 4 2 . 16 1
Motif probability model
____________________________________________
Pos. # a v c d e f g h i w k l m n y p q r s t
____________________________________________
1 | 0.001 0.679 0.000 0.001 0.001 0.001 0.001 0.000 0.204 0.000 0.001 0.103 0.000 0.001 0.000 0.001 0.001 0.001 0.001 0.001
2 | 0.001 0.171 0.000 0.001 0.001 0.001 0.001 0.000 0.340 0.034 0.001 0.341 0.000 0.001 0.068 0.001 0.001 0.001 0.001 0.035
3 | 0.137 0.069 0.102 0.001 0.001 0.001 0.510 0.000 0.001 0.000 0.001 0.035 0.000 0.001 0.000 0.001 0.001 0.001 0.103 0.035
4 | 0.001 0.306 0.000 0.001 0.001 0.103 0.001 0.000 0.476 0.000 0.001 0.103 0.000 0.001 0.000 0.001 0.001 0.001 0.001 0.001
5 | 0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.002 0.000 0.035 0.000 0.001 0.001 0.001 0.950 0.001
7 | 0.001 0.001 0.000 0.849 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.002 0.000 0.136 0.000 0.001 0.001 0.001 0.001 0.001
8 | 0.001 0.001 0.000 0.103 0.001 0.001 0.001 0.000 0.001 0.000 0.035 0.103 0.000 0.001 0.000 0.069 0.034 0.001 0.577 0.069
9 | 0.035 0.340 0.000 0.001 0.069 0.001 0.001 0.068 0.001 0.000 0.035 0.002 0.000 0.001 0.000 0.374 0.034 0.001 0.001 0.035
10 | 0.001 0.001 0.000 0.001 0.238 0.577 0.001 0.000 0.001 0.000 0.001 0.002 0.000 0.001 0.170 0.001 0.001 0.001 0.001 0.001
12 | 0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.543 0.001 0.000 0.001 0.137 0.068 0.068 0.000 0.001 0.170 0.001 0.001 0.001
Background probability model
0.089 0.079 0.008 0.067 0.076 0.044 0.071 0.013 0.061 0.009 0.076 0.094 0.023 0.043 0.027 0.045 0.034 0.044 0.052 0.052
10 columns
Num Motifs: 29
1, 1 79 agvda VIVLSANDPFVQ safgk 90 0.48 F 1091044
2, 1 72 klstq ILAISVDSPFSH lqyll 83 1.00 F 11467494
6, 1 66 nlntk IYAISNDSHFVQ knwie 77 1.00 F 13186328
8, 1 68 kknte VISVSEDTVYVH kawvq 79 1.00 F 13541053
9, 1 66 kfkak VIGISVDSPFSL aefak 77 1.00 F 13541117
13, 1 65 eldce LVGLSVDQVFSH ikwie 76 0.98 F 14286173
15, 1 65 klgav VIGVSTDSVEKN rkfae 76 1.00 F 14600438
21, 1 80 megvd VTVVSMDLPFAQ krfce 91 0.65 F 15605963
23, 1 70 tagln VVGISPDKPEKL atfrd 81 0.99 F 15609658
24, 1 64 glntv ILGVSPDPVERH kkfie 75 1.00 F 15613511
28, 1 56 fekaq VVGISRDSVEAL krfke 67 1.00 F 15643152
30, 1 101 tpgla VWGISPDSTYAH eafad 112 0.98 F 15790738
32, 1 80 idntv VLCISADLPFAQ srfcg 91 0.99 F 15801846
35, 1 66 eagav VLGINRDSVYAH rawaa 77 1.00 F 15807234
38, 1 66 evnav VIGISVDPPFSN kafke 77 1.00 F 15899339
47, 1 71 krgve VVGVSFDSEFVH nawrn 82 1.00 F 16501671
53, 1 73 ngvde IVCISVNDAFVM newak 84 0.32 F 17229033
60, 1 68 kanae VLAISVDSPFAL kafkd 79 1.00 F 18313548
62, 1 66 rrnav VLLISCDSVYTH kawas 77 1.00 F 19173077
67, 1 67 klgvd VYSVSTDTHFTH kawhs 78 1.00 F 20151112
70, 1 72 drdaq ILGFSGDSEFVH hawrk 83 1.00 F 21223405
72, 1 78 keegi VLTISADLPFAQ krwca 89 0.99 F 21283385
73, 1 65 srgit VIGISGDSPESH kqfae 76 1.00 F 21674812
81, 1 68 krnvk LIALSIDSVEDH lawsk 79 1.00 F 3318841
85, 1 50 eince VIGVSVDSVYCH qawce 61 1.00 F 4433065
86, 1 74 kgide IICFSVNDPFVM kawgk 85 0.43 F 4704732
87, 1 68 klnck LIGFSCNSKESH dqwie 79 0.56 F 4996210
92, 1 68 klgve VLSVSVDSVFVH kmwnd 79 1.00 F 6850955
95, 1 71 klgce VLGVSVDSQFTH lawin 82 1.00 F 9955016
***** **** *
Column 1 : Sequence Number, Site Number
Column 2 : Left End Location
Column 4 : Motif Element
Column 5 : Right End Location
Column 6 : Probability of Element
Column 7 : Forward Motif (F) or Reverse Complement (R)
Column 8 : Sequence Description from Fast A input
Log Motif portion of MAP for motif a = -469.15170
Log Fragmentation portion of MAP for motif a = -3.80666
-------------------------------------------------------------------------
MOTIF b
Motif model (residue frequency x 100)
____________________________________________
Pos. # a v c d e f g h i w k l m n y p q r s t Info
_____________________________________________________________________________________________
1 | 50 . . . . . . . 16 . . . . . . . . . 33 . 1.9
2 | . . . . . 16 . . . . . 50 . . . . 33 . . . 2.1
3 | . . . . . . . . . . . . . . 33 . 66 . . . 3.4
4 | . 33 . . . 16 . . . . . 33 . . 16 . . . . . 1.7
6 | 33 . . 16 33 . . . . . . . . 16 . . . . . . 1.5
8 | . . . . . . . 50 . . 16 . . . . 33 . . . . 3.2
9 | . . . . . . 66 . . . . . . . . 16 . 16 . . 2.3
10 | . 33 . 16 33 . . . . . . . . . 16 . . . . . 1.7
11 | 50 50 . . . . . . . . . . . . . . . . . . 2.1
12 | . . 66 . . . . . . . . . . . . . . . . 33 4.4
13 | . . . . . . . . . . . . . . . 100 . . . . 3.8
14 | 50 50 . . . . . . . . . . . . . . . . . . 2.1
16 | . . . . . . . . . 100 . . . . . . . . . . 5.8
17 | . . . . 16 . . . . . 66 . . 16 . . . . . . 2.1
19 | . . . . . . 100 . . . . . . . . . . . . . 3.2
nonsite 8 7 . 6 7 4 7 1 6 . 7 9 2 4 2 4 3 4 5 5
site 12 11 4 2 5 2 11 3 1 6 5 5 . 2 4 10 6 1 2 2
Motif probability model
____________________________________________
Pos. # a v c d e f g h i w k l m n y p q r s t
____________________________________________
1 | 0.468 0.006 0.001 0.005 0.005 0.004 0.005 0.001 0.158 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.312 0.004
2 | 0.006 0.006 0.001 0.005 0.005 0.158 0.005 0.001 0.004 0.001 0.006 0.469 0.002 0.003 0.002 0.004 0.310 0.003 0.004 0.004
3 | 0.006 0.006 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.310 0.004 0.618 0.003 0.004 0.004
4 | 0.006 0.314 0.001 0.005 0.005 0.158 0.005 0.001 0.004 0.001 0.006 0.315 0.002 0.003 0.156 0.004 0.002 0.003 0.004 0.004
6 | 0.314 0.006 0.001 0.159 0.313 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.157 0.002 0.004 0.002 0.003 0.004 0.004
8 | 0.006 0.006 0.001 0.005 0.005 0.004 0.005 0.463 0.004 0.001 0.159 0.007 0.002 0.003 0.002 0.312 0.002 0.003 0.004 0.004
9 | 0.006 0.006 0.001 0.005 0.005 0.004 0.621 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.158 0.002 0.157 0.004 0.004
10 | 0.006 0.314 0.001 0.159 0.313 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.156 0.004 0.002 0.003 0.004 0.004
11 | 0.468 0.468 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.004
12 | 0.006 0.006 0.617 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.312
13 | 0.006 0.006 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.927 0.002 0.003 0.004 0.004
14 | 0.468 0.468 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.004
16 | 0.006 0.006 0.001 0.005 0.005 0.004 0.005 0.001 0.004 0.924 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.004
17 | 0.006 0.006 0.001 0.005 0.159 0.004 0.005 0.001 0.004 0.001 0.621 0.007 0.002 0.157 0.002 0.004 0.002 0.003 0.004 0.004
19 | 0.006 0.006 0.001 0.005 0.005 0.004 0.928 0.001 0.004 0.001 0.006 0.007 0.002 0.003 0.002 0.004 0.002 0.003 0.004 0.004
Background probability model
0.089 0.079 0.008 0.067 0.076 0.044 0.071 0.013 0.061 0.009 0.076 0.094 0.023 0.043 0.027 0.045 0.034 0.044 0.052 0.052
15 columns
Num Motifs: 6
2, 1 161 riles IQYVKENPGYACPVNWNFG dqvfy 179 1.00 F 11467494
47, 1 160 lrmvd ALQFHEEHGDVCPAQWEKG kegmn 178 1.00 F 16501671
67, 1 154 rkika AQYVAAHPGEVCPAKWKEG eatla 172 1.00 F 20151112
81, 1 166 lrvvi SLQLTAEKRVATPVDWKDG dsvmv 184 1.00 F 3318841
87, 1 163 lrvlk SLQLTNTHPVATPVNWKEG dkcci 181 1.00 F 4996210
95, 1 160 lrlvq AFQYTDEHGEVCPAGWKPG sdtik 178 1.00 F 9955016
**** * ******* ** *
Column 1 : Sequence Number, Site Number
Column 2 : Left End Location
Column 4 : Motif Element
Column 5 : Right End Location
Column 6 : Probability of Element
Column 7 : Forward Motif (F) or Reverse Complement (R)
Column 8 : Sequence Description from Fast A input
Log Motif portion of MAP for motif b = -187.76179
Log Fragmentation portion of MAP for motif b = -7.77486
-------------------------------------------------------------------------
MOTIF c
Motif model (residue frequency x 100)
____________________________________________
Pos. # a v c d e f g h i w k l m n y p q r s t Info
_____________________________________________________________________________________________
1 | . . . 5 3 . 5 . . . 53 3 . . . 3 11 7 1 3 1.6
3 | . 61 . . . . . . 22 . . 7 3 . . . . . 1 3 2.1
4 | . 38 . . . 3 3 . 11 . . 40 1 . . . . . . . 1.7
5 | 5 35 . . . . . . 24 . 1 31 1 . . . . . . . 1.7
6 | . . . 48 . . . 1 . . . . . 37 11 . 1 . . . 2.7
7 | 1 7 . . . 85 . . . . . 3 . . 1 . . . . . 3.4
8 | . . . . . 5 3 1 . 55 . . . . 18 . . . 9 5 3.5
9 | 87 . . . . 1 5 . . . . . . . . . . . 3 1 2.8
10 | 3 . . 14 9 . . 1 . . . . . 1 . 16 5 . 20 25 1.5
11 | . . . . . . 1 . . 90 . 1 . . 1 . . . 1 1 5.2
12 | . . 100 . . . . . . . . . . . . . . . . . 6.0
13 | 9 7 . . 1 . 53 . . . 1 . 1 . . 24 . . . . 2.0
14 | . 7 . 1 . . 1 . . . . 1 . . 1 83 . . 1 . 3.2
15 | . . 100 . . . . . . . . . . . . . . . . . 6.0
16 | . 5 . 1 1 . . 3 1 . 27 1 . . . . 9 46 . . 2.1
18 | . 3 . . 37 12 . . 18 . . 16 3 . . . 3 . . 3 1.4
20 | . . . 1 . . . . . . 3 . . . . 94 . . . . 3.9
22 | . 7 . . . 22 . . 9 . . 44 11 . 5 . . . . . 1.8
24 | 7 . . 3 37 . . 3 . . 27 1 . 1 . . 11 1 1 1 1.4
25 | 3 18 . 1 . 9 . . 7 . . 51 1 . . . . . 1 3 1.4
nonsite 8 7 1 6 7 4 6 1 5 1 7 9 2 4 2 4 3 4 4 4
site 5 9 10 3 4 7 3 . 4 7 5 10 1 2 2 11 2 2 2 2
Motif probability model
____________________________________________
Pos. # a v c d e f g h i w k l m n y p q r s t
____________________________________________
1 | 0.001 0.001 0.000 0.056 0.037 0.000 0.056 0.000 0.001 0.000 0.533 0.038 0.000 0.000 0.000 0.037 0.110 0.074 0.019 0.037
3 | 0.001 0.606 0.000 0.001 0.001 0.000 0.001 0.000 0.221 0.000 0.001 0.074 0.037 0.000 0.000 0.000 0.000 0.000 0.019 0.037
4 | 0.001 0.386 0.000 0.001 0.001 0.037 0.037 0.000 0.111 0.000 0.001 0.405 0.019 0.000 0.000 0.000 0.000 0.000 0.000 0.000
5 | 0.056 0.349 0.000 0.001 0.001 0.000 0.001 0.000 0.239 0.000 0.019 0.313 0.019 0.000 0.000 0.000 0.000 0.000 0.000 0.000
6 | 0.001 0.001 0.000 0.478 0.001 0.000 0.001 0.018 0.001 0.000 0.001 0.001 0.000 0.367 0.110 0.000 0.019 0.000 0.000 0.000
7 | 0.019 0.074 0.000 0.001 0.001 0.845 0.001 0.000 0.001 0.000 0.001 0.038 0.000 0.000 0.019 0.000 0.000 0.000 0.000 0.000
8 | 0.001 0.001 0.000 0.001 0.001 0.056 0.037 0.018 0.001 0.551 0.001 0.001 0.000 0.000 0.184 0.000 0.000 0.000 0.092 0.056
9 | 0.863 0.001 0.000 0.001 0.001 0.019 0.056 0.000 0.001 0.000 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.037 0.019
10 | 0.037 0.001 0.000 0.147 0.092 0.000 0.001 0.018 0.001 0.000 0.001 0.001 0.000 0.019 0.000 0.166 0.055 0.000 0.202 0.257
11 | 0.001 0.001 0.000 0.001 0.001 0.000 0.019 0.000 0.001 0.899 0.001 0.019 0.000 0.000 0.019 0.000 0.000 0.000 0.019 0.019
12 | 0.001 0.001 0.991 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
13 | 0.093 0.074 0.000 0.001 0.019 0.000 0.533 0.000 0.001 0.000 0.019 0.001 0.019 0.000 0.000 0.239 0.000 0.000 0.000 0.000
14 | 0.001 0.074 0.000 0.019 0.001 0.000 0.019 0.000 0.001 0.000 0.001 0.019 0.000 0.000 0.019 0.826 0.000 0.000 0.019 0.000
15 | 0.001 0.001 0.991 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
16 | 0.001 0.056 0.000 0.019 0.019 0.000 0.001 0.037 0.019 0.000 0.276 0.019 0.000 0.000 0.000 0.000 0.092 0.459 0.000 0.000
18 | 0.001 0.037 0.000 0.001 0.368 0.129 0.001 0.000 0.184 0.000 0.001 0.166 0.037 0.000 0.000 0.000 0.037 0.000 0.000 0.037
20 | 0.001 0.001 0.000 0.019 0.001 0.000 0.001 0.000 0.001 0.000 0.037 0.001 0.000 0.000 0.000 0.936 0.000 0.000 0.000 0.000
22 | 0.001 0.074 0.000 0.001 0.001 0.221 0.001 0.000 0.092 0.000 0.001 0.441 0.110 0.000 0.055 0.000 0.000 0.000 0.000 0.000
24 | 0.074 0.001 0.000 0.037 0.368 0.000 0.001 0.037 0.001 0.000 0.276 0.019 0.000 0.019 0.000 0.000 0.110 0.019 0.019 0.019
25 | 0.037 0.184 0.000 0.019 0.001 0.092 0.001 0.000 0.074 0.000 0.001 0.515 0.019 0.000 0.000 0.000 0.000 0.000 0.019 0.037
Background probability model
0.089 0.079 0.008 0.067 0.076 0.044 0.071 0.013 0.061 0.009 0.076 0.094 0.023 0.043 0.027 0.045 0.034 0.044 0.052 0.052
20 columns
Num Motifs: 54
3, 1 17 viqsd KLVVVDFYADWCMPCRYISPILEKL skeyn 41 1.00 F 11499727
4, 1 22 llntt QYVVADFYADWCGPCKAIAPMYAQF aktfs 46 1.00 F 1174686
5, 1 19 ifsak KNVIVDFWAAWCGPCKLTSPEFQKA adefs 43 1.00 F 12044976
7, 1 21 kenhs KPILIDFYADWCPPCRMLIPVLDSI ekkhg 45 1.00 F 13358154
10, 1 17 aetse GVVLADFWAPWCGPCKMIAPVLEEL dqemg 41 1.00 F 135765
11, 1 29 akesn KLIVIDFTASWCPPCRMIAPIFNDL akkfm 53 1.00 F 1388082
12, 1 44 likqn DKLVIDFYATWCGPCKMMQPHLTKL iqayp 68 1.00 F 140543
14, 1 80 selrg KVVMLQFTASWCGVCRKEMPFIEKD iwlkh 104 1.00 F 14578634
16, 1 23 lqnsd KPVLVDFYATWCGPCQLMVPILNEV setlk 47 1.00 F 15218394
17, 1 157 adfrg RPLVINLWASWCPPCRREMPVLQQA qaenp 181 1.00 F 15597673
18, 1 26 ensfh KPVLVDFWADWCAPCKALMPLLAQI aesyq 50 1.00 F 15599256
19, 1 67 negkg KTILLNFWSETCGVCIAELKTFEQL lqsyp 91 1.00 F 15602312
20, 1 61 eefkg KVLLINFWATWCPPCKEEIPMFKEI yekyr 85 1.00 F 15605725
25, 1 60 sdyrg DVVILNVWASWCEPCRKEMPALMEL qsdye 84 1.00 F 15614085
26, 1 63 releg KGVFLNFWGTYCPPCEREMPHMEKL ygeyk 87 1.00 F 15614140
27, 1 72 sslrg QPVILHFFATWCPVCQDEMPSLVKL dkeyr 96 1.00 F 15615431
31, 1 2 m TVTLKDFYADWCGPCKTQDPILEEL eadyd 26 1.00 F 15791337
33, 1 72 adyrg RPVVLNFWASWCGPCREEAPLFAKL aahpg 96 1.00 F 15805225
34, 1 78 taaqg KPVVINFWASWCVPCRQEAPLFSKL sqeta 102 1.00 F 15805374
37, 1 49 fitkn KIVVVDFWAEWCAPCLILAPVIEEL andyp 73 1.00 F 15899007
40, 1 61 easrq QPVLVDFWAPWCGPCKQLTPVIEKV vreaa 85 1.00 F 15966937
41, 1 61 sdfrg KTLLVNLWATWCVPCRKEMPALDEL qgkls 85 1.00 F 15988313
42, 1 60 qdakg KKVLLNFWATWCKPCRQEMPAMEKL qkeya 84 1.00 F 16078864
43, 1 53 llqdd LPMVIDFWAPWCGPCRSFAPIFAET aaera 77 1.00 F 16123427
46, 1 21 vlkad GAILVDFWAEWCGPCKMIAPILDEI adeyq 45 1.00 F 1633495
48, 1 34 vlqcp KPILVYFGAPWCGLCHFVKPLLNHL hgewq 58 1.00 F 1651717
49, 1 60 tlsee RPVLLYFWASWCGVCRFTTPAVAHL aaege 84 1.00 F 16759994
50, 1 53 llkdd LPVVIDFWAPWCGPCRNFAPIFEDV aeers 77 1.00 F 16761507
52, 1 19 iissh PKILLNFWAEWCAPCRCFWPTLEQF aemee 43 1.00 F 16804867
54, 1 22 vlsed KVVVVDFTATWCGPCRLVSPLMDQL adeyk 46 1.00 F 17229859
55, 1 18 vlegt GYVLVDYFSDGCVPCKALMPAVEEL skkye 42 1.00 F 1729944
56, 1 28 rqhpe KIIILDFYATWCGPCKAIAPLYKEL atthk 52 1.00 F 17531233
57, 1 27 ehlkg KIIGLYFSASWCPPCRAFTPKLKEF feeik 51 1.00 F 17537401
58, 1 63 safrg QPVVINFWAPWCGPCVEEMPELSAL aqeqk 87 1.00 F 17547503
59, 1 286 seykg KTIFLNFWATWCPPCRGEMPYIDEL ykeyn 310 1.00 F 18309723
61, 1 44 dsllg KKIGLYFSAAWCGPCQRFTPQLVEV ynels 68 1.00 F 18406743
61, 2 364 sdlvg KTILMYFSAHWCPPCRAFTPKLVEV ykqik 388 1.00 F 18406743
63, 1 15 sdfeg EVVVLNAWGQWCAPCRAEVDDLQLV qetld 39 1.00 F 19554157
64, 1 39 eeykg KVVVINFWATWCGYCVEEMPGFEKV ykefg 63 1.00 F 19705357
66, 1 7 agdfm KPMLLDFSATWCGPCRMQKPILEEL ekkyg 31 1.00 F 20092028
69, 1 103 adykg KVVVLNVWGSWCPPCRAEAKNFEKV yqdvk 127 1.00 F 21222859
74, 1 53 sdfkg ERVLINFWTTWCPPCRQEMPDMQRF yqdlq 77 1.00 F 23098307
76, 1 20 kylqh QRVVVDFSAEWCGPCRAIAPVFDKL sneft 44 1.00 F 267116
77, 1 81 aafkg KVSLVNVWASWCVPCHDEAPLLTEL gkdkr 105 1.00 F 27375582
78, 1 34 vtsdn DVVLADFYADWCGPCQMLEPVVETL aeqtd 58 1.00 F 2822332
79, 1 77 sdlkg KKVILNFWATWCGPCQQEMPDMEAF ykehk 101 1.00 F 30021713
80, 1 19 tisan SNVLVYFWAPLCAPCDLFTPTYEAS srkhf 43 1.00 F 3261501
82, 1 19 tietn PLVIVDFWAPWCGSCKMLGPVLEEV esevg 43 1.00 F 3323237
83, 1 17 ektah QAVVVNVGASWCPDCRKIEPIMENL aktyk 41 1.00 F 4155972
84, 1 79 vvnse TPVVVDFHAQWCGPCKILGPRLEKM vakqh 103 1.00 F 4200327
91, 1 20 nenkg RLIVVDFFAQWCGPCRNIAPKVEAL akeip 44 1.00 F 6687568
93, 1 18 llttn KKVVVDFYANWCGPCKILGPIFEEV aqdkk 42 1.00 F 7109697
94, 1 21 ilaed KLVVIDFYADWCGPCKIIAPKLDEL aqqys 45 1.00 F 7290567
96, 1 49 adlqg KVTLINFWFPSCPGCVSEMPKIIKT andyk 73 1.00 F 15677788
* ************** * * * **
Column 1 : Sequence Number, Site Number
Column 2 : Left End Location
Column 4 : Motif Element
Column 5 : Right End Location
Column 6 : Probability of Element
Column 7 : Forward Motif (F) or Reverse Complement (R)
Column 8 : Sequence Description from Fast A input
Log Motif portion of MAP for motif c = -1607.59351
Log Fragmentation portion of MAP for motif c = -10.42374
-------------------------------------------------------------------------
MOTIF d
Motif model (residue frequency x 100)
____________________________________________
Pos. # a v c d e f g h i w k l m n y p q r s t Info
_____________________________________________________________________________________________
1 | 2 . . 28 17 2 . 2 8 . . 17 . . 11 . . . 2 5 1.1
2 | 5 2 . . 5 34 . . . . 2 14 . . 17 . . 8 2 5 1.4
3 | 8 . . 5 11 . 14 . 2 . 28 . . 5 2 . . 17 . 2 1.0
4 | 2 . . 11 . . 62 . . . 5 . . 11 . . 2 . 2 . 2.1
5 | . . . . . . 2 . . . 57 . . 5 . . 5 22 5 . 2.2
7 | 2 68 . . . 2 5 . 2 . 2 . . 2 . . . . 2 8 1.9
8 | 2 62 . . . . . . 25 . . 8 . . . . . . . . 2.3
9 | . 8 . . . 8 . . 5 . . 77 . . . . . . . . 2.3
10 | 8 8 . . . 57 . . 2 . . 5 . . 11 . . . 2 2 2.1
11 | 14 2 . . . 62 8 . . . . . . . . . . . 11 . 2.4
12 | 2 11 . . . 14 . 11 2 5 . 5 . . 45 . . . . . 2.4
13 | . . . . . . . . . . . . . . . 100 . . . . 4.3
14 | 22 . . . . 2 28 2 . . 8 17 5 . 2 . . 8 . . 1.2
15 | 51 . . 42 . . . . . . . . . 2 . . . . 2 . 2.3
16 | . . . 2 . 71 2 . . 5 . . 5 . 5 . . . 5 . 2.9
17 | . . . . . . . . . . . . . . . . . . 11 88 3.6
18 | 5 . . . . 25 2 . . . . . . . . 51 2 . 11 . 2.4
19 | . 54 . . . . 20 . 8 . . . . . . . . . . 17 2.0
20 | . . 97 . . . . . . . . . . . . . . . 2 . 6.2
21 | 5 . . . . . . . . . . . . . . 28 2 . 20 42 2.3
22 | 14 2 . . . . 2 . . . 14 5 5 . . . 2 8 5 37 1.3
23 | . . . . 62 . . . . . 2 . . 8 . . 14 . 5 5 2.2
24 | 14 2 . . . 2 2 22 11 . . 31 11 . . . . . . . 1.8
27 | 2 2 . . 2 45 17 . 8 . . 8 . . 8 2 . . . . 1.6
28 | 11 . . 2 2 8 5 . . . . . . 2 14 . 5 22 22 . 1.4
nonsite 8 7 . 6 7 4 7 1 5 . 7 9 2 4 2 4 3 4 5 5
site 7 9 3 3 4 13 7 1 3 . 4 7 1 1 4 7 1 3 4 8
Motif probability model
____________________________________________
Pos. # a v c d e f g h i w k l m n y p q r s t
____________________________________________
1 | 0.029 0.001 0.000 0.283 0.170 0.029 0.001 0.028 0.085 0.000 0.001 0.170 0.000 0.001 0.113 0.001 0.000 0.001 0.029 0.057
2 | 0.058 0.029 0.000 0.001 0.057 0.339 0.001 0.000 0.001 0.000 0.029 0.142 0.000 0.001 0.169 0.001 0.000 0.085 0.029 0.057
3 | 0.086 0.001 0.000 0.057 0.114 0.001 0.142 0.000 0.029 0.000 0.283 0.001 0.000 0.057 0.029 0.001 0.000 0.170 0.001 0.029
4 | 0.029 0.001 0.000 0.114 0.001 0.001 0.621 0.000 0.001 0.000 0.057 0.001 0.000 0.113 0.000 0.001 0.029 0.001 0.029 0.001
5 | 0.001 0.001 0.000 0.001 0.001 0.001 0.029 0.000 0.001 0.000 0.564 0.001 0.000 0.057 0.000 0.001 0.057 0.226 0.057 0.001
7 | 0.029 0.677 0.000 0.001 0.001 0.029 0.057 0.000 0.029 0.000 0.029 0.001 0.000 0.029 0.000 0.001 0.000 0.001 0.029 0.085
8 | 0.029 0.621 0.000 0.001 0.001 0.001 0.001 0.000 0.254 0.000 0.001 0.086 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.001
9 | 0.001 0.086 0.000 0.001 0.001 0.085 0.001 0.000 0.057 0.000 0.001 0.762 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.001
10 | 0.086 0.086 0.000 0.001 0.001 0.564 0.001 0.000 0.029 0.000 0.001 0.058 0.000 0.001 0.113 0.001 0.000 0.001 0.029 0.029
11 | 0.142 0.029 0.000 0.001 0.001 0.620 0.085 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.113 0.001
12 | 0.029 0.114 0.000 0.001 0.001 0.142 0.001 0.113 0.029 0.057 0.001 0.058 0.000 0.001 0.451 0.001 0.000 0.001 0.001 0.001
13 | 0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.987 0.000 0.001 0.001 0.001
14 | 0.227 0.001 0.000 0.001 0.001 0.029 0.283 0.028 0.001 0.000 0.086 0.170 0.057 0.001 0.029 0.001 0.000 0.085 0.001 0.001
15 | 0.508 0.001 0.000 0.423 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.029 0.000 0.001 0.000 0.001 0.029 0.001
16 | 0.001 0.001 0.000 0.029 0.001 0.705 0.029 0.000 0.001 0.057 0.001 0.001 0.057 0.001 0.057 0.001 0.000 0.001 0.057 0.001
17 | 0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.113 0.874
18 | 0.058 0.001 0.000 0.001 0.001 0.254 0.029 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.508 0.029 0.001 0.113 0.001
19 | 0.001 0.536 0.000 0.001 0.001 0.001 0.198 0.000 0.085 0.000 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.001 0.170
20 | 0.001 0.001 0.958 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.001 0.029 0.001
21 | 0.058 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.001 0.000 0.001 0.000 0.282 0.029 0.001 0.198 0.423
22 | 0.142 0.029 0.000 0.001 0.001 0.001 0.029 0.000 0.001 0.000 0.142 0.058 0.057 0.001 0.000 0.001 0.029 0.085 0.057 0.367
23 | 0.001 0.001 0.000 0.001 0.621 0.001 0.001 0.000 0.001 0.000 0.029 0.001 0.000 0.085 0.000 0.001 0.141 0.001 0.057 0.057
24 | 0.142 0.029 0.000 0.001 0.001 0.029 0.029 0.226 0.113 0.000 0.001 0.311 0.113 0.001 0.000 0.001 0.000 0.001 0.001 0.001
27 | 0.029 0.029 0.000 0.001 0.029 0.451 0.170 0.000 0.085 0.000 0.001 0.086 0.000 0.001 0.085 0.029 0.000 0.001 0.001 0.001
28 | 0.114 0.001 0.000 0.029 0.029 0.085 0.057 0.000 0.001 0.000 0.001 0.001 0.000 0.029 0.141 0.001 0.057 0.226 0.226 0.001
Background probability model
0.089 0.079 0.008 0.067 0.076 0.044 0.071 0.013 0.061 0.009 0.076 0.094 0.023 0.043 0.027 0.045 0.034 0.044 0.052 0.052
25 columns
Num Motifs: 35
1, 1 37 ldfdk EFRDKTVVIVAIPGAFTPTCTANHIPPF vekft 64 1.00 F 1091044
2, 1 32 irlsd YRGKKYVILFFYPANFTAISPTELMLLS drise 59 1.00 F 11467494
6, 1 26 eikei DLKSNWNVFFFYPYSYSFICPLELKNIS nkike 53 0.98 F 13186328
8, 1 28 kirls SYRGKWVVLFFYPADFTFVCPTEVEGFA edyek 55 1.00 F 13541053
9, 1 26 mrkls EFRGQNVVLAFFPGAFTSVCTKEMCTFR dsman 53 1.00 F 13541117
13, 1 25 melpd EFEGKWFILFSHPADFTPVCTTEFVAFQ evype 52 1.00 F 14286173
15, 1 25 kirls DFRGRIVVLYFYPRAMTPGCTREGVRFN ellde 52 1.00 F 14600438
22, 1 26 vtlrg YRGAKNVLLVFFPLAFTGICQGELDQLR dhlpe 53 1.00 F 15609375
23, 1 30 nvsla DYRGRRVIVYFYPAASTPGCTKQACDFR dnlgd 57 1.00 F 15609658
24, 1 24 tvsls DFKGKNIVLYFYPKDMTPGCTTEACDFR drved 51 1.00 F 15613511
28, 1 20 tfthv DLYGKYTILFFFPKAGTSGCTREAVEFS renfe 47 1.00 F 15643152
30, 1 61 gltda LADNRAVVLFFYPFDFSPVCATELCAIQ narwf 88 1.00 F 15790738
35, 1 26 itlss YRGQSHVVLVFYPLDFSPVCSMQLPEYS gsqdd 53 1.00 F 15807234
36, 1 28 vnlae LFKGKKGVLFGVPGAFTPGCSKTHLPGF veqae 55 1.00 F 15826629
38, 1 26 vkips DFKGKVVVLAFYPAAFTSVCTKEMCTFR dsmak 53 1.00 F 15899339
39, 1 30 vttel LFKGKRVVLFAVPGAFTPTCSLNHLPGY lenrd 57 1.00 F 15964668
44, 1 50 fnlak ALKKGPVVLYFFPAAYTAGCTAEAREFA eatpe 77 1.00 F 16125919
47, 1 31 fnfkq HTNGKTTVLFFWPMDFTFVCPSELIAFD kryee 58 1.00 F 16501671
51, 1 33 slekn IEDDKWTILFFYPMDFTFVCPTEIVAIS arsde 60 1.00 F 16803644
53, 1 31 vttdd LFAGKTVAVFSLPGAFTPTCSSTHLPGY nelak 58 1.00 F 17229033
60, 1 28 rlsev LKRGRPVVLLFFPGAFTSVCTKELCTFR dkmal 55 1.00 F 18313548
62, 1 26 eislq DYIGKYVVLAFYPLDFTFVCPTEINRFS dlkga 53 1.00 F 19173077
67, 1 27 evtek DTEGRWSVFFFYPADFTFVCPTELGDVA dhyee 54 1.00 F 20151112
68, 1 29 vdtht LFTGRKVVLFAVPGAFTPTCSAKHLPGY veqfe 56 1.00 F 21112072
70, 1 32 qinhk TYEGQWKVVFAWPKDFTFVCPTEIAAFG klnde 59 1.00 F 21223405
71, 1 28 eihly DLKGKKVLLSFHPLAWTQVCAQQMKSLE enyel 55 1.00 F 21227878
73, 1 25 mvsls EFKGRKVLLIFYPGDDTPVCTAQLCDYR nnvaa 52 1.00 F 21674812
81, 1 28 irfhd FLGDSWGILFSHPRDFTPVCTTELGRAA klape 55 1.00 F 3318841
85, 1 10 eidin EYKGKYVVLLFYPLDWTFVCPTEMIGYS evagq 37 1.00 F 4433065
86, 1 32 vsvhs IAAGKKVILFGVPGAFTPTCSMSHVPGF igkae 59 1.00 F 4704732
87, 1 28 fdfyk YVGDNWAILFSHPHDFTPVCTTELAEFG kmhee 55 1.00 F 4996210
88, 1 41 ynask EFANKKVVLFALPGAFTPVCSANHVPEY iqklp 68 1.00 F 5326864
89, 1 88 slkki TENNRVVVFFVYPRASTPGCTRQACGFR dnyqe 115 1.00 F 6322180
90, 1 43 ewskl ISENKKVIITGAPAAFSPTCTVSHIPGY inyld 70 0.99 F 6323138
95, 1 31 evkls DYKGKYVVLFFYPLDFTFVCPTEIIAFS nraed 58 1.00 F 9955016
***** ****************** **
Column 1 : Sequence Number, Site Number
Column 2 : Left End Location
Column 4 : Motif Element
Column 5 : Right End Location
Column 6 : Probability of Element
Column 7 : Forward Motif (F) or Reverse Complement (R)
Column 8 : Sequence Description from Fast A input
Log Motif portion of MAP for motif d = -1668.31468
Log Fragmentation portion of MAP for motif d = -7.86327
Log Background portion of Map = -39912.17887
Log Alignment portion of Map = -956.36102
Log Site/seq portion of Map = 0.00000
Log Null Map = -46943.36311
Log Map = 2112.13301
log MAP = sum of motif and fragmentation parts of MAP + background + alignment + sites/seq - Null
Frequency Map = 2109.909622
Nearopt Map = 2111.157969
Maximal Map = 2111.157969
Total Time 105 sec (1.750000 min)
Elapsed time: 104.960000 secs
DOF[0] = 190
DOF[1] = 285
DOF[2] = 380
DOF[3] = 475
"""
#run if called from command-line
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_parse/test_greengenes.py 000644 000765 000024 00000015307 12024702176 023315 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from cogent.util.unit_test import TestCase, main
from cogent.parse.greengenes import MinimalGreengenesParser, make_ignore_f,\
DefaultDelimitedSplitter, SpecificGreengenesParser
__author__ = "Daniel McDonald"
__copyright__ = "Copyright 2007-2012, The Cogent Project" #consider project name
__credits__ = ["Daniel McDonald"] #remember to add yourself if you make changes
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Daniel McDonald"
__email__ = "daniel.mcdonald@colorado.edu"
__status__ = "Prototype"
class ParseGreengenesRecordsTests(TestCase):
def setUp(self):
pass
def test_MinimalGreengenesParser_mock(self):
"""Test MinimalGreengenesParser against mock data"""
res = MinimalGreengenesParser(mock_data.splitlines(), RecStart="my_starting", \
RecEnd="my_ending")
records = list(res)
exp = [{'a':'1','b':'2','c':'3','d':'','e':'5'},
{'q':'asdasd','c':'taco'}]
self.assertEqual(records, exp)
def test_MinimalGreengenesParser_real(self):
"""Test MinimalGreengenesParser against real data"""
res = MinimalGreengenesParser(real_data.splitlines())
record1, record2 = list(res)
self.assertEqual(record1['G2_chip_tax_string'],'Unclassified')
self.assertEqual(record1['authors'],'Hernanandez-Eugenio,G., Silva-Rojas,H.V., Zelaya-Molina,L.X.')
self.assertEqual(record1['bel3_div_ratio'],'')
self.assertEqual(len(record1), 72)
self.assertEqual(record2['ncbi_acc_w_ver'],'FJ832719.1')
self.assertEqual(record2['timestamp'],'2010-03-23 14:08:27')
self.assertEqual(record2['title'],'Developmental Microbial Ecology of the Crop of the Folivorous Hoatzin')
def test_SpecificGreengenesParser_real(self):
"""Test SpecificGreengenesParser against real data"""
fields = ['prokMSA_id','journal']
res = SpecificGreengenesParser(real_data.splitlines(), fields)
records = list(res)
exp = [('604868',''),('604867','ISME J (2010) In press')]
self.assertEqual(records, exp)
ids = ['604867','12312312323']
res = SpecificGreengenesParser(real_data.splitlines(), fields, ids)
records = list(res)
exp = [('604867','ISME J (2010) In press')]
self.assertEqual(records, exp)
def test_make_ignore_f(self):
"""Properly ignore empty records and the start line"""
f = make_ignore_f('testing')
self.assertFalse(f(['asasdasd','']))
self.assertFalse(f(['test','']))
self.assertFalse(f(['testing2','']))
self.assertFalse(f(['testing','asd']))
self.assertTrue(f(['','']))
self.assertTrue(f(None))
self.assertTrue(f(['','']))
self.assertTrue(f(['testing','']))
mock_data = """my_starting
a=1
b=2
c=3
d=
e=5
my_ending
my_starting
q=asdasd
c=taco
my_ending
"""
real_data = """BEGIN
G2_chip_tax_string=Unclassified
G2_chip_tax_string_format_2=Unclassified
HOMD_tax_string=
HOMD_tax_string_format_2=
Hugenholtz_tax_string=Unclassified
Hugenholtz_tax_string_format_2=Unclassified
Ludwig_tax_string=Unclassified
Ludwig_tax_string_format_2=Unclassified
Pace_tax_string=Unclassified
Pace_tax_string_format_2=Unclassified
RDP_tax_string=Unclassified
RDP_tax_string_format_2=Unclassified
Silva_tax_string=Unclassified
Silva_tax_string_format_2=Unclassified
authors=Hernanandez-Eugenio,G., Silva-Rojas,H.V., Zelaya-Molina,L.X.
bel3_div_ratio=
bellerophon=
blast_perc_ident_to_template=
clone=51a
contact_info=Irrigacion, Universidad Autonoma Chapingo, Carretera Mexico-Texcoco Km 37.5, Texcoco, Mexico 56230, Mexico
core_set_member=
core_set_member2=
country=Mexico: Mexico City
create_date=21-NOV-2009
db_name=
decision=clone
description=Uncultured bacterium clone 51a 16S ribosomal RNA gene, partial sequence
email=
gold_id=
img_oid=
isolate=
isolation_source=mesophilic anaerobic reactor fed with effluent from the chemical industry
journal=
longest_insertion=
medline_ids=
ncbi_acc=
ncbi_acc_w_ver=FJ461956.1
ncbi_gi=213390944
ncbi_seq_length=1512
ncbi_tax_id=77133
ncbi_tax_string=Bacteria; environmental samples
ncbi_tax_string_format_2=Unclassified
non_ACGT_count=
non_ACGT_percent=
note=
organism=uncultured bacterium
perc_ident_to_invariant_core=
prokMSA_id=604868
prokMSAname=Microbial ecology industrial digestor mesophilic anaerobic reactor fed effluent chemical industry clone 51a
pubmed_ids=
remark=
replaced_by=
single_nt_runs_over_7=
small_gap_intrusions=
source=uncultured bacterium
span_aligned=1..2
specific_host=
status=0
strain=
study_id=38002
sub_species=
submit_date=24-OCT-2008
template=
timestamp=2010-03-23 14:08:27
title=Microbial ecology of industrial anaerobic digestor
unaligned_length=
update_date=21-NOV-2009
warning=
wigeon95=
wigeon99=
wigeon_std_dev=
aligned_seq=unaligned
END
BEGIN
G2_chip_tax_string=Unclassified
G2_chip_tax_string_format_2=Unclassified
HOMD_tax_string=
HOMD_tax_string_format_2=
Hugenholtz_tax_string=Unclassified
Hugenholtz_tax_string_format_2=Unclassified
Ludwig_tax_string=Unclassified
Ludwig_tax_string_format_2=Unclassified
Pace_tax_string=Unclassified
Pace_tax_string_format_2=Unclassified
RDP_tax_string=Unclassified
RDP_tax_string_format_2=Unclassified
Silva_tax_string=Unclassified
Silva_tax_string_format_2=Unclassified
authors=Brodie,E.L., Dominguez-Bello,M.G., Garcia-Amado,M.A., Godoy-Vitorino,F., Goldfarb,K.C., Michelangeli,F.
bel3_div_ratio=
bellerophon=
blast_perc_ident_to_template=
clone=J3Q101_11C02
contact_info=Biology, University of Puerto Rico, Rio Piedras Campus, PO Box 23360, San Juan, PR 00931-3360, USA
core_set_member=
core_set_member2=
country=Venezuela
create_date=10-DEC-2009
db_name=
decision=clone
description=Uncultured bacterium clone J3Q101_11C02 16S ribosomal RNA gene, partial sequence
email=
gold_id=
img_oid=
isolate=
isolation_source=crop contents
journal=ISME J (2010) In press
longest_insertion=
medline_ids=
ncbi_acc=
ncbi_acc_w_ver=FJ832719.1
ncbi_gi=226447371
ncbi_seq_length=1326
ncbi_tax_id=77133
ncbi_tax_string=Bacteria; environmental samples
ncbi_tax_string_format_2=Unclassified
non_ACGT_count=
non_ACGT_percent=
note=
organism=uncultured bacterium
perc_ident_to_invariant_core=
prokMSA_id=604867
prokMSAname=Microbial Ecology Crop Folivorous Hoatzin crop contents clone J3Q101_11C02
pubmed_ids=
remark=
replaced_by=
single_nt_runs_over_7=
small_gap_intrusions=
source=uncultured bacterium
span_aligned=1..2
specific_host=
status=0
strain=
study_id=37901
sub_species=
submit_date=16-MAR-2009
template=
timestamp=2010-03-23 14:08:27
title=Developmental Microbial Ecology of the Crop of the Folivorous Hoatzin
unaligned_length=
update_date=10-DEC-2009
warning=
wigeon95=
wigeon99=
wigeon_std_dev=
aligned_seq=unaligned
END
"""
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_illumina_sequence.py 000644 000765 000024 00000010507 12024702176 024672 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests of Illumina sequence file parser.
"""
from cogent.util.unit_test import TestCase, main
from cogent.util.misc import remove_files
from cogent.app.util import get_tmp_filename
from cogent.parse.illumina_sequence import (MinimalIlluminaSequenceParser)
__author__ = "Greg Caporaso"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Greg Caporaso", "Gavin Huttley"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Greg Caporaso"
__email__ = "gregcaporaso@gmail.com"
__status__ = "Production"
class ParseIlluminaSequenceTests(TestCase):
""" Test of top-level Illumina parsing functions """
def setUp(self):
""" """
self.illumina_read1 = illumina_read1
self.illumina_read2 = illumina_read2
self.expected_read1 = expected_read1
self.expected_read2 = expected_read2
self.illumina_read1_fp = get_tmp_filename(
prefix='ParseIlluminaTest',suffix='.txt')
open(self.illumina_read1_fp,'w').write('\n'.join(self.illumina_read1))
self.files_to_remove = [self.illumina_read1_fp]
def tearDown(self):
""" """
remove_files(self.files_to_remove)
def test_MinimalIlluminaSequenceParser(self):
""" MinimalIlluminaSequenceParser functions as expected """
actual_read1 = list(MinimalIlluminaSequenceParser(self.illumina_read1))
self.assertEqual(actual_read1,self.expected_read1)
actual_read2 = list(MinimalIlluminaSequenceParser(self.illumina_read2))
self.assertEqual(actual_read2,self.expected_read2)
def test_MinimalIlluminaSequenceParser_handles_filepath_as_input(self):
""" MinimalIlluminaSequenceParser functions with filepath as input
"""
actual_read1 = list(MinimalIlluminaSequenceParser(
self.illumina_read1_fp))
self.assertEqual(actual_read1,self.expected_read1)
def test_MinimalIlluminaSequenceParser_handles_file_as_input(self):
""" MinimalIlluminaSequenceParser functions with file handle as input
"""
actual_read1 = list(MinimalIlluminaSequenceParser(
open(self.illumina_read1_fp)))
self.assertEqual(actual_read1,self.expected_read1)
illumina_read1 = """HWI-6X_9267:1:1:4:1699#ACCACCC/1:TACGGAGGGTGCGAGCGTTAATCGCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCGAAAAAAAAAAAAAAAAAAAAAAA:abbbbbbbbbb`_`bbbbbb`bb^aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaDaabbBBBBBBBBBBBBBBBBBBB
HWI-6X_9267:1:1:4:390#ACCTCCC/1:GACAGGAGGAGCAAGTGTTATTCAAATTATGCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGAAAAAAAAAAAAAAAAAAAAAAA:aaaaaaaaaa```aa\^_aa``aVaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaBaaaaa""".split('\n')
expected_read1 = [(["HWI-6X_9267","1","1","4","1699#ACCACCC/1"],
"TACGGAGGGTGCGAGCGTTAATCGCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCGAAAAAAAAAAAAAAAAAAAAAAA",
"abbbbbbbbbb`_`bbbbbb`bb^aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaDaabbBBBBBBBBBBBBBBBBBBB"),
(["HWI-6X_9267","1","1","4","390#ACCTCCC/1"],
"GACAGGAGGAGCAAGTGTTATTCAAATTATGCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGAAAAAAAAAAAAAAAAAAAAAAA",
"aaaaaaaaaa```aa\^_aa``aVaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaBaaaaa")]
illumina_read2 = """HWI-6X_9267:1:1:4:1699#ACCACCC/2:TTTTAAAAAAAAGGGGGGGGGGGCCCCCCCCCCCCCCCCCCCCCCCCTTTTTTTTTTTTTAAAAAAAAACCCCCCCGGGGGGGGTTTTTTTAATTATTC:aaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccBcccccccccccccccc```````BBBB
HWI-6X_9267:1:1:4:390#ACCTCCC/2:ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACG:aaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbb""".split('\n')
expected_read2 = [(["HWI-6X_9267","1","1","4","1699#ACCACCC/2"],
"TTTTAAAAAAAAGGGGGGGGGGGCCCCCCCCCCCCCCCCCCCCCCCCTTTTTTTTTTTTTAAAAAAAAACCCCCCCGGGGGGGGTTTTTTTAATTATTC",
"aaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccBcccccccccccccccc```````BBBB"),
(["HWI-6X_9267","1","1","4","390#ACCTCCC/2"],
"ACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACG",
"aaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbb")]
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_parse/test_ilm.py 000644 000765 000024 00000002045 12024702176 021747 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from cogent.util.unit_test import TestCase, main
from cogent.core.info import Info
from cogent.parse.ilm import ilm_parser
__author__ = "Shandy Wikman"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Shandy Wikman"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Shandy Wikman"
__email__ = "ens01svn@cs.umu.se"
__status__ = "Development"
class IlmParserTest(TestCase):
"""Provides tests for ILM RNA secondary structure format parsers"""
def setUp(self):
"""Setup function"""
#output
self.ilm_out = ILM
#expected
self.ilm_exp = [[(0,13),(1,12),(2,11),(6,7)]]
def test_ilm_output(self):
"""Test for ilm format"""
obs = ilm_parser(self.ilm_out)
self.assertEqual(obs,self.ilm_exp)
ILM = ['\n', 'Final Matching:\n', '1 14\n', '2 13\n', '3 12\n', '4 0\n',
'5 0\n', '6 0\n', '7 8\n', '8 7\n', '9 0\n', '10 0\n', '11 0\n', '12 3\n',
'13 2\n', '14 1\n']
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_infernal.py 000644 000765 000024 00000015773 12024702176 023000 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
# test_infernal.py
from cogent.util.unit_test import TestCase, main
from cogent.parse.infernal import CmsearchParser,CmalignScoreParser
__author__ = "Jeremy Widmann"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jeremy Widmann"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Jeremy Widmann"
__email__ = "jeremy.widmann@colorado.edu"
__status__ = "Development"
class CmsearchParserTests(TestCase):
"""Tests for CmsearchParser.
"""
def setUp(self):
"""setup for CmsearchParserTests.
"""
self.basic_results_empty = """# command data
# date
# CM summary
# Post search summary
"""
self.basic_results_hits = """# command data
# date
# CM summary
Model_1 Target_1 1 10 1 10 25.25 - 50
Model_1 Target_2 3 13 1 10 14.2 - 49
# Post search summary
"""
self.basic_res = [['Model_1','Target_1', 1, 10, 1, 10, 25.25, '-', 50],\
['Model_1','Target_2', 3, 13, 1, 10, 14.2, '-', 49]]
self.search_res = [['model1.cm','seq_0', 5, 23, 1, 19, 12.85, '-', 37],\
['model1.cm','seq_1', 1, 19, 1, 19, 14.36, '-', 47]]
def test_cmsearch_parser_no_data(self):
"""CmsearchParser should return correct result given no data.
"""
parser = CmsearchParser([])
self.assertEqual(list(parser),[])
def test_cmsearch_parser_no_res(self):
"""CmsearchParser should return correct result given no hits in result.
"""
parser = CmsearchParser(self.basic_results_empty.split('\n'))
self.assertEqual(list(parser),[])
def test_cmsearch_parser_basic(self):
"""CmsearchParser should return correct result given basic output.
"""
parser = CmsearchParser(self.basic_results_hits.split('\n'))
self.assertEqual(list(parser),self.basic_res)
def test_cmsearch_parser_full(self):
"""CmsearchParser should return correct result given cmsearch output.
"""
parser = CmsearchParser(SEARCH_DATA.split('\n'))
self.assertEqual(list(parser),self.search_res)
class CmalignScoreParserTests(TestCase):
"""Tests for CmalignScoreParser.
"""
def setUp(self):
"""setup for CmalignScoreParserTests.
"""
self.basic_results_hits = """# command: data
# date:
#
# cm summary
1 Target_1 83 55.02 2.94 0.956 00:00:00.01
2 Target_2 84 53.31 4.42 0.960 00:00:00.01
# post alignment summary
"""
self.basic_res = [[1,'Target_1',83,55.02,2.94,0.956,'00:00:00.01'],\
[2,'Target_2',84,53.31,4.42,0.960,'00:00:00.01']]
self.search_res = \
[[1,'AABL01002928.1/2363-2445',83,55.02,2.94,0.956,'00:00:00.01'],\
[2,'AACV01025780.1/26051-26134',84,53.31,4.42,0.960,'00:00:00.01']]
def test_cmalign_score_parser_no_data(self):
"""CmalignScoreParser should return correct result given no data.
"""
parser = CmalignScoreParser([])
self.assertEqual(list(parser),[])
def test_cmalign_score_parser_basic(self):
"""CmalignScoreParser should return correct result given basic output.
"""
parser = CmalignScoreParser(self.basic_results_hits.split('\n'))
self.assertEqual(list(parser),self.basic_res)
def test_cmalign_score_parser_full(self):
"""CmalignScoreParser should return correct result given cmalign output.
"""
parser = CmalignScoreParser(ALIGN_DATA.split('\n'))
self.assertEqual(list(parser),self.search_res)
SEARCH_DATA = """# command: cmsearch -T 0.0 --tabfile /tmp/tmpQGr0PGVeaEvGUkw2TM3e.txt --informat FASTA /tmp/tmp40hq0MqFPLn2lAymQeAD.txt /tmp/tmplTEQNgv0UA7sFSV0Z2RL.txt
# date: Mon Nov 8 13:51:12 2010
# num seqs: 3
# dbsize(Mb): 0.000124
#
# Pre-search info for CM 1: model1.cm
#
# rnd mod alg cfg beta bit sc cut
# --- --- --- --- ----- ----------
# 1 hmm fwd loc - 3.00
# 2 cm cyk loc 1e-10 0.00
# 3 cm ins loc 1e-15 0.00
#
# CM: model1.cm
# target coord query coord
# ---------------------- ------------
# model name target name start stop start stop bit sc E-value GC%
# ------------------------------- ----------- ---------- ---------- ----- ----- -------- -------- ---
model1.cm seq_0 5 23 1 19 12.85 - 37
model1.cm seq_1 1 19 1 19 14.36 - 47
#
# Post-search info for CM 1: /tmp/tmpWmLUo5hsKH6nyib4nGMq.cm
#
# rnd mod alg cfg beta bit sc cut num hits surv fract
# --- --- --- --- ----- ---------- -------- ----------
# 1 hmm fwd loc - 3.00 2 0.4113
# 2 cm cyk loc 1e-10 0.00 2 0.4113
# 3 cm ins loc 1e-15 0.00 2 0.3065
#
# run time
# -----------
# 00:00:00"""
ALIGN_DATA = """# cmalign :: align sequences to an RNA CM
# INFERNAL 1.0.2 (October 2009)
# Copyright (C) 2009 HHMI Janelia Farm Research Campus
# Freely distributed under the GNU General Public License (GPLv3)
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# command: cmalign orig_alignments/RF01057_3NPQ_chain_A_results.cm RF01057_two_seqs.fasta
# date: Thu May 1 13:00:32 2011
#
# cm name algorithm config sub bands tau
# ------------------------- --------- ------ --- ----- ------
# RF01057_3NPQ_chain_A_resu opt acc global no hmm 1e-07
#
# bit scores
# ------------------
# seq idx seq name len total struct avg prob elapsed
# ------- -------------------------- ----- -------- -------- -------- -----------
1 AABL01002928.1/2363-2445 83 55.02 2.94 0.956 00:00:00.01
2 AACV01025780.1/26051-26134 84 53.31 4.42 0.960 00:00:00.01
# STOCKHOLM 1.0
#=GF AU Infernal 1.0.2
AABL01002928.1/2363-2445 CGCGCCGAGGAGCGCUGCGACGGCCCG...UCGAGGGCCGCCAGGCUCGG
AACV01025780.1/26051-26134 CCUGCCGAGGGGCGCUGCGACCGGAUCcaaUGAGGCCCGGCCAGGCUCGG
#=GC SS_cons :::::::::<--<<<--<--<<_____...________>>->--------
#=GC RF CuuuCCGAGGAGCGCUGcAACgGgcuc...uuacggcccGCcAGGCUCGG
AABL01002928.1/2363-2445 CGGGG...ACAAucgguUUUCCAACGGCGSUCUGUUUAU
AACV01025780.1/26051-26134 UAAGGuggCUUU.....GUAACAACGGCGCCCGGCUAGA
#=GC SS_cons -----...----.....--------->>>-->:::::::
#=GC RF aaagG...uaaa.....ccuaCAACGGCGCUCAcuCaca
//
#
# CPU time: 0.02u 0.00s 00:00:00.02 Elapsed: 00:00:00"""
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_kegg_fasta.py 000644 000765 000024 00000005650 12024702176 023266 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
__author__ = "Jesse Zaneveld"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jesse Zaneveld"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Jesse Zaneveld"
__email__ = "zaneveld@gmail.com"
__status__ = "Production"
"""
Test code for kegg_fasta.py in cogent.parse.
"""
from cogent.util.unit_test import TestCase, main
from cogent.parse.kegg_fasta import kegg_label_fields, parse_fasta
class ParseKeggFastaTests(TestCase):
def test_kegg_label_fields(self):
"""kegg_label_fields should return fields from line"""
# Format is species:gene_id [optional gene_name]; description.
# Note that the '>' should already be stripped by the Fasta Parser
test1 = \
"""stm:STM0001 thrL; thr operon leader peptide ; K08278 thr operon leader peptide"""
test2 = \
"""stm:STM0002 thrA; bifunctional aspartokinase I/homeserine dehydrogenase I (EC:2.7.2.4 1.1.1.13); K00003 homoserine dehydrogenase [EC:1.1.1.3]; K00928 aspartate kinase [EC:2.7.2.4]"""
obs = kegg_label_fields(test1)
exp = ('stm:STM0001','stm','STM0001',\
'thrL','thr operon leader peptide ; K08278 thr operon leader peptide')
self.assertEqual(obs,exp)
obs = kegg_label_fields(test2)
exp = ('stm:STM0002', 'stm', 'STM0002', 'thrA', \
'bifunctional aspartokinase I/homeserine dehydrogenase I (EC:2.7.2.4 1.1.1.13); K00003 homoserine dehydrogenase [EC:1.1.1.3]; K00928 aspartate kinase [EC:2.7.2.4]')
self.assertEqual(obs,exp)
def test_parse_fasta(self):
"""parse_fasta should parse KEGG FASTA lines"""
obs = parse_fasta(TEST_KEGG_FASTA_LINES)
exp = EXP_RESULT
for i,entry in enumerate(obs):
self.assertEqual(entry, exp[i])
TEST_KEGG_FASTA_LINES = \
[">stm:STM0001 thrL; thr operon leader peptide; K08278 thr operon leader peptide",\
"atgaaccgcatcagcaccaccaccattaccaccatcaccattaccacaggtaacggtgcgggctga",\
">stm:STM0002 thrA; bifunctional aspartokinase I/homeserine dehydrogenase I (EC:2.7.2.4 1.1.1.13); K12524 bifunctional aspartokinase/homoserine dehydrogenase 1 [EC:2.7.2.4 1.1.1.3]",\
"atgcgagtgttgaagttcggcggtacatcagtggcaaatgcagaacgttttctgcgtgtt",\
"gccgatattctggaaagcaatgccaggcaagggcaggtagcgaccgtactttccgccccc"]
EXP_RESULT = \
["\t".join(["stm:STM0001","stm","STM0001",\
"thrL","thr operon leader peptide; K08278 thr operon leader peptide","atgaaccgcatcagcaccaccaccattaccaccatcaccattaccacaggtaacggtgcgggctga","\n"]),\
"\t".join(["stm:STM0002","stm","STM0002",\
"thrA","bifunctional aspartokinase I/homeserine dehydrogenase I (EC:2.7.2.4 1.1.1.13); K12524 bifunctional aspartokinase/homoserine dehydrogenase 1 [EC:2.7.2.4 1.1.1.3]",\
"atgcgagtgttgaagttcggcggtacatcagtggcaaatgcagaacgttttctgcgtgttgccgatattctggaaagcaatgccaggcaagggcaggtagcgaccgtactttccgccccc","\n"])]
if __name__=="__main__":
main()
PyCogent-1.5.3/tests/test_parse/test_kegg_ko.py 000644 000765 000024 00000041365 12024702176 022604 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
__author__ = "Jesse Zaneveld"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jesse Zaneveld"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Jesse Zaneveld"
__email__ = "zaneveld@gmail.com"
__status__ = "Production"
"""
Test code for kegg_ko.py in cogent.parse.
"""
from cogent.util.unit_test import TestCase, main
from cogent.parse.kegg_ko import kegg_label_fields,\
parse_kegg_taxonomy, ko_record_iterator, ko_record_splitter,\
ko_default_parser, ko_first_field_parser, delete_comments,\
ko_colon_fields, ko_colon_delimited_parser, _is_new_kegg_rec_group,\
group_by_end_char, class_lines_to_fields, ko_class_parser, parse_ko,\
parse_ko_file, make_tab_delimited_line_parser
class ParseKOTests(TestCase):
def make_tab_delimited_line_parser(self):
"""make_tab_delimited_line_parser should generate line parser"""
line ="good\tbad:good\tgood\tgood\tbad:good\tgood"
parse_fn = make_tab_delimited_line_parser([0,2,3,5])
obs = parse_fn(line)
exp = "good\tgood\tgood\tgood\tgood\tgood"
self.assertEqual(obs,exp)
def test_kegg_label_fields(self):
"""kegg_label_fields should return fields from line"""
# Format is species:gene_id [optional gene_name]; description.
# Note that the '>' should already be stripped by the Fasta Parser
test1 = \
"""stm:STM0001 thrL; thr operon leader peptide ; K08278 thr operon leader peptide"""
test2 = \
"""stm:STM0002 thrA; bifunctional aspartokinase I/homeserine dehydrogenase I (EC:2.7.2.4 1.1.1.13); K00003 homoserine dehydrogenase [EC:1.1.1.3]; K00928 aspartate kinase [EC:2.7.2.4]"""
obs = kegg_label_fields(test1)
exp = ('stm:STM0001','stm','STM0001',\
'thrL','thr operon leader peptide ; K08278 thr operon leader peptide')
self.assertEqual(obs,exp)
obs = kegg_label_fields(test2)
exp = ('stm:STM0002', 'stm', 'STM0002', 'thrA', \
'bifunctional aspartokinase I/homeserine dehydrogenase I (EC:2.7.2.4 1.1.1.13); K00003 homoserine dehydrogenase [EC:1.1.1.3]; K00928 aspartate kinase [EC:2.7.2.4]')
self.assertEqual(obs,exp)
def test_ko_record_iterator(self):
"""ko_record_iterator should iterate over KO records"""
recs = []
for rec in ko_record_iterator(TEST_KO_LINES):
recs.append(rec)
self.assertEqual(len(recs),3)
self.assertEqual(len(recs[0]),31)
exp = 'ENTRY K01559 KO\n'
self.assertEqual(recs[0][0],exp)
exp = ' RCI: RCIX1162 RCIX2396\n'
self.assertEqual(recs[0][-1],exp)
exp = 'ENTRY K01561 KO\n'
self.assertEqual(recs[-1][0],exp)
exp = ' MSE: Msed_1088\n'
self.assertEqual(recs[-1][-1],exp)
def test_ko_record_splitter(self):
"""ko_record_splitter should split ko lines into a dict of groups"""
recs=[rec for rec in ko_record_iterator(TEST_KO_LINES)]
split_recs = ko_record_splitter(recs[0])
exp = ['GENES AFM: AFUA_4G13070\n',\
' PHA: PSHAa2393\n',\
' ABO: ABO_0668\n',\
' BXE: Bxe_B0037 Bxe_C0683 Bxe_C1002 Bxe_C1023\n',\
' MPT: Mpe_A2274\n',\
' BBA: Bd0910(catD)\n',\
' GBE: GbCGDNIH1_0998 GbCGDNIH1_1171\n',\
' FNU: FN1345\n', \
' RBA: RB13257\n',\
' HMA: rrnAC1925(mhpC)\n',\
' RCI: RCIX1162 RCIX2396\n']
self.assertEqual(exp,split_recs["GENES"])
exp = ['CLASS Metabolism; Biosynthesis of Secondary Metabolites; Limonene and\n', ' pinene degradation [PATH:ko00903]\n', ' Metabolism; Xenobiotics Biodegradation and Metabolism; Caprolactam\n', ' degradation [PATH:ko00930]\n', ' Metabolism; Xenobiotics Biodegradation and Metabolism;\n', ' 1,1,1-Trichloro-2,2-bis(4-chlorophenyl)ethane (DDT) degradation\n', ' [PATH:ko00351]\n', ' Metabolism; Xenobiotics Biodegradation and Metabolism; Benzoate\n', ' degradation via CoA ligation [PATH:ko00632]\n', ' Metabolism; Xenobiotics Biodegradation and Metabolism; Benzoate\n', ' degradation via hydroxylation [PATH:ko00362]\n']
def test_ko_default_parser(self):
"""ko_default parser should strip out newlines and join lines together"""
# Applies to 'NAME' and 'DEFINITION' lines
default_line_1 = ['NAME E3.8.1.2\n']
obs = ko_default_parser(default_line_1)
self.assertEqual(obs,'E3.8.1.2')
default_line_2 = ['DEFINITION 2-haloacid dehalogenase [EC:3.8.1.2]\n']
obs = ko_default_parser(default_line_2)
self.assertEqual(obs,'2-haloacid dehalogenase [EC:3.8.1.2]')
def test_ko_first_field_parser(self):
"""ko_first_field_parser should strip out newlines and join lines
together (first field only)"""
obs = ko_first_field_parser(\
['ENTRY K01559 KO\n'])
exp = 'K01559'
self.assertEqual(obs,exp)
def test_delete_comments(self):
"""delete_comments should delete parenthetical comments from lines"""
test_line = \
"bifunctional aspartokinase I/homeserine dehydrogenase I (EC:2.7.2.4 1.1.1.13);"
exp = "bifunctional aspartokinase I/homeserine dehydrogenase I ;"
obs = delete_comments(test_line)
self.assertEqual(obs,exp)
nested_test_line = \
"text(comment1(comment2));"
exp = "text;"
obs = delete_comments(nested_test_line)
self.assertEqual(obs,exp)
def test_ko_colon_fields(self):
"""ko_colon_fields should convert lines to (key, [list of values])"""
test_lines =\
[' BXE: Bxe_B0037 Bxe_C0683 Bxe_C1002 Bxe_C1023\n']
obs = ko_colon_fields(test_lines)
exp = ('BXE', ['Bxe_B0037', 'Bxe_C0683', 'Bxe_C1002', 'Bxe_C1023'])
self.assertEqual(obs,exp)
test_lines = [' HMA: rrnAC1925(mhpC)\n']
obs = ko_colon_fields(test_lines, without_comments = True)
exp = ('HMA', ['rrnAC1925'])
self.assertEqual(obs,exp)
test_lines = [' HMA: rrnAC1925(mhpC)\n']
obs = ko_colon_fields(test_lines, without_comments = False)
exp = ('HMA', ['rrnAC1925(mhpC)'])
self.assertEqual(obs,exp)
def test_ko_colon_delimited_parser(self):
"""ko_colon_delimited_parser should return a dict of id: values for
colon delimited lines"""
test_lines =\
['GENES AFM: AFUA_4G13070\n',\
' PHA: PSHAa2393\n',\
' ABO: ABO_0668\n',\
' BXE: Bxe_B0037 Bxe_C0683 Bxe_C1002 Bxe_C1023\n',\
' MPT: Mpe_A2274\n',\
' BBA: Bd0910(catD)\n',\
' GBE: GbCGDNIH1_0998 GbCGDNIH1_1171\n',\
' FNU: FN1345\n',\
' RBA: RB13257\n',\
' HMA: rrnAC1925(mhpC)\n',\
' RCI: RCIX1162 RCIX2396\n']
obs = ko_colon_delimited_parser(test_lines, without_comments = True)
self.assertEqual(obs['BXE'],['Bxe_B0037','Bxe_C0683', 'Bxe_C1002','Bxe_C1023'])
self.assertEqual(obs['PHA'],['PSHAa2393'])
# Check that comments are stripped
self.assertEqual(obs['BBA'],['Bd0910'])
obs = ko_colon_delimited_parser(test_lines, without_comments = False)
# Lines without comments shouldn't be affected
self.assertEqual(obs['BXE'],['Bxe_B0037','Bxe_C0683', 'Bxe_C1002','Bxe_C1023'])
self.assertEqual(obs['PHA'],['PSHAa2393'])
# Comments should be preserved
self.assertEqual(obs['BBA'],['Bd0910(catD)'])
def test_is_new_kegg_rec_group(self):
"""_is_new_kegg_rec_group should check for irregular field terminators in KEGG"""
pass
# Handle unusual KEGG fields.
def test_group_by_end_char(self):
"""group_by_end_char should yield successive lines that end with a given
char, plus the last group of lines"""
class_lines=['CLASS Metabolism; Xenobiotics Biodegradation and Metabolism;\n',\
' gamma-Hexachlorocyclohexane degradation [PATH:ko00361]\n',\
' Metabolism; Xenobiotics Biodegradation and Metabolism;\n',\
' 1,2-Dichloroethane degradation [PATH:ko00631]\n']
exp =[['CLASS Metabolism; Xenobiotics Biodegradation and Metabolism;\n',\
' gamma-Hexachlorocyclohexane degradation [PATH:ko00361]\n'],\
[' Metabolism; Xenobiotics Biodegradation and Metabolism;\n',\
' 1,2-Dichloroethane degradation [PATH:ko00631]\n']]
for i,group in enumerate(group_by_end_char(class_lines)):
self.assertEqual(group, exp[i])
def test_class_lines_to_fields(self):
"""class_lines_to_fields should split groups of lines for one KO class
definition"""
class_lines1=['CLASS Metabolism; Xenobiotics Biodegradation and Metabolism;\n',\
' gamma-Hexachlorocyclohexane degradation [PATH:ko00361]\n']
class_lines2=[' Metabolism; Xenobiotics Biodegradation and Metabolism;\n',\
' 1,2-Dichloroethane degradation [PATH:ko00631]\n']
obs = class_lines_to_fields(class_lines1)
exp = ('PATH:ko00361',('Metabolism', 'Xenobiotics Biodegradation and Metabolism', 'gamma-Hexachlorocyclohexane degradation'))
self.assertEqual(obs,exp)
obs = class_lines_to_fields(class_lines2)
exp = ('PATH:ko00631',('Metabolism', 'Xenobiotics Biodegradation and Metabolism','1,2-Dichloroethane degradation'))
self.assertEqual(obs,exp)
def test_ko_class_parser(self):
"""ko_class_parser should return fields"""
class_lines='CLASS Metabolism; Xenobiotics Biodegradation and Metabolism;\n',\
' gamma-Hexachlorocyclohexane degradation [PATH:ko00361]\n',\
' Metabolism; Xenobiotics Biodegradation and Metabolism;\n',\
' 1,2-Dichloroethane degradation [PATH:ko00631]\n'
exp = [('PATH:ko00361',('Metabolism','Xenobiotics Biodegradation and Metabolism',\
'gamma-Hexachlorocyclohexane degradation')),\
('PATH:ko00631',('Metabolism', 'Xenobiotics Biodegradation and Metabolism', '1,2-Dichloroethane degradation'))]
for i,obs in enumerate(ko_class_parser(class_lines)):
self.assertEqual(obs,exp[i])
def test_parse_ko(self):
"""parse_ko should parse a ko record into fields """
lines = TEST_KO_LINES
r = parse_ko(lines)
results = []
for result in r:
results.append(result)
# For each entry we expect a dict
self.assertEqual(results[0]["ENTRY"], "K01559")
self.assertEqual(results[1]["ENTRY"], "K01560")
self.assertEqual(results[2]["ENTRY"], "K01561")
self.assertEqual(results[0]["NAME"], "E3.7.1.-")
self.assertEqual(results[1]["NAME"], "E3.8.1.2")
self.assertEqual(results[2]["NAME"], "E3.8.1.3")
self.assertEqual(results[0].get("DEFINITION"), None) #case 1 has no def
self.assertEqual(results[1]["DEFINITION"],\
"2-haloacid dehalogenase [EC:3.8.1.2]")
self.assertEqual(results[2]["DEFINITION"],\
"haloacetate dehalogenase [EC:3.8.1.3]")
self.assertEqual(len(results[0]["CLASS"]), 5)
self.assertEqual(results[0]["CLASS"][4], \
('PATH:ko00362', ('Metabolism', \
'Xenobiotics Biodegradation and Metabolism',\
'Benzoate degradation via hydroxylation')))
self.assertEqual(results[0]["DBLINKS"], \
{'RN': ['R04488', 'R05100', 'R05363', \
'R05365', 'R06371', 'R07515', \
'R07831']})
self.assertEqual(results[1]["DBLINKS"], \
{'GO': ['0018784'], 'RN': ['R05287'], 'COG': ['COG1011']})
self.assertEqual(results[2]["DBLINKS"], \
{'GO': ['0018785'], 'RN': ['R05287']})
self.assertEqual(results[0]["GENES"], \
{'AFM': ['AFUA_4G13070'], 'FNU': ['FN1345'],\
'GBE': ['GbCGDNIH1_0998', 'GbCGDNIH1_1171'],\
'PHA': ['PSHAa2393'], \
'BBA': ['Bd0910'], \
'ABO': ['ABO_0668'],\
'MPT': ['Mpe_A2274'],\
'RCI': ['RCIX1162', 'RCIX2396'], \
'BXE': ['Bxe_B0037', 'Bxe_C0683', 'Bxe_C1002', 'Bxe_C1023'],\
'HMA': ['rrnAC1925'], \
'RBA': ['RB13257']})
TEST_KO_LINES = ['ENTRY K01559 KO\n', '\
NAME E3.7.1.-\n', '\
PATHWAY ko00351 1,1,1-Trichloro-2,2-bis(4-chlorophenyl)ethane (DDT)\n', '\
degradation\n', '\
ko00362 Benzoate degradation via hydroxylation\n', '\
ko00632 Benzoate degradation via CoA ligation\n', '\
ko00903 Limonene and pinene degradation\n', '\
ko00930 Caprolactam degradation\n', '\
CLASS Metabolism; Biosynthesis of Secondary Metabolites; Limonene and\n', '\
pinene degradation [PATH:ko00903]\n', '\
Metabolism; Xenobiotics Biodegradation and Metabolism; Caprolactam\n', '\
degradation [PATH:ko00930]\n', '\
Metabolism; Xenobiotics Biodegradation and Metabolism;\n', '\
1,1,1-Trichloro-2,2-bis(4-chlorophenyl)ethane (DDT) degradation\n', '\
[PATH:ko00351]\n', '\
Metabolism; Xenobiotics Biodegradation and Metabolism; Benzoate\n', '\
degradation via CoA ligation [PATH:ko00632]\n', '\
Metabolism; Xenobiotics Biodegradation and Metabolism; Benzoate\n', '\
degradation via hydroxylation [PATH:ko00362]\n', '\
DBLINKS RN: R04488 R05100 R05363 R05365 R06371 R07515 R07831\n', '\
GENES AFM: AFUA_4G13070\n', '\
PHA: PSHAa2393\n', '\
ABO: ABO_0668\n', '\
BXE: Bxe_B0037 Bxe_C0683 Bxe_C1002 Bxe_C1023\n', '\
MPT: Mpe_A2274\n', '\
BBA: Bd0910(catD)\n', '\
GBE: GbCGDNIH1_0998 GbCGDNIH1_1171\n', '\
FNU: FN1345\n', '\
RBA: RB13257\n', '\
HMA: rrnAC1925(mhpC)\n', '\
RCI: RCIX1162 RCIX2396\n', '\
///\n', '\
ENTRY K01560 KO\n', '\
NAME E3.8.1.2\n', '\
DEFINITION 2-haloacid dehalogenase [EC:3.8.1.2]\n', '\
PATHWAY ko00361 gamma-Hexachlorocyclohexane degradation\n', '\
ko00631 1,2-Dichloroethane degradation\n', '\
CLASS Metabolism; Xenobiotics Biodegradation and Metabolism;\n', '\
gamma-Hexachlorocyclohexane degradation [PATH:ko00361]\n', '\
Metabolism; Xenobiotics Biodegradation and Metabolism;\n', '\
1,2-Dichloroethane degradation [PATH:ko00631]\n', '\
DBLINKS RN: R05287\n', '\
COG: COG1011\n', '\
GO: 0018784\n', '\
GENES NCR: NCU03617\n', '\
ANI: AN5830.2 AN7918.2\n', '\
AFM: AFUA_2G07750 AFUA_5G14640 AFUA_8G05870\n', '\
AOR: AO090001000019 AO090003001435 AO090011000921\n', '\
PST: PSPTO_0247(dehII)\n', '\
PSP: PSPPH_1747(dehII1) PSPPH_5028(dehII2)\n', '\
ATU: Atu0797 Atu3405(hadL)\n', '\
ATC: AGR_C_1458 AGR_L_2834\n', '\
RET: RHE_CH00996(ypch00330) RHE_PF00342(ypf00173)\n', '\
MSE: Msed_0732\n', '\
///\n', '\
ENTRY K01561 KO\n', '\
NAME E3.8.1.3\n', '\
DEFINITION haloacetate dehalogenase [EC:3.8.1.3]\n', '\
PATHWAY ko00361 gamma-Hexachlorocyclohexane degradation\n', '\
ko00631 1,2-Dichloroethane degradation\n', '\
CLASS Metabolism; Xenobiotics Biodegradation and Metabolism;\n', '\
gamma-Hexachlorocyclohexane degradation [PATH:ko00361]\n', '\
Metabolism; Xenobiotics Biodegradation and Metabolism;\n', '\
1,2-Dichloroethane degradation [PATH:ko00631]\n', '\
DBLINKS RN: R05287\n', '\
GO: 0018785\n', '\
GENES RSO: RSc0256(dehH)\n', '\
REH: H16_A0197\n', '\
BPS: BPSL0329\n', '\
BPM: BURPS1710b_0537(dehH)\n', '\
BPD: BURPS668_0347\n', '\
STO: ST2570\n', '\
MSE: Msed_1088\n', '\
///\n']
if __name__=="__main__":
main()
PyCogent-1.5.3/tests/test_parse/test_kegg_pos.py 000644 000765 000024 00000003447 12024702176 022773 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from cogent.util.unit_test import TestCase, main
from cogent.parse.kegg_pos import parse_pos_lines, parse_pos_file
__author__ = "Jesse Zaneveld"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jesse Zaneveld", "Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Jesse Zaneveld"
__email__ = "zaneveld@gmail.com"
__status__ = "Production"
"""
Test code for kegg_pos.py in cogent.parse.
"""
class ParsePosTests(TestCase):
def test_parse_pos_lines(self):
"""Parse pos lines should parse given lines and filename"""
test_lines = \
['YPO0021 hemN 28982 28982..30355 1374\n',\
'YPO0022 glnG, glnT, ntrC 30409 complement(30409..31821) 1413\n',\
'YPO0023 glnL, ntrB 31829 complement(31829..32878) 1050\n',\
'YPO0024 glnA 33131 complement(33131..34540) 1410\n']
obs = parse_pos_lines(test_lines, file_name = 'y.pestis.pos')
exp = ['y.pestis\tYPO0021 hemN 28982 28982..30355 1374\n',\
'y.pestis\tYPO0022 glnG, glnT, ntrC 30409 complement(30409..31821) 1413\n',\
'y.pestis\tYPO0023 glnL, ntrB 31829 complement(31829..32878) 1050\n',\
'y.pestis\tYPO0024 glnA 33131 complement(33131..34540) 1410\n']
for i,parsed_line in enumerate(obs):
self.assertEqual(parsed_line, exp[i])
def test_pos_to_fields(self):
"""parse_pos_to_fields should open files and extract fields"""
# Note that this test is set to pass, as this is just a simple
# open/yield wrapper equivalent to the demo blocks for other parsers.
# It is kept as an independent function to allow calling by handlers.
pass
if __name__=="__main__":
main()
PyCogent-1.5.3/tests/test_parse/test_kegg_taxonomy.py 000644 000765 000024 00000003260 12024702176 024041 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from cogent.util.unit_test import TestCase, main
from cogent.parse.kegg_taxonomy import parse_kegg_taxonomy
__author__ = "Jesse Zaneveld"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jesse Zaneveld","Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Jesse Zaneveld"
__email__ = "zaneveld@gmail.com"
__status__ = "Production"
"""
Test code for kegg_taxonomy.py in cogent.parse.
"""
class ParseKEGGTaxonomy(TestCase):
def test_parse_kegg_taxonomy(self):
"""parse_kegg_taxonomy should return successive taxonomy entries from
lines"""
test_lines =\
['# Eukaryotes\n',\
'## Animals\n',\
'### Vertebrates\n',\
'#### Mammals\n',\
'T01001(2000)\thsa\tH.sapiens\tHomo sapiens (human)\n',\
'#### Birds\n',\
'T01006(2005)\tgga\tG.gallus\tGallus gallus (chicken)\n',\
'### Arthropods\n',\
'#### Insects\n',\
'T00030(2000)\tdme\tD.melanogaster\tDrosophila melanogaster (fruit fly)\n']
exp =\
['Eukaryotes\tAnimals\tVertebrates\tMammals\tT01001(2000)\thsa\tH.sapiens\tHomo sapiens (human)\tHomo\tsapiens\thuman\n',\
'Eukaryotes\tAnimals\tVertebrates\tBirds\tT01006(2005)\tgga\tG.gallus\tGallus gallus (chicken)\tGallus\tgallus\tchicken\n',\
'Eukaryotes\tAnimals\tArthropods\tInsects\tT00030(2000)\tdme\tD.melanogaster\tDrosophila melanogaster (fruit fly)\tDrosophila\tmelanogaster\tfruit fly\n']
obs = parse_kegg_taxonomy(test_lines)
for i,res in enumerate(obs):
self.assertEqual(res,exp[i])
if __name__=="__main__":
main()
PyCogent-1.5.3/tests/test_parse/test_locuslink.py 000644 000765 000024 00000033067 12024702176 023201 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for locuslink-specific classes
"""
from cogent.parse.locuslink import ll_start,LLFinder,pipes,first_pipe,commas, \
_read_accession, _read_rell, _read_accnum, \
_read_map, _read_sts, _read_comp, _read_grif, _read_pmid, _read_go, \
_read_extannot, _read_cdd, _read_contig, LocusLink, LinesToLocusLink
from cogent.util.unit_test import TestCase, main
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class locuslinkTests(TestCase):
"""Tests toplevel functions."""
def test_read_accession(self):
"""_read_accession should perform correct conversions"""
self.assertEqual(_read_accession('NP_035835|6755985|na\n'), \
{'Accession':'NP_035835','Gi':'6755985','Strain':'na'})
#check that it ignores additional fields
self.assertEqual(_read_accession('NG_002740|30172554|na|1|1315\n'), \
{'Accession':'NG_002740','Gi':'30172554','Strain':'na'})
def test_read_rell(self):
"""_read_rell should perform correct conversions"""
self.assertEqual(_read_rell(\
'related mRNA|AK090391|n|NM_153775--AK090391\n'), \
{'Description':'related mRNA','Id':'AK090391','IdType':'n',\
'Printable':'NM_153775--AK090391'})
def test_read_accnum(self):
"""_read_accnum should perform correct conversions"""
self.assertEqual(_read_accnum('NG_002740|30172554|na|1|1315\n'), \
{'Accession':'NG_002740','Gi':'30172554','Strain':'na',\
'Start':'1','End':'1315'})
def test_read_map(self):
"""_read_map should perform correct conversions"""
self.assertEqual(_read_map('10 C1|RefSeq|C|\n'), \
{'Location':'10 C1', 'Source':'RefSeq','Type':'C'})
def test_read_sts(self):
"""_read_sts should perform correct conversions"""
self.assertEqual(_read_sts('RH35858|2|37920|na|seq_map|epcr\n'), \
{'Name':'RH35858','Chromosome':'2','StsId':'37920', 'Segment':'na',\
'SequenceKnown':'seq_map', 'Evidence':'epcr'})
def test_read_cdd(self):
"""_read_cdd should perform correct conversions"""
self.assertEqual(_read_cdd(\
'Immunoglobulin C-2 Type|smart00408|103|na|4.388540e+01\n'),
{'Name':'Immunoglobulin C-2 Type','Key':'smart00408',\
'Score':'103', 'EValue':'na', 'BitScore':'4.388540e+01'})
def test_read_comp(self):
"""_read_comp should perform correct conversions"""
self.assertEqual(_read_comp(\
'10090|Map2k6|11|11 cM|26399|17|MAP2K6|ncbi_mgd\n'), \
{'TaxonId':'10090', 'Symbol':'Map2k6', 'Chromosome':'11', \
'Position':'11 cM', 'LocusId':'26399', 'ChromosomeSelf':'17', \
'SymbolSelf':'MAP2K6','MapName':'ncbi_mgd'})
def test_read_grif(self):
"""_read_grif should perform correct conversions"""
self.assertEqual(_read_grif('12037672|interaction with pRb\n'), \
{'PubMedId':'12037672', 'Description':'interaction with pRb'})
def test_read_pmid(self):
"""_read_pmid should perform correct conversions"""
self.assertEqual(_read_pmid('12875969,12817023,12743034\n'), \
['12875969','12817023','12743034'])
def test_read_go(self):
"""_read_go should perform correct conversions"""
self.assertEqual(_read_go(\
'molecular function|zinc ion binding|IEA|GO:0008270|GOA|na\n'), \
{'Category':'molecular function', 'Term':'zinc ion binding',\
'EvidenceCode':'IEA','GoId':'GO:0008270','Source':'GOA', \
'PubMedId':'na'})
def test_read_extannot(self):
"""_read_extannot should perform correct conversions"""
self.assertEqual(_read_extannot(\
'cellular role|Pol II transcription|NR|Proteome|8760285\n'), \
{'Category':'cellular role','Term':'Pol II transcription',\
'EvidenceCode':'NR', 'Source':'Proteome', 'PubMedId':'8760285'})
def test_read_contig(self):
"""_read_contig should perform correct conversions"""
self.assertEqual(_read_contig(\
'NT_011109.15|29800594|na|31124734|31133047|-|19|reference\n'),\
{'Accession':'NT_011109.15','Gi':'29800594','Strain':'na',\
'From':'31124734','To':'31133047','Orientation':'-',\
'Chromosome':'19','Assembly':'reference'})
def test_LinesToLocusLink(self):
"""LinesToLocusLink should give expected results on sample data"""
fake_file = \
""">>1
LOCUSID: 1
LOCUS_CONFIRMED: yes
LOCUS_TYPE: gene with protein product, function known or inferred
ORGANISM: Homo sapiens
STATUS: REVIEWED
NM: NM_130786|21071029|na
NP: NP_570602|21071030
CDD: Immunoglobulin C-2 Type|smart00408|103|na|4.388540e+01
PRODUCT: alpha 1B-glycoprotein
ASSEMBLY: AF414429,AK055885,AK056201
CONTIG: NT_011109.15|29800594|na|31124734|31133047|-|19|reference
EVID: supported by alignment with mRNA
XM: NM_130786|21071029|na
XP: NP_570602|21071030|na
ACCNUM: AC010642|9929687|na|43581|41119
TYPE: g
ACCNUM: AF414429|15778555|na|na|na
TYPE: m
PROT: AAL07469|15778556
ACCNUM: AK055885|16550723|na|na|na
TYPE: m
ACCNUM: AK056201|16551539|na|na|na
TYPE: m
ACCNUM: BC035719|23273475|na|na|na
TYPE: m
PROT: AAH35719|23273476
ACCNUM: none|na|na|na|na
TYPE: p
PROT: P04217|23503038
OFFICIAL_SYMBOL: A1BG
OFFICIAL_GENE_NAME: alpha-1-B glycoprotein
ALIAS_SYMBOL: A1B
ALIAS_SYMBOL: ABG
ALIAS_SYMBOL: GAB
PREFERRED_PRODUCT: alpha 1B-glycoprotein
SUMMARY: Summary: The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins.
CHR: 19
STS: RH65092|-|10673|na|na|epcr
STS: WI-16009|-|52209|na|na|epcr
STS: G59506|-|136670|na|na|epcr
COMP: 10090|A1bg|na|na|117586|19|A1BG|ncbi_mgd
COMP: 10090|A1bg|7|7 cM|117586|19|A1BG|ncbi_mgd
BUTTON: unigene.gif
LINK: http://www.ncbi.nlm.nih.gov/UniGene/clust.cgi?ORG=Hs&CID=390608
UNIGENE: Hs.390608
OMIM: 138670
MAP: 19q13.4|RefSeq|C|
MAPLINK: default_human_gene|A1BG
BUTTON: snp.gif
LINK: http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?locusId=1
BUTTON: homol.gif
LINK: http://www.ncbi.nlm.nih.gov/HomoloGene/homolquery.cgi?TEXT=1[loc]&TAXID=9606
BUTTON: ensembl.gif
LINK: http://www.ensembl.org/Homo_sapiens/contigview?geneid=NM_130786
BUTTON: ucsc.gif
LINK: http://genome.ucsc.edu/cgi-bin/hgTracks?org=human&position=NM_130786
BUTTON: mgc.gif
LINK: http://mgc.nci.nih.gov/Genes/GeneInfo?ORG=Hs&CID=390608
PMID: 12477932,8889549,3458201,2591067
GO: molecular function|molecular_function unknown|ND|GO:0005554|GOA|3458201
GO: biological process|biological_process unknown|ND|GO:0000004|GOA|na
GO: cellular component|extracellular|IDA|GO:0005576|GOA|3458201
>>386590
LOCUSID: 386590
LOCUS_CONFIRMED: yes
LOCUS_TYPE: gene with protein product, function known or inferred
ORGANISM: Danio rerio
ACCNUM: AF510108|31323727|na|na|na
TYPE: m
PROT: AAP47138|31323728
OFFICIAL_SYMBOL: tra1
OFFICIAL_GENE_NAME: tumor rejection antigen (gp96) 1
BUTTON: zfin.gif
LINK: http://zfin.org/cgi-bin/ZFIN_jump?record=ZDB-GENE-031002-1
PMID: 14499652"""
records = list(LLFinder(fake_file.split('\n')))
self.assertEqual(len(records), 2)
first, second = map(LinesToLocusLink, records)
#test the second one first, since it's shorter
self.assertEqual(second.LOCUSID, 386590)
self.assertEqual(second.LOCUS_CONFIRMED, 'yes')
self.assertEqual(second.LOCUS_TYPE, \
'gene with protein product, function known or inferred')
self.assertEqual(second.ORGANISM, 'Danio rerio')
self.assertEqual(second.ACCNUM, [{'Accession':'AF510108', \
'Gi':'31323727', 'Strain':'na','Start':'na','End':'na'}])
self.assertEqual(second.TYPE, ['m'])
self.assertEqual(second.PROT, \
[{'Accession':'AAP47138','Gi':'31323728'}])
self.assertEqual(second.OFFICIAL_SYMBOL, 'tra1')
self.assertEqual(second.OFFICIAL_GENE_NAME, \
'tumor rejection antigen (gp96) 1')
self.assertEqual(second.BUTTON, ['zfin.gif'])
self.assertEqual(second.LINK, \
['http://zfin.org/cgi-bin/ZFIN_jump?record=ZDB-GENE-031002-1'])
self.assertEqual(second.PMID, ['14499652'])
#now for the annoying test on the longer record
self.assertEqual(first.LOCUSID, 1)
self.assertEqual(first.LOCUS_CONFIRMED, 'yes')
self.assertEqual(first.ORGANISM, 'Homo sapiens')
self.assertEqual(first.LOCUS_TYPE, \
'gene with protein product, function known or inferred')
self.assertEqual(first.STATUS, 'REVIEWED')
self.assertEqual(first.NM, [{'Accession':'NM_130786','Gi':'21071029', \
'Strain':'na'}])
self.assertEqual(first.NP, [{'Accession':'NP_570602','Gi':'21071030'}])
self.assertEqual(first.CDD, [{'Name':'Immunoglobulin C-2 Type',\
'Key':'smart00408','Score':'103', 'EValue':'na',\
'BitScore':'4.388540e+01'}])
self.assertEqual(first.PRODUCT, ['alpha 1B-glycoprotein'])
self.assertEqual(first.ASSEMBLY, [['AF414429','AK055885','AK056201']])
self.assertEqual(first.CONTIG, [{'Accession':'NT_011109.15',\
'Gi':'29800594','Strain':'na', 'From':'31124734','To':'31133047',\
'Orientation':'-','Chromosome':'19','Assembly':'reference'}])
self.assertEqual(first.EVID, ['supported by alignment with mRNA'])
self.assertEqual(first.XM, [{'Accession':'NM_130786', 'Gi':'21071029', \
'Strain':'na'}])
self.assertEqual(first.XP, [{'Accession':'NP_570602', 'Gi':'21071030', \
'Strain':'na'}])
self.assertEqual(first.ACCNUM, [ \
{'Accession':'AC010642','Gi':'9929687','Strain':'na',\
'Start':'43581', 'End':'41119'},
{'Accession':'AF414429','Gi':'15778555','Strain':'na',\
'Start':'na', 'End':'na'},
{'Accession':'AK055885','Gi':'16550723','Strain':'na',\
'Start':'na', 'End':'na'},
{'Accession':'AK056201','Gi':'16551539','Strain':'na',\
'Start':'na', 'End':'na'},
{'Accession':'BC035719','Gi':'23273475','Strain':'na',\
'Start':'na', 'End':'na'},
{'Accession':'none','Gi':'na','Strain':'na',\
'Start':'na', 'End':'na'},
])
self.assertEqual(first.TYPE, ['g','m','m','m','m','p'])
self.assertEqual(first.PROT, [ \
{'Accession':'AAL07469', 'Gi':'15778556'},
{'Accession':'AAH35719', 'Gi':'23273476'},
{'Accession':'P04217', 'Gi':'23503038'},
])
self.assertEqual(first.OFFICIAL_SYMBOL, 'A1BG')
self.assertEqual(first.OFFICIAL_GENE_NAME, 'alpha-1-B glycoprotein')
self.assertEqual(first.ALIAS_SYMBOL, ['A1B','ABG','GAB'])
self.assertEqual(first.PREFERRED_PRODUCT, ['alpha 1B-glycoprotein'])
self.assertEqual(first.SUMMARY, ["""Summary: The protein encoded by this gene is a plasma glycoprotein of unknown function. The protein shows sequence similarity to the variable regions of some immunoglobulin supergene family member proteins."""])
self.assertEqual(first.CHR, ['19'])
self.assertEqual(first.STS, [
{'Name':'RH65092','Chromosome':'-','StsId':'10673','Segment':'na',\
'SequenceKnown':'na','Evidence':'epcr'},
{'Name':'WI-16009','Chromosome':'-','StsId':'52209','Segment':'na',\
'SequenceKnown':'na','Evidence':'epcr'},
{'Name':'G59506','Chromosome':'-','StsId':'136670','Segment':'na',\
'SequenceKnown':'na','Evidence':'epcr'},
])
self.assertEqual(first.COMP, [
{'TaxonId':'10090','Symbol':'A1bg','Chromosome':'na','Position':'na',\
'LocusId':'117586', 'ChromosomeSelf':'19','SymbolSelf':'A1BG',\
'MapName':'ncbi_mgd'},
{'TaxonId':'10090','Symbol':'A1bg','Chromosome':'7','Position':'7 cM',\
'LocusId':'117586', 'ChromosomeSelf':'19','SymbolSelf':'A1BG',\
'MapName':'ncbi_mgd'},
])
self.assertEqual(first.BUTTON, ['unigene.gif','snp.gif','homol.gif', \
'ensembl.gif', 'ucsc.gif', 'mgc.gif'])
self.assertEqual(first.LINK, [ \
'http://www.ncbi.nlm.nih.gov/UniGene/clust.cgi?ORG=Hs&CID=390608',
'http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?locusId=1',
'http://www.ncbi.nlm.nih.gov/HomoloGene/homolquery.cgi?TEXT=1[loc]&TAXID=9606',
'http://www.ensembl.org/Homo_sapiens/contigview?geneid=NM_130786',
'http://genome.ucsc.edu/cgi-bin/hgTracks?org=human&position=NM_130786',
'http://mgc.nci.nih.gov/Genes/GeneInfo?ORG=Hs&CID=390608',
])
self.assertEqual(first.UNIGENE, ['Hs.390608'])
self.assertEqual(first.OMIM, ['138670'])
self.assertEqual(first.MAP, [{'Location':'19q13.4','Source':'RefSeq',\
'Type':'C'}])
self.assertEqual(first.MAPLINK, ['default_human_gene|A1BG'])
self.assertEqual(first.PMID, ['12477932','8889549','3458201','2591067'])
self.assertEqual(first.GO, [ \
{'Category':'molecular function','Term':'molecular_function unknown',\
'EvidenceCode':'ND','GoId':'GO:0005554','Source':'GOA',\
'PubMedId':'3458201'},
{'Category':'biological process','Term':'biological_process unknown',\
'EvidenceCode':'ND','GoId':'GO:0000004','Source':'GOA',\
'PubMedId':'na'},
{'Category':'cellular component','Term':'extracellular',\
'EvidenceCode':'IDA','GoId':'GO:0005576','Source':'GOA',\
'PubMedId':'3458201'},
])
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_mage.py 000644 000765 000024 00000022665 12024702176 022111 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests of MageParser
"""
from cogent.util.unit_test import TestCase, main
from cogent.parse.mage import MageParser, MageGroupFromString,\
MageListFromString, MagePointFromString
from cogent.format.mage import MagePoint, MageList, MageGroup, Kinemage
__author__ = "Sandra Smit"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Sandra Smit", "Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Sandra Smit"
__email__ = "sandra.smit@colorado.edu"
__status__ = "Production"
class MageGroupFromStringTests(TestCase):
"""Tests for the MageGroupFromString function"""
def test_MageGroupFromString(self):
"""MageGroupFromString should fill itself from string correctly"""
f = MageGroupFromString
group = f('@group {GA manifold} off')
self.assertEqual(str(group),'@group {GA manifold} off')
group = f('@group {dna} recessiveon off dominant '+\
'master= {master name} nobutton clone={clone_name}\n')
self.assertEqual(str(group),\
'@group {dna} off nobutton recessiveon dominant '+\
'master={master name} clone={clone_name}')
group = f(\
'@subgroup {max_group} recessiveon instance= {inst \tname} lens')
self.assertEqual(str(group),
'@subgroup {max_group} recessiveon lens instance={inst \tname}')
group = f('@subgroup {Pos 1} on')
self.assertEqual(str(group), '@subgroup {Pos 1}')
def test_MageGroupFromString_wrong(self):
"""MageGroupFromString should fail on wrong input"""
f = MageGroupFromString
# unknown keyword
self.assertRaises(KeyError,f,'@group {GA manifold} of')
# wrong nesting
self.assertRaises(ValueError,f,'@group {GA manifold} master={blabla')
class MageListFromStringTests(TestCase):
"""Tests for the MageListFromString function"""
def test_MageListFromString(self):
"""MageListFromString should fill itself from string correctly"""
f = MageListFromString
l = f('@dotlist {label} color= green radius=0.3 off \t nobutton\n' )
self.assertEqual(str(l),\
'@dotlist {label} off nobutton color=green radius=0.3')
l = f('@vectorlist')
self.assertEqual(str(l),'@vectorlist')
l = f('@balllist off angle= 4 width= 2 face=something '+\
'font=\tother size= 3')
self.assertEqual(str(l),'@balllist off angle=4 width=2 '+\
'face=something font=other size=3')
l = f('@dotlist {} on nobutton color=sky')
self.assertEqual(str(l),'@dotlist nobutton color=sky')
def test_MageListFromString_wrong(self):
"""MageListFromString should fail on wrong input"""
f = MageListFromString
self.assertRaises(ValueError,f,
'@somelist {label} color= green radius=0.3 off \t nobutton\n')
self.assertRaises(KeyError,f,
'@vectorlist {label} colors= green radius=0.3 off \t nobutton\n')
class MagePointFromStringTests(TestCase):
"""Tests of the MagePointFromString factory function."""
def test_MagePointFromString(self):
"""MagePoint should fill itself from string correctly"""
m = MagePointFromString('{construction}width5 0.000 0.707 -1.225\n')
self.assertEqual(str(m), \
'{construction} width5 ' + ' '.join(map(str, [0.0,0.707,-1.225])))
m = MagePointFromString('3, 4, 5')
self.assertEqual(str(m), ' '.join(map(str, map(float, [3, 4, 5]))))
m = MagePointFromString('{b2}P 0.000 0.000 0.000')
self.assertEqual(str(m), '{b2} P ' + \
' '.join(map(str, map(float, [0,0,0]))))
m = MagePointFromString('P -2650192.000 4309510.000 3872241.000')
self.assertEqual(str(m), 'P ' + \
' '.join(map(str, map(float, [-2650192,4309510,3872241]))))
m = MagePointFromString('{"}P -2685992.000 5752262.000 535328.000')
self.assertEqual(str(m), '{"} P ' + \
' '.join(map(str, map(float, [-2685992,5752262,535328]))))
m = MagePointFromString('{ 1, d, 0 } P 1.000, 0.618, 0.000')
self.assertEqual(str(m), '{ 1, d, 0 } P ' + \
' '.join(map(str, map(float, [1.000, 0.618, 0.000]))))
m = MagePointFromString('{"}width1 -1.022 0.969 -0.131')
self.assertEqual(str(m), '{"} width1 ' + \
' '.join(map(str, map(float, [-1.022,0.969,-0.131]))))
m = MagePointFromString(\
'width3 {A label with spaces } A blue r=3.7 5, 6, 7')
self.assertEqual(m.Width, 3)
self.assertEqual(m.Label, 'A label with spaces ')
self.assertFloatEqual(m.Coordinates, [5, 6, 7])
self.assertFloatEqual(m.Radius, 3.7)
self.assertEqual(m.Color, 'blue')
self.assertEqual(m.State, 'A')
self.assertEqual(str(m),'{A label with spaces } A blue width3 r=3.7 ' +\
' '.join(map(str, map(float, [5, 6, 7]))))
class MageParserTests(TestCase):
"""Tests for the MageParser"""
def test_MageParser(self):
"""MageParser should work on valid input"""
obs = str(MageParser(EXAMPLE_1.split('\n'))).split('\n')
exp = EXP_EXAMPLE_1.split('\n')
assert len(obs) == len(exp)
#first check per line; easier for debugging
for x in range(len(obs)):
self.assertEqual(obs[x],exp[x])
#double check to see if the whole string is the same
self.assertEqual(str(MageParser(EXAMPLE_1.split('\n'))),EXP_EXAMPLE_1)
EXAMPLE_1 = """
@text
Kinemage of ribosomal RNA SSU Bacteria
@kinemage1
@caption
SSU Bacteria secondary structure elements
@viewid {oblique}
@zoom 1.05
@zslab 467
@center 0.500 0.289 0.204
@matrix
-0.55836 -0.72046 -0.41133 0.82346 -0.42101 -0.38036 0.10085 -0.55108
0.82833
@2viewid {top}
@2zoom 0.82
@2zslab 470
@2center 0.500 0.289 0.204
@2matrix
-0.38337 0.43731 -0.81351 0.87217 -0.11840 -0.47466 -0.30389 -0.89148
-0.33602
@3viewid {side}
@3zoom 0.82
@3zslab 470
@3center 0.500 0.289 0.204
@3matrix
-0.49808 -0.81559 -0.29450 0.86714 -0.46911 -0.16738 -0.00164 -0.33875
0.94088
@4viewid {End-O-Line}
@4zoom 1.43
@4zslab 469
@4center 0.500 0.289 0.204
@4matrix
0.00348 -0.99984 -0.01766 0.57533 -0.01244 0.81784 -0.81792 -0.01301
0.57519
@perspective
@fontsizelabel 24
@onewidth
@zclipoff
@localrotation 1 0 0 .5 .866 0 .5 .289 .816
@group {Tetrahedron}
@vectorlist {Edges} color=white nobutton
P {0 0 0} 0 0 0
0.5 0 0
{1 0 0} 1 0 0
0.5 0.5 0
{0 1 0} 0 1 0
P 0 0 0
0 0.5 0
{0 1 0} 0 1 0
0 0.5 0.5
{0 0 1} 0 0 1
P 0 0 0
0 0 0.5
{0 0 1} 0 0 1
0.5 0 0.5
{1 0 0} 1 0 0
@labellist {labels} color=white nobutton
{U} 0 0 0
{A} 1.1 0 0
{C} 0 1.05 0
{G} 0 0 1.08
@group {Lines}
@vectorlist {A=U&C=G} color= green off
P 0 0.5 0.5
.1 .4 .4
.25 .25 .25
.4 .1 .1
L 0.500, 0.000, 0.000
@vectorlist {A=G&C=U} color= red off
P 0.5 0 0.5
.25 .25 .25
L 0, 0.500, 0.000
@vectorlist {A=C&G=U} color= red off
P 0.5 0.5 0
.25 .25 .25
L 0.000, 0.000, 0.500
@group {SSU Bacteria} recessiveon
@dotlist {Stem} radius=0.03 color= orange
{a} .3 .1 .4
{b} r=.2 .1 .1 .1
@balllist {Junction} radius=.04
{c} red .4 .4 0
{}\t r=.1 green .3 .2 .1
@group {empty group}
@group {Nested}
@subgroup {First}
@spherelist color=\tpurple
{e} .1 .1 .1
@subgroup {Second} master=\t {master name}
@labellist {labels}
{U} 0 0 0
{A} 1.1 0 0
{C} 0 1.05 0
{G} 0 0 1.08
"""
EXP_EXAMPLE_1 =\
"""@kinemage 1
@viewid {oblique}
@zoom 1.05
@zslab 467
@center 0.500 0.289 0.204
@matrix
-0.55836 -0.72046 -0.41133 0.82346 -0.42101 -0.38036 0.10085 -0.55108
0.82833
@2viewid {top}
@2zoom 0.82
@2zslab 470
@2center 0.500 0.289 0.204
@2matrix
-0.38337 0.43731 -0.81351 0.87217 -0.11840 -0.47466 -0.30389 -0.89148
-0.33602
@3viewid {side}
@3zoom 0.82
@3zslab 470
@3center 0.500 0.289 0.204
@3matrix
-0.49808 -0.81559 -0.29450 0.86714 -0.46911 -0.16738 -0.00164 -0.33875
0.94088
@4viewid {End-O-Line}
@4zoom 1.43
@4zslab 469
@4center 0.500 0.289 0.204
@4matrix
0.00348 -0.99984 -0.01766 0.57533 -0.01244 0.81784 -0.81792 -0.01301
0.57519
@perspective
@fontsizelabel 24
@onewidth
@zclipoff
@localrotation 1 0 0 .5 .866 0 .5 .289 .816
@text
Kinemage of ribosomal RNA SSU Bacteria
@caption
SSU Bacteria secondary structure elements
@group {Tetrahedron}
@vectorlist {Edges} nobutton color=white
{0 0 0} P 0.0 0.0 0.0
0.5 0.0 0.0
{1 0 0} 1.0 0.0 0.0
0.5 0.5 0.0
{0 1 0} 0.0 1.0 0.0
P 0.0 0.0 0.0
0.0 0.5 0.0
{0 1 0} 0.0 1.0 0.0
0.0 0.5 0.5
{0 0 1} 0.0 0.0 1.0
P 0.0 0.0 0.0
0.0 0.0 0.5
{0 0 1} 0.0 0.0 1.0
0.5 0.0 0.5
{1 0 0} 1.0 0.0 0.0
@labellist {labels} nobutton color=white
{U} 0.0 0.0 0.0
{A} 1.1 0.0 0.0
{C} 0.0 1.05 0.0
{G} 0.0 0.0 1.08
@group {Lines}
@vectorlist {A=U&C=G} off color=green
P 0.0 0.5 0.5
0.1 0.4 0.4
0.25 0.25 0.25
0.4 0.1 0.1
L 0.5 0.0 0.0
@vectorlist {A=G&C=U} off color=red
P 0.5 0.0 0.5
0.25 0.25 0.25
L 0.0 0.5 0.0
@vectorlist {A=C&G=U} off color=red
P 0.5 0.5 0.0
0.25 0.25 0.25
L 0.0 0.0 0.5
@group {SSU Bacteria} recessiveon
@dotlist {Stem} color=orange radius=0.03
{a} 0.3 0.1 0.4
{b} r=0.2 0.1 0.1 0.1
@balllist {Junction} radius=.04
{c} red 0.4 0.4 0.0
{} green r=0.1 0.3 0.2 0.1
@group {empty group}
@group {Nested}
@subgroup {First}
@spherelist color=purple
{e} 0.1 0.1 0.1
@subgroup {Second} master={master name}
@labellist {labels}
{U} 0.0 0.0 0.0
{A} 1.1 0.0 0.0
{C} 0.0 1.05 0.0
{G} 0.0 0.0 1.08"""
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_parse/test_meme.py 000644 000765 000024 00000101172 12024702176 022112 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests for the MEME parser
"""
from __future__ import division
from cogent.util.unit_test import TestCase, main
import string
import re
from cogent.motif.util import Motif
from cogent.core.moltype import DNA
from cogent.parse.record import DelimitedSplitter
from cogent.parse.record_finder import LabeledRecordFinder
from cogent.parse.meme import *
__author__ = "Jeremy Widmann"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jeremy Widmann", "Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Jeremy Widmann"
__email__ = "jermey.widmann@colorado.edu"
__status__ = "Production"
class MemeTests(TestCase):
"""Tests for meme module.
"""
def setUp(self):
"""Setup function for meme tests.
"""
#Meme output data:
self.meme_file = MEME_FILE.split('\n')
self.meme_main = LabeledRecordFinder(lambda x: x.startswith('COMMAND'))
self.meme_command = LabeledRecordFinder(lambda x: x.startswith('MOTIF'))
self.meme_summary = LabeledRecordFinder(lambda x: x.startswith('SUMMARY'))
self.meme_module = LabeledRecordFinder(lambda x: x.startswith('Motif'))
self.alphabet_block, self.main_block = \
list(self.meme_main(self.meme_file))
self.cmd_mod_list = list(self.meme_command(self.main_block))
self.command_block = self.cmd_mod_list[0]
self.module_blocks = self.cmd_mod_list[1:]
self.summary_block = list(self.meme_summary(self.module_blocks[-1]))[1]
self.module_data_blocks = []
for module in self.module_blocks:
self.module_data_blocks.append(\
list(self.meme_module(module)))
#List and Dict for testing dictFromList function
self.sample_list = ['key1',1,'key2',2,'key3',3,'key4',4]
self.sample_dict = {'key1':1,
'key2':2,
'key3':3,
'key4':4,
}
#List of command line data
self.command_line_list = [
'model: mod= tcm nmotifs= 3 evt= 1e+100',
'object function= E-value of product of p-values',
'width: minw= 4 maxw= 10 minic= 0.00',
'width: wg= 11 ws= 1 endgaps= yes',
'nsites: minsites= 2 maxsites= 50 wnsites= 0.8',
'theta: prob= 1 spmap= uni spfuzz= 0.5',
'em: prior= dirichlet b= 0.01 maxiter= 20',
'distance= 1e-05',
'data: n= 597 N= 15',
'strands: +',
'sample: seed= 0 seqfrac= 1',
]
#List of dicts which contain general info for each module.
self.module_info_dicts = [
{'MOTIF':'1',
'width':'10',
'sites':'11',
'llr':'131',
'E-value':'1.3e-019',
},
{'MOTIF':'2',
'width':'7',
'sites':'11',
'llr':'88',
'E-value':'2.5e-006',
},
{'MOTIF':'3',
'width':'7',
'sites':'6',
'llr':'53',
'E-value':'5.5e-001',
},
]
#Summary dict
self.summary_dict = {'CombinedP':{
'1': float(3.48e-02),
'11': float(3.78e-05),
'17': float(2.78e-08),
'28': float(3.49e-06),
'105': float(3.98e-06),
'159': float(1.08e-02),
'402-C01': float(4.22e-07),
'407-A07': float(7.32e-08),
'410-A10': float(4.23e-04),
'505-D01': float(5.72e-07),
'507-B04-1': float(1.01e-04),
'518-D12': float(2.83e-06),
'621-H01': float(8.69e-07),
'625-H05': float(8.86e-06),
'629-C08': float(5.61e-07),
}
}
self.remap_dict = {
'11':'11',
'1':'1',
'407-A07':'407-A07',
'17':'17',
'159':'159',
'505-D01':'505-D01',
'28':'28',
'507-B04-1':'507-B04-1',
'402-C01':'402-C01',
'621-H01':'621-H01',
'629-C08':'629-C08',
'410-A10':'410-A10',
'105':'105',
'625-H05':'625-H05',
'518-D12':'518-D12'
}
#ModuleInstances and Modules
self.ModuleInstances = [
[ModuleInstance('CTATTGGGGC',Location('629-C08',18,28),
float(1.95e-06)),
ModuleInstance('CTATTGGGGC',Location('621-H01',45,55),
float(1.95e-06)),
ModuleInstance('CTATTGGGGC',Location('505-D01',26,36),
float(1.95e-06)),
ModuleInstance('CTATTGGGGC',Location('407-A07',5,15),
float(1.95e-06)),
ModuleInstance('CTATTGGGGC',Location('105',0,10),
float(1.95e-06)),
ModuleInstance('CTATTGGGGC',Location('28',3,13),
float(1.95e-06)),
ModuleInstance('CTATTGGGGC',Location('17',16,26),
float(1.95e-06)),
ModuleInstance('CTATTGGGCC',Location('402-C01',24,34),
float(3.30e-06)),
ModuleInstance('CTAGTGGGGC',Location('625-H05',2,12),
float(5.11e-06)),
ModuleInstance('CTAGTGGGCC',Location('11',15,25),
float(6.37e-06)),
ModuleInstance('CTATTGGGGT',Location('518-D12',0,10),
float(9.40e-06)),
],
[ModuleInstance('CGTTACG',Location('629-C08',37,44),
float(6.82e-05)),
ModuleInstance('CGTTACG',Location('621-H01',30,37),
float(6.82e-05)),
ModuleInstance('CGTTACG',Location('507-B04-1',8,15),
float(6.82e-05)),
ModuleInstance('CGTTACG',Location('410-A10',7,14),
float(6.82e-05)),
ModuleInstance('CGTTACG',Location('407-A07',26,33),
float(6.82e-05)),
ModuleInstance('CGTTACG',Location('17',0,7),
float(6.82e-05)),
ModuleInstance('TGTTACG',Location('625-H05',32,39),
float(1.74e-04)),
ModuleInstance('TGTTACG',Location('505-D01',3,10),
float(1.74e-04)),
ModuleInstance('CATTACG',Location('518-D12',30,37),
float(2.14e-04)),
ModuleInstance('CGGTACG',Location('402-C01',1,8),
float(2.77e-04)),
ModuleInstance('TGTTCCG',Location('629-C08',5,12),
float(6.45e-04)),
],
[ModuleInstance('CTATTGG',Location('629-C08',57,64),
float(1.06e-04)),
ModuleInstance('CTATTGG',Location('507-B04-1',42,49),
float(1.06e-04)),
ModuleInstance('CTATTGG',Location('410-A10',27,34),
float(1.06e-04)),
ModuleInstance('CTATTGG',Location('159',14,21),
float(1.06e-04)),
ModuleInstance('CTATTGG',Location('1',18,25),
float(1.06e-04)),
ModuleInstance('CTAATGG',Location('507-B04-1',28,35),
float(1.63e-04)),
],
]
self.Modules = []
for module, info in zip(self.ModuleInstances, self.module_info_dicts):
curr_module_data = {}
for instance in module:
curr_module_data[(instance.Location.SeqId,
instance.Location.Start)] = instance
temp_module = Module(curr_module_data, MolType=DNA,
Evalue=float(info['E-value']),
Llr=int(info['llr']))
self.Modules.append(temp_module)
self.ConsensusSequences = ['CTATTGGGGC','CGTTACG','CTATTGG']
def test_get_data_block(self):
"""getDataBlock should return the main block and the alphabet."""
main_block, alphabet = getDataBlock(self.meme_file)
self.assertEqual(main_block,self.main_block)
self.assertEqual(alphabet, DNA)
def test_get_alphabet(self):
"""getMolType should return the correct alphabet."""
self.assertEqual(getMolType(self.alphabet_block),DNA)
def test_get_command_module_blocks(self):
"""getCommandModuleBlocks should return the command and module blocks.
"""
command_block, module_blocks = getCommandModuleBlocks(self.main_block)
self.assertEqual(command_block, self.command_block)
self.assertEqual(module_blocks, self.module_blocks)
def test_get_summary_block(self):
"""getSummaryBlock should return the MEME summary block."""
self.assertEqual(getSummaryBlock(self.module_blocks[-1]),
self.summary_block)
def test_dict_from_list(self):
"""dictFromList should return a dict given a list."""
self.assertEqual(dictFromList(self.sample_list),self.sample_dict)
def test_extract_command_line_data(self):
"""extractCommandLineData should return a dict of command line data."""
self.assertEqual(extractCommandLineData(self.command_block),
self.command_line_list)
def test_get_module_data_blocks(self):
"""getModuleDataBlocks should return a list of blocks for each module.
"""
self.assertEqual(getModuleDataBlocks(self.module_blocks),
self.module_data_blocks)
def test_extract_module_data(self):
"""extractModuleData should return a Module object."""
for data, module in zip(self.module_data_blocks,self.Modules):
ans = extractModuleData(data,DNA,self.remap_dict)
self.assertEqual(ans,module)
def test_get_consensus_sequence(self):
"""getConsensusSequence should return Module's Consensus sequence."""
for data,seq in zip(self.module_data_blocks,self.ConsensusSequences):
ans = getConsensusSequence(data[1])
self.assertEqual(ans,seq)
def test_get_module_general_info(self):
"""getModuleGeneralInfo should return a dict of Module info."""
for module, data_dict in zip(self.module_data_blocks,
self.module_info_dicts):
self.assertEqual(getModuleGeneralInfo(module[0][0]),data_dict)
#Test that getModuleGeneralInfo can parse general info line when
# motif ID is > 100. MEME changes the format of this line when in this
# case.
data_line_special = \
'MOTIF100 width = 50 sites = 2 llr = 273 E-value = 3.1e-007'
expected = {'MOTIF':'100','width':'50','sites':'2','llr':'273',\
'E-value':'3.1e-007'}
self.assertEqual(getModuleGeneralInfo(data_line_special),expected)
def test_extract_summary_data(self):
"""extractSummaryData should return a dict of MEME summary data."""
self.assertEqual(extractSummaryData(self.summary_block),
self.summary_dict)
def test_meme_parser(self):
"""MemeParser should correctly return a MotifResults object."""
test_motif_results = MotifResults([],[],{})
test_motif_results.Results = self.summary_dict
test_motif_results.Results['Warnings']=[]
test_motif_results.Parameters = self.command_line_list
test_motif_results.Modules = self.Modules
ans_motif_results = MemeParser(self.meme_file)
self.assertEqual(ans_motif_results.Modules,test_motif_results.Modules)
self.assertEqual(ans_motif_results.Results,test_motif_results.Results)
self.assertEqual(ans_motif_results.Parameters,
test_motif_results.Parameters)
MEME_FILE = """
********************************************************************************
MEME - Motif discovery tool
********************************************************************************
MEME version 3.0 (Release date: 2001/03/03 13:05:22)
For further information on how to interpret these results or to get
a copy of the MEME software please access http://meme.sdsc.edu.
This file may be used as input to the MAST algorithm for searching
sequence databases for matches to groups of motifs. MAST is available
for interactive use and downloading at http://meme.sdsc.edu.
********************************************************************************
********************************************************************************
REFERENCE
********************************************************************************
If you use this program in your research, please cite:
Timothy L. Bailey and Charles Elkan,
"Fitting a mixture model by expectation maximization to discover
motifs in biopolymers", Proceedings of the Second International
Conference on Intelligent Systems for Molecular Biology, pp. 28-36,
AAAI Press, Menlo Park, California, 1994.
********************************************************************************
********************************************************************************
TRAINING SET
********************************************************************************
DATAFILE= meme.16346.data
ALPHABET= ACGT
Sequence name Weight Length Sequence name Weight Length
------------- ------ ------ ------------- ------ ------
1 1.0000 26 11 1.0000 25
17 1.0000 26 28 1.0000 26
105 1.0000 21 159 1.0000 21
402-C01 1.0000 34 407-A07 1.0000 34
410-A10 1.0000 34 505-D01 1.0000 49
507-B04-1 1.0000 49 518-D12 1.0000 49
621-H01 1.0000 74 625-H05 1.0000 65
629-C08 1.0000 64
********************************************************************************
********************************************************************************
COMMAND LINE SUMMARY
********************************************************************************
This information can also be useful in the event you wish to report a
problem with the MEME software.
command: meme meme.16346.data -dna -mod tcm -nmotifs 3 -minw 4 -maxw 10 -evt 1e100 -time 720 -maxsize 60000 -nostatus -maxiter 20
model: mod= tcm nmotifs= 3 evt= 1e+100
object function= E-value of product of p-values
width: minw= 4 maxw= 10 minic= 0.00
width: wg= 11 ws= 1 endgaps= yes
nsites: minsites= 2 maxsites= 50 wnsites= 0.8
theta: prob= 1 spmap= uni spfuzz= 0.5
em: prior= dirichlet b= 0.01 maxiter= 20
distance= 1e-05
data: n= 597 N= 15
strands: +
sample: seed= 0 seqfrac= 1
Letter frequencies in dataset:
A 0.173 C 0.206 G 0.299 T 0.322
Background letter frequencies (from dataset with add-one prior applied):
A 0.173 C 0.207 G 0.298 T 0.322
********************************************************************************
********************************************************************************
MOTIF 1 width = 10 sites = 11 llr = 131 E-value = 1.3e-019
********************************************************************************
--------------------------------------------------------------------------------
Motif 1 Description
--------------------------------------------------------------------------------
Simplified A ::a:::::::
pos.-specific C a:::::::29
probability G :::2:aaa8:
matrix T :a:8a::::1
bits 2.5 *
2.3 * *
2.0 * *
1.8 * * *** *
Information 1.5 *** **** *
content 1.3 *** ******
(17.2 bits) 1.0 **********
0.8 **********
0.5 **********
0.3 **********
0.0 ----------
Multilevel CTATTGGGGC
consensus
sequence
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 1 sites sorted by position p-value
--------------------------------------------------------------------------------
Sequence name Start P-value Site
------------- ----- --------- ----------
629-C08 19 1.95e-06 TCCGTGAACA CTATTGGGGC GTGTAAGAGC
621-H01 46 1.95e-06 CGCATGCGTG CTATTGGGGC GTCATTTGTC
505-D01 27 1.95e-06 TTGATTGTTG CTATTGGGGC ATTGCCGTAC
407-A07 6 1.95e-06 CGTTA CTATTGGGGC GGGTATTTTC
105 1 1.95e-06 . CTATTGGGGC CGAAATGGTT
28 4 1.95e-06 TCC CTATTGGGGC CAAGGGCTAC
17 17 1.95e-06 GCTACTTGTG CTATTGGGGC
402-C01 25 3.30e-06 CTTAACATTC CTATTGGGCC
625-H05 3 5.11e-06 GC CTAGTGGGGC AGCTGACAGA
11 16 6.37e-06 TGTTAGACAG CTAGTGGGCC
518-D12 1 9.40e-06 . CTATTGGGGT GTTGTATTGA
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 1 block diagrams
--------------------------------------------------------------------------------
SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM
------------- ---------------- -------------
629-C08 2e-06 18_[1]_36
621-H01 2e-06 45_[1]_19
505-D01 2e-06 26_[1]_13
407-A07 2e-06 5_[1]_19
105 2e-06 [1]_11
28 2e-06 3_[1]_13
17 2e-06 16_[1]
402-C01 3.3e-06 24_[1]
625-H05 5.1e-06 2_[1]_53
11 6.4e-06 15_[1]
518-D12 9.4e-06 [1]_39
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 1 in BLOCKS format
--------------------------------------------------------------------------------
BL MOTIF 1 width=10 seqs=11
629-C08 ( 19) CTATTGGGGC 1
621-H01 ( 46) CTATTGGGGC 1
505-D01 ( 27) CTATTGGGGC 1
407-A07 ( 6) CTATTGGGGC 1
105 ( 1) CTATTGGGGC 1
28 ( 4) CTATTGGGGC 1
17 ( 17) CTATTGGGGC 1
402-C01 ( 25) CTATTGGGCC 1
625-H05 ( 3) CTAGTGGGGC 1
11 ( 16) CTAGTGGGCC 1
518-D12 ( 1) CTATTGGGGT 1
//
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 1 position-specific scoring matrix
--------------------------------------------------------------------------------
log-odds matrix: alength= 4 w= 10 n= 462 bayes= 6.40183 E= 1.3e-019
-1010 227 -1010 -1010
-1010 -1010 -1010 164
253 -1010 -1010 -1010
-1010 -1010 -71 135
-1010 -1010 -1010 164
-1010 -1010 174 -1010
-1010 -1010 174 -1010
-1010 -1010 174 -1010
-1010 -19 146 -1010
-1010 214 -1010 -182
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 1 position-specific probability matrix
--------------------------------------------------------------------------------
letter-probability matrix: alength= 4 w= 10 nsites= 11 E= 1.3e-019
0.000000 1.000000 0.000000 0.000000
0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 0.000000
0.000000 0.000000 0.181818 0.818182
0.000000 0.000000 0.000000 1.000000
0.000000 0.000000 1.000000 0.000000
0.000000 0.000000 1.000000 0.000000
0.000000 0.000000 1.000000 0.000000
0.000000 0.181818 0.818182 0.000000
0.000000 0.909091 0.000000 0.090909
--------------------------------------------------------------------------------
Time 0.54 secs.
********************************************************************************
********************************************************************************
MOTIF 2 width = 7 sites = 11 llr = 88 E-value = 2.5e-006
********************************************************************************
--------------------------------------------------------------------------------
Motif 2 Description
--------------------------------------------------------------------------------
Simplified A :1::9::
pos.-specific C 7:::1a:
probability G :91:::a
matrix T 3:9a:::
bits 2.5
2.3 *
2.0 **
1.8 ***
Information 1.5 ****
content 1.3 *******
(11.6 bits) 1.0 *******
0.8 *******
0.5 *******
0.3 *******
0.0 -------
Multilevel CGTTACG
consensus T
sequence
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 2 sites sorted by position p-value
--------------------------------------------------------------------------------
Sequence name Start P-value Site
------------- ----- --------- -------
629-C08 38 6.82e-05 CGTGTAAGAG CGTTACG TGTTCCGTGA
621-H01 31 6.82e-05 CGAGGGAGTA CGTTACG CATGCGTGCT
507-B04-1 9 6.82e-05 CTTGCACA CGTTACG TGTGAGCCAT
410-A10 8 6.82e-05 CTTTGCT CGTTACG TGGTTGTATG
407-A07 27 6.82e-05 GGTATTTTCC CGTTACG T
17 1 6.82e-05 . CGTTACG CTACTTGTGC
625-H05 33 1.74e-04 ATAGGTCGAC TGTTACG GTTAGCGTTC
505-D01 4 1.74e-04 GCA TGTTACG TGACTTTTGA
518-D12 31 2.14e-04 GTTATTGCGA CATTACG CGTTCTGGTT
402-C01 2 2.77e-04 C CGGTACG GTTTGTCTTA
629-C08 6 6.45e-04 TTACG TGTTCCG TGAACACTAT
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 2 block diagrams
--------------------------------------------------------------------------------
SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM
------------- ---------------- -------------
629-C08 0.00064 5_[2]_25_[2]_20
621-H01 6.8e-05 30_[2]_37
507-B04-1 6.8e-05 8_[2]_34
410-A10 6.8e-05 7_[2]_20
407-A07 6.8e-05 26_[2]_1
17 6.8e-05 [2]_19
625-H05 0.00017 32_[2]_26
505-D01 0.00017 3_[2]_39
518-D12 0.00021 30_[2]_12
402-C01 0.00028 1_[2]_26
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 2 in BLOCKS format
--------------------------------------------------------------------------------
BL MOTIF 2 width=7 seqs=11
629-C08 ( 38) CGTTACG 1
621-H01 ( 31) CGTTACG 1
507-B04-1 ( 9) CGTTACG 1
410-A10 ( 8) CGTTACG 1
407-A07 ( 27) CGTTACG 1
17 ( 1) CGTTACG 1
625-H05 ( 33) TGTTACG 1
505-D01 ( 4) TGTTACG 1
518-D12 ( 31) CATTACG 1
402-C01 ( 2) CGGTACG 1
629-C08 ( 6) TGTTCCG 1
//
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 2 position-specific scoring matrix
--------------------------------------------------------------------------------
log-odds matrix: alength= 4 w= 7 n= 507 bayes= 6.53743 E= 2.5e-006
-1010 181 -1010 -24
-93 -1010 161 -1010
-1010 -1010 -171 150
-1010 -1010 -1010 164
239 -118 -1010 -1010
-1010 227 -1010 -1010
-1010 -1010 174 -1010
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 2 position-specific probability matrix
--------------------------------------------------------------------------------
letter-probability matrix: alength= 4 w= 7 nsites= 11 E= 2.5e-006
0.000000 0.727273 0.000000 0.272727
0.090909 0.000000 0.909091 0.000000
0.000000 0.000000 0.090909 0.909091
0.000000 0.000000 0.000000 1.000000
0.909091 0.090909 0.000000 0.000000
0.000000 1.000000 0.000000 0.000000
0.000000 0.000000 1.000000 0.000000
--------------------------------------------------------------------------------
Time 0.85 secs.
********************************************************************************
********************************************************************************
MOTIF 3 width = 7 sites = 6 llr = 53 E-value = 5.5e-001
********************************************************************************
--------------------------------------------------------------------------------
Motif 3 Description
--------------------------------------------------------------------------------
Simplified A ::a2:::
pos.-specific C a::::::
probability G :::::aa
matrix T :a:8a::
bits 2.5 *
2.3 * *
2.0 * *
1.8 * * **
Information 1.5 *** ***
content 1.3 *** ***
(12.7 bits) 1.0 *******
0.8 *******
0.5 *******
0.3 *******
0.0 -------
Multilevel CTATTGG
consensus
sequence
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 3 sites sorted by position p-value
--------------------------------------------------------------------------------
Sequence name Start P-value Site
------------- ----- --------- -------
629-C08 58 1.06e-04 TCCGTGAACA CTATTGG
507-B04-1 43 1.06e-04 TGGTGTTGCG CTATTGG
410-A10 28 1.06e-04 TTGTATGCCG CTATTGG
159 15 1.06e-04 GACCGTTGGT CTATTGG
1 19 1.06e-04 TTGGATAGTG CTATTGG G
507-B04-1 29 1.63e-04 GAGCCATTCT CTAATGG TGTTGCGCTA
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 3 block diagrams
--------------------------------------------------------------------------------
SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM
------------- ---------------- -------------
629-C08 0.00011 57_[3]
507-B04-1 0.00016 28_[3]_7_[3]
410-A10 0.00011 27_[3]
159 0.00011 14_[3]
1 0.00011 18_[3]_1
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 3 in BLOCKS format
--------------------------------------------------------------------------------
BL MOTIF 3 width=7 seqs=6
629-C08 ( 58) CTATTGG 1
507-B04-1 ( 43) CTATTGG 1
410-A10 ( 28) CTATTGG 1
159 ( 15) CTATTGG 1
1 ( 19) CTATTGG 1
507-B04-1 ( 29) CTAATGG 1
//
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 3 position-specific scoring matrix
--------------------------------------------------------------------------------
log-odds matrix: alength= 4 w= 7 n= 507 bayes= 6.83576 E= 5.5e-001
-923 227 -923 -923
-923 -923 -923 164
253 -923 -923 -923
-6 -923 -923 137
-923 -923 -923 164
-923 -923 174 -923
-923 -923 174 -923
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 3 position-specific probability matrix
--------------------------------------------------------------------------------
letter-probability matrix: alength= 4 w= 7 nsites= 6 E= 5.5e-001
0.000000 1.000000 0.000000 0.000000
0.000000 0.000000 0.000000 1.000000
1.000000 0.000000 0.000000 0.000000
0.166667 0.000000 0.000000 0.833333
0.000000 0.000000 0.000000 1.000000
0.000000 0.000000 1.000000 0.000000
0.000000 0.000000 1.000000 0.000000
--------------------------------------------------------------------------------
Time 1.09 secs.
********************************************************************************
********************************************************************************
SUMMARY OF MOTIFS
********************************************************************************
--------------------------------------------------------------------------------
Combined block diagrams: non-overlapping sites with p-value < 0.0001
--------------------------------------------------------------------------------
SEQUENCE NAME COMBINED P-VALUE MOTIF DIAGRAM
------------- ---------------- -------------
1 3.48e-02 26
11 3.78e-05 15_[1(6.37e-06)]
17 2.78e-08 [2(6.82e-05)]_9_[1(1.95e-06)]
28 3.49e-06 3_[1(1.95e-06)]_13
105 3.98e-06 [1(1.95e-06)]_11
159 1.08e-02 21
402-C01 4.22e-07 24_[1(3.30e-06)]
407-A07 7.32e-08 5_[1(1.95e-06)]_11_[2(6.82e-05)]_1
410-A10 4.23e-04 7_[2(6.82e-05)]_20
505-D01 5.72e-07 26_[1(1.95e-06)]_13
507-B04-1 1.01e-04 8_[2(6.82e-05)]_34
518-D12 2.83e-06 [1(9.40e-06)]_39
621-H01 8.69e-07 30_[2(6.82e-05)]_8_[1(1.95e-06)]_19
625-H05 8.86e-06 2_[1(5.11e-06)]_53
629-C08 5.61e-07 18_[1(1.95e-06)]_9_[2(6.82e-05)]_20
--------------------------------------------------------------------------------
********************************************************************************
********************************************************************************
Stopped because nmotifs = 3 reached.
********************************************************************************
CPU: compute-0-2.local
********************************************************************************
"""
#run if called from command-line
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_parse/test_mothur.py 000644 000765 000024 00000001731 12024702176 022505 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
#file cogent.parse.mothur.py
__author__ = "Kyle Bittinger"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Kyle Bittinger"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Kyle Bittinger"
__email__ = "kylebittinger@gmail.com"
__status__ = "Prototype"
from cStringIO import StringIO
from cogent.util.unit_test import TestCase, main
from cogent.parse.mothur import parse_otu_list
class FunctionTests(TestCase):
def test_parse_otu_list(self):
observed = list(parse_otu_list(StringIO(mothur_output)))
expected = [
(0.0, [['cccccc'], ['bbbbbb'], ['aaaaaa']]),
(0.62, [['bbbbbb', 'cccccc'], ['aaaaaa']]),
(0.67000000000000004, [['aaaaaa', 'bbbbbb', 'cccccc']])
]
self.assertEqual(observed, expected)
mothur_output = """\
unique 3 cccccc bbbbbb aaaaaa
0.62 2 bbbbbb,cccccc aaaaaa
0.67 1 aaaaaa,bbbbbb,cccccc
"""
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_msms.py 000644 000765 000024 00000001714 12024702176 022147 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
import os, tempfile
from cogent.util.unit_test import TestCase, main
from cogent.parse.msms import parse_VertFile
try:
from cStringIO import StringIO
except ImportError:
from StringIO import StringIO
__author__ = "Marcin Cieslik"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Marcin Cieslik"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Marcin Cieslik"
__email__ = "mpc4p@virginia.edu"
__status__ = "Development"
class MsmsTest(TestCase):
"""Tests for Msms application output parsers"""
def setUp(self):
vs = "1. 2. 3.\n" + \
"4. 5. 6.\n" + \
"7. 8. 9.\n"
self.vertfile = StringIO(vs)
def test_parseVertFile(self):
out_arr = parse_VertFile(self.vertfile)
assert out_arr.dtype == 'float64'
assert out_arr.shape == (3,3)
assert out_arr[0][0] == 1.
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_ncbi_taxonomy.py 000644 000765 000024 00000040610 12024702176 024037 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests of parsers for dealing with NCBI Taxonomy files.
"""
from cogent.parse.ncbi_taxonomy import MissingParentError, NcbiTaxon, \
NcbiTaxonParser, NcbiTaxonLookup, NcbiName, NcbiNameParser, \
NcbiNameLookup, \
NcbiTaxonomy, NcbiTaxonNode, NcbiTaxonomyFromFiles
from cogent.util.unit_test import TestCase, main
__author__ = "Jason Carnes"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jason Carnes", "Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
good_nodes = '''1\t|\t1\t|\tno rank\t|\t\t|\t8\t|\t0\t|\t1\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t|
2\t|\t1\t|\tsuperkingdom\t|\t\t|\t0\t|\t0\t|\t11\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t|
6\t|\t2\t|\tgenus\t|\t\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t|\t
7\t|\t6\t|\tspecies\t|\tAC\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t|
9\t|\t7\t|\tsubspecies\t|\tBA\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t|
10\t|\t6\t|\tspecies\t|\tAC\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t|
'''.split('\n')
bad_nodes = '''1\t|\t1\t|\tno rank\t|\t\t|\t8\t|\t0\t|\t1\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t|
2\t|\t1\t|\tsuperkingdom\t|\t\t|\t0\t|\t0\t|\t11\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t|
6\t|\t2\t|\tgenus\t|\t\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t|\t
7\t|\t6\t|\tspecies\t|\tAC\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t|
9\t|\t777\t|\tsubspecies\t|\tBA\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t|
10\t|\t666\t|\tspecies\t|\tAC\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t|
'''.split('\n')
good_names = '''1\t|\tall\t|\t\t|\tsynonym\t|
1\t|\troot\t|\t\t|\tscientific name\t|
2\t|\tBacteria\t|\tBacteria \t|\tscientific name\t|
2\t|\tMonera\t|\tMonera \t|\tin-part\t|
2\t|\tProcaryotae\t|\tProcaryotae <#1>\t|\tin-part\t|
2\t|\tProkaryotae\t|\tProkaryotae <#1>\t|\tin-part\t|
2\t|\teubacteria\t|\t\t|\tgenbank common name\t|
6\t|\tAzorhizobium\t|\t\t|\tscientific name\t|
7\t|\tAzorhizobium caulinodans\t|\t\t|\tscientific name\t|
9\t|\tBuchnera aphidicola\t|\t\t|\tscientific name\t|
10\t|\tFakus namus\t|\t\t|\tscientific name\t|
'''.split('\n')
class NcbiTaxonTests(TestCase):
"""Tests proper parsing of NCBI node file, e.g. nodes.dmp"""
def test_init(self):
"""NcbiTaxon init should return object containing taxonomy data"""
good_1 = '''1\t|\t1\t|\tno rank\t|\t\t|\t8\t|\t0\t|\t1\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t|\n'''
good_2 = '''2\t|\t1\t|\tsuperkingdom\t|\t\t|\t0\t|\t0\t|\t11\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t|\n'''
good_3 = '''6\t|\t2\t|\tgenus\t|\t\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t|\n'''
good_4 = '''7\t|\t6\t|\tspecies\t|\tAC\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t|\n'''
node_1 = NcbiTaxon(good_1) #make a NcbiTaxon object
node_2 = NcbiTaxon(good_2) #from the corresponding
node_3 = NcbiTaxon(good_3) #line.
node_4 = NcbiTaxon(good_4)
self.assertEqual(node_1.Rank, 'no rank') #confirm object holds
self.assertEqual(node_1.RankId, 28) #right data
self.assertEqual(node_1.ParentId, 1)
self.assertEqual(node_2.Rank, 'superkingdom')
self.assertEqual(node_2.RankId, 27)
self.assertEqual(node_2.ParentId, 1)
self.assertEqual(node_3.Rank, 'genus')
self.assertEqual(node_3.RankId, 8)
self.assertEqual(node_3.ParentId, 2)
self.assertEqual(node_4.Rank, 'species')
self.assertEqual(node_4.RankId, 4)
self.assertEqual(node_4.ParentId, 6)
#test some comparisons
assert node_1 > node_2
assert node_1 > node_3
assert node_1 > node_4
assert node_1 == node_1
assert node_2 < node_1
assert node_2 == node_2
assert node_4 < node_1
assert node_3 > node_4
def test_str(self):
"""NcbiTaxon str should write data in input format from nodes.dmp"""
good = '''2\t|\t1\t|\tsuperkingdom\t|\t\t|\t0\t|\t0\t|\t11\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t|\n'''
node = NcbiTaxon(good)
self.assertEqual(str(node), good)
root = '''1\t|\t1\t|\tno rank\t|\t\t|\t8\t|\t0\t|\t1\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t|'''
root_node = NcbiTaxon(root)
self.assertEqual(str(root), root)
def test_bad_input(self):
"""NcbiTaxon init should raise ValueError if nodes missing"""
bad_node_taxid = '''\t|\t6\t|\tspecies\t|\tAC\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t|\n''' #contains no taxon_id; not valid
bad_node_parentid = '''7\t|\t\t|\tspecies\t|\tAC\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t|\n''' #contains no parent_id; not valid
self.assertRaises(ValueError, NcbiTaxon, bad_node_taxid)
self.assertRaises(ValueError, NcbiTaxon, bad_node_parentid)
class NcbiNameTests(TestCase):
"""Tests proper parsing NCBI name file, e.g. names.dmp."""
def test_init(self):
"""NcbiName should init OK with well-formed name line"""
line_1 = '''1\t|\tall\t|\t\t|\tsynonym\t|\n'''
line_2 = '''1\t|\troot\t|\t\t|\tscientific name\t|\n'''
line_3 = '''2\t|\tBacteria\t|\tBacteria \t|\tscientific name\t|\n'''
line_4 = '''7\t|\tAzorhizobium caulinodans\t|\t\t|\tscientific name\t|\n'''
name_1 = NcbiName(line_1) #make an NcbiName object
name_2 = NcbiName(line_2) #from the corresponding line
name_3 = NcbiName(line_3)
name_4 = NcbiName(line_4)
self.assertEqual(name_1.TaxonId, 1) #test that the data
self.assertEqual(name_1.NameClass, 'synonym') #fields in the object
self.assertEqual(name_2.TaxonId, 1) #hold right data
self.assertEqual(name_2.NameClass, 'scientific name')
self.assertEqual(name_3.TaxonId, 2)
self.assertEqual(name_3.NameClass, 'scientific name')
self.assertEqual(name_4.TaxonId, 7)
self.assertEqual(name_4.NameClass, 'scientific name')
def test_str(self):
"""NcbiName str should return line in original format"""
line = '''1\t|\troot\t|\t\t|\tscientific name|\n'''
name = NcbiName(line)
self.assertEqual(str(name), line)
def test_bad_input(self):
"""NcbiName init should raise correct errors on bad data"""
bad_name_taxid = '''\t|\troot\t|\t\t|\tscientific name\t|\n'''#no tax_id
self.assertRaises(ValueError, NcbiName, bad_name_taxid)
class NcbiNameLookupTest(TestCase):
"""Tests of the NcbiNameLookup factory function."""
def test_init(self):
"""NcbiNameLookup should map taxon ids to scientific names"""
names = list(NcbiNameParser(good_names)) #list of objects
sci_names = NcbiNameLookup(names) #NcbiNameLookup object
root = names[1] #NcbiName object made from 2nd line of good_name_file
bacteria = names[2] #from 3rd line of good_name_file
azorhizobium = names[7]
caulinodans = names[8]
assert (sci_names[1] is root) #gets NcbiName object from the
assert (sci_names[2] is bacteria) #NcbiNameLookup object and
assert (sci_names[6] is azorhizobium) #asks if it is the original
assert (sci_names[7] is caulinodans) #NcbiName object
self.assertEqual(sci_names[1].Name, 'root')
self.assertEqual(sci_names[2].Name, 'Bacteria')
self.assertEqual(sci_names[7].Name, 'Azorhizobium caulinodans')
self.assertEqual(sci_names[9].Name, 'Buchnera aphidicola')
class NcbiTaxonLookupTest(TestCase):
"""Tests of the NcbiTaxonLookup factory function."""
def setUp(self):
"""Sets up the class tests"""
self.names = list(NcbiNameParser(good_names))
self.nodes = list(NcbiTaxonParser(good_nodes))
self.taxID_to_obj = NcbiTaxonLookup(self.nodes)
self.names_to_obj = NcbiNameLookup(self.names)
def test_init(self):
"""NcbiTaxonLookup should have correct fields for input NcbiTaxon"""
line1_obj = self.nodes[0] #NcbiTaxon objects made from lines of
line2_obj = self.nodes[1] #good_node_file
line3_obj = self.nodes[2]
line4_obj = self.nodes[3]
line5_obj = self.nodes[4]
assert (self.taxID_to_obj[1] is line1_obj) #gets NcbiTaxon object from
assert (self.taxID_to_obj[2] is line2_obj) #NcbiTaxonLookup object &
assert (self.taxID_to_obj[6] is line3_obj) #asks if it is the original
assert (self.taxID_to_obj[7] is line4_obj) #NcbiTaxon object
assert (self.taxID_to_obj[9] is line5_obj)
self.assertEqual(self.taxID_to_obj[1].ParentId, 1) #checking a few
self.assertEqual(self.taxID_to_obj[2].ParentId, 1) #individual
self.assertEqual(self.taxID_to_obj[6].ParentId, 2) #fields of the
self.assertEqual(self.taxID_to_obj[7].ParentId, 6) #NcbiTaxon objs
self.assertEqual(self.taxID_to_obj[9].ParentId, 7)
self.assertEqual(self.taxID_to_obj[1].Rank, 'no rank')
self.assertEqual(self.taxID_to_obj[2].Rank, 'superkingdom')
self.assertEqual(self.taxID_to_obj[6].Rank, 'genus')
self.assertEqual(self.taxID_to_obj[7].Rank, 'species')
self.assertEqual(self.taxID_to_obj[7].EmblCode, 'AC')
self.assertEqual(self.taxID_to_obj[7].DivisionId, '0')
self.assertEqual(self.taxID_to_obj[7].DivisionInherited, 1)
self.assertEqual(self.taxID_to_obj[7].TranslTable, 11)
self.assertEqual(self.taxID_to_obj[7].TranslTableInherited, 1)
self.assertEqual(self.taxID_to_obj[7].TranslTableMt, 0)
self.assertEqual(self.taxID_to_obj[7].TranslTableMtInherited, 1)
class NcbiTaxonomyTests(TestCase):
"""Tests of the NcbiTaxonomy class."""
def setUp(self):
self.tx = NcbiTaxonomyFromFiles(good_nodes, good_names)
def test_init_good(self):
"""NcbiTaxonomyFromFiles should pass spot-checks of resulting objects"""
self.assertEqual(len(self.tx.ByName), 6)
self.assertEqual(len(self.tx.ById), 6)
self.assertEqual(self.tx[10].Name, 'Fakus namus')
self.assertEqual(self.tx['1'].Name, 'root')
self.assertEqual(self.tx['root'].Parent, None)
self.assertEqual(self.tx.Deadbeats, {})
def test_init_bad(self):
"""NcbiTaxonomyFromFiles should produce deadbeats by default"""
bad_tx = NcbiTaxonomyFromFiles(bad_nodes, good_names)
self.assertEqual(len(bad_tx.Deadbeats), 2)
assert 777 in bad_tx.Deadbeats
assert 666 in bad_tx.Deadbeats
assert bad_tx.Deadbeats[777] == bad_tx[9]
def test_init_strict(self):
"""NcbiTaxonomyFromFiles should fail if strict and deadbeats exist"""
tx = NcbiTaxonomyFromFiles(good_nodes, good_names, strict=True)
self.assertRaises(MissingParentError, NcbiTaxonomyFromFiles, \
bad_nodes, good_names, strict=True)
def test_Ancestors(self):
"""NcbiTaxonomy should support Ancestors correctly, not incl. self"""
result = self.tx['7'].ancestors()
tax_ids = [taxon_obj.TaxonId for taxon_obj in result]
self.assertEqual(tax_ids, [6, 2, 1])
def test_Parent(self):
"""NcbiTaxonomy should support Parent correctly"""
assert self.tx[7].Parent is self.tx[6]
assert self.tx[6].Parent is self.tx[2]
assert self.tx[2].Parent is self.tx[1]
assert self.tx[1].Parent is None
def test_Siblings(self):
"""NcbiTaxonomy should support Siblings correctly"""
sibs = self.tx[7].siblings()
self.assertEqual(len(sibs), 1)
assert sibs[0] is self.tx[10]
def test_Children(self):
"""NcbiTaxonomy should support Children correctly"""
children = self.tx[6].Children
self.assertEqual(len(children), 2)
assert children[0] is self.tx[7]
assert children[1] is self.tx[10]
root_kids = self.tx['root']
self.assertEqual(len(root_kids), 1)
assert root_kids[0] is self.tx[2]
self.assertEqual(len(self.tx[10].Children), 0)
def test_Names(self):
"""NcbiTaxonomy should fill in names correctly"""
self.assertEqual(self.tx['6'].Name, 'Azorhizobium')
self.assertEqual(self.tx['1'].Name, 'root')
self.assertEqual(self.tx['2'].Name, 'Bacteria')
self.assertEqual(self.tx['7'].Name, 'Azorhizobium caulinodans')
def test_lastCommonAncestor(self):
"""NcbiTaxonomy should support lastCommonAncestor()"""
assert self.tx[9].lastCommonAncestor(self.tx[9]) is self.tx[9]
assert self.tx[9].lastCommonAncestor(self.tx[7]) is self.tx[7]
assert self.tx[9].lastCommonAncestor(self.tx[10]) is self.tx[6]
assert self.tx[9].lastCommonAncestor(self.tx[1]) is self.tx[1]
class NcbiTaxonNodeTests(TestCase):
"""Tests of the NcbiTaxonNode class.
Note: only testing methods that differ from the TreeNode base class.
Note: nested_species is explicitly designed to test the case where the nodes
file does _not_ contain the root, and where the id of the de facto
root is not 1, to make sure there's nothing special about a node
called 'root' or with id 1.
"""
def test_getRankedDescendants(self):
"""NcbiTaxonNode getRankedDescendants should return correct list"""
nested_species = '''3\t|\t3\t|\tsuperkingdom\t|\t\t|\t8\t|\t0\t|\t1\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t|
11\t|\t3\t|\tkingdom\t|\t\t|\t8\t|\t0\t|\t1\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t|
22\t|\t11\t|\tclass\t|\t\t|\t8\t|\t0\t|\t1\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t|
44\t|\t22\t|\torder\t|\t\t|\t0\t|\t0\t|\t11\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t|
66\t|\t22\t|\torder\t|\t\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t|\t
77\t|\t66\t|\tfamily\t|\t\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t|
99\t|\t66\t|\tfamily\t|\t\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t|
88\t|\t44\t|\tfamily\t|\t\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t|
101\t|\t77\t|\tgenus\t|\t\t|\t8\t|\t0\t|\t1\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t|
202\t|\t77\t|\tgenus\t|\t\t|\t0\t|\t0\t|\t11\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t|
606\t|\t99\t|\tgenus\t|\t\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t|\t
707\t|\t88\t|\tgenus\t|\t\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t|
909\t|\t88\t|\tgenus\t|\t\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t|
123\t|\t909\t|\tgroup\t|\t\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t|
1111\t|\t123\t|\tspecies\t|\tAT\t|\t8\t|\t0\t|\t1\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t|
2222\t|\t707\t|\tspecies\t|\tTT\t|\t0\t|\t0\t|\t11\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t|
6666\t|\t606\t|\tspecies\t|\tGG\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t|\t
7777\t|\t606\t|\tspecies\t|\tAC\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t|
9999\t|\t202\t|\tspecies\t|\tBA\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t|
1010\t|\t101\t|\tspecies\t|\tAC\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t|
5555\t|\t555\t|\tspecies\t|\tAC\t|\t0\t|\t1\t|\t11\t|\t1\t|\t0\t|\t1\t|\t0\t|\t0\t|\t\t|
555\t|\t3\t|\tsuperclass\t|\t\t|\t8\t|\t0\t|\t1\t|\t0\t|\t0\t|\t0\t|\t0\t|\t0\t|\t\t|'''.split('\n')
nested_names = [
'3|a||scientific name|',
'11|b||scientific name|',
'555|c||scientific name|',
'22|d||scientific name|',
'44|e||scientific name|',
'66|f||scientific name|',
'88|g||scientific name|',
'77|h||scientific name|',
'99|i||scientific name|',
'707|j||scientific name|',
'909|k||scientific name|',
'101|l||scientific name|',
'202|m||scientific name|',
'606|n||scientific name|',
'2222|o||scientific name|',
'123|p||scientific name|',
'1111|q||scientific name|',
'1010|r||scientific name|',
'9999|s||scientific name|',
'7777|t||scientific name|',
'6666|u||scientific name|',
'5555|z||scientific name|',
]
tx = NcbiTaxonomyFromFiles(nested_species, nested_names)
dec = tx[3].getRankedDescendants('superclass')
self.assertEqual(len(dec), 1)
assert dec[0] is tx[555]
sp = tx['f'].getRankedDescendants('species')
self.assertSameItems(sp, [tx[1010], tx[9999], tx[7777], tx[6666]])
empty = tx[11].getRankedDescendants('superclass')
self.assertEqual(empty, [])
gr = tx[3].getRankedDescendants('group')
self.assertEqual(gr, [tx[123]])
assert tx[3] is tx['a']
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_nexus.py 000644 000765 000024 00000037464 12024702176 022345 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for the Nexus Parser
"""
from cogent.util.unit_test import TestCase, main
from cogent.parse.nexus import get_tree_info, parse_nexus_tree, parse_PAUP_log, \
split_tree_info, parse_trans_table, parse_dnd, get_BL_table, parse_taxa, \
find_fields
__author__ = "Catherine Lozupone"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Catherine Lozupone", "Rob Knight", "Micah Hamady"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Catherine Lozupone"
__email__ = "lozupone@colorado.edu"
__status__ = "Production"
Nexus_tree = """#NEXUS
Begin trees; [Treefile saved Wednesday, May 5, 2004 5:02 PM]
[!
>Data file = Grassland_short.nex
>Neighbor-joining search settings:
> Ties (if encountered) will be broken systematically
> Distance measure = Jukes-Cantor
> (Tree is unrooted)
]
Translate
1 outgroup25,
2 AF078391l,
3 AF078211af,
4 AF078393l,
5 AF078187af,
6 AF078320l,
7 AF078432l,
8 AF078290af,
9 AF078350l,
10 AF078356l,
11 AF078306af,
12 AF078429l,
13 AF078256af,
14 AF078443l,
15 AF078450l,
16 AF078452l,
17 AF078258af,
18 AF078380l,
19 AF078251af,
20 AF078179af,
21 outgroup258
;
tree PAUP_1 = [&R] (1,(2,(((3,4),(5,(((((6,10),9),(11,18)),((((7,15),19),17),(8,(12,(14,16))))),13))),20)),21);
tree PAUP_2 = [&R] (1,(2,(((3,4),(5,(((((6,10),9),(11,18)),((((7,15),19),17),(8,(12,(14,16))))),13))),20)),21);
End;""".split('\n')
Nexus_tree_2 = """#NEXUS
Begin trees; [Treefile saved Wednesday, June 14, 2006 11:20 AM]
[!>Neighbor-joining search settings:
> Ties (if encountered) will be broken systematically
> Distance measure = uncorrected ("p")
> (Tree is unrooted)
]
tree nj = [&U] ((((((((((YA10260L1:0.01855,SARAG06_Y:0.00367):0.01965,(((YA270L1G0:0.01095,SARAD10_Y:0.00699):0.01744,YA270L1A0:0.04329):0.00028,((YA165L1C1:0.01241,SARAA02_Y:0.02584):0.00213,((YA165L1H0:0.00092,SARAF10_Y:-0.00092):0.00250,(YA165L1A0:0.00177,SARAH10_Y:0.01226):0.00198):0.00131):0.00700):0.01111):0.11201,(YA160L1F0:0.00348,SARAG01_Y:-0.00122):0.13620):0.01202,((((YRM60L1D0:0.00357,(YRM60L1C0:0.00477,SARAE10_Y:-0.00035):0.00086):0.00092,SARAE03_Y:0.00126):0.00125,SARAC11_Y:0.00318):0.00160,YRM60L1H0:0.00593):0.09975):0.07088,SARAA01_Y:0.02880):0.00190,SARAB04_Y:0.05219):0.00563,YRM60L1E0:0.06099):0.00165,(YRM60L1H0:0.00450,SARAF11_Y:0.01839):0.00288):0.00129,YRM60L1B1:0.00713):0.00194,(YRM60L1G0:0.00990,(YA165L1G0:0.00576,(YA160L1G0:0.01226,SARAA11_Y:0.00389):0.00088):0.00300):0.00614,SARAC06_Y:0.00381);
end;""".split('\n')
Nexus_tree_3 = """#NEXUS
Begin trees; [Treefile saved Wednesday, May 5, 2004 5:02 PM]
[!
>Data file = Grassland_short.nex
>Neighbor-joining search settings:
> Ties (if encountered) will be broken systematically
> Distance measure = Jukes-Cantor
> (Tree is unrooted)
]
Translate
1 outgroup25,
2 AF078391l,
3 'AF078211af',
4 AF078393l,
5 AF078187af,
6 AF078320l,
7 AF078432l,
8 AF078290af,
9 AF078350l,
10 AF078356l,
11 AF078306af,
12 AF078429l,
13 AF078256af,
14 'AF078443l',
15 AF078450l,
16 AF078452l,
17 AF078258af,
18 'AF078380l',
19 AF078251af,
20 AF078179af,
21 outgroup258
;
tree PAUP_1 = [&R] (1,(2,(((3,4),(5,(((((6,10),9),(11,18)),((((7,15),19),17),(8,(12,(14,16))))),13))),20)),21);
tree PAUP_2 = [&R] (1,(2,(((3,4),(5,(((((6,10),9),(11,18)),((((7,15),19),17),(8,(12,(14,16))))),13))),20)),21);
End;""".split('\n')
PAUP_log = """
P A U P *
Version 4.0b10 for Macintosh (PPC/Altivec)
Wednesday, May 5, 2004 5:03 PM
This copy registered to: Scott Dawson
UC-Berkeley
(serial number = B400784)
-----------------------------NOTICE-----------------------------
This is a beta-test version. Please report any crashes,
apparent calculation errors, or other anomalous results.
There are no restrictions on publication of results obtained
with this version, but you should check the WWW site
frequently for bug announcements and/or updated versions.
See the README file on the distribution media for details.
----------------------------------------------------------------
Tree description:
Optimality criterion = parsimony
Character-status summary:
Of 500 total characters:
All characters are of type 'unord'
All characters have equal weight
253 characters are constant
109 variable characters are parsimony-uninformative
Number of parsimony-informative characters = 138
Multistate taxa interpreted as uncertainty
Character-state optimization: Accelerated transformation (ACCTRAN)
AncStates = "standard"
Tree number 1 (rooted using user-specified outgroup)
Branch lengths and linkages for tree #1
Assigned Minimum Maximum
Connected branch possible possible
Node to node length length length
-------------------------------------------------------------------------
40 root 0 0 0
outgroup25 (1)* 40 40 24 52
39 40 57 15 72
AF078391l (2) 39 56 48 81
38 39 33 17 71
37 38 31 14 48
22 37 20 11 33
AF078211af (3) 22 4 2 7
AF078393l (4) 22 1 0 3
36 37 14 5 32
AF078187af (5) 36 18 10 28
35 36 21 16 45
34 35 10 3 23
26 34 5 3 9
24 26 4 3 13
23 24 0 0 3
AF078320l (6) 23 1 1 3
AF078356l (10) 23 2 2 2
AF078350l (9) 24 5 3 5
25 26 9 2 10
AF078306af (11) 25 6 4 10
AF078380l (18) 25 5 3 10
33 34 5 4 15
29 33 3 1 4
28 29 2 2 2
27 28 3 1 3
AF078432l (7) 27 2 2 2
AF078450l (15) 27 3 3 4
AF078251af (19) 28 6 6 7
AF078258af (17) 29 6 6 6
32 33 4 3 15
AF078290af (8) 32 9 8 11
31 32 9 6 18
AF078429l (12) 31 2 1 5
30 31 10 9 18
AF078443l (14) 30 2 1 6
AF078452l (16) 30 4 4 5
AF078256af (13) 35 4 1 6
AF078179af (20) 38 48 34 79
outgroup258 (21)* 40 45 27 67
-------------------------------------------------------------------------
Sum 509
Tree length = 509
Consistency index (CI) = 0.7151
Homoplasy index (HI) = 0.2849
""".split('\n')
line1 = " 40 root 0 0 0"
line2 = "outgroup25 (1)* 40 40 24 52"
line3 = " 39 40 57 15 72"
line4 = "AF078391l (2) 39 56 48 81"
class NexusParserTests(TestCase):
"""Tests of the Nexus Parser functions"""
def test_parse_nexus_tree(self):
"""parse_nexus_tree returns a dnd string and a translation table list"""
Trans_table, dnd = parse_nexus_tree(Nexus_tree)
#check the full dendrogram string is returned
self.assertEqual(dnd['tree PAUP_1'],\
"(1,(2,(((3,4),(5,(((((6,10),9),(11,18)),((((7,15),19),17),(8,(12,(14,16))))),13))),20)),21);")
#check that all taxa are returned in the Trans_table
self.assertEqual(Trans_table['1'], 'outgroup25')
self.assertEqual(Trans_table['2'], 'AF078391l')
self.assertEqual(Trans_table['3'], 'AF078211af')
self.assertEqual(Trans_table['4'], 'AF078393l')
self.assertEqual(Trans_table['5'], 'AF078187af')
self.assertEqual(Trans_table['6'], 'AF078320l')
self.assertEqual(Trans_table['21'], 'outgroup258')
self.assertEqual(Trans_table['20'], 'AF078179af')
self.assertEqual(Trans_table['19'], 'AF078251af')
#check that Nexus files without translation table work
Trans_table, dnd = parse_nexus_tree(Nexus_tree_2)
self.assertEqual(Trans_table, None)
self.assertEqual(dnd['tree nj'], '((((((((((YA10260L1:0.01855,SARAG06_Y:0.00367):0.01965,(((YA270L1G0:0.01095,SARAD10_Y:0.00699):0.01744,YA270L1A0:0.04329):0.00028,((YA165L1C1:0.01241,SARAA02_Y:0.02584):0.00213,((YA165L1H0:0.00092,SARAF10_Y:-0.00092):0.00250,(YA165L1A0:0.00177,SARAH10_Y:0.01226):0.00198):0.00131):0.00700):0.01111):0.11201,(YA160L1F0:0.00348,SARAG01_Y:-0.00122):0.13620):0.01202,((((YRM60L1D0:0.00357,(YRM60L1C0:0.00477,SARAE10_Y:-0.00035):0.00086):0.00092,SARAE03_Y:0.00126):0.00125,SARAC11_Y:0.00318):0.00160,YRM60L1H0:0.00593):0.09975):0.07088,SARAA01_Y:0.02880):0.00190,SARAB04_Y:0.05219):0.00563,YRM60L1E0:0.06099):0.00165,(YRM60L1H0:0.00450,SARAF11_Y:0.01839):0.00288):0.00129,YRM60L1B1:0.00713):0.00194,(YRM60L1G0:0.00990,(YA165L1G0:0.00576,(YA160L1G0:0.01226,SARAA11_Y:0.00389):0.00088):0.00300):0.00614,SARAC06_Y:0.00381);')
def test_parse_nexus_tree_sq(self):
"""remove single quotes from tree and translate tables"""
Trans_table, dnd = parse_nexus_tree(Nexus_tree_3)
#check the full dendrogram string is returned
self.assertEqual(dnd['tree PAUP_1'],\
"(1,(2,(((3,4),(5,(((((6,10),9),(11,18)),((((7,15),19),17),(8,(12,(14,16))))),13))),20)),21);")
#check that all taxa are returned in the Trans_table
self.assertEqual(Trans_table['1'], 'outgroup25')
self.assertEqual(Trans_table['2'], 'AF078391l')
self.assertEqual(Trans_table['3'], 'AF078211af')
self.assertEqual(Trans_table['4'], 'AF078393l')
self.assertEqual(Trans_table['5'], 'AF078187af')
self.assertEqual(Trans_table['6'], 'AF078320l')
self.assertEqual(Trans_table['21'], 'outgroup258')
self.assertEqual(Trans_table['20'], 'AF078179af')
self.assertEqual(Trans_table['19'], 'AF078251af')
def test_get_tree_info(self):
"""get_tree_info returns the Nexus file section that describes the tree"""
result = get_tree_info(Nexus_tree)
self.assertEqual(len(result), 33)
self.assertEqual(result[0],\
"Begin trees; [Treefile saved Wednesday, May 5, 2004 5:02 PM]")
self.assertEqual(result[31], \
"tree PAUP_1 = [&R] (1,(2,(((3,4),(5,(((((6,10),9),(11,18)),((((7,15),19),17),(8,(12,(14,16))))),13))),20)),21);")
def test_split_tree_info(self):
"""split_tree_info splits lines into header, Trans_table, and dnd"""
tree_info = get_tree_info(Nexus_tree)
header, trans_table, dnd = split_tree_info(tree_info)
self.assertEqual(len(header), 9)
self.assertEqual(len(trans_table), 22)
self.assertEqual(len(dnd), 2)
self.assertEqual(header[0],\
"Begin trees; [Treefile saved Wednesday, May 5, 2004 5:02 PM]")
self.assertEqual(header[8], "\tTranslate")
self.assertEqual(trans_table[0], "\t\t1 outgroup25,")
self.assertEqual(trans_table[21], "\t\t;")
self.assertEqual(dnd[0], \
"tree PAUP_1 = [&R] (1,(2,(((3,4),(5,(((((6,10),9),(11,18)),((((7,15),19),17),(8,(12,(14,16))))),13))),20)),21);")
def test_parse_trans_table(self):
"""parse_trans_table returns a dict with the taxa names indexed by number"""
tree_info = get_tree_info(Nexus_tree)
header, trans_table, dnd = split_tree_info(tree_info)
Trans_table = parse_trans_table(trans_table)
self.assertEqual(len(Trans_table), 21)
#check that taxa are returned in the Trans_table
self.assertEqual(Trans_table['1'], 'outgroup25')
self.assertEqual(Trans_table['2'], 'AF078391l')
self.assertEqual(Trans_table['3'], 'AF078211af')
self.assertEqual(Trans_table['4'], 'AF078393l')
self.assertEqual(Trans_table['5'], 'AF078187af')
self.assertEqual(Trans_table['6'], 'AF078320l')
self.assertEqual(Trans_table['21'], 'outgroup258')
self.assertEqual(Trans_table['20'], 'AF078179af')
self.assertEqual(Trans_table['19'], 'AF078251af')
def test_parse_dnd(self):
"""parse_dnd returns a dict with dnd indexed by tree name"""
tree_info = get_tree_info(Nexus_tree)
header, trans_table, dnd = split_tree_info(tree_info)
dnd_dict = parse_dnd(dnd)
self.assertEqual(dnd_dict['tree PAUP_1'],\
"(1,(2,(((3,4),(5,(((((6,10),9),(11,18)),((((7,15),19),17),(8,(12,(14,16))))),13))),20)),21);")
#------------------------------------------------------
def test_get_BL_table(self):
"""get_BL_table returns the section of the log file w/ the BL table"""
BL_table = get_BL_table(PAUP_log)
self.assertEqual(len(BL_table), 40)
self.assertEqual(BL_table[0], \
" 40 root 0 0 0")
self.assertEqual(BL_table[39], \
"outgroup258 (21)* 40 45 27 67")
def test_find_fields(self):
"""find_fields takes BL table line and returns field names mapped to info"""
result = find_fields(line1)
self.assertEqual(result['taxa'], "40")
self.assertEqual(result['bl'], "0")
self.assertEqual(result['parent'], "root")
def test_parse_taxa(self):
"""parse_taxa should return the taxa # from a taxa_field from find_fields"""
result1 = find_fields(line1)
result2 = find_fields(line2)
result3 = find_fields(line3)
result4 = find_fields(line4)
self.assertEqual(parse_taxa(result1["taxa"]), '40')
self.assertEqual(parse_taxa(result2["taxa"]), '1')
self.assertEqual(parse_taxa(result3["taxa"]), '39')
self.assertEqual(parse_taxa(result4["taxa"]), '2')
def test_parse_PAUP_log(self):
"""parse_PAUP_log extracts branch length info from a PAUP log file"""
BL_dict = parse_PAUP_log(PAUP_log)
self.assertEqual(len(BL_dict), 40)
self.assertEqual(BL_dict['1'], ('40', 40))
self.assertEqual(BL_dict['40'], ('root', 0))
self.assertEqual(BL_dict['39'], ('40', 57))
self.assertEqual(BL_dict['2'], ('39', 56))
self.assertEqual(BL_dict['26'], ('34', 5))
self.assertEqual(BL_dict['21'], ('40', 45))
#run if called from command line
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_nupack.py 000644 000765 000024 00000003125 12024702176 022447 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from cogent.util.unit_test import TestCase, main
from cogent.core.info import Info
from cogent.parse.nupack import nupack_parser
__author__ = "Shandy Wikman"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Shandy Wikman"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Shandy Wikman"
__email__ = "ens01svn@cs.umu.se"
__status__ = "Development"
class NupackParserTest(TestCase):
"""Provides tests for NUPACK RNA secondary structure format parsers"""
def setUp(self):
"""Setup function"""
#output
self.nupack_out = NUPACK
#expected
self.nupack_exp = [['GGCUAGUCCCUUCU',[(0,9),(1,8),(4,13),(5,12)],-23.30]]
def test_nupack_output(self):
"""Test for nupack format"""
obs = nupack_parser(self.nupack_out)
self.assertEqual(obs,self.nupack_exp)
NUPACK = ['****************************************************************\n', 'NUPACK 1.2\n',
'Copyright 2007-2009 2003, 2004 by Robert M. Dirks & Niles A. Pierce\n',
'California Institute of Technology\n', 'Pasadena, CA 91125 USA\n', '\n',
'Last Modified: 03/18/2004\n',
'****************************************************************\n', '\n',
'\n', 'Fold.out Version 1.2: Complexity O(N^5) (pseudoknots enabled)\n',
'Reading Input File...\n', 'Sequence Read.\n', 'Energy Parameters Loaded\n',
'SeqLength = 14\n', 'Sequence and a Minimum Energy Structure:\n',
'GGCUAGUCCCUUCU\n', '((..{{..))..}}\n', 'mfe = -23.30 kcal/mol\n',
'pseudoknotted!\n']
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_pamlmatrix.py 000644 000765 000024 00000004656 12024702176 023356 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from StringIO import StringIO
from cogent.util.unit_test import TestCase, main
from cogent.evolve.models import DSO78_matrix, DSO78_freqs
from cogent.parse.paml_matrix import PamlMatrixParser
__author__ = "Matthew Wakefield"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Matthew Wakefield"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Matthew Wakefield"
__email__ = "wakefield@wehi.edu.au"
__status__ = "Production"
data = """
27
98 32
120 0 905
36 23 0 0
89 246 103 134 0
198 1 148 1153 0 716
240 9 139 125 11 28 81
23 240 535 86 28 606 43 10
65 64 77 24 44 18 61 0 7
41 15 34 0 0 73 11 7 44 257
26 464 318 71 0 153 83 27 26 46 18
72 90 1 0 0 114 30 17 0 336 527 243
18 14 14 0 0 0 0 15 48 196 157 0 92
250 103 42 13 19 153 51 34 94 12 32 33 17 11
409 154 495 95 161 56 79 234 35 24 17 96 62 46 245
371 26 229 66 16 53 34 30 22 192 33 136 104 13 78 550
0 201 23 0 0 0 0 0 27 0 46 0 0 76 0 75 0
24 8 95 0 96 0 22 0 127 37 28 13 0 698 0 34 42 61
208 24 15 18 49 35 37 54 44 889 175 10 258 12 48 30 157 0 28
0.087127 0.040904 0.040432 0.046872 0.033474 0.038255 0.049530
0.088612 0.033618 0.036886 0.085357 0.080482 0.014753 0.039772
0.050680 0.069577 0.058542 0.010494 0.029916 0.064718
Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys Met Phe Pro Ser Thr Trp Tyr Val
S_ij = S_ji and PI_i for the Dayhoff model, with the rate Q_ij=S_ij*PI_j
The rest of the file is not used.
Prepared by Z. Yang, March 1995.
See the following reference for notation used here:
Yang, Z., R. Nielsen and M. Hasegawa. 1998. Models of amino acid substitution and
applications to mitochondrial protein evolution. Mol. Biol. Evol. 15:1600-1611.
"""
class TestParsePamlMatrix(TestCase):
def test_parse(self):
matrix,freqs = PamlMatrixParser(StringIO(data))
self.assertEqual(DSO78_matrix,matrix)
self.assertEqual(DSO78_freqs,freqs)
pass
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_parse/test_pdb.py 000755 000765 000024 00000114601 12024702176 021740 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for the pdb parser.
"""
from cogent.util.unit_test import TestCase, main
from cogent.parse.pdb import dict2pdb, dict2ter, pdb2dict, get_symmetry, \
get_coords_offset, get_trailer_offset, \
parse_header, parse_coords, parse_trailer, \
PDBParser
from cogent.core.entity import Structure
from cogent.core.entity import StructureBuilder
from numpy import array, allclose
__author__ = "Marcin Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Marcin Cieslik"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Marcin Cieslik"
__email__ = "mpc4p@virginia.edu"
__status__ = "Production"
class pdbTests(TestCase):
"""Tests of cogent.parse.pdb functions."""
def test_PDBParser(self):
"""tests the UI parsing function.
"""
fh = open('data/2E12.pdb')
structure = PDBParser(fh, 'JUNK')
assert type(structure) is Structure
assert len(structure) == 1
assert (0,) in structure
assert structure.header['space_group'] == 'P 21 21 21'
assert structure.header['experiment_type'] == 'X-RAY DIFFRACTION'
assert structure.header['r_free'] == '0.280'
assert structure.header['dbref_acc'] == 'Q8P4R5'
assert structure.header['cryst1'] == '49.942 51.699 82.120 90.00 90.00 90.00'
assert structure.header['matthews'] == '2.29'
model = structure[(0,)]
assert len(model) == 2
assert structure.raw_header
assert structure.raw_trailer
assert structure.header
assert structure.trailer == {}
assert structure.getId() == ('JUNK', )
def test_parse_trailer(self):
"""testing trailer parsing dummy."""
d = parse_trailer(None)
assert isinstance(d, dict)
def test_parse_coords(self):
"""testing minimal structure building and coords parsing.
"""
builder = StructureBuilder()
builder.initStructure('JUNK')
atom = 'ATOM 10 CA PRO A 2 51.588 38.262 31.417 1.00 6.58 C \n'
hetatm = 'HETATM 1633 O HOH B 164 17.979 35.529 38.171 1.00 1.02 O \n'
lines = ['MODEL ', atom, hetatm]
z = parse_coords(builder, lines)
assert len(z[(0,)]) == 2
assert len(z[(0,)][('A',)]) == 1
assert len(z[(0,)][('B',)]) == 1
z.setTable()
atom1 = z.table['A'][('JUNK', 0, 'A', ('PRO', 2, ' '), ('CA', ' '))]
hetatm1 = z.table['A'][('JUNK', 0, 'B', ('H_HOH', 164, ' '), ('O', ' '))]
self.assertAlmostEqual([51.588 , 38.262 , 31.417][2], list(atom1.coords)[2])
self.assertAlmostEqual([17.979 , 35.529 , 38.171][2], list(hetatm1.coords)[2])
def test_parse_header(self):
"""testing header parsing.
"""
header = ['HEADER TRANSLATION 17-OCT-06 2E12 \n',
'TITLE THE CRYSTAL STRUCTURE OF XC5848 FROM XANTHOMONAS CAMPESTRIS \n',
'TITLE 2 ADOPTING A NOVEL VARIANT OF SM-LIKE MOTIF \n',
'COMPND MOL_ID: 1; \n',
'COMPND 2 MOLECULE: HYPOTHETICAL PROTEIN XCC3642; \n',
'COMPND 3 CHAIN: A, B; \n',
'COMPND 4 SYNONYM: SM-LIKE MOTIF; \n',
'COMPND 5 ENGINEERED: YES \n',
'SOURCE MOL_ID: 1; \n',
'SOURCE 2 ORGANISM_SCIENTIFIC: XANTHOMONAS CAMPESTRIS PV. CAMPESTRIS; \n',
'SOURCE 3 ORGANISM_TAXID: 340; \n',
'SOURCE 4 STRAIN: PV. CAMPESTRIS; \n',
'SOURCE 5 EXPRESSION_SYSTEM: ESCHERICHIA COLI; \n',
'SOURCE 6 EXPRESSION_SYSTEM_TAXID: 562 \n',
'KEYWDS NOVEL SM-LIKE MOTIF, LSM MOTIF, XANTHOMONAS CAMPESTRIS, X- \n',
'KEYWDS 2 RAY CRYSTALLOGRAPHY, TRANSLATION \n',
'EXPDTA X-RAY DIFFRACTION \n',
'AUTHOR K.-H.CHIN,S.-K.RUAN,A.H.-J.WANG,S.-H.CHOU \n',
'REVDAT 2 24-FEB-09 2E12 1 VERSN \n',
'REVDAT 1 30-OCT-07 2E12 0 \n',
'JRNL AUTH K.-H.CHIN,S.-K.RUAN,A.H.-J.WANG,S.-H.CHOU \n',
'JRNL TITL XC5848, AN ORFAN PROTEIN FROM XANTHOMONAS \n',
'JRNL TITL 2 CAMPESTRIS, ADOPTS A NOVEL VARIANT OF SM-LIKE MOTIF \n',
'JRNL REF PROTEINS V. 68 1006 2007 \n',
'JRNL REFN ISSN 0887-3585 \n',
'JRNL PMID 17546661 \n',
'JRNL DOI 10.1002/PROT.21375 \n',
'REMARK 1 \n',
'REMARK 2 \n',
'REMARK 2 RESOLUTION. 1.70 ANGSTROMS. \n',
'REMARK 3 \n',
'REMARK 3 REFINEMENT. \n',
'REMARK 3 PROGRAM : CNS \n',
'REMARK 3 AUTHORS : BRUNGER,ADAMS,CLORE,DELANO,GROS,GROSSE- \n',
'REMARK 3 : KUNSTLEVE,JIANG,KUSZEWSKI,NILGES, PANNU, \n',
'REMARK 3 : READ,RICE,SIMONSON,WARREN \n',
'REMARK 3 \n',
'REMARK 3 REFINEMENT TARGET : NULL \n',
'REMARK 3 \n',
'REMARK 3 DATA USED IN REFINEMENT. \n',
'REMARK 3 RESOLUTION RANGE HIGH (ANGSTROMS) : 1.70 \n',
'REMARK 3 RESOLUTION RANGE LOW (ANGSTROMS) : 30.00 \n',
'REMARK 3 DATA CUTOFF (SIGMA(F)) : 5.000 \n',
'REMARK 3 DATA CUTOFF HIGH (ABS(F)) : NULL \n',
'REMARK 3 DATA CUTOFF LOW (ABS(F)) : NULL \n',
'REMARK 3 COMPLETENESS (WORKING+TEST) (%) : 99.1 \n',
'REMARK 3 NUMBER OF REFLECTIONS : 6937 \n',
'REMARK 3 \n',
'REMARK 3 FIT TO DATA USED IN REFINEMENT. \n',
'REMARK 3 CROSS-VALIDATION METHOD : THROUGHOUT \n',
'REMARK 3 FREE R VALUE TEST SET SELECTION : RANDOM \n',
'REMARK 3 R VALUE (WORKING SET) : 0.220 \n',
'REMARK 3 FREE R VALUE : 0.280 \n',
'REMARK 3 FREE R VALUE TEST SET SIZE (%) : NULL \n',
'REMARK 3 FREE R VALUE TEST SET COUNT : NULL \n',
'REMARK 3 ESTIMATED ERROR OF FREE R VALUE : NULL \n',
'REMARK 3 \n',
'REMARK 3 FIT IN THE HIGHEST RESOLUTION BIN. \n',
'REMARK 3 TOTAL NUMBER OF BINS USED : NULL \n',
'REMARK 3 BIN RESOLUTION RANGE HIGH (A) : 1.70 \n',
'REMARK 3 BIN RESOLUTION RANGE LOW (A) : 1.75 \n',
'REMARK 3 BIN COMPLETENESS (WORKING+TEST) (%) : 97.00 \n',
'REMARK 3 REFLECTIONS IN BIN (WORKING SET) : NULL \n',
'REMARK 3 BIN R VALUE (WORKING SET) : 0.2400 \n',
'REMARK 3 BIN FREE R VALUE : 0.2200 \n',
'REMARK 3 BIN FREE R VALUE TEST SET SIZE (%) : NULL \n',
'REMARK 3 BIN FREE R VALUE TEST SET COUNT : NULL \n',
'REMARK 3 ESTIMATED ERROR OF BIN FREE R VALUE : 0.012 \n',
'REMARK 3 \n',
'REMARK 3 NUMBER OF NON-HYDROGEN ATOMS USED IN REFINEMENT. \n',
'REMARK 3 PROTEIN ATOMS : 1512 \n',
'REMARK 3 NUCLEIC ACID ATOMS : 0 \n',
'REMARK 3 HETEROGEN ATOMS : 0 \n',
'REMARK 3 SOLVENT ATOMS : 122 \n',
'REMARK 3 \n',
'REMARK 3 B VALUES. \n',
'REMARK 3 FROM WILSON PLOT (A**2) : 24.00 \n',
'REMARK 3 MEAN B VALUE (OVERALL, A**2) : NULL \n',
'REMARK 3 OVERALL ANISOTROPIC B VALUE. \n',
'REMARK 3 B11 (A**2) : NULL \n',
'REMARK 3 B22 (A**2) : NULL \n',
'REMARK 3 B33 (A**2) : NULL \n',
'REMARK 3 B12 (A**2) : NULL \n',
'REMARK 3 B13 (A**2) : NULL \n',
'REMARK 3 B23 (A**2) : NULL \n',
'REMARK 3 \n',
'REMARK 3 ESTIMATED COORDINATE ERROR. \n',
'REMARK 3 ESD FROM LUZZATI PLOT (A) : NULL \n',
'REMARK 3 ESD FROM SIGMAA (A) : NULL \n',
'REMARK 3 LOW RESOLUTION CUTOFF (A) : NULL \n',
'REMARK 3 \n',
'REMARK 3 CROSS-VALIDATED ESTIMATED COORDINATE ERROR. \n',
'REMARK 3 ESD FROM C-V LUZZATI PLOT (A) : NULL \n',
'REMARK 3 ESD FROM C-V SIGMAA (A) : NULL \n',
'REMARK 3 \n',
'REMARK 3 RMS DEVIATIONS FROM IDEAL VALUES. \n',
'REMARK 3 BOND LENGTHS (A) : 0.007 \n',
'REMARK 3 BOND ANGLES (DEGREES) : 1.32 \n',
'REMARK 3 DIHEDRAL ANGLES (DEGREES) : NULL \n',
'REMARK 3 IMPROPER ANGLES (DEGREES) : NULL \n',
'REMARK 3 \n',
'REMARK 3 ISOTROPIC THERMAL MODEL : NULL \n',
'REMARK 3 \n',
'REMARK 3 ISOTROPIC THERMAL FACTOR RESTRAINTS. RMS SIGMA \n',
'REMARK 3 MAIN-CHAIN BOND (A**2) : NULL ; NULL \n',
'REMARK 3 MAIN-CHAIN ANGLE (A**2) : NULL ; NULL \n',
'REMARK 3 SIDE-CHAIN BOND (A**2) : NULL ; NULL \n',
'REMARK 3 SIDE-CHAIN ANGLE (A**2) : NULL ; NULL \n',
'REMARK 3 \n',
'REMARK 3 BULK SOLVENT MODELING. \n',
'REMARK 3 METHOD USED : NULL \n',
'REMARK 3 KSOL : NULL \n',
'REMARK 3 BSOL : NULL \n',
'REMARK 3 \n',
'REMARK 3 NCS MODEL : NULL \n',
'REMARK 3 \n',
'REMARK 3 NCS RESTRAINTS. RMS SIGMA/WEIGHT \n',
'REMARK 3 GROUP 1 POSITIONAL (A) : NULL ; NULL \n',
'REMARK 3 GROUP 1 B-FACTOR (A**2) : NULL ; NULL \n',
'REMARK 3 \n',
'REMARK 3 PARAMETER FILE 1 : NULL \n',
'REMARK 3 TOPOLOGY FILE 1 : NULL \n',
'REMARK 3 \n',
'REMARK 3 OTHER REFINEMENT REMARKS: NULL \n',
'REMARK 4 \n',
'REMARK 4 2E12 COMPLIES WITH FORMAT V. 3.15, 01-DEC-08 \n',
'REMARK 100 \n',
'REMARK 100 THIS ENTRY HAS BEEN PROCESSED BY PDBJ ON 19-OCT-06. \n',
'REMARK 100 THE RCSB ID CODE IS RCSB026092. \n',
'REMARK 200 \n',
'REMARK 200 EXPERIMENTAL DETAILS \n',
'REMARK 200 EXPERIMENT TYPE : X-RAY DIFFRACTION \n',
'REMARK 200 DATE OF DATA COLLECTION : 28-JUL-06 \n',
'REMARK 200 TEMPERATURE (KELVIN) : 100 \n',
'REMARK 200 PH : 8.0 \n',
'REMARK 200 NUMBER OF CRYSTALS USED : 10 \n',
'REMARK 200 \n',
'REMARK 200 SYNCHROTRON (Y/N) : Y \n',
'REMARK 200 RADIATION SOURCE : NSRRC \n',
'REMARK 200 BEAMLINE : BL13B1 \n',
'REMARK 200 X-RAY GENERATOR MODEL : NULL \n',
'REMARK 200 MONOCHROMATIC OR LAUE (M/L) : M \n',
'REMARK 200 WAVELENGTH OR RANGE (A) : 0.96437, 0.97983 \n',
'REMARK 200 MONOCHROMATOR : NULL \n',
'REMARK 200 OPTICS : NULL \n',
'REMARK 200 \n',
'REMARK 200 DETECTOR TYPE : CCD \n',
'REMARK 200 DETECTOR MANUFACTURER : ADSC QUANTUM 315 \n',
'REMARK 200 INTENSITY-INTEGRATION SOFTWARE : DENZO \n',
'REMARK 200 DATA SCALING SOFTWARE : HKL-2000 \n',
'REMARK 200 \n',
'REMARK 200 NUMBER OF UNIQUE REFLECTIONS : 6937 \n',
'REMARK 200 RESOLUTION RANGE HIGH (A) : 1.700 \n',
'REMARK 200 RESOLUTION RANGE LOW (A) : 30.000 \n',
'REMARK 200 REJECTION CRITERIA (SIGMA(I)) : 2.000 \n',
'REMARK 200 \n',
'REMARK 200 OVERALL. \n',
'REMARK 200 COMPLETENESS FOR RANGE (%) : 99.7 \n',
'REMARK 200 DATA REDUNDANCY : 4.500 \n',
'REMARK 200 R MERGE (I) : 0.24000 \n',
'REMARK 200 R SYM (I) : 0.06000 \n',
'REMARK 200 FOR THE DATA SET : 8.0000 \n',
'REMARK 200 \n',
'REMARK 200 IN THE HIGHEST RESOLUTION SHELL. \n',
'REMARK 200 HIGHEST RESOLUTION SHELL, RANGE HIGH (A) : 1.70 \n',
'REMARK 200 HIGHEST RESOLUTION SHELL, RANGE LOW (A) : NULL \n',
'REMARK 200 COMPLETENESS FOR SHELL (%) : 97.5 \n',
'REMARK 200 DATA REDUNDANCY IN SHELL : 4.50 \n',
'REMARK 200 R MERGE FOR SHELL (I) : 0.06000 \n',
'REMARK 200 R SYM FOR SHELL (I) : 0.24000 \n',
'REMARK 200 FOR SHELL : 7.900 \n',
'REMARK 200 \n',
'REMARK 200 DIFFRACTION PROTOCOL: MAD \n',
'REMARK 200 METHOD USED TO DETERMINE THE STRUCTURE: MAD \n',
'REMARK 200 SOFTWARE USED: AMORE \n',
'REMARK 200 STARTING MODEL: NULL \n',
'REMARK 200 \n',
'REMARK 200 REMARK: NULL \n',
'REMARK 280 \n',
'REMARK 280 CRYSTAL \n',
'REMARK 280 SOLVENT CONTENT, VS (%): 46.26 \n',
'REMARK 280 MATTHEWS COEFFICIENT, VM (ANGSTROMS**3/DA): 2.29 \n',
'REMARK 280 \n',
'REMARK 280 CRYSTALLIZATION CONDITIONS: PH 8.0, VAPOR DIFFUSION, SITTING \n',
'REMARK 280 DROP, TEMPERATURE 298K \n',
'REMARK 290 REMARK: NULL \n',
'REMARK 300 \n',
'REMARK 300 BIOMOLECULE: 1 \n',
'REMARK 300 SEE REMARK 350 FOR THE AUTHOR PROVIDED AND/OR PROGRAM \n',
'REMARK 300 GENERATED ASSEMBLY INFORMATION FOR THE STRUCTURE IN \n',
'REMARK 300 THIS ENTRY. THE REMARK MAY ALSO PROVIDE INFORMATION ON \n',
'REMARK 300 BURIED SURFACE AREA. \n',
'REMARK 465 \n',
'REMARK 465 MISSING RESIDUES \n',
'REMARK 465 THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE \n',
'REMARK 465 EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN \n',
'REMARK 465 IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.) \n',
'REMARK 465 \n',
'REMARK 465 M RES C SSSEQI \n',
'REMARK 465 LEU A 94 \n',
'REMARK 465 GLY A 95 \n',
'REMARK 465 ALA A 96 \n',
'REMARK 465 PRO A 97 \n',
'REMARK 465 GLN A 98 \n',
'REMARK 465 VAL A 99 \n',
'REMARK 465 MET A 100 \n',
'REMARK 465 PRO A 101 \n',
'REMARK 465 LEU B 94 \n',
'REMARK 465 GLY B 95 \n',
'REMARK 465 ALA B 96 \n',
'REMARK 465 PRO B 97 \n',
'REMARK 465 GLN B 98 \n',
'REMARK 465 VAL B 99 \n',
'REMARK 465 MET B 100 \n',
'REMARK 465 PRO B 101 \n',
'REMARK 500 \n',
'REMARK 500 GEOMETRY AND STEREOCHEMISTRY \n',
'REMARK 500 SUBTOPIC: CLOSE CONTACTS IN SAME ASYMMETRIC UNIT \n',
'REMARK 500 \n',
'REMARK 500 THE FOLLOWING ATOMS ARE IN CLOSE CONTACT. \n',
'REMARK 500 \n',
'REMARK 500 ATM1 RES C SSEQI ATM2 RES C SSEQI DISTANCE \n',
'REMARK 500 O HOH A 127 O HOH A 149 2.05 \n',
'REMARK 500 \n',
'REMARK 500 REMARK: NULL \n',
'REMARK 500 \n',
'REMARK 500 GEOMETRY AND STEREOCHEMISTRY \n',
'REMARK 500 SUBTOPIC: TORSION ANGLES \n',
'REMARK 500 \n',
'REMARK 500 TORSION ANGLES OUTSIDE THE EXPECTED RAMACHANDRAN REGIONS: \n',
'REMARK 500 (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN IDENTIFIER; \n',
'REMARK 500 SSEQ=SEQUENCE NUMBER; I=INSERTION CODE). \n',
'REMARK 500 \n',
'REMARK 500 STANDARD TABLE: \n',
'REMARK 500 FORMAT:(10X,I3,1X,A3,1X,A1,I4,A1,4X,F7.2,3X,F7.2) \n',
'REMARK 500 \n',
'REMARK 500 EXPECTED VALUES: GJ KLEYWEGT AND TA JONES (1996). PHI/PSI- \n',
'REMARK 500 CHOLOGY: RAMACHANDRAN REVISITED. STRUCTURE 4, 1395 - 1400 \n',
'REMARK 500 \n',
'REMARK 500 M RES CSSEQI PSI PHI \n',
'REMARK 500 ASN A 64 -175.84 -178.56 \n',
'REMARK 500 HIS A 71 -156.72 -164.33 \n',
'REMARK 500 LEU A 72 -70.52 -135.73 \n',
'REMARK 500 ALA A 74 -75.47 -29.45 \n',
'REMARK 500 SER A 75 -5.54 -145.62 \n',
'REMARK 500 GLN A 76 -178.36 65.32 \n',
'REMARK 500 GLU A 77 115.33 61.52 \n',
'REMARK 500 MET A 92 -36.89 93.21 \n',
'REMARK 500 LEU B 25 37.31 -77.89 \n',
'REMARK 500 GLN B 28 37.09 32.85 \n',
'REMARK 500 ARG B 30 132.20 -36.84 \n',
'REMARK 500 ASN B 64 -172.78 -175.93 \n',
'REMARK 500 GLN B 76 67.64 34.46 \n',
'REMARK 500 PRO B 91 -156.99 -48.61 \n',
'REMARK 500 MET B 92 -37.52 -160.44 \n',
'REMARK 500 \n',
'REMARK 500 REMARK: NULL \n',
'REMARK 525 \n',
'REMARK 525 SOLVENT \n',
'REMARK 525 \n',
'REMARK 525 THE SOLVENT MOLECULES HAVE CHAIN IDENTIFIERS THAT \n',
'REMARK 525 INDICATE THE POLYMER CHAIN WITH WHICH THEY ARE MOST \n',
'REMARK 525 CLOSELY ASSOCIATED. THE REMARK LISTS ALL THE SOLVENT \n',
'REMARK 525 MOLECULES WHICH ARE MORE THAN 5A AWAY FROM THE \n',
'REMARK 525 NEAREST POLYMER CHAIN (M = MODEL NUMBER; \n',
'REMARK 525 RES=RESIDUE NAME; C=CHAIN IDENTIFIER; SSEQ=SEQUENCE \n',
'REMARK 525 NUMBER; I=INSERTION CODE): \n',
'REMARK 525 \n',
'REMARK 525 M RES CSSEQI \n',
'REMARK 525 HOH B 115 DISTANCE = 6.82 ANGSTROMS \n',
'REMARK 525 HOH A 116 DISTANCE = 6.52 ANGSTROMS \n',
'REMARK 525 HOH B 119 DISTANCE = 5.12 ANGSTROMS \n',
'REMARK 525 HOH B 121 DISTANCE = 5.21 ANGSTROMS \n',
'REMARK 525 HOH B 123 DISTANCE = 5.18 ANGSTROMS \n',
'REMARK 525 HOH A 124 DISTANCE = 6.99 ANGSTROMS \n',
'REMARK 525 HOH B 124 DISTANCE = 5.13 ANGSTROMS \n',
'REMARK 525 HOH B 134 DISTANCE = 7.25 ANGSTROMS \n',
'REMARK 525 HOH B 140 DISTANCE = 5.54 ANGSTROMS \n',
'REMARK 525 HOH B 141 DISTANCE = 5.94 ANGSTROMS \n',
'REMARK 525 HOH B 142 DISTANCE = 6.60 ANGSTROMS \n',
'REMARK 525 HOH B 143 DISTANCE = 7.39 ANGSTROMS \n',
'REMARK 525 HOH A 145 DISTANCE = 9.25 ANGSTROMS \n',
'REMARK 525 HOH A 150 DISTANCE = 6.01 ANGSTROMS \n',
'REMARK 525 HOH B 152 DISTANCE = 5.46 ANGSTROMS \n',
'REMARK 525 HOH B 153 DISTANCE = 9.74 ANGSTROMS \n',
'REMARK 525 HOH B 154 DISTANCE = 9.32 ANGSTROMS \n',
'REMARK 525 HOH B 155 DISTANCE = 5.41 ANGSTROMS \n',
'REMARK 525 HOH B 163 DISTANCE = 5.16 ANGSTROMS \n',
'DBREF 2E12 A 1 101 UNP Q8P4R5 Q8P4R5_XANCP 1 101 \n',
'DBREF 2E12 B 1 101 UNP Q8P4R5 Q8P4R5_XANCP 1 101 \n',
'SEQRES 1 A 101 MET PRO LYS TYR ALA PRO HIS VAL TYR THR GLU GLN ALA \n',
'SEQRES 2 A 101 GLN ILE ALA THR LEU GLU HIS TRP VAL LYS LEU LEU ASP \n',
'SEQRES 3 A 101 GLY GLN GLU ARG VAL ARG ILE GLU LEU ASP ASP GLY SER \n',
'SEQRES 4 A 101 MET ILE ALA GLY THR VAL ALA VAL ARG PRO THR ILE GLN \n',
'SEQRES 5 A 101 THR TYR ARG ASP GLU GLN GLU ARG GLU GLY SER ASN GLY \n',
'SEQRES 6 A 101 GLN LEU ARG ILE ASP HIS LEU ASP ALA SER GLN GLU PRO \n',
'SEQRES 7 A 101 GLN TRP ILE TRP MET ASP ARG ILE VAL ALA VAL HIS PRO \n',
'SEQRES 8 A 101 MET PRO LEU GLY ALA PRO GLN VAL MET PRO \n',
'SEQRES 1 B 101 MET PRO LYS TYR ALA PRO HIS VAL TYR THR GLU GLN ALA \n',
'SEQRES 2 B 101 GLN ILE ALA THR LEU GLU HIS TRP VAL LYS LEU LEU ASP \n',
'SEQRES 3 B 101 GLY GLN GLU ARG VAL ARG ILE GLU LEU ASP ASP GLY SER \n',
'SEQRES 4 B 101 MET ILE ALA GLY THR VAL ALA VAL ARG PRO THR ILE GLN \n',
'SEQRES 5 B 101 THR TYR ARG ASP GLU GLN GLU ARG GLU GLY SER ASN GLY \n',
'SEQRES 6 B 101 GLN LEU ARG ILE ASP HIS LEU ASP ALA SER GLN GLU PRO \n',
'SEQRES 7 B 101 GLN TRP ILE TRP MET ASP ARG ILE VAL ALA VAL HIS PRO \n',
'SEQRES 8 B 101 MET PRO LEU GLY ALA PRO GLN VAL MET PRO \n',
'FORMUL 3 HOH *122(H2 O) \n',
'HELIX 1 1 GLU A 11 LEU A 24 1 14 \n',
'HELIX 2 2 GLU B 11 LEU B 25 1 15 \n',
'SHEET 1 A 3 ILE A 51 ARG A 55 0 \n',
'SHEET 2 A 3 GLU A 61 ASP A 70 -1 O ASN A 64 N GLN A 52 \n',
'SHEET 3 A 3 GLN A 79 TRP A 82 -1 O ILE A 81 N LEU A 67 \n',
'SHEET 1 B 5 ILE A 51 ARG A 55 0 \n',
'SHEET 2 B 5 GLU A 61 ASP A 70 -1 O ASN A 64 N GLN A 52 \n',
'SHEET 3 B 5 MET A 40 VAL A 45 -1 N THR A 44 O ASP A 70 \n',
'SHEET 4 B 5 ARG A 30 LEU A 35 -1 N ILE A 33 O ILE A 41 \n',
'SHEET 5 B 5 ILE A 86 PRO A 91 -1 O VAL A 87 N GLU A 34 \n',
'SHEET 1 C 5 PRO B 78 TRP B 82 0 \n',
'SHEET 2 C 5 GLN B 66 ASP B 70 -1 N ILE B 69 O GLN B 79 \n',
'SHEET 3 C 5 MET B 40 VAL B 47 -1 N ALA B 46 O ARG B 68 \n',
'SHEET 4 C 5 VAL B 31 LEU B 35 -1 N ILE B 33 O ILE B 41 \n',
'SHEET 5 C 5 ILE B 86 HIS B 90 -1 O VAL B 87 N GLU B 34 \n',
'SHEET 1 D 2 GLN B 52 ARG B 55 0 \n',
'SHEET 2 D 2 GLU B 61 ASN B 64 -1 O ASN B 64 N GLN B 52 \n',
'CRYST1 49.942 51.699 82.120 90.00 90.00 90.00 P 21 21 21 8 \n']
correct_header = {
'bio_cmx': [[[('A',), ('B',)], 1]],
'uc_mxs': array([[[ 1. , 0. , 0. , 0. ],\
[ 0. , 1. , 0. , 0. ],\
[ 0. , 0. , 1. , 0. ],\
[ 0. , 0. , 0. , 1. ]],\
[[ -1. , 0. , 0. , 24.971 ],\
[ 0. , -1. , 0. , 0. ],\
[ 0. , 0. , 1. , 41.06 ],\
[ 0. , 0. , 0. , 1. ]],\
[[ -1. , 0. , 0. , 0. ],\
[ 0. , 1. , 0. , 25.8495],\
[ 0. , 0. , -1. , 41.06 ],\
[ 0. , 0. , 0. , 1. ]],\
[[ 1. , 0. , 0. , 24.971 ],\
[ 0. , -1. , 0. , 25.8495],\
[ 0. , 0. , -1. , 0. ],\
[ 0. , 0. , 0. , 1. ]]]), \
'dbref_acc_full': 'Q8P4R5_XANCP', \
'name': 'TRANSLATION', \
'solvent_content': '46.26', \
'expdta': 'X-RAY', \
'bio_mxs': array([[[ 1., 0., 0., 0.],\
[ 0., 1., 0., 0.],\
[ 0., 0., 1., 0.],\
[ 0., 0., 0., 1.]]]),
'uc_omx': array([[ 49.94256605, 0. , 0. ],\
[ 0. , 51.69828879, 0. ],\
[ 0. , 0. , 82.12203334]]), \
'space_group': 'P 21 21 21', 'r_free': '0.280', \
'cryst1': '49.942 51.699 82.120 90.00 90.00 90.00', \
'experiment_type': 'X-RAY DIFFRACTION', \
'uc_fmx': array([[ 0.020023, 0. , 0. ],\
[ 0. , 0.019343, 0. ],\
[ 0. , 0. , 0.012177]]),\
'date': '17-OCT-06', \
'matthews': '2.29', \
'resolution': '1.70', \
'id': '2E12', \
'dbref_acc': 'Q8P4R5'}
parsed_header = parse_header(header)
for key, val in parsed_header.items():
assert val == correct_header[key]
def test_get_trailer_offset(self):
lines = ['ATOM','CONNECT']
assert get_trailer_offset(lines) == 1
def test_get_coords_offset(self):
lines = ['dummy','ATOM','CONNECT']
assert get_coords_offset(lines) == 1
def test_get_symmetry(self):
"""testing parsing of symmetry operators
"""
lines = [
'REMARK 290 SMTRY1 1 1.000000 0.000000 0.000000 0.00000 \n',
'REMARK 290 SMTRY2 1 0.000000 1.000000 0.000000 0.00000 \n',
'REMARK 290 SMTRY3 1 0.000000 0.000000 1.000000 0.00000 \n',
'REMARK 290 SMTRY1 2 -1.000000 0.000000 0.000000 24.97100 \n',
'REMARK 290 SMTRY2 2 0.000000 -1.000000 0.000000 0.00000 \n',
'REMARK 290 SMTRY3 2 0.000000 0.000000 1.000000 41.06000 \n',
'REMARK 290 SMTRY1 3 -1.000000 0.000000 0.000000 0.00000 \n',
'REMARK 290 SMTRY2 3 0.000000 1.000000 0.000000 25.84950 \n',
'REMARK 290 SMTRY3 3 0.000000 0.000000 -1.000000 41.06000 \n',
'REMARK 290 SMTRY1 4 1.000000 0.000000 0.000000 24.97100 \n',
'REMARK 290 SMTRY2 4 0.000000 -1.000000 0.000000 25.84950 \n',
'REMARK 290 SMTRY3 4 0.000000 0.000000 -1.000000 0.00000 \n',
'REMARK 290 \n',
'REMARK 290 REMARK: NULL \n',
'REMARK 350 \n',
'REMARK 350 COORDINATES FOR A COMPLETE MULTIMER REPRESENTING THE KNOWN \n',
'REMARK 350 BIOLOGICALLY SIGNIFICANT OLIGOMERIZATION STATE OF THE \n',
'REMARK 350 MOLECULE CAN BE GENERATED BY APPLYING BIOMT TRANSFORMATIONS \n',
'REMARK 350 GIVEN BELOW. BOTH NON-CRYSTALLOGRAPHIC AND \n',
'REMARK 350 CRYSTALLOGRAPHIC OPERATIONS ARE GIVEN. \n',
'REMARK 350 \n',
'REMARK 350 BIOMOLECULE: 1 \n',
'REMARK 350 AUTHOR DETERMINED BIOLOGICAL UNIT: DIMERIC \n',
'REMARK 350 APPLY THE FOLLOWING TO CHAINS: A, B \n',
'REMARK 350 BIOMT1 1 1.000000 0.000000 0.000000 0.00000 \n',
'REMARK 350 BIOMT2 1 0.000000 1.000000 0.000000 0.00000 \n',
'REMARK 350 BIOMT3 1 0.000000 0.000000 1.000000 0.00000 \n',
'CRYST1 49.942 51.699 82.120 90.00 90.00 90.00 P 21 21 21 8 \n',
'ORIGX1 1.000000 0.000000 0.000000 0.00000 \n',
'ORIGX2 0.000000 1.000000 0.000000 0.00000 \n',
'ORIGX3 0.000000 0.000000 1.000000 0.00000 \n',
'SCALE1 0.020023 0.000000 0.000000 0.00000 \n',
'SCALE2 0.000000 0.019343 0.000000 0.00000 \n',
'SCALE3 0.000000 0.000000 0.012177 0.00000 \n']
sym = get_symmetry(lines)
correct_sym = {
'bio_cmx': [[[('A',), ('B',)], 1]],
'uc_mxs': array([[[ 1. , 0. , 0. , 0. ],\
[ 0. , 1. , 0. , 0. ],\
[ 0. , 0. , 1. , 0. ],\
[ 0. , 0. , 0. , 1. ]],\
[[ -1. , 0. , 0. , 24.971 ],\
[ 0. , -1. , 0. , 0. ],\
[ 0. , 0. , 1. , 41.06 ],\
[ 0. , 0. , 0. , 1. ]],\
[[ -1. , 0. , 0. , 0. ],\
[ 0. , 1. , 0. , 25.8495],\
[ 0. , 0. , -1. , 41.06 ],\
[ 0. , 0. , 0. , 1. ]],\
[[ 1. , 0. , 0. , 24.971 ],\
[ 0. , -1. , 0. , 25.8495],\
[ 0. , 0. , -1. , 0. ],\
[ 0. , 0. , 0. , 1. ]]]), \
'bio_mxs': array([[[ 1., 0., 0., 0.],\
[ 0., 1., 0., 0.],\
[ 0., 0., 1., 0.],\
[ 0., 0., 0., 1.]]]),
'uc_omx': array([[ 49.94256605, 0. , 0. ],\
[ 0. , 51.69828879, 0. ],\
[ 0. , 0. , 82.12203334]]), \
'uc_fmx': array([[ 0.020023, 0. , 0. ],\
[ 0. , 0.019343, 0. ],\
[ 0. , 0. , 0.012177]]),}
for key in sym:
try:
assert sym[key] == correct_sym[key]
except ValueError:
assert allclose(sym[key], correct_sym[key])
def test_dict2pdb(self):
"""testing pdb dict round-trip.
"""
line = 'ATOM 1 N MET A 1 53.045 42.225 33.724 1.00 2.75 N\n'
d = pdb2dict(line)
line2 = dict2pdb(d)
assert line == line2
d.pop('coords')
assert d == {'ser_num': 1, 'res_long_id': ('MET', 1, ' '),
'h_flag': ' ',
'at_name': ' N ',
'at_long_id': ('N', ' '),
'bfactor': 2.75, 'chain_id': 'A',
'occupancy': 1.0, 'element': ' N',
'res_name': 'MET',
'seg_id': ' ', 'at_id': 'N',
'alt_loc': ' ',
'res_ic': ' ',
'res_id': 1,
'at_type': 'ATOM '}
def test_dict2ter(self):
d = {'ser_num': 1, 'chain_id': 'A', 'res_name': 'MET', 'res_ic': ' ', \
'res_id': 1,}
assert dict2ter(d) == 'TER 2 MET A 1 \n'
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_phylip.py 000644 000765 000024 00000035312 12024702176 022476 0 ustar 00jrideout staff 000000 000000 #!/bin/env python
#file cogent/parse/test_phylip.py
"""Unit tests for the phylip parser
"""
from cogent.parse.phylip import MinimalPhylipParser, get_align_for_phylip
from cogent.parse.record import RecordError
from cogent.util.unit_test import TestCase, main
from StringIO import StringIO
__author__ = "Micah Hamady"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Micah Hamady", "Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Micah Hamady"
__email__ = "hamady@colorado.edu"
__status__ = "Production"
class PhylipGenericTest(TestCase):
"""Setup data for Phylip parsers."""
def setUp(self):
"""standard files"""
self.big_interleaved = StringIO("""10 705 I
Cow ATGGCATATCCCATACAACTAGGATTCCAAGATGCAACATCACCAATCATAGAAGAACTA
Carp ATGGCACACCCAACGCAACTAGGTTTCAAGGACGCGGCCATACCCGTTATAGAGGAACTT
Chicken ATGGCCAACCACTCCCAACTAGGCTTTCAAGACGCCTCATCCCCCATCATAGAAGAGCTC
Human ATGGCACATGCAGCGCAAGTAGGTCTACAAGACGCTACTTCCCCTATCATAGAAGAGCTT
Loach ATGGCACATCCCACACAATTAGGATTCCAAGACGCGGCCTCACCCGTAATAGAAGAACTT
Mouse ATGGCCTACCCATTCCAACTTGGTCTACAAGACGCCACATCCCCTATTATAGAAGAGCTA
Rat ATGGCTTACCCATTTCAACTTGGCTTACAAGACGCTACATCACCTATCATAGAAGAACTT
Seal ATGGCATACCCCCTACAAATAGGCCTACAAGATGCAACCTCTCCCATTATAGAGGAGTTA
Whale ATGGCATATCCATTCCAACTAGGTTTCCAAGATGCAGCATCACCCATCATAGAAGAGCTC
Frog ATGGCACACCCATCACAATTAGGTTTTCAAGACGCAGCCTCTCCAATTATAGAAGAATTA
CTTCACTTTCATGACCACACGCTAATAATTGTCTTCTTAATTAGCTCATTAGTACTTTAC
CTTCACTTCCACGACCACGCATTAATAATTGTGCTCCTAATTAGCACTTTAGTTTTATAT
GTTGAATTCCACGACCACGCCCTGATAGTCGCACTAGCAATTTGCAGCTTAGTACTCTAC
ATCACCTTTCATGATCACGCCCTCATAATCATTTTCCTTATCTGCTTCCTAGTCCTGTAT
CTTCACTTCCATGACCATGCCCTAATAATTGTATTTTTGATTAGCGCCCTAGTACTTTAT
ATAAATTTCCATGATCACACACTAATAATTGTTTTCCTAATTAGCTCCTTAGTCCTCTAT
ACAAACTTTCATGACCACACCCTAATAATTGTATTCCTCATCAGCTCCCTAGTACTTTAT
CTACACTTCCATGACCACACATTAATAATTGTGTTCCTAATTAGCTCATTAGTACTCTAC
CTACACTTTCACGATCATACACTAATAATCGTTTTTCTAATTAGCTCTTTAGTTCTCTAC
CTTCACTTCCACGACCATACCCTCATAGCCGTTTTTCTTATTAGTACGCTAGTTCTTTAC
ATTATTTCACTAATACTAACGACAAAGCTGACCCATACAAGCACGATAGATGCACAAGAA
ATTATTACTGCAATGGTATCAACTAAACTTACTAATAAATATATTCTAGACTCCCAAGAA
CTTCTAACTCTTATACTTATAGAAAAACTATCA---TCAAACACCGTAGATGCCCAAGAA
GCCCTTTTCCTAACACTCACAACAAAACTAACTAATACTAACATCTCAGACGCTCAGGAA
GTTATTATTACAACCGTCTCAACAAAACTCACTAACATATATATTTTGGACTCACAAGAA
ATCATCTCGCTAATATTAACAACAAAACTAACACATACAAGCACAATAGATGCACAAGAA
ATTATTTCACTAATACTAACAACAAAACTAACACACACAAGCACAATAGACGCCCAAGAA
ATTATCTCACTTATACTAACCACGAAACTCACCCACACAAGTACAATAGACGCACAAGAA
ATTATTACCCTAATGCTTACAACCAAATTAACACATACTAGTACAATAGACGCCCAAGAA
ATTATTACTATTATAATAACTACTAAACTAACTAATACAAACCTAATGGACGCACAAGAG
GTAGAGACAATCTGAACCATTCTGCCCGCCATCATCTTAATTCTAATTGCTCTTCCTTCT
ATCGAAATCGTATGAACCATTCTACCAGCCGTCATTTTAGTACTAATCGCCCTGCCCTCC
GTTGAACTAATCTGAACCATCCTACCCGCTATTGTCCTAGTCCTGCTTGCCCTCCCCTCC
ATAGAAACCGTCTGAACTATCCTGCCCGCCATCATCCTAGTCCTCATCGCCCTCCCATCC
ATTGAAATCGTATGAACTGTGCTCCCTGCCCTAATCCTCATTTTAATCGCCCTCCCCTCA
GTTGAAACCATTTGAACTATTCTACCAGCTGTAATCCTTATCATAATTGCTCTCCCCTCT
GTAGAAACAATTTGAACAATTCTCCCAGCTGTCATTCTTATTCTAATTGCCCTTCCCTCC
GTGGAAACGGTGTGAACGATCCTACCCGCTATCATTTTAATTCTCATTGCCCTACCATCA
GTAGAAACTGTCTGAACTATCCTCCCAGCCATTATCTTAATTTTAATTGCCTTGCCTTCA
ATCGAAATAGTGTGAACTATTATACCAGCTATTAGCCTCATCATAATTGCCCTTCCATCC
TTACGAATTCTATACATAATAGATGAAATCAATAACCCATCTCTTACAGTAAAAACCATA
CTACGCATCCTGTACCTTATAGACGAAATTAACGACCCTCACCTGACAATTAAAGCAATA
CTCCAAATCCTCTACATAATAGACGAAATCGACGAACCTGATCTCACCCTAAAAGCCATC
CTACGCATCCTTTACATAACAGACGAGGTCAACGATCCCTCCCTTACCATCAAATCAATT
CTACGAATTCTATATCTTATAGACGAGATTAATGACCCCCACCTAACAATTAAGGCCATG
CTACGCATTCTATATATAATAGACGAAATCAACAACCCCGTATTAACCGTTAAAACCATA
CTACGAATTCTATACATAATAGACGAGATTAATAACCCAGTTCTAACAGTAAAAACTATA
TTACGAATCCTCTACATAATGGACGAGATCAATAACCCTTCCTTGACCGTAAAAACTATA
TTACGGATCCTTTACATAATAGACGAAGTCAATAACCCCTCCCTCACTGTAAAAACAATA
CTTCGTATCCTATATTTAATAGATGAAGTTAATGATCCACACTTAACAATTAAAGCAATC
GGACATCAGTGATACTGAAGCTATGAGTATACAGATTATGAGGACTTAAGCTTCGACTCC
GGACACCAATGATACTGAAGTTACGAGTATACAGACTATGAAAATCTAGGATTCGACTCC
GGACACCAATGATACTGAACCTATGAATACACAGACTTCAAGGACCTCTCATTTGACTCC
GGCCACCAATGGTACTGAACCTACGAGTACACCGACTACGGCGGACTAATCTTCAACTCC
GGGCACCAATGATACTGAAGCTACGAGTATACTGATTATGAAAACTTAAGTTTTGACTCC
GGGCACCAATGATACTGAAGCTACGAATATACTGACTATGAAGACCTATGCTTTGATTCA
GGACACCAATGATACTGAAGCTATGAATATACTGACTATGAAGACCTATGCTTTGACTCC
GGACATCAGTGATACTGAAGCTATGAGTACACAGACTACGAAGACCTGAACTTTGACTCA
GGTCACCAATGATATTGAAGCTATGAGTATACCGACTACGAAGACCTAAGCTTCGACTCC
GGCCACCAATGATACTGAAGCTACGAATATACTAACTATGAGGATCTCTCATTTGACTCT
TACATAATTCCAACATCAGAATTAAAGCCAGGGGAGCTACGACTATTAGAAGTCGATAAT
TATATAGTACCAACCCAAGACCTTGCCCCCGGACAATTCCGACTTCTGGAAACAGACCAC
TACATAACCCCAACAACAGACCTCCCCCTAGGCCACTTCCGCCTACTAGAAGTCGACCAT
TACATACTTCCCCCATTATTCCTAGAACCAGGCGACCTGCGACTCCTTGACGTTGACAAT
TACATAATCCCCACCCAGGACCTAACCCCTGGACAATTCCGGCTACTAGAGACAGACCAC
TATATAATCCCAACAAACGACCTAAAACCTGGTGAACTACGACTGCTAGAAGTTGATAAC
TACATAATCCCAACCAATGACCTAAAACCAGGTGAACTTCGTCTATTAGAAGTTGATAAT
TATATGATCCCCACACAAGAACTAAAGCCCGGAGAACTACGACTGCTAGAAGTAGACAAT
TATATAATCCCAACATCAGACCTAAAGCCAGGAGAACTACGATTATTAGAAGTAGATAAC
TATATAATTCCAACTAATGACCTTACCCCTGGACAATTCCGGCTGCTAGAAGTTGATAAT
CGAGTTGTACTACCAATAGAAATAACAATCCGAATGTTAGTCTCCTCTGAAGACGTATTA
CGAATAGTTGTTCCAATAGAATCCCCAGTCCGTGTCCTAGTATCTGCTGAAGACGTGCTA
CGCATTGTAATCCCCATAGAATCCCCCATTCGAGTAATCATCACCGCTGATGACGTCCTC
CGAGTAGTACTCCCGATTGAAGCCCCCATTCGTATAATAATTACATCACAAGACGTCTTG
CGAATGGTTGTTCCCATAGAATCCCCTATTCGCATTCTTGTTTCCGCCGAAGATGTACTA
CGAGTCGTTCTGCCAATAGAACTTCCAATCCGTATATTAATTTCATCTGAAGACGTCCTC
CGGGTAGTCTTACCAATAGAACTTCCAATTCGTATACTAATCTCATCCGAAGACGTCCTG
CGAGTAGTCCTCCCAATAGAAATAACAATCCGCATACTAATCTCATCAGAAGATGTACTC
CGAGTTGTCTTACCTATAGAAATAACAATCCGAATATTAGTCTCATCAGAAGACGTACTC
CGAATAGTAGTCCCAATAGAATCTCCAACCCGACTTTTAGTTACAGCCGAAGACGTCCTC
CACTCATGAGCTGTGCCCTCTCTAGGACTAAAAACAGACGCAATCCCAGGCCGTCTAAAC
CATTCTTGAGCTGTTCCATCCCTTGGCGTAAAAATGGACGCAGTCCCAGGACGACTAAAT
CACTCATGAGCCGTACCCGCCCTCGGGGTAAAAACAGACGCAATCCCTGGACGACTAAAT
CACTCATGAGCTGTCCCCACATTAGGCTTAAAAACAGATGCAATTCCCGGACGTCTAAAC
CACTCCTGGGCCCTTCCAGCCATGGGGGTAAAGATAGACGCGGTCCCAGGACGCCTTAAC
CACTCATGAGCAGTCCCCTCCCTAGGACTTAAAACTGATGCCATCCCAGGCCGACTAAAT
CACTCATGAGCCATCCCTTCACTAGGGTTAAAAACCGACGCAATCCCCGGCCGCCTAAAC
CACTCATGAGCCGTACCGTCCCTAGGACTAAAAACTGATGCTATCCCAGGACGACTAAAC
CACTCATGGGCCGTACCCTCCTTGGGCCTAAAAACAGATGCAATCCCAGGACGCCTAAAC
CACTCGTGAGCTGTACCCTCCTTGGGTGTCAAAACAGATGCAATCCCAGGACGACTTCAT
CAAACAACCCTTATATCGTCCCGTCCAGGCTTATATTACGGTCAATGCTCAGAAATTTGC
CAAGCCGCCTTTATTGCCTCACGCCCAGGGGTCTTTTACGGACAATGCTCTGAAATTTGT
CAAACCTCCTTCATCACCACTCGACCAGGAGTGTTTTACGGACAATGCTCAGAAATCTGC
CAAACCACTTTCACCGCTACACGACCGGGGGTATACTACGGTCAATGCTCTGAAATCTGT
CAAACCGCCTTTATTGCCTCCCGCCCCGGGGTATTCTATGGGCAATGCTCAGAAATCTGT
CAAGCAACAGTAACATCAAACCGACCAGGGTTATTCTATGGCCAATGCTCTGAAATTTGT
CAAGCTACAGTCACATCAAACCGACCAGGTCTATTCTATGGCCAATGCTCTGAAATTTGC
CAAACAACCCTAATAACCATACGACCAGGACTGTACTACGGTCAATGCTCAGAAATCTGT
CAAACAACCTTAATATCAACACGACCAGGCCTATTTTATGGACAATGCTCAGAGATCTGC
CAAACATCATTTATTGCTACTCGTCCGGGAGTATTTTACGGACAATGTTCAGAAATTTGC
GGGTCAAACCACAGTTTCATACCCATTGTCCTTGAGTTAGTCCCACTAAAGTACTTTGAA
GGAGCTAATCACAGCTTTATACCAATTGTAGTTGAAGCAGTACCTCTCGAACACTTCGAA
GGAGCTAACCACAGCTACATACCCATTGTAGTAGAGTCTACCCCCCTAAAACACTTTGAA
GGAGCAAACCACAGTTTCATGCCCATCGTCCTAGAATTAATTCCCCTAAAAATCTTTGAA
GGAGCAAACCACAGCTTTATACCCATCGTAGTAGAAGCGGTCCCACTATCTCACTTCGAA
GGATCTAACCATAGCTTTATGCCCATTGTCCTAGAAATGGTTCCACTAAAATATTTCGAA
GGCTCAAATCACAGCTTCATACCCATTGTACTAGAAATAGTGCCTCTAAAATATTTCGAA
GGTTCAAACCACAGCTTCATACCTATTGTCCTCGAATTGGTCCCACTATCCCACTTCGAG
GGCTCAAACCACAGTTTCATACCAATTGTCCTAGAACTAGTACCCCTAGAAGTCTTTGAA
GGAGCAAACCACAGCTTTATACCAATTGTAGTTGAAGCAGTACCGCTAACCGACTTTGAA
AAATGATCTGCGTCAATATTA---------------------TAA
AACTGATCCTCATTAATACTAGAAGACGCCTCGCTAGGAAGCTAA
GCCTGATCCTCACTA------------------CTGTCATCTTAA
ATA---------------------GGGCCCGTATTTACCCTATAG
AACTGGTCCACCCTTATACTAAAAGACGCCTCACTAGGAAGCTAA
AACTGATCTGCTTCAATAATT---------------------TAA
AACTGATCAGCTTCTATAATT---------------------TAA
AAATGATCTACCTCAATGCTT---------------------TAA
AAATGATCTGTATCAATACTA---------------------TAA
AACTGATCTTCATCAATACTA---GAAGCATCACTA------AGA
""")
self.space_interleaved = StringIO(""" 5 176 I
cox2_leita MAFILSFWMI FLLDSVIVLL SFVCFVCVWI CALLFSTVLL VSKLNNIYCT
cox2_crifa MAFILSFWMI FLIDAVIVLL SFVCFVCIWI CSLFFSSFLL VSKINNVYCT
cox2_bsalt MSFIISFWML FLIDSLIVLL SGAIFVCIWI CSLFFLCILF ICKLDYIFCS
cox2_trybb MSFILTFWMI FLMDSIIVLI SFSIFLSVWI CALIIATVLT VTKINNIYCT
cox2_tborr MLFFINQLLL LLVDTFVILE IFSLFVCVFI IVMYILFINY NIFLKNINVY
WDFTASKFID VYWFTIGGMF SLGLLLRLCL LLYFGHLNFV SFDLCKVVGF
WDFTASKFID AYWFTIGGMF VLCLLLRLCL LLYFGCLNFV SFDLCKVVGF
WDFISAKFID LYWFTLGCLF IVCLLIRLCL LLYFSCLNFV CFDLCKCIGF
WDFISSKFID TYWFVLGMMF ILCLLLRLCL LLYFSCINFV SFDLCKVIGF
LDFIGSKYLD LYWFLIGIFF VIVLLIRLCL LLYYSWISLL IFDLCKIMGF
QWYWVYFIFG ETTIFSNLIL ESDYMIGDLR LLQCNHVLTL LSLVIYKLWL
QWYWVYFIFG ETTIFSNLIL ESDYLIGDLR LLQCNHVLTL LSLVIYKLWL
QWYWVYFIFG ETTIFSNLIL ESDYLIGDLR LLQCNHVLTL LSLVIYKVWL
QWYWVYFLFG ETTIFSNLIL ESDYLIGDLR ILQCNHVLTL LSLVIYKLWV
QWYWIFFVFK ENVIFSNLLI ESDYWIGDLR LLQCNNTFNL ICLVVYKIWV
SAVDVIHSFA ISSLGVKVEN LVAVMK
SAVDVIHSFA VSSLGIKVDC IPGRCN
SAIDVIHSFT LANLGIKVD? ?PGRCN
SAVDVIHSFT ISSLGIKVEN PGRCNE
TSIDVIHSFT ISTLGIKIDC IPGRCN
""")
self.interleaved_little = StringIO(""" 6 39 I
Archaeopt CGATGCTTAC CGCCGATGCT
HesperorniCGTTACTCGT TGTCGTTACT
BaluchitheTAATGTTAAT TGTTAATGTT
B. virginiTAATGTTCGT TGTTAATGTT
BrontosaurCAAAACCCAT CATCAAAACC
B.subtilisGGCAGCCAAT CACGGCAGCC
TACCGCCGAT GCTTACCGC
CGTTGTCGTT ACTCGTTGT
AATTGTTAAT GTTAATTGT
CGTTGTTAAT GTTCGTTGT
CATCATCAAA ACCCATCAT
AATCACGGCA GCCAATCAC
""")
self.empty = []
self.noninterleaved_little = StringIO(""" 6 20
Archaeopt CGATGCTTAC CGCCGATGCT
HesperorniCGTTACTCGT TGTCGTTACT
BaluchitheTAATGTTAAT TGTTAATGTT
B. virginiTAATGTTCGT TGTTAATGTT
BrontosaurCAAAACCCAT CATCAAAACC
B.subtilisGGCAGCCAAT CACGGCAGCC
""")
self.noninterleaved_big = StringIO("""10 297
Rhesus tgtggcacaaatactcatgccagctcattacagcatgagaac---agtttgttactcact
aaagacagaatgaatgtagaaaaggctgaattctgtaataaaagcaaacagcctggcttg
gcaaggagccaacataacagatggactggaagtaaggaaacatgtaatgataggcagact
cccagcacagagaaaaaggtagatctgaatgctaatgccctgtatgagagaaaagaatgg
aataagcaaaaactgccatgctctgagaatcctagagacactgaagatgttccttgg
Manatee tgtggcacaaatactcatgccagctcattacagcatgagaatagcagtttattactcact
aaagacagaatgaatgtagaaaaggctgaattctgtcataaaagcaaacagcctggctta
acaaggagccagcagagcagatgggctgaaagtaaggaaacatgtaatgataggcagact
cctagcacagagaaaaaggtagatatgaatgctaatccattgtatgagagaaaagaagtg
aataagcagaaacctccatgctccgagagtgttagagatacacaagatattccttgg
Pig tgtggcacagatactcatgccagctcgttacagcatgagaacagcagtttattactcact
aaagacagaatgaatgtagaaaaggctgaattttgtaataaaagcaagcagcctgtctta
gcaaagagccaacagagcagatgggctgaaagtaagggcacatgtaatgataggcagact
cctaacacagagaaaaaggtagttctgaatactgatctcctgtatgggagaaacgaactg
aataagcagaaacctgcgtgctctgacagtcctagagattcccaagatgttccttgg
""")
class MinimalPhylipParserTests(PhylipGenericTest):
"""Tests of MinimalPhylipParser: returns (label, seq) tuples."""
def test_empty(self):
"""MinimalFastaParser should return empty list from 'file' w/o labels"""
self.assertEqual(list(MinimalPhylipParser(self.empty)), [])
def test_minimal_parser(self):
"""MinimalFastaParser should read single record as (label, seq) tuple"""
seqs = list(MinimalPhylipParser(self.big_interleaved))
self.assertEqual(len(seqs), 10)
label, seq = seqs[-1]
self.assertEqual(label, 'Frog')
self.assertEqual(seq, \
'ATGGCACACCCATCACAATTAGGTTTTCAAGACGCAGCCTCTCCAATTATAGAAGAATTACTTCACTTCCACGACCATACCCTCATAGCCGTTTTTCTTATTAGTACGCTAGTTCTTTACATTATTACTATTATAATAACTACTAAACTAACTAATACAAACCTAATGGACGCACAAGAGATCGAAATAGTGTGAACTATTATACCAGCTATTAGCCTCATCATAATTGCCCTTCCATCCCTTCGTATCCTATATTTAATAGATGAAGTTAATGATCCACACTTAACAATTAAAGCAATCGGCCACCAATGATACTGAAGCTACGAATATACTAACTATGAGGATCTCTCATTTGACTCTTATATAATTCCAACTAATGACCTTACCCCTGGACAATTCCGGCTGCTAGAAGTTGATAATCGAATAGTAGTCCCAATAGAATCTCCAACCCGACTTTTAGTTACAGCCGAAGACGTCCTCCACTCGTGAGCTGTACCCTCCTTGGGTGTCAAAACAGATGCAATCCCAGGACGACTTCATCAAACATCATTTATTGCTACTCGTCCGGGAGTATTTTACGGACAATGTTCAGAAATTTGCGGAGCAAACCACAGCTTTATACCAATTGTAGTTGAAGCAGTACCGCTAACCGACTTTGAAAACTGATCTTCATCAATACTA---GAAGCATCACTA------AGA')
self.assertEqual(seqs[0][0], 'Cow')
seqs = list(MinimalPhylipParser(self.space_interleaved))
self.assertEqual(len(seqs), 5)
self.assertEqual(seqs[0][0], 'cox2_leita')
self.assertEqual(seqs[-1][0], 'cox2_tborr')
self.assertEqual(len(seqs[0][1]), 176)
self.assertEqual(len(seqs[-1][1]), 176)
seqs = list(MinimalPhylipParser(self.interleaved_little))
self.assertEqual(len(seqs), 6)
self.assertEqual(seqs[1][0], 'Hesperorni')
self.assertEqual(seqs[-1][0], 'B.subtilis')
self.assertEqual(seqs[-1][1], 'GGCAGCCAATCACGGCAGCCAATCACGGCAGCCAATCAC')
seqs = list(MinimalPhylipParser(self.noninterleaved_little))
self.assertEqual(len(seqs), 6)
self.assertEqual(seqs[0][0], 'Archaeopt')
self.assertEqual(seqs[-1][0], 'B.subtilis')
self.assertEqual(seqs[-1][-1], 'GGCAGCCAATCACGGCAGCC')
seqs = list(MinimalPhylipParser(self.noninterleaved_big))
self.assertEqual(len(seqs), 3)
self.assertEqual(seqs[0][0], 'Rhesus')
self.assertEqual(seqs[-1][0], 'Pig')
self.assertEqual(seqs[-1][1], 'tgtggcacagatactcatgccagctcgttacagcatgagaacagcagtttattactcactaaagacagaatgaatgtagaaaaggctgaattttgtaataaaagcaagcagcctgtcttagcaaagagccaacagagcagatgggctgaaagtaagggcacatgtaatgataggcagactcctaacacagagaaaaaggtagttctgaatactgatctcctgtatgggagaaacgaactgaataagcagaaacctgcgtgctctgacagtcctagagattcccaagatgttccttgg')
def test_get_align(self):
"""get_align_for_phylip should return Aligment object for phylip files"""
align = get_align_for_phylip(self.big_interleaved)
align = get_align_for_phylip(self.interleaved_little)
s = str(align)
self.assertEqual(s, '''>Archaeopt
CGATGCTTACCGCCGATGCTTACCGCCGATGCTTACCGC
>Hesperorni
CGTTACTCGTTGTCGTTACTCGTTGTCGTTACTCGTTGT
>Baluchithe
TAATGTTAATTGTTAATGTTAATTGTTAATGTTAATTGT
>B. virgini
TAATGTTCGTTGTTAATGTTCGTTGTTAATGTTCGTTGT
>Brontosaur
CAAAACCCATCATCAAAACCCATCATCAAAACCCATCAT
>B.subtilis
GGCAGCCAATCACGGCAGCCAATCACGGCAGCCAATCAC
''')
align = get_align_for_phylip(self.noninterleaved_little)
s = str(align)
self.assertEqual(s, '''>Archaeopt
CGATGCTTACCGCCGATGCT
>Hesperorni
CGTTACTCGTTGTCGTTACT
>Baluchithe
TAATGTTAATTGTTAATGTT
>B. virgini
TAATGTTCGTTGTTAATGTT
>Brontosaur
CAAAACCCATCATCAAAACC
>B.subtilis
GGCAGCCAATCACGGCAGCC
''')
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_pknotsrg.py 000644 000765 000024 00000002024 12024702176 023032 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from cogent.util.unit_test import TestCase, main
from cogent.core.info import Info
from cogent.parse.pknotsrg import pknotsrg_parser
__author__ = "Shandy Wikman"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Shandy Wikman"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Shandy Wikman"
__email__ = "ens01svn@cs.umu.se"
__status__ = "Development"
class PknotsrgParserTest(TestCase):
"""Provides tests for pknotsRG RNA secondary structure format parsers"""
def setUp(self):
"""Setup function"""
#output
self.pknotsrg_out = PKNOTSRG
#expected
self.pknotsrg_exp = [['UGCAUAAUAGCUCC',[(0,8),(3,11),(5,13)],-22.40]]
def test_pknotsrg_output(self):
"""Test for pknotsrg format parser"""
obs = pknotsrg_parser(self.pknotsrg_out)
self.assertEqual(obs,self.pknotsrg_exp)
PKNOTSRG = ['UGCAUAAUAGCUCC\n', '(..{.[..)..}.] (-22.40)\n']
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_psl.py 000644 000765 000024 00000003022 12024702176 021760 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for the PSL parser.
Compatible with blat v.34
"""
from cogent.parse.psl import make_header, MinimalPslParser, PslToTable
from cogent.util.unit_test import TestCase, main
from cogent import LoadTable
__author__ = "Gavin Huttley, Anuj Pahwa"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight","Peter Maxwell", "Gavin Huttley", "Anuj Pahwa"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Development"
fname = 'data/test.psl'
class Test(TestCase):
def test_header(self):
"""should return correct header"""
expect = ['match', 'mis-match', 'rep. match', "N's", 'Q gap count',
'Q gap bases', 'T gap count', 'T gap bases', 'strand', 'Q name',
'Q size', 'Q start', 'Q end', 'T name', 'T size', 'T start',
'T end', 'block count', 'blockSizes', 'qStarts', 'tStarts']
parser = MinimalPslParser(fname)
version = parser.next()
header = parser.next()
self.assertEqual(header, expect)
def test_psl_to_table(self):
table = PslToTable(fname)
def test_getting_seq_coords(self):
"""get correct sequence coordinates to produce a trimmed sequence"""
table = PslToTable(fname)
for row in table:
query_name = row["Q name"]
query_strand = row["strand"]
q_start = row["Q start"]
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_parse/test_rdb.py 000644 000765 000024 00000030077 12024702176 021743 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
#test_rdb.py
"""Unit test for RDB Parser
"""
from cogent.util.unit_test import TestCase, main
from cogent.parse.rdb import RdbParser, MinimalRdbParser,is_seq_label,\
InfoMaker, create_acceptable_sequence
from cogent.core.sequence import Sequence, DnaSequence, RnaSequence
from cogent.core.info import Info
from cogent.parse.record import RecordError
__author__ = "Sandra Smit"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Sandra Smit", "Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Sandra Smit"
__email__ = "sandra.smit@colorado.edu"
__status__ = "Production"
class RdbTests(TestCase):
"""Tests for top-level functions in Rdb.py"""
def test_is_seq_label(self):
"""is_seq_label should return True if a line starts with 'seq:'"""
seq = 'seq:this is a sequence line'
not_seq = 'this is not a sequence line'
still_not_seq = 'this seq: is still not a sequence line'
self.assertEqual(is_seq_label(seq),True)
self.assertEqual(is_seq_label(not_seq),False)
self.assertEqual(is_seq_label(still_not_seq),False)
def test_create_acceptable_sequence(self):
"""create_acceptable_sequence: should handle 'o' and sec. struct"""
f = create_acceptable_sequence
# should keep any char accepted by RNA.Alphabet.DegenGapped
s = "UCAG---NRYBDHKMNSRWVY?"
self.assertEqual(f(s),s)
# should replace 'o' by '?'
s = "UCAG-oo-ACGU"
self.assertEqual(f(s), "UCAG-??-ACGU")
# should strip out secondary info
s = "{UC^AG-[oo]-A(CG)U}"
self.assertEqual(f(s), "UCAG-??-ACGU")
# should leave other chars untouched
s = "XYZ1234"
self.assertEqual(f(s), "XYZ1234")
class InfoMakerTests(TestCase):
"""Tests for the Constructor InfoMaker. Should return an Info object"""
def test_empty(self):
"""InfoMaker: should return empty Info from empty header"""
empty_header = []
obs = InfoMaker(empty_header)
exp = Info()
self.assertEqual(obs,exp)
def test_full(self):
"""InfoMaker should return Info object with name, value pairs"""
test_header = ['acc: X3402','abc:1','mty: ssu','seq: Mit. X3402',\
'','nonsense',':no_name']
obs = InfoMaker(test_header)
exp = Info()
exp.rRNA = 'X3402'
exp.abc = '1'
exp.Species = 'Mit. X3402'
exp.Gene = 'ssu'
self.assertEqual(obs,exp)
class GenericRdbTest(TestCase):
"SetUp data for all Rdb parsers"""
def setUp(self):
self.empty = []
self.labels = 'mty:ssu\nseq:bac\n//\nttl:joe\nseq:mit\n//'.split('\n')
self.nolabels = 'ACGUAGCUAGCUAC\nGCUGCAUCG\nAUCG\n//'.split('\n')
self.oneseq = 'seq:H.Sapiens\nAGUCAUCUAGAUHCAUHC\n//'.split('\n')
self.multiline = 'seq:H.Sapiens\nAGUCAUUAG\nAUHCAUHC\n//'.split('\n')
self.threeseq =\
'seq:bac\nAGU\n//\nseq:mit\nACU\n//\nseq:pla\nAAA\n//'.split('\n')
self.twogood =\
'seq:bac\n//\nseq:mit\nACU\n//\nseq:pla\nAAA\n//'.split('\n')
self.oneX =\
'seq:bac\nX\n//\nseq:mit\nACT\n//\nseq:pla\nAAA\n//'.split('\n')
self.strange = 'seq:bac\nACGUXxAaKkoo---*\n//'.split('\n')
class MinimalRdbParserTests(GenericRdbTest):
"""Tests of MinimalRdbParser: returns (headerLines,sequence) tuples"""
def test_empty(self):
"""MinimalRdbParser should return empty list from file w/o seqs"""
self.assertEqual(list(MinimalRdbParser(self.empty)),[])
self.assertEqual(list(MinimalRdbParser(self.nolabels, strict=False)),
[])
self.assertRaises(RecordError, list, MinimalRdbParser(self.nolabels))
def test_only_labels(self):
"""MinimalRdbParser should return empty list from file w/o seqs"""
#should fail if strict (the default)
self.assertRaises(RecordError, list,
MinimalRdbParser(self.labels,strict=True))
#if not strict, should skip the records
self.assertEqual(list(MinimalRdbParser(self.labels, strict=False)),
[])
def test_only_sequences(self):
"""MinimalRdbParser should return empty list form file w/o lables"""
#should fail if strict (the default)
self.assertRaises(RecordError, list,
MinimalRdbParser(self.nolabels,strict=True))
#if not strict, should skip the records
self.assertEqual(list(MinimalRdbParser(self.nolabels, strict=False)),
[])
def test_single(self):
"""MinimalRdbParser should read single record as (header,seq) tuple"""
res = list(MinimalRdbParser(self.oneseq))
self.assertEqual(len(res),1)
first = res[0]
self.assertEqual(first, (['seq:H.Sapiens'], 'AGUCAUCUAGAUHCAUHC'))
res = list(MinimalRdbParser(self.multiline))
self.assertEqual(len(res),1)
first = res[0]
self.assertEqual(first, (['seq:H.Sapiens'], 'AGUCAUUAGAUHCAUHC'))
def test_multiple(self):
"""MinimalRdbParser should read multiple record correctly"""
res = list(MinimalRdbParser(self.threeseq))
self.assertEqual(len(res), 3)
a, b, c = res
self.assertEqual(a, (['seq:bac'], 'AGU'))
self.assertEqual(b, (['seq:mit'], 'ACU'))
self.assertEqual(c, (['seq:pla'], 'AAA'))
def test_multiple_bad(self):
"""MinimalRdbParser should complain or skip bad records"""
self.assertRaises(RecordError, list, MinimalRdbParser(self.twogood))
f = list(MinimalRdbParser(self.twogood, strict=False))
self.assertEqual(len(f), 2)
a, b = f
self.assertEqual(a, (['seq:mit'], 'ACU'))
self.assertEqual(b, (['seq:pla'], 'AAA'))
def test_strange(self):
"""MRP: handle strange char. according to constr. and strip off '*'"""
f = list(MinimalRdbParser(self.strange))
obs = f[0]
exp = (['seq:bac'],'ACGUXxAaKkoo---')
self.assertEqual(obs,exp)
class RdbParserTests(GenericRdbTest):
"""Tests for the RdbParser. Should return Sequence objects"""
def test_empty(self):
"""RdbParser should return empty list from 'file' w/o labels"""
self.assertEqual(list(RdbParser(self.empty)), [])
self.assertEqual(list(RdbParser(self.nolabels, strict=False)),
[])
self.assertRaises(RecordError, list, RdbParser(self.nolabels))
def test_only_labels(self):
"""RdbParser should return empty list from file w/o seqs"""
#should fail if strict (the default)
self.assertRaises(RecordError, list,
RdbParser(self.labels,strict=True))
#if not strict, should skip the records
self.assertEqual(list(RdbParser(self.labels, strict=False)), [])
def test_only_sequences(self):
"""RdbParser should return empty list form file w/o lables"""
#should fail if strict (the default)
self.assertRaises(RecordError, list,
RdbParser(self.nolabels,strict=True))
#if not strict, should skip the records
self.assertEqual(list(RdbParser(self.nolabels, strict=False)),
[])
def test_single(self):
"""RdbParser should read single record as (header,seq) tuple"""
res = list(RdbParser(self.oneseq))
self.assertEqual(len(res),1)
first = res[0]
self.assertEqual(first, Sequence('AGUCAUCUAGAUHCAUHC'))
self.assertEqual(first.Info, Info({'Species':'H.Sapiens',\
'OriginalSeq':'AGUCAUCUAGAUHCAUHC'}))
res = list(RdbParser(self.multiline))
self.assertEqual(len(res),1)
first = res[0]
self.assertEqual(first, Sequence('AGUCAUUAGAUHCAUHC'))
self.assertEqual(first.Info, Info({'Species':'H.Sapiens',\
'OriginalSeq':'AGUCAUUAGAUHCAUHC'}))
def test_single_constructor(self):
"""RdbParser should use constructors if supplied"""
to_dna = lambda x, Info: DnaSequence(str(x).replace('U','T'), \
Info=Info)
f = list(RdbParser(self.oneseq, to_dna))
self.assertEqual(len(f), 1)
a = f[0]
self.assertEqual(a, 'AGTCATCTAGATHCATHC')
self.assertEqual(a.Info, Info({'Species':'H.Sapiens',\
'OriginalSeq':'AGUCAUCUAGAUHCAUHC'}))
def alternativeConstr(header_lines):
info = Info()
for line in header_lines:
all = line.strip().split(':',1)
#strip out empty lines, lines without name, lines without colon
if not all[0] or len(all) != 2:
continue
name = all[0].upper()
value = all[1].strip().upper()
info[name] = value
return info
f = list(RdbParser(self.oneseq, to_dna, alternativeConstr))
self.assertEqual(len(f), 1)
a = f[0]
self.assertEqual(a, 'AGTCATCTAGATHCATHC')
exp_info = Info({'OriginalSeq':'AGUCAUCUAGAUHCAUHC',\
'Refs':{}, 'SEQ':'H.SAPIENS'})
self.assertEqual(a.Info, Info({'OriginalSeq':'AGUCAUCUAGAUHCAUHC',\
'Refs':{}, 'SEQ':'H.SAPIENS'}))
def test_multiple_constructor_bad(self):
"""RdbParser should complain or skip bad records w/ constructor"""
def dnastrict(x, **kwargs):
try:
return DnaSequence(x, **kwargs)
except Exception:
raise RecordError, "Could not convert sequence"
self.assertRaises(RecordError, list, RdbParser(self.oneX,dnastrict))
f = list(RdbParser(self.oneX, dnastrict, strict=False))
self.assertEqual(len(f), 2)
a, b = f
self.assertEqual(a, 'ACT')
self.assertEqual(a.Info,Info({'Species':'mit','OriginalSeq':'ACT'}))
self.assertEqual(b, 'AAA')
self.assertEqual(b.Info,Info({'Species':'pla','OriginalSeq':'AAA'}))
def test_full(self):
"""RdbParser: full data, valid and invalid"""
# when only good record, should work independent of strict
r1 = RnaSequence("-??GG-UGAA--CGCU---ACGU-N???---",\
Info=Info({'Species': "unidentified Thermus OPB AF027020",\
'Refs':{'rRNA':['AF027020']},\
'OriginalSeq':'-o[oGG-U{G}AA--C^GC]U---ACGU-Nooo---'}))
r2 = RnaSequence("---CGAUCG--UAUACG-N???-",\
Info=Info({'Species':'Thermus silvanus X84211',\
'Refs':{'rRNA':['X84211']},\
'OriginalSeq':'---CGAU[C(G){--UA}U]ACG-Nooo-'}))
obs = list(RdbParser(RDB_LINES_ONLY_GOOD.split('\n'), strict=True))
self.assertEqual(len(obs), 2)
self.assertEqual(obs[0], r1)
self.assertEqual(str(obs[0]), str(r1))
self.assertEqual(obs[0].Info, r1.Info)
self.assertEqual(obs[1], r2)
self.assertEqual(str(obs[1]), str(r2))
self.assertEqual(obs[1].Info, r2.Info)
obs = list(RdbParser(RDB_LINES_ONLY_GOOD.split('\n'), strict=False))
self.assertEqual(len(obs), 2)
self.assertEqual(obs[0], r1)
self.assertEqual(str(obs[0]), str(r1))
self.assertEqual(obs[0].Info, r1.Info)
# when strict, should raise error on invalid record
f = RdbParser(RDB_LINES_GOOD_BAD.split('\n'), strict=True)
self.assertRaises(RecordError, list, f)
# when not strict, malicious record is skipped
obs = list(RdbParser(RDB_LINES_GOOD_BAD.split('\n'), strict=False))
self.assertEqual(len(obs), 2)
self.assertEqual(obs[0], r1)
self.assertEqual(str(obs[0]), str(r1))
self.assertEqual(obs[0].Info, r1.Info)
self.assertEqual(obs[1], r2)
self.assertEqual(str(obs[1]), str(r2))
self.assertEqual(obs[1].Info, r2.Info)
RDB_LINES_ONLY_GOOD=\
"""acc:AF027020
seq: unidentified Thermus OPB AF027020
-o[oGG-U{G}AA--C^GC]U---ACGU-Nooo---
//
acc:X84211
seq: Thermus silvanus X84211
---CGAU[C(G){--UA}U]ACG-Nooo-
//
"""
RDB_LINES_GOOD_BAD=\
"""acc:AF027020
seq: unidentified Thermus OPB AF027020
-o[oGG-U{G}AA--C^GC]U---ACGU-Nooo---
//
acc:ABC123
seq: E. coli
---ACGU-Nooo-RYXQ-
//
acc:X84211
seq: Thermus silvanus X84211
---CGAU[C(G){--UA}U]ACG-Nooo-
//
"""
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_record.py 000644 000765 000024 00000050015 12024702176 022444 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for parser support libraries dealing with records.
"""
from cogent.parse.record import FieldError, RecordError, Grouper, \
DelimitedSplitter, GenericRecord, MappedRecord, \
TypeSetter, list_adder, dict_adder, \
LineOrientedConstructor, int_setter, str_setter, bool_setter, \
string_and_strip, FieldWrapper, StrictFieldWrapper, raise_unknown_field, \
FieldMorpher, list_extender
from cogent.util.unit_test import TestCase, main
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class recordsTests(TestCase):
"""Tests of top-level functionality in records."""
def test_string_and_strip(self):
"""string_and_strip should convert all items to strings and strip them"""
self.assertEqual(string_and_strip(), [])
self.assertEqual(string_and_strip('\t', ' ', '\n\t'), ['','',''])
self.assertEqual(string_and_strip('\ta\tb', 3, ' cde e', None), \
['a\tb', '3', 'cde e', 'None'])
def test_raise_unknown_field(self):
"""raise_unknown_field should always raise FieldError"""
self.assertRaises(FieldError, raise_unknown_field, 'xyz', 123)
class GrouperTests(TestCase):
"""Tests of the Grouper class."""
def test_call(self):
"""Grouper should return lists containing correct number of groups"""
empty = []
s3 = 'abc'
s10 = range(10)
g1 = Grouper(1)
g2 = Grouper(2)
g5 = Grouper(5)
self.assertEqual(list(g1(empty)), [])
self.assertEqual(list(g2(empty)), [])
self.assertEqual(list(g5(empty)), [])
self.assertEqual(list(g1(s3)), [['a'], ['b'], ['c']])
self.assertEqual(list(g2(s3)), [['a','b'], ['c']])
self.assertEqual(list(g5(s3)), [['a','b','c']])
self.assertEqual(list(g1(s10)), [[i] for i in range(10)])
self.assertEqual(list(g2(s10)), [[0,1],[2,3],[4,5],[6,7],[8,9]])
self.assertEqual(list(g5(s10)), [[0,1,2,3,4],[5,6,7,8,9]])
def test_call_bad(self):
"""Grouper call should raise ValueError if NumItems is not an int"""
g_none = Grouper(None)
g_neg = Grouper(-1)
g_zero = Grouper(0)
g_alpha = Grouper('abc')
for g in (g_none, g_neg, g_zero, g_alpha):
iterator = g('abcd')
self.assertRaises(ValueError, list, iterator)
class DelimitedSplitterTests(TestCase):
"""Tests of the DelimitedSplitter factory function."""
def test_parsers(self):
"""DelimitedSplitter should return function with correct behavior"""
empty = DelimitedSplitter()
space = DelimitedSplitter(None)
semicolon = DelimitedSplitter(';')
twosplits = DelimitedSplitter(';', 2)
allsplits = DelimitedSplitter(';', None)
lastone = DelimitedSplitter(';', -1)
lasttwo = DelimitedSplitter(';', -2)
self.assertEqual(empty('a b c'), ['a', 'b c'])
self.assertEqual(empty('abc'), ['abc'])
self.assertEqual(empty(' '), [])
self.assertEqual(empty('a b c'), space('a b c'))
self.assertEqual(semicolon(' a ; b ; c d'), ['a','b ; c d'])
self.assertEqual(twosplits(' a ; b ; c d'), ['a','b', 'c d'])
self.assertEqual(allsplits(' a ; b ; c;;d;e ;'),\
['a','b','c','','d','e',''])
self.assertEqual(lastone(' a ; b ; c;;d;e ;'),\
['a ; b ; c;;d;e',''])
self.assertEqual(lasttwo(' a ; b ; c;;d;e ;'),\
['a ; b ; c;;d','e',''])
self.assertEqual(lasttwo(''), [])
self.assertEqual(lasttwo('x'), ['x'])
self.assertEqual(lasttwo('x;'), ['x', ''])
class GenericRecordTests(TestCase):
"""Tests of the GenericRecord class"""
class gr(GenericRecord):
Required = {'a':'x', 'b':[], 'c':{}}
def test_init(self):
"""GenericRecord init should work OK empty or with data"""
self.assertEqual(GenericRecord(), {})
self.assertEqual(GenericRecord({'a':1}), {'a':1})
assert isinstance(GenericRecord(), GenericRecord)
def test_init_subclass(self):
"""GenericRecord subclass init should include required data"""
self.assertEqual(self.gr(), {'a':'x', 'b':[], 'c':{}})
self.assertEqual(self.gr({'a':[]}), {'a':[], 'b':[],'c':{}})
assert isinstance(self.gr(), self.gr)
assert isinstance(self.gr(), GenericRecord)
def test_delitem(self):
"""GenericRecord delitem should fail if item required"""
g = self.gr()
g['d'] = 3
self.assertEqual(g, {'a':'x','b':[],'c':{},'d':3})
del g['d']
self.assertEqual(g, {'a':'x','b':[],'c':{}})
self.assertRaises(AttributeError, g.__delitem__, 'a')
g['c'][3] = 4
self.assertEqual(g['c'], {3:4})
def test_copy(self):
"""GenericRecord copy should include attributes and set correct class"""
g = self.gr()
g['a'] = 'abc'
g.X = 'y'
h = g.copy()
self.assertEqual(g, h)
assert isinstance(h, self.gr)
self.assertEqual(h.X, 'y')
self.assertEqual(h, {'a':'abc', 'b':[], 'c':{}})
class MappedRecordTests(TestCase):
"""Tests of the MappedRecord class"""
def setUp(self):
"""Define a few standard MappedRecords"""
self.empty = MappedRecord()
self.single = MappedRecord({'a':3})
self.several = MappedRecord(a=4,b=5,c='a',d=[1,2,3])
def test_init_empty(self):
"""MappedRecord empty init should work OK"""
g = MappedRecord()
self.assertEqual(g, {})
def test_init_data(self):
"""MappedRecord should work like normal dict init"""
exp = {'a':3, 'b':4}
self.assertEqual(MappedRecord({'a':3, 'b':4}), exp)
self.assertEqual(MappedRecord(a=3, b=4), exp)
self.assertEqual(MappedRecord([['a',3],['b',4]]), exp)
def test_init_subclass(self):
"""MappedRecord subclasses should behave as expected"""
class rec(MappedRecord):
Required = {'a':{}, 'b':'xyz', 'c':3}
Aliases = {'B':'b'}
r = rec()
self.assertEqual(r, {'a':{}, 'b':'xyz', 'c':3})
#test that subclassing is correct
s = r.copy()
assert isinstance(s, rec)
#test Aliases
s.B = 0
self.assertEqual(s, {'a':{}, 'b':0, 'c':3})
#test Required
try:
del s.B
except AttributeError:
pass
else:
raise AssertionError, "Subclass failed to catch requirement"
def test_getattr(self):
"""MappedRecord getattr should look in dict after real attrs"""
s = self.several
self.assertEqual(s.Aliases, {})
self.assertEqual(s.a, 4)
self.assertEqual(s.d, [1,2,3])
for key in s:
self.assertEqual(getattr(s, key), s[key])
assert 'xyz' not in s
self.assertEqual(s.xyz, None)
self.assertEqual(s['xyz'], None)
s.Aliases = {'xyz':'a'}
self.assertEqual(s['xyz'], 4)
def test_setattr(self):
"""MappedRecord setattr should add to dict"""
s = self.single
#check that we haven't screwed up normal attribute setting
assert 'Aliases' not in s
s.Aliases = {'x':'y'}
assert 'Aliases' not in s
self.assertEqual(s.Aliases, {'x':'y'})
s.x = 5
assert 'x' in s
self.assertEqual(s['x'], 5)
self.assertEqual(s.x, 5)
s.Aliases = {'XYZ':'b'}
s.XYZ = 3
self.assertEqual(s.b, 3)
def test_delattr(self):
"""MappedRecord delattr should work for 'normal' and other attributes"""
s = self.single
s.__dict__['x'] = 'y'
assert 'x' not in s
self.assertEqual(s.x, 'y')
del s.x
self.assertEqual(s.x, None)
self.assertEqual(s, {'a':3})
#try it for an internal attribute: check it doesn't delete anything else
s.b = 4
self.assertEqual(s, {'a':3, 'b':4})
del s.a
self.assertEqual(s, {'b':4})
del s.abc
self.assertEqual(s, {'b':4})
s.Required = {'b':True}
try:
del s.b
except AttributeError:
pass
else:
raise AssertionError, "Allowed deletion of required attribute"""
s.a = 3
self.assertEqual(s.a, 3)
s.Aliases = {'xyz':'a'}
del s.xyz
self.assertEqual(s.a, None)
def test_getitem(self):
"""MappedRecord getitem should work only for keys, not attributes"""
s = self.single
self.assertEqual(s['Required'], None)
self.assertEqual(s['a'], 3)
self.assertEqual(s['xyz'], None)
self.assertEquals(s[list('abc')], None)
s.Aliases = {'xyz':'a'}
self.assertEqual(s['xyz'], 3)
def test_setitem(self):
"""MappedRecord setitem should work only for keys, not attributes"""
s = self.single
s['Required'] = None
self.assertEqual(s, {'a':3, 'Required':None})
self.assertEqual(s.Required, {})
self.assertNotEqual(s.Required, None)
s['c'] = 5
self.assertEqual(s, {'a':3, 'c':5, 'Required':None})
#still not allowed unhashable objects as keys
self.assertRaises(TypeError, s.__setitem__, range(3))
s.Aliases = {'C':'c'}
s['C'] = 3
self.assertEqual(s, {'a':3, 'c':3, 'Required':None})
def test_delitem(self):
"""MappedRecord delitem should only work for keys, not attributes"""
s = self.single
del s['Required']
self.assertEqual(s.Required, {})
s.Required = {'a':True}
try:
del s['a']
except AttributeError:
pass
else:
raise AssertionError, "Allowed deletion of required item"
s.Aliases = {'B':'b'}
s.b = 5
self.assertEqual(s.b, 5)
del s.B
self.assertEqual(s.b, None)
def test_contains(self):
"""MappedRecord contains should use aliases, but not apply to attrs"""
s = self.single
assert 'a' in s
assert 'b' not in s
s.b = 5
assert 'b' in s
assert 'Required' not in s
assert 'A' not in s
s.Aliases = {'A':'a'}
assert 'A' in s
def test_get(self):
"""MappedRecord get should be typesafe against unhashables"""
s = self.single
self.assertEqual(s.get(1, 6), 6)
self.assertEqual(s.get('a', 'xyz'), 3)
self.assertEqual(s.get('ABC', 'xyz'), 'xyz')
s.Aliases = {'ABC':'a'}
self.assertEqual(s.get('ABC', 'xyz'), 3)
self.assertEqual(s.get([1,2,3], 'x'), 'x')
def test_setdefault(self):
"""MappedRecord setdefault should not be typesafe against unhashables"""
s = self.single
x = s.setdefault('X', 'xyz')
self.assertEqual(x, 'xyz')
self.assertEqual(s, {'a':3, 'X':'xyz'})
self.assertRaises(TypeError, s.setdefault, ['a','b'], 'xyz')
def test_update(self):
"""MappedRecord update should transparently convert keys"""
s = self.single
s.b = 999
s.Aliases = {'XYZ':'x', 'ABC':'a'}
d = {'ABC':111, 'CVB':222}
s.update(d)
self.assertEqual(s, {'a':111, 'b':999, 'CVB':222})
def test_copy(self):
"""MappedRecord copy should return correct class"""
s = self.single
t = s.copy()
assert isinstance(t, MappedRecord)
s.Aliases = {'XYZ':'x'}
u = s.copy()
u.Aliases['ABC'] = 'a'
self.assertEqual(s.Aliases, {'XYZ':'x'})
self.assertEqual(t.Aliases, {})
self.assertEqual(u.Aliases, {'XYZ':'x', 'ABC':'a'})
def test_subclass(self):
"""MappedRecord subclassing should work correctly"""
class ret3(MappedRecord):
DefaultValue = 3
ClassData = 'xyz'
x = ret3({'ABC':777, 'DEF':'999'})
self.assertEqual(x.ZZZ, 3)
self.assertEqual(x.ABC, 777)
self.assertEqual(x.DEF, '999')
self.assertEqual(x.ClassData, 'xyz')
x.ZZZ = 6
self.assertEqual(x.ZZZ, 6)
self.assertEqual(x.ZZ, 3)
x.ClassData = 'qwe'
self.assertEqual(x.ClassData, 'qwe')
self.assertEqual(ret3.ClassData, 'xyz')
def test_DefaultValue(self):
"""MappedRecord DefaultValue should give new copy when requested"""
class m(MappedRecord):
DefaultValue=[]
a = m()
b = m()
assert a['abc'] is not b['abc']
assert a['abc'] == b['abc']
class dummy(object):
"""Do-nothing class whose attributes can be freely abused."""
pass
class TypeSetterTests(TestCase):
"""Tests of the TypeSetter class"""
def test_setter_empty(self):
"""TypeSetter should set attrs to vals on empty init"""
d = dummy()
ident = TypeSetter()
ident(d, 'x', 'abc')
self.assertEqual(d.x, 'abc')
ident(d, 'y', 3)
self.assertEqual(d.y, 3)
ident(d, 'x', 2)
self.assertEqual(d.x, 2)
def test_setter_typed(self):
"""TypeSetter should set attrs to constructor(val) when specified"""
d = dummy()
i = TypeSetter(int)
i(d, 'zz', 3)
self.assertEqual(d.zz, 3)
i(d, 'xx', '456')
self.assertEqual(d.xx, 456)
class TypeSetterLikeTests(TestCase):
"""Tests of the functions that behave similarly to TypeSetter products"""
def test_list_adder(self):
"""list_adder should add items to list, creating if necessary"""
d = dummy()
list_adder(d, 'x', 3)
self.assertEqual(d.x, [3])
list_adder(d, 'x', 'abc')
self.assertEqual(d.x, [3, 'abc'])
list_adder(d, 'y', [2,3])
self.assertEqual(d.x, [3, 'abc'])
self.assertEqual(d.y, [[2,3]])
def test_list_extender(self):
"""list_adder should add items to list, creating if necessary"""
d = dummy()
list_extender(d, 'x', '345')
self.assertEqual(d.x, ['3','4','5'])
list_extender(d, 'x', 'abc')
self.assertEqual(d.x, ['3','4','5','a','b','c'])
list_extender(d, 'y', [2,3])
self.assertEqual(d.x, ['3','4','5','a','b','c'])
self.assertEqual(d.y, [2,3])
list_extender(d, 'y', None)
self.assertEqual(d.y, [2,3,None])
def test_dict_adder(self):
"""dict_adder should add items to dict, creating if necessary"""
d = dummy()
dict_adder(d, 'x', 3)
self.assertEqual(d.x, {3:None})
dict_adder(d, 'x', 'ab')
self.assertEqual(d.x, {3:None, 'a':'b'})
dict_adder(d, 'x', ['a', 0])
self.assertEqual(d.x, {3:None, 'a':0})
dict_adder(d, 'y', None)
self.assertEqual(d.x, {3:None, 'a':0})
self.assertEqual(d.y, {None:None})
class LineOrientedConstructorTests(TestCase):
"""Tests of the LineOrientedConstructor class"""
def test_init_empty(self):
"""LOC empty init should succeed with expected defaults"""
l = LineOrientedConstructor()
self.assertEqual(l.Lines, [])
self.assertEqual(l.LabelSplitter(' ab cd '), ['ab','cd'])
self.assertEqual(l.FieldMap, {})
self.assertEqual(l.Constructor, MappedRecord)
self.assertEqual(l.Strict, False)
def test_empty_LOC(self):
"""LOC empty should fail if strict, fill fields if not strict"""
data = ["abc def","3 n","\t abc \txyz\n\n", "fgh "]
l = LineOrientedConstructor()
result = l()
self.assertEqual(result, {})
result = l([])
self.assertEqual(result, {})
result = l([' ','\n\t '])
self.assertEqual(result, {})
result = l(data)
self.assertEqual(result, {'abc':'xyz', '3':'n', 'fgh':None})
def test_full_LOC(self):
"""LOC should behave as expected when initialized with rich data"""
data = ["abc\t def"," 3 \t n"," abc \txyz\n\n", "x\t5", "fgh ",
"x\t3 "]
class rec(MappedRecord):
Required = {'abc':[]}
maps = {'abc':list_adder, 'x':int_setter, 'fgh':bool_setter}
label_splitter = DelimitedSplitter('\t')
constructor = rec
strict = True
loc_bad = LineOrientedConstructor(data, label_splitter, maps, \
constructor, strict)
self.assertRaises(FieldError, loc_bad)
strict = False
loc_good = LineOrientedConstructor(data, label_splitter, maps, \
constructor, strict)
result = loc_good()
assert isinstance(result, rec)
self.assertEqual(result, \
{'abc':['def','xyz'], '3':'n','fgh':False,'x':3})
class fake_dict(dict):
"""Test that constructors return the correct subclass"""
pass
class FieldWrapperTests(TestCase):
"""Tests of the FieldWrapper factory function"""
def test_default(self):
"""Default FieldWrapper should wrap fields and labels"""
fields = list('abcde')
f = FieldWrapper(fields)
self.assertEqual(f(''), {})
self.assertEqual(f('xy za '), {'a':'xy','b':'za'})
self.assertEqual(f('1 2\t\t 3 \n4 5 6'), \
{'a':'1','b':'2','c':'3','d':'4','e':'5'})
def test_splitter(self):
"""FieldWrapper with splitter should use that splitter"""
fields = ['label', 'count']
splitter = DelimitedSplitter(':', -1)
f = FieldWrapper(fields, splitter)
self.assertEqual(f(''), {})
self.assertEqual(f('nknasd:'), {'label':'nknasd', 'count':''})
self.assertEqual(f('n:k:n:a:sd '), {'label':'n:k:n:a', 'count':'sd'})
def test_constructor(self):
"""FieldWrapper with constructor should use that constructor"""
fields = list('abc')
f = FieldWrapper(fields, constructor=fake_dict)
self.assertEqual(f('x y'), {'a':'x','b':'y'})
assert isinstance(f('x y'), fake_dict)
class StrictFieldWrapperTests(TestCase):
"""Tests of the StrictFieldWrapper factory function"""
def test_default(self):
"""Default StrictFieldWrapper should wrap fields if count correct"""
fields = list('abcde')
f = StrictFieldWrapper(fields)
self.assertEqual(f('1 2\t\t 3 \n4 5 '), \
{'a':'1','b':'2','c':'3','d':'4','e':'5'})
self.assertRaises(FieldError, f, '')
self.assertRaises(FieldError, f, 'xy za ')
def test_splitter(self):
"""StrictFieldWrapper with splitter should use that splitter"""
fields = ['label', 'count']
splitter = DelimitedSplitter(':', -1)
f = StrictFieldWrapper(fields, splitter)
self.assertEqual(f('n:k:n:a:sd '), {'label':'n:k:n:a', 'count':'sd'})
self.assertEqual(f('nknasd:'), {'label':'nknasd', 'count':''})
self.assertRaises(FieldError, f, '')
def test_constructor(self):
"""StrictFieldWrapper with constructor should use that constructor"""
fields = list('ab')
f = StrictFieldWrapper(fields, constructor=fake_dict)
self.assertEqual(f('x y'), {'a':'x','b':'y'})
assert isinstance(f('x y'), fake_dict)
class FieldMorpherTests(TestCase):
"""Tests of the FieldMorpher class."""
def test_default(self):
"""FieldMorpher default should use correct constructors"""
fm = FieldMorpher({'a':int, 'b':str})
self.assertEqual(fm({'a':'3', 'b':456}), {'a':3,'b':'456'})
def test_default_error(self):
"""FieldMorpher default should raise FieldError on unknown fields"""
fm = FieldMorpher({'a':int, 'b':str})
self.assertRaises(FieldError, fm, {'a':'3', 'b':456, 'c':'4'})
def test_altered_default(self):
"""FieldMorpher with default set should apply it"""
func = lambda x, y: (str(x), float(y) - 0.5)
fm = FieldMorpher({'3':str,4:int}, func)
#check that recognized values aren't tampered with
self.assertEqual(fm({3:3, 4:'4'}), {'3':'3', 4:4})
#check that unrecognized values get the appropriate conversion
self.assertEqual(fm({3:3, 5:'5'}), {'3':'3', '5':4.5})
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_record_finder.py 000644 000765 000024 00000023462 12024702176 024001 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for recordfinders: parsers that group the lines for a record.
"""
from cogent.parse.record import RecordError
from cogent.parse.record_finder import DelimitedRecordFinder, \
LabeledRecordFinder, LineGrouper, TailedRecordFinder
from cogent.util.unit_test import TestCase, main
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Zongzhi Liu"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class TailedRecordFinderTests(TestCase):
"""Tests of the TailedRecordFinder factory function."""
def setUp(self):
"""Define a standard TailedRecordFinder"""
self.endswith_period = lambda x: x.endswith('.')
self.period_tail_finder = TailedRecordFinder(self.endswith_period)
def test_parsers(self):
"""TailedRecordFinder should split records into lines correctly"""
lines = '>abc\ndef\nz.\n>efg\nz.'.split()
fl = self.period_tail_finder
self.assertEqual(list(fl(lines)), \
[['>abc', 'def', 'z.'], ['>efg','z.']])
def test_parsers_empty(self):
"""TailedRecordFinder should return empty list on empty lines"""
fl = self.period_tail_finder
self.assertEqual(list(fl([' ','\n'])), [])
self.assertEqual(list(fl([])), [])
def test_parsers_strip(self):
"""TailedRecordFinder should trim each line correctly"""
fl = self.period_tail_finder
lines = '>abc \n \t def\n z. \t\n>efg \nz.'.split('\n')
self.assertEqual(list(fl(lines)), \
[['>abc', ' \t def', ' z.'], ['>efg','z.']])
def test_parsers_leftover(self):
"""TailedRecordFinder should raise error or yield leftover"""
f = self.period_tail_finder
good = [ 'abc \n',
'def\n',
'.\n',
'ghi \n',
'j.',
]
blank = ['', ' ', '\t \t\n\n']
bad = ['abc']
result = [['abc', 'def','.'], ['ghi','j.']]
self.assertEqual(list(f(good)), result)
self.assertEqual(list(f(good+blank)), result)
self.assertRaises(RecordError, list, f(good+bad))
f2 = TailedRecordFinder(self.endswith_period, strict=False)
self.assertEqual(list(f2(good+bad)), result + [['abc']])
def test_parsers_ignore(self):
"""TailedRecordFinder should skip lines to ignore."""
def never(line):
return False
def ignore_labels(line):
return (not line) or line.isspace() or line.startswith('#')
lines = ['abc','\n','1.','def','#ignore','2.']
self.assertEqual(list(TailedRecordFinder(self.endswith_period)(lines)),
[['abc', '1.'],['def','#ignore','2.']])
self.assertEqual(list(TailedRecordFinder(self.endswith_period,
ignore=never)(lines)),
[['abc', '', '1.'],['def','#ignore','2.']])
self.assertEqual(list(TailedRecordFinder(self.endswith_period,
ignore=ignore_labels)(lines)),
[['abc','1.'],['def','2.']])
class DelimitedRecordFinderTests(TestCase):
"""Tests of the DelimitedRecordFinder factory function."""
def test_parsers(self):
"""DelimitedRecordFinder should split records into lines correctly"""
lines = 'abc\ndef\n//\nefg\n//'.split()
self.assertEqual(list(DelimitedRecordFinder('//')(lines)), \
[['abc', 'def', '//'], ['efg','//']])
self.assertEqual(list(DelimitedRecordFinder('//', keep_delimiter=False)
(lines)), \
[['abc', 'def'], ['efg']])
def test_parsers_empty(self):
"""DelimitedRecordFinder should return empty list on empty lines"""
self.assertEqual(list(DelimitedRecordFinder('//')([' ','\n'])), [])
self.assertEqual(list(DelimitedRecordFinder('//')([])), [])
def test_parsers_strip(self):
"""DelimitedRecordFinder should trim each line correctly"""
lines = ' \t abc \n \t def\n // \t\n\t\t efg \n//'.split('\n')
self.assertEqual(list(DelimitedRecordFinder('//')(lines)), \
[['abc', 'def', '//'], ['efg','//']])
def test_parsers_error(self):
"""DelimitedRecordFinder should raise RecordError if trailing data"""
good = [ ' \t abc \n',
'\t def\n',
'// \t\n',
'\t\n',
'\t efg \n',
'\t\t//\n',
]
blank = ['', ' ', '\t \t\n\n']
bad = ['abc']
result = [['abc', 'def', '//'], ['efg','//']]
r = DelimitedRecordFinder('//')
self.assertEqual(list(r(good)), result)
self.assertEqual(list(r(good+blank)), result)
try:
list(r(good+bad))
except RecordError:
pass
else:
raise AssertionError, "Parser failed to raise error on bad data"
r = DelimitedRecordFinder('//', strict=False)
self.assertEqual(list(r(good+bad)), result + [['abc']])
def test_parsers_ignore(self):
"""DelimitedRecordFinder should skip lines to ignore."""
def never(line):
return False
def ignore_labels(line):
return (not line) or line.isspace() or line.startswith('#')
lines = ['>abc','\n','1', '$$', '>def','#ignore','2', '$$']
self.assertEqual(list(DelimitedRecordFinder('$$')(lines)),
[['>abc', '1', '$$'],['>def','#ignore','2', '$$']])
self.assertEqual(list(DelimitedRecordFinder('$$',
ignore=never)(lines)),
[['>abc', '', '1', '$$'],['>def','#ignore','2','$$']])
self.assertEqual(list(DelimitedRecordFinder('$$',
ignore=ignore_labels)(lines)),
[['>abc','1','$$'],['>def','2','$$']])
class LabeledRecordFinderTests(TestCase):
"""Tests of the LabeledRecordFinder factory function."""
def setUp(self):
"""Define a standard LabeledRecordFinder"""
self.FastaLike = LabeledRecordFinder(lambda x: x.startswith('>'))
def test_parsers(self):
"""LabeledRecordFinder should split records into lines correctly"""
lines = '>abc\ndef\n//\n>efg\n//'.split()
fl = self.FastaLike
self.assertEqual(list(fl(lines)), \
[['>abc', 'def', '//'], ['>efg','//']])
def test_parsers_empty(self):
"""LabeledRecordFinder should return empty list on empty lines"""
fl = self.FastaLike
self.assertEqual(list(fl([' ','\n'])), [])
self.assertEqual(list(fl([])), [])
def test_parsers_strip(self):
"""LabeledRecordFinder should trim each line correctly"""
fl = self.FastaLike
lines = ' \t >abc \n \t def\n // \t\n\t\t >efg \n//'.split('\n')
self.assertEqual(list(fl(lines)), \
[['>abc', 'def', '//'], ['>efg','//']])
def test_parsers_leftover(self):
"""LabeledRecordFinder should not raise RecordError if last line label"""
fl = self.FastaLike
good = [ ' \t >abc \n',
'\t def\n',
'\t\n',
'\t >efg \n',
'ghi',
]
blank = ['', ' ', '\t \t\n\n']
bad = ['>abc']
result = [['>abc', 'def'], ['>efg','ghi']]
self.assertEqual(list(fl(good)), result)
self.assertEqual(list(fl(good+blank)), result)
self.assertEqual(list(fl(good+bad)), result + [['>abc']])
def test_parsers_ignore(self):
"""LabeledRecordFinder should skip lines to ignore."""
def never(line):
return False
def ignore_labels(line):
return (not line) or line.isspace() or line.startswith('#')
def is_start(line):
return line.startswith('>')
lines = ['>abc','\n','1','>def','#ignore','2']
self.assertEqual(list(LabeledRecordFinder(is_start)(lines)),
[['>abc', '1'],['>def','#ignore','2']])
self.assertEqual(list(LabeledRecordFinder(is_start,
ignore=never)(lines)),
[['>abc', '', '1'],['>def','#ignore','2']])
self.assertEqual(list(LabeledRecordFinder(is_start,
ignore=ignore_labels)(lines)),
[['>abc','1'],['>def','2']])
class LineGrouperTests(TestCase):
"""Tests of the LineGrouper class."""
def test_parser(self):
"""LineGrouper should return n non-blank lines at a time"""
good = [ ' \t >abc \n',
'\t def\n',
'\t\n',
'\t >efg \n',
'ghi',
]
c = LineGrouper(2)
self.assertEqual(list(c(good)), [['>abc', 'def'],['>efg','ghi']])
c = LineGrouper(1)
self.assertEqual(list(c(good)), [['>abc'], ['def'],['>efg'],['ghi']])
c = LineGrouper(4)
self.assertEqual(list(c(good)), [['>abc', 'def','>efg','ghi']])
#shouldn't work if not evenly divisible
c = LineGrouper(3)
self.assertRaises(RecordError, list, c(good))
def test_parser_ignore(self):
"""LineGrouper should skip lines to ignore."""
def never(line):
return False
def ignore_labels(line):
return (not line) or line.isspace() or line.startswith('#')
lines = ['abc','\n','1','def','#ignore','2']
self.assertEqual(list(LineGrouper(1)(lines)),
[['abc'], ['1'],['def'],['#ignore'],['2']])
self.assertEqual(list(LineGrouper(1, ignore=never)(lines)),
[[i.strip()] for i in lines])
self.assertEqual(list(LineGrouper(2, ignore=ignore_labels)(lines)),
[['abc','1'],['def','2']])
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_rfam.py 000644 000765 000024 00000056502 12024702176 022122 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""
Provides tests for RfamParser and related classes and functions.
"""
from cogent.parse.rfam import is_header_line, is_seq_line, is_structure_line,\
HeaderToInfo, MinimalRfamParser, RfamFinder, NameToInfo, RfamParser,\
ChangedSequence, is_empty_or_html
from cogent.util.unit_test import TestCase, main
from cogent.parse.record import RecordError
from cogent.core.info import Info
from cogent.struct.rna2d import WussStructure
from cogent.core.alignment import Alignment
from cogent.core.moltype import BYTES
__author__ = "Sandra Smit and Greg Caporaso"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Sandra Smit", "Greg Caporaso", "Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Sandra Smit"
__email__ = "sandra.smit@colorado.edu"
__status__ = "Development"
Sequence = BYTES.Sequence
class RfamParserTests(TestCase):
""" Tests componenets of the rfam parser, in the rfam.py file """
def setUp(self):
""" Construct some fake data for testing purposes """
self._fake_headers = []
temp = list(fake_headers.split('\n'))
for line in temp:
self._fake_headers.append(line.strip())
del temp
self._fake_record_no_headers =\
list(fake_record_no_headers.split('\n'))
self._fake_record_no_sequences =\
list(fake_record_no_sequences.split('\n'))
self._fake_record_no_structure =\
list(fake_record_no_structure.split('\n'))
self._fake_two_records =\
list(fake_two_records.split('\n'))
self._fake_record =\
list(fake_record.split('\n'))
self._fake_record_bad_header_1 =\
list(fake_record_bad_header_1.split('\n'))
self._fake_record_bad_header_2 =\
list(fake_record_bad_header_2.split('\n'))
self._fake_record_bad_sequence_1 =\
list(fake_record_bad_sequence_1.split('\n'))
self._fake_record_bad_structure_1 =\
list(fake_record_bad_structure_1.split('\n'))
self._fake_record_bad_structure_2 =\
list(fake_record_bad_structure_2.split('\n'))
self.single_family = single_family.split('\n')
def test_is_empty_or_html(self):
"""is_empty_or_html: should ignore empty and HTML line"""
line = ' '
self.assertEqual(is_empty_or_html(line), True)
line = '\n\n'
self.assertEqual(is_empty_or_html(line), True)
line = ''
self.assertEqual(is_empty_or_html(line), True)
line = ' \n\n'
self.assertEqual(is_empty_or_html(line), True)
line = '\t/\n'
self.assertEqual(is_empty_or_html(line), False)
def test_is_header_line(self):
"""is_header_line: functions correctly w/ various lines """
self.assertEqual(is_header_line('#=GF'), True)
self.assertEqual(is_header_line('#=GF AC RF00001'), True)
self.assertEqual(is_header_line('#=GF CC until it is\
required for transcription. '), True)
self.assertEqual(is_header_line(''), False)
self.assertEqual(is_header_line('X07545.1/505-619 '), False)
self.assertEqual(is_header_line('#=G'), False)
self.assertEqual(is_header_line('=GF'), False)
self.assertEqual(is_header_line('#=GC SS_cons'), False)
def test_is_seq_line(self):
"""is_seq_line: functions correctly w/ various lines """
s = 'X07545.1/505-619 .\
.ACCCGGC.CAUA...GUGGCCG.GGCAA.CAC.CCGG.U.C..UCGUU'
assert is_seq_line('s')
assert is_seq_line('X07545.1/505-619')
assert is_seq_line('M21086.1/8-123')
assert not is_seq_line('')
assert not is_seq_line('#GF=')
assert not is_seq_line('//blah')
def test_is_structure_line(self):
"""is_structure_line: functions correctly w/ various lines """
s = '#=GC SS_cons\
<<<<<<<<<........<<.<<<<.<...<.<...<<<<.<.<.......'
self.assertEqual(is_structure_line(s), True)
self.assertEqual(is_structure_line('#=GC SS_cons'), True)
self.assertEqual(is_structure_line('#=GC SS_cons '), True)
self.assertEqual(is_structure_line(''), False)
self.assertEqual(is_structure_line(' '), False)
self.assertEqual(is_structure_line('#=GF AC RF00001'), False)
self.assertEqual(is_structure_line('X07545.1/505-619'), False)
self.assertEqual(is_structure_line('=GC SS_cons'), False)
self.assertEqual(is_structure_line('#=GC'), False)
self.assertEqual(is_structure_line('#=GC RF'), False)
def test_HeaderToInfo(self):
"""HeaderToInfo: correctly builds info object from header information"""
info = HeaderToInfo(self._fake_headers)
self.assertEqual(info['Identification'], '5S_rRNA')
self.assertEqual(info['RT'], None)
self.assertEqual(info['Comment'], 'This is a short comment')
self.assertEqual(info['Author'], 'Griffiths-Jones SR')
self.assertEqual(info['Sequences'], '606')
self.assertEqual(info['DatabaseReference'],\
['URL; http://oberon.fvms.ugent.be:8080/rRNA/ssu/index.html;',\
'URL; http://rdp.cme.msu.edu/html/;'])
self.assertEqual(info['PK'],'not real')
self.assertEqual(info['Rfam'], ['RF00001'])
def test_HeaderToInfo_invalid_data(self):
"""HeaderToInfo: correctly raises error when necessary """
invalid_headers = [['#=GF ACRF00001'],['#=GFACRF00001']]
for h in invalid_headers:
self.assertRaises(RecordError, HeaderToInfo, h)
def test_MinimalRfamParser_strict_missing_fields(self):
"""MinimalRfamParser: toggle strict functions w/ missing fields"""
# strict = True
self.assertRaises(RecordError,list,\
MinimalRfamParser(self._fake_record_no_sequences))
self.assertRaises(RecordError,list,\
MinimalRfamParser(self._fake_record_no_structure))
# strict = False
# no header shouldn't be a problem
self.assertEqual(list(MinimalRfamParser(self._fake_record_no_headers,\
strict=False)), [([],{'Z11765.1/1-89':'GGUC'},'............>>>')])
# should get empty on missing sequence or missing structure
self.assertEqual(list(MinimalRfamParser(self._fake_record_no_sequences,\
strict=False)), [])
self.assertEqual(list(MinimalRfamParser(self._fake_record_no_structure,\
strict=False)), [])
def test_MinimalRfamParser_strict_invalid_sequence(self):
"""MinimalRfamParser: toggle strict functions w/ invalid seq
"""
#strict = True
self.assertRaises(RecordError,list,\
MinimalRfamParser(self._fake_record_bad_sequence_1))
# strict = False
# you expect to get back as much information as possible, also
# half records or sequences
result = MinimalRfamParser(self._fake_record_bad_sequence_1,strict=False)
self.assertEqual(len(list(MinimalRfamParser(\
self._fake_record_bad_sequence_1,strict=False))[0][1].NamedSeqs), 3)
def test_MinimalRfamParser_strict_invalid_structure(self):
"""MinimalRfamParser: toggle strict functions w/ invalid structure
"""
#strict = True
self.assertRaises(RecordError,list,\
MinimalRfamParser(self._fake_record_bad_structure_1))
# strict = False
self.assertEqual(list(MinimalRfamParser(\
self._fake_record_bad_structure_1,strict=False))[0][2],None)
def test_MinimalRfamParser_w_valid_data(self):
"""MinimalRfamParser: integrity of output """
# Some ugly constructions here, but this is what the output of
# parsing fake_two_records should be
headers = ['#=GF AC RF00014','#=GF AU Mifsud W']
sequences =\
{'U17136.1/898-984':\
''.join(['AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA',\
'AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU']),\
'M15749.1/155-239':\
''.join(['AACGCAUCGGAUUUCCCGGUGUAACGAA-UUUUCAAGUGCUUCUUGCAUU',\
'AGCAAGUUUGAUCCCGACUCCUG-CGAGUCGGGAUUU']),\
'AF090431.1/222-139':\
''.join(['CUCACAUCAGAUUUCCUGGUGUAACGAA-UUUUCAAGUGCUUCUUGCAUA',\
'AGCAAGUUUGAUCCCGACCCGU--AGGGCCGGGAUUU'])}
structure = WussStructure(''.join(\
['...<<<<<<<.....>>>>>>>....................<<<<<...',\
'.>>>>>....<<<<<<<<<<.....>>>>>>>>>>..']))
data = []
for r in MinimalRfamParser(self._fake_two_records, strict=False):
data.append(r)
self.assertEqual(data[0],(headers,sequences,structure))
assert isinstance(data[0][1],Alignment)
# This line tests that invalid entries are ignored when strict=False
# Note, there are two records in self._fake_two_records, but 2nd is
# invalid
self.assertEqual(len(data),1)
def test_RfamFinder(self):
"""RfamFinder: integrity of output """
fake_record = ['a','//','b','b','//']
num_records = 0
data = []
for r in RfamFinder(fake_record):
data.append(r)
num_records += 1
self.assertEqual(num_records, 2)
self.assertEqual(data[0], ['a','//'])
self.assertEqual(data[1], ['b','b','//'])
def test_ChangedSequence(self):
"""ChangedSequence: integrity of output"""
# Made up input, based on a line that would look like:
# U17136.1/898-984 AACA..CAU..CAGAUUUCCU..GGUGUAA.CGAA
s_in = 'AACA..CAU..CAGAUUUCCU..GGUGUAA.CGAA'
s_out = 'AACA--CAU--CAGAUUUCCU--GGUGUAA-CGAA'
sequence = ChangedSequence(s_in)
self.assertEqual(sequence, s_out)
# test some extremes on the seq
# sequence of all blanks
s_in = '.' * 5
s_out = '-' * 5
sequence = ChangedSequence(s_in)
self.assertEqual(sequence, s_out)
# sequence of no blanks
s_in = 'U' * 5
s_out = 'U' * 5
sequence = ChangedSequence(s_in)
self.assertEqual(sequence, s_out)
def test_NameToInfo(self):
"""NameToInfo: integrity of output """
# Made up input, based on a line that would look like:
# U17136.1/898-984 AACA..CAU..CAGAUUUCCU..GGUGUAA.CGAA
s_in = 'AACA..CAU..CAGAUUUCCU..GGUGUAA.CGAA'
#s_out = 'AACA--CAU--CAGAUUUCCU--GGUGUAA-CGAA'
sequence = Sequence(s_in, Name='U17136.1/898-984')
info = NameToInfo(sequence)
#self.assertEqual(seq, s_out)
self.assertEqual(info['Start'], 897)
self.assertEqual(info['End'], 984)
self.assertEqual(info['GenBank'], ['U17136.1'])
def test_NameToInfo_invalid_label(self):
"""NameToInfo: raises error on invalid label """
s = 'AA'
invalid_labels = ['U17136.1898-984','U17136.1/898984']
for l in invalid_labels:
self.assertRaises(RecordError,NameToInfo,\
Sequence(s, Name=l))
a = 'U17136.1/' #missing start/end positions
b = '/898-984' #missing genbank id
obs_info = NameToInfo(Sequence(s,Name=a))
exp = Info({'GenBank':'U17136.1','Start':None,'End':None})
self.assertEqual(obs_info,exp)
obs_info = NameToInfo(Sequence(s,Name=b))
exp = Info({'GenBank':None,'Start':897,'End':984})
self.assertEqual(obs_info,exp)
#strict = False
# in strict mode you want to get back as much info as possible
lab1 = 'U17136.1898-984'
lab2 = 'U17136.1/898984'
obs_info = NameToInfo(Sequence(s,Name=lab1), strict=False)
exp = Info({'GenBank':None,'Start':None,'End':None})
self.assertEqual(obs_info,exp)
obs_info = NameToInfo(Sequence(s,Name=lab2), strict=False)
exp = Info({'GenBank':'U17136.1','Start':None,'End':None})
self.assertEqual(obs_info,exp)
def test_RfamParser(self):
"""RfamParser: integrity of output """
expected_sequences =\
[''.join(['AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA',\
'AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU']),\
''.join(['AACGCAUCGGAUUUCCCGGUGUAACGAA-UUUUCAAGUGCUUCUUGCAUU',\
'AGCAAGUUUGAUCCCGACUCCUG-CGAGUCGGGAUUU']),\
''.join(['CUCACAUCAGAUUUCCUGGUGUAACGAA-UUUUCAAGUGCUUCUUGCAUA',\
'AGCAAGUUUGAUCCCGACCCGU--AGGGCCGGGAUUU'])]
expected_structure = ''.join(\
['...<<<<<<<.....>>>>>>>....................<<<<<...',\
'.>>>>>....<<<<<<<<<<.....>>>>>>>>>>..'])
for r in RfamParser(self._fake_record):
headers,sequences,structure = r
self.assertEqual(headers['Refs']['Rfam'], ['RF00014'])
self.assertEqual(headers['Author'], 'Mifsud W')
self.assertEqualItems(sequences.values(), expected_sequences)
assert isinstance(sequences, Alignment)
self.assertEqualItems([s.Info.GenBank for s in sequences.Seqs],
[['U17136.1'],['M15749.1'],['AF090431.1']])
self.assertEqualItems([s.Info.Start for s in sequences.Seqs],
[897,154,221])
self.assertEqual(structure, expected_structure)
assert isinstance(structure,WussStructure)
def test_RfamParser_strict_missing_fields(self):
"""RfamParser: toggle strict functions correctly """
# strict = True
self.assertRaises(RecordError,list,\
RfamParser(self._fake_record_no_headers))
self.assertRaises(RecordError,list,\
RfamParser(self._fake_record_no_sequences))
self.assertRaises(RecordError,list,\
RfamParser(self._fake_record_no_structure))
# strict = False
self.assertEqual(list(RfamParser(self._fake_record_no_headers,\
strict=False)), [])
self.assertEqual(list(RfamParser(self._fake_record_no_sequences,\
strict=False)), [])
self.assertEqual(list(RfamParser(self._fake_record_no_structure,\
strict=False)), [])
def test_RFamParser_strict_invalid_headers(self):
"""RfamParser: functions when toggling strict w/ record w/ bad header
"""
self.assertRaises(RecordError,list,\
RfamParser(self._fake_record_bad_header_1))
self.assertRaises(RecordError,list,\
RfamParser(self._fake_record_bad_header_2))
# strict = False
x = list(RfamParser(self._fake_record_bad_header_1, strict=False))
obs = list(RfamParser(self._fake_record_bad_header_1,\
strict=False))[0][0].keys()
self.assertEqual(len(obs),1)
obs = list(RfamParser(self._fake_record_bad_header_2,\
strict=False))[0][0].keys()
self.assertEqual(len(obs),1)
def test_RfamParser_strict_invalid_sequences(self):
"""RfamParser: functions when toggling strict w/ record w/ bad seq
"""
self.assertRaises(RecordError,list,
MinimalRfamParser(self._fake_record_bad_sequence_1))
# strict = False
# in 'False' mode you expect to get back as much as possible, also
# parts of sequences
self.assertEqual(len(list(RfamParser(self._fake_record_bad_sequence_1,\
strict=False))[0][1].NamedSeqs), 3)
def test_RfamParser_strict_invalid_structure(self):
"""RfamParser: functions when toggling strict w/ record w/ bad struct
"""
# strict
self.assertRaises(RecordError,list,\
RfamParser(self._fake_record_bad_structure_2))
#not strict
self.assertEqual(list(RfamParser(self._fake_record_bad_structure_2,\
strict=False)),[])
def test_RfamParser_single_family(self):
"""RfamParser: should work on a single family in stockholm format"""
exp_header = Info()
exp_aln = {'K02120.1/628-682':\
'AUGGGAAAUUCCCCCUCCUAUAACCCCCCCGCUGGUAUCUCCCCCUCAGACUGGC',\
'D00647.1/629-683':\
'AUGGGAAACUCCCCCUCCUAUAACCCCCCCGCUGGCAUCUCCCCCUCAGACUGGC'}
exp_struct = '<<<<<<.........>>>>>>.........<<<<<<.............>>>>>>'
h, a, s = list(RfamParser(self.single_family))[0]
self.assertEqual(h,exp_header)
self.assertEqual(a,exp_aln)
self.assertEqual(s,exp_struct)
# This is an altered version of some header info from Rfam.seed modified to
# incorporate different cases for testing
fake_headers = """#=GF AC RF00001
#=GF AU Griffiths-Jones SR
#=GF ID 5S_rRNA
#=GF RT 5S Ribosomal RNA Database.
#=GF DR URL; http://oberon.fvms.ugent.be:8080/rRNA/ssu/index.html;
#=GF DR URL; http://rdp.cme.msu.edu/html/;
#=GF CC This is a short
#=GF CC comment
#=GF SQ 606
#=GF PK not real"""
fake_record_no_headers ="""Z11765.1/1-89 GGUC
#=GC SS_cons ............>>>
//"""
fake_record_no_sequences ="""#=GF AC RF00006
#=GC SS_cons ............>
//"""
fake_record_no_structure ="""#=GF AC RF00006
Z11765.1/1-89 GGUCAGC
//"""
fake_two_records ="""# STOCKHOLM 1.0
#=GF AC RF00014
#=GF AU Mifsud W
U17136.1/898-984 AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA
M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU
AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA
#=GC SS_cons ...<<<<<<<.....>>>>>>>....................<<<<<...
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
U17136.1/898-984 AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU
M15749.1/155-239 AGCAAGUUUGAUCCCGACUCCUG.CGAGUCGGGAUUU
AF090431.1/222-139 AGCAAGUUUGAUCCCGACCCGU..AGGGCCGGGAUUU
#=GC SS_cons .>>>>>....<<<<<<<<<<.....>>>>>>>>>>..
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
//
#=GF AC RF00015
//"""
fake_record ="""# STOCKHOLM 1.0
#=GF AC RF00014
#=GF AU Mifsud W
U17136.1/898-984 AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA
M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU
AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA
#=GC SS_cons ...<<<<<<<.....>>>>>>>....................<<<<<...
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
U17136.1/898-984 AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU
M15749.1/155-239 AGCAAGUUUGAUCCCGACUCCUG.CGAGUCGGGAUUU
AF090431.1/222-139 AGCAAGUUUGAUCCCGACCCGU..AGGGCCGGGAUUU
#=GC SS_cons .>>>>>....<<<<<<<<<<.....>>>>>>>>>>..
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
//"""
fake_record_bad_header_1 ="""# STOCKHOLM 1.0
#=GF AC RF00014
#=GF AUMifsud W
U17136.1/898-984 AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA
M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU
AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA
#=GC SS_cons ...<<<<<<<.....>>>>>>>....................<<<<<...
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
U17136.1/898-984 AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU
M15749.1/155-239 AGCAAGUUUGAUCCCGACUCCUG.CGAGUCGGGAUUU
AF090431.1/222-139 AGCAAGUUUGAUCCCGACCCGU..AGGGCCGGGAUUU
#=GC SS_cons .>>>>>....<<<<<<<<<<.....>>>>>>>>>>..
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
//"""
fake_record_bad_header_2 ="""# STOCKHOLM 1.0
#=GF AC RF00014
#=GFAUMifsud W
U17136.1/898-984 AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA
M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU
AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA
#=GC SS_cons ...<<<<<<<.....>>>>>>>....................<<<<<...
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
U17136.1/898-984 AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU
M15749.1/155-239 AGCAAGUUUGAUCCCGACUCCUG.CGAGUCGGGAUUU
AF090431.1/222-139 AGCAAGUUUGAUCCCGACCCGU..AGGGCCGGGAUUU
#=GC SS_cons .>>>>>....<<<<<<<<<<.....>>>>>>>>>>..
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
//"""
fake_record_bad_sequence_1 ="""# STOCKHOLM 1.0
#=GF AC RF00014
#=GF AU Mifsud W
U17136.1/898-984AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA
M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU
AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA
#=GC SS_cons ...<<<<<<<.....>>>>>>>....................<<<<<...
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
U17136.1/898-984 AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU
M15749.1/155-239 AGCAAGUUUGAUCCCGACUCCUG.CGAGUCGGGAUUU
AF090431.1/222-139 AGCAAGUUUGAUCCCGACCCGU..AGGGCCGGGAUUU
#=GC SS_cons .>>>>>....<<<<<<<<<<.....>>>>>>>>>>..
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
//"""
fake_record_bad_structure_1 ="""# STOCKHOLM 1.0
#=GF AC RF00014
#=GF AU Mifsud W
U17136.1/898-984 AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA
M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU
AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA
#=GC SS_cons...<<<<<<<.....>>>>>>>....................<<<<<...
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
U17136.1/898-984 AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU
M15749.1/155-239 AGCAAGUUUGAUCCCGACUCCUG.CGAGUCGGGAUUU
AF090431.1/222-139 AGCAAGUUUGAUCCCGACCCGU..AGGGCCGGGAUUU
#=GC SS_cons .>>>>>....<<<<<<<<<<.....>>>>>>>>>>..
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
//"""
fake_record_bad_structure_2 ="""# STOCKHOLM 1.0
#=GF AC RF00014
#=GF AU Mifsud W
U17136.1/898-984 AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA
M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU
AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA
#=GC SS_cons ...<<<<<<<.....>>>>>>>....................<<<<>>>>....<<<<<<<<<<.....>>>>>>>>>>..
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
//"""
single_family=\
"""K02120.1/628-682 AUGGGAAAUUCCCCCUCCUAUAACCCCCCCGCUGGUAUCUCCCCCUCAGA
D00647.1/629-683 AUGGGAAACUCCCCCUCCUAUAACCCCCCCGCUGGCAUCUCCCCCUCAGA
#=GC SS_cons <<<<<<.........>>>>>>.........<<<<<<.............>
K02120.1/628-682 CUGGC
D00647.1/629-683 CUGGC
#=GC SS_cons >>>>>
//"""
# Run tests if called from the command line
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_rna_fold.py 000644 000765 000024 00000011264 12024702176 022755 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests for the RNAfold dot plot parser
"""
from __future__ import division
from cogent.util.unit_test import TestCase, main
from cogent.parse.rna_fold import *
__author__ = "Jeremy Widmann"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jeremy Widmann", "Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Jeremy Widmann"
__email__ = "jeremy.widmann@colorado.edu"
__status__ = "Production"
class RnaFoldParserTests(TestCase):
"""Tests for RnaFoldParser.
"""
def setUp(self):
"""Setup function for RnaFoldParser tests.
"""
self.rna_fold_parser_results = ('ACGUGCUAG',
[(1, 7, float(0.01462)),
(2, 9, float(0.11118)),
(3, 7, float(0.00985)),
(4, 8, float(0.01005)),
(4, 9, float(0.01586))])
self.sequence_lines = ['/sequence line before where sequences start\n',
' ACCUGUCUAUCGCUGC&*$#@(*\n',
'ACGGUUAUAUUAUCUCUG\\\n',
') end of sequence\n']
self.sequence_lines_empty = ['/sequence \n',
'\n',
')\n']
self.index_lines = ['unimportant line',
'1 3 0.332 ubox',
'1 4 0.003 ubox']
self.index_lines_no_ubox = ['unimportant line',
'1 2 0.432 u box',
'1 4 0.32 ubo x']
def test_getSequence(self):
self.assertEqual(getSequence(self.sequence_lines),
'ACCUGUCUAUCGCUGC&*$#@(*ACGGUUAUAUUAUCUCUG')
self.assertEqual(getSequence(self.sequence_lines_empty),'')
def test_getIndices(self):
self.assertEqual(getIndices(self.index_lines),[(1,3,float(0.332)),
(1,4,float(0.003))])
self.assertEqual(getIndices(self.index_lines_no_ubox),[])
def test_RnaFoldParser(self):
self.assertEqual(RnaFoldParser([]), ('',[]))
self.assertEqual(RnaFoldParser(RNA_FOLD_RESULTS),
self.rna_fold_parser_results)
RNA_FOLD_RESULTS = ['%!PS-Adobe-3.0 EPSF-3.0\n',
'%%Title: RNA DotPlot\n',
'%%Creator: PS_dot.c,v 1.24 2003/08/07 09:01:00 ivo Exp $, ViennaRNA-1.5\n',
'%%CreationDate: Fri Oct 8 13:15:01 2004\n',
'%%BoundingBox: 66 211 518 662\n',
'%%DocumentFonts: Helvetica\n',
'%%Pages: 1\n',
'%%EndComments\n',
'\n',
'%Options: \n',
'%This file contains the square roots of the base pair probabilities in the form\n',
'% i j sqrt(p(i,j)) ubox\n',
'100 dict begin\n',
'\n',
'/logscale false def\n',
'\n',
'%delete next line to get rid of title\n',
'270 665 moveto /Helvetica findfont 14 scalefont setfont (dot.ps) show\n',
'\n',
'/lpmin {\n',
' 1e-05 log % log(pmin) only probs>pmin will be shown\n',
'} bind def\n',
'\n',
'/box { %size x y box - draws box centered on x,y\n',
' 2 index 0.5 mul add % x += 0.5\n',
' exch 2 index 0.5 mul add exch % x += 0.5\n',
' newpath\n',
' moveto\n',
' dup neg 0 rlineto\n',
' dup neg 0 exch rlineto\n',
' 0 rlineto\n',
' closepath\n',
' fill\n',
'} bind def\n',
'\n',
'/sequence { (\\\n',
'ACGUGCUAG\\\n',
') } def\n',
'/len { sequence length } bind def\n',
'\n',
'/ubox {\n',
' logscale {\n',
' log dup add lpmin div 1 exch sub dup 0 lt { pop 0 } if\n',
' } if\n',
' 3 1 roll\n',
' exch len exch sub 1 add box\n',
'} bind def\n',
'\n',
'/lbox {\n',
' 3 1 roll\n',
' len exch sub 1 add box\n',
'} bind def\n',
'\n',
'72 216 translate\n',
'72 6 mul len 1 add div dup scale\n',
'/Helvetica findfont 0.95 scalefont setfont\n',
'\n',
'% print sequence along all 4 sides\n',
'[ [0.7 -0.3 0 ]\n',
' [0.7 0.7 len add 0]\n',
' [0.7 -0.2 90]\n',
' [-0.3 len sub 0.7 len add -90]\n',
'] {\n',
' gsave\n',
' aload pop rotate translate\n',
' 0 1 len 1 sub {\n',
' dup 0 moveto\n',
' sequence exch 1 getinterval\n',
' show\n',
' } for\n',
' grestore\n',
'} forall\n',
'\n',
'0.5 dup translate\n',
'% draw diagonal\n',
'0.04 setlinewidth\n',
'0 len moveto len 0 lineto stroke \n',
'\n',
'%draw grid\n',
'0.01 setlinewidth\n',
'len log 0.9 sub cvi 10 exch exp % grid spacing\n',
'dup 1 gt {\n',
' dup dup 20 div dup 2 array astore exch 40 div setdash\n',
'} { [0.3 0.7] 0.1 setdash } ifelse\n',
'0 exch len {\n',
' dup dup\n',
' 0 moveto\n',
' len lineto \n',
' dup\n',
' len exch sub 0 exch moveto\n',
' len exch len exch sub lineto\n',
' stroke\n',
'} for\n',
'0.5 neg dup translate\n',
'\n',
'1 7 0.01462 ubox\n',
'2 9 0.11118 ubox\n',
'3 7 0.00985 ubox\n',
'4 8 0.01005 ubox\n',
'4 9 0.01586 ubox\n',
'showpage\n',
'end\n',
'%%EOF\n']
#run if called from command-line
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_parse/test_rna_plot.py 000644 000765 000024 00000016216 12024702176 023011 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests for the RNAplot parser
"""
from __future__ import division
from cogent.util.unit_test import TestCase, main
import string
import re
from cogent.parse.rna_plot import get_sequence, get_coordinates, get_pairs,\
RnaPlotParser
__author__ = "Jeremy Widmann"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jeremy Widmann"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Jeremy Widmann"
__email__ = "jermey.widmann@colorado.edu"
__status__ = "Production"
class RnaPlotParserTests(TestCase):
"""Tests for RnaPlotParser.
"""
def setUp(self):
"""Setup function for RnaPlotParser tests.
"""
self.sequence_lines = SEQUENCE_LINES.split('\n')
self.expected_seq = 'AAACCGCCUUU'
self.coordinate_lines = COORDINATE_LINES.split('\n')
self.expected_coords = [[92.500,92.500],\
[92.500,77.500],\
[92.500,62.500],\
[82.218,50.185],\
[85.577,34.497],\
[100.000,27.472],\
[114.423,34.497],\
[117.782,50.185],\
[107.500,62.500],\
[107.500,77.500],\
[107.500,92.500]]
self.pairs_lines = PAIRS_LINES.split('\n')
self.expected_pairs = [[0,10],\
[1,9],\
[2,8]]
self.rna_plot_lines = RNA_PLOT_FILE.split('\n')
def test_get_sequence(self):
"""get_sequence should properly parse out sequence.
"""
#test real data
obs_seq = get_sequence(self.sequence_lines)
self.assertEqual(obs_seq, self.expected_seq)
#test empty list
self.assertEqual(get_sequence([]),'')
def test_get_coordinates(self):
"""get_coordinates should proplerly parse out coordinates.
"""
obs_coords = get_coordinates(self.coordinate_lines)
for (obs1, obs2), (exp1, exp2) in zip(obs_coords,self.expected_coords):
self.assertFloatEqual(obs1,exp1)
self.assertFloatEqual(obs2,exp2)
#test empty list
self.assertEqual(get_coordinates([]),[])
def test_get_pairs(self):
"""get_pairs should proplerly parse out pairs.
"""
obs_pairs = get_pairs(self.pairs_lines)
self.assertEqual(obs_pairs, self.expected_pairs)
#test empty list
self.assertEqual(get_pairs([]),[])
def test_RnaPlotParser(self):
"""RnaPlotParser should properly parse full RNAplot postscript file.
"""
obs_seq, obs_coords, obs_pairs = RnaPlotParser(self.rna_plot_lines)
#test seq is correctly parsed
self.assertEqual(obs_seq, self.expected_seq)
#test coords are correctly parsed
for (obs1, obs2), (exp1, exp2) in zip(obs_coords,self.expected_coords):
self.assertFloatEqual(obs1,exp1)
self.assertFloatEqual(obs2,exp2)
#test pairs are correctly parsed
self.assertEqual(obs_pairs, self.expected_pairs)
#test empty list
self.assertEqual(RnaPlotParser([]),('',[],[]))
SEQUENCE_LINES = """
/sequence (\
AAACCGCCUUU\
) def
/coor [
[92.500 92.500]
[92.500 77.500]
[92.500 62.500]
[82.218 50.185]
[85.577 34.497]
[100.000 27.472]
[114.423 34.497]
[117.782 50.185]
[107.500 62.500]
[107.500 77.500]
[107.500 92.500]
] def
/pairs [
[1 11]
[2 10]
[3 9]
] def
init
% switch off outline pairs or bases by removing these lines
drawoutline
drawpairs
drawbases
% show it
showpage
end
%%EOF
"""
COORDINATE_LINES = """
/coor [
[92.500 92.500]
[92.500 77.500]
[92.500 62.500]
[82.218 50.185]
[85.577 34.497]
[100.000 27.472]
[114.423 34.497]
[117.782 50.185]
[107.500 62.500]
[107.500 77.500]
[107.500 92.500]
] def
/pairs [
[1 11]
[2 10]
[3 9]
] def
init
% switch off outline pairs or bases by removing these lines
drawoutline
drawpairs
drawbases
% show it
showpage
end
%%EOF
"""
PAIRS_LINES = """
/pairs [
[1 11]
[2 10]
[3 9]
] def
init
% switch off outline pairs or bases by removing these lines
drawoutline
drawpairs
drawbases
% show it
showpage
end
%%EOF
"""
RNA_PLOT_FILE = """
%!PS-Adobe-3.0 EPSF-3.0
%%Creator: PS_dot.c,v 1.38 2007/02/02 15:18:13 ivo Exp $, ViennaRNA-1.8.2
%%CreationDate: Wed Apr 14 12:08:23 2010
%%Title: RNA Secondary Structure Plot
%%BoundingBox: 66 210 518 662
%%DocumentFonts: Helvetica
%%Pages: 1
%%EndComments
%Options:
% to switch off outline pairs of sequence comment or
% delete the appropriate line near the end of the file
%%BeginProlog
/RNAplot 100 dict def
RNAplot begin
/fsize 14 def
/outlinecolor {0.2 setgray} bind def
/paircolor {0.2 setgray} bind def
/seqcolor {0 setgray} bind def
/cshow { dup stringwidth pop -2 div fsize -3 div rmoveto show} bind def
/min { 2 copy gt { exch } if pop } bind def
/max { 2 copy lt { exch } if pop } bind def
/drawoutline {
gsave outlinecolor newpath
coor 0 get aload pop 0.8 0 360 arc % draw 5' circle of 1st sequence
currentdict /cutpoint known % check if cutpoint is defined
{coor 0 cutpoint getinterval
{aload pop lineto} forall % draw outline of 1st sequence
coor cutpoint get aload pop
2 copy moveto 0.8 0 360 arc % draw 5' circle of 2nd sequence
coor cutpoint coor length cutpoint sub getinterval
{aload pop lineto} forall} % draw outline of 2nd sequence
{coor {aload pop lineto} forall} % draw outline as a whole
ifelse
stroke grestore
} bind def
/drawpairs {
paircolor
0.7 setlinewidth
[9 3.01] 9 setdash
newpath
pairs {aload pop
coor exch 1 sub get aload pop moveto
coor exch 1 sub get aload pop lineto
} forall
stroke
} bind def
% draw bases
/drawbases {
[] 0 setdash
seqcolor
0
coor {
aload pop moveto
dup sequence exch 1 getinterval cshow
1 add
} forall
pop
} bind def
/init {
/Helvetica findfont fsize scalefont setfont
1 setlinejoin
1 setlinecap
0.8 setlinewidth
72 216 translate
% find the coordinate range
/xmax -1000 def /xmin 10000 def
/ymax -1000 def /ymin 10000 def
coor {
aload pop
dup ymin lt {dup /ymin exch def} if
dup ymax gt {/ymax exch def} {pop} ifelse
dup xmin lt {dup /xmin exch def} if
dup xmax gt {/xmax exch def} {pop} ifelse
} forall
/size {xmax xmin sub ymax ymin sub max} bind def
72 6 mul size div dup scale
size xmin sub xmax sub 2 div size ymin sub ymax sub 2 div
translate
} bind def
end
%%EndProlog
RNAplot begin
% data start here
/sequence (\
AAACCGCCUUU\
) def
/coor [
[92.500 92.500]
[92.500 77.500]
[92.500 62.500]
[82.218 50.185]
[85.577 34.497]
[100.000 27.472]
[114.423 34.497]
[117.782 50.185]
[107.500 62.500]
[107.500 77.500]
[107.500 92.500]
] def
/pairs [
[1 11]
[2 10]
[3 9]
] def
init
% switch off outline pairs or bases by removing these lines
drawoutline
drawpairs
drawbases
% show it
showpage
end
%%EOF
"""
#run if called from command-line
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_parse/test_rnaalifold.py 000644 000765 000024 00000006172 12024702176 023306 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from cogent.util.unit_test import TestCase, main
from cogent.core.info import Info
from cogent.parse.rnaalifold import rnaalifold_parser, MinimalRnaalifoldParser
__author__ = "Shandy Wikman"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Shandy Wikman","Jeremy Widmann"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Shandy Wikman"
__email__ = "ens01svn@cs.umu.se"
__status__ = "Development"
class RnaalifoldParserTest(TestCase):
"""Provides tests for RNAALIFOLD RNA secondary structure format parsers"""
def setUp(self):
"""Setup function """
#output
self.rnaalifold_out = RNAALIFOLD
#expected
self.rnaalifold_exp = [['GGCUAGGUAAAUCC',[(0,13),(1,12),(3,10)],-26.50]]
#output 2
self.rnaalifold_out_2 = RNAALIFOLD_2_STRUCTS
#expected
self.rnaalifold_exp_2 = \
[['GGCUAGGUAAAUCC',[(0,13),(1,12),(3,10)],\
float(-26.50)],\
['-GAUCCUAAGCGACGAAGUUYAWSCU------YGKRYARYRWWKKR-',\
[(6,21),(7,20),(8,19),(9,18),(10,17)],\
float(-0.80)]]
def test_rnaalifold_output(self):
"""Test for rnaalifold format parser"""
#Test empty lines
self.assertEqual(rnaalifold_parser(''),[])
#Test one structure
obs = rnaalifold_parser(self.rnaalifold_out)
self.assertEqual(obs,self.rnaalifold_exp)
#Test two structures
obs_2 = rnaalifold_parser(self.rnaalifold_out_2)
self.assertEqual(obs_2,self.rnaalifold_exp_2)
class MinimalRnaalifoldParserTest(TestCase):
"""Provides tests for MinimalRnaalifoldParser structure format parser.
"""
def setUp(self):
"""Setup function """
#output
self.rnaalifold_out = RNAALIFOLD
#expected
self.rnaalifold_exp = [['GGCUAGGUAAAUCC','((.(......).))',\
float(-26.50)]]
#output 2
self.rnaalifold_out_2 = RNAALIFOLD_2_STRUCTS
#expected
self.rnaalifold_exp_2 = \
[['GGCUAGGUAAAUCC','((.(......).))',float(-26.50)],\
['-GAUCCUAAGCGACGAAGUUYAWSCU------YGKRYARYRWWKKR-',\
'......(((((......))))).........................',\
float(-0.80)]]
def test_rnaalifold_output(self):
"""Test for rnaalifold format parser"""
#Test empty lines
self.assertEqual(MinimalRnaalifoldParser(''),[])
#Test one structure
obs = MinimalRnaalifoldParser(self.rnaalifold_out)
self.assertEqual(obs,self.rnaalifold_exp)
#Test two structures
obs_2 = MinimalRnaalifoldParser(self.rnaalifold_out_2)
self.assertEqual(obs_2,self.rnaalifold_exp_2)
RNAALIFOLD = ['GGCUAGGUAAAUCC\n',
'((.(......).)) (-26.50 = -26.50 + 0.00) \n']
RNAALIFOLD_2_STRUCTS = ['GGCUAGGUAAAUCC\n',
'((.(......).)) (-26.50 = -26.50 + 0.00) \n',\
'-GAUCCUAAGCGACGAAGUUYAWSCU------YGKRYARYRWWKKR-\n',
'......(((((......)))))......................... ( -0.80 = -1.30 + 0.50)\n',]
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_rnaforester.py 000644 000765 000024 00000014434 12024702176 023525 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from cogent.util.unit_test import TestCase, main
from cogent.core.info import Info
from cogent.parse.rnaforester import rnaforester_parser
__author__ = "Shandy Wikman"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Shandy Wikman"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Shandy Wikman"
__email__ = "ens01svn@cs.umu.se"
__status__ = "Development"
class RnaforesterParserTest(TestCase):
"""Provides tests for RNAforester RNA secondary structure format parsers"""
def setUp(self):
"""Setup function
application output is not always actual(real) output from
the application but output in the format of the application
output in question. this to save space and time(mine)"""
#output
self.rnaforester_out = RNAFORESTER
#expected
self.rnaforester_exp = [[{'seq1':'ggccacguagcucagucgguagagcaaaggacugaaaauccuugugucguugguucaauuccaaccguggccacca',
'seq2':'-gccagauagcucagucgguagagcguucgccugaaaagugaaaggucgccgguucgaucccggcucuggccacca'},
'ggccacauagcucagucgguagagcaaacgacugaaaagccaaaggucgccgguucaaucccaacccuggccacca',
[(0, 71), (1, 70), (2, 69), (3, 68), (4, 67), (5, 66), (6, 65), (9, 24),
(10, 23), (11, 22), (12, 21), (26, 42), (27, 41), (28, 40), (29, 39),
(31, 38), (32, 37), (48, 64), (49, 63), (50, 62), (51, 61), (52, 60)]]]
def test_rnaforester_output(self):
"""Test for rnaforester format"""
obs = rnaforester_parser(self.rnaforester_out)
self.assertEqual(obs,self.rnaforester_exp)
RNAFORESTER = ['*** Scoring parameters ***\n', '\n',
'Scoring type: similarity\n', 'Scoring parameters:\n', 'pm: 10\n',
'pd: -5\n', 'bm: 1\n', 'br: 0\n', 'bd: -10\n', '\n', '\n',
'Input string (upper or lower case); & to end for multiple alignments, @ to quit\n',
'....,....1....,....2....,....3....,....4....,....5....,....6....,....7....,....8\n',
'\n', '*** Calculation ***\n', '\n', 'clustering threshold is: 0.7\n',
'join clusters cutoff is: 0\n', '\n', 'Computing all pairwise similarities\n',
'2,1: 0.74606\n', '\n', 'joining alignments:\n', '1,2: 0.74606 -> 1\n',
'Calculate similarities to other clusters\n', '\n', '\n', '*** Results ***\n',
'\n', 'Minimum basepair probability for consensus structure (-cmin): 0.5\n',
'\n', 'RNA Structure Cluster Nr: 1\n', 'Score: 264.25\n', 'Members: 2\n', '\n', 'seq1 ggccacguagcucagucgguagagcaaaggacugaaaauccuugugucguugguu\n',
'seq2 -gccagauagcucagucgguagagcguucgccugaaaagugaaaggucgccgguu\n',
' **** ****************** * ******* **** ****\n', '\n',
'seq1 caauuccaaccguggccacca\n',
'seq2 cgaucccggcucuggccacca\n',
' * ** ** * *********\n', '\n',
'seq1 (((((((..((((........)))).(((((.......))))).....(((((..\n',
'seq2 -((((((..((((........)))).((((.((....)))))).....(((((..\n',
' ***************************** **** *****************\n', '\n', 'seq1 .....))))))))))))....\n',
'seq2 .....))))))))))).....\n',
' **************** ****\n', '\n', '\n',
'Consensus sequence/structure:\n',
' 100% **** ****************** * ******* **** ****\n', ' 90% **** ****************** * ******* **** ****\n', ' 80% **** ****************** * ******* **** ****\n', ' 70% **** ****************** * ******* **** ****\n', ' 60% **** ****************** * ******* **** ****\n', ' 50% *******************************************************\n', ' 40% *******************************************************\n', ' 30% *******************************************************\n', ' 20% *******************************************************\n', ' 10% *******************************************************\n', ' ggccacauagcucagucgguagagcaaacgacugaaaagccaaaggucgccgguu\n', ' (((((((..((((........)))).((((.((....)))))).....(((((..\n', ' 10% *******************************************************\n', ' 20% *******************************************************\n', ' 30% *******************************************************\n', ' 40% *******************************************************\n', ' 50% *******************************************************\n', ' 60% *******************************************************\n', ' 70% ****************************** **** ****************\n', ' 80% ****************************** **** ****************\n', ' 90% ****************************** **** ****************\n', ' 100% ****************************** **** ****************\n', '\n', ' 100% * ** ** * *********\n', ' 90% * ** ** * *********\n', ' 80% * ** ** * *********\n', ' 70% * ** ** * *********\n', ' 60% * ** ** * *********\n', ' 50% *********************\n', ' 40% *********************\n', ' 30% *********************\n', ' 20% *********************\n', ' 10% *********************\n', ' caaucccaacccuggccacca\n', ' .....))))))))))))....\n', ' 10% *********************\n', ' 20% *********************\n', ' 30% *********************\n', ' 40% *********************\n', ' 50% *********************\n', ' 60% *********************\n', ' 70% **************** ****\n', ' 80% **************** ****\n', ' 90% **************** ****\n', ' 100% **************** ****\n', '\n', '\n']
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_rnaview.py 000644 000765 000024 00000120626 12024702176 022647 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
#test_rnaview.py
"""Provides tests for code in the rnaview.py file.
"""
from cogent.util.unit_test import TestCase, main
from cogent.parse.rnaview import is_roman_numeral, is_edge, is_orientation,\
parse_annotation, parse_uncommon_residues, parse_base_pairs,\
parse_base_multiplets, parse_pair_counts, MinimalRnaviewParser,\
RnaviewParser, Base, BasePair, BasePairs, BaseMultiplet, BaseMultiplets,\
PairCounts, RnaViewObjectError, RnaViewParseError, MinimalRnaviewParser,\
parse_filename, parse_number_of_pairs, verify_bp_counts,\
in_chain, is_canonical, is_not_canonical, is_stacked, is_not_stacked,\
is_tertiary, is_not_stacked_or_tertiary, is_tertiary_base_base
__author__ = "Greg Caporaso and Sandra Smit"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Greg Caporaso", "Sandra Smit", "Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Sandra Smit"
__email__ = "sandra.smit@colorado.edu"
__status__ = "Production"
#==============================================================================
# RNAVIEW OBJECTS TESTS
#==============================================================================
class BaseTests(TestCase):
"""Tests for Base class"""
def test_init(self):
"""Base __init__: should initialize on standard data"""
b = Base('A','30','G')
self.assertEqual(b.ChainId, 'A')
self.assertEqual(b.ResId, '30')
self.assertEqual(b.ResName, 'G')
#ResId or ResName can't be None or empty string
self.assertRaises(RnaViewObjectError, Base, None, None, 'G')
self.assertRaises(RnaViewObjectError, Base, '1', 'A', '')
self.assertRaises(RnaViewObjectError, Base, None, '', 'C')
#Can pass RnaViewSeqPos (str)
b = Base('C','12','A','10')
self.assertEqual(b.RnaViewSeqPos, '10')
def test_str(self):
"""Base __str__: should return correct string"""
b = Base('A','30','G')
self.assertEqual(str(b), 'A 30 G')
def test_eq(self):
"""Base ==: functions as expected """
# Define a standard to compare others
b = Base('A','30','G')
# Identical to b
b_a = Base('A','30','G')
# Differ in Chain from b
b_b = Base('B','30','G')
# Differ in ResId from b
b_c = Base('A','25','G')
# Differ in ResName from b
b_d = Base('A','30','C')
# Differ in RnaViewSeqPos
b_e = Base('A','30','G','2')
# Differ in everything from b
b_e = Base('C','12','U','1')
self.assertEqual(b == b, True)
self.assertEqual(b_a == b, True)
self.assertEqual(b_b == b, False)
self.assertEqual(b_c == b, False)
self.assertEqual(b_d == b, False)
self.assertEqual(b_e == b, False)
def test_ne(self):
"""Base !=: functions as expected"""
# Define a standard to compare others
b = Base('A','30','G')
# Identical to b
b_a = Base('A','30','G')
# Differ in Chain from b
b_b = Base('B','30','G')
# Differ in ResId from b
b_c = Base('A','25','G')
# Differ in ResName from b
b_d = Base('A','30','C')
# Differ in everything from b
b_e = Base('C','12','U')
self.assertEqual(b != b, False)
self.assertEqual(b_a != b, False)
self.assertEqual(b_b != b, True)
self.assertEqual(b_c != b, True)
self.assertEqual(b_d != b, True)
self.assertEqual(b_e != b, True)
class BasePairTests(TestCase):
"""Tests for BasePair object"""
def setUp(self):
"""setUp method for all tests in BasePairTests"""
self.b1 = Base('A','30','G')
self.b2 = Base('A','36','C')
self.bp = BasePair(self.b1, self.b2, Edges='H/W', Saenger='XI',\
Orientation='cis', Conformation=None)
def test_init(self):
"""BasePair __init__: should initialize on standard data"""
self.failUnless(self.bp.Up is self.b1)
self.failUnless(self.bp.Down is self.b2)
self.failUnless(self.bp.Conformation is None)
self.assertEqual(self.bp.Edges, 'H/W')
self.assertEqual(self.bp.Orientation, 'cis')
self.assertEqual(self.bp.Saenger, 'XI')
def test_str(self):
"""BasePair __str__: should return correct string"""
self.assertEqual(str(self.bp),
"Bases: A 30 G -- A 36 C; Annotation: H/W -- cis -- "+\
"None -- XI;")
def test_eq(self):
"BasePair ==: should function as expected"
# identical
up = Base('A','30','G')
down = Base('A','36','C')
bp = BasePair(up, down, Edges='H/W', Saenger='XI',\
Orientation='cis', Conformation=None)
self.assertEqual(bp == self.bp, True)
# diff up base
diff_up = Base('C','12','A')
bp = BasePair(diff_up, down, Edges='H/W', Saenger='XI',\
Orientation='cis', Conformation=None)
self.assertEqual(bp == self.bp, False)
# diff down base
diff_down = Base('D','13','U')
bp = BasePair(up, diff_down, Edges='H/W', Saenger='XI',\
Orientation='cis', Conformation=None)
self.assertEqual(bp == self.bp, False)
# diff edges
bp = BasePair(up, down, Edges='W/W', Saenger='XI',\
Orientation='cis', Conformation=None)
self.assertEqual(bp == self.bp, False)
# diff orientation
bp = BasePair(up, down, Edges='H/W', Saenger='XI',\
Orientation='tran', Conformation=None)
self.assertEqual(bp == self.bp, False)
# diff conformation
bp = BasePair(up, down, Edges='H/W', Saenger='XI',\
Orientation='cis', Conformation='syn')
self.assertEqual(bp == self.bp, False)
# diff saenger
bp = BasePair(up, down, Edges='H/W', Saenger='XIX',\
Orientation='cis', Conformation=None)
self.assertEqual(bp == self.bp, False)
def test_ne(self):
"BasePair !=: should function as expected"
# identical
up = Base('A','30','G')
down = Base('A','36','C')
bp = BasePair(up, down, Edges='H/W', Saenger='XI',\
Orientation='cis', Conformation=None)
self.assertEqual(bp != self.bp, False)
# diff up base
diff_up = Base('C','12','A')
bp = BasePair(diff_up, down, Edges='H/W', Saenger='XI',\
Orientation='cis', Conformation=None)
self.assertEqual(bp != self.bp, True)
# diff down base
diff_down = Base('D','13','U')
bp = BasePair(up, diff_down, Edges='H/W', Saenger='XI',\
Orientation='cis', Conformation=None)
self.assertEqual(bp != self.bp, True)
# diff edges
bp = BasePair(up, down, Edges='W/W', Saenger='XI',\
Orientation='cis', Conformation=None)
self.assertEqual(bp != self.bp, True)
# diff orientation
bp = BasePair(up, down, Edges='H/W', Saenger='XI',\
Orientation='tran', Conformation=None)
self.assertEqual(bp != self.bp, True)
# diff conformation
bp = BasePair(up, down, Edges='H/W', Saenger='XI',\
Orientation='cis', Conformation='syn')
self.assertEqual(bp != self.bp, True)
# diff saenger
bp = BasePair(up, down, Edges='H/W', Saenger='XIX',\
Orientation='cis', Conformation=None)
self.assertEqual(bp != self.bp, True)
def test_isWC(self):
"""BasePair isWC: should return True for GC or AU pair"""
bp = BasePair(Base('A','30','G'), Base('A','36','C'))
self.assertEqual(bp.isWC(), True)
bp = BasePair(Base('A','30','g'), Base('A','36','C'))
self.assertEqual(bp.isWC(), True)
bp = BasePair(Base('A','30','C'), Base('A','36','G'))
self.assertEqual(bp.isWC(), True)
bp = BasePair(Base('A','30','U'), Base('A','36','a'))
self.assertEqual(bp.isWC(), True)
bp = BasePair(Base('A','30','G'), Base('A','36','U'))
self.assertEqual(bp.isWC(), False)
def test_isWobble(self):
"""BasePair isWobble: should return True for GU pair"""
bp = BasePair(Base('A','30','G'), Base('A','36','U'))
self.assertEqual(bp.isWobble(), True)
bp = BasePair(Base('A','30','g'), Base('A','36','U'))
self.assertEqual(bp.isWobble(), True)
bp = BasePair(Base('A','30','u'), Base('A','36','g'))
self.assertEqual(bp.isWobble(), True)
bp = BasePair(Base('A','30','A'), Base('A','36','U'))
self.assertEqual(bp.isWobble(), False)
class BasePairsTests(TestCase):
"""Tests for BasePairs object"""
def setUp(self):
"""setUp method for all BasePairs tests"""
self.a1 = BasePair(Base('A','30','G'), Base('A','36','C'), Saenger='XX')
self.a2 = BasePair(Base('A','31','A'), Base('A','35','U'), Saenger='XX')
self.a3 = BasePair(Base('A','40','G'), Base('A','60','U'), Saenger='V')
self.a4 = BasePair(Base('A','41','A'), Base('A','58','U'),\
Saenger=None)
self.ab1 = BasePair(Base('A','41','A'), Base('B','58','U'))
self.ac1 = BasePair(Base('A','10','C'), Base('C','3','G'))
self.bc1 = BasePair(Base('B','41','A'), Base('C','1','U'))
self.bn1 = BasePair(Base('B','41','A'), Base(None,'1','U'))
self.cd1 = BasePair(Base('C','41','A'), Base('D','1','U'))
self.bp1 = BasePair(Base('A','34','U'), Base('A','40','A'))
self.bp2 = BasePair(Base('A','35','C'), Base('A','39','G'))
self.bp3 = BasePair(Base('B','32','G'), Base('B','38','U'))
self.bp4 = BasePair(Base('B','33','G'), Base('B','37','C'))
self.bp5 = BasePair(Base('A','31','C'), Base('B','41','G'))
self.bp6 = BasePair(Base('A','32','U'), Base('B','40','A'))
self.bp7 = BasePair(Base('A','37','U'), Base('B','35','A'))
self.pairs = BasePairs([self.bp1, self.bp2, self.bp3, self.bp4,\
self.bp5, self.bp6, self.bp7])
def test_init(self):
"""BasePairs __init__: should work with or without Model"""
# init from list
bps = BasePairs([self.a1, self.a2])
self.failUnless(bps[0] is self.a1)
self.failUnless(bps[1] is self.a2)
# init from tuple
bps = BasePairs((self.a1, self.a2))
self.failUnless(bps[0] is self.a1)
self.failUnless(bps[1] is self.a2)
def test_str(self):
"""BasePairs __str__: should produce expected string"""
b1 = BasePair(Base('A','30','G'), Base('A','36','C'), Saenger='XX')
b2 = BasePair(Base('A','31','A'), Base('A','35','U'),\
Orientation='cis',Edges='W/W')
bps = BasePairs([b1, b2])
exp_lines = [
"===================================================================",\
"Bases: Up -- Down; Annotation: Edges -- Orient. -- Conf. -- Saenger",\
"===================================================================",\
"Bases: A 30 G -- A 36 C; Annotation: None -- None -- None -- XX;",\
"Bases: A 31 A -- A 35 U; Annotation: W/W -- cis -- None -- None;"]
self.assertEqual(str(bps), '\n'.join(exp_lines))
def test_select(self):
"""BasePairs select: should work with any good function"""
def xx(bp):
if bp.Saenger == 'XX':
return True
return False
bps = BasePairs([self.a1, self.a2, self.a3, self.a4])
obs = bps.select(xx)
self.assertEqual(len(obs), 2)
self.failUnless(obs[0] is self.a1)
self.failUnless(obs[1] is self.a2)
for i in obs:
self.assertEqual(i.Saenger, 'XX')
def test_PresentChains(self):
"""BasePairs PresentChains: should work on single/multiple chain(s)"""
bps = BasePairs([self.a1, self.a2, self.a3, self.a4])
self.assertEqual(bps.PresentChains, ['A'])
bps = BasePairs([self.a1, self.ab1])
self.assertEqualItems(bps.PresentChains, ['A','B'])
bps = BasePairs([self.a1, self.ab1, self.ac1, self.bc1])
self.assertEqualItems(bps.PresentChains, ['A','B', 'C'])
bps = BasePairs([self.a1, self.ab1, self.bn1])
self.assertEqualItems(bps.PresentChains, [None, 'A','B'])
def test_cliques(self):
"""BasePairs cliques: single/multiple chains and cliques"""
#one chain, one clique
bps = BasePairs([self.a1, self.a2, self.a3, self.a4])
obs_cl = list(bps.cliques())
self.assertEqual(len(obs_cl), 1)
#3 chains, 2 cliques
bps = BasePairs([self.a1, self.a2, self.cd1])
obs_cl = list(bps.cliques())
self.assertEqual(len(obs_cl), 2)
self.assertEqual(len(obs_cl[0]), 2)
self.assertEqual(len(obs_cl[1]), 1)
self.failUnless(obs_cl[1][0] is self.cd1)
self.assertEqual(obs_cl[1].PresentChains, ['C','D'])
#5 chains, 1 clique
bps = BasePairs([self.a1, self.ab1, self.bc1, self.bn1, self.cd1])
obs_cl = list(bps.cliques())
self.assertEqual(len(obs_cl), 1)
self.assertEqual(len(obs_cl[0]), 5)
self.failUnless(obs_cl[0][0] is self.a1)
self.assertEqualItems(obs_cl[0].PresentChains, ['A','B','C','D', None])
def test_hasConflicts(self):
"""BasePairs hadConflicts: handle chains and residue IDs"""
# no conflict
b1 = BasePair(Base('A','30','G'), Base('A','36','C'), Saenger='XX')
b2 = BasePair(Base('A','31','A'), Base('A','35','U'),\
Orientation='cis',Edges='W/W')
b3 = BasePair(Base('A','15','G'), Base('A','42','C'))
bps = BasePairs([b1, b2, b3])
self.assertEqual(bps.hasConflicts(), False)
self.assertEqual(bps.hasConflicts(return_conflict=True), (False, None))
# conflict within chain
b1 = BasePair(Base('A','30','G'), Base('A','36','C'), Saenger='XX')
b2 = BasePair(Base('A','31','A'), Base('A','35','U'),\
Orientation='cis',Edges='W/W')
b3 = BasePair(Base('A','30','G'), Base('A','42','C'))
bps = BasePairs([b1, b2, b3])
self.assertEqual(bps.hasConflicts(), True)
# conflict within chain -- return conflict
b1 = BasePair(Base('A','30','G'), Base('A','36','C'), Saenger='XX')
b2 = BasePair(Base('A','31','A'), Base('A','35','U'),\
Orientation='cis',Edges='W/W')
b3 = BasePair(Base('A','30','G'), Base('A','42','C'))
bps = BasePairs([b1, b2, b3])
self.assertEqual(bps.hasConflicts(return_conflict=True),\
(True, "A 30 G"))
# no conflict, same residue ID, different chain
b1 = BasePair(Base('A','30','G'), Base('A','36','C'), Saenger='XX')
b2 = BasePair(Base('A','31','A'), Base('A','35','U'),\
Orientation='cis',Edges='W/W')
b3 = BasePair(Base('C','30','G'), Base('A','42','C'))
bps = BasePairs([b1, b2, b3])
self.assertEqual(bps.hasConflicts(), False)
class BaseMultipletTests(TestCase):
"""Tests for BaseMultiplet object"""
def test_init(self):
"""BaseMultiplet __init__: should work as expected"""
b1 = Base('A','30','A')
b2 = Base('A','35','G')
b3 = Base('A','360','U')
bm = BaseMultiplet([b1, b2, b3])
self.failUnless(bm[0] is b1)
self.failUnless(bm[2] is b3)
#should work from tuple also
bm = BaseMultiplet((b1, b2, b3))
self.failUnless(bm[0] is b1)
self.failUnless(bm[2] is b3)
def test_str(self):
"""BaseMultiplet __str__: should give expected string"""
b1 = Base('A','30','A')
b2 = Base('A','35','G')
b3 = Base('A','360','U')
bm = BaseMultiplet([b1, b2, b3])
exp = "A 30 A -- A 35 G -- A 360 U;"
self.assertEqual(str(bm), exp)
class BaseMultipletsTests(TestCase):
"""Tests for BaseMultiplets object"""
def test_init(self):
"""BaseMultiplets __init__: from list and tuple"""
b1 = Base('A','30','A')
b2 = Base('A','35','G')
b3 = Base('A','360','U')
bm1 = BaseMultiplet([b1, b2, b3])
b4 = Base('B','12','C')
b5 = Base('B','42','U')
b6 = Base('C','2','A')
bm2 = BaseMultiplet([b4, b5, b6])
bms = BaseMultiplets([bm1, bm2])
self.failUnless(bms[0] is bm1)
self.failUnless(bms[1] is bm2)
self.assertEqual(bms[1][2].ResId, '2')
#should work from tuple also
bms = BaseMultiplets((bm1, bm2))
self.failUnless(bms[0] is bm1)
self.failUnless(bms[1] is bm2)
self.assertEqual(bms[1][2].ResId, '2')
def test_str(self):
"""BaseMultiplets __str__: should give expected string"""
b1 = Base('A','30','A')
b2 = Base('A','35','G')
b3 = Base('A','360','U')
bm1 = BaseMultiplet([b1, b2, b3])
b4 = Base('B','12','C')
b5 = Base('B','42','U')
b6 = Base('C','2','A')
bm2 = BaseMultiplet([b4, b5, b6])
bms = BaseMultiplets([bm1, bm2])
exp_lines = [\
"A 30 A -- A 35 G -- A 360 U;",\
"B 12 C -- B 42 U -- C 2 A;"]
self.assertEqual(str(bms), '\n'.join(exp_lines))
class TestPairCounts(TestCase):
"""Tests for PairCounts object.
Contains only test for __init__. Everything should fucntion as a dict.
"""
def test_init(self):
"""PairCounts __init__: should work as dict"""
res = PairCounts(\
{'Standard':1, 'WS--cis':300, 'Bifurcated': 2, 'HS-tran':0})
self.assertEqual(res['Standard'], 1)
self.assertEqual(res['WS--cis'], 300)
self.assertEqual(res['Bifurcated'], 2)
self.assertEqual(res['HS-tran'], 0)
#==============================================================================
# SELECTION FUNCTIONS TESTS
#==============================================================================
class SelectionFunctionTests(TestCase):
def test_in_chain(self):
b1, b2 = Base('A','30','C'), Base('A','40','G')
bp1 = BasePair(b1, b2)
self.assertEqual(in_chain("A")(bp1), True)
self.assertEqual(in_chain(["A",'B'])(bp1), True)
self.assertEqual(in_chain("B")(bp1), False)
b3, b4 = Base('A','30','C'), Base('B','40','G')
bp2 = BasePair(b3,b4)
self.assertEqual(in_chain("A")(bp2), False)
self.assertEqual(in_chain(["A",'B'])(bp2), True)
self.assertEqual(in_chain("AB")(bp2), True)
self.assertEqual(in_chain("AC")(bp2), False)
b5, b6 = Base('A','30','C'), Base('C','40','G')
bp3 = BasePair(b5,b6)
self.assertEqual(in_chain("A")(bp3), False)
self.assertEqual(in_chain("C")(bp3), False)
self.assertEqual(in_chain(["A",'B'])(bp3), False)
self.assertEqual(in_chain("AC")(bp3), True)
def test_is_canocical(self):
"""is_canonical: work on annotation, not base identity"""
b1, b2 = Base('A','30','C'), Base('A','40','G')
bp = BasePair(b1, b2, Edges='+/+')
self.assertEqual(is_canonical(bp), True)
bp = BasePair(b1, b2, Edges='-/-')
self.assertEqual(is_canonical(bp), True)
bp = BasePair(b1, b2, Edges='W/W')
self.assertEqual(is_canonical(bp), False)
bp = BasePair(b1, b2, Edges='W/W', Orientation='cis',Saenger='XXVIII')
self.assertEqual(is_canonical(bp), True)
def test_is_not_canocical(self):
"""is_not_canonical: opposite of is_canonical"""
b1, b2 = Base('A','30','C'), Base('A','40','G')
bp = BasePair(b1, b2, Edges='+/+')
self.assertEqual(is_not_canonical(bp), False)
bp = BasePair(b1, b2, Edges='-/-')
self.assertEqual(is_not_canonical(bp), False)
bp = BasePair(b1, b2, Edges='W/W')
self.assertEqual(is_not_canonical(bp), True)
bp = BasePair(b1, b2, Edges='W/W', Orientation='cis',Saenger='XXVIII')
self.assertEqual(is_not_canonical(bp), False)
def test_is_stacked(self):
"""is_stacked: checks annotation, not base identity"""
b1, b2 = Base('A','30','C'), Base('A','40','A')
bp = BasePair(b1, b2, Edges='stacked')
self.assertEqual(is_stacked(bp), True)
bp = BasePair(b1, b2, Edges='H/?')
self.assertEqual(is_stacked(bp), False)
def test_is_not_stacked(self):
"""is_not_stacked: opposite of is_stacked"""
b1, b2 = Base('A','30','C'), Base('A','40','A')
bp = BasePair(b1, b2, Edges='stacked')
self.assertEqual(is_not_stacked(bp), False)
bp = BasePair(b1, b2, Edges='H/?')
self.assertEqual(is_not_stacked(bp), True)
def test_is_tertiary(self):
"""is_tertiary: checks annotation, not base identity"""
b1, b2 = Base('A','30','C'), Base('A','40','U')
bp = BasePair(b1, b2, Saenger='!1H(b_b)')
self.assertEqual(is_tertiary(bp), True)
bp = BasePair(b1, b2, Edges='H/?', Saenger='XX')
self.assertEqual(is_tertiary(bp), False)
bp = BasePair(b1,b2, Edges='stacked')
self.assertEqual(is_tertiary(bp), False)
def test_is_not_stacked_or_tertiary(self):
"""is_not_stacked_or_tertiary: checks annotation, not base identity"""
b1, b2 = Base('A','30','C'), Base('A','40','U')
bp = BasePair(b1, b2, Saenger='!1H(b_b)')
self.assertEqual(is_not_stacked_or_tertiary(bp), False)
bp = BasePair(b1, b2, Edges='stacked')
self.assertEqual(is_not_stacked_or_tertiary(bp), False)
bp = BasePair(b1, b2, Edges='W/W', Saenger='XX')
self.assertEqual(is_not_stacked_or_tertiary(bp), True)
def test_is_tertiary_base_base(self):
"""is_tertiary_base_base: checks annotation, not base identity"""
b1, b2 = Base('A','30','C'), Base('A','40','U')
bp = BasePair(b1, b2, Saenger='!1H(b_b)')
self.assertEqual(is_tertiary_base_base(bp), True)
bp = BasePair(b1, b2, Edges='H/?', Saenger='!(s_s)')
self.assertEqual(is_tertiary_base_base(bp), False)
#==============================================================================
# RNAVIEW PARSER TESTS
#==============================================================================
class RnaviewParserTests(TestCase):
"""Tests for RnaviewParser and related code"""
def test_is_roman_numeral(self):
"""is_roman_numeral: should work for all, including comma"""
self.assertEqual(is_roman_numeral('XIII'),True)
self.assertEqual(is_roman_numeral('Xiii'),False)
self.assertEqual(is_roman_numeral('MMCDXXVIII'),True)
self.assertEqual(is_roman_numeral('XII,XIII'),True)
self.assertEqual(is_roman_numeral('n/a'),False)
def test_is_edge(self):
"""is_edge: should identify valid edges correctly"""
self.assertEqual(is_edge('H/W'),True)
self.assertEqual(is_edge('./W'),True)
self.assertEqual(is_edge('+/+'),True)
self.assertEqual(is_edge(' '),False)
self.assertEqual(is_edge('P/W'),False)
self.assertEqual(is_edge('X/W'),True)
self.assertEqual(is_edge('X/X'),True)
def test_is_orientation(self):
"""is_orientation: should fail on anything but 'cis' or 'tran'"""
self.assertEqual(is_orientation('cis'),True)
self.assertEqual(is_orientation('tran'),True)
self.assertEqual(is_orientation('tranxxx'),False)
def test_parse_annotation(self):
"""parse_annotation: should return correct tuple of 4 or raise error
"""
self.assertEqual(parse_annotation(['W/S', 'tran', 'syn', 'syn',\
'n/a']), ('W/S','tran','syn syn','n/a'))
self.assertEqual(parse_annotation(['syn','stacked']),\
('stacked', None,'syn',None))
self.assertEqual(parse_annotation(['W/W','tran','syn','XII,XIII']),\
('W/W', 'tran','syn','XII,XIII'))
self.assertEqual(parse_annotation(['./W','cis','!1H(b_b)']),\
('./W', 'cis',None,'!1H(b_b)'))
self.assertEqual(parse_annotation([]),\
(None, None, None, None))
self.assertRaises(RnaViewParseError, parse_annotation, ['X--X'])
def test_parse_filename(self):
"""parse_filename: should return name of file"""
lines = ["PDB data file name: pdb1t4l.ent_nmr.pdb"]
self.assertEqual(parse_filename(lines), 'pdb1t4l.ent_nmr.pdb')
lines = ["PDB data file name: pdb1t4l.ent_nmr.pdb","other line"]
self.assertRaises(RnaViewParseError, parse_filename, lines)
def test_parse_uncommon_residues(self):
"""parse_uncommon_residues: should fail on some missing residue info
"""
lines = UC_LINES.split('\n')
self.assertEqual(parse_uncommon_residues(lines),\
{('D','16','TLN'):'u',('D','17','LCG'):'g',\
('0','2588','OMG'):'g',(' ','2621','PSU'):'P'})
for l in UC_LINES_WRONG.split('\n'):
self.assertRaises(RnaViewParseError, parse_uncommon_residues, [l])
def test_parse_base_pairs_basic(self):
"""parse_base_pairs: basic input"""
basic_lines =\
['25_437, 0: 34 C-G 448 0: +/+ cis XIX',\
'26_436, 0: 35 U-A 447 0: -/- cis XX']
bp1 = BasePair(Up=Base('0','34','C','25'),\
Down=Base('0','448','G','437'),\
Edges='+/+', Orientation='cis',Conformation=None,Saenger='XIX')
bp2 = BasePair(Up=Base('0','35','U','26'),\
Down=Base('0','447','A','436'),\
Edges='-/-', Orientation='cis',Conformation=None,Saenger='XX')
bps = BasePairs([bp1,bp2])
obs = parse_base_pairs(basic_lines)
for o,e in zip(obs,[bp1,bp2]):
self.assertEqual(o,e)
self.assertEqual(len(obs), 2)
basic_lines =\
['25_437, 0: 34 c-P 448 0: +/+ cis XIX',\
'26_436, 0: 35 U-X 447 0: -/- cis XX']
self.assertRaises(RnaViewParseError, parse_base_pairs, basic_lines)
basic_lines =\
['25_437, 0: 34 c-P 448 0: +/+ cis XIX',\
'26_436, 0: 35 I-A 447 0: -/- cis XX']
bp1 = BasePair(Up=Base('0','34','c','25'),\
Down=Base('0','448','P','437'),\
Edges='+/+', Orientation='cis',Conformation=None,Saenger='XIX')
bp2 = BasePair(Up=Base('0','35','I','26'),\
Down=Base('0','447','A','436'),\
Edges='-/-', Orientation='cis',Conformation=None,Saenger='XX')
bps = BasePairs([bp1,bp2])
obs = parse_base_pairs(basic_lines)
for o,e in zip(obs,[bp1,bp2]):
self.assertEqual(o,e)
self.assertEqual(len(obs), 2)
lines = ['1_2, : 6 G-G 7 : stacked',\
'1_16, : 6 G-C 35 : +/+ cis XIX']
bp1 = BasePair(Up=Base(' ','6','G','1'),\
Down=Base(' ','7','G','2'), Edges='stacked')
bp2 = BasePair(Up=Base(' ','6','G','1'),\
Down=Base(' ','35','C','16'),\
Edges='+/+', Orientation='cis',Conformation=None,Saenger='XIX')
obs = parse_base_pairs(lines)
for o,e in zip(obs,[bp1,bp2]):
self.assertEqual(o,e)
self.assertEqual(len(obs), 2)
def test_parse_base_multiplets_basic(self):
"""parse_base_multiplets: basic input"""
basic_lines =\
['235_237_254_| [20 3] 0: 246 G + 0: 248 A + 0: 265 U',\
'273_274_356_| [21 3] 0: 284 C + 0: 285 A + 0: 367 G']
bm1 = BaseMultiplet([Base('0','246','G','235'),\
Base('0','248','A','237'), Base('0','265','U','254')])
bm2 = BaseMultiplet([Base('0','284','C','273'),\
Base('0','285','A','274'), Base('0','367','G','356')])
bms = BaseMultiplets([bm1,bm2])
obs = parse_base_multiplets(basic_lines)
for o,e in zip(obs,bms):
for base_x, base_y in zip(o,e):
self.assertEqual(base_x,base_y)
self.assertEqual(len(obs), 2)
self.assertEqual(len(obs[0]), 3)
basic_lines =\
['235_237_254_| [20 3] 0: 246 G + 0: 248 A + 0: 265 I',\
'273_274_356_| [21 3] 0: 284 P + 0: 285 a + 0: 367 G']
bm1 = BaseMultiplet([Base('0','246','G','235'),\
Base('0','248','A','237'), Base('0','265','I','254')])
bm2 = BaseMultiplet([Base('0','284','P','273'),\
Base('0','285','a','274'), Base('0','367','G','356')])
bms = BaseMultiplets([bm1,bm2])
obs = parse_base_multiplets(basic_lines)
for o,e in zip(obs,bms):
for base_x, base_y in zip(o,e):
self.assertEqual(base_x,base_y)
self.assertEqual(len(obs), 2)
self.assertEqual(len(obs[0]), 3)
def test_parse_base_multiplets_errors(self):
"""parse_base_multiplets: error checking"""
# Unknown base
basic_lines =\
['235_237_254_| [20 3] 0: 246 X + 0: 248 A + 0: 265 U',\
'273_274_356_| [21 3] 0: 284 C + 0: 285 A + 0: 367 G']
self.assertRaises(RnaViewParseError, parse_base_multiplets,\
basic_lines)
# number of rnaview_seqpos doesn't match number of bases
basic_lines =\
['235_237_| [20 3] 0: 246 X + 0: 248 A + 0: 265 U',\
'273_274_356_| [21 3] 0: 284 C + 0: 285 A + 0: 367 G']
self.assertRaises(RnaViewParseError, parse_base_multiplets,\
basic_lines)
# Number of reported bases incorrect
basic_lines =\
['235_237_254_| [20 3] 0: 246 X + 0: 248 A + 0: 265 U',\
'273_274_356_| [21 5] 0: 284 C + 0: 285 A + 0: 367 G']
self.assertRaises(RnaViewParseError, parse_base_multiplets,\
basic_lines)
def test_parse_number_of_pairs(self):
"""parse_number_of_pairs: good/bad input"""
lines = ["The total base pairs = 31 (from 65 bases)"]
exp = {'NUM_PAIRS':31, 'NUM_BASES':65}
self.assertEqual(parse_number_of_pairs(lines), exp)
lines = ["The total base pairs = 31 (from 65 bases)","XXX"]
self.assertRaises(RnaViewParseError, parse_number_of_pairs, lines)
lines = ["The total base pairs = 31(from 65 bases)"]
self.assertRaises(RnaViewParseError, parse_number_of_pairs, lines)
def test_parse_pair_counts(self):
"""parse_pair_counts: should work for even number of lines"""
lines = PC_COUNTS1.split('\n')
res = parse_pair_counts(lines)
self.assertEqual(res['Standard'], 1)
self.assertEqual(res['WS--cis'], 300)
self.assertEqual(res['Bifurcated'], 2)
self.assertEqual(res['HS-tran'], 0)
lines = PC_COUNTS2.split('\n')
res = parse_pair_counts(lines)
self.assertEqual(res['Standard'], 19)
self.assertEqual(res['WW-tran'], 1)
self.assertEqual(res['HS-tran'], 0)
self.failIf('Bifurcated' in res)
self.assertRaises(RnaViewParseError, parse_pair_counts,\
PC_COUNTS2.split('\n')[:-1])
self.assertEqual(parse_pair_counts([]),{})
def test_verify_bp_counts(self):
"""verify_bp_count: should raise an error if bp counts are wrong"""
lines = RNAVIEW_PDB_REAL.split('\n')
obs = RnaviewParser(lines)
# this shouldn't raise an error
verify_bp_counts(obs['BP'],11,obs['PC'])
# reported number isn't right
self.assertRaises(RnaViewParseError,\
verify_bp_counts, obs['BP'], 12, obs['PC'])
# No longer checks for the base pair counts reported in the
# dictionary, b/c this number doens't match the total when
# modified bases are present.
## PREVIOUS TEST:
# pair_counts isn't right
#obs['PC']['Standard'] = 14
#self.assertRaises(RnaViewParseError,\
# verify_bp_counts, obs['BP'], 11, obs['PC'])
## NEW TEST
obs['PC']['Standard'] = 14
verify_bp_counts(obs['BP'],11,obs['PC'])
def test_MinimalRnaviewParser(self):
"""MinimalRnaviewParser: should divide lines into right classes"""
exp = {'FN': ['PDB data file name: 1EHZ.pdb'], 'UC':\
['uncommon residue I 1 on chain A [#1] assigned to: I',
'uncommon residue 2MG 10 on chain A [#10] assigned to: g'],
'BP':['1_72, A: 1 I-C 72 A: X/X cis n/a',
'58_60, A: 58 a-C 60 A: S/S tran syn !(s_s)'],
'BM':['9_12_23_| [1 3] A: 9 A + A: 12 U + A: 23 A',
'13_22_46_| [2 3] A: 13 C + A: 22 G + A: 46 g'],
'PC':['Standard WW--cis WW-tran HH--cis HH-tran SS--cis SS-tran',
'19 3 1 0 1 0 0',
'WH--cis WH-tran WS--cis WS-tran HS--cis HS-tran',
'0 3 0 2 0 0'],
'NP':['The total base pairs = 30 (from 76 bases)']}
obs = MinimalRnaviewParser(RNAVIEW_LINES.split('\n'))
self.assertEqual(len(obs), len(exp))
self.assertEqual(obs, exp)
def test_MinimalRnaviewParser_short(self):
"""MinimalRnaviewparser: should leave lists empty if no lines found"""
lines = RNAVIEW_LINES_SHORT.split('\n')
res = MinimalRnaviewParser(lines)
self.assertEqual(len(res['FN']), 1)
self.assertEqual(res['UC'], [])
self.assertEqual(res['BM'], [])
self.assertEqual(len(res['PC']), 4)
self.assertEqual(len(res['BP']), 11)
self.assertEqual(len(res['NP']), 1)
def test_RnaviewParser(self):
"""RnaviewParser: should work with/without model and/or verification
"""
rnaview_lines = RNAVIEW_PDB_REAL.split('\n')
obs = RnaviewParser(rnaview_lines)
self.assertEqual(obs['FN'], 'pdb430d.ent')
self.assertEqual(len(obs['UC']),1)
self.assertEqual(len(obs['BP']),19)
self.assertEqual(len(obs['BM']),0)
self.assertEqual(obs['BM'],BaseMultiplets())
self.assertEqual(obs['PC']['Standard'],7)
self.assertEqual(obs['BP'][2].Down.ResName,'c')
self.assertEqual(obs['BP'][6].Edges,'stacked')
self.assertEqual(obs['NP']['NUM_PAIRS'], 11)
self.assertEqual(obs['NP']['NUM_BASES'], 29)
def test_RnaviewParser_error(self):
"""RnaviewParser: strict or not"""
lines = RNAVIEW_LINES_ERROR.split('\n')
self.assertRaises(RnaViewParseError, RnaviewParser, lines, strict=True)
obs = RnaviewParser(lines, strict=False)
self.assertEqual(obs['NP'], None)
self.assertEqual(obs['BP'][1].Up.ResId, '2')
self.assertEqual(obs['PC']['Standard'], 6)
RNAVIEW_LINES=\
"""PDB data file name: 1EHZ.pdb
uncommon residue I 1 on chain A [#1] assigned to: I
uncommon residue 2MG 10 on chain A [#10] assigned to: g
BEGIN_base-pair
1_72, A: 1 I-C 72 A: X/X cis n/a
58_60, A: 58 a-C 60 A: S/S tran syn !(s_s)
END_base-pair
Summary of triplets and higher multiplets
BEGIN_multiplets
9_12_23_| [1 3] A: 9 A + A: 12 U + A: 23 A
13_22_46_| [2 3] A: 13 C + A: 22 G + A: 46 g
END_multiplets
The total base pairs = 30 (from 76 bases)
------------------------------------------------
Standard WW--cis WW-tran HH--cis HH-tran SS--cis SS-tran
19 3 1 0 1 0 0
WH--cis WH-tran WS--cis WS-tran HS--cis HS-tran
0 3 0 2 0 0
------------------------------------------------
"""
RNAVIEW_LINES_SHORT=\
"""PDB data file name: pdb17ra.ent_nmr.pdb
BEGIN_base-pair
1_21, : 1 G-C 21 : +/+ cis XIX
2_20, : 2 G-C 20 : +/+ cis XIX
3_19, : 3 C-G 19 : +/+ cis XIX
4_18, : 4 G-U 18 : W/W cis XXVIII
5_6, : 5 U-A 6 : stacked
5_17, : 5 U-A 17 : -/- cis XX
7_16, : 7 A-U 16 : W/W cis n/a
8_15, : 8 G-C 15 : +/+ cis XIX
9_14, : 9 G-C 14 : +/+ cis XIX
10_14, : 10 A-C 14 : stacked
11_13, : 11 U-A 13 : S/W tran n/a
END_base-pair
The total base pairs = 9 (from 21 bases)
------------------------------------------------
Standard WW--cis WW-tran HH--cis HH-tran SS--cis SS-tran
6 2 0 0 0 0 0
WH--cis WH-tran WS--cis WS-tran HS--cis HS-tran
0 0 0 1 0 0
------------------------------------------------"""
PC_COUNTS1=\
""" Standard WW--cis WW-tran HH--cis HH-tran SS--cis SS-tran
1 0 0 0 0 0 0
WH--cis WH-tran WS--cis WS-tran HS--cis HS-tran
12 0 300 0 0 0
Single-bond Bifurcated
0 2"""
PC_COUNTS2=\
""" Standard WW--cis WW-tran HH--cis HH-tran SS--cis SS-tran
19 3 1 0 1 0 0
WH--cis WH-tran WS--cis WS-tran HS--cis HS-tran
0 3 0 2 0 0"""
UC_LINES=\
"""uncommon residue TLN 16 on chain D [#16] assigned to: u
uncommon residue LCG 17 on chain D [#17] assigned to: g
uncommon residue OMG 2588 on chain 0 [#2430] assigned to: g
uncommon residue PSU 2621 on chain [#2463] assigned to: P"""
UC_LINES_WRONG=\
"""uncommon residue 16 on chain D [#16] assigned to: u
uncommon residue LCG on chain D [#17] assigned to: g
uncommon residue OMG 2588 on chain 0 [#2430] assigned to: """
RNAVIEW_LINES_TOTAL=\
"""PDB data file name: 1EHZ.pdb
uncommon residue PSU 1 on chain A [#1] assigned to: P
uncommon residue 2MG 10 on chain A [#10] assigned to: g
BEGIN_base-pair
1_72, A: 1 P-C 20 A: +/+ cis n/a
1_72, A: 2 A-U 19 A: H/W cis n/a
58_60, A: 10 a-U 60 A: S/S tran syn !(s_s)
END_base-pair
Summary of triplets and higher multiplets
BEGIN_multiplets
9_12_23_| [1 3] A: 1 P + A: 2 U + A: 60 U
13_22_46_| [2 3] A: 20 C + A: 19 U + A: 60 U
END_multiplets
The total base pairs = 30 (from 76 bases)
------------------------------------------------
Standard WW--cis WW-tran HH--cis HH-tran SS--cis SS-tran
19 3 1 0 1 0 0
WH--cis WH-tran WS--cis WS-tran HS--cis HS-tran
0 3 0 2 0 0
------------------------------------------------
"""
RNAVIEW_PDB_REAL=\
"""PDB data file name: pdb430d.ent
uncommon residue +C 27 on chain A [#27] assigned to: c
BEGIN_base-pair
1_29, A: 1 G-C 29 A: +/+ cis XIX
2_28, A: 2 G-C 28 A: +/+ cis XIX
3_27, A: 3 G-c 27 A: +/+ cis XIX
4_26, A: 4 U-A 26 A: -/- cis XX
5_25, A: 5 G-C 25 A: +/+ cis XIX
6_24, A: 6 C-G 24 A: +/+ cis XIX
7_8, A: 7 U-C 8 A: stacked
8_9, A: 8 C-A 9 A: stacked
9_21, A: 9 A-A 21 A: H/H tran II
11_20, A: 11 U-A 20 A: W/H tran XXIV
12_19, A: 12 A-G 19 A: H/S tran XI
13_18, A: 13 C-G 18 A: +/+ cis XIX
14_17, A: 14 G-A 17 A: S/H tran XI
25_26, A: 25 C-A 26 A: stacked
7_23, A: 7 U-C 23 A: W/W tran !1H(b_b)
8_23, A: 8 C-C 23 A: S/H cis !1H(b_b).
10_11, A: 10 G-U 11 A: S/H cis !1H(b_b)
11_21, A: 11 U-A 21 A: W/H tran !1H(b_b)
10_20, A: 10 G-A 20 A: W/. tran !(s_s)
END_base-pair
The total base pairs = 11 (from 29 bases)
------------------------------------------------
Standard WW--cis WW-tran HH--cis HH-tran SS--cis SS-tran
7 0 0 0 1 0 0
WH--cis WH-tran WS--cis WS-tran HS--cis HS-tran
0 1 0 0 0 2
------------------------------------------------"""
RNAVIEW_LINES_ERROR=\
"""PDB data file name: pdb17ra.ent_nmr.pdb
BEGIN_base-pair
1_21, : 1 G-C 21 : +/+ cis XIX
2_20, : 2 G-C 20 : +/+ cis XIX
3_19, : 3 C-G 19 : +/+ cis XIX
4_18, : 4 G-U 18 : W/W cis XXVIII
5_6, : 5 U-A 6 : stacked
5_17, : 5 U-A 17 : -/- cis XX
7_16, : 7 A-U 16 : W/W cis n/a
8_15, : 8 G-C 15 : +/+ cis XIX
9_14, : 9 G-C 14 : +/+ cis XIX
10_14, : 10 A-C 14 : stacked
11_13, : 11 U-A 13 : S/W tran n/a
END_base-pair
The total base pairs = 9(from 21 bases)
------------------------------------------------
Standard WW--cis WW-tran HH--cis HH-tran SS--cis SS-tran
6 2 0 0 0 0 0
WH--cis WH-tran WS--cis WS-tran HS--cis HS-tran
0 0 0 1 0 0
------------------------------------------------"""
#run if called from command-line
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_parse/test_sprinzl.py 000644 000765 000024 00000033626 12024702176 022700 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
#file evo/parsers/test_sprinzl.py
"""Unit tests for the Sprinzl tRNA database parser.
"""
from string import strip
from cogent.parse.sprinzl import OneLineSprinzlParser, GenomicSprinzlParser,\
_fix_sequence, get_pieces, get_counts, sprinzl_to_vienna
from cogent.util.unit_test import TestCase, main
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Jeremy Widmann", "Sandra Smit"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
sample_file = """Accession@@@AA@@@Anticodon@@@Species@@@Strain@@@0@@@1@@@2@@@3@@@4@@@5@@@6@@@7@@@8@@@9@@@10@@@11@@@12@@@13@@@14@@@15@@@16@@@17@@@17A@@@18@@@19@@@20@@@20A@@@20B@@@21@@@22@@@23@@@24@@@25@@@26@@@27@@@28@@@29@@@30@@@31@@@32@@@33@@@34@@@35@@@36@@@37@@@38@@@39@@@40@@@41@@@42@@@43@@@44@@@45@@@e11@@@e12@@@e13@@@e14@@@e15@@@e16@@@e17@@@e1@@@e2@@@e3@@@e4@@@e5@@@e27@@@e26@@@e25@@@e24@@@e23@@@e22@@@e21@@@46@@@47@@@48@@@49@@@5.@@@51@@@52@@@53@@@54@@@55@@@56@@@57@@@58@@@59@@@60@@@61@@@62@@@63@@@64@@@65@@@66@@@67@@@68@@@69@@@7.@@@71@@@72@@@73@@@74@@@75@@@76
GA0000001@@@Ala@@@TGC@@@Haemophilus influenzae@@@Rd KW20@@@-@@@G@@@G@@@G@@@G@@@C@@@C@@@T@@@T@@@A@@@G@@@C@@@T@@@C@@@A@@@G@@@C@@@T@@@-@@@G@@@G@@@G@@@-@@@-@@@A@@@G@@@A@@@G@@@C@@@G@@@C@@@C@@@T@@@G@@@C@@@T@@@T@@@T@@@G@@@C@@@A@@@C@@@G@@@C@@@A@@@G@@@G@@@A@@@G@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@G@@@T@@@C@@@A@@@G@@@C@@@G@@@G@@@T@@@T@@@C@@@G@@@A@@@T@@@C@@@C@@@C@@@G@@@C@@@T@@@A@@@G@@@G@@@C@@@T@@@C@@@C@@@A@@@-@@@-@@@-
GA0000002@@@Ala@@@GGC@@@Chlamydia pneumoniae @@@AR39@@@-@@@G@@@G@@@G@@@G@@@T@@@A@@@T@@@T@@@A@@@G@@@C@@@T@@@C@@@A@@@G@@@T@@@T@@@-@@@G@@@G@@@T@@@-@@@-@@@A@@@G@@@A@@@G@@@C@@@G@@@C@@@A@@@A@@@C@@@A@@@A@@@T@@@G@@@G@@@C@@@A@@@T@@@T@@@G@@@T@@@T@@@G@@@A@@@G@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@G@@@T@@@C@@@A@@@G@@@C@@@G@@@G@@@T@@@T@@@C@@@G@@@A@@@C@@@C@@@C@@@C@@@G@@@C@@@T@@@A@@@T@@@G@@@C@@@T@@@C@@@C@@@-@@@-@@@-@@@-
GA0000003@@@Ala@@@TGC@@@Chlamydia pneumoniae @@@AR39@@@-@@@G@@@G@@@G@@@G@@@A@@@C@@@T@@@T@@@A@@@G@@@C@@@T@@@T@@@A@@@G@@@T@@@T@@@-@@@G@@@G@@@T@@@-@@@-@@@A@@@G@@@A@@@G@@@C@@@G@@@T@@@C@@@T@@@G@@@A@@@T@@@T@@@T@@@G@@@C@@@A@@@T@@@T@@@C@@@A@@@G@@@A@@@A@@@G@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@-@@@G@@@T@@@C@@@A@@@G@@@G@@@A@@@G@@@T@@@T@@@C@@@G@@@A@@@A@@@T@@@C@@@T@@@C@@@C@@@T@@@A@@@G@@@T@@@C@@@T@@@C@@@C@@@-@@@-@@@-@@@-"""
sample_lines = ['\t'.join(i.split('@@@')) for i in sample_file.split('\n')]
class OneLineSprinzlParserTests(TestCase):
"""Tests of OneLineSprinzlParser"""
def setUp(self):
"""standard tRNA file"""
self.tRNAs = sample_lines #open('data_sprinzl.txt').read().split('\n')
def test_minimal(self):
"""OneLineSprinzlParser should work on a minimal 'file'"""
small = ['acc\taa\tac\tsp\tst\ta\tb\tc','q\tw\te\tr\tt\tA\tC\tG']
p = OneLineSprinzlParser(small)
result = list(p)
self.assertEqual(len(result), 1)
self.assertEqual(result[0], 'ACG')
self.assertEqual(result[0].Info.Accession, 'q')
self.assertEqual(result[0].Info.AA, 'w')
self.assertEqual(result[0].Info.Anticodon, 'e')
self.assertEqual(result[0].Info.Species, 'r')
self.assertEqual(result[0].Info.Strain, 't')
def test_init(self):
"""OneLineSprinzlParser should read small file correctly"""
p = OneLineSprinzlParser(self.tRNAs)
recs = list(p)
self.assertEqual(len(recs), 3)
first, second, third = recs
assert first.Info.Labels is second.Info.Labels
assert first.Info.Labels is third.Info.Labels
expected_label_list = "0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 17A 18 19 20 20A 20B 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 e11 e12 e13 e14 e15 e16 e17 e1 e2 e3 e4 e5 e27 e26 e25 e24 e23 e22 e21 46 47 48 49 5. 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 7. 71 72 73 74 75 76".split()
exp_labels = {}
for i, label in enumerate(expected_label_list):
exp_labels[label] = i
self.assertEqual(first.Info.Labels, exp_labels)
self.assertEqual(first.Info.Accession, 'GA0000001')
self.assertEqual(first.Info.AA, 'Ala')
self.assertEqual(first.Info.Anticodon, 'TGC')
self.assertEqual(first.Info.Species, 'Haemophilus influenzae')
self.assertEqual(first.Info.Strain, 'Rd KW20')
self.assertEqual(first, '-GGGGCCTTAGCTCAGCT-GGG--AGAGCGCCTGCTTTGCACGCAGGAG-------------------GTCAGCGGTTCGATCCCGCTAGGCTCCA---'.replace('T','U'))
self.assertEqual(third.Info.Accession, 'GA0000003')
self.assertEqual(third.Info.AA, 'Ala')
self.assertEqual(third.Info.Anticodon, 'TGC')
self.assertEqual(third.Info.Species, 'Chlamydia pneumoniae')
self.assertEqual(third.Info.Strain, 'AR39')
self.assertEqual(third, '-GGGGACTTAGCTTAGTT-GGT--AGAGCGTCTGATTTGCATTCAGAAG-------------------GTCAGGAGTTCGAATCTCCTAGTCTCC----'.replace('T','U'))
genomic_sample = """3\t5950\tsequences\t\t0\t1\t2\t3\t4\t5\t6\t7\t8\t9\t10\t11\t12\t13\t14\t15\t16\t17\t17A\t18\t19\t20\t20A\t20B\t21\t22\t23\t24\t25\t26\t27\t28\t29\t30\t31\t32\t33\t34\t35\t36\t37\t38\t39\t40\t41\t42\t43\t44\t45\te11\te12\te13\te14\te15\te16\te17\te1\te2\te3\te4\te5\te27\te26\te25\te24\te23\te22\te21\t46\t47\t48\t49\t50\t51\t52\t53\t54\t55\t56\t57\t58\t59\t60\t61\t62\t63\t64\t65\t66\t67\t68\t69\t70\t71\t72\t73\t74\t75\t76\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t
2\tGA0000001\tAla\t\tTGC\t\tHaemophilus influenzae\t\t\t\t\t\t\t\t\t\tRd KW20\t\t\t\tBacteria; Proteobacteria; gamma subdivision; Pasteurellaceae; Haemophilus\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t
\t\t\t\t-\tG\tG\tG\tG\tC\tC\tT\tT\tA\tG\tC\tT\tC\tA\tG\tC\tT\t-\tG\tG\tG\t-\t-\tA\tG\tA\tG\tC\tG\tC\tC\tT\tG\tC\tT\tT\tT\tG\tC\tA\tC\tG\tC\tA\tG\tG\tA\tG\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\tG\tT\tC\tA\tG\tC\tG\tG\tT\tT\tC\tG\tA\tT\tC\tC\tC\tG\tC\tT\tA\tG\tG\tC\tT\tC\tC\tA\t-\t-\t-\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t
\t\t\t\t\t=\t=\t*\t=\t=\t=\t=\t\t\t=\t=\t=\t=\t\t\t\t\t\t\t\t\t\t\t\t=\t=\t=\t=\t\t=\t=\t=\t=\t=\t\t\t\t\t\t\t\t=\t=\t=\t=\t=\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t=\t=\t=\t=\t=\t\t\t\t\t\t\t\t=\t=\t=\t=\t=\t=\t=\t=\t=\t*\t=\t=\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t
3\tGA0000002\tAla\t\tGGC\t\tChlamydia pneumoniae \t\t\t\t\t\t\t\t\t\tAR39\t\t\t\tBacteria; Chlamydiales; Chlamydiaceae; Chlamydophila; Chlamydophila pneumoniae\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t
\t\t\t\t-\tG\tG\tG\tG\tT\tA\tT\tT\tA\tG\tC\tT\tC\tA\tG\tT\tT\t-\tG\tG\tT\t-\t-\tA\tG\tA\tG\tC\tG\tC\tA\tA\tC\tA\tA\tT\tG\tG\tC\tA\tT\tT\tG\tT\tT\tG\tA\tG\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\tG\tT\tC\tA\tG\tC\tG\tG\tT\tT\tC\tG\tA\tC\tC\tC\tC\tG\tC\tT\tA\tT\tG\tC\tT\tC\tC\t-\t-\t-\t-\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t
\t\t\t\t\t=\t=\t*\t=\t*\t=\t=\t\t\t=\t=\t=\t=\t\t\t\t\t\t\t\t\t\t\t\t=\t=\t=\t=\t\t=\t=\t=\t=\t=\t\t\t\t\t\t\t\t=\t=\t=\t=\t=\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t=\t=\t=\t=\t=\t\t\t\t\t\t\t\t=\t=\t=\t=\t=\t=\t=\t*\t=\t*\t=\t=\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t
4\tGA0000003\tAla\t\tTGC\t\tChlamydia pneumoniae \t\t\t\t\t\t\t\t\t\tAR39\t\t\t\tBacteria; Chlamydiales; Chlamydiaceae; Chlamydophila; Chlamydophila pneumoniae\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t
\t\t\t\t-\tG\tG\tG\tG\tA\tC\tT\tT\tA\tG\tC\tT\tT\tA\tG\tT\tT\t-\tG\tG\tT\t-\t-\tA\tG\tA\tG\tC\tG\tT\tC\tT\tG\tA\tT\tT\tT\tG\tC\tA\tT\tT\tC\tA\tG\tA\tA\tG\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\t-\tG\tT\tC\tA\tG\tG\tA\tG\tT\tT\tC\tG\tA\tA\tT\tC\tT\tC\tC\tT\tA\tG\tT\tC\tT\tC\tC\t-\t-\t-\t-\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t
\t\t\t\t\t=\t=\t*\t=\t=\t=\t=\t\t\t=\t=\t=\t*\t\t\t\t\t\t\t\t\t\t\t\t*\t=\t=\t=\t\t=\t=\t=\t=\t=\t\t\t\t\t\t\t\t=\t=\t=\t=\t=\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t=\t=\t=\t=\t=\t\t\t\t\t\t\t\t=\t=\t=\t=\t=\t=\t=\t=\t=\t*\t=\t=\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t""".split('\n')
class GenomicSprinzlParserTests(TestCase):
"""Tests of the GenomicSprinzlParser class."""
def test_single(self):
"""GenomicSprinzlParser should work with single sequence"""
seqs = list(GenomicSprinzlParser(genomic_sample[0:4]))
self.assertEqual(len(seqs), 1)
s = seqs[0]
self.assertEqual(s, '-GGGGCCTTAGCTCAGCT-GGG--AGAGCGCCTGCTTTGCACGCAGGAG-------------------GTCAGCGGTTCGATCCCGCTAGGCTCCA---'.replace('T','U'))
self.assertEqual(s.Info.Accession, 'GA0000001')
self.assertEqual(s.Info.AA, 'Ala')
self.assertEqual(s.Info.Anticodon, 'UGC')
self.assertEqual(s.Info.Species, 'Haemophilus influenzae')
self.assertEqual(s.Info.Strain, 'Rd KW20')
self.assertEqual(s.Info.Taxonomy, ['Bacteria', 'Proteobacteria', \
'gamma subdivision', 'Pasteurellaceae', 'Haemophilus'])
self.assertEqual(s.Pairing, '.==*====..====...........====.=====.......=====........................=====.......=========*==....')
def test_multi(self):
"""GenomicSprinzlParser should work with multiple sequences"""
seqs = list(GenomicSprinzlParser(genomic_sample))
self.assertEqual(len(seqs), 3)
self.assertEqual([s.Info.Accession for s in seqs], \
['GA0000001', 'GA0000002', 'GA0000003'])
self.assertEqual(seqs[2].Info.Anticodon, 'UGC')
self.assertEqual(seqs[0].Info.Order, seqs[2].Info.Order)
class FixSequenceTests(TestCase):
"""Tests that _fix_structure functions properly."""
def test_fix_sequence(self):
"""Fix sequence should properly replace terminal gaps with CCA"""
seqs = ['','ACGUUCC-','ACGUUC--','ACGUU---','ACGU----']
results = ['','ACGUUCCA','ACGUUCCA','ACGUUCCA','ACGU-CCA']
for s,r in zip(seqs,results):
self.assertEqual(_fix_sequence(s),r)
class SprinzlToViennaTests(TestCase):
def setUp(self):
"""setUp function for SprinzlToViennaTests"""
self.structures = map(strip,STRUCTURES.split('\n'))
self.vienna_structs = map(strip,VIENNA.split('\n'))
self.short_struct = '...===...===.'
#structure too long
self.incorrect1 = ''.join(['..=====*..*==.............==*...=.=...',
'....=.=..........................=====.......=====*=====......'])
#two halves don't match
self.incorrect2 = ''.join(['..=====*..*===............==*...=.=...',
'....=.=..........................=====.......=====*=====.....'])
def test_get_pieces(self):
"""get_pieces: should return the correct pieces"""
splits = [0,3,7,-1,13]
self.assertEqual(get_pieces(self.short_struct, splits),\
['...','===.','..===','.'])
#will include empty strings for indices outside of the structure
self.assertEqual(get_pieces(self.short_struct,[2,10,20,30]),\
['.===...=','==.',''])
#will return empty list if no break-positions are given
self.assertEqual(get_pieces(self.short_struct,[]), [])
def test_get_counts(self):
"""get_counts: should return list of lengths of paired regions"""
self.assertEqual(get_counts('.===.=..'),[3,1])
self.assertEqual(get_counts('...====..'),[4])
self.assertEqual(get_counts('...'),[])
def test_sprinzl_to_vienna(self):
"""sprinzl_to_vienna: should give expected output"""
#This should only work for correct
for sprinzl,vienna in zip(self.structures, self.vienna_structs):
self.assertEqual(sprinzl_to_vienna(sprinzl),vienna)
#Check two obvious errors
self.assertRaises(AssertionError,sprinzl_to_vienna,self.incorrect1)
self.assertRaises(AssertionError,sprinzl_to_vienna,self.incorrect2)
STRUCTURES="""\
.===*===..====...........====.=====.......=====........................=====.......========*===....
.=======..=*=.............=*=.=====.......=====........................=====.......============....
.=======..====...........====.*====.......====*........................=.===.......===.========....
.=====.=..====...........====..===.........===.........................**===.......===**=.=====....
....====..====...........====.*====.......====*........................=====.......=========.......
..=====*..*==.............==*...=.=.......=.=..........................=====.......=====*=====.....
.=.=.==*..*.=*...........*=.*.*=.==.......==.=*........................=====.......=====*==.=.=....
.====*.=..**.=...........=.**.*====.......====*........................=.*.=.......=.*.==.*====....
.====*.=..**.=............=**.*====.......====*........................=.*.=.......=.*.==.*====...."""
VIENNA="""\
.(((((((..((((...........)))).(((((.......)))))........................(((((.......))))))))))))....
.(((((((..(((.............))).(((((.......)))))........................(((((.......))))))))))))....
.(((((((..((((...........)))).(((((.......)))))........................(.(((.......))).))))))))....
.(((((.(..((((...........))))..(((.........))).........................(((((.......)))))).)))))....
....((((..((((...........)))).(((((.......)))))........................(((((.......))))))))).......
..((((((..(((.............)))...(.(.......).)..........................(((((.......))))))))))).....
.(.(.(((..(.((...........)).).((.((.......)).))........................(((((.......)))))))).).)....
.(((((.(..((.(...........).)).(((((.......)))))........................(.(.(.......).).)).)))))....
.(((((.(..((.(............))).(((((.......)))))........................(.(.(.......).).)).)))))...."""
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_stockholm.py 000644 000765 000024 00000062130 12024702176 023172 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""
Provides tests for StockholmParser and related classes and functions.
"""
from cogent.parse.stockholm import is_gf_line, is_gc_line, is_gs_line, \
is_gr_line, is_seq_line, is_structure_line, GfToInfo, GcToInfo, GsToInfo, \
GrToInfo, MinimalStockholmParser, StockholmFinder, \
StockholmParser, Sequence, is_empty_or_html
from cogent.util.unit_test import TestCase, main
from cogent.parse.record import RecordError
from cogent.core.info import Info
from cogent.struct.rna2d import WussStructure
from cogent.core.alignment import Alignment
from cogent.core.moltype import BYTES
__author__ = "Jeremy Widmann"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jeremy Widmann"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Jeremy Widmann"
__email__ = "jeremy.widmann@colorado.edu"
__status__ = "Development"
Sequence = BYTES.Sequence
class StockholmParserTests(TestCase):
""" Tests componenets of the stockholm parser, in the stockholm.py file """
def setUp(self):
""" Construct some fake data for testing purposes """
self._fake_headers = []
temp = list(fake_headers.split('\n'))
for line in temp:
self._fake_headers.append(line.strip())
del temp
self._fake_gc_annotation = []
temp = list(fake_gc_annotation.split('\n'))
for line in temp:
self._fake_gc_annotation.append(line.strip())
del temp
self._fake_gs_annotation = []
temp = list(fake_gs_annotation.split('\n'))
for line in temp:
self._fake_gs_annotation.append(line.strip())
del temp
self._fake_gr_annotation = []
temp = list(fake_gr_annotation.split('\n'))
for line in temp:
self._fake_gr_annotation.append(line.strip())
del temp
self._fake_record_no_headers =\
list(fake_record_no_headers.split('\n'))
self._fake_record_no_sequences =\
list(fake_record_no_sequences.split('\n'))
self._fake_record_no_structure =\
list(fake_record_no_structure.split('\n'))
self._fake_two_records =\
list(fake_two_records.split('\n'))
self._fake_record =\
list(fake_record.split('\n'))
self._fake_record_bad_header_1 =\
list(fake_record_bad_header_1.split('\n'))
self._fake_record_bad_header_2 =\
list(fake_record_bad_header_2.split('\n'))
self._fake_record_bad_sequence_1 =\
list(fake_record_bad_sequence_1.split('\n'))
self._fake_record_bad_structure_1 =\
list(fake_record_bad_structure_1.split('\n'))
self._fake_record_bad_structure_2 =\
list(fake_record_bad_structure_2.split('\n'))
self.single_family = single_family.split('\n')
def test_is_empty_or_html(self):
"""is_empty_or_html: should ignore empty and HTML line"""
line = ' '
self.assertEqual(is_empty_or_html(line), True)
line = '\n\n'
self.assertEqual(is_empty_or_html(line), True)
line = ''
self.assertEqual(is_empty_or_html(line), True)
line = ' \n\n'
self.assertEqual(is_empty_or_html(line), True)
line = '\t/\n'
self.assertEqual(is_empty_or_html(line), False)
def test_is_gf_line(self):
"""is_gf_line: functions correctly w/ various lines """
self.assertEqual(is_gf_line('#=GF'), True)
self.assertEqual(is_gf_line('#=GF AC RF00001'), True)
self.assertEqual(is_gf_line('#=GF CC until it is\
required for transcription. '), True)
self.assertEqual(is_gf_line(''), False)
self.assertEqual(is_gf_line('X07545.1/505-619 '), False)
self.assertEqual(is_gf_line('#=G'), False)
self.assertEqual(is_gf_line('=GF'), False)
self.assertEqual(is_gf_line('#=GC SS_cons'), False)
def test_is_gc_line(self):
"""is_gc_line: functions correctly w/ various lines """
self.assertEqual(is_gc_line('#=GC'), True)
self.assertEqual(is_gc_line('#=GC SS_cons'), True)
self.assertEqual(is_gc_line('#=GC RF'), True)
self.assertEqual(is_gc_line(''), False)
self.assertEqual(is_gc_line('X07545.1/505-619 '), False)
self.assertEqual(is_gc_line('#=G'), False)
self.assertEqual(is_gc_line('=GF'), False)
self.assertEqual(is_gc_line('#=GR SS'), False)
def test_is_gs_line(self):
"""is_gs_line: functions correctly w/ various lines """
self.assertEqual(is_gs_line('#=GS'), True)
self.assertEqual(is_gs_line('#=GS Seq1 AC'), True)
self.assertEqual(is_gs_line('#=GS Seq1 DE'), True)
self.assertEqual(is_gs_line(''), False)
self.assertEqual(is_gs_line('X07545.1/505-619 '), False)
self.assertEqual(is_gs_line('#=G'), False)
self.assertEqual(is_gs_line('=GF'), False)
self.assertEqual(is_gs_line('#=GC SS_cons'), False)
def test_is_gr_line(self):
"""is_gr_line: functions correctly w/ various lines """
self.assertEqual(is_gr_line('#=GR'), True)
self.assertEqual(is_gr_line('#=GR SS ..<<..>>..'), True)
self.assertEqual(is_gr_line('#=GR RF cGGacG'), True)
self.assertEqual(is_gr_line(''), False)
self.assertEqual(is_gr_line('X07545.1/505-619 '), False)
self.assertEqual(is_gr_line('#=G'), False)
self.assertEqual(is_gr_line('=GF'), False)
self.assertEqual(is_gr_line('#=GC SS_cons'), False)
def test_is_seq_line(self):
"""is_seq_line: functions correctly w/ various lines """
s = 'X07545.1/505-619 .\
.ACCCGGC.CAUA...GUGGCCG.GGCAA.CAC.CCGG.U.C..UCGUU'
assert is_seq_line('s')
assert is_seq_line('X07545.1/505-619')
assert is_seq_line('M21086.1/8-123')
assert not is_seq_line('')
assert not is_seq_line('#GF=')
assert not is_seq_line('//blah')
def test_is_structure_line(self):
"""is_structure_line: functions correctly w/ various lines """
s = '#=GC SS_cons\
<<<<<<<<<........<<.<<<<.<...<.<...<<<<.<.<.......'
self.assertEqual(is_structure_line(s), True)
self.assertEqual(is_structure_line('#=GC SS_cons'), False)
self.assertEqual(is_structure_line('#=GC SS_cons2'), False)
self.assertEqual(is_structure_line('#=GC SS_cons '), True)
self.assertEqual(is_structure_line(''), False)
self.assertEqual(is_structure_line(' '), False)
self.assertEqual(is_structure_line('#=GF AC RF00001'), False)
self.assertEqual(is_structure_line('X07545.1/505-619'), False)
self.assertEqual(is_structure_line('=GC SS_cons'), False)
self.assertEqual(is_structure_line('#=GC'), False)
self.assertEqual(is_structure_line('#=GC RF'), False)
def test_GfToInfo(self):
"""GfToInfo: correctly builds info object from header information"""
info = GfToInfo(self._fake_headers)
self.assertEqual(info['AccessionNumber'], 'RF00001')
self.assertEqual(info['Identification'], '5S_rRNA')
self.assertEqual(info['Comment'], 'This is a short comment')
self.assertEqual(info['Author'], 'Griffiths-Jones SR')
self.assertEqual(info['Sequences'], '606')
self.assertEqual(info['DatabaseReference'],\
['URL; http://oberon.fvms.ugent.be:8080/rRNA/ssu/index.html;',\
'URL; http://rdp.cme.msu.edu/html/;'])
self.assertEqual(info['PK'],'not real')
def test_GfToInfo_invalid_data(self):
"""GfToInfo: correctly raises error when necessary """
invalid_headers = [['#=GF ACRF00001'],['#=GFACRF00001']]
for h in invalid_headers:
self.assertRaises(RecordError, GfToInfo, h)
def test_GcToInfo(self):
"""GcToInfo: correctly builds info object from header information"""
info = GcToInfo(self._fake_gc_annotation)
self.assertEqual(info['ConsensusSecondaryStructure'], \
'..........<<<<<<<<<<.....>>>>>>>>>>..')
self.assertEqual(info['ReferenceAnnotation'], \
'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
def test_GcToInfo_invalid_data(self):
"""GcToInfo: correctly raises error when necessary """
invalid_headers = [['#=GCSS_cons ..<<..>>..'],['#=GCSAxxxxxxx']]
for h in invalid_headers:
self.assertRaises(RecordError, GcToInfo, h)
def test_GsToInfo(self):
"""GsToInfo: correctly builds info object from header information"""
info = GsToInfo(self._fake_gs_annotation)
self.assertEqual(info['BasePair'], \
{'1N77_C':['0 70 cWW CC','1 69 cWW CC','2 68 cWW CC',\
'3 67 cWW CC']})
def test_GsToInfo_invalid_data(self):
"""GsToInfo: correctly raises error when necessary """
invalid_headers = [['#=GSBPS 0 10 cwW CC'],['#=GSACRF00001']]
for h in invalid_headers:
self.assertRaises(RecordError, GsToInfo, h)
def test_GrToInfo(self):
"""GrToInfo: correctly builds info object from header information"""
info = GrToInfo(self._fake_gr_annotation)
self.assertEqual(info['SecondaryStructure'], \
{'1N77_C':'..........<<<<<<<<<<.....>>>>>>>>>>..'})
self.assertEqual(info['ReferenceAnnotation'], \
{'1N77_C':'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'})
def test_GrToInfo_invalid_data(self):
"""GrToInfo: correctly raises error when necessary """
invalid_headers = [['#=GRSS ..<<..>>..'],['#=GRSAxxxxxx']]
for h in invalid_headers:
self.assertRaises(RecordError, GrToInfo, h)
def test_StockholmStockholmParser_strict_missing_fields(self):
"""MinimalStockholmParser: toggle strict functions w/ missing fields"""
# strict = True
self.assertRaises(RecordError,list,\
MinimalStockholmParser(self._fake_record_no_sequences))
# strict = False
# no header shouldn't be a problem
headers, aln, struct = \
list(MinimalStockholmParser(self._fake_record_no_headers,\
strict=False))[0]
self.assertEqual((headers,aln.todict(),str(struct)), \
({'GS':[],'GF':[],'GR':[],\
'GC':['#=GC SS_cons ............>>>']},\
{'Z11765.1/1-89':'GGUC'},'............>>>'))
# should get empty on missing sequence or missing structure
self.assertEqual(list(MinimalStockholmParser(\
self._fake_record_no_sequences,\
strict=False)), [])
def test_MinimalStockholmParser_strict_invalid_sequence(self):
"""MinimalStockholmParser: toggle strict functions w/ invalid seq
"""
#strict = True
self.assertRaises(RecordError,list,\
MinimalStockholmParser(self._fake_record_bad_sequence_1))
# strict = False
# you expect to get back as much information as possible, also
# half records or sequences
result = MinimalStockholmParser(\
self._fake_record_bad_sequence_1,strict=False)
self.assertEqual(len(list(MinimalStockholmParser(\
self._fake_record_bad_sequence_1,strict=False))[0][1].NamedSeqs), 3)
def test_StockholmParser_strict_invalid_structure(self):
"""StockholmParser: toggle strict functions w/ invalid structure
"""
#strict = True
self.assertRaises(RecordError,list,\
StockholmParser(self._fake_record_bad_structure_1))
# strict = False
self.assertEqual(list(MinimalStockholmParser(\
self._fake_record_bad_structure_1,strict=False))[0][2],None)
def test_MinimalStockholmParser_w_valid_data(self):
"""MinimalStockholmParser: integrity of output """
# Some ugly constructions here, but this is what the output of
# parsing fake_two_records should be
headers = ['#=GF AC RF00014','#=GF AU Mifsud W']
sequences =\
{'U17136.1/898-984':\
''.join(['AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA',\
'AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU']),\
'M15749.1/155-239':\
''.join(['AACGCAUCGGAUUUCCCGGUGUAACGAA-UUUUCAAGUGCUUCUUGCAUU',\
'AGCAAGUUUGAUCCCGACUCCUG-CGAGUCGGGAUUU']),\
'AF090431.1/222-139':\
''.join(['CUCACAUCAGAUUUCCUGGUGUAACGAA-UUUUCAAGUGCUUCUUGCAUA',\
'AGCAAGUUUGAUCCCGACCCGU--AGGGCCGGGAUUU'])}
structure = WussStructure(''.join(\
['...<<<<<<<.....>>>>>>>....................<<<<<...',\
'.>>>>>....<<<<<<<<<<.....>>>>>>>>>>..']))
data = []
for r in MinimalStockholmParser(self._fake_two_records, strict=False):
data.append(r)
self.assertEqual(\
(data[0][0]['GF'],data[0][1].todict(),\
str(data[0][2])),(headers,sequences,structure))
assert isinstance(data[0][1],Alignment)
# This line tests that invalid entries are ignored when strict=False
# Note, there are two records in self._fake_two_records, but 2nd is
# invalid
self.assertEqual(len(data),1)
def test_StockholmFinder(self):
"""StockholmFinder: integrity of output """
fake_record = ['a','//','b','b','//']
num_records = 0
data = []
for r in StockholmFinder(fake_record):
data.append(r)
num_records += 1
self.assertEqual(num_records, 2)
self.assertEqual(data[0], ['a','//'])
self.assertEqual(data[1], ['b','b','//'])
def test_StockholmParser(self):
"""StockholmParser: integrity of output """
expected_sequences =\
[''.join(['AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA',\
'AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU']),\
''.join(['AACGCAUCGGAUUUCCCGGUGUAACGAA-UUUUCAAGUGCUUCUUGCAUU',\
'AGCAAGUUUGAUCCCGACUCCUG-CGAGUCGGGAUUU']),\
''.join(['CUCACAUCAGAUUUCCUGGUGUAACGAA-UUUUCAAGUGCUUCUUGCAUA',\
'AGCAAGUUUGAUCCCGACCCGU--AGGGCCGGGAUUU'])]
expected_structure = ''.join(\
['...<<<<<<<.....>>>>>>>....................<<<<<...',\
'.>>>>>....<<<<<<<<<<.....>>>>>>>>>>..'])
for r in StockholmParser(self._fake_record):
headers = r.Info
sequences = r
structure = r.Info['Struct']
self.assertEqual(headers['GF']['AccessionNumber'], 'RF00014')
self.assertEqual(headers['GF']['Author'], 'Mifsud W')
self.assertEqualItems(sequences.values(), expected_sequences)
assert isinstance(sequences, Alignment)
self.assertEqual(structure, expected_structure)
assert isinstance(structure,WussStructure)
def test_StockholmParser_strict_missing_fields(self):
"""StockholmParser: toggle strict functions correctly """
# strict = True
self.assertRaises(RecordError,list,\
StockholmParser(self._fake_record_no_headers))
# strict = False
self.assertEqual(list(StockholmParser(self._fake_record_no_headers,\
strict=False)), [])
self.assertEqual(list(StockholmParser(self._fake_record_no_sequences,\
strict=False)), [])
def test_StockholmParser_strict_invalid_headers(self):
"""StockholmParser: functions when toggling strict record w/ bad header
"""
self.assertRaises(RecordError,list,\
StockholmParser(self._fake_record_bad_header_1))
self.assertRaises(RecordError,list,\
StockholmParser(self._fake_record_bad_header_2))
# strict = False
x = list(StockholmParser(self._fake_record_bad_header_1, strict=False))
obs = list(StockholmParser(self._fake_record_bad_header_1,\
strict=False))[0].Info.GF.keys()
self.assertEqual(len(obs),1)
obs = list(StockholmParser(self._fake_record_bad_header_2,\
strict=False))[0].Info.GF.keys()
self.assertEqual(len(obs),1)
def test_StockholmParser_strict_invalid_sequences(self):
"""StockholmParser: functions when toggling strict w/ record w/ bad seq
"""
self.assertRaises(RecordError,list,
MinimalStockholmParser(self._fake_record_bad_sequence_1))
# strict = False
# in 'False' mode you expect to get back as much as possible, also
# parts of sequences
self.assertEqual(len(list(StockholmParser(\
self._fake_record_bad_sequence_1,\
strict=False))[0].NamedSeqs), 3)
def test_StockholmParser_strict_invalid_structure(self):
"""StockholmParser: functions when toggling strict record w/ bad struct
"""
# strict
self.assertRaises(RecordError,list,\
StockholmParser(self._fake_record_bad_structure_2))
#not strict
self.assertEqual(list(StockholmParser(\
self._fake_record_bad_structure_2,\
strict=False)),[])
def test_StockholmParser_single_family(self):
"""StockholmParser: should work on a family in stockholm format"""
exp_header = {}
exp_aln = {'K02120.1/628-682':\
'AUGGGAAAUUCCCCCUCCUAUAACCCCCCCGCUGGUAUCUCCCCCUCAGACUGGC',\
'D00647.1/629-683':\
'AUGGGAAACUCCCCCUCCUAUAACCCCCCCGCUGGCAUCUCCCCCUCAGACUGGC'}
exp_struct = '<<<<<<.........>>>>>>.........<<<<<<.............>>>>>>'
aln = list(StockholmParser(self.single_family))[0]
h = aln.Info['GF']
a = aln
s = aln.Info['Struct']
self.assertEqual(h,exp_header)
self.assertEqual(a,exp_aln)
self.assertEqual(s,exp_struct)
# This is an altered version of some header info from Rfam.seed modified to
# incorporate different cases for testing
fake_headers = """#=GF AC RF00001
#=GF AU Griffiths-Jones SR
#=GF ID 5S_rRNA
#=GF RT 5S Ribosomal RNA Database.
#=GF DR URL; http://oberon.fvms.ugent.be:8080/rRNA/ssu/index.html;
#=GF DR URL; http://rdp.cme.msu.edu/html/;
#=GF CC This is a short
#=GF CC comment
#=GF SQ 606
#=GF PK not real"""
fake_gc_annotation = """#=GC SS_cons ..........<<<<<<<<<<.....>>>>>>>>>>..
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
"""
fake_gs_annotation = """#=GS 1N77_C BP 0 70 cWW CC
#=GS 1N77_C BP 1 69 cWW CC
#=GS 1N77_C BP 2 68 cWW CC
#=GS 1N77_C BP 3 67 cWW CC
"""
fake_gr_annotation = """#=GR 1N77_C SS ..........<<<<<<<<<<.....>>>>>>>>>>..
#=GR 1N77_C RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
"""
fake_record_no_headers ="""Z11765.1/1-89 GGUC
#=GC SS_cons ............>>>
//"""
fake_record_no_sequences ="""#=GF AC RF00006
#=GC SS_cons ............>
//"""
fake_record_no_structure ="""#=GF AC RF00006
Z11765.1/1-89 GGUCAGC
//"""
fake_two_records ="""# STOCKHOLM 1.0
#=GF AC RF00014
#=GF AU Mifsud W
U17136.1/898-984 AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA
M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU
AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA
#=GC SS_cons ...<<<<<<<.....>>>>>>>....................<<<<<...
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
U17136.1/898-984 AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU
M15749.1/155-239 AGCAAGUUUGAUCCCGACUCCUG.CGAGUCGGGAUUU
AF090431.1/222-139 AGCAAGUUUGAUCCCGACCCGU..AGGGCCGGGAUUU
#=GC SS_cons .>>>>>....<<<<<<<<<<.....>>>>>>>>>>..
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
//
#=GF AC RF00015
//"""
fake_record ="""# STOCKHOLM 1.0
#=GF AC RF00014
#=GF AU Mifsud W
U17136.1/898-984 AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA
M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU
AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA
#=GC SS_cons ...<<<<<<<.....>>>>>>>....................<<<<<...
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
U17136.1/898-984 AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU
M15749.1/155-239 AGCAAGUUUGAUCCCGACUCCUG.CGAGUCGGGAUUU
AF090431.1/222-139 AGCAAGUUUGAUCCCGACCCGU..AGGGCCGGGAUUU
#=GC SS_cons .>>>>>....<<<<<<<<<<.....>>>>>>>>>>..
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
//"""
fake_record_bad_header_1 ="""# STOCKHOLM 1.0
#=GF AC RF00014
#=GF AUMifsudW
U17136.1/898-984 AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA
M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU
AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA
#=GC SS_cons ...<<<<<<<.....>>>>>>>....................<<<<<...
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
U17136.1/898-984 AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU
M15749.1/155-239 AGCAAGUUUGAUCCCGACUCCUG.CGAGUCGGGAUUU
AF090431.1/222-139 AGCAAGUUUGAUCCCGACCCGU..AGGGCCGGGAUUU
#=GC SS_cons .>>>>>....<<<<<<<<<<.....>>>>>>>>>>..
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
//"""
fake_record_bad_header_2 ="""# STOCKHOLM 1.0
#=GF AC RF00014
#=GFAUMifsud W
U17136.1/898-984 AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA
M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU
AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA
#=GC SS_cons ...<<<<<<<.....>>>>>>>....................<<<<<...
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
U17136.1/898-984 AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU
M15749.1/155-239 AGCAAGUUUGAUCCCGACUCCUG.CGAGUCGGGAUUU
AF090431.1/222-139 AGCAAGUUUGAUCCCGACCCGU..AGGGCCGGGAUUU
#=GC SS_cons .>>>>>....<<<<<<<<<<.....>>>>>>>>>>..
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
//"""
fake_record_bad_sequence_1 ="""# STOCKHOLM 1.0
#=GF AC RF00014
#=GF AU Mifsud W
U17136.1/898-984AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA
M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU
AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA
#=GC SS_cons ...<<<<<<<.....>>>>>>>....................<<<<<...
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
U17136.1/898-984 AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU
M15749.1/155-239 AGCAAGUUUGAUCCCGACUCCUG.CGAGUCGGGAUUU
AF090431.1/222-139 AGCAAGUUUGAUCCCGACCCGU..AGGGCCGGGAUUU
#=GC SS_cons .>>>>>....<<<<<<<<<<.....>>>>>>>>>>..
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
//"""
fake_record_bad_structure_1 ="""# STOCKHOLM 1.0
#=GF AC RF00014
#=GF AU Mifsud W
U17136.1/898-984 AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA
M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU
AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA
#=GC SS_cons...<<<<<<<.....>>>>>>>....................<<<<<...
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
U17136.1/898-984 AGCAAGUUUCAUCCCGACCCCCUCAGGGUCGGGAUUU
M15749.1/155-239 AGCAAGUUUGAUCCCGACUCCUG.CGAGUCGGGAUUU
AF090431.1/222-139 AGCAAGUUUGAUCCCGACCCGU..AGGGCCGGGAUUU
#=GC SS_cons .>>>>>....<<<<<<<<<<.....>>>>>>>>>>..
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
//"""
fake_record_bad_structure_2 ="""# STOCKHOLM 1.0
#=GF AC RF00014
#=GF AU Mifsud W
U17136.1/898-984 AACACAUCAGAUUUCCUGGUGUAACGAAUUUUUUAAGUGCUUCUUGCUUA
M15749.1/155-239 AACGCAUCGGAUUUCCCGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUU
AF090431.1/222-139 CUCACAUCAGAUUUCCUGGUGUAACGAA.UUUUCAAGUGCUUCUUGCAUA
#=GC SS_cons ...<<<<<<<.....>>>>>>>....................<<<<>>>>....<<<<<<<<<<.....>>>>>>>>>>..
#=GC RF xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
//"""
single_family=\
"""K02120.1/628-682 AUGGGAAAUUCCCCCUCCUAUAACCCCCCCGCUGGUAUCUCCCCCUCAGA
D00647.1/629-683 AUGGGAAACUCCCCCUCCUAUAACCCCCCCGCUGGCAUCUCCCCCUCAGA
#=GC SS_cons <<<<<<.........>>>>>>.........<<<<<<.............>
K02120.1/628-682 CUGGC
D00647.1/629-683 CUGGC
#=GC SS_cons >>>>>
//"""
# Run tests if called from the command line
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_stride.py 000644 000765 000024 00000004522 12024702176 022462 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
import os
try:
from cogent.util.unit_test import TestCase, main
from cogent.struct.selection import einput
from cogent.parse.pdb import PDBParser
from cogent.parse.stride import stride_parser
from cogent.app.stride import Stride, stride_xtra
except ImportError:
from zenpdb.cogent.util.unit_test import TestCase, main
from zenpdb.cogent.struct.selection import einput
from zenpdb.cogent.parse.pdb import PDBParser
from zenpdb.cogent.parse.stride import stride_parser
from zenpdb.cogent.app.stride import Stride, stride_xtra
__author__ = "Marcin Cieslik"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Marcin Cieslik"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Marcin Cieslik"
__email__ = "mpc4p@virginia.edu"
__status__ = "Development"
class StrideParseTest(TestCase):
"""Tests for Stride application controller."""
def setUp(self):
input_file = os.path.join('data', '2E12.pdb')
self.input_structure = PDBParser(open(input_file))
stride_app = Stride()
res = stride_app(self.input_structure)
self.lines = res['StdOut'].readlines()
def test_stride_parser(self):
"""tests if output is parsed fully"""
id_xtra = stride_parser(self.lines)
assert len(id_xtra) < len(self.input_structure[(0,)][('A',)]) + \
len(self.input_structure[(0,)][('B',)])
self.input_structure[(0,)][('A',)].remove_hetero()
self.input_structure[(0,)][('B',)].remove_hetero()
assert len(id_xtra) == len(self.input_structure[(0,)][('A',)]) + \
len(self.input_structure[(0,)][('B',)])
def test_stride_xtra(self):
"""tests if residues get annotated with parsed data."""
stride_xtra(self.input_structure)
self.assertEquals(\
self.input_structure[(0,)][('A',)][(('H_HOH', 138, ' '),)].xtra, {})
self.assertAlmostEquals(\
self.input_structure[(0,)][('A',)][(('ILE', 86, ' '),)].xtra['STRIDE_ASA'], 13.9)
self.input_structure[(0,)][('A',)].remove_hetero()
self.input_structure[(0,)][('B',)].remove_hetero()
all_residues = einput(self.input_structure, 'R')
a = all_residues.data_children('STRIDE_ASA', xtra=True, forgiving=False)
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_structure.py 000755 000765 000024 00000002300 12024702176 023223 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for the pdb parser.
"""
from cogent.util.unit_test import TestCase, main
from cogent.core.entity import Structure
from cogent.parse.structure import FromFilenameStructureParser, FromFileStructureParser
__author__ = "Marcin Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Marcin Cieslik"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Marcin Cieslik"
__email__ = "mpc4p@virginia.edu"
__status__ = "Production"
class structuresTests(TestCase):
"""Tests of cogent.parse.structure UI functions."""
def test_FromFilenameStructureParser(self):
structure = FromFilenameStructureParser('data/1LJO.pdb', 'pdb')
self.assertRaises(TypeError, FromFilenameStructureParser, open('data/1LJO.pdb'), 'pdb')
assert isinstance(structure, Structure)
def test_FromFileStructureParser(self):
structure = FromFileStructureParser(open('data/1LJO.pdb'), 'pdb')
assert isinstance(structure, Structure)
self.assertRaises(TypeError, FromFileStructureParser, 'data/1LJO.pdb', 'pdb')
assert isinstance(structure, Structure)
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_tinyseq.py 000644 000765 000024 00000003245 12024702176 022665 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from StringIO import StringIO
import xml.dom.minidom
from cogent.util.unit_test import TestCase, main
from cogent.parse.tinyseq import TinyseqParser
__author__ = "Matthew Wakefield"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Matthew Wakefield"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Matthew Wakefield"
__email__ = "wakefield@wehi.edu.au"
__status__ = "Production"
data = """
31322957
AY286018.1
9315
Macropus eugenii
Macropus eugenii medium wave-sensitive opsin 1 (OPN1MW) mRNA, complete cds
99
GGCAGGGAAAGGGAAGAAAGTAAAGGGGCCATGACACAGGCATGGGACCCTGCAGGGTTCTTGGCTTGGCGGCGGGACGAGAACGAGGAGACGACTCGG
"""
sample_seq = ">AY286018.1\nGGCAGGGAAAGGGAAGAAAGTAAAGGGGCCATGACACAGGCATGGGACCCTGCAGGGTTCTTGGCTTGGCGGCGGGACGAGAACGAGGAGACGACTCGG"
sample_annotations = '[genbank_id "AY286018.1" at [0:99]/99, organism "Macropus eugenii" at [0:99]/99]'
class ParseTinyseq(TestCase):
def test_parse(self):
for name,seq in [TinyseqParser(data).next(),TinyseqParser(xml.dom.minidom.parseString(data)).next()]:
self.assertEqual(name, 'AY286018.1')
self.assertEqual(sample_seq, seq.toFasta())
self.assertEqual(str(seq.annotations), sample_annotations)
pass
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_parse/test_tree.py 000644 000765 000024 00000021446 12024702176 022133 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for tree parsers.
"""
from cogent.parse.tree import DndParser, DndTokenizer, RecordError
from cogent.core.tree import PhyloNode
from cogent.util.unit_test import TestCase, main
#from cogent.parse.newick import parse_string, TreeParseError as RecordError
#def DndParser(data, NodeClass=PhyloNode, unescape_name=True):
# if not unescape_name:
# raise NotImplementedError
# def constructor(children, name, attribs):
# return NodeClass(Children = list(children or []), Name=name, Params=attribs)
# return parse_string(data, constructor)
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Peter Maxwell", "Daniel McDonald"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
sample = """
(
(
xyz:0.28124,
(
def:0.24498,
mno:0.03627)
:0.17710)
:0.04870,
abc:0.05925,
(
ghi:0.06914,
jkl:0.13776)
:0.09853);
"""
node_data_sample = """
(
(
xyz:0.28124,
(
def:0.24498,
mno:0.03627)
'A':0.17710)
B:0.04870,
abc:0.05925,
(
ghi:0.06914,
jkl:0.13776)
C:0.09853);
"""
minimal = "();"
no_names = "((,),(,));"
missing_tip_name = "((a,b),(c,));"
empty = '();'
single = '(abc:3);'
double = '(abc:3, def:4);'
onenest = '(abc:3, (def:4, ghi:5):6 );'
nodedata = '(abc:3, (def:4, ghi:5)jkl:6 );'
class DndTokenizerTests(TestCase):
"""Tests of the DndTokenizer factory function."""
def test_gdata(self):
"""DndTokenizer should work as expected on real data"""
exp = \
['(', '(', 'xyz', ':', '0.28124',',', '(', 'def', ':', '0.24498',\
',', 'mno', ':', '0.03627', ')', ':', '0.17710', ')', ':', '0.04870', \
',', 'abc', ':', '0.05925', ',', '(', 'ghi', ':', '0.06914', ',', \
'jkl', ':', '0.13776', ')', ':', '0.09853', ')', ';']
#split it up for debugging on an item-by-item basis
obs = list(DndTokenizer(sample))
self.assertEqual(len(obs), len(exp))
for i, j in zip(obs, exp):
self.assertEqual(i, j)
#try it all in one go
self.assertEqual(list(DndTokenizer(sample)), exp)
def test_nonames(self):
"""DndTokenizer should work as expected on trees with no names"""
exp = ['(','(',',',')',',','(',',',')',')',';']
obs = list(DndTokenizer(no_names))
self.assertEqual(obs, exp)
def test_missing_tip_name(self):
"""DndTokenizer should work as expected on trees with a missing name"""
exp = ['(','(','a',',','b',')',',','(','c',',',')',')',';']
obs = list(DndTokenizer(missing_tip_name))
self.assertEqual(obs, exp)
def test_minimal(self):
"""DndTokenizer should work as expected a minimal tree without names"""
exp = ['(',')',';']
obs = list(DndTokenizer(minimal))
self.assertEqual(obs, exp)
class DndParserTests(TestCase):
"""Tests of the DndParser factory function."""
def test_nonames(self):
"""DndParser should produce the correct tree when there are no names"""
obs = DndParser(no_names)
exp = PhyloNode()
exp.append(PhyloNode())
exp.append(PhyloNode())
exp.Children[0].append(PhyloNode())
exp.Children[0].append(PhyloNode())
exp.Children[1].append(PhyloNode())
exp.Children[1].append(PhyloNode())
self.assertEqual(str(obs), str(exp))
def test_minimal(self):
"""DndParser should produce the correct minimal tree"""
obs = DndParser(minimal)
exp = PhyloNode()
exp.append(PhyloNode())
self.assertEqual(str(obs), str(exp))
def test_missing_tip_name(self):
"""DndParser should produce the correct tree when missing a name"""
obs = DndParser(missing_tip_name)
exp = PhyloNode()
exp.append(PhyloNode())
exp.append(PhyloNode())
exp.Children[0].append(PhyloNode(Name='a'))
exp.Children[0].append(PhyloNode(Name='b'))
exp.Children[1].append(PhyloNode(Name='c'))
exp.Children[1].append(PhyloNode())
self.assertEqual(str(obs), str(exp))
def test_gsingle(self):
"""DndParser should produce a single-child PhyloNode on minimal data"""
t = DndParser(single)
self.assertEqual(len(t), 1)
child = t[0]
self.assertEqual(child.Name, 'abc')
self.assertEqual(child.Length, 3)
self.assertEqual(str(t), '(abc:3.0);')
def test_gdouble(self):
"""DndParser should produce a double-child PhyloNode from data"""
t = DndParser(double)
self.assertEqual(len(t), 2)
self.assertEqual(str(t), '(abc:3.0,def:4.0);')
def test_gonenest(self):
"""DndParser should work correctly with nested data"""
t = DndParser(onenest)
self.assertEqual(len(t), 2)
self.assertEqual(len(t[0]), 0) #first child is terminal
self.assertEqual(len(t[1]), 2) #second child has two children
self.assertEqual(str(t), '(abc:3.0,(def:4.0,ghi:5.0):6.0);')
def test_gnodedata(self):
"""DndParser should assign Name to internal nodes correctly"""
t = DndParser(nodedata)
self.assertEqual(len(t), 2)
self.assertEqual(len(t[0]), 0) #first child is terminal
self.assertEqual(len(t[1]), 2) #second child has two children
self.assertEqual(str(t), '(abc:3.0,(def:4.0,ghi:5.0)jkl:6.0);')
info_dict = {}
for node in t.traverse():
info_dict[node.Name] = node.Length
self.assertEqual(info_dict['abc'], 3.0)
self.assertEqual(info_dict['def'], 4.0)
self.assertEqual(info_dict['ghi'], 5.0)
self.assertEqual(info_dict['jkl'], 6.0)
def test_data(self):
"""DndParser should work as expected on real data"""
t = DndParser(sample)
self.assertEqual(str(t), '((xyz:0.28124,(def:0.24498,mno:0.03627):0.1771):0.0487,abc:0.05925,(ghi:0.06914,jkl:0.13776):0.09853);')
tdata = DndParser(node_data_sample, unescape_name=True)
self.assertEqual(str(tdata), "((xyz:0.28124,(def:0.24498,mno:0.03627)A:0.1771)B:0.0487,abc:0.05925,(ghi:0.06914,jkl:0.13776)C:0.09853);")
def test_gbad(self):
"""DndParser should fail if parens unbalanced"""
left = '((abc:3)'
right = '(abc:3))'
self.assertRaises(RecordError, DndParser, left)
self.assertRaises(RecordError, DndParser, right)
def test_DndParser(self):
"""DndParser tests"""
t_str = "(A_a,(B:1.0,C),'D_e':0.5)E;"
tree_unesc = DndParser(t_str, PhyloNode, unescape_name=True)
tree_esc = DndParser(t_str, PhyloNode, unescape_name=False)
self.assertEqual(tree_unesc.Name, 'E')
self.assertEqual(tree_unesc.Children[0].Name, 'A a')
self.assertEqual(tree_unesc.Children[1].Children[0].Name, 'B')
self.assertEqual(tree_unesc.Children[1].Children[0].Length, 1.0)
self.assertEqual(tree_unesc.Children[1].Children[1].Name, 'C')
self.assertEqual(tree_unesc.Children[2].Name, 'D_e')
self.assertEqual(tree_unesc.Children[2].Length, 0.5)
self.assertEqual(tree_esc.Name, 'E')
self.assertEqual(tree_esc.Children[0].Name, 'A_a')
self.assertEqual(tree_esc.Children[1].Children[0].Name, 'B')
self.assertEqual(tree_esc.Children[1].Children[0].Length, 1.0)
self.assertEqual(tree_esc.Children[1].Children[1].Name, 'C')
self.assertEqual(tree_esc.Children[2].Name, "'D_e'")
self.assertEqual(tree_esc.Children[2].Length, 0.5)
reload_test = tree_esc.getNewick(with_distances=True, \
escape_name=False)
obs = DndParser(reload_test, unescape_name=False)
self.assertEqual(obs.getNewick(with_distances=True), \
tree_esc.getNewick(with_distances=True))
reload_test = tree_unesc.getNewick(with_distances=True, \
escape_name=False)
obs = DndParser(reload_test, unescape_name=False)
self.assertEqual(obs.getNewick(with_distances=True), \
tree_unesc.getNewick(with_distances=True))
class PhyloNodeTests(TestCase):
"""Check that PhyloNode works the way I think"""
def test_gops(self):
"""Basic PhyloNode operations should work as expected"""
p = PhyloNode()
self.assertEqual(str(p), ';')
p.Name = 'abc'
self.assertEqual(str(p), 'abc;')
p.Length = 3
self.assertEqual(str(p), 'abc:3;') #don't suppress branch from root
q = PhyloNode()
p.append(q)
self.assertEqual(str(p), '()abc:3;')
r = PhyloNode()
q.append(r)
self.assertEqual(str(p), '(())abc:3;')
r.Name = 'xyz'
self.assertEqual(str(p), '((xyz))abc:3;')
q.Length = 2
self.assertEqual(str(p), '((xyz):2)abc:3;')
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_parse/test_unigene.py 000644 000765 000024 00000014504 12024702176 022623 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for unigene-specific classes
"""
from cogent.parse.unigene import _read_sts, _read_expression, UniGeneSeqRecord,\
UniGeneProtSimRecord, _read_seq, LinesToUniGene
from cogent.parse.record_finder import GbFinder
from cogent.util.unit_test import TestCase, main
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class unigeneTests(TestCase):
"""Tests toplevel functions."""
def test_read_sts(self):
"""_read_sts should perform correct conversions"""
self.assertEqual(_read_sts('ACC=RH128467 UNISTS=211775\n'), \
{'ACC':'RH128467', 'UNISTS':'211775'})
def test_read_expression(self):
"""_read_expression should perform correct conversions"""
self.assertEqual(_read_expression(\
'embryo ; whole body ; mammary gland ; brain\n'),
['embryo', 'whole body', 'mammary gland', 'brain'])
def test_read_seq(self):
"""_read_seq should perform correct conversions"""
#reset the found fields, since we can't guarantee order of test
#execution and it's persistent class data
UniGeneSeqRecord.found_fields = {}
self.assertEqual(_read_seq('ACC=BC025044.1\n'), \
UniGeneSeqRecord({'ACC':'BC025044.1'}))
self.assertEqual(_read_seq(\
"ACC=AI842963.1; NID=g5477176; CLONE=UI-M-AO1-aem-f-10-0-UI; END=3'; LID=1944; SEQTYPE=EST; TRACE=158501677\n"), \
UniGeneSeqRecord({ 'ACC':'AI842963.1','NID':'g5477176',
'CLONE':'UI-M-AO1-aem-f-10-0-UI', 'END':"3'",
'LID':'1944', 'SEQTYPE':'EST',
'TRACE':'158501677'}) )
def test_LinesToUniGene(self):
"""LinesToUniGene should give expected results on sample data"""
fake_file = \
"""ID Mm.1
TITLE S100 calcium binder
GENE S100a10
CYTOBAND 3 41.7 cM
LOCUSLINK 20194
EXPRESS embryo ; whole body ; mammary gland ; brain
CHROMOSOME 3
STS ACC=RH128467 UNISTS=211775
STS ACC=M16465 UNISTS= 178878
PROTSIM ORG=Homo sapiens; PROTGI=107251; PROTID=pir:JC1139; PCT=91; ALN=97
PROTSIM ORG=Mus musculus; PROTGI=116487; PROTID=sp:P08207; PCT=100; ALN=97
PROTSIM ORG=Rattus norvegicus; PROTGI=116489; PROTID=sp:P05943; PCT=94; ALN=94
SCOUNT 5
SEQUENCE ACC=BC025044.1; NID=g19263549; PID=g19263550; SEQTYPE=mRNA
SEQUENCE ACC=AA471893.1; NID=g2199884; CLONE=IMAGE:872193; END=5'; LID=539; SEQTYPE=EST
SEQUENCE ACC=AI842963.1; NID=g5477176; CLONE=UI-M-AO1-aem-f-10-0-UI; END=3'; LID=1944; SEQTYPE=EST; TRACE=158501677
SEQUENCE ACC=CB595147.1; NID=g29513003; CLONE=IMAGE:30300703; END=5'; LID=12885; MGC=6677832; SEQTYPE=EST
SEQUENCE ACC=BY144053.1; NID=g26280109; CLONE=L930184D22; END=5'; LID=12267; SEQTYPE=EST
//
ID Mm.5
TITLE homeo box A10
GENE Hoxa10
CYTOBAND 6 26.33 cM
LOCUSLINK 15395
EXPRESS kidney ; colon ; mammary gland
CHROMOSOME 6
PROTSIM ORG=Caenorhabditis elegans; PROTGI=7510074; PROTID=pir:T31611; PCT=30; ALN=326
SCOUNT 1
SEQUENCE ACC=AW990320.1; NID=g8185938; CLONE=IMAGE:1513482; END=5'; LID=1043; SEQTYPE=EST; TRACE=94472873
//
"""
records = list(GbFinder(fake_file.split('\n')))
self.assertEqual(len(records), 2)
first, second = map(LinesToUniGene, records)
self.assertEqual(first.ID, 'Mm.1')
self.assertEqual(first.TITLE, 'S100 calcium binder')
self.assertEqual(first.GENE, 'S100a10')
self.assertEqual(first.CYTOBAND, '3 41.7 cM')
self.assertEqual(first.CHROMOSOME, '3')
self.assertEqual(first.LOCUSLINK, 20194)
self.assertEqual(first.EXPRESS, ['embryo', 'whole body', \
'mammary gland', 'brain'])
self.assertEqual(first.STS, [{'ACC':'RH128467','UNISTS':'211775'},
{'ACC':'M16465', 'UNISTS':'178878'}])
exp_prot_sim = map(UniGeneProtSimRecord, [
{'ORG':'Homo sapiens','PROTGI':'107251',
'PROTID':'pir:JC1139','PCT':'91','ALN':'97'},
{'ORG':'Mus musculus','PROTGI':'116487',
'PROTID':'sp:P08207','PCT':'100','ALN':'97'},
{'ORG':'Rattus norvegicus','PROTGI':'116489',
'PROTID':'sp:P05943','PCT':'94','ALN':'94'},])
for obs, exp in zip(first.PROTSIM, exp_prot_sim):
self.assertEqual(obs, exp)
self.assertEqual(first.SCOUNT, 5)
exp_seqs = map(UniGeneSeqRecord, [
{'ACC':'BC025044.1', 'NID':'g19263549','PID':'g19263550',
'SEQTYPE':'mRNA'},
{'ACC':'AA471893.1','NID':'g2199884','END':"5'",
'CLONE':'IMAGE:872193','LID':'539', 'SEQTYPE':'EST'},
{'ACC':'AI842963.1','NID':'g5477176',
'CLONE':'UI-M-AO1-aem-f-10-0-UI','END':"3'",'LID':'1944',
'SEQTYPE':'EST','TRACE':'158501677'},
{'ACC':'CB595147.1','NID':'g29513003',
'CLONE':'IMAGE:30300703','END':"5'",'LID':'12885',
'MGC':'6677832', 'SEQTYPE':'EST'},
{'ACC':'BY144053.1','NID':'g26280109',
'CLONE':'L930184D22','END':"5'",'LID':'12267',
'SEQTYPE':'EST'}])
for obs, exp in zip(first.SEQUENCE, exp_seqs):
self.assertEqual(obs, exp)
self.assertEqual(second.ID, 'Mm.5')
self.assertEqual(second.TITLE, 'homeo box A10')
self.assertEqual(second.GENE, 'Hoxa10')
self.assertEqual(second.CYTOBAND, '6 26.33 cM')
self.assertEqual(second.LOCUSLINK, 15395)
self.assertEqual(second.EXPRESS,['kidney','colon','mammary gland'])
self.assertEqual(second.CHROMOSOME, '6')
self.assertEqual(second.PROTSIM, map(UniGeneProtSimRecord, [
{'ORG':'Caenorhabditis elegans', 'PROTGI':'7510074',
'PROTID':'pir:T31611','PCT':'30',
'ALN':'326'}]))
self.assertEqual(second.SCOUNT, 1)
self.assertEqual(second.STS, [])
self.assertEqual(second.SEQUENCE, map(UniGeneSeqRecord, [
{'ACC':'AW990320.1','NID':'g8185938',
'CLONE':'IMAGE:1513482','END':"5'",'LID':'1043',
'SEQTYPE':'EST','TRACE':'94472873'}]))
#test that the synonym mapping works OK
self.assertEqual(second.SequenceIds[0].NucleotideId, 'g8185938')
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_motif/__init__.py 000644 000765 000024 00000000453 12024702176 021673 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
__all__ = ['test_util']
__author__ = ""
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jeremy Widmann"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Jeremy Widmann"
__email__ = "jeremy.widmann@colorado.edu"
__status__ = "Production"
PyCogent-1.5.3/tests/test_motif/test_util.py 000644 000765 000024 00000050546 12024702176 022160 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
#file cogent_tests/motif/test_util.py
from __future__ import division
from cogent.util.unit_test import TestCase, main
from cogent.motif.util import Location, ModuleInstance, Module, Motif,\
MotifResults, MotifFormatter, html_color_to_rgb
from cogent.core.moltype import ASCII
__author__ = "Jeremy Widmann"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jeremy Widmann", "Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Jeremy Widmann"
__email__ = "jeremy.widmann@colorado.edu"
__status__ = "Production"
class LocationTests(TestCase):
"""Tests of Location class for holding module location.
"""
def setUp(self):
"""Setup for Location tests."""
self.location_no_end = Location('seq1',1)
self.locations = [
Location('seq1',1,5),
Location('seq2',3,54),
Location('seq1',5,3),
Location('seq1',2,3),
Location('seq2',54,2),
Location('seq0',1,3),
]
self.locations_sorted = [
Location('seq0',1,3),
Location('seq1',1,5),
Location('seq1',5,3),
Location('seq1',2,3),
Location('seq2',3,54),
Location('seq2',54,2),
]
def test_init_no_end(self):
"""__init__ should properly initialize Location object"""
self.assertEqual(self.location_no_end.SeqId, 'seq1')
self.assertEqual(self.location_no_end.Start, 1)
self.assertEqual(self.location_no_end.End, 2)
def test_init_complete(self):
"""__init__ should properly initialize Location object"""
self.assertEqual(self.locations[0].SeqId, 'seq1')
self.assertEqual(self.locations[0].Start, 1)
self.assertEqual(self.locations[0].End, 5)
def test_cmp(self):
"""Location object should sort properly with __cmp__ overwritten."""
self.locations.sort()
self.assertEqual(self.locations, self.locations_sorted)
class ModuleInstanceTests(TestCase):
"""Tests for ModuleInstance class."""
def setUp(self):
"""Setup function for ModuleInstance tests."""
self.sequences = [
'accucua',
'caucguu',
'accucua',
'cgacucg',
'cgaucag',
'cuguacc',
'cgcauca',
]
self.locations = [
Location('seq0',1,3),
Location('seq1',2,3),
Location('seq1',1,5),
Location('seq1',5,3),
Location('seq2',3,54),
Location('seq2',54,2),
Location('seq3',4,0),
]
self.Pvalues = [
.1,
.002,
.0000000003,
.6,
.0094,
.6,
.00201,
]
self.Evalues = [
.006,
.02,
.9,
.0200000001,
.09,
.0000003,
.900001,
]
self.modules_no_e = []
for i in xrange(7):
self.modules_no_e.append(ModuleInstance(self.sequences[i],
self.locations[i],
self.Pvalues[i]))
self.modules_p_and_e = []
for i in xrange(7):
self.modules_p_and_e.append(ModuleInstance(self.sequences[i],
self.locations[i],
self.Pvalues[i],
self.Evalues[i]))
self.modules_no_e_sorted = [
ModuleInstance(self.sequences[2],self.locations[2],self.Pvalues[2]),
ModuleInstance(self.sequences[1],self.locations[1],self.Pvalues[1]),
ModuleInstance(self.sequences[6],self.locations[6],self.Pvalues[6]),
ModuleInstance(self.sequences[4],self.locations[4],self.Pvalues[4]),
ModuleInstance(self.sequences[0],self.locations[0],self.Pvalues[0]),
ModuleInstance(self.sequences[3],self.locations[3],self.Pvalues[3]),
ModuleInstance(self.sequences[5],self.locations[5],self.Pvalues[5]),
]
self.modules_p_and_e_sorted = [
ModuleInstance(self.sequences[2],self.locations[2],self.Pvalues[2]),
ModuleInstance(self.sequences[1],self.locations[1],self.Pvalues[1]),
ModuleInstance(self.sequences[6],self.locations[6],self.Pvalues[6]),
ModuleInstance(self.sequences[4],self.locations[4],self.Pvalues[4]),
ModuleInstance(self.sequences[0],self.locations[0],self.Pvalues[0]),
ModuleInstance(self.sequences[5],self.locations[5],self.Pvalues[5]),
ModuleInstance(self.sequences[3],self.locations[3],self.Pvalues[3]),
]
def test_init_no_p_e_values(self):
"""Init should properly initialize ModuleInstance objects."""
module1 = ModuleInstance(self.sequences[0], self.locations[0])
module2 = ModuleInstance(self.sequences[1], self.locations[1])
self.assertEqual(module1.Sequence, 'accucua')
self.assertEqual(module1.Location.SeqId, 'seq0')
self.assertEqual(module2.Sequence, 'caucguu')
self.assertEqual(module2.Location.SeqId, 'seq1')
def test_init_no_e_values(self):
"""Init should properly initialize ModuleInstance objects."""
self.modules_no_e.sort()
self.assertEqual(self.modules_no_e, self.modules_no_e_sorted)
def test_len(self):
"""len() should return correct length of the ModuleInstance sequence."""
for module in self.modules_no_e:
self.assertEqual(len(module), 7)
def test_str(self):
"""str() should return the correct string for each ModuleInstance."""
for module, seq in zip(self.modules_no_e, self.sequences):
self.assertEqual(str(module), seq)
def test_cmp(self):
"""ModuleInstances should sort properly with __cmp__ overwritten."""
self.modules_no_e.sort()
self.modules_p_and_e.sort()
self.assertEqual(map(str,self.modules_no_e),
map(str,self.modules_no_e_sorted))
self.assertEqual(map(str,self.modules_p_and_e),
map(str,self.modules_p_and_e_sorted))
class ModuleTests(TestCase):
"""Tests for Module class."""
def setUp(self):
"""SetUp for Module class tests."""
self.sequences = [
'accucua',
'caucguu',
'accucua',
'cgacucg',
'cgaucag',
'cuguacc',
'cgcauca',
]
self.locations = [
Location('seq0',1,3),
Location('seq1',2,3),
Location('seq1',1,5),
Location('seq1',5,3),
Location('seq2',3,54),
Location('seq2',54,2),
Location('seq3',4,0),
]
self.Pvalues = [
.1,
.002,
.0000000003,
.6,
.0094,
.6,
.00201,
]
self.Evalues = [
.006,
.02,
.9,
.0200000001,
.09,
.0000003,
.900001,
]
self.modules_no_e = []
for i in xrange(7):
self.modules_no_e.append(ModuleInstance(self.sequences[i],
self.locations[i],
self.Pvalues[i]))
self.modules_p_and_e = []
for i in xrange(7):
self.modules_p_and_e.append(ModuleInstance(self.sequences[i],
self.locations[i],
self.Pvalues[i],
self.Evalues[i]))
self.module_no_template = Module(
{
(self.modules_no_e[0].Location.SeqId,
self.modules_no_e[0].Location.Start):self.modules_no_e[0],
(self.modules_no_e[1].Location.SeqId,
self.modules_no_e[1].Location.Start):self.modules_no_e[1],
(self.modules_no_e[2].Location.SeqId,
self.modules_no_e[2].Location.Start):self.modules_no_e[2],
(self.modules_no_e[3].Location.SeqId,
self.modules_no_e[3].Location.Start):self.modules_no_e[3],
(self.modules_no_e[4].Location.SeqId,
self.modules_no_e[4].Location.Start):self.modules_no_e[4],
(self.modules_no_e[5].Location.SeqId,
self.modules_no_e[5].Location.Start):self.modules_no_e[5],
(self.modules_no_e[6].Location.SeqId,
self.modules_no_e[6].Location.Start):self.modules_no_e[6],
}
)
self.module_with_template = Module(
{
(self.modules_no_e[0].Location.SeqId,
self.modules_no_e[0].Location.Start):self.modules_no_e[0],
(self.modules_no_e[1].Location.SeqId,
self.modules_no_e[1].Location.Start):self.modules_no_e[1],
(self.modules_no_e[2].Location.SeqId,
self.modules_no_e[2].Location.Start):self.modules_no_e[2],
(self.modules_no_e[3].Location.SeqId,
self.modules_no_e[3].Location.Start):self.modules_no_e[3],
(self.modules_no_e[4].Location.SeqId,
self.modules_no_e[4].Location.Start):self.modules_no_e[4],
(self.modules_no_e[5].Location.SeqId,
self.modules_no_e[5].Location.Start):self.modules_no_e[5],
(self.modules_no_e[6].Location.SeqId,
self.modules_no_e[6].Location.Start):self.modules_no_e[6],
},
Template = 'accgucg'
)
def test_init(self):
"""Init should properly initialize Module object."""
module = Module(data={(self.modules_no_e[0].Location.SeqId,
self.modules_no_e[0].Location.Start): \
self.modules_no_e[0]})
self.assertEqual(module.Template, None)
self.assertEqual(module.Alphabet, ASCII.Alphabet)
self.assertEqual(module.Pvalue, None)
self.assertEqual(module.Evalue, None)
self.assertEqual(module.keys(),[('seq0',1)])
self.assertEqual(module.values(),[ModuleInstance(self.sequences[0],
self.locations[0],
self.Pvalues[0])])
def test_cmp(self):
"""Module objects should sort properly with __cmp__ overwritten."""
pvals_sorted = [3e-010, 0.002,
0.0020100000000000001,
0.0094000000000000004,
0.10000000000000001,
0.59999999999999998,
0.59999999999999998]
evals_sorted = [.9,
.02,
.900001,
.09,
.006,
.0000003,
.0200000001,
]
modules = []
for instance, pvalue, evalue in zip(self.modules_no_e,
self.Pvalues,
self.Evalues):
modules.append(Module({(instance.Location.SeqId,
instance.Location.Start):instance},
Pvalue=pvalue,
Evalue=evalue))
modules.sort()
for ans, p, e in zip(modules, pvals_sorted, evals_sorted):
self.assertEqual(ans.Pvalue, p)
self.assertEqual(ans.Evalue, e)
def test_LocationDict(self):
"""LocationDict should return correct dictionary of locations."""
location_dict_ans = {
'seq0':[1],
'seq1':[1,2,3],
'seq2':[2,3],
'seq3':[0],
}
location_dict = self.module_no_template.LocationDict
self.assertEqual(location_dict,location_dict_ans)
class MotifTests(TestCase):
"""Tests for Motif class."""
def test_init(self):
"""Init should properly initialize Motif object."""
module = Module({
('a',3): ModuleInstance('guc', Location('a',3,5)),
('b',3): ModuleInstance('guc', Location('b',3,5)),
('c',8): ModuleInstance('guc', Location('c',8,10)),
})
m = Motif(module)
self.assertEqual(m.Modules,[module])
self.assertEqual(m.Info,None)
class MotifResultsTests(TestCase):
"""Tests for MotifResults class."""
def test_init(self):
"""Init should properly initialize MotifResults object."""
module = Module({
('a',3): ModuleInstance('guc', Location('a',3,5)),
('b',3): ModuleInstance('guc', Location('b',3,5)),
('c',8): ModuleInstance('guc', Location('c',8,10)),
})
motif = Motif([module])
results = {'key1':'value1','key2':'value2'}
parameters = {'parameter1':1,'parameter2':2}
mr = MotifResults([module],[motif],results,parameters)
self.assertEqual(mr.Modules,[module])
self.assertEqual(mr.Motifs,[motif])
self.assertEqual(mr.Results,results)
self.assertEqual(mr.parameter1,1)
self.assertEqual(mr.parameter2,2)
class UtilTests(TestCase):
"""Tests for utility functions."""
def test_html_color_to_rgb(self):
"""Tests for html_to_color_rgb."""
html_colors = ['#FF0000','#00FF00','#0000FF','545454']
rgb_colors = [(1.0,0.0,0.0),(0.0,1.0,0.0),(0.0,0.0,1.0),\
(0.32941176470588235,0.32941176470588235,0.32941176470588235)]
for html, rgb in zip(html_colors, rgb_colors):
self.assertEqual(html_color_to_rgb(html),rgb)
def test_make_remap_dict(self):
"""Tests for make_remap_dict."""
pass
class MotifFormatterTests(TestCase):
"""Tests for MotifFormatter class."""
def setUp(self):
"""SetUp for MotifFormatter class tests."""
self.sequences = [
'accucua',
'caucguu',
'accucua',
'cgacucg',
'cgaucag',
'cuguacc',
'cgcauca',
]
self.locations = [
Location('seq0',1,3),
Location('seq1',2,3),
Location('seq1',1,5),
Location('seq1',5,3),
Location('seq2',3,54),
Location('seq2',54,2),
Location('seq3',4,0),
]
self.Pvalues = [
.1,
.002,
.0000000003,
.6,
.0094,
.6,
.00201,
]
self.Evalues = [
.006,
.02,
.9,
.0200000001,
.09,
.0000003,
.900001,
]
self.modules_no_e = []
for i in xrange(7):
self.modules_no_e.append(ModuleInstance(self.sequences[i],
self.locations[i],
self.Pvalues[i]))
self.module_with_template = Module(
{
(self.modules_no_e[0].Location.SeqId,
self.modules_no_e[0].Location.Start):self.modules_no_e[0],
(self.modules_no_e[1].Location.SeqId,
self.modules_no_e[1].Location.Start):self.modules_no_e[1],
(self.modules_no_e[2].Location.SeqId,
self.modules_no_e[2].Location.Start):self.modules_no_e[2],
(self.modules_no_e[3].Location.SeqId,
self.modules_no_e[3].Location.Start):self.modules_no_e[3],
(self.modules_no_e[4].Location.SeqId,
self.modules_no_e[4].Location.Start):self.modules_no_e[4],
(self.modules_no_e[5].Location.SeqId,
self.modules_no_e[5].Location.Start):self.modules_no_e[5],
(self.modules_no_e[6].Location.SeqId,
self.modules_no_e[6].Location.Start):self.modules_no_e[6],
},
Template = 'accgucg', ID='1'
)
self.modules_with_ids =\
[Module({
('a',3): ModuleInstance('guc', Location('a',3,5)),
('b',3): ModuleInstance('guc', Location('b',3,5)),
('c',8): ModuleInstance('guc', Location('c',8,10)),
},ID='1'),
Module({
('a',7): ModuleInstance('cca', Location('a',7,9)),
('b',7): ModuleInstance('cca', Location('b',7,9)),
('c',11): ModuleInstance('cca',Location('c',11,13)),
},ID='2'),
Module({
('a',10): ModuleInstance('gca',Location('a',10,12)),
('b',10): ModuleInstance('gca',Location('b',10,12)),
('c',14): ModuleInstance('gca',Location('c',14,12)),
},ID='3'),
Module({
('a',13): ModuleInstance('ggg',Location('a',13,15)),
('b',13): ModuleInstance('ggg',Location('b',13,15)),
('c',18): ModuleInstance('ggg',Location('c',18,20)),
},ID='4'),
]
self.motifs_with_ids = map(Motif,self.modules_with_ids)
self.motif_results = MotifResults(Modules=self.modules_with_ids,\
Motifs=self.motifs_with_ids)
self.color_map = {'1':"""background-color: #0000FF; ; font-family: 'Courier New', Courier""",
'2':"""background-color: #FFFF00; ; font-family: 'Courier New', Courier""",
'3':"""background-color: #00FFFF; ; font-family: 'Courier New', Courier""",
'4':"""background-color: #FF00FF; ; font-family: 'Courier New', Courier""",
}
self.color_map_rgb = {
'color_1':(0.0,0.0,1.0),
'color_2':(1.0,1.0,0.0),
'color_3':(0.0,1.0,1.0),
'color_4':(1.0,0.0,1.0),
}
def test_getColorMapS0(self):
"""tests for getColorMapS0"""
mf = MotifFormatter()
module_ids = ['1','2','3','4']
self.assertEqual(mf.getColorMapS0(module_ids),self.color_map)
def test_getColorMap(self):
"""tests for getColorMap"""
mf = MotifFormatter()
self.assertEqual(mf.getColorMap(self.motif_results),self.color_map)
def test_getColorMapRgb(self):
"""tests for getColorMapRgb"""
mf = MotifFormatter()
self.assertEqual(mf.getColorMapRgb(self.motif_results),\
self.color_map_rgb)
#run if called from command-line
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_maths/__init__.py 000644 000765 000024 00000001154 12024702176 021670 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
__all__ = ['test_fit_function',
'test_geometry',
'test_matrix',
'test_matrix_logarithm',
'test_optimisers',
'test_stats',
'test_unifrac',
'test_spatial']
__author__ = ""
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Peter Maxwell", "Catherine Lozupone",
"Gavin Huttley", "Sandra Smit", "Marcin Cieslik", "Antonio Gonzalez Pena"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
PyCogent-1.5.3/tests/test_maths/test_distance_transform.py 000644 000765 000024 00000052105 12024702176 025057 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for distance_transform.py functions.
"""
from __future__ import division
from cogent.util.unit_test import TestCase, main
from cogent.maths.distance_transform import *
from numpy import array, sqrt, shape, ones, diag
__author__ = "Justin Kuczynski"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Justin Kuczynski",
"Zongzhi Liu",
"Greg Caporaso"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Justin Kuczynski"
__email__ = "justinak@gmail.com"
__status__ = "Prototype"
class functionTests(TestCase):
"""Tests of top-level functions."""
def setUp(self):
self.mat_test = asmatrix([[10, 10, 20],
[10, 15, 10],
[15, 5, 5]], 'float')
self.emptyarray = array([], 'd')
self.mtx1 = array([[1, 3],
[0.0, 23.1]],'d')
self.dense1 = array([[1, 3],
[5, 2],
[0.1, 22]],'d')
self.zeromtx = array([[ 0.0, 0.0, 0.0],
[ 0.0, 0.0 , 0.0],
[ 0.0, 0.0, 0.0 ],
[ 0.0, 0.0, 0.0 ]],'d')
self.sparse1 = array([[ 0.0, 0.0, 5.33],
[ 0.0, 0.0 , 0.4],
[ 1.0, 0.0, 0.0 ],
[ 0.0, 0.0, 0.0 ]],'d')
self.input_binary_dist_otu_gain1 = array([[2,1,0,0],
[1,0,0,1],
[0,0,3,0],
[0,0,0,1]])
def get_sym_mtx_from_uptri(self, mtx):
"""helper fn, only for square matrices"""
numrows, numcols = shape(mtx)
for i in range(numrows):
for j in range(i):
if i==j:
break
mtx[i,j] = mtx[j,i] # j < i, so row upper triangle
return mtx
def test_dist_canberra(self):
"""tests dist_canberra
tests inputs of empty mtx, zeros, and results compared with calcs done
by hand"""
self.assertFloatEqual(dist_canberra(self.zeromtx), zeros((4,4),'d'))
mtx1expected = array([[ 0.0, 46.2/52.2],
[ 46.2/52.2, 0.0 ]],'d')
self.assertFloatEqual(dist_canberra(self.mtx1), mtx1expected)
sparse1exp = ones((self.sparse1.shape[0],self.sparse1.shape[0]))
# remove diagonal
sparse1exp[0,0] = sparse1exp[1,1] = sparse1exp[2,2] = sparse1exp[3,3]\
= 0.0
sparse1exp[0,1] = sparse1exp[1,0] = ( (5.33-.4) / (5.33 + .4) )
self.assertFloatEqual(dist_canberra(self.sparse1), sparse1exp)
def test_dist_euclidean(self):
"""tests dist_euclidean
tests inputs of empty mtx, zeros, and dense1 compared with calcs done
by hand"""
self.assertFloatEqual(dist_euclidean(self.zeromtx), zeros((4,4),'d'))
dense1expected = array([[ 0.0, sqrt(17.), sqrt(.9**2 + 19**2)],
[ sqrt(17.), 0.0 , sqrt(4.9**2 + 20**2)],
[ sqrt(.9**2 + 19**2), sqrt(4.9**2 + 20**2), 0.0 ]],'d')
self.assertFloatEqual(dist_euclidean(self.dense1), dense1expected)
def test_dist_gower(self):
"""tests dist_gower
tests inputs of empty mtx, zeros, and results compared with calcs done
by hand"""
self.assertFloatEqual(dist_gower(self.zeromtx), zeros((4,4),'d'))
mtx1expected = array([[ 0.0, 2.],
[ 2., 0.0 ]],'d')
self.assertFloatEqual(dist_gower(self.mtx1), mtx1expected)
sparse1expected = array([[ 0.0, 4.93/5.33, 2, 1],
[ 4.93/5.33 , 0.0 , 1 + .4/5.33, .4/5.33],
[ 2, 1 + .4/5.33, 0,1],
[1, .4/5.33, 1, 0.0]],'d')
self.assertFloatEqual(dist_gower(self.sparse1), sparse1expected)
def test_dist_manhattan(self):
"""tests dist_manhattan
tests inputs of empty mtx, zeros, and dense1 compared with calcs done
by hand"""
self.assertFloatEqual(dist_manhattan(self.zeromtx), zeros((4,4),'d'))
dense1expected = array([[ 0.0, 5.0, 019.9],
[ 5.0, 0.0 , 24.9],
[ 19.9, 24.90, 0.0 ]],'d')
self.assertFloatEqual(dist_manhattan(self.dense1), dense1expected)
def test_dist_abund_jaccard(self):
"""dist_abund_jaccard should compute distances for dense1 and mtx1"""
mtx1_expected = array([[0, 0.25], [0.25, 0]], 'd')
self.assertEqual(dist_abund_jaccard(self.mtx1), mtx1_expected)
dense1_expected = zeros((3,3), 'd')
self.assertEqual(dist_abund_jaccard(self.dense1), dense1_expected)
sparse1_expected = array([
[0.0, 0.0, 1.0, 1.0],
[0.0, 0.0, 1.0, 1.0],
[1.0, 1.0, 0.0, 1.0],
[1.0, 1.0, 1.0, 0.0]], 'd')
self.assertEqual(dist_abund_jaccard(self.sparse1), sparse1_expected)
def test_dist_morisita_horn(self):
"""tests dist_morisita_horn
tests inputs of empty mtx, zeros, and dense1 compared with calcs done
by hand"""
self.assertFloatEqual(dist_morisita_horn(self.zeromtx),
zeros((4,4),'d'))
a = 1 - 2*69.3/(26/16. * 23.1 * 4)
mtx1expected = array([[0, a],
[a,0]],'d')
self.assertFloatEqual(dist_morisita_horn(self.mtx1),
mtx1expected)
def test_dist_bray_curtis(self):
"""tests dist_bray_curtis
tests inputs of empty mtx, zeros, and mtx1 compared with calcs done
by hand"""
self.assertFloatEqual(dist_manhattan(self.zeromtx), zeros((4,4)*1,'d'))
mtx1expected = array([[0, 21.1/27.1],
[21.1/27.1, 0]],'d')
self.assertFloatEqual(dist_bray_curtis(self.mtx1), mtx1expected)
def test_dist_bray_curtis_faith(self):
"""tests dist_bray_curtis_faith
tests inputs of empty mtx, zeros, and mtx1 compared with calcs done
by hand"""
self.assertFloatEqual(dist_manhattan(self.zeromtx), zeros((4,4)*1,'d'))
mtx1expected = array([[0, 21.1/27.1],
[21.1/27.1, 0]],'d')
self.assertFloatEqual(dist_bray_curtis_faith(self.mtx1), mtx1expected)
def test_dist_soergel(self):
"""tests dist_soergel
tests inputs of empty mtx, zeros, and dense1 compared with calcs done
by hand/manhattan dist"""
self.assertFloatEqual(dist_soergel(self.zeromtx), zeros((4,4)*1,'d'))
dense1expected = dist_manhattan(self.dense1)
dense1norm = array([[ 1, 8, 23],
[8,1,27],
[23,27,1]],'d')
dense1expected /= dense1norm
self.assertFloatEqual(dist_soergel(self.dense1), dense1expected)
def test_dist_kulczynski(self):
"""tests dist_kulczynski
tests inputs of empty mtx, zeros, and mtx1 compared with calcs done
by hand"""
self.assertFloatEqual(dist_kulczynski(self.zeromtx),
zeros((4,4)*1,'d'))
mtx1expected = array([[0, 1.-1./2.*(3./4. + 3./23.1)],
[1.-1./2.*(3./4. + 3./23.1), 0]],'d')
self.assertFloatEqual(dist_kulczynski(self.mtx1), mtx1expected)
def test_dist_pearson(self):
"""tests dist_pearson
tests inputs of empty mtx, zeros, mtx compared with calcs done
by hand, and an example from
http://davidmlane.com/hyperstat/A56626.html
"""
self.assertFloatEqual(dist_pearson(self.zeromtx), zeros((4,4),'d'))
mtx1expected = array([[0, 0],
[0, 0]],'d')
self.assertFloatEqual(dist_pearson(self.mtx1), mtx1expected)
# example 1 from http://davidmlane.com/hyperstat/A56626.html
ex1 = array([[1, 2, 3, ],
[2,5,6]],'d')
ex1res = 1 - 4./sqrt(2.*(8+2./3.))
ex1expected = array([[0, ex1res],
[ex1res, 0]],'d')
self.assertFloatEqual(dist_pearson(ex1), ex1expected)
def test_dist_spearman_approx(self):
"""tests dist_spearman_approx
tests inputs of empty mtx, zeros, and an example from wikipedia
"""
self.assertFloatEqual(dist_spearman_approx(self.zeromtx),
zeros((4,4)*1,'d'))
# ex1 from wikipedia Spearman's_rank_correlation_coefficient 20jan2009
ex1 = array([[106 ,86 ,100 ,101 ,99 ,103 ,97 ,113 ,112 ,110],
[7,0,27,50,28,29,20,12,6,17]],'d')
ex1res = 6.*194./(10.*99.)
ex1expected = array([[0, ex1res],
[ex1res, 0]],'d')
self.assertFloatEqual(dist_spearman_approx(ex1), ex1expected)
# now binary fns
def test_binary_dist_otu_gain(self):
""" binary OTU gain functions as expected """
actual = binary_dist_otu_gain(self.input_binary_dist_otu_gain1)
expected = array([[0, 1, 2, 2],
[1, 0, 2, 1],
[1, 1, 0, 1],
[1, 0, 1, 0]])
self.assertEqual(actual,expected)
def test_binary_dist_chisq(self):
"""tests binary_dist_chisq
tests inputs of empty mtx, zeros, and mtx1 compared with calcs done
by hand"""
self.assertFloatEqual(binary_dist_chisq(self.zeromtx),
zeros((4,4),'d'))
mtx1expected = array([[0,sqrt(9/8.)],
[ sqrt(9/8.),0]],'d')
self.assertFloatEqual(binary_dist_chisq(self.mtx1),
mtx1expected)
def test_binary_dist_chord(self):
"""tests binary_dist_chord
tests inputs of empty mtx, zeros, and results compared with calcs done
by hand"""
self.assertFloatEqual(binary_dist_chord(self.zeromtx),
zeros((4,4),'d'))
mtx1expected = array([[0,sqrt( 1/2. + (1./sqrt(2.) -1.)**2)],
[ sqrt( 1/2. + (1./sqrt(2.) -1.)**2),0]],'d')
self.assertFloatEqual(binary_dist_chord(self.mtx1),
mtx1expected)
def test_binary_dist_lennon(self):
"""tests binary_dist_lennon
tests inputs of empty mtx, zeros, and results compared with calcs done
by hand"""
self.assertFloatEqual(binary_dist_lennon(self.zeromtx),
zeros((4,4),'d'))
mtxa = array([[5.2,9,0.2],
[0,99,1],
[0,0.0,8233.1]],'d')
self.assertFloatEqual(binary_dist_lennon(mtxa),
zeros((3,3),'d') )
mtxb = array([[5.2,0,0.2, 9.2],
[0,0,0,1],
[0,3.2,0,8233.1]],'d')
mtxbexpected = array([[0,0,0.5],
[0,0,0],
[0.5,0,0]],'d')
self.assertFloatEqual(binary_dist_lennon(mtxb),
mtxbexpected)
def test_binary_dist_pearson(self):
"""tests binary_dist_pearson
tests inputs of empty mtx, zeros, and dense1 compared with calcs done
by hand"""
self.assertFloatEqual(binary_dist_pearson(self.zeromtx),
zeros((4,4),'d'))
self.assertFloatEqual(binary_dist_pearson(self.dense1), zeros((3,3)))
def test_binary_dist_jaccard(self):
"""tests binary_dist_jaccard
tests inputs of empty mtx, zeros, and sparse1 compared with calcs done
by hand"""
self.assertFloatEqual(binary_dist_jaccard(self.zeromtx),
zeros((4,4),'d'))
sparse1expected = array([[0, 0, 1., 1.],
[0, 0, 1, 1],
[1,1,0,1],
[1,1,1,0]],'d')
self.assertFloatEqual(binary_dist_jaccard(self.sparse1),
sparse1expected)
sparse1expected = dist_manhattan(self.sparse1.astype(bool))
sparse1norm = array([[ 1, 1,2,1],
[1,1,2,1],
[2,2,1,1],
[1,1,1,100]],'d')
sparse1expected /= sparse1norm
self.assertFloatEqual(binary_dist_jaccard(self.sparse1),
sparse1expected)
def test_binary_dist_ochiai(self):
"""tests binary_dist_ochiai
tests inputs of empty mtx, zeros, and mtx1 compared with calcs done
by hand"""
self.assertFloatEqual(binary_dist_ochiai(self.zeromtx),
zeros((4,4),'d'))
mtx1expected = array([[0,1-1/sqrt(2.)],
[1-1/sqrt(2.), 0,]],'d')
self.assertFloatEqual(binary_dist_ochiai(self.mtx1),mtx1expected)
def test_binary_dist_hamming(self):
"""tests binary_dist_hamming
tests inputs of empty mtx, zeros, and mtx1 compared with calcs done
by hand"""
self.assertFloatEqual(binary_dist_hamming(self.zeromtx),
zeros((4,4),'d'))
mtx1expected = array([[0,1],
[1, 0,]],'d')
self.assertFloatEqual(binary_dist_hamming(self.mtx1),mtx1expected)
def test_binary_dist_sorensen_dice(self):
"""tests binary_dist_sorensen_dice
tests inputs of empty mtx, zeros, and mtx1 compared with calcs done
by hand"""
self.assertFloatEqual(binary_dist_sorensen_dice(self.zeromtx),
zeros((4,4),'d'))
mtx1expected = array([[0,1/3.],
[1/3., 0,]],'d')
self.assertFloatEqual(binary_dist_sorensen_dice(self.mtx1),
mtx1expected)
sparse1expected = array([[0, 0, 1., 1.],
[0, 0, 1, 1],
[1,1,0,1],
[1,1,1,0]],'d')
self.assertFloatEqual(binary_dist_sorensen_dice(self.sparse1),
sparse1expected)
def test_binary_dist_euclidean(self):
"""tests binary_dist_euclidean
tests two inputs compared with calculations by hand, and runs zeros
and an empty input"""
dense1expected = array([[ 0.0, 0.0, 0.0],
[ 0.0, 0.0 , 0.0],
[ 0.0, 0.0, 0.0 ]],'d')
sparse1expected = zeros((4,4),'d')
sparse1expected[0,2] = sqrt(2)
sparse1expected[0,3] = 1.0
sparse1expected[1,2] = sqrt(2)
sparse1expected[1,3] = 1.0
sparse1expected[2,3] = 1.0
sparse1expected = self.get_sym_mtx_from_uptri(sparse1expected)
self.assertFloatEqual(binary_dist_euclidean(self.dense1),
dense1expected)
self.assertFloatEqual(binary_dist_euclidean(self.sparse1),
sparse1expected)
self.assertFloatEqual(binary_dist_euclidean(self.zeromtx),
zeros((4,4),'d'))
#zj's stuff
def test_chord_transform(self):
"""trans_chord should return the exp result in the ref paper."""
exp = [[ 0.40824829, 0.40824829, 0.81649658],
[ 0.48507125, 0.72760688, 0.48507125],
[ 0.90453403, 0.30151134, 0.30151134]]
res = trans_chord(self.mat_test)
self.assertFloatEqual(res, exp)
def test_chord_dist(self):
"""dist_chord should return the exp result."""
self.assertFloatEqual(dist_chord(self.zeromtx), zeros((4,4),'d'))
exp = [[ 0. , 0.46662021, 0.72311971],
[ 0.46662021, 0. , 0.62546036],
[ 0.72311971, 0.62546036, 0. ]]
dist = dist_chord(self.mat_test)
self.assertFloatEqual(dist, exp)
def test_chisq_transform(self):
"""trans_chisq should return the exp result in the ref paper."""
exp_m = [[ 0.42257713, 0.45643546, 0.84515425],
[ 0.48294529, 0.7824608 , 0.48294529],
[ 1.01418511, 0.36514837, 0.3380617 ]]
res_m = trans_chisq(self.mat_test)
self.assertFloatEqual(res_m, exp_m)
def test_chisq_distance(self):
"""dist_chisq should return the exp result."""
self.assertFloatEqual(dist_chisq(self.zeromtx), zeros((4,4),'d'))
exp_d = [[ 0. , 0.4910521 , 0.78452291],
[ 0.4910521 , 0. , 0.69091002],
[ 0.78452291, 0.69091002, 0. ]]
res_d = dist_chisq(self.mat_test)
self.assertFloatEqual(res_d, exp_d)
def test_hellinger_transform(self):
"""dist_hellinger should return the exp result in the ref paper."""
exp = [[ 0.5 , 0.5 , 0.70710678],
[ 0.53452248, 0.65465367, 0.53452248],
[ 0.77459667, 0.4472136 , 0.4472136 ]]
res = trans_hellinger(self.mat_test)
self.assertFloatEqual(res, exp)
def test_hellinger_distance(self):
"""dist_hellinger should return the exp result."""
self.assertFloatEqual(dist_hellinger(self.zeromtx), zeros((4,4),'d'))
exp = [[ 0. , 0.23429661, 0.38175149],
[ 0.23429661, 0. , 0.32907422],
[ 0.38175149, 0.32907422, 0. ]]
dist = dist_hellinger(self.mat_test)
self.assertFloatEqual(dist, exp)
def test_species_profile_transform(self):
"""trans_specprof should return the exp result."""
exp = [[ 0.25 , 0.25 , 0.5 ],
[ 0.28571429, 0.42857143, 0.28571429],
[ 0.6 , 0.2 , 0.2 ]]
res = trans_specprof(self.mat_test)
self.assertFloatEqual(res, exp)
def test_species_profile_distance(self):
"""dist_specprof should return the exp result."""
self.assertFloatEqual(dist_specprof(self.zeromtx), zeros((4,4),'d'))
exp = [[ 0. , 0.28121457, 0.46368092],
[ 0.28121457, 0. , 0.39795395],
[ 0.46368092, 0.39795395, 0. ]]
dist = dist_specprof(self.mat_test)
self.assertFloatEqual(dist, exp)
def test_dist_bray_curtis_magurran1(self):
""" zero values should return zero dist, or 1 with nonzero samples"""
res = dist_bray_curtis_magurran(
numpy.array([[0,0,0],
[0,0,0],
[1,1,1],
]))
self.assertFloatEqual(res,numpy.array([
[0,0,1],
[0,0,1],
[1,1,0],
]))
def test_dist_bray_curtis_magurran2(self):
""" should match hand-calculated values"""
res = dist_bray_curtis_magurran(
numpy.array([[1,4,3],
[1,3,5],
[0,2,0],
]))
self.assertFloatEqual(res,numpy.array([
[0,1-14/17,1-(.4)],
[1-14/17,0,1-4/11],
[1-.4,1-4/11,0],
]))
#def test_no_dupes(self):
#""" here we check all distance functions in distance_transform for
#duplicate
#results. Uses an unsafe hack to get all distance functions,
#thus disabled by default
#The dataset is from Legendre 2001, Ecologically Meaningful...
#also, doesn't actually raise an error on failing, just prints to
#stdout
#"""
#import distance_transform
## L19 dataset
#L19data = array(
#[[7,1,0,0,0,0,0,0,0],
#[4,2,0,0,0,1,0,0,0],
#[2,4,0,0,0,1,0,0,0],
#[1,7,0,0,0,0,0,0,0],
#[0,8,0,0,0,0,0,0,0],
#[0,7,1,0,0,0,0,0,0],
#[0,4,2,0,0,0,2,0,0],
#[0,2,4,0,0,0,1,0,0],
#[0,1,7,0,0,0,0,0,0],
#[0,0,8,0,0,0,0,0,0],
#[0,0,7,1,0,0,0,0,0],
#[0,0,4,2,0,0,0,3,0],
#[0,0,2,4,0,0,0,1,0],
#[0,0,1,7,0,0,0,0,0],
#[0,0,0,8,0,0,0,0,0],
#[0,0,0,7,1,0,0,0,0],
#[0,0,0,4,2,0,0,0,4],
#[0,0,0,2,4,0,0,0,1],
#[0,0,0,1,7,0,0,0,0]], 'd')
#distfns = []
#distfn_strs = dir(distance_transform)
## warning: dangerous eval, and might catch bad or not functions
#for fnstr in distfn_strs:
#if fnstr.find('dist') != -1:
#distfns.append(eval('%s' % fnstr))
#dist_results = []
#for distfn in distfns:
#dist_results.append(distfn(L19data))
#for i in range(len(dist_results)):
#for j in range(i):
#try:
#self.assertFloatEqual(dist_results[i], dist_results[j])
#except:
#pass # should not be equal, so catch error and proceed
#else:
#print "duplicates found: ", distfns[i], distfns[j]
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_maths/test_fit_function.py 000644 000765 000024 00000003541 12024702176 023661 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for fit function.
"""
from numpy import array, arange, exp
from numpy.random import rand
from cogent.util.unit_test import TestCase, main
from cogent.maths.fit_function import fit_function
__author__ = "Antonio Gonzalez Pena"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Antonio Gonzalez Pena"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Antonio Gonzalez Pena"
__email__ = "antgonza@gmail.com"
__status__ = "Prototype"
class fit_function_test(TestCase):
"""Tests of top-level fit functions."""
def test_constant(self):
"""test constant approximation"""
# defining our fitting function
def f(x,a):
return a[0]
exp_params = [2]
x = arange(-1,1,.01)
y = f(x, exp_params)
y_noise = y + rand(len(x))
params = fit_function(x, y_noise, f, 1, 5)
self.assertFloatEqual(params, exp_params , .5)
def test_linear(self):
"""test linear approximation"""
# defining our fitting function
def f(x,a):
return (a[0]+x*a[1])
exp_params = [2, 10]
x = arange(-1,1,.01)
y = f(x, exp_params)
y_noise = y + rand(len(y))
params = fit_function(x, y_noise, f, 2, 5)
self.assertFloatEqual(params, exp_params , .5)
def test_exponential(self):
"""test exponential approximation"""
# defining our fitting function
def f(x,a):
return exp(a[0]+x*a[1])
exp_params = [2, 10]
x = arange(-1,1,.01)
y = f(x, exp_params)
y_noise = y + rand(len(y))
params = fit_function(x, y_noise, f, 2, 5)
self.assertFloatEqual(params, exp_params , .5)
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_maths/test_function_optimisation.py 000644 000765 000024 00000005524 12024702176 025621 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests for optimisation functions"""
from function_optimisation import great_deluge, ga_evolve, _simple_breed,\
_simple_score, _simple_init, _simple_select
from cogent.util.unit_test import TestCase, main
from numpy import array
__author__ = "Daniel McDonald"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Daniel McDonald"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Daniel McDonald"
__email__ = "mcdonadt@colorado.edu"
__status__ = "Production"
class OptimisationFunctionsTestCase(TestCase):
"""Tests for great_delue and ga_evolve"""
def test_great_deluge(self):
"""great_deluge should return expected values from foo() obj"""
class foo:
def __init__(self, x): self.x = x
def cost(self): return self.x
def perturb(self): return self.__class__(self.x - 1)
observed = [i for i in great_deluge(foo(5), max_total_iters=6)]
self.assertEqual(observed[0][1].x, 4)
self.assertEqual(observed[1][1].x, 3)
self.assertEqual(observed[2][1].x, 2)
self.assertEqual(observed[3][1].x, 1)
self.assertEqual(observed[4][1].x, 0)
self.assertEqual(observed[5][1].x, -1)
def test_ga_evolve(self):
"""ga_evolve should return expected values when using overloaded funcs"""
init_f = lambda x,y: [1,1,1]
score_f = lambda x,y: 5
breed_f = lambda w,x,y,z: [1,1,1]
select_f = lambda x,y: 2
expected = [(0, 2), (1, 2), (2, 2)]
observed = [i for i in ga_evolve(1, 2, 3, 0.5, score_f, breed_f, \
select_f, init_f, None, 3)]
self.assertEqual(observed, expected)
class PrivateFunctionsTestCase(TestCase):
"""Tests of the private support functions for ga_evolve"""
def test_simple_breed(self):
"""simple_breed should return expected values when with modded parent"""
f = lambda: 0.5
obj = lambda: 0
obj.mutate = lambda: 1
expected = [1,1,1,1,1]
observed = _simple_breed([0, obj], 5, 1.0, f)
self.assertEqual(observed, expected)
def test_simple_score(self):
"""simple_score should return choosen value with overloaded obj"""
bar = lambda: 5
bar.score = lambda x: x
self.assertEqual(_simple_score(bar,6), 6)
def test_simple_init(self):
"""simple_init should return a simple list"""
expected = [array([0]), array([0]), array([0])]
self.assertEqual(_simple_init(array([0]), 3), expected)
def test_simple_select(self):
"""simple_select should return our hand picked selection"""
pop = ['a','b','c','d','e']
scores = [5,3,8,6,1]
best_expected = (1, 'e')
self.assertEqual(_simple_select(pop, scores), best_expected)
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_maths/test_geometry.py 000644 000765 000024 00000011335 12024702176 023025 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests of the geometry package."""
from numpy import array, take, newaxis
from math import sqrt
from cogent.util.unit_test import TestCase, main
from cogent.maths.geometry import center_of_mass_one_array, \
center_of_mass_two_array, center_of_mass, distance, sphere_points
__author__ = "Sandra Smit"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Sandra Smit", "Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Sandra Smit"
__email__ = "sandra.smit@colorado.edu"
__status__ = "Production"
class CenterOfMassTests(TestCase):
"""Tests for the center of mass functions"""
def setUp(self):
"""setUp for all CenterOfMass tests"""
self.simple = array([[1, 1, 1], [3, 1, 1], [2, 3, 2]])
self.simple_list = [[1, 1, 1], [3, 1, 1], [2, 3, 2]]
self.more_weight = array([[1, 1, 3], [3, 1, 3], [2, 3, 50]])
self.square = array([[1, 1, 25], [3, 1, 25], [3, 3, 25], [1, 3, 25]])
self.square_odd = array([[1, 1, 25], [3, 1, 4], [3, 3, 25], [1, 3, 4]])
self.sec_weight = array([[1, 25, 1], [3, 25, 1], [3, 25, 3], [1, 25, 3]])
def test_center_of_mass_one_array(self):
"""center_of_mass_one_array should behave correctly"""
com1 = center_of_mass_one_array
self.assertEqual(com1(self.simple), array([2, 2]))
self.assertEqual(com1(self.simple_list), array([2, 2]))
self.assertFloatEqual(com1(self.more_weight), array([2, 2.785714]))
self.assertEqual(com1(self.square), array([2, 2]))
self.assertEqual(com1(self.square_odd), array([2, 2]))
self.assertEqual(com1(self.sec_weight, 1), array([2, 2]))
def test_CoM_one_array_wrong(self):
"""center_of_mass_one_array should fail on wrong input"""
com1 = center_of_mass_one_array
self.assertRaises(TypeError, com1, self.simple, 'a') #weight_idx wrong
self.assertRaises(IndexError, com1, self.simple, 100) #w_idx out of range
self.assertRaises(IndexError, com1, [1, 2, 3], 2) #shape[1] out of range
def test_center_of_mass_two_array(self):
"""center_of_mass_two_array should behave correctly"""
com2 = center_of_mass_two_array
coor = take(self.square_odd, (0, 1), 1)
weights = take(self.square_odd, (2,), 1)
self.assertEqual(com2(coor, weights), array([2, 2]))
weights = weights.ravel()
self.assertEqual(com2(coor, weights), array([2, 2]))
def test_CoM_two_array_wrong(self):
"""center_of_mass_two_array should fail on wrong input"""
com2 = center_of_mass_two_array
weights = [1, 2]
self.assertRaises(TypeError, com2, self.simple, 'a') #weight_idx wrong
self.assertRaises(ValueError, com2, self.simple, weights) #not aligned
def test_center_of_mass(self):
"""center_of_mass should make right choice between functional methods
"""
com = center_of_mass
com1 = center_of_mass_one_array
com2 = center_of_mass_two_array
self.assertEqual(com(self.simple), com1(self.simple))
self.assertFloatEqual(com(self.more_weight), com1(self.more_weight))
self.assertEqual(com(self.sec_weight, 1), com1(self.sec_weight, 1))
coor = take(self.square_odd, (0, 1), 1)
weights = take(self.square_odd, (2,), 1)
self.assertEqual(com(coor, weights), com2(coor, weights))
weights = weights.ravel()
self.assertEqual(com(coor, weights), com2(coor, weights))
def test_distance(self):
"""distance should return Euclidean distance correctly."""
#for single dimension, should return difference
a1 = array([3])
a2 = array([-1])
self.assertEqual(distance(a1, a2), 4)
#for two dimensions, should work e.g. for 3, 4, 5 triangle
a1 = array([0, 0])
a2 = array([3, 4])
self.assertEqual(distance(a1, a2), 5)
#vector should be the same as itself for any dimensions
a1 = array([1.3, 23, 5.4, 2.6, -1.2])
self.assertEqual(distance(a1, a1), 0)
#should match hand-calculated case for an array
a1 = array([[1, -2], [3, 4]])
a2 = array([[1, 0], [-1, 2.5]])
self.assertEqual(distance(a1, a1), 0)
self.assertEqual(distance(a2, a2), 0)
self.assertEqual(distance(a1, a2), distance(a2, a1))
self.assertFloatEqual(distance(a1, a2), sqrt(22.25))
def test_sphere_points(self):
"""tests sphere points"""
self.assertEquals(sphere_points(1), array([[ 1., 0., 0.]]))
# def test_coords_to_symmetry(self):
# """tests symmetry expansion (TODO)"""
# pass
#
# def test_coords_to_crystal(self):
# """tests crystal expansion (TODO)"""
# pass
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_maths/test_matrix/ 000755 000765 000024 00000000000 12024703635 022122 5 ustar 00jrideout staff 000000 000000 PyCogent-1.5.3/tests/test_maths/test_matrix_logarithm.py 000644 000765 000024 00000003677 12024702176 024556 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for matrix logarithm."""
from numpy import array
from cogent.util.unit_test import TestCase, main
from cogent.maths.matrix_logarithm import logm, logm_taylor
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Gavin Huttley"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class logarithm_tests(TestCase):
"""Tests of top-level matrix logarithm functions."""
def test_logm(self):
"""logm results should match scipy's"""
p = array([[ 0.86758487, 0.05575623, 0.0196798 , 0.0569791 ],
[ 0.01827347, 0.93312148, 0.02109664, 0.02750842],
[ 0.04782582, 0.1375742 , 0.80046869, 0.01413129],
[ 0.23022035, 0.22306947, 0.06995306, 0.47675713]])
q = logm(p)
self.assertFloatEqual(q, \
array([[-0.15572053, 0.04947485, 0.01918653, 0.08705915],
[ 0.01405019, -0.07652296, 0.02252941, 0.03994336],
[ 0.05365208, 0.15569116, -0.22588966, 0.01654642],
[ 0.35144866, 0.31279003, 0.10478999, -0.76902868]]))
def test_logm_taylor(self):
"""logm_taylor should return same result as logm"""
q_eig = logm([[ 0.86758487, 0.05575623, 0.0196798 , 0.0569791 ],
[ 0.01827347, 0.93312148, 0.02109664, 0.02750842],
[ 0.04782582, 0.1375742 , 0.80046869, 0.01413129],
[ 0.23022035, 0.22306947, 0.06995306, 0.47675713]])
q_taylor = logm_taylor([[0.86758487, 0.05575623, 0.0196798, 0.0569791],
[ 0.01827347, 0.93312148, 0.02109664, 0.02750842],
[ 0.04782582, 0.1375742 , 0.80046869, 0.01413129],
[ 0.23022035, 0.22306947, 0.06995306, 0.47675713]])
self.assertFloatEqual(q_taylor, q_eig)
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_maths/test_optimisers.py 000644 000765 000024 00000006436 12024702176 023376 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from __future__ import division
import time, sys, os, numpy
from cogent.util.unit_test import TestCase, main
from cogent.maths.optimisers import maximise, MaximumEvaluationsReached
__author__ = "Peter Maxwell and Gavin Huttley"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Peter Maxwell", "Gavin Huttley"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Production"
def quartic(x):
# Has global maximum at -4 and local maximum at 2
# http://www.wolframalpha.com/input/?i=x**2*%283*x**2%2B8*x-48%29
# Scaled down 10-fold to avoid having to change init_temp
return x**2*(3*x**2+8*x-48)
class NullFile(object):
def write(self, x):
pass
def isatty(self):
return False
def quiet(f, *args, **kw):
# Checkpointer still has print statements
orig = sys.stdout
try:
sys.stdout = NullFile()
result = f(*args, **kw)
finally:
sys.stdout = orig
return result
def MakeF():
evals = [0]
last = [0]
def f(x):
evals[0] += 1
last[0] = x
# Scaled down 10-fold to avoid having to change init_temp
return -0.1 * quartic(x)
return f, last, evals
class OptimiserTestCase(TestCase):
def _test_optimisation(self, target=-4, xinit=1.0, bounds=([-10,10]), **kw):
local = kw.get('local', None)
max_evaluations = kw.get('max_evaluations', None)
f, last, evals = MakeF()
x = quiet(maximise, f, [xinit], bounds, **kw)
self.assertEqual(x, last[0]) # important for Calculator
error = abs(x[0] - target)
self.assertTrue(error < .0001, (kw, x, target, x))
def test_global(self):
# Should find global minimum
self._test_optimisation(local=False, seed=1)
def test_bounded(self):
# Global minimum out of bounds, so find secondary one
# numpy.seterr('raise')
self._test_optimisation(bounds=([0.0],[10.0]), target=2, seed=1)
def test_local(self):
# Global minimum not the nearest one
self._test_optimisation(local=True, target=2)
def test_limited(self):
self.assertRaises(MaximumEvaluationsReached,
self._test_optimisation, max_evaluations=5)
# def test_limited_warning(self):
# """optimiser warning if max_evaluations exceeded"""
# self._test_optimisation(max_evaluations=5, limit_action='warn')
def test_get_max_eval_count(self):
"""return the evaluation count from optimisation"""
f, last, evals = MakeF()
x, e = quiet(maximise, f, xinit=[1.0], bounds=([-10,10]),
return_eval_count=True)
self.assertTrue(e > 500)
def test_checkpointing(self):
filename = 'checkpoint.tmp.pickle'
if os.path.exists(filename):
os.remove(filename)
self._test_optimisation(filename=filename, seed=1, init_temp=10)
self._test_optimisation(filename=filename, seed=1, init_temp=10)
self.assertRaises(Exception, self._test_optimisation,
filename=filename, seed=1, init_temp=3.21)
if os.path.exists(filename):
os.remove(filename)
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_maths/test_period.py 000644 000765 000024 00000021456 12024702176 022461 0 ustar 00jrideout staff 000000 000000 from numpy import arange, convolve, random, sin, pi, exp, array, zeros, float64
from cogent.util.unit_test import TestCase, main
from cogent.maths.period import ipdft, dft, auto_corr, hybrid, goertzel
# because we'll be comparing python and pyrexed implementations of the same
# algorithms I'm separating out those imports to make it clear
from cogent.maths.period import _ipdft_inner2 as py_ipdft_inner, \
_goertzel_inner as py_goertzel_inner, _autocorr_inner2 as py_autocorr_inner
try:
from cogent.maths._period import ipdft_inner as pyx_ipdft_inner, \
goertzel_inner as pyx_goertzel_inner, \
autocorr_inner as pyx_autocorr_inner
pyrex_available = True
except ImportError:
pyrex_available = False
__author__ = "Hua Ying, Julien Epps and Gavin Huttley"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Julien Epps", "Hua Ying", "Gavin Huttley"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "Gavin.Huttley@anu.edu.au"
__status__ = "Production"
class TestPeriod(TestCase):
def setUp(self):
t = arange(0, 10, 0.1)
n = random.randn(len(t))
nse = convolve(n, exp(-t/0.05))*0.1
nse = nse[:len(t)]
self.sig = sin(2*pi*t) + nse
self.p = 10
def test_inner_funcs(self):
"""python and pyrexed implementation should be the same"""
if pyrex_available is not True:
return
x = array([0.04874203, 0.56831373, 0.94267804, 0.95664485, 0.60719478,
-0.09037356, -0.69897319, -1.11239811, -0.84127485, -0.56281126,
0.02301213, 0.56250284, 1.0258557 , 1.03906527, 0.69885916,
0.10103556, -0.43248024, -1.03160503, -0.84901545, -0.84934356,
0.00323728, 0.44344594, 0.97736748, 1.01635433, 0.38538423,
0.09869918, -0.60441861, -0.90175391, -1.00166887, -0.66303249,
-0.02070569, 0.76520328, 0.93462426, 0.97011673, 0.63199999,
0.0764678 , -0.55680168, -0.92028808, -0.98481451, -0.57600588,
0.0482667 , 0.57572519, 1.02077883, 0.93271663, 0.41581696,
-0.07639671, -0.71426286, -0.97730119, -1.0370596 , -0.67919572,
0.03779302, 0.60408759, 0.87826068, 0.79126442, 0.69769622,
0.01419442, -0.42917556, -1.00100485, -0.83945546, -0.55746313,
0.12730859, 0.60057659, 0.98059721, 0.83275501, 0.69031804,
0.02277554, -0.63982729, -1.23680355, -0.79477887, -0.67773375,
-0.05204714, 0.51765381, 0.77691955, 0.8996709 , 0.5153137 ,
0.01840839, -0.65124866, -1.13269058, -0.92342177, -0.45673709,
0.11212881, 0.50153941, 1.09329507, 0.96457193, 0.80271578,
-0.0041043 , -0.81750772, -0.99259986, -0.92343788, -0.57694955,
0.13982059, 0.56653375, 0.82217563, 0.85162513, 0.3984116 ,
-0.18937514, -0.65304629, -1.0067146 , -1.0037422 , -0.68011283])
N = 100
period = 10
self.assertFloatEqual(py_goertzel_inner(x, N, period),
pyx_goertzel_inner(x, N, period))
ulim = 8
N = 8
x = array([ 0., 1., 0., -1., 0., 1., 0., -1.])
X = zeros(8, dtype='complex128')
W = array([1.00000000e+00 +2.44929360e-16j,
-1.00000000e+00 -1.22464680e-16j,
-5.00000000e-01 -8.66025404e-01j,
6.12323400e-17 -1.00000000e+00j,
3.09016994e-01 -9.51056516e-01j,
5.00000000e-01 -8.66025404e-01j,
6.23489802e-01 -7.81831482e-01j,
7.07106781e-01 -7.07106781e-01j])
py_result = py_ipdft_inner(x, X, W, ulim, N)
pyx_result = pyx_ipdft_inner(x, X, W, ulim, N)
for i, j in zip(py_result, pyx_result):
self.assertFloatEqual(abs(i), abs(j))
x = array([-0.07827614, 0.56637551, 1.01320526, 1.01536245, 0.63548361,
0.08560101, -0.46094955, -0.78065656, -0.8893556 , -0.56514145,
0.02325272, 0.63660719, 0.86291302, 0.82953598, 0.5706848 ,
0.11655242, -0.6472655 , -0.86178218, -0.96495057, -0.76098445,
-0.18911517, 0.59280646, 1.00248693, 0.89241423, 0.52475111,
-0.01620599, -0.60199278, -0.98279829, -1.12469771, -0.61355799,
0.04321191, 0.52784788, 0.68508784, 0.86015123, 0.66825756,
-0.0802846 , -0.63626753, -0.93023345, -0.99129547, -0.46891033,
0.04145813, 0.71226518, 1.01499246, 0.94726778, 0.63598143,
-0.21920589, -0.48071702, -0.86041579, -0.9046141 , -0.55714746,
-0.10052384, 0.69708969, 1.02575789, 1.16524031, 0.49895282,
-0.13068573, -0.45770419, -0.86155787, -0.9230734 , -0.6590525 ,
-0.05072955, 0.52380317, 1.02674335, 0.87778499, 0.4303284 ,
-0.01855665, -0.62858193, -0.93954774, -0.94257301, -0.49692951,
0.00699347, 0.69049074, 0.93906549, 1.06339809, 0.69337543,
0.00252569, -0.57825881, -0.88460603, -0.99259672, -0.73535697,
0.12064751, 0.91159174, 0.88966993, 1.02159917, 0.43479926,
-0.06159005, -0.61782651, -0.95284676, -0.8218889 , -0.52166419,
0.021961 , 0.52268762, 0.79428288, 1.01642697, 0.49060377,
-0.02183994, -0.52743836, -0.99363909, -1.02963821, -0.64249996])
py_xc = zeros(2*len(x)-1, dtype=float64)
pyx_xc = py_xc.copy()
N = 100
py_autocorr_inner(x, py_xc, N)
pyx_autocorr_inner(x, pyx_xc, N)
for i, j in zip(py_xc, pyx_xc):
self.assertFloatEqual(i, j)
def test_autocorr(self):
"""correctly compute autocorrelation"""
s = [1,1,1,1]
X, periods = auto_corr(s, llim=-3, ulim=None)
exp_X = array([1,2,3,4,3,2,1], dtype=float)
self.assertEqual(X, exp_X)
auto_x, auto_periods = auto_corr(self.sig, llim=2, ulim=50)
max_idx = list(auto_x).index(max(auto_x))
auto_p = auto_periods[max_idx]
self.assertEqual(auto_p, self.p)
def test_dft(self):
"""correctly compute discrete fourier transform"""
dft_x, dft_periods = dft(self.sig)
dft_x = abs(dft_x)
max_idx = list(dft_x).index(max(dft_x))
dft_p = dft_periods[max_idx]
self.assertEqual(int(dft_p), self.p)
def test_ipdft(self):
"""correctly compute integer discrete fourier transform"""
s = [0, 1, 0, -1, 0, 1, 0, -1]
X, periods = ipdft(s, llim=1, ulim=len(s))
exp_X = abs(array([0, 0, -1.5+0.866j, -4j, 2.927-0.951j, 1.5+0.866j,
0.302+0.627j, 0]))
X = abs(X)
self.assertFloatEqual(X, exp_X, eps=1e-3)
ipdft_x, ipdft_periods = ipdft(self.sig, llim=2, ulim=50)
ipdft_x = abs(ipdft_x)
max_idx = list(ipdft_x).index(max(ipdft_x))
ipdft_p = ipdft_periods[max_idx]
self.assertEqual(ipdft_p, self.p)
def test_goertzel(self):
"""goertzel and ipdft should be the same"""
ipdft_pwr, ipdft_prd = ipdft(self.sig, llim=10, ulim=10)
self.assertFloatEqual(goertzel(self.sig, 10), ipdft_pwr)
def test_hybrid(self):
"""correctly compute hybrid statistic"""
hybrid_x, hybrid_periods = hybrid(self.sig, llim=None, ulim=50)
hybrid_x = abs(hybrid_x)
max_idx = list(hybrid_x).index(max(hybrid_x))
hybrid_p = hybrid_periods[max_idx]
self.assertEqual(hybrid_p, self.p)
def test_hybrid_returns_all(self):
"""correctly returns hybrid, ipdft and autocorr statistics"""
ipdft_pwr, ipdft_prd = ipdft(self.sig, llim=2, ulim=50)
auto_x, auto_periods = auto_corr(self.sig, llim=2, ulim=50)
hybrid_x, hybrid_periods = hybrid(self.sig, llim=None, ulim=50)
hybrid_ipdft_autocorr_stats, hybrid_periods = hybrid(self.sig,
llim=None, ulim=50, return_all=True)
self.assertEqual(hybrid_ipdft_autocorr_stats[0], hybrid_x)
self.assertEqual(hybrid_ipdft_autocorr_stats[1], ipdft_pwr)
self.assertEqual(hybrid_ipdft_autocorr_stats[2], auto_x)
ipdft_pwr, ipdft_prd = ipdft(self.sig, llim=10, ulim=10)
auto_x, auto_periods = auto_corr(self.sig, llim=10, ulim=10)
hybrid_x, hybrid_periods = hybrid(self.sig, llim=10, ulim=10)
hybrid_ipdft_autocorr_stats, hybrid_periods = hybrid(self.sig,
llim=10, ulim=10, return_all=True)
self.assertEqual(hybrid_ipdft_autocorr_stats[0], hybrid_x)
self.assertEqual(hybrid_ipdft_autocorr_stats[1], ipdft_pwr)
self.assertEqual(hybrid_ipdft_autocorr_stats[2], auto_x)
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_maths/test_spatial/ 000755 000765 000024 00000000000 12024703635 022253 5 ustar 00jrideout staff 000000 000000 PyCogent-1.5.3/tests/test_maths/test_stats/ 000755 000765 000024 00000000000 12024703635 021754 5 ustar 00jrideout staff 000000 000000 PyCogent-1.5.3/tests/test_maths/test_svd.py 000644 000765 000024 00000007612 12024702176 021771 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for the svd-supporting functionality."""
from cogent.util.unit_test import TestCase, main
from cogent.maths.svd import ratio_two_best, ratio_best_to_sum, \
euclidean_distance, euclidean_norm, _dists_from_mean_slow, \
dists_from_v, weiss, three_item_combos, two_item_combos
from numpy import array, sqrt
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Rob Knight", "Daniel McDonald"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class functionTests(TestCase):
"""Tests of top-level functions."""
def test_ratio_two_best(self):
"""ratio_two_best should return ratio of two biggest items in list"""
v = array([3, 2, 5, 2, 4, 10, 3])
self.assertEqual(ratio_two_best(v), 2)
#should return 1 if items the same
v = array([2,2,2,2,2])
self.assertEqual(ratio_two_best(v), 1)
#check that it works on floating-point
v = array([3,2,1])
self.assertEqual(ratio_two_best(v), 1.5)
def test_ratio_best_to_sum(self):
"""ratio_best_to_sum should return ratio of biggest item to sum"""
v = [3, 2, 5, 2, 4, 10, 3]
self.assertFloatEqual(ratio_best_to_sum(v), 10/29.0)
v = [2,2,2,2,2]
self.assertEqual(ratio_best_to_sum(v), 2/10.0)
#check that it works on floating-point
v = [3,2,1]
self.assertEqual(ratio_best_to_sum(v), 0.5)
def test_euclidean_distance(self):
"""euclidean_distance should return distance between two points"""
first = array([2, 3, 4])
second = array([4, 8, 10])
self.assertEqual(euclidean_distance(first, first), 0)
self.assertEqual(euclidean_distance(second, second), 0)
self.assertFloatEqual(euclidean_distance(first, second), sqrt(65))
self.assertFloatEqual(euclidean_distance(second, first), sqrt(65))
def test_euclidean_norm(self):
"""euclidean_norm should match hand-calculated results"""
first = array([3,4])
self.assertEqual(euclidean_norm(first), 5)
def test_dists_from_mean_slow(self):
"""_dists_from_mean_slow should return distance of each item from mean"""
m = [[1,2,3,4],[2,3,4,5],[0,1,2,3]]
self.assertEqual(_dists_from_mean_slow(m), array([0.0,2.0,2.0]))
def test_dists_from_v(self):
"""dists_from_v should return distance of each item from v, or mean"""
m = [[1,2,3,4],[2,3,4,5],[0,1,2,3]]
#should calculate distances from mean by default
self.assertEqual(dists_from_v(m), array([0.0,2.0,2.0]))
#should caculate distances from vector if supplied
v = array([2,2,2,3])
self.assertEqual(dists_from_v(m, v), sqrt(array([3,9,5])))
def test_weiss(self):
"""weiss should perform weiss calculation correctly"""
e = array([12.0, 5.0, 0.1, 1e-3, 1e-15])
self.assertFloatEqual(weiss(e), 4.453018506827001)
def test_three_item_combos(self):
"""three_item_combos should return items in correct order"""
items = list(three_item_combos('abcde'))
self.assertEqual(items, map(tuple, \
['abc','abd','abe','acd','ace','ade','bcd','bce','bde','cde']))
def test_two_item_combos(self):
"""two_item_combos should return items in correct order"""
items = list(two_item_combos('abcd'))
self.assertEqual(items, map(tuple, ['ab','ac','ad','bc','bd','cd']))
def test_pca_qs(self):
"""pca_qs not tested b/c it just wraps eigenvalues(corrcoef(qs))"""
pass
def test_pca_cov_qs(self):
"""pca_cov_qs not tested b/c it just wraps eigenvalues(cov(qs))"""
pass
def test_svd_qs(self):
"""svd_qs not tested b/c it just wraps singular_value_decompositon(qs)"""
pass
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_maths/test_unifrac/ 000755 000765 000024 00000000000 12024703635 022245 5 ustar 00jrideout staff 000000 000000 PyCogent-1.5.3/tests/test_maths/test_unifrac/__init__.py 000644 000765 000024 00000000556 12024702176 024363 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
__all__ = ['test_fast_tree',
'test_fast_unifrac'
]
__author__ = ""
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Micah Hamady"]
__license__ = "All rights reserveed"
__version__ = "1.5.3"
__maintainer__ = "Micah Hamady"
__email__ = "hamady@colorado.edu"
__status__ = "Prototype"
PyCogent-1.5.3/tests/test_maths/test_unifrac/test_fast_tree.py 000644 000765 000024 00000063502 12024702176 025637 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for fast tree."""
from cogent.util.unit_test import TestCase, main
from cogent.parse.tree import DndParser
from cogent.maths.unifrac.fast_tree import (count_envs, sum_env_dict,
index_envs, get_branch_lengths, index_tree, bind_to_array,
bind_to_parent_array, _is_parent_empty, delete_empty_parents,
traverse_reduce, bool_descendants, sum_descendants, fitch_descendants,
tip_distances, UniFracTreeNode, FitchCounter, FitchCounterDense,
permute_selected_rows, prep_items_for_jackknife, jackknife_bool,
jackknife_int, unifrac, unnormalized_unifrac, PD, G, unnormalized_G,
unifrac_matrix, unifrac_vector, PD_vector, weighted_unifrac,
weighted_unifrac_matrix, weighted_unifrac_vector, jackknife_array,
env_unique_fraction, unifrac_one_sample, weighted_one_sample)
from numpy import (arange, reshape, zeros, logical_or, array, sum, nonzero,
flatnonzero, newaxis)
from numpy.random import permutation
__author__ = "Rob Knight and Micah Hamady"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Micah Hamady"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight, Micah Hamady"
__email__ = "rob@spot.colorado.edu, hamady@colorado.edu"
__status__ = "Prototype"
class fast_tree_tests(TestCase):
"""Tests of top-level functions"""
def setUp(self):
"""Define a couple of standard trees"""
self.t1 = DndParser('(((a,b),c),(d,e))', UniFracTreeNode)
self.t2 = DndParser('(((a,b),(c,d)),(e,f))', UniFracTreeNode)
self.t3 = DndParser('(((a,b,c),(d)),(e,f))', UniFracTreeNode)
self.t4 = DndParser('((c)b,((f,g,h)e,i)d)', UniFracTreeNode)
self.t4.Name = 'a'
self.t_str = '((a:1,b:2):4,(c:3,(d:1,e:1):2):3)'
self.t = DndParser(self.t_str, UniFracTreeNode)
self.env_str = """
a A 1
a C 2
b A 1
b B 1
c B 1
d B 3
e C 1"""
self.env_counts = count_envs(self.env_str.splitlines())
self.node_index, self.nodes = index_tree(self.t)
self.count_array, self.unique_envs, self.env_to_index, \
self.node_to_index = index_envs(self.env_counts, self.node_index)
self.branch_lengths = get_branch_lengths(self.node_index)
self.old_t_str = '((org1:0.11,org2:0.22,(org3:0.12,org4:0.23)g:0.33)b:0.2,(org5:0.44,org6:0.55)c:0.3,org7:0.4)'
self.old_t = DndParser(self.old_t_str, UniFracTreeNode)
self.old_env_str = """
org1 env1 1
org1 env2 1
org2 env2 1
org3 env2 1
org4 env3 1
org5 env1 1
org6 env1 1
org7 env3 1
"""
self.old_env_counts = count_envs(self.old_env_str.splitlines())
self.old_node_index, self.old_nodes = index_tree(self.old_t)
self.old_count_array, self.old_unique_envs, self.old_env_to_index, \
self.old_node_to_index = index_envs(self.old_env_counts, self.old_node_index)
self.old_branch_lengths = get_branch_lengths(self.old_node_index)
def test_traverse(self):
"""traverse should work iterative or recursive"""
stti = self.t4.traverse
stt = self.t4.traverse_recursive
obs = [i.Name for i in stt(self_before=False, self_after=False)]
exp = [i.Name for i in stti(self_before=False, self_after=False)]
self.assertEqual(obs, exp)
obs = [i.Name for i in stt(self_before=True, self_after=False)]
exp = [i.Name for i in stti(self_before=True, self_after=False)]
self.assertEqual(obs, exp)
obs = [i.Name for i in stt(self_before=False, self_after=True)]
exp = [i.Name for i in stti(self_before=False, self_after=True)]
self.assertEqual(obs, exp)
obs = [i.Name for i in stt(self_before=True, self_after=True)]
exp = [i.Name for i in stti(self_before=True, self_after=True)]
self.assertEqual(obs, exp)
def test_count_envs(self):
"""count_envs should return correct counts from lines"""
envs = """
a A 3 some other junk
a B
a C 1
b A 2
skip
c B
d
b A 99
"""
result = count_envs(envs.splitlines())
self.assertEqual(result, \
{'a':{'A':3,'B':1,'C':1},'b':{'A':99},'c':{'B':1}})
def test_sum_env_dict(self):
"""sum_env_dict should return correct counts from env_dict"""
envs = """
a A 3 some other junk
a B
a C 1
b A 2
skip
c B
d
b A 99
"""
result = count_envs(envs.splitlines())
sum_ = sum_env_dict(result)
self.assertEqual(sum_, 105)
def test_index_envs(self):
"""index_envs should map envs and taxa onto indices"""
self.assertEqual(self.unique_envs, ['A','B','C'])
self.assertEqual(self.env_to_index, {'A':0, 'B':1, 'C':2})
self.assertEqual(self.node_to_index,{'a':0, 'b':1, 'c':4, 'd':2, 'e':3})
self.assertEqual(self.count_array, \
array([[1,0,2],[1,1,0],[0,3,0],[0,0,1], \
[0,1,0],[0,0,0],[0,0,0],[0,0,0],[0,0,0]]))
def test_get_branch_lengths(self):
"""get_branch_lengths should make array of branch lengths from index"""
result = get_branch_lengths(self.node_index)
self.assertEqual(result, array([1,2,1,1,3,2,4,3,0]))
def test_env_unique_fraction(self):
"""should report unique fraction of bl in each env """
# testing old unique fraction
cur_count_array = self.count_array.copy()
bound_indices = bind_to_array(self.nodes, cur_count_array)
total_bl = sum(self.branch_lengths)
bool_descendants(bound_indices)
env_bl_sums, env_bl_ufracs = env_unique_fraction(self.branch_lengths, cur_count_array)
# env A has 0 unique bl, B has 4, C has 1
self.assertEqual(env_bl_sums, [0,4,1])
self.assertEqual(env_bl_ufracs, [0,4/17.0,1/17.0])
cur_count_array = self.old_count_array.copy()
bound_indices = bind_to_array(self.old_nodes, cur_count_array)
total_bl = sum(self.old_branch_lengths)
bool_descendants(bound_indices)
env_bl_sums, env_bl_ufracs = env_unique_fraction(self.old_branch_lengths, cur_count_array)
# env A has 0 unique bl, B has 4, C has 1
self.assertEqual(env_bl_sums, env_bl_sums)
self.assertEqual(env_bl_sums, [1.29, 0.33999999999999997, 0.63])
self.assertEqual(env_bl_ufracs, [1.29/2.9,0.33999999999999997/2.9, 0.63/2.9])
def test_index_tree(self):
"""index_tree should produce correct index and node map"""
#test for first tree: contains singleton outgroup
t1 = self.t1
id_1, child_1 = index_tree(t1)
nodes_1 = [n._leaf_index for n in t1.traverse(self_before=False, \
self_after=True)]
self.assertEqual(nodes_1, [0,1,2,3,6,4,5,7,8])
self.assertEqual(child_1, [(2,0,1),(6,2,3),(7,4,5),(8,6,7)])
#test for second tree: strictly bifurcating
t2 = self.t2
id_2, child_2 = index_tree(t2)
nodes_2 = [n._leaf_index for n in t2.traverse(self_before=False, \
self_after=True)]
self.assertEqual(nodes_2, [0,1,4,2,3,5,8,6,7,9,10])
self.assertEqual(child_2, [(4,0,1),(5,2,3),(8,4,5),(9,6,7),(10,8,9)])
#test for third tree: contains trifurcation and single-child parent
t3 = self.t3
id_3, child_3 = index_tree(t3)
nodes_3 = [n._leaf_index for n in t3.traverse(self_before=False, \
self_after=True)]
self.assertEqual(nodes_3, [0,1,2,4,3,5,8,6,7,9,10])
self.assertEqual(child_3, [(4,0,2),(5,3,3),(8,4,5),(9,6,7),(10,8,9)])
def test_bind_to_array(self):
"""bind_to_array should return correct array ranges"""
a = reshape(arange(33), (11,3))
id_, child = index_tree(self.t3)
bindings = bind_to_array(child, a)
self.assertEqual(len(bindings), 5)
self.assertEqual(bindings[0][0], a[4])
self.assertEqual(bindings[0][1], a[0:3])
self.assertEqual(bindings[0][1].shape, (3,3))
self.assertEqual(bindings[1][0], a[5])
self.assertEqual(bindings[1][1], a[3:4])
self.assertEqual(bindings[1][1].shape, (1,3))
self.assertEqual(bindings[2][0], a[8])
self.assertEqual(bindings[2][1], a[4:6])
self.assertEqual(bindings[2][1].shape, (2,3))
self.assertEqual(bindings[3][0], a[9])
self.assertEqual(bindings[3][1], a[6:8])
self.assertEqual(bindings[3][1].shape, (2,3))
self.assertEqual(bindings[4][0], a[10])
self.assertEqual(bindings[4][1], a[8:10])
self.assertEqual(bindings[4][1].shape, (2,3))
def test_bind_to_parent_array(self):
"""bind_to_parent_array should bind tree to array correctly"""
a = reshape(arange(33), (11,3))
index_tree(self.t3)
bindings = bind_to_parent_array(self.t3, a)
self.assertEqual(len(bindings), 10)
self.assertEqual(bindings[0][0], a[8])
self.assertEqual(bindings[0][1], a[10])
self.assertEqual(bindings[1][0], a[4])
self.assertEqual(bindings[1][1], a[8])
self.assertEqual(bindings[2][0], a[0])
self.assertEqual(bindings[2][1], a[4])
self.assertEqual(bindings[3][0], a[1])
self.assertEqual(bindings[3][1], a[4])
self.assertEqual(bindings[4][0], a[2])
self.assertEqual(bindings[4][1], a[4])
self.assertEqual(bindings[5][0], a[5])
self.assertEqual(bindings[5][1], a[8])
self.assertEqual(bindings[6][0], a[3])
self.assertEqual(bindings[6][1], a[5])
self.assertEqual(bindings[7][0], a[9])
self.assertEqual(bindings[7][1], a[10])
self.assertEqual(bindings[8][0], a[6])
self.assertEqual(bindings[8][1], a[9])
self.assertEqual(bindings[9][0], a[7])
self.assertEqual(bindings[9][1], a[9])
def test_delete_empty_parents(self):
"""delete_empty_parents should remove empty parents from bound indices"""
id_to_node, node_first_last = index_tree(self.t)
bound_indices = bind_to_array(node_first_last, self.count_array[:,0:1])
bool_descendants(bound_indices)
self.assertEqual(len(bound_indices), 4)
deleted = delete_empty_parents(bound_indices)
self.assertEqual(len(deleted), 2)
for d in deleted:
self.assertEqual(d[0][0], 1)
def test_traverse_reduce(self):
"""traverse_reduce should reduce array in traversal order."""
id_, child = index_tree(self.t3)
a = zeros((11,3)) + 99 #fill with junk
bindings = bind_to_array(child, a)
#load in leaf envs
a[0] = a[1] = a[2] = a[7] = [0,1,0]
a[3] = [1,0,0]
a[6] = [0,0,1]
f = logical_or.reduce
traverse_reduce(bindings, f)
self.assertEqual(a,\
array([[0,1,0],[0,1,0],[0,1,0],[1,0,0],[0,1,0],[1,0,0],\
[0,0,1],[0,1,0],[1,1,0],[0,1,1],[1,1,1]])
)
f = sum
traverse_reduce(bindings, f)
self.assertEqual( a, \
array([[0,1,0],[0,1,0],[0,1,0],[1,0,0],[0,3,0],[1,0,0],\
[0,0,1],[0,1,0],[1,3,0],[0,1,1],[1,4,1]])
)
def test_bool_descendants(self):
"""bool_descendants should be true if any descendant true"""
#self.t3 = DndParser('(((a,b,c),(d)),(e,f))', UniFracTreeNode)
id_, child = index_tree(self.t3)
a = zeros((11,3)) + 99 #fill with junk
bindings = bind_to_array(child, a)
#load in leaf envs
a[0] = a[1] = a[2] = a[7] = [0,1,0]
a[3] = [1,0,0]
a[6] = [0,0,1]
bool_descendants(bindings)
self.assertEqual(a, \
array([[0,1,0],[0,1,0],[0,1,0],[1,0,0],[0,1,0],[1,0,0],\
[0,0,1],[0,1,0],[1,1,0],[0,1,1],[1,1,1]])
)
def test_sum_descendants(self):
"""sum_descendants should sum total descendants w/ each state"""
id_, child = index_tree(self.t3)
a = zeros((11,3)) + 99 #fill with junk
bindings = bind_to_array(child, a)
#load in leaf envs
a[0] = a[1] = a[2] = a[7] = [0,1,0]
a[3] = [1,0,0]
a[6] = [0,0,1]
sum_descendants(bindings)
self.assertEqual(a, \
array([[0,1,0],[0,1,0],[0,1,0],[1,0,0],[0,3,0],[1,0,0],\
[0,0,1],[0,1,0],[1,3,0],[0,1,1],[1,4,1]])
)
def test_fitch_descendants(self):
"""fitch_descendants should assign states by fitch parsimony, ret. #"""
id_, child = index_tree(self.t3)
a = zeros((11,3)) + 99 #fill with junk
bindings = bind_to_array(child, a)
#load in leaf envs
a[0] = a[1] = a[2] = a[7] = [0,1,0]
a[3] = [1,0,0]
a[6] = [0,0,1]
changes = fitch_descendants(bindings)
self.assertEqual(changes, 2)
self.assertEqual(a, \
array([[0,1,0],[0,1,0],[0,1,0],[1,0,0],[0,1,0],[1,0,0],\
[0,0,1],[0,1,0],[1,1,0],[0,1,1],[0,1,0]])
)
def test_fitch_descendants_missing_data(self):
"""fitch_descendants should work with missing data"""
#tree and envs for testing missing values
t_str = '(((a:1,b:2):4,(c:3,d:1):2):1,(e:2,f:1):3);'
env_str = """a A
b B
c D
d C
e C
f D"""
t = DndParser(t_str, UniFracTreeNode)
node_index, nodes = index_tree(t)
env_counts = count_envs(env_str.split('\n'))
count_array, unique_envs, env_to_index, node_to_index = \
index_envs(env_counts, node_index)
branch_lengths = get_branch_lengths(node_index)
#test just the AB pair
ab_counts = count_array[:, 0:2]
bindings = bind_to_array(nodes, ab_counts)
changes = fitch_descendants(bindings, counter=FitchCounter)
self.assertEqual(changes, 1)
orig_result = ab_counts.copy()
#check that the original Fitch counter gives the expected
#incorrect parsimony result
changes = fitch_descendants(bindings, counter=FitchCounterDense)
self.assertEqual(changes, 5)
new_result = ab_counts.copy()
#check that the two versions fill the array with the same values
self.assertEqual(orig_result, new_result)
def test_tip_distances(self):
"""tip_distances should set tips to correct distances."""
t = self.t
bl = self.branch_lengths.copy()[:,newaxis]
bindings = bind_to_parent_array(t, bl)
tips = []
for n in t.traverse(self_before=False, self_after=True):
if not n.Children:
tips.append(n._leaf_index)
tip_distances(bl, bindings, tips)
self.assertEqual(bl, array([5,6,6,6,6,0,0,0,0])[:,newaxis])
def test_permute_selected_rows(self):
"""permute_selected_rows should switch just the selected rows in a"""
orig = reshape(arange(8),(4,2))
new = orig.copy()
fake_permutation = lambda a: range(a)[::-1] #reverse order
permute_selected_rows([0,2], orig, new, fake_permutation)
self.assertEqual(new, array([[4,5],[2,3],[0,1],[6,7]]))
#make sure we didn't change orig
self.assertEqual(orig, reshape(arange(8), (4,2)))
def test_prep_items_for_jackknife(self):
"""prep_items_for_jackknife should expand indices of repeated counts"""
a = array([0,1,0,1,2,0,3])
# 0 1 2 3 4 5 6
result = prep_items_for_jackknife(a)
exp = array([1,3,4,4,6,6,6])
self.assertEqual(result, exp)
def test_jackknife_bool(self):
"""jackknife_bool should make a vector with right number of nonzeros"""
fake_permutation = lambda a: range(a)[::-1] #reverse order
orig_vec = array([0,0,1,0,1,1,0,1,1])
orig_items = flatnonzero(orig_vec)
length = len(orig_vec)
result = jackknife_bool(orig_items, 3, len(orig_vec), fake_permutation)
self.assertEqual(result, array([0,0,0,0,0,1,0,1,1]))
#returns the original if trying to take too many
self.assertEqual(jackknife_bool(orig_items, 20, len(orig_vec)), \
orig_vec)
def test_jackknife_int(self):
"""jackknife_int should make a vector with right counts"""
orig_vec = array([0,2,1,0,3,1])
orig_items = array([1,1,2,4,4,4,5])
# 0 1 2 3 4 5 6
fake_permutation = lambda a: a == 7 and array([4,6,3,1,2,6,5])
result = jackknife_int(orig_items, 4, len(orig_vec), fake_permutation)
self.assertEqual(result, array([0,1,0,0,2,1]))
#returns the original if trying to take too many
self.assertEqual(jackknife_int(orig_items, 20, len(orig_vec)), \
orig_vec)
def test_jackknife_array(self):
"""jackknife_array should make a new array with right counts"""
orig_vec1 = array([0,2,2,3,1])
orig_vec2 = array([2,2,1,2,2])
test_array = array([orig_vec1, orig_vec2])
# implement this, just doing by eye now
#perm_fn = fake_permutation
perm_fn = permutation
#print "need to test with fake permutation!!"
new_mat1 = jackknife_array(test_array, 1, axis=1, jackknife_f=jackknife_int, permutation_f=permutation)
self.assertEqual(new_mat1.sum(axis=0), [1,1,1,1,1])
new_mat2 = jackknife_array(test_array, 2, axis=1, jackknife_f=jackknife_int, permutation_f=permutation)
self.assertEqual(new_mat2.sum(axis=0), [2,2,2,2,2])
new_mat3 = jackknife_array(test_array, 2, axis=0, jackknife_f=jackknife_int, permutation_f=permutation)
self.assertEqual(new_mat3.sum(axis=1), [2,2])
# test that you get orig mat back if too many
self.assertEqual(jackknife_array(test_array, 20, axis=1), test_array)
def test_unifrac(self):
"""unifrac should return correct results for model tree"""
m = array([[1,0,1],[1,1,0],[0,1,0],[0,0,1],[0,1,0],[0,1,1],[1,1,1],\
[0,1,1],[1,1,1]])
bl = self.branch_lengths
self.assertEqual(unifrac(bl, m[:,0], m[:,1]), 10/16.0)
self.assertEqual(unifrac(bl, m[:,0], m[:,2]), 8/13.0)
self.assertEqual(unifrac(bl, m[:,1], m[:,2]), 8/17.0)
def test_unnormalized_unifrac(self):
"""unnormalized unifrac should return correct results for model tree"""
m = array([[1,0,1],[1,1,0],[0,1,0],[0,0,1],[0,1,0],[0,1,1],[1,1,1],\
[0,1,1],[1,1,1]])
bl = self.branch_lengths
self.assertEqual(unnormalized_unifrac(bl, m[:,0], m[:,1]), 10/17.)
self.assertEqual(unnormalized_unifrac(bl, m[:,0], m[:,2]), 8/17.)
self.assertEqual(unnormalized_unifrac(bl, m[:,1], m[:,2]), 8/17.)
def test_PD(self):
"""PD should return correct results for model tree"""
m = array([[1,0,1],[1,1,0],[0,1,0],[0,0,1],[0,1,0],[0,1,1],[1,1,1],\
[0,1,1],[1,1,1]])
bl = self.branch_lengths
self.assertEqual(PD(bl, m[:,0]), 7)
self.assertEqual(PD(bl, m[:,1]), 15)
self.assertEqual(PD(bl, m[:,2]), 11)
def test_G(self):
"""G should return correct results for model tree"""
m = array([[1,0,1],[1,1,0],[0,1,0],[0,0,1],[0,1,0],[0,1,1],[1,1,1],\
[0,1,1],[1,1,1]])
bl = self.branch_lengths
self.assertEqual(G(bl, m[:,0], m[:,0]), 0)
self.assertEqual(G(bl, m[:,0], m[:,1]), 1/16.0)
self.assertEqual(G(bl, m[:,1], m[:,0]), 9/16.0)
def test_unnormalized_G(self):
"""unnormalized_G should return correct results for model tree"""
m = array([[1,0,1],[1,1,0],[0,1,0],[0,0,1],[0,1,0],[0,1,1],[1,1,1],\
[0,1,1],[1,1,1]])
bl = self.branch_lengths
self.assertEqual(unnormalized_G(bl, m[:,0], m[:,0]), 0/17.)
self.assertEqual(unnormalized_G(bl, m[:,0], m[:,1]), 1/17.)
self.assertEqual(unnormalized_G(bl, m[:,1], m[:,0]), 9/17.)
def test_unifrac_matrix(self):
"""unifrac_matrix should return correct results for model tree"""
m = array([[1,0,1],[1,1,0],[0,1,0],[0,0,1],[0,1,0],[0,1,1],[1,1,1],\
[0,1,1],[1,1,1]])
bl = self.branch_lengths
result = unifrac_matrix(bl, m)
self.assertEqual(result, array([[0, 10/16.,8/13.],[10/16.,0,8/17.],\
[8/13.,8/17.,0]]))
#should work if we tell it the measure is asymmetric
result = unifrac_matrix(bl, m, is_symmetric=False)
self.assertEqual(result, array([[0, 10/16.,8/13.],[10/16.,0,8/17.],\
[8/13.,8/17.,0]]))
#should work if the measure really is asymmetric
result = unifrac_matrix(bl,m,metric=unnormalized_G,is_symmetric=False)
self.assertEqual(result, array([[0, 1/17.,2/17.],[9/17.,0,6/17.],\
[6/17.,2/17.,0]]))
#should also match web site calculations
envs = self.count_array
bound_indices = bind_to_array(self.nodes, envs)
bool_descendants(bound_indices)
result = unifrac_matrix(bl, envs)
exp = array([[0, 0.6250, 0.6154], [0.6250, 0, \
0.4706], [0.6154, 0.4707, 0]])
assert (abs(result - exp)).max() < 0.001
def test_unifrac_one_sample(self):
"""unifrac_one_sample should match unifrac_matrix"""
m = array([[1,0,1],[1,1,0],[0,1,0],[0,0,1],[0,1,0],[0,1,1],[1,1,1],\
[0,1,1],[1,1,1]])
bl = self.branch_lengths
result = unifrac_matrix(bl, m)
for i in range(len(result)):
one_sam_res = unifrac_one_sample(i, bl, m)
self.assertEqual(result[i], one_sam_res)
self.assertEqual(result[:,i], one_sam_res)
#should work ok on asymmetric metrics
result = unifrac_matrix(bl,m,metric=unnormalized_G,is_symmetric=False)
for i in range(len(result)):
one_sam_res = unifrac_one_sample(i, bl, m, metric=unnormalized_G)
self.assertEqual(result[i], one_sam_res)
# only require row for asym
# self.assertEqual(result[:,i], one_sam_res)
def test_unifrac_vector(self):
"""unifrac_vector should return correct results for model tree"""
m = array([[1,0,1],[1,1,0],[0,1,0],[0,0,1],[0,1,0],[0,1,1],[1,1,1],\
[0,1,1],[1,1,1]])
bl = self.branch_lengths
result = unifrac_vector(bl, m)
self.assertFloatEqual(result, array([10./17,6./17,7./17]))
def test_PD_vector(self):
"""PD_vector should return correct results for model tree"""
m = array([[1,0,1],[1,1,0],[0,1,0],[0,0,1],[0,1,0],[0,1,1],[1,1,1],\
[0,1,1],[1,1,1]])
bl = self.branch_lengths
result = PD_vector(bl, m)
self.assertFloatEqual(result, array([7,15,11]))
def test_weighted_unifrac_matrix(self):
"""weighted unifrac matrix should ret correct results for model tree"""
#should match web site calculations
envs = self.count_array
bound_indices = bind_to_array(self.nodes, envs)
sum_descendants(bound_indices)
bl = self.branch_lengths
tip_indices = [n._leaf_index for n in self.t.tips()]
result = weighted_unifrac_matrix(bl, envs, tip_indices)
exp = array([[0, 9.1, 4.5], [9.1, 0, \
6.4], [4.5, 6.4, 0]])
assert (abs(result - exp)).max() < 0.001
#should work with branch length corrections
td = bl.copy()[:,newaxis]
tip_bindings = bind_to_parent_array(self.t, td)
tips = [n._leaf_index for n in self.t.tips()]
tip_distances(td, tip_bindings, tips)
result = weighted_unifrac_matrix(bl, envs, tip_indices, bl_correct=True,
tip_distances=td)
exp = array([[0, 9.1/11.5, 4.5/(10.5+1./3)], [9.1/11.5, 0, \
6.4/(11+1./3)], [4.5/(10.5+1./3), 6.4/(11+1./3), 0]])
assert (abs(result - exp)).max() < 0.001
def test_weighted_one_sample(self):
"""weighted one sample should match weighted matrix"""
#should match web site calculations
envs = self.count_array
bound_indices = bind_to_array(self.nodes, envs)
sum_descendants(bound_indices)
bl = self.branch_lengths
tip_indices = [n._leaf_index for n in self.t.tips()]
result = weighted_unifrac_matrix(bl, envs, tip_indices)
for i in range(len(result)):
one_sam_res = weighted_one_sample(i, bl, envs, tip_indices)
self.assertEqual(result[i], one_sam_res)
self.assertEqual(result[:,i], one_sam_res)
#should work with branch length corrections
td = bl.copy()[:,newaxis]
tip_bindings = bind_to_parent_array(self.t, td)
tips = [n._leaf_index for n in self.t.tips()]
tip_distances(td, tip_bindings, tips)
result = weighted_unifrac_matrix(bl, envs, tip_indices, bl_correct=True,
tip_distances=td)
for i in range(len(result)):
one_sam_res = weighted_one_sample(i, bl, envs, tip_indices,
bl_correct=True, tip_distances=td)
self.assertEqual(result[i], one_sam_res)
self.assertEqual(result[:,i], one_sam_res)
def test_weighted_unifrac_vector(self):
"""weighted_unifrac_vector should ret correct results for model tree"""
envs = self.count_array
bound_indices = bind_to_array(self.nodes, envs)
sum_descendants(bound_indices)
bl = self.branch_lengths
tip_indices = [n._leaf_index for n in self.t.tips()]
result = weighted_unifrac_vector(bl, envs, tip_indices)
self.assertFloatEqual(result[0], sum([
abs(1./2 - 2./8)*1,
abs(1./2 - 1./8)*2,
abs(0 - 1./8)*3,
abs(0 - 3./8)*1,
abs(0 - 1./8)*1,
abs(0 - 4./8)*2,
abs(2./2 - 3./8)*4,
abs(0. - 5./8)*3.]))
self.assertFloatEqual(result[1], sum([
abs(0-.6)*1,
abs(.2-.2)*2,
abs(.2-0)*3,
abs(.6-0)*1,
abs(0-.2)*1,
abs(.6-.2)*2,
abs(.2-.8)*4,
abs(.8-.2)*3]))
self.assertFloatEqual(result[2], sum([
abs(2./3-1./7)*1,
abs(0-2./7)*2,
abs(0-1./7)*3,
abs(0-3./7)*1,
abs(1./3-0)*1,
abs(1./3-3./7)*2,
abs(2./3-3./7)*4,
abs(1./3-4./7)*3]))
if __name__ == '__main__': #run if called from command-line
main()
PyCogent-1.5.3/tests/test_maths/test_unifrac/test_fast_unifrac.py 000644 000765 000024 00000047373 12024702176 026337 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for fast unifrac."""
from __future__ import division
from numpy import array, logical_not, argsort
from cogent.util.unit_test import TestCase, main
from cogent.parse.tree import DndParser
from cogent.maths.unifrac.fast_tree import (count_envs, index_tree, index_envs,
get_branch_lengths)
from cogent.maths.unifrac.fast_unifrac import (reshape_by_name,
meta_unifrac, shuffle_tipnames, weight_equally, weight_by_num_tips,
weight_by_branch_length, weight_by_num_seqs, get_all_env_names,
consolidate_skipping_missing_matrices, consolidate_missing_zero,
consolidate_missing_one, consolidate_skipping_missing_values,
UniFracTreeNode, mcarlo_sig, num_comps, fast_unifrac,
fast_unifrac_whole_tree, PD_whole_tree, PD_generic_whole_tree,
TEST_ON_TREE, TEST_ON_ENVS, TEST_ON_PAIRWISE, shared_branch_length,
shared_branch_length_to_root, fast_unifrac_one_sample)
from numpy.random import permutation
__author__ = "Rob Knight and Micah Hamady"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Micah Hamady", "Daniel McDonald",
"Justin Kuczynski"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight, Micah Hamady"
__email__ = "rob@spot.colorado.edu, hamady@colorado.edu"
__status__ = "Prototype"
class unifrac_tests(TestCase):
"""Tests of top-level functions."""
def setUp(self):
"""Define some standard trees."""
self.t_str = '((a:1,b:2):4,(c:3,(d:1,e:1):2):3)'
self.t = DndParser(self.t_str, UniFracTreeNode)
self.env_str = """
a A 1
a C 2
b A 1
b B 1
c B 1
d B 3
e C 1"""
self.env_counts = count_envs(self.env_str.splitlines())
self.missing_env_str = """
a A 1
a C 2
e C 1"""
self.missing_env_counts = count_envs(self.missing_env_str.splitlines())
self.extra_tip_str = """
q A 1
w C 2
e A 1
r B 1
t B 1
y B 3
u C 1"""
self.extra_tip_counts = count_envs(self.extra_tip_str.splitlines())
self.wrong_tip_str = """
q A 1
w C 2
r B 1
t B 1
y B 3
u C 1"""
self.wrong_tip_counts = count_envs(self.wrong_tip_str.splitlines())
self.t2_str = '(((a:1,b:1):1,c:5):2,d:4)'
self.t2 = DndParser(self.t2_str, UniFracTreeNode)
self.env2_str = """
a B 1
b A 1
c A 2
c C 2
d B 1
d C 1"""
self.env2_counts = count_envs(self.env2_str.splitlines())
self.trees = [self.t, self.t2]
self.envs = [self.env_counts, self.env2_counts]
self.mc_1 = array([.5, .4, .3, .2, .1, .6, .7, .8, .9, 1.0])
# from old EnvsNode tests
self.old_t_str = '((org1:0.11,org2:0.22,(org3:0.12,org4:0.23)g:0.33)b:0.2,(org5:0.44,org6:0.55)c:0.3,org7:0.4)'
self.old_t = DndParser(self.old_t_str, UniFracTreeNode)
self.old_env_str = """
org1 env1 1
org1 env2 1
org2 env2 1
org3 env2 1
org4 env3 1
org5 env1 1
org6 env1 1
org7 env3 1
"""
self.old_env_counts = count_envs(self.old_env_str.splitlines())
self.old_node_index, self.old_nodes = index_tree(self.old_t)
self.old_count_array, self.old_unique_envs, self.old_env_to_index, \
self.old_node_to_index = index_envs(self.old_env_counts, self.old_node_index)
self.old_branch_lengths = get_branch_lengths(self.old_node_index)
def test_shared_branch_length(self):
"""Should return the correct shared branch length by env"""
t_str = "(((a:1,b:2):3,c:4),(d:5,e:6,f:7):8);"
envs = """
a A 1
b A 1
c A 1
d A 1
e A 1
f B 1
"""
env_counts = count_envs(envs.splitlines())
t = DndParser(t_str, UniFracTreeNode)
exp = {('A',):21.0,('B',):7.0}
obs = shared_branch_length(t, env_counts, 1)
self.assertEqual(obs, exp)
exp = {('A','B'):8.0}
obs = shared_branch_length(t, env_counts, 2)
self.assertEqual(obs, exp)
self.assertRaises(ValueError, shared_branch_length, t, env_counts, 3)
def test_shared_branch_length_to_root(self):
"""Should return the correct shared branch length by env to root"""
t_str = "(((a:1,b:2):3,c:4),(d:5,e:6,f:7):8);"
envs = """
a A 1
b A 1
c A 1
d A 1
e A 1
f B 1
"""
env_counts = count_envs(envs.splitlines())
t = DndParser(t_str, UniFracTreeNode)
exp = {'A':29.0,'B':15.0}
obs = shared_branch_length_to_root(t, env_counts)
self.assertEqual(obs, exp)
def test_fast_unifrac(self):
"""Should calc unifrac values for whole tree."""
#Note: results not tested for correctness here as detailed tests
#in fast_tree module.
res = fast_unifrac(self.t, self.env_counts)
res = fast_unifrac(self.t, self.missing_env_counts)
res = fast_unifrac(self.t, self.extra_tip_counts)
self.assertRaises(ValueError, fast_unifrac, self.t, \
self.wrong_tip_counts)
def test_fast_unifrac_one_sample(self):
""" fu one sample should match whole unifrac result, for env 'B'"""
# first get full unifrac matrix
res = fast_unifrac(self.t, self.env_counts)
dmtx, env_order = res['distance_matrix']
dmtx_vec = dmtx[env_order.index('B')]
dmtx_vec = dmtx_vec[argsort(env_order)]
# then get one sample unifrac vector
one_sam_dvec, one_sam_env_order = \
fast_unifrac_one_sample('B', self.t, self.env_counts)
one_sam_dvec = one_sam_dvec[argsort(one_sam_env_order)]
self.assertFloatEqual(one_sam_dvec, dmtx_vec)
def test_fast_unifrac_one_sample2(self):
"""fu one sam should match whole weighted unifrac result, for env 'B'"""
# first get full unifrac matrix
res = fast_unifrac(self.t, self.env_counts, weighted=True)
dmtx, env_order = res['distance_matrix']
dmtx_vec = dmtx[env_order.index('B')]
dmtx_vec = dmtx_vec[argsort(env_order)]
# then get one sample unifrac vector
one_sam_dvec, one_sam_env_order = \
fast_unifrac_one_sample('B', self.t, self.env_counts,weighted=True)
one_sam_dvec = one_sam_dvec[argsort(one_sam_env_order)]
self.assertFloatEqual(one_sam_dvec, dmtx_vec)
def test_fast_unifrac_one_sample3(self):
"""fu one sam should match missing env unifrac result, for env 'B'"""
# first get full unifrac matrix
res = fast_unifrac(self.t, self.missing_env_counts, weighted=False)
dmtx, env_order = res['distance_matrix']
dmtx_vec = dmtx[env_order.index('C')]
dmtx_vec = dmtx_vec[argsort(env_order)]
# then get one sample unifrac vector
one_sam_dvec, one_sam_env_order = \
fast_unifrac_one_sample('C', self.t,
self.missing_env_counts,weighted=False)
one_sam_dvec = one_sam_dvec[argsort(one_sam_env_order)]
self.assertFloatEqual(one_sam_dvec, dmtx_vec)
# and should raise valueerror when 'B'
self.assertRaises(ValueError, fast_unifrac_one_sample, 'B', self.t,
self.missing_env_counts,weighted=False)
def test_fast_unifrac_whole_tree(self):
""" should correctly compute one p-val for whole tree """
# "should test with fake permutation but
# using same as old envs nodefor now"
result = []
num_to_do = 10
for i in range(num_to_do):
real_ufracs, sim_ufracs = fast_unifrac_whole_tree(self.old_t, \
self.old_env_counts, 1000, permutation_f=permutation)
rawp, corp = mcarlo_sig(sum(real_ufracs), [sum(x) for x in \
sim_ufracs], 1, tail='high')
result.append(rawp)
self.assertSimilarMeans(result, 0.047)
def test_unifrac_explicit(self):
"""unifrac should correctly compute correct values.
environment M contains only tips not in tree, tip j is in no envs
values were calculated by hand
"""
t1 = DndParser('((a:1,b:2):4,((c:3, j:17),(d:1,e:1):2):3)', \
UniFracTreeNode) # note c,j is len 0 node
# /-------- /-a
# ---------| \-b
# | /-------- /-c
# \--------| \-j
# \-------- /-d
# \-e
env_str = """
a A 1
a C 2
b A 1
b B 1
c B 1
d B 3
e C 1
m M 88"""
env_counts = count_envs(env_str.splitlines())
self.assertFloatEqual(fast_unifrac(t1,env_counts)['distance_matrix'], \
(array(
[[0,10/16, 8/13],
[10/16,0,8/17],
[8/13,8/17,0]]),['A','B','C']))
# changing tree topology relative to c,j tips shouldn't change
# anything
t2 = DndParser('((a:1,b:2):4,((c:2, j:16):1,(d:1,e:1):2):3)', \
UniFracTreeNode)
self.assertFloatEqual(fast_unifrac(t2,env_counts)['distance_matrix'], \
(array(
[[0,10/16, 8/13],
[10/16,0,8/17],
[8/13,8/17,0]]),['A','B','C']))
def test_unifrac_make_subtree(self):
"""unifrac result should not depend on make_subtree
environment M contains only tips not in tree, tip j, k is in no envs
one clade is missing entirely
values were calculated by hand
we also test that we still have a valid tree at the end
"""
t1 = DndParser('((a:1,b:2):4,((c:3, (j:1,k:2)mt:17),(d:1,e:1):2):3)',\
UniFracTreeNode) # note c,j is len 0 node
# /-------- /-a
# ---------| \-b
# | /-------- /-c
# \--------| \mt------ /-j
# | \-k
# \-------- /-d
# \-e
#
env_str = """
a A 1
a C 2
b A 1
b B 1
c B 1
d B 3
e C 1
m M 88"""
env_counts = count_envs(env_str.splitlines())
self.assertFloatEqual(fast_unifrac(t1,env_counts,make_subtree=False)['distance_matrix'], \
(array(
[[0,10/16, 8/13],
[10/16,0,8/17],
[8/13,8/17,0]]),['A','B','C']))
self.assertFloatEqual(fast_unifrac(t1,env_counts,make_subtree=True)['distance_matrix'], \
(array(
[[0,10/16, 8/13],
[10/16,0,8/17],
[8/13,8/17,0]]),['A','B','C']))
# changing tree topology relative to c,j tips shouldn't change anything
t2 = DndParser('((a:1,b:2):4,((c:2, (j:1,k:2)mt:17):1,(d:1,e:1):2):3)', \
UniFracTreeNode)
self.assertFloatEqual(fast_unifrac(t2,env_counts,make_subtree=False)['distance_matrix'], \
(array(
[[0,10/16, 8/13],
[10/16,0,8/17],
[8/13,8/17,0]]),['A','B','C']))
self.assertFloatEqual(fast_unifrac(t2,env_counts,make_subtree=True)['distance_matrix'], \
(array(
[[0,10/16, 8/13],
[10/16,0,8/17],
[8/13,8/17,0]]),['A','B','C']))
# ensure we haven't meaningfully changed the tree
# by passing it to unifrac
t3 = DndParser('((a:1,b:2):4,((c:3, (j:1,k:2)mt:17),(d:1,e:1):2):3)',\
UniFracTreeNode) # note c,j is len 0 node
t1_tips = [tip.Name for tip in t1.tips()]
t1_tips.sort()
t3_tips = [tip.Name for tip in t3.tips()]
t3_tips.sort()
self.assertEqual(t1_tips, t3_tips)
tipj3 = t3.getNodeMatchingName('j')
tipb3 = t3.getNodeMatchingName('b')
tipj1 = t1.getNodeMatchingName('j')
tipb1 = t1.getNodeMatchingName('b')
self.assertFloatEqual(tipj1.distance(tipb1), tipj3.distance(tipb3))
def test_PD_whole_tree(self):
"""PD_whole_tree should correctly compute PD for test tree.
environment M contains only tips not in tree, tip j is in no envs
"""
t1 = DndParser('((a:1,b:2):4,((c:3, j:17),(d:1,e:1):2):3)', \
UniFracTreeNode)
env_str = """
a A 1
a C 2
b A 1
b B 1
c B 1
d B 3
e C 1
m M 88"""
env_counts = count_envs(env_str.splitlines())
self.assertEqual(PD_whole_tree(t1,env_counts), \
(['A','B','C'], array([7.,15.,11.])))
def test_PD_generic_whole_tree(self):
"""PD_generic_whole_tree should correctly compute PD for test tree."""
self.t1 = DndParser('((a:1,b:2):4,(c:3,(d:1,e:1):2):3)', \
UniFracTreeNode)
self.env_str = """
a A 1
a C 2
b A 1
b B 1
c B 1
d B 3
e C 1"""
env_counts = count_envs(self.env_str.splitlines())
self.assertEqual(PD_generic_whole_tree(self.t1,self.env_counts), \
(['A','B','C'], array([7.,15.,11.])))
def test_mcarlo_sig(self):
"""test_mcarlo_sig should calculate monte carlo sig high/low"""
self.assertEqual(mcarlo_sig(.5, self.mc_1, 1, 'high'), (5.0/10, 5.0/10))
self.assertEqual(mcarlo_sig(.5, self.mc_1, 1, 'low'), (4.0/10, 4.0/10))
self.assertEqual(mcarlo_sig(.5, self.mc_1, 5, 'high'), (5.0/10, 1.0))
self.assertEqual(mcarlo_sig(.5, self.mc_1, 5, 'low'), (4.0/10, 1.0))
self.assertEqual(mcarlo_sig(0, self.mc_1, 1, 'low'), (0.0, "<=%.1e" % (1.0/10)))
self.assertEqual(mcarlo_sig(100, self.mc_1, 10, 'high'), (0.0, "<=%.1e" % (1.0/10)))
def test_num_comps(self):
""" test num comps """
self.assertEqual(num_comps(5), sum([i for i in range(1, 5)]))
self.assertEqual(num_comps(15), sum([i for i in range(1, 15)]))
self.assertEqual(num_comps(10000), sum([i for i in range(1, 10000)]))
self.assertEqual(num_comps(1833), sum([i for i in range(1, 1833)]))
def test_shuffle_tipnames(self):
"""shuffle_tipnames should return copy of tree w/ labels permuted"""
#Note: this should never fail but is technically still stochastic
#5! is 120 so repeating 5 times should fail about 1 in 10^10.
for i in range(5):
try:
t = DndParser(self.t_str)
result = shuffle_tipnames(t)
orig_names = [n.Name for n in t.tips()]
new_names = [n.Name for n in result.tips()]
self.assertIsPermutation(orig_names, new_names)
return
except AssertionError:
continue
raise AssertionError, "Produced same permutation in 5 tries: broken?"
def test_weight_equally(self):
"""weight_equally should return unit weight per tree"""
self.assertEqual(weight_equally(self.trees, self.envs),
array([1,1]))
def test_weight_by_num_tips(self):
"""weight_by_num_tips should return tips per tree"""
self.assertEqual(weight_by_num_tips(self.trees, self.envs),
array([5, 4]))
def test_weight_by_branch_length(self):
"""weight_by_branch_length should return branch length per tree"""
self.assertEqual(weight_by_branch_length(self.trees, self.envs),
array([17, 14]))
def test_weight_by_num_seqs(self):
"""weight_by_num_seqs should return num seqs per tree"""
self.assertEqual(weight_by_num_seqs(self.trees, self.envs),
array([10, 8]))
def test_get_all_env_names(self):
"""get_all_env_names should get all names from counts"""
self.assertEqual(get_all_env_names(self.env_counts),
set('ABC'))
def test_consolidate_skipping_missing_matrices(self):
"""consolidate_skipping_missing_matrices should skip those missing data"""
m1 = array([[1,2],[3,4]])
m2 = array([[1,2,3],[4,5,6],[7,8,9]])
m3 = array([[2,2,2],[3,3,3],[4,4,4]])
matrices = [m1,m2, m3]
env_names = map(list, ['AB', 'ABC', 'ABC'])
weights = [1, 2, 3]
all_names =list('ABC')
result = consolidate_skipping_missing_matrices(matrices, env_names, weights,
all_names)
self.assertFloatEqual(result, .4*m2 + .6*m3)
def test_consolidate_missing_zero(self):
"""consolidate_missing_zero should fill missing values to zero"""
m1 = array([[1,2],[3,4]])
m2 = array([[1,2,3],[4,5,6],[7,8,9]])
m3 = array([[2,2,2],[3,3,3],[4,4,4]])
matrices = [m1,m2, m3]
env_names = map(list, ['AB', 'ABC', 'ABC'])
weights = [1, 2, 3]
weights = array(weights, float)
weights/=weights.sum()
all_names =list('ABC')
transformed_m1 = array([[1,2,0],[3,4,0],[0,0,0]])
result = consolidate_missing_zero(matrices, env_names, weights,
all_names)
self.assertFloatEqual(result, (1/6.)*transformed_m1 + (2/6.)*m2 + (3/6.)*m3)
def test_consolidate_missing_one(self):
"""consolidate_missing_one should fill missing off-diags to one"""
m1 = array([[1,2],[3,4]])
m2 = array([[1,2,3],[4,5,6],[7,8,9]])
m3 = array([[2,2,2],[3,3,3],[4,4,4]])
matrices = [m1,m2, m3]
env_names = map(list, ['AB', 'ABC', 'ABC'])
weights = [1, 2, 3]
weights = array(weights, float)
weights/=weights.sum()
all_names =list('ABC')
transformed_m1 = array([[1,2,1],[3,4,1],[1,1,0]])
result = consolidate_missing_one(matrices, env_names, weights,
all_names)
self.assertFloatEqual(result, (1/6.)*transformed_m1 + (2/6.)*m2 + (3/6.)*m3)
def test_consolidate_skipping_missing_values(self):
"""consolidate_skipping_missing_values should average over filled values"""
m1 = array([[1,2],[3,4]])
m2 = array([[1,2,3],[4,5,6],[7,8,9]])
m3 = array([[2,2,2],[3,3,3],[4,4,4]])
matrices = [m1,m2, m3]
env_names = map(list, ['AB', 'ABC', 'ABC'])
weights = [1., 2, 3]
weights = array(weights)
weights/=weights.sum()
all_names =list('ABC')
expected = array([[ 1/6.*1 + 2/6.*1 + 3/6.*2,
1/6.*2 + 2/6.*2 + 3/6.*2,
2/5.*3 + 3/5.*2],
[ 1/6.*3 + 2/6.*4 + 3/6.*3,
1/6.*4 + 2/6.*5 + 3/6.*3,
2/5.*6 + 3/5.*3],
[ 2/5.*7 + 3/5.*4,
2/5.*8 + 3/5.*4,
2/5.*9 + 3/5.*4]])
result = consolidate_skipping_missing_values(matrices, env_names, weights,
all_names)
self.assertFloatEqual(result, expected)
def test_reshape_by_name(self):
"""reshape_by_name should reshape matrix from old to new names"""
old = array([[0,1,2],[3,4,5],[6,7,8]])
old_names = 'ABC'
new_names = 'xCyBA'
exp = array([[0,0,0,0,0],[0,8,0,7,6],[0,0,0,0,0],\
[0,5,0,4,3],[0,2,0,1,0]])
self.assertEqual(reshape_by_name(old, old_names, new_names), exp)
result = reshape_by_name(old, old_names, new_names, masked=True)
result.fill_value=0
self.assertEqual(result._data * logical_not(result._mask), exp)
def test_meta_unifrac(self):
"""meta_unifrac should give correct result on sample trees"""
tree_list = [self.t, self.t2]
envs_list = [self.env_counts, self.env2_counts]
result = meta_unifrac(tree_list, envs_list, weight_equally,
modes=["distance_matrix"])
u1_distances = array([[0, 10/16.,8/13.],[10/16.,0,8/17.],\
[8/13.,8/17.,0]])
u2_distances = array([[0,11/14.,6/13.],[11/14.,0,7/13.],[6/13.,7/13., 0]])
exp = (u1_distances + u2_distances)/2
self.assertFloatEqual(result['distance_matrix'], (exp, list('ABC')))
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_maths/test_stats/__init__.py 000644 000765 000024 00000000663 12024702176 024071 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
__all__ = ['test_distribution','test_histogram', 'test_special',
'test_ks', 'test_test']
__author__ = ""
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Catherine Lozupone", "Gavin Huttley",
"Sandra Smit"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
PyCogent-1.5.3/tests/test_maths/test_stats/test_alpha_diversity.py 000644 000765 000024 00000025014 12024702176 026555 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
#file test_alpha_diversity.py
from __future__ import division
from numpy import array, log, sqrt, exp
from math import e
from cogent.util.unit_test import TestCase, main
from cogent.maths.stats.alpha_diversity import expand_counts, counts, observed_species, singles, \
doubles, osd, margalef, menhinick, dominance, simpson, \
simpson_reciprocal, reciprocal_simpson,\
shannon, equitability, berger_parker_d, mcintosh_d, brillouin_d, \
strong, kempton_taylor_q, fisher_alpha, \
mcintosh_e, heip_e, simpson_e, robbins, robbins_confidence, \
chao1_uncorrected, chao1_bias_corrected, chao1, chao1_var, \
chao1_confidence, ACE, michaelis_menten_fit
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight","Justin Kuczynski"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class diversity_tests(TestCase):
"""Tests of top-level functions"""
def setUp(self):
"""Set up shared variables"""
self.TestData = array([0,1,1,4,2,5,2,4,1,2])
self.NoSingles = array([0,2,2,4,5,0,0,0,0,0])
self.NoDoubles = array([0,1,1,4,5,0,0,0,0,0])
def test_expand_counts(self):
"""expand_counts should return correct expanded array"""
c = array([2,0,1,2])
self.assertEqual(expand_counts(c), array([0,0,2,3,3]))
def test_counts(self):
"""counts should return correct array"""
c = array([5,0,1,1,5,5])
obs = counts(c)
exp = array([1,2,0,0,0,3])
self.assertEqual(obs, exp)
d = array([2,2,1,0])
obs = counts(d, obs)
exp = array([2,3,2,0,0,3])
self.assertEqual(obs, exp)
def test_singles(self):
"""singles should return correct # of singles"""
self.assertEqual(singles(self.TestData), 3)
self.assertEqual(singles(array([0,3,4])), 0)
self.assertEqual(singles(array([1])), 1)
def test_doubles(self):
"""doubles should return correct # of doubles"""
self.assertEqual(doubles(self.TestData), 3)
self.assertEqual(doubles(array([0,3,4])), 0)
self.assertEqual(doubles(array([2])), 1)
def test_osd(self):
"""osd should return correct # of observeds, singles, doubles"""
self.assertEqual(osd(self.TestData), (9,3,3))
def test_margalef(self):
"""margalef should match hand-calculated values"""
self.assertEqual(margalef(self.TestData), 8/log(22))
def test_menhinick(self):
"""menhinick should match hand-calculated values"""
self.assertEqual(menhinick(self.TestData), 9/sqrt(22))
def test_dominance(self):
"""dominance should match hand-calculated values"""
c = array([1,0,2,5,2])
self.assertFloatEqual(dominance(c), .34)
d = array([5])
self.assertEqual(dominance(d), 1)
def test_simpson(self):
"""simpson should match hand-calculated values"""
c = array([1,0,2,5,2])
self.assertFloatEqual(simpson(c), .66)
d = array([5])
self.assertFloatEqual(simpson(d), 0)
def test_reciprocal_simpson(self):
"""reciprocal_simpson should match hand-calculated results"""
c = array([1,0,2,5,2])
self.assertFloatEqual(reciprocal_simpson(c), 1/.66)
def test_simpson_reciprocal(self):
"""simpson_reciprocal should match 1/D results"""
c = array([1,0,2,5,2])
self.assertFloatEqual(simpson_reciprocal(c), 1./dominance(c))
def test_shannon(self):
"""shannon should match hand-calculated values"""
c = array([5])
self.assertFloatEqual(shannon(c), 0)
c = array([5,5])
self.assertFloatEqual(shannon(c), 1)
c = array([1,1,1,1,0])
self.assertEqual(shannon(c), 2)
def test_equitability(self):
"""equitability should match hand-calculated values"""
c = array([5])
self.assertFloatEqual(equitability(c), 0)
c = array([5,5])
self.assertFloatEqual(equitability(c), 1)
c = array([1,1,1,1,0])
self.assertEqual(equitability(c), 1)
def test_berger_parker_d(self):
"""berger-parker_d should match hand-calculated values"""
c = array([5])
self.assertFloatEqual(berger_parker_d(c), 1)
c = array([5,5])
self.assertFloatEqual(berger_parker_d(c), 0.5)
c = array([1,1,1,1,0])
self.assertEqual(berger_parker_d(c), 0.25)
def test_mcintosh_d(self):
"""mcintosh_d should match hand-calculated values"""
c = array([1,2,3])
self.assertFloatEqual(mcintosh_d(c), 0.636061424871458)
def test_brillouin_d(self):
"""brillouin_d should match hand-calculated values"""
c = array([1,2,3,1])
self.assertFloatEqual(brillouin_d(c), 0.86289353018248782)
def test_strong(self):
"""strong's dominance index should match hand-calculated values"""
c = array([1,2,3,1])
self.assertFloatEqual(strong(c), 0.214285714)
def test_kempton_taylor_q(self):
"""kempton_taylor_q should approximate Magurran 1998 calculation p143"""
c = array([2,3,3,3,3,3,4,4,4,6,6,7,7,9,9,11,14,15,15,20,29,33,34,
36,37,53,57,138,146,170])
self.assertFloatEqual(kempton_taylor_q(c), 14/log(34/4))
def test_fisher_alpha(self):
"""fisher alpha should match hand-calculated value."""
c = array([4,3,4,0,1,0,2])
obs = fisher_alpha(c)
self.assertFloatEqual(obs, 2.7823795367398798)
def test_mcintosh_e(self):
"""mcintosh e should match hand-calculated value."""
c = array([1,2,3,1])
num = sqrt(15)
den = sqrt(19)
exp = num/den
self.assertEqual(mcintosh_e(c), exp)
def test_heip_e(self):
"""heip e should match hand-calculated value"""
c = array([1,2,3,1])
h = shannon(c, base=e)
expected = exp(h-1)/3
self.assertEqual(heip_e(c), expected)
def test_simpson_e(self):
"""simpson e should match hand-calculated value"""
c = array([1,2,3,1])
s = simpson(c)
self.assertEqual((1/s)/4, simpson_e(c))
def test_robbins(self):
"""robbins metric should match hand-calculated value"""
c = array([1,2,3,0,1])
r = robbins(c)
self.assertEqual(r,2./7)
def test_robbins_confidence(self):
"""robbins CI should match hand-calculated value"""
c = array([1,2,3,0,1])
r = robbins_confidence(c, 0.05)
n = 7
s = 2
k = sqrt(8/0.05)
self.assertEqual(r, ((s-k)/(n+1), (s+k)/(n+1)))
def test_observed_species(self):
"""observed_species should return # observed species"""
c = array([4,3,4,0,1,0,2])
obs = observed_species(c)
exp = 5
self.assertEqual(obs, exp)
c = array([0,0,0])
obs = observed_species(c)
exp = 0
self.assertEqual(obs, exp)
self.assertEqual(observed_species(self.TestData), 9)
def test_chao1_bias_corrected(self):
"""chao1_bias_corrected should return same result as EstimateS"""
obs = chao1_bias_corrected(*osd(self.TestData))
self.assertEqual(obs, 9.75)
def test_chao1_uncorrected(self):
"""chao1_uncorrected should return same result as EstimateS"""
obs = chao1_uncorrected(*osd(self.TestData))
self.assertEqual(obs, 10.5)
def test_chao1(self):
"""chao1 should use right decision rules"""
self.assertEqual(chao1(self.TestData), 9.75)
self.assertEqual(chao1(self.TestData,bias_corrected=False),10.5)
self.assertEqual(chao1(self.NoSingles), 4)
self.assertEqual(chao1(self.NoSingles,bias_corrected=False),4)
self.assertEqual(chao1(self.NoDoubles), 5)
self.assertEqual(chao1(self.NoDoubles,bias_corrected=False),5)
def test_chao1_var(self):
"""chao1_var should match observed results from EstimateS"""
#NOTE: EstimateS reports sd, not var, and rounds to 2 dp
self.assertFloatEqual(chao1_var(self.TestData), 1.42**2, eps=0.01)
self.assertFloatEqual(chao1_var(self.TestData,bias_corrected=False),\
2.29**2, eps=0.01)
self.assertFloatEqualAbs(chao1_var(self.NoSingles), 0.39**2, eps=0.01)
self.assertFloatEqualAbs(chao1_var(self.NoSingles, \
bias_corrected=False), 0.39**2, eps=0.01)
self.assertFloatEqualAbs(chao1_var(self.NoDoubles), 2.17**2, eps=0.01)
self.assertFloatEqualAbs(chao1_var(self.NoDoubles, \
bias_corrected=False), 2.17**2, eps=0.01)
def test_chao1_confidence(self):
"""chao1_confidence should match observed results from EstimateS"""
#NOTE: EstimateS rounds to 2 dp
self.assertFloatEqual(chao1_confidence(self.TestData), (9.07,17.45), \
eps=0.01)
self.assertFloatEqual(chao1_confidence(self.TestData, \
bias_corrected=False), (9.17,21.89), eps=0.01)
self.assertFloatEqualAbs(chao1_confidence(self.NoSingles),\
(4, 4.95), eps=0.01)
self.assertFloatEqualAbs(chao1_confidence(self.NoSingles, \
bias_corrected=False), (4,4.95), eps=0.01)
self.assertFloatEqualAbs(chao1_confidence(self.NoDoubles), \
(4.08,17.27), eps=0.01)
self.assertFloatEqualAbs(chao1_confidence(self.NoDoubles, \
bias_corrected=False), (4.08,17.27), eps=0.01)
def test_ACE(self):
"""ACE should match values calculated by hand"""
self.assertFloatEqual(ACE(array([2,0])), 1.0, eps=0.001)
# next: just returns the number of species when all are abundant
self.assertFloatEqual(ACE(array([12,0,9])), 2.0, eps=0.001)
self.assertFloatEqual(ACE(array([12,2,8])), 3.0, eps=0.001)
self.assertFloatEqual(ACE(array([12,2,1])), 4.0, eps=0.001)
self.assertFloatEqual(ACE(array([12,1,2,1])), 7.0, eps=0.001)
self.assertFloatEqual(ACE(array([12,3,2,1])), 4.6, eps=0.001)
self.assertFloatEqual(ACE(array([12,3,6,1,10])), 5.62749672, eps=0.001)
def test_michaelis_menten_fit(self):
""" michaelis_menten_fit should match hand values in limiting cases"""
res = michaelis_menten_fit([22])
self.assertFloatEqual(res,1.0,eps=.01)
res = michaelis_menten_fit([42])
self.assertFloatEqual(res,1.0,eps=.01)
res = michaelis_menten_fit([34],num_repeats=3,params_guess=[13,13])
self.assertFloatEqual(res,1.0,eps=.01)
res = michaelis_menten_fit([70,70],num_repeats=5)
self.assertFloatEqual(res,2.0,eps=.01)
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_maths/test_stats/test_cai/ 000755 000765 000024 00000000000 12024703635 023547 5 ustar 00jrideout staff 000000 000000 PyCogent-1.5.3/tests/test_maths/test_stats/test_distribution.py 000644 000765 000024 00000106634 12024702176 026115 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests of statistical probability distribution integrals.
Currently using tests against calculations in R, spreadsheets being unreliable.
"""
from cogent.util.unit_test import TestCase, main
from cogent.maths.stats.distribution import z_low, z_high, zprob, chi_low, \
chi_high, t_low, t_high, tprob, poisson_high, poisson_low, poisson_exact, \
binomial_high, binomial_low, binomial_exact, f_low, f_high, fprob, \
stdtr, bdtr, bdtrc, pdtr, pdtrc, fdtr, fdtrc, gdtr, gdtrc, chdtri, stdtri,\
pdtri, bdtri, fdtri, gdtri
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Gavin Huttley", "Rob Knight", "Sandra Smit"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class DistributionsTests(TestCase):
"""Tests of particular statistical distributions."""
def setUp(self):
self.values = [0, 0.01, 0.1, 0.5, 1, 2, 5, 10, 20, 30, 50, 200]
self.negvalues = [-i for i in self.values]
self.df = [1, 10, 100]
def test_z_low(self):
"""z_low should match R's pnorm() function"""
probs = [
0.5000000, 0.5039894, 0.5398278, 0.6914625, 0.8413447,
0.9772499, 0.9999997, 1.0000000, 1.0000000, 1.0000000,
1.0000000, 1.0000000,
]
negprobs = [
5.000000e-01, 4.960106e-01, 4.601722e-01, 3.085375e-01,
1.586553e-01, 2.275013e-02, 2.866516e-07, 7.619853e-24,
2.753624e-89, 4.906714e-198, 0.000000e+00, 0.000000e+00]
for z, p in zip(self.values, probs):
self.assertFloatEqual(z_low(z), p)
for z, p in zip(self.negvalues, negprobs):
self.assertFloatEqual(z_low(z), p)
def test_z_high(self):
"""z_high should match R's pnorm(lower.tail=FALSE) function"""
negprobs = [
0.5000000, 0.5039894, 0.5398278, 0.6914625, 0.8413447,
0.9772499, 0.9999997, 1.0000000, 1.0000000, 1.0000000,
1.0000000, 1.0000000,
]
probs = [
5.000000e-01, 4.960106e-01, 4.601722e-01, 3.085375e-01,
1.586553e-01, 2.275013e-02, 2.866516e-07, 7.619853e-24,
2.753624e-89, 4.906714e-198, 0.000000e+00, 0.000000e+00]
for z, p in zip(self.values, probs):
self.assertFloatEqual(z_high(z), p)
for z, p in zip(self.negvalues, negprobs):
self.assertFloatEqual(z_high(z), p)
def test_zprob(self):
"""zprob should match twice the z_high probability for abs(z)"""
probs = [2*i for i in [
5.000000e-01, 4.960106e-01, 4.601722e-01, 3.085375e-01,
1.586553e-01, 2.275013e-02, 2.866516e-07, 7.619853e-24,
2.753624e-89, 4.906714e-198, 0.000000e+00, 0.000000e+00]]
for z, p in zip(self.values, probs):
self.assertFloatEqual(zprob(z), p)
for z, p in zip(self.negvalues, probs):
self.assertFloatEqual(zprob(z), p)
def test_chi_low(self):
"""chi_low should match R's pchisq() function"""
probs = {
1: [ 0.00000000, 0.07965567, 0.24817037, 0.52049988, 0.68268949,
0.84270079, 0.97465268, 0.99843460, 0.99999226, 0.99999996,
1.00000000, 1.00000000
],
10: [ 0.000000e+00, 2.593339e-14, 2.497951e-09, 6.611711e-06,
1.721156e-04, 3.659847e-03, 1.088220e-01, 5.595067e-01,
9.707473e-01, 9.991434e-01, 9.999997e-01, 1.000000e+00,
],
100: [ 0.000000e+00, 2.906006e-180, 2.780588e-130, 2.029952e-95,
1.788777e-80, 1.233751e-65, 2.238699e-46, 2.181059e-32,
1.854727e-19, 9.056126e-13, 6.953305e-06, 1.000000e-00
],
}
for df in self.df:
for x, p in zip(self.values, probs[df]):
self.assertFloatEqual(chi_low(x, df), p)
def test_chi_high(self):
"""chi_high should match R's pchisq(lower.tail=FALSE) function"""
probs = {
1: [ 1.000000e+00, 9.203443e-01, 7.518296e-01, 4.795001e-01,
3.173105e-01, 1.572992e-01, 2.534732e-02, 1.565402e-03,
7.744216e-06, 4.320463e-08, 1.537460e-12, 2.088488e-45,
],
10: [ 1.000000e+00, 1.000000e-00, 1.000000e-00, 9.999934e-01,
9.998279e-01, 9.963402e-01, 8.911780e-01, 4.404933e-01,
2.925269e-02, 8.566412e-04, 2.669083e-07, 1.613931e-37,
],
100:[ 1.00000e+00, 1.00000e+00, 1.00000e+00, 1.00000e+00,
1.00000e+00, 1.00000e+00, 1.00000e+00, 1.00000e+00,
1.00000e+00, 1.00000e+00, 9.99993e-01, 1.17845e-08,
],
}
for df in self.df:
for x, p in zip(self.values, probs[df]):
self.assertFloatEqual(chi_high(x, df), p)
def test_t_low(self):
"""t_low should match R's pt() function"""
probs = {
1: [ 0.5000000, 0.5031830, 0.5317255, 0.6475836, 0.7500000,
0.8524164, 0.9371670, 0.9682745, 0.9840977, 0.9893936,
0.9936347, 0.9984085,
],
10: [ 0.5000000, 0.5038910, 0.5388396, 0.6860532, 0.8295534,
0.9633060, 0.9997313, 0.9999992, 1.0000000, 1.0000000,
1.0000000, 1.0000000,
],
100:[ 0.5000000, 0.5039794, 0.5397277, 0.6909132, 0.8401379,
0.9758939, 0.9999988, 1.0000000, 1.0000000, 1.0000000,
1.0000000, 1.0000000,
],
}
negprobs = {
1: [ 0.500000000, 0.496817007, 0.468274483, 0.352416382,
0.250000000, 0.147583618, 0.062832958, 0.031725517,
0.015902251, 0.010606402, 0.006365349, 0.001591536,
],
10: [ 5.000000e-01, 4.961090e-01, 4.611604e-01, 3.139468e-01,
1.704466e-01, 3.669402e-02, 2.686668e-04, 7.947766e-07,
1.073031e-09, 1.980896e-11, 1.237155e-13, 1.200254e-19,
],
100:[ 5.000000e-01, 4.960206e-01, 4.602723e-01, 3.090868e-01,
1.598621e-01, 2.410609e-02, 1.225087e-06, 4.950844e-17,
4.997134e-37, 4.190166e-52, 7.236082e-73, 2.774197e-132,
],
}
for df in self.df:
for x, p in zip(self.values, probs[df]):
self.assertFloatEqualRel(t_low(x, df), p, eps=1e-4)
for x, p in zip(self.negvalues, negprobs[df]):
self.assertFloatEqualRel(t_low(x, df), p, eps=1e-4)
def test_t_high(self):
"""t_high should match R's pt(lower.tail=FALSE) function"""
negprobs = {
1: [ 0.5000000, 0.5031830, 0.5317255, 0.6475836, 0.7500000,
0.8524164, 0.9371670, 0.9682745, 0.9840977, 0.9893936,
0.9936347, 0.9984085,
],
10: [ 0.5000000, 0.5038910, 0.5388396, 0.6860532, 0.8295534,
0.9633060, 0.9997313, 0.9999992, 1.0000000, 1.0000000,
1.0000000, 1.0000000,
],
100:[ 0.5000000, 0.5039794, 0.5397277, 0.6909132, 0.8401379,
0.9758939, 0.9999988, 1.0000000, 1.0000000, 1.0000000,
1.0000000, 1.0000000,
],
}
probs = {
1: [ 0.500000000, 0.496817007, 0.468274483, 0.352416382,
0.250000000, 0.147583618, 0.062832958, 0.031725517,
0.015902251, 0.010606402, 0.006365349, 0.001591536,
],
10: [ 5.000000e-01, 4.961090e-01, 4.611604e-01, 3.139468e-01,
1.704466e-01, 3.669402e-02, 2.686668e-04, 7.947766e-07,
1.073031e-09, 1.980896e-11, 1.237155e-13, 1.200254e-19,
],
100:[ 5.000000e-01, 4.960206e-01, 4.602723e-01, 3.090868e-01,
1.598621e-01, 2.410609e-02, 1.225087e-06, 4.950844e-17,
4.997134e-37, 4.190166e-52, 7.236082e-73, 2.774197e-132,
],
}
for df in self.df:
for x, p in zip(self.values, probs[df]):
self.assertFloatEqualRel(t_high(x, df), p, eps=1e-4)
for x, p in zip(self.negvalues, negprobs[df]):
self.assertFloatEqualRel(t_high(x, df), p, eps=1e-4)
def test_tprob(self):
"""tprob should match twice the t_high probability for abs(t)"""
probs = {
1: [ 2*i for i in
[ 0.500000000, 0.496817007, 0.468274483, 0.352416382,
0.250000000, 0.147583618, 0.062832958, 0.031725517,
0.015902251, 0.010606402, 0.006365349, 0.001591536,
]],
10: [ 2*i for i in
[ 5.000000e-01, 4.961090e-01, 4.611604e-01, 3.139468e-01,
1.704466e-01, 3.669402e-02, 2.686668e-04, 7.947766e-07,
1.073031e-09, 1.980896e-11, 1.237155e-13, 1.200254e-19,
]],
100:[ 2*i for i in
[ 5.000000e-01, 4.960206e-01, 4.602723e-01, 3.090868e-01,
1.598621e-01, 2.410609e-02, 1.225087e-06, 4.950844e-17,
4.997134e-37, 4.190166e-52, 7.236082e-73, 2.774197e-132,
]],
}
for df in self.df:
for x, p in zip(self.values, probs[df]):
self.assertFloatEqualRel(tprob(x, df), p, eps=1e-4)
def test_poisson_low(self):
"""Lower tail of poisson should match R for integer successes"""
#WARNING: Results only guaranteed for integer successes: floating
#point _should_ yield reasonable values, but R rounds to int.
expected = {
(0, 0): 1,
(0, 0.75): 0.4723666,
(0, 1): 0.3678794,
(0, 5): 0.006737947,
(0, 113.7): 4.175586e-50,
(2, 0): 1,
(2, 3): 0.4231901,
(2, 17.8): 3.296636e-06,
(17, 29.6): 0.008753318,
(180, 0): 1,
(180, 137.4):0.999784,
(180, 318):2.436995e-17,
(180, 1024):8.266457e-233,
}
for (key, value) in expected.items():
self.assertFloatEqual(poisson_low(*key), value)
def test_poisson_high(self):
"""Upper tail of poisson should match R for integer successes"""
#WARNING: Results only guaranteed for integer successes: floating
#point _should_ yield reasonable values, but R rounds to int.
expected = {
(0, 0): 0,
(0, 0.75): 0.5276334,
(0, 1): 0.6321206,
(0, 5): 0.993262,
(0, 113.7): 1,
(2, 0): 0,
(2, 3): 0.5768099,
(2, 17.8): 0.9999967,
(17, 29.6): 0.9912467,
(180, 0): 0,
(180, 137.4):0.0002159856,
(180, 318):1,
(180, 1024):1,
}
for (key, value) in expected.items():
self.assertFloatEqual(poisson_high(*key), value)
def test_poisson_exact(self):
"""Poisson exact should match expected values from R"""
expected = {
(0, 0): 1,
(0, 0.75): 0.4723666,
(0, 1): 0.3678794,
(0, 5): 0.006737947,
(0, 113.7): 4.175586e-50,
(2, 0): 0,
(2, 3): 0.2240418,
(2, 17.8): 2.946919e-06,
(17, 29.6): 0.004034353,
(180, 0): 0,
(180, 137.4):7.287501e-05,
(180, 318):1.067247e-17,
(180, 1024):6.815085e-233,
}
for (key, value) in expected.items():
self.assertFloatEqual(poisson_exact(*key), value)
def test_binomial_high(self):
"""Binomial high should match values from R for integer successes"""
expected = {
(0, 1, 0.5): 0.5,
(1, 1, 0.5): 0,
(1, 1, 0.0000001): 0,
(1, 1, 0.9999999): 0,
(3, 5, 0.75):0.6328125,
(0, 60, 0.5): 1,
(129, 130, 0.5):7.34684e-40,
(299, 300, 0.099): 4.904089e-302,
(9, 27, 0.0003): 4.958496e-29,
(1032, 2050, 0.5): 0.3702155,
(-1, 3, 0.1): 1, #if successes less than 0, return 1
(-0.5, 3, 0.1):1,
}
for (key, value) in expected.items():
self.assertFloatEqualRel(binomial_high(*key), value, 1e-4)
#should reject if successes > trials or successes < -1
self.assertRaises(ValueError, binomial_high, 7, 5, 0.5)
def test_binomial_low(self):
"""Binomial low should match values from R for integer successes"""
expected = {
(0, 1, 0.5): 0.5,
(1, 1, 0.5): 1,
(1, 1, 0.0000001): 1,
(1, 1, 0.9999999): 1,
(26, 50, .5): 0.6641,
(3, 5, 0.75):0.3671875,
(0, 60, 0.5): 8.673617e-19,
(129, 130, 0.5):1,
(299, 300, 0.099): 1,
(9, 27, 0.0003): 1,
(1032, 2050, 0.5): 0.6297845,
}
for (key, value) in expected.items():
self.assertFloatEqualRel(binomial_low(*key), value, 1e-4)
def test_binomial_series(self):
"""binomial_exact should match values from R on a whole series"""
expected = map(float, "0.0282475249 0.1210608210 0.2334744405 0.2668279320 0.2001209490 0.1029193452 0.0367569090 0.0090016920 0.0014467005 0.0001377810 0.0000059049".split())
for i in range(len(expected)):
self.assertFloatEqual(binomial_exact(i, 10, 0.3), expected[i])
def test_binomial_exact(self):
"""binomial_exact should match values from R for integer successes"""
expected = {
(0, 1, 0.5): 0.5,
(1, 1, 0.5): 0.5,
(1, 1, 0.0000001): 1e-07,
(1, 1, 0.9999999): 0.9999999,
(3, 5, 0.75):0.2636719,
(0, 60, 0.5): 8.673617e-19,
(129, 130, 0.5):9.550892e-38,
(299, 300, 0.099): 1.338965e-298,
(9, 27, 0.0003): 9.175389e-26,
(1032, 2050, 0.5): 0.01679804,
}
for (key, value) in expected.items():
self.assertFloatEqualRel(binomial_exact(*key), value, 1e-4)
def test_binomial_exact_floats(self):
"""binomial_exact should be within limits for floating point numbers
"""
expected = {
(18.3, 100, 0.2): (0.09089812, 0.09807429),
(2.7,1050,0.006): (0.03615498, 0.07623827),
(2.7,1050,0.06): (1.365299e-25, 3.044327e-24),
(2,100.5,0.6): (7.303533e-37, 1.789727e-36),
(10,100.5,.5):(7.578011e-18,1.365543e-17),
(0.2, 60, 0.5): (8.673617e-19, 5.20417e-17),
(.5,5,.3):(0.16807,0.36015),
}
for (key, value) in expected.items():
min_val, max_val = value
assert min_val < binomial_exact(*key) < max_val
#self.assertFloatEqualRel(binomial_exact(*key), value, 1e-4)
def test_binomial_exact_errors(self):
"""binomial_exact should raise errors on invalid input"""
self.assertRaises(ValueError, binomial_exact,10.2, 5, 0.33)
self.assertRaises(ValueError, binomial_exact,-2, 5, 0.33)
self.assertRaises(ValueError, binomial_exact, 10, 50, -2)
self.assertRaises(ValueError, binomial_exact, 10, 50, 3)
def test_f_high(self):
"""F high should match values from R for integer successes"""
expected = {
(1, 1, 0): 1,
(1, 1, 1): 0.5,
(1, 1, 20): 0.1400487,
(1, 1, 1000000): 0.0006366196,
(1, 10, 0): 1,
(1,10, 5): 0.0493322,
(1, 10, 20): 0.001193467,
(10, 1, 0):1,
(10, 10, 14.7): 0.0001062585,
(13.7, 11.9, 3.8): 0.01340347, #test non-integer degrees of freedom
#used following series to track down a bug after a failed test case
(28, 29, 2): 0.03424088,
(28, 29, 10): 1.053019e-08,
(28, 29, 20): 1.628245e-12,
(28, 29, 300): 5.038791e-29,
(28, 35, 1): 0.4946777,
(28, 37, 1): 0.4934486,
(28, 38, 1): 0.4928721,
(28, 38.001, 1): 0.4928716,
(28, 38.5, 1): 0.4925927,
(28, 39, 1): 0.492319,
(28, 39, 10): 1.431901e-10,
(28, 39, 20): 1.432014e-15,
(28, 39, 30): 1.059964e-18,
(28, 39, 50): 8.846678e-23,
(28, 39, 10): 1.431901e-10,
(28, 39, 300): 1.226935e-37,
(28, 39, 50): 8.846678e-23,
(28,39,304.7): 9.08154e-38,
(28.4, 39.2, 304.7): 5.573927e-38,
(1032, 2050, 0): 1,
(1032, 2050, 4.15): 1.23535e-165,
(1032, 2050, 0.5): 1,
(1032, 2050, 0.1): 1,
}
e = expected.items()
e.sort()
for (key, value) in e:
self.assertFloatEqualRel(f_high(*key), value)
def test_f_low(self):
"""F low should match values from R for integer successes"""
expected = {
(1, 1, 0): 0,
(1, 1, 1): 0.5,
(1, 1, 20): 0.8599513,
(1, 1, 1000000): 0.9993634,
(1, 10, 0): 0,
(1,10, 5): 0.9506678,
(1, 10, 20): 0.9988065,
(10, 1, 0):0,
(10, 10, 14.7): 0.9998937,
(28.4, 39.2, 304.7): 1,
(1032, 2050, 0): 0,
(1032, 2050, 4.15): 1,
(1032, 2050, 0.5): 7.032663e-35,
(1032, 2050, 0.1): 1.70204e-278,
}
for (key, value) in expected.items():
self.assertFloatEqualRel(f_low(*key), value)
def test_fprob(self):
"""fprob should return twice the tail on a particular side"""
error = 1e-4
#right-hand side
self.assertFloatEqualAbs(fprob(10,10,1.2), 0.7788, eps=error)
#left-hand side
self.assertFloatEqualAbs(fprob(10,10,1.2, side='left'), 1.2212,
eps=error)
self.assertRaises(ValueError, fprob, 10,10,-3)
self.assertRaises(ValueError, fprob, 10, 10, 1, 'non_valid_side')
def test_stdtr(self):
"""stdtr should match cephes results"""
t = [-10, -3.1, -0.5, -0.01, 0, 1, 0.5, 10]
k = [2, 10, 100]
exp = [
0.00492622851166,
7.94776587798e-07,
4.9508444923e-17,
0.0451003650651,
0.00562532860804,
0.00125696358826,
0.333333333333,
0.313946802871,
0.309086782915,
0.496464554479,
0.496108987495,
0.496020605117,
0.5,
0.5,
0.5,
0.788675134595,
0.829553433849,
0.840137922108,
0.666666666667,
0.686053197129,
0.690913217085,
0.995073771488,
0.999999205223,
1.0,
]
index = 0
for i in t:
for j in k:
self.assertFloatEqual(stdtr(j,i), exp[index])
index += 1
def test_bdtr(self):
"""bdtr should match cephes results"""
k_s = [0,1,2,3,5]
n_s = [5,10,1000]
p_s = [1e-10, .1, .5, .9, .999999]
exp = [
0.9999999995,
0.59049,
0.03125,
1e-05,
1.00000000014e-30,
0.999999999,
0.3486784401,
0.0009765625,
1e-10,
1.00000000029e-60,
0.9999999,
1.74787125172e-46,
9.33263618503e-302,
0.0,
0.0,
1.0,
0.91854,
0.1875,
0.00046,
4.99999600058e-24,
1.0,
0.7360989291,
0.0107421875,
9.1e-09,
9.99999100259e-54,
1.0,
1.9595578811e-44,
9.34196882121e-299,
0.0,
0.0,
1.0,
0.99144,
0.5,
0.00856,
9.99998500087e-18,
1.0,
0.9298091736,
0.0546875,
3.736e-07,
4.49999200104e-47,
1.0,
1.09744951737e-42,
4.67099374325e-296,
0.0,
0.0,
1.0,
0.99954,
0.8125,
0.08146,
9.99998000059e-12,
1.0,
0.9872048016,
0.171875,
9.1216e-06,
1.19999685024e-40,
1.0,
4.09381247279e-41,
1.5554471507e-293,
0.0,
0.0,
1.0,
1.0,
1.0,
1.0,
1.0,
1.0,
0.9998530974,
0.623046875,
0.0016349374,
2.51998950038e-28,
1.0,
2.55654569306e-38,
7.7385053063e-289,
0.0,
0.0,
]
index = 0
for k in k_s:
for n in n_s:
for p in p_s:
self.assertFloatEqual(bdtr(k,n,p), exp[index])
index += 1
def test_bdtrc(self):
"""bdtrc should give same results as cephes"""
k_s = [0,1,2,3,5]
n_s = [5,10,1000]
p_s = [1e-10, .1, .5, .9, .999999]
exp = [
4.999999999e-10,
0.40951,
0.96875,
0.99999,
1.0,
9.9999999955e-10,
0.6513215599,
0.9990234375,
0.9999999999,
1.0,
9.9999995005e-08,
1.0,
1.0,
1.0,
1.0,
9.999999998e-20,
0.08146,
0.8125,
0.99954,
1.0,
4.4999999976e-19,
0.2639010709,
0.9892578125,
0.9999999909,
1.0,
4.99499966766e-15,
1.0,
1.0,
1.0,
1.0,
9.9999999985e-30,
0.00856,
0.5,
0.99144,
1.0,
1.19999999937e-28,
0.0701908264,
0.9453125,
0.9999996264,
1.0,
1.66166987575e-22,
1.0,
1.0,
1.0,
1.0,
4.9999999996e-40,
0.00046,
0.1875,
0.91854,
0.99999999999,
2.09999999899e-38,
0.0127951984,
0.828125,
0.9999908784,
1.0,
4.14171214499e-30,
1.0,
1.0,
1.0,
1.0,
0.0,
0.0,
0.0,
0.0,
0.0,
2.09999999928e-58,
0.0001469026,
0.376953125,
0.9983650626,
1.0,
1.36817318242e-45,
1.0,
1.0,
1.0,
1.0,
]
index = 0
for k in k_s:
for n in n_s:
for p in p_s:
self.assertFloatEqual(bdtrc(k,n,p), exp[index])
index += 1
def test_pdtr(self):
"""pdtr should match cephes results"""
k_s = [0,1,2,5,10]
m_s = [1e-9, 0.1,0.5,1,2,31]
exp = [
0.999999999 ,
0.904837418036 ,
0.606530659713 ,
0.367879441171 ,
0.135335283237 ,
3.44247710847e-14 ,
1.0 ,
0.99532115984 ,
0.909795989569 ,
0.735758882343 ,
0.40600584971 ,
1.10159267471e-12 ,
1.0 ,
0.99984534693 ,
0.985612322033 ,
0.919698602929 ,
0.676676416183 ,
1.76426951809e-11 ,
1.0 ,
0.999999998725 ,
0.999985835063 ,
0.999405815182 ,
0.983436391519 ,
9.72616712615e-09 ,
1.0 ,
1.0 ,
0.999999999992 ,
0.999999989952 ,
0.999991691776 ,
1.12519146046e-05 ,
]
index = 0
for k in k_s:
for m in m_s:
self.assertFloatEqual(pdtr(k,m), exp[index])
index += 1
def test_pdtrc(self):
"""pdtrc should match cephes results"""
k_s = [0,1,2,5,10]
m_s = [1e-9, 0.1,0.5,1,2,31]
exp = [
9.999999995e-10 ,
0.095162581964 ,
0.393469340287 ,
0.632120558829 ,
0.864664716763 ,
1.0 ,
4.99999999667e-19 ,
0.00467884016044 ,
0.090204010431 ,
0.264241117657 ,
0.59399415029 ,
0.999999999999 ,
1.66666666542e-28 ,
0.000154653070265 ,
0.014387677967 ,
0.0803013970714 ,
0.323323583817 ,
0.999999999982 ,
1.3888888877e-57 ,
1.27489869223e-09 ,
1.41649373223e-05 ,
0.000594184817582 ,
0.0165636084806 ,
0.999999990274 ,
2.50521083625e-107 ,
2.28584493079e-19 ,
7.74084073923e-12 ,
1.00477663757e-08 ,
8.30822436848e-06 ,
0.999988748085 ,
]
index = 0
for k in k_s:
for m in m_s:
self.assertFloatEqual(pdtrc(k,m), exp[index])
index += 1
def test_fdtr(self):
"""fdtr should match cephes results"""
a_s = [1, 2, 10, 1000]
b_s = a_s
x_s = [0, 0.01, 0.5, 10, 521.4]
exp = [
0.0,
0.0634510348611,
0.391826552031,
0.805017770958,
0.972137685271,
0.0,
0.0705345615859,
0.4472135955,
0.912870929175,
0.998087586699,
0.0,
0.0776792814356,
0.504352495617,
0.989880440265,
0.999999999415,
0.0,
0.0796356309764,
0.520335137562,
0.998387447605,
1.0,
0.0,
0.00985245702333,
0.292893218813,
0.781782109764,
0.96904781206,
0.0,
0.00990099009901,
0.333333333333,
0.909090909091,
0.99808575804,
0.0,
0.00994027888402,
0.379078676941,
0.995884773663,
0.999999999923,
0.0,
0.00995006724716,
0.393317789705,
0.999949891187,
1.0,
0.0,
1.5895531756e-06,
0.18766987087,
0.758331535711,
0.965930763936,
0.0,
2.44851927021e-07,
0.185934432082,
0.90573080983,
0.998084291751,
0.0,
1.15978163168e-08,
0.144845806026,
0.999428447457,
0.999999999997,
0.0,
2.54720538101e-09,
0.109321108726,
1.0,
1.0,
0.0,
1.66707029586e-22,
0.157610464133,
0.751895627261,
0.965077362955,
0.0,
2.56671102571e-40,
0.135876263477,
0.904846465249,
0.998083928382,
0.0,
5.1131107462e-143,
0.030392376141,
0.999825108037,
0.999999999999,
0.0,
0.0,
9.96002853277e-28,
1.0,
1.0,
]
index = 0
for a in a_s:
for b in b_s:
for x in x_s:
self.assertFloatEqual(fdtr(a,b,x), exp[index])
index += 1
def test_fdtrc(self):
"""fdtrc should match cephes results"""
a_s = [1, 2, 10, 1000]
b_s = a_s
x_s = [0, 0.01, 0.5, 10, 521.4]
exp = [
1.0,
0.936548965139,
0.608173447969,
0.194982229042,
0.0278623147287,
1.0,
0.929465438414,
0.5527864045,
0.0871290708247,
0.00191241330122,
1.0,
0.922320718564,
0.495647504383,
0.0101195597354,
5.85364343244e-10,
1.0,
0.920364369024,
0.479664862438,
0.00161255239482,
3.24963344513e-93,
1.0,
0.990147542977,
0.707106781187,
0.218217890236,
0.0309521879405,
1.0,
0.990099009901,
0.666666666667,
0.0909090909091,
0.00191424196018,
1.0,
0.990059721116,
0.620921323059,
0.00411522633745,
7.73162209771e-11,
1.0,
0.990049932753,
0.606682210295,
5.01088134545e-05,
7.71037335669e-156,
1.0,
0.999998410447,
0.81233012913,
0.241668464289,
0.0340692360638,
1.0,
0.999999755148,
0.814065567918,
0.0942691901701,
0.00191570824928,
1.0,
0.999999988402,
0.855154193974,
0.000571552543402,
3.21796660031e-12,
1.0,
0.999999997453,
0.890678891274,
3.96065609687e-16,
0.0,
1.0,
1.0,
0.842389535866,
0.248104372739,
0.0349226370457,
1.0,
1.0,
0.864123736523,
0.0951535347509,
0.00191607161849,
1.0,
1.0,
0.969607623859,
0.00017489196271,
6.83862415869e-13,
1.0,
1.0,
1.0,
6.68418402018e-243,
0.0,
]
index = 0
for a in a_s:
for b in b_s:
for x in x_s:
self.assertFloatEqual(fdtrc(a,b,x), exp[index])
index += 1
def test_gdtr(self):
"""gdtr should match cephes results"""
a_s = [1, 2, 10, 1000]
b_s = a_s
x_s = [0, 0.01, 0.5, 10, 521.4]
exp = [
0.0,
0.00995016625083,
0.393469340287,
0.99995460007,
1.0,
0.0,
4.96679133403e-05,
0.090204010431,
0.999500600773,
1.0,
0.0,
2.7307942837e-27,
1.70967002935e-10,
0.542070285528,
1.0,
0.0,
0.0,
0.0,
0.0,
2.78154480191e-77,
0.0,
0.0198013266932,
0.632120558829,
0.999999997939,
1.0,
0.0,
0.00019735322711,
0.264241117657,
0.999999956716,
1.0,
0.0,
2.77103020131e-24,
1.11425478339e-07,
0.995004587692,
1.0,
0.0,
0.0,
0.0,
0.0,
0.91070640569,
0.0,
0.095162581964,
0.993262053001,
1.0,
1.0,
0.0,
0.00467884016044,
0.959572318005,
1.0,
1.0,
0.0,
2.51634780677e-17,
0.0318280573062,
1.0,
1.0,
0.0,
0.0,
0.0,
0.0,
1.0,
0.0,
0.99995460007,
1.0,
1.0,
1.0,
0.0,
0.999500600773,
1.0,
1.0,
1.0,
0.0,
0.542070285528,
1.0,
1.0,
1.0,
0.0,
0.0,
3.29827279707e-86,
1.0,
1.0,
]
index = 0
for a in a_s:
for b in b_s:
for x in x_s:
self.assertFloatEqual(gdtr(a,b,x), exp[index])
index += 1
def test_gdtrc(self):
"""gdtrc should match cephes results"""
a_s = [1, 2, 10, 1000]
b_s = a_s
x_s = [0, 0.01, 0.5, 10, 521.4]
exp = [
1.0,
0.990049833749,
0.606530659713,
4.53999297625e-05,
3.62123855523e-227,
1.0,
0.999950332087,
0.909795989569,
0.000499399227387,
1.89173502125e-224,
1.0,
1.0,
0.999999999829,
0.457929714472,
2.89188102723e-208,
1.0,
1.0,
1.0,
1.0,
1.0,
1.0,
0.980198673307,
0.367879441171,
2.06115362244e-09,
0.0,
1.0,
0.999802646773,
0.735758882343,
4.32842260712e-08,
0.0,
1.0,
1.0,
0.999999888575,
0.00499541230831,
0.0,
1.0,
1.0,
1.0,
1.0,
0.0892935943104,
1.0,
0.904837418036,
0.00673794699909,
3.72007597602e-44,
0.0,
1.0,
0.99532115984,
0.0404276819945,
3.75727673578e-42,
0.0,
1.0,
1.0,
0.968171942694,
1.12534739608e-31,
0.0,
1.0,
1.0,
1.0,
1.0,
0.0,
1.0,
4.53999297625e-05,
7.12457640674e-218,
0.0,
0.0,
1.0,
0.000499399227387,
3.56941277978e-215,
0.0,
0.0,
1.0,
0.457929714472,
3.90479663912e-199,
0.0,
0.0,
1.0,
1.0,
1.0,
0.0,
0.0,
]
index = 0
for a in a_s:
for b in b_s:
for x in x_s:
self.assertFloatEqual(gdtrc(a,b,x), exp[index])
index += 1
def test_chdtri(self):
"""chdtri should match cephes results"""
k_s = [1,2,5,10,100]
p_s = [1e-50, 1e-9, .02, .5, .8, .99]
exp = [
224.384748319,
37.3248930514,
5.41189443105,
0.45493642312,
0.0641847546673,
0.00015708785791,
230.258509299,
41.4465316739,
7.82404601086,
1.38629436112,
0.446287102628,
0.020100671707,
244.127298027,
50.6921937015,
13.388222599,
4.3514601911,
2.34253430584,
0.554298076728,
262.995620961,
62.9454574206,
21.1607675413,
9.34181776559,
6.17907925604,
2.55821216019,
478.347499744,
209.317598707,
131.141676866,
99.334129236,
87.9453359228,
70.0648949254,
]
index = 0
for k in k_s:
for p in p_s:
self.assertFloatEqual(chdtri(k,p), exp[index])
index += 1
def test_stdtri(self):
"""stdtri should match cephes results"""
k_s = [1,2,5,10,100]
p_s = [1e-50, 1e-9, .02, .5, .8, .99]
exp = [
-3.18309886184e+49,
-318309886.184,
-15.8945448441,
8.1775627727e-17,
1.37638192049,
31.8205159538,
-7.07106781216e+24,
-22360.6797414,
-4.84873221444,
7.48293180888e-17,
1.06066017178,
6.96455671876,
-15683925591.1,
-98.9372246484,
-2.75650852191,
6.976003623e-17,
0.919543780236,
3.36492999891,
-256452.571877,
-20.1446977667,
-2.35931462368,
6.80574793291e-17,
0.879057828551,
2.76376945745,
-28.9584072963,
-6.59893982023,
-2.08088390123,
6.6546053747e-17,
0.845230424487,
2.3642173659,
]
index = 0
for k in k_s:
for p in p_s:
self.assertFloatEqual(stdtri(k,p), exp[index])
index += 1
def test_pdtri(self):
"""pdtri should match cephes results"""
k_s = [1,2,5,10,100]
p_s = [1e-50, 1e-9, .02, .5, .8, .99]
exp = [
119.924420375,
23.9397278656,
5.83392170192,
1.67834699002,
0.824388309033,
0.148554740253,
124.094307191,
26.6722865587,
7.51660387561,
2.67406031372,
1.53504420264,
0.436045165078,
134.901981814,
33.6746016741,
12.0269783451,
5.67016118871,
3.90366383933,
1.7852844853,
150.2138305,
43.627975401,
18.8297496417,
10.6685224038,
8.15701989758,
4.77124616939,
332.371212972,
173.368244558,
122.695978128,
100.666862949,
92.4593447729,
79.0999186597,
]
index = 0
for k in k_s:
for p in p_s:
self.assertFloatEqual(pdtri(k,p), exp[index])
index += 1
def test_bdtri(self):
"""bdtri should match cephes results"""
k_s = [0,1,2,3]
n_s = [5,10,1000]
p_s = [1e-10, .1, .5, .9, .999999]
exp = [
0.99,
0.36904265552,
0.129449436704,
0.020851637639,
2.00000080006e-07,
0.9,
0.205671765276,
0.0669670084632,
0.0104807417938,
1.00000045003e-07,
0.0227627790442,
0.00229993617745,
0.000692907009547,
0.000105354965434,
1.00000049953e-09,
0.997884361719,
0.58389037462,
0.313810170456,
0.112234958546,
0.000316327821398,
0.939678616058,
0.336847723307,
0.162262728195,
0.0545286199977,
0.00014913049349,
0.0260030189545,
0.0038841043984,
0.00167777786542,
0.000531936197341,
1.415587631e-06,
0.999784533318,
0.753363546712,
0.5,
0.246636453288,
0.00465241636163,
0.964779035441,
0.449603888674,
0.258574723285,
0.11582527803,
0.00203463563411,
0.0287538329681,
0.00531348536403,
0.00267315927217,
0.00110256069953,
1.82723947076e-05,
0.999996837712,
0.887765041454,
0.686189829544,
0.41610962538,
0.0212382182007,
0.981054188003,
0.551730832384,
0.355099967912,
0.187562296647,
0.00839131408953,
0.0312483560212,
0.00666849533707,
0.00367082709364,
0.00174586632568,
7.10965576424e-05,
]
index = 0
for k in k_s:
for n in n_s:
for p in p_s:
self.assertFloatEqual(bdtri(k,n,p), exp[index])
index += 1
def test_gdtri(self):
"""gdtri should match cephes results"""
k_s = [1,2,4,10,100]
n_s = k_s
p_s = [1e-9, .02, .5, .8, .99]
exp = [
1.0000000005e-09,
0.0202027073175,
0.69314718056,
1.60943791243,
4.60517018599,
4.47220262303e-05,
0.214699095008,
1.67834699002,
2.994308347,
6.63835206799,
0.0124777531242,
1.01623845904,
3.67206074885,
5.51504571515,
10.0451175148,
0.602134838869,
4.61834927756,
9.66871461471,
12.5187528198,
18.7831173933,
51.1433022288,
80.5501391278,
99.6668649193,
108.304391619,
124.722561491,
5.0000000025e-10,
0.0101013536588,
0.34657359028,
0.804718956217,
2.30258509299,
2.23610131152e-05,
0.107349547504,
0.839173495008,
1.4971541735,
3.319176034,
0.00623887656209,
0.50811922952,
1.83603037443,
2.75752285758,
5.02255875742,
0.301067419435,
2.30917463878,
4.83435730736,
6.25937640991,
9.39155869666,
25.5716511144,
40.2750695639,
49.8334324597,
54.1521958095,
62.3612807454,
2.50000000125e-10,
0.00505067682938,
0.17328679514,
0.402359478109,
1.1512925465,
1.11805065576e-05,
0.053674773752,
0.419586747504,
0.748577086751,
1.659588017,
0.00311943828105,
0.25405961476,
0.918015187213,
1.37876142879,
2.51127937871,
0.150533709717,
1.15458731939,
2.41717865368,
3.12968820495,
4.69577934833,
12.7858255572,
20.1375347819,
24.9167162298,
27.0760979048,
31.1806403727,
1.0000000005e-10,
0.00202027073175,
0.069314718056,
0.160943791243,
0.460517018599,
4.47220262303e-06,
0.0214699095008,
0.167834699002,
0.2994308347,
0.663835206799,
0.00124777531242,
0.101623845904,
0.367206074885,
0.551504571515,
1.00451175148,
0.0602134838869,
0.461834927756,
0.966871461471,
1.25187528198,
1.87831173933,
5.11433022288,
8.05501391278,
9.96668649193,
10.8304391619,
12.4722561491,
1.0000000005e-11,
0.000202027073175,
0.0069314718056,
0.0160943791243,
0.0460517018599,
4.47220262303e-07,
0.00214699095008,
0.0167834699002,
0.02994308347,
0.0663835206799,
0.000124777531242,
0.0101623845904,
0.0367206074885,
0.0551504571515,
0.100451175148,
0.00602134838869,
0.0461834927756,
0.0966871461471,
0.125187528198,
0.187831173933,
0.511433022288,
0.805501391278,
0.996668649193,
1.08304391619,
1.24722561491,
]
index = 0
for k in k_s:
for n in n_s:
for p in p_s:
self.assertFloatEqual(gdtri(k,n,p), exp[index])
index += 1
def test_fdtri(self):
"""fdtri should match cephes results"""
k_s = [1,2,4,10,100]
n_s = k_s
p_s = [1e-50, 1e-9, .02, .5, .8, .99]
exp = [
0.0,
2.46740096071e-18,
0.000987610197427,
1.0,
9.472135955,
4052.18069548,
0.0,
1.99999988687e-18,
0.000800320128051,
0.666666666667,
3.55555555556,
98.5025125628,
0.0,
1.77777767722e-18,
0.000711321880645,
0.548632170413,
2.35072147881,
21.1976895844,
0.0,
1.65119668161e-18,
0.000660638708985,
0.489736921158,
1.88288794493,
10.0442892734,
0.0,
1.57866975531e-18,
0.000631602221127,
0.458262714634,
1.66429288986,
6.89530103058,
0.0,
9.99999973218e-10,
0.0206164098292,
1.5,
12.0,
4999.5,
0.0,
9.99999972718e-10,
0.0204081632653,
1.0,
4.0,
99.0,
0.0,
9.99999972468e-10,
0.0203050891044,
0.828427124746,
2.472135955,
18.0,
0.0,
9.99999972318e-10,
0.0202435772829,
0.743491774985,
1.89864830731,
7.55943215755,
0.0,
9.99999972228e-10,
0.0202067893611,
0.697973989501,
1.63562099482,
4.82390980716,
0.0,
1.29104998825e-05,
0.0712270257663,
1.82271484235,
13.6443218387,
5624.58332963,
0.0,
1.58118880931e-05,
0.082357834815,
1.20710678119,
4.2360679775,
99.2493718553,
0.0,
1.82578627816e-05,
0.0917479893415,
1.0,
2.48261291932,
15.9770248526,
0.0,
2.04128031324e-05,
0.0999726146531,
0.898817134423,
1.82861100515,
5.99433866163,
0.0,
2.21407117017e-05,
0.106518545067,
0.844891468084,
1.5273126184,
3.5126840636,
0.0,
0.00213897888638,
0.130917099116,
2.04191262042,
14.7718897826,
6055.8467074,
0.0,
0.00322083313175,
0.168531162323,
1.34500479177,
4.38216390487,
99.3991959745,
0.0,
0.00448830777955,
0.207656634378,
1.11257336081,
2.45957986729,
14.5459008033,
0.0,
0.00608578074458,
0.251574092492,
1.0,
1.73159473193,
4.84914680208,
0.0,
0.00800159033308,
0.298648905106,
0.940477156977,
1.38089597558,
2.50331112688,
0.0,
3.09672866088e-11,
0.178906118636,
2.18215440197,
15.4973240414,
6334.110036,
0.0,
5.2776234633e-11,
0.2457526061,
1.43271814572,
4.47142755584,
99.4891628084,
0.0,
0.0659164713677,
0.326585865322,
1.18358397235,
2.43020291912,
13.5769915067,
0.0,
0.119865858243,
0.442669184276,
1.06329004653,
1.63265061785,
4.01371941549,
0.0,
0.289673110482,
0.661509869668,
1.0,
1.1839371445,
1.59766912303,
]
index = 0
for k in k_s:
for n in n_s:
for p in p_s:
self.assertFloatEqual(fdtri(k,n,p), exp[index])
index += 1
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_maths/test_stats/test_histogram.py 000644 000765 000024 00000007432 12024702176 025367 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Provides tests for Histogram.
"""
from cogent.util.unit_test import TestCase, main
from cogent.maths.stats.histogram import Histogram
from cogent.core.location import Span
__author__ = "Sandra Smit"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Sandra Smit"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Sandra Smit"
__email__ = "sandra.smit@colorado.edu"
__status__ = "Production"
class HistogramTests(TestCase):
"""Tests for Histogram class"""
def test_init_no_bins(self):
"""Histogram should raise error if initialized without bins"""
# you deserve an Error if you initialize your histogram
# without providing Bins
self.assertRaises(AttributeError, Histogram)
def test_init_bins(self):
"""Histogram should set _bins property correctly"""
bins = [Span(0,2),Span(2,4),Span(4,6)]
bins_only = Histogram(bins=bins)
self.assertEqual(bins_only._bins, bins)
def test_init_bins_data(self):
"""Histogram should fill bins with data if supplied"""
# most basic histogram, bins and data
data = [1,3,5,'A']
bins = [Span(0,2),Span(2,4),Span(4,6)]
data_and_bins = Histogram(data=data,bins=bins)
self.assertEqual(data_and_bins._bins,bins)
self.assertEqual(data_and_bins._values,[[1],[3],[5]])
self.assertEqual(data_and_bins.Other,['A'])
def test_call(self):
"""Histogram __call__ should update with new data"""
data = [1,3,5,'A']
bins = [Span(0,2),Span(2,4),Span(4,6)]
data_and_bins = Histogram(data=data,bins=bins)
#update the histogram
data_and_bins([4,5,6,7])
self.assertEqual(data_and_bins._values,[[1],[3],[5,4,5]])
self.assertEqual(data_and_bins.Other,['A',6,7])
def test_mapping(self):
"""Histogram Mapping should apply correct function to values"""
# bins, data, mapping
data = ['A','AAA','CCCCC','GGGGGGGGGGGGGG']
bins = [Span(0,2),Span(2,4),Span(4,6)]
mapping = Histogram(data=data,bins=bins,Mapping=len)
self.assertEqual(mapping._values, [['A'],['AAA'],['CCCCC']])
self.assertEqual(mapping.Other,['GGGGGGGGGGGGGG'])
def test_multi(self):
"""Histogram Multi should allow values to match multiple bins"""
#bins, data, multi=True
bins2 = [Span(0,5),Span(3,8),Span(6,10)]
data2 = [0,1,2,3,4,5,6,7,8,9,10]
not_multi = Histogram(data2,bins2)
self.assertEqual(not_multi._values,[[0,1,2,3,4],[5,6,7],[8,9]])
self.assertEqual(not_multi.Other,[10])
multi = Histogram(data2,bins2,Multi=True)
self.assertEqual(multi._values,[[0,1,2,3,4],[3,4,5,6,7],[6,7,8,9]])
self.assertEqual(multi.Other,[10])
def test_toFreqs(self):
"""Histogram toFreqs() should return a Freqs object"""
h = Histogram(range(0,20),bins=[Span(0,3),Span(3,10),
Span(10,18),Span(18,20)])
constructor=str
f = h.toFreqs()
self.assertEqual(f[constructor(Span(0,3))],3)
self.assertEqual(f[constructor(Span(3,10))],7)
self.assertEqual(f[constructor(Span(10,18))],8)
self.assertEqual(f[constructor(Span(18,20))],2)
def test_clear(self):
"""Histogram clear should reset all data"""
data = [1,3,5,'A']
bins = [Span(0,2),Span(2,4),Span(4,6)]
data_and_bins = Histogram(data=data,bins=bins)
self.assertEqual(data_and_bins._bins,bins)
self.assertEqual(data_and_bins._values,[[1],[3],[5]])
self.assertEqual(data_and_bins.Other,['A'])
data_and_bins.clear()
self.assertEqual(data_and_bins._values,[[],[],[]])
self.assertEqual(data_and_bins.Other,[])
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_maths/test_stats/test_information_criteria.py 000644 000765 000024 00000002047 12024702176 027576 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from cogent.util.unit_test import TestCase, main
from cogent.maths.stats.information_criteria import aic, bic
__author__ = "Gavin Huttley"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Gavin Huttley"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Production"
class InformationCriteria(TestCase):
"""Tests calculation of AIC and BIC measures."""
def test_aic(self):
"""correctly compute AIC from Burnham & Anderson 2002, p102"""
self.assertFloatEqual(aic(-9.7039, 4), 27.4078)
def test_aic_corrected(self):
"""correctly compute AIC corrected for small sample size"""
# from Burnham & Anderson 2002, p102
self.assertFloatEqual(aic(-9.7039, 4, sample_size=13), 32.4078)
def test_bic(self):
"""correctly compute BIC"""
# against hand calculated
self.assertFloatEqual(bic(-9.7039, 4, 13), 29.6675974298)
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_maths/test_stats/test_jackknife.py 000644 000765 000024 00000013752 12024702176 025321 0 ustar 00jrideout staff 000000 000000 import numpy as np
from cogent.util.unit_test import TestCase, main
from cogent.maths.stats.jackknife import JackknifeStats
__author__ = "Anuj Pahwa, Gavin Huttley"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Anuj Pahwa", "Gavin Huttley"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "Gavin.Huttley@anu.edu.au"
__status__ = "Production"
def pmcc(data, axis=1):
"""Compute the Product-moment correlation coefficient.
Expression 15.3 from Biometry by Sokal/Rohlf
This code implementation is on the proviso that the data that is provided
is two dimensional: [[Y1], [Y2]] (trying to determine the correlation
coefficient between data sets Y1 and Y2"""
if axis is 0:
data = data.transpose()
axis = 1
other_axis = 0
mean = data.mean(axis=axis)
data_less_mean = np.array([data[0] - mean[0],
data[1] - mean[1]])
sum_squares = np.sum(np.square(data_less_mean), axis=axis)
sum_products = np.sum(np.prod(data_less_mean, axis=other_axis))
pmcc = np.divide(sum_products, np.sqrt(np.prod(sum_squares)))
z_trans = np.arctanh(pmcc)
return z_trans
# test data from Box 15.2; Biometry by Sokal/Rohlf
data = np.array([[159, 179, 100, 45, 384, 230, 100, 320, 80, 220, 320, 210],
[14.40, 15.20, 11.30, 2.50, 22.70, 14.90, 1.41, 15.81, 4.19, \
15.39, 17.25, 9.52]])
# factory function generator for the statistical function of interest
def stat_maker(func, data, axis):
def calc_stat(coords):
subset_data = data.take(coords, axis)
return func(subset_data, axis)
return calc_stat
# function to compute mean of a np array
def mean(data, axis):
return data.mean(axis=axis)
class JackknifeTests(TestCase):
def test_proper_initialise(self):
"""jackknife should initialise correctly"""
# Scalar
pmcc_stat = stat_maker(pmcc, data, 1)
test_knife = JackknifeStats(data.shape[1], pmcc_stat)
self.assertEqual(test_knife.n, data.shape[1])
self.assertEqual(test_knife._jackknifed_stat, None)
# Vector
mean_stat = stat_maker(mean, data, 1)
test_knife = JackknifeStats(data.shape[1], mean_stat)
self.assertEqual(test_knife.n, data.shape[1])
self.assertEqual(test_knife._jackknifed_stat, None)
def test_jackknife_stats(self):
"""jackknife results should match Sokal & Rolf example"""
# Scalar
pmcc_stat = stat_maker(pmcc, data, 1)
test_knife = JackknifeStats(data.shape[1], pmcc_stat)
self.assertAlmostEquals(test_knife.JackknifedStat, 1.2905845)
self.assertAlmostEquals(test_knife.StandardError, 0.2884490)
self.assertTrue(test_knife._jackknifed_stat is not None)
# Vector
mean_stat = stat_maker(mean, data, 1)
test_knife = JackknifeStats(data.shape[1], mean_stat)
expected_jk_stat = data.mean(axis=1)
got_jk_stat = test_knife.JackknifedStat
expected_standard_err = [30.69509346, 1.87179671]
got_standard_err = test_knife.StandardError
for index in [0,1]:
self.assertAlmostEqual(got_jk_stat[index], expected_jk_stat[index])
self.assertAlmostEqual(got_standard_err[index],
expected_standard_err[index])
def test_tables(self):
"""jackknife should work for calculators return scalars or vectors"""
# Scalar
pmcc_stat = stat_maker(pmcc, data, 1)
test_knife = JackknifeStats(data.shape[1], pmcc_stat)
expected_subsample_stats = [1.4151, 1.3946, 1.4314, 1.1889, 1.1323, \
1.3083, 1.3561, 1.3453, 1.2412, 1.3216, \
1.2871, 1.3664]
expected_pseudovalues = [0.1968, 0.4224, 0.0176, 2.6852, 3.3084, \
1.3718, 0.8461, 0.9650, 2.1103, 1.2253, \
1.6049, 0.7333]
test_knife.jackknife()
got_subsample_stats = test_knife._subset_statistics
got_pseudovalues = test_knife._pseudovalues
for index in range(data.shape[1]):
self.assertAlmostEqual(got_subsample_stats[index],
expected_subsample_stats[index], places=4)
self.assertAlmostEqual(got_pseudovalues[index],
expected_pseudovalues[index], places=4)
# Vector
mean_stat = stat_maker(mean, data, 1)
test_knife = JackknifeStats(data.shape[1], mean_stat)
test_knife.jackknife()
expected_pseudovalues = data.transpose()
expected_subsample_stats = [[ 198.9091, 11.8336],
[ 197.0909, 11.7609],
[ 204.2727, 12.1155],
[ 209.2727, 12.9155],
[ 178.4545, 11.0791],
[ 192.4545, 11.7882],
[ 204.2727, 13.0145],
[ 184.2727, 11.7055],
[ 206.0909, 12.7618],
[ 193.3636, 11.7436],
[ 184.2727, 11.5745],
[ 194.2727, 12.2773]]
got_subsample_stats = test_knife._subset_statistics
got_pseudovalues = test_knife._pseudovalues
for index1 in range(data.shape[1]):
for index2 in range(data.shape[0]):
self.assertAlmostEqual(got_subsample_stats[index1][index2],
expected_subsample_stats[index1][index2],
places=4)
self.assertAlmostEqual(got_pseudovalues[index1][index2],
expected_pseudovalues[index1][index2],
places=4)
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_maths/test_stats/test_ks.py 000644 000765 000024 00000010647 12024702176 024011 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from cogent.util.unit_test import TestCase, main
from cogent.maths.stats.ks import pkolmogorov1x, pkolmogorov2x, pkstwo,\
psmirnov2x
from cogent.maths.stats.test import ks_test, ks_boot
__author__ = "Gavin Huttley"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Gavin Huttley"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Production"
class KSTests(TestCase):
"""Tests Kolmogorov-Smirnov."""
def setUp(self):
self.x1 = [0.09916191, 0.29732882, 0.41475044, 0.68816838, 0.20841367,
0.46129887, 0.22074544, 0.06889561, 0.88264852, 0.87726406, 0.76905072,
0.86178033, 0.42596777, 0.59443782, 0.68852176, 0.66032130, 0.72683791,
0.02363118, 0.82384762, 0.32759965, 0.69231127, 0.50848596, 0.67500888,
0.84919139, 0.70774136, 0.97847465, 0.59784714, 0.82033663, 0.45640039,
0.13054766, 0.01227875, 0.21229238, 0.37054602, 0.80905622, 0.26056527,
0.01662457, 0.76277188, 0.76892495, 0.39186350, 0.61468789, 0.83247770,
0.69946238, 0.80550609, 0.22336814, 0.62491296, 0.03413056, 0.74500251,
0.36008309, 0.19443889, 0.06808133]
self.x2 = [1.1177760, 0.9984325, 0.8113576, 0.7247507, 0.9473543, 1.1192222,
1.2577115, 0.6168244, 0.9616475, 1.0677138, 0.5106196, 1.2334833,
0.3750225, 0.9788191, 1.1366872, 0.8212352, 0.7665240, 0.4409294,
0.4447418, 1.1381901, 0.7299300, 1.1307991, 0.5356031, 0.3193794,
1.2476867, 0.7909454, 0.7781800, 0.8438637, 1.1814135, 1.0117055,
0.7433708, 0.7917239, 0.5080752, 0.9014003, 0.5960710, 0.9646521,
0.9263595, 0.7969784, 1.2847108, 0.6393015, 0.6828791, 1.0817340,
0.6586887, 0.7314203, 0.3998812, 0.9988478, 1.0225579, 1.2721428,
0.6465969, 0.9133413]
def test_pk1x(self):
"""1 sample 1-sided should match answers from R"""
self.assertFloatEqual(pkolmogorov1x(0.06, 30), 0.2248113)
def test_pk2x(self):
"""1 sample 2-sided should match answers from R"""
self.assertFloatEqual(pkolmogorov2x(0.7199, 50), (1-6.661e-16))
self.assertFloatEqual(pkolmogorov2x(0.08, 30), 0.01754027)
self.assertFloatEqual(pkolmogorov2x(0.03, 300), 0.05753413)
def test_ps2x(self):
"""2 sample 2-sided smirnov should match answers from R"""
self.assertFloatEqual(psmirnov2x(0.48, 20, 50), 0.9982277)
self.assertFloatEqual(psmirnov2x(0.28, 20, 50), 0.8161612)
self.assertFloatEqual(psmirnov2x(0.28, 50, 20), 0.8161612)
def tes_pk2x(self):
"""2 sample 2-sided kolmogorov should match answers from R"""
self.assertFloatEqual(pkolmogorov1x(0.058, 50), 0.007530237)
self.assertFloatEqual(pkolmogorov1x(0.018, 50), 4.887356e-26)
self.assertFloatEqual(pkolmogorov1x(0.018, 5000), 0.922618)
def test_pkstwo(self):
"""kolmogorov asymptotic should match answers from R"""
self.assertFloatEqual(pkstwo(2.3),[1-5.084e-05],eps=1e-5)
def test_ks2x(self):
"""KS two-sample, 2-sided should match answers from R"""
D, Pval = ks_test(self.x1, self.x2)
self.assertFloatEqual((D, Pval), (0.46, 3.801e-05), eps=1e-4)
D, Pval = ks_test(self.x1, self.x2, exact=False)
self.assertFloatEqual((D, Pval), (0.46, 5.084e-05), eps=1e-4)
D, Pval = ks_test(self.x1, self.x2[:20])
self.assertFloatEqual((D,Pval), (0.53, 0.0003576), eps=1e-4)
D, Pval = ks_test(self.x2[:20], self.x1)
self.assertFloatEqual((D,Pval), (0.53, 0.0003576), eps=1e-4)
D, Pval = ks_test(self.x1[:20], self.x2)
self.assertFloatEqual((D,Pval), (0.48, 0.001772), eps=1e-4)
D, Pval = ks_test(self.x1, self.x2, alt="greater")
self.assertFloatEqual((D,Pval), (0.46, 2.542e-05), eps=1e-4)
D, Pval = ks_test(self.x1, self.x2, alt="g")
self.assertFloatEqual((D,Pval), (0.46, 2.542e-05), eps=1e-4)
D, Pval = ks_test(self.x1, self.x2, alt="less")
self.assertFloatEqual((D,Pval), (6.9388939039072284e-18, 1.), eps=1e-4)
D, Pval = ks_test(self.x2, self.x1, alt="l")
self.assertFloatEqual((D,Pval), (0.46, 2.542e-05), eps=1e-4)
def test_ks_boot(self):
"""excercising the bootstrapped version of KS"""
D, Pval = ks_boot(self.x1[:10], self.x2[:10], num_reps=10)
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_maths/test_stats/test_period.py 000644 000765 000024 00000012704 12024702176 024652 0 ustar 00jrideout staff 000000 000000 import numpy
from cogent.util.unit_test import TestCase, main
from cogent.maths.stats.period import chi_square, factorial, g_statistic, \
circular_indices, _seq_to_symbols, seq_to_symbols, blockwise_bootstrap, \
SeqToSymbols
from cogent.maths.period import ipdft, hybrid, auto_corr, Hybrid, Ipdft, \
AutoCorrelation
__author__ = "Hua Ying, Julien Epps and Gavin Huttley"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Julien Epps", "Hua Ying", "Gavin Huttley"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "Gavin.Huttley@anu.edu.au"
__status__ = "Production"
class TestPeriodStat(TestCase):
def setUp(self):
x = [1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1,
0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0,
1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1,
1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 1, 1,
1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1,
0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1,
0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1,
0, 0, 0, 0, 1, 1, 1, 0, 0, 0]
self.x = numpy.array(x)
self.sig = numpy.array(self.x, numpy.float64)
self.motifs = ['AA', 'TT', 'TA']
def test_chi_square(self):
D, cs_p_val = chi_square(self.x, 10)
self.assertEqual('%.4f'%D, '0.4786')
self.assertEqual('%.4f'%cs_p_val, '0.4891')
def test_factorial(self):
self.assertEqual(factorial(1), 1)
self.assertEqual(factorial(4), 24)
self.assertEqual(factorial(0), 1)
def test_g_statitic(self):
"""calc g-stat correctly"""
X, periods = ipdft(self.sig, llim=2, ulim=39)
g_obs, p_val = g_statistic(X)
self.assertFloatEqual(p_val, 0.9997, eps=1e-3)
self.assertFloatEqual(g_obs, 0.0577, eps=1e-3)
def test_circular_indices(self):
v = range(10)
self.assertEqual(circular_indices(v, 8, 10, 4), [8,9,0,1])
self.assertEqual(circular_indices(v, 9, 10, 4), [9,0,1,2])
self.assertEqual(circular_indices(v, 4, 10, 4), [4,5,6,7])
def test_seq_to_symbol(self):
"""both py and pyx seq_to_symbol versions correctly convert a sequence"""
motifs = ['AA', 'AT', 'TT']
symbols = _seq_to_symbols('AATGGTTA', motifs, 2)
self.assertEqual(symbols, numpy.array([1,1,0,0,0,1,0,0]))
symbols = seq_to_symbols('AAGATT', motifs, 2, numpy.zeros(6, numpy.uint8))
self.assertEqual(symbols, numpy.array([1,0,0,1,1,0]))
def test_seq_to_symbol_factory(self):
"""checks factory function for conversion works"""
motifs = ['AA', 'AT', 'TT']
seq_to_symbols = SeqToSymbols(motifs)
self.assertEqual(seq_to_symbols('AATGGTTA'),
numpy.array([1,1,0,0,0,1,0,0]))
self.assertEqual(seq_to_symbols('AAGATT'),
numpy.array([1,0, 0, 1, 1, 0], numpy.uint8))
def test_permutation(self):
s = 'ATCGTTGGGACCGGTTCAAGTTTTGGAACTCGCAAGGGGTGAATGGTCTTCGTCTAACGCTGG'\
'GGAACCCTGAATCGTTGTAACGCTGGGGTCTTTAACCGTTCTAATTTAACGCTGGGGGGTTCT'\
'AATTTTTAACCGCGGAATTGCGTC'
seq_to_symbol = SeqToSymbols(self.motifs, length=len(s))
hybrid_calc = Hybrid(len(s), llim=2, period = 4)
ipdft_calc = Ipdft(len(s), llim=2, period = 4)
stat, p = blockwise_bootstrap(s, hybrid_calc, block_size=10,
num_reps=1000, seq_to_symbols=seq_to_symbol)
# print 's=%.4f; p=%.3f' % (stat, p)
stat, p = blockwise_bootstrap(s, ipdft_calc, block_size=10,
num_reps=1000, seq_to_symbols=seq_to_symbol)
# print 's=%.4f; p=%.3f' % (stat, p)
def test_permutation_all(self):
"""performs permutation test of Hybrid, but considers all stats"""
s = 'ATCGTTGGGACCGGTTCAAGTTTTGGAACTCGCAAGGGGTGAATGGTCTTCGTCTAACGCTGG'\
'GGAACCCTGAATCGTTGTAACGCTGGGGTCTTTAACCGTTCTAATTTAACGCTGGGGGGTTCT'\
'AATTTTTAACCGCGGAATTGCGTC'
seq_to_symbol = SeqToSymbols(self.motifs, length=len(s))
hybrid_calc = Hybrid(len(s), period = 4, return_all=True)
stat, p = blockwise_bootstrap(s, hybrid_calc, block_size=10,
num_reps=1000, seq_to_symbols=seq_to_symbol)
# print 's=%s; p=%s' % (stat, p)
def test_get_num_stats(self):
"""calculators should return correct num stats"""
hybrid_calc = Hybrid(150, llim=2, period = 4)
ipdft_calc = Ipdft(150, llim=2, period = 4)
autocorr_calc = AutoCorrelation(150, llim=2, period = 4)
self.assertEqual(hybrid_calc.getNumStats(), 1)
self.assertEqual(ipdft_calc.getNumStats(), 1)
self.assertEqual(autocorr_calc.getNumStats(), 1)
hybrid_calc = Hybrid(150, llim=2, period = 4, return_all=True)
self.assertEqual(hybrid_calc.getNumStats(), 3)
def test_permutation_skips(self):
"""permutation test correctly handles data without symbols"""
s = 'N' * 150
seq_to_symbol = SeqToSymbols(self.motifs, length=len(s))
ipdft_calc = Ipdft(len(s), llim=2, period = 4)
stat, p = blockwise_bootstrap(s, ipdft_calc, block_size=10,
num_reps=1000, seq_to_symbols=seq_to_symbol, num_stats=1)
self.assertEqual(stat, 0.0)
self.assertEqual(p, 1.0)
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_maths/test_stats/test_rarefaction.py 000644 000765 000024 00000015512 12024702176 025665 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
#file test_parse.py
from numpy import array
from cogent.util.unit_test import TestCase, main
from cogent.maths.stats.rarefaction import (subsample,
naive_histogram,
wrap_numpy_histogram,
rarefaction,
subsample_freq_dist_nonzero,
subsample_random,
subsample_multinomial)
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class TopLevelTests(TestCase):
"""Tests of top-level functions"""
def test_subsample(self):
"""subsample should return a random subsample of a vector"""
a = array([0,5,0])
self.assertEqual(subsample(a,5), array([0,5,0]))
self.assertEqual(subsample(a,2), array([0,2,0]))
b = array([2,0,1])
# selecting 2 counts from the vector 1000 times yields each of the
# two possible results at least once each
b = array([2,0,1])
actual = {}
for i in range(1000):
e = subsample(b,2)
actual[tuple(e)] = None
self.assertEqual(actual, {(1,0,1):None,(2,0,0):None})
obs = subsample(b,2)
assert (obs == array([1,0,1])).all() or (obs == array([2,0,0])).all()
def test_subsample_freq_dist_nonzero(self):
"""subsample_freq_dist_nonzero should return a random subsample of a vector
"""
a = array([0,5,0])
self.assertEqual(subsample_freq_dist_nonzero(a,5), array([0,5,0]))
self.assertEqual(subsample_freq_dist_nonzero(a,2), array([0,2,0]))
# selecting 35 counts from the vector 1000 times yields each at least
# two different results
b = array([2,0,1,2,1,8,6,0,3,3,5,0,0,0,5])
actual = {}
for i in range(100):
e = subsample_freq_dist_nonzero(b,35)
self.assertTrue(e.sum(),35)
actual[tuple(e)] = None
self.assertTrue(len(actual) > 1)
# selecting 2 counts from the vector 1000 times yields each of the
# two possible results at least once each (note that an issue with an
# inital buggy version of subsample_freq_dist_nonzero was detected with
# this test, so don't remove - )
b = array([2,0,1])
actual = {}
for i in range(1000):
e = subsample_freq_dist_nonzero(b,2)
actual[tuple(e)] = None
self.assertTrue(e.sum() == 2)
self.assertEqual(actual, {(1,0,1):None,(2,0,0):None})
def test_subsample_random(self):
"""subsample_random should return a random subsample of a vector
"""
a = array([0,5,0])
self.assertEqual(subsample_random(a,5), array([0,5,0]))
self.assertEqual(subsample_random(a,2), array([0,2,0]))
# selecting 35 counts from the vector 1000 times yields each at least
# two different results
b = array([2,0,1,2,1,8,6,0,3,3,5,0,0,0,5])
actual = {}
for i in range(100):
e = subsample_random(b,35)
self.assertTrue(e.sum(),35)
actual[tuple(e)] = None
self.assertTrue(len(actual) > 1)
# selecting 2 counts from the vector 1000 times yields each of the
# two possible results at least once each
b = array([2,0,1])
actual = {}
for i in range(1000):
e = subsample_random(b,2)
actual[tuple(e)] = None
self.assertTrue(e.sum() == 2)
self.assertEqual(actual, {(1,0,1):None,(2,0,0):None})
def test_subsample_multinomial(self):
"""subsample_multinomial should return a random subsample of a vector
"""
# selecting 35 counts from the vector 1000 times yields each at least
# two different results
actual = {}
for i in range(100):
b = array([2,0,1,2,1,8,6,0,3,3,5,0,0,0,5])
e = subsample_multinomial(b,35)
self.assertTrue(e.sum(),35)
actual[tuple(e)] = None
self.assertTrue(len(actual) > 1)
def test_naive_histogram(self):
"""naive_histogram should produce expected result"""
vals = array([1,0,0,3])
self.assertEqual(naive_histogram(vals), array([2,1,0,1]))
self.assertEqual(naive_histogram(vals, 4), array([2,1,0,1,0]))
def test_wrap_numpy_histogram(self):
"""wrap_numpy_histogram should provide expected result"""
vals = array([1,0,0,3])
h_f = wrap_numpy_histogram(3)
self.assertEqual(h_f(vals), array([2,1,0,1]))
h_f = wrap_numpy_histogram(4)
self.assertEqual(h_f(vals, 4), array([2,1,0,1,0]))
def test_rarefaction(self):
"""rarefaction should produce expected curve"""
vals = array([5,0,0,3,0,10], dtype=int)
res = [r.copy() for r in rarefaction(vals, stride=1)]
self.assertEqual(len(res), 18)
for i, r in enumerate(res):
self.assertEqual(r.sum(), i+1)
#make sure we didn't add any bad counts
for pos in [1,2,4]:
self.assertEqual(r[pos], 0)
#when we get to end should recapture orig vals
self.assertEqual(r, vals)
res = [r.copy() for r in rarefaction(vals, stride=3)]
self.assertEqual(len(res), 6)
for i, r in enumerate(res):
self.assertEqual(r.sum(), 3*(i+1))
#make sure we didn't add any bad counts
for pos in [1,2,4]:
self.assertEqual(r[pos], 0)
#when we get to end should recapture orig vals
self.assertEqual(r, vals)
#repeat everything above using alt. input format
orig_vals = vals.copy()
vals = array([0,0,0,0,0,3,3,3,5,5,5,5,5,5,5,5,5,5], dtype=int)
res = [r.copy() for r in rarefaction(vals, stride=1, is_counts=False)]
self.assertEqual(len(res), 18)
for i, r in enumerate(res):
self.assertEqual(r.sum(), i+1)
#make sure we didn't add any bad counts
for pos in [1,2,4]:
self.assertEqual(r[pos], 0)
#when we get to end should recapture orig vals
self.assertEqual(r, orig_vals)
res = [r.copy() for r in rarefaction(vals, stride=3, is_counts=False)]
self.assertEqual(len(res), 6)
for i, r in enumerate(res):
self.assertEqual(r.sum(), 3*(i+1))
#make sure we didn't add any bad counts
for pos in [1,2,4]:
self.assertEqual(r[pos], 0)
#when we get to end should recapture orig vals
self.assertEqual(r, orig_vals)
if __name__ =='__main__':
main()
PyCogent-1.5.3/tests/test_maths/test_stats/test_special.py 000644 000765 000024 00000036744 12024702176 025022 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for special functions used in statistics.
"""
from cogent.util.unit_test import TestCase, main
from cogent.maths.stats.special import permutations, permutations_exact, \
ln_permutations, combinations, combinations_exact, \
ln_combinations, ln_binomial, log_one_minus, one_minus_exp, igami,\
ndtri, incbi, log1p
import math
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Gavin Huttley", "Rob Knight", "Sandra Smit"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class SpecialTests(TestCase):
"""Tests miscellaneous functions."""
def test_permutations(self):
"""permutations should return expected results"""
self.assertEqual(permutations(1,1), 1)
self.assertEqual(permutations(2,1), 2)
self.assertEqual(permutations(3,1), 3)
self.assertEqual(permutations(4,1), 4)
self.assertEqual(permutations(4,2), 12)
self.assertEqual(permutations(4,3), 24)
self.assertEqual(permutations(4,4), 24)
self.assertFloatEqual(permutations(300, 100), 3.8807387193009318e+239)
def test_permutations_errors(self):
"""permutations should raise errors on invalid input"""
self.assertRaises(IndexError,permutations,10,50)
self.assertRaises(IndexError,permutations,-1,50)
self.assertRaises(IndexError,permutations,10,-5)
def test_permutations_float(self):
"""permutations should use gamma function when floats as input"""
self.assertFloatEqual(permutations(1.0,1), 1)
self.assertFloatEqual(permutations(2,1.0), 2)
self.assertFloatEqual(permutations(3.0,1.0), 3)
self.assertFloatEqual(permutations(4.0,1), 4)
self.assertFloatEqual(permutations(4.0,2.0), 12)
self.assertFloatEqual(permutations(4.0,3.0), 24)
self.assertFloatEqual(permutations(4,4.0), 24)
self.assertFloatEqual(permutations(300, 100), 3.8807387193009318e+239)
def test_permutations_range(self):
"""permutations should increase gradually with increasing k"""
start = 5 #permuations(10,5) = 30240
end = 6 #permutations(10,6) = 151200
step = 0.1
lower_lim = 30240
upper_lim = 151200
previous_value = 30239.9999
while start <= end:
obs = permutations(10,start)
assert lower_lim <= obs <= upper_lim
assert obs > previous_value
previous_value = obs
start += step
def test_permutations_exact(self):
"""permutations_exact should return expected results"""
self.assertFloatEqual(permutations_exact(1,1), 1)
self.assertFloatEqual(permutations_exact(2,1), 2)
self.assertFloatEqual(permutations_exact(3,1), 3)
self.assertFloatEqual(permutations_exact(4,1), 4)
self.assertFloatEqual(permutations_exact(4,2), 12)
self.assertFloatEqual(permutations_exact(4,3), 24)
self.assertFloatEqual(permutations_exact(4,4), 24)
self.assertFloatEqual(permutations_exact(300,100),\
3.8807387193009318e239)
def test_ln_permutations(self):
"""ln_permutations should return expected results"""
self.assertFloatEqual(ln_permutations(1,1), math.log(1))
self.assertFloatEqual(ln_permutations(2,1), math.log(2))
self.assertFloatEqual(ln_permutations(3,1.0), math.log(3))
self.assertFloatEqual(ln_permutations(4,1), math.log(4))
self.assertFloatEqual(ln_permutations(4.0,2), math.log(12))
self.assertFloatEqual(ln_permutations(4,3.0), math.log(24))
self.assertFloatEqual(ln_permutations(4,4), math.log(24))
self.assertFloatEqual(ln_permutations(300.0,100),\
math.log(3.8807387193009318e239))
def test_combinations(self):
"""combinations should return expected results when int as input"""
self.assertEqual(combinations(1,1), 1)
self.assertEqual(combinations(2,1), 2)
self.assertEqual(combinations(3,1), 3)
self.assertEqual(combinations(4,1), 4)
self.assertEqual(combinations(4,2), 6)
self.assertEqual(combinations(4,3), 4)
self.assertEqual(combinations(4,4), 1)
self.assertEqual(combinations(20,4), 19*17*15)
self.assertFloatEqual(combinations(300, 100), 4.1582514632578812e+81)
def test_combinations_errors(self):
"""combinations should raise errors on invalid input"""
self.assertRaises(IndexError,combinations,10,50)
self.assertRaises(IndexError,combinations,-1,50)
self.assertRaises(IndexError,combinations,10,-5)
def test_combinations_float(self):
"""combinations should use gamma function when floats as input"""
self.assertFloatEqual(combinations(1.0,1.0), 1)
self.assertFloatEqual(combinations(2.0,1.0), 2)
self.assertFloatEqual(combinations(3.0,1.0), 3)
self.assertFloatEqual(combinations(4.0,1.0), 4)
self.assertFloatEqual(combinations(4.0,2), 6)
self.assertFloatEqual(combinations(4,3.0), 4)
self.assertFloatEqual(combinations(4.0,4.0), 1)
self.assertFloatEqual(combinations(20.0,4.0), 19*17*15)
self.assertFloatEqual(combinations(300,100.0),4.1582514632578812e81)
def test_combinations_range(self):
"""combinations should decrease gradually with increasing k"""
start = 5 #combinations(10,5) = 252
end = 6 #combinations(10,6) = 210
step = 0.1
lower_lim = 210
upper_lim = 252
previous_value = 252.00001
while start <= end:
obs = combinations(10,start)
assert lower_lim <= obs <= upper_lim
assert obs < previous_value
previous_value = obs
start += step
def test_combinations_exact(self):
"""combinations_exact should return expected results"""
self.assertEqual(combinations_exact(1,1), 1)
self.assertEqual(combinations_exact(2,1), 2)
self.assertEqual(combinations_exact(3,1), 3)
self.assertEqual(combinations_exact(4,1), 4)
self.assertEqual(combinations_exact(4,2), 6)
self.assertEqual(combinations_exact(4,3), 4)
self.assertEqual(combinations_exact(4,4), 1)
self.assertEqual(combinations_exact(20,4), 19*17*15)
self.assertFloatEqual(combinations_exact(300,100),4.1582514632578812e81)
def test_ln_combinations(self):
"""ln_combinations should return expected results"""
self.assertFloatEqual(ln_combinations(1,1), math.log(1))
self.assertFloatEqual(ln_combinations(2,1), math.log(2))
self.assertFloatEqual(ln_combinations(3,1), math.log(3))
self.assertFloatEqual(ln_combinations(4.0,1), math.log(4))
self.assertFloatEqual(ln_combinations(4,2.0), math.log(6))
self.assertFloatEqual(ln_combinations(4,3), math.log(4))
self.assertFloatEqual(ln_combinations(4,4.0), math.log(1))
self.assertFloatEqual(ln_combinations(20,4), math.log(19*17*15))
self.assertFloatEqual(ln_combinations(300,100),\
math.log(4.1582514632578812e+81))
def test_ln_binomial_integer(self):
"""ln_binomial should match R results for integer values"""
expected = {
(10,60,0.1): -3.247883,
(1, 1, 0.5): math.log(0.5),
(1, 1, 0.0000001): math.log(1e-07),
(1, 1, 0.9999999): math.log(0.9999999),
(3, 5, 0.75): math.log(0.2636719),
(0, 60, 0.5): math.log(8.673617e-19),
(129, 130, 0.5): math.log(9.550892e-38),
(299, 300, 0.099): math.log(1.338965e-298),
(9, 27, 0.0003): math.log(9.175389e-26),
(1032, 2050, 0.5): math.log(0.01679804),
}
for (key, value) in expected.items():
self.assertFloatEqualRel(ln_binomial(*key), value, 1e-4)
def test_ln_binomial_floats(self):
"""Binomial exact should match values from R for integer successes"""
expected = {
(18.3, 100, 0.2): (math.log(0.09089812), math.log(0.09807429)),
(2.7,1050,0.006): (math.log(0.03615498), math.log(0.07623827)),
(2.7,1050,0.06): (math.log(1.365299e-25), math.log(3.044327e-24)),
(2,100.5,0.6): (math.log(7.303533e-37), math.log(1.789727e-36)),
(0.2, 60, 0.5): (math.log(8.673617e-19), math.log(5.20417e-17)),
(.5,5,.3):(math.log(0.16807),math.log(0.36015)),
(10,100.5,.5):(math.log(7.578011e-18),math.log(1.365543e-17)),
}
for (key, value) in expected.items():
min_val, max_val = value
assert min_val < ln_binomial(*key) < max_val
#self.assertFloatEqualRel(binomial_exact(*key), value, 1e-4)
def test_ln_binomial_range(self):
"""ln_binomial should increase in a monotonically increasing region.
"""
start=0
end=1
step = 0.1
lower_lim = -1.783375-1e-4
upper_lim = -1.021235+1e-4
previous_value = -1.784
while start <= end:
obs = ln_binomial(start,5,.3)
assert lower_lim <= obs <= upper_lim
assert obs > previous_value
previous_value = obs
start += step
def test_log_one_minus_large(self):
"""log_one_minus_x should return math.log(1-x) if x is large"""
self.assertFloatEqual(log_one_minus(0.2), math.log(1-0.2))
def test_log_one_minus_small(self):
"""log_one_minus_x should return -x if x is small"""
self.assertFloatEqualRel(log_one_minus(1e-30), 1e-30)
def test_one_minus_exp_large(self):
"""one_minus_exp_x should return 1 - math.exp(x) if x is large"""
self.assertFloatEqual(one_minus_exp(0.2), 1-(math.exp(0.2)))
def test_one_minus_exp_small(self):
"""one_minus_exp_x should return -x if x is small"""
self.assertFloatEqual(one_minus_exp(1e-30), -1e-30)
def test_log1p(self):
"""log1p should give same results as cephes"""
p_s = [1e-10, 1e-5, 0.1, 0.8, 0.9, 0.95, 0.999, 0.9999999, 1, \
1.000000001, 1.01, 2]
exp = [
9.9999999995e-11,
9.99995000033e-06,
0.0953101798043,
0.587786664902,
0.641853886172,
0.667829372576,
0.692647055518,
0.69314713056,
0.69314718056,
0.69314718106,
0.698134722071,
1.09861228867,]
for p, e in zip(p_s, exp):
self.assertFloatEqual(log1p(p), e)
def test_igami(self):
"""igami should give same result as cephes implementation"""
a_vals = [1e-10, 1e-5, 0.5, 1, 10, 200]
y_vals = range(0,10,2)
obs = [igami(a, y/10.0) for a in a_vals for y in y_vals]
exp=[1.79769313486e+308,
0.0,
0.0,
0.0,
0.0,
1.79769313486e+308,
0.0,
0.0,
0.0,
0.0,
1.79769313486e+308,
0.821187207575,
0.3541631504,
0.137497948864,
0.0320923773337,
1.79769313486e+308,
1.60943791243,
0.916290731874,
0.510825623766,
0.223143551314,
1.79769313486e+308,
12.5187528198,
10.4756841889,
8.9044147366,
7.28921960854,
1.79769313486e+308,
211.794753362,
203.267574402,
196.108740945,
188.010915412,
]
for o, e in zip(obs, exp):
self.assertFloatEqual(o,e)
def test_ndtri(self):
"""ndtri should give same result as implementation in cephes"""
exp=[-1.79769313486e+308,
-2.32634787404,
-2.05374891063,
-1.88079360815,
-1.75068607125,
-1.64485362695,
-1.5547735946,
-1.47579102818,
-1.40507156031,
-1.34075503369,
-1.28155156554,
-1.22652812004,
-1.17498679207,
-1.12639112904,
-1.08031934081,
-1.03643338949,
-0.99445788321,
-0.954165253146,
-0.915365087843,
-0.877896295051,
-0.841621233573,
-0.806421247018,
-0.772193214189,
-0.738846849185,
-0.70630256284,
-0.674489750196,
-0.643345405393,
-0.612812991017,
-0.582841507271,
-0.553384719556,
-0.524400512708,
-0.495850347347,
-0.467698799115,
-0.439913165673,
-0.412463129441,
-0.385320466408,
-0.358458793251,
-0.331853346437,
-0.305480788099,
-0.279319034447,
-0.253347103136,
-0.227544976641,
-0.201893479142,
-0.176374164781,
-0.150969215497,
-0.125661346855,
-0.100433720511,
-0.0752698620998,
-0.0501535834647,
-0.0250689082587,
0.0,
0.0250689082587,
0.0501535834647,
0.0752698620998,
0.100433720511,
0.125661346855,
0.150969215497,
0.176374164781,
0.201893479142,
0.227544976641,
0.253347103136,
0.279319034447,
0.305480788099,
0.331853346437,
0.358458793251,
0.385320466408,
0.412463129441,
0.439913165673,
0.467698799115,
0.495850347347,
0.524400512708,
0.553384719556,
0.582841507271,
0.612812991017,
0.643345405393,
0.674489750196,
0.70630256284,
0.738846849185,
0.772193214189,
0.806421247018,
0.841621233573,
0.877896295051,
0.915365087843,
0.954165253146,
0.99445788321,
1.03643338949,
1.08031934081,
1.12639112904,
1.17498679207,
1.22652812004,
1.28155156554,
1.34075503369,
1.40507156031,
1.47579102818,
1.5547735946,
1.64485362695,
1.75068607125,
1.88079360815,
2.05374891063,
2.32634787404,
]
obs = [ndtri(i/100.0) for i in range(100)]
self.assertFloatEqual(obs, exp)
def test_incbi(self):
"""incbi results should match cephes libraries"""
aa_range = [0.1, 0.2, 0.5, 1, 2, 5]
bb_range = aa_range
yy_range = [0.1, 0.2, 0.5, 0.9]
exp = [
8.86928001193e-08,
9.08146855855e-05,
0.5,
0.999999911307,
4.39887474012e-09,
4.50443299194e-06,
0.0416524955556,
0.997881005025,
3.46456275553e-10,
3.54771169012e-07,
0.00337816430373,
0.732777808689,
1e-10,
1.024e-07,
0.0009765625,
0.3486784401,
3.85543289443e-11,
3.94796342545e-08,
0.000376636057552,
0.154915841005,
1.33210087225e-11,
1.36407136078e-08,
0.000130149552409,
0.056682323296,
0.00211899497509,
0.0646097657259,
0.958347504444,
0.999999995601,
0.000247764691908,
0.00788804962659,
0.5,
0.999752235308,
3.09753032747e-05,
0.000990813218262,
0.092990311753,
0.906714634947,
1e-05,
0.00032,
0.03125,
0.59049,
4.01878917904e-06,
0.000128614607219,
0.0126923538971,
0.309157452156,
1.41593162013e-06,
4.5316442592e-05,
0.00449136140034,
0.122896698096,
0.267222191311,
0.684264602461,
0.996621835696,
0.999999999654,
0.0932853650529,
0.321847764104,
0.907009688247,
0.999969024697,
0.0244717418524,
0.0954915028125,
0.5,
0.975528258148,
0.01,
0.04,
0.25,
0.81,
0.00445768188762,
0.0179929616503,
0.120614758428,
0.531877433474,
0.00165851285512,
0.00672409501831,
0.046687245337,
0.247272226803,
0.6513215599,
0.8926258176,
0.9990234375,
0.9999999999,
0.40951,
0.67232,
0.96875,
0.99999,
0.19,
0.36,
0.75,
0.99,
0.1,
0.2,
0.5,
0.9,
0.0513167019495,
0.105572809,
0.292893218813,
0.683772233983,
0.020851637639,
0.04364750021,
0.129449436704,
0.36904265552,
0.845084158995,
0.956946913164,
0.999623363942,
0.999999999961,
0.690842547844,
0.850620771098,
0.987307646103,
0.999995981211,
0.468122566526,
0.629849697132,
0.879385241572,
0.995542318112,
0.316227766017,
0.4472135955,
0.707106781187,
0.948683298051,
0.195800105659,
0.287140725417,
0.5,
0.804199894341,
0.0925952589131,
0.13988068827,
0.264449983296,
0.510316306551,
0.943317676704,
0.984896695084,
0.999869850448,
0.999999999987,
0.877103301904,
0.944441767096,
0.9955086386,
0.999998584068,
0.752727773197,
0.841546267738,
0.953312754663,
0.998341487145,
0.63095734448,
0.724779663678,
0.870550563296,
0.979148362361,
0.489683693449,
0.577552475154,
0.735550016704,
0.907404741087,
0.300968763593,
0.366086516536,
0.5,
0.699031236407,
]
i = 0
for a in aa_range:
for b in bb_range:
for y in yy_range:
result = incbi(a,b,y)
e = exp[i]
self.assertFloatEqual(e, result)
i += 1
#specific cases that failed elsewhere
self.assertFloatEqual(incbi(999,2,1e-10), 0.97399698104554944)
#execute tests if called from command line
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_maths/test_stats/test_test.py 000644 000765 000024 00000210751 12024702176 024351 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for statistical tests and utility functions.
"""
from cogent.util.unit_test import TestCase, main
from cogent.maths.stats.test import tail, G_2_by_2,G_fit, likelihoods,\
posteriors, bayes_updates, t_paired, t_one_sample, t_two_sample, \
mc_t_two_sample, _permute_observations, t_one_observation, correlation, \
correlation_test, correlation_matrix, z_test, z_tailed_prob, \
t_tailed_prob, sign_test, reverse_tails, ZeroExpectedError, combinations, \
multiple_comparisons, multiple_inverse, multiple_n, fisher, regress, \
regress_major, f_value, f_two_sample, calc_contingency_expected, \
G_fit_from_Dict2D, chi_square_from_Dict2D, MonteCarloP, \
regress_residuals, safe_sum_p_log_p, G_ind, regress_origin, stdev_from_mean, \
regress_R2, permute_2d, mantel, mantel_test, _flatten_lower_triangle, \
pearson, spearman, _get_rank, kendall_correlation, std, median, \
get_values_from_matrix, get_ltm_cells, distance_matrix_permutation_test, \
ANOVA_one_way, mw_test, mw_boot, is_symmetric_and_hollow
from numpy import array, concatenate, fill_diagonal, reshape, arange, matrix, \
ones, testing, tril, cov, sqrt
from cogent.util.dict2d import Dict2D
import math
from cogent.maths.stats.util import Numbers
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2011, The Cogent Project"
__credits__ = ["Rob Knight", "Catherine Lozupone", "Gavin Huttley",
"Sandra Smit", "Daniel McDonald", "Jai Ram Rideout",
"Michael Dwan"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class TestsHelper(TestCase):
"""Class with utility methods useful for other tests."""
def setUp(self):
"""Sets up variables used in the tests."""
# How many times a p-value should be tested to fall in a given range
# before failing the test.
self.p_val_tests = 10
def assertCorrectPValue(self, exp_min, exp_max, fn, args=None,
kwargs=None, p_val_idx=0):
"""Tests that the stochastic p-value falls in the specified range.
Performs the test self.p_val_tests times and fails if the observed
p-value does not fall into the specified range at least once. Each
p-value is also tested that it falls in the range 0.0 to 1.0.
This method assumes that fn is callable, and will unpack and pass args
and kwargs to fn if they are provided. It also assumes that fn returns
a single value (the p-value to be tested) or a tuple of results (any
length greater than or equal to 1), with the p-value at position
p_val_idx.
This is primarily used for testing the Mantel and correlation_test
functions.
"""
found_match = False
for i in range(self.p_val_tests):
if args is not None and kwargs is not None:
obs = fn(*args, **kwargs)
elif args is not None:
obs = fn(*args)
elif kwargs is not None:
obs = fn(**kwargs)
else:
obs = fn()
try:
p_val = float(obs)
except TypeError:
p_val = obs[p_val_idx]
self.assertIsProb(p_val)
if p_val >= exp_min and p_val <= exp_max:
found_match = True
break
self.assertTrue(found_match)
class TestsTests(TestCase):
"""Tests miscellaneous functions."""
def test_std(self):
"""Should produce a standard deviation of 1.0 for a std normal dist"""
expected = 1.58113883008
self.assertFloatEqual(std(array([1,2,3,4,5])), expected)
expected_a = array([expected, expected, expected, expected, expected])
a = array([[1,2,3,4,5],[5,1,2,3,4],[4,5,1,2,3],[3,4,5,1,2],[2,3,4,5,1]])
self.assertFloatEqual(std(a,axis=0), expected_a)
self.assertFloatEqual(std(a,axis=1), expected_a)
self.assertRaises(ValueError, std, a, 5)
def test_std_2d(self):
"""Should produce from 2darray the same stdevs as scipy.stats.std"""
inp = array([[1,2,3],[4,5,6]])
exps = ( #tuple(scipy_std(inp, ax) for ax in [None, 0, 1])
1.8708286933869707,
array([ 2.12132034, 2.12132034, 2.12132034]),
array([ 1., 1.]))
results = tuple(std(inp, ax) for ax in [None, 0, 1])
for obs, exp in zip(results, exps):
testing.assert_almost_equal(obs, exp)
def test_std_3d(self):
"""Should produce from 3darray the same std devs as scipy.stats.std"""
inp3d = array(#2,2,3
[[[ 0, 2, 2],
[ 3, 4, 5]],
[[ 1, 9, 0],
[ 9, 10, 1]]])
exp3d = (#for axis None, 0, 1, 2: calc from scipy.stats.std
3.63901418552,
array([[ 0.70710678, 4.94974747, 1.41421356],
[ 4.24264069, 4.24264069, 2.82842712]]),
array([[ 2.12132034, 1.41421356, 2.12132034],
[ 5.65685425, 0.70710678, 0.70710678]]),
array([[ 1.15470054, 1. ],
[ 4.93288286, 4.93288286]]))
res = tuple(std(inp3d, ax) for ax in [None, 0, 1, 2])
for obs, exp in zip(res, exp3d):
testing.assert_almost_equal(obs, exp)
def test_median(self):
"""_median should work similarly to numpy.mean (in terms of axis)"""
m = array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
expected = 6.5
observed = median(m, axis=None)
self.assertEqual(observed, expected)
expected = array([5.5, 6.5, 7.5])
observed = median(m, axis=0)
self.assertEqual(observed, expected)
expected = array([2.0, 5.0, 8.0, 11.0])
observed = median(m, axis=1)
self.assertEqual(observed, expected)
self.assertRaises(ValueError, median, m, 10)
def test_tail(self):
"""tail should return x/2 if test is true; 1-(x/2) otherwise"""
self.assertFloatEqual(tail(0.25, 'a'=='a'), 0.25/2)
self.assertFloatEqual(tail(0.25, 'a'!='a'), 1-(0.25/2))
def test_combinations(self):
"""combinations should return correct binomial coefficient"""
self.assertFloatEqual(combinations(5,3), 10)
self.assertFloatEqual(combinations(5,2), 10)
#only one way to pick no items or the same number of items
self.assertFloatEqual(combinations(123456789, 0), 1)
self.assertFloatEqual(combinations(123456789, 123456789), 1)
#n ways to pick one item
self.assertFloatEqual(combinations(123456789, 1), 123456789)
#n(n-1)/2 ways to pick 2 items
self.assertFloatEqual(combinations(123456789, 2), 123456789*123456788/2)
#check an arbitrary value in R
self.assertFloatEqual(combinations(1234567, 12), 2.617073e64)
def test_multiple_comparisons(self):
"""multiple_comparisons should match values from R"""
self.assertFloatEqual(multiple_comparisons(1e-7, 10000), 1-0.9990005)
self.assertFloatEqual(multiple_comparisons(0.05, 10), 0.4012631)
self.assertFloatEqual(multiple_comparisons(1e-20, 1), 1e-20)
self.assertFloatEqual(multiple_comparisons(1e-300, 1), 1e-300)
self.assertFloatEqual(multiple_comparisons(0.95, 3),0.99987499999999996)
self.assertFloatEqual(multiple_comparisons(0.75, 100),0.999999999999679)
self.assertFloatEqual(multiple_comparisons(0.5, 1000),1)
self.assertFloatEqual(multiple_comparisons(0.01, 1000),0.99995682875259)
self.assertFloatEqual(multiple_comparisons(0.5, 5), 0.96875)
self.assertFloatEqual(multiple_comparisons(1e-20, 10), 1e-19)
def test_multiple_inverse(self):
"""multiple_inverse should invert multiple_comparisons results"""
#NOTE: multiple_inverse not very accurate close to 1
self.assertFloatEqual(multiple_inverse(1-0.9990005, 10000), 1e-7)
self.assertFloatEqual(multiple_inverse(0.4012631 , 10), 0.05)
self.assertFloatEqual(multiple_inverse(1e-20, 1), 1e-20)
self.assertFloatEqual(multiple_inverse(1e-300, 1), 1e-300)
self.assertFloatEqual(multiple_inverse(0.96875, 5), 0.5)
self.assertFloatEqual(multiple_inverse(1e-19, 10), 1e-20)
def test_multiple_n(self):
"""multiple_n should swap parameters in multiple_comparisons"""
self.assertFloatEqual(multiple_n(1e-7, 1-0.9990005), 10000)
self.assertFloatEqual(multiple_n(0.05, 0.4012631), 10)
self.assertFloatEqual(multiple_n(1e-20, 1e-20), 1)
self.assertFloatEqual(multiple_n(1e-300, 1e-300), 1)
self.assertFloatEqual(multiple_n(0.95,0.99987499999999996),3)
self.assertFloatEqual(multiple_n(0.5,0.96875),5)
self.assertFloatEqual(multiple_n(1e-20, 1e-19), 10)
def test_fisher(self):
"""fisher results should match p 795 Sokal and Rohlf"""
self.assertFloatEqual(fisher([0.073,0.086,0.10,0.080,0.060]),
0.0045957946540917905)
def test_regress(self):
"""regression slope, intercept should match p 459 Sokal and Rohlf"""
x = [0, 12, 29.5,43,53,62.5,75.5,85,93]
y = [8.98, 8.14, 6.67, 6.08, 5.90, 5.83, 4.68, 4.20, 3.72]
self.assertFloatEqual(regress(x, y), (-0.05322, 8.7038), 0.001)
#higher precision from OpenOffice
self.assertFloatEqual(regress(x, y), (-0.05322215,8.70402730))
#add test to confirm no overflow error with large numbers
x = [32119,33831]
y = [2.28,2.43]
exp = (8.761682243E-05, -5.341209112E-01)
self.assertFloatEqual(regress(x,y),exp,0.001)
def test_regress_origin(self):
"""regression slope constrained through origin should match Excel"""
x = array([1,2,3,4])
y = array([4,2,6,8])
self.assertFloatEqual(regress_origin(x, y), (1.9333333,0))
#add test to confirm no overflow error with large numbers
x = [32119,33831]
y = [2.28,2.43]
exp = (7.1428649481939822e-05, 0)
self.assertFloatEqual(regress_origin(x,y),exp,0.001)
def test_regress_R2(self):
"""regress_R2 returns the R^2 value of a regression"""
x = [1.0,2.0,3.0,4.0,5.0]
y = [2.1,4.2,5.9,8.4,9.6]
result = regress_R2(x, y)
self.assertFloatEqual(result, 0.99171419347896)
def test_regress_residuals(self):
"""regress_residuals reprts error for points in linear regression"""
x = [1.0,2.0,3.0,4.0,5.0]
y = [2.1,4.2,5.9,8.4,9.6]
result = regress_residuals(x, y)
self.assertFloatEqual(result, [-0.1, 0.08, -0.14, 0.44, -0.28])
def test_stdev_from_mean(self):
"""stdev_from_mean returns num std devs from mean for each val in x"""
x = [2.1, 4.2, 5.9, 8.4, 9.6]
result = stdev_from_mean(x)
self.assertFloatEqual(result, [-1.292463399014413, -0.60358696806764478, -0.045925095396451399, 0.77416589382589174, 1.1678095686526162])
def test_regress_major(self):
"""major axis regression should match p 589 Sokal and Rohlf"""
#Note that the Sokal and Rohlf example flips the axes, such that the
#equation is for explaining x in terms of y, not y in terms of x.
#Behavior here is the reverse, for easy comparison with regress.
y = [159, 179, 100, 45, 384, 230, 100, 320, 80, 220, 320, 210]
x = [14.40, 15.20, 11.30, 2.50, 22.70, 14.90, 1.41, 15.81, 4.19, 15.39,
17.25, 9.52]
self.assertFloatEqual(regress_major(x, y), (18.93633,-32.55208))
def test_sign_test(self):
"""sign_test, should match values from R"""
v = [("two sided", 26, 50, 0.88772482734078251),
("less", 26, 50, 0.6641),
("l", 10, 50, 1.193066583837777e-05),
("hi", 30, 50, 0.1013193755322703),
("h", 0, 50, 1.0),
("2", 30, 50, 0.20263875106454063),
("h", 49, 50, 4.5297099404706387e-14),
("h", 50, 50, 8.8817841970012543e-16)
]
for alt, success, trials, p in v:
result = sign_test(success, trials, alt=alt)
self.assertFloatEqual(result, p, eps=1e-5)
def test_permute_2d(self):
"""permute_2d permutes rows and cols of a matrix."""
a = reshape(arange(9), (3,3))
self.assertEqual(permute_2d(a, [0,1,2]), a)
self.assertEqual(permute_2d(a, [2,1,0]), \
array([[8,7,6],[5,4,3],[2,1,0]]))
self.assertEqual(permute_2d(a, [1,2,0]), \
array([[4,5,3],[7,8,6],[1,2,0]]))
class GTests(TestCase):
"""Tests implementation of the G tests for fit and independence."""
def test_G_2_by_2_2tailed_equal(self):
"""G_2_by_2 should return 0 if all cell counts are equal"""
self.assertFloatEqual(0, G_2_by_2(1, 1, 1, 1, False, False)[0])
self.assertFloatEqual(0, G_2_by_2(100, 100, 100, 100, False, False)[0])
self.assertFloatEqual(0, G_2_by_2(100, 100, 100, 100, True, False)[0])
def test_G_2_by_2_bad_data(self):
"""G_2_by_2 should raise ValueError if any counts are negative"""
self.assertRaises(ValueError, G_2_by_2, 1, -1, 1, 1)
def test_G_2_by_2_2tailed_examples(self):
"""G_2_by_2 values should match examples in Sokal & Rohlf"""
#example from p 731, Sokal and Rohlf (1995)
#without correction
self.assertFloatEqual(G_2_by_2(12, 22, 16, 50, False, False)[0],
1.33249, 0.0001)
self.assertFloatEqual(G_2_by_2(12, 22, 16, 50, False, False)[1],
0.24836, 0.0001)
#with correction
self.assertFloatEqual(G_2_by_2(12, 22, 16, 50, True, False)[0],
1.30277, 0.0001)
self.assertFloatEqual(G_2_by_2(12, 22, 16, 50, True, False)[1],
0.25371, 0.0001)
def test_G_2_by_2_1tailed_examples(self):
"""G_2_by_2 values should match values from codon_binding program"""
#first up...the famous arginine case
self.assertFloatEqualAbs(G_2_by_2(36, 16, 38, 106), (29.111609, 0),
0.00001)
#then some other miscellaneous positive and negative values
self.assertFloatEqualAbs(G_2_by_2(0,52,12,132), (-7.259930, 0.996474),
0.00001)
self.assertFloatEqualAbs(G_2_by_2(5,47,14,130), (-0.000481, 0.508751),
0.00001)
self.assertFloatEqualAbs(G_2_by_2(5,47,36,108), (-6.065167, 0.993106),
0.00001)
def test_calc_contingency_expected(self):
"""calcContingencyExpected returns new matrix with expected freqs"""
matrix = Dict2D({'rest_of_tree': {'env1': 2, 'env3': 1, 'env2': 0},
'b': {'env1': 1, 'env3': 1, 'env2': 3}})
result = calc_contingency_expected(matrix)
self.assertFloatEqual(result['rest_of_tree']['env1'], [2, 1.125])
self.assertFloatEqual(result['rest_of_tree']['env3'], [1, 0.75])
self.assertFloatEqual(result['rest_of_tree']['env2'], [0, 1.125])
self.assertFloatEqual(result['b']['env1'], [1, 1.875])
self.assertFloatEqual(result['b']['env3'], [1, 1.25])
self.assertFloatEqual(result['b']['env2'], [3, 1.875])
def test_Gfit_unequal_lists(self):
"""Gfit should raise errors if lists unequal"""
#lists must be equal
self.assertRaises(ValueError, G_fit, [1, 2, 3], [1, 2])
def test_Gfit_negative_observeds(self):
"""Gfit should raise ValueError if any observeds are negative."""
self.assertRaises(ValueError, G_fit, [-1, 2, 3], [1, 2, 3])
def test_Gfit_nonpositive_expecteds(self):
"""Gfit should raise ZeroExpectedError if expecteds are zero/negative"""
self.assertRaises(ZeroExpectedError, G_fit, [1, 2, 3], [0, 1, 2])
self.assertRaises(ZeroExpectedError, G_fit, [1, 2, 3], [-1, 1, 2])
def test_Gfit_good_data(self):
"""Gfit tests for fit should match examples in Sokal and Rohlf"""
#example from p. 699, Sokal and Rohlf (1995)
obs = [63, 31, 28, 12, 39, 16, 40, 12]
exp = [ 67.78125, 22.59375, 22.59375, 7.53125, 45.18750,
15.06250, 45.18750, 15.06250]
#without correction
self.assertFloatEqualAbs(G_fit(obs, exp, False)[0], 8.82397, 0.00002)
self.assertFloatEqualAbs(G_fit(obs, exp, False)[1], 0.26554, 0.00002)
#with correction
self.assertFloatEqualAbs(G_fit(obs, exp)[0], 8.76938, 0.00002)
self.assertFloatEqualAbs(G_fit(obs, exp)[1], 0.26964, 0.00002)
#example from p. 700, Sokal and Rohlf (1995)
obs = [130, 46]
exp = [132, 44]
#without correction
self.assertFloatEqualAbs(G_fit(obs, exp, False)[0], 0.12002, 0.00002)
self.assertFloatEqualAbs(G_fit(obs, exp, False)[1], 0.72901, 0.00002)
#with correction
self.assertFloatEqualAbs(G_fit(obs, exp)[0], 0.11968, 0.00002)
self.assertFloatEqualAbs(G_fit(obs, exp)[1], 0.72938, 0.00002)
def test_safe_sum_p_log_p(self):
"""safe_sum_p_log_p should ignore zero elements, not raise error"""
m = array([2,4,0,8])
self.assertEqual(safe_sum_p_log_p(m,2), 2*1+4*2+8*3)
def test_G_ind(self):
"""G test for independence should match Sokal and Rohlf p 738 values"""
a = array([[29,11],[273,191],[8,31],[64,64]])
self.assertFloatEqual(G_ind(a)[0], 28.59642)
self.assertFloatEqual(G_ind(a, True)[0], 28.31244)
def test_G_fit_from_Dict2D(self):
"""G_fit_from_Dict2D runs G-fit on data in a Dict2D
"""
matrix = Dict2D({'Marl': {'val':[2, 5.2]},
'Chalk': {'val':[10, 5.2]},
'Sandstone':{'val':[8, 5.2]},
'Clay':{'val':[2, 5.2]},
'Limestone':{'val':[4, 5.2]}
})
g_val, prob = G_fit_from_Dict2D(matrix)
self.assertFloatEqual(g_val, 9.84923)
self.assertFloatEqual(prob, 0.04304536)
def test_chi_square_from_Dict2D(self):
"""chi_square_from_Dict2D calcs a Chi-Square and p value from Dict2D"""
#test1
obs_matrix = Dict2D({'rest_of_tree': {'env1': 2, 'env3': 1, 'env2': 0},
'b': {'env1': 1, 'env3': 1, 'env2': 3}})
input_matrix = calc_contingency_expected(obs_matrix)
test, csp = chi_square_from_Dict2D(input_matrix)
self.assertFloatEqual(test, 3.0222222222222221)
#test2
test_matrix_2 = Dict2D({'Marl': {'val':[2, 5.2]},
'Chalk': {'val':[10, 5.2]},
'Sandstone':{'val':[8, 5.2]},
'Clay':{'val':[2, 5.2]},
'Limestone':{'val':[4, 5.2]}
})
test2, csp2 = chi_square_from_Dict2D(test_matrix_2)
self.assertFloatEqual(test2, 10.1538461538)
self.assertFloatEqual(csp2, 0.0379143890013)
#test3
matrix3_obs = Dict2D({'AIDS':{'Males':4, 'Females':2, 'Both':3},
'No_AIDS':{'Males':3, 'Females':16, 'Both':2}
})
matrix3 = calc_contingency_expected(matrix3_obs)
test3, csp3 = chi_square_from_Dict2D(matrix3)
self.assertFloatEqual(test3, 7.6568405139833722)
self.assertFloatEqual(csp3, 0.0217439383468)
class LikelihoodTests(TestCase):
"""Tests implementations of likelihood calculations."""
def test_likelihoods_unequal_list_lengths(self):
"""likelihoods should raise ValueError if input lists unequal length"""
self.assertRaises(ValueError, likelihoods, [1, 2], [1])
def test_likelihoods_equal_priors(self):
"""likelihoods should equal Pr(D|H) if priors the same"""
equal = [0.25, 0.25, 0.25,0.25]
unequal = [0.5, 0.25, 0.125, 0.125]
equal_answer = [1, 1, 1, 1]
unequal_answer = [2, 1, 0.5, 0.5]
for obs, exp in zip(likelihoods(equal, equal), equal_answer):
self.assertFloatEqual(obs, exp)
for obs, exp in zip(likelihoods(unequal, equal), unequal_answer):
self.assertFloatEqual(obs, exp)
def test_likelihoods_equal_evidence(self):
"""likelihoods should return vector of 1's if evidence equal for all"""
equal = [0.25, 0.25, 0.25,0.25]
unequal = [0.5, 0.25, 0.125, 0.125]
equal_answer = [1, 1, 1, 1]
unequal_answer = [2, 1, 0.5, 0.5]
not_unity = [0.7, 0.7, 0.7, 0.7]
for obs, exp in zip(likelihoods(equal, unequal), equal_answer):
self.assertFloatEqual(obs, exp)
#should be the same if evidences don't sum to 1
for obs, exp in zip(likelihoods(not_unity, unequal), equal_answer):
self.assertFloatEqual(obs, exp)
def test_likelihoods_unequal_evidence(self):
"""likelihoods should update based on weighted sum if evidence unequal"""
not_unity = [1, 0.5, 0.25, 0.25]
unequal = [0.5, 0.25, 0.125, 0.125]
products = [1.4545455, 0.7272727, 0.3636364, 0.3636364]
#if priors and evidence both unequal, likelihoods should change
#(calculated using StarCalc)
for obs, exp in zip(likelihoods(not_unity, unequal), products):
self.assertFloatEqual(obs, exp)
def test_posteriors_unequal_lists(self):
"""posteriors should raise ValueError if input lists unequal lengths"""
self.assertRaises(ValueError, posteriors, [1, 2, 3], [1])
def test_posteriors_good_data(self):
"""posteriors should return products of paired list elements"""
first = [0, 0.25, 0.5, 1, 0.25]
second = [0.25, 0.5, 0, 0.1, 1]
product = [0, 0.125, 0, 0.1, 0.25]
for obs, exp in zip(posteriors(first, second), product):
self.assertFloatEqual(obs, exp)
class BayesUpdateTests(TestCase):
"""Tests implementation of Bayes calculations"""
def setUp(self):
first = [0.25, 0.25, 0.25]
second = [0.1, 0.75, 0.3]
third = [0.95, 1e-10, 0.2]
fourth = [0.01, 0.9, 0.1]
bad = [1, 2, 1, 1, 1]
self.bad = [first, bad, second, third]
self.test = [first, second, third, fourth]
self.permuted = [fourth, first, third, second]
self.deleted = [second, fourth, third]
self.extra = [first, second, first, third, first, fourth, first]
#BEWARE: low precision in second item, so need to adjust threshold
#for assertFloatEqual accordingly (and use assertFloatEqualAbs).
self.result = [0.136690646154, 0.000000009712, 0.863309344133]
def test_bayes_updates_bad_data(self):
"""bayes_updates should raise ValueError on unequal-length lists"""
self.assertRaises(ValueError, bayes_updates, self.bad)
def test_bayes_updates_good_data(self):
"""bayes_updates should match hand calculations of probability updates"""
#result for first -> fourth calculated by hand
for obs, exp in zip(bayes_updates(self.test), self.result):
self.assertFloatEqualAbs(obs, exp, 1e-11)
def test_bayes_updates_permuted(self):
"""bayes_updates should not be affected by order of inputs"""
for obs, exp in zip(bayes_updates(self.permuted), self.result):
self.assertFloatEqualAbs(obs, exp, 1e-11)
def test_bayes_update_nondiscriminating(self):
"""bayes_updates should be unaffected by extra nondiscriminating data"""
#deletion of non-discriminating evidence should not affect result
for obs, exp in zip(bayes_updates(self.deleted), self.result):
self.assertFloatEqualAbs(obs, exp, 1e-11)
#additional non-discriminating evidence should not affect result
for obs, exp in zip(bayes_updates(self.extra), self.result):
self.assertFloatEqualAbs(obs, exp, 1e-11)
class StatTests(TestsHelper):
"""Tests that the t and z tests are implemented correctly"""
def setUp(self):
super(StatTests, self).setUp()
self.x = [
7.33, 7.49, 7.27, 7.93, 7.56,
7.81, 7.46, 6.94, 7.49, 7.44,
7.95, 7.47, 7.04, 7.10, 7.64,
]
self.y = [
7.53, 7.70, 7.46, 8.21, 7.81,
8.01, 7.72, 7.13, 7.68, 7.66,
8.11, 7.66, 7.20, 7.25, 7.79,
]
def test_t_paired_2tailed(self):
"""t_paired should match values from Sokal & Rohlf p 353"""
x, y = self.x, self.y
#check value of t and the probability for 2-tailed
self.assertFloatEqual(t_paired(y, x)[0], 19.7203, 1e-4)
self.assertFloatEqual(t_paired(y, x)[1], 1.301439e-11, 1e-4)
def test_t_paired_no_variance(self):
"""t_paired should return None if lists are invariant"""
x = [1, 1, 1]
y = [0, 0, 0]
self.assertEqual(t_paired(x,x), (None, None))
self.assertEqual(t_paired(x,y), (None, None))
def test_t_paired_1tailed(self):
"""t_paired should match pre-calculated 1-tailed values"""
x, y = self.x, self.y
#check probability for 1-tailed low and high
self.assertFloatEqual(
t_paired(y, x, "low")[1], 1-(1.301439e-11/2), 1e-4)
self.assertFloatEqual(
t_paired(x, y, "high")[1], 1-(1.301439e-11/2), 1e-4)
self.assertFloatEqual(
t_paired(y, x, "high")[1], 1.301439e-11/2, 1e-4)
self.assertFloatEqual(
t_paired(x, y, "low")[1], 1.301439e-11/2, 1e-4)
def test_t_paired_specific_difference(self):
"""t_paired should allow a specific difference to be passed"""
x, y = self.x, self.y
#difference is 0.2, so test should be non-significant if 0.2 passed
self.failIf(t_paired(y, x, exp_diff=0.2)[0] > 1e-10)
#same, except that reversing list order reverses sign of difference
self.failIf(t_paired(x, y, exp_diff=-0.2)[0] > 1e-10)
#check that there's no significant difference from the true mean
self.assertFloatEqual(
t_paired(y, x,exp_diff=0.2)[1], 1, 1e-4)
def test_t_paired_bad_data(self):
"""t_paired should raise ValueError on lists of different lengths"""
self.assertRaises(ValueError, t_paired, self.y, [1, 2, 3])
def test_t_two_sample(self):
"""t_two_sample should match example on p.225 of Sokal and Rohlf"""
I = array([7.2, 7.1, 9.1, 7.2, 7.3, 7.2, 7.5])
II = array([8.8, 7.5, 7.7, 7.6, 7.4, 6.7, 7.2])
self.assertFloatEqual(t_two_sample(I, II), (-0.1184, 0.45385 * 2),
0.001)
def test_t_two_sample_no_variance(self):
"""t_two_sample should return None if lists are invariant"""
x = array([1, 1, 1])
y = array([0, 0, 0])
self.assertEqual(t_two_sample(x,x), (None, None))
self.assertEqual(t_two_sample(x,y), (None, None))
def test_t_one_sample(self):
"""t_one_sample results should match those from R"""
x = array(range(-5,5))
y = array(range(-1,10))
self.assertFloatEqualAbs(t_one_sample(x), (-0.5222, 0.6141), 1e-4)
self.assertFloatEqualAbs(t_one_sample(y), (4, 0.002518), 1e-4)
#do some one-tailed tests as well
self.assertFloatEqualAbs(t_one_sample(y, tails='low'),(4, 0.9987),1e-4)
self.assertFloatEqualAbs(t_one_sample(y,tails='high'),(4,0.001259),1e-4)
def test_t_two_sample_switch(self):
"""t_two_sample should call t_one_observation if 1 item in sample."""
sample = array([4.02, 3.88, 3.34, 3.87, 3.18])
x = array([3.02])
self.assertFloatEqual(t_two_sample(x,sample),(-1.5637254,0.1929248))
self.assertFloatEqual(t_two_sample(sample, x),(-1.5637254,0.1929248))
#can't do the test if both samples have single item
self.assertEqual(t_two_sample(x,x), (None, None))
def test_t_one_observation(self):
"""t_one_observation should match p. 228 of Sokal and Rohlf"""
sample = array([4.02, 3.88, 3.34, 3.87, 3.18])
x = 3.02
#note that this differs after the 3rd decimal place from what's in the
#book, because Sokal and Rohlf round their intermediate steps...
self.assertFloatEqual(t_one_observation(x,sample),\
(-1.5637254,0.1929248))
def test_mc_t_two_sample(self):
"""Test gives correct results with valid input data."""
# Verified against R's t.test() and perm.t.test().
# With numpy array as input.
exp = (-0.11858541225631833, 0.90756579317867436)
I = array([7.2, 7.1, 9.1, 7.2, 7.3, 7.2, 7.5])
II = array([8.8, 7.5, 7.7, 7.6, 7.4, 6.7, 7.2])
obs = mc_t_two_sample(I, II)
self.assertFloatEqual(obs[:2], exp)
self.assertEqual(len(obs[2]), 999)
self.assertCorrectPValue(0.8, 0.9, mc_t_two_sample, [I, II],
p_val_idx=3)
# With python list as input.
exp = (-0.11858541225631833, 0.90756579317867436)
I = [7.2, 7.1, 9.1, 7.2, 7.3, 7.2, 7.5]
II = [8.8, 7.5, 7.7, 7.6, 7.4, 6.7, 7.2]
obs = mc_t_two_sample(I, II)
self.assertFloatEqual(obs[:2], exp)
self.assertEqual(len(obs[2]), 999)
self.assertCorrectPValue(0.8, 0.9, mc_t_two_sample, [I, II],
p_val_idx=3)
exp = (-0.11858541225631833, 0.45378289658933718)
obs = mc_t_two_sample(I, II, tails='low')
self.assertFloatEqual(obs[:2], exp)
self.assertEqual(len(obs[2]), 999)
self.assertCorrectPValue(0.4, 0.47, mc_t_two_sample, [I, II],
{'tails':'low'}, p_val_idx=3)
exp = (-0.11858541225631833, 0.54621710341066287)
obs = mc_t_two_sample(I, II, tails='high', permutations=99)
self.assertFloatEqual(obs[:2], exp)
self.assertEqual(len(obs[2]), 99)
self.assertCorrectPValue(0.4, 0.62, mc_t_two_sample, [I, II],
{'tails':'high', 'permutations':99}, p_val_idx=3)
exp = (-2.8855783649036986, 0.99315596652421401)
obs = mc_t_two_sample(I, II, tails='high', permutations=99, exp_diff=1)
self.assertFloatEqual(obs[:2], exp)
self.assertEqual(len(obs[2]), 99)
self.assertCorrectPValue(0.55, 0.99, mc_t_two_sample, [I, II],
{'tails':'high', 'permutations':99, 'exp_diff':1}, p_val_idx=3)
def test_mc_t_two_sample_unbalanced_obs(self):
"""Test gives correct results with unequal number of obs per sample."""
# Verified against R's t.test() and perm.t.test().
exp = (-0.10302479888889175, 0.91979753020527177)
I = array([7.2, 7.1, 9.1, 7.2, 7.3, 7.2])
II = array([8.8, 7.5, 7.7, 7.6, 7.4, 6.7, 7.2])
obs = mc_t_two_sample(I, II)
self.assertFloatEqual(obs[:2], exp)
self.assertEqual(len(obs[2]), 999)
self.assertCorrectPValue(0.8, 0.9, mc_t_two_sample, [I, II],
p_val_idx=3)
def test_mc_t_two_sample_single_obs_sample(self):
"""Test works correctly with one sample having a single observation."""
sample = array([4.02, 3.88, 3.34, 3.87, 3.18])
x = array([3.02])
exp = (-1.5637254,0.1929248)
obs = mc_t_two_sample(x, sample)
self.assertFloatEqual(obs[:2], exp)
self.assertFloatEqual(len(obs[2]), 999)
self.assertIsProb(obs[3])
obs = mc_t_two_sample(sample, x)
self.assertFloatEqual(obs[:2], exp)
self.assertFloatEqual(len(obs[2]), 999)
self.assertIsProb(obs[3])
def test_mc_t_two_sample_no_perms(self):
"""Test gives empty permutation results if no perms are given."""
exp = (-0.11858541225631833, 0.90756579317867436, [], None)
I = array([7.2, 7.1, 9.1, 7.2, 7.3, 7.2, 7.5])
II = array([8.8, 7.5, 7.7, 7.6, 7.4, 6.7, 7.2])
obs = mc_t_two_sample(I, II, permutations=0)
self.assertFloatEqual(obs, exp)
def test_mc_t_two_sample_no_mc(self):
"""Test no MC stats if initial t-test is bad."""
x = array([1, 1, 1])
y = array([0, 0, 0])
self.assertEqual(mc_t_two_sample(x,x), (None, None, [], None))
self.assertEqual(mc_t_two_sample(x,y), (None, None, [], None))
def test_mc_t_two_sample_invalid_input(self):
"""Test fails on various invalid input."""
self.assertRaises(ValueError, mc_t_two_sample, [1, 2, 3], [4., 5., 4.],
tails='foo')
self.assertRaises(ValueError, mc_t_two_sample, [1, 2, 3], [4., 5., 4.],
permutations=-1)
self.assertRaises(ValueError, mc_t_two_sample, [1], [4.])
self.assertRaises(ValueError, mc_t_two_sample, [1, 2], [])
def test_permute_observations(self):
"""Test works correctly on small input dataset."""
I = [10, 20., 1]
II = [2, 4, 5, 7]
obs = _permute_observations(I, II, 1)
self.assertEqual(len(obs[0]), 1)
self.assertEqual(len(obs[1]), 1)
self.assertEqual(len(obs[0][0]), len(I))
self.assertEqual(len(obs[1][0]), len(II))
self.assertFloatEqual(sorted(concatenate((obs[0][0], obs[1][0]))),
sorted(I + II))
def test_reverse_tails(self):
"""reverse_tails should return 'high' if tails was 'low' or vice versa"""
self.assertEqual(reverse_tails('high'), 'low')
self.assertEqual(reverse_tails('low'), 'high')
self.assertEqual(reverse_tails(None), None)
self.assertEqual(reverse_tails(3), 3)
def test_tail(self):
"""tail should return prob/2 if test is true, or 1-(prob/2) if false"""
self.assertFloatEqual(tail(0.25, True), 0.125)
self.assertFloatEqual(tail(0.25, False), 0.875)
self.assertFloatEqual(tail(1, True), 0.5)
self.assertFloatEqual(tail(1, False), 0.5)
self.assertFloatEqual(tail(0, True), 0)
self.assertFloatEqual(tail(0, False), 1)
def test_z_test(self):
"""z_test should give correct values"""
sample = array([1,2,3,4,5])
self.assertFloatEqual(z_test(sample, 3, 1), (0,1))
self.assertFloatEqual(z_test(sample, 3, 2, 'high'), (0,0.5))
self.assertFloatEqual(z_test(sample, 3, 2, 'low'), (0,0.5))
#check that population mean and variance, and tails, can be set OK.
self.assertFloatEqual(z_test(sample, 0, 1), (6.7082039324993694, \
1.9703444711798951e-11))
self.assertFloatEqual(z_test(sample, 1, 10), (0.44721359549995793, \
0.65472084601857694))
self.assertFloatEqual(z_test(sample, 1, 10, 'high'), \
(0.44721359549995793, 0.65472084601857694/2))
self.assertFloatEqual(z_test(sample, 1, 10, 'low'), \
(0.44721359549995793, 1-(0.65472084601857694/2)))
class CorrelationTests(TestsHelper):
"""Tests of correlation coefficients and Mantel test."""
def setUp(self):
"""Sets up variables used in the tests."""
super(CorrelationTests, self).setUp()
# For testing spearman and correlation_test using method='spearman'.
# Taken from the Spearman wikipedia article. Also used for testing
# Pearson (verified with R).
self.data1 = [106, 86, 100, 101, 99, 103, 97, 113, 112, 110]
self.data2 = [7, 0, 27, 50, 28, 29, 20, 12, 6, 17]
# For testing spearman.
self.a = [1,2,4,3,1,6,7,8,10,4]
self.b = [2,10,20,1,3,7,5,11,6,13]
self.c = [7,1,20,13,3,57,5,121,2,9]
self.r = (1.7,10,20,1.7,3,7,5,11,6.5,13)
self.x = (1, 2, 4, 3, 1, 6, 7, 8, 10, 4, 100, 2, 3, 77)
# Ranked copies for testing spearman.
self.b_ranked = [2, 7, 10, 1, 3, 6, 4, 8, 5, 9]
self.c_ranked = [5, 1, 8, 7, 3, 9, 4, 10, 2, 6]
def test_mantel(self):
"""mantel should be significant for same matrix, not for random"""
a = reshape(arange(25), (5,5))
a = tril(a) + tril(a).T
fill_diagonal(a, 0)
b = a.copy()
#closely related -- should be significant
self.assertCorrectPValue(0.0, 0.049, mantel, (a, b, 1000))
c = reshape(ones(25), (5,5))
c[0, 1] = 3.0
c[1, 0] = 3.0
fill_diagonal(c, 0)
#not related -- should not be significant
self.assertCorrectPValue(0.06, 1.0, mantel, (a, c, 1000))
def test_mantel_test_one_sided_greater(self):
"""Test one-sided mantel test (greater)."""
# This test output was verified by R (their mantel function does a
# one-sided greater test).
m1 = array([[0, 1, 2], [1, 0, 3], [2, 3, 0]])
m2 = array([[0, 2, 7], [2, 0, 6], [7, 6, 0]])
p, stat, perms = mantel_test(m1, m1, 999, alt='greater')
self.assertFloatEqual(stat, 1.0)
self.assertEqual(len(perms), 999)
self.assertCorrectPValue(0.09, 0.25, mantel_test, (m1, m1, 999),
{'alt':'greater'})
p, stat, perms = mantel_test(m1, m2, 999, alt='greater')
self.assertFloatEqual(stat, 0.755928946018)
self.assertEqual(len(perms), 999)
self.assertCorrectPValue(0.2, 0.5, mantel_test, (m1, m2, 999),
{'alt':'greater'})
def test_mantel_test_one_sided_less(self):
"""Test one-sided mantel test (less)."""
# This test output was verified by R (their mantel function does a
# one-sided greater test, but I modified their output to do a one-sided
# less test).
m1 = array([[0, 1, 2], [1, 0, 3], [2, 3, 0]])
m2 = array([[0, 2, 7], [2, 0, 6], [7, 6, 0]])
m3 = array([[0, 0.5, 0.25], [0.5, 0, 0.1], [0.25, 0.1, 0]])
p, stat, perms = mantel_test(m1, m1, 999, alt='less')
self.assertFloatEqual(p, 1.0)
self.assertFloatEqual(stat, 1.0)
self.assertEqual(len(perms), 999)
p, stat, perms = mantel_test(m1, m2, 999, alt='less')
self.assertFloatEqual(stat, 0.755928946018)
self.assertEqual(len(perms), 999)
self.assertCorrectPValue(0.6, 1.0, mantel_test, (m1, m2, 999),
{'alt':'less'})
p, stat, perms = mantel_test(m1, m3, 999, alt='less')
self.assertFloatEqual(stat, -0.989743318611)
self.assertEqual(len(perms), 999)
self.assertCorrectPValue(0.1, 0.25, mantel_test, (m1, m3, 999),
{'alt':'less'})
def test_mantel_test_two_sided(self):
"""Test two-sided mantel test."""
# This test output was verified by R (their mantel function does a
# one-sided greater test, but I modified their output to do a two-sided
# test).
m1 = array([[0, 1, 2], [1, 0, 3], [2, 3, 0]])
m2 = array([[0, 2, 7], [2, 0, 6], [7, 6, 0]])
m3 = array([[0, 0.5, 0.25], [0.5, 0, 0.1], [0.25, 0.1, 0]])
p, stat, perms = mantel_test(m1, m1, 999, alt='two sided')
self.assertFloatEqual(stat, 1.0)
self.assertEqual(len(perms), 999)
self.assertCorrectPValue(0.20, 0.45, mantel_test, (m1, m1, 999),
{'alt':'two sided'})
p, stat, perms = mantel_test(m1, m2, 999, alt='two sided')
self.assertFloatEqual(stat, 0.755928946018)
self.assertEqual(len(perms), 999)
self.assertCorrectPValue(0.6, 0.75, mantel_test, (m1, m2, 999),
{'alt':'two sided'})
p, stat, perms = mantel_test(m1, m3, 999, alt='two sided')
self.assertFloatEqual(stat, -0.989743318611)
self.assertEqual(len(perms), 999)
self.assertCorrectPValue(0.2, 0.45, mantel_test, (m1, m3, 999),
{'alt':'two sided'})
def test_mantel_test_invalid_distance_matrix(self):
"""Test mantel test with invalid distance matrix."""
# Single asymmetric, non-hollow distance matrix.
self.assertRaises(ValueError, mantel_test, array([[1, 2], [3, 4]]),
array([[0, 0], [0, 0]]), 999)
# Two asymmetric distance matrices.
self.assertRaises(ValueError, mantel_test, array([[0, 2], [3, 0]]),
array([[0, 1], [0, 0]]), 999)
def test_mantel_test_invalid_input(self):
"""Test mantel test with invalid input."""
self.assertRaises(ValueError, mantel_test, array([[1]]), array([[1]]),
999, alt='foo')
self.assertRaises(ValueError, mantel_test, array([[1]]),
array([[1, 2], [3, 4]]), 999)
self.assertRaises(ValueError, mantel_test, array([[1]]),
array([[1]]), 0)
self.assertRaises(ValueError, mantel_test, array([[1]]),
array([[1]]), -1)
def test_is_symmetric_and_hollow(self):
"""Should correctly test for symmetry and hollowness of dist mats."""
self.assertTrue(is_symmetric_and_hollow(array([[0, 1], [1, 0]])))
self.assertTrue(is_symmetric_and_hollow(matrix([[0, 1], [1, 0]])))
self.assertTrue(is_symmetric_and_hollow(matrix([[0.0, 0], [0.0, 0]])))
self.assertTrue(not is_symmetric_and_hollow(
array([[0.001, 1], [1, 0]])))
self.assertTrue(not is_symmetric_and_hollow(
array([[0, 1.1], [1, 0]])))
self.assertTrue(not is_symmetric_and_hollow(
array([[0.5, 1.1], [1, 0]])))
def test_flatten_lower_triangle(self):
"""Test flattening various dms' lower triangulars."""
self.assertEqual(_flatten_lower_triangle(array([[8]])), [])
self.assertEqual(_flatten_lower_triangle(array([[1, 2], [3, 4]])), [3])
self.assertEqual(_flatten_lower_triangle(array([[1, 2, 3], [4, 5, 6],
[7, 8, 9]])), [4, 7, 8])
def test_pearson(self):
"""Test pearson correlation method on valid data."""
# This test output was verified by R.
self.assertFloatEqual(pearson([1, 2], [1, 2]), 1.0)
self.assertFloatEqual(pearson([1, 2, 3], [1, 2, 3]), 1.0)
self.assertFloatEqual(pearson([1, 2, 3], [1, 2, 4]), 0.9819805)
def test_pearson_invalid_input(self):
"""Test running pearson on bad input."""
self.assertRaises(ValueError, pearson, [1.4, 2.5], [5.6, 8.8, 9.0])
self.assertRaises(ValueError, pearson, [1.4], [5.6])
def test_spearman(self):
"""Test the spearman function with valid input."""
# One vector has no ties.
exp = 0.3719581
obs = spearman(self.a, self.b)
self.assertFloatEqual(obs, exp)
# Both vectors have no ties.
exp = 0.2969697
obs = spearman(self.b, self.c)
self.assertFloatEqual(obs, exp)
# Both vectors have ties.
exp = 0.388381
obs = spearman(self.a, self.r)
self.assertFloatEqual(obs, exp)
exp = -0.17575757575757578
obs = spearman(self.data1, self.data2)
self.assertFloatEqual(obs, exp)
def test_spearman_no_variation(self):
"""Test the spearman function with a vector having no variation."""
exp = 0.0
obs = spearman([1, 1, 1], [1, 2, 3])
self.assertFloatEqual(obs, exp)
def test_spearman_ranked(self):
"""Test the spearman function with a vector that is already ranked."""
exp = 0.2969697
obs = spearman(self.b_ranked, self.c_ranked)
self.assertFloatEqual(obs, exp)
def test_spearman_one_obs(self):
"""Test running spearman on a single observation."""
self.assertRaises(ValueError, spearman, [1.0], [5.0])
def test_spearman_invalid_input(self):
"""Test the spearman function with invalid input."""
self.assertRaises(ValueError, spearman, [],[])
self.assertRaises(ValueError, spearman, self.a, [])
self.assertRaises(TypeError, spearman, {0:2}, [1,2,3])
def test_get_rank(self):
"""Test the _get_rank function with valid input."""
exp = ([1.5,3.5,7.5,5.5,1.5,9.0,10.0,11.0,12.0,7.5,14.0,3.5,5.5,13.0],
4)
obs = _get_rank(self.x)
self.assertFloatEqual(exp,obs)
exp = ([1.5,3.0,5.5,4.0,1.5,7.0,8.0,9.0,10.0,5.5],2)
obs = _get_rank(self.a)
self.assertFloatEqual(exp,obs)
exp = ([2,7,10,1,3,6,4,8,5,9],0)
obs = _get_rank(self.b)
self.assertFloatEqual(exp,obs)
exp = ([1.5,7.0,10.0,1.5,3.0,6.0,4.0,8.0,5.0,9.0], 1)
obs = _get_rank(self.r)
self.assertFloatEqual(exp,obs)
exp = ([],0)
obs = _get_rank([])
self.assertEqual(exp,obs)
def test_get_rank_invalid_input(self):
"""Test the _get_rank function with invalid input."""
vec = [1, 'a', 3, 2.5, 3, 1]
self.assertRaises(TypeError, _get_rank, vec)
vec = [1, 2, {1:2}, 2.5, 3, 1]
self.assertRaises(TypeError, _get_rank, vec)
vec = [1, 2, [23,1], 2.5, 3, 1]
self.assertRaises(TypeError, _get_rank, vec)
vec = [1, 2, (1,), 2.5, 3, 1]
self.assertRaises(TypeError, _get_rank, vec)
def test_correlation(self):
"""Correlations and significance should match R's cor.test()"""
x = [1,2,3,5]
y = [0,0,0,0]
z = [1,1,1,1]
a = [2,4,6,8]
b = [1.5, 1.4, 1.2, 1.1]
c = [15, 10, 5, 20]
bad = [1,2,3] #originally gave r = 1.0000000002
self.assertFloatEqual(correlation(x,x), (1, 0))
self.assertFloatEqual(correlation(x,y), (0,1))
self.assertFloatEqual(correlation(y,z), (0,1))
self.assertFloatEqualAbs(correlation(x,a), (0.9827076, 0.01729), 1e-5)
self.assertFloatEqualAbs(correlation(x,b), (-0.9621405, 0.03786), 1e-5)
self.assertFloatEqualAbs(correlation(x,c), (0.3779645, 0.622), 1e-3)
self.assertEqual(correlation(bad,bad), (1, 0))
def test_correlation_test_pearson(self):
"""Test correlation_test using pearson on valid input."""
# These results were verified with R.
# Test with non-default confidence level and permutations.
obs = correlation_test(self.data1, self.data2, method='pearson',
confidence_level=0.90, permutations=990)
self.assertFloatEqual(obs[:2], (-0.03760147, 0.91786297277172868))
self.assertEqual(len(obs[2]), 990)
for r in obs[2]:
self.assertTrue(r >= -1.0 and r <= 1.0)
self.assertCorrectPValue(0.9, 0.93, correlation_test,
(self.data1, self.data2),
{'method':'pearson', 'confidence_level':0.90,
'permutations':990}, p_val_idx=3)
self.assertFloatEqual(obs[4], (-0.5779077, 0.5256224))
# Test with non-default tail type.
obs = correlation_test(self.data1, self.data2, method='pearson',
confidence_level=0.90, permutations=990,
tails='low')
self.assertFloatEqual(obs[:2], (-0.03760147, 0.45893148638586434))
self.assertEqual(len(obs[2]), 990)
for r in obs[2]:
self.assertTrue(r >= -1.0 and r <= 1.0)
self.assertCorrectPValue(0.41, 0.46, correlation_test,
(self.data1, self.data2),
{'method':'pearson', 'confidence_level':0.90,
'permutations':990, 'tails':'low'}, p_val_idx=3)
self.assertFloatEqual(obs[4], (-0.5779077, 0.5256224))
def test_correlation_test_spearman(self):
"""Test correlation_test using spearman on valid input."""
# This example taken from Wikipedia page:
# http://en.wikipedia.org/wiki/Spearman's_rank_correlation_coefficient
obs = correlation_test(self.data1, self.data2, method='spearman',
tails='high')
self.assertFloatEqual(obs[:2], (-0.17575757575757578, 0.686405827612))
self.assertEqual(len(obs[2]), 999)
for rho in obs[2]:
self.assertTrue(rho >= -1.0 and rho <= 1.0)
self.assertCorrectPValue(0.67, 0.7, correlation_test,
(self.data1, self.data2),
{'method':'spearman', 'tails':'high'}, p_val_idx=3)
self.assertFloatEqual(obs[4],
(-0.7251388558041697, 0.51034422964834503))
# The p-value is off because the example uses a one-tailed test, while
# we use a two-tailed test. Someone confirms the answer that we get
# here for a two-tailed test:
# http://stats.stackexchange.com/questions/22816/calculating-p-value-
# for-spearmans-rank-correlation-coefficient-example-on-wikip
obs = correlation_test(self.data1, self.data2, method='spearman',
tails=None)
self.assertFloatEqual(obs[:2],
(-0.17575757575757578, 0.62718834477648433))
self.assertEqual(len(obs[2]), 999)
for rho in obs[2]:
self.assertTrue(rho >= -1.0 and rho <= 1.0)
self.assertCorrectPValue(0.60, 0.64, correlation_test,
(self.data1, self.data2),
{'method':'spearman', 'tails':None}, p_val_idx=3)
self.assertFloatEqual(obs[4],
(-0.7251388558041697, 0.51034422964834503))
def test_correlation_test_invalid_input(self):
"""Test correlation_test using invalid input."""
self.assertRaises(ValueError, correlation_test, self.data1, self.data2,
method='foo')
self.assertRaises(ValueError, correlation_test, self.data1, self.data2,
tails='foo')
self.assertRaises(ValueError, correlation_test, self.data1, self.data2,
permutations=-1)
self.assertRaises(ValueError, correlation_test, self.data1, self.data2,
confidence_level=-1)
self.assertRaises(ValueError, correlation_test, self.data1, self.data2,
confidence_level=1.1)
self.assertRaises(ValueError, correlation_test, self.data1, self.data2,
confidence_level=0)
self.assertRaises(ValueError, correlation_test, self.data1, self.data2,
confidence_level=0.0)
self.assertRaises(ValueError, correlation_test, self.data1, self.data2,
confidence_level=1)
self.assertRaises(ValueError, correlation_test, self.data1, self.data2,
confidence_level=1.0)
def test_correlation_test_no_permutations(self):
"""Test correlation_test with no permutations."""
# These results were verified with R.
exp = (-0.2581988897471611, 0.7418011102528389, [], None,
(-0.97687328610475876, 0.93488023560400879))
obs = correlation_test([1, 2, 3, 4], [1, 2, 1, 1], permutations=0)
self.assertFloatEqual(obs, exp)
def test_correlation_test_perfect_correlation(self):
"""Test correlation_test with perfectly-correlated input vectors."""
# These results were verified with R.
obs = correlation_test([1, 2, 3, 4], [1, 2, 3, 4])
self.assertFloatEqual(obs[:2],
(0.99999999999999978, 2.2204460492503131e-16))
self.assertEqual(len(obs[2]), 999)
for r in obs[2]:
self.assertTrue(r >= -1.0 and r <= 1.0)
self.assertCorrectPValue(0.06, 0.09, correlation_test,
([1, 2, 3, 4], [1, 2, 3, 4]), p_val_idx=3)
self.assertFloatEqual(obs[4], (0.99999999999998879, 1.0))
def test_correlation_test_small_obs(self):
"""Test correlation_test with a small number of observations."""
# These results were verified with R.
obs = correlation_test([1, 2, 3], [1, 2, 3])
self.assertFloatEqual(obs[:2], (1.0, 0))
self.assertEqual(len(obs[2]), 999)
for r in obs[2]:
self.assertTrue(r >= -1.0 and r <= 1.0)
self.assertCorrectPValue(0.3, 0.4, correlation_test,
([1, 2, 3], [1, 2, 3]), p_val_idx=3)
self.assertFloatEqual(obs[4], (None, None))
obs = correlation_test([1, 2, 3], [1, 2, 3], method='spearman')
self.assertFloatEqual(obs[:2], (1.0, 0))
self.assertEqual(len(obs[2]), 999)
for r in obs[2]:
self.assertTrue(r >= -1.0 and r <= 1.0)
self.assertCorrectPValue(0.3, 0.4, correlation_test,
([1, 2, 3], [1, 2, 3]), {'method':'spearman'}, p_val_idx=3)
self.assertFloatEqual(obs[4], (None, None))
def test_correlation_matrix(self):
"""Correlations in matrix should match values from R"""
a = [2,4,6,8]
b = [1.5, 1.4, 1.2, 1.1]
c = [15, 10, 5, 20]
m = correlation_matrix([a,b,c])
self.assertFloatEqual(m[0,0], [1.0])
self.assertFloatEqual([m[1,0], m[1,1]], [correlation(b,a)[0], 1.0])
self.assertFloatEqual(m[2], [correlation(c,a)[0], correlation(c,b)[0], \
1.0])
class Ftest(TestCase):
"""Tests for the F test"""
def test_f_value(self):
"""f_value: should calculate the correct F value if possible"""
a = array([1,3,5,7,9,8,6,4,2])
b = array([5,4,6,3,7,6,4,5])
self.assertEqual(f_value(a,b), (8,7,4.375))
self.assertFloatEqual(f_value(b,a), (7,8,0.2285714))
too_short = array([4])
self.assertRaises(ValueError, f_value, too_short, b)
def test_f_two_sample(self):
"""f_two_sample should match values from R"""
#The expected values in this test are obtained through R.
#In R the F test is var.test(x,y) different alternative hypotheses
#can be specified (two sided, less, or greater).
#The vectors are random samples from a particular normal distribution
#(mean and sd specified).
#a: 50 elem, mean=0 sd=1
a = [-0.70701689, -1.24788845, -1.65516470, 0.10443876, -0.48526915,
-0.71820656, -1.02603596, 0.03975982, -2.23404324, -0.21509363,
0.08438468, -0.01970062, -0.67907971, -0.89853667, 1.11137131,
0.05960496, -1.51172084, -0.79733957, -1.60040659, 0.80530639,
-0.81715836, -0.69233474, 0.95750665, 0.99576429, -1.61340216,
-0.43572590, -1.50862327, 0.92847551, -0.68382338, -1.12523522,
-0.09147488, 0.66756023, -0.87277588, -1.36539039, -0.11748707,
-1.63632578, -0.31343078, -0.28176086, 0.33854483, -0.51785630,
2.25360559, -0.80761191, 1.18983499, 0.57080342, -1.44601700,
-0.53906955, -0.01975266, -1.37147915, -0.31537616, 0.26877544]
#b: 50 elem, mean=0, sd=1.2
b=[0.081418743, 0.276571612, -1.864316504, 0.675213612, -0.769202643,
0.140372825, -1.426250184, 0.058617884, -0.819287409, -0.007701916,
-0.782722020, -0.285891593, 0.661980419, 0.383225191, 0.622444946,
-0.192446150, 0.297150571, 0.408896059, -0.167359383, -0.552381362,
0.982168338, 1.439730446, 1.967616101, -0.579607307, 1.095590943,
0.240591302, -1.566937143, -0.199091349, -1.232983905, 0.362378169,
1.166061081, -0.604676222, -0.536560206, -0.303117595, 1.519222792,
-0.319146503, 2.206220810, -0.566351124, -0.720397392, -0.452001377,
0.250890097, 0.320685395, -1.014632725, -3.010346273, -1.703955054,
0.592587381, -1.237451255, 0.172243366, -0.452641122, -0.982148581]
#c: 60 elem, mean=5, sd=1
c=[4.654329, 5.242129, 6.272640, 5.781779, 4.391241, 3.800752,
4.559463, 4.318922, 3.243020, 5.121280, 4.126385, 5.541131,
4.777480, 5.646913, 6.972584, 3.817172, 6.128700, 4.731467,
6.762068, 5.082983, 5.298511, 5.491125, 4.532369, 4.265552,
5.697317, 5.509730, 2.935704, 4.507456, 3.786794, 5.548383,
3.674487, 5.536556, 5.297847, 2.439642, 4.759836, 5.114649,
5.986774, 4.517485, 4.579208, 4.579374, 2.502890, 5.190955,
5.983194, 6.766645, 4.905079, 4.214273, 3.950364, 6.262393,
8.122084, 6.330007, 4.767943, 5.194029, 3.503136, 6.039079,
4.485647, 6.116235, 6.302268, 3.596693, 5.743316, 6.860152]
#d: 30 elem, mean=0, sd =0.05
d=[ 0.104517366, 0.023039678, 0.005579091, 0.052928250, 0.020724823,
-0.060823243, -0.019000890, -0.064133996, -0.016321594, -0.008898334,
-0.027626992, -0.051946186, 0.085269587, -0.031190678, 0.065172938,
-0.054628573, 0.019257306, -0.032427056, -0.058767356, 0.030927400,
0.052247357, -0.042954937, 0.031842104, 0.094130522, -0.024828465,
0.011320453, -0.016195062, 0.015631245, -0.050335598, -0.031658335]
a,b,c,d = map(array,[a,b,c,d])
self.assertEqual(map(len,[a,b,c,d]), [50, 50, 60, 30])
#allowed error. This big, because results from R
#are rounded at 4 decimals
error = 1e-4
self.assertFloatEqual(f_two_sample(a,a), (49, 49, 1, 1), eps=error)
self.assertFloatEqual(f_two_sample(a,b), (49, 49, 0.8575, 0.5925),
eps=error)
self.assertFloatEqual(f_two_sample(b,a), (49, 49, 1.1662, 0.5925),
eps=error)
self.assertFloatEqual(f_two_sample(a,b, tails='low'),
(49, 49, 0.8575, 0.2963), eps=error)
self.assertFloatEqual(f_two_sample(a,b, tails='high'),
(49, 49, 0.8575, 0.7037), eps=error)
self.assertFloatEqual(f_two_sample(a,c),
(49, 59, 0.6587, 0.1345), eps=error)
#p value very small, so first check df's and F value
self.assertFloatEqualAbs(f_two_sample(d,a, tails='low')[0:3],
(29, 49, 0.0028), eps=error)
assert f_two_sample(d,a, tails='low')[3] < 2.2e-16 #p value
def test_MonteCarloP(self):
"""MonteCarloP calcs a p-value from a val and list of random vals"""
val = 3.0
random_vals = [0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,8.0,9.0]
#test for "high" tail (larger values than expected by chance)
p_val = MonteCarloP(val, random_vals, 'high')
self.assertEqual(p_val, 0.7)
#test for "low" tail (smaller values than expected by chance)
p_val = MonteCarloP(val, random_vals, 'low')
self.assertEqual(p_val, 0.4)
class MannWhitneyTests(TestCase):
"""check accuracy of Mann-Whitney implementation"""
x = map(int, "104 109 112 114 116 118 118 119 121 123 125 126"\
" 126 128 128 128".split())
y = map(int, "100 105 107 107 108 111 116 120 121 123".split())
def test_mw_test(self):
"""mann-whitney test results should match Sokal & Rohlf"""
U, p = mw_test(self.x, self.y)
self.assertFloatEqual(U, 123.5)
self.assertTrue(0.02 <= p <= 0.05)
def test_mw_boot(self):
"""excercising the Monte-carlo variant of mann-whitney"""
U, p = mw_boot(self.x, self.y, 10)
self.assertFloatEqual(U, 123.5)
self.assertTrue(0 <= p <= 0.5)
class KendallTests(TestCase):
"""check accuracy of Kendall tests against values from R"""
def do_test(self, x, y, alt_expecteds):
"""conducts the tests for each alternate hypothesis against expecteds"""
for alt, exp_p, exp_tau in alt_expecteds:
tau, p_val = kendall_correlation(x, y, alt=alt, warn=False)
self.assertFloatEqual(tau, exp_tau, eps=1e-3)
self.assertFloatEqual(p_val, exp_p, eps=1e-3)
def test_exact_calcs(self):
"""calculations of exact probabilities should match R"""
x = (44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1)
y = ( 2.6, 3.1, 2.5, 5.0, 3.6, 4.0, 5.2, 2.8, 3.8)
expecteds = [["gt", 0.05972, 0.4444444],
["lt", 0.9624, 0.4444444],
["ts", 0.1194, 0.4444444]]
self.do_test(x,y,expecteds)
def test_with_ties(self):
"""tied values calculated from normal approx"""
# R example with ties in x
x = (44.4, 45.9, 41.9, 53.3, 44.4, 44.1, 50.7, 45.2, 60.1)
y = ( 2.6, 3.1, 2.5, 5.0, 3.6, 4.0, 5.2, 2.8, 3.8)
expecteds = [#["gt", 0.05793, 0.4225771],
["lt", 0.942, 0.4225771],
["ts", 0.1159, 0.4225771]]
self.do_test(x,y,expecteds)
# R example with ties in y
x = (44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1)
y = ( 2.6, 3.1, 2.5, 5.0, 3.1, 4.0, 5.2, 2.8, 3.8)
expecteds = [["gt", 0.03737, 0.4789207],
["lt", 0.9626, 0.4789207],
["ts", 0.07474, 0.4789207]]
self.do_test(x,y,expecteds)
# R example with ties in x and y
x = (44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 44.4, 60.1)
y = ( 2.6, 3.6, 2.5, 5.0, 3.6, 4.0, 5.2, 2.8, 3.8)
expecteds=[["gt", 0.02891, 0.5142857],
["lt", 0.971, 0.5142857],
["ts", 0.05782, 0.5142857]]
self.do_test(x,y,expecteds)
def test_bigger_vectors(self):
"""docstring for test_bigger_vectors"""
# q < expansion
x= (0.118583104633, 0.227860069338, 0.143856130991, 0.935362617582,
0.0471303856799, 0.659819202174, 0.739247965907, 0.268929000278,
0.848250568194, 0.307764819102, 0.733949480141, 0.271662210481,
0.155903098872)
y= (0.749762144455, 0.407571703468, 0.934176427266, 0.188638794706,
0.184844781493, 0.391485553856, 0.735504815302, 0.363655952442,
0.18489971978, 0.851075466765, 0.139932273818, 0.333675110224,
0.570250937033)
expecteds = [["gt", 0.9183, -0.2820513],
["lt", 0.1022, -0.2820513],
["ts", 0.2044, -0.2820513]]
self.do_test(x,y,expecteds)
# q > expansion
x= (0.2602556958, 0.441506392849, 0.930624643531, 0.728461775775,
0.234341774892, 0.725677256368, 0.354788882728, 0.475882541956,
0.347533553428, 0.608578046857, 0.144697962102, 0.784502692164,
0.872607603407)
y= (0.753056395718, 0.454332072011, 0.791882395707, 0.622853579015,
0.127030232518, 0.232086215578, 0.586604349918, 0.0139051260749,
0.579079370051, 0.0550643809812, 0.94798878249, 0.318410679439,
0.86725134615)
expecteds = [["gt", 0.4762, 0.02564103],
["lt", 0.5711, 0.02564103],
["ts", 0.9524, 0.02564103]]
self.do_test(x,y,expecteds)
class TestDistMatrixPermutationTest(TestCase):
"""Tests of distance_matrix_permutation_test"""
def setUp(self):
"""sets up variables for testing"""
self.matrix = array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
self.cells = [(0,1), (1,3)]
self.cells2 = [(0,2), (2,3)]
def test_get_ltm_cells(self):
"get_ltm_cells converts indices to be below the diagonal"
cells = [(0,0),(0,1),(0,2),(1,0),(1,1),(1,2),(2,0),(2,1),(2,2)]
result = get_ltm_cells(cells)
self.assertEqual(result, [(2, 0), (1, 0), (2, 1)])
cells = [(0,1),(0,2)]
result = get_ltm_cells(cells)
self.assertEqual(result, [(2, 0), (1, 0)])
def test_get_values_from_matrix(self):
"""get_values_from_matrix returns the special and other values from matrix"""
matrix = self.matrix
cells = [(1,0),(0,1),(2,0),(2,1)]
#test that works for a symmetric matrix
cells_sym = get_ltm_cells(cells)
special_vals, other_vals = get_values_from_matrix(matrix, cells_sym,\
cells2=None, is_symmetric=True)
special_vals.sort()
other_vals.sort()
self.assertEqual(special_vals, [5,9,10])
self.assertEqual(other_vals, [13,14,15])
#test that work for a non symmetric matrix
special_vals, other_vals = get_values_from_matrix(matrix, cells,\
cells2=None, is_symmetric=False)
special_vals.sort()
other_vals.sort()
self.assertEqual(special_vals, [2,5,9,10])
self.assertEqual(other_vals, [1,3,4,6,7,8,11,12,13,14,15,16])
#test that works on a symmetric matrix when cells2 is defined
cells2 = [(3,0),(3,2),(0,3)]
cells2_sym = get_ltm_cells(cells2)
special_vals, other_vals = get_values_from_matrix(matrix, cells_sym,\
cells2=cells2_sym, is_symmetric=True)
special_vals.sort()
other_vals.sort()
self.assertEqual(special_vals, [5,9,10])
self.assertEqual(other_vals, [13,15])
#test that works when cells2 is defined and not symmetric
special_vals, other_vals = get_values_from_matrix(matrix, cells, cells2=cells2,\
is_symmetric=False)
special_vals.sort()
other_vals.sort()
self.assertEqual(special_vals, [2,5,9,10])
self.assertEqual(other_vals, [4,13,15])
def test_distance_matrix_permutation_test_non_symmetric(self):
""" evaluate empirical p-values for a non symmetric matrix
To test the empirical p-values, we look at a simple 3x3 matrix
b/c it is easy to see what t score every permutation will
generate -- there's only 6 permutations.
Running dist_matrix_test with n=1000, we expect that each
permutation will show up 160 times, so we know how many
times to expect to see more extreme t scores. We therefore
know what the empirical p-values will be. (n=1000 was chosen
empirically -- smaller values seem to lead to much more frequent
random failures.)
"""
def make_result_list(*args, **kwargs):
return [distance_matrix_permutation_test(*args,**kwargs)[2] \
for i in range(10)]
m = arange(9).reshape((3,3))
n = 100
# looks at each possible permutation n times --
# compare first row to rest
r = make_result_list(m, [(0,0),(0,1),(0,2)],n=n,is_symmetric=False)
self.assertSimilarMeans(r, 0./6.)
r = make_result_list(m, [(0,0),(0,1),(0,2)],n=n,is_symmetric=False,\
tails='high')
self.assertSimilarMeans(r, 4./6.)
r = make_result_list(m, [(0,0),(0,1),(0,2)],n=n,is_symmetric=False,\
tails='low')
self.assertSimilarMeans(r, 0./6.)
# looks at each possible permutation n times --
# compare last row to rest
r = make_result_list(m, [(2,0),(2,1),(2,2)],n=n,is_symmetric=False)
self.assertSimilarMeans(r, 0./6.)
r = make_result_list(m, [(2,0),(2,1),(2,2)],n=n,is_symmetric=False,\
tails='high')
self.assertSimilarMeans(r, 0./6.)
r = make_result_list(m, [(2,0),(2,1),(2,2)],n=n,is_symmetric=False,\
tails='low')
self.assertSimilarMeans(r, 4./6.)
def test_distance_matrix_permutation_test_symmetric(self):
""" evaluate empirical p-values for symmetric matrix
See test_distance_matrix_permutation_test_non_symmetric
doc string for a description of how this test works.
"""
def make_result_list(*args, **kwargs):
return [distance_matrix_permutation_test(*args)[2] for i in range(10)]
m = array([[0,1,3],[1,2,4],[3,4,5]])
# looks at each possible permutation n times --
# compare first row to rest
n = 100
# looks at each possible permutation n times --
# compare first row to rest
r = make_result_list(m, [(0,0),(0,1),(0,2)],n=n)
self.assertSimilarMeans(r, 0./6.)
r = make_result_list(m, [(0,0),(0,1),(0,2)],n=n,tails='high')
self.assertSimilarMeans(r, 0.77281447417149496,0)
r = make_result_list(m, [(0,0),(0,1),(0,2)],n=n,tails='low')
self.assertSimilarMeans(r, 4./6.)
## The following lines are not part of the test code, but are useful in
## figuring out what t-scores all of the permutations will yield.
#permutes = [[0, 1, 2], [0, 2, 1], [1, 0, 2],\
# [1, 2, 0], [2, 0, 1], [2, 1, 0]]
#results = []
#for p in permutes:
# p_m = permute_2d(m,p)
# results.append(t_two_sample(\
# [p_m[0,1],p_m[0,2]],[p_m[2,1]],tails='high'))
#print results
def test_distance_matrix_permutation_test_alt_stat(self):
def fake_stat_test(a,b,tails=None):
return 42.,42.
m = array([[0,1,3],[1,2,4],[3,4,5]])
self.assertEqual(distance_matrix_permutation_test(m,\
[(0,0),(0,1),(0,2)],n=5,f=fake_stat_test),(42.,42.,0.))
def test_distance_matrix_permutation_test_return_scores(self):
""" return_scores=True functions as expected """
# use alt statistical test to make results simple
def fake_stat_test(a,b,tails=None):
return 42.,42.
m = array([[0,1,3],[1,2,4],[3,4,5]])
self.assertEqual(distance_matrix_permutation_test(\
m,[(0,0),(0,1),(0,2)],\
n=5,f=fake_stat_test,return_scores=True),(42.,42.,0.,[42.]*5))
def test_ANOVA_one_way(self):
"""ANOVA one way returns same values as ANOVA on a stats package
"""
g1 = Numbers([10.0, 11.0, 10.0, 5.0, 6.0])
g2 = Numbers([1.0, 2.0, 3.0, 4.0, 1.0, 2.0])
g3 = Numbers([6.0, 7.0, 5.0, 6.0, 7.0])
i = [g1, g2, g3]
dfn, dfd, F, between_MS, within_MS, group_means, prob = ANOVA_one_way(i)
self.assertEqual(dfn, 2)
self.assertEqual(dfd, 13)
self.assertFloatEqual(F, 18.565450643776831)
self.assertFloatEqual(between_MS, 55.458333333333343)
self.assertFloatEqual(within_MS, 2.9871794871794868)
self.assertFloatEqual(group_means, [8.4000000000000004, 2.1666666666666665, 6.2000000000000002])
self.assertFloatEqual(prob, 0.00015486238993089464)
#execute tests if called from command line
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_maths/test_stats/test_util.py 000644 000765 000024 00000233537 12024702176 024356 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests Numbers and Freqs objects, and their Unsafe versions.
"""
from math import sqrt
import numpy
from cogent.util.unit_test import TestCase, main
from cogent.maths.stats.util import SummaryStatistics, SummaryStatisticsError,\
Numbers, UnsafeNumbers, Freqs, UnsafeFreqs, NumberFreqs, \
UnsafeNumberFreqs
from cogent.util.misc import ConstraintError
from operator import add, sub, mul
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Sandra Smit"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class SummaryStatisticsTests(TestCase):
"""Tests of summary stats functions."""
def test_init(self):
"""SummaryStatistics should initialize correctly."""
#check empty init -- can access private vars, but can't get
#properties.
s = SummaryStatistics()
self.assertEqual(s._count, None)
self.assertRaises(SummaryStatisticsError, getattr, s, 'Count')
#check init with one positional parameter
s = SummaryStatistics(1)
self.assertEqual(s.Count, 1)
#check init with all positional parameters.
#note that inconsistent data can sneak in (c.f. sd vs var)
s = SummaryStatistics(1,2,3,4,5,6)
self.assertEqual(s.Count, 1)
self.assertEqual(s.Sum, 2)
self.assertEqual(s.Mean, 3)
self.assertEqual(s.StandardDeviation, 4)
self.assertEqual(s.Variance, 5)
self.assertEqual(s.SumSquares, 6)
def test_str(self):
"""SummaryStatistics str should print known fields."""
s = SummaryStatistics()
self.assertEqual(str(s), '')
#note that additional fields will fill in if they can be calculated.
s = SummaryStatistics(Mean=3, StandardDeviation=2)
#now expect to print as table
self.assertEqual(str(s), '==========================\n Statistic Value\n--------------------------\n Mean 3\nStandardDeviation 2\n Variance 4\n--------------------------')
def test_Count(self):
"""SummaryStatistics Count should work if Count or Sum and Mean ok"""
s = SummaryStatistics(Count=3)
self.assertEqual(s.Count, 3)
s = SummaryStatistics(Sum=10, Mean=5)
self.assertEqual(s.Count, 2)
#if inconsistent, believes Count
s = SummaryStatistics(Count=3, Sum=2, Mean=5)
self.assertEqual(s.Count, 3)
#doesn't work with just sum or mean
s = SummaryStatistics(Mean=3)
self.assertRaises(SummaryStatisticsError, getattr, s, 'Count')
def test_Sum(self):
"""SummaryStatistics Sum should work if Sum or Count and Mean ok"""
s = SummaryStatistics(Sum=3)
self.assertEqual(s.Sum, 3)
s = SummaryStatistics(Count=3, Mean=5)
self.assertEqual(s.Sum, 15)
def test_Mean(self):
"""SummaryStatistics Mean should work if Mean or Count and Sum ok"""
s = SummaryStatistics(Mean=3)
self.assertEqual(s.Mean, 3)
s = SummaryStatistics(Count=3, Sum=15)
self.assertEqual(s.Mean, 5)
def test_StandardDeviation(self):
"""SummaryStatistics StandardDeviation should work if it or variance ok"""
s = SummaryStatistics(StandardDeviation=3)
self.assertEqual(s.StandardDeviation, 3)
self.assertEqual(s.Variance, 9)
s = SummaryStatistics(Variance=9)
self.assertEqual(s.StandardDeviation, 3)
def test_Variance(self):
"""SummaryStatistics Variance should work if it or std dev ok"""
s = SummaryStatistics(StandardDeviation=3)
self.assertEqual(s.StandardDeviation, 3)
self.assertEqual(s.Variance, 9)
s = SummaryStatistics(Variance=9)
self.assertEqual(s.StandardDeviation, 3)
def test_SumSquares(self):
"""SummaryStatistics SumSquares should work if set"""
s = SummaryStatistics(SumSquares=3)
self.assertEqual(s.SumSquares, 3)
s = SummaryStatistics(Sum=3)
self.assertRaises(SummaryStatisticsError, getattr, s, 'SumSquares')
def test_cmp(self):
"""SummaryStatistics should sort by count, then sum, then variance"""
a = SummaryStatistics(Count=3)
b = SummaryStatistics(Count=4)
c = SummaryStatistics(Count=3, Sum=5)
d = SummaryStatistics(Count=3, Sum=10)
e = SummaryStatistics(Sum=10)
assert a < b
assert b > a
assert a == a
assert a < b
assert c < d
assert e < a
all = [c,a,d,b,e]
all.sort()
self.assertEqual(all, [e,a,c,d,b])
class NumbersTestsI(object):
"""Abstract class with tests for Numbers objects.
Inherited by safe and unsafe versions to test polymorphism.
"""
ClassToTest = None
def test_init_empty(self):
"""Numbers should initialize OK with empty list"""
self.assertEqual(self.ClassToTest([]), [])
def test_init_single(self):
"""Numbers should initialize OK with single number"""
self.assertEqual(self.ClassToTest([5.0]), [5.0])
def test_init_list(self):
"""Numbers should initialize OK with list of numbers"""
self.assertEqual(self.ClassToTest([1, 5.0, 3.2]), [1, 5.0, 3.2])
def test_init_bad_type(self):
"""Numbers should fail with TypeError if input not iterable"""
self.assertRaises(TypeError, self.ClassToTest, 34)
def test_add_nonempty(self):
"""Numbers should allow addition of two nonempty Numbers"""
#test that addition works in the right direction
self.assertFloatEqual(self.integers + self.floats,
Numbers([1, 2, 3, 4, 5, 1.5, 2.7]))
#test that neither of the things that were added was changed
self.assertFloatEqual(self.integers, [1,2,3,4,5])
self.assertFloatEqual(self.floats, [1.5, 2.7])
def test_add_empty(self):
"""Numbers should be unchanged on addition of empty list"""
#test that addition of an empty list works
self.assertFloatEqual(self.integers + self.empty, self.integers)
self.assertFloatEqual(self.empty + self.floats, self.floats)
def test_add_repeated(self):
"""Numbers should support repeated addition, a+b+c"""
self.assertFloatEqual(self.floats + self.floats + self.floats,
[1.5, 2.7]*3)
def test_iadd(self):
"""Numbers should support in-place addition"""
self.floats += [4]
self.assertFloatEqual(self.floats, [1.5, 2.7, 4.0])
def test_setitem(self):
"""Numbers should support assignment to positive index"""
self.floats[0] = 1
self.assertFloatEqual(self.floats, [1.0, 2.7])
def test_setitem_negative_index(self):
"""Numbers should support assignment to negative index"""
self.floats[-1] = 2
self.assertFloatEqual(self.floats, [1.5, 2.0])
def test_setslice(self):
"""Numbers should support slice assignment"""
self.floats[0:1] = [1, 2, 3]
self.assertFloatEqual(self.floats, [1, 2, 3, 2.7])
def test_append_good(self):
"""Numbers should support append of a number"""
self.floats.append(1)
self.assertFloatEqual(self.floats,
[1.5, 2.7, 1.0])
def test_extend(self):
"""Numbers should support extend with a sequence"""
self.floats.extend([5,5,5])
self.assertFloatEqual(self.floats,
[1.5, 2.7, 5.0, 5.0, 5.0])
def test_items(self):
"""Numbers should support items() method"""
self.assertFloatEqual(self.floats.items()[0], (1.5, 1))
self.assertFloatEqual(self.floats.items()[1], (2.7, 1))
def test_isValid(self):
"""Numbers isValid should return True if all items numbers"""
for i in [self.empty, self.integers, self.floats, self.mixed]:
assert i.isValid()
def test_toFixedWidth(self):
"""Numbers should be able to convert items to fixed-width string"""
self.assertEqual(self.floats.toFixedWidth(), " +1.50e+00 +2.70e+00")
def test_toFixedWidth_empty(self):
"""Numbers should return empty string when converting no items"""
self.assertEqual(self.empty.toFixedWidth(), '')
def test_toFixedWidth_mixed(self):
"""Numbers should convert all kinds of floats to fixed precision"""
self.assertEqual(self.mixed.toFixedWidth(), ''.join([
' +0.00e+00',
' +1.00e+00',
' -1.00e+00',
' +1.23e+00',
' -1.24e+00',
'+1.23e+302',
'+1.23e-298',
'-1.23e+302',
'-1.23e-298',
]))
def test_toFixedWidth_specified_precision(self):
"""Numbers should convert all kinds of floats to specified precision"""
self.assertEqual(self.mixed.toFixedWidth(7), ''.join([
' +0e+00',
' +1e+00',
' -1e+00',
' +1e+00',
' -1e+00',
'+1e+302',
'+1e-298',
'-1e+302',
'-1e-298',
]))
self.assertEqual(self.mixed.toFixedWidth(8), ''.join([
' +0e+00',
' +1e+00',
' -1e+00',
' +1e+00',
' -1e+00',
' +1e+302',
' +1e-298',
' -1e+302',
' -1e-298',
]))
self.assertEqual(self.mixed.toFixedWidth(12), ''.join([
' +0.0000e+00',
' +1.0000e+00',
' -1.0000e+00',
' +1.2346e+00',
' -1.2368e+00',
'+1.2340e+302',
'+1.2340e-298',
'-1.2340e+302',
'-1.2340e-298',
]))
def test_normalize(self):
"""Numbers normalize should return items summing to 1 by default"""
first = self.ints
second = self.fracs
first.normalize()
second.normalize()
self.assertFloatEqual(first, second)
self.assertFloatEqual(first.Sum, 1)
self.assertFloatEqual(second.Sum, 1)
empty = self.empty
empty.normalize()
self.assertEqual(empty, [])
zero = self.zero
zero.normalize()
self.assertEqual(zero, [0,0,0,0,0])
def test_normalize_parameter(self):
"""Numbers normalize(x) should divide items by x"""
first = self.ClassToTest([0, 1, 2, 3, 4])
first.normalize(max(first))
self.assertFloatEqual(first, [0, 1.0/4, 2.0/4, 3.0/4, 4.0/4])
second = self.ClassToTest([0, 1, 2])
second.normalize(0.5)
self.assertFloatEqual(second, [0, 2, 4])
def test_accumulate(self):
"""Numbers accumulate should do cumulative sum in place"""
nl = self.ClassToTest([0, 1, 2, 3, 4])
nl.accumulate()
self.assertEqual(nl, [0, 1, 3, 6, 10])
nl = self.ClassToTest()
nl.accumulate()
self.assertEqual(nl, [])
def test_firstIndexLessThan(self):
"""Numbers firstIndexLessThan should return first index less than val"""
nl = self.ints
f = nl.firstIndexLessThan
self.assertEqual(f(-50), None)
self.assertEqual(f(100), 0)
self.assertEqual(f(3), 0)
self.assertEqual(f(1), None)
self.assertEqual(f(1, inclusive=True), 0)
self.assertEqual(f(-50, stop_at_ends=True), 4)
def test_firstIndexGreaterThan(self):
"""Numbers firstIndexGreaterThan should return first index less than val"""
nl = self.ints
f = nl.firstIndexGreaterThan
self.assertEqual(f(-50), 0)
self.assertEqual(f(100), None)
self.assertEqual(f(3), 3)
self.assertEqual(f(1), 1)
self.assertEqual(f(1, inclusive=True), 0)
self.assertEqual(f(2), 2)
self.assertEqual(f(2, inclusive=True), 1)
self.assertEqual(f(100, stop_at_ends=True), 4)
#compatibility tests with old choose()
"""Numbers choose should return correct index"""
nl = self.ClassToTest([1, 2, 3, 4, 5])
nl.normalize()
nl.accumulate()
known_values = [
(-50, 0),
(0, 0),
(0.001, 0),
(1/15.0 - 0.001, 0),
(1/15.0 + 0.001, 1),
(3/15.0 + 0.001, 2),
(1, 4),
(10, 4),
]
for test, result in known_values:
self.assertFloatEqual(nl.firstIndexGreaterThan(test, inclusive=True, stop_at_ends=True), result)
def test_lastIndexGreaterThan(self):
"""Numbers lastIndexGreaterThan should return last index > val"""
nl = self.ints
f = nl.lastIndexGreaterThan
self.assertEqual(f(-50), 4)
self.assertEqual(f(100), None)
self.assertEqual(f(3), 4)
self.assertEqual(f(1), 4)
self.assertEqual(f(1, inclusive=True), 4)
self.assertEqual(f(100, stop_at_ends=True), 0)
def test_lastIndexLessThan(self):
"""Numbers lastIndexLessThan should return last index < val"""
nl = self.ints
f = nl.lastIndexLessThan
self.assertEqual(f(-50), None)
self.assertEqual(f(100), 4)
self.assertEqual(f(3), 1)
self.assertEqual(f(1), None)
self.assertEqual(f(1, inclusive=True), 0)
self.assertEqual(f(-50, stop_at_ends=True), 0)
def test_Sum(self):
"""Numbers Sum should be the same as sum()"""
self.assertEqual(self.ints.Sum, 15)
self.assertEqual(self.empty.Sum, 0)
def test_Count(self):
"""Numbers Count should be the same as len()"""
self.assertEqual(self.ints.Count, 5)
self.assertEqual(self.empty.Count, 0)
def test_SumSquares(self):
"""Numbers SumSquares should be sum of squares"""
self.assertEqual(self.ints.SumSquares, (1*1+2*2+3*3+4*4+5*5))
self.assertEqual(self.empty.SumSquares, 0)
def test_Variance(self):
"""Numbers Variance should be variance of individual numbers"""
self.assertEqual(self.empty.Variance, None)
self.assertEqual(self.zero.Variance, 0)
self.assertFloatEqual(self.ints.Variance, 2.5)
def test_StandardDeviation(self):
"""Numbers StandardDeviation should be sd of individual numbers"""
self.assertEqual(self.empty.StandardDeviation, None)
self.assertEqual(self.zero.StandardDeviation, 0)
self.assertFloatEqual(self.ints.StandardDeviation, sqrt(2.5))
def test_Mean(self):
"""Numbers Mean should be mean of individual numbers"""
self.assertEqual(self.empty.Mean, None)
self.assertEqual(self.zero.Mean, 0)
self.assertEqual(self.ints.Mean, 3)
def test_NumberQuantiles(self):
"""quantiles should be correct"""
num = self.ClassToTest(range(1,11))
self.assertFloatEqual(num.quantile(.1), 1.9)
self.assertFloatEqual(num.quantile(.2), 2.8)
self.assertFloatEqual(num.quantile(.25), 3.25)
self.assertFloatEqual(num.Median, 5.5)
self.assertFloatEqual(num.quantile(.75), 7.75)
self.assertFloatEqual(num.quantile(.77), 7.93)
def test_summarize(self):
"""Numbers summarize should return SummaryStatistics object"""
self.assertEqual(self.ints.summarize(), SummaryStatistics(Mean=3,\
Variance=2.5, Count=5))
def test_choice(self):
"""Numbers choice should return random element from self"""
nums = [self.ints.choice() for i in range(10)]
self.assertEqual(len(nums), 10)
for n in nums:
assert n in self.ints
v = Numbers(nums).Variance
self.assertNotEqual(v, 0)
def test_randomSequence(self):
"""Numbers randomSequence should return random sequence from self"""
nums = self.ints.randomSequence(10)
nums = [self.ints.choice() for i in range(10)]
self.assertEqual(len(nums), 10)
for n in nums:
assert n in self.ints
v = Numbers(nums).Variance
self.assertNotEqual(v, 0)
def test_subset(self):
"""Numbers subset should delete (or keep) selected items"""
odd = [5,1,3]
nums = self.ints
nums.extend([1,1,1])
new_nums = nums.copy()
new_nums.subset(odd)
self.assertEqual(new_nums, [1,3,5,1,1,1])
new_nums = nums.copy()
new_nums.subset(odd, keep=False)
self.assertEqual(new_nums, [2,4])
def test_copy(self):
"""Numbers copy should leave class intact (unlike slice)"""
c = self.ints.copy()
self.assertEqual(c, self.ints)
self.assertEqual(c.__class__, self.ints.__class__)
def test_round(self):
"""Numbers round should round numbers in-place"""
self.floats.round()
self.assertEqual(self.floats, [2.0,3.0])
for i, f in enumerate(self.floats):
self.floats[i] = self.floats[i] + 0.101
self.assertNotEqual(self.floats, [2.0,3.0])
self.assertNotEqual(self.floats, [2.1,3.1])
self.floats.round(1)
self.assertEqual(self.floats, [2.1,3.1])
def test_Uncertainty(self):
"""Numbers Uncertainty should act via Freqs"""
self.assertEqual(self.floats.Uncertainty, \
Freqs(self.floats).Uncertainty)
self.assertNotEqual(self.floats.Uncertainty, None)
def test_Mode(self):
"""Numbers Mode should return most common element"""
self.assertEqual(self.empty.Mode, None)
self.assertEqual(self.zero.Mode, 0)
self.ints.extend([1,2,2,3,3,3])
self.assertEqual(self.ints.Mode, 3)
class NumbersTests(TestCase, NumbersTestsI):
"""Tests of the (safe) Numbers class."""
ClassToTest = Numbers
def setUp(self):
"""define some standard lists"""
self.empty = self.ClassToTest([])
self.integers = self.ClassToTest([1,2,3,4,5])
self.floats = self.ClassToTest([1.5, 2.7])
self.mixed = self.ClassToTest([
0,
1,
-1,
1.234567890,
-1.2367890,
123.4e300,
123.4e-300,
-123.4e300,
-123.4e-300,
])
self.zero = self.ClassToTest([0,0,0,0,0])
self.ints = self.ClassToTest([1,2,3,4,5])
self.fracs = self.ClassToTest([0.1,0.2,0.3,0.4,0.5])
def test_init_string(self):
"""Numbers should initialize by treating string as list of digits"""
self.assertEqual(self.ClassToTest('102'), [1.0, 0.0, 2.0])
def test_init_bad_string(self):
"""Numbers should raise ValueError if float() can't convert string"""
self.assertRaises(ValueError, self.ClassToTest, '102a')
def test_append_bad(self):
"""Numbers should reject append of a non-number"""
self.assertRaises(ValueError, self.floats.append, "abc")
class UnsafeNumbersTests(TestCase, NumbersTestsI):
"""Tests of the UnsafeNumbers class."""
ClassToTest = UnsafeNumbers
def setUp(self):
"""define some standard lists"""
self.empty = self.ClassToTest([])
self.integers = self.ClassToTest([1,2,3,4,5])
self.floats = self.ClassToTest([1.5, 2.7])
self.mixed = self.ClassToTest([
0,
1,
-1,
1.234567890,
-1.2367890,
123.4e300,
123.4e-300,
-123.4e300,
-123.4e-300,
])
self.zero = self.ClassToTest([0,0,0,0,0])
self.ints = self.ClassToTest([1,2,3,4,5])
self.fracs = self.ClassToTest([0.1,0.2,0.3,0.4,0.5])
def test_init_string(self):
"""UnsafeNumbers should treat string as list of chars"""
self.assertEqual(self.ClassToTest('102'), ['1','0','2'])
def test_init_bad_string(self):
"""UnsafeNumbers should silently incorporate unfloatable string"""
self.assertEqual(self.ClassToTest('102a'), ['1','0','2','a'])
def test_append_bad(self):
"""UnsafeNumbers should allow append of a non-number"""
self.empty.append('abc')
self.assertEquals(self.empty, ['abc'])
def test_isValid_bad(self):
"""UnsafeNumbers should return False if invalid"""
assert self.mixed.isValid()
self.mixed.append('abc')
assert not self.mixed.isValid()
class StaticFreqsTestsI(object):
"""Tests of the interface shared by Freqs and UnsafeFreqs (static keys).
All of these tests assume keys on the alphabet 'abcde'.
These tests were added to ensure that array-based objects that implement
a fixed set of keys maintain the appropriate portion of the freqs interface.
"""
ClassToTest = None
def setUp(self):
"""Standard cases to test."""
self.Alphabetic = self.ClassToTest({'a':3,'b':2,'c':1,'d':1,'e':1})
self.Empty = self.ClassToTest({})
self.Constant = self.ClassToTest({'a':5})
#The following test various ways of constructing the objects
def test_fromTuples(self):
"""Freqs fromTuples should add from key, count pairs w/ repeated keys"""
ct = self.ClassToTest
f = ct()
self.assertEqual(f.fromTuples([('a',4),('b',3),('a',2)]), \
ct({'a':6,'b':3}))
#note: should be allowed to subtract, as long as freq doesn't go
#negative.
f.fromTuples([('b',-1),('c',4.5)])
self.assertEqual(f, ct({'a':6,'b':2,'c':4.5}))
#should work with a different operator
f.fromTuples([('b',7)], op=mul)
self.assertEqual(f, ct({'a':6, 'b':14, 'c':4.5}))
#check that it works with something that depends on the key
def func(key, first, second):
if key == 'a':
return first + second
else:
return max(second, first * second)
f = ct()
self.assertEqual(f.fromTuples([('a',4),('b',3),('a',2), ('b',4)], \
func,uses_key=True), ct({'a':6,'b':12}))
def test_newFromTuples(self):
"""Freqs newFromTuples should work as expected."""
ct = self.ClassToTest
self.assertEqual(ct.newFromTuples([('a',4),('b',3),('a',2)]), \
ct({'a':6,'b':3}))
def test_fromDict(self):
"""Freqs fromDict should add from dict of {key:count}"""
ct = self.ClassToTest
f = ct()
self.assertEqual(f.fromDict({'a':6,'b':3}), ct({'a':6,'b':3}))
#note: should be allowed to subtract, as long as freq doesn't go
#negative.
f.fromDict({'b':-1, 'c':4.5})
self.assertEqual(f, ct({'a':6,'b':2,'c':4.5}))
#should work with a different operator
f.fromDict({'b':7}, op=mul)
self.assertEqual(f, ct({'a':6, 'b':14, 'c':4.5}))
def test_newFromDict(self):
"""Freqs newFromDict should work as expected."""
ct = self.ClassToTest
self.assertEqual(ct.newFromDict({'a':6,'b':3}), ct({'a':6,'b':3}))
def test_fromDicts(self):
"""Freqs fromDicts should add from list of dicts of {key:count}"""
ct = self.ClassToTest
f = ct()
self.assertEqual(f.fromDicts([{'a':6},{'b':3}]), ct({'a':6,'b':3}))
#note: should be allowed to subtract, as long as freq doesn't go
#negative. Also tests add of 1-item dict (note: must be in list)
f.fromDicts([{'b':-1, 'c':4.5}])
self.assertEqual(f, ct({'a':6,'b':2,'c':4.5}))
#should work with a different operator
f.fromDicts([{'b':2},{'b':3}], op=mul)
self.assertEqual(f, ct({'a':6, 'b':12, 'c':4.5}))
def test_newFromDicts(self):
"""Freqs newFromDicts should work as expected."""
ct = self.ClassToTest
self.assertEqual(ct.newFromDicts([{'a':6},{'b':3}]), ct({'a':6,'b':3}))
def test_fromSeq(self):
"""Freqs fromSeq should add items from sequence, according to weight"""
ct = self.ClassToTest
f = self.ClassToTest()
self.assertEqual(f.fromSeq('aaabbbaaa'), ct({'a':6,'b':3}))
#should be able to change the operator...
self.assertEqual(f.fromSeq('aab', sub), ct({'a':4,'b':2}))
#...or change the weight
self.assertEqual(f.fromSeq('acc', weight=3.5),\
ct({'a':7.5,'b':2,'c':7}))
def test_newFromSeq(self):
"""Freqs newFromSeq should work as expected."""
ct = self.ClassToTest
self.assertEqual(ct.newFromSeq('aaabbbaaa'), ct({'a':6,'b':3}))
def test_fromSeqs(self):
"""Freqs fromSeqs should add items from sequences, according to weight"""
ct = self.ClassToTest
f = ct()
self.assertEqual(f.fromSeqs(['aaa','bbbaaa']), ct({'a':6,'b':3}))
#should be able to change the operator...
self.assertEqual(f.fromSeqs(list('aab'), sub), ct({'a':4,'b':2}))
#...or change the weight. Note that a string counts as a seq of seqs.
self.assertEqual(f.fromSeqs('acc', weight=3.5), \
ct({'a':7.5,'b':2,'c':7}))
def test_newFromSeqs(self):
"""Freqs newFromSeqs should work as expected."""
ct = self.ClassToTest
self.assertEqual(ct.newFromSeqs(['aaa','bbbaaa']), ct({'a':6,'b':3}))
def test_isValid(self):
"""Freqs isValid should return True if valid"""
d =self.ClassToTest()
assert d.isValid()
d.fromSeq('aaaaaaaaaaaaabb')
assert d.isValid()
def test_find_conversion_function(self):
"""Freqs _find_conversion_function should return correct value."""
d = self.ClassToTest()
f = d._find_conversion_function
#should always return None if data empty
for i in [None, 0, False, {}, [], tuple()]:
self.assertEqual(f(i), None)
#should return fromDict for non-empty dict
self.assertEqual(f({3:4}), d.fromDict)
#should return fromSeq for string or list of scalars or strings
for i in ['abc', [1,2,3], (1,2,3), ['ab','bb','cb']]:
self.assertEqual(f(i), d.fromSeq)
#should return fromSeqs for sequence of sequences
for i in [[[1,2,3],[3,4,4]], ([1,2,4],[3,4,4]), [(1,2),[3],[], [4]]]:
self.assertEqual(f(i), d.fromSeqs)
#should return fromTuples if possibly key-value pairs
for i in [[('a',3),('b',-1)], [(1,2),(3,4)]]:
self.assertEqual(f(i), d.fromTuples)
#should not be fooled by 2-item seqs that can't be key-value pairs
self.assertEqual(f(['ab','cd']), d.fromSeq)
#The following test inheritance of dict properties/methods
def test_setitem_good(self):
"""Freqs should allow non-negative values to be set"""
ct = self.ClassToTest
self.Empty['a'] = 0
self.assertEqual(self.Empty, ct({'a':0}))
self.Empty['b'] = 5
self.assertEqual(self.Empty, ct({'a':0, 'b':5}))
def test_delitem(self):
"""delitem not applicable to freqs w/ constant keys: not tested"""
pass
def test_setdefault_good(self):
"""Freqs setdefault should work with positive values if key present"""
a = self.Alphabetic.setdefault('a', 200)
self.assertEqual(a, 3)
self.assertEqual(self.Alphabetic['a'], 3)
def test_iadd(self):
"""Freqs += should add in place from any known data type"""
ct = self.ClassToTest
f = ct({'a':3, 'b':4})
f += 'aca'
self.assertEqual(f, ct({'a':5, 'b':4, 'c':1}))
f += ['b','b']
self.assertEqual(f, ct({'a':5, 'b':6, 'c':1}))
f += {'c':10, 'a':-3}
self.assertEqual(f, ct({'a':2, 'b':6, 'c':11}))
f += (('a',3),('b',-2))
self.assertEqual(f, ct({'a':5, 'b':4, 'c':11}))
f += [['a', 'b', 'b'],['c', 'c', 'c']]
self.assertEqual(f, ct({'a':6, 'b':6, 'c':14}))
#note that list of strings will give implementation-dependent result
def test_add(self):
"""Freqs + should make new object, adding from any known data type"""
ct = self.ClassToTest
orig = {'a':3, 'b':4}
f = ct(orig)
r = f + 'aca'
self.assertEqual(r, ct({'a':5, 'b':4, 'c':1}))
self.assertEqual(f, orig)
r = f + ['b','b']
self.assertEqual(r, ct({'a':3, 'b':6}))
self.assertEqual(f, orig)
r = f + {'c':10, 'a':-3}
self.assertEqual(r, ct({'a':0, 'b':4, 'c':10}))
self.assertEqual(f, orig)
r = f + (('a',3),('b',-2))
self.assertEqual(r, ct({'a':6, 'b':2}))
self.assertEqual(f, orig)
r = f + [['a', 'b', 'b'],['c', 'c', 'c']]
self.assertEqual(r, ct({'a':4, 'b':6, 'c':3}))
self.assertEqual(f, orig)
#note that list of strings will give implementation-dependent result
def test_isub(self):
"""Freqs -= should subtract in place using any known data type"""
ct = self.ClassToTest
f = ct({'a':5, 'b':4})
f -= 'aba'
self.assertEqual(f, ct({'a':3, 'b':3}))
f -= ['b','b']
self.assertEqual(f, ct({'a':3, 'b':1}))
f -= {'c':-2, 'a':-3}
self.assertEqual(f, ct({'a':6, 'b':1, 'c':2}))
f -= (('a',3),('b',-2))
self.assertEqual(f, ct({'a':3, 'b':3, 'c':2}))
f -= [['a', 'b', 'b'],['c', 'c']]
self.assertEqual(f, ct({'a':2, 'b':1, 'c':0}))
#note that list of strings will give implementation-dependent result
def test_sub(self):
"""Freqs - should make new object, subtracting using any known data type"""
orig = {'a':3, 'b':4}
ct = self.ClassToTest
f = self.ClassToTest(orig)
r = f - 'aba'
self.assertEqual(r, ct({'a':1, 'b':3}))
self.assertEqual(f, orig)
r = f - ['b','b']
self.assertEqual(r, ct({'a':3, 'b':2}))
self.assertEqual(f, orig)
r = f - {'c':-10, 'a':3}
self.assertEqual(r, ct({'a':0, 'b':4, 'c':10}))
self.assertEqual(f, orig)
r = f - (('a',3),('b',-2))
self.assertEqual(r, ct({'a':0, 'b':6}))
self.assertEqual(f, orig)
r = f - [['a', 'b', 'b'],['a','a']]
self.assertEqual(r, ct({'a':0, 'b':2}))
self.assertEqual(f, orig)
#note that list of strings will give implementation-dependent results
def test_copy(self):
"""Freqs copy should preserve class of original"""
d = {'a':4, 'b':3, 'c':6}
f = self.ClassToTest(d)
g = f.copy()
self.assertEqual(f, g)
self.assertEqual(f.__class__, g.__class__)
def test_str(self):
"""Freqs abstract interface doesn't specify string result"""
pass
def test_delitem(self):
"""Freqs delitem is implementation-dependent"""
pass
#The following test custom methods
def test_rekey(self):
"""Freqs rekey should map the results onto new keys"""
#note that what happens to unmapped keys is implementation-dependent
ct = self.ClassToTest
d = ct({'a':3, 'b':5, 'c':6, 'd':7, 'e':1})
#should work with simple rekeying
f = d.rekey({'a':'d', 'b':'e'})
self.assertEqual(f['d'], d['a'])
self.assertEqual(f['e'], d['b'])
#remaining keys might be absent or 0
for i in 'abc':
if i in f:
self.assertEqual(f[i], 0)
#should work if many old keys map to the same new key
f = d.rekey({'a':'d', 'b':'e', 'c':'e'})
self.assertEqual(f['d'], d['a'])
self.assertEqual(f['e'], d['b'] + d['c'])
#remaining keys might be absent or 0
for i in 'abc':
if i in f:
self.assertEqual(f[i], 0)
#check with explicit constructor and default
d = self.ClassToTest({'a':3, 'b':5, 'c':6, 'd':7, 'e':1})
f = d.rekey({'a':'+', 'b':'-', 'c':'+'}, default='x', constructor=dict)
self.assertEqual(f, {'+':9, '-':5, 'x':8})
self.assertEqual(f.__class__, dict)
def test_purge(self):
"""Freqs purge should have no effect if keys are fixed"""
ct = self.ClassToTest
orig = {'a':3, 'b':2, 'c':1, 'd':3, 'e':4}
d = ct(orig)
d.purge()
self.assertEqual(d, ct(orig))
def test_normalize(self):
"""Freqs should allow normalization"""
ct = self.ClassToTest
self.Empty.normalize()
self.assertEqual(self.Empty, ct({}))
a = self.Alphabetic.copy()
a.normalize()
expected = {'a':0.375, 'b':0.25, 'c':0.125, 'd':0.125, 'e':0.125}
for key, val in expected.items():
self.assertFloatEqual(a[key], val)
def test_choice(self):
"""Freqs choice should work as expected"""
self.Alphabetic.normalize()
keys = self.Alphabetic.keys()
vals = Numbers(self.Alphabetic.values())
vals.accumulate()
#test first item
self.assertEqual(self.Alphabetic.choice(-1), keys[0])
self.assertEqual(self.Alphabetic.choice(-0.0001), keys[0])
self.assertEqual(self.Alphabetic.choice(-1e300), keys[0])
self.assertEqual(self.Alphabetic.choice(0), keys[0])
#test last item
last_val = vals.pop()
self.assertEqual(self.Alphabetic.choice(last_val), keys[-1])
self.assertEqual(self.Alphabetic.choice(1000), keys[-1])
#test remaining items
for index, value in enumerate(vals):
self.assertEqual(self.Alphabetic.choice(value-0.01),keys[index])
self.assertEqual(self.Alphabetic.choice(value+0.01), keys[index+1])
def test_randomSequence_good(self):
"""Freqs randomSequence should give correct counts"""
self.Alphabetic.normalize()
total = self.Alphabetic.Sum
keys = self.Alphabetic.keys()
probs = [float(i)/total for i in self.Alphabetic.values()]
rand_seq = self.Alphabetic.randomSequence(10000)
observed = [rand_seq.count(key) for key in keys]
expected = [prob*10000 for prob in probs]
self.assertSimilarFreqs(observed, expected)
def test_randomSequence_bad(self):
"""Empty Freqs should raise error on randomSequence"""
self.assertRaises(IndexError, self.Empty.randomSequence, 5)
def test_randomSequence_one_item(self):
"""Freqs randomSequence should work with one key"""
self.Constant.normalize()
rand = self.Constant.randomSequence(1000)
self.assertEqual(rand.count('a'), 1000)
self.assertEqual(len(rand), 1000)
def test_subset_preserve(self):
"""Freqs subset should preserve wanted items"""
ct = self.ClassToTest
self.Constant.subset('bc')
self.assertEqual(self.Constant, self.Empty)
self.Alphabetic.subset('abx')
self.assertEqual(self.Alphabetic, ct({'a':3,'b':2}))
def test_subset_remove(self):
"""Freqs subset should delete unwanted items if asked"""
ct = self.ClassToTest
self.Alphabetic.subset('abx', keep=False)
self.assertEqual(self.Alphabetic, ct({'c':1,'d':1,'e':1}))
self.Constant.subset('bx', keep=False)
self.assertEqual(self.Constant, ct({'a':5}))
def test_scale(self):
"""Freqs scale should multiply all values with the given scale"""
ct = self.ClassToTest
f = ct({'a':0.25,'b':0.25})
f.scale(10)
self.assertEqual(f,ct({'a':2.5,'b':2.5}))
f.scale(100)
self.assertEqual(f,ct({'a':250, 'b':250}))
f.scale(0.001)
self.assertEqual(f,ct({'a':0.25,'b':0.25}))
def test_round(self):
"""Freqs round should round all frequencies to integers"""
ct = self.ClassToTest
f = ct({'a':23.1, 'b':12.5, 'c':56.7})
f.round()
self.assertEqual(f,ct({'a':23, 'b':13, 'c':57}))
g = ct({'a':23.1356, 'b':12.5731})
g.round(3)
self.assertEqual(g,ct({'a':23.136, 'b':12.573}))
def test_expand(self):
"""Freqs expand should give expected results"""
ct = self.ClassToTest
f = ct({'a':3, 'c':5, 'b':2})
self.assertEqual(f.expand(order='acb'), list('aaacccccbb'))
self.assertEqual(f.expand(order='dba'), list('bbaaa'))
self.assertEqual(f.expand(order='cba',convert_to=''.join),'cccccbbaaa')
f['c'] = 0
self.assertEqual(f.expand(order='acb'), list('aaabb'))
f.normalize()
self.assertEqual(f.expand(order='cba'), ['a'])
self.assertEqual(f.expand(convert_to=''.join), 'a')
f.normalize(total=1.0/20)
self.assertEqual(f.expand(order='abc'), list('a'*12 + 'b'*8))
#test expand with scaling
g = ct({'c':0.5,'d':0.5})
self.assertEqual(g.expand(order='cd'),['c','d'])
self.assertEqual(g.expand(order='cd',scale=10),list(5*'c'+5*'d'))
self.assertRaises(ValueError,g.expand,scale=33)
def test_Count(self):
"""Freqs Count should return correct count (number of categories)"""
self.assertEqual(self.Alphabetic.Count, 5)
#WARNING: Count of empty categories is implementation-dependent
def test_Sum(self):
"""Freqs Sum should return sum of item counts in all categories"""
self.assertEqual(self.Alphabetic.Sum, 8)
self.assertEqual(self.Empty.Sum, 0)
self.assertEqual(self.Constant.Sum, 5)
def test_SumSquares(self):
"""Freqs SumSquared should return sum of squared freq of each category"""
self.assertEqual(self.Alphabetic.SumSquares, 16)
self.assertEqual(self.Empty.SumSquares, 0)
self.assertEqual(self.Constant.SumSquares, 25)
def test_Variance(self):
"""Freqs Variance should return variance of counts in categories"""
self.assertFloatEqual(self.Alphabetic.Variance, 0.8)
self.assertFloatEqual(self.Empty.Variance, None)
#WARNING: Variance with empty categories is implementation-dependent
def test_StandardDeviation(self):
"""Freqs StandardDeviation should return stdev of counts in categories"""
self.assertFloatEqual(self.Alphabetic.StandardDeviation, sqrt(0.8))
self.assertFloatEqual(self.Empty.StandardDeviation, None)
#WARNING: Standard deviation with empty categories is implementation-
#dependent
def test_Mean(self):
"""Freqs Mean should return mean of counts in categories"""
self.assertEqual(self.Alphabetic.Mean, 8/5.0)
self.assertEqual(self.Empty.Mean, None)
#WARNING: Mean with empty categories is implementation-dependent
def test_Uncertainty(self):
"""Freqs Shannon uncertainty values should match spreadsheet"""
self.assertEqual(self.Empty.Uncertainty, 0)
self.assertFloatEqual(self.Alphabetic.Uncertainty, 2.1556, eps=1e-4)
#WARNING: Uncertainty with empty categories is implementation-dependent
def test_mode(self):
"""Freqs mode should return most frequent item"""
self.assertEqual(self.Empty.Mode, None)
self.assertEqual(self.Alphabetic.Mode, 'a')
self.assertEqual(self.Constant.Mode, 'a')
def test_summarize(self):
"""Freqs summarize should return Summary: Count, Sum, SumSquares, Var"""
s = self.Alphabetic.summarize()
self.assertEqual(s.Sum, 8)
self.assertEqual(s.Count, 5)
self.assertEqual(s.SumSquares, 16)
self.assertFloatEqual(s.Variance, 0.8)
self.assertFloatEqual(s.StandardDeviation, sqrt(0.8))
self.assertFloatEqual(s.Mean, 8.0/5)
def test_getSortedList(self):
"""Freqs getSortedList should return sorted list of key, val tuples"""
#behavior is implementation-defined with empty list, so skip tests.
a = self.Alphabetic
a['b'] = 5
self.assertEqual(a.getSortedList(), \
[('b',5),('a',3),('e',1),('d',1),('c',1)])
self.assertEqual(a.getSortedList(descending=True), \
[('b',5),('a',3),('e',1),('d',1),('c',1)])
self.assertEqual(a.getSortedList(descending=False), \
[('c',1),('d',1),('e',1),('a',3),('b',5)])
self.assertEqual(a.getSortedList(by_val=False), \
[('e',1),('d',1),('c',1),('b',5),('a',3)])
self.assertEqual(a.getSortedList(by_val=False, descending=False), \
[('a',3),('b',5),('c',1),('d',1),('e',1)])
class FreqsStaticTests(StaticFreqsTestsI, TestCase):
ClassToTest = Freqs
class UnsafeFreqsStaticTests(StaticFreqsTestsI, TestCase):
ClassToTest = UnsafeFreqs
class FreqsTestsI(object):
"""Tests of the interface shared by Freqs and UnsafeFreqs."""
ClassToTest = None
#The following test various ways of constructing the objects
def test_fromTuples(self):
"""Freqs fromTuples should add from key, count pairs w/ repeated keys"""
f = self.ClassToTest()
self.assertEqual(f.fromTuples([('a',4),('b',3),('a',2)]), {'a':6,'b':3})
#note: should be allowed to subtract, as long as freq doesn't go
#negative.
f.fromTuples([('b',-1),('c',4.5)])
self.assertEqual(f, {'a':6,'b':2,'c':4.5})
#should work with a different operator
f.fromTuples([('b',7)], op=mul)
self.assertEqual(f, {'a':6, 'b':14, 'c':4.5})
#check that it works with something that depends on the key
def func(key, first, second):
if key == 'a':
return first + second
else:
return max(second, first * second)
f = self.ClassToTest()
self.assertEqual(f.fromTuples([('a',4),('b',3),('a',2), ('b',4)], \
func,uses_key=True), {'a':6,'b':12})
def test_fromDict(self):
"""Freqs fromDict should add from dict of {key:count}"""
f = self.ClassToTest()
self.assertEqual(f.fromDict({'a':6,'b':3}), {'a':6,'b':3})
#note: should be allowed to subtract, as long as freq doesn't go
#negative.
f.fromDict({'b':-1, 'c':4.5})
self.assertEqual(f, {'a':6,'b':2,'c':4.5})
#should work with a different operator
f.fromDict({'b':7}, op=mul)
self.assertEqual(f, {'a':6, 'b':14, 'c':4.5})
def test_fromDicts(self):
"""Freqs fromDicts should add from list of dicts of {key:count}"""
f = self.ClassToTest()
self.assertEqual(f.fromDicts([{'a':6},{'b':3}]), {'a':6,'b':3})
#note: should be allowed to subtract, as long as freq doesn't go
#negative. Also tests add of 1-item dict (note: must be in list)
f.fromDicts([{'b':-1, 'c':4.5}])
self.assertEqual(f, {'a':6,'b':2,'c':4.5})
#should work with a different operator
f.fromDicts([{'b':2},{'b':3}], op=mul)
self.assertEqual(f, {'a':6, 'b':12, 'c':4.5})
def test_fromSeq(self):
"""Freqs fromSeq should add items from sequence, according to weight"""
f = self.ClassToTest()
self.assertEqual(f.fromSeq('aaabbbaaa'), {'a':6,'b':3})
#should be able to change the operator...
self.assertEqual(f.fromSeq('aab', sub), {'a':4,'b':2})
#...or change the weight
self.assertEqual(f.fromSeq('acc', weight=3.5), {'a':7.5,'b':2,'c':7})
def test_fromSeqs(self):
"""Freqs fromSeqs should add items from sequences, according to weight"""
f = self.ClassToTest()
self.assertEqual(f.fromSeqs(['aaa','bbbaaa']), {'a':6,'b':3})
#should be able to change the operator...
self.assertEqual(f.fromSeqs(list('aab'), sub), {'a':4,'b':2})
#...or change the weight. Note that a string counts as a seq of seqs.
self.assertEqual(f.fromSeqs('acc', weight=3.5), {'a':7.5,'b':2,'c':7})
def test_isValid(self):
"""Freqs isValid should return True if valid"""
d =self.ClassToTest()
assert d.isValid()
d.fromSeq('aaaaaaaaaaaaabb')
assert d.isValid()
def test_find_conversion_function(self):
"""Freqs _find_conversion_function should return correct value."""
d = self.ClassToTest()
f = d._find_conversion_function
#should always return None if data empty
for i in [None, 0, False, {}, [], tuple()]:
self.assertEqual(f(i), None)
#should return fromDict for non-empty dict
self.assertEqual(f({3:4}), d.fromDict)
#should return fromSeq for string or list of scalars or strings
for i in ['abc', [1,2,3], (1,2,3), ['ab','bb','cb']]:
self.assertEqual(f(i), d.fromSeq)
#should return fromSeqs for sequence of sequences
for i in [[[1,2,3],[3,4,4]], ([1,2,4],[3,4,4]), [(1,2),[3],[], [4]]]:
self.assertEqual(f(i), d.fromSeqs)
#should return fromTuples if possibly key-value pairs
for i in [[('a',3),('b',-1)], [(1,2),(3,4)]]:
self.assertEqual(f(i), d.fromTuples)
#should not be fooled by 2-item seqs that can't be key-value pairs
self.assertEqual(f(['ab','cd']), d.fromSeq)
#The following test inheritance of dict properties/methods
def test_setitem_good(self):
"""Freqs should allow non-negative values to be set"""
self.Empty[3] = 0
self.assertEqual(self.Empty, {3:0})
self.Empty['xyz'] = 5
self.assertEqual(self.Empty, {3:0, 'xyz':5})
def test_delitem(self):
"""Freqs should delete all counts of item with del"""
del self.Alphabetic['a']
del self.Alphabetic['b']
del self.Alphabetic['c']
self.assertEqual(self.Alphabetic, {'d':1,'e':1})
def test_setdefault_good(self):
"""Freqs setdefault should work with positive values"""
a = self.Alphabetic.setdefault('a', 200)
self.assertEqual(a, 3)
self.assertEqual(self.Alphabetic['a'], 3)
f = self.Alphabetic.setdefault('f', 0)
self.assertEqual(f, 0)
self.assertEqual(self.Alphabetic['f'], 0)
g = self.Alphabetic.setdefault('g', 1000)
self.assertEqual(g, 1000)
self.assertEqual(self.Alphabetic['g'], 1000)
#The following test overridden operators and methods
def test_iadd(self):
"""Freqs += should add in place from any known data type"""
f = self.ClassToTest({'a':3, 'b':4})
f += 'aca'
self.assertEqual(f, {'a':5, 'b':4, 'c':1})
f += ['b','b']
self.assertEqual(f, {'a':5, 'b':6, 'c':1})
f += {'c':10, 'a':-3}
self.assertEqual(f, {'a':2, 'b':6, 'c':11})
f += (('a',3),('b',-2))
self.assertEqual(f, {'a':5, 'b':4, 'c':11})
f += [['a', 'b', 'b'],['c', 'c', 'c']]
self.assertEqual(f, {'a':6, 'b':6, 'c':14})
#note that list of strings will use the strings as keys
f += ['abc', 'def', 'abc']
self.assertEqual(f, {'a':6, 'b':6, 'c':14, 'abc':2, 'def':1})
def test_add(self):
"""Freqs + should make new object, adding from any known data type"""
orig = {'a':3, 'b':4}
f = self.ClassToTest(orig)
self.assertEqual(f, orig)
r = f + 'aca'
self.assertEqual(r, {'a':5, 'b':4, 'c':1})
self.assertEqual(f, orig)
r = f + ['b','b']
self.assertEqual(r, {'a':3, 'b':6})
self.assertEqual(f, orig)
r = f + {'c':10, 'a':-3}
self.assertEqual(r, {'a':0, 'b':4, 'c':10})
self.assertEqual(f, orig)
r = f + (('a',3),('b',-2))
self.assertEqual(r, {'a':6, 'b':2})
self.assertEqual(f, orig)
r = f + [['a', 'b', 'b'],['c', 'c', 'c']]
self.assertEqual(r, {'a':4, 'b':6, 'c':3})
self.assertEqual(f, orig)
#note that list of strings will use the strings as keys
r = f + ['abc', 'def', 'abc']
self.assertEqual(r, {'a':3, 'b':4, 'abc':2, 'def':1})
self.assertEqual(f, f)
def test_isub(self):
"""Freqs -= should subtract in place using any known data type"""
f = self.ClassToTest({'a':5, 'b':4})
f -= 'aba'
self.assertEqual(f, {'a':3, 'b':3,})
f -= ['b','b']
self.assertEqual(f, {'a':3, 'b':1})
f -= {'c':-2, 'a':-3}
self.assertEqual(f, {'a':6, 'b':1, 'c':2})
f -= (('a',3),('b',-2))
self.assertEqual(f, {'a':3, 'b':3, 'c':2})
f -= [['a', 'b', 'b'],['c', 'c']]
self.assertEqual(f, {'a':2, 'b':1, 'c':0})
f['abc'] = 3
f['def'] = 10
#note that list of strings will use the strings as keys
f -= ['abc', 'def', 'abc']
self.assertEqual(f, {'a':2, 'b':1, 'c':0, 'abc':1, 'def':9})
def test_sub(self):
"""Freqs - should make new object, subtracting using any known data type"""
orig = {'a':3, 'b':4}
f = self.ClassToTest(orig)
self.assertEqual(f, orig)
r = f - 'aba'
self.assertEqual(r, {'a':1, 'b':3})
self.assertEqual(f, orig)
r = f - ['b','b']
self.assertEqual(r, {'a':3, 'b':2})
self.assertEqual(f, orig)
r = f - {'c':-10, 'a':3}
self.assertEqual(r, {'a':0, 'b':4, 'c':10})
self.assertEqual(f, orig)
r = f - (('a',3),('b',-2))
self.assertEqual(r, {'a':0, 'b':6})
self.assertEqual(f, orig)
r = f - [['a', 'b', 'b'],['a','a']]
self.assertEqual(r, {'a':0, 'b':2})
self.assertEqual(f, orig)
#note that list of strings will use the strings as keys
orig['abc'] = 5
orig['def'] = 10
f['abc'] = 5
f['def'] = 10
r = f - ['abc', 'def', 'abc']
self.assertEqual(r, {'a':3, 'b':4, 'abc':3, 'def':9})
self.assertEqual(f, orig)
def test_copy(self):
"""Freqs copy should preserve class of original"""
d = {'a':4, 'b':3, 'c':6}
f = self.ClassToTest(d)
g = f.copy()
self.assertEqual(d, f)
self.assertEqual(d, g)
self.assertEqual(f, g)
self.assertEqual(f.__class__, g.__class__)
def test_str(self):
"""Freqs should print as tab-delimited table, or 'Empty'"""
#should work with empty freq distribution
self.assertEqual(str(self.ClassToTest([])), \
"Empty frequency distribution")
#should work with single element
self.assertEqual(str(self.ClassToTest({'X':1.0})), \
"Value\tCount\nX\t1.0")
#should work with multiples of same key
self.assertEqual(str(self.ClassToTest({1.0:5.0})), \
"Value\tCount\n1.0\t5.0")
#should work with different keys
self.assertEqual(str(self.ClassToTest({0:3.0,1:2.0})), \
"Value\tCount\n0\t3.0\n1\t2.0")
def test_delitem(self):
"""Freqs delitem should refuse to delete a required key"""
a = self.Alphabetic
del a['a']
self.assertEqual(a, {'b':2, 'c':1, 'd':1, 'e':1})
#can't delete RequiredKeys once set
a.RequiredKeys = 'bcd'
del a['e']
self.assertEqual(a, {'b':2,'c':1,'d':1})
for k in 'bcd':
self.assertRaises(KeyError, a.__delitem__, k)
#when RequiredKeys is removed, can delete them again
a.RequiredKeys = None
del a['b']
self.assertEqual(a, {'c':1,'d':1})
#The following test custom methods
def test_rekey(self):
"""Freqs rekey should map the results onto new keys."""
d = self.ClassToTest({'a':3, 'b':5, 'c':6, 'd':7, 'e':1})
f = d.rekey({'a':'+', 'b':'-', 'c':'+'})
self.assertEqual(f, {'+':9, '-':5, None:8})
self.assertEqual(f.__class__, d.__class__)
#check with explicit constructor and default
d = self.ClassToTest({'a':3, 'b':5, 'c':6, 'd':7, 'e':1})
f = d.rekey({'a':'+', 'b':'-', 'c':'+'}, default='x', constructor=dict)
self.assertEqual(f, {'+':9, '-':5, 'x':8})
self.assertEqual(f.__class__, dict)
def test_purge(self):
"""Freqs purge should have no effect unless RequiredKeys set."""
working = self.PosNeg.copy()
self.assertEqual(working, self.PosNeg)
working.purge()
self.assertEqual(working, self.PosNeg)
working.RequiredKeys=(-2,-1)
working[-2] = 3
working.purge()
self.assertEqual(working, {-2:3,-1:1})
#should have no effect if repeated
working.purge()
self.assertEqual(working, {-2:3,-1:1})
def test_normalize(self):
"""Freqs should allow normalization on any type"""
self.Empty.normalize()
self.assertEqual(self.Empty, {})
a = self.Alphabetic.copy()
a.normalize()
expected = {'a':0.375, 'b':0.25, 'c':0.125, 'd':0.125, 'e':0.125}
for key, val in expected.items():
self.assertFloatEqual(a[key], val)
self.PosNeg.normalize()
expected = {-2:0.25, -1:0.25, 1:0.25, 2:0.25}
for key, val in expected.items():
self.assertFloatEqual(self.PosNeg[key], val)
#check that it works when we specify a total
self.PosNeg.normalize(total=0.2)
expected = {-2:1.25, -1:1.25, 1:1.25, 2:1.25}
for key, val in expected.items():
self.assertFloatEqual(self.PosNeg[key], val)
#check that purging works
a = self.Alphabetic.copy()
a.RequiredKeys = 'ac'
a.normalize()
self.assertEqual(a, {'a':0.75, 'c':0.25})
a = self.Alphabetic.copy()
a.RequiredKeys = 'ac'
a.normalize(purge=False)
self.assertEqual(a, \
{'a':0.375, 'b':0.25, 'c':0.125, 'd':0.125, 'e':0.125})
#normalize should also create keys when necessary
a.RequiredKeys = 'bdex'
a.normalize(purge=True)
self.assertEqual(a, {'b':0.5, 'd':0.25, 'e':0.25, 'x':0})
def test_choice(self):
"""Freqs choice should work as expected"""
self.Alphabetic.normalize()
keys = self.Alphabetic.keys()
vals = Numbers(self.Alphabetic.values())
vals.accumulate()
#test first item
self.assertEqual(self.Alphabetic.choice(-1), keys[0])
self.assertEqual(self.Alphabetic.choice(-0.0001), keys[0])
self.assertEqual(self.Alphabetic.choice(-1e300), keys[0])
self.assertEqual(self.Alphabetic.choice(0), keys[0])
#test last item
last_val = vals.pop()
self.assertEqual(self.Alphabetic.choice(last_val), keys[-1])
self.assertEqual(self.Alphabetic.choice(1000), keys[-1])
#test remaining items
for index, value in enumerate(vals):
self.assertEqual(self.Alphabetic.choice(value-0.01),keys[index])
self.assertEqual(self.Alphabetic.choice(value+0.01), keys[index+1])
def test_randomSequence_good(self):
"""Freqs randomSequence should give correct counts"""
self.Alphabetic.normalize()
total = self.Alphabetic.Sum
keys = self.Alphabetic.keys()
probs = [float(i)/total for i in self.Alphabetic.values()]
rand_seq = self.Alphabetic.randomSequence(10000)
observed = [rand_seq.count(key) for key in keys]
expected = [prob*10000 for prob in probs]
self.assertSimilarFreqs(observed, expected)
def test_randomSequence_bad(self):
"""Empty Freqs should raise error on randomSequence"""
self.assertRaises(IndexError, self.Empty.randomSequence, 5)
def test_randomSequence_one_item(self):
"""Freqs randomSequence should work with one key"""
self.Constant.normalize()
rand = self.Constant.randomSequence(1000)
self.assertEqual(rand.count(1), 1000)
self.assertEqual(len(rand), 1000)
def test_subset_preserve(self):
"""Freqs subset should preserve unwanted items"""
self.Constant.subset('abc')
self.assertEqual(self.Constant, self.Empty)
self.Alphabetic.subset('abx')
self.assertEqual(self.Alphabetic, Freqs('aaabb'))
def test_subset_remove(self):
"""Freqs subset should delete unwanted items if asked"""
self.Alphabetic.subset('abx', keep=False)
self.assertEqual(self.Alphabetic, Freqs('cde'))
self.Constant.subset('abx', keep=False)
self.assertEqual(self.Constant, Freqs([1]*5))
def test_scale(self):
"""Freqs scale should multiply all values with the given scale"""
f = self.ClassToTest({'a':0.25,'b':0.25})
f.scale(10)
self.assertEqual(f,{'a':2.5,'b':2.5})
f.scale(100)
self.assertEqual(f,{'a':250, 'b':250})
f.scale(0.001)
self.assertEqual(f,{'a':0.25,'b':0.25})
def test_round(self):
"""Freqs round should round all frequencies to integers"""
f = self.ClassToTest({'a':23.1, 'b':12.5, 'c':56.7})
f.round()
self.assertEqual(f,{'a':23, 'b':13, 'c':57})
g = Freqs({'a':23.1356, 'b':12.5731})
g.round(3)
self.assertEqual(g,{'a':23.136, 'b':12.573})
def test_expand(self):
"""Freqs expand should give expected results"""
f = self.ClassToTest({'U':3, 'A':5, 'C':2})
self.assertEqual(f.expand(order='UAC'), list('UUUAAAAACC'))
self.assertEqual(f.expand(order='GCU'), list('CCUUU'))
self.assertEqual(f.expand(order='ACU',convert_to=''.join),'AAAAACCUUU')
del f['A']
self.assertEqual(f.expand(order='UAC'), list('UUUCC'))
f.normalize()
self.assertEqual(f.expand(order='ACU'), ['U'])
self.assertEqual(f.expand(convert_to=''.join), 'U')
f.normalize(total=1.0/20)
self.assertEqual(f.expand(order='UCA'), list('U'*12 + 'C'*8))
#test expand with scaling
g = self.ClassToTest({'A':0.5,'G':0.5})
self.assertEqual(g.expand(order='AG'),['A','G'])
self.assertEqual(g.expand(order='AG',scale=10),list(5*'A'+5*'G'))
self.assertRaises(ValueError,g.expand,scale=33)
def test_Count(self):
"""Freqs Count should return correct count (number of categories)"""
self.assertEqual(self.Alphabetic.Count, 5)
self.assertEqual(self.NumericDuplicated.Count, 3)
self.assertEqual(self.Empty.Count, 0)
self.assertEqual(self.Constant.Count, 1)
def test_Sum(self):
"""Freqs Sum should return sum of item counts in all categories"""
self.assertEqual(self.Alphabetic.Sum, 8)
self.assertEqual(self.NumericUnique.Sum, 5)
self.assertEqual(self.NumericDuplicated.Sum, 4)
self.assertEqual(self.Empty.Sum, 0)
# WARNING: For numeric keys, the value of the key is not taken into
# account (i.e. each key counts as a separate category)
self.assertEqual(self.PosNeg.Sum, 4)
self.assertEqual(self.Constant.Sum, 5)
def test_SumSquares(self):
"""Freqs SumSquared should return sum of squared freq of each category"""
self.assertEqual(self.Alphabetic.SumSquares, 16)
self.assertEqual(self.NumericUnique.SumSquares, 5)
self.assertEqual(self.NumericDuplicated.SumSquares, 6)
self.assertEqual(self.Empty.SumSquares, 0)
self.assertEqual(self.PosNeg.SumSquares, 4)
self.assertEqual(self.Constant.SumSquares, 25)
def test_Variance(self):
"""Freqs Variance should return variance of counts in categories"""
self.assertFloatEqual(self.Alphabetic.Variance, 0.8)
self.assertFloatEqual(self.NumericUnique.Variance, 0)
self.assertFloatEqual(self.NumericDuplicated.Variance, 1.0/3)
self.assertFloatEqual(self.Empty.Variance, None)
self.assertFloatEqual(self.PosNeg.Variance, 0)
self.assertEqual(self.Constant.Variance, 0)
def test_StandardDeviation(self):
"""Freqs StandardDeviation should return stdev of counts in categories"""
self.assertFloatEqual(self.Alphabetic.StandardDeviation, sqrt(0.8))
self.assertFloatEqual(self.NumericUnique.StandardDeviation, 0)
self.assertFloatEqual(self.NumericDuplicated.StandardDeviation, \
sqrt(1.0/3))
self.assertFloatEqual(self.Empty.StandardDeviation, None)
self.assertFloatEqual(self.PosNeg.StandardDeviation, 0)
self.assertEqual(self.Constant.StandardDeviation, 0)
def test_Mean(self):
"""Freqs Mean should return mean of counts in categories"""
self.assertEqual(self.Alphabetic.Mean, 8/5.0)
self.assertEqual(self.NumericUnique.Mean, 1)
self.assertEqual(self.NumericDuplicated.Mean, 4/3.0)
self.assertEqual(self.Empty.Mean, None)
self.assertEqual(self.PosNeg.Mean, 1)
self.assertEqual(self.Constant.Mean, 5)
def test_Uncertainty(self):
"""Freqs Shannon uncertainty values should match spreadsheet"""
self.assertEqual(self.Empty.Uncertainty, 0)
self.assertFloatEqual(self.Alphabetic.Uncertainty, 2.1556, eps=1e-4)
self.assertFloatEqual(self.PosNeg.Uncertainty, 2)
self.assertFloatEqual(self.NumericDuplicated.Uncertainty, 1.5)
self.assertFloatEqual(self.NumericUnique.Uncertainty, 2.3219, eps=1e-4)
self.assertFloatEqual(self.Constant.Uncertainty, 0)
def test_mode(self):
"""Freqs mode should return most frequent item"""
self.assertEqual(self.Empty.Mode, None)
self.assertEqual(self.Alphabetic.Mode, 'a')
assert(self.PosNeg.Mode in self.PosNeg)
assert(self.NumericUnique.Mode in self.NumericUnique)
self.assertEqual(self.NumericDuplicated.Mode, 1.5)
self.assertEqual(self.Constant.Mode, 1)
def test_summarize(self):
"""Freqs summarize should return Summary: Count, Sum, SumSquares, Var"""
self.assertEqual(self.Empty.summarize(), SummaryStatistics())
s = self.Alphabetic.summarize()
self.assertEqual(s.Sum, 8)
self.assertEqual(s.Count, 5)
self.assertEqual(s.SumSquares, 16)
self.assertFloatEqual(s.Variance, 0.8)
self.assertFloatEqual(s.StandardDeviation, sqrt(0.8))
self.assertFloatEqual(s.Mean, 8.0/5)
def test_getSortedList(self):
"""Freqs getSortedList should return sorted list of key, val tuples"""
e = self.Empty
self.assertEqual(e.getSortedList(), [])
self.assertEqual(e.getSortedList(descending=True), [])
self.assertEqual(e.getSortedList(descending=False), [])
self.assertEqual(e.getSortedList(by_val=True), [])
self.assertEqual(e.getSortedList(by_val=False), [])
a = self.Alphabetic
a['b'] = 5
self.assertEqual(a.getSortedList(), \
[('b',5),('a',3),('e',1),('d',1),('c',1)])
self.assertEqual(a.getSortedList(descending=True), \
[('b',5),('a',3),('e',1),('d',1),('c',1)])
self.assertEqual(a.getSortedList(descending=False), \
[('c',1),('d',1),('e',1),('a',3),('b',5)])
self.assertEqual(a.getSortedList(by_val=False), \
[('e',1),('d',1),('c',1),('b',5),('a',3)])
self.assertEqual(a.getSortedList(by_val=False, descending=False), \
[('a',3),('b',5),('c',1),('d',1),('e',1)])
class FreqsTests(FreqsTestsI, TestCase):
"""Tests of Freqs-specific behavior, mostly validation."""
ClassToTest = Freqs
def setUp(self):
"""defines some standard frequency distributions to check"""
self.Alphabetic = self.ClassToTest('abcdeaab')
self.NumericUnique = self.ClassToTest([1,2,3,4,5])
self.NumericDuplicated = self.ClassToTest([1, 1.5, 1.5, 3.5])
self.Empty = self.ClassToTest('')
self.PosNeg = self.ClassToTest([-2, -1, 1, 2])
self.Constant = self.ClassToTest([1]*5)
def test_isValid_bad(self):
"""Freqs should reject invalid data, so isValid() always True"""
self.assertRaises(ConstraintError, self.ClassToTest, {'a':3, 'b':-10})
def test_init_empty(self):
"""Freqs should initialize OK with empty list"""
self.assertEqual(self.ClassToTest([]), {})
def test_init_single(self):
"""Freqs should initialize OK with single item"""
self.assertEqual(self.ClassToTest(['X']), {'X':1.0})
def test_init_same_key(self):
"""Freqs should initialize OK with duplicate items"""
self.assertEqual(self.ClassToTest([1]*5), {1:5})
def test_init_two_keys(self):
"""Freqs should initialize OK with distinct items"""
self.assertEqual(self.ClassToTest([0,1,0,0,1]), {1:2,0:3})
def test_init_strings(self):
"""Freqs should initialize OK with characters in string"""
self.assertEqual(self.ClassToTest('zabcz'), {'z':2,'a':1,'b':1,'c':1})
def test_init_fails_negative(self):
"""Freqs init should fail on negative frequencies"""
self.assertRaises(ConstraintError, self.ClassToTest, {'a':3, 'b':-3})
def test_init_from_dict(self):
"""Freqs should init OK from dictionary"""
self.assertEqual(self.ClassToTest({'a':3,'b':2}), {'a':3, 'b':2})
def test_init_from_dicts(self):
"""Freqs should init OK from list of dicts"""
self.assertEqual(self.ClassToTest([{'a':1,'b':1}, {'a':2,'b':1}]), \
{'a':3, 'b':2})
def test_init_from_strings(self):
"""Freqs should init OK from list of strings"""
self.assertEqual(self.ClassToTest(['abc','def','abc']), \
{'abc':2,'def':1})
def test_init_from_tuples(self):
"""Freqs should init OK from list of key-value pairs"""
self.assertEqual(self.ClassToTest([('a',3),('b',10),('a',2)]), \
{'a':5,'b':10})
def test_init_alphabet_success(self):
"""Freqs should init ok with keys matching alphabet"""
fd = self.ClassToTest('abc', Constraint='abcd')
self.assertEqual(fd, {'a':1,'b':1,'c':1})
self.assertRaises(ConstraintError, fd.setdefault, 'x', 1)
self.assertRaises(ConstraintError, fd.__setitem__, 'x', 1)
def test_init_alphabet_failure(self):
"""Freqs should fail if keys don't match alphabet"""
try:
f = Freqs('abcd', Constraint='abc')
except ConstraintError:
pass
else:
self.fail()
def test_setitem_bad(self):
"""Freqs should not allow negative values"""
self.assertRaises(ConstraintError, self.Empty.__setitem__, 'xyz', -0.01)
def test_setdefault_bad(self):
"""Freqs setdefault should fail if default < 0"""
self.assertRaises(ConstraintError, self.Empty.setdefault, 'a', -1)
self.assertRaises(ConstraintError, self.Empty.setdefault, 'a', -0.00001)
self.assertRaises(ConstraintError, self.Empty.setdefault, 'a', "-1")
self.assertRaises(ConstraintError, self.Empty.setdefault, 'a', "xxxx")
class UnsafeFreqsTests(FreqsTestsI, TestCase):
"""Tests of UnsafeFreqs-specific behavior, mostly validation."""
ClassToTest = UnsafeFreqs
def setUp(self):
"""defines some standard frequency distributions to check"""
self.Alphabetic = self.ClassToTest({'a':3,'b':2,'c':1,'d':1,'e':1})
self.NumericUnique = self.ClassToTest({'1':1,'2':1,'3':1,'4':1,'5':1})
self.NumericDuplicated = self.ClassToTest({1:1, 1.5:2, 3.5:1})
self.Empty = self.ClassToTest({})
self.PosNeg = self.ClassToTest({-2:1, -1:1, 1:1, 2:1})
self.Constant = self.ClassToTest({1:5})
def test_isValid_bad(self):
"""UnsafeFreqs should allow invalid data, returning False for isValid"""
d = self.ClassToTest({'a':3, 'b':'x'})
assert not d.isValid()
def test_init_empty(self):
"""UnsafeFreqs should initialize OK with empty list"""
self.assertEqual(self.ClassToTest([]), {})
def test_init_single(self):
"""UnsafeFreqs init FAILS with single item"""
self.assertRaises(ValueError, self.ClassToTest, ['X'])
def test_init_same_key(self):
"""UnsafeFreqs init FAILS with list of items"""
self.assertRaises(TypeError, self.ClassToTest, [1]*5)
def test_init_strings(self):
"""UnsafeFreqs init FAILS with string"""
self.assertRaises(ValueError, self.ClassToTest, 'zabcz')
def test_init_negative(self):
"""UnsafeFreqs init should SUCCEED on negative frequencies"""
self.assertEqual(self.ClassToTest({'a':3, 'b':-3}), {'a':3,'b':-3})
def test_init_from_dict(self):
"""UnsafeFreqs should init OK from dictionary"""
self.assertEqual(self.ClassToTest({'a':3,'b':2}), {'a':3, 'b':2})
def test_init_from_dicts(self):
"""UnsafeFreqs init should init LIKE A DICT from list of dicts"""
# WARNING: Note the difference between this and Freqs init!
self.assertEqual(self.ClassToTest([{'a':1,'b':1}, {'a':2,'b':1}]), \
{'a':'b'})
def test_init_from_strings(self):
"""UnsafeFreqs init should FAIL from list of strings"""
self.assertRaises(ValueError, self.ClassToTest, ['abc','def','abc'])
def test_init_from_tuples(self):
"""UnsafeFreqs should init LIKE A DICT from list of key-value pairs"""
# WARNING: Note the difference between this and Freqs init!
self.assertEqual(self.ClassToTest([('a',3),('b',10),('a',2)]), \
{'a':2,'b':10})
class FreqsSubclassTests(TestCase):
"""Freqs subclassing should work correctly, esp. with RequiredKeys."""
class BaseFreqs(Freqs):
RequiredKeys = 'UCAG'
def test_init(self):
"""Freqs subclass init should add RequiredKeys"""
b = self.BaseFreqs()
self.assertEqual(b, {'U':0.0,'C':0.0,'A':0.0,'G':0.0})
self.assertEqual(self.BaseFreqs('UUCCCCAAAabc'), \
{'U':2, 'C':4, 'A':3, 'a':1, 'b':1, 'c':1, 'G':0})
def test_delitem(self):
"""Freqs subclass delitem shouldn't allow deletion of RequiredKeys"""
b = self.BaseFreqs('AAGCg')
self.assertEqual(b, {'A':2,'G':1,'C':1,'U':0,'g':1})
del b['g']
self.assertEqual(b, {'A':2,'G':1,'C':1,'U':0})
self.assertRaises(KeyError, b.__delitem__, 'A')
def test_purge(self):
"""Freqs subclass purge should eliminate anything not in RequiredKeys"""
b = self.BaseFreqs('AjaknadjkAjnjndfjndCnjdjsfnfdsjkC32478737^&@GGGG')
b.purge()
self.assertEqual(b, {'A':2,'C':2,'G':4, 'U':0})
b.purge()
self.assertEqual(b, {'A':2,'C':2,'G':4, 'U':0})
def test_normalize(self):
"""Freqs subclass normalize should optionally elminate non-required keys"""
b = self.BaseFreqs('UUUCX')
b.normalize(purge=False)
self.assertEqual(b, {'U':0.6, 'C':0.2, 'X':0.2, 'A':0, 'G':0})
b.normalize(purge=True)
self.assertFloatEqual(b, {'U':0.75, 'C':0.25, 'A':0, 'G':0})
b = self.BaseFreqs()
b.normalize()
self.assertEqual(b, {'U':0, 'C':0, 'A':0, 'G':0})
class NumberFreqsTestsI(object):
"""Interface for tests of safe and unsafe NumberFreqs classes."""
ClassToTest = None
def setUp(self):
"""defines some standard frequency distributions to check"""
self.NumericUnique = self.ClassToTest([1,2,3,4,5])
self.NumericDuplicated = self.ClassToTest([1, 1.5, 1.5, 3.5])
self.Empty = self.ClassToTest('')
self.PosNeg = self.ClassToTest([-2, -1, 1, 2])
self.Constant = self.ClassToTest([1]*5)
def test_setitem_good(self):
"""NumberFreqs should allow non-negative values to be set"""
self.Empty[3] = 0
self.assertEqual(self.Empty, {3:0})
def test_add_good(self):
"""NumberFreqs should allow addition of counts or items"""
self.Empty += [1]*5
self.assertEqual(self.Empty, {1:5})
def test_Mean(self):
"""NumberFreqs means should match hand-calculated values"""
self.assertEqual(self.Empty.Mean, None)
self.assertFloatEqual(self.NumericUnique.Mean, 15.0/5)
self.assertFloatEqual(self.NumericDuplicated.Mean, 7.5/4)
self.assertFloatEqual(self.PosNeg.Mean, 0.0)
self.assertFloatEqual(self.Constant.Mean, 1.0)
def test_Variance(self):
"""NumberFreqs variance should match values from R."""
self.assertEqual(None, self.Empty.Variance)
self.assertFloatEqual(2.5, self.NumericUnique.Variance)
self.assertFloatEqual(1.229167, self.NumericDuplicated.Variance)
self.assertFloatEqual(10/3.0, self.PosNeg.Variance)
self.assertFloatEqual(0, self.Constant.Variance)
def test_Sum(self):
"""NumberFreqs sums should match hand-calculated values"""
self.assertEqual(self.Empty.Sum, None)
self.assertFloatEqual(self.NumericUnique.Sum, 15)
self.assertFloatEqual(self.NumericDuplicated.Sum, 7.5)
self.assertFloatEqual(self.PosNeg.Sum, 0.0)
self.assertFloatEqual(self.Constant.Sum, 5.0)
def test_Count(self):
"""NumberFreqs counts should match hand-calculated values"""
self.assertEqual(self.NumericUnique.Count, 5)
self.assertEqual(self.NumericDuplicated.Count, 4)
self.assertEqual(self.Empty.Count, 0)
self.assertEqual(self.PosNeg.Count, 4)
self.assertEqual(self.Constant.Count, 5)
def test_Sumsq(self):
"""NumberFreqs sum of squares should match spreadsheet"""
self.assertEqual(self.Empty.SumSquares, None)
self.assertFloatEqual(self.NumericUnique.SumSquares, 55.0)
self.assertFloatEqual(self.NumericDuplicated.SumSquares, 17.75)
self.assertFloatEqual(self.PosNeg.SumSquares, 10.0)
self.assertFloatEqual(self.Constant.SumSquares, 5.0)
def test_Stdev(self):
"""NumberFreqs stdev should match spreadsheet"""
self.assertEqual(self.Empty.StandardDeviation, None)
self.assertFloatEqual(self.NumericUnique.StandardDeviation,1.581139)
self.assertFloatEqual(self.NumericDuplicated.StandardDeviation,1.108678)
self.assertFloatEqual(self.PosNeg.StandardDeviation, 1.825742)
self.assertFloatEqual(self.Constant.StandardDeviation, 0.0)
def test_NumberFreqsQuantiles(self):
"""quantiles should match Numbers, including Median"""
data={32: 60L, 33: 211L, 34: 141L, 35: 70L,
36: 26L, 10: 30L, 11: 5L, 18: 43L, 19: 10L,
21: 1L, 22: 1L, 23: 58L, 24: 12L, 25: 3L,
26: 74L, 27: 10L, 28: 77L, 29: 20L, 30: 102L,
31: 47L}
nums = Numbers(NumberFreqs(data=data).expand())
number_freqs = self.ClassToTest()
number_freqs.update(data)
for quantile in numpy.arange(0.05, 0.96, 0.05):
num_q = nums.quantile(quantile)
num_f = number_freqs.quantile(quantile)
self.assertFloatEqual(num_f, num_q)
self.assertFloatEqual(number_freqs.Median, nums.Median)
def test_normalize(self):
"""NumberFreqs should allow normalization on any type"""
self.Empty.normalize()
self.assertEqual(self.Empty, {})
# will refuse to normalize if sum is 0
orig = self.PosNeg.copy()
self.PosNeg.normalize()
self.assertEqual(self.PosNeg, orig)
# will normalize OK if total passed in
self.PosNeg.normalize(4) #self.PosNeg.Count)
expected = {-2:0.25, -1:0.25, 1:0.25, 2:0.25}
for key, val in expected.items():
self.assertFloatEqual(self.PosNeg[key], val)
def test_Uncertainty(self):
"""NumberFreqs Shannon entropy values should match spreadsheet"""
self.assertEqual(self.Empty.Uncertainty, 0)
self.assertEqual(self.PosNeg.Uncertainty, 2)
self.assertEqual(self.NumericDuplicated.Uncertainty, 1.5)
self.assertEqual("%6.4f" % self.NumericUnique.Uncertainty, '2.3219')
self.assertEqual(self.Constant.Uncertainty, 0)
def test_Mode(self):
"""NumberFreqs mode should return most frequent item"""
self.assertEqual(self.Empty.Mode, None)
assert(self.PosNeg.Mode in self.PosNeg)
assert(self.NumericUnique.Mode in self.NumericUnique)
self.assertEqual(self.NumericDuplicated.Mode, 1.5)
self.assertEqual(self.Constant.Mode, 1)
def test_randomSequence_one_item(self):
"""NumberFreqs randomSequence should work with one key"""
self.Constant.normalize()
rand = self.Constant.randomSequence(1000)
self.assertEqual(rand.count(1), 1000)
self.assertEqual(len(rand), 1000)
class NumberFreqsTests(NumberFreqsTestsI, TestCase):
"""Tests of (safe) NumberFreqs classes."""
ClassToTest = NumberFreqs
def setUp(self):
"""defines some standard frequency distributions to check"""
self.NumericUnique = self.ClassToTest([1,2,3,4,5])
self.NumericDuplicated = self.ClassToTest([1, 1.5, 1.5, 3.5])
self.Empty = self.ClassToTest()
self.PosNeg = self.ClassToTest([-2, -1, 1, 2])
self.Constant = self.ClassToTest([1]*5)
def test_setitem_bad(self):
"""NumberFreqs should not allow non-numeric values"""
self.assertRaises(ValueError, self.Empty.__setitem__, 'xyz', -0.01)
def test_add_bad(self):
"""NumberFreqs add should fail if key not numeric"""
self.assertRaises(ValueError, self.Empty.__iadd__, {'a':-1})
class UnsafeNumberFreqsTests(NumberFreqsTestsI, TestCase):
"""Tests of UnsafeNumberFreqs classes."""
ClassToTest = UnsafeNumberFreqs
def setUp(self):
"""defines some standard frequency distributions to check"""
self.NumericUnique = self.ClassToTest({1:1,2:1,3:1,4:1,5:1})
self.NumericDuplicated = self.ClassToTest({1:1,1.5:2,3.5:1})
self.Empty = self.ClassToTest()
self.PosNeg = self.ClassToTest({-2:1,-1:1,1:1,2:1})
self.Constant = self.ClassToTest({1:5})
#execute tests if called from command line
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_maths/test_stats/test_cai/__init__.py 000644 000765 000024 00000000554 12024702176 025663 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
__all__ = ['test_util',
'test_get_by_cai',
'test_adaptor',
]
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
PyCogent-1.5.3/tests/test_maths/test_stats/test_cai/test_adaptor.py 000644 000765 000024 00000001404 12024702176 026610 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests of the basic CAI adaptors."""
from cogent.util.unit_test import TestCase, main
import cogent.maths.stats.cai.adaptor
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Prototype"
class adaptor_tests(TestCase):
"""Tests of top-level functionality.
NOTE: The adaptors are currently tested in an integration test with the
drawing modules in test_draw/test_matplotlib/test_codon_usage. There are
not individual unit tests at present, although these should possibly be
added later.
"""
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_maths/test_stats/test_cai/test_get_by_cai.py 000644 000765 000024 00000001421 12024702176 027242 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests of the get_by_cai filter classes."""
from cogent.util.unit_test import TestCase, main
import cogent.maths.stats.cai.get_by_cai
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Prototype"
class get_by_cai_tests(TestCase):
"""Tests of top-level functionality.
NOTE: The adaptors are currently tested in an integration test with the
drawing modules in test_draw/test_matplotlib/test_codon_usage. There are
not individual unit tests at present, although these should possibly be
added later.
"""
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_maths/test_stats/test_cai/test_util.py 000644 000765 000024 00000035076 12024702176 026147 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests of the basic CAI calculations."""
from cogent.util.unit_test import TestCase, main
from math import log, exp
from operator import mul
from cogent.maths.stats.cai.util import cu, as_rna, synonyms_to_rna, \
get_synonyms, sum_codon_freqs, norm_to_max, arithmetic_mean, \
geometric_mean, codon_adaptiveness_all, codon_adaptiveness_blocks, \
valid_codons, set_min, cai_1, cai_2, cai_3
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
def product(x): return reduce(mul, x)
def amean(x): return sum(x)/float(len(x))
def gmean(x): return (product(x))**(1./len(x))
class cai_tests(TestCase):
"""Tests of top-level functionality."""
def test_as_rna(self):
"""as_rna should do correct conversion to RNA"""
self.assertEqual(as_rna('TCGT'), 'UCGU')
def test_synonyms_to_rna(self):
"""synonyms_to_rna should convert as expected"""
s = {'*':['TAA','TAG'], 'F':['TTT','TTC']}
self.assertEqual(synonyms_to_rna(s), {'*':['UAA','UAG'],'F':['UUU','UUC']})
def test_synonyms(self):
"""synonyms should produce correct results for standard genetic code.
NOTE: for standard genetic code, expect the following:
- Stop codons are UGA, UAA, UAG
- Single-codon blocs are M = 'AUG', W = 'UGG'
"""
result = get_synonyms()
self.assertEqual(len(result), 18)
self.assertEqual(''.join(sorted(result)), 'ACDEFGHIKLNPQRSTVY')
self.assertEqual(sorted(result['I']), ['AUA','AUC','AUU'])
#check that we can do it without eliminating single-codon blocks
result = get_synonyms(singles_removed=False)
self.assertEqual(len(result), 20)
self.assertEqual(''.join(sorted(result)), 'ACDEFGHIKLMNPQRSTVWY')
self.assertEqual(sorted(result['I']), ['AUA','AUC','AUU'])
self.assertEqual(result['W'], ['UGG'])
def test_sum_codon_freqs(self):
"""sum_codon_freqs should add list of codon freqs together, incl, missing keys"""
d = {'x':3, 'UUU':5, 'UAC':3}
d2 = {'y':5, 'UUU':1, 'AGG':2}
result = sum_codon_freqs([d,d2])
self.assertEqual(len(result), 64)
assert 'x' not in result #should exclude bad keys
self.assertEqual(result['UUU'], 6.0)
self.assertEqual(result['AGG'], 2.0)
self.assertEqual(result['UAC'], 3.0)
self.assertEqual(sum(result.values()), 11.0)
def test_norm_to_max(self):
"""norm_to_max should normalize vals in list to best val"""
a = [1,2,3,4]
self.assertEqual(norm_to_max(a), [.25,.5,.75,1])
def test_arithmetic_mean(self):
"""arithmetic_mean should average a list of means with freqs"""
obs = arithmetic_mean([1,2,3],[2,1,3])
exp = sum([1,1,2,3,3,3])/6.0
self.assertEqual(obs, exp)
#should also work without freqs
self.assertFloatEqual(arithmetic_mean([1,3,7]), 11/3.)
def test_geometric_mean(self):
"""geometric_mean should average a list of means with freqs"""
obs = geometric_mean([1,2,3],[2,1,3])
exp = (1*1*2*3*3*3)**(1/6.)
self.assertEqual(obs, exp)
obs = geometric_mean([0.01, 0.2, 0.5], [5, 2, 3])
exp = (.01*.01*.01*.01*.01*.2*.2*.5*.5*.5)**(0.1)
self.assertFloatEqual(obs,exp)
#should also work without freqs
self.assertFloatEqual(geometric_mean([0.01,0.2,0.5]), (.01*.2*.5)**(1/3.))
def test_codon_adaptiveness_all(self):
"""codon_adaptiveness_all should normalize all codons relative to the best one."""
codons = {'x':4, 'y':3, 'z':2, 'zz':0, 'zzz':4}
result = codon_adaptiveness_all(codons)
self.assertEqual(result, {'x':1., 'y':.75, 'z':.5, 'zz':0, 'zzz':1.})
def test_codon_adaptiveness_blocks(self):
"""codon_adaptiveness_blocks should normalize codons by the best in each block"""
codons = {'x':4, 'y':1, 'z':2, 'zz':0, 'zzz':2, 'zzzz':1}
blocks = {'A': ['x','y','z'], 'B':['zz','zzz','zzzz']}
result = codon_adaptiveness_blocks(codons, blocks)
self.assertEqual(result, {'x':1., 'y':.25, 'z':.5, 'zz':0, 'zzz':1., 'zzzz':.5})
def test_set_min(self):
"""set_min should set minimum value to specified threshold."""
codons = {'x':4, 'y':1e-5, 'z':0}
set_min(codons, 1)
self.assertEqual(codons, {'x':4, 'y':1, 'z':1})
def test_valid_codons(self):
"""valid_codons should extract all valid codons from blocks"""
blocks = {'A':['GCA','GCG'], 'C':['UGU','UGC']}
self.assertEqual(list(sorted(valid_codons(blocks))), ['GCA','GCG','UGC','UGU'])
def test_cai_1(self):
"""cai_1 should produce expected results"""
ref_freqs = cu.copy()
ref_freqs.update({'AGA':4, 'AGG':2, 'CCC':4, 'CCA':1, 'UGG':1})
#tests with arithmetic mean
gene_freqs = {'AGA':1}
self.assertEqual(cai_1(ref_freqs, gene_freqs, average=arithmetic_mean), 1)
gene_freqs = {'AGA':5}
self.assertEqual(cai_1(ref_freqs, gene_freqs, average=arithmetic_mean), 1)
gene_freqs = {'AGG':5}
self.assertEqual(cai_1(ref_freqs, gene_freqs, average=arithmetic_mean), 0.5)
gene_freqs = {'AGG':5,'AGA':5}
self.assertEqual(cai_1(ref_freqs, gene_freqs, average=arithmetic_mean), 0.75)
gene_freqs={'AGA':5,'CCC':1}
self.assertEqual(cai_1(ref_freqs, gene_freqs, average=arithmetic_mean), 1)
gene_freqs={'AGA':5,'CCA':5}
self.assertEqual(cai_1(ref_freqs, gene_freqs, average=arithmetic_mean), 0.625)
ref_freqs_2 = cu.copy()
ref_freqs_2.update({'AGA':4, 'AGG':2, 'CCC':5, 'CCA':1, 'UGG':1})
ref_freqs_2.update({'UUU':2,'UUC':1})
gene_freqs = {'AGA':3,'AGG':1,'CCC':2,'CCA':1,'UUU':1, 'UUC':2}
obs = cai_1(ref_freqs_2, gene_freqs, average=arithmetic_mean)
vals = [.8,.8,.8,.4,1,1,.2,.4,.2,.2]
expect = sum(vals)/len(vals)
self.assertFloatEqual(obs, expect)
#tests with geometric mean
gene_freqs = {'AGA':1}
self.assertEqual(cai_1(ref_freqs, gene_freqs, average=geometric_mean), 1)
gene_freqs = {'AGA':5}
self.assertEqual(cai_1(ref_freqs, gene_freqs, average=geometric_mean), 1)
gene_freqs = {'AGG':5}
self.assertEqual(cai_1(ref_freqs, gene_freqs, average=geometric_mean), 0.5)
gene_freqs = {'AGG':5,'AGA':5}
self.assertFloatEqual(cai_1(ref_freqs, gene_freqs, average=geometric_mean), \
(1**5 * 0.5**5)**(0.1))
gene_freqs={'AGA':5,'CCC':1}
self.assertEqual(cai_1(ref_freqs, gene_freqs, average=geometric_mean), 1)
gene_freqs={'AGA':5,'CCA':5}
self.assertFloatEqual(cai_1(ref_freqs, gene_freqs, average=geometric_mean), \
(1**5 * 0.25**5)**0.1)
ref_freqs_2 = cu.copy()
ref_freqs_2.update({'AGA':4, 'AGG':2, 'CCC':5, 'CCA':1, 'UGG':1})
ref_freqs_2.update({'UUU':2,'UUC':1})
gene_freqs = {'AGA':3,'AGG':1,'CCC':2,'CCA':1,'UUU':1, 'UUC':2}
obs = cai_1(ref_freqs_2, gene_freqs, average=geometric_mean)
vals = [.8,.8,.8,.4,1,1,.2,.4,.2,.2]
expect = (product(vals))**(1./len(vals))
self.assertFloatEqual(obs, expect)
def test_cai_2(self):
"""cai_2 should produce expected results"""
ref_freqs = cu.copy()
ref_freqs.update({'AGA':4, 'AGG':2, 'CCC':5, 'CCA':1, 'UGG':1})
#tests with arithmetic mean
gene_freqs = {'AGA':1}
self.assertEqual(cai_2(ref_freqs, gene_freqs, average=arithmetic_mean), 1)
gene_freqs = {'AGA':5}
self.assertEqual(cai_2(ref_freqs, gene_freqs, average=arithmetic_mean), 1)
gene_freqs = {'AGG':5}
self.assertEqual(cai_2(ref_freqs, gene_freqs, average=arithmetic_mean), 0.5)
gene_freqs = {'AGG':5,'AGA':5}
self.assertEqual(cai_2(ref_freqs, gene_freqs, average=arithmetic_mean), 0.75)
gene_freqs={'AGA':5,'CCC':1}
self.assertEqual(cai_2(ref_freqs, gene_freqs, average=arithmetic_mean), 1)
gene_freqs={'AGA':5,'CCA':5}
self.assertEqual(cai_2(ref_freqs, gene_freqs, average=arithmetic_mean), 0.6)
ref_freqs_2 = ref_freqs.copy()
ref_freqs_2.update({'UUU':2,'UUC':1})
gene_freqs = {'AGA':3,'AGG':1,'CCC':2,'CCA':1,'UUU':1, 'UUC':2}
obs = cai_2(ref_freqs_2, gene_freqs, average=arithmetic_mean)
vals = [1,1,1,.5,1,1,.2,1,.5,.5]
expect = sum(vals)/len(vals)
self.assertEqual(obs, expect)
#tests with geometric mean
gene_freqs = {'AGA':1}
self.assertEqual(cai_2(ref_freqs, gene_freqs, average=geometric_mean), 1)
gene_freqs = {'AGA':5}
self.assertEqual(cai_2(ref_freqs, gene_freqs, average=geometric_mean), 1)
gene_freqs = {'AGG':5}
self.assertEqual(cai_2(ref_freqs, gene_freqs, average=geometric_mean), 0.5)
gene_freqs = {'AGG':5,'AGA':5}
self.assertFloatEqual(cai_2(ref_freqs, gene_freqs, average=geometric_mean), \
(1**5 * 0.5**5)**(0.1))
gene_freqs={'AGA':5,'CCC':1}
self.assertEqual(cai_2(ref_freqs, gene_freqs, average=geometric_mean), 1)
gene_freqs={'AGA':5,'CCA':5}
self.assertFloatEqual(cai_2(ref_freqs, gene_freqs, average=geometric_mean), \
(1**5 * 0.2**5)**0.1)
ref_freqs_2 = ref_freqs.copy()
ref_freqs_2.update({'UUU':2,'UUC':1})
gene_freqs = {'AGA':3,'AGG':1,'CCC':2,'CCA':1,'UUU':1, 'UUC':2}
obs = cai_2(ref_freqs_2, gene_freqs, average=geometric_mean)
vals = [1,1,1,.5,1,1,.2,1,.5,.5]
expect = (product(vals))**(1./len(vals))
self.assertEqual(obs, expect)
#test that results match example on Gang Wu's CAI calculator page
ref_freqs = cu.copy()
ref_freqs.update({'UUU':78743, 'UUC':56591, 'UUA':51320, 'UUG':45581, \
'CUU':42704, 'CUC':35873, 'CUA':15275, 'CUG':168885})
gene_freqs={'UUU':6, 'UUC':3, 'CUU':3, 'CUC':2, 'CUG':8}
self.assertFloatEqual(cai_2(ref_freqs, gene_freqs, average=geometric_mean), \
exp((6*log(1) + 3*log(56591./78743) + 3*log(42704./168885) + \
2*log(35873./168885)+8*log(1))/22.))
def test_cai_3(self):
"""cai_3 should produce expected results"""
ref_freqs = cu.copy()
ref_freqs.update({'AGA':4, 'AGG':2, 'CCC':5, 'CCA':1, 'UGG':1})
#tests with arithmetic mean
gene_freqs = {'AGA':1}
self.assertEqual(cai_3(ref_freqs, gene_freqs, average=arithmetic_mean), 1)
gene_freqs = {'AGA':5}
self.assertEqual(cai_3(ref_freqs, gene_freqs, average=arithmetic_mean), 1)
gene_freqs = {'AGG':5}
self.assertEqual(cai_3(ref_freqs, gene_freqs, average=arithmetic_mean), 0.5)
gene_freqs = {'AGG':5,'AGA':5}
self.assertEqual(cai_3(ref_freqs, gene_freqs, average=arithmetic_mean), 0.75)
gene_freqs={'AGA':5,'CCC':1}
self.assertEqual(cai_3(ref_freqs, gene_freqs, average=arithmetic_mean), 1)
gene_freqs={'AGA':5,'CCA':5}
self.assertEqual(cai_3(ref_freqs, gene_freqs, average=arithmetic_mean), 0.6)
ref_freqs_2 = ref_freqs.copy()
ref_freqs_2.update({'UUU':2,'UUC':1})
gene_freqs = {'AGA':3,'AGG':1,'CCC':2,'CCA':1,'UUU':1, 'UUC':2}
obs = cai_3(ref_freqs_2, gene_freqs, average=arithmetic_mean)
family_vals = [[1,1,1,.5],[1,1,.2],[1,.5,.5]]
family_averages = map(amean, family_vals)
expect = amean(family_averages)
self.assertEqual(obs, expect)
#tests with geometric mean
gene_freqs = {'AGA':1}
self.assertEqual(cai_3(ref_freqs, gene_freqs, average=geometric_mean), 1)
gene_freqs = {'AGA':5}
self.assertEqual(cai_3(ref_freqs, gene_freqs, average=geometric_mean), 1)
gene_freqs = {'AGG':5}
self.assertEqual(cai_3(ref_freqs, gene_freqs, average=geometric_mean), 0.5)
gene_freqs = {'AGG':5,'AGA':5}
self.assertFloatEqual(cai_3(ref_freqs, gene_freqs, average=geometric_mean), \
(1**5 * 0.5**5)**(0.1))
gene_freqs={'AGA':5,'CCC':1}
self.assertEqual(cai_3(ref_freqs, gene_freqs, average=geometric_mean), 1)
gene_freqs={'AGA':5,'CCA':5}
self.assertFloatEqual(cai_3(ref_freqs, gene_freqs, average=geometric_mean), \
(1**5 * 0.2**5)**0.1)
ref_freqs_2 = ref_freqs.copy()
ref_freqs_2.update({'UUU':2,'UUC':1})
gene_freqs = {'AGA':3,'AGG':1,'CCC':2,'CCA':1,'UUU':1, 'UUC':2}
obs = cai_3(ref_freqs_2, gene_freqs, average=geometric_mean)
family_vals = [[1,1,1,.5],[1,1,.2],[1,.5,.5]]
family_averages = map(gmean, family_vals)
expect = gmean(family_averages)
self.assertEqual(obs, expect)
#tests with Eyre-Walker's variant -- should be same as geometric mean
gene_freqs = {'AGA':1}
self.assertEqual(cai_3(ref_freqs, gene_freqs, average='eyre_walker'), 1)
gene_freqs = {'AGA':5}
self.assertEqual(cai_3(ref_freqs, gene_freqs, average='eyre_walker'), 1)
gene_freqs = {'AGG':5}
self.assertEqual(cai_3(ref_freqs, gene_freqs, average='eyre_walker'), 0.5)
gene_freqs = {'AGG':5,'AGA':5}
self.assertFloatEqual(cai_3(ref_freqs, gene_freqs, average='eyre_walker'), \
(1**5 * 0.5**5)**(0.1))
gene_freqs={'AGA':5,'CCC':1}
self.assertEqual(cai_3(ref_freqs, gene_freqs, average='eyre_walker'), 1)
gene_freqs={'AGA':5,'CCA':5}
self.assertFloatEqual(cai_3(ref_freqs, gene_freqs, average='eyre_walker'), \
(1**5 * 0.2**5)**0.1)
ref_freqs_2 = ref_freqs.copy()
ref_freqs_2.update({'UUU':2,'UUC':1})
gene_freqs = {'AGA':3,'AGG':1,'CCC':2,'CCA':1,'UUU':1, 'UUC':2}
obs = cai_3(ref_freqs_2, gene_freqs, average='eyre_walker')
family_vals = [[1,1,1,.5],[1,1,.2],[1,.5,.5]]
family_averages = map(gmean, family_vals)
expect = gmean(family_averages)
self.assertEqual(obs, expect)
#test results for Gang Wu's example (unfortunately, no worked example for
#this model)
ref_freqs = cu.copy()
ref_freqs.update({'UUU':78743, 'UUC':56591, 'UUA':51320, 'UUG':45581, \
'CUU':42704, 'CUC':35873, 'CUA':15275, 'CUG':168885})
gene_freqs={'UUU':6, 'UUC':3, 'CUU':3, 'CUC':2, 'CUG':8}
obs = cai_3(ref_freqs, gene_freqs, average=geometric_mean)
family_vals = [6*[1]+3*[56591./78743],\
3*[42704./168885] + 2*[35873./168885]+8*[1]]
family_averages = map(gmean, family_vals)
expect = gmean(family_averages)
self.assertFloatEqual(obs, expect)
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_maths/test_spatial/__init__.py 000644 000765 000024 00000000450 12024702176 024362 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
__all__ = ['test_ckd3']
__author__ = ""
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Marcin Cieslik"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Marcin Cieslik"
__email__ = "mpc4p@virginia.edu"
__status__ = "Development"
PyCogent-1.5.3/tests/test_maths/test_spatial/test_ckd3.py 000644 000765 000024 00000003626 12024702176 024516 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
import os, tempfile
import numpy as np
try:
from cogent.util.unit_test import TestCase, main
except ImportError:
from zenpdb.cogent.util.unit_test import TestCase, main
__author__ = "Marcin Cieslik"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Marcin Cieslik"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Marcin Cieslik"
__email__ = "mpc4p@virginia.edu"
__status__ = "Development"
class KDTreeTest(TestCase):
"""Tests KD-Trees"""
def setUp(self):
self.arr = np.random.random(3000).reshape((1000, 3))
self.point = np.random.random(3)
self.center = np.array([0.5, 0.5, 0.5])
def test_0import(self):
# sort by name
"""tests if can import ckd3 cython extension."""
global ckd3
from cogent.maths.spatial import ckd3
assert 'KDTree' in dir(ckd3)
def test_instance(self):
"""check if KDTree instance behaves correctly."""
kdt = ckd3.KDTree(self.arr)
self.assertEquals(kdt.dims, 3)
def assig():
kdt.dims = 4
self.assertRaises(AttributeError, assig)
self.assertEquals(kdt.dims, 3)
self.assertEquals(kdt.pnts, 1000)
def test_knn(self):
"""testing k-nearest neighbors.
"""
sqd = np.sum(np.power((self.arr - self.point), 2), axis=1)
sorted_idx = sqd.argsort()
kdt = ckd3.KDTree(self.arr)
points, dists = kdt.knn(self.point, 5)
self.assertEqualItems(sorted_idx[:5], points)
def test_rn(self):
"""testing neighbors within radius.
"""
sqd = np.sum(np.power((self.arr - self.point), 2), axis=1)
sqd = sqd[sqd <= 0.05]
sqd.sort()
kdt = ckd3.KDTree(self.arr)
points, dists = kdt.rn(self.point, 0.05)
dists.sort()
self.assertFloatEqual(dists, sqd)
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_maths/test_matrix/__init__.py 000644 000765 000024 00000000440 12024702176 024230 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
__all__ = ['test_distance']
__author__ = ""
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
PyCogent-1.5.3/tests/test_maths/test_matrix/test_distance.py 000644 000765 000024 00000037563 12024702176 025342 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for distance matrices.
"""
from cogent.util.unit_test import TestCase, main
from cogent.maths.matrix.distance import DistanceMatrix
from cogent.util.dict2d import largest, Dict2DError, Dict2DSparseError
from cogent.parse.aaindex import AAIndex1Record
from cogent.maths.stats.util import Freqs
from copy import deepcopy
__author__ = "Greg Caporaso"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Greg Caporaso", "Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Greg Caporaso"
__email__ = "caporaso@colorado.edu"
__status__ = "Production"
class DistanceMatrixTests(TestCase):
def setUp(self):
# v : vector
# m : matrix
self.default_keys = list('ACDEFGHIKLMNPQRSTVWY')
# Set up some matrices
v1 = {'A':1, 'B':2, 'C':3}
v2 = {'A':4, 'B':5, 'C':6}
v3 = {'A':7, 'B':8, 'C':9}
self.m1 = {'A':dict(v1),\
'B':dict(v2),\
'C':dict(v3)}
v4 = {'A':0, 'B':1, 'C':5}
v5 = {'A':5, 'B':0, 'C':4, 'X':99}
v6 = {'A':5, 'B':8, 'C':0}
self.m2 = {'A':dict(v4),\
'B':dict(v5),\
'C':dict(v6)}
self.matrices = [self.m1,self.m2]
aar_data = dict(zip(self.default_keys, [i*.15 for i in range(20)]))
# Setup a AAIndex1Record for testing purposes
self.aar = AAIndex1Record("5", "Some Info",\
"25", "Greg", "A test",\
"something", "This is a test, this is only a test",\
[0.987, 0.783, 1., 0], aar_data)
# From test_Dict2D, used in tests at end of this file for
# inheritance testing
self.empty = {}
self.single_same = {'a':{'a':2}}
self.single_diff = {'a':{'b':3}}
self.square = {
'a':{'a':1,'b':2,'c':3},
'b':{'a':2,'b':4,'c':6},
'c':{'a':3,'b':6,'c':9},
}
self.top_triangle = {
'a':{'a':1, 'b':2, 'c':3},
'b':{'b':4, 'c':6},
'c':{'c':9}
}
self.bottom_triangle = {
'b':{'a':2},
'c':{'a':3, 'b':6}
}
self.sparse = {
'a':{'a':1, 'c':3},
'd':{'b':2},
}
self.dense = {
'a':{'a':1,'b':2,'c':3},
'b':{'a':2,'b':4,'c':6},
}
def test_all_init_parameters(self):
""" All parameters to init are handled correctly """
# will fail if any paramters are not recognized
d = DistanceMatrix()
d = DistanceMatrix(data={})
d = DistanceMatrix(RowOrder=[])
d = DistanceMatrix(ColOrder=[])
d = DistanceMatrix(Pad=True)
d = DistanceMatrix(Default=42)
d = DistanceMatrix(data={},RowOrder=[],ColOrder=[],Pad=True,Default=42)
def test_attribute_init(self):
""" Proper initialization of all attributes """
# proper setting to defaults
d = DistanceMatrix(data={'a':{'a':1}})
self.assertEqual(d.RowOrder, self.default_keys)
self.assertEqual(d.ColOrder, self.default_keys)
self.assertEqual(d.Pad, True)
self.assertEqual(d.Default, None)
self.assertEqual(d.RowConstructor, dict)
# differ from defaults
d = DistanceMatrix(data={'a':{'b':1}},RowOrder=['a'],\
ColOrder=['b'],Pad=False,Default=42,RowConstructor=Freqs)
self.assertEqual(d.RowOrder, ['a'])
self.assertEqual(d.ColOrder, ['b'])
self.assertEqual(d.Pad, False)
self.assertEqual(d.Default, 42)
self.assertEqual(d.RowConstructor, Freqs)
# differ from defaults and no data
d = DistanceMatrix(RowOrder=['a'],\
ColOrder=['b'],Pad=False,Default=42,RowConstructor=Freqs)
self.assertEqual(d.RowOrder, ['a'])
self.assertEqual(d.ColOrder, ['b'])
self.assertEqual(d.Pad, False)
self.assertEqual(d.Default, 42)
self.assertEqual(d.RowConstructor, Freqs)
def test_Order_defaults(self):
""" RowOrder and ColOrder are set to default as expected """
for m in self.matrices:
dm = DistanceMatrix(data=m)
self.assertEqual(dm.RowOrder, self.default_keys)
self.assertEqual(dm.ColOrder, self.default_keys)
def test_Order_parameters(self):
""" RowOrder and ColOrder are set to paramters as expected """
row_order = ['a']
col_order = ['b']
for m in self.matrices:
dm = DistanceMatrix(data=m, RowOrder=row_order, ColOrder=col_order)
self.assertEqual(dm.RowOrder, row_order)
self.assertEqual(dm.ColOrder, col_order)
def test_rowKeys(self):
""" rowKeys functions properly """
dm = DistanceMatrix(data={'a':{'b':1}})
goal = self.default_keys + ['a']
goal.sort()
actual = dm.rowKeys()
actual.sort()
self.assertEqual(actual,goal)
def test_colKeys(self):
""" colKeys functions properly """
dm = DistanceMatrix(data={'a':{'b':1}})
goal = self.default_keys + ['b']
goal.sort()
actual = dm.colKeys()
actual.sort()
self.assertEqual(actual,goal)
def test_sharedColKeys(self):
""" sharedColKeys functions properly """
# no shared keys b/c a is not in RowOrder and therefore not padded
dm = DistanceMatrix(data={'a':{'b':1}})
self.assertEqual(dm.sharedColKeys(),[])
# shared should be only self.default_keys b/c 'b' not in ColOrder
dm = DistanceMatrix(data={'a':{'b':1}},\
RowOrder=self.default_keys + ['a'])
actual = dm.sharedColKeys()
actual.sort()
self.assertEqual(actual, self.default_keys)
# shared should be self.default_keys + 'b'
dm = DistanceMatrix(data={'a':{'b':1}},\
RowOrder=self.default_keys + ['a'],\
ColOrder=self.default_keys + ['b'])
actual = dm.sharedColKeys()
actual.sort()
self.assertEqual(actual, self.default_keys + ['b'])
def test_default_padding(self):
""" Default padding functions as expected """
for m in self.matrices:
dm = DistanceMatrix(data=m)
for r in self.default_keys:
for c in self.default_keys:
dm[r][c]
def test_init_data_types(self):
""" Correct init from varying data types """
# No data
goal = {}.fromkeys(self.default_keys)
for r in goal:
goal[r] = {}.fromkeys(self.default_keys)
dm = DistanceMatrix()
self.assertEqual(dm,goal)
# data is dict of dicts
dm = DistanceMatrix(data={'a':{'b':1}}, Pad=False)
self.assertEqual(dm,{'a':{'b':1}})
# data is list of lists
dm = DistanceMatrix(data=[[1]],RowOrder=['a'],ColOrder=['b'], Pad=False)
self.assertEqual(dm,{'a':{'b':1}})
# data is in Indices form
dm = DistanceMatrix(data=[('a','b',1)], Pad=False)
self.assertEqual(dm,{'a':{'b':1}})
def test_sparse_init(self):
""" Init correctly from a sparse dict """
d = DistanceMatrix(data={'A':{'C':0.}})
for r in self.default_keys:
for c in self.default_keys:
if (r == 'A') and (c == 'C'):
self.assertEqual(d[r][c],0.)
else:
self.assertEqual(d[r][c],None)
def test_dict_integrity(self):
""" Integrity of key -> value pairs """
for m in self.matrices:
dm = DistanceMatrix(data=m)
self.assertEqual(dm['A']['A'], m['A']['A'])
self.assertEqual(dm['B']['C'], m['B']['C'])
def test_attribute_forwarder_integrity(self):
""" Integrity of attribute forwarding """
dm = DistanceMatrix(data=self.m2,info=self.aar)
self.assertEqual(dm.ID, '5')
self.assertEqual(dm.Correlating, [0.987, 0.783, 1., 0])
self.assertEqual(dm.Data['C'], 0.15)
def test_copy(self):
""" Copy functions as expected"""
dm = DistanceMatrix(data=self.m2, RowOrder=self.m2.keys(), info=self.aar)
c = dm.copy()
self.assertEqual(c['A']['A'],dm['A']['A'])
self.assertEqual(c.RowOrder,dm.RowOrder)
self.assertEqual(c.ColOrder,dm.ColOrder)
self.assertEqual(c.Pad,dm.Pad)
self.assertEqual(c.Power,dm.Power)
# Make sure it's a separate object
c['A']['A'] = 999
self.assertNotEqual(c['A']['A'],dm['A']['A'])
def test_attribute_forwarder_integrity_after_copy(self):
""" Integrity of attribute forwarding following a copy()"""
dm = DistanceMatrix(data=self.m2, RowOrder=self.m2.keys(), info=self.aar)
c = dm.copy()
# dm.ID == '5'
self.assertEqual(c.ID, dm.ID)
self.assertEqual(c.Correlating, dm.Correlating)
self.assertEqual(c.Data['R'], dm.Data['R'])
c.ID = '0'
self.assertNotEqual(c.ID,dm.ID)
def test_setDiag(self):
""" setDiag works as expected """
for m in self.matrices:
# create a deep copy so we can test against original
# matrix without it being effected by altering the object
# based on it
n = deepcopy(m)
dm = DistanceMatrix(data=n, RowOrder=m.keys())
# set diag to 42
dm.setDiag(42)
# test that diag is 42
for k in dm:
self.assertEqual(dm[k][k],42)
# test that no diag is unchanged
self.assertEqual(dm['B']['A'], m['B']['A'])
self.assertEqual(dm['B']['C'], m['B']['C'])
def test_scale(self):
""" Scale correctly applies function to all elements """
for m in self.matrices:
# Test square all elements
# explicit tests
n = deepcopy(m)
dm = DistanceMatrix(data=n, RowOrder=m.keys(), Pad=False)
dm.scale(lambda x: x**2)
self.assertEqual(dm['A']['A'],m['A']['A']**2)
self.assertEqual(dm['B']['A'],m['B']['A']**2)
self.assertEqual(dm['B']['C'],m['B']['C']**2)
# Test cube all elements
# explicit tests
n = deepcopy(m)
dm = DistanceMatrix(data=n, RowOrder=m.keys(), Pad=False)
dm.scale(lambda x: x**3)
self.assertEqual(dm['A']['A'],m['A']['A']**3)
self.assertEqual(dm['B']['A'],m['B']['A']**3)
self.assertEqual(dm['B']['C'],m['B']['C']**3)
# Test linearize all elements
# explicit tests
n = deepcopy(m)
dm = DistanceMatrix(data=n, RowOrder=m.keys(), Pad=False)
dm.scale(lambda x: 10**-(x/10.0))
self.assertFloatEqual(dm['A']['A'],10**-(m['A']['A']/10.))
self.assertFloatEqual(dm['B']['A'],10**-(m['B']['A']/10.))
self.assertFloatEqual(dm['B']['C'],10**-(m['B']['C']/10.))
def test_elementPow_valid(self):
""" elementPow correctly scales all elements and updates self.Power"""
for m in self.matrices:
# Test square all elements
# explicit tests
n = deepcopy(m)
dm = DistanceMatrix(data=n, RowOrder=n.keys(),ColOrder=n.keys(),\
Pad=False)
dm.elementPow(2)
self.assertEqual(dm.Power, 2)
self.assertEqual(dm['A']['A'],m['A']['A']**2)
self.assertEqual(dm['B']['A'],m['B']['A']**2)
self.assertEqual(dm['B']['C'],m['B']['C']**2)
# Test cube square root of all elements
# explicit tests
n = deepcopy(m)
dm = DistanceMatrix(data=n, RowOrder=n.keys(),ColOrder=n.keys(),\
Pad=False)
dm.elementPow(3)
dm.elementPow(1./2.)
self.assertEqual(dm.Power, 3./2.)
self.assertEqual(dm['A']['A'],m['A']['A']**(3./2.))
self.assertEqual(dm['B']['A'],m['B']['A']**(3./2.))
self.assertEqual(dm['B']['C'],m['B']['C']**(3./2.))
def test_elementPow_ignore_invalid(self):
""" elementPow correctly detects and ignores invalid data"""
for m in self.matrices:
# Test square all elements
# explicit tests
n = deepcopy(m)
dm = DistanceMatrix(data=n, RowOrder=n.keys(),ColOrder=n.keys(),\
Pad=False)
dm['A']['A'] = 'p'
dm.elementPow(2)
self.assertEqual(dm.Power, 2.)
self.assertEqual(dm['A']['A'],'p')
n = deepcopy(m)
dm = DistanceMatrix(data=n, RowOrder=n.keys(),ColOrder=n.keys(),\
Pad=False)
dm['A']['A'] = None
dm.elementPow(2)
self.assertEqual(dm.Power, 2.)
self.assertEqual(dm['A']['A'],None)
def test_elementPow_error_on_invalid(self):
""" elementPow correctly raises error on invalid data"""
for m in self.matrices:
# Test square all elements
# explicit tests
n = deepcopy(m)
dm = DistanceMatrix(data=n, RowOrder=n.keys(),ColOrder=n.keys(),\
Pad=False)
dm['A']['A'] = 'p'
self.assertRaises(TypeError,dm.elementPow,2,ignore_invalid=False)
dm['A']['A'] = None
self.assertRaises(TypeError,dm.elementPow,2,ignore_invalid=False)
def test_elementPow_invalid_pow(self):
""" elementPow correctly raises error on invalid power """
for m in self.matrices:
n = deepcopy(m)
dm = DistanceMatrix(data=n, RowOrder=n.keys(),ColOrder=n.keys(),\
Pad=False)
self.assertRaises(TypeError,dm.elementPow,None,ignore_invalid=False)
self.assertRaises(TypeError,dm.elementPow,'a',ignore_invalid=False)
def test_transpose(self):
""" transpose functions as expected """
for m in self.matrices:
d = DistanceMatrix(data=m)
t = d.copy()
t.transpose()
# Note, this line will fail on a matrix where transpose = original
self.assertNotEqual(t,d)
for r in t:
for c in t[r]:
self.assertEqual(t[r][c],d[c][r])
t.transpose()
self.assertEqual(t,d)
def test_reflect(self):
""" reflect functions as expected """
for m in self.matrices:
d = DistanceMatrix(data=m)
n = d.copy()
# Only testing one method, all other are tested in superclass, so
# redundant testing is probably not necessary
n.reflect(method=largest)
for r in d.RowOrder:
for c in d.ColOrder:
if d[r][c] > d[c][r]:
goal = d[r][c]
else:
goal = d[c][r]
self.assertEqual(n[r][c],goal)
self.assertEqual(n[c][r],goal)
######
# Following tests copied (and slightly modified) from test_DistanceMatrix and
# written by Rob Knight. Intended to test inheritance
#####
def test_toDelimited(self):
"""DistanceMatrix toDelimited functions as expected"""
d = DistanceMatrix(self.square,Pad=False)
d.RowOrder = d.ColOrder = 'abc'
self.assertEqual(d.toDelimited(), \
'-\ta\tb\tc\na\t1\t2\t3\nb\t2\t4\t6\nc\t3\t6\t9')
self.assertEqual(d.toDelimited(headers=False), \
'1\t2\t3\n2\t4\t6\n3\t6\t9')
#set up a custom formatter...
def my_formatter(x):
try:
return '%1.1f' % x
except:
return str(x)
#...and use it
self.assertEqual(d.toDelimited(headers=True, item_delimiter='x', \
row_delimiter='y', formatter=my_formatter), \
'-xaxbxcyax1.0x2.0x3.0ybx2.0x4.0x6.0ycx3.0x6.0x9.0')
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_format/__init__.py 000644 000765 000024 00000000643 12024702176 022046 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
__all__ = ['test_mage', 'test_fasta', 'test_pdb_color', 'test_xyzrn']
__author__ = ""
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Gavin Huttley", "Sandra Smit",
"Marcin Cieslik", "Jeremy Widmann"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Production"
PyCogent-1.5.3/tests/test_format/test_bedgraph.py 000644 000765 000024 00000012777 12024702176 023135 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for Mage format writer.
"""
from __future__ import division
from cogent.util.unit_test import TestCase, main
from cogent.util.table import Table
from cogent.format.bedgraph import get_header
__author__ = "Gavin Huttley"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Gavin Huttley"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Production"
from cogent.util.unit_test import TestCase, main
class FormatBedgraph(TestCase):
def test_only_required_columns(self):
"""generate bedgraph from minimal data"""
table = Table(header=['chrom', 'start', 'end', 'value'],
rows=[['1', 100, i, 0] for i in range(101,111)] + \
[['1', 150, i, 10] for i in range(151,161)])
bgraph = table.tostring(format='bedgraph', name='test track',
description='test of bedgraph', color=(255,0,0))
self.assertTrue(bgraph,
'\n'.join(['track type=bedGraph name="test track" '\
+'description="test of bedgraph" color=255,0,0',
'1\t100\t110\t0', '1\t150\t160\t10']))
def test_merged_overlapping_spans(self):
"""bedgraph merged overlapping spans, one chrom"""
rows = [['1', i, i+1, 0] for i in range(100, 121)] +\
[['1', i, i+1, 10] for i in range(150, 161)]
table = Table(header=['chrom', 'start', 'end', 'value'], rows=rows)
bgraph = table.tostring(format='bedgraph', name='test track',
description='test of bedgraph', color=(255,0,0))
self.assertTrue(bgraph,
'\n'.join(['track type=bedGraph name="test track" '\
+'description="test of bedgraph" color=255,0,0',
'1\t100\t120\t0', '1\t150\t160\t10']))
def test_merged_overlapping_spans_multichrom(self):
"""bedgraph merged overlapping spans, two crhoms"""
rows = [['1', i, i+1, 0] for i in range(100, 121)] +\
[['1', i, i+1, 10] for i in range(150, 161)]
rows += [['2', i, i+1, 0] for i in range(100, 121)]
table = Table(header=['chrom', 'start', 'end', 'value'], rows=rows)
bgraph = table.tostring(format='bedgraph', name='test track',
description='test of bedgraph', color=(255,0,0))
self.assertTrue(bgraph,
'\n'.join(['track type=bedGraph name="test track" '\
+'description="test of bedgraph" color=255,0,0',
'1\t100\t120\t1', '1\t150\t160\t10', '2\t105\t120\t1',]))
def test_invalid_args_fail(self):
"""incorrect bedgraph args causes RuntimeError"""
rows = [['1', i, i+1, 0] for i in range(100, 121)] +\
[['1', i, i+1, 10] for i in range(150, 161)]
table = Table(header=['chrom', 'start', 'end', 'value'], rows=rows)
self.assertRaises(RuntimeError, table.tostring,
format='bedgraph', name='test track',
description='test of bedgraph', color=(255,0,0), abc=None)
def test_invalid_table_fails(self):
"""assertion error if table has > 4 columns"""
rows = [['1', i, i+1, 0, 1] for i in range(100, 121)] +\
[['1', i, i+1, 10, 1] for i in range(150, 161)]
table = Table(header=['chrom', 'start', 'end', 'value', 'blah'],
rows=rows)
self.assertRaises(AssertionError, table.tostring,
format='bedgraph', name='test track',
description='test of bedgraph', color=(255,0,0), abc=None)
def test_boolean_correctly_formatted(self):
"""boolean setting correctly formatted"""
rows = [['1', i, i+1, 0] for i in range(100, 121)] +\
[['1', i, i+1, 10] for i in range(150, 161)]
table = Table(header=['chrom', 'start', 'end', 'value'], rows=rows)
bgraph = table.tostring(format='bedgraph', name='test track',
description='test of bedgraph', color=(255,0,0), autoScale=True)
self.assertTrue(bgraph,
'\n'.join(['track type=bedGraph name="test track" '\
+'description="test of bedgraph" color=255,0,0 autoScale=on',
'1\t100\t110\t1', '1\t150\t160\t10']))
def test_int_correctly_formatted(self):
"""int should be correctly formatted"""
rows = [['1', i, i+1, 0] for i in range(100, 121)] +\
[['1', i, i+1, 10] for i in range(150, 161)]
table = Table(header=['chrom', 'start', 'end', 'value'], rows=rows)
bgraph = table.tostring(format='bedgraph', name='test track',
description='test of bedgraph', color=(255,0,0), smoothingWindow=10)
self.assertTrue(bgraph,
'\n'.join(['track type=bedGraph name="test track" '\
+'description="test of bedgraph" color=255,0,0 smoothingWindow=10',
'1\t100\t110\t1', '1\t150\t160\t10']))
def test_raises_on_incorrect_format_val(self):
"""raise AssertionError when provide incorrect format value"""
rows = [['1', i, i+1, 0] for i in range(100, 121)] +\
[['1', i, i+1, 10] for i in range(150, 161)]
table = Table(header=['chrom', 'start', 'end', 'value'], rows=rows)
self.assertRaises(AssertionError, table.tostring,
format='bedgraph', name='test track',
description='test of bedgraph', color=(255,0,0),
windowingFunction='sqrt')
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_format/test_clustal.py 000644 000765 000024 00000005062 12024702176 023015 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests for Clustal sequence format writer.
"""
from cogent.util.unit_test import TestCase, main
from cogent.format.clustal import clustal_from_alignment
from cogent.core.alignment import Alignment
from cogent.core.sequence import Sequence
from cogent.core.info import Info
__author__ = "Jeremy Widmann"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jeremy Widmann"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Jeremy Widmann"
__email__ = "jeremy.widmann@colorado.edu"
__status__ = "Production"
class ClustalTests(TestCase):
"""Tests for Clustal writer.
"""
def setUp(self):
"""Setup for Clustal tests."""
self.unaligned_dict = {'1st':'AAA','2nd':'CCCC','3rd':'GGGG',
'4th':'UUUU'}
self.alignment_dict = {'1st':'AAAA','2nd':'CCCC','3rd':'GGGG',
'4th':'UUUU'}
#create alignment change order.
self.alignment_object = Alignment(self.alignment_dict)
self.alignment_order = ['2nd','4th','3rd','1st']
self.alignment_object.RowOrder=self.alignment_order
self.clustal_with_label=\
"""CLUSTAL
1st AAAA
2nd CCCC
3rd GGGG
4th UUUU
"""
self.clustal_with_label_lw2=\
"""CLUSTAL
1st AA
2nd CC
3rd GG
4th UU
1st AA
2nd CC
3rd GG
4th UU
"""
self.clustal_with_label_reordered=\
"""CLUSTAL
2nd CCCC
4th UUUU
3rd GGGG
1st AAAA
"""
self.clustal_with_label_lw2_reordered=\
"""CLUSTAL
2nd CC
4th UU
3rd GG
1st AA
2nd CC
4th UU
3rd GG
1st AA
"""
def test_clustal_from_alignment_unaligned(self):
"""should raise error with unaligned seqs."""
self.assertRaises(ValueError,\
clustal_from_alignment,self.unaligned_dict)
def test_clustal_from_alignment(self):
"""should return correct clustal string."""
self.assertEqual(clustal_from_alignment({}),'')
self.assertEqual(clustal_from_alignment(self.alignment_dict),\
self.clustal_with_label)
self.assertEqual(clustal_from_alignment(self.alignment_dict,
interleave_len=2),self.clustal_with_label_lw2)
def test_clustal_from_alignment_reordered(self):
"""should return correct clustal string."""
self.assertEqual(clustal_from_alignment(self.alignment_object),\
self.clustal_with_label_reordered)
self.assertEqual(clustal_from_alignment(self.alignment_object,
interleave_len=2),self.clustal_with_label_lw2_reordered)
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_format/test_fasta.py 000644 000765 000024 00000006207 12024702176 022446 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests for FASTA sequence format writer.
"""
from cogent.util.unit_test import TestCase, main
from cogent.format.fasta import fasta_from_sequences, fasta_from_alignment
from cogent.core.alignment import Alignment
from cogent.core.sequence import Sequence
from cogent.core.info import Info
__author__ = "Jeremy Widmann"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jeremy Widmann", "Gavin Huttley", "Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Jeremy Widmann"
__email__ = "jeremy.widmann@colorado.edu"
__status__ = "Production"
class FastaTests(TestCase):
"""Tests for Fasta writer.
"""
def setUp(self):
"""Setup for Fasta tests."""
self.strings = ['AAAA','CCCC','gggg','uuuu']
self.labels = ['1st','2nd','3rd','4th']
self.infos = ["Dog", "Cat", "Mouse", "Rat"]
self.sequences_with_labels = map(Sequence, self.strings)
self.sequences_with_names = map(Sequence, self.strings)
for l,sl,sn in zip(self.labels,self.sequences_with_labels,\
self.sequences_with_names):
sl.Label = l
sn.Name = l
self.fasta_no_label='>0\nAAAA\n>1\nCCCC\n>2\ngggg\n>3\nuuuu'
self.fasta_with_label=\
'>1st\nAAAA\n>2nd\nCCCC\n>3rd\nGGGG\n>4th\nUUUU'
self.fasta_with_label_lw2=\
'>1st\nAA\nAA\n>2nd\nCC\nCC\n>3rd\nGG\nGG\n>4th\nUU\nUU'
self.alignment_dict = {'1st':'AAAA','2nd':'CCCC','3rd':'GGGG',
'4th':'UUUU'}
self.alignment_object = Alignment(self.alignment_dict)
for label, info in zip(self.labels, self.infos):
self.alignment_object.NamedSeqs[label].Info = Info(species=info)
self.fasta_with_label_species=\
'>1st:Dog\nAAAA\n>2nd:Cat\nCCCC\n>3rd:Mouse\nGGGG\n>4th:Rat\nUUUU'
self.alignment_object.RowOrder = ['1st','2nd','3rd','4th']
def test_fastaFromSequence(self):
"""should return correct fasta string."""
self.assertEqual(fasta_from_sequences(''),'')
self.assertEqual(fasta_from_sequences(self.strings),\
self.fasta_no_label)
self.assertEqual(fasta_from_sequences(self.sequences_with_labels),\
self.fasta_with_label)
self.assertEqual(fasta_from_sequences(self.sequences_with_names),\
self.fasta_with_label)
make_seqlabel = lambda seq: "%s:%s" % (seq.Name, seq.Info.species)
seqs = [self.alignment_object.NamedSeqs[label] for label in self.labels]
self.assertEqual(fasta_from_sequences(seqs,
make_seqlabel=make_seqlabel), self.fasta_with_label_species)
def test_fasta_from_alignment(self):
"""should return correct fasta string."""
self.assertEqual(fasta_from_alignment({}),'')
self.assertEqual(fasta_from_alignment(self.alignment_dict),\
self.fasta_with_label)
self.assertEqual(fasta_from_alignment(self.alignment_dict,
line_wrap=2),self.fasta_with_label_lw2)
self.assertEqual(fasta_from_alignment(self.alignment_object),\
self.fasta_with_label)
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_format/test_mage.py 000644 000765 000024 00000050443 12024702176 022262 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Unit tests for Mage format writer.
"""
from __future__ import division
from numpy import array
from copy import deepcopy
from cogent.util.unit_test import TestCase, main
from cogent.format.mage import MagePoint, MageList, MageGroup, MageHeader, \
Kinemage, MagePointFromBaseFreqs
from cogent.core.usage import BaseUsage
from cogent.util.misc import Delegator
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Sandra Smit", "Gavin Huttley", "Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Production"
class MagePointTests(TestCase):
"""Tests of the MagePoint class, holding information about points."""
def test_init_empty(self):
"""MagePoint should init correctly with no data"""
m = MagePoint()
self.assertEqual(str(m), ' '.join(map(str,[0,0,0])))
def test_init(self):
"""MagePoint should init correctly with normal cases"""
#label and coords
m = MagePoint([0.200,0.000,0.800], '0.800')
self.assertEqual(str(m), '{0.800} ' + \
' '.join(map(str, ([0.200,0.000,0.800]))))
#coords only
m = MagePoint([0.200,0.000,0.800])
self.assertEqual(str(m), ' '.join(map(str, ([0.200,0.000,0.800]))))
#label only
m = MagePoint(Label='abc')
self.assertEqual(str(m), '{abc} '+' '.join(map(str,[0,0,0])))
#all fields occupied
m = MagePoint(Label='abc', Coordinates=[3, 6, 1.5], Radius=0.5, \
Width=2, State='P', Color='green')
self.assertEqual(str(m), \
'{abc} P green width2 r=0.5 3 6 1.5')
def test_cmp(self):
"""MagePoint cmp should compare all fields"""
self.assertEqual(MagePoint([0,0,0]), MagePoint([0,0,0]))
self.assertNotEqual(MagePoint([0,0,0]), MagePoint([0,0,0], Color='red'))
def test_get_coord(self):
"""MagePoint _get_coord should return coordinate that is asked for"""
m = MagePoint([0,1,2])
self.assertEqual(m.X,m.Coordinates[0])
self.assertEqual(m.Y,m.Coordinates[1])
self.assertEqual(m.Z,m.Coordinates[2])
m = MagePoint()
self.assertEqual(m.X,m.Coordinates[0])
self.assertEqual(m.Y,m.Coordinates[1])
self.assertEqual(m.Z,m.Coordinates[2])
def test_set_coord(self):
"""MagePoint _get_coord should return coordinate that is asked for"""
m = MagePoint([0,1,2])
m.X, m.Y, m.Z = 2,3,4
self.assertEqual(m.Coordinates,[2,3,4])
m = MagePoint()
m.X, m.Y, m.Z = 5,4,3
self.assertEqual(m.Coordinates,[5,4,3])
m = MagePoint()
m.X = 5
self.assertEqual(m.Coordinates,[5,0,0])
def test_toCartesian(self):
"""MagePoint toCartesian() should transform coordinates correctly"""
m = MagePoint([.1,.2,.3])
self.assertEqual(m.toCartesian().Coordinates,[.6,.7,.5])
m = MagePoint()
self.assertEqual(m.toCartesian().Coordinates,[1,1,1])
m = MagePoint([.25,.25,.25],Color='red',Label='label',State='L')
self.assertEqual(m.toCartesian().Coordinates,[.5,.5,.5])
self.assertEqual(m.toCartesian().Color,m.Color)
self.assertEqual(m.toCartesian().Label,m.Label)
self.assertEqual(m.toCartesian().State,m.State)
m = MagePoint([1/3.0,1/3.0,0])
self.assertFloatEqual(m.toCartesian().Coordinates,
[2/3.0,1/3.0,2/3.0])
m = MagePoint([1/3.0,1/3.0,1/3.0])
self.assertFloatEqual(m.toCartesian().Coordinates,
[1/3.0,1/3.0,1/3.0])
m = MagePoint([3,4,5])
self.assertRaises(ValueError,m.toCartesian)
def test_fromCartesian(self):
"""MagePoint fromCartesian should transform coordinates correctly"""
mp = MagePoint([2/3.0,1/3.0,2/3.0])
self.assertFloatEqual(mp.fromCartesian().Coordinates,[1/3.0,1/3.0,0])
points = [MagePoint([.1,.2,.3]),MagePoint([.25,.25,.25],Color='red',
Label='label',State='L'),MagePoint([1/3,1/3,0]),
MagePoint([0,0,0]),MagePoint([1/7,2/7,3/7])]
for m in points:
b = m.toCartesian().fromCartesian()
self.assertFloatEqual(m.Coordinates,b.Coordinates)
self.assertEqual(m.Color,b.Color)
self.assertEqual(m.Label,b.Label)
self.assertEqual(m.State,b.State)
#even after multiple iterations?
mutant = deepcopy(m)
for x in range(10):
mutant = mutant.toCartesian().fromCartesian()
self.assertFloatEqual(m.Coordinates,mutant.Coordinates)
class freqs_label(dict):
"""dict with Label and Id, for testing MagePointFromBaseFreqs"""
def __init__(self, Label, Id, freqs):
self.Label = Label
self.Id = Id
self.update(freqs)
class freqs_display(dict):
"""dict with display properties, for testing MagePointFromBaseFreqs"""
def __init__(self, Color, Radius, Id, freqs):
self.Color = Color
self.Radius = Radius
self.Id = Id
self.update(freqs)
class MagePointFromBaseFreqsTests(TestCase):
"""Tests of the MagePointFromBaseFreqs factory function."""
def setUp(self):
"""Define a few standard frequencies"""
self.empty = freqs_label(None, None, {})
self.dna = freqs_label('dna', None, {'A':4, 'T':1, 'G':2, 'C':3})
self.rna = freqs_label(None, 'rna', {'U':2, 'A':1, 'G':2})
self.display = freqs_display('green', '0.25', 'xxx', {'A':2})
def test_MagePointFromBaseFreqs(self):
"""MagePoint should fill itself from base freqs correctly"""
e = MagePointFromBaseFreqs(self.empty)
self.assertEqual(str(e), '0.0 0.0 0.0')
dna = MagePointFromBaseFreqs(self.dna)
self.assertEqual(str(dna), '{dna} 0.4 0.3 0.2')
rna = MagePointFromBaseFreqs(self.rna)
self.assertEqual(str(rna), '{rna} 0.2 0.0 0.4')
display = MagePointFromBaseFreqs(self.display)
self.assertEqual(str(display), \
'{xxx} green r=0.25 1.0 0.0 0.0')
def test_MagePointFromBaseFreqs_usage(self):
"""MagePoint should init correctly from base freqs"""
class fake_seq(str, Delegator):
def __new__(cls, data, *args):
return str.__new__(cls, data)
def __init__(self, data, *args):
Delegator.__init__(self, *args)
self.__dict__['Info'] = self._handler
str.__init__(data)
class has_species(object):
def __init__(self, sp):
self.Species = sp
s = fake_seq('AAAAACCCTG', has_species('Homo sapiens'))
b = BaseUsage(s)
p = MagePointFromBaseFreqs(b)
self.assertEqual(str(p), '{Homo sapiens} 0.5 0.3 0.1')
def test_MagePointFromBaseFreqs_functions(self):
"""MagePointFromBaseFreqs should apply functions correctly"""
def set_color(x):
if x.Label == 'dna':
return 'green'
else:
return 'blue'
def set_radius(x):
if x.Label == 'dna':
return 0.25
else:
return 0.5
def set_label(x):
if x.Id is not None:
return 'xxx'
else:
return 'yyy'
self.assertEqual(str(MagePointFromBaseFreqs(self.dna,
get_label=set_label)),
'{yyy} 0.4 0.3 0.2')
self.assertEqual(str(MagePointFromBaseFreqs(self.rna,
get_label=set_label)),
'{xxx} 0.2 0.0 0.4')
self.assertEqual(str(MagePointFromBaseFreqs(self.dna,
get_radius=set_radius)),
'{dna} r=0.25 0.4 0.3 0.2')
self.assertEqual(str(MagePointFromBaseFreqs(self.rna,
get_radius=set_radius)),
'{rna} r=0.5 0.2 0.0 0.4')
self.assertEqual(str(MagePointFromBaseFreqs(self.dna,
get_color=set_color)),
'{dna} green 0.4 0.3 0.2')
self.assertEqual(str(MagePointFromBaseFreqs(self.rna,
get_color=set_color)),
'{rna} blue 0.2 0.0 0.4')
self.assertEqual(str(MagePointFromBaseFreqs(self.rna,
get_label=set_label, get_radius=set_radius,get_color=set_color)),
'{xxx} blue r=0.5 0.2 0.0 0.4')
class MageListTests(TestCase):
"""Tests of the MageList class, holding a collection of points."""
def setUp(self):
"""Define a few standard points and lists of points."""
self.null = MagePoint([0,0,0])
self.label = MagePoint([1, 1, 1], 'test')
self.properties = MagePoint(Width=1, Label='point', State='L',\
Color='blue', Coordinates=[2.0,4.0,6.0])
self.radius1 = MagePoint([2,2,2],Radius=.1)
self.radius2 = MagePoint([3,3,3],Radius=.5)
self.first_list = [self.null, self.properties]
self.empty_list = []
self.minimal_list = [self.null]
self.single_list = [self.label]
self.multi_list = [self.properties] * 10
self.radii = [self.radius1,self.radius2]
def test_init_empty(self):
"""MageList should init correctly with no data"""
m = MageList()
self.assertEqual(str(m), "@dotlist")
m = MageList(self.empty_list)
self.assertEqual(str(m), "@dotlist")
def test_init(self):
"""MageList should init correctly with data"""
m = MageList(self.minimal_list)
self.assertEqual(str(m), "@dotlist\n" + str(self.null))
m = MageList(self.multi_list,'x',Off=True,Color='green',NoButton=True)
self.assertEqual(str(m), "@dotlist {x} off nobutton color=green\n" + \
'\n'.join(10 * [str(self.properties)]))
m = MageList(self.first_list,NoButton=True,Color='red', \
Style='vector', Radius=0.03, Width=3, Label='test')
self.assertEqual(str(m), "@vectorlist {test} nobutton color=red " + \
"radius=0.03 width=3\n" + str(self.null) + '\n' + str(self.properties))
def test_toArray_radii(self):
"""MageList toArray should return the correct array"""
m = MageList(self.empty_list)
self.assertEqual(m.toArray(),array(()))
m = MageList(self.first_list,Radius=.3)
self.assertEqual(m.toArray(),array([[0,0,0,0.3],[2.0,4.0,6.0,0.3]]))
m = MageList(self.radii)
self.assertEqual(m.toArray(), array([[2,2,2,.1],[3,3,3,.5]]))
m = MageList(self.radii,Radius=.4)
self.assertEqual(m.toArray(), array([[2,2,2,.1],[3,3,3,.5]]))
m = MageList(self.single_list) #radius = None
self.assertRaises(ValueError,m.toArray)
def test_toArray_coords_only(self):
"""MageList toArray should return the correct array"""
m = MageList(self.empty_list)
self.assertEqual(m.toArray(include_radius=False),array(()))
m = MageList(self.first_list,Radius=.3)
self.assertEqual(m.toArray(include_radius=False),
array([[0,0,0],[2.0,4.0,6.0]]))
m = MageList(self.radii)
self.assertEqual(m.toArray(include_radius=False),
array([[2,2,2],[3,3,3]]))
m = MageList(self.radii,Radius=.4)
self.assertEqual(m.toArray(include_radius=False),
array([[2,2,2],[3,3,3]]))
m = MageList(self.single_list) #radius = None
self.assertEqual(m.toArray(include_radius=False),array([[1,1,1]]))
def test_iterPoints(self):
"""MageList iterPoints should yield all points in self"""
m = MageList(self.single_list)
for point in m.iterPoints():
assert isinstance(point,MagePoint)
self.assertEqual(len(list(m.iterPoints())),1)
m = MageList(self.multi_list)
for point in m.iterPoints():
assert isinstance(point,MagePoint)
self.assertEqual(len(list(m.iterPoints())),10)
def test_toCartesian(self):
"""MageList toCartesian should return new list"""
m = MageList([self.null],Color='green')
res = m.toCartesian()
self.assertEqual(len(m), len(res))
self.assertEqual(m.Color,res.Color)
self.assertEqual(res[0].Coordinates,[1,1,1])
m.Color='red'
self.assertEqual(res.Color,'green')
m = MageList([self.properties])
self.assertRaises(ValueError,m.toCartesian)
def test_fromCartesian(self):
"""MageList fromCartesian() should return new list with ACG coordinates
"""
point = MagePoint([.1,.2,.3])
m = MageList([point]*5,Color='green')
res = m.toCartesian().fromCartesian()
self.assertEqual(str(m),str(res))
class MageGroupTests(TestCase):
"""Test cases for the MageGroup class."""
def setUp(self):
"""Define some standard lists and groups."""
self.p1 = MagePoint([0, 1, 0], Color='green', Label='x')
self.p0 = MagePoint([0,0,0])
self.min_list = MageList([self.p0]*2,'y')
self.max_list = MageList([self.p1]*5,'z',Color='blue',Off=True, \
Style='ball')
self.min_group = MageGroup([self.min_list], Label="min_group")
self.max_group = MageGroup([self.min_list, self.max_list], Color='red',
Label="max_group", Style='dot')
self.nested = MageGroup([self.min_group, self.max_group], Label='nest',
Color='orange', Radius=0.3, Style='vector')
self.empty = MageGroup(Label='empty',Color='orange', NoButton=True,
Style='vector',RecessiveOn=False)
def test_init(self):
"""Nested MageGroups should set subgroup and cascades correctly."""
exp_lines = [
'@group {nest} recessiveon',
'@subgroup {min_group} recessiveon',
'@vectorlist {y} color=orange radius=0.3',
str(self.p0),
str(self.p0),
'@subgroup {max_group} recessiveon',
'@dotlist {y} color=red radius=0.3',
str(self.p0),
str(self.p0),
'@balllist {z} off color=blue radius=0.3',
str(self.p1),
str(self.p1),
str(self.p1),
str(self.p1),
str(self.p1),
]
s = str(self.nested).split('\n')
self.assertEqual(str(self.nested), '\n'.join(exp_lines))
#check that resetting the cascaded values works OK
nested = self.nested
str(nested)
self.assertEqual(nested,self.nested)
self.assertEqual(nested[0][0].Color,None)
def test_str(self):
"""MageGroup str should print correctly"""
m = self.empty
self.assertEqual(str(self.empty),'@group {empty} nobutton')
m = MageGroup(Label='label',Clone='clone_name',Off=True)
self.assertEqual(str(m),
'@group {label} off recessiveon clone={clone_name}')
m = MageGroup()
self.assertEqual(str(m),'@group recessiveon')
def test_iterGroups(self):
"""MageGroup iterGroups should behave as expected"""
groups = list(self.nested.iterGroups())
self.assertEqual(groups[0],self.min_group)
self.assertEqual(groups[1],self.max_group)
self.assertEqual(len(groups),2)
def test_iterLists(self):
"""MageGroup iterLists should behave as expected"""
lists = list(self.nested.iterLists())
self.assertEqual(len(lists),3)
self.assertEqual(lists[0],self.min_list)
self.assertEqual(lists[1],self.min_list)
self.assertEqual(lists[2],self.max_list)
def test_iterGroupsAndLists(self):
"""MageGroup iterGroupsAndLists should behave as expected"""
all = list(self.nested.iterGroupsAndLists())
self.assertEqual(len(all),5)
self.assertEqual(all[0],self.min_group)
self.assertEqual(all[4],self.max_list)
def test_iterPoints(self):
"""MageGroup iterPoints should behave as expected"""
points = list(self.nested.iterPoints())
self.assertEqual(len(points),9)
self.assertEqual(points[1],self.p0)
self.assertEqual(points[6],self.p1)
def test_toCartesian(self):
"""MageGroup toCartesian should return a new MageGroup"""
m = self.nested
res = m.toCartesian()
self.assertEqual(len(m),len(res))
self.assertEqual(m.RecessiveOn,res.RecessiveOn)
self.assertEqual(m[1][1].Color, res[1][1].Color)
self.assertEqual(res[1][1][1].Coordinates,[1,0,0])
def test_fromCartesian(self):
"""MageGroup fromCartesian should return a new MageGroup"""
point = MagePoint([.1,.2,.3])
l = MageList([point]*5,Color='red')
m = MageGroup([l],Radius=0.02,Subgroup=True)
mg = MageGroup([m])
res = mg.toCartesian().fromCartesian()
self.assertEqual(str(mg),str(res))
class MageHeaderTests(TestCase):
"""Tests of the MageHeader class.
For now, MageHeader does nothing, so just verify that it gets the string.
"""
def test_init(self):
"""MageHeader should keep the string it was initialized with."""
m = MageHeader('@perspective')
self.assertEqual(str(m), '@perspective')
class KinemageTests(TestCase):
"""Tests of the overall Kinemage class."""
def setUp(self):
self.point = MagePoint([0,0,0],'x')
self.ml = MageList([self.point], Label='y',Color='green')
self.mg1 = MageGroup([self.ml],Label='z')
self.mg2 = MageGroup([self.ml,self.ml],Label='b')
self.kin = Kinemage(1)
self.kin.Groups = [self.mg1,self.mg2]
def test_init_empty(self):
"""Kinemage empty init should work, but refuse to print"""
k = Kinemage()
self.assertEqual(k.Count, None)
self.assertRaises(ValueError, k.__str__)
def test_init(self):
"""Kinemage should init with any of its usual fields"""
k = Kinemage(1)
self.assertEqual(str(k), '@kinemage 1')
k.Header = '@perspective'
self.assertEqual(str(k), '@kinemage 1\n@perspective')
k.Count = 2
self.assertEqual(str(k), '@kinemage 2\n@perspective')
k.Header = ''
k.Caption = 'test caption'
self.assertEqual(str(k), '@kinemage 2\n@caption\ntest caption')
k.Caption = None
k.Text = 'some text'
self.assertEqual(str(k), '@kinemage 2\n@text\nsome text')
k.Groups = [self.mg1]
k.Header = '@test_header'
k.Caption = 'This is\nThe caption'
k.Text = 'some text here'
self.assertEqual(str(k), '@kinemage 2\n@test_header\n@text\n' +\
'some text here\n' + \
'@caption\nThis is\nThe caption\n@group {z} recessiveon\n' + \
'@dotlist {y} color=green\n{x} 0 0 0')
def test_iterGroups(self):
"""Kinemage iterGroups should behave as expected"""
k = self.kin
groups = list(k.iterGroups())
self.assertEqual(len(groups),2)
self.assertEqual(groups[0],self.mg1)
self.assertEqual(groups[1],self.mg2)
def test_iterLists(self):
"""Kinemage iterLists should behave as expected"""
k = self.kin
lists = list(k.iterLists())
self.assertEqual(len(lists),3)
self.assertEqual(lists[0],self.ml)
def test_iterPoints(self):
"""Kinemage iterPoints should behave as expected"""
k = self.kin
points = list(k.iterPoints())
self.assertEqual(len(points),3)
self.assertEqual(points[0],self.point)
def test_iterGroupAndLists(self):
"""Kinemage iterGroupsAndLists should behave as expected"""
all = list(self.kin.iterGroupsAndLists())
self.assertEqual(len(all),5)
self.assertEqual(all[0],self.mg1)
self.assertEqual(all[4],self.ml)
def test_toCartesian(self):
"""Kinemage toCartesian should return new Kinemage with UC,UG,UA coords
"""
k = self.kin
res = k.toCartesian()
self.assertEqual(len(k.Groups),len(res.Groups))
self.assertEqual(k.Text,res.Text)
self.assertEqual(k.Groups[1].RecessiveOn,res.Groups[1].RecessiveOn)
self.assertEqual(res.Groups[0][0][0].Coordinates,[1,1,1])
def test_fromCartesian(self):
"""Kinemage fromCartesian should return Kinemage with A,C,G(,U) coords
"""
point = MagePoint([.1,.2,.3])
l = MageList([point]*5,Color='red')
m1 = MageGroup([l],Radius=0.02,Subgroup=True)
m2 = MageGroup([l],Radius=0.02)
mg = MageGroup([m1])
k = Kinemage(Count=1,Groups=[mg,m2])
res = k.toCartesian().fromCartesian()
self.assertEqual(str(k),str(res))
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_format/test_pdb_color.py 000644 000765 000024 00000141751 12024702176 023317 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from __future__ import division
from cogent.util.unit_test import TestCase, main
from cogent.util.misc import app_path
from cogent.format.pdb_color import get_aligned_muscle, make_color_list, \
ungapped_to_pdb_numbers, get_matching_chains, get_chains, \
get_best_muscle_hits, chains_to_seqs, align_subject_to_pdb
__author__ = "Jeremy Widmann"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jeremy Widmann", "Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Jeremy Widmann"
__email__ = "jeremy.widmann@colorado.edu"
__status__ = "Production"
"""Tests of the pdb_color module.
Owner: Jeremy Widmann jeremy.widmann@colorado.edu
Revision History:
October 2006 Jeremy Widmann: File created
"""
MUSCLE_PATH = app_path('muscle')
class PdbColorTests(TestCase):
"""Tests for pdb_color functions.
"""
def setUp(self):
"""Setup for pdb_color tests."""
#Nucleotide test data results
self.test_pdb_chains_1 = {'A': [(1, 'G'), (2, 'C'), (3, 'C'), (4, 'A'),
(5, 'C'), (6, 'C'), (7, 'C'), (8, 'U'),
(9, 'G')],
'B': [(10, 'C'), (11, 'A'), (12, 'G'),
(13, 'G'), (14, 'G'), (15, 'U'),
(16, 'C'), (17, 'G'), (18, 'G'),
(19, 'C')]}
self.ungapped_to_pdb_1 = {'A': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6,
6: 7, 7: 8, 8: 9},
'B': {0: 10, 1: 11, 2: 12, 3: 13, 4: 14,
5: 15, 6: 16, 7: 17, 8: 18, 9: 19}}
self.test_pdb_seqs_1 = {'A': 'GCCACCCUG', 'B': 'CAGGGUCGGC'}
self.test_pdb_types_1 = {'A': 'Nucleotide', 'B': 'Nucleotide'}
#Protein test data results
self.test_pdb_chains_2 = {'A': [(1, 'ALA'), (2, 'PRO'), (3, 'ILE'),
(4, 'LYS'), (5, 'VAL'), (6, 'GLY'),
(7, 'ASP'), (8, 'ALA')]}
self.ungapped_to_pdb_2 = {'A': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6,
6: 7, 7: 8}}
self.test_pdb_seqs_2 = {'A': 'APIKVGDA'}
self.test_pdb_types_2 = {'A': 'Protein'}
def test_get_aligned_muscle(self):
"""Tests for get_aligned_muscle function.
"""
if not MUSCLE_PATH:
return 'skipping test'
seq1 = 'ACCUG'
seq2 = 'ACGGUG'
seq1_aligned_known = 'AC-CUG'
seq2_aligned_known = 'ACGGUG'
frac_same_known = 4/5.0
seq1_aln, seq2_aln, frac_same = get_aligned_muscle(seq1,seq2)
self.assertEqual(seq1_aln,seq1_aligned_known)
self.assertEqual(seq2_aln,seq2_aligned_known)
self.assertEqual(frac_same,frac_same_known)
def test_get_chains_nucleotide(self):
"""Tests for get_chains function using nucleotide pdb lines.
"""
chains_nuc = get_chains(TEST_PDB_STRING_1.split('\n'))
self.assertEqual(chains_nuc, self.test_pdb_chains_1)
def test_get_chains_protein(self):
"""Tests for get_chains function using protein pdb lines.
"""
chains_prot = get_chains(TEST_PDB_STRING_2.split('\n'))
self.assertEqual(chains_prot, self.test_pdb_chains_2)
def test_ungapped_to_pdb_nucleotide(self):
"""Tests for ungapped_to_pdb function using nucleotide pdb chains.
"""
for k,v in self.test_pdb_chains_1.items():
self.assertEqual(ungapped_to_pdb_numbers(v),\
self.ungapped_to_pdb_1[k])
def test_ungapped_to_pdb_protein(self):
"""Tests for ungapped_to_pdb function using protein pdb chains.
"""
for k,v in self.test_pdb_chains_2.items():
self.assertEqual(ungapped_to_pdb_numbers(v),\
self.ungapped_to_pdb_2[k])
def test_chains_to_seqs_nucleotide(self):
"""Tests for chains_to_seqs function using nucleotide pdb chains.
"""
seqs, seqtypes = chains_to_seqs(self.test_pdb_chains_1)
self.assertEqual(seqs, self.test_pdb_seqs_1)
self.assertEqual(seqtypes, self.test_pdb_types_1)
def test_chains_to_seqs_protein(self):
"""Tests for chains_to_seqs function using protein pdb chains.
"""
seqs, seqtypes = chains_to_seqs(self.test_pdb_chains_2)
self.assertEqual(seqs, self.test_pdb_seqs_2)
self.assertEqual(seqtypes, self.test_pdb_types_2)
def test_get_best_muscle_hits(self):
"""Tests for get_best_muscle_hits function.
"""
if not MUSCLE_PATH:
return 'skipping test'
subject_seq = 'AACCGGUU'
query_aln = {1:'CCCCCCCC',
2:'GGGGGGGG',
3:'AAGGGGUU',
4:'AACCGGGU'}
res_20 = {1:'CCCCCCCC',
2:'GGGGGGGG',
3:'AAGGGGUU',
4:'AACCGGGU'}
res_50 = {3:'AAGGGGUU',
4:'AACCGGGU'}
res_80 = {4:'AACCGGGU'}
res_100 = {}
self.assertEqual(get_best_muscle_hits(subject_seq,query_aln,.2),res_20)
self.assertEqual(get_best_muscle_hits(subject_seq,query_aln,.5),res_50)
self.assertEqual(get_best_muscle_hits(subject_seq,query_aln,.8),res_80)
self.assertEqual(get_best_muscle_hits(subject_seq,query_aln,1),res_100)
def test_get_matching_chains(self):
"""Tests for get_matching_chains function.
"""
if not MUSCLE_PATH:
return 'skipping test'
subject_seq = 'GCGACCCUG'
res_30 = {'A': 'GCCACCCUG', 'B': 'CAGGGUCGGC'}
res_80 = {'A': 'GCCACCCUG'}
res_100 = {}
#Threshold of .3
test_30, ungapped_to_pdb = get_matching_chains(subject_seq, \
TEST_PDB_STRING_1.split('\n'),\
subject_type='Nucleotide',\
threshold=.3)
#Threshold of .8
test_80, ungapped_to_pdb = get_matching_chains(subject_seq, \
TEST_PDB_STRING_1.split('\n'),\
subject_type='Nucleotide',\
threshold=.8)
#Threshold of 1
test_100, ungapped_to_pdb = get_matching_chains(subject_seq, \
TEST_PDB_STRING_1.split('\n'),\
subject_type='Nucleotide',\
threshold=1)
#Incorrect subject_type
#Threshold of .3
test_wrong_subject, ungapped_to_pdb = get_matching_chains(subject_seq, \
TEST_PDB_STRING_1.split('\n'),\
subject_type='Protein',\
threshold=.3)
self.assertEqual(test_30,res_30)
self.assertEqual(test_80,res_80)
self.assertEqual(test_100,res_100)
self.assertEqual(test_wrong_subject,{})
def test_align_subject_to_pdb(self):
"""Tests for align_subject_to_pdb function.
"""
if not MUSCLE_PATH:
return 'skipping test'
subject_seq = 'GCGACCCUG'
pdb_matching = {'A': 'GCCACCCUG', 'B': 'CAGGGUCGGC'}
result = {'A':('GCGACCCUG', 'GCCACCCUG'), \
'B':('GCGACCCUG-', 'CAGGGUCGGC')}
self.assertEqual(align_subject_to_pdb(subject_seq,pdb_matching),result)
def test_make_color_list(self):
"""Tests for make_color_list function.
"""
colors = [(1.0,1.0,1.0),(1.0,0.0,1.0),(.5,.5,.5)]
res = [('color_1',(1.0,1.0,1.0)),\
('color_2',(1.0,0.0,1.0)),\
('color_3',(.5,.5,.5))]
self.assertEqual(make_color_list(colors),res)
TEST_PDB_STRING_1 = """
HEADER RIBONUCLEIC ACID 04-JAN-00 1DQH
TITLE CRYSTAL STRUCTURE OF HELIX II OF THE X. LAEVIS SOMATIC 5S
TITLE 2 RRNA WITH A CYTOSINE BULGE IN TWO CONFORMATIONS
CRYST1 32.780 32.780 102.500 90.00 90.00 90.00 P 43 21 2 8
ATOM 1 O5* G A 1 38.612 13.536 39.204 1.00 37.41 O
ATOM 2 C5* G A 1 39.496 12.419 39.356 1.00 34.43 C
ATOM 3 C4* G A 1 38.750 11.165 39.729 1.00 33.33 C
ATOM 4 O4* G A 1 38.129 11.341 41.036 1.00 33.01 O
ATOM 5 C3* G A 1 37.614 10.780 38.799 1.00 33.57 C
ATOM 6 O3* G A 1 38.110 9.968 37.740 1.00 34.48 O
ATOM 7 C2* G A 1 36.722 9.952 39.715 1.00 32.60 C
ATOM 8 O2* G A 1 37.155 8.606 39.837 1.00 31.59 O
ATOM 9 C1* G A 1 36.871 10.693 41.048 1.00 31.18 C
ATOM 10 N9 G A 1 35.863 11.709 41.312 1.00 29.41 N
ATOM 11 C8 G A 1 36.087 13.049 41.490 1.00 28.46 C
ATOM 12 N7 G A 1 34.996 13.725 41.725 1.00 28.77 N
ATOM 13 C5 G A 1 33.986 12.775 41.711 1.00 28.42 C
ATOM 14 C6 G A 1 32.586 12.918 41.929 1.00 28.30 C
ATOM 15 O6 G A 1 31.954 13.943 42.190 1.00 27.86 O
ATOM 16 N1 G A 1 31.921 11.699 41.818 1.00 27.81 N
ATOM 17 C2 G A 1 32.533 10.496 41.534 1.00 27.95 C
ATOM 18 N2 G A 1 31.731 9.428 41.426 1.00 27.06 N
ATOM 19 N3 G A 1 33.844 10.357 41.352 1.00 28.40 N
ATOM 20 C4 G A 1 34.498 11.525 41.450 1.00 28.58 C
ATOM 21 P C A 2 37.584 10.166 36.231 1.00 35.40 P
ATOM 22 O1P C A 2 38.502 9.371 35.370 1.00 37.17 O
ATOM 23 O2P C A 2 37.371 11.599 35.951 1.00 34.58 O
ATOM 24 O5* C A 2 36.188 9.406 36.193 1.00 33.54 O
ATOM 25 C5* C A 2 36.118 7.997 36.285 1.00 32.07 C
ATOM 26 C4* C A 2 34.683 7.567 36.457 1.00 30.69 C
ATOM 27 O4* C A 2 34.156 8.055 37.727 1.00 29.94 O
ATOM 28 C3* C A 2 33.728 8.137 35.431 1.00 30.66 C
ATOM 29 O3* C A 2 33.789 7.366 34.249 1.00 31.59 O
ATOM 30 C2* C A 2 32.391 7.983 36.148 1.00 28.79 C
ATOM 31 O2* C A 2 31.937 6.639 36.197 1.00 29.01 O
ATOM 32 C1* C A 2 32.779 8.394 37.563 1.00 27.55 C
ATOM 33 N1 C A 2 32.625 9.836 37.823 1.00 25.64 N
ATOM 34 C2 C A 2 31.353 10.331 38.113 1.00 24.78 C
ATOM 35 O2 C A 2 30.382 9.563 38.052 1.00 23.25 O
ATOM 36 N3 C A 2 31.210 11.641 38.434 1.00 24.08 N
ATOM 37 C4 C A 2 32.265 12.446 38.446 1.00 23.49 C
ATOM 38 N4 C A 2 32.070 13.718 38.781 1.00 22.55 N
ATOM 39 C5 C A 2 33.573 11.981 38.110 1.00 24.21 C
ATOM 40 C6 C A 2 33.702 10.676 37.806 1.00 24.76 C
ATOM 41 P C A 3 33.724 8.080 32.829 1.00 33.04 P
ATOM 42 O1P C A 3 34.035 6.997 31.842 1.00 33.90 O
ATOM 43 O2P C A 3 34.513 9.331 32.832 1.00 32.17 O
ATOM 44 O5* C A 3 32.184 8.423 32.636 1.00 31.52 O
ATOM 45 C5* C A 3 31.244 7.380 32.516 1.00 29.58 C
ATOM 46 C4* C A 3 29.856 7.885 32.841 1.00 27.91 C
ATOM 47 O4* C A 3 29.809 8.390 34.201 1.00 26.77 O
ATOM 48 C3* C A 3 29.352 9.042 32.005 1.00 28.23 C
ATOM 49 O3* C A 3 28.917 8.548 30.744 1.00 30.13 O
ATOM 50 C2* C A 3 28.212 9.549 32.887 1.00 26.35 C
ATOM 51 O2* C A 3 27.079 8.690 32.854 1.00 23.86 O
ATOM 52 C1* C A 3 28.855 9.454 34.273 1.00 25.79 C
ATOM 53 N1 C A 3 29.575 10.688 34.648 1.00 24.80 N
ATOM 54 C2 C A 3 28.832 11.781 35.132 1.00 24.14 C
ATOM 55 O2 C A 3 27.577 11.677 35.209 1.00 23.22 O
ATOM 56 N3 C A 3 29.482 12.906 35.489 1.00 23.57 N
ATOM 57 C4 C A 3 30.820 12.983 35.357 1.00 23.98 C
ATOM 58 N4 C A 3 31.421 14.109 35.709 1.00 23.48 N
ATOM 59 C5 C A 3 31.596 11.888 34.855 1.00 24.07 C
ATOM 60 C6 C A 3 30.938 10.773 34.525 1.00 24.08 C
ATOM 61 P A A 4 28.825 9.524 29.474 1.00 32.73 P
ATOM 62 O1P A A 4 28.402 8.666 28.346 1.00 34.23 O
ATOM 63 O2P A A 4 30.032 10.358 29.329 1.00 32.79 O
ATOM 64 O5* A A 4 27.664 10.569 29.796 1.00 32.19 O
ATOM 65 C5* A A 4 26.305 10.158 29.955 1.00 30.34 C
ATOM 66 C4* A A 4 25.469 11.306 30.483 1.00 29.48 C
ATOM 67 O4* A A 4 25.962 11.694 31.789 1.00 27.46 O
ATOM 68 C3* A A 4 25.495 12.610 29.694 1.00 29.92 C
ATOM 69 O3* A A 4 24.603 12.575 28.598 1.00 32.61 O
ATOM 70 C2* A A 4 25.037 13.609 30.748 1.00 28.81 C
ATOM 71 O2* A A 4 23.640 13.574 30.987 1.00 29.76 O
ATOM 72 C1* A A 4 25.761 13.085 31.989 1.00 26.51 C
ATOM 73 N9 A A 4 27.073 13.686 32.195 1.00 25.35 N
ATOM 74 C8 A A 4 28.277 13.137 31.857 1.00 24.29 C
ATOM 75 N7 A A 4 29.306 13.870 32.184 1.00 23.88 N
ATOM 76 C5 A A 4 28.741 14.980 32.793 1.00 24.49 C
ATOM 77 C6 A A 4 29.305 16.106 33.385 1.00 23.89 C
ATOM 78 N6 A A 4 30.621 16.294 33.487 1.00 23.21 N
ATOM 79 N1 A A 4 28.468 17.036 33.883 1.00 23.83 N
ATOM 80 C2 A A 4 27.145 16.832 33.779 1.00 24.11 C
ATOM 81 N3 A A 4 26.497 15.805 33.254 1.00 23.77 N
ATOM 82 C4 A A 4 27.365 14.897 32.777 1.00 24.53 C
ATOM 83 P C A 5 24.910 13.447 27.286 1.00 35.53 P
ATOM 84 O1P C A 5 23.998 12.939 26.228 1.00 37.30 O
ATOM 85 O2P C A 5 26.366 13.448 27.050 1.00 36.57 O
ATOM 86 O5* C A 5 24.512 14.941 27.678 1.00 34.43 O
ATOM 87 C5* C A 5 23.191 15.257 28.063 1.00 33.62 C
ATOM 88 C4* C A 5 23.144 16.642 28.633 1.00 33.30 C
ATOM 89 O4* C A 5 23.865 16.674 29.889 1.00 32.25 O
ATOM 90 C3* C A 5 23.834 17.712 27.809 1.00 33.29 C
ATOM 91 O3* C A 5 23.016 18.137 26.731 1.00 35.25 O
ATOM 92 C2* C A 5 24.027 18.781 28.860 1.00 32.42 C
ATOM 93 O2* C A 5 22.811 19.399 29.229 1.00 32.98 O
ATOM 94 C1* C A 5 24.494 17.930 30.040 1.00 31.25 C
ATOM 95 N1 C A 5 25.956 17.721 30.055 1.00 29.62 N
ATOM 96 C2 C A 5 26.739 18.692 30.649 1.00 28.72 C
ATOM 97 O2 C A 5 26.182 19.698 31.113 1.00 29.29 O
ATOM 98 N3 C A 5 28.074 18.527 30.707 1.00 28.00 N
ATOM 99 C4 C A 5 28.632 17.446 30.193 1.00 27.58 C
ATOM 100 N4 C A 5 29.955 17.322 30.311 1.00 27.48 N
ATOM 101 C5 C A 5 27.862 16.439 29.546 1.00 27.81 C
ATOM 102 C6 C A 5 26.532 16.614 29.508 1.00 28.20 C
ATOM 103 P C A 6 23.692 18.693 25.380 1.00 36.34 P
ATOM 104 O1P C A 6 22.530 18.995 24.495 1.00 37.64 O
ATOM 105 O2P C A 6 24.762 17.807 24.897 1.00 35.91 O
ATOM 106 O5* C A 6 24.353 20.071 25.823 1.00 34.59 O
ATOM 107 C5* C A 6 23.528 21.138 26.251 1.00 33.84 C
ATOM 108 C4* C A 6 24.359 22.254 26.818 1.00 33.26 C
ATOM 109 O4* C A 6 25.110 21.734 27.937 1.00 32.57 O
ATOM 110 C3* C A 6 25.439 22.842 25.933 1.00 33.36 C
ATOM 111 O3* C A 6 24.918 23.787 25.011 1.00 34.78 O
ATOM 112 C2* C A 6 26.314 23.522 26.974 1.00 32.29 C
ATOM 113 O2* C A 6 25.695 24.687 27.482 1.00 33.47 O
ATOM 114 C1* C A 6 26.331 22.447 28.060 1.00 31.49 C
ATOM 115 N1 C A 6 27.434 21.495 27.881 1.00 29.26 N
ATOM 116 C2 C A 6 28.720 21.876 28.301 1.00 29.08 C
ATOM 117 O2 C A 6 28.868 22.999 28.797 1.00 29.07 O
ATOM 118 N3 C A 6 29.752 21.012 28.149 1.00 28.04 N
ATOM 119 C4 C A 6 29.534 19.814 27.609 1.00 28.05 C
ATOM 120 N4 C A 6 30.564 18.975 27.495 1.00 28.92 N
ATOM 121 C5 C A 6 28.230 19.407 27.162 1.00 28.53 C
ATOM 122 C6 C A 6 27.226 20.270 27.318 1.00 28.47 C
ATOM 123 P C A 7 25.633 23.980 23.581 1.00 34.67 P
ATOM 124 O1P C A 7 24.754 24.860 22.776 1.00 34.93 O
ATOM 125 O2P C A 7 26.052 22.667 23.033 1.00 33.74 O
ATOM 126 O5* C A 7 26.947 24.804 23.919 1.00 33.62 O
ATOM 127 C5* C A 7 26.834 26.089 24.487 1.00 32.39 C
ATOM 128 C4* C A 7 28.195 26.619 24.841 1.00 32.43 C
ATOM 129 O4* C A 7 28.796 25.780 25.861 1.00 31.40 O
ATOM 130 C3* C A 7 29.216 26.582 23.727 1.00 32.59 C
ATOM 131 O3* C A 7 29.039 27.661 22.832 1.00 33.34 O
ATOM 132 C2* C A 7 30.510 26.706 24.503 1.00 30.81 C
ATOM 133 O2* C A 7 30.730 28.021 24.964 1.00 31.51 O
ATOM 134 C1* C A 7 30.207 25.796 25.698 1.00 30.43 C
ATOM 135 N1 C A 7 30.670 24.427 25.465 1.00 28.25 N
ATOM 136 C2 C A 7 32.021 24.155 25.622 1.00 27.81 C
ATOM 137 O2 C A 7 32.798 25.101 25.915 1.00 26.72 O
ATOM 138 N3 C A 7 32.458 22.883 25.457 1.00 26.97 N
ATOM 139 C4 C A 7 31.591 21.912 25.152 1.00 26.73 C
ATOM 140 N4 C A 7 32.056 20.670 25.049 1.00 25.90 N
ATOM 141 C5 C A 7 30.206 22.176 24.955 1.00 26.90 C
ATOM 142 C6 C A 7 29.796 23.434 25.113 1.00 28.38 C
ATOM 143 P U A 8 29.356 27.448 21.284 1.00 34.70 P
ATOM 144 O1P U A 8 28.951 28.708 20.571 1.00 35.62 O
ATOM 145 O2P U A 8 28.821 26.145 20.836 1.00 34.46 O
ATOM 146 O5* U A 8 30.942 27.354 21.220 1.00 32.95 O
ATOM 147 C5* U A 8 31.739 28.423 21.710 1.00 32.04 C
ATOM 148 C4* U A 8 33.182 28.009 21.784 1.00 32.28 C
ATOM 149 O4* U A 8 33.347 26.969 22.780 1.00 31.32 O
ATOM 150 C3* U A 8 33.778 27.374 20.544 1.00 32.35 C
ATOM 151 O3* U A 8 34.101 28.349 19.563 1.00 34.03 O
ATOM 152 C2* U A 8 35.026 26.728 21.124 1.00 31.88 C
ATOM 153 O2* U A 8 36.029 27.682 21.404 1.00 31.73 O
ATOM 154 C1* U A 8 34.481 26.181 22.445 1.00 31.32 C
ATOM 155 N1 U A 8 34.022 24.800 22.296 1.00 30.13 N
ATOM 156 C2 U A 8 34.954 23.806 22.406 1.00 29.56 C
ATOM 157 O2 U A 8 36.151 24.027 22.587 1.00 30.22 O
ATOM 158 N3 U A 8 34.454 22.540 22.298 1.00 28.90 N
ATOM 159 C4 U A 8 33.147 22.177 22.079 1.00 28.56 C
ATOM 160 O4 U A 8 32.873 20.991 21.974 1.00 28.38 O
ATOM 161 C5 U A 8 32.230 23.269 21.949 1.00 29.20 C
ATOM 162 C6 U A 8 32.691 24.519 22.059 1.00 29.91 C
ATOM 163 P G A 9 34.056 27.954 18.027 1.00 35.48 P
ATOM 164 O1P G A 9 34.253 29.197 17.239 1.00 36.23 O
ATOM 165 O2P G A 9 32.850 27.108 17.775 1.00 36.21 O
ATOM 166 O5* G A 9 35.348 27.048 17.835 1.00 32.95 O
ATOM 167 C5* G A 9 36.637 27.612 18.017 1.00 31.54 C
ATOM 168 C4* G A 9 37.696 26.545 17.935 1.00 29.88 C
ATOM 169 O4* G A 9 37.533 25.628 19.041 1.00 29.65 O
ATOM 170 C3* G A 9 37.696 25.633 16.719 1.00 29.94 C
ATOM 171 O3* G A 9 38.321 26.196 15.566 1.00 30.72 O
ATOM 172 C2* G A 9 38.513 24.452 17.217 1.00 29.09 C
ATOM 173 O2* G A 9 39.906 24.698 17.166 1.00 30.51 O
ATOM 174 C1* G A 9 38.035 24.354 18.672 1.00 28.52 C
ATOM 175 N9 G A 9 36.951 23.383 18.827 1.00 27.70 N
ATOM 176 C8 G A 9 35.600 23.640 18.864 1.00 27.66 C
ATOM 177 N7 G A 9 34.873 22.551 18.981 1.00 27.66 N
ATOM 178 C5 G A 9 35.807 21.518 19.032 1.00 27.08 C
ATOM 179 C6 G A 9 35.622 20.108 19.171 1.00 26.99 C
ATOM 180 O6 G A 9 34.557 19.479 19.295 1.00 27.32 O
ATOM 181 N1 G A 9 36.834 19.430 19.166 1.00 25.88 N
ATOM 182 C2 G A 9 38.067 20.024 19.072 1.00 26.62 C
ATOM 183 N2 G A 9 39.121 19.203 19.094 1.00 26.35 N
ATOM 184 N3 G A 9 38.253 21.334 18.960 1.00 26.19 N
ATOM 185 C4 G A 9 37.092 22.013 18.942 1.00 26.94 C
TER 186 G A 9
ATOM 187 O5* C B 10 37.876 10.866 21.876 1.00 38.18 O
ATOM 188 C5* C B 10 39.087 10.527 21.197 1.00 34.41 C
ATOM 189 C4* C B 10 39.746 11.780 20.669 1.00 34.90 C
ATOM 190 O4* C B 10 38.931 12.392 19.627 1.00 33.24 O
ATOM 191 C3* C B 10 39.904 12.927 21.657 1.00 34.06 C
ATOM 192 O3* C B 10 40.989 12.675 22.550 1.00 35.88 O
ATOM 193 C2* C B 10 40.214 14.067 20.695 1.00 32.72 C
ATOM 194 O2* C B 10 41.499 13.919 20.131 1.00 33.32 O
ATOM 195 C1* C B 10 39.202 13.792 19.582 1.00 31.82 C
ATOM 196 N1 C B 10 37.945 14.536 19.750 1.00 30.40 N
ATOM 197 C2 C B 10 37.944 15.911 19.492 1.00 29.71 C
ATOM 198 O2 C B 10 39.014 16.463 19.179 1.00 29.35 O
ATOM 199 N3 C B 10 36.795 16.602 19.605 1.00 29.04 N
ATOM 200 C4 C B 10 35.677 15.980 19.976 1.00 29.04 C
ATOM 201 N4 C B 10 34.555 16.703 20.064 1.00 28.12 N
ATOM 202 C5 C B 10 35.656 14.584 20.270 1.00 29.17 C
ATOM 203 C6 C B 10 36.802 13.911 20.146 1.00 29.79 C
ATOM 204 P A B 11 40.901 13.173 24.071 1.00 38.04 P
ATOM 205 O1P A B 11 42.040 12.552 24.800 1.00 37.87 O
ATOM 206 O2P A B 11 39.521 13.016 24.582 1.00 36.68 O
ATOM 207 O5* A B 11 41.192 14.729 23.921 1.00 35.30 O
ATOM 208 C5* A B 11 42.473 15.172 23.503 1.00 33.64 C
ATOM 209 C4* A B 11 42.479 16.676 23.341 1.00 31.60 C
ATOM 210 O4* A B 11 41.587 17.051 22.255 1.00 30.20 O
ATOM 211 C3* A B 11 41.947 17.462 24.525 1.00 30.78 C
ATOM 212 O3* A B 11 42.937 17.594 25.538 1.00 30.99 O
ATOM 213 C2* A B 11 41.593 18.790 23.867 1.00 29.71 C
ATOM 214 O2* A B 11 42.736 19.561 23.571 1.00 29.00 O
ATOM 215 C1* A B 11 40.990 18.307 22.547 1.00 29.66 C
ATOM 216 N9 A B 11 39.547 18.127 22.638 1.00 28.32 N
ATOM 217 C8 A B 11 38.844 16.967 22.817 1.00 28.32 C
ATOM 218 N7 A B 11 37.548 17.134 22.849 1.00 28.39 N
ATOM 219 C5 A B 11 37.382 18.501 22.678 1.00 27.65 C
ATOM 220 C6 A B 11 36.236 19.324 22.601 1.00 27.42 C
ATOM 221 N6 A B 11 34.973 18.877 22.692 1.00 27.87 N
ATOM 222 N1 A B 11 36.433 20.651 22.423 1.00 26.83 N
ATOM 223 C2 A B 11 37.687 21.105 22.330 1.00 27.23 C
ATOM 224 N3 A B 11 38.837 20.433 22.378 1.00 27.13 N
ATOM 225 C4 A B 11 38.610 19.123 22.553 1.00 27.90 C
ATOM 226 P G B 12 42.493 17.715 27.072 1.00 32.74 P
ATOM 227 O1P G B 12 43.728 17.509 27.884 1.00 34.58 O
ATOM 228 O2P G B 12 41.277 16.922 27.381 1.00 32.04 O
ATOM 229 O5* G B 12 42.027 19.234 27.207 1.00 30.99 O
ATOM 230 C5* G B 12 42.962 20.289 27.056 1.00 30.37 C
ATOM 231 C4* G B 12 42.254 21.614 27.115 1.00 29.14 C
ATOM 232 O4* G B 12 41.457 21.794 25.913 1.00 28.24 O
ATOM 233 C3* G B 12 41.237 21.787 28.229 1.00 28.75 C
ATOM 234 O3* G B 12 41.830 22.073 29.500 1.00 30.19 O
ATOM 235 C2* G B 12 40.436 22.958 27.690 1.00 27.60 C
ATOM 236 O2* G B 12 41.166 24.177 27.802 1.00 27.01 O
ATOM 237 C1* G B 12 40.309 22.557 26.218 1.00 27.19 C
ATOM 238 N9 G B 12 39.139 21.706 26.021 1.00 26.10 N
ATOM 239 C8 G B 12 39.095 20.333 25.971 1.00 25.61 C
ATOM 240 N7 G B 12 37.878 19.869 25.854 1.00 25.39 N
ATOM 241 C5 G B 12 37.080 21.003 25.805 1.00 24.51 C
ATOM 242 C6 G B 12 35.677 21.130 25.709 1.00 25.12 C
ATOM 243 O6 G B 12 34.819 20.224 25.646 1.00 24.39 O
ATOM 244 N1 G B 12 35.282 22.463 25.710 1.00 23.14 N
ATOM 245 C2 G B 12 36.136 23.544 25.801 1.00 25.12 C
ATOM 246 N2 G B 12 35.566 24.781 25.809 1.00 23.32 N
ATOM 247 N3 G B 12 37.447 23.430 25.888 1.00 23.89 N
ATOM 248 C4 G B 12 37.847 22.143 25.892 1.00 24.92 C
ATOM 249 P G B 13 41.050 21.646 30.847 1.00 31.79 P
ATOM 250 O1P G B 13 41.913 21.970 32.013 1.00 33.13 O
ATOM 251 O2P G B 13 40.502 20.266 30.697 1.00 30.96 O
ATOM 252 O5* G B 13 39.754 22.570 30.921 1.00 30.06 O
ATOM 253 C5* G B 13 39.862 23.958 31.171 1.00 29.46 C
ATOM 254 C4* G B 13 38.543 24.649 30.914 1.00 28.17 C
ATOM 255 O4* G B 13 38.049 24.334 29.586 1.00 27.88 O
ATOM 256 C3* G B 13 37.356 24.322 31.810 1.00 27.17 C
ATOM 257 O3* G B 13 37.506 24.978 33.065 1.00 27.57 O
ATOM 258 C2* G B 13 36.240 24.941 30.983 1.00 27.19 C
ATOM 259 O2* G B 13 36.251 26.358 31.018 1.00 26.53 O
ATOM 260 C1* G B 13 36.629 24.474 29.573 1.00 26.87 C
ATOM 261 N9 G B 13 36.031 23.162 29.330 1.00 26.99 N
ATOM 262 C8 G B 13 36.660 21.949 29.286 1.00 25.89 C
ATOM 263 N7 G B 13 35.829 20.951 29.099 1.00 26.13 N
ATOM 264 C5 G B 13 34.583 21.549 28.997 1.00 25.76 C
ATOM 265 C6 G B 13 33.304 20.970 28.786 1.00 25.44 C
ATOM 266 O6 G B 13 33.020 19.783 28.643 1.00 26.51 O
ATOM 267 N1 G B 13 32.308 21.927 28.756 1.00 26.07 N
ATOM 268 C2 G B 13 32.506 23.276 28.917 1.00 26.27 C
ATOM 269 N2 G B 13 31.394 24.054 28.887 1.00 26.49 N
ATOM 270 N3 G B 13 33.702 23.829 29.104 1.00 25.76 N
ATOM 271 C4 G B 13 34.683 22.911 29.131 1.00 25.78 C
ATOM 272 P G B 14 36.688 24.480 34.364 1.00 28.04 P
ATOM 273 O1P G B 14 37.232 25.265 35.498 1.00 28.24 O
ATOM 274 O2P G B 14 36.700 23.014 34.418 1.00 27.78 O
ATOM 275 O5* G B 14 35.189 24.948 34.089 1.00 26.64 O
ATOM 276 C5* G B 14 34.821 26.309 34.234 1.00 26.48 C
ATOM 277 C4* G B 14 33.329 26.470 34.021 1.00 26.73 C
ATOM 278 O4* G B 14 32.973 26.023 32.691 1.00 27.34 O
ATOM 279 C3* G B 14 32.455 25.631 34.927 1.00 26.32 C
ATOM 280 O3* G B 14 32.289 26.283 36.163 1.00 25.18 O
ATOM 281 C2* G B 14 31.150 25.594 34.155 1.00 26.28 C
ATOM 282 O2* G B 14 30.462 26.832 34.292 1.00 28.24 O
ATOM 283 C1* G B 14 31.685 25.398 32.734 1.00 27.44 C
ATOM 284 N9 G B 14 31.902 23.979 32.496 1.00 26.19 N
ATOM 285 C8 G B 14 33.102 23.325 32.481 1.00 25.54 C
ATOM 286 N7 G B 14 32.989 22.052 32.226 1.00 25.02 N
ATOM 287 C5 G B 14 31.622 21.853 32.065 1.00 24.97 C
ATOM 288 C6 G B 14 30.885 20.664 31.768 1.00 25.65 C
ATOM 289 O6 G B 14 31.314 19.514 31.556 1.00 24.40 O
ATOM 290 N1 G B 14 29.512 20.911 31.724 1.00 23.81 N
ATOM 291 C2 G B 14 28.927 22.133 31.942 1.00 25.97 C
ATOM 292 N2 G B 14 27.586 22.167 31.875 1.00 26.21 N
ATOM 293 N3 G B 14 29.603 23.248 32.211 1.00 25.92 N
ATOM 294 C4 G B 14 30.936 23.030 32.250 1.00 25.92 C
ATOM 295 P U B 15 32.149 25.416 37.480 1.00 25.35 P
ATOM 296 O1P U B 15 32.057 26.334 38.616 1.00 27.54 O
ATOM 297 O2P U B 15 33.208 24.374 37.459 1.00 26.82 O
ATOM 298 O5* U B 15 30.740 24.685 37.321 1.00 24.63 O
ATOM 299 C5* U B 15 29.549 25.454 37.213 1.00 25.46 C
ATOM 300 C4* U B 15 28.349 24.546 37.085 1.00 24.23 C
ATOM 301 O4* U B 15 28.377 23.938 35.774 1.00 23.55 O
ATOM 302 C3* U B 15 28.321 23.355 38.026 1.00 24.30 C
ATOM 303 O3* U B 15 27.841 23.737 39.309 1.00 25.05 O
ATOM 304 C2* U B 15 27.338 22.453 37.289 1.00 23.86 C
ATOM 305 O2* U B 15 25.979 22.899 37.359 1.00 23.23 O
ATOM 306 C1* U B 15 27.826 22.628 35.851 1.00 24.23 C
ATOM 307 N1 U B 15 28.871 21.650 35.545 1.00 23.38 N
ATOM 308 C2 U B 15 28.448 20.398 35.155 1.00 24.01 C
ATOM 309 O2 U B 15 27.270 20.112 35.029 1.00 22.67 O
ATOM 310 N3 U B 15 29.458 19.491 34.928 1.00 24.00 N
ATOM 311 C4 U B 15 30.813 19.710 35.044 1.00 24.09 C
ATOM 312 O4 U B 15 31.590 18.793 34.769 1.00 24.56 O
ATOM 313 C5 U B 15 31.178 21.033 35.445 1.00 24.19 C
ATOM 314 C6 U B 15 30.217 21.944 35.673 1.00 23.69 C
ATOM 315 P C B 16 28.396 23.004 40.622 1.00 24.91 P
ATOM 316 O1P C B 16 29.881 23.085 40.622 1.00 24.84 O
ATOM 317 O2P C B 16 27.730 21.661 40.668 1.00 24.89 O
ATOM 318 O5* C B 16 27.946 23.943 41.822 1.00 25.30 O
ATOM 319 C5* C B 16 26.884 23.591 42.712 1.00 25.79 C
ATOM 320 C4* C B 16 25.915 24.750 42.836 1.00 25.83 C
ATOM 321 O4* C B 16 26.638 25.986 43.088 1.00 25.96 O
ATOM 322 C3* C B 16 25.135 25.023 41.571 1.00 25.48 C
ATOM 323 O3* C B 16 23.978 24.188 41.595 1.00 24.15 O
ATOM 324 C2* C B 16 24.736 26.489 41.738 1.00 25.72 C
ATOM 325 O2* C B 16 23.575 26.576 42.539 1.00 25.74 O
ATOM 326 C1* C B 16 25.947 27.061 42.477 1.00 25.87 C
ATOM 327 N1 C B 16 26.880 27.811 41.611 1.00 26.24 N
ATOM 328 C2 C B 16 26.585 29.152 41.350 1.00 26.56 C
ATOM 329 O2 C B 16 25.575 29.642 41.883 1.00 25.85 O
ATOM 330 N3 C B 16 27.407 29.874 40.549 1.00 25.78 N
ATOM 331 C4 C B 16 28.502 29.306 40.033 1.00 27.00 C
ATOM 332 N4 C B 16 29.310 30.069 39.261 1.00 26.12 N
ATOM 333 C5 C B 16 28.832 27.943 40.289 1.00 26.61 C
ATOM 334 C6 C B 16 27.998 27.237 41.079 1.00 26.28 C
ATOM 335 P G B 17 23.384 23.642 40.245 1.00 23.95 P
ATOM 336 O1P G B 17 23.436 24.574 39.098 1.00 25.14 O
ATOM 337 O2P G B 17 22.082 23.017 40.615 1.00 27.22 O
ATOM 338 O5* G B 17 24.421 22.475 39.840 1.00 26.25 O
ATOM 339 C5* G B 17 24.386 21.204 40.495 1.00 26.12 C
ATOM 340 C4* G B 17 23.742 20.170 39.592 1.00 24.76 C
ATOM 341 O4* G B 17 24.528 20.039 38.380 1.00 24.76 O
ATOM 342 C3* G B 17 23.732 18.759 40.174 1.00 25.57 C
ATOM 343 O3* G B 17 22.581 18.586 40.985 1.00 25.04 O
ATOM 344 C2* G B 17 23.674 17.891 38.931 1.00 23.99 C
ATOM 345 O2* G B 17 22.367 17.810 38.389 1.00 26.46 O
ATOM 346 C1* G B 17 24.591 18.671 37.983 1.00 24.16 C
ATOM 347 N9 G B 17 26.001 18.255 38.013 1.00 22.61 N
ATOM 348 C8 G B 17 27.064 18.970 38.508 1.00 22.62 C
ATOM 349 N7 G B 17 28.216 18.388 38.321 1.00 22.00 N
ATOM 350 C5 G B 17 27.898 17.197 37.680 1.00 22.60 C
ATOM 351 C6 G B 17 28.740 16.163 37.189 1.00 22.09 C
ATOM 352 O6 G B 17 29.987 16.125 37.154 1.00 22.31 O
ATOM 353 N1 G B 17 28.003 15.117 36.663 1.00 20.99 N
ATOM 354 C2 G B 17 26.634 15.088 36.560 1.00 21.61 C
ATOM 355 N2 G B 17 26.130 13.961 36.055 1.00 21.54 N
ATOM 356 N3 G B 17 25.834 16.080 36.938 1.00 20.35 N
ATOM 357 C4 G B 17 26.525 17.085 37.502 1.00 22.36 C
ATOM 358 P G B 18 22.687 17.714 42.302 1.00 26.50 P
ATOM 359 O1P G B 18 21.410 17.993 43.015 1.00 27.27 O
ATOM 360 O2P G B 18 23.973 17.912 42.983 1.00 23.81 O
ATOM 361 O5* G B 18 22.672 16.216 41.755 1.00 23.89 O
ATOM 362 C5* G B 18 21.485 15.647 41.200 1.00 25.00 C
ATOM 363 C4* G B 18 21.782 14.282 40.588 1.00 24.30 C
ATOM 364 O4* G B 18 22.684 14.431 39.468 1.00 22.31 O
ATOM 365 C3* G B 18 22.476 13.275 41.487 1.00 24.94 C
ATOM 366 O3* G B 18 21.499 12.593 42.250 1.00 26.00 O
ATOM 367 C2* G B 18 23.118 12.339 40.468 1.00 22.26 C
ATOM 368 O2* G B 18 22.152 11.549 39.777 1.00 22.69 O
ATOM 369 C1* G B 18 23.616 13.361 39.455 1.00 22.64 C
ATOM 370 N9 G B 18 24.946 13.869 39.792 1.00 20.90 N
ATOM 371 C8 G B 18 25.300 15.055 40.391 1.00 21.43 C
ATOM 372 N7 G B 18 26.600 15.220 40.446 1.00 20.70 N
ATOM 373 C5 G B 18 27.116 14.070 39.870 1.00 20.39 C
ATOM 374 C6 G B 18 28.475 13.671 39.631 1.00 21.70 C
ATOM 375 O6 G B 18 29.522 14.295 39.877 1.00 21.10 O
ATOM 376 N1 G B 18 28.545 12.408 39.051 1.00 22.16 N
ATOM 377 C2 G B 18 27.475 11.625 38.743 1.00 21.99 C
ATOM 378 N2 G B 18 27.770 10.418 38.218 1.00 23.07 N
ATOM 379 N3 G B 18 26.205 11.986 38.939 1.00 21.12 N
ATOM 380 C4 G B 18 26.110 13.214 39.499 1.00 21.28 C
ATOM 381 P C B 19 21.883 12.026 43.669 1.00 27.44 P
ATOM 382 O1P C B 19 20.671 11.313 44.167 1.00 28.17 O
ATOM 383 O2P C B 19 22.528 13.044 44.526 1.00 26.74 O
ATOM 384 O5* C B 19 23.028 10.969 43.369 1.00 27.50 O
ATOM 385 C5* C B 19 22.755 9.790 42.650 1.00 28.82 C
ATOM 386 C4* C B 19 24.029 8.991 42.502 1.00 29.58 C
ATOM 387 O4* C B 19 24.980 9.694 41.662 1.00 29.75 O
ATOM 388 C3* C B 19 24.790 8.765 43.794 1.00 29.78 C
ATOM 389 O3* C B 19 24.156 7.716 44.580 1.00 32.33 O
ATOM 390 C2* C B 19 26.182 8.418 43.280 1.00 29.77 C
ATOM 391 O2* C B 19 26.195 7.085 42.797 1.00 31.42 O
ATOM 392 C1* C B 19 26.302 9.364 42.073 1.00 28.44 C
ATOM 393 N1 C B 19 27.024 10.602 42.393 1.00 27.04 N
ATOM 394 C2 C B 19 28.387 10.641 42.164 1.00 26.22 C
ATOM 395 O2 C B 19 28.935 9.617 41.731 1.00 25.20 O
ATOM 396 N3 C B 19 29.078 11.788 42.418 1.00 25.55 N
ATOM 397 C4 C B 19 28.436 12.863 42.885 1.00 25.98 C
ATOM 398 N4 C B 19 29.141 14.002 43.089 1.00 25.03 N
ATOM 399 C5 C B 19 27.039 12.836 43.162 1.00 26.00 C
ATOM 400 C6 C B 19 26.375 11.697 42.903 1.00 26.97 C
TER 401 C B 19
MASTER 238 0 0 0 0 0 0 6 456 2 0 2
END
"""
TEST_PDB_STRING_2 = """
HEADER ANTIOXIDANT ENZYME 06-NOV-00 1HD2
TITLE HUMAN PEROXIREDOXIN 5
ATOM 1 N ALA A 1 -7.101 53.135 16.957 1.00 88.42 N
ANISOU 1 N ALA A 1 12714 7605 13277 523 -2633 3491 N
ATOM 2 CA ALA A 1 -8.014 52.075 17.450 1.00 63.39 C
ANISOU 2 CA ALA A 1 8225 7477 8383 2990 -789 1435 C
ATOM 3 C ALA A 1 -7.241 50.817 17.757 1.00 46.53 C
ANISOU 3 C ALA A 1 5793 6074 5811 2042 989 16 C
ATOM 4 O ALA A 1 -6.073 50.698 17.346 1.00 53.54 O
ANISOU 4 O ALA A 1 5678 8269 6398 1327 1320 1080 O
ATOM 5 CB ALA A 1 -9.119 51.791 16.443 1.00 77.54 C
ANISOU 5 CB ALA A 1 9616 10505 9342 2125 -2228 2932 C
ATOM 6 N PRO A 2 -7.796 49.873 18.488 1.00 34.26 N
ANISOU 6 N PRO A 2 4045 4756 4215 1116 109 -2030 N
ATOM 7 CA PRO A 2 -6.966 48.670 18.750 1.00 29.30 C
ANISOU 7 CA PRO A 2 3413 4041 3677 360 -116 -2345 C
ATOM 8 C PRO A 2 -6.707 47.922 17.451 1.00 23.66 C
ANISOU 8 C PRO A 2 1982 4034 2972 261 -662 -1762 C
ATOM 9 O PRO A 2 -7.549 47.657 16.601 1.00 23.73 O
ANISOU 9 O PRO A 2 1855 3960 3202 209 -836 -1320 O
ATOM 10 CB PRO A 2 -7.774 47.860 19.732 1.00 30.97 C
ANISOU 10 CB PRO A 2 3871 4866 3032 146 -165 -2287 C
ATOM 11 CG PRO A 2 -8.779 48.807 20.281 1.00 35.77 C
ANISOU 11 CG PRO A 2 3448 6110 4032 639 23 -2099 C
ATOM 12 CD PRO A 2 -9.080 49.777 19.155 1.00 36.34 C
ANISOU 12 CD PRO A 2 3188 6479 4142 982 -759 -2220 C
ATOM 13 N ILE A 3 -5.409 47.566 17.357 1.00 19.45 N
ANISOU 13 N ILE A 3 1760 3495 2134 -75 -484 -1027 N
ATOM 14 CA ILE A 3 -5.040 46.822 16.152 1.00 18.02 C
ANISOU 14 CA ILE A 3 1703 3171 1973 10 -766 -936 C
ATOM 15 C ILE A 3 -5.689 45.452 16.192 1.00 18.71 C
ANISOU 15 C ILE A 3 1903 3233 1974 -124 -474 -787 C
ATOM 16 O ILE A 3 -5.915 44.901 17.260 1.00 20.34 O
ANISOU 16 O ILE A 3 2262 3409 2057 315 -565 -507 O
ATOM 17 CB ILE A 3 -3.513 46.734 16.025 1.00 17.09 C
ANISOU 17 CB ILE A 3 1740 2849 1903 104 -657 -844 C
ATOM 18 CG1 ILE A 3 -3.034 46.332 14.628 1.00 20.89 C
ANISOU 18 CG1 ILE A 3 2079 3804 2056 -100 -461 -1173 C
ATOM 19 CG2 ILE A 3 -2.939 45.866 17.110 1.00 18.41 C
ANISOU 19 CG2 ILE A 3 1664 3016 2316 -146 -718 -494 C
ATOM 20 CD1 ILE A 3 -1.553 46.546 14.371 1.00 22.47 C
ANISOU 20 CD1 ILE A 3 2492 3346 2701 -814 225 -681 C
ATOM 21 N LYS A 4 -6.016 44.945 14.979 1.00 18.15 N
ANISOU 21 N LYS A 4 2109 2759 2028 -146 -425 -709 N
ATOM 22 CA LYS A 4 -6.669 43.672 14.871 1.00 19.21 C
ANISOU 22 CA LYS A 4 1734 2945 2619 -166 -240 -1004 C
ATOM 23 C LYS A 4 -6.136 42.905 13.666 1.00 18.35 C
ANISOU 23 C LYS A 4 1621 2920 2432 -469 -242 -951 C
ATOM 24 O LYS A 4 -5.523 43.516 12.787 1.00 18.36 O
ANISOU 24 O LYS A 4 1727 2866 2383 -290 -380 -717 O
ATOM 25 CB LYS A 4 -8.179 43.880 14.717 1.00 22.82 C
ANISOU 25 CB LYS A 4 1797 3115 3760 78 -321 -1132 C
ATOM 26 CG LYS A 4 -8.515 44.601 13.433 1.00 36.33 C
ANISOU 26 CG LYS A 4 3420 5560 4825 996 -1411 -178 C
ATOM 27 CD LYS A 4 -9.973 44.994 13.306 1.00 47.94 C
ANISOU 27 CD LYS A 4 3683 7877 6655 1611 -1898 19 C
ATOM 28 CE LYS A 4 -10.268 45.683 11.970 1.00 55.24 C
ANISOU 28 CE LYS A 4 4630 8886 7472 2301 -2486 712 C
ATOM 29 NZ LYS A 4 -9.697 47.057 11.880 1.00 64.74 N
ANISOU 29 NZ LYS A 4 6567 9962 8070 776 -3006 2044 N
ATOM 30 N VAL A 5 -6.390 41.610 13.638 1.00 17.46 N
ANISOU 30 N VAL A 5 1965 2729 1939 -44 -514 -643 N
ATOM 31 CA VAL A 5 -6.062 40.803 12.447 1.00 17.65 C
ANISOU 31 CA VAL A 5 2357 2607 1741 11 -466 -478 C
ATOM 32 C VAL A 5 -6.706 41.431 11.221 1.00 16.38 C
ANISOU 32 C VAL A 5 1801 2477 1947 27 -578 -599 C
ATOM 33 O VAL A 5 -7.842 41.860 11.225 1.00 20.73 O
ANISOU 33 O VAL A 5 1855 3251 2769 339 -647 -999 O
ATOM 34 CB VAL A 5 -6.540 39.355 12.621 1.00 18.16 C
ANISOU 34 CB VAL A 5 2548 2687 1666 -199 -700 -506 C
ATOM 35 CG1 VAL A 5 -6.490 38.556 11.331 1.00 20.68 C
ANISOU 35 CG1 VAL A 5 3654 2669 1532 -286 -788 -390 C
ATOM 36 CG2 VAL A 5 -5.643 38.711 13.693 1.00 21.32 C
ANISOU 36 CG2 VAL A 5 3412 3031 1657 -320 -951 -136 C
ATOM 37 N GLY A 6 -5.884 41.470 10.169 1.00 17.57 N
ANISOU 37 N GLY A 6 1818 3021 1838 -5 -684 -127 N
ATOM 38 CA GLY A 6 -6.293 42.101 8.926 1.00 17.93 C
ANISOU 38 CA GLY A 6 2091 2771 1951 266 -1009 -287 C
ATOM 39 C GLY A 6 -5.787 43.509 8.756 1.00 19.23 C
ANISOU 39 C GLY A 6 2774 2651 1880 339 -804 -321 C
ATOM 40 O GLY A 6 -5.730 44.041 7.631 1.00 18.94 O
ANISOU 40 O GLY A 6 2552 2787 1858 397 -657 -317 O
ATOM 41 N ASP A 7 -5.412 44.192 9.821 1.00 16.89 N
ANISOU 41 N ASP A 7 2075 2493 1851 359 -716 -218 N
ATOM 42 CA ASP A 7 -4.884 45.527 9.687 1.00 17.92 C
ANISOU 42 CA ASP A 7 2197 2589 2023 267 -381 -346 C
ATOM 43 C ASP A 7 -3.441 45.489 9.193 1.00 15.85 C
ANISOU 43 C ASP A 7 2048 2223 1750 430 -734 -281 C
ATOM 44 O ASP A 7 -2.691 44.572 9.409 1.00 18.38 O
ANISOU 44 O ASP A 7 2451 2196 2337 656 -1042 -611 O
ATOM 45 CB ASP A 7 -4.884 46.242 11.037 1.00 19.75 C
ANISOU 45 CB ASP A 7 2401 2759 2345 469 -423 -702 C
ATOM 46 CG ASP A 7 -6.256 46.558 11.580 1.00 19.89 C
ANISOU 46 CG ASP A 7 2495 3124 1940 591 -278 -147 C
ATOM 47 OD1 ASP A 7 -7.246 46.486 10.836 1.00 22.27 O
ANISOU 47 OD1 ASP A 7 2413 3699 2348 528 -394 -258 O
ATOM 48 OD2 ASP A 7 -6.286 46.895 12.814 1.00 23.63 O
ANISOU 48 OD2 ASP A 7 3176 3849 1952 665 -161 -245 O
ATOM 49 N ALA A 8 -3.094 46.594 8.534 1.00 16.73 N
ANISOU 49 N ALA A 8 2285 2228 1845 255 -469 -376 N
ATOM 50 CA ALA A 8 -1.686 46.796 8.224 1.00 19.26 C
ANISOU 50 CA ALA A 8 2282 3100 1936 119 -555 -155 C
ATOM 51 C ALA A 8 -0.940 47.209 9.477 1.00 18.08 C
ANISOU 51 C ALA A 8 2249 2719 1900 181 -453 -242 C
ATOM 52 O ALA A 8 -1.418 47.960 10.308 1.00 20.61 O
ANISOU 52 O ALA A 8 2470 3061 2299 280 -308 -530 O
ATOM 53 CB ALA A 8 -1.558 47.881 7.175 1.00 22.45 C
ANISOU 53 CB ALA A 8 2906 3904 1718 16 -78 87 C
MASTER 245 0 6 6 7 0 3 6 1429 1 9 13
END
"""
#run if called from command-line
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_format/test_stockholm.py 000644 000765 000024 00000007264 12024702176 023357 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests for Stockholm sequence format writer.
"""
from cogent.util.unit_test import TestCase, main
from cogent.format.stockholm import stockholm_from_alignment
from cogent.core.alignment import Alignment
from cogent.core.sequence import Sequence
from cogent.core.info import Info
__author__ = "Jeremy Widmann"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jeremy Widmann"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Jeremy Widmann"
__email__ = "jeremy.widmann@colorado.edu"
__status__ = "Production"
class StockholmTests(TestCase):
"""Tests for Stockholm writer.
"""
def setUp(self):
"""Setup for Stockholm tests."""
self.unaligned_dict = {'1st':'AAA','2nd':'CCCC','3rd':'GGGG',
'4th':'UUUU'}
self.alignment_dict = {'1st':'AAAA','2nd':'CCCC','3rd':'GGGG',
'4th':'UUUU'}
#create alignment change order.
self.alignment_object = Alignment(self.alignment_dict)
self.alignment_order = ['2nd','4th','3rd','1st']
self.alignment_object.RowOrder=self.alignment_order
self.gc_annotation = {'SS_cons':'....'}
self.stockholm_with_label=\
"""# STOCKHOLM 1.0
1st AAAA
2nd CCCC
3rd GGGG
4th UUUU
//"""
self.stockholm_with_label_lw2=\
"""# STOCKHOLM 1.0
1st AA
2nd CC
3rd GG
4th UU
1st AA
2nd CC
3rd GG
4th UU
//"""
self.stockholm_with_label_struct=\
"""# STOCKHOLM 1.0
1st AAAA
2nd CCCC
3rd GGGG
4th UUUU
#=GC SS_cons ....
//"""
self.stockholm_with_label_struct_lw2=\
"""# STOCKHOLM 1.0
1st AA
2nd CC
3rd GG
4th UU
#=GC SS_cons ..
1st AA
2nd CC
3rd GG
4th UU
#=GC SS_cons ..
//"""
self.stockholm_with_label_reordered=\
"""# STOCKHOLM 1.0
2nd CCCC
4th UUUU
3rd GGGG
1st AAAA
//"""
self.stockholm_with_label_lw2_reordered=\
"""# STOCKHOLM 1.0
2nd CC
4th UU
3rd GG
1st AA
2nd CC
4th UU
3rd GG
1st AA
//"""
def test_stockholm_from_alignment_unaligned(self):
"""should raise error with unaligned seqs."""
self.assertRaises(ValueError,\
stockholm_from_alignment,self.unaligned_dict)
def test_stockholm_from_alignment(self):
"""should return correct stockholm string."""
self.assertEqual(stockholm_from_alignment({}),'')
self.assertEqual(stockholm_from_alignment(self.alignment_dict),\
self.stockholm_with_label)
self.assertEqual(stockholm_from_alignment(self.alignment_dict,
interleave_len=2),self.stockholm_with_label_lw2)
def test_stockholm_from_alignment_struct(self):
"""should return correct stockholm string."""
self.assertEqual(stockholm_from_alignment({},\
GC_annotation=self.gc_annotation),'')
self.assertEqual(stockholm_from_alignment(self.alignment_dict,\
GC_annotation=self.gc_annotation),\
self.stockholm_with_label_struct)
self.assertEqual(stockholm_from_alignment(self.alignment_dict,\
GC_annotation=self.gc_annotation,\
interleave_len=2),self.stockholm_with_label_struct_lw2)
def test_stockholm_from_alignment_reordered(self):
"""should return correct stockholm string."""
self.assertEqual(stockholm_from_alignment(self.alignment_object),\
self.stockholm_with_label_reordered)
self.assertEqual(stockholm_from_alignment(self.alignment_object,
interleave_len=2),self.stockholm_with_label_lw2_reordered)
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_format/test_xyzrn.py 000644 000765 000024 00000002717 12024702176 022544 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
import os, tempfile
from unittest import main
from cogent.util.unit_test import TestCase
from cogent import FromFilenameStructureParser
from cogent.struct.selection import einput
from cogent.format import xyzrn
__author__ = "Marcin Cieslik"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__contributors__ = ["Marcin Cieslik"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Marcin Cieslik"
__email__ = "mpc4p@virginia.edu"
__status__ = "Development"
class XyzrnTest(TestCase):
"""Tests conversion of PDB files into the informal xyzrn format."""
def setUp(self):
self.structure = FromFilenameStructureParser('data/1A1X.pdb')
self.residues = einput(self.structure, 'R')
self.atoms = einput(self.structure, 'A')
self.residue8 = self.residues.values()[8]
self.atom17 = self.atoms.values()[17]
self.atom23 = self.atoms.values()[23]
def test_write_atom(self):
fd, fn = tempfile.mkstemp()
os.close(fd)
handle = open(fn, 'wb')
xyzrn.XYZRNWriter(handle, [self.atom17])
handle.close()
handle = open(fn, 'rb')
coords_radius = [float(n) for n in handle.read().split()[:4]]
self.atom17.setRadius()
radius = self.atom17.getRadius()
self.assertFloatEqualRel(self.atom17.coords, coords_radius[:3])
self.assertFloatEqualRel(radius, coords_radius[3])
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_evolve/__init__.py 000644 000765 000024 00000001157 12024702176 022057 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
__all__ = ["test_best_likelihood",
"test_bootstrap",
"test_likelihood_function",
"test_motifchange",
"test_parameter_controller",
"test_scale_rules",
"test_simulation",
"test_substitution_model",
"test_coevolution",
"test_models"]
__author__ = ""
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Peter Maxwell","Greg Caporaso"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
PyCogent-1.5.3/tests/test_evolve/test_best_likelihood.py 000644 000765 000024 00000010444 12024702176 024516 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from cogent.util.unit_test import TestCase, main
from cogent import LoadSeqs, DNA
from cogent.evolve.best_likelihood import aligned_columns_to_rows, count_column_freqs, get_ML_probs, \
get_G93_lnL_from_array, BestLogLikelihood, _transpose, _take
import math
__author__ = "Helen Lindsay"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Gavin Huttley", "Helen Lindsay"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Helen Lindsay"
__email__ = "helen.lindsay@anu.edu.au"
__status__ = "Production"
IUPAC_DNA_ambiguities = 'NRYWSKMBDHV'
def makeSampleAlignment(gaps = False, ambiguities = False):
if gaps:
seqs_list = ['AAA--CTTTGG-T','CCCCC-TATG-GT','-AACCCTTTGGGT']
elif ambiguities:
seqs_list = ['AARNCCTTTGGC','CCNYCCTTTGSG','CAACCCTGWGGG']
else:
seqs_list = ['AAACCCGGGTTTA','CCCGGGTTTAAAC','GGGTTTAAACCCG']
seqs = zip('abc', seqs_list)
return LoadSeqs(data = seqs)
class TestGoldman93(TestCase):
def setUp(self):
self.aln = makeSampleAlignment()
self.gapped_aln = makeSampleAlignment(gaps = True)
self.ambig_aln = makeSampleAlignment(ambiguities = True)
def test_aligned_columns_to_rows(self):
obs = aligned_columns_to_rows(self.aln[:-1], 3)
expect = [['AAA','CCC','GGG'],['CCC','GGG','TTT'],
['GGG','TTT','AAA'], ['TTT','AAA','CCC']]
assert obs == expect, (obs, expect)
obs = aligned_columns_to_rows(self.aln, 1)
expect = [['A','C','G'],['A','C','G'],['A','C','G'],
['C','G','T'],['C','G','T'],['C','G','T'],
['G','T','A'],['G','T','A'],['G','T','A'],
['T','A','C'],['T','A','C'],['T','A','C'],
['A','C','G']]
self.assertEqual(obs, expect)
obs = aligned_columns_to_rows(self.gapped_aln[:-1], 3, allowed_chars='ACGT')
expect = [['TTT','TAT','TTT']]
self.assertEqual(obs, expect)
obs = aligned_columns_to_rows(self.ambig_aln, 2, exclude_chars=IUPAC_DNA_ambiguities)
expect = [['AA','CC','CA'],['CC','CC','CC'],['TT','TT','TG']]
self.assertEqual(obs, expect)
def test_count_column_freqs(self):
columns = aligned_columns_to_rows(self.aln, 1)
obs = count_column_freqs(columns)
expect = {'A C G' : 4, 'C G T' : 3, 'G T A' : 3, 'T A C' : 3}
self.assertEqual(obs, expect)
columns = aligned_columns_to_rows(self.aln[:-1], 2)
obs = count_column_freqs(columns)
expect = {'AA CC GG': 1, 'AC CG GT': 1, 'CC GG TT':1, 'GG TT AA':1,
'GT TA AC':1, 'TT AA CC':1}
self.assertEqual(obs, expect)
def test__transpose(self):
"""test transposing an array"""
a = [[0,1,2],[3,4,5],[6,7,8],[9,10,11]]
e = [[0,3,6,9],[1,4,7,10],[2,5,8,11]]
self.assertEqual(_transpose(a), e)
def test__take(self):
"""test taking selected rows from an array"""
e = [[0,3,6,9],[1,4,7,10],[2,5,8,11]]
self.assertEqual(_take(e, [0,1]), [[0,3,6,9],[1,4,7,10]])
self.assertEqual(_take(e, [1,2]), [[1,4,7,10],[2,5,8,11]])
self.assertEqual(_take(e, [0,2]), [[0,3,6,9],[2,5,8,11]])
def test_get_ML_probs(self):
columns = aligned_columns_to_rows(self.aln, 1)
obs = get_ML_probs(columns, with_patterns=True)
expect = {'A C G' : 4/13.0, 'C G T' : 3/13.0, 'G T A' : 3/13.0, 'T A C' : 3/13.0}
sum = 0
for pattern, lnL, freq in obs:
self.assertFloatEqual(lnL, expect[pattern])
sum += lnL
self.assertTrue(lnL >= 0)
self.assertFloatEqual(sum, 1)
def test_get_G93_lnL_from_array(self):
columns = aligned_columns_to_rows(self.aln, 1)
obs = get_G93_lnL_from_array(columns)
expect = math.log(math.pow(4/13.0, 4)) + 3*math.log(math.pow(3/13.0, 3))
self.assertFloatEqual(obs, expect)
def test_BestLogLikelihood(self):
obs = BestLogLikelihood(self.aln, DNA.Alphabet)
expect = math.log(math.pow(4/13.0, 4)) + 3*math.log(math.pow(3/13.0, 3))
self.assertFloatEqual(obs,expect)
lnL, l = BestLogLikelihood(self.aln, DNA.Alphabet, return_length=True)
self.assertEqual(l, len(self.aln))
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_evolve/test_bootstrap.py 000644 000765 000024 00000012047 12024702176 023374 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
import sys
import unittest
from cogent.evolve import likelihood_function, \
parameter_controller, substitution_model, bootstrap
from cogent import LoadSeqs, LoadTree
import os
__author__ = "Peter Maxwell and Gavin Huttley"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Peter Maxwell", "Gavin Huttley", "Matthew Wakefield",
"Helen Lindsay", "Andrew Butterfield"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Production"
base_path = os.getcwd()
data_path = os.path.join(base_path, 'data')
seqnames = ['Chimpanzee', 'Rhesus', 'Orangutan',
'Human']
REPLICATES = 2
def float_ge_zero(num, epsilon=1e-6):
"""compare whether a floating point value is >= zero with epsilon
tolerance."""
if num >= 0.0:
return True
elif abs(num - 0.0) < epsilon:
return True
else:
return False
class BootstrapTests(unittest.TestCase):
def gettree(self):
treeobj = LoadTree(filename=os.path.join(data_path,"murphy.tree"))
return treeobj.getSubTree(seqnames)
def getsubmod(self,choice = 'F81'):
if choice == 'F81':
return substitution_model.Nucleotide(model_gaps=True)
else:
return substitution_model.Nucleotide(
model_gaps=True,
predicates = {'kappa':'transition'})
def getalignmentobj(self):
moltype = self.getsubmod().MolType
alignmentobj = LoadSeqs(
filename = os.path.join(data_path, "brca1.fasta"),
moltype = moltype)
return alignmentobj.takeSeqs(seqnames)[:1000]
def getcontroller(self,treeobj, submodobj):
return submodobj.makeParamController(treeobj)
def create_null_controller(self, alignobj):
"""A null model controller creator.
We constrain the human chimp branches to be equal."""
treeobj = self.gettree()
submodobj = self.getsubmod()
controller = self.getcontroller(treeobj, submodobj)
# we are setting a local molecular clock for human/chimp
controller.setLocalClock('Human', 'Chimpanzee')
return controller
def create_alt_controller(self,alignobj):
"""An alternative model controller. Chimp/Human
branches are free to vary."""
treeobj = self.gettree()
submodobj = self.getsubmod()
controller = self.getcontroller(treeobj, submodobj)
return controller
def calclength(self, likelihood_function):
"""This extracts the length of the human branch and returns it."""
return likelihood_function.getParamValue("length", 'Human')
def test_conf_int(self):
"""testing estimation of confidence intervals."""
alignobj = self.getalignmentobj()
bstrap = bootstrap.EstimateConfidenceIntervals(
self.create_null_controller(alignobj), self.calclength, alignobj)
bstrap.setNumReplicates(REPLICATES)
bstrap.setSeed(1984)
bstrap.run(local=True)
samplelnL = bstrap.getSamplelnL()
for lnL in samplelnL:
assert lnL < 0.0, lnL
observed_stat = bstrap.getObservedStats()
assert float_ge_zero(observed_stat)
samplestats = bstrap.getSampleStats()
for stat in samplestats:
assert float_ge_zero(stat)
self.assertEqual(len(samplelnL), REPLICATES)
self.assertEqual(len(samplestats), REPLICATES)
def test_prob(self):
"""testing estimation of probability."""
import sys
alignobj = self.getalignmentobj()
prob_bstrap = bootstrap.EstimateProbability(
self.create_null_controller(alignobj),
self.create_alt_controller(alignobj),
alignobj)
prob_bstrap.setNumReplicates(REPLICATES)
prob_bstrap.setSeed(1984)
prob_bstrap.run(local=True)
self.assertEqual(len(prob_bstrap.getSampleLRList()), REPLICATES)
assert float_ge_zero(prob_bstrap.getObservedLR())
# check the returned sample LR's for being > 0.0
for sample_LR in prob_bstrap.getSampleLRList():
#print sample_LR
assert float_ge_zero(sample_LR), sample_LR
# check the returned observed lnL fulfill this assertion too, really
# testing their order
null, alt = prob_bstrap.getObservedlnL()
assert float_ge_zero(2 * (alt - null))
# now check the structure of the returned sample
for snull, salt in prob_bstrap.getSamplelnL():
#print salt, snull, 2*(salt-snull)
assert float_ge_zero(2 * (salt - snull))
# be sure we get something back from getprob if proc rank is 0
assert float_ge_zero(prob_bstrap.getEstimatedProb())
if __name__ == "__main__":
unittest.main()
PyCogent-1.5.3/tests/test_evolve/test_coevolution.py 000755 000765 000024 00001630436 12024702176 023741 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
# Authors: Greg Caporaso (gregcaporaso@gmail.com), Brett Easton, Gavin Huttley
# test_coevolution.py
""" Description
File created on 22 May 2007.
"""
from __future__ import division
from tempfile import mktemp
from os import remove, environ
from os.path import exists
from numpy import zeros, ones, array, transpose, arange, nan, log, e, sqrt,\
greater_equal, less_equal
from cogent.util.unit_test import TestCase, main
from cogent import DNA, RNA, PROTEIN, LoadTree, LoadSeqs
from cogent.core.alphabet import CharAlphabet
from cogent.maths.stats.util import Freqs
from cogent.core.profile import Profile
from cogent.core.alphabet import CharAlphabet, Alphabet
from cogent.maths.stats.distribution import binomial_exact
from cogent.core.alignment import DenseAlignment
from cogent.seqsim.tree import RandomTree
from cogent.app.util import get_tmp_filename
from cogent.evolve.models import DSO78_matrix, DSO78_freqs
from cogent.evolve.substitution_model import SubstitutionModel, Empirical
from cogent.app.gctmpca import gctmpca_aa_order,\
default_gctmpca_aa_sub_matrix
from cogent.util.misc import app_path
from cogent.evolve.coevolution import mi_alignment, nmi_alignment,\
resampled_mi_alignment, sca_alignment, make_weights,\
parse_gctmpca_result_line, gDefaultNullValue, create_gctmpca_input,\
build_rate_matrix, coevolve_pair, validate_position, validate_alphabet,\
validate_alignment, unpickle_coevolution_result, mi,\
parse_gctmpca_result, sca_pair, csv_to_coevolution_matrix, sca_position,\
coevolve_position, sca_input_validation, coevolve_alignment, \
probs_from_dict, pickle_coevolution_result, \
parse_coevolution_matrix_filepath, normalized_mi, n_random_seqs, \
mi_position, mi_pair, calc_pair_scale, coevolve_alignments, protein_dict,\
ignore_excludes, merge_alignments, ltm_to_symmetric, join_positions,\
is_parsimony_informative, identify_aln_positions_above_threshold, \
get_subalignments, get_positional_probabilities, \
get_positional_frequencies, get_dgg, get_dg, get_allowed_perturbations, \
freqs_to_array, freqs_from_aln, \
filter_threshold_based_multiple_interdependency, \
filter_non_parsimony_informative, filter_exclude_positions, \
coevolution_matrix_to_csv, count_le_threshold, count_ge_threshold, \
nmi_position, nmi_pair, AAGapless, ancestral_state_position, \
ancestral_state_pair, coevolve_alignments_validation, \
ancestral_state_alignment, nmi, build_coevolution_matrix_filepath,\
aln_position_pairs_cmp_threshold, validate_tree, validate_ancestral_seqs,\
validate_ancestral_seqs, get_ancestral_seqs, \
ancestral_states_input_validation, ancestral_state_pair, gctmpca_alignment,\
aln_position_pairs_ge_threshold, aln_position_pairs_ge_threshold,\
aln_position_pairs_le_threshold, gctmpca_pair
__author__ = "Greg Caporaso"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Greg Caporaso"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Greg Caporaso"
__email__ = "gregcaporaso@gmail.com"
__status__ = "Beta"
class CoevolutionTests(TestCase):
""" Tests of coevolution.py """
def setUp(self):
"""Set up variables for us in tests """
self.run_slow_tests = int(environ.get('TEST_SLOW_APPC',0))
self.run_gctmpca_tests = app_path('calculate_likelihood')
## Data used in SCA tests
self.dna_aln = DenseAlignment(data=zip(\
range(4),['ACGT','AGCT','ACCC','TAGG']),MolType=DNA)
self.rna_aln = DenseAlignment(data=zip(\
range(4),['ACGU','AGCU','ACCC','UAGG']),MolType=RNA)
self.protein_aln = DenseAlignment(data=zip(\
range(4),['ACGP','AGCT','ACCC','TAGG']),MolType=PROTEIN)
self.dna_aln_gapped = DenseAlignment(data=zip(range(4),\
['A-CGT','AGC-T','-ACCC','TAGG-']),MolType=DNA)
self.freq = DenseAlignment(data=zip(range(20),\
['TCT', 'CCT', 'CCC', 'CCC',\
'CCG', 'CC-', 'AC-', 'AC-', 'AA-', 'AA-', 'GA-', 'GA-', 'GA-', 'GA-',\
'GA-', 'G--', 'G--', 'G--', 'G--', 'G--',]),MolType=PROTEIN)
self.two_pos = DenseAlignment(data=zip(map(str,range(20)),\
['TC', 'CC', 'CC', 'CC', 'CC', 'CC', 'AC', 'AC', \
'AA', 'AA', 'GA', 'GA', 'GA', 'GA', 'GA', 'GT', \
'GT', 'GT', 'GT', 'GT']),MolType=PROTEIN)
self.tree20 = LoadTree(treestring=tree20_string)
self.gpcr_aln = gpcr_aln
self.myos_aln = myos_aln
# a made-up dict of base frequencies to use as the natural freqs
# for SCA calcs on DNA seqs
self.dna_base_freqs = dict(zip('ACGT',[0.25]*4))
self.rna_base_freqs = dict(zip('ACGU',[0.25]*4))
self.run_slow_gctmpca_tests = False
self.protein_aln4 = DenseAlignment([('A1','AACF'),('A12','AADF'),\
('A123','ADCF'),('A111','AAD-')],\
MolType=PROTEIN)
self.rna_aln4 = DenseAlignment([('A1','AAUU'),('A12','ACGU'),\
('A123','UUAA'),('A111','AAA-')],\
MolType=RNA)
self.dna_aln4 = DenseAlignment([('A1','AATT'),('A12','ACGT'),\
('A123','TTAA'),('A111','AAA?')],\
MolType=DNA)
self.tree4 = LoadTree(treestring=\
"((A1:0.5,A111:0.5):0.5,(A12:0.5,A123:0.5):0.5);")
def test_alignment_analyses_moltype_protein(self):
""" alignment methods work with moltype = PROTEIN """
r = mi_alignment(self.protein_aln4)
self.assertEqual(r.shape,(4,4))
r = nmi_alignment(self.protein_aln4)
self.assertEqual(r.shape,(4,4))
r = sca_alignment(self.protein_aln4,cutoff=0.75)
self.assertEqual(r.shape,(4,4))
r = ancestral_state_alignment(self.protein_aln4,self.tree4)
self.assertEqual(r.shape,(4,4))
# check if we're running the GCTMPCA tests and the slow tests
if self.run_slow_tests and self.run_gctmpca_tests:
r = gctmpca_alignment(self.protein_aln4,self.tree4,epsilon=0.7)
self.assertEqual(r.shape,(4,4))
def test_alignment_analyses_moltype_rna(self):
""" alignment methods work with moltype = RNA """
r = mi_alignment(self.rna_aln4)
self.assertEqual(r.shape,(4,4))
r = nmi_alignment(self.rna_aln4)
self.assertEqual(r.shape,(4,4))
r = sca_alignment(self.rna_aln4,cutoff=0.75,alphabet='ACGU',\
background_freqs=self.rna_base_freqs)
self.assertEqual(r.shape,(4,4))
r = ancestral_state_alignment(self.rna_aln4,self.tree4)
self.assertEqual(r.shape,(4,4))
# check if we're running the GCTMPCA tests and the slow tests
if self.run_slow_tests and self.run_gctmpca_tests:
r = gctmpca_alignment(self.rna_aln4,self.tree4,epsilon=0.7)
self.assertEqual(r.shape,(4,4))
def test_alignment_analyses_moltype_dna(self):
""" alignment methods work with moltype = DNA """
r = mi_alignment(self.dna_aln4)
self.assertEqual(r.shape,(4,4))
r = nmi_alignment(self.dna_aln4)
self.assertEqual(r.shape,(4,4))
r = sca_alignment(self.dna_aln4,cutoff=0.75,alphabet='ACGT',\
background_freqs=self.dna_base_freqs)
self.assertEqual(r.shape,(4,4))
r = ancestral_state_alignment(self.dna_aln4,self.tree4)
self.assertEqual(r.shape,(4,4))
# check if we're running the GCTMPCA tests and the slow tests
if self.run_slow_tests and self.run_gctmpca_tests:
# Gctmpca method doesn't support DNA alignments.
self.assertRaises(ValueError,gctmpca_alignment,self.dna_aln4,\
self.tree4,epsilon=0.7)
def test_join_positions(self):
""" join_positions functions as expected """
self.assertEqual(join_positions(list('ABCD'),list('WXYZ')),\
['AW','BX','CY','DZ'])
self.assertEqual(join_positions(list('AAA'),list('BBB')),\
['AB','AB','AB'])
self.assertEqual(join_positions([],[]),[])
def test_mi(self):
""" mi calculations function as expected with valid data"""
self.assertFloatEqual(mi(1.0,1.0,1.0),1.0)
self.assertFloatEqual(mi(1.0,1.0,2.0),0.0)
self.assertFloatEqual(mi(1.0,1.0,1.5),0.5)
def test_normalized_mi(self):
""" normalized mi calculations function as expected with valid data"""
self.assertFloatEqual(normalized_mi(1.0,1.0,1.0),1.0)
self.assertFloatEqual(normalized_mi(1.0,1.0,2.0),0.0)
self.assertFloatEqual(normalized_mi(1.0,1.0,1.5),0.3333,3)
def test_mi_pair(self):
""" mi_pair calculates mi from a pair of columns """
aln = DenseAlignment(data={'1':'AB','2':'AB'},MolType=PROTEIN)
self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1), 0.0)
aln = DenseAlignment(data={'1':'AB','2':'BA'},MolType=PROTEIN)
self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1), 1.0)
# order of positions doesn't matter (when it shouldn't)
aln = DenseAlignment(data={'1':'AB','2':'AB'},MolType=PROTEIN)
self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1),\
mi_pair(aln,pos1=1,pos2=0))
aln = DenseAlignment(data={'1':'AB','2':'BA'},MolType=PROTEIN)
self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1), \
mi_pair(aln,pos1=1,pos2=0))
def test_wrapper_functions_handle_invalid_parameters(self):
"""coevolve_*: functions error on missing parameters"""
# missing cutoff
aln = DenseAlignment(data={'1':'AC','2':'AC'},MolType=PROTEIN)
self.assertRaises(ValueError,coevolve_pair,sca_pair,aln,0,1)
self.assertRaises(ValueError,coevolve_position,sca_position,aln,0)
self.assertRaises(ValueError,coevolve_alignment,sca_alignment,aln)
self.assertRaises(ValueError,coevolve_alignments,sca_alignment,aln,aln)
def test_coevolve_pair(self):
"""coevolve_pair: returns same as pair methods called directly """
aln = DenseAlignment(data={'1':'AC','2':'AC'},MolType=PROTEIN)
t = LoadTree(treestring='(1:0.5,2:0.5);')
cutoff = 0.50
# mi_pair == coevolve_pair(mi_pair,...)
self.assertFloatEqual(coevolve_pair(mi_pair,aln,pos1=0,pos2=1),\
mi_pair(aln,pos1=0,pos2=1))
self.assertFloatEqual(coevolve_pair(nmi_pair,aln,pos1=0,pos2=1),\
nmi_pair(aln,pos1=0,pos2=1))
self.assertFloatEqual(coevolve_pair(ancestral_state_pair,aln,pos1=0,\
pos2=1,tree=t),ancestral_state_pair(aln,pos1=0,pos2=1,tree=t))
self.assertFloatEqual(coevolve_pair(sca_pair,aln,pos1=0,\
pos2=1,cutoff=cutoff),sca_pair(aln,pos1=0,pos2=1,cutoff=cutoff))
def test_coevolve_position(self):
"""coevolve_position: returns same as position methods called directly
"""
aln = DenseAlignment(data={'1':'AC','2':'AC'},MolType=PROTEIN)
t = LoadTree(treestring='(1:0.5,2:0.5);')
cutoff = 0.50
# mi_position == coevolve_position(mi_position,...)
self.assertFloatEqual(coevolve_position(mi_position,aln,position=0),\
mi_position(aln,position=0))
self.assertFloatEqual(coevolve_position(nmi_position,aln,position=0),\
nmi_position(aln,position=0))
self.assertFloatEqual(coevolve_position(\
ancestral_state_position,aln,position=0,\
tree=t),ancestral_state_position(aln,position=0,tree=t))
self.assertFloatEqual(coevolve_position(sca_position,aln,position=0,\
cutoff=cutoff),sca_position(aln,position=0,cutoff=cutoff))
def test_coevolve_alignment(self):
"""coevolve_alignment: returns same as alignment methods"""
aln = DenseAlignment(data={'1':'AC','2':'AC'},MolType=PROTEIN)
t = LoadTree(treestring='(1:0.5,2:0.5);')
cutoff = 0.50
# mi_alignment == coevolve_alignment(mi_alignment,...)
self.assertFloatEqual(coevolve_alignment(mi_alignment,aln),\
mi_alignment(aln))
self.assertFloatEqual(coevolve_alignment(mip_alignment,aln),\
mip_alignment(aln))
self.assertFloatEqual(coevolve_alignment(mia_alignment,aln),\
mia_alignment(aln))
self.assertFloatEqual(coevolve_alignment(nmi_alignment,aln),\
nmi_alignment(aln))
self.assertFloatEqual(coevolve_alignment(ancestral_state_alignment,aln,\
tree=t),ancestral_state_alignment(aln,tree=t))
self.assertFloatEqual(coevolve_alignment(sca_alignment,aln,\
cutoff=cutoff),sca_alignment(aln,cutoff=cutoff))
def test_coevolve_alignments_validation_idenifiers(self):
"""coevolve_alignments_validation: seq/tree validation functions
"""
method = sca_alignment
aln1 = DenseAlignment(data={'1':'AC','2':'AD'},MolType=PROTEIN)
aln2 = DenseAlignment(data={'1':'EFW','2':'EGY'},MolType=PROTEIN)
t = LoadTree(treestring='(1:0.5,2:0.5);')
# OK w/ no tree
coevolve_alignments_validation(method,aln1,aln2,2,None)
# OK w/ tree
coevolve_alignments_validation(method,aln1,aln2,2,None,tree=t)
# If there is a plus present in identifiers, we only care about the
# text before the colon
aln1 = DenseAlignment(data={'1+a':'AC','2+b':'AD'},MolType=PROTEIN)
aln2 = DenseAlignment(\
data={'1 + c':'EFW','2 + d':'EGY'},MolType=PROTEIN)
t = LoadTree(treestring='(1+e:0.5,2 + f:0.5);')
# OK w/ no tree
coevolve_alignments_validation(method,aln1,aln2,2,None)
# OK w/ tree
coevolve_alignments_validation(method,aln1,aln2,2,None,tree=t)
# mismatch b/w alignments seq names
aln1 = DenseAlignment(data={'3':'AC','2':'AD'},MolType=PROTEIN)
aln2 = DenseAlignment(data={'1':'EFW','2':'EGY'},MolType=PROTEIN)
t = LoadTree(treestring='(1:0.5,2:0.5);')
self.assertRaises(AssertionError,coevolve_alignments_validation,\
method,aln1,aln2,2,None,tree=t)
# mismatch b/w alignments and tree seq names
aln1 = DenseAlignment(data={'1':'AC','2':'AD'},MolType=PROTEIN)
aln2 = DenseAlignment(data={'1':'EFW','2':'EGY'},MolType=PROTEIN)
t = LoadTree(treestring='(3:0.5,2:0.5);')
self.assertRaises(AssertionError,\
coevolve_alignments_validation,method,\
aln1,aln2,2,None,tree=t)
# mismatch b/w alignments in number of seqs
aln1 = DenseAlignment(\
data={'1':'AC','2':'AD','3':'AA'},MolType=PROTEIN)
aln2 = DenseAlignment(data={'1':'EFW','2':'EGY'},MolType=PROTEIN)
t = LoadTree(treestring='(1:0.5,2:0.5);')
self.assertRaises(AssertionError,coevolve_alignments_validation,\
method,aln1,aln2,2,None)
self.assertRaises(AssertionError,coevolve_alignments_validation,\
method,aln1,aln2,2,None,tree=t)
# mismatch b/w alignments & tree in number of seqs
aln1 = DenseAlignment(data={'1':'AC','2':'AD'},MolType=PROTEIN)
aln2 = DenseAlignment(data={'1':'EFW','2':'EGY'},MolType=PROTEIN)
t = LoadTree(treestring='(1:0.5,(2:0.5,3:0.25));')
self.assertRaises(AssertionError,coevolve_alignments_validation,\
method,aln1,aln2,2,None,tree=t)
def test_coevolve_alignments_validation_min_num_seqs(self):
"""coevolve_alignments_validation: ValueError on fewer than min_num_seqs """
method = mi_alignment
# too few sequences -> ValueError
aln1 = DenseAlignment(data={'1':'AC','2':'AD'},MolType=PROTEIN)
aln2 = DenseAlignment(data={'1':'EFW','2':'EGY'},MolType=PROTEIN)
coevolve_alignments_validation(method,aln1,aln2,1,None)
coevolve_alignments_validation(method,aln1,aln2,2,None)
self.assertRaises(ValueError,\
coevolve_alignments_validation,method,aln1,aln2,3,None)
def test_coevolve_alignments_validation_max_num_seqs(self):
"""coevolve_alignments_validation: min_num_seqs <= max_num_seqs
"""
method = mi_alignment
# min_num_seqs > max_num_seqs-> ValueError
aln1 = DenseAlignment(data={'1':'AC','2':'AD'},MolType=PROTEIN)
aln2 = DenseAlignment(data={'1':'EFW','2':'EGY'},MolType=PROTEIN)
coevolve_alignments_validation(method,aln1,aln2,1,None)
coevolve_alignments_validation(method,aln1,aln2,1,3)
coevolve_alignments_validation(method,aln1,aln2,2,3)
self.assertRaises(ValueError,\
coevolve_alignments_validation,method,aln1,aln2,3,2)
def test_coevolve_alignments_validation_moltypes(self):
"""coevolve_alignments_validation: valid for acceptable MolTypes
"""
aln1 = DenseAlignment(data={'1':'AC','2':'AU'},MolType=RNA)
aln2 = DenseAlignment(data={'1':'EFW','2':'EGY'},MolType=PROTEIN)
# different MolType
coevolve_alignments_validation(mi_alignment,aln1,aln2,2,None)
coevolve_alignments_validation(nmi_alignment,aln1,aln2,2,None)
coevolve_alignments_validation(\
resampled_mi_alignment,aln1,aln2,2,None)
self.assertRaises(AssertionError,coevolve_alignments_validation,\
sca_alignment,aln1,aln2,2,None)
self.assertRaises(AssertionError,coevolve_alignments_validation,\
ancestral_state_alignment,aln1,aln2,2,None)
self.assertRaises(AssertionError,coevolve_alignments_validation,\
gctmpca_alignment,aln1,aln2,2,None)
def test_coevolve_alignments(self):
""" coevolve_alignments: returns correct len(aln1) x len(aln2) matrix
"""
aln1 = DenseAlignment(data={'1':'AC','2':'AD'},MolType=PROTEIN)
aln2 = DenseAlignment(data={'1':'EFW','2':'EGY'},MolType=PROTEIN)
combined_aln =\
DenseAlignment(data={'1':'ACEFW','2':'ADEGY'},MolType=PROTEIN)
t = LoadTree(treestring='(1:0.5,2:0.5);')
cutoff = 0.50
# MI
m = mi_alignment(combined_aln)
expected = array([[m[2,0],m[2,1]],\
[m[3,0],m[3,1]],[m[4,0],m[4,1]]])
self.assertFloatEqual(coevolve_alignments(mi_alignment,aln1,aln2),\
expected)
# MI (return_full=True)
self.assertFloatEqual(coevolve_alignments(mi_alignment,aln1,aln2,\
return_full=True),m)
# NMI
m = nmi_alignment(combined_aln)
expected = array([[m[2,0],m[2,1]],\
[m[3,0],m[3,1]],[m[4,0],m[4,1]]])
self.assertFloatEqual(coevolve_alignments(nmi_alignment,aln1,aln2),\
expected)
# AS
m = ancestral_state_alignment(combined_aln,tree=t)
expected = array([[m[2,0],m[2,1]],\
[m[3,0],m[3,1]],[m[4,0],m[4,1]]])
self.assertFloatEqual(\
coevolve_alignments(ancestral_state_alignment,aln1,aln2,\
tree=t),expected)
# SCA
m = sca_alignment(combined_aln,cutoff=cutoff)
expected = array([[m[2,0],m[2,1]],\
[m[3,0],m[3,1]],[m[4,0],m[4,1]]])
self.assertFloatEqual(coevolve_alignments(sca_alignment,aln1,aln2,\
cutoff=cutoff),expected)
def test_coevolve_alignments_watches_min_num_seqs(self):
""" coevolve_alignments: error on too few sequences """
aln1 = DenseAlignment(data={'1':'AC','2':'AD'},MolType=PROTEIN)
aln2 = DenseAlignment(data={'1':'EFW','2':'EGY'},MolType=PROTEIN)
coevolve_alignments(mi_alignment,aln1,aln2)
coevolve_alignments(mi_alignment,aln1,aln2,min_num_seqs=0)
coevolve_alignments(mi_alignment,aln1,aln2,min_num_seqs=1)
coevolve_alignments(mi_alignment,aln1,aln2,min_num_seqs=2)
self.assertRaises(ValueError,\
coevolve_alignments,mi_alignment,aln1,aln2,min_num_seqs=3)
self.assertRaises(ValueError,\
coevolve_alignments,mi_alignment,aln1,aln2,min_num_seqs=50)
def test_coevolve_alignments_watches_max_num_seqs(self):
""" coevolve_alignments: filtering or error on too many sequences """
aln1 = DenseAlignment(data={'1':'AC','2':'AD','3':'YP'},\
MolType=PROTEIN)
aln2 = DenseAlignment(data={'1':'ACP','2':'EAD','3':'PYP'},\
MolType=PROTEIN)
# keep all seqs
tmp_filepath = get_tmp_filename(\
prefix='tmp_test_coevolution',suffix='.fasta')
coevolve_alignments(mi_alignment,aln1,aln2,max_num_seqs=3,\
merged_aln_filepath=tmp_filepath)
self.assertEqual(LoadSeqs(tmp_filepath).getNumSeqs(),3)
# keep 2 seqs
coevolve_alignments(mi_alignment,aln1,aln2,max_num_seqs=2,\
merged_aln_filepath=tmp_filepath)
self.assertEqual(LoadSeqs(tmp_filepath).getNumSeqs(),2)
# error if no sequence filter
self.assertRaises(ValueError,\
coevolve_alignments,mi_alignment,aln1,aln2,max_num_seqs=2,\
merged_aln_filepath=tmp_filepath,sequence_filter=None)
# clean up the temporary file
remove(tmp_filepath)
def test_coevolve_alignments_different_MolType(self):
""" coevolve_alignments: different MolTypes supported """
aln1 = DenseAlignment(data={'1':'AC','2':'AU'},MolType=RNA)
aln2 = DenseAlignment(data={'1':'EFW','2':'EGY'},MolType=PROTEIN)
combined_aln = DenseAlignment(data={'1':'ACEFW','2':'AUEGY'})
t = LoadTree(treestring='(1:0.5,2:0.5);')
cutoff = 0.50
# MI
m = mi_alignment(combined_aln)
expected = array([[m[2,0],m[2,1]],\
[m[3,0],m[3,1]],[m[4,0],m[4,1]]])
self.assertFloatEqual(coevolve_alignments(mi_alignment,aln1,aln2),\
expected)
# MI (return_full=True)
self.assertFloatEqual(coevolve_alignments(mi_alignment,aln1,aln2,\
return_full=True),m)
# NMI
m = nmi_alignment(combined_aln)
expected = array([[m[2,0],m[2,1]],\
[m[3,0],m[3,1]],[m[4,0],m[4,1]]])
self.assertFloatEqual(coevolve_alignments(nmi_alignment,aln1,aln2),\
expected)
def test_mi_pair_cols_default_exclude_handling(self):
""" mi_pair returns null_value on excluded by default """
aln = DenseAlignment(data={'1':'AB','2':'-B'},MolType=PROTEIN)
self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1), gDefaultNullValue)
aln = DenseAlignment(data={'1':'-B','2':'-B'},MolType=PROTEIN)
self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1), gDefaultNullValue)
aln = DenseAlignment(data={'1':'AA','2':'-B'},MolType=PROTEIN)
self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1), gDefaultNullValue)
aln = DenseAlignment(data={'1':'AA','2':'PB'},MolType=PROTEIN)
self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1,excludes='P'),\
gDefaultNullValue)
def test_mi_pair_cols_non_default_exclude_handling(self):
""" mi_pair uses non-default exclude_handler when provided"""
aln = DenseAlignment(data={'1':'A-','2':'A-'},MolType=PROTEIN)
self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1), gDefaultNullValue)
self.assertFloatEqual(\
mi_pair(aln,pos1=0,pos2=1,exclude_handler=ignore_excludes),0.0)
def test_mi_pair_cols_and_entropies(self):
""" mi_pair calculates mi from a pair of columns and precalc entropies
"""
aln = DenseAlignment(data={'1':'AB','2':'AB'},MolType=PROTEIN)
self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1,h1=0.0,h2=0.0), 0.0)
aln = DenseAlignment(data={'1':'AB','2':'BA'},MolType=PROTEIN)
self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1,h1=1.0,h2=1.0), 1.0)
# incorrect positional entropies provided to ensure that the
# precalculated values are used, and that entorpies are not
# caluclated on-the-fly.
aln = DenseAlignment(data={'1':'AB','2':'AB'},MolType=PROTEIN)
self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1,h1=1.0,h2=1.0), 2.0)
def test_mi_pair_alt_calculator(self):
""" mi_pair uses alternate mi_calculator when provided """
aln = DenseAlignment(data={'1':'AB','2':'AB'},MolType=PROTEIN)
self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1),0.0)
self.assertFloatEqual(mi_pair(aln,pos1=0,pos2=1,\
mi_calculator=normalized_mi),gDefaultNullValue)
def test_mi_position_valid_input(self):
""" mi_position functions with varied valid input """
aln = DenseAlignment(data={'1':'ACG','2':'GAC'},MolType=PROTEIN)
self.assertFloatEqual(mi_position(aln,0),array([1.0,1.0,1.0]))
aln = DenseAlignment(data={'1':'ACG','2':'ACG'},MolType=PROTEIN)
self.assertFloatEqual(mi_position(aln,0),array([0.0,0.0,0.0]))
aln = DenseAlignment(data={'1':'ACG','2':'ACG'},MolType=PROTEIN)
self.assertFloatEqual(mi_position(aln,2),array([0.0,0.0,0.0]))
def test_mi_position_from_alignment_nmi(self):
"""mi_position functions w/ alternate mi_calculator """
aln = DenseAlignment(data={'1':'ACG','2':'ACG'},MolType=PROTEIN)
self.assertFloatEqual(mi_position(aln,0),array([0.0,0.0,0.0]))
aln = DenseAlignment(data={'1':'ACG','2':'ACG'},MolType=PROTEIN)
self.assertFloatEqual(mi_position(aln,0,mi_calculator=normalized_mi),\
array([gDefaultNullValue,gDefaultNullValue,gDefaultNullValue]))
def test_mi_position_from_alignment_default_exclude_handling(self):
""" mi_position handles excludes by setting to null_value"""
aln = DenseAlignment(data={'1':'ACG','2':'G-C'},MolType=PROTEIN)
self.assertFloatEqual(mi_position(aln,0),\
array([1.0,gDefaultNullValue,1.0]))
aln = DenseAlignment(data={'1':'ACG','2':'GPC'},MolType=PROTEIN)
self.assertFloatEqual(mi_position(aln,0,excludes='P'),\
array([1.0,gDefaultNullValue,1.0]))
def test_mi_position_from_alignment_non_default_exclude_handling(self):
""" mi_position handles excludes w/ non-default method"""
aln = DenseAlignment(data={'1':'ACG','2':'G-C'},MolType=PROTEIN)
self.assertFloatEqual(\
mi_position(aln,0,exclude_handler=ignore_excludes),\
array([1.0,1.0,1.0]))
def test_mi_alignment_excludes(self):
""" mi_alignment handles excludes properly """
expected = array([[0.0, gDefaultNullValue, 0.0],\
[gDefaultNullValue,gDefaultNullValue,gDefaultNullValue],\
[0.0,gDefaultNullValue,0.0]])
# gap in second column
aln = DenseAlignment(data={'1':'ACG','2':'A-G'},MolType=PROTEIN)
self.assertFloatEqual(mi_alignment(aln),expected)
# excludes = 'P'
aln = DenseAlignment(data={'1':'ACG','2':'APG'},MolType=PROTEIN)
self.assertFloatEqual(mi_alignment(aln,excludes='P'),\
expected)
# gap in first column
expected = array([\
[gDefaultNullValue, gDefaultNullValue, gDefaultNullValue],\
[gDefaultNullValue,0.0,0.0], [gDefaultNullValue,0.0,0.0]])
aln = DenseAlignment(data={'1':'-CG','2':'ACG'},MolType=PROTEIN)
self.assertFloatEqual(mi_alignment(aln),expected)
def test_mi_alignment_high(self):
""" mi_alignment detected perfectly correlated columns """
expected = [[1.0, 1.0],[1.0,1.0]]
aln = DenseAlignment(data={'1':'AG','2':'GA'},MolType=PROTEIN)
self.assertFloatEqual(mi_alignment(aln),expected)
def test_mi_alignment_low(self):
""" mi_alignment detected in perfectly uncorrelated columns
"""
expected = [[0.0, 0.0],[0.0,1.0]]
aln = DenseAlignment(data={'1':'AG','2':'AC'},MolType=PROTEIN)
self.assertFloatEqual(mi_alignment(aln),expected)
def test_resampled_mi_alignment(self):
""" resampled_mi_alignment returns without error """
aln = DenseAlignment(data={'1':'ACDEF','2':'ACFEF','3':'ACGEF'},\
MolType=PROTEIN)
resampled_mi_alignment(aln)
aln = DenseAlignment(data={'1':'ACDEF','2':'ACF-F','3':'ACGEF'},\
MolType=PROTEIN)
resampled_mi_alignment(aln)
def test_coevolve_alignment(self):
""" coevolve_alignment functions as expected with varied input """
aln1 = DenseAlignment(data={'1':'ACDEF','2':'ACFEF','3':'ACGEF'},\
MolType=PROTEIN)
# no kwargs passed
self.assertFloatEqual(coevolve_alignment(mi_alignment,aln1),\
mi_alignment(aln1))
# different method passed
self.assertFloatEqual(coevolve_alignment(nmi_alignment,aln1),\
nmi_alignment(aln1))
# kwargs passed
self.assertFloatEqual(coevolve_alignment(mi_alignment,aln1,\
mi_calculator=nmi),nmi_alignment(aln1))
def test_build_coevolution_matrix_filepath(self):
""" build_coevolution_matrix_filepath functions w/ varied input """
self.assertEqual(build_coevolution_matrix_filepath(\
'./blah.fasta'),'./blah')
self.assertEqual(build_coevolution_matrix_filepath(\
'blah.fasta'),'./blah')
self.assertEqual(build_coevolution_matrix_filepath('blah'),'./blah')
self.assertEqual(build_coevolution_matrix_filepath('./blah'),'./blah')
self.assertEqual(build_coevolution_matrix_filepath('./blah.fasta',\
output_dir='./duh/',method='xx',alphabet='yyy'),\
'./duh/blah.yyy.xx')
self.assertEqual(build_coevolution_matrix_filepath('./blah.fasta',\
output_dir='./duh/',method='xx',alphabet='yyy',\
parameter=0.25),\
'./duh/blah.yyy.xx')
self.assertEqual(build_coevolution_matrix_filepath('./blah.fasta',\
output_dir='./duh/',method='xx'),'./duh/blah.xx')
self.assertEqual(build_coevolution_matrix_filepath('./blah.fasta',\
output_dir='./duh/',method='sca',parameter=0.25),\
'./duh/blah.sca_25')
self.assertEqual(build_coevolution_matrix_filepath('./blah.fasta',\
output_dir='./duh/',method='sca',parameter=0.25,\
alphabet='xx'),'./duh/blah.xx.sca_25')
# no trailing / to output_dir
self.assertEqual(build_coevolution_matrix_filepath('./blah.fasta',\
output_dir='./duh',method='sca',parameter=0.25,\
alphabet='xx'),'./duh/blah.xx.sca_25')
self.assertEqual(build_coevolution_matrix_filepath('./blah.fasta',\
output_dir='./duh/',method='gctmpca',parameter=0.25),\
'./duh/blah.gctmpca_25')
self.assertRaises(ValueError,build_coevolution_matrix_filepath,\
'./blah.fasta','./duh/','gctmpca')
self.assertRaises(ValueError,build_coevolution_matrix_filepath,\
'./blah.fasta','./duh/','sca')
self.assertRaises(ValueError,build_coevolution_matrix_filepath,\
'./blah.fasta','./duh/','sca','xx')
def test_pickle_coevolution_result_error(self):
"""pickle matrix: IOError handled correctly"""
m = array([[1,2],[3,4]])
self.assertRaises(IOError,pickle_coevolution_result,m,'')
def test_unpickle_coevolution_result_error(self):
"""unpickle matrix: IOError handled correctly"""
self.assertRaises(IOError,unpickle_coevolution_result,\
'invalid/file/path.pkl')
def test_pickle_and_unpickle(self):
"""unpickle(pickle(matrix)) == matrix"""
for expected in [4.5,array([1.2,4.3,5.5]),\
array([[1.4,2.2],[3.0,0.4]])]:
filepath = mktemp()
pickle_coevolution_result(expected,filepath)
actual = unpickle_coevolution_result(filepath)
self.assertFloatEqual(actual,expected)
remove(filepath)
def test_csv_coevolution_result_error(self):
"""matrix -> csv: IOError handled correctly"""
m = array([[1,2],[3,4]])
self.assertRaises(IOError,coevolution_matrix_to_csv,m,'')
def test_uncsv_coevolution_result_error(self):
"""csv -> matrix: IOError handled correctly"""
self.assertRaises(IOError,csv_to_coevolution_matrix,\
'invalid/file/path.pkl')
def test_csv_and_uncsv(self):
"""converting to/from csv matrix results in correct coevolution matrix
"""
expected = array([[1.4,2.2],[gDefaultNullValue,0.4]])
filepath = mktemp()
coevolution_matrix_to_csv(expected,filepath)
actual = csv_to_coevolution_matrix(filepath)
self.assertFloatEqual(actual,expected)
remove(filepath)
def test_parse_coevolution_matrix_filepath(self):
"""Parsing matrix filepaths works as expected. """
expected = ('myosin_995', 'a1_4', 'nmi')
self.assertEqual(\
parse_coevolution_matrix_filepath('pkls/myosin_995.a1_4.nmi.pkl'),\
expected)
self.assertEqual(\
parse_coevolution_matrix_filepath('pkls/myosin_995.a1_4.nmi.csv'),\
expected)
expected = ('p53','orig','mi')
self.assertEqual(\
parse_coevolution_matrix_filepath('p53.orig.mi.pkl'),\
expected)
self.assertEqual(\
parse_coevolution_matrix_filepath('p53.orig.mi.csv'),\
expected)
def test_parse_coevolution_matrix_filepath_error(self):
"""Parsing matrix file paths handles invalid filepaths """
self.assertRaises(ValueError,\
parse_coevolution_matrix_filepath,'pkls/myosin_995.nmi.pkl')
self.assertRaises(ValueError,\
parse_coevolution_matrix_filepath,'pkls/myosin_995.pkl')
self.assertRaises(ValueError,\
parse_coevolution_matrix_filepath,'pkls/myosin_995')
self.assertRaises(ValueError,\
parse_coevolution_matrix_filepath,'')
def test_identify_aln_positions_above_threshold(self):
"""Extracting scores above threshold works as expected """
m = array([\
[gDefaultNullValue,gDefaultNullValue,\
gDefaultNullValue,gDefaultNullValue],\
[0.3, 1.0,gDefaultNullValue,gDefaultNullValue],\
[0.25,0.75,1.0,gDefaultNullValue],
[0.9,0.751,0.8,1.0]])
self.assertEqual(identify_aln_positions_above_threshold(m,0.75,0),[])
self.assertEqual(identify_aln_positions_above_threshold(m,0.75,1),\
[1])
self.assertEqual(identify_aln_positions_above_threshold(m,0.75,2),\
[1,2])
self.assertEqual(identify_aln_positions_above_threshold(m,0.75,3),\
[0,1,2,3])
m = ltm_to_symmetric(m)
self.assertEqual(identify_aln_positions_above_threshold(m,0.75,0),\
[3])
self.assertEqual(identify_aln_positions_above_threshold(m,0.75,1),\
[1,2,3])
self.assertEqual(identify_aln_positions_above_threshold(m,0.75,2),\
[1,2,3])
self.assertEqual(identify_aln_positions_above_threshold(m,0.75,3),\
[0,1,2,3])
self.assertEqual(identify_aln_positions_above_threshold(m,1.1,0),\
[])
self.assertEqual(identify_aln_positions_above_threshold(m,-5.,0),\
[1,2,3])
self.assertEqual(identify_aln_positions_above_threshold(m,-5.,1),\
[0,1,2,3])
def test_count_ge_threshold(self):
"""count_ge_threshold works as expected """
m = array([[gDefaultNullValue]*3]*3)
self.assertEqual(count_ge_threshold(m,1.0),(0,0))
self.assertEqual(count_ge_threshold(m,\
gDefaultNullValue,gDefaultNullValue),(0,0))
self.assertEqual(count_ge_threshold(m,1.0,42),(0,9))
m = array([[0,1,2],[3,4,5],[6,7,8]])
self.assertEqual(count_ge_threshold(m,4),(5,9))
self.assertEqual(count_ge_threshold(m,8),(1,9))
self.assertEqual(count_ge_threshold(m,9),(0,9))
m = array([[0,gDefaultNullValue,gDefaultNullValue],\
[gDefaultNullValue,4,5],[6,7,8]])
self.assertEqual(count_ge_threshold(m,4),(5,6))
self.assertEqual(count_ge_threshold(m,8),(1,6))
self.assertEqual(count_ge_threshold(m,9),(0,6))
def test_count_le_threshold(self):
"""count_le_threshold works as expected """
m = array([[gDefaultNullValue]*3]*3)
self.assertEqual(count_le_threshold(m,1.0),(0,0))
self.assertEqual(count_le_threshold(m,\
gDefaultNullValue,gDefaultNullValue),(0,0))
self.assertEqual(count_le_threshold(m,1.0,42),(0,9))
m = array([[0,1,2],[3,4,5],[6,7,8]])
self.assertEqual(count_le_threshold(m,4),(5,9))
self.assertEqual(count_le_threshold(m,8),(9,9))
self.assertEqual(count_le_threshold(m,9),(9,9))
m = array([[0,gDefaultNullValue,gDefaultNullValue],\
[gDefaultNullValue,4,5],[6,7,8]])
self.assertEqual(count_le_threshold(m,4),(2,6))
self.assertEqual(count_le_threshold(m,8),(6,6))
self.assertEqual(count_le_threshold(m,9),(6,6))
def test_count_ge_threshold_symmetric_ignore_diagonal(self):
"""count_ge_threshold works with symmetric and/or ignoring diag = True
"""
# no good scores, varied null value
m = array([[gDefaultNullValue]*3]*3)
self.assertEqual(count_ge_threshold(m,1.0,symmetric=True),(0,0))
self.assertEqual(count_ge_threshold(m,1.0,symmetric=True),(0,0))
self.assertEqual(count_ge_threshold(m,1.0,42,symmetric=True),(0,6))
self.assertEqual(count_ge_threshold(m,1.0,ignore_diagonal=True),(0,0))
self.assertEqual(count_ge_threshold(m,1.0,ignore_diagonal=True),(0,0))
self.assertEqual(count_ge_threshold(m,1.0,42,\
ignore_diagonal=True),(0,6))
self.assertEqual(count_ge_threshold(m,1.0,\
ignore_diagonal=True,symmetric=True),(0,0))
self.assertEqual(count_ge_threshold(m,1.0,\
ignore_diagonal=True,symmetric=True),(0,0))
self.assertEqual(count_ge_threshold(m,1.0,42,\
ignore_diagonal=True,symmetric=True),(0,3))
# no null values, varied other values
m = array([[0,1,2],[3,4,5],[6,7,8]])
self.assertEqual(count_ge_threshold(m,4),(5,9))
self.assertEqual(count_ge_threshold(m,4,symmetric=True),(4,6))
self.assertEqual(count_ge_threshold(m,4,ignore_diagonal=True),(3,6))
self.assertEqual(count_ge_threshold(m,4,symmetric=True,\
ignore_diagonal=True),(2,3))
# null and mixed values
m = array([\
[0,gDefaultNullValue,gDefaultNullValue],\
[3,4,gDefaultNullValue],\
[gDefaultNullValue,7,8]])
self.assertEqual(count_ge_threshold(m,4),(3,5))
self.assertEqual(count_ge_threshold(m,4,symmetric=True),(3,5))
self.assertEqual(count_ge_threshold(m,4,ignore_diagonal=True),(1,2))
self.assertEqual(count_ge_threshold(m,4,symmetric=True,\
ignore_diagonal=True),(1,2))
def test_count_le_threshold_symmetric_ignore_diagonal(self):
"""count_le_threshold works with symmetric and/or ignoring diag = True
"""
# varied null value
m = array([[gDefaultNullValue]*3]*3)
self.assertEqual(count_le_threshold(m,1.0,symmetric=True),(0,0))
self.assertEqual(count_le_threshold(m,1.0,symmetric=True),(0,0))
self.assertEqual(count_le_threshold(m,1.0,42,symmetric=True),(0,6))
self.assertEqual(count_le_threshold(m,1.0,ignore_diagonal=True),(0,0))
self.assertEqual(count_le_threshold(m,1.0,ignore_diagonal=True),(0,0))
self.assertEqual(count_le_threshold(m,1.0,42,\
ignore_diagonal=True),(0,6))
self.assertEqual(count_le_threshold(m,1.0,\
ignore_diagonal=True,symmetric=True),(0,0))
self.assertEqual(count_le_threshold(m,1.0,\
ignore_diagonal=True,symmetric=True),(0,0))
self.assertEqual(count_le_threshold(m,1.0,42,\
ignore_diagonal=True,symmetric=True),(0,3))
# no null values, varied other values
m = array([[0,1,2],[3,4,5],[6,7,8]])
self.assertEqual(count_le_threshold(m,4),(5,9))
self.assertEqual(count_le_threshold(m,4,symmetric=True),(3,6))
self.assertEqual(count_le_threshold(m,4,ignore_diagonal=True),(3,6))
self.assertEqual(count_le_threshold(m,4,symmetric=True,\
ignore_diagonal=True),(1,3))
# null and mixed values
m = array([\
[0,gDefaultNullValue,gDefaultNullValue],\
[3,4,gDefaultNullValue],\
[gDefaultNullValue,7,8]])
self.assertEqual(count_le_threshold(m,4),(3,5))
self.assertEqual(count_le_threshold(m,4,symmetric=True),(3,5))
self.assertEqual(count_le_threshold(m,4,ignore_diagonal=True),(1,2))
self.assertEqual(count_le_threshold(m,4,symmetric=True,\
ignore_diagonal=True),(1,2))
def test_aln_position_pairs_cmp_threshold_intramolecular(self):
"""aln_position_pairs_ge_threshold: intramolecular matrix
"""
m = array([\
[0,gDefaultNullValue,gDefaultNullValue],\
[3,4,gDefaultNullValue],\
[gDefaultNullValue,7,8]])
# cmp_function = ge
self.assertEqual(aln_position_pairs_cmp_threshold(m,3.5,greater_equal),\
[(1,1),(2,1),(2,2)])
# cmp_function = greater_equal, alt null_value
self.assertEqual(aln_position_pairs_cmp_threshold(\
m,3.5,greater_equal,null_value=4),\
[(2,1),(2,2)])
# cmp_function = le
self.assertEqual(aln_position_pairs_cmp_threshold(m,3.5,less_equal),\
[(0,0),(1,0)])
# results equal results with wrapper functions
self.assertEqual(aln_position_pairs_cmp_threshold(m,3.5,greater_equal),\
aln_position_pairs_ge_threshold(m,3.5))
self.assertEqual(aln_position_pairs_cmp_threshold(m,3.5,less_equal),\
aln_position_pairs_le_threshold(m,3.5))
self.assertEqual(aln_position_pairs_cmp_threshold(\
m,3.5,greater_equal,null_value=4),\
aln_position_pairs_ge_threshold(m,3.5,null_value=4))
self.assertEqual(aln_position_pairs_cmp_threshold(\
m,3.5,less_equal,null_value=0),\
aln_position_pairs_le_threshold(m,3.5,null_value=0))
def test_aln_position_pairs_ge_threshold_intermolecular(self):
"""aln_position_pairs_ge_threshold: intermolecular matrix
"""
m = array([[1.,10.,4.,3.],[9.,18.,5.,6.]])
# error if failed to specify intermolecular_data_only=True
self.assertRaises(AssertionError,aln_position_pairs_cmp_threshold,\
m,3.5,greater_equal)
# cmp_function = ge
self.assertEqual(aln_position_pairs_cmp_threshold(\
m,3.5,greater_equal,intermolecular_data_only=True),\
[(1,4),(2,4),(0,5),(1,5),(2,5),(3,5)])
# cmp_function = greater_equal, alt null_value
self.assertEqual(aln_position_pairs_cmp_threshold(\
m,3.5,greater_equal,null_value=18.,intermolecular_data_only=True),\
[(1,4),(2,4),(0,5),(2,5),(3,5)])
# cmp_function = le
self.assertEqual(aln_position_pairs_cmp_threshold(\
m,3.5,less_equal,intermolecular_data_only=True),\
[(0,4),(3,4)])
# results equal results with wrapper functions
self.assertEqual(aln_position_pairs_cmp_threshold(\
m,3.5,greater_equal,intermolecular_data_only=True),\
aln_position_pairs_ge_threshold(m,3.5,intermolecular_data_only=True))
self.assertEqual(aln_position_pairs_cmp_threshold(\
m,3.5,less_equal,intermolecular_data_only=True),\
aln_position_pairs_le_threshold(m,3.5,intermolecular_data_only=True))
self.assertEqual(aln_position_pairs_cmp_threshold(\
m,3.5,greater_equal,null_value=4.,intermolecular_data_only=True),\
aln_position_pairs_ge_threshold(m,3.5,null_value=4.,\
intermolecular_data_only=True))
self.assertEqual(aln_position_pairs_cmp_threshold(\
m,3.5,less_equal,null_value=18.,intermolecular_data_only=True),\
aln_position_pairs_le_threshold(m,3.5,null_value=18.,\
intermolecular_data_only=True))
def test_is_parsimony_informative_strict(self):
""" is_parsimony_informative functions as expected with strict=True
"""
freqs = {'A':25}
self.assertFalse(is_parsimony_informative(freqs,strict=True))
freqs = {'A':25,'-':25}
self.assertFalse(is_parsimony_informative(freqs,strict=True))
freqs = {'A':25,'?':25}
self.assertFalse(is_parsimony_informative(freqs,strict=True))
freqs = {'A':25,'B':1}
self.assertFalse(is_parsimony_informative(freqs,strict=True))
freqs = {'A':1,'B':1,'C':1,'D':1,'E':1}
self.assertFalse(is_parsimony_informative(freqs,strict=True))
freqs = {'A':2,'B':1,'C':1,'D':1,'E':1}
self.assertFalse(is_parsimony_informative(freqs,strict=True))
freqs = {'A':2,'B':2,'C':1,'D':1,'E':1}
self.assertFalse(is_parsimony_informative(freqs,strict=True))
freqs = {'A':25,'B':2}
self.assertTrue(is_parsimony_informative(freqs,strict=True))
freqs = {'A':2,'B':2,'C':2,'D':2,'E':2}
self.assertTrue(is_parsimony_informative(freqs,strict=True))
def test_is_parsimony_informative_non_strict(self):
""" is_parsimony_informative functions as expected with strict=False
"""
freqs = {'A':25}
self.assertFalse(is_parsimony_informative(freqs,strict=False))
freqs = {'A':25,'-':25}
self.assertFalse(is_parsimony_informative(freqs,strict=False))
freqs = {'A':25,'?':25}
self.assertFalse(is_parsimony_informative(freqs,strict=False))
freqs = {'A':25,'B':1}
self.assertFalse(is_parsimony_informative(freqs,strict=False))
freqs = {'A':1,'B':1,'C':1,'D':1,'E':1}
self.assertFalse(is_parsimony_informative(freqs,strict=False))
freqs = {'A':2,'B':1,'C':1,'D':1,'E':1}
self.assertFalse(is_parsimony_informative(freqs,strict=False))
freqs = {'A':2,'B':2,'C':1,'D':1,'E':1}
self.assertTrue(is_parsimony_informative(freqs,strict=False))
freqs = {'A':25,'B':2}
self.assertTrue(is_parsimony_informative(freqs,strict=False))
freqs = {'A':2,'B':2,'C':2,'D':2,'E':2}
self.assertTrue(is_parsimony_informative(freqs,strict=False))
def test_is_parsimony_informative_non_default(self):
""" is_parsimony_informative functions w non default paramters
"""
## NEED TO UPDATE THESE TESTS BASED ON MY ERROR IN THE
## DEFINITION OF PARSIMONY INFORMATIVE.
# changed minimum_count
freqs = {'A':25,'B':2}
self.assertFalse(is_parsimony_informative(freqs,\
minimum_count=3,strict=False))
freqs = {'A':25,'B':1}
self.assertTrue(is_parsimony_informative(freqs,\
minimum_count=1,strict=False))
# different value of strict yields different results
freqs = {'A':25,'B':2,'C':3}
self.assertTrue(is_parsimony_informative(freqs,\
minimum_count=3,strict=False))
self.assertFalse(is_parsimony_informative(freqs,\
minimum_count=3,strict=True))
# changed minimum_differences
freqs = {'A':25,'B':25}
self.assertFalse(is_parsimony_informative(\
freqs,minimum_differences=3,strict=False))
freqs = {'A':25}
self.assertTrue(is_parsimony_informative(\
freqs,minimum_differences=1,strict=False))
# changed ignored
freqs = {'A':25,'-':25,'?':25}
self.assertTrue(is_parsimony_informative(freqs,ignored=None,\
strict=False))
freqs = {'A':25,'?':25}
self.assertTrue(is_parsimony_informative(freqs,ignored='',\
strict=False))
freqs = {'A':25,'-':25}
self.assertTrue(is_parsimony_informative(freqs,ignored=None,\
strict=False))
freqs = {'A':25,'C':25}
self.assertFalse(is_parsimony_informative(freqs,ignored='A',\
strict=False))
def test_filter_non_parsimony_informative_intramolecular(self):
""" non-parsimony informative sites in intramolecular matrix -> null
"""
aln = LoadSeqs(data={'1':'ACDE','2':'ACDE','3':'ACDE','4':'ACDE'},\
moltype=PROTEIN,aligned=DenseAlignment)
m = array([[1.,10.,4.,3.],[9.,18.,5.,6.],
[4.,1.,3.,2.],[21.,0.,1.,33.]])
expected = array([[gDefaultNullValue]*4]*4)
filter_non_parsimony_informative(aln,m)
self.assertFloatEqual(m,expected)
aln = LoadSeqs(data={'1':'ACDE','2':'FCDE','3':'ACDE','4':'FCDE'},\
moltype=PROTEIN,aligned=DenseAlignment)
m = array([[42.,10.,4.,3.],[9.,18.,5.,6.],
[4.,1.,3.,2.],[21.,0.,1.,33.]])
expected = array([[gDefaultNullValue]*4]*4)
expected[0,0] = 42.
filter_non_parsimony_informative(aln,m)
self.assertFloatEqual(m,expected)
def test_filter_non_parsimony_informative_intermolecular(self):
""" non-parsimony informative sites in intermolecular matrix -> null
"""
# all non-parsimony informative
aln = LoadSeqs(data={'1':'ACDEWQ','2':'ACDEWQ','3':'ACDEWQ','4':'ACDEWQ'},\
moltype=PROTEIN,aligned=DenseAlignment)
m = array([[1.,10.,4.,3.],[9.,18.,5.,6.]])
expected = array([[gDefaultNullValue]*4]*2)
filter_non_parsimony_informative(aln,m,intermolecular_data_only=True)
self.assertFloatEqual(m,expected)
# one non-parsimony informative pair of positions
aln = LoadSeqs(data={'1':'FCDEWD','2':'ACDEWQ','3':'ACDEWD','4':'FCDEWQ'},\
moltype=PROTEIN,aligned=DenseAlignment)
m = array([[1.,10.,4.,3.],[9.,18.,5.,6.]])
expected = array([[gDefaultNullValue]*4]*2)
expected[1,0] = 9.
filter_non_parsimony_informative(aln,m,intermolecular_data_only=True)
self.assertFloatEqual(m,expected)
# all parsimony informative
aln = LoadSeqs(data={'1':'FFFFFF','2':'FFFFFF','3':'GGGGGG','4':'GGGGGG'},\
moltype=PROTEIN,aligned=DenseAlignment)
m = array([[1.,10.,4.,3.],[9.,18.,5.,6.]])
expected = array([[1.,10.,4.,3.],[9.,18.,5.,6.]])
filter_non_parsimony_informative(aln,m,intermolecular_data_only=True)
self.assertFloatEqual(m,expected)
def test_filter_exclude_positions_intramolecular(self):
"""filter_exclude_positions: functions for intramolecular data
"""
# filter zero positions (no excludes)
aln = LoadSeqs(data={'1':'WCDE','2':'ACDE','3':'ACDE','4':'ACDE'},\
moltype=PROTEIN,aligned=DenseAlignment)
m = array([[1.,10.,4.,3.],[9.,18.,5.,6.],
[4.,1.,3.,2.],[21.,0.,1.,33.]])
expected = array([[1.,10.,4.,3.],[9.,18.,5.,6.],
[4.,1.,3.,2.],[21.,0.,1.,33.]])
filter_exclude_positions(aln,m)
self.assertFloatEqual(m,expected)
# filter zero positions (max_exclude_percentage = percent exclude)
aln = LoadSeqs(data={'1':'-CDE','2':'A-DE','3':'AC-E','4':'ACD-'},\
moltype=PROTEIN,aligned=DenseAlignment)
m = array([[1.,10.,4.,3.],[9.,18.,5.,6.],
[4.,1.,3.,2.],[21.,0.,1.,33.]])
expected = array([[1.,10.,4.,3.],[9.,18.,5.,6.],
[4.,1.,3.,2.],[21.,0.,1.,33.]])
filter_exclude_positions(aln,m,max_exclude_percent=0.25)
self.assertFloatEqual(m,expected)
# filter zero positions (max_exclude_percentage too high)
aln = LoadSeqs(data={'1':'-CDE','2':'A-DE','3':'AC-E','4':'ACD-'},\
moltype=PROTEIN,aligned=DenseAlignment)
m = array([[1.,10.,4.,3.],[9.,18.,5.,6.],
[4.,1.,3.,2.],[21.,0.,1.,33.]])
expected = array([[1.,10.,4.,3.],[9.,18.,5.,6.],
[4.,1.,3.,2.],[21.,0.,1.,33.]])
filter_exclude_positions(aln,m,max_exclude_percent=0.5)
self.assertFloatEqual(m,expected)
# filter one position (defualt max_exclude_percentage)
aln = LoadSeqs(data={'1':'-CDE','2':'ACDE','3':'ACDE','4':'ACDE'},\
moltype=PROTEIN,aligned=DenseAlignment)
m = array([[1.,10.,4.,3.],[9.,18.,5.,6.],
[4.,1.,3.,2.],[21.,0.,1.,33.]])
expected = array([[gDefaultNullValue]*4,[gDefaultNullValue,18.,5.,6.],
[gDefaultNullValue,1.,3.,2.],[gDefaultNullValue,0.,1.,33.]])
filter_exclude_positions(aln,m)
self.assertFloatEqual(m,expected)
# filter one position (non-defualt max_exclude_percentage)
aln = LoadSeqs(data={'1':'-CDE','2':'ACDE','3':'ACDE','4':'-CDE'},\
moltype=PROTEIN,aligned=DenseAlignment)
m = array([[1.,10.,4.,3.],[9.,18.,5.,6.],
[4.,1.,3.,2.],[21.,0.,1.,33.]])
expected = array([[gDefaultNullValue]*4,[gDefaultNullValue,18.,5.,6.],
[gDefaultNullValue,1.,3.,2.],[gDefaultNullValue,0.,1.,33.]])
filter_exclude_positions(aln,m,max_exclude_percent=0.49)
self.assertFloatEqual(m,expected)
# filter all positions (defualt max_exclude_percentage)
aln = LoadSeqs(data={'1':'----','2':'ACDE','3':'ACDE','4':'ACDE'},\
moltype=PROTEIN,aligned=DenseAlignment)
m = array([[1.,10.,4.,3.],[9.,18.,5.,6.],
[4.,1.,3.,2.],[21.,0.,1.,33.]])
expected = array([[gDefaultNullValue]*4]*4)
filter_exclude_positions(aln,m)
self.assertFloatEqual(m,expected)
# filter all positions (non-defualt max_exclude_percentage)
aln = LoadSeqs(data={'1':'----','2':'A-DE','3':'AC--','4':'-CDE'},\
moltype=PROTEIN,aligned=DenseAlignment)
m = array([[1.,10.,4.,3.],[9.,18.,5.,6.],
[4.,1.,3.,2.],[21.,0.,1.,3.]])
expected = array([[gDefaultNullValue]*4]*4)
filter_exclude_positions(aln,m,max_exclude_percent=0.49)
self.assertFloatEqual(m,expected)
# filter one position (defualt max_exclude_percentage,
# non-defualt excludes)
aln = LoadSeqs(data={'1':'WCDE','2':'ACDE','3':'ACDE','4':'ACDE'},\
moltype=PROTEIN,aligned=DenseAlignment)
m = array([[1.,10.,4.,3.],[9.,18.,5.,6.],
[4.,1.,3.,2.],[21.,0.,1.,33.]])
expected = array([[gDefaultNullValue]*4,[gDefaultNullValue,18.,5.,6.],
[gDefaultNullValue,1.,3.,2.],[gDefaultNullValue,0.,1.,33.]])
filter_exclude_positions(aln,m,excludes='W')
self.assertFloatEqual(m,expected)
# filter one position (defualt max_exclude_percentage,
# non-defualt null_value)
aln = LoadSeqs(data={'1':'-CDE','2':'ACDE','3':'ACDE','4':'ACDE'},\
moltype=PROTEIN,aligned=DenseAlignment)
m = array([[1.,10.,4.,3.],[9.,18.,5.,6.],
[4.,1.,3.,2.],[21.,0.,1.,33.]])
expected = array([[999.]*4,[999.,18.,5.,6.],
[999.,1.,3.,2.],[999.,0.,1.,33.]])
filter_exclude_positions(aln,m,null_value=999.)
self.assertFloatEqual(m,expected)
def test_filter_exclude_positions_intermolecular(self):
"""filter_exclude_positions: functions for intermolecular data
"""
# these tests correspond to alignments of length 4 and 2 positions
# respectively, hence a coevolution_matrix with shape = (2,4)
# filter zero positions (no excludes)
merged_aln = LoadSeqs(data={'1':'WCDEDE','2':'ACDEDE',\
'3':'ACDEDE','4':'ACDEDE'},moltype=PROTEIN,aligned=DenseAlignment)
m = array([[1.,10.,4.,3.],[9.,18.,5.,6.]])
expected = array([[1.,10.,4.,3.],[9.,18.,5.,6.]])
filter_exclude_positions(merged_aln,m,intermolecular_data_only=True)
self.assertFloatEqual(m,expected)
# filter one position (aln1)
merged_aln = LoadSeqs(data={'1':'WC-EDE','2':'ACDEDE',\
'3':'ACDEDE','4':'ACDEDE'},moltype=PROTEIN,aligned=DenseAlignment)
m = array([[1.,10.,4.,3.],[9.,18.,5.,6.]])
expected = array([[1.,10.,gDefaultNullValue,3.],\
[9.,18.,gDefaultNullValue,6.]])
filter_exclude_positions(merged_aln,m,intermolecular_data_only=True)
self.assertFloatEqual(m,expected)
# filter one position (aln2)
merged_aln = LoadSeqs(data={'1':'WCEEDE','2':'ACDEDE',\
'3':'ACDEDE','4':'ACDED-'},moltype=PROTEIN,aligned=DenseAlignment)
m = array([[1.,10.,4.,3.],[9.,18.,5.,6.]])
expected = array([[1.,10.,4.,3.],\
[gDefaultNullValue]*4])
filter_exclude_positions(merged_aln,m,intermolecular_data_only=True)
self.assertFloatEqual(m,expected)
# filter two positions (aln1 & aln2)
merged_aln = LoadSeqs(data={'1':'-CEEDE','2':'ACDEDE',\
'3':'ACDEDE','4':'ACDED-'},moltype=PROTEIN,aligned=DenseAlignment)
m = array([[1.,10.,4.,3.],[9.,18.,5.,6.]])
expected = array([[gDefaultNullValue,10.,4.,3.],\
[gDefaultNullValue]*4])
filter_exclude_positions(merged_aln,m,intermolecular_data_only=True)
self.assertFloatEqual(m,expected)
# filter two positions (aln1 & aln2, alt excludes)
merged_aln = LoadSeqs(data={'1':'WCEEDE','2':'ACDEDE',\
'3':'ACDEDE','4':'ACDEDW'},moltype=PROTEIN,aligned=DenseAlignment)
m = array([[1.,10.,4.,3.],[9.,18.,5.,6.]])
expected = array([[gDefaultNullValue,10.,4.,3.],\
[gDefaultNullValue]*4])
filter_exclude_positions(merged_aln,m,intermolecular_data_only=True,\
excludes='W')
self.assertFloatEqual(m,expected)
# filter two positions (aln1 & aln2, alt null_value)
merged_aln = LoadSeqs(data={'1':'-CEEDE','2':'ACDEDE',\
'3':'ACDEDE','4':'ACDED-'},moltype=PROTEIN,aligned=DenseAlignment)
m = array([[1.,10.,4.,3.],[9.,18.,5.,6.]])
expected = array([[999.,10.,4.,3.],\
[999.]*4])
filter_exclude_positions(merged_aln,m,intermolecular_data_only=True,\
null_value=999.)
self.assertFloatEqual(m,expected)
def test_filter_threshold_based_multiple_interdependency_intermolecular(self):
"multiple interdependency filter functions with intermolecular data "
## cmp_function = ge
# lower boundary
null = gDefaultNullValue
m = array([[0.63,0.00,null],\
[0.75,0.10,0.45],\
[0.95,0.32,0.33],\
[1.00,0.95,0.11]])
expected = array([[null,null,null],\
[null,null,0.45],\
[null,null,null],\
[null,null,null]])
actual = filter_threshold_based_multiple_interdependency(\
None,m,0.95,0,greater_equal,True)
self.assertFloatEqual(actual,expected)
# realisitic test case
m = array([[0.63,0.00,null],\
[0.75,0.10,0.45],\
[0.95,0.32,0.33],\
[1.00,0.95,0.11]])
expected = array([[null,0.00,null],\
[null,0.10,0.45],\
[null,0.32,0.33],\
[null,null,null]])
actual = filter_threshold_based_multiple_interdependency(\
None,m,0.95,1,greater_equal,True)
self.assertFloatEqual(actual,expected)
# upper boundary, nothing filtered
null = gDefaultNullValue
m = array([[0.63,0.00,null],\
[0.75,0.10,0.45],\
[0.95,0.32,0.33],\
[1.00,0.95,0.11]])
expected = m
actual = filter_threshold_based_multiple_interdependency(\
None,m,0.95,5,greater_equal,True)
self.assertFloatEqual(actual,expected)
# cmp_function = less_equal, realistic test case
m = array([[0.63,0.00,null],\
[0.75,0.10,0.45],\
[0.95,0.32,0.33],\
[1.00,0.95,0.11]])
expected = array([[0.63,null,null],\
[0.75,null,null],\
[null,null,null],\
[1.00,null,null]])
actual = filter_threshold_based_multiple_interdependency(\
None,m,0.35,1,less_equal,True)
self.assertFloatEqual(actual,expected)
def test_filter_threshold_based_multiple_interdependency_intramolecular(self):
"multiple interdependency filter functions with intramolecular data "
null = gDefaultNullValue
## cmp_function = ge
# lower bound, everything filtered
m = array([[0.63,0.75,0.95,1.00],\
[0.75,0.10,null,0.95],\
[0.95,null,0.33,0.11],\
[1.00,0.95,0.11,1.00]])
expected = array([[null,null,null,null],\
[null,null,null,null],\
[null,null,null,null],\
[null,null,null,null]])
actual = filter_threshold_based_multiple_interdependency(\
None,m,0.95,0,greater_equal)
self.assertFloatEqual(actual,expected)
# realistic test case
m = array([[0.63,0.75,0.95,1.00],\
[0.75,0.10,null,0.95],\
[0.95,null,0.33,0.11],\
[1.00,0.95,0.11,1.00]])
expected = array([[null,null,null,null],\
[null,0.10,null,null],\
[null,null,0.33,null],\
[null,null,null,null]])
actual = filter_threshold_based_multiple_interdependency(\
None,m,0.95,1,greater_equal)
self.assertFloatEqual(actual,expected)
# upper boundary, nothing filtered
m = array([[0.63,0.75,0.95,1.00],\
[0.75,0.10,null,0.95],\
[0.95,null,0.33,0.11],\
[1.00,0.95,0.11,1.00]])
expected = m
actual = filter_threshold_based_multiple_interdependency(\
None,m,0.95,5,greater_equal)
self.assertFloatEqual(actual,expected)
## cmp_function = le
# realistic test case
m = array([[0.63,0.75,0.95,1.00],\
[0.75,0.10,null,0.95],\
[0.95,null,0.33,0.11],\
[1.00,0.95,0.11,1.00]])
expected = array([[0.63,0.75,null,1.00],\
[0.75,0.10,null,0.95],\
[null,null,null,null],\
[1.00,0.95,null,1.00]])
actual = filter_threshold_based_multiple_interdependency(\
None,m,0.33,1,less_equal)
self.assertFloatEqual(actual,expected)
def test_probs_from_dict(self):
"""probs_from_dict: dict of probs -> list of probs in alphabet's order
"""
d = {'A':0.25,'D':0.52,'C':0.23}
a = list('ACD')
self.assertFloatEqual(probs_from_dict(d,a),[0.25,0.23,0.52])
a = list('ADC')
self.assertFloatEqual(probs_from_dict(d,a),[0.25,0.52,0.23])
a = list('DCA')
self.assertFloatEqual(probs_from_dict(d,a),[0.52,0.23,0.25])
a = CharAlphabet('DCA')
self.assertFloatEqual(probs_from_dict(d,a),[0.52,0.23,0.25])
# protein natural probs
l = probs_from_dict(protein_dict,AAGapless)
for i in range(20):
self.assertFloatEqual(l[i],protein_dict[AAGapless[i]],0.001)
def test_freqs_from_aln(self):
"""freqs_from_aln: freqs of alphabet chars in aln is calc'ed correctly
"""
# non-default scaled_aln_size
aln = DenseAlignment(data=zip(range(4),['ACGT','AGCT','ACCC','TAGG']),\
MolType=PROTEIN)
alphabet = 'ACGT'
expected = [4,5,4,3]
self.assertEqual(freqs_from_aln(aln,alphabet,16),expected)
# change the order of the alphabet
alphabet = 'TGCA'
expected = [3,4,5,4]
self.assertEqual(freqs_from_aln(aln,alphabet,16),expected)
# default scaled_aln_size, sums of freqs == 100
alphabet = 'ACGT'
expected = [25.,31.25,25,18.75]
self.assertEqual(freqs_from_aln(aln,alphabet),expected)
# alphabet char which doesn't show up gets zero freq
alphabet = 'ACGTW'
expected = [25.,31.25,25,18.75,0]
self.assertEqual(freqs_from_aln(aln,alphabet),expected)
# alignment char which doesn't show up is silently ignored
aln = DenseAlignment(data=zip(range(4),['ACGT','AGCT','ACCC','TWGG']),\
MolType=PROTEIN)
alphabet = 'ACGT'
expected = [18.75,31.25,25,18.75]
self.assertEqual(freqs_from_aln(aln,alphabet),expected)
def test_freqs_to_array(self):
"""freqs_to_array: should convert Freqs object to array"""
#should work with empty object
f = Freqs()
f2a = freqs_to_array
self.assertFloatEqual(f2a(f, AAGapless), zeros(20))
#should work with full object, omitting unwanted keys
f = Freqs({'A':20, 'Q':30, 'X':20})
expected = zeros(20)
expected[AAGapless.index('A')] = 20
expected[AAGapless.index('Q')] = 30
self.assertFloatEqual(expected, f2a(f, AAGapless))
#should work for normal dict and any alphabet
d = {'A':3,'D':1,'C':5,'E':2}
alpha = "ABCD"
exp = array([3,0,5,1])
self.assertFloatEqual(f2a(d,alpha),exp)
def test_get_allowed_perturbations(self):
"""get_allowed_perturbations: should work for different cutoff values
"""
counts = [50,40,10,0]
a = list('ACGT')
self.assertEqual(get_allowed_perturbations(counts,1.0,a),[])
self.assertEqual(get_allowed_perturbations(counts,0.51,a),[])
self.assertEqual(get_allowed_perturbations(counts,0.5,a),['A'])
self.assertEqual(get_allowed_perturbations(counts,0.49,a),['A'])
self.assertEqual(get_allowed_perturbations(counts,0.401,a),['A'])
self.assertEqual(get_allowed_perturbations(counts,0.40,a),['A','C'])
self.assertEqual(get_allowed_perturbations(counts,0.399,a),['A','C'])
self.assertEqual(get_allowed_perturbations(counts,0.10,a),\
['A','C','G'])
self.assertEqual(get_allowed_perturbations(counts,0.0,a),a)
def test_get_subalignments(self):
"""get_subalignments: works with different alignment sizes and cutoffs
"""
aln = DenseAlignment(\
data={1:'AAAA',2:'AAAC',3:'AACG',4:'ACCT',5:'ACG-'},\
MolType=PROTEIN)
sub_aln_0A = DenseAlignment(\
data={1:'AAAA',2:'AAAC',3:'AACG',4:'ACCT',5:'ACG-'},\
MolType=PROTEIN)
sub_aln_0C = {}
sub_aln_1A = DenseAlignment(data={1:'AAAA',2:'AAAC',3:'AACG'},\
MolType=PROTEIN)
sub_aln_1C = DenseAlignment(data={4:'ACCT',5:'ACG-'},MolType=PROTEIN)
sub_aln_2G = DenseAlignment(data={5:'ACG-'},MolType=PROTEIN)
self.assertEqual(get_subalignments(aln,0,['A']),[sub_aln_0A])
self.assertEqual(get_subalignments(aln,0,['C']),[sub_aln_0C])
self.assertEqual(get_subalignments(aln,1,['A']),[sub_aln_1A])
self.assertEqual(get_subalignments(aln,1,['C']),[sub_aln_1C])
self.assertEqual(get_subalignments(aln,1,['A','C']),\
[sub_aln_1A,sub_aln_1C])
self.assertEqual(get_subalignments(aln,2,['G']),[sub_aln_2G])
self.assertEqual(get_subalignments(aln,3,['-']),[sub_aln_2G])
def test_get_positional_frequencies_w_scale(self):
"""get_positional_frequencies: works with default scaled_aln_size"""
aln = DenseAlignment(data={1:'ACDE',2:'ADDE',3:'AEED',4:'AFEF'},\
MolType=PROTEIN)
expected_0 = array([100.,0.,0.,0.,0.])
expected_1 = array([0.,25.,25.,25.,25.])
expected_2 = array([0.,0.,50.,50.,0.])
expected_3 = array([0.,0.,25.,50.,25.])
self.assertFloatEqual(get_positional_frequencies(aln,0,'ACDEF'),expected_0)
self.assertFloatEqual(get_positional_frequencies(aln,1,'ACDEF'),expected_1)
self.assertFloatEqual(get_positional_frequencies(aln,2,'ACDEF'),expected_2)
self.assertFloatEqual(get_positional_frequencies(aln,3,'ACDEF'),expected_3)
# extra characters (W) are silently ignored -- is this the desired
# behavior?
aln = DenseAlignment(data={1:'WCDE',2:'ADDE',3:'AEED',4:'AFEF'},\
MolType=PROTEIN)
expected_0 = array([75.,0.,0.,0.,0.])
self.assertFloatEqual(get_positional_frequencies(aln,0,'ACDEF'),expected_0)
# 20 residue amino acid alphabet
aln = DenseAlignment(data={1:'ACDE',2:'ADDE',3:'AEED',4:'AFEF'},\
MolType=PROTEIN)
expected = array([100.] + [0.]*19)
self.assertFloatEqual(get_positional_frequencies(aln,0,AAGapless),expected)
def test_get_positional_frequencies(self):
"""get_positional_frequencies: works with non-default scaled_aln_size
"""
aln = DenseAlignment(data={1:'ACDE',2:'ADDE',3:'AEED',4:'AFEF'},\
MolType=PROTEIN)
expected_0 = array([4.,0.,0.,0.,0.])
expected_1 = array([0.,1.,1.,1.,1.])
expected_2 = array([0.,0.,2.,2.,0.])
expected_3 = array([0.,0.,1.,2.,1.])
self.assertFloatEqual(get_positional_frequencies(aln,0,'ACDEF',4),\
expected_0)
self.assertFloatEqual(get_positional_frequencies(aln,1,'ACDEF',4),\
expected_1)
self.assertFloatEqual(get_positional_frequencies(aln,2,'ACDEF',4),\
expected_2)
self.assertFloatEqual(get_positional_frequencies(aln,3,'ACDEF',4),\
expected_3)
# extra characters (W) are silently ignored -- is this the desired
# behavior?
aln = DenseAlignment(data={1:'WCDE',2:'ADDE',3:'AEED',4:'AFEF'},\
MolType=PROTEIN)
expected_0 = array([3.,0.,0.,0.,0.])
self.assertFloatEqual(get_positional_frequencies(aln,0,'ACDEF',4),\
expected_0)
# 20 residue amino acid alphabet
aln = DenseAlignment(data={1:'ACDE',2:'ADDE',3:'AEED',4:'AFEF'},\
MolType=PROTEIN)
expected = array([4.] + [0.]*19)
self.assertFloatEqual(get_positional_frequencies(aln,0,AAGapless,4),\
expected)
def test_validate_alphabet_invalid(self):
"""validate_alphabet: raises error on incompatible alpabet and freqs
"""
# len(alpha) > len(freqs)
self.assertRaises(ValueError,validate_alphabet,\
'ABC',{'A':0.5,'B':0.5})
self.assertRaises(ValueError,validate_alphabet,\
'ABCD',{'A':0.5,'B':0.5})
# len(alpha) == len(freqs)
self.assertRaises(ValueError,validate_alphabet,\
'AC',{'A':0.5,'B':0.5})
# len(alpha) < len(freqs)
self.assertRaises(ValueError,validate_alphabet,\
'A',{'A':0.5,'B':0.5})
self.assertRaises(ValueError,validate_alphabet,'',\
{'A':0.5,'B':0.5})
# different values, len(alpha) > len(freqs)
self.assertRaises(ValueError,validate_alphabet,[1,42,3],\
{42:0.5,1:0.5})
self.assertRaises(ValueError,validate_alphabet,CharAlphabet('ABC'),\
{'A':0.5,'C':0.5})
def test_validate_alphabet_valid(self):
"""validate_alphabet: does nothing on compatible alpabet and freqs
"""
validate_alphabet('AB',{'A':0.5,'B':0.5})
validate_alphabet(CharAlphabet('AB'),{'A':0.5,'B':0.5})
validate_alphabet([1,42,8],{1:0.5,42:0.25,8:0.25})
def test_validate_position_invalid(self):
"""validate_position: raises error on invalid position """
self.assertRaises(ValueError,validate_position,self.dna_aln,4)
self.assertRaises(ValueError,validate_position,self.dna_aln,42)
self.assertRaises(ValueError,validate_position,self.dna_aln,-1)
self.assertRaises(ValueError,validate_position,self.dna_aln,-199)
def test_validate_position_valid(self):
"""validate_position: does nothing on valid position """
validate_position(self.dna_aln,0)
validate_position(self.dna_aln,1)
validate_position(self.dna_aln,2)
validate_position(self.dna_aln,3)
def test_validate_alignment(self):
"""validate_alignment: ValueError on bad alignment characters"""
# ambiguous characters
aln = DenseAlignment(data={0:'BA',1:'AC',2:'CG',3:'CT',4:'TA'},\
MolType=PROTEIN)
self.assertRaises(ValueError,validate_alignment,aln)
aln = DenseAlignment(data={0:'NA',1:'AC',2:'CG',3:'CT',4:'TA'},\
MolType=DNA)
self.assertRaises(ValueError,validate_alignment,aln)
aln = DenseAlignment(data={0:'YA',1:'AC',2:'CG',3:'CU',4:'UA'},\
MolType=RNA)
self.assertRaises(ValueError,validate_alignment,aln)
aln = DenseAlignment(data={0:'AA',1:'AC',2:'CG',3:'CT',4:'TA'},\
MolType=PROTEIN)
validate_alignment(aln)
aln = DenseAlignment(data={0:'AA',1:'AC',2:'CG',3:'CT',4:'TA'},\
MolType=DNA)
validate_alignment(aln)
aln = DenseAlignment(data={0:'AA',1:'AC',2:'CG',3:'CU',4:'UA'},\
MolType=RNA)
validate_alignment(aln)
def test_coevolve_functions_validate_alignment(self):
"""coevolve_*: functions run validate alignment"""
aln = DenseAlignment(\
data={'0':'BA','1':'AC','2':'CG','3':'CT','4':'TA'},\
MolType=PROTEIN)
self.assertRaises(ValueError,coevolve_pair,mi_pair,aln,0,1)
self.assertRaises(ValueError,coevolve_position,mi_position,aln,0)
self.assertRaises(ValueError,coevolve_alignment,mi_alignment,aln)
self.assertRaises(ValueError,coevolve_alignments,mi_alignment,aln,aln)
def test_get_positional_probabilities_w_non_def_num_seqs(self):
"""get_positional_probabilities: works w/ non-def num_seqs"""
freqs = [1.,2.,0.]
probs = [0.33,0.33,0.33]
expected = array([0.444411,0.218889,0.300763])
self.assertFloatEqual(get_positional_probabilities(freqs,probs,3),\
expected)
def test_get_dg(self):
"""get_dg: returns delta_g vector"""
p = [0.1,0.2,0.3]
a = [0.5,0.6,0.7]
expected = [log(0.1/0.5),log(0.2/0.6),log(0.3/0.7)]
self.assertFloatEqual(get_dg(p,a),expected)
def test_get_dgg(self):
"""get_dgg: returns delta_delta_g value given two delta_g vectors """
v1 = array([0.05,0.5,0.1])
v2 = array([0.03,0.05,0.1])
expected = sqrt(sum((v1 - v2) * (v1 - v2)))/100 * e
self.assertFloatEqual(get_dgg(v1,v2),expected)
def test_get_positional_probabilities_w_def_num_seqs(self):
"""get_positional_probabilities: works w/ num_seqs scaled to 100 (def)
"""
freqs = [15.,33.,52.]
probs = [0.33,0.33,0.33]
expected = array([2.4990e-5,0.0846,3.8350e-5])
self.assertFloatEqual(get_positional_probabilities(freqs,probs),\
expected,0.001)
def test_get_positional_probs_handles_rounding_error_in_freqs(self):
"""get_positional_probabilities: works w/ rounding error in freqs"""
# Since freqs are scaled to scaled_aln_size, rounding error can cause
# errors for positions that are perfectly controled. Testing here that
# that value error is handled.
# default scaled_aln_size
freqs = [100.0000000001,0.,0.]
probs = [0.33,0.33,0.33]
expected = array([7.102218e-49,4.05024e-18,4.05024e-18])
self.assertFloatEqual(get_positional_probabilities(freqs,probs),\
expected)
# value that is truely over raises an error
freqs = [101.0000000001,0.,0.]
probs = [0.33,0.33,0.33]
self.assertRaises(ValueError,get_positional_probabilities,freqs,probs)
# non-default scaled_aln_size
freqs = [50.0000000001,0.,0.]
probs = [0.33,0.33,0.33]
expected = array([8.42747e-25,2.01252e-9,2.01252e-9])
self.assertFloatEqual(get_positional_probabilities(freqs,probs,50),\
expected)
# value that is truely over raises an error
freqs = [51.0000000001,0.,0.]
probs = [0.33,0.33,0.33]
self.assertRaises(ValueError,get_positional_probabilities,\
freqs,probs,50)
def test_sca_input_validation(self):
"""sca_input_validation: handles sca-specific validation steps """
# MolType != PROTEIN makes background freqs required
self.assertRaises(ValueError,sca_input_validation,\
self.dna_aln,cutoff=0.4)
self.assertRaises(ValueError,sca_input_validation,\
self.rna_aln,cutoff=0.4)
# no cutoff -> ValueError
self.assertRaises(ValueError,sca_input_validation,self.protein_aln)
# low cutoff -> ValueError
self.assertRaises(ValueError,sca_input_validation,\
self.protein_aln,cutoff=-0.001)
# high cutoff -> ValueError
self.assertRaises(ValueError,sca_input_validation,\
self.protein_aln,cutoff=1.001)
# good cut-off -> no error
sca_input_validation(self.protein_aln,cutoff=0.50)
sca_input_validation(self.protein_aln,cutoff=0.0)
sca_input_validation(self.protein_aln,cutoff=1.0)
# only bad alphabet -> ValueError
self.assertRaises(ValueError,sca_input_validation,\
self.dna_aln,cutoff=0.5,alphabet='ABC')
# only bad background_freqs -> ValueError
self.assertRaises(ValueError,sca_input_validation,\
self.dna_aln,cutoff=0.5, background_freqs={'A':0.25, 'C':0.75})
# incompatible background_freqs & alphabet provided -> ValueError
self.assertRaises(ValueError,sca_input_validation,\
self.dna_aln,cutoff=0.5, alphabet='ABC', \
background_freqs={'A':0.25, 'C':0.75})
# default alphabet, background_freqs -> no error
sca_input_validation(self.protein_aln,cutoff=0.50)
# compatible non-default alphabet, backgorund_freqs -> no error
sca_input_validation(self.dna_aln,cutoff=0.50,alphabet='A',\
background_freqs={'A':1.0})
## Note: don't need a full set of tests of validate_alphabet here --
## it's tested on it's own.
def test_sca_pair_no_error(self):
"""sca_pair: returns w/o error """
r = sca_pair(self.dna_aln,1,0,cutoff=0.50,alphabet='ACGT',\
background_freqs=self.dna_base_freqs)
r = coevolve_pair(sca_pair,self.dna_aln,1,0,cutoff=0.50,\
alphabet='ACGT',background_freqs=self.dna_base_freqs)
def test_sca_pair_return_all(self):
"""sca_pair: handles return_all by returning lists of proper length
"""
# two allowed_perturbations
a = 'ACGT'
aln = DenseAlignment(data={0:'AA',1:'AC',2:'CG',3:'CT',4:'TA'},\
MolType=DNA)
actual = sca_pair(aln,0,1,cutoff=0.33,return_all=True,alphabet=a,\
background_freqs=self.dna_base_freqs)
self.assertEqual(len(actual),2)
self.assertEqual(actual[0][0], 'A')
self.assertEqual(actual[1][0], 'C')
# one allowed_perturbations
a = 'ACGT'
aln = DenseAlignment(data={0:'AA',1:'AC',2:'AG',3:'CT',4:'TA'},\
MolType=DNA)
actual = sca_pair(aln,0,1,0.33,return_all=True,alphabet=a,\
background_freqs=self.dna_base_freqs)
self.assertEqual(len(actual),1)
self.assertEqual(actual[0][0], 'A')
# zero allowed_perturbations
actual = sca_pair(aln,0,1,1.0,return_all=True,alphabet=a,\
background_freqs=self.dna_base_freqs)
#expected = [('A',-1),('C',-1)]
expected = gDefaultNullValue
self.assertFloatEqual(actual,expected)
# pos1 == pos2
actual = sca_pair(aln,0,0,0.33,return_all=True,alphabet=a,\
background_freqs=self.dna_base_freqs)
#expected = [('A',-1),('C',-1)]
expected = [('A', 2.40381185618)]
self.assertFloatEqual(actual,expected)
def test_sca_pair_error(self):
"""sca_pair:returns w/ error when appropriate """
a = 'ACGT'
# pos1 out of range
self.assertRaises(ValueError,coevolve_pair,sca_pair,self.dna_aln,\
100,1,cutoff=0.50,alphabet=a,background_freqs=self.dna_base_freqs)
# pos2 out of range
self.assertRaises(ValueError,coevolve_pair,sca_pair,self.dna_aln,\
0,100,cutoff=0.50,alphabet=a,background_freqs=self.dna_base_freqs)
# pos1 & pos2 out of range
self.assertRaises(ValueError,coevolve_pair,sca_pair,self.dna_aln,\
100,100,cutoff=0.50,alphabet=a,\
background_freqs=self.dna_base_freqs)
# bad cut-off
self.assertRaises(ValueError,coevolve_pair,sca_pair,\
self.dna_aln,0,1,cutoff=1.2,\
alphabet=a,background_freqs=self.dna_base_freqs)
# incompatible alphabet and background freqs
self.assertRaises(ValueError,coevolve_pair,sca_pair,\
self.dna_aln,0,1,cutoff=0.2,alphabet=a)
self.assertRaises(ValueError,coevolve_pair,sca_pair,\
self.dna_aln,0,1,cutoff=0.2,alphabet='ACGTBC',\
background_freqs=self.dna_base_freqs)
def test_sca_position_no_error(self):
"""sca_position: returns w/o error """
r = sca_position(self.dna_aln,1,0.50,alphabet='ACGT',\
background_freqs=self.dna_base_freqs)
# sanity check -- coupling w/ self
self.assertFloatEqual(r[1],3.087,0.01)
r = sca_position(self.dna_aln_gapped,1,0.50,\
alphabet='ACGT',background_freqs=self.dna_base_freqs)
self.assertFloatEqual(r[1],3.387,0.01)
## same tests, but called via coevolve_position
r = coevolve_position(sca_position,self.dna_aln,1,cutoff=0.50,\
alphabet='ACGT',background_freqs=self.dna_base_freqs)
# sanity check -- coupling w/ self
self.assertFloatEqual(r[1],3.087,0.01)
r = coevolve_position(sca_position,self.dna_aln_gapped,1,cutoff=0.50,\
alphabet='ACGT',background_freqs=self.dna_base_freqs)
# sanity check -- coupling w/ self
self.assertFloatEqual(r[1],3.387,0.01)
def test_sca_position_error(self):
"""sca_position: returns w/ error when appropriate """
a = 'ACGT'
# position out of range
self.assertRaises(ValueError,coevolve_position,sca_position,\
self.dna_aln,100,cutoff=0.50,alphabet=a,\
background_freqs=self.dna_base_freqs)
# bad cutoff
self.assertRaises(\
ValueError,coevolve_position,sca_position,self.dna_aln,\
1,cutoff=-8.2,alphabet=a,background_freqs=self.dna_base_freqs)
# incompatible alphabet and background freqs
self.assertRaises(ValueError,coevolve_position,sca_position,\
self.dna_aln,0,cutoff=0.2,alphabet=a)
self.assertRaises(ValueError,coevolve_position,sca_position,\
self.dna_aln,0,cutoff=0.2,alphabet='ACGTBC',\
background_freqs=self.dna_base_freqs)
def test_sca_position_returns_same_as_sca_pair(self):
"""sca_position: returns same as sca_pair called on each pos """
expected = []
for i in range(len(self.dna_aln)):
expected.append(sca_pair(self.dna_aln,1,i,0.50,\
alphabet='ACGT',background_freqs=self.dna_base_freqs))
actual = sca_position(self.dna_aln,1,0.50,\
alphabet='ACGT',background_freqs=self.dna_base_freqs)
self.assertFloatEqual(actual,expected)
# change some of the defaults to make sure they make it through
bg_freqs = {'A':0.50,'C':0.50}
expected = []
for i in range(len(self.dna_aln)):
expected.append(sca_pair(self.dna_aln,1,i,0.50,\
alphabet='AC',null_value=52.,scaled_aln_size=20,\
background_freqs=bg_freqs))
actual = sca_position(self.dna_aln,1,0.50,alphabet='AC',\
null_value=52.,scaled_aln_size=20,background_freqs=bg_freqs)
self.assertFloatEqual(actual,expected)
def test_sca_alignment_no_error(self):
"""sca_alignment: returns w/o error """
r = sca_alignment(self.dna_aln,0.50,alphabet='ACGT',\
background_freqs=self.dna_base_freqs)
# sanity check -- coupling w/ self
self.assertFloatEqual(r[0][0],2.32222608171)
## same test, but called via coevolve_alignment
r = coevolve_alignment(sca_alignment,self.dna_aln,\
cutoff=0.50,alphabet='ACGT',\
background_freqs=self.dna_base_freqs)
# sanity check -- coupling w/ self
self.assertFloatEqual(r[0][0],2.32222608171)
def test_sca_alignment_error(self):
"""sca_alignment: returns w/ error when appropriate """
a = 'ACGT'
# incompatible alphabet and background freqs
self.assertRaises(ValueError,coevolve_position,sca_position,\
self.dna_aln,0,cutoff=0.2,alphabet=a)
self.assertRaises(ValueError,coevolve_position,sca_position,\
self.dna_aln,0,cutoff=0.2,alphabet='ACGTBC',\
background_freqs=self.dna_base_freqs)
def test_sca_alignment_returns_same_as_sca_position(self):
"""sca_alignment: returns same as sca_position on every position"""
expected = []
for i in range(len(self.dna_aln)):
expected.append(\
sca_position(self.dna_aln,i,0.50,alphabet='ACGT',\
background_freqs=self.dna_base_freqs))
actual = sca_alignment(self.dna_aln,0.50,alphabet='ACGT',\
background_freqs=self.dna_base_freqs)
self.assertFloatEqual(actual,expected)
# change some of the defaults to make sure they make it through
bg_freqs = {'A':0.50,'C':0.50}
expected = []
for i in range(len(self.dna_aln)):
expected.append(\
sca_position(self.dna_aln,i,0.50,alphabet='AC',\
null_value=52.0,scaled_aln_size=20,background_freqs=bg_freqs))
actual = sca_alignment(self.dna_aln,0.50,alphabet='AC',\
null_value=52.0,scaled_aln_size=20,background_freqs=bg_freqs)
self.assertFloatEqual(actual,expected)
def test_sca_pair_gpcr(self):
"""sca_pair: reproduces several GPCR data from Suel et al., 2003
"""
self.assertFloatEqual(sca_pair(self.gpcr_aln,295,18,0.32),0.12,0.1)
self.assertFloatEqual(sca_pair(self.gpcr_aln,295,124,0.32),1.86,0.1)
self.assertFloatEqual(sca_pair(self.gpcr_aln,295,304,0.32),0.3,0.1)
# covariation w/ self
self.assertFloatEqual(sca_pair(self.gpcr_aln,295,295,0.32),7.70358628)
def test_sca_position_gpcr(self):
"""sca_position: reproduces several GPCR data from Suel et al., 2003
"""
if not self.run_slow_tests: return
vector = sca_position(self.gpcr_aln,295,0.32)
self.assertFloatEqual(vector[18],0.12,0.1)
self.assertFloatEqual(vector[124],1.86,0.1)
self.assertFloatEqual(vector[304],0.3,0.1)
# covariation w/ self == null_value
self.assertFloatEqual(vector[295],nan)
def test_ltm_to_symmetric(self):
"""ltm_to_symmetric: making ltm matrices symmetric functions"""
m = arange(9).reshape((3,3))
expected = [[0,3,6],[3,4,7],[6,7,8]]
self.assertEqual(ltm_to_symmetric(m),expected)
# non-square matrices not supported
self.assertRaises(AssertionError,\
ltm_to_symmetric,arange(10).reshape(5,2))
self.assertRaises(AssertionError,\
ltm_to_symmetric,arange(10).reshape(2,5))
def test_merge_alignments(self):
""" merging alignments of same moltype functions as expected"""
# PROTEIN
aln1 = DenseAlignment(data={'1':'AC','2':'AD'},MolType=PROTEIN)
aln2 = DenseAlignment(data={'1':'EF','2':'EG'},MolType=PROTEIN)
combined_aln = DenseAlignment(\
data={'1':'ACEF','2':'ADEG'},MolType=PROTEIN)
actual = merge_alignments(aln1,aln2)
self.assertEqual(actual,combined_aln)
self.assertEqual(actual.MolType,PROTEIN)
# RNA
aln1 = DenseAlignment(data={'1':'AC','2':'AU'},MolType=RNA)
aln2 = DenseAlignment(data={'1':'GG','2':'UG'},MolType=RNA)
combined_aln = DenseAlignment(data={'1':'ACGG','2':'AUUG'},MolType=RNA)
actual = merge_alignments(aln1,aln2)
self.assertEqual(actual,combined_aln)
self.assertEqual(actual.MolType,RNA)
# DNA
aln1 = DenseAlignment(data={'1':'AC','2':'AT'},MolType=DNA)
aln2 = DenseAlignment(data={'1':'GG','2':'TG'},MolType=DNA)
combined_aln = DenseAlignment(data={'1':'ACGG','2':'ATTG'},MolType=DNA)
actual = merge_alignments(aln1,aln2)
self.assertEqual(actual,combined_aln)
self.assertEqual(actual.MolType,DNA)
def test_merge_alignments_ignores_id_following_plus(self):
""" merge_alignments ignores all seq id characters after '+' """
aln1 = DenseAlignment(data={'1+a':'AC','2+b':'AD'},MolType=PROTEIN)
aln2 = DenseAlignment(\
data={'1 + c':'EFW','2 + d':'EGY'},MolType=PROTEIN)
combined_aln = DenseAlignment(\
data={'1':'ACEFW','2':'ADEGY'},MolType=PROTEIN)
self.assertEqual(merge_alignments(aln1,aln2),combined_aln)
# not all ids have a +
aln1 = DenseAlignment(data={'1':'AC','2+b':'AD'},MolType=PROTEIN)
aln2 = DenseAlignment(data={'1+c':'EFW','2':'EGY'},MolType=PROTEIN)
combined_aln = DenseAlignment(\
data={'1':'ACEFW','2':'ADEGY'},MolType=PROTEIN)
self.assertEqual(merge_alignments(aln1,aln2),combined_aln)
def test_merge_alignments_different_moltype(self):
""" merging alignments of different moltype functions as expected"""
aln1 = DenseAlignment(data={'1':'AC','2':'AU'},MolType=RNA)
aln2 = DenseAlignment(data={'1':'EF','2':'EG'},MolType=PROTEIN)
combined_aln = DenseAlignment(data={'1':'ACEF','2':'AUEG'})
self.assertEqual(merge_alignments(aln1,aln2),combined_aln)
aln1 = DenseAlignment(data={'1':'AC','2':'AT'},MolType=DNA)
aln2 = DenseAlignment(data={'1':'EF','2':'EG'},MolType=PROTEIN)
combined_aln = DenseAlignment(data={'1':'ACEF','2':'ATEG'})
self.assertEqual(merge_alignments(aln1,aln2),combined_aln)
aln1 = DenseAlignment(data={'1':'AC','2':'AT'},MolType=DNA)
aln2 = DenseAlignment(data={'1':'UC','2':'UG'},MolType=RNA)
combined_aln = DenseAlignment(data={'1':'ACUC','2':'ATUG'})
self.assertEqual(merge_alignments(aln1,aln2),combined_aln)
def test_n_random_seqs(self):
"""n_random_seqs: functions as expected"""
aln1 = LoadSeqs(data=zip(list('abcd'),['AA','AC','DD','GG']),\
moltype=PROTEIN,aligned=DenseAlignment)
# Number of returned sequences correct
self.assertEqual(n_random_seqs(aln1,1).getNumSeqs(),1)
self.assertEqual(n_random_seqs(aln1,2).getNumSeqs(),2)
self.assertEqual(n_random_seqs(aln1,3).getNumSeqs(),3)
self.assertEqual(n_random_seqs(aln1,4).getNumSeqs(),4)
# Sequences are correct
new_aln = n_random_seqs(aln1,3)
self.assertEqual(new_aln.getNumSeqs(),3)
for n in new_aln.Names:
self.assertEqual(new_aln.getSeq(n),aln1.getSeq(n))
# Objects are equal when all are requested
self.assertEqual(n_random_seqs(aln1,4),aln1)
# Objects are not equal when subset are requested
self.assertNotEqual(n_random_seqs(aln1,3),aln1)
# In 1000 iterations, we get at least one different alignment --
# this tests the random selection
different = False
new_aln = n_random_seqs(aln1,2)
for i in range(1000):
new_aln2 = n_random_seqs(aln1,2)
if new_aln != new_aln2:
different = True
break
self.assertTrue(different)
class AncestorCoevolve(TestCase):
""" Tests of the ancestral state method for detecting coevolution """
def setUp(self):
""" """
# t1, ancestral_states1, and aln1_* are used to test that when
# alternate seqs are used with the same tree and ancestral_states,
# the results vary when appropriate
self.t1 = LoadTree(treestring=\
'((A:0.5,B:0.5):0.5,(C:0.5,(D:0.5,E:0.5):0.5):0.5);')
self.ancestral_states1 = DenseAlignment(data={'root':'AAA',\
'edge.0':'AAA','edge.1':'AAA','edge.2':'AAA'},MolType=PROTEIN)
self.ancestral_states1_w_gaps = DenseAlignment(data={'root':'AAA',\
'edge.0':'AAA','edge.1':'A-A','edge.2':'AA-'},MolType=PROTEIN)
# no correlated changes count
self.aln1_1 = DenseAlignment(data={'A':'AAC','B':'AAD','C':'AAA',\
'D':'AAE','E':'AFA'},MolType=PROTEIN)
# 1 correlated change count
self.aln1_2 = DenseAlignment(data={'A':'AAC','B':'AAD','C':'AAA',\
'D':'AEE','E':'AFF'},MolType=PROTEIN)
# 1 different correlated change count
self.aln1_3 = DenseAlignment(data={'A':'AAC','B':'AAD','C':'AAA',\
'D':'AGE','E':'AFH'},MolType=PROTEIN)
# 3 correlated change counts
self.aln1_4 = DenseAlignment(data={'A':'AAC','B':'AGD','C':'AAA',\
'D':'AGE','E':'AFH'},MolType=PROTEIN)
# 8 correlated change counts
self.aln1_5 = DenseAlignment(data={'A':'YYC','B':'HGD','C':'AAA',\
'D':'AGE','E':'AFH'},MolType=PROTEIN)
self.aln1_w_gaps = DenseAlignment(data={'A':'AAC','B':'AAD','C':'AAA',\
'D':'AG-','E':'A-H'},MolType=PROTEIN)
# t2, ancestral_states2_*, and aln2 are used to test that when
# alternate ancestral states are used with the same aln and tree,
# the results vary when appropriate
self.t2 = LoadTree(treestring='(A:0.5,B:0.5,C:0.5);')
self.ancestral_states2_1 = DenseAlignment(data={'root':'AA'},\
MolType=PROTEIN)
self.ancestral_states2_2 = DenseAlignment(data={'root':'CC'},\
MolType=PROTEIN)
self.ancestral_states2_3 = DenseAlignment(data={'root':'EF'},\
MolType=PROTEIN)
self.aln2 = DenseAlignment(data={'A':'AA','B':'CC','C':'CA'},\
MolType=PROTEIN)
# t3_*, ancestral_states3, and aln3 are used to test that when
# alternate trees are used with the same aln and ancestral_states,
# the results vary when appropriate
self.t3_1 = LoadTree(treestring='(A:0.5,(B:0.5,C:0.5):0.5);')
self.t3_2 = LoadTree(treestring='((A:0.5,B:0.5):0.5,C:0.5);')
self.ancestral_states3 = DenseAlignment(\
data={'root':'CC','edge.0':'AD'},MolType=PROTEIN)
self.aln3 = DenseAlignment(data={'A':'AC','B':'CA','C':'CC'},\
MolType=PROTEIN)
def test_validate_ancestral_seqs_invalid(self):
"""validate_ancestral_seqs: ValueError on incompatible anc. seqs & tree
"""
# edge missing
aln = DenseAlignment(data={'A':'AC','B':'CA','C':'CC'},MolType=PROTEIN)
self.assertRaises(ValueError,validate_ancestral_seqs,aln,\
tree=LoadTree(treestring='((A:0.5,B:0.5):0.5,C:0.5);'),\
ancestral_seqs=DenseAlignment(data={'root':'AA'},MolType=PROTEIN))
# root missing
self.assertRaises(ValueError,validate_ancestral_seqs,aln,\
tree=LoadTree(treestring='((A:0.5,B:0.5):0.5,C:0.5);'),\
ancestral_seqs=DenseAlignment(data={'edge.0':'AA'},MolType=PROTEIN))
# correct numSeqs but wrong names
self.assertRaises(ValueError,validate_ancestral_seqs,aln,\
tree=LoadTree(treestring='((A:0.5,B:0.5):0.5,C:0.5);'),\
ancestral_seqs=DenseAlignment(data={'root':'AA','edge.1':'AA'},\
MolType=PROTEIN))
self.assertRaises(ValueError,validate_ancestral_seqs,aln,\
tree=LoadTree(treestring='((A:0.5,B:0.5):0.5,C:0.5);'),\
ancestral_seqs=DenseAlignment(data={'r':'AA','edge.0':'AA'},\
MolType=PROTEIN))
self.assertRaises(ValueError,validate_ancestral_seqs,aln,\
tree=LoadTree(treestring='((A:0.5,B:0.5):0.5,C:0.5);'),\
ancestral_seqs=DenseAlignment(data={'r':'AA','e':'AA'},\
MolType=PROTEIN))
# different tree: invalid
aln = DenseAlignment(data={'A':'AC','B':'CA','C':'CC','D':'DD'},\
MolType=PROTEIN)
self.assertRaises(ValueError,validate_ancestral_seqs,aln,\
tree=LoadTree(treestring='((A:0.5,B:0.5):0.5,(C:0.5,D:0.5):0.5);'),\
ancestral_seqs=DenseAlignment(\
data={'root':'AA','e':'AA','edge.1':'AA'},MolType=PROTEIN))
def test_validate_ancestral_seqs_valid(self):
"""validate_ancestral_seqs: does nothing on compatible anc. seqs & tree
"""
aln = DenseAlignment(data={'A':'AC','B':'CA','C':'CC'},MolType=PROTEIN)
# valid data -> no error
validate_ancestral_seqs(aln,tree=LoadTree(\
treestring='((A:0.5,B:0.5):0.5,C:0.5);'),\
ancestral_seqs=DenseAlignment(data={'root':'AA','edge.0':'AA'},\
MolType=PROTEIN))
# different tree: valid
aln = DenseAlignment(data={'A':'AC','B':'CA','C':'CC','D':'DD'},\
MolType=PROTEIN)
validate_ancestral_seqs(aln,tree=LoadTree(\
treestring='((A:0.5,B:0.5):0.5,(C:0.5,D:0.5):0.5);'),\
ancestral_seqs=DenseAlignment(data=\
{'root':'AA','edge.0':'AA','edge.1':'AA'},MolType=PROTEIN))
def test_ancestral_states_input_validation(self):
"""ancestral_states_input_validation: all validation steps performed"""
aln = DenseAlignment(data={'A':'AC','B':'CA','C':'CC','D':'DD'},\
MolType=PROTEIN)
# incompatible tree and ancestral states (more thorough testing in
# test_validate_ancestral_seqs)
self.assertRaises(ValueError,ancestral_states_input_validation,aln,\
tree=LoadTree(treestring='((A:0.5,B:0.5):0.5,(C:0.5,D:0.5):0.5);'),\
ancestral_seqs=DenseAlignment(data={'root':'AA','e':'AA',\
'edge.1':'AA'},MolType=PROTEIN))
# no tree provided
self.assertRaises(ValueError,ancestral_states_input_validation,aln,\
ancestral_seqs=DenseAlignment(data={'root':'AA','e':'AA',\
'edge.1':'AA'},MolType=PROTEIN))
# incompatible tree and alignment (more tests in test_validate_tree)
aln = DenseAlignment(data={'A':'AC','B':'CA','C':'CC'},MolType=PROTEIN)
self.assertRaises(ValueError,ancestral_states_input_validation,aln,\
tree=LoadTree(treestring='((A:0.5,B:0.5):0.5,(C:0.5,D:0.5):0.5);'))
def test_validate_tree_valid(self):
"""validate_tree: does nothing on compatible tree and aln """
t = LoadTree(treestring='((A:0.5,B:0.5):0.5,(C:0.5,D:0.5):0.5);')
aln = DenseAlignment(data={'A':'AC','B':'CA','C':'CC','D':'DD'},\
MolType=PROTEIN)
validate_tree(aln,t)
t = LoadTree(treestring='((A:0.5,B:0.5):0.5,C:0.5);')
aln = DenseAlignment(\
data={'A':'AC','B':'CA','C':'CC'},MolType=PROTEIN)
validate_tree(aln,t)
def test_validate_tree_invalid(self):
"""validate_tree: raises ValueError on incompatible tree and aln """
# different scale tree and aln
t = LoadTree(treestring='((A:0.5,B:0.5):0.5,C:0.5);')
aln = DenseAlignment(data={'A':'AC','B':'CA','C':'CC','D':'DD'},\
MolType=PROTEIN)
self.assertRaises(ValueError,validate_tree,aln,t)
t = LoadTree(treestring='((A:0.5,B:0.5):0.5,(C:0.5,D:0.5):0.5);')
aln = DenseAlignment(data={'A':'AC','B':'CA','C':'CC'},\
MolType=PROTEIN)
self.assertRaises(ValueError,validate_tree,aln,t)
# same scale tree and aln, but different names
t = LoadTree(treestring='((A:0.5,B:0.5):0.5,(C:0.5,Dee:0.5):0.5);')
aln = DenseAlignment(data={'A':'AC','B':'CA','C':'CC','D':'DD'},\
MolType=PROTEIN)
self.assertRaises(ValueError,validate_tree,aln,t)
def test_get_ancestral_seqs(self):
"""get_ancestral_seqs: returns valid collection of ancestral seqs """
t = LoadTree(treestring='((A:0.5,B:0.5):0.5,C:0.5);')
aln = DenseAlignment(data={'A':'AA','B':'AA','C':'AC'},MolType=PROTEIN)
expected = DenseAlignment(data={'root':'AA','edge.0':'AA'},\
MolType=PROTEIN)
self.assertEqual(get_ancestral_seqs(aln,t, optimise=False),expected)
t = LoadTree(treestring='(A:0.5,B:0.5,C:0.5);')
aln = DenseAlignment(data={'A':'AA','B':'AA','C':'AC'},\
MolType=PROTEIN)
expected = DenseAlignment(data={'root':'AA'},MolType=PROTEIN)
self.assertEqual(get_ancestral_seqs(aln,t, optimise=False),expected)
t = LoadTree(treestring='(((A1:0.5,A2:0.5):0.5,B:0.5):0.5,\
(C:0.5,D:0.5):0.5);')
aln = DenseAlignment(data={'A1':'AD','A2':'AD','B':'AC',\
'C':'AC','D':'AC'},MolType=PROTEIN)
expected = DenseAlignment(data={'root':'AC','edge.0':'AD',\
'edge.1':'AC','edge.2':'AC'},MolType=PROTEIN)
self.assertEqual(get_ancestral_seqs(aln,t, optimise=False),expected)
def test_get_ancestral_seqs_handles_gaps(self):
"""get_ancestral_seqs: handles gaps """
# Gaps handled OK
t = LoadTree(treestring='(A:0.5,B:0.5,C:0.5);')
aln = DenseAlignment(data={'A':'A-','B':'AA','C':'AA'},MolType=PROTEIN)
expected = DenseAlignment(data={'root':'AA'},MolType=PROTEIN)
self.assertEqual(get_ancestral_seqs(aln,t, optimise=False),expected)
def test_get_ancestral_seqs_handles_ambiguous_residues(self):
"""get_ancestral_seqs: handles ambiguous residues """
# Non-canonical residues handled OK
t = LoadTree(treestring='(A:0.5,B:0.5,C:0.5);')
aln = DenseAlignment(data={'A':'AX','B':'Z-','C':'BC'},MolType=PROTEIN)
actual = get_ancestral_seqs(aln,t, optimise=False)
self.assertEqual(len(actual),2)
self.assertEqual(actual.getNumSeqs(),1)
def test_ancestral_state_alignment_handles_ancestral_state_calc(self):
"""ancestral_state_alignment: functions when calc'ing ancestral states
"""
t = LoadTree(treestring='((A:0.5,B:0.5):0.5,C:0.5);')
aln = DenseAlignment(data={'A':'AA','B':'AA','C':'AC'},MolType=PROTEIN)
self.assertEqual(ancestral_state_alignment(aln,t),[[0,0],[0,2]])
# non-bifurcating tree
t = LoadTree(treestring='(A:0.5,B:0.5,C:0.5);')
aln = DenseAlignment(data={'A':'AA','B':'AA','C':'AC'},MolType=PROTEIN)
self.assertEqual(ancestral_state_alignment(aln,t),[[0,0],[0,2]])
def test_ancestral_state_position_handles_ancestral_state_calc(self):
"""ancestral_state_position: functions when calc'ing ancestral states
"""
t = LoadTree(treestring='((A:0.5,B:0.5):0.5,C:0.5);')
aln = DenseAlignment(data={'A':'AA','B':'AA','C':'AC'},MolType=PROTEIN)
self.assertEqual(ancestral_state_position(aln,t,0),[0,0])
self.assertEqual(ancestral_state_position(aln,t,1),[0,2])
def test_ancestral_state_pair_handles_ancestral_state_calc(self):
"""ancestral_state_position: functions when calc'ing ancestral states
"""
t = LoadTree(treestring='((A:0.5,B:0.5):0.5,C:0.5);')
aln = DenseAlignment(data={'A':'AA','B':'AA','C':'AC'},MolType=PROTEIN)
self.assertEqual(ancestral_state_pair(aln,t,0,0),0)
self.assertEqual(ancestral_state_pair(aln,t,0,1),0)
self.assertEqual(ancestral_state_pair(aln,t,1,1),2)
self.assertEqual(ancestral_state_pair(aln,t,1,0),0)
def test_ancestral_state_alignment_no_error_on_gap(self):
"""ancestral_state_alignment: return w/o error with gapped seqs """
ancestral_state_alignment(self.aln1_w_gaps,self.t1,\
self.ancestral_states1)
ancestral_state_alignment(self.aln1_1,self.t1,\
self.ancestral_states1_w_gaps)
def test_ancestral_state_methods_handle_bad_ancestor_aln(self):
"""ancestral state methods raise error on bad ancestor alignment """
# bad length and seq names
self.assertRaises(ValueError,coevolve_alignment,\
ancestral_state_alignment,self.aln1_2,\
tree=self.t1,ancestral_seqs=self.ancestral_states2_1)
self.assertRaises(ValueError,coevolve_position,\
ancestral_state_position,self.aln1_2,0,\
tree=self.t1,ancestral_seqs=self.ancestral_states2_1)
self.assertRaises(ValueError,coevolve_pair,\
ancestral_state_pair,self.aln1_2,0,1,\
tree=self.t1,ancestral_seqs=self.ancestral_states2_1)
# bad seq names
self.assertRaises(ValueError,coevolve_alignment,\
ancestral_state_alignment,self.aln1_2,\
tree=self.t1,ancestral_seqs=self.aln1_2)
self.assertRaises(ValueError,coevolve_position,\
ancestral_state_position,self.aln1_2,0,\
tree=self.t1,ancestral_seqs=self.aln1_2)
self.assertRaises(ValueError,coevolve_pair,\
ancestral_state_pair,self.aln1_2,0,1,\
tree=self.t1,ancestral_seqs=self.aln1_2)
# bad length
a = DenseAlignment(data={'root':'AC','edge.0':'AD','edge.1':'AA',\
'edge.2':'EE'})
self.assertRaises(ValueError,coevolve_alignment,\
ancestral_state_alignment,self.aln1_2,\
tree=self.t1,ancestral_seqs=a)
self.assertRaises(ValueError,coevolve_position,\
ancestral_state_position,self.aln1_2,0,\
tree=self.t1,ancestral_seqs=a)
self.assertRaises(ValueError,coevolve_pair,\
ancestral_state_pair,self.aln1_2,0,1,\
tree=self.t1,ancestral_seqs=a)
def test_ancestral_states_methods_handle_bad_position_numbers(self):
"""coevolve_* w/ ancestral_states raise ValueError on bad position
"""
self.assertRaises(ValueError,coevolve_position,\
ancestral_state_position,self.aln1_2,\
42,tree=self.t1,ancestral_states=self.ancestral_states2_1)
self.assertRaises(ValueError,coevolve_pair,\
ancestral_state_pair,self.aln1_2,\
0,42,tree=self.t1,ancestral_states=self.ancestral_states2_1)
self.assertRaises(ValueError,coevolve_pair,\
ancestral_state_pair,self.aln1_2,\
42,0,tree=self.t1,ancestral_states=self.ancestral_states2_1)
def test_ancestral_state_alignment_non_bifurcating_tree(self):
"""ancestral_state_alignment: handles non-bifurcating tree correctly
"""
self.assertEqual(ancestral_state_alignment(self.aln2,\
self.t2,self.ancestral_states2_3),[[9,9],[9,9]])
def test_ancestral_state_alignment_bifurcating_tree(self):
"""ancestral_state_alignment: handles bifurcating tree correctly """
self.assertFloatEqual(ancestral_state_alignment(self.aln1_5,\
self.t1,self.ancestral_states1),\
[[5,5,5],[5,11.6,11.6],[5,11.6,11.6]])
def test_ancestral_state_alignment_ancestor_difference(self):
"""ancestral_state_alignment: different ancestor -> different result
"""
# ancestral_states2_1
self.assertEqual(ancestral_state_alignment(self.aln2,\
self.t2,self.ancestral_states2_1),[[5,2],[2,2]])
# ancestral_states2_2
self.assertEqual(ancestral_state_alignment(self.aln2,\
self.t2,self.ancestral_states2_2),[[2,2],[2,5]])
# ancestral_states2_3
self.assertEqual(ancestral_state_alignment(self.aln2,\
self.t2,self.ancestral_states2_3),[[9,9],[9,9]])
def test_ancestral_state_position_ancestor_difference(self):
"""ancestral_state_position: difference_ancestor -> different result
"""
# ancestral_states2_1
self.assertEqual(ancestral_state_position(self.aln2,\
self.t2,0,self.ancestral_states2_1),[5,2])
self.assertEqual(ancestral_state_position(self.aln2,\
self.t2,1,self.ancestral_states2_1),[2,2])
# ancestral_states2_2
self.assertEqual(ancestral_state_position(self.aln2,\
self.t2,0,self.ancestral_states2_2),[2,2])
self.assertEqual(ancestral_state_position(self.aln2,\
self.t2,1,self.ancestral_states2_2),[2,5])
# ancestral_states2_3
self.assertEqual(ancestral_state_position(self.aln2,\
self.t2,0,self.ancestral_states2_3),[9,9])
self.assertEqual(ancestral_state_position(self.aln2,\
self.t2,1,self.ancestral_states2_3),[9,9])
def test_ancestral_state_pair_ancestor_difference(self):
"""ancestral_state_pair: difference_ancestor -> different result
"""
# ancestral_states2_1
self.assertEqual(ancestral_state_pair(self.aln2,\
self.t2,0,0,self.ancestral_states2_1),5)
self.assertEqual(ancestral_state_pair(self.aln2,\
self.t2,0,1,self.ancestral_states2_1),2)
self.assertEqual(ancestral_state_pair(self.aln2,\
self.t2,1,1,self.ancestral_states2_1),2)
self.assertEqual(ancestral_state_pair(self.aln2,\
self.t2,1,0,self.ancestral_states2_1),2)
# ancestral_states2_2
self.assertEqual(ancestral_state_pair(self.aln2,\
self.t2,0,0,self.ancestral_states2_2),2)
self.assertEqual(ancestral_state_pair(self.aln2,\
self.t2,0,1,self.ancestral_states2_2),2)
self.assertEqual(ancestral_state_pair(self.aln2,\
self.t2,1,1,self.ancestral_states2_2),5)
self.assertEqual(ancestral_state_pair(self.aln2,\
self.t2,1,0,self.ancestral_states2_2),2)
# ancestral_states2_3
self.assertEqual(ancestral_state_pair(self.aln2,\
self.t2,0,0,self.ancestral_states2_3),9)
self.assertEqual(ancestral_state_pair(self.aln2,\
self.t2,0,1,self.ancestral_states2_3),9)
self.assertEqual(ancestral_state_pair(self.aln2,\
self.t2,1,1,self.ancestral_states2_3),9)
self.assertEqual(ancestral_state_pair(self.aln2,\
self.t2,1,0,self.ancestral_states2_3),9)
def test_ancestral_state_alignment_tree_difference(self):
"""ancestral_state_alignment: different result on different tree
"""
# tree: t3_1
self.assertEqual(ancestral_state_alignment(self.aln3,\
self.t3_1,self.ancestral_states3),[[7,5],[5,5]])
# tree: t3_2
self.assertEqual(ancestral_state_alignment(self.aln3,\
self.t3_2,self.ancestral_states3),[[2,2],[2,5]])
def test_ancestral_state_position_tree_difference(self):
"""ancestral_state_position: different result on different tree
"""
# tree: t3_1
self.assertEqual(ancestral_state_position(self.aln3,\
self.t3_1,0,self.ancestral_states3),[7,5])
self.assertEqual(ancestral_state_position(self.aln3,\
self.t3_1,1,self.ancestral_states3),[5,5])
# tree: t3_2
self.assertEqual(ancestral_state_position(self.aln3,\
self.t3_2,0,self.ancestral_states3),[2,2])
self.assertEqual(ancestral_state_position(self.aln3,\
self.t3_2,1,self.ancestral_states3),[2,5])
def test_ancestral_state_pair_tree_difference(self):
"""ancestral_state_pair: different result on different tree
"""
# tree: t3_1
self.assertFloatEqual(ancestral_state_pair(self.aln3,\
self.t3_1,0,1,self.ancestral_states3),5)
self.assertFloatEqual(ancestral_state_pair(self.aln3,\
self.t3_1,1,0,self.ancestral_states3),5)
self.assertFloatEqual(ancestral_state_pair(self.aln3,\
self.t3_1,0,0,self.ancestral_states3),7)
self.assertFloatEqual(ancestral_state_pair(self.aln3,\
self.t3_1,1,1,self.ancestral_states3),5)
# tree: t3_2
self.assertFloatEqual(ancestral_state_pair(self.aln3,\
self.t3_2,0,1,self.ancestral_states3),2)
self.assertFloatEqual(ancestral_state_pair(self.aln3,\
self.t3_2,1,0,self.ancestral_states3),2)
self.assertFloatEqual(ancestral_state_pair(self.aln3,\
self.t3_2,0,0,self.ancestral_states3),2)
self.assertFloatEqual(ancestral_state_pair(self.aln3,\
self.t3_2,1,1,self.ancestral_states3),5)
def test_ancestral_state_alignment_aln_difference(self):
"""ancestral_state_alignment: difference aln -> different result
"""
expected = [[0,0,0],[0,2,0],[0,0,7.8]]
actual = ancestral_state_alignment(self.aln1_1,\
self.t1,self.ancestral_states1)
self.assertFloatEqual(actual,expected)
expected = [[5,5,5],[5,11.6,11.6],[5,11.6,11.6]]
actual = ancestral_state_alignment(self.aln1_5,\
self.t1,self.ancestral_states1)
self.assertFloatEqual(actual,expected)
def test_ancestral_state_position_aln_difference(self):
"""ancestral_state_position: difference aln -> different result
"""
expected = [0,0,0]
actual = ancestral_state_position(self.aln1_1,\
self.t1,0,self.ancestral_states1)
self.assertFloatEqual(actual,expected)
expected = [0,2,0]
actual = ancestral_state_position(self.aln1_1,\
self.t1,1,self.ancestral_states1)
self.assertFloatEqual(actual,expected)
expected = [0,0,7.8]
actual = ancestral_state_position(self.aln1_1,\
self.t1,2,self.ancestral_states1)
self.assertFloatEqual(actual,expected)
expected = [5,5,5]
actual = ancestral_state_position(self.aln1_5,\
self.t1,0,self.ancestral_states1)
self.assertFloatEqual(actual,expected)
expected = [5,11.6,11.6]
actual = ancestral_state_position(self.aln1_5,\
self.t1,1,self.ancestral_states1)
self.assertFloatEqual(actual,expected)
expected = [5,11.6,11.6]
actual = ancestral_state_position(self.aln1_5,\
self.t1,2,self.ancestral_states1)
self.assertFloatEqual(actual,expected)
def test_ancestral_state_pair_aln_difference(self):
"""acestral_state_pair: different aln -> different result """
self.assertFloatEqual(ancestral_state_pair(self.aln1_1,self.t1,0,0,\
self.ancestral_states1),0)
self.assertFloatEqual(ancestral_state_pair(self.aln1_1,self.t1,1,1,\
self.ancestral_states1),2)
self.assertFloatEqual(ancestral_state_pair(self.aln1_1,self.t1,2,2,\
self.ancestral_states1),7.8)
self.assertFloatEqual(ancestral_state_pair(self.aln1_5,self.t1,0,1,\
self.ancestral_states1),5)
self.assertFloatEqual(ancestral_state_pair(self.aln1_5,self.t1,0,2,\
self.ancestral_states1),5)
self.assertFloatEqual(ancestral_state_pair(self.aln1_5,self.t1,1,2,\
self.ancestral_states1),11.6)
def test_ancestral_state_pair_symmetry(self):
"""ancestral_state_pair: value[i,j] == value[j,i] """
self.assertFloatEqual(ancestral_state_pair(self.aln1_5,self.t1,0,1,\
self.ancestral_states1),ancestral_state_pair(\
self.aln1_5,self.t1,1,0,self.ancestral_states1))
self.assertFloatEqual(ancestral_state_pair(self.aln1_5,self.t1,0,2,\
self.ancestral_states1),ancestral_state_pair(\
self.aln1_5,self.t1,2,0,self.ancestral_states1))
self.assertFloatEqual(ancestral_state_pair(self.aln1_5,self.t1,1,2,\
self.ancestral_states1),ancestral_state_pair(\
self.aln1_5,self.t1,2,1,self.ancestral_states1))
def est_ancestral_state_methods_handle_alt_null_value(self):
"""ancetral state methods handle non-default null value """
# need to rewrite a test of this -- right now there's no way to get
# null values into the ancestral states result, but that will change
# when I fix the exclude handling
pass
class GctmpcaTests(TestCase):
def setUp(self):
self.run_slow_tests = int(environ.get('TEST_SLOW_APPC',0))
self.run_gctmpca_tests = app_path('calculate_likelihood')
# Data used by Gctmpca tests
self.l1 = "1\t2\t42.60\n"
self.l2 = "2\t3\t0.60"
self.lines = ["Position 1\tPosition 2\tScore\n",self.l1,self.l2]
self.aln = DenseAlignment(\
[('A1','AACF'),('A12','AADF'),('A123','ADCF')],\
MolType=PROTEIN)
self.rna_aln = DenseAlignment(\
[('A1','AACU'),('A12','AAGG'),('A123','ADCA')],\
MolType=RNA)
self.dna_aln = DenseAlignment(\
[('A1','AACT'),('A12','AAGG'),('A123','ADCA')],\
MolType=DNA)
self.tree = LoadTree(treestring="(A1:0.5,(A12:0.5,A123:0.5):0.5);")
self.aln4 = DenseAlignment([('A','AACF'),('AB','AADF'),\
('ABC','ADCF'),('AAA','AADE')],\
MolType=PROTEIN)
self.tree4 = LoadTree(treestring=\
"((A:0.5,AAA:0.5):0.5,(AB:0.5,ABC:0.5):0.5);")
def test_parse_gctmpca_result_line(self):
"""Gctmpca: result line parsing functions as expected """
exp1 = (0,1,42.60)
exp2 = (1,2,0.60)
self.assertFloatEqual(parse_gctmpca_result_line(self.l1),exp1)
self.assertFloatEqual(parse_gctmpca_result_line(self.l2),exp2)
def test_parse_gctmpca_result(self):
"""Gctmpca: result (as list) yeilds correctly matrix """
exp1 = array([\
[gDefaultNullValue,42.60,gDefaultNullValue],\
[42.60,gDefaultNullValue,0.60],\
[gDefaultNullValue,0.60,gDefaultNullValue]])
self.assertFloatEqual(parse_gctmpca_result(self.lines,3),exp1)
exp2 = array([\
[gDefaultNullValue,42.60,gDefaultNullValue,gDefaultNullValue],\
[42.60,gDefaultNullValue,0.60,gDefaultNullValue],\
[gDefaultNullValue,0.60,gDefaultNullValue,gDefaultNullValue],\
[gDefaultNullValue,gDefaultNullValue,\
gDefaultNullValue,gDefaultNullValue]])
self.assertFloatEqual(parse_gctmpca_result(self.lines,4),exp2)
self.assertRaises(ValueError,parse_gctmpca_result,self.lines,2)
def test_create_gctmpca_input(self):
"""Gctmpca: create_gctmpca_input generates proper data """
seqs1, tree1, seq_names, seq_to_species1 = \
create_gctmpca_input(self.aln,self.tree)
exp_seqs1 = ['3 4','A1.. AACF', 'A12. AADF','A123 ADCF','\n']
exp_tree1 = ["(A1..:0.5,(A12.:0.5,A123:0.5):0.5);",'\n']
exp_seq_names = ["A1..","A12.","A123",'\n']
exp_seq_to_species1 = ["A1..\tA1..","A12.\tA12.","A123\tA123",'\n']
self.assertEqual(seqs1,exp_seqs1)
self.assertEqual(tree1,exp_tree1)
self.assertEqual(seq_names,exp_seq_names)
self.assertEqual(seq_to_species1,exp_seq_to_species1)
def test_gctmpca_pair(self):
"""Gctmpca: pair method works on trivial data """
if not self.run_slow_tests: return
if not self.run_gctmpca_tests: return
# Note: the values in here are derived from the results of running
# this on the example data from the command line.
# More extensive tests are performed
# in the app controller test -- here I just want to make sure we're
# getting the values out.
actual = gctmpca_pair(self.aln,self.tree,2,3)
expected = 0.483244
self.assertFloatEqual(actual,expected)
actual = gctmpca_pair(self.aln4,self.tree4,2,3)
expected = 0.164630
self.assertFloatEqual(actual,expected)
# mol type = RNA
actual = gctmpca_pair(self.rna_aln,self.tree,3,4)
expected = 0.237138
self.assertFloatEqual(actual,expected)
# mol type = DNA => error
self.assertRaises(ValueError,gctmpca_pair,\
self.dna_aln,self.tree,2,3)
def test_gctmpca_alignment(self):
"""Gctmpca: alignment method works on trivial data """
if not self.run_slow_tests: return
if not self.run_gctmpca_tests: return
# Note: the values in here are derived from the results of running
# this on the example data from the command line.
# More extensive tests are performed
# in the app controller test -- here I just want to make sure we're
# getting the values out.
actual = gctmpca_alignment(self.aln,self.tree)
expected = array([\
[gDefaultNullValue,gDefaultNullValue,\
gDefaultNullValue,gDefaultNullValue],\
[gDefaultNullValue,gDefaultNullValue,0.483244,gDefaultNullValue],\
[gDefaultNullValue,0.483244,gDefaultNullValue,1.373131],\
[gDefaultNullValue,gDefaultNullValue,1.373131,gDefaultNullValue]])
self.assertFloatEqual(actual,expected)
def test_build_q_yields_roughly_gctmpca_default(self):
"""build_rate_matrix: from DSO78 data yields Yeang's default Q
"""
# Note: This doesn't reproduce the exact values, which I expect is
# due to the two Dayhoff matrices in the PAML data. I think they're
# using the second one (which is from Dayhoff et al., 1978) and we're
# using the first one. What is the difference b/w these two?
aa_order = 'ACDEFGHIKLMNPQRSTVWY'
q = build_rate_matrix(DSO78_matrix,DSO78_freqs,aa_order=aa_order)
expected = []
for row_aa in aa_order:
expected.append([default_gctmpca_aa_sub_matrix[row_aa][col_aa] \
for col_aa in aa_order])
self.assertFloatEqual(q,expected,3)
def test_build_q_ignores_zero_counts(self):
"""build_rate_matrix: recoded counts (i.e., w/ many 0s) yeilds right Q
"""
# Test that when working with reduced counts, counts and freqs
# that equal 0.0 don't effect the calculation of Q.
aa_order_3 = 'ACD'
count_matrix_3 = [[0,3,9],[3,0,6],[9,6,0]]
aa_freqs_3 = {'A':0.5,'C':0.3,'D':0.2}
sm = Empirical(\
rate_matrix=array(count_matrix_3),\
motif_probs=aa_freqs_3,\
alphabet=Alphabet(aa_order_3),recode_gaps=True,do_scaling=True,\
name="",optimise_motif_probs=False)
wprobs = array([aa_freqs_3[aa] for aa in aa_order_3])
mprobs_matrix=ones((wprobs.shape[0],wprobs.shape[0]),float)*wprobs
q3 = sm.calcQ(wprobs, mprobs_matrix)
aa_freqs_20 = {}.fromkeys('ACDEFGHIKLMNPQRSTVWY',0.0)
aa_freqs_20['A'] = 0.5
aa_freqs_20['C'] = 0.3
aa_freqs_20['D'] = 0.2
count_matrix_20 = zeros(400).reshape(20,20)
count_matrix_20[0,1] = count_matrix_20[1,0] = 3
count_matrix_20[0,2] = count_matrix_20[2,0] = 9
count_matrix_20[1,2] = count_matrix_20[2,1] = 6
q_20 = build_rate_matrix(array(count_matrix_20),aa_freqs_20)
for i in range(20):
for j in range(20):
try:
# rates in q_3 and q_20 are identical
self.assertEqual(q3[i,j],q_20[i,j])
except IndexError:
# and everything not in q_3 is zero
self.assertEqual(q_20[i,j],0.)
# following are support funcs for ResampledMiTests
def make_freqs(c12):
c1,c2=Freqs(),Freqs()
for a,b in c12.expand():
c1 += a
c2 += b
return c1, c2
def make_sample(freqs):
d=[]
for i, s in enumerate(freqs.expand()):
d += [("s%d" % i, s)]
return LoadSeqs(data=d)
def _calc_mi():
"""one mutual info hand calc"""
from math import log
i = 37/42 * -log(37/42,2) - (5/42 * log(5/42,2))
j = 39/42 * -log(39/42,2) - (3/42 * log(3/42,2))
k = 34/42 * -log(34/42,2) - (3/42 * log(3/42,2)) - (5/42 * log(5/42,2))
return i+j-k
class ResampledMiTests(TestCase):
def setUp(self):
self.c12 = Freqs()
self.c12 += ['AA']*2
self.c12 += ['BB']*2
self.c12 += ['BC']
self.c1, self.c2 = make_freqs(self.c12)
self.aln = make_sample(self.c12)
def test_calc_weights(self):
"""resampled mi weights should be correctly computed"""
w1 = make_weights(self.c1, 5)
w2 = make_weights(self.c2, 5)
e = [('A', {'C': 0.033333333333333333, 'B': 0.066666666666666666}),
('C', {'A': 0.050000000000000003, 'B': 0.050000000000000003}),
('B', {'A': 0.066666666666666666, 'C': 0.033333333333333333})]
weights = []
for w in w1,w2:
for k, d in w:
weights += d.values()
self.assertFloatEqual(sum(weights), 0.5)
self.assertEqual(w2, e)
def test_scaled_mi(self):
"""resampled mi should match hand calc"""
def calc_scaled(data, expected_smi):
col_i, col_j = Freqs(), Freqs()
for i, j in data:
col_i += i
col_j += j
pair_freqs = Freqs(data)
weights_i = make_weights(col_i.copy(), col_i.Sum)
weights_j = make_weights(col_j.copy(), col_j.Sum)
entropy = mi(col_i.Uncertainty, col_j.Uncertainty,
pair_freqs.Uncertainty)
self.assertFloatEqual(entropy, _calc_mi())
scales = calc_pair_scale(data, col_i, col_j, weights_i, weights_j)
scaled_mi = 1-sum([w*pair_freqs[pr] for pr, e, w in scales\
if entropy <= e])
self.assertFloatEqual(scaled_mi, expected_smi)
data = ['BN', 'BN', 'BP', 'BN', 'PN', 'BN', 'BN', 'BN', 'BN', 'BN',
'BN', 'BN', 'BN', 'PN', 'BN', 'PN', 'BN', 'BN', 'BN', 'BN',
'BN', 'BP', 'BN', 'BN', 'BN', 'BN', 'BP', 'BN', 'BN', 'BN',
'BN', 'PN', 'PN', 'BN', 'BN', 'BN', 'BN', 'BN', 'BN', 'BN',
'BN', 'BN']
calc_scaled(data, 8/42)
def test_resampled_mi_interface(self):
"""resampled_mi_alignment should correctly compute statistic from
alignment"""
arr = resampled_mi_alignment(self.aln)
# expected value from hand calculation
self.assertFloatEqual(arr.tolist(), [[1.,0.78333333],[0.78333333,1.]])
ALN_FILE=\
"""Seq_1 ACDEFG
Seq_2 STVWY-
Seq_3 WY.ZBX"""
#J in here
ALN_FILE_WRONG_KEY=\
"""Seq_1 ACDKLM
Seq_2 JINCK-
Seq_3 VX.MAB"""
#last seq too long
ALN_FILE_INC_SHAPE=\
"""Seq_1 ACDKLM
Seq_2 LINCK-
Seq_3 VX.MABN"""
gpcr_ungapped = """>OPSD_SPAAU
MNGTEGPFFYVPMVNTSGIVRSPYEYPQYYLVNPAAYAALGAYMFLLILVGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVLGRLGCNIEGFFATLGGEIALWSLVVLAIERWVVVCKPISNFRFGENHAIMGLAFTWIMAMACAAPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFICHFSIPLTIVFFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVIMMVIAFLVCWLPYAGVAWWIFTHQGSEFGPVFMTIPAFFAKSSSIYNPMIYICLNKQFRHCMITTLCCGKNPFEEEEGASTSVSSSSVSPAA-
>B1AR_CANFA
LPDGAATAARLLVPASPSASPLAPTSEGPAPLSQQWTAGIGLLMALIVLLIVAGNVLVIAAIAKTPRLQTLTNLFIMSLASADLVMGLLVVPFGATIVMRGRWEYGSFLCELWTSVDVLCVTASIETLCVIALDRYLAITAPFYQSLLTRARARALVCTVWAISALVSFLPILMHWWRAR-R--KCCDFV--------TNRAYAIASSVVSFYVPLCIMAFVYLRVFREAQKQNGRRRPSRLVALREQKALKTLGIIMGVFTLCWLPFFLANVVKAFHRDL-VPDRLFVFFNWLGYANSAFNPIIYCRSP-DFRRAFQRLLCCARRAARGSHGAAG------PPPSPG
>OPSB_HUMAN
---MRKMSEEEFYLFKNISSVGPWDGPQYHIAPVWAFYLQAAFMGTVFLIGFPLNAMVLVATLRYKKLRQPLNYILVNVSFGGFLLCIFSVFPVFVASCNGYFVFGRHVCALEGFLGTVAGLVTGWSLAFLAFERYIVICKPFGNFRFSSKHALTVVLATWTIGIGVSIPPFFGWSRFIPEGLQCSCGPDWYTVGTKYRSESYTWFLFIFCFIVPLSLICFSYTQLLRALKAVAAQQQESATTQKAEREVSRMVVVMVGSFCVCYVPYAAFAMYMVNNRNHGLDLRLVTIPSFFSKSACIYNPIIYCFMNKQFQACIMKMVCGKAMTD---ESDTCSTVSSTQVGPN-
>OPSD_PROCL
NP-Y-GNFTVVDMAPKDILHMIHPHWYQYPPMNPMMYPLLLIFMLFTGILCLAGNFVTIWVFMNTKSLRTPANLLVVNLAMSDFLMMFTMFPPMMVTCYYHTWTLGPTFCQVYAFLGNLCGCASIWTMVFITFDRYNVIVKGVAGEPLSTKKASLWILTIWVLSITWCIAPFFGWNRYVPEGNLTGCGTD--YLSEDILSRSYLYDYSTWVYYLPLLP-IYCYVSIIKAVAAHMGIRNEEAQKTSAECRLAKIAMTTVALWFIAWTPYLLINWVGMFARSY-LSPVYTIWGYVFAKANAVYNPIVYAISHPKYRAAMEKKLPCLSCKTESDDVSESAEEKAESA----
>BRB1_RABIT
ASQGPLELQPSNQSQLAPPNATSC--SGAPDAWDLLHRLLPTFIIAIFTLGLLGNSFVLSVFLLARRRLSVAEIYLANLAASDLVFVLGLPFWAENVRNQFDWPFGAALCRIVNGVIKANLFISIFLVVAISQDRYSVLVHPMSRRGRRRRQAQATCALIWLAGGLLSTPTFVLRSVRA-LN--SACILL---LPHEAWHWLRMVELNLLGFLLPLAAILFFNCHILASLRRR---RVPSRCGGPRDSKSTALILTLVASFLVCWAPYHFFAFLECLWQVHEFTDLGLQLSNFSAFVNSCLNPVIYVFVGRLFRTKVWELCQQCSPR---------LAPV-------S
>5H7_.ENLA
NLLPSEFMTERPLNTTEQDLTKPDCGKELLLYGDTEKIVIGVVLSIITLFTIAGNALVIISVCIVKKLRQPSNYLVVSLAAADLSVAVAVMPFVIITDLVGGWLFGKVFCNVFIAMDVMCCTASIMTLCVISVDRYLGITRPLYPARQNGKLMAKMVFIVWLLSASITLPPLFGWAK--N-V--RVCLIS--------QDFGYTVYSTAVAFYIPMTVMLVMYQRIFVAAKISSKLDRKNISIFKREQKAARTLGIIVGAFTFCWLPFFLLSTARPFICGICMPLRLERTLLWLGYTNSLINPLIYAFFNRDLRTTFWNLLRCKYTNINRRLSAASTERHEGIL----
>B3AR_FELCA
MAPWPHGNGSLASWPDAPTLTPNTANTSGLPGVPWAVALAGALLALAVLATVGGNLLVIVAIARTPRLQTMTNVFVTSLATADLVVGLLVVPPGATLALTGHWPLGATGCELWTSVDVLCVTASIETLCALAVDRYLAVTNPLYGALVTKRRARAAVVLVWVVSAAVSFAPIMSKWWRVQ-R--HCCAFA--------SNIPYALLSSSVSFYLPLLVMLFVYARVFVVATRQGVPRRPARLLPLREHRALRTLGLIMGTFSLCWLPFFVANVVRALGGPS-VPSPAFLALNWLGYANSAFNPLIYCRSP-DFRSAFRRLLCRCRLEERHAAASGAAALTRPAESGLP
>TLR1_DROME
IIDNRDNLESINEAKDFLTECLFPSPTRPYELPWEQKTIWAIIFGLMMFVAIAGNGIVLWIVTGHRSMRTVTNYFLLNLSIADLLMSSLNCVFNFIFMLNSDWPFGSIYCTINNFVANVTVSTSVFTLVAISFDRYIAIVDPL-KRRTSRRKVRIILVLIWALSCVLSAPCLLYSS----SR--TVCFMMDGRYPTSMADYAYNLIILVLTTGIPMIVMLICYSLMGRVPGGSSIGTDRQMESMKSKRKVVRMFIAIVSIFAICWLPYHLFFIYAYHNNQVKYVQHMYLGFYWLAMSNAMVNPLIYYWMNKRFRMYFQRIICCCCVGLTRHRFDSPNSSNRHTRAETK
>ITR_CATCO
EQDFWSFNESSRNSTVGNETF-GNQTVNPLKRNEEVAKVEVTVLALVLFLALAGNLCVLIAIYTAKHTQSRMYYLMKHLSIADLVVAVFQVLPQLIWDITFRFYGPDFLCRLVKYLQTVGMFASTYMLVLMSIDRCIAICQPL--RSLHKRKDRCYVIVSWALSLVFSVPQVYIFSLRE----VYDCWGD---FVQPWGAKAYITWISLTIYIIPVAILGGCYGLISFKIWQNANAVSSVKLVSKAKITTVKMTFVIVLAYIVCWTPFFFVQMWSAWDPEA-REAMPFIISMLLASLNSCCNPWIYMFFAGHLFHDLKQSLLCCSTLYLKSSQCRCKSNCSTYVIKST
>NK1R_RANCA
----MNSNISAQNDSALNSTIQNGTKINQFIQPPWQIALWSVAYSIIVIVSLVGNIIVMWIIIAHKRMRTVTNYFLVNLAFAEASMSAFNTVINFTYAIHNHWYYGLIYCKFHNFFPISAVFTSIYSMTAIALDRYMAIIHPL-KPRLSATATKIVICVIWSFSFCMAFPLGYYAD---GG---DICYLNPDSEENRKYEQVYQVLVFCLIYILPLLVIGCAYTFIGMTLWAS--PSDRYHEQVVAKRKVVKMMIVVVCTFAICWLPFHIFFLLQTLHEMTKFYQQFYLAIMWLAMSSTMYNPIIYCCLNDRFRIGFKHVFRWCPFIRAGEY----STRYLQTQSSMY
>A2AA_MOUSE
----MGSLQPDAGNSSWNGTEAPGGGTRATPYSLQVTLTLVCLAGLLMLFTVFGNVLVIIAVFTSRALKAPQNLFLVSLASADILVATLVIPFSLANEVMGYWYFGKVWCEIYLALDVLFCTSSIVHLCAISLDRYWSITQAIYNLKRTPRRIKAIIVTVWVISAVISFPPLISIEKKGQ-P--PSCKIN--------DQKWYVISSSIGSFFAPCLIMILVYVRIYQIAKRRRGGASRWRGRQNREKRFTFVLAVVIGVFVVCWFPFFFTYTLIAVGCP--VPSQLFNFFFWFGYCNSSLNPVIYTIFNHDFRRAFKKILCRGDRKRIV------------------
>MSHR_VULVU
SGQGPQRRLLGSPNATSPTTPHFKLAANQTGPRCLEVSIPNGLFLSLGLVSVVENVLVVAAIAKNRNLHSPMYYFIGCLAVSDLLVSVTNVLETAVMLLVEAAAVVQQLDDIIDVLICGSMVSSLCFLGAIAVDRYLSIFYALYHSIVTLPRAWRAISAIWVASVLSSTLFIAYYY----------------------NNHTAVLLCLVSFFVAMLVLMAVLYVHMLARARQHIARKRQHSVHQGFGLKGAATLTILLGIFFLCWGPFFLHLSLMVLCPQHGCVFQNFNLFLTLIICNSIIDPFIYAFRSQELRKTLQEVVLCSW-----------------------
>OPSD_BOVIN
MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIMLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTPHEETNNESFVIYMFVVHFIIPLIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSDFGPIFMTIPAFFAKTSAVYNPVIYIMMNKQFRNCMVTTLCCGKNPLGDDEASTTVSKTETSQVAPA
>OPSG_MOUSE
DHYEDSTHASIFTYTNSNSTKGPFEGPNYHIAPRWVYHLTSTWMILVVVASVFTNGLVLAATMRFKKLRHPLNWILVNLAVADLAETIIASTISVVNQIYGYFVLGHPLCVIEGYIVSLCGITGLWSLAIISWERWLVVCKPFGNVRFDAKLATVGIVFSWVWAAIWTAPPIFGWSRYWPYGLKTSCGPDVFSGTSYPGVQSYMMVLMVTCCIFPLSIIVLCYLQVWLAIRAVAKQQKESESTQKAEKEVTRMVVVMVFAYCLCWGPYTFFACFATAHPGYAFHPLVASLPSYFAKSATIYNPIIYVFMNRQFRNCILHLFGKKVDDS-----SELVSSV--SSVSPA
>TA2R_RAT
--MWLNS-----------TSLGACFRPVNITLQERRAIASPWFAASFCALGLGSNLLALSVLAGARPPRSSFLALLCGLVLTDFLGLLVTGAVVASQHAALLTDPGCRLCHFMGAAMVFFGLCPLLLGAAMAAERFVGITRPFSRPAATSRRAWATVGLVWVGAGTLGLLPLLGLGRYSVQYPGSWCFLT----LGAERGDVAFGLMFALLGSVSVGLSLLLNTVSVATLCRVYHAREATQRPRDCEVEMMVQLVGIMVVATVCWMPLLVFILQTLLQTLPRTTERQLLIYLRVATWNQILDPWVYILFRRSVLRRLHPRFTSQLQAVSLHSPPTQ------------
>CCR3_HUMAN
LENFSSSYDYGENESDSCCTSPPC---PQDFSLNFDRAFLPALYSLLFLLGLLGNGAVAAVLLSRRTALSSTDTFLLHLAVADTLLVLTLPLWAVDAAVQ--WVFGSGLCKVAGALFNINFYAGALLLACISFDRYLNIVHATLYRRGPPARVTLTCLAVWGLCLLFALPDFIFLSAHH-RL-ATHCQYN----FPQVGRTALRVLQLVAGFLLPLLVMAYCYAHILAVL---------LVSRGQRRLRAMRLVVVVVVAFALCWTPYHLVVLVDILMDLGSRVDVAKSVTSGLGYMHCCLNPLLYAFVGVKFRERMWMLLLRLGCPN----QRGL--------PSSS
>THRR_CRILO
LPEGRAIYLNKSHSPPAPLAPFISEDASGYLTSPWLRLFIPSVYTFVFVVSLPLNILAIAVFVLKMKVKKPAVVYMLHLAMADVLFVSVLPLKISYYFSGSDWQFGSGMCRFATAAFYCNMYASIMLMTVISIDRFLAVVYPISLSWRTLGRANFTCLVIWVMAIMGVVPLLLKEQTTR--N--TTCHDVLNETLLQGFYSYYFSAFSAVFFLVPLIISTICYMSIIRCL------SSSSVANRSKKSRALFLSAAVFCVFIVCFGPTNVLLIMHYLLLSDEKAYFAYLLCVCVSSVSCCIDPLIYYYASSECQRHLYGILCCKESSDPNSYNSTGDTCS--------
>AA2A_CAVPO
----------------------------------MSSSVYITVELVIAVLAILGNVLVCWAVWINSNLQNVTNYFVVSLAAADIAVGVLAIPFAITISTG--FCAACHGCLFFACFVLVLTQSSIFSLLTITIDRYIAIRIPLYNGLVTCTRAKGIIAICWVLSFAIGLTPMLGWNNCSS-E-QVTCLFE-----DVVPMNYMVYYNFFAFVLVPLLLMLGIYLRIFLAARRQESQGERTRSTLQKEVHPAKSLAIIVGLFALCCLPLNIINCFTFFCPECHAPPWLMYLTIILSHGNSVVNPLIYAYRIREFRQTFRKIIRSHILRRRELFKAGGAHSPEGEQVSLR
>C3AR_MOUSE
--------------MESFDADTNSTDLHSRPLFQPQDIASMVILGLTCLLGLLGNGLVLWVAGVKMK-TTVNTVWFLHLTLADFLCCLSLPFSLAHLILQGHWPYGLFLCKLIPSIIILNMFASVFLLTAISLDRCLIVHKPICQNHRNVRTAFAICGCVWVVAFVMCVPVFVYRDLFI-ED-DYVDQFT-YDNHVPTPLMAITITRLVVGFLVPFFIMVICYSLIVFRM--------RKTNFTKSRNKTFRVAVAVVTVFFICWTPYHLVGVLLLITDPEEAVMSWDHMSIALASANSCFNPFLYALLGKDFRKKARQSIKGILEAAFSEELTHSASS---------
>TA2R_BOVIN
--MWPNA-----------SSLGPCFRPMNITLEERRLIASPWFAASFCLVGLASNLLALSVLMGARQSRSSFLTFLCGLVLTDFMGLLVTGAIVVTQHFVLFVDPGCSLCHFMGVIMVFFGLCPLLLGAAMASERFLGITRPFRPATASQRRAWTTVGLVWASALALGLLPLLGVGHYTVQYPGSWCFLT----LGTDPGDVAFGLLFALLGSISVGMSFLLNTISVATLCHVYHGATAQQRPRDCEVEMMVQLMGIMVVASICWMPLLVFIAQTVLQSPPRLTERQLLIYLRVATWNQILDPWVYILFRRAVIQRFYPRLSTRSRSLSLQPQLTR------------
>OPSD_RABIT
MNGTEGPDFYIPMSNQTGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTTTLYTSLHGYFVFGPTGCNVEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWIMALACAAPPLVGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPLIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSSSIYNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATASKTETSQVAPA
>OPSD_PHOVI
MNGTEGPNFYVPFSNKTGVVRSPFEFPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVGFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTLPAFFAKAASIYNPVIYIMMNKQFRTCMITTLCCGKNPLGDDEVSASASKTETSQVAPA
>D3DR_RAT
--------MAPLSQISTHLNSTCGAENSTGVNRARPHAYYALSYCALILAIIFGNGLVCAAVLRERALQTTTNYLVVSLAVADLLVATLVMPWVVYLEVTGGWNFSRICCDVFVTLDVMMCTASILNLCAISIDRYTAVVMPVGTGQSSCRRVALMITAVWVLAFAVSCPLLFGFN---D-P--SICSI---------SNPDFVIYSSVVSFYVPFGVTVLVYARIYIVLRQRTSLPLQPRGVPLREKKATQMVVIVLGAFIVCWLPFFLTHVLNTHCQACHVSPELYRATTWLGYVNSALNPVIYTTFNVEFRKAFLKILSC-------------------------
>DADR_.ENLA
---------------MTFNITSMDEDVLLTERESSFRVLTGCFLSVLILSTLLGNTLVCAAVIRFRHRSKVTNFFVISLAVSDLLVAVLVMPWKAVAEIAGFWPFG-TFCNIWVAFDIMCSTASILNLCVISVDRYWAISSPFYERKMTPKVAFIMIGVAWTLSVLISFIPVQLNWHKALN-TMDNCDSS--------LNRTYAISSSLISFYIPVAIMIVTYTRIYRIAAKQLDCESSLKTSFKRETKVLKTLSVIMGVFVCCWLPFFILNCIVPFCDPSCISSTTFDVFVWFGWANSSLNPIIYAFNA-DFRKAFSNLLGCYRLCPTSNNIIETAVVYSCQ-----
>PAR2_HUMAN
---VDGTSHVTGKGVTVETVFSVDEFSASVLTGKLTTVFLPIVYTIVFVVGLPSNGMALWVFLFRTKKKHPAVIYMANLALADLLSVIWFPLKIAYHIHGNNWIYGEALCNVLIGFFYGNMYCSILFMTCLSVQRYWVIVNPMGHSRKKANIAIGISLAIWLLILLVTIPLYVVKQTIF--N--TTCHDVLPEQLLVGDMFNYFLSLAIGVFLFPAFLTASAYVLMIRML------SAMDENSEKKRKRAIKLIVTVLAMYLICFTPSNLLLVVHYFLIKSSHVYALYIVALCLSTLNSCIDPFVYYFVSHDFRDHAKNALLCRSVRTVKQMQVSLKSSS--------
>GPRA_RAT
TPANQSAEASESNVSATVPRAAAVTPFQSLQLVHQLKGLIVMLYSIVVVVGLVGNCLLVLVIARVRRLHNVTNFLIGNLALSDVLMCAACVPLTLAYAFEPRWVFGGGLCHLVFFLQPVTVYVSVFTLTTIAVDRYVVLVHPL-RRRISLKLSAYAVLGIWALSAVLALPAAVHTYHVEHD--VRLCEEF--WGSQERQRQIYAWGLLLGTYLLPLLAILLSYVRVSVKLRNRPGS-SQADWDRARRRRTFCLLVVVVVVFALCWLPLHIFNLLRDLDPRAYAFGLVQLLCHWLAMSSACYNPFIYAWLHDSFREELRKMLLSWPRKIVPHGQNMT------------
>ET1R_PIG
EFSLVVTTHRPTNLALPSNGSMHNYCPQQTKITSAFKYINTVISCTIFIVGMVGNATLLRIIYQNKCMRNGPNALIASLALGDLIYVVIDLPINVFKLLAGRHDFGVFLCKLFPFLQKSSVGITVLNLCALSVDRYRAVASWSVQGIGIPLVTAIEIVSIWILSFILAIPEAIGFVMVPKT--HKTCMLNATSKFMEFYQDVKDWWLFGFYFCMPLVCTAIFYTLMTCEMLNRGSL-IALSEHLKQRREVAKTVFCLVVIFALCWFPLHLSRILKKTVYDESFLLLMDYIGINLATMNSCINPIALYFVSKKFKNCFQSCLCCCCYQSKSLMTSVPWKNHEQNNHNTE
>NTR2_MOUSE
-----METSSLWPPRPSPSAGLSLEARLGVDTRLWAKVLFTALYSLIFALGTAGNALSVHVVLKARTRPGRLRYHVLSLALSALLLLLISVPMELYNFVWSHWVFGDLGCRGYYFVRELCAYATVLSVASLSAERCLAVCQPLARRLLTPRRTCRLLSLVWVASLGLALPMAVIMGQKH--AASRVCTVL---VSRASSRSTFQVKRAGLLRSPLWELTAILNGITVNHLVALVQARHKDASQIRSLQHSAQVLRAIVAVYVICWLPYHARRLMYCYIPDDDFYHYFYMVTNTLFYVSSAVTPVLYNAVSSSFRKLFLESLSSLCGEQRSVVPLPQSTYSFRLWGSPR
>CKR7_MOUSE
QDEVTDDYIGENTTVDYTLYESVC---FKKDVRNFKAWFLPLMYSVICFVGLLGNGLVILTYIYFKRLKTMTDTYLLNLAVADILFLLILPFWAYSEAKS--WIFGVYLCKGIFGIYKLSFFSGMLLLLCISIDRYVAIVQAVRHRARVLLISKLSCVGIWMLALFLSIPELLYSGLQK-GE-TLRCSLV---SAQVEALITIQVAQMVFGFLVPMLAMSFCYLIIIRTL---------LQARNFERNKAIKVIIAVVVVFIVFQLPYNGVVLAQTVANFNKQLNIAYDVTYSLASVRCCVNPFLYAFIGVKFRSDLFKLFKDLGCLSQERLRHWS--------HVRN
>NY5R_HUMAN
-------------NTAATRNSDFPVWDDYKSSVDDLQYFLIGLYTFVSLLGFMGNLLILMALMKKRNQKTTVNFLIGNLAFSDILVVLFCSPFTLTSVLLDQWMFGKVMCHIMPFLQCVSVLVSTLILISIAIVRYHMIKHPI-SNNLTANHGYFLIATVWTLGFAICSPLPVFHSLVESS--RYLCVES---WPSDSYRIAFTISLLLVQYILPLVCLTVSHTSVCRSISCGVHEKRSVTRIKKRSRSVFYRLTILILVFAVSWMPLHLFHVVTDFNDNLRHFKLVYCICHLLGMMSCCLNPILYGFLNNGIKADLVSLIHCLHM----------------------
>VG74_HSVSA
VKLDFSSEDFSNYSYNYSGDIYYGDVAPCVVNFLISESALAFIYVLMFLCNAIGNSLVLRTFLKYRA-QAQSFDYLMMGFCLNSLFLAGYLLMRLLRMFE--IFMNTELCKLEAFFLNLSIYWSPFILVFISVLRCLLIFCATRLWVKKTLIGQVFLCCSFVLACFGALPHVMVTSYYE----PSSCIEE---VLTEQLRTKLNTFHTWYSFAGPLFITVICYSMSCYKL---------FKTKLSKRAEVVTIITMTTLLFIVFCIPYYIMESIDTLLRVGSAIVYGIQCTYMLLVLYYCMLPLMFAMFGSLFRQRMAAWCKTICHC---------------------
>CCKR_HUMAN
DSLLVNGSNITPPCELGLENETLFCLDQPRPSKEWQPAVQILLYSLIFLLSVLGNTLVITVLIRNKRMRTVTNIFLLSLAVSDLMLCLFCMPFNLIPNLLKDFIFGSAVCKTTTYFMGTSVSVSTFNLVAISLERYGAICKPLSRVWQTKSHALKVIAATWCLSFTIMTPYPIYN----NNQTANMCRFL---LPNDVMQQSWHTFLLLILFLIPGIVMMVAYGLISLELYQGRANSNSSAANLMAKKRVIRMLIVIVVLFFLCWMPIFSANAWRAYDTASRLSGTPISFILLLSYTSSCVNPIIYCFMNKRFRLGFMATFPCCPNPGPPGARGEVTTGASLSRFSYS
>OPSR_CARAU
GD--ETTRESMFVYTNSNNTRDPFEGPNYHIAPRWVYNLATVWMFFVVVASTFTNGLVLVATAKFKKLRHPLNWILVNLAVADLAETLLASTISVTNQFFGYFILGHPMCIFEGFTVSVCGIAGLWSLTVISWERWVVVCKPFGNVKFDAKWASAGIIFSWVWSAIWCAPPIFGWSRFWPHGLKTSCGPDVFSGSEDPGVQSYMIVLMITCCIIPLAIIILCYIAVWLAIRTVAQQQKDSESTQKAEKEVSRMVVVMIFAYCFCWGPYTFCACFAAANPGYAFHPLAAAMPAYFAKSATIYNPIIYVFMNRQFRVCIMQLFGKKVDDG-----SEVSS------VAPA
>OPS2_LIMPO
PN-----ASVVDTMPKEMLYMIHEHWYAFPPMNPLWYSILGVAMIILGIICVLGNGMVIYLMMTTKSLRTPTNLLVVNLAFSDFCMMAFMMPTMASNCFAETWILGPFMCEVYGMAGSLFGCASIWSMVMITLDRYNVIVRGMAAAPLTHKKATLLLLFVWIWSGGWTILPFFGWSRYVPEGNLTSCTVD--YLTKDWSSASYVIIYGLAVYFLPLITMIYCYFFIVHAVAEHNVAANADQQKQSAECRLAKVAMMTVGLWFMAWTPYLIIAWAGVFSSGTRLTPLATIWGSVFAKANSCYNPIVYGISHPRYKAALYQRFPSLACGSGESGSDVKTMEEKPKSPEA-
>O5I1_HUMAN
-----------MEFTDRNYTLVTEFILLGFPTRPELQIVLFLMFLTLYAIILIGNIGLMLLIRIDPHLQTPMYFFLSNLSFVDLCYFSDIVPKMLVNFLSENKSISYYGCALQFYFFCTFADTESFILAAMAYDRYVAICNPLYTVVMSRGICMRLIVLSYLGGNMSSLVHTSFAFIL---KNHFFCDLPKLSCTDTTINEWLLSTYGSSVEIICFIIIIISYFFILLSV--------LKIRSFSGRKKTFSTCASHLTSVTIYQGTLLFIYSRPSYLY---SPNTDKIISVFYTIFIPVLNPLIYSLRNKDVKDAAEKVLRSKVDSS--------------------
>OAR_HELVI
--TEEVIEDDRDACAVADDPKYPSSFGITLAVPEWEAICTAIVLTLIIISTIVGNILVILSVFTYKPLRIVQNFFIVSLAVADLTVAILVLPLNVAYSILGQWVFGIYVCKMWLTCDIMCCTSSILNLCAIALDRYWAITDPIYAQKRTLERVLLMIGVVWVLSLIISSPPLLGWNDW-E-P--TPCRLT--------SQPGFVIFSSSGSFYIPLVIMTVVYFEIYLATKKRAVYEEKQRISLTRERRAARTLGIIMGVFVVCWLPFFVIYLVIPFCASCCLSNKFINFITWLGYCNSALNPLIYTIFNMDFRRAFKKLLCMKP-----------------------
>O1D4_HUMAN
-------------MDGDNQSENSQFLLLGISESPEQQQILFWMFLSMYLVTVLGNVLIILAISSDSHLHTPMYFFLANLSFTDLFFVTNTIPKMLVNFQSQNKAISYAGCLTQLYFLVSLVTLDNLILAVMAYDRYVAICCPLYVTAMSPGLCVLLLSLCWGLSVLYGLLLTFLLTRV---THYLFCDMYWLACSNTHIIHTALIATGWFIFLTLLGFMTTSYVRIVRTI--------LQMPSASKKYKTFSTCASHLGVVSLFYGTLAMVYLQPLHTY----SMKDSVATVMYAVLTPMMNPFIYSLRNKDMHGAPGRVLWRPFQRP--------------------
>OPSD_APIME
AR-F-NNQTVVDKVPPDMLHLIDANWYQYPPLNPMWHGILGFVIGMLGFVSAMGNGMVVYIFLSTKSLRTPSNLFVINLAISNFLMMFCMSPPMVINCYYETWVLGPLFCQIYAMLGSLFGCGSIWTMTMIAFDRYNVIVKGLSGKPLSINGALIRIIAIWLFSLGWTIAPMFGWNRYVPEGNMTACGTD--YFNRGLLSASYLVCYGIWVYFVPLFLIIYSYWFIIQAVAAHMNVRSSENQNTSAECKLAKVALMTISLWFMAWTPYLVINFSGIFNLVK-ISPLFTIWGSLFAKANAVYNPIVYGISHPKYRAALFAKFPSLACAAEPSSDAVSVTDNEKSNA---
>TRFR_CHICK
----------MENGTGDEQNHTGLLLSSQEFVTAEYQVVTILLVLLICGLGIVGNIMVVLVVLRTKHMRTPTNCYLVSLAVADLMVLVAAGLPNITESLYKSWVYGYVGCLCITYLQYLGINASSFSITAFTIERYIAICHPIAQFLCTFSRAKKIIIFVWSFASVYCMLWFFLLDLN--DT-VVSCGYK------RSYYSPIYMMDFGIFYVLPMVLATVLYGLIARILFLNVNSNKSFNSTIASRRQVTKMLAVVVVLFAFLWMPYRTLVVVNSFLSSPFQENWFLLFCRICIYLNSAINPVIYNLMSQKFRAAFRKLCNCHLKRDKKPANYSVKESDHFSSEIED
>OPSR_HUMAN
DSYEDSTQSSIFTYTNSNSTRGPFEGPNYHIAPRWVYHLTSVWMIFVVTASVFTNGLVLAATMKFKKLRHPLNWILVNLAVADLAETVIASTISIVNQVSGYFVLGHPMCVLEGYTVSLCGITGLWSLAIISWERWLVVCKPFGNVRFDAKLAIVGIAFSWIWSAVWTAPPIFGWSRYWPHGLKTSCGPDVFSGSSYPGVQSYMIVLMVTCCIIPLAIIMLCYLQVWLAIRAVAKQQKESESTQKAEKEVTRMVVVMIFAYCVCWGPYTFFACFAAANPGYAFHPLMAALPAYFAKSATIYNPVIYVFMNRQFRNCILQLFGKKVDDG-----SELVSSV--SSVSPA
>CB2R_MOUSE
----MEGCRETEVTNGSNGGLEFNPMKEYMILSSGQQIAVAVLCTLMGLLSALENMAVLYIILSSRRRRKPSYLFISSLAGADFLASVIFACNFVIFHVFHG-VDSNAIFLLKIGSVTMTFTASVGSLLLTAVDRYLCLCYPPYKALVTRGRALVALCVMWVLSALISYLPLMGWTC-----CPSPCSEL------FPLIPNDYLLGWLLFIAILFSGIIYTYGYVLWKAHRHAEHQVPGIARMRLDVRLAKTLGLVLAVLLICWFPALALMGHSLVTTLSDQVKEAFAFCSMLCLVNSMVNPIIYALRSGEIRSAAQHCLIGWKKYLQGLGPEGKVTETEADVKTT-
>PE21_HUMAN
--MSPCGPLNLSLAGEATTCAA---PWVPNTSAVPPSGASPALPIFSMTLGAVSNLLALALLAQAAGSATTFLLFVASLLATDLAGHVIPGALVLRLYTAG-RAPAGGACHFLGGCMVFFGLCPLLLGCGMAVERCVGVTRPLHAARVSVARARLALAAVAAVALAVALLPLARVGRYELQYPGTWCFIG--LGPPGGWRQALLAGLFASLGLVALLAALVCNTLSGLALHRSRRRAHGPRRARAHDVEMVGQLVGIMVVSCICWSPMLVLVALAVGGWSSTSLQRPLFLAVRLASWNQILDPWVYILLRQAVLRQLLRLLPPRAGAKGGPAGLGLSSLRSSRHSGLS
>OPR._CAVPO
GSHLQGNLSLLSPNHSGLPPHLLLNASHSAFLPLGLKVTIVGLYLAVCIGGLLGNCLVMYVILRHTKMKTATNIYIFNLALADTLVLLTLPFQATDILLGF-WPFGNTLCKTVIAIDYYNMFTSTFTLTAMSVDRYVAICHPIALDVRTSSKAQAVNVAIWALALVVGVPVAIMGSAQVEE---IECLVE-IPDPQDYWGPVFAVSIFLFSFIIPVLIISVCYSLMIRRLHGVRLL-SGSREKDRNLRRITRLVLVVVAVFVGCWTPVQVFVLVQGLGVQPETTVAILRFCTALGYVNSCLNPILYAFLDENFKACFRKFCCASALHREMQVSDRVALGCKTTETVPR
>OPSV_CHICK
-----MSSDDDFYLFTNGSVPGPWDGPQYHIAPPWAFYLQTAFMGIVFAVGTPLNAVVLWVTVRYKRLRQPLNYILVNISASGFVSCVLSVFVVFVASARGYFVFGKRVCELEAFVGTHGGLVTGWSLAFLAFERYIVICKPFGNFRFSSRHALLVVVATWLIGVGVGLPPFFGWSRYMPEGLQCSCGPDWYTVGTKYRSEYYTWFLFIFCFIVPLSLIIFSYSQLLSALRAVAAQQQESATTQKAEREVSRMVVVMVGSFCLCYVPYAALAMYMVNNRDHGLDLRLVTIPAFFSKSACVYNPIIYCFMNKQFRACIMETVCGKPLTD--DSDASTSSVSSSQVGPT-
>YKR5_CAEEL
MNSENGLDSVTQIMYDMKKYNIVNDVLPPPNHEDLHVVIMAVSYLLLFLLGTCGNVAVLTTIYHVIRTLDNTLIYVIVLSCVDFGVCLSLPITVIDQILGF-WMFGKIPCKLHAVFENFGKILSALILTAMSFDRYAGVC---------HPQRKRLRSRNFAITILLAPGMLTRM-------KIEKCTVD----IDSQMFTAFTIYQFILCYCTPLVLIAFFYTKLLSKLRE----TRTFKSSQIPFLHISLYTLAVACFYFLCWTPFWMATLFAVYLENSPVFVYIMYFIHALPFTNSAINWILYGRVFLETVS---------------------------------
>5H1B_MOUSE
CAPPPPAASQTGVPLTNLSHNSADGYIYQDSIALPWKVLLVALLALITLATTLSNAFVIATVYRTRKLHTPANYLIASLAVTDLLVSILVMPISTMYTVTGRWTLGQVVCDFWLSSDITCCTASIMHLCVIALDRYWAITDAVYSAKRTPKRAAIMIVLVWVFSISISLPPFFWR----E-E--LDCFVN-------TDHVLYTVYSTVGAFYLPTLLLIALYGRIYVEARSRRVSLEKKKLMAARERKATKTLGIILGAFIVCWLPFFIISLVMPICKDAWFHMAIFDFFNWLGYLNSLINPIIYTMSNEDFKQAFHKLIRFKCAG---------------------
>P2YR_HUMAN
AAFLAGPGSSWGNSTVASTAAVSSSFKCALTKTGFQFYYLPAVYILVFIIGFLGNSVAIWMFVFHMKPWSGISVYMFNLALADFLYVLTLPALIFYYFNKTDWIFGDAMCKLQRFIFHVNLYGSILFLTCISAHRYSGVVYPLSLGRLKKKNAICISVLVWLIVVVAISPILFYSGTG----KTITCYDT-TSDEYLRSYFIYSMCTTVAMFCVPLVLILGCYGLIVRALIY------KDLDNSPLRRKSIYLVIIVLTVFAVSYIPFHVMKTMNLRARLDDRVYATYQVTRGLASLNSCVDPILYFLAGDTFRRRLSRATRKASRRSEANLQSKSLPEFKQNGDTSL
>5H4_CAVPO
------------------MDKLDANVSSKEGFGSVEKVVLLTFLSAVILMAILGNLLVMVAVCRDRQRKIKTNYFIVSLAFADLLVSVLVMPFGAIELVQDIWVYGEMFCLVRTSLDVLLTTASIFHLCCISLDRYYAICCQPYRNKMTPLRIALMLGGCWVIPMFISFLPIMQGWNNIRK-NSTYCVFM--------VNKPYAITCSVVAFYIPFLLMVLAYYRIYVTAKEHGRPDQHSTHRMRTETKAAKTLCIIMGCFCLCWAPFFVTNIVDPFIDYT-VPGQLWTAFLWLGYINSGLNPFLYAFLNKSFRRAFLIILCCDDERYRRPSILGQTINGSTHVLR--
>MC4R_PIG
GMHTSLHFWNRSTYGLHSNASEPLGKGYSEGGCYEQLFVSPEVFVTLGVISLLENILVIVAIAKNKNLHSPMYFFICSLAVADMLVSVSNGSETIVITLLNSQSFTVNIDNVIDSVICSSLLASICSLLSIAVDRYFTIFYALYHNIMTVKRVGIIISCIWAVCTVSGVLFIIYYS----------------------DDSSAVIICLITVFFTMLALMASLYVHMFLMARLH-RIPGTGTIRQGANMKGAITLTILIGVFVVCWAPFFLHLIFYISCPQNVCFMSHFNLYLILIMCNSIIDPLIYALRSQELRKTFKEIICCYPLGGLCDLSSRY------------
>SSR1_MOUSE
GEGACSRGPGSGAADGMEEPGRNASQNGTLSEGQGSAILISFIYSVVCLVGLCGNSMVIYVILRYAKMKTATNIYILNLAIADELLMLSVPFLVTSTLLRH-WPFGALLCRLVLSVDAVNMFTSIYCLTVLSVDRYVAVVHPIAARYRRPTVAKVVNLGVWVLSLLVILPIVVFSRTAADG--TVACNML-MPEPAQRWLVGFVLYTFLMGFLLPVGAICLCYVLIIAKMRMVALK-AGWQQRKRSERKITLMVMMVVMVFVICWMPFYVVQLVNVFAEQ--DDATVSQLSVILGYANSCANPILYGFLSDNFKRSFQRILCLS-----WMDNAAETALKSRAYSVED
>CML1_HUMAN
EDEDYNTSISYGDEYPDYLDSIVVLEDLSPLEARVTRIFLVVVYSIVCFLGILGNGLVIIIATFKMK-KTVNMVWFLNLAVADFLFNVFLPIHITYAAMDYHWVFGTAMCKISNFLLIHNMFTSVFLLTIISSDRCISVLLPVSQNHRSVRLAYMACMVIWVLAFFLSSPSLVFRDTAN-SS--WPTHSQ-MDPVGYSRHMVVTVTRFLCGFLVPVLIITACYLTIVCKL---------HRNRLAKTKKPFKIIVTIIITFFLCWCPYHTLNLLELHHTAMSVFSLGLPLATALAIANSCMNPILYVFMGQDFKK-FKVALFSRLVNALSEDTGHSFTKMSSMNERTS
>ACM1_RAT
------------MNTSVPPAVSPNITVLAPGKGPWQVAFIGITTGLLSLATVTGNLLVLISFKVNTELKTVNNYFLLSLACADLIIGTFSMNLYTTYLLMGHWALGTLACDLWLALDYVASNASVMNLLLISFDRYFSVTRPLYRAKRTPRRAALMIGLAWLVSFVLWAPAILFWQYLV-VL-AGQCYIQ------FLSQPIITFGTAMAAFYLPVTVMCTLYWRIYRETENRRGKAKRKTFSLVKEKKAARTLSAILLAFILTWTPYNIMVLVSTFCKDC-VPETLWELGYWLCYVNSTVNPMCYALCNKAFRDTFRLLLLCRWDKRRWRKIPKRPSRQC-------
>MRG_HUMAN
QNPNLVSQLCGVFLQNETNETIHMQMSMAVGQQALPLNIIAPKAVLVSLCGVLLNGTVFWLLCCGAT--NPYMVYILHLVAADVIYLCCSAVGFLQVTLLTYHGVVFFIPDFLAILSPFSFEVCLCLLVAISTERCVCVLFPIYRCHRPKYTSNVVCTLIWGLPFCINIVKSLFLTYWK-------------------KACVIFLKLSGLFHAILSLVMCVSSLTLLIRFL--------CCSQQQKATRVYAVVQISAPMFLLWALPLSVAPLITDF----KMFVTTSYLISLFLIINSSANPIIYFFVGSLRKKRLKESLRVILQRALADKPEVGIDPMEQPHSTQH
>A2AC_RAT
AEGPNGSDAGEWGSGGGANASGTDWGPPPGQYSAGAVAGLAAVVGFLIVFTVVGNVLVVIAVLTSRALRAPQNLFLVSLASADILVATLVMPFSLANELMAYWYFGQVWCGVYLALDVLFCTSSIVHLCAISLDRYWSVTQAVYNLKRTPRRVKATIVAVWLISAVISFPPLVSFYR-------PQCGLN--------DETWYILSSCIGSFFAPCLIMGLVYARIYRVAKLRRRAVCRRKVAQAREKRFTFVLAVVMGVFVLCWFPFFFSYSLYGICREAQLPEPLFKFFFWIGYCNSSLNPVIYTVFNQDFRRSFKHILFRRRRRGFRQ-----------------
>OPSD_SEPOF
---MGRDIPDNETWWYNPTMEVHPHWKQFNQVPDAVYYSLGIFIGICGIIGCTGNGIVIYLFTKTKSLQTPANMFIINLAFSDFTFSLVNGFPLMTISCFIKWVFGMAACKVYGFIGGIFGLMSIMTMSMISIDRYNVIGRPMASKKMSHRRAFLMIIFVWMWSTLWSIGPIFGWGAYVLEGVLCNCSFD--YITRDSATRSNIVCMYIFAFCFPILIIFFCYFNIVMAVSNHRLNLRKAQAGASAEMKLAKISIVIVTQFLLSWSPYAVVALLAQFGPIEWVTPYAAQLPVMFAKASAIHNPLIYSVSHPKFREAIAENFPWIITCCQFDEKEVEEIPATEQS-GGE
>SSR3_HUMAN
SVSTTSEPENASSAWPPDATLGNVSAGPSPAGLAVSGVLIPLVYLVVCVVGLLGNSLVIYVVLRHTASPSVTNVYILNLALADELFMLGLPFLAAQNALSY-WPFGSLMCRLVMAVDGINQFTSIFCLTVMSVDRYLAVVHPTSARWRTAPVARTVSAAVWVASAVVVLPVVVFSGVPR-----STCHMQ-WPEPAAAWRAGFIIYTAALGFFGPLLVICLCYLLIVVKVRSARVWAPSCQRRRRSERRVTRMVVAVVALFVLCWMPFYVLNIVNVVCPLPPAFFGLYFLVVALPYANSCANPILYGFLSYRFKQGFRRVLLRPSRRVRSQEPTVGEDEEEEDG---E
>O.1R_HUMAN
MGVPPGSREPSPVPPDYED-EFLRYLWRDYLYPKQYEWVLIAAYVAVFVVALVGNTLVCLAVWRNHHMRTVTNYFIVNLSLADVLVTAICLPASLLVDITESWLFGHALCKVIPYLQAVSVSVAVLTLSFIALDRWYAICHPL-LFKSTARRARGSILGIWAVSLAIMVPQAAVMECSSRTRLFSVCDER---WADDLYPKIYHSCFFIVTYLAPLGLMAMAYFQIFRKLWGRQPRFLAEVKQMRARRKTAKMLMVVLLVFALCYLPISVLNVLKRVFGMFEAVYACFTFSHWLVYANSAANPIIYNFLSGKFREQFKAAFSCCLPGLGPCGSLKASHKS---LSLQS
>O2C1_HUMAN
-------------MDGVNDSSLQGFVLMSISDHPQLEMIFFIAILFSYLLTLLGNSTIILLSRLEARLHTPMYFFLSNLSSLDLAFATSSVPQMLINLWGPGKTISYGGCITQLYVFLWLGATECILLVVMAFDRYVAVCRPLYTAIMNPQLCWLLAVIAWLGGLGNSVIQSTFTLQL---PEGFLCEVPKLACGDTSLNQAVLNGVCTFFTAVPLSIIVISYCLIAQAV--------LKIHSAEGRRKAFNTCLSHLLVVFLFYGSASYGYLLPAKNS---KQDQGKFISLFYSLVTPMVNPLIYTLRNMEVKGALRRLLGKGREVG--------------------
>OPSD_MOUSE
MNGTEGPNFYVPFSNVTGVGRSPFEQPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVVFTWIMALACAAPPLVGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIFFLICWLPYASVAFYIFTHQGSNFGPIFMTLPAFFAKSSSIYNPVIYIMLNKQFRNCMLTTLCCGKNPLGDDDASATASKTETSQVAPA
>CB1R_TARGR
EFFNRSVSTFKENDDNLKCGENFMDMECFMILTASQQLIIAVLSLTLGTFTVLENFLVLCVILQSRTRCRPSYHFIGSLAVADLLGSVIFVYSFLDFHVFHR-KDSSNVFLFKLGGVTASFTASVGSLFLTAIDRYISIHRPLYKRIVTRTKAVIAFCVMWTIAIIIAVLPLLGWNCK-K--LKSVCSDI------FPLIDENYLMFWIGVTSILLLFIVYAYVYILWKAHSHSEDQITRPEQTRMDIRLAKTLVLILVVLIICWGPLLAIMVYDVFGKMNNPIKTVFAFCSMLCLMDSTVNPIIYALRSQDLRHAFLEQCPPCEGTSQPLDNSMEGNN-AGNVHRAA
>B2AR_HUMAN
MGQPGNGSAFLLAPNRSHAPDH----DVTQQRDEVWVVGMGIVMSLIVLAIVFGNVLVITAIAKFERLQTVTNYFITSLACADLVMGLAVVPFGAAHILMKMWTFGNFWCEFWTSIDVLCVTASIETLCVIAVDRYFAITSPFYQSLLTKNKARVIILMVWIVSGLTSFLPIQMHWYRAI-N--TCCDFF--------TNQAYAIASSIVSFYVPLVIMVFVYSRVFQEAKRQGRTLRRSSKFCLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHVIQDNL-IRKEVYILLNWIGYVNSGFNPLIYCRSP-DFRIAFQELLCLRRSSLKAYGNGYSGNTGEQSGYHVE
>OPSD_ZOSOP
MNGTEGPFFYIPMVNTTGIVRSPYEYPQYYLVNPAAYACLGAYMFFLILVGFPVNFLTLYVTLEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVLGRLGCNIEGFFATLGGEIALWSLVVLAIERWVVVCKPISNFRFGENHAIMGVAFTWFMASACAVPPLVGWSRYIPEGMQCSCGVDYYTRAEGFNNESFVIYMFIVHFCIPLAVVGFCYGRLLCAVKEAAAAQQESETTQRAEREVSRMVVIMVIGFLVCWLPYASVAWYIFTHQGSEFGPPFMTVPAFFAKSSSIYNPMIYICMNKQFRHCMITTLCCGKNPFEEEEGASTVSSSSVSPAA--
>GP37_HUMAN
LAQNGSLGEGIHEPGGPRRGNRLKNPFYPLTQESYGAYAVMCLSVVIFGTGIIGNLAVMCIVCHNYYMRSISNSLLANLAFWDFLIIFFCLPLVIFHELTKKWLLEDFSCKIVPYIEVASLGVTTFTLCALCIDRFRAATNVQYEMIENCSSTTAKLAVIWVGALLLALPEVVLRQLSKI-K--KISPDLTIYVLALTYDSARLWWYFGCYFCLPTLFTITCSLVTARKIRKAEKATRGNKRQIQLESQMNCTVVALTILYGFCIIPENICNIVTAYMATGQTMDLLNIISQFLLFFKSCVTPVLLFCLCKPFSRAFMECCCCCCEECIQKSSTVTYTTELELSPFST
>OPSR_CHICK
HEEEDTTRDSVFTYTNSNNTRGPFEGPNYHIAPRWVYNLTSVWMIFVVAASVFTNGLVLVATWKFKKLRHPLNWILVNLAVADLGETVIASTISVINQISGYFILGHPMCVVEGYTVSACGITALWSLAIISWERWFVVCKPFGNIKFDGKLAVAGILFSWLWSCAWTAPPIFGWSRYWPHGLKTSCGPDVFSGSSDPGVQSYMVVLMVTCCFFPLAIIILCYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIVAYCFCWGPYTFFACFAAANPGYAFHPLAAALPAYFAKSATIYNPIIYVFMNRQFRNCILQLFGKKVDDG-----SEVVSSVSNSSVSPA
>IL8B_BOVIN
EGFEDEFGNYSGTPPTEDYDYSPC----EISTETLNKYAVVVIDALVFLLSLLGNSLVMLVILYSRIGRSVTDVYLLNLAMADLLFAMTLPIWTASKAKG--WVFGTPLCKVVSLLKEVNFYSGILLLACISMDRYLAIVHATRTLTQKWHWVKFICLGIWALSVILALPIFIFREAYQ-YS-DLVCYED-LGANTTKWRMIMRVLPQTFGFLLPLLVMLFCYGFTLRTL---------FSAQMGHKHRAMRVIFAVVLVFLLCWLPYNLVLIADTLMRAHNDIGRALDATEILGFLHSCLNPLIYVFIGQKFRHGLLKIMAIHGLISKEFLAKDG------------
>OPSD_MACFA
MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNAEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLFGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIVIFFCYGQLVFTVKEARAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSASIYNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVSKTETSQVAPA
>5H2B_RAT
ILQKTCDHLILTDRSGLKAESAAEEMKQTAENQGNTVHWAALLIFAVIIPTIGGNILVILAVSLEKRLQYATNYFLMSLAVADLLVGLFVMPIALLTIMFEAWPLPLALCPAWLFLDVLFSTASIMHLCAISLDRYIAIKKPIANQCNSRTTAFVKITVVWLISIGIAIPVPIKGIEA-NA---ITCELT------KDRFGSFMLFGSLAAFFAPLTIMIVTYFLTIHALRKKRRMGKKPAQTISNEQRASKVLGIVFLFFLLMWCPFFITNVTLALCDSCTTLKTLLQIFVWVGYVSSGVNPLIYTLFNKTFREAFGRYITCNYQATKSVKVLRKGNSMVENSKFFT
>OPSR_ANOCA
NDEDDTTRDSLFTYTNSNNTRGPFEGPNYHIAPRWVYNITSVWMIFVVIASIFTNGLVLVATAKFKKLRHPLNWILVNLAIADLGETVIASTISVINQISGYFILGHPMCVLEGYTVSTCGISALWSLAVISWERWVVVCKPFGNVKFDAKLAVAGIVFSWVWSAVWTAPPVFGWSRYWPHGLKTSCGPDVFSGSDDPGVLSYMIVLMITCCFIPLAVILLCYLQVWLAIRAVAAQQKESESTQKAEKEVSRMVVVMIIAYCFCWGPYTVFACFAAANPGYAFHPLAAALPAYFAKSATIYNPIIYVFMNRQFRNCIMQLFGKKVDDG-----SELVSSVSNSSVSPA
>5H1A_MOUSE
-MDMFSLGQGNNTTTSLEPFGTGGNDTGLSNVTFSYQVITSLLLGTLIFCAVLGNACVVAAIALERSLQNVANYLIGSLAVTDLMVSVLVLPMAALYQVLNKWTLGQVTCDLFIALDVLCCTSSILHLCAIALDRYWAITDPIYVNKRTPRRAAALISLTWLIGFLISIPPMLGWRA---NP--NECTIS--------KDHGYTIYSTFGAFYIPLLLMLVLYGRIFRAARFRKNEEAKRKMALARERKTVKTLGIIMGTFILCWLPFFIVALVLPFCESSHMPELLGAIINWLGYSNSLLNPVIYAYFNKDFQNAFKKIIKCKFCR---------------------
>CCR4_HUMAN
IYTSDNYTEEMG-SGDYDSMKEPC---FREENANFNKIFLPTIYSIIFLTGIVGNGLVILVMGYQKKLRSMTDKYRLHLSVADLLFVITLPFWAVDAVAN--WYFGNFLCKAVHVIYTVNLYSSVLILAFISLDRYLAIVHATSQRPRKLLAEKVVYVGVWIPALLLTIPDFIFANVSE-DD-RYICDRF---YPNDLWVVVFQFQHIMVGLILPGIVILSCYCIIISKL---------SHSKGHQKRKALKTTVILILAFFACWLPYYIGISIDSFILLENTVHKWISITEALAFFHCCLNPILYAFLGAKFKTSAQHALTSVSRGSSLKILSKG----------GH
>CKR3_MACMU
MTTSLDTVETFGPTSYDDDMGLLC---EKADVGALIAQFVPPLYSLVFMVGLLGNVVVVMILIKYRRLRIMTNIYLLNLAISDLLFLFTLPFWIHYVRERN-WVFSHGMCKVLSGFYHTGLYSEIFFIILLTIDRYLAIVHAVALRARTVTFGVITSIVTWGLAVLAALPEFIFYGTEK-LF-KTLCSAIYPQDTVYSWRHFHTLKMTILCLALPLLVMAICYTGIIKTL---------LRCPSKKKYKAIRLIFVIMAVFFIFWTPYNVAILISTYQSVLKHLDLFVLATEVIAYSHCCVNPVIYAFVGERFRKYLRHFFHRHVLMHLGKYIPFLT-------SSVS
>OPSD_GLOME
MNGTEGLNFYVPFSNKTGVVRSPFEYPQYYLAEPWQFSVLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYIPLNLAVANLFMVFGGFTTTLYTSLHAYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGLALTWVMAMACAAPPLVGWSRYIPEGMQCSCGIDYYTSRQEVNNESFVIYMFVVHFTIPLVIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVVAFLICWVPYASVAFYIFTHQGSDFGPIFMTIPSFFAKSSSIYNPVIYIMMNKQLRNCMLTTLCCGRNPLGDDEASTTASKTETSQVAPA
>FMLR_MOUSE
---MDTNMSLLMNKSAVNLMNVSGSTQSVSAGYIVLDVFSYLIFAVTFVLGVLGNGLVIWVAGFRMK-HTVTTISYLNLAIADFCFTSTLPFYIASMVMGGHWPFGWFMCKFIYTVIDINLFGSVFLIALIALDRCICVLHPVAQNHRTVSLAKKVIIVPWICAFLLTLPVIIRLTTVPSPWPVEKRKVA------VTMLTVRGIIRFIIGFSTPMSIVAICYGLITTKI---------HRQGLIKSSRPLRVLSFVVAAFFLCWCPFQVVALISTIQVREPGIVTALKITSPLAFFNSCLNPMLYVFMGQDFRERLIHSLPASLERALTEDSAQTGT----------
>OPSD_LAMJA
MNGTEGDNFYVPFSNKTGLARSPYEYPQYYLAEPWKYSALAAYMFFLILVGFPVNFLTLFVTVQHKKLRTPLNYILLNLAMANLFMVLFGFTVTMYTSMNGYFVFGPTMCSIEGFFATLGGEVALWSLVVLAIERYIVICKPMGNFRFGNTHAIMGVAFTWIMALACAAPPLVGWSRYIPEGMQCSCGPDYYTLNPNFNNESYVVYMFVVHFLVPFVIIFFCYGRLLCTVKEAAAAQQESASTQKAEKEVTRMVVLMVIGFLVCWVPYASVAFYIFTHQGSDFGATFMTLPAFFAKSSALYNPVIYILMNKQFRNCMITTLCCGKNPLGDDESGASVSSVSTSPVSPA
>C5AR_CAVPO
---MMVTVSYDYDYNSTFLPDGFVD--NYVERLSFGDLVAVVIMVVVFLVGVPGNALVVWVTACEAR-RHINAIWFLNLAAADLLSCLALPILLVSTVHLNHWYFGDTACKVLPSLILLNMYTSILLLATISADRLLLVLSPICQRFRGGCLAWTACGLAWVLALLLSSPSFLYRRTHN-SF--VYCVTD-YG-RDISKERAVALVRLLVGFIVPLITLTACYTFLLLRT---------WSRKATRSAKTVKVVVAVVSSFFVFWLPYQVTGILLAWHSPNRNTKALDAVCVAFAYINCCINPIIYVVAGHGFQGRLLKSLPSVLRNVLTEESLDKSTVD--------
>P2Y8_.ENLA
ATSYPTFLTTPYLPMKLLMNLTNDTEDICVFDEGFKFLLLPVSYSAVFMVGLPLNIAAMWIFIAKMRPWNPTTVYMFNLALSDTLYVLSLPTLVYYYADKNNWPFGEVLCKLVRFLFYANLYSSILFLTCISVHRYRGVCHPISLRRMNAKHAYVICALVWLSVTLCLVPNLIFVTVS-----NTICHDT-TRPEDFARYVEYSTAIMCLLFGIPCLIIAGCYGLMTRELMKP---SGNQQTLPSYKKRSIKTIIFVMIAFAICFMPFHITRTLYYYARLLNVINVTYKVTRPLASANSCIDPILYFLANDRYRRRLIRTVRRRSSVPNRRCMHTNMTAGPLPVISAE
>5H1D_FUGRU
ELDNNSLDYFSSNFTDIPSN--TTVAHWTEATLLGLQISVSVVLAIVTLATMLSNAFVIATIFLTRKLHTPANFLIGSLAVTDMLVSILVMPISIVYTVSKTWSLGQIVCDIWLSSDITFCTASILHLCVIALDRYWAITDALYSKRRTMRRAAVMVAVVWVISISISMPPLFWR----H-E--KECMVN-------TDQISYTLYSTFGAFYVPTVLLIILYGRIYVAARSRKLALERKRLCAARERKATKTLGIILGAFIICWLPFFVVTLVWAICKEC-FDPLLFDVFTWLGYLNSLINPVIYTVFNDEFKQAFQKLIKFRR-----------------------
>5HT_LYMST
TGQFINGSHSSRSRDNASANDTDDRYWSLTVYSHEHLVLTSVILGLFVLCCIIGNCFVIAAVMLERSLHNVANYLILSLAVADLMVAVLVMPLSVVSEISKVWFLHSEVCDMWISVDVLCCTASILHLVAIAMDRYWAVTSI-YIRRRSARRILLMIMVVWIVALFISIPPLFGWRD-PD-K--GTCIIS--------QDKGYTIFSTVGAFYLPMLVMMIIYIRIWLVARSRNDTRTREKLELKRERKAARTLAIITGAFLICWLPFFIIALIGPFVDPEGIPPFARSFVLWLGYFNSLLNPIIYTIFSPEFRSAFQKILFGKYRRGHR------------------
>5H6_HUMAN
-----------MVPEPGPTANSTPAWGAGPPSAPGGSGWVAAALCVVIALTAAANSLLIALICTQPARNTS-NFFLVSLFTSDLMVGLVVMPPAMLNALYGRWVLARGLCLLWTAFDVMCCSASILNLCLISLDRYLLILSPLYKLRMTPLRALALVLGAWSLAALASFLPLLLGWH---E-VPGQCRLL--------ASLPFVLVASGLTFFLPSGAICFTYCRILLAARKQVESRRLATKHSRKALKASLTLGILLGMFFVTWLPFFVANIVQAVCDC--ISPGLFDVLTWLGYCNSTMNPIIYPLFMRDFKRALGRFLPCPRCPRERQASLASSGPRPG-LSLQQ
>AG2R_MERUN
------MALNSSADDGIKRIQDDC---PKAGRHSYIFVMIPTLYSIIFVVGIFGNSLVVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPVWAVYTAMEYRWPFGNHLCKIASAGISFNLYASVFLLTCLSIDRYLAIVHPMSRLRRTMLVAKVTCVVIWLLAGLASLPAVIHRNVYF-TN--TVCAFH-YESQNSTLPVGLGLTKNILGFMFPFLIILTSYTLIWKALKKA----YEIQKNKPRNDDIFRIIMAIVLFFFFSWIPHQIFTFLDVLIQLGDVVDTAMPITICIAYFNNCLNPLFYGFLGKKFKKYFLQLLKYIPPKAKSHSSLSTRPSD-------N
>GRHR_HORSE
--MANSDSLEQDPNHCSAINNSIPLIQGKLPTLTVSGKIRVTVTFFLFLLSTAFNASFLLKLQKWTQKLSRMKVLLKHLTLANLLETLIVMPLDGMWNITVQWYAGEFLCKVLSYLKLFSMYAPAFMMVVISLDRSLAITQPL-AVQSNSKLEQSMISLAWILSIVFAGPQLYIFRMIYTV--FSQCVTH--SFPQWWHQAFYNFFTFGCLFIIPLLIMLICNAKIIFALTRVPRKNQSKNNIPRARLRTLKMTVAFATSFVVCWTPYYVLGIWYWFDPEMRVSDPVNHFFFLFAFLNPCFDPLIYGYFSL-------------------------------------
>REIS_TODPA
---------------------MFGNPAMTGLHQFTMWEHYFTGSIYLVLGCVVFSLCGMCIIFLARQKPRRKYAILIHVLITAMAVNGGDPAHASSSIVGR-WLYGSVGCQLMGFWGFFGGMSHIWMLFAFAMERYMAVCHREFYQQMPSVYYSIIVGLMYTFGTFWATMPLLGWASYGLEVHGTSCTIN--YSVSDESYQSYVFFLAIFSFIFPMVSGWYAISKAWSGLSAIPD-EKEKDKDILSEEQLTALAGAFILISLISWSGFGYVAIYSALTHGGQLSHLRGHVPPIMSKTGCALFPLLIFLLTARSLPKSDTKKP--------------------------
>GRPR_RAT
NCSHLNLEVDPFLSCNNTFNQTLSPPKMDNWFHPGIIYVIPAVYGLIIVIGLIGNITLIKIFCTVKSMRNVPNLFISSLALGDLLLLVTCAPVDASKYLADRWLFGRIGCKLIPFIQLTSVGVSVFTLTALSADRYKAIVRPMIQASHALMKICLKAALIWIVSMLLAIPEAVFSDLHPQT--FISCAPY---HSNELHPKIHSMASFLVFYIIPLSIISVYYYFIARNLIQSLPVNIHVKKQIESRKRLAKTVLVFVGLFAFCWLPNHVIYLYRSYHYSEMLHFITSICARLLAFTNSCVNPFALYLLSKSFRKQFNTQLLCCQPSLLNR--SHSMTSFKSTNP-SA
>CKR5_MACMU
----MDYQVSSPTYDIDYYTSEPC---QKINVKQIAARLLPPLYSLVFIFGFVGNILVVLILINCKRLKSMTDIYLLNLAISDLLFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQR-EG-HYTCSSHFPYSQYQFWKNFQTLKMVILGLVLPLLVMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKRFCKCCSIFA-------SSVY
>B4AR_MELGA
--------------MTPLPAGNGSVPNCSWAAVLSRQWAVGAALSITILVIVAGNLLVIVAIAKTPRLQTMTNVFVTSLACADLVMGLLVVPPGATILLSGHWPYGTVVCELWTSLDVLCVTASIETLCAIAVDRYLAITAPLYEALVTKGRAWAVVCMVWAISAFISFLPIMNHWWRDV-R--RCCDFV--------TNMTYAIVSSTVSFYVPLLVMIFVYVRVFAVATRHSRGRRPSRLLAIKEHKALKTLGIIMGTFTLCWLPFFVANIIKVFCRPL-VPDQLFLFLNWLGYVNSAFNPIIYCRSP-DFRSAFRKLLCCPRRADRRLHAAPQAFSPRGDPMEDS
>OPSV_.ENLA
-----MLEEEDFYLFKNVSNVSPFDGPQYHIAPKWAFTLQAIFMGMVFLIGTPLNFIVLLVTIKYKKLRQPLNYILVNITVGGFLMCIFSIFPVFVSSSQGYFFFGRIACSIDAFVGTLTGLVTGWSLAFLAFERYIVICKPMGNFNFSSSHALAVVICTWIIGIVVSVPPFLGWSRYMPEGLQCSCGPDWYTVGTKYRSEYYTWFIFIFCFVIPLSLICFSYGRLLGALRAVAAQQQESASTQKAEREVSRMVIFMVGSFCLCYVPYAAMAMYMVTNRNHGLDLRLVTIPAFFSKSSCVYNPIIYSFMNKQFRGCIMETVCGRPMSD--DSSVSSSTVSSSQVSPA-
>OPSG_ORYLA
ENGTEGKNFYIPMNNRTGLVRSPYEYPQYYLADPWQFKLLGIYMFFLILTGFPINALTLVVTAQNKKLRQPLNFILVNLAVAGLIMVCFGFTVCIYSCMVGYFSLGPLGCTIEGFMATLGGQVSLWSLVVLAIERYIVVCKPMGSFKFTATHSAAGCAFTWIMASSCAVPPLVGWSRYIPEGIQVSCGPDYYTLAPGFNNESFVMYMFSCHFCVPVFTIFFTYGSLVMTVKAAAAQQQDSASTQKAEKEVTRMCFLMVLGFLLAWVPYASYAAWIFFNRGAAFSAMSMAIPSFFSKSSALFNPIIYILLNKQFRNCMLATIGMGG-----------VSTSKTEVSTAA
>NY6R_RABIT
--MEVSLNDPASNKTSAKSNSSAFFYFESCQSPSLALLLLLIAYTVVLIMGICGNLSLITIIFKKQRAQNVTNILIANLSLSDILVCVMCIPFTAIYTLMDRWIFGNTMCKLTSYVQSVSISVSIFSLVLIAIERYQLIVNPR-GWKPSASHAYWGIMLIWLFSLLLSIPLLLSYHLTDSH--HVVCVEH---WPSKTNQLLYSTSLIMLQYFVPLGFMFICYLKIVICLHKRNSKRRENESRLTENKRINTMLISIVVTFAACWLPLNTFNVIFDWYHEVCHHDLVFAICHLVAMVSTCINPLFYGFLNRNFQKDLVVLIHHCLCFALRERY---TLHTDESKGSLR
>CCR4_BOVIN
IFTSDNYTEDDLGSGDYDSMKEPC---FREENAHFNRIFLPTVYSIIFLTGIVGNGLVILVMGYQKKLRSMTDKYRLHLSVADLLFVLTLPFWAVDAVAN--WYFGKFLCKAVHVIYTVNLYSSVLILAFISLDRYLAIVHATSQKPRKLLAEKVVYVGVWLPAVLLTIPDLIFADIKE-DE-RYICDRF---YPSDLWLVVFQFQHIVVGLLLPGIVILSCYCIIISKL---------SHSKGYQKRKALKTTVILILTFFACWLPYYIGISIDSFILLESTVHKWISITEALAFFHCCLNPILYAFLGAKFKTSAQHALTSVSRGSSLKILSKG----------GH
>OPS2_SCHGR
YESSVGLPLLGWNVPTEHLDLVHPHWRSFQVPNKYWHFGLAFVYFMLMCMSSLGNGIVLWIYATTKSIRTPSNMFIVNLALFDVLMLLEMPMLVVSSLFYQR-PVWELGCDIYAALGSVAGIGSAINNAAIAFDRYRTISCPI-DGRLTQGQVLALIAGTWVWTLPFTLMPLLRIWSRFAEGFLTTCSFD--YLTDDEDTKVFVGCIFAWSYAFPLCLICCFYYRLIGAVREHNVKSNADTEAQSAEIRIAKVALTIFFLFLCSWTPYAVVAMIGAFGNRAALTPLSTMIPAVTAKIVSCIDPWVYAINHPRFRAEVQKRMKWLHLGEDARSSKSDRTVGNVSASA--
>DADR_HUMAN
--------------MRTLNTSAMDGTGLVVERDFSVRILTACFLSLLILSTLLGNTLVCAAVIRFRHRSKVTNFFVISLAVSDLLVAVLVMPWKAVAEIAGFWPFG-SFCNIWVAFDIMCSTASILNLCVISVDRYWAISSPFYERKMTPKAAFILISVAWTLSVLISFIPVQLSWHKAGN-TIDNCDSS--------LSRTYAISSSVISFYIPVAIMIVTYTRIYRIAQKQVECESSFKMSFKRETKVLKTLSVIMGVFVCCWLPFFILNCILPFCGSGCIDSNTFDVFVWFGWANSSLNPIIYAFNA-DFRKAFSTLLGCYRLCPATNNAIETAAMFSSH-----
>THRR_HUMAN
LTEYRLVSINKSSPLQKQLPAFISEDASGYLTSSWLTLFVPSVYTGVFVVSLPLNIMAIVVFILKMKVKKPAVVYMLHLATADVLFVSVLPFKISYYFSGSDWQFGSELCRFVTAAFYCNMYASILLMTVISIDRFLAVVYPMSLSWRTLGRASFTCLAIWALAIAGVVPLVLKEQTIQ--N--TTCHDVLNETLLEGYYAYYFSAFSAVFFFVPLIISTVCYVSIIRCL------SSSAVANRSKKSRALFLSAAVFCIFIICFGPTNVLLIAHYSFLSHEAAYFAYLLCVCVSSISSCIDPLIYYYASSECQRYVYSILCCKESSDPSSYNSSGDTCS--------
>V2R_HUMAN
MLMASTTSAVPGHPSLPSLPSNSSQERPLDTRDPLLARAELALLSIVFVAVALSNGLVLAALARRGRHWAPIHVFIGHLCLADLAVALFQVLPQLAWKATDRFRGPDALCRAVKYLQMVGMYASSYMILAMTLDRHRAICRPMAYRHGSGAHWNRPVLVAWAFSLLLSLPQLFIFAQRNSG--VTDCWAC---FAEPWGRRTYVTWIALMVFVAPTLGIAACQVLIFREIHASGRRPGEGAHVSAAVAKTVRMTLVIVVVYVLCWAPFFLVQLWAAWDPEA-LEGAPFVLLMLLASLNSCTNPWIYASFSSSVSSELRSLLCCARGRTPPSLGPQDSSLAKDTSS---
>GALR_HUMAN
----MELAVGNLSEGNASCPEPPAPEPGPLFGIGVENFVTLVVFGLIFALGVLGNSLVITVLARSKPPRSTTNLFILNLSIADLAYLLFCIPFQATVYALPTWVLGAFICKFIHYFFTVSMLVSIFTLAAMSVDRYVAIVHSRSSSLRVSRNALLGVGCIWALSIAMASPVAYHQGLF-SN--QTFCWEQ---WPDPRHKKAYVVCTFVFGYLLPLLLICFCYAKVLNHLHKKLK--NMSKKSEASKKKTAQTVLVVVVVFGISWLPHHIIHLWAEFGVFPPASFLFRITAHCLAYSNSSVNPIIYAFLSENFRKAYKQVFKCHIRKDSHLSDTKEPPSTNCTHV---
>5H7_RAT
SSWMPHLLSGFLEVTASPAPTNVSGCGEQINYGRVEKVVIGSILTLITLLTIAGNCLVVISVCFVKKLRQPSNYLIVSLALADLSVAVAVMPFVSVTDLIGGWIFGHFFCNVFIAMDVMCCTASIMTLCVISIDRYLGITRPLYPVRQNGKCMAKMILSVWLLSASITLPPLFGWAQ--N-D--KVCLIS--------QDFGYTIYSTAVAFYIPMSVMLFMYYQIYKAARKSSRLERKNISIFKREQKAATTLGIIVGAFTVCWLPFFLLSTARPFICGTCIPLWVERTCLWLGYANSLINPFIYAFFNRDLRTTYRSLLQCQYRNINRKLSAAGAERPERSEFVLQ
>UL33_RCMVM
----MDVLLGTEELEDELHQLHFNYTCVPSLGLSVARDAETAVNFLIVLVGGPMNFLVLATQMLSNRSVSTPTLYMTNLYLANLLTVATLPFLMLSNRGL--VGSSPEGCKIAALAYYATCTAGFATLMLIAINRYR-VIHQRRSGAGSKRQTYAVLAVTWLASLMCASPAPLYATVMAA-DAFETCIIYSYDQVK-TVLATFKILITMIWGITPVVMMSWFYVFFYRRL---------KLTSYRRRSQTLTFVTTLMLSFLVVQTPFVAIMSYDSYGVLNNKRDAVSMLARVVPNFHCLLNPVLYAFLGRDFNKRFILCISGKLFSRRRALRERAGPVCALP---SK
>O.YR_MOUSE
GTPAANWSIELDLGSGVPPGAEGNLTAGPPRRNEALARVEVAVLCLILFLALSGNACVLLALRTTRHKHSRLFFFMKHLSIADLVVAVFQVLPQLLWDITFRFYGPDLLCRLVKYLQVVGMFASTYLLLLMSLDRCLAICQPL--RSLRRRTDRLAVLATWLGCLVASVPQVHIFSLRE----VFDCWAV---FIQPWGPKAYVTWITLAVYIVPVIVLAACYGLISFKIWQNRAAVSSVKLISKAKIRTVKMTFIIVLAFIVCWTPFFFVQMWSVWDVNA-KEASAFIIAMLLASLNSCCNPWIYMLFTGHLFHELVQRFLCCSARYLKGSRPGENSSTFVLSRCSS
>PF2R_RAT
--MSINS---------SKQPASSAAGLIANTTCQTENRLSVFFSIIFMTVGIVSNSLAIAILMKAYQSKASFLLLASGLVITDFFGHLINGGIAVFVYASDKFDQSNILCSVFGISMVFSGLCPLFLGSTMAIERCIGVTNPLHSTKITSKHVKMILSGVCMFAVFVALLPILGHRDYQIQASRTWCFYN--TEHIEDWEDRFYLLFFSSLGLLALGISFSCNAVTGVTLLRVKFRSQQHRQGRSHHLEMVIQLLAIMCVSCVCWSPFLVTMANIAINGNNPVTCETTLFALRMATWNQILDPWVYILLRKAVLRNLYKLASRCCGVNIISLHIWELKVAAISESPAA
>D1DR_FUGRU
--------------MAQNFSTVGDGKQMLLERDSSKRVLTGCFLSLLIFTTLLGNTLVCVAVTKFRHRSKVTNFFVISLAISDLLVAILVMPWKAATEIMGFWPFG-EFCNIWVAFDIMCSTASILNLCVISVDRYWAISSPFYERKMTPKVACLMISVAWTLSVLISFIPVQLNWHKALN-PPDNCDSS--------LNRTYAISSSLISFYIPVAIMIVTYTRIYRIAQKQSLSECSFKMSFKRETKVLKTLSVIMGVFVCCWLPFFILNCMVPFCEADCISSTTFDVFVWFGWANSSLNPIIYAFNA-DFRKAFSILLGCHRLCPGNS-AIEIAPLSNPSCQYQP
>CKR2_RAT
HSLFPRSIQELDEGATTPYDYDDGEPCHKTSVKQIGAWILPPLYSLVFIFGFVGNMLVIIILISCKKLKSMTDIYLFNLAISDLLFLLTLPFWAHYAANE--WVFGNIMCKLFTGLYHIGYFGGIFFIILLTIDRYLAIVHAVALKARTVTFGVITSVVTWVVAVFASLPGIIFTKSEQ-ED-QHTCGPY----FPTIWKNFQTIMRNILSLILPLLVMVICYSGILHTL--------FRCRNEKKRHRAVRLIFAIMIVYFLFWTPYNIVLFLTTFQEFLMHLDQAMQVTETLGMTHCCVNPIIYAFVGEKFRRYLSIFFRKHIAKNLCKQCPVFVSSTFTPSTGEQ
>OPRK_HUMAN
PPNSSAWFPGWAEPDSNGSAGSEDAQLEPAHISPAIPVIITAVYSVVFVVGLVGNSLVMFVIIRYTKMKTATNIYIFNLALADALVTTTMPFQSTVYLMNS-WPFGDVLCKIVISIDYYNMFTSIFTLTMMSVDRYIAVCHPVALDFRTPLKAKIINICIWLLSSSVGISAIVLGGTKVDVD-VIECSLQFPDDDYSWWDLFMKICVFIFAFVIPVLIIIVCYTLMILRLKSVRLL-SGSREKDRNLRRITRLVLVVVAVFVVCWTPIHIFILVEALGSTSTAALSSYYFCIALGYTNSSLNPILYAFLDENFKRCFRDFCFPLKMRMERQSTSRVAYLRDIDGMNKP
>ACM3_BOVIN
N---------ISRAAGNLSSPNGTTSDPLGGHTIWQVVFIAFLTGVLALVTIIGNILVIVAFKVNKQLKTVNNYFLLSLACADLIIGVISMNLFTTYIIMNRWALGNLACDLWLSIDYVASNASVMNLLVISFDRYFSITRPLYRAKRTTKRAGVMIGLAWVISFILWAPAILFWQYFV-VP-PGECFIQ------FLSEPTITFGTAIAAFYMPVTIMTILYWRIYKETEKRKTRTKRKRMSLIKEKKAAQTLSAILLAFIITWTPYNIMVLVNTFCDSC-IPKTYWNLGYWLCYINSTVNPVCYALCNKTFRNTFKMLLLCQCDKRKRRKQQYQHKRVPEQAL---
>C3.1_HUMAN
---MDQFPESVTENFEYDDLAEAC---YIGDIVVFGTVFLSIFYSVIFAIGLVGNLLVVFALTNSKKPKSVTDIYLLNLALSDLLFVATLPFWTHYLINEKG--LHNAMCKFTTAFFFIGFFGSIFFITVISIDRYLAIVLAASMNNRTVQHGVTISLGVWAAAILVAAPQFMFTKQKE-----NECLGDYPEVLQEIWPVLRNVETNFLGFLLPLLIMSYCYFRIIQTL---------FSCKNHKKAKAIKLILLVVIVFFLFWTPYNVMIFLETLKLYDKDLRLALSVTETVAFSHCCLNPLIYAFAGEKFRRYLYHLYGKCLAVLCGRSVHVDRSRHGSVLSSNF
>5H4_HUMAN
------------------MDKLDANVSSEEGFGSVEKVVLLTFLSTVILMAILGNLLVMVAVCWDRQRKIKTNYFIVSLAFADLLVSVLVMPFGAIELVQDIWIYGEVFCLVRTSLDVLLTTASIFHLCCISLDRYYAICCQPYRNKMTPLRIALMLGGCWVIPTFISFLPIMQGWNNIRK-NSTYCVFM--------VNKPYAITCSVVAFYIPFLLMVLAYYRIYVTAKEHSRPDQHSTHRMRTETKAAKTLCIIMGCFCLCWAPFFVTNIVDPFIDYT-VPGQVWTAFLWLGYINSGLNPFLYAFLNKSFRRAFLIILCCDDERYRRPSILGQTINGSTHVLRDA
>5H2A_PIG
NTSDAFNWTVDSENRTNLSCEGCLSPPCFSLLHLQEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYWPLPSKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHRRFNSRTKAFLKIIAVWTISVGISMPIPVFGLQD-VF---GSCLL---------ADDNFVLIGSFVSFFIPLTIMVITYFLTIKSLQKEREPGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICKESDVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENKKPLQLILAYKSSQLQTGQK
>OPSD_DIPVU
MNGTEGPYFYVPMVNTSGIVRSPYEYPQYYLVNPAAYAALGAYMFLLILVGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVLGRLGCNIEGFFATLGGEIALWSLVVLAIERWVVVCKPISNFRFGENHAIMGLAFTWLMALACAAPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFICHFSIPLLVVFFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVIMMVIAFLVCWLPYASVAWWIFTHQGSDFGPVFMTIPAFFAKSSSIYNPMIYICLNKQFRHCMITTLCCGKNPFEEEEGASTSVSSSSVSPAA-
>HH1R_HUMAN
----------MSLPNSSCLLEDKMCEGNKTTMASPQLMPLVVVLSTICLVTVGLNLLVLYAVRSERKLHTVGNLYIVSLSVADLIVGAVVMPMNILYLLMSKWSLGRPLCLFWLSMDYVASTASIFSVFILCIDRYRSVQQPLYLKYRTKTRASATILGAWFLSFLWVIPILGWNHFM--VR-EDKCETD------FYDVTWFKVMTAIINFYLPTLLMLWFYAKIYKAVRQHLRSQYVSGLHMNRERKAAKQLGFIMAAFILCWIPYFIFFMVIAFCKNC-CNEHLHMFTIWLGYINSTLNPLIYPLCNENFKKTFKRILHIRS-----------------------
>OPSD_DELDE
MNGTEGLNFYVPFSNKTGVVRSPFEYPQYYLAEPWQFSVLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVANLFMVFGGFTTTLYTSLHAYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGLALTWIMAMACAAPPLVGWSRYIPEGMQCSCGIDYYTLSPEVNNESFVIYMFVVHFTIPLVIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVVAFLICWVPYASVAFYIFTHQGSDFGPIFMTIPSFFAKSSSIYNPVIYIMMNKQFRNCMLTTLCCGRNPLGDDEASTTASKTETSQVAPA
>OPS1_DROME
PLS---NGSVVDKVTPDMAHLISPYWNQFPAMDPIWAKILTAYMIMIGMISWCGNGVVIYIFATTKSLRTPANLLVINLAISDFGIMITNTPMMGINLYFETWVLGPMMCDIYAGLGSAFGCSSIWSMCMISLDRYQVIVKGMAGRPMTIPLALGKIAYIWFMSSIWCLAPAFGWSRYVPEGNLTSCGID--YLERDWNPRSYLIFYSIFVYYIPLFLICYSYWFIIAAVSAHMNVRSSEDAEKSAEGKLAKVALVTITLWFMAWTPYLVINCMGLFKFEG-LTPLNTIWGACFAKSAACYNPIVYGISHPKYRLALKEKCPCCVFGKVDDGKSSDSEAESKA-----
>RGR_BOVIN
---------------------MAESGTLPTGFGELEVLAVGTVLLVEALSGLSLNILTILSFCKTPELRTPSHLLVLSLALADSGIS-LNALVAATSSLLRRWPYGSEGCQAHGFQGFVTALASICSSAAVAWGRYHHFCTRS---RLDWNTAVSLVFFVWLSSAFWAALPLLGWGHYDYEPLGTCCTLD--YSRGDRNFTSFLFTMAFFNFLLPLFITVVSYRLME--------------QKLGKTSRPPVNTVLPARTLLLGWGPYALLYLYATIADATSISPKLQMVPALIAKAVPTVNAMNYALGSEMVHRGIWQCLSPQRREHSREQ----------------
>VU51_HSV6U
----------------MEKETKSLAWPATAEFYGWVFIFSSIQLCTMVLLTVRFNSFKVGR-E--------YAVFTFAGMSFNCFLLPIKMGLLSGH-----WSLPRDFCAILLYIDDFSIYFSSWSLVFMAIERINHFCYSTLLNENSKALAKVCFPIVWIISGVQALQMLNNYKATA---ETPQCFLA--------FRSGYDMWLMLVYSVMIPVMLVFIYIYSKNFM-----------LLKDELSTVTTYLCIYLLLGTIAHLPKAGLSEIESD----KIFYGLRDIFMALPVLKVYYIPVMAYCMACDDHTVPVRLCSIWLVNLCKKCFSCTLEVGIKMLK---
>A1AA_ORYLA
-----------MTPSSVTLNCSNCSHVLAPELNTVKAVVLGMVLGIFILFGVIGNILVILSVVCHRHLQTVTYYFIVNLAVADLLLSSTVLPFSAIFEILDRWVFGRVFCNIWAAVDVLCCTASIMSLCVISVDRYIGVSYPLYPAIMTKRRALLAVMLLWVLSVIISIGPLFGWKEP--AP--TVCKIT--------EEPGYAIFSAVGSFYLPLAIILAMYCRVYVVAQKELRSFALRLLKFSREKKAAKTLGIVVGCFVLCWLPFFLVLPIGSIFPAYRPSDTVFKITFWLGYFNSCINPIIYLCSNQEFKKAFQSLLGVHCLRMTPRAHHHHTQGHSLT-----
>B1AR_BOVIN
VPDGAATAARLLVPASPPASLLTSASEGPPLPSQQWTAGMGLLMAFIVLLIVVGNVLVLVAIAKTPRLQTLTNLFIMSLASADLVMGLLVVPFGATIVVWGRWEYGSFFCELWTSVDVLCVTASIETLCVIALDRYLAITSPFYQSLLTRARARALVCTVWAISALVSFLPIFMQWWGDS-R--ECCDFI--------INEGYAITSSVVSFYVPLCIMAFVYLRVFREAQKQPGRRRPPRLVALREQKALKTLGIIMGVFTLCWLPFFLANVVKAFHRDL-VPDRLFVFFNWLGYANSAFNPIIYCRSP-DFRKAFQRLLCCARRAACGSHAAAGCLAVARPSPSPG
>OPSH_ASTFA
NE--DTTRESAFVYTNANNTRDPFEGPNYHIAPRWVYNVSSLWMIFVVIASVFTNGLVIVATAKFKKLRHPLNWILVNLAIADLGETVLASTISVINQIFGYFILGHPMCVFEGWTVSVCGITALWSLTIISWERWVVVCKPFGNVKFDGKWAAGGIIFSWVWAIIWCTPPIFGWSRYWPHGLKTSCGPDVFSGSEDPGVASYMITLMLTCCILPLSIIIICYIFVWSAIHQVAQQQKDSESTQKAEKEVSRMVVVMILAFIVCWGPYASFATFSAVNPGYAWHPLAAAMPAYFAKSATIYNPIIYVFMNRQFRSCIMQLFGKKVEDA-----SEVSTAS--------
>OL15_MOUSE
-------------MEVDSNSSSGTFILMGVSDHPHLEIIFFAVILASYLLTLVGNLTIILLSRLDARLHTPMYFFLSNLSSLDLAFTTSSVPQMLKNLWGPDKTISYGGCVTQLYVFLWLGATECILLVVMAFDRYVAVCRPLYMTVMNPRLCWGLAAISWLGGLGNSVIQSTFTLQL---PDNFLCEVPKLACGDTSLNEAVLNGVCTFFTVVPVSVILVSYCFIAQAV--------MKIRSVEGRRKAFNTCVSHLVVVFLFYGSAIYGYLLPAKSS---NQSQGKFISLFYSVVTPMVNPLIYTLRNKEVKGALGRLLGKGRGAS--------------------
>ETBR_HUMAN
SLAPAEVPKGDRTAGSPPRTISPPPCQGPIEIKETFKYINTVVSCLVFVLGIIGNSTLLRIIYKNKCMRNGPNILIASLALGDLLHIVIDIPINVYKLLAEDWPFGAEMCKLVPFIQKASVGITVLSLCALSIDRYRAVASWSIKGIGVPKWTAVEIVLIWVVSVVLAVPEAIGFDIITRI--LRICLLHQKTAFMQFYKTAKDWWLFSFYFCLPLAITAFFYTLMTCEMLRKSGM-IALNDHLKQRREVAKTVFCLVLVFALCWLPLHLSRILKLTLYNQSFLLVLDYIGINMASLNSCINPIALYLVSKRFKNCFKSCLCCWCQSFE-EKQSLEFKANDHGYDNFR
>PE23_MOUSE
---------MASMWAPEHSAEAHSNLS---STTDDCGSVSVAFPITMMVTGFVGNALAMLLVSRSYRRKKSFLLCIGWLALTDLVGQLLTSPVVILVYLSQRLDPSGRLCTFFGLTMTVFGLSSLLVASAMAVERALAIRAPHYASHMKTRAT-PVLLGVWLSVLAFALLPVLGVG----RYSGTWCFISNETDPAREPGSVAFASAFACLGLLALVVTFACNLATIKALVSRAKASQSSAQWGRITTETAIQLMGIMCVLSVCWSPLLIMMLKMIFNQMSEKECNSFLIAVRLASLNQILDPWVYLLLRKILLRKFCQIRDHTN-YASSSTSLPCWSDQLER-----
>TA2R_MOUSE
--MWPNG-----------TSLGACFRPVNITLQERRAIASPWFAASFCALGLGSNLLALSVLAGARPPRSSFLALLCGLVLTDFLGLLVTGAIVASQHAALLTDPSCRLCYFMGVAMVFFGLCPLLLGAAMASERFVGITRPFSRPTATSRRAWATVGLVWVAAGALGLLPLLGLGRYSVQYPGSWCFLT----LGTQRGDVVFGLIFALLGSASVGLSLLLNTVSVATLCRVYHTREATQRPRDCEVEMMVQLVGIMVVATVCWMPLLVFIMQTLLQTPPRATEHQLLIYLRVATWNQILDPWVYILFRRSVLRRLHPRFSSQLQAVSLRRPPAQ------------
>ACM3_HUMAN
N---------VSRAAGNFSSPDGTTDDPLGGHTVWQVVFIAFLTGILALVTIIGNILVIVSFKVNKQLKTVNNYFLLSLACADLIIGVISMNLFTTYIIMNRWALGNLACDLWLAIDYVASNASVMNLLVISFDRYFSITRPLYRAKRTTKRAGVMIGLAWVISFVLWAPAILFWQYFV-VP-PGECFIQ------FLSEPTITFGTAIAAFYMPVTIMTILYWRIYKETEKRKTRTKRKRMSLVKEKKAAQTLSAILLAFIITWTPYNIMVLVNTFCDSC-IPKTFWNLGYWLCYINSTVNPVCYALCNKTFRTTFKMLLLCQCDKKKRRKQQYQHKRAPEQAL---
>CCR4_FELCA
IYPSDNYTEDDLGSGDYDSMKEPC---FREENAHFNRIFLPTVYSIIFLTGIVGNGLVILVMGYQKKLRSMTDKYRLHLSVADLLFVLTLPFWAVDAVAN--WYFGKFLCKAVHVIYTVNLYSSVLILAFISLDRYLAIVHATSQRPRKLLAEKVVYVGVWIPALLLTIPDFIFANVRE-DG-RYICDRF---YPSDSWLVVFQFQHIMVGLILPGIVILSCYCIIISKL---------SHSKGYQKRKALKTTVILILAFFACWLPYYIGISIDSFILLESTVHKWISITEALAFFHCCLNPILYAFLGAKFKTSAQHALTSVSRGSSLKILSKG----------GH
>GPRJ_MOUSE
AEAAEALLPHGLMGLHEEHSWMSNRTELQYELNPGEVATASIFFGALWLFSIFGNSLVCLVIHRSRRTQSTTNYFVVSMACADLLISVASTPFVVLQFTTGRWTLGSAMCKVVRYFQYLTPGVQIYVLLSICIDRFYTIVYPL-SFKVSREKAKKMIAASWILDAAFVTPVFFFYG---SNW-HCNYFLP-----PSWEGTAYTVIHFLVGFVIPSILIILFYQKVIKYIWRIDGR-RTMNIVPRTKVKTVKMFLLLNLVFLFSWLPFHVAQLWHPHEQDYKKSSLVFTAVTWVSFSSSASKPTLYSIYNANFRRGMKETFCMSSMKCYRSNAYTIKRNYVGISEIPP
>AA2B_MOUSE
------------------------------MQLETQDALYVALELVIAALAVAGNVLVCAAVGASSALQTPTNYFLVSLATADVAVGLFAIPFAITISLG--FCTDFHGCLFLACFVLVLTQSSIFSLLAVAVDRYLAIRVPLYKGLVTGTRARGIIAVLWVLAFGIGLTPFLGWNSKDI-A-PLTCLFE-----NVVPMSYMVYFNFFGCVLPPLLIMLVIYIKIFMVACKQ---MDHSRTTLQREIHAAKSLAMIVGIFALCWLPVHAINCITLFHPALDKPKWVMNVAILLSHANSVVNPIVYAYRNRDFRYSFHKIISRYVLCQAETKGGSGLSLGL-------
>MAS_HUMAN
-----MDGSNVTSFVVEEPTNISTGRNASVGNAHRQIPIVHWVIMSISPVGFVENGILLWFLCFRMR-RNPFTVYITHLSIADISLLFCIFILSIDYALDYESSGHYYTIVTLSVTFLFGYNTGLYLLTAISVERCLSVLYPIYRCHRPKYQSALVCALLWALSCLVTTMEYVMCI-DRHS--RNDC-----------RAVIIFIAILSFLVFTPLMLVSSTILVVKIRK----------NTWASHSSKLYIVIMVTIIIFLIFAMPMRLLYLLYYEYW--STFGNLHHISLLFSTINSSANPFIYFFVGSSKKKRFKESLKVVLTRAFKDEMQPRTVTVETVV----
>TSHR_SHEEP
LQAFDNHYDYTVCGGSEEMVCTPKSDEFNPCEDIMGYKFLRIVVWFVSLLALLGNVFVLVILLTSHYKLTVPRFLMCNLAFADFCMGLYLLLIASVDLYTQSWQTG-PGCNTAGFFTVFASELSVYTLTVITLERWYAITFAMLDRKIRLWHAYVIMLGGWVCCFLLALLPLVGISSY----KVSICLPM-----TETPLALAYIILVLLLNIIAFIIVCACYVKIYITVRNP------HYNPGDKDTRIAKRMAVLIFTDFMCMAPISFYALSALMNKPLITVTNSKILLVLFYPLNSCANPFLYAIFTKAFQRDVFMLLSKFGICKRQAQAYRGSTGIRVQKVPPD
>NK2R_BOVIN
----MGACVVMTDINISSGLDSNATGITAFSMPGWQLALWTAAYLALVLVAVMGNATVIWIILAHQRMRTVTNYFIVNLALADLCMAAFNAAFNFVYASHNIWYFGRAFCYFQNLFPITAMFVSIYSMTAIAADRYMAIVHPF-QPRLSAPGTRAVIAGIWLVALALAFPQCFYST---GA---TKCVVAWPEDSGGKMLLLYHLIVIALIYFLPLVVMFVAYSVIGLTLWRRGHQHGANLRHLQAKKKFVKTMVLVVVTFAICWLPYHLYFILGTFQEDIKFIQQVYLALFWLAMSSTMYNPIIYCCLNHRFRSGFRLAFRCCPWVTPTEE----YTPSLSTRVNRC
>5H2C_MOUSE
LLVWQFDISISPVAAIVTDTFNSSDGGRLFQFPDGVQNWPALSIVVIIIMTIGGNILVIMAVSMEKKLHNATNYFLMSLAIADMLVGLLVMPLSLLAILYDYWPLPRYLCPVWISLDVLFSTASIMHLCAISLDRYVAVRSPVHSRFNSRTKAIMKIAIVWAISIGVSVPIPVIGLRD-VF--NTTCVL---------NDPNFVLIGSFVAFFIPLTIMVITYFLTIYVLRRQKKKPRGTMQAINNEKKASKVLGIVFFVFLIMWCPFFITNILSVLCGKAKLMEKLLNVFVWIGYVCSGINPLVYTLFNKIYRRAFSKYLRCDYKPDKKPPVRQIALSGRELNVNIY
>OLF1_CHICK
-------------MASGNCTTPTTFILSGLTDNPGLQMPLFMVFLAIYTITLLTNLGLIALISVDLHLQTPMYIFLQNLSFTDAAYSTVITPKMLATFLEERKTISYVGCILQYFSFVLLTVTESLLLAVMAYDRYVAICKPLYPSIMTKAVCWRLVESLYFLAFLNSLVHTSGLLKL---SNHFFCDISQISSSSIAISELLVIISGSLFVMSSIIIILISYVFIILTV--------VMIRSKDGKYKAFSTCTSHLMAVSLFHGTVIFMYLRPVKLF---SLDTDKIASLFYTVVIPMLNPLIYSWRNKEVKDALRRLTATTFGFIDSKAVQ--------------
>O.2R_RAT
ASELNETQEPFLNPTDYDDEEFLRYLWREYLHPKEYEWVLIAGYIIVFVVALIGNVLVCVAVWKNHHMRTVTNYFIVNLSLADVLVTITCLPATLVVDITETWFFGQSLCKVIPYLQTVSVSVSVLTLSCIALDRWYAICHPL-MFKSTAKRARNSIVVIWIVSCIIMIPQAIVMERSSKTTLFTVCDER---WGGEVYPKMYHICFFLVTYMAPLCLMVLAYLQIFRKLWCRKARVAAEIKQIRARRKTARMLMVVLLVFAICYLPISILNVLKRVFGMFETVYAWFTFSHWLVYANSAANPIIYNFLSGKFREEFKAAFSCCLG-VHRRQGDRLESRKSLTTQISN
>CKR8_MOUSE
DYTMEPNVTMT--DYYPDFFTAPC---DAEFLLRGSMLYLAILYCVLFVLGLLGNSLVILVLVGCKKLRSITDIYLLNLAASDLLFVLSIPFQTHNLLDQ--WVFGTAMCKVVSGLYYIGFFSSMFFITLMSVDRYLAIVHAVAIKVRTASVGTALSLTVWLAAVTATIPLMVFYQVAS-ED-MLQCFQF-YEEQSLRWKLFTHFEINALGLLLPFAILLFCYVRILQQL---------RGCLNHNRTRAIKLVLTVVIVSLLFWVPFNVALFLTSLHDLHQRLALAIHVTEVISFTHCCVNPVIYAFIGEKFKKHLMDVFQKS-CSHIFLYLGRQRQ------LSSN
>GPRF_MACNE
----MDPEETSVYLDYYYATSPNPDIRETHSHVPYTSVFLPVFYTAVFLTGVLGNLVLMGALHFKPGSRRLIDIFIINLAASDFIFLVTLPLWVDKEASLGLWRTGSFLCKGSSYMISVNMHCSVFLLTCMSVDRYLAIVCPVSRKFRRTDCAYVVCASIWFISCLLGLPTLLSRELT-IDD-KPYCAEK----KATPLKLIWSLVALIFTFFVPLLSIVTCYCCIARKLCAH---YQQSGKHNKKLKKSIKIIFIVVAAFLVSWLPFNTSKLLAIVSGLQAILQLGMEVSGPLAFANSCVNPFIYYIFDSYIRRAIVHCLCPCLKNYDFGSSTETALSTFIHAEDFT
>MC3R_MOUSE
NSSCCLSSVSPMLPNLSEHPAAPPASNRSGSGFCEQVFIKPEVFLALGIVSLMENILVILAVVRNGNLHSPMYFFLCSLAAADMLVSLSNSLETIMIAVINSDQFIQHMDNIFDSMICISLVASICNLLAIAIDRYVTIFYALYHSIMTVRKALTLIGVIWVCCGICGVMFIIYYS----------------------EESKMVIVCLITMFFAMVLLMGTLYIHMFLFARLHIAVAGVVAPQQHSCMKGAVTITILLGVFIFCWAPFFLHLVLIITCPTNICYTAHFNTYLVLIMCNSVIDPLIYAFRSLELRNTFKEILCGCNSMNLG------------------
>CCR5_RAT
DDLYKELAIYSNSTEIPLQDSIFCSTEEGPLLTSFKTIFMPVAYSLIFLLGMMGNILVLVILERHRHTRSSTETFLFHLAVADLLLVFILPFAVAEGSVG--WVLGTFLCKTVIALHKINFYCSSLLLACIAVDRYLAIVHAVAYRRRRLLSIHITCSTIWLAGFLFALPELLFAKVVQ-NE-LPQCIFSQENEAETRAWFASRFLYHTGGFLLPMLVMAWCYVGVVHRL--------LQAQRRPQRQKAVRVAILVTSIFLLCWSPYHIVIFLDTLERLKGYLSVAITLCEFLGLAHCCLNPMLYTFAGVKFRSDLSRLLTKLGCAG---PASLC--------PGWR
>OPSD_LOLFO
---MGRDIPDNETWWYNPYMDIHPHWKQFDQVPAAVYYSLGIFIAICGIIGCVGNGVVIYLFTKTKSLQTPANMFIINLAFSDFTFSLVNGFPLMTISCFMKWVFGNAACKVYGLIGGIFGLMSIMTMTMISIDRYNVIGRPMASKKMSHRKAFIMIIFVWIWSTIWAIGPIFGWGAYTLEGVLCNCSFD--YITRDTTTRSNILCMYIFAFMCPIVVIFFCYFNIVMSVSNHRLNLRKAQAGANAEMKLAKISIVIVTQFLLSWSPYAVVALLAQFGPIEWVTPYAAQLPVMFAKASAIHNPMIYSVSHPKFRERIASNFPWILTCCQYDEKEIEEIPAGEQS-GGE
>OPSD_TODPA
----GRDLRDNETWWYNPSIVVHPHWREFDQVPDAVYYSLGIFIGICGIIGCGGNGIVIYLFTKTKSLQTPANMFIINLAFSDFTFSLVNGFPLMTISCFLKWIFGFAACKVYGFIGGIFGFMSIMTMAMISIDRYNVIGRPMASKKMSHRRAFIMIIFVWLWSVLWAIGPIFGWGAYTLEGVLCNCSFD--YISRDSTTRSNILCMFILGFFGPILIIFFCYFNIVMSVSNHRLNLRKAQAGANAEMRLAKISIVIVSQFLLSWSPYAVVALLAQFGPLEWVTPYAAQLPVMFAKASAIHNPMIYSVSHPKFREAISQTFPWVLTCCQFDDKETEEIPAGESSDAAP
>OPRK_RAT
LPNSSSWFPNWAESDSNGSVGSEDQQLEPAHISPAIPVIITAVYSVVFVVGLVGNSLVMFVIIRYTKMKTATNIYIFNLALADALVTTTMPFQSAVYLMNS-WPFGDVLCKIVISIDYYNMFTSIFTLTMMSVDRYIAVCHPVALDFRTPLKAKIINICIWLLASSVGISAIVLGGTKVDVD-VIECSLQFPDDEYSWWDLFMKICVFVFAFVIPVLIIIVCYTLMILRLKSVRLL-SGSREKDRNLRRITKLVLVVVAVFIICWTPIHIFILVEALGSTSTAVLSSYYFCIALGYTNSSLNPVLYAFLDENFKRCFRDFCFPIKMRMERQSTNRVASMRDVGGMNKP
>ACM4_RAT
------M-NFTPVNGSSANQSVRLVTAAHNHLETVEMVFIATVTGSLSLVTVVGNILVMLSIKVNRQLQTVNNYFLFSLGCADLIIGAFSMNLYTLYIIKGYWPLGAVVCDLWLALDYVVSNASVMNLLIISFDRYFCVTKPLYPARRTTKMAGLMIAAAWVLSFVLWAPAILFWQFVV-VP-DNQCFIQ------FLSNPAVTFGTAIAAFYLPVVIMTVLYIHISLASRSRSIAVRKKRQMAARERKVTRTIFAILLAFILTWTPYNVMVLVNTFCQSC-IPERVWSIGYWLCYVNSTINPACYALCNATFKKTFRHLLLCQYRNIGTAR----------------
>SSR3_RAT
SVPTTLDPGNASSAWPLDTSLGNASAGTSLAGLAVSGILISLVYLVVCVVGLLGNSLVIYVVLRHTSSPSVTSVYILNLALADELFMLGLPFLAAQNALSY-WPFGSLMCRLVMAVDGINQFTSIFCLTVMSVDRYLAVVHPTSARWRTAPVARMVSAAVWVASAVVVLPVVVFSGVPR-----STCHMQ-WPEPAAAWRTAFIIYTAALGFFGPLLVICLCYLLIVVKVRSTSCQAPACQRRRRSERRVTRMVVAVVALFVLCWMPFYLLNIVNVVCPLPPAFFGLYFLVVALPYANSCANPILYGFLSYRFKQGFRRILLRPSRRVRSQEPGSGEEDEEEEERREE
>OPSD_HUMAN
MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVLGGFTSTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPAFFAKSAAIYNPVIYIMMNKQFRNCMLTTICCGKNPLGDDEASATVSKTETSQVAPA
>OPSG_GECGE
RDDDDTTRGSVFTYTNTNNTRGPFEGPNYHIAPRWVYNLVSFFMIIVVIASCFTNGLVLVATAKFKKLRHPLNWILVNLAFVDLVETLVASTISVFNQIFGYFILGHPLCVIEGYVVSSCGITGLWSLAIISWERWFVVCKPFGNIKFDSKLAIIGIVFSWVWAWGWSAPPIFGWSRYWPHGLKTSCGPDVFSGSVELGCQSFMLTLMITCCFLPLFIIIVCYLQVWMAIRAVAAQQKESESTQKAEREVSRMVVVMIVAFCICWGPYASFVSFAAANPGYAFHPLAAALPAYFAKSATIYNPVIYVFMNRQFRNCIMQLFGKKVDDG-----SEAVSSVSNSSVAPA
>OAR_BOMMO
TEIYDVIEDEKDVCAVADEPNIPCSFGISLAVPEWEAICTAIILTMIIISTVVGNILVILSVFTYKPLRIVQNFFIVSLAVADLTVAILVLPLNVAYSILGQWVFGIYVCKMWLTCDIMCCTSSILNLCAIALDRYWAITDPIYAQKRTLERVLFMIGIVWILSLVISSPPLLGWNDW-E-P--TPCRLT--------SQPGFVIFSSSGSFYIPLVIMTVVYFEIYLATKKRAVYEEKQRISLTRERRAARTLGIIMGVFVVCWLPFFVIYLVIPFCVSCCLSNKFINFITWLGYVNSALNPLIYTIFNMDFRRAFKKLLFIKC-----------------------
>YQH2_CAEEL
EQSTPARENLPNREIYQIFQFTLVYALPLSNHDNSSLMLIAGFYALLFMFGTCGNAAILAVVHHVKGRHNTTLTYICILSIVDFLSMLPIPMTIIDQILGF-WMFDTFACKLFRLLEHIGKIFSTFILVAFSIDRYCAVCHPLQVRVRNQRTVFVFLGIMFFVTCVMLSPILLYAHSKVTRMHLYKCVDD----LGRELFVVFTLYSFVLAYLMPLLFMIYFYYEMLIRLFKQLVGGGEEKKLTIPVGHIAIYTLAICSFHFICWTPYWISILYSLYEELYYAFIYFMYGVHALPYINSASNFILYGLLNRQLHNAPERKYTRNGVGGRQMSHALTSELIAIPSSSCR
>UL33_HSV7J
----MICYSFAKNVTFAFLIILQNFFSQHDEEYKYNYTCITPTVRKAQRLESVINGIMLTLILPVSTKQTITSPYLITLFISDSLHSLTVLLLTLNREAL--TNLNQALCQCVLFVYSASCTYSLCMLAVISTIRYR-TLQRRTLNDKNNNHIKRNVGILFLSSAMCAIPAVLYVQVEKK-KNYGKCNIHSTQKAY-DLFIGIKIVYCFLWGIFPTVIFSYFYVIFGKTL---------RALTQSKHNKTLSFISLLILSFLCIQIPNLLVMSVEIFFLYIIQREIVQIISRLMPEIHCLSNPLVYAFTRTDFRLRFYDFIKCNLCNSSLKRKRNP------------
>C5AR_RAT
DPISNDSSEITYDYSDGTPNPDMPADGVYIPKMEPGDIAALIIYLAVFLVGVTGNALVVWVTAFEAK-RTVNAIWFLNLAVADLLSCLALPILFTSIVKHNHWPFGDQACIVLPSLILLNMYSSILLLATISADRFLLVFKPICQKFRRPGLAWMACGVTWVLALLLTIPSFVFRRIHK-SD--ILCNID-YSKGPFFIEKAIAILRLMVGFVLPLLTLNICYTFLLIRT---------WSRKATRSTKTLKVVMAVVTCFFVFWLPYQVTGVILAWLPRSQSVERLNSLCVSLAYINCCVNPIIYVMAGQGFHGRLRRSLPSIIRNVLSEDSLGRSTMD--------
>CKR6_HUMAN
DSSEDYFVSVNTSYYSVDSEMLLC---SLQEVRQFSRLFVPIAYSLICVFGLLGNILVVITFAFYKKARSMTDVYLLNMAIADILFVLTLPFWAVSHATGA-WVFSNATCKLLKGIYAINFNCGMLLLTCISMDRYIAIVQATRLRSRTLPRSKIICLVVWGLSVIISSSTFVFNQKYN-TQ-SDVCEPKQTVSEPIRWKLLMLGLELLFGFFIPLMFMIFCYTFIVKTL---------VQAQNSKRHKAIRVIIAVVLVFLACQIPHNMVLLVTAANLGKKLIGYTKTVTEVLAFLHCCLNPVLYAFIGQKFRNYFLKILKDLWCVRRKYKSSGFEN------ISRQ
>B2AR_MOUSE
MGPHGNDSDFLLAPNGSRAPDH----DVTQERDEAWVVGMAILMSVIVLAIVFGNVLVITAIAKFERLQTVTNYFIISLACADLVMGLAVVPFGASHILMKMWNFGNFWCEFWTSIDVLCVTASIETLCVIAVDRYVAITSPFYQSLLTKNKARVVILMVWIVSGLTSFLPIQMHWYRAI-D--TCCDFF--------TNQAYAIASSIVSFYVPLCVMVFVYSRVFQVAKRQGRSLRRSSKFCLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHVIRDNL-IPKEVYILLNWLGYVNSAFNPLIYCRSP-DFRIAFQELLCLRRSSSKTYGNGYSDYTGEPNTCQLG
>DADR_DIDMA
---------------MPLNDTTMDRRGLVVERDFSFRILTACFLSLLILSTLLGNTLVCAAVIRFRHRSKVTNFFVISLAVSDLLVAVLVMPWKAVAEIAGFWPFG-SFCNIWVAFDIMCSTASILNLCVISVDRYWAISSPFYERKMTPKAAFILISVAWTLSVLISFIPVQLNWHKAGN-TMDNCDSS--------LSRTYAISSSLISFYIPVAIMIVTYTRIYRIAQKQVECESSFKMSFKRETKVLKTLSVIMGVFVCCWLPFFILNCMVPFCESDCIDSITFDVFVWFGWANSSLNPIIYAFNA-DFRKAFSTLLGCYRLCPTANNAIETGAVFSSH-----
>O.YR_MACMU
GELAANWSTEAVNSSAAPPGAEGNCTAGPPRRNEALARVEVAVLCLILFLALSGNACVLLALRTTRHKHSRLFFFMKHLSIADLVVAVFQVLPQLLWDITFRFYGPDLLCRLVKYLQVVGMFASTYLLLLMSLDRCLAICQPL--RSLRRRTDRLAVLATWLGCLVASAPQVHIFSLRE----VFDCWAV---FIQPWGPKAYITWITLAVYIVPVIVLAACYGLISFKIWQNRMAVSSVKLISKAKIRTVKMTFIIVLAFIVCWTPFFFVQMWSVWDANA-KEASAFIIVMLLASLNSCCNPWIYMLFTGHLFHELVQRFLCCSASYLKGNRLGENSSSFVLSHRSS
>AG2R_MELGA
------MVPNYSTEETVKRIHVDC---PVSGRHSYIYIMVPTVYSIIFIIGIFGNSLVVIVIYCYMKLKTVASIFLLNLALADLCFLITLPLWAAYTAMEYQWPFGNCLCKLASAGISFNLYASVFLLTCLSIDRYLAIVHPVSRIRRTMFVARVTCIVIWLLAGVASLPVIIHRNIFF-LN--TVCGFR-YDNNNTTLRVGLGLSKNLLGFLIPFLIILTSYTLIWKTLKKA----YQIQRNKTRNDDIFKMIVAIVFFFFFSWIPHQVFTFLDVLIQLHDIVDTAMPFTICIAYFNNCLNPFFYVFFGKNFKKYFLQLIKYIPPNVSTHPSLTTRPPE-------N
>TSHR_MOUSE
LQAFESHYDYTVCGDNEDMVCTPKSDEFNPCEDIMGYRFLRIVVWFVSLLALLGNIFVLLILLTSHYKLTVPRFLMCNLAFADFCMGVYLLLIASVDLYTHSWQTG-PGCNTAGFFTVFASELSVYTLTVITLERWYAITFAMLDRKIRLRHAYTIMAGGWVSCFLLALLPMVGISSY----KVSICLPM-----TDTPLALAYIVLVLLLNVVAFVVVCSCYVKIYITVRNP------QYNPRDKDTKIAKRMAVLIFTDFMCMAPISFYALSALMNKPLITVTNSKILLVLFYPLNSCANPFLYAIFTKAFQRDVFILLSKFGICKRQAQAYQGSTGIQIQKIPQD
>CCR3_MOUSE
FAFLLENSTSPYDYGENESDFSDSPPCPQDFSLNFDRTFLPALYSLLFLLGLLGNGAVAAVLLSQRTALSSTDTFLLHLAVADVLLVLTLPLWAVDAAVQ--WVFGPGLCKVAGALFNINFYAGAFLLACISFDRYLSIVHATIYRRDPRVRVALTCIVVWGLCLLFALPDFIYLSANY-RL-ATHCQYN----FPQVGRTALRVLQLVAGFLLPLLVMAYCYAHILAVL---------LVSRGQRRFRAMRLVVVVVAAFAVCWTPYHLVVLVDILMDVGSHVDVAKSVTSGMGYMHCCLNPLLYAFVGVKFREKMWMLFTRLGRSDQRGPQRQPSWSETTEASYLG
>AA2B_RAT
------------------------------MQLETQDALYVALELVIAALAVAGNVLVCAAVGASSALQTPTNYFLVSLATADVAVGLFAIPFAITISLG--FCTDFHSCLFLACFVLVLTQSSIFSLLAVAVDRYLAIRVPLYKGLVTGTRARGIIAVLWVLAFGIGLTPFLGWNSKDI-T-PVKCLFE-----NVVPMSYMVYFNFFGCVLPPLLIMMVIYIKIFMVACKQ---MEHSRTTLQREIHAAKSLAMIVGIFALCWLPVHAINCITLFHPALDKPKWVMNVAILLSHANSVVNPIVYAYRNRDFRYSFHRIISRYVLCQTDTKGGSGFSLSL-------
>MSHR_RANTA
PVLGSQRRLLGSLNCTPPATFPLMLAPNRTGPQCLEVSIPNGLFLSLGLVSLVENVLVVAAIAKNSNLHSPMYYFICCLAVSDLLVSVSNVLETAVMLLLEAAAVVQQLDNVIDVLICGSMVSSLCFLGAIAVDRYISIFYALYHSVVTLPRAWRIIAAIWVASILTSLLFITYYY----------------------NNHTVVLLCLVGFFIAMLALMAVLYVHMLARACQHIARKRQRPIHRGFGLKGAATLTILLGVFFLCWGPFFLHLSLIVLCPQHGCIFKNFNLFLALIICNAIVDPLIYAFRSQELRKTLQEVLQCSW-----------------------
>CKR3_RAT
EEELKTVVETFETTPYEYEWAPPC---EKVSIRELGSWLLPPLYSLVFIVGLLGNMMVVLILIKYRKLQIMTNIYLLNLAISDLLFLFTVPFWIHYVLWNE-WGFGHCMCKMLSGLYYLALYSEIFFIILLTIDRYLAIVHAVALRARTVTFATITSIITWGFAVLAALPEFIFHESQD-NF-DLSCSPRYPEGEEDSWKRFHALRMNIFGLALPLLIMVICYSGIIKTL---------LRCPNKKKHKAIQLIFVVMIVFFIFWTPYNLVLLLSAFHSTFIHLDLAMQVTEVITHTHCCINPIIYAFVGERFRKHLRLFFHRNVAIYLRKYISFLT-------SSVS
>OPS2_HEMSA
DFGYPEGVSIVDFVRPEIKPYVHQHWYNYPPVNPMWHYLLGVIYLFLGTVSIFGNGLVIYLFNKSAALRTPANILVVNLALSDLIMLTTNVPFFTYNCFSGGWMFSPQYCEIYACLGAITGVCSIWLLCMISFDRYNIICNGFNGPKLTTGKAVVFALISWVIAIGCALPPFFGWGNYILEGILDSCSYD--YLTQDFNTFSYNIFIFVFDYFLPAAIIVFSYVFIVKAIFAHMNVRSNEADAQRAEIRIAKTALVNVSLWFICWTPYALISLKGVMGDTSGITPLVSTLPALLAKSCSCYNPFVYAISHPKYRLAITQHLPWFCVHETETKSNDDAQDKA-------
>AG2R_RABIT
------MMLNSSTEDGIKRIQDDC---PKAGRHNYIFVMIPTLYSIIFVVGIFGNSLAVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPLWAVYTAMEYRWPFGNYLCKIASASVSFNLYASVFLLTCLSIDRYLAIVHPMSRLRRTMLVAKVTCIIIWLLAGLASLPAIIHRNVFF-TN--TVCAFH-YESQNSTLPIGLGLTKNILGFLFPFLIILTSYTLIWKALKKA----YEIQKNKPRNDDIFKIIMAIVLFFFFSWVPHQIFTFLDVLIQLGDIVDTAMPITICIAYFNNCLNPLFYGFLGKKFKKYFLQLLKYIPPKAKSHSNLSTRPSD-------N
>5H2A_HUMAN
NTSDAFNWTVDSENRTNLSCEGCLSPSCLSLLHLQEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYWPLPSKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHSRFNSRTKAFLKIIAVWTISVGISMPIPVFGLQD-VF---GSCLL---------ADDNFVLIGSFVSFFIPLTIMVITYFLTIKSLQKEEPGGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICKESDVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENKKPLQLILAYKSSQLQMGQK
>AA2A_HUMAN
-------------------------------MPIMGSSVYITVELAIAVLAILGNVLVCWAVWLNSNLQNVTNYFVVSLAAADIAVGVLAIPFAITISTG--FCAACHGCLFIACFVLVLTQSSIFSLLAIAIDRYIAIRIPLYNGLVTGTRAKGIIAICWVLSFAIGLTPMLGWNNCGS-Q-QVACLFE-----DVVPMNYMVYFNFFACVLVPLLLMLGVYLRIFLAARRQESQGERARSTLQKEVHAAKSLAIIVGLFALCWLPLHIINCFTFFCPDCHAPLWLMYLAIVLSHTNSVVNPFIYAYRIREFRQTFRKIIRSHVLRQQEPFKAAGAHGSDGEQVSLR
>HH2R_CAVPO
-------------------MAFNGTVPSFCMDFTVYKVTISVILIILILVTVAGNVVVCLAVGLNRRLRSLTNCFIVSLAVTDLLLGLLVLPFSAIYQLSCKWSFSKVFCNIYTSLDVMLCTASILNLFMISLDRYCAVTDPLYPVLITPARVAISLVFIWVISITLSFLSIHLGWN--RN-TIVKCKVQ--------VNEVYGLVDGLVTFYLPLLIMCITYFRIFKIAREQR--IGSWKAATIREHKATVTLAAVMGAFIICWFPYFTVFVYRGLKGDD-VNEVFEDVVLWLGYANSALNPILYAALNRDFRTAYHQLFCCRLASHNSHETSLRRSQCQEPRW---
>IL8B_HUMAN
KGEDLSNYSYSSTLPPFLLDAAPC----EPESLEINKYFVVIIYALVFLLSLLGNSLVMLVILYSRVGRSVTDVYLLNLALADLLFALTLPIWAASKVNG--WIFGTFLCKVVSLLKEVNFYSGILLLACISVDRYLAIVHATRTLTQKRYLVKFICLSIWGLSLLLALPVLLFRRTVY-NV-SPACYED-MGNNTANWRMLLRILPQSFGFIVPLLIMLFCYGFTLRTL---------FKAHMGQKHRAMRVIFAVVLIFLLCWLPYNLVLLADTLMRTQNHIDRALDATEILGILHSCLNPLIYAFIGQKFRHGLLKILAIHGLISKDSLPKDS------------
>OLF5_RAT
-------------MSSTNQSSVTEFLLLGLSRQPQQQQLLFLLFLIMYLATVLGNLLIILAIGTDSRLHTPMYFFLSNLSFVDVCFSSTTVPKVLANHILGSQAISFSGCLTQLYFLAVFGNMDNFLLAVMSYDRFVAICHPLYTTKMTRQLCVLLVVGSWVVANMNCLLHILLMARL---SPHFFCDGTKLSCSDTHLNELMILTEGAVVMVTPFVCILISYIHITCAV--------LRVSSPRGGWKSFSTCGSHLAVVCLFYGTVIAVYFNPSSSH---LAGRDMAAAVMYAVVTPMLNPFIYSLRNSDMKAALRKVLAMRFPSKQ-------------------
>OPSB_CHICK
TDLPEDFYIPMALDAPNITALSPFLVPQTHLGSPGLFRAMAAFMFLLIALGVPINTLTIFCTARFRKLRSHLNYILVNLALANLLVILVGSTTACYSFSQMYFALGPTACKIEGFAATLGGMVSLWSLAVVAFERFLVICKPLGNFTFRGSHAVLGCVATWVLGFVASAPPLFGWSRYIPEGLQCSCGPDWYTTDNKWHNESYVLFLFTFCFGVPLAIIVFSYGRLLITLRAVARQQEQSATTQKADREVTKMVVVMVLGFLVCWAPYTAFALWVVTHRGRSFEVGLASIPSVFSKSSTVYNPVIYVLMNKQFRSCMLKLLFCGRSPFGDDEDVSGSSVSSSHVAPA-
>AA2A_RAT
----------------------------------MGSSVYITVELAIAVLAILGNVLVCWAVWINSNLQNVTNFFVVSLAAADIAVGVLAIPFAITISTG--FCAACHGCLFFACFVLVLTQSSIFSLLAIAIDRYIAIRIPLYNGLVTGVRAKGIIAICWVLSFAIGLTPMLGWNNCST-K-RVTCLFE-----DVVPMNYMVYYNFFAFVLLPLLLMLAIYLRIFLAARRQESQGERTRSTLQKEVHAAKSLAIIVGLFALCWLPLHIINCFTFFCSTCHAPPWLMYLAIILSHSNSVVNPFIYAYRIREFRQTFRKIIRTHVLRRQEPFQAGGAHSTEGEQVSLR
>OLF6_CHICK
-------------MASGNCTTPTTFILSGLTDNPGLQMPLFMVFLAIYTITLLTNLGLIALISIDLQLQTPMYIFLQNLSFTDAVYSTVITPKMLATFLEETKTISYVGCILQYFSFVLLTVRECLLLAVMAYDRYAAICKPLYPAIMTKAVCWRLVKGLYSLAFLNFLVHTSGLLKL---SNHFFCDNSQISSSSTALNELLVFIFGSLFVMSSIITILISYVFIILTV--------VRIRSKERKYKAFSTCTSHLMAVSLFHGTIVFMYFQPANNF---SLDKDKIMSLFYTVVIPMLNPLIYSWRNKEVKDALHRAIATAVLFH--------------------
>P2Y6_HUMAN
-----------MEWDNGTGQALGLPPTTCVYRENFKQLLLPPVYSAVLAAGLPLNICVITQICTSRRALTRTAVYTLNLALADLLYACSLPLLIYNYAQGDHWPFGDFACRLVRFLFYANLHGSILFLTCISFQRYLGICHPLWHKRGGRRAAWLVCVAVWLAVTTQCLPTAIFAATG-----RTVCYDL-SPPALATHYMPYGMALTVIGFLLPFAALLACYCLLACRLCRQ---GPAEPVAQERRGKAARMAVVVAAAFAISFLPFHITKTAYLAVRSTEAFAAAYKGTRPFASANSVLDPILFYFTQKKFRRRPHELLQKLTAKWQRQGR---------------
>OPS5_DROME
SLGDGSVFPMGHGYPAEYQHMVHAHWRGFREAPIYYHAGFYIAFIVLMLSSIFGNGLVIWIFSTSKSLRTPSNLLILNLAIFDLFMCTN-MPHYLINATVGYIVGGDLGCDIYALNGGISGMGASITNAFIAFDRYKTISNPI-DGRLSYGQIVLLILFTWLWATPFSVLPLFQIWGRYPEGFLTTCSFD--YLTNTDENRLFVRTIFVWSYVIPMTMILVSYYKLFTHVRVHNVKANANADNMSVELRIAKAALIIYMLFILAWTPYSVVALIGCFGEQQLITPFVSMLPCLACKSVSCLDPWVYATSHPKYRLELERRLPWLGIREKHATSGTSSVSGDTLALSVQ
>YR13_CAEEL
--------------------MSNIFSVPLDPMSVAVGIPYVCFFIILSVVGIIGNVIVIYAIAGDRNRKSVMNILLLNLAVADLANLIFTIPEWIPPVFFGSWLFPSFLCPVCRYLECVFLFASISTQMIVCIERYIAIVLPMARQLCSRRNVLITVLVDWIFVACFASPYAVWHSVK-LFQLSATCSNT---VGKSTWWQGYKLTEFLAFYFVPCFIITVVYTKVAKCLWCKCLDSSRSSDALRTRRNVVKMLIACVAVYFVCYSPIQVIFLSKAVLNVTHPPYDFILLMNALAMTCSASNPLLYTLFSQKFRRRLRDVLYCPSDVENETKTYYSGPRASF------
>5H1B_CRIGR
CAPPPPAASQTGVPLVNLSHNSAESHIYQDSIALPWKVLLVALLALITLATTLSNAFVIATVYRTRKLHTPANYLIASLAVTDLLVSILVMPVSTMYTVTGRWTLGQVVCDFWLSSDITCCTASIMHLCVIALDRYWAITDAVYAAKRTPKRAAIMIALVWVFSISISLPPFFWR----E-E--LTCLVN-------TDHVLYTVYSTGGAFYLPTLLLIALYGRIYVEARSRRVSLEKKKLMAARERKATKTLGIILGAFIVCWLPFFIISLVMPICKDAWFHMATLDFFNWLGYLNSLINPIIYTMSNEDFKQAFHKLIRFKCAG---------------------
>D2DR_BOVIN
---MDPLNLSWYDDDPESRNWSRPFNGSEGKADRPPYNYYAMLLTLLIFVIVFGNVLVCMAVSREKALQTTTNYLIVSLAVADLLVATLVMPWVVYLEVVGEWKFSRIHCDIFVTLDVMMCTASILNLCAISIDRYTAVAMPMNTRYSSKRRVTVMIAIVWVLSFTISCPMLFGLN---Q----NECII---------ANPAFVVYSSIVSFYVPFIVTLLVYIKIYIVLRRRRTSMSRRKLSQQKEKKATQMLAIVLGVFIICWLPFFITHILNIHCDCN-IPPVLYSAFTWLGYVNSAVNPIIYTTFNIEFRKAFLKILHC-------------------------
>VU51_HSV6Z
----------------MEKETKSLAWPATAEFYGWVFIFSSIQLCTVVFLTVRFNGFKVGR-E--------YAVFTFAGMSFNCFLLPIKMGLLSGH-----WTLPRDFCAILLYIDDFSAYFSSWSLVFMAIERINYFCYSTLLNENSKALAKVCFPIVWVVSGVQALQMLNNYKATA---ETGQCFLA--------FRSGHDMWLMLVYSVVIPVMLVFFYLYSKNFM-----------LLKDELSSVTTYLCIYLLLGTIAHLPKAALSEIESD----KIFYGLRDIFMALPVLKVYYISAMAYCMACDDHTVPVRLCSIWLVNLCKKCFSCTLEVGIKMLK---
>D2DR_MOUSE
---MDPLNLSWYDDDLERQNWSRPFNGSEGKADRPHYNYYAMLLTLLIFIIVFGNVLVCMAVSREKALQTTTNYLIVSLAVADLLVATLVMPWVVYLEVVGEWKFSRIHCDIFVTLDVMMCTASILNLCAISIDRYTAVAMPMNTRYSSKRRVTVMIAIVWVLSFTISCPLLFGLN---Q----NECII---------ANPAFVVYSSIVSFYVPFIVTLLVYIKIYIVLRKRRTSMSRRKLSQQKEKKATQMLAIVLGVFIICWLPFFITHILNIHCDCN-IPPVLYSAFTWLGYVNSAVNPIIYTTFNIEFRKAFMKILHC-------------------------
>CCR4_MACFA
IYTSDNYTEEMG-SGDYDSIKEPC---FREENAHFNRIFLPTIYSIIFLTGIVGNGLVILVMGYQKKLRSMTDKYRLHLSVADLLYVITLPFWAVDAVAN--WYFGNFLCKAVHVIYTVNLYSSVLILAFISLDRYLAIVHATSQRPRKLLAEKVVYVGVWIPALLLTIPDFIFASVSE-DD-RYICDRF---YPNDLWVVVFQFQHIMVGLILPGIVILSCYCIIISKL---------SHSKGHQKRKALKTTVILILAFFACWLPYYIGISIDSFILLENTVHKWISITEALGFFHCCLNPILYAFLGAKFKTSAQHALTSVSRGSSLKILSKG----------GH
>HH1R_BOVIN
---------MTCPNSSCVFEDKMCQGNKTAPANDAQLTPLVVVLSTISLVTVGLNLLVLYAVRSERKLHTVGNLYIVSLSVADLIVGVVVMPMNILYLLMSRWSLGRPLCLFWLSMDYVASTASIFSVFILCIDRYRSVQQPLYLRYRTKTRASITILAAWFLSFLWIIPILGWRHFQ--EP-EDKCETD------FYNVTWFKVMTAIINFYLPTLLMLWFYAKIYKAVRQHLRSQYVSGLHMNRERKAAKQLGFIMAAFIICWIPYFIFFMVIAFCESC-CNQHVHMFTIWLGYINSTLNPLIYPLCNENFKKTFKKILHIRS-----------------------
>OPS2_DROPS
AQTG-GNRSVLDNVLPDMAPLVNPHWSRFAPMDPTMSKILGLFTLVILIISCCGNGVVVYIFGGTKSLRTPANLLVLNLAFSDFCMMASQSPVMIINFYYETWVLGPLWCDIYAACGSLFGCVSIWSMCMIAFDRYNVIVKGINGTPMTIKTSIMKIAFIWMMAVFWTIMPLIGWSSYVPEGNLTACSID--YMTRQWNPRSYLITYSLFVYYTPLFMICYSYWFIIATVAAHMNVRSSEDCDKSAENKLAKVALTTISLWFMAWTPYLIICYFGLFKIDG-LTPLTTIWGATFAKTSAVYNPIVYGISHPNDRLVLKEKCPMCVCGTTDEPKPDATSEAESKD----
>OPSD_SCYCA
MNGTEGENFYIPMSNKTGVVRSPFDYPQYYLAEPWKFSVLAAYMFFLIIAGFPVNFLTLYVTIQHKKLRQPLNYILLNLAVADLFMIFGGFPSTMITSMNGYFVFGPSGCNFEGFFATLGGEIGLWSLVVLAIERYVVVCKPMSNFRFGSQHAFMGVGLTWIMAMACAFPPLVGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFSIPLTIIFFCYGRLVCTVKEAAAQQQESETTQRAEREVTRMVIIMVIAFLICWLPYASVAFFIFCNQGSEFGPIFMTIPAFFAKAASLYNPLIYILMNKQFRNCMITTICCGKNPFEEEESTSASSVSSSQVAPAA
>OPSD_BUFBU
MNGTEGPNFYIPMSNKTGVVRSPFEYPQYYLAEPWQYSILCAYMFLLILLGFPINFMTLYVTIQHKKLRTPLNYILLNLAFANHFMVLCGFTVTMYSSMNGYFILGATGCYVEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFSENHAVMGVAFTWIMALSCAVPPLLGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFVVHFTIPLIIIFFCYGRLVCTVKEAAAQQQESATTQKAEKEVTRMVIIMVVFFLICWVPYASVAFFIFSNQGSEFGPIFMTVPAFFAKSSSIYNPVIYIMLNKQFRNCMITTLCCGKNPFGEDDASSAASSVSSSQVSPA
>ACM4_CHICK
AQPWQAKMANLTYDNVTLSNRSEVAIQPPTNYKTVELVFIATVTGSLSLVTVVGNILVMLSIKVNRQLQTVNNYFLFSLACADLIIGVFSMNLYTVYIIKGYWPLGAVVCDLWLALDYVVSNASVMNLLIISFDRYFCVTKPLYPARRTTKMAGLMIAAAWILSFILWAPAILFWQFIV-VH-ERECYIQ------FLSNPAVTFGTAIAAFYLPVVIMTVLYIHISLASRSRSIAVRKKRQMAAREKKVTRTIFAILLAFILTWTPYNVMVLINTFCETC-VPETVWSIGYWLCYVNSTINPACYALCNATFKKTFKHLLMCQYRNIGTAR----------------
>GRHR_RAT
--MANNASLEQDQNHCSAINNSIPLTQGKLPTLTLSGKIRVTVTFFLFLLSTAFNASFLVKLQRWTQKLSRMKVLLKHLTLANLLETLIVMPLDGMWNITVQWYAGEFLCKVLSYLKLFSMYAPAFMMVVISLDRSLAVTQPL-AVQSKSKLERSMTSLAWILSIVFAGPQLYIFRMIYAV--FSQCVTH--SFPQWWHEAFYNFFTFSCLFIIPLLIMLICNAKIIFALTRVPRKNQSKNNIPRARLRTLKMTVAFGTSFVICWTPYYVLGIWYWFDPEMRVSEPVNHFFFLFAFLNPCFDPLIYGYFSL-------------------------------------
>C5AR_MOUSE
---MNSSFEINYDHY-GTMDPNIPADGIHLPKRQPGDVAALIIYSVVFLVGVPGNALVVWVTAFEPD-GPSNAIWFLNLAVADLLSCLAMPVLFTTVLNHNYWYFDATACIVLPSLILLNMYASILLLATISADRFLLVFKPICQKVRGTGLAWMACGVAWVLALLLTIPSFVYREAYK-SE--TVCGIN-YGGGSFPKEKAVAILRLMVGFVLPLLTLNICYTFLLLRT---------WSRKATRSTKTLKVVMAVVICFFIFWLPYQVTGVMIAWLPPSKRVEKLNSLCVSLAYINCCVNPIIYVMAGQGFHGRLLRSLPSIIRNALSEDSVGRSTDD--------
>5H2A_MACMU
NTSDAFNWTVESENRTNLSCEGCLSPSCLSLLHLQEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYWPLPSKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHSRFNSRTKAFLKIIAVWTISVGISMPIPVFGLQD-VF---GSCLL---------ADDNFVLIGSFVSFFIPLTIMVITYFLTIKSLQKEDPGGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICKESDVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENKKPLQLILAYKSSQLQMGQK
>LSHR_BOVIN
ESELSGWDYDYGFCLPKTLQCAPEPDAFNPCEDIMGYNFLRVLIWLINILAITGNVTVLFVLLTSRYKLTVPRFLMCNLSFADFCMGLYLLLIASVDAQTKGWQTG-SGCSAAGFFTVFASELSVYTLTVITLERWHTITYAILDQKLRLKHAIPVMLGGWLFSTLIAVLPLVGVSNY----KVSICLPM-----VESTLSQVYILTILILNVMAFIIICACYIKIYFAVQNP------ELMATNKDTKIAKKMAVLIFTDFTCMAPISFFAISAAFKVPLITVTNSKVLLVLFYPVNSCANPFLYAIFTKAFQRDFFLLLSKFGCCKYRAELYRRSNCKNGFTGSNK
>SSR5_HUMAN
LFPASTPSWNASSPGAASGGGDNRTLVGPAPSAGARAVLVPVLYLLVCAAGLGGNTLVIYVVLRFAKMKTVTNIYILNLAVADVLYMLGLPFLATQNAASF-WPFGPVLCRLVMTLDGVNQFTSVFCLTVMSVDRYLAVVHPLSARWRRPRVAKLASAAAWVLSLCMSLPLLVFADVQE-----GTCNAS-WPEPVGLWGAVFIIYTAVLGFFAPLLVICLCYLLIVVKVRAAGV--RVGCVRRRSERKVTRMVLVVVLVFAGCWLPFFTVNIVNLAVALPPASAGLYFFVVILSYANSCANPVLYGFLSDNFRQSFQKVLCLRKGSGAKDADATEQQQEATRPRTAA
>B2AR_PIG
MGQPGNRSVFLLAPNGSHAPDQ----DVPQERDEAWVVGMAIVMSLIVLAIVFGNVLVITAIAKFERLQTVTNYFITSLACADLVMGLAVVPFGASHILMKMWTFGSFWCEFWISIDVLCVTASIETLCVIAVDRYLAITSPFYQCLLTKNKARVVILMVWVVSGLISFLPIKMHWYQAL-N--ACCDFF--------TNQPYAIASSIVSFYLPLVVMVFVYSRVFQVARRQGRSHRRSSKFCLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHGIHDNL-IPKEVYILLNWVGYVNSAFNPLIYCRSP-DFRMAFQELLCLHRSSLKAYGNGCSDYTGEQSGCYLG
>FSHR_RAT
SDMMYNEFDYDLCNEVVDVTCSPKPDAFNPCEDIMGYNILRVLIWFISILAITGNTTVLVVLTTSQYKLTVPRFLMCNLAFADLCIGIYLLLIASVDIHTKSWQTG-AGCDAAGFFTVFASELSVYTLTAITLERWHTITHAMLECKVQLRHAASVMVLGWTFAFAAALFPIFGISSY----KVSICLPM-----IDSPLSQLYVMALLVLNVLAFVVICGCYTHIYLTVRNP------TIVSSSSDTKIAKRMATLIFTDFLCMAPISFFAISASLKVPLITVSKAKILLVLFYPINSCANPFLYAIFTKNFRRDFFILLSKFGCYEMQAQIYRTNFHARKSHCSSA
>D4DR_MOUSE
---MGNSSATEDGGLLAGRGP---ESLGTGAGLGGAGAAALVGGVLLIGLVLAGNSLVCVSVASERTLQTPTNYFIVSLAAADLLLAVLVLPLFVYSEVQGGWLLSPRLCDTLMAMDVMLCTASIFNLCAISVDRFVAVTVPL-RYNQQGQCQLLLIAATWLLSAAVASPVVCGLN---G-R--AVCCL---------ENRDYVVYSSVCSFFLPCPLMLLLYWATFRGLRRWPEPRRRGAKITGRERKAMRVLPVVVGAFLVCWTPFFVVHITRALCPACFVSPRLVSAVTWLGYVNSALNPIIYTIFNAEFRSVFRKTLRLRC-----------------------
>GALT_MOUSE
--------------------MADIQNISLDSPGSVGAVAVPVVFALIFLLGMVGNGLVLAVLLQPGPPGSTTDLFILNLAVADLCFILCCVPFQAAIYTLDAWLFGAFVCKTVHLLIYLTMYASSFTLAAVSVDRYLAVRHPLSRALRTPRNARAAVGLVWLLAALFSAPYLSYYG---GA--LELCVPA----WEDARRRALDVATFAAGYLLPVTVVSLAYGRTLCFLWAAGP-AAAAEARRRATGRAGRAMLAVAALYALCWGPHHALILCFWYGRFAPATYACRLASHCLAYANSCLNPLVYSLASRHFRARFRRLWPCGHRRHRHHHHRLHPASSGPAGYPGD
>AG2R_RAT
------MALNSSAEDGIKRIQDDC---PKAGRHSYIFVMIPTLYSIIFVVGIFGNSLVVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPLWAVYTAMEYRWPFGNHLCKIASASVSFNLYASVFLLTCLSIDRYLAIVHPMSRLRRTMLVAKVTCIIIWLMAGLASLPAVIHRNVYF-TN--TVCAFH-YESRNSTLPIGLGLTKNILGFLFPFLIILTSYTLIWKALKKA----YEIQKNKPRNDDIFRIIMAIVLFFFFSWVPHQIFTFLDVLIQLGDIVDTAMPITICIAYFNNCLNPLFYGFLGKKFKKYFLQLLKYIPPKAKSHSSLSTRPSD-------N
>NMBR_RAT
PNLSLPTEASESELEPEVWENDFLPDSDGTTAELVIRCVIPSLYLIIISVGLLGNIMLVKIFLTNSTMRSVPNIFISNLAAGDLLLLLTCVPVDASRYFFDEWVFGKLGCKLIPAIQLTSVGVSVFTLTALSADRYRAIVNPMMQTSGVVLWTSLKAVGIWVVSVLLAVPEAVFSEVARSS--FTACIPY---QTDELHPKIHSVLIFLVYFLIPLVIISIYYYHIAKTLIRSLPGNEHTKKQMETRKRLAKIVLVFVGCFVFCWFPNHILYLYRSFNYKELGHMIVTLVARVLSFSNSCVNPFALYLLSESFRKHFNSQLCCGQKSYPERSTSYLMTSLKSNAKNVV
>NY1R_CANFA
TSFSQVENHSIFCNFSE-NSQFLAFESDDCHLPLAMIFTLALAYGAVIILGVTGNLALIMIILKQKEMRNVTNILIVNLSFSDLLVAIMCLPFTFVYTLMDHWVFGEAMCKLNPFVQCVSITVSIFSLVLIAVERHQLIINPR-GWRPNNRHAYVGIAVIWVLAVVSSLPFLIYQVLTDKD--KYVCFDK---FPSDSHRLSYTTLLLMLQYFGPLCFIFICYFKIYIRLKRRNMMMRDNKYRSSETKRINIMLLSIVVAFAVCWLPLTIFNTVFDWNHQICNHNLLFLLCHLTAMISTCVNPIFYGFLNKNFQRDLQFFFNFCDFRSRDDDY---TMHTDVSKTSLK
>A2AD_HUMAN
AAGPNASGAGERGSGGVANASGASWGPPRGQYSAGAVAGLAAVVGFLIVFTVVGNVLVVIAVLTSRALRAPQNLFLVSLASADILVATLVMPFSLANELMAYWYFGQVWCGVYLALDVLFCTSSIVHLCAISLDRYWSVTQAVYNLKRTPRRVKATIVAVWLISAVISFPPLVSLYR-------PQCGLN--------DETWYILSSCIGSFFAPCLIMGLVYARIYRVAKLRRRAVCRRKVAQAREKRFTFVLAVVMGVFVLCWFPFFFSYSLYGICREAQVPGPLFKFFFWIGYCNSSLNPVIYTVFNQDFRRSFKHILFRRRRRGFRQ-----------------
>V1AR_SHEEP
SSRWWPLDAGDANTSGDLAGLGEDGGPQADTRNEELAKLEIAVLAVIFVVAVLGNSSVLLALHRTPRKTSRMHLFIRHLSLADLAVAFFQVLPQLGWDITYRFRGPDGLCRVVKHMQVFAMFASAYMLVVMTADRYIAVCHPLKTLQQPARRSRLMIAAAWVLSFVLSTPQYFVFSMVETK--TYDCWAN---FIHPWGLPAYVTWMTGSVFVAPVVILGTCYGFICYHIWRKVLHVSSVKTISRAKIRTVKMTFVIVTAYIVCWAPFFIIQMWSAWDKNFESENPATAIPALLASLNSCCNPWIYMFFSGHLLQDCAQSFPCCQNVKRTFTREGSTSFTNNRSPTNS
>FML1_MOUSE
-----------MESNYSIHLNGSEVVVYDSTISRVLWILSMVVVSITFFLGVLGNGLVIWVAGFRMP-HTVTTIWYLNLALADFSFTATLPFLLVEMAMKEKWPFGWFLCKLVHIAVDVNLFGSVFLIAVIALDRCICVLHPVAQNHRTVSLARNVVVGSWIFALILTLPLFLFLTTVRVSWVEERLNTA------ITFVTTRGIIRFIVSFSLPMSFVAICYGLITTKI---------HKKAFVNSSRPFRVLTGVVASFFICWFPFQLVALLGTVWLKEKIIGRLVNPTSSLAFFNSCLNPILYVFMGQDFQERLIHSLSSRLQRALSEDSGHIAS----------
>OPSD_PETMA
MNGTEGENFYIPFSNKTGLARSPFEYPQYYLAEPWKYSVLAAYMFFLILVGFPVNFLTLFVTVQHKKLRTPLNYILLNLAVANLFMVLFGFTLTMYSSMNGYFVFGPTMCNFEGFFATLGGEMSLWSLVVLAIERYIVICKPMGNFRFGSTHAYMGVAFTWFMALSCAAPPLVGWSRYLPEGMQCSCGPDYYTLNPNFNNESFVIYMFLVHFIIPFIVIFFCYGRLLCTVKEAAAAQQESASTQKAEKEVTRMVVLMVIGFLVCWVPYASVAFYIFTHQGSDFGATFMTVPAFFAKTSALYNPIIYILMNKQFRNCMITTLCCGKNPLGDEDSGASVSSVSTSQVSPA
>B2AR_MACMU
MGQPGNGSAFLLAPNGSHAPDH----DVTQERDEAWVVGMGIVMSLIVLAIVFGNVLVITAIAKFERLQTVTNYFITSLACADLVMGLAVVPFGAAHILMKMWTFGNFWCEFWTSIDVLCVTASIETLCVIAVDRYFAITSPFYQSLLTKNKARVIILMVWIVSGLTSFLPIQMHWYRAI-N--TCCDFF--------TNQAYAIASSIVSFYVPLVIMVFVYSRVFQEAKRQGRTLRRSSKFCLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHVIQDNL-IPKEVYILLNWVGYVNSGFNPLIYCRSP-DFRIAFQELLCLRRSSLKACGNGYSGNTGEQSGYHLE
>AG2R_SHEEP
------MILNSSTEDGIKRIQDDC---PKAGRHNYIFIMIPTLYSIIFVVGLFGNSLVVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPLWAVYTAMEYRWPFGNYLCKIASGSVSFNLYASVFLLTCLSIDRYLAIVHPMSRLRRTMLVAKVTCIIIWLLAGLASLPTIIHRNVFF-TN--TVCAFHVYESQNSTLPVGLGLTKNILGFLFPFLIILTSYTLIWKTLKKA----YEIQKNKPRKDDIFKIILAIVLFFFFSWVPHQIFTFMDVLIQLGDIVDTAMPITICLAYFNNCLNPPFYGFLGKKFKKYFLQLLKYIPPKAKSHSNLSTRPSE-------N
>AG2R_BOVIN
------MILNSSTEDGIKRIQDDC---PKAGRHNYIFIMIPTLYSIIFVVGIFGNSLVVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPLWAVYTAMEYRWPFGNYLCKIASASVSFNLYASVFLLTCLSIDRYLAIVHPMSRLRRTMLVAKVTCIIIWLLAGLASLPTIIHRNVFF-TN--TVCAFH-YESQNSTLPVGLGLTKNILGFLFPFLIILTSYTLIWKTLKKA----YEIQKNKPRKDDIFKIILAIVLFFFFSWVPHQIFTFMDVLIQLGDIVDTAMPITICLAYFNNCLNPLFYGFLGKKFKKYFLQLLKYIPPKAKSHSNLSTRPSE-------N
>ACM2_HUMAN
--------------MNNSTNSSNNSLALTSPYKTFEVVFIVLVAGSLSLVTIIGNILVMVSIKVNRHLQTVNNYFLFSLACADLIIGVFSMNLYTLYTVIGYWPLGPVVCDLWLALDYVVSNASVMNLLIISFDRYFCVTKPLYPVKRTTKMAGMMIAAAWVLSFILWAPAILFWQFIV-VE-DGECYIQ------FFSNAAVTFGTAIAAFYLPVIIMTVLYWHISRASKSRVKMPAKKKPPPSREKKVTRTILAILLAFIITWAPYNVMVLINTFCAPC-IPNTVWTIGYWLCYINSTINPACYALCNATFKKTFKHLLMCHYKNIGATR----------------
>PF2R_HUMAN
--MSMNN---------SKQLVSPAAALLSNTTCQTENRLSVFFSVIFMTVGILSNSLAIAILMKAYQSKASFLLLASGLVITDFFGHLINGAIAVFVYASDKFDQSNVLCSIFGICMVFSGLCPLLLGSVMAIERCIGVTKPIHSTKITSKHVKMMLSGVCLFAVFIALLPILGHRDYKIQASRTWCFYN--TEDIKDWEDRFYLLLFSFLGLLALGVSLLCNAITGITLLRVKFKSQQHRQGRSHHLEMVIQLLAIMCVSCICWSPFLVTMANIGINGNHLETCETTLFALRMATWNQILDPWVYILLRKAVLKNLYKLASQCCGVHVISLHIWELKVAAISESPVA
>US28_HCMVA
----MTPTTTTAELTTEFDYDEDATPCVFTDVLNQSKPVTLFLYGVVFLFGSIGNFLVIFTITWRRRIQCSGDVYFINLAAADLLFVCTLPLWMQYLLDH--NSLASVPCTLLTACFYVAMFASLCFITEIALDRYYAIVYMR---YRPVKQACLFSIFWWIFAVIIAIPHFMVVTKK-----DNQCMTD-YDYLEVSYPIILNVELMLGAFVIPLSVISYCYYRISRIV---------AVSQSRHKGRIVRVLIAVVLVFIIFWLPYHLTLFVDTLKLLKRSLKRALILTESLAFCHCCLNPLLYVFVGTKFRQELHCLLAEFRQRLFSRDVSWYRSSPSRRETSSD
>NK4R_HUMAN
NLTSSPAPTASPSPAPSWTPSPRPGPAHPFLQPPWAVALWSLAYGAVVAVAVLGNLVVIWIVLAHKRMRTVTNSFLVNLAFADAAMAALNALVNFIYALHE-WYFGANYCRFQNFFPITAVFASIYSMTAIAVDRYMAIIDPL-KPRLSATATRIVIGSIWILAFLLAFPQCLYSK---GR---TLCYVQ--WPEGSRQHFTYHMIVIVLVYCFPLLIMGITYTIVGITLWGG--PCDKYQEQLKAKRKVVKMMIIVVVTFAICWLPYHIYFILTAIYQQLKYIQQVYLASFWLAMSSTMYNPIIYCCLNKRFRAGFKRAFRWCPFIHVSSY----ATRLHPMRQSSL
>MC3R_RAT
NSSCCPSSSYPTLPNLSQHPAAPSASNRSGSGFCEQVFIKPEVFLALGIVSLMENILVILAVVRNGNLHSPMYFFLLSLLQADMLVSLSNSLETIMIVVINSDQFIQHMDNIFDSMICISLVASICNLLAIAVDRYVTIFYALYHSIMTVRKALSLIVAIWVCCGICGVMFIVYYS----------------------EESKMVIVCLITMFFAMVLLMGTLYIHMFLFARLHIAAADGVAPQQHSCMKGAVTITILLGVFIFCWAPFFLHLVLIITCPTNICYTAHFNTYLVLIMCNSVIDPLIYAFRSLELRNTFKEILCGCNGMNVG------------------
>OPS1_HEMSA
TFGYPEGMTVADFVPDRVKHMVLDHWYNYPPVNPMWHYLLGVVYLFLGVISIAGNGLVIYLYMKSQALKTPANMLIVNLALSDLIMLTTNFPPFCYNCFSGGWMFSGTYCEIYAALGAITGVCSIWTLCMISFDRYNIICNGFNGPKLTQGKATFMCGLAWVISVGWSLPPFFGWGSYTLEGILDSCSYD--YFTRDMNTITYNICIFIFDFFLPASVIVFSYVFIVKAIFAHMNVRSNEAETQRAEIRIAKTALVNVSLWFICWTPYAAITIQGLLGNAEGITPLLTTLPALLAKSCSCYNPFVYAISHPKFRLAITQHLPWFCVHEKDPNDVEETQEKS-------
>AA3R_RAT
-----------------------MKANNTTTSALWLQITYITMEAAIGLCAVVGNMLVIWVVKLNRTLRTTTFYFIVSLALADIAVGVLVIPLAIAVSLE--VQMHFYACLFMSCVLLVFTHASIMSLLAIAVDRYLRVKLTVYRTVTTQRRIWLFLGLCWLVSFLVGLTPMFGWN-RKE-L-TLSCHFR-----SVVGLDYMVFFSFITWILIPLVVMCIIYLDIFYIIRNK--NFRETRAFYGREFKTAKSLFLVLFLFALCWLPLSIINFVSYFNVK--IPEIAMCLGILLSHANSMMNPIVYACKIKKFKETYFVILRACRLCQTSDSLDSN------------
>SSR4_RAT
GED----TTWTPGINASWAPDEEEDAVRSDGTGTAGMVTIQCIYALVCLVGLVGNALVIFVILRYAKMKTATNIYLLNLAVADELFMLSVPFVASAAALRH-WPFGAVLCRAVLSVDGLNMFTSVFCLTVLSVDRYVAVVHPLAATYRRPSVAKLINLGVWLASLLVTLPIAVFADTRPGGE-AVACNLH---WPHPAWSAVFVIYTFLLGFLLPVLAIGLCYLLIVGKMRAVALR-AGWQQRRRSEKKITRLVLMVVTVFVLCWMPFYVVQLLNLFVTS--LDATVNHVSLILSYANSCANPILYGFLSDNFRRSFQRVLCLRCCLLETTGGAEETALKSRGGPGCI
>NK1R_RAT
-----MDNVLPMDSDLFPNISTNTSESNQFVQPTWQIVLWAAAYTVIVVTSVVGNVVVIWIILAHKRMRTVTNYFLVNLAFAEACMAAFNTVVNFTYAVHNVWYYGLFYCKFHNFFPIAALFASIYSMTAVAFDRYMAIIHPL-QPRLSATATKVVIFVIWVLALLLAFPQGYYST---SR---VVCMIEWPEHPNRTYEKAYHICVTVLIYFLPLLVIGYAYTVVGITLWAS-IPSDRYHEQVSAKRKVVKMMIVVVCTFAICWLPFHVFFLLPYINPDLKFIQQVYLASMWLAMSSTMYNPIIYCCLNDRFRLGFKHAFRCCPFISAGDY----STRYLQTQSSVY
>OPSD_SARDI
MNGTEGPFFYIPMVNTTGIVRSPYEYPQYYLVNPAAYAILGAYMFFLIIVGFPVNFMTLYVTLEHKKLRTPLNYILLNLAVADLFMVIGGFTTTMYTSMHGYFVLGRLGCNLEGFFATLGGMISLWSLAVLAIERWVVVCKPISNFRFGENHAIMGVSLTWGMALACTVPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVLYMFFCHFTIPLTIIFFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVIIMVIGFLVCWLPYASVAWFIFTHQGSEFGPLFMTIPAFFAKSSSIYNPMIYICMNKQFRHCMITTLFCGKNPF---EGEEETEASSASSVSPA
>FSHR_HORSE
FDMMYSEFEYDLCNEVVDVTCSPKPDAFNPCEDIMGYDILRVLIWFISILAITGNIIVLVILITSQYKLTVPRFLMCNLAFADLCIGIYLLLIASVDIHTKSWQTG-AGCDAAGFFTVFASELSVYTLTAITLERWHTITHAMLECKVQLRHAASVMLVGWIFAFAVALLPIFGISTY----KVSICLPM-----IDSPLSQLYVMSLLVLNVLAFVVICGCYIHIYLTVRNP------NIVSSSSDTKIAKRMAILIFTDFLCMAPISFFAISASLKVPLITVSKSKILLVLFYPINSCANPFLYAIFTKNFRRDFFILLSKFGCYEMQAQLYRTISHPRNGHCPPT
>OPSG_CARAU
MNGTEGKNFYVPMSNRTGLVRSPFEYPQYYLAEPWQFKILALYLFFLMSMGLPINGLTLVVTAQHKKLRQPLNFILVNLAVAGTIMVCFGFTVTFYTAINGYFVLGPTGCAVEGFMATLGGEVALWSLVVLAIERYIVVCKPMGSFKFSSSHAFAGIAFTWVMALACAAPPLFGWSRYIPEGMQCSCGPDYYTLNPDYNNESYVIYMFVCHFILPVAVIFFTYGRLVCTVKAAAAQQQDSASTQKAEREVTKMVILMVFGFLIAWTPYATVAAWIFFNKGADFSAKFMAIPAFFSKSSALYNPVIYVLLNKQFRNCMLTTIFCGKNPLGDDESS--TSKTEVSSVSPA
>AA3R_RABIT
------------------------MPDNSTTLFLAIRASYIVFEIVIGVCAVVGNVLVIWVIKLNPSLKTTTFYFIFSLALADIAVGFLVMPLAIVISLG--ITIGFYSCLVMSCLLLVFTHASIMSLLAIAVDRYLRVKLTVYRRVTTQRRIWLALGLCWVVSLLVGFTPMFGWN-MKE-S-DFQCKFD-----SVIPMEYMVFFSFFTWILIPLLLMCALYVYIFYIIRNK--SFKETGAFYRREFKTAKSLFLVLALFAGCWLPLSIINCVTYFKCK--VPDVVLLVGILLSHANSMMNPIVYACKIQKFKETYLLIFKARVTCQPSDSLDPS------------
>5H1D_HUMAN
SPLNQSAEGLPQEASNRSLNATETSEAWDPRTLQALKISLAVVLSVITLATVLSNAFVLTTILLTRKLHTPANYLIGSLATTDLLVSILVMPISIAYTITHTWNFGQILCDIWLSSDITCCTASILHLCVIALDRYWAITDALYSKRRTAGHAATMIAIVWAISICISIPPLFWR----Q-E--SDCLVN-------TSQISYTIYSTCGAFYIPSVLLIILYGRIYRAARNRKLALERKRISAARERKATKILGIILGAFIICWLPFFVVSLVLPICRDSWIHPALFDFFTWLGYLNSLINPIIYTVFNEEFRQAFQKIVPFRKAS---------------------
>CCR4_MACMU
IYTSDNYTEEMG-SGDYDSIKEPC---FREENAHFNRIFLPTIYSIIFLTGIVGNGLVILVMGYQKKLRSMTDKYRLHLSVADLLFVITLPFWAVDAVAN--WYFGNFLCKAVHVIYTVNLYSSVLILAFISLDRYLAIVHATSQKPRKLLAEKVVYVGVWIPALLLTIPDFIFASVSE-DD-RYICDRF---YPNDLWVVVFQFQHIMVGLILPGIDILSCYCIIISKL---------SHSKGHQKRKALKTTVILILAFFACWLPYYIGISIDSFILLENTVHKWISITEALAFFHCCLNPILYAFLGAKFKTSAQHALTSVSRGSSLKILSKG----------GH
>OPSP_CHICK
------MSSNSSQAPPNGTPGPFDGPQWPYQAPQSTYVGVAVLMGTVVACASVVNGLVIVVSICYKKLRSPLNYILVNLAVADLLVTLCGSSVSLSNNINGFFVFGRRMCELEGFMVSLTGIVGLWSLAILALERYVVVCKPLGDFQFQRRHAVSGCAFTWGWALLWSAPPLLGWSSYVPEGLRTSCGPN--WYTGGSNNNSYILSLFVTCFVLPLSLILFSYTNLLLTLRAAAAQQKEADTTQRAEREVTRMVIVMVMAFLLCWLPYSTFALVVATHKGIIIQPVLASLPSYFSKTATVYNPIIYVFMNKQFQSCLLEMLCCGYQPQRTGKASPGVTAAGLRNKVMP
>O7C1_HUMAN
-------------METGNQTHAQEFLLLGFSATSEIQFILFGLFLSMYLVTFTGNLLIILAICSDSHLHTPMYFFLSNLSFADLCFTSTTVPKMLLNILTQNKFITYAGCLSQIFFFTSFGCLDNLLLTVMAYDRFVAVCHPLYTVIMNPQLCGLLVLGSWCISVMGSLLETLTVLRL---SPHFFCDLLKLACSDTFINNVVIYFATGVLGVISFTGIFFSYYKIVFSI--------LRISSAGRKHKAFSTCGSHLSVVTLFYGTGFGVYLSSAATP---SSRTSLVASVMYTMVTPMLNPFIYSLRNTDMKRALGRLLSRATFFNGDITAGLS------------
>OPR._PIG
GSPLQGNLSLLSPNHSLLPPHLLLNASHGAFLPLGLKVTIVGLYLAVCVGGLLGNCLVMYVILRHTKMKTATNIYIFNLALADTAVLLTLPFQGTDVLLGF-WPFGNALCKAVIAIDYYNMFTSAFTLTAMSVDRYVAICHPIALDVRTSSKAQAVNVAIWALASIVGVPVAIMGSAQVEE---IECLVE-IPAPQDYWGPVFAVCIFLFSFVIPVLIISVCYSLMVRRLRGVRLL-SGSREKDRNLRRITRLVLVVVAVFVGCWTPVQVFVLVQGLGVQPETAVAVLRFCTALGYVNSCLNPILYAFLDENFKACFRKFCCAPTRRREMQVSDRVALACKTSETVPR
>OPSD_RAT
MNGTEGPNFYVPFSNITGVVRSPFEQPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIGLWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIFFLICWLPYASVAMYIFTHQGSNFGPIFMTLPAFFAKTASIYNPIIYIMMNKQFRNCMLTSLCCGKNPLGDDEASATASKTETSQVAPA
>P2UR_HUMAN
----MAADLGPWNDTINGTWDGDELGYRCRFNEDFKYVLLPVSYGVVCVLGLCLNAVALYIFLCRLKTWNASTTYMFHLAVSDALYAASLPLLVYYYARGDHWPFSTVLCKLVRFLFYTNLYCSILFLTCISVHRCLGVLRPLSLRWGRARYARRVAGAVWVLVLACQAPVLYFVTTS-----RVTCHDT-SAPELFSRFVAYSSVMLGLLFAVPFAVILVCYVLMARRLLKP---YGTSGGLPRAKRKSVRTIAVVLAVFALCFLPFHVTRTLYYSFRSLNAINMAYKVTRPLASANSCLDPVLYFLAGQRLVRFARDAKPPTGPSPATPARRRLTDMQRIGDVLGS
>D1DR_OREMO
-MEIFTTTRGTSAGPEPAPGGHGGTDSPR-TSDLSLRALTGCVLCILIVSTLLGNALVCAAVIKFRHRSKVTNAFVISLAVSDLFVAVLVMPWRAVSEVAGVWLFG-AFCDTWVAFDIMCSTASILHLCIISMDRYWAISSPFYERRMTPRFGCVMIGVAWTLSVLISFIPVQLNWHARR--DPGDCNAS--------LNRTYAISSSLISFYIPVLIMVGTYTRIFRIGRTQPALESSLKTSFRRETKVLKTLSVIMGVFVFCWLPFFVLNCMVPFCRLECVSDTTFSVFVWFGWANSSLNPVIYAFNA-DFRKAFSTILGCSRYCRTSAVEAVDYHHDTTLQK---
>OPSP_PETMA
HHSLPSALPSATGGNGTVATMHNPFERPLEGIAPWNFTMLAALMGTITALSLGENFAVIVVTARFRQLRQPLNYVLVNLAAADLLVSAIGGSVSFFTNIKGYFFLGVHACVLEGFAVTYFGVVALWSLALLAFERYFVICRPLGNFRLQSKHAVLGLAVVWVFSLACTLPPVLGWSSYRPSMIGTTCEPN--WYSGELHDHTFILMFFSTCFIFPLAVIFFSYGKLIQKLKKASETQRGLESTRRAEQQVTRMVVVMILAFLVCWMPYATFSIVVTACPTI---PLLAAVPAFFSKTATVYNPVIYIFMNKQFRDCFVQVLPCKGLKKVSATQTAGASVNTQSPGNRH
>DADR_RAT
---------------MAPNTSTMDEAGLPAERDFSFRILTACFLSLLILSTLLGNTLVCAAVIRFRHRSKVTNFFVISLAVSDLLVAVLVMPWKAVAEIAGFWPLG-PFCNIWVAFDIMCSTASILNLCVISVDRYWAISSPFYERKMTPKAAFILISVAWTLSVLISFIPVQLSWHKAGN-EDDNCDTR--------LSRTYAISSSLISFYIPVAIMIVTYTSIYRIAQKQVECESSFKMSFKRETKVLKTLSVIMGVFVCCWLPFFISNCMVPFCGSECIDSITFDVFVWFGWANSSLNPIIYAFNA-DFQKAFSTLLGCYRLCPTTNNAIETAVVFSSH-----
>GALS_RAT
------------MNGSGSQGAENTSQEGGSGGWQPEAVLVPLFFALIFLVGTVGNALVLAVLLRGGQAVSTTNLFILNLGVADLCFILCCVPFQATIYTLDDWVFGSLLCKAVHFLIFLTMHASSFTLAAVSLDRYLAIRYPLSRELRTPRNALAAIGLIWGLALLFSGPYLSYYR---AN--LTVCHPA----WSAPRRRAMDLCTFVFSYLLPVLVLSLTYARTLRYLWRTDP-VTAGSGSQRAKRKVTRMIIIVAVLFCLCWMPHHALILCVWFGRFPRATYALRILSHLVSYANSCVNPIVYALVSKHFRKGFRKICAGLLRPAPRRASGRVHSGSMLEQESTD
>5H7_MOUSE
SSWMPHLLSGFPEVTASPAPTNVSGCGEQINYGRVEKVVIGSILTLITLLTIAGNCLVVISVCFVKNVRQPSNYLIVSLALADLSVAVAVMPFVSVTDLIGGWIFGHFFCNVFIAMDVMCCTASIMTLCVISIDRYLGITRPLYPVRQNGKCMAKMILSVWPLSASITLPPLFGWAQ--N-D--KVCLIS--------QDFGYTIYSTAVAFYIPMSVMLFMYYQIYKAARKSSRLERKNISSFKREQKAATTLGIIVGAFTVCWLPFFLLSTARPFICGTCIPLWVERTCLWLGYANSLINPFIYSFFNRDLRTTYRSLLQCQYRNINRKLSAAGAERPERSEFVLQ
>EBI2_HUMAN
------MDIQMANNFTPPSATPQGNDCDLYAHHSTARIVMPLHYSLVFIIGLVGNLLALVVIVQNRKKINSTTLYSTNLVISDILFTTALPTRIAYYAMGFDWRIGDALCRITALVFYINTYAGVNFMTCLSIDRFIAVVHPLYNKIKRIEHAKGVCIFVWILVFAQTLPLLINPMS--KQEERITCMEY--NFEETKSLPWILLGACFIGYVLPLIIILICYSQICCKLFRTAK-QNPLTEKSGVNKKALNTIILIIVVFVLCFTPYHVAIIQHMIKKLRHSFQISLHFTVCLMNFNCCMDPFIYFFACKGYKRKVMRMLKRQVSVSISSAVKSAMTETQMMIHSKS
>TSHR_CANFA
LQAFDSHYDYTVCGGNEDMVCTPKSDEFNPCEDIMGYKFLRIVVWFVSLLALLGNVFVLIVLLTSHYKLTVPRFLMCNLAFADFCMGMYLLLIASVDLYTHSWQTG-PGCNTAGFFTVFASELSVYTLTVITLERWYAITFAMLDRKIRLRHAYAIMVGGWVCCFLLALLPLVGISSY----KVSICLPM-----TETPLALAYIILVLLLNIVAFIIVCSCYVKIYITVRNP------QYNPGDKDTKIAKRMAVLIFTDFMCMAPISFYALSALMNKPLITVTNSKILLVLFYPLNSCANPFLYAIFTKAFQRDVFILLSKFGICKRQAQAYRGSAGIQIQKVTRD
>NY4R_RAT
HLMASLSPAFLQGKNGTNPLDSLYNLSDGCQDSADLLAFIITTYSVETVLGVLGNLCLIFVTTRQKEKSNVTNLLIANLAFSDFLMCLICQPLTVTYTIMDYWIFGEVLCKMLTFIQCMSVTVSILSLVLVALERHQLIINPT-GWKPSISQAYLGIVVIWFISCFLSLPFLANSILNDED--KVVCFVS---WSSDHHRLIYTTFLLLFQYCVPLAFILVCYMRIYQRLQRQRA-THTCSSRVGQMKRINGMLMAMVTAFAVLWLPLHVFNTLEDWYQEACHGNLIFLMCHLFAMASTCVNPFIYGFLNINFKKDIKALVLTCRCRPPQGEP---TVHTDLSKGSMR
>CKR5_PAPHA
----MDYQVSSPTYDIDYYTSEPC---QKINVKQIAARLLPPLYSLVFIFGFVGNILVVLILINCKRLKSMTDIYLLNLAISDLLFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQR-EG-HYTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKRFCKCCSIFA-------SSVY
>OLF2_RAT
------------MESGNSTRRFSSFFLLGFTENPQLHFLIFALFLSMYLVTVLGNLLIIMAIITQSHLHTPMYFFLANLSFVDICFTSTTIPKMLVNIYTQSKSITYEDCISQMCVFLVFAELGNFLLAVMAYDRYVA-CHPLYTVIVNHRLCILLLLLSWVISIFHAFIQSLIVLQL---TPHFFCELNQLTCSDNFPSHLIMNLVPVMLAAISFSGILYSYFKIVSSI--------HSISTVQGKYKAFSTCASHLSIVSLFYSTGLGVYVSSAVVQ---SSHSAASASVMYTVVTPMLNPFIYSLRNKDVKRALERLLEGNCKVHHWTG----------------
>HH2R_HUMAN
-------------------MAPNGTASSFCLDSTACKITITVVLAVLILITVAGNVVVCLAVGLNRRLRNLTNCFIVSLAITDLLLGLLVLPFSAIYQLSCKWSFGKVFCNIYTSLDVMLCTASILNLFMISLDRYCAVMDPLYPVLVTPVRVAISLVLIWVISITLSFLSIHLGWN--RN-TTSKCKVQ--------VNEVYGLVDGLVTFYLPLLIMCITYYRIFKVARDQR--ISSWKAATIREHKATVTLAAVMGAFIICWFPYFTAFVYRGLRGDD-INEVLEAIVLWLGYANSALNPILYAALNRDFRTGYQQLFCCRLANRNSHKTSLRRTQSREPRQ---
>CKR5_HYLLE
----MDYQVSSPTYDIDYDTSEPC---QKINVKQIAARLLPPLYSLVFIFGFVGNMLVILVLINCKRLKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQK-EG-HYTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKHFCKCCSIFA-------SSVY
>B3AR_HUMAN
MAPWPHENSSLAPWPDLPTLAPNTANTSGLPGVPWEAALAGALLALAVLATVGGNLLVIVAIAWTPRLQTMTNVFVTSLAAADLVMGLLVVPPAATLALTGHWPLGATGCELWTSVDVLCVTASIETLCALAVDRYLAVTNPLYGALVTKRCARTAVVLVWVVSAAVSFAPIMSQWWRVQ-R--RCCAFA--------SNMPYVLLSSSVSFYLPLLVMLFVYARVFVVATRQGVPRRPARLLPLREHRALCTLGLIMGTFTLCWLPFFLANVLRALGGPS-VPGPAFLALNWLGYANSAFNPLIYCRSP-DFRSAFRRLLCRCGRRLPPEPCAAAGVPAARSSPAQP
>NY6R_MOUSE
--MEVLTNQPTPNKTSGKSNNSAFFYFESCQPPFLAILLLLIAYTVILIMGIFGNLSLIIIIFKKQRAQNVTNILIANLSLSDILVCVMCIPFTVIYTLMDHWVFGNTMCKLTSYVQSVSVSVSIFSLVLIAIERYQLIVNPR-GWKPRVAHAYWGIILIWLISLTLSIPLFLSYHLTNTH--QVACVEI---WPSKLNQLLFSTSLFMLQYFVPLGFILICYLKIVLCLRKRTRQRKENKSRLNENKRVNVMLISIVVTFGACWLPLNIFNVIFDWYHEMCHHDLVFVVCHLIAMVSTCINPLFYGFLNKNFQKDLMMLIHHCWCGEPQESY---TMHTDESKGSLK
>OLF4_RAT
-------------MTGNNQTLILEFLLLGLPIPSEYHLLFYALFLAMYLTIILGNLLIIVLVRLDSHLHMPMYLFLSNLSFSDLCFSSVTMPKLLQNMQSQVPSISYTGCLTQLYFFMVFGDMESFLLVVMAYDRYVAICFPLYTTIMSTKFCASLVLLLWMLTMTHALLHTLLIARL---SLHFFCDISKLSCSDIYVNELMIYILGGLIIIIPFLLIVMSYVRIFFSI--------LKFPSIQDIYKVFSTCGSHLSVVTLFYGTIFGIYLCPSGNN---STVKEIAMAMMYTVVTPMLNPFIYSLRNRDMKRALIRVICTKKISL--------------------
>OPSD_ZEUFA
MNGTEGPDFYVPMVNTTGIVRSPYDYPQYYLVNPAAFSMLAAYMFFLILVGFPVNFLTLYVTMEHKKLRTPLNYILLNLAVANLFMVIGGFTTTMYTSMHGYFVLGRTGCNLEGFFATLGGEIALWSLVVLAVERWVVVCKPISNFRFGENHAVMGVSFTWLMACACSVPPLFGWSRYIPEGMQCSCGIDYYTRAPGYNNESFVIYMFVCHFSIPLTIIFFCYGRLLCAVKDAAAAQQESETTQRAEREVSRMVVIMVIGFLICWLPYASVAWFIFTHQGSEFGPVFMTIPAFFAKSSAIYNPMIYICMNKQFRHCMITTLCCGKNPFEEEEGASTASSVSSSHVSPA
>OPSD_CYPCA
MNGTEGPMFYVPMSNATGVVKSPYDYPQYYLVAPWAYGCLAAYMFFLIITGFPINFLTLYVTIEHKKLRTPLNYILLNLAISDLFMVFGGFTTTMYTSLHGYFVFGRIGCNLEGFFATLGGEMGLWSLVVLAFERWMVVCKPVSNFRFGENHAIMGVVFTWFMACTCAVPPLVGWSRYIPEGMQCSCGVDYYTRAPGYNNESFVIYMFLVHFIIPLIVIFFCYGRLVCTVKDAAAQQQESETTQRAEREVTRMVVIMVIGFLICWIPYASVAWYIFTHQGSEFGPVFMTVPAFFAKSAAVYNPCIYICMNKQFRHCMITTLCCGKNPFEEEEGASTASSVSSSSVSPA
>A2AR_LABOS
-----MDPLNATGMDAFTAIHLNASWSADSGYSLAAIASIAALVSFLILFTVVGNILVVIAVLTSRALKAPQNLFLVSLATADILVATLVMPFSLANELMGYWYFGKVWCGIYLALDVLFCTSSIVHLCAISLDRYWSVTQAVYNLKRTPKRVKCIIVIVWLISAFISSPPLLSIDS--I-S--PQCMLN--------DDTWYILSSSMASFFAPCLIMILVYIRIYQVAKTRKRRIAEKKVSQAREKRFTFVLAVVMGVFVVCWFPFFFSYSLHAVCRDYKIPDTLFK-FFWIGYCNSSLNPAIYTIFNRDFRRAFQKILCKSWKKSF-------------------
>D3DR_HUMAN
--------MASLSQLSSHLNYTCGAENSTGASQARPHAYYALSYCALILAIVFGNGLVCMAVLKERALQTTTNYLVVSLAVADLLVATLVMPWVVYLEVTGGWNFSRICCDVFVTLDVMMCTASILNLCAISIDRYTAVVMPVGTGQSSCRRVALMITAVWVLAFAVSCPLLFGFN---D-P--TVCSI---------SNPDFVIYSSVVSFYLPFGVTVLVYARIYVVLKQRTSLPLQPRGVPLREKKATQMVAIVLGAFIVCWLPFFLTHVLNTHCQTCHVSPELYSATTWLGYVNSALNPVIYTTFNIEFRKAFLKILSC-------------------------
>D2DR_FUGRU
-----MDVFTQYAYNDSIFDNGTWSANETTKDETHPYNYYAMLLTLLIFVIVFGNVLVCMAVSREKALQTTTNYLIVSLAVADLLVATLVMPWVVYLEVVGEWRFSKIHCDIFVTLDVMMCTASILNLCAISIDRYTAVAMPMNTRYSSRRRVTVMISVVWVLSFAISCPLLFGLN---T-R--SLCFI---------ANPAFVVYSSIVSFYVPFIVTLLVYVQIYVVLRKRQTSLSKRKISQQKEKKATQMLAIVLGVFIICWLPFFITHILNTHCTRCKVPAEMYNAFTWLGYVNSAVNPIIYTTFNVEFRKAFIKILHC-------------------------
>P2Y4_HUMAN
ASTESSLLRSLGLSPGPGS---SEVELDCWFDEDFKFILLPVSYAVVFVLGLGLNAPTLWLFIFRLRPWDATATYMFHLALSDTLYVLSLPTLIYYYAAHNHWPFGTEICKFVRFLFYWNLYCSVLFLTCISVHRYLGICHPLALRWGRPRLAGLLCLAVWLVVAGCLVPNLFFVTTS-----TVLCHDT-TRPEEFDHYVHFSSAVMGLLFGVPCLVTLVCYGLMARRLYQP----LPGSAQSSSRLRSLRTIAVVLTVFAVCFVPFHITRTIYYLARLLNIVNVVYKVTRPLASANSCLDPVLYLLTGDKYRRQLRQLCGGGKPQPRTAASSLASSCRWAATPQDS
>GALR_MOUSE
----MELAMVNLSEGNGSDPEPPAPESRPLFGIGVENFITLVVFGLIFAMGVLGNSLVITVLARSKPPRSTTNLFILNLSIADLAYLLFCIPFQATVYALPTWVLGAFICKFIHYFFTVSMLVSIFTLAAMSVDRYVAIVHSRSSSLRVSRNALLGVGFIWALSIAMASPVAYHQRLF-SN--QTFCWEQ---WPNKLHKKAYVVCTFVFGYLLPLLLICFCYAKVLNHLHKKLK--NMSKKSEASKKKTAQTVLVVVVVFGISWLPHHVVHLWAEFGAFPPASFFFRITAHCLAYSNSSVNPIIYAFLSENFRKAYKQVFKCHVCDESPRSETKEPPSTNCTHV---
>OAR1_LOCMI
SSAAEEPQDALVGGDACGGRRPPSVLGVRLAVPEWEVAVTAVSLSLIILITIVGNVLVVLSVFTYKPLRIVQNFFIVSLAVADLTVAVLVMPFNVAYSLIQRWVFGIVVCKMWLTCDVLCCTASILNLCAIALDRYWAITDPIYAQKRTLRRVLAMIAGVWLLSGVISSPPLIGWNDW-N-D--TPCQLT--------EEQGYVIYSSLGSFFIPLFIMTIVYVEIFIATKRRPVYEEKQRISLSKERRAARTLGIIMGVFVVCWLPFFLMYVIVPFCNPSKPSPKLVNFITWLGYINSALNPIIYTIFNLDFRRAFKKLLHFKT-----------------------
>ML1A_SHEEP
GSPGGTPKGNGSSALLNVSQAAPGAGDGVRPRPSWLAATLASILIFTIVVDIVGNLLVVLSVYRNKKLRNAGNVFVVSLAVADLLVAVYPYPLALASIVNNGWSLSSLHCQLSGFLMGLSVIGSVFSITGIAINRYCCICHSLYGKLYSGTNSLCYVFLIWTLTLVAIVPNLCVGT-LQYDP-IYSCTFT------QSVSSAYTIAVVVFHFIVPMLVVVFCYLRIWALVLQVWK-PDNKPKLKPQDFRNFVTMFVVFVLFAICWAPLNFIGLVVASDPASRIPEWLFVASYYMAYFNSCLNAIIYGLLNQNFRQEYRKIIVSLCTTKMFFVDSSNRKPSPLIANHNL
>OPSD_SOLSO
MNGTEGPYFYIPMLNTTGIVRSPYEYPQYYLVNPAAYAALCAYMFLLILLGFPINFLTLYVTIEHKKLRTPLNYILLNLAVANLFMVFGGFTTTMYTSMHGYFVLGRLGCNLEGFFATLGGEIGLWSLVVLAVERWMVVCKPISNFRFTENHAIMGLGFTWFAASACAVPPLVGWSRYIPEGMQCSCGVDYYTRAEGFNNESFVVYMFVCHFLIPLIVVFFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVVIMVIAFLICWCPYAGVAWYIFSNQGSEFGPLFMTIPAFFAKSSSIYNPLIYIFMNKQFRHCMITTLCCGKNPFEEEEGSTTSASSSSVSPAA-
>ML1A_MOUSE
-------MKGNVSELLNATQQAPGGGEGGRPRPSWLASTLAFILIFTIVVDILGNLLVILSVYRNKKLRNSGNIFVVSLAVADLVVAVYPYPLVLTSILNNGWNLGYLHCQVSAFLMGLSVIGSIFNITGIAMNRYCYICHSLYDKIYSNKNSLCYVFLIWMLTLIAIMPNLQTGT-LQYDP-IYSCTFT------QSVSSAYTIAVVVFHFIVPMIIVIFCYLRIWVLVLQVRR-PDNKPKLKPQDFRNFVTMFVVFVLFAICWAPLNLIGLIVASDPATRIPEWLFVASYYLAYFNSCLNAIIYGLLNQNFRKEYKKIIVSLCTAKMFFVESSNCKPSPLIPNNNL
>OPSD_POERE
MNGTEGPYFYVPMVNTTGIVRSPYEYPQYYLVSPAAYACLGAYMFFLILVGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTIYTSMHGYFVLGRLGCNLEGYFATLGGEIGLWSLVVLAVERWLVVCKPISNFRFSENHAIMGLVFTWIMANSCAAPPLLGWSRYIPEGMQCSCGVDYYTRAEGFNNESFVIYMFICHFCIPLIVVFFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVVIMVIGFLVCWIPYASVAWYIFTHQGSEFGPLFMTVPAFFAKSASIYNPLIYICMNKQFRHCMITTLCCGKNPFEEEEGASTASSVSSSSVSPA
>PE22_HUMAN
----------------MGNASNDSQSEDCETRQWLPPGESPAISSVMFSAGVLGNLIALALLARRWRSLSLFHVLVTELVFTDLLGTCLISPVVLASYARNQLAPESRACTYFAFAMTFFSLATMLMLFAMALERYLSIGHPYYQRRVSASGGLAVLPVIYAVSLLFCSLPLLDYGQYVQYCPGTWCFIR--------HGRTAYLQLYATLLLLLIVSVLACNFSVILNLIRMGGPRRGERVSMAEETDHLILLAIMTITFAVCSLPFTIFAYMNETSS---RKEKWDLQALRFLSINSIIDPWVFAILRPPVLRLMRSVLCCRISLRTQDATQTSSKQADL------
>CKR3_HUMAN
MTTSLDTVETFGTTSYYDDVGLLC---EKADTRALMAQFVPPLYSLVFTVGLLGNVVVVMILIKYRRLRIMTNIYLLNLAISDLLFLVTLPFWIHYVRGHN-WVFGHGMCKLLSGFYHTGLYSEIFFIILLTIDRYLAIVHAVALRARTVTFGVITSIVTWGLAVLAALPEFIFYETEE-LF-ETLCSALYPEDTVYSWRHFHTLRMTIFCLVLPLLVMAICYTGIIKTL---------LRCPSKKKYKAIRLIFVIMAVFFIFWTPYNVAILLSSYQSILKHLDLVMLVTEVIAYSHCCMNPVIYAFVGERFRKYLRHFFHRHLLMHLGRYIPFLT-------SSVS
>GPRA_HUMAN
TPANQSAEASAGNGSVAGADAPAVTPFQSLQLVHQLKGLIVLLYSVVVVVGLVGNCLLVLVIARVPRLHNVTNFLIGNLALSDVLMCTACVPLTLAYAFEPRWVFGGGLCHLVFFLQPVTVYVSVFTLTTIAVDRYVVLVHPL--RRASRCASAYAVLAIWALSAVLALPPAVHTYHVEHD--VRLCEEF--WGSQERQRQLYAWGLLLVTYLLPLLVILLSYVRVSVKLRNRPGC-SQADWDRARRRRTFCLLVVVVVVFAVCWLPLHVFNLLRDLDPHAYAFGLVQLLCHWLAMSSACYNPFIYAWLHDSFREELRKLLVAWPRKIAPHGQNMT------------
>OPSD_LIZAU
MNGTEGPYFYIPMVNTTGIVRSPYEYPQYYLVNPAAYAALGAYMFLLILIGFPVNFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVLGRLGCNLEGFFATLGGEIALWSLVVLAVERWMVVCKPISNFRFGEDHAIMGLAFTWVMAAACAVPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFVCHFLIPLVVVFFCYGRLLCAVKEAAAAQQESETTQRAEREVSRMVVIMVVAFLVCWCPYAGVAWYIFTHQGSEFGPLFMTFPAFFAKSSSIYNPMIYICMNKQFRQCMITTLCCGKNPFEEEEGASTSVSSSSVSPAA-
>PF2R_BOVIN
--MSTNS---------SIQPVSPESELLSNTTCQLEEDLSISFSIIFMTVGILSNSLAIAILMKAYQYKSSFLLLASALVITDFFGHLINGTIAVFVYASDKFDKSNILCSIFGICMVFSGLCPLFLGSLMAIERCIGVTKPIHSTKITTKHVKMMLSGVCFFAVFVALLPILGHRDYKIQASRTWCFYK--TDEIKDWEDRFYLLLFAFLGLLALGISFVCNAITGISLLKVKFRSQQHRQGRSHHFEMVIQLLGIMCVSCICWSPFLVTMASIGMNIQDKDSCERTLFTLRMATWNQILDPWVYILLRKAVLRNLYVCTRRCCGVHVISLHVWELKVAAISDLPVT
>GALR_RAT
----MELAPVNLSEGNGSDPEPP-AEPRPLFGIGVENFITLVVFGLIFAMGVLGNSLVITVLARSKPPRSTTNLFILNLSIADLAYLLFCIPFQATVYALPTWVLGAFICKFIHYFFTVSMLVSIFTLAAMSVDRYVAIVHSRSSSLRVSRNALLGVGFIWALSIAMASPVAYYQRLF-SN--QTFCWEH---WPNQLHKKAYVVCTFVFGYLLPLLLICFCYAKVLNHLHKKLK--NMSKKSEASKKKTAQTVLVVVVVFGISWLPHHVIHLWAEFGAFPPASFFFRITAHCLAYSNSSVNPIIYAFLSENFRKAYKQVFKCRVCNESPHGDAKEPPSTNCTHV---
>OPSU_BRARE
MNGTEGPAFYVPMSNATGVVRSPYEYPQYYLVAPWAYGFVAAYMFFLIITGFPVNFLTLYVTIEHKKLRTPLNYILLNLAIADLFMVFGGFTTTMYTSLHGYFVFGRLGCNLEGFFATLGGEMGLKSLVVLAIERWMVVCKPVSNFRFGENHAIMGVAFTWVMACSCAVPPLVGWSRYIPEGMQCSCGVDYYTRTPGVNNESFVIYMFIVHFFIPLIVIFFCYGRLVCTVKEAARQQQESETTQRAEREVTRMVIIMVIAFLICWLPYAGVAWYIFTHQGSEFGPVFMTLPAFFAKTSAVYNPCIYICMNKQFRHCMITTLCCGKNPFEEEEGASTASSVSSSSVSPA
>GPR6_RAT
GPPAASAALGGGGGPNGSLELSSQLPAGPSGLLLSAVNPWDVLLCVSGTVIAGENALVVALIASTPALRTPMFVLVGSLATADLLAG-CGLILHFVFQ-Y--VVPSETVSLLMVGFLVASFAASVSSLLAITVDRYLSLYNALYYSRRTLLGVHLLLAATWTVSLGLGLLPVLGWNCL-A--DRASCSVV-------RPLTRSHVALLSTSFFVVFGIMLHLYVRICQVVWRHIALHCLAPPHLAATRKGVGTLAVVLGTFGASWLPFAIYCVVGSQ----EDPAIYTYATLLPATYNSMINPIIYAFRNQEIQRALWLLFCGCFQSKVPFRSRSP------------
>ACM3_CHICK
DSPETTESFPFSTVETTNSSLNATIKDPLGGHAVWQVVLIAFLTGIIALVTIIGNILVIVSFKVNKQLKTVNNYFLLSLACADLIIGVISMNLFTTYIIMGHWALGNLACDLWLSIDYVASNASVMNLLVISFDRYFSITRPLYRAKRTTKRAGVMIGLAWIISFVLWAPAILFWQYFV-VP-LDECFIQ------FLSEPIITFGTAIAAFYLPVTIMSILYWRIYKETEKRKTRTKRKRMSLIKEKKAAQTLSAILFAFIITWTPYNIMVLVNTFCDC--VPKTVWNLGYWLCYINSTVNPVCYALCNKMFRNTFKMLLLCQCDKRKRRKQQYQHKRIPREAS---
>NY5R_CANFA
-------------NTAATRNSDFPVWDDYKSSVDDLQYFLIGLYTFVSLLGFMGNLLILMALMRKRNQKTMVNFLIGNLAFSDILVVLFCSPFTLTSVLLDQWMFGKVMCHIMPFLQCVSVLVSTLILISIAIVRYHMIKHPI-SNNLTANHGYFLIATVWTLGFAICSPLPVFHSLVESS--RYLCVES---WPSDSYRIAFTISLLLVQYILPLVCLTVSHTSVCRSISCGVHDNRSIMRIKKRSRSVFYRLTILILVFAVSWMPLHLFHVVTDFNDNLRHFKLVYCICHLLGMMSCCLNPILYGFLNNGIKADLISLIQCLHMS---------------------
>MC5R_SHEEP
-MNSSFHLHFLDLGLNATEGNLSGLSVRNASSPCEDMGIAVEVFLALGLISLLENILVIGAIVRNRNLHIPMYFFVGSLAVADMLVSLSNFWETITIYLLTNDASVRHLDNVFDSMICISVVASMCSLLAIAVDRYVTIFCRLYQRIMTGRRSGAIIAGIWAFCTSCGTVFIVYYY----------------------EESTYVVVCLIAMFLTMLLLMASLYTHMFLLARTH-RIPGHSSVRQRTGVKGAITLAMLLGVFIICWAPFFLHLILMISCPQNSCFMSHFNMYLILIMCNSVIDPLIYAFRSQEMRKTFKEIVCFQGFRTPCRFPSTY------------
>V2R_RAT
MLLVSTVSAVPGLFSPPSSPSNSSQEELLDDRDPLLVRAELALLSTIFVAVALSNGLVLGALIRRGRRWAPMHVFISHLCLADLAVALFQVLPQLAWDATDRFHGPDALCRAVKYLQMVGMYASSYMILAMTLDRHRAICRPMAYRHGGGARWNRPVLVAWAFSLLLSLPQLFIFAQRDSG--VFDCWAR---FAEPWGLRAYVTWIALMVFVAPALGIAACQVLIFREIHASGRRPSEGAHVSAAMAKTVRMTLVIVIVYVLCWAPFFLVQLWAAWDPEA-LERPPFVLLMLLASLNSCTNPWIYASFSSSVSSELRSLLCCAQRHTTHSLGPQDSSLMKDTPS---
>EDG2_MOUSE
QPQFTAMNEQQCFYNESIAFFYNRSGKYLATEWNTVSKLVMGLGITVCVFIMLANLLVMVAIYVNRRFHFPIYYLMANLAAADFFAG-LAYFYLMFNTGPNTRRLTVSTWLLRQGLIDTSLTASVANLLAIAIERHITVFRMQLHTRMSNRRVVVVIVVIWTMAIVMGAIPSVGWNCI-C--DIDHCSNM------APLYSDSYLVFWAIFNLVTFVVMVVLYAHIFGYVRQRRMSSSGPRRNRDTMMSLLKTVVIVLGAFIVCWTPGLVLLLLDVCCPQC-DVLAYEKFFLLLAEFNSAMNPIIYSYRDKEMSATFRQILCCQRNENPNGPTEGSNHTILAGVHSND
>EDG1_RAT
VKALRSQVSDYGNYDIIVRHYNYTGKLNIGVEKDHGIKLTSVVFILICCLIILENIFVLLTIWKTKKFHRPMYYFIGNLALSDLLAG-VAYTANLLLSGATTYKLTPAQWFLREGSMFVALSASVFSLLAIAIERYITMLKMKLHNGSNSSRSFLLISACWVISLILGGLPIMGWNCI-S--SLSSCSTV------LPLYHKHYILFCTTVFTLLLLSIVILYCRIYSLVRTRLTFISKASRSSEKSLALLKTVIIVLSVFIACWAPLFILLLLDVGCKAKCDILYKAEYFLVLAVLNSGTNPIIYTLTNKEMRRAFIRIISCCKCPNGDSAGKFKEFSRSKSDNSSH
>CKR8_HUMAN
DYTLDLSVTTVTDYYYPDIFSSPC---DAELIQTNGKLLLAVFYCLLFVFSLLGNSLVILVLVVCKKLRSITDVYLLNLALSDLLFVFSFPFQTYYLLDQ--WVFGTVMCKVVSGFYYIGFYSSMFFITLMSVDRYLAVVHAVALKVRTIRMGTTLCLAVWLTAIMATIPLLVFYQVAS-ED-VLQCYSF-YNQQTLKWKIFTNFKMNILGLLIPFTIFMFCYIKILHQL---------KRCQNHNKTKAIRLVLIVVIASLLFWVPFNVVLFLTSLHSMHQQLTYATHVTEIISFTHCCVNPVIYAFVGEKFKKHLSEIFQKS-CSQIFNYLGRQKS------SSCQ
>B1AR_MACMU
LPDGVATAARLLVPASPPASLLPPASEGPEPLSQQWTAGMGLLMALIVLLIVAGNVLVIVAIAKTPRLQTLTNLFIMSLASADLVMGLLVVPFGATIVVWGRWEYGSFFCELWTSVDVLCVTASIETLCVIALDRYLAITSPFYQSLLTRARARGLVCTVWAISALVSFLPILMHWWRAR-R--KCCDFV--------TNRAYAIASSVVSFYVPLCIMAFVYLRVFREAQKQNGRRRPSRLVALREQKALKTLGIIMGVFTLCWLPFFLANVVKAFHREL-VPDRLFVFFNWLGYANSAFNPIIYCRSP-DFRNAFQRLLCCARRAARRRHAAHGCLARPGPPPSPG
>DBDR_.ENLA
FQHLDSDQVASWQSPEMLMNKSVSRESQRRKELVAGQIVTGSLLLLLIFWTLFGNILVCTAVMRFRHRSRVTNIFIVSLAVSDLLVALLVMPWKAVAEVAGHWPFG-AFCDIWVAFDIMCSTASILNLCVISVDRYWAISSPFYERKMTQRVALLMISTAWALSVLISFIPVQLSWHKSDH-STGNCDSS--------LNRTYAISSSLISFYIPVAIMIVTYTRIYRIAQIQSCSQTSLRTSIKKETKVLKTLSIIMGVFVCCWLPFFILNCMVPFCDRSCVSETTFDIFVWFGWANSSLNPIIYAFNA-DFRKVFSSLLGCGHWCSTTPVETVNYNQDTLFHK---
>O00155
PTEPWSPSPGSAPWDYSGLDGLEELELCPAGDLPYGYVYIPALYLAAFAVGLLGNAFVVWLLAGRRGPRRLVDTFVLHLAAADLGFVLTLPLWAAAAARRP-WPFGDGLCKLSTFALAGTRSAGALLLAGMSVDRYLAVVKLLARPLRTPRCAVASCCGVWAVALLAGLPSLVYRGLQPLPG-DSQCGE-----EPSHAFQGLSLLLLLLTFVLPLVVTLFCYCRISRRL-------RRPPHVGRARRNSLRIIFAIESTFVGSWLPFSALRAVFHLARLGLALRWGLTIATCLAFVNSCANPLIYLLLDRSFRARALDGACGRTGRLARRISSASSVFRCRAQAANT
>NY4R_HUMAN
HLLALLLPKSPQGENRSKPLGTPYNFSEHCQDSVDVMVFIVTSYSIETVVGVLGNLCLMCVTVRQKEKANVTNLLIANLAFSDFLMCLLCQPLTAVYTIMDYWIFGETLCKMSAFIQCMSVTVSILSLVLVALERHQLIINPT-GWKPSISQAYLGIVLIWVIACVLSLPFLANSILENAD--KVVCTES---WPLAHHRTIYTTFLLLFQYCLPLGFILVCYARIYRRLQRQGRVKGTYSLRAGHMKQVNVVLVVMVVAFAVLWLPLHVFNSLEDWHHEACHGNLIFLVCHLLAMASTCVNPFIYGFLNTNFKKEIKALVLTCQQSAPLEES---TVHTEVSKGSLR
>SSR2_MOUSE
QLNGSQVWVSSPFDLNGSLGPSNGSNQTEPYYDMTSNAVLTFIYFVVCVVGLCGNTLVIYVILRYAKMKTITNIYILNLAIADELFMLGLPFLAMQVALVH-WPFGKAICRVVMTVDGINQFTSIFCLTVMSIDRYLAVVHPISAKWRRPRTAKMINVAVWCVSLLVILPIMIYAGLRSWG--RSSCTIN-WPGESGAWYTGFIIYAFILGFLVPLTIICLCYLFIIIKVKSSGIR-VGSSKRKKSEKKVTRMVSIVVAVFIFCWLPFYIFNVSSVSVAISPALKGMFDFVVILTYANSCANPILYAFLSDNFKKSFQNVLCLVKVSGTEDGERSDLNETTETQRTLL
>EDG2_SHEEP
QPQFTAMNEPQCFYNESIAFFYNRSGKYLATEWNTVSKLVMGLGITVCIFIMLANLLVMVAIYVNRRFHFPIYYLMANLAAADFFAG-LAYFYLMFNTGPNTRRLTVSTWLLRQGLIDTTVTASVANLLAIAIERHITVFRMQLHTRMSNRRVVVVIVVIWTMAIVMGAIPSVGWNCI-C--DIENCSNM------APLYSDSYLVFWAIFNLVTFVVMVVLYAHIFGYVRQRRMSSSGPRRNRDTMMSLLKTVVIVLGAFIICWTPGLVLLLLDVCCPQC-DVLAYEKFFLLLAEFNSAMNPIIYSYRDKEMSATFRQILCCQRSENTSGPTEGSNHTILAGVHSND
>5H6_MOUSE
-----------MVPEPGPVNSSTPAWGPGPPPAPGGSGWVAAALCVVIVLTAAANSLLIALICTQPALRNTSNFFLVSLFTSDLMVGLVVMPPAMLNALYGRWVLARGLCLLWTAFDVMCCSASILNLCLISLDRYLLILSPLYKLRMTAPRALALILGAWSLAALASFLPLLLGWH---E-APGQCRLL--------ASLPYVLVASGVTFFLPSGAICFTYCRILLAARKQMESRRLTTKHSRKALKASLTLGILLSMFFVTWLPFFVASIAQAVCDC--ISPGLFDVLTWLGYCNSTMNPIIYPLFMRDFKRALGRFVPCVHCPPEHRASPASSGARPGLSLQQV
>ACTR_CAVPO
-------------MKHIIHASGNVNGTARNNSDCPHVALPEEIFFIISITGVLENLIIILAVIKNKNLQFPMYFFICSLAISDMLGSLYKILESILIMFRNMGSFETTTDDIIDTMFILSLLGSIFSLLAIAVDRYITIFHALYHSIVTMHRTIAVLSIIWTFCIGSGITMVLFFS----------------------HHHVPTVLTFTSLFPLMLVFILCLYVHMFLMARSH---ARNISTLPRGNMRGAITLTILLGVFIFCWAPFILHILLVTFCPNNTCYISLFHVNGMLIMCNAVIDPFIYAFRSPELRSAFRRMISYSKCL---------------------
>OPSD_ALLMI
MNGTEGPDFYIPFSNKTGVVRSPFEYPQYYLAEPWKYSALAAYMFMLIILGFPINFLTLYVTVQHKKLRSPLNYILLNLAVADLFMVLGGFTTTLYTSMNGYFVFGVTGCYFEGFFATLGGEVALWCLVVLAIERYIVVCKPMSNFRFGENHAIMGVVFTWIMALTCAAPPLVGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFVVHFAIPLAVIFFCYGRLVCTVKEAAAQQQESATTQKAEKEVTRMVIIMVVSFLICWVPYASVAFYIFSNQGSDFGPVFMTIPAFFAKSSAIYNPVIYIVMNKQFRNCMITTLCCGKNPLGDDETATGTSSVSTSQVSPA
>OPSD_RANTE
MNGTEGPNFYIPMSNKTGVVRSPFEYPQYYLAEPWKYSILAAYMFLLILLGFPINFMTLYVTIQHKKLRTPLNYILLNLAFANHFMVLCGFTITLYTSLHGYFVFGQSGCYFEGFFATLGGEIALWSLVALAIERYIVVCKPMSNFRFGENHAMMGVAFTWIMALACAVPPLFGWSRYIPEGMQCSCGVDYYTLKPEINNESFVIYMFVVHFLIPLIIITFCYGRLVCTVKEAAAQQQESATTQKAEKEVTRMVIIMVIFFLICWVPYAYVAFYIFCNQGSEFGPIFMTVPAFFAKSSAIYNPVIYIMLNKQFRNCMITTLCCGKNPFGDDDASSAATSVSTSQVSPA
>A2AC_CAVPO
GPNASGAGEG----GGGVNASGAVWGPPPSQYSAGAVAGLAAVVGFLIVFTVVGNVLVVIAVLTSRALRAPQNLFLVSLASADILVATLVMPFSLANELMAYWYFGQVWCGVYLALDVLFCTSSIVHLCAISLDRYWSVTQAVYNLKRTPRRVKATIVAVWLISAIISFPPLVSFYR-------PRCGLN--------DETWYILSSCIGSFFAPCLIMGLVYARIYRVAKLRRRAVCRRKVAQAREKRFTFVLAVVMGVFVLCWFPFFFSYSLYGICREAQLPTPLFKFFFWIGYCNSSLNPVIYTIFNQDFRRSFKHILFRRRRRGFRQ-----------------
>PE23_RAT
---------MAGVWAPEHSVEAHSNQS---SAADGCGSVSVAFPITMMVTGFVGNALAMLLVVRSYRRKKSFLLCIGWLALTDLVGQLLTSPVVILVYLSQRLDPSGRLCTFFGLTMTVFGLSSLLVASAMAVERALAIRAPHYASHMKTRAT-PVLLGVWLSVLAFALLPVLGVG----RYSGTWCFISNETDSAREPGSVAFASAFACLGLLALVVTFACNLATIKALVSRAKASQSSAQWGRITTETAIQLMGIMCVLSVCWSPLLIMMLKMIFNQMSEKECNSFLIAVRLASLNQILDPWVYLLLRKILLRKFCQIRDHTN-YASSSTSLPCWSDQLER-----
>OPSF_ANGAN
MNGTEGPNFYVPMSNVTGVVRSPFEYPQYYLAEPWAYSALAAYMFFLIIAGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVFGPTGCNIEGFFATLGGEIALWCLVVLAVERWMVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLFGWSRYIPEGMQCSCGMDHYAPNPETYNESFVIYMFICHFTIPLTVISFCYGRLVCTVKEATAQQQESETTQRAEREVTRMVIIMVISFLVCWVPYASVAWYIFTHQGSSFGPIFMTIPAFFAKSSSLYNPLIYICMNKQSRNCMITTLCCGKNPFEEEEGASTASSVSS--VSPA
>OPSR_ORYLA
NE--DTTRGSAFTYTNSNHTRDPFEGPNYHIAPRWVYNLATLWMFFVVVLSVFTNGLVLVATAKFKKLRHPLNWILSNLAIADLGETVFASTISVCNQFFGYFILGHPMCVFEGYVVSTCGIAALWSLTIISWERWVVVCKPFGNVKFDAKWAIGGIVFSWVWSAVWCAPPVFGWSRYWPHGLKTSCGPDVFSGSDDPGVQSYMIVLMITCCIIPLAIIILCYLAVWLAIRAVAMQQKESESTQKAEREVSRMVVVMIVAYCVCWGPYTFFACFAAANPGYAFHPLAAAMPAYFAKSATIYNPVIYVFMNRQFRTCIMQLFGKQVDDG-----SEVSS------VAPA
>CKR5_CERAE
----MDYQVSSPTYDIDNYTSEPC---QKINVKQIAARLLPPLYSLVFIFGFVGNILVVLILINCKRLKSMTDIYLLNLAISDLLFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWVVAVFASLPRIIFTRSQR-EG-HYTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKRFCKCCSIFA-------SSVY
>D2DR_CERAE
---MDPLNLSWYDDDLERQNWSRPFNGSDGKADRPHYNYYATLLTLLIAVIVFGNVLVCMAVSREKALQTTTNYLIVSLAVADLLVATLVMPWVVYLEVVGEWKFSKIHCDIFVTLDVMMCTASILNLCAISIDRYTAVAMPMNTRYSSKRRVTVMIAIVWVLSFTISCPLLFGLN---Q----NECII---------ANPAFVVYSSIVSFYVPFIVTLLVYIKIYIVLRRRRTSMSRRKLSQQKEKKATQMLAIVLGVFIICWLPFFITHILNIHCDCN-IPPVLYSAFTWLGYVNSAVNPIIYTTFNIEFRKAFLKILHC-------------------------
>AA3R_SHEEP
-------------------------MPVNSTAVSWTSVTYITVEILIGLCAIVGNVLVIWVVKLNPSLQTTTFYFIVSLALADIAVGVLVMPLAIVISLG--VTIHFYSCLFMTCLMLIFTHASIMSLLAIAVDRYLRVKLTVYRRVTTQRRIWLALGLCWLVSFLVGLTPMFGWN-MKA-D-FLPCRFR-----SVMRMDYMVYFSFFLWILVPLVVMCAIYFDIFYIIRNR--SSRETGAFYGREFKTAKSLLLVLFLFALCWLPLSIINCILYFDGQ--VPQTVLYLGILLSHANSMMNPIVYAYKIKKFKETYLLILKACVMCQPSKSMDPS------------
>IL8B_RAT
SGDIDSYN-YSSDPPFTLSDAAPC----PSANLDINRYAVVVIYVLVTLLSLVGNSLVMLVILYNRSTCSVTDVYLLNLAIADLFFALTLPVWAASKVNG--WIFGSFLCKVFSFLQEITFYSSVLLLACISMDRYLAIVHATSTLIQKRHLVKFVCITMWFLSLVLSLPIFILRTTVK-PS-TVVCYEN-IGNNTSKWRVVLRILPQTYGFLLPLLIMLFCYGFTLRTL---------FKAHMGQKHRAMRVIFAVVLVFLLCWLPYNIVLFTDTLMRTKNEINKALEATEILGFLHSCLNPIIYAFIGQKFRHGLLKIMANYGLVSKEFLAKEG------------
>NY5R_MOUSE
ASPAWEDYRGTENNTSAARNTAFPVWEDYRGSVDDLQYFLIGLYTFVSLLGFMGNLLILMAVMKKRNQKTTVNFLIGNLAFSDILVVLFCSPFTLTSVLLDQWMFGKAMCHIMPFLQCVSVLVSTLILISIAIVRYHMIKHPI-SNNLTANHGYFLIATVWTLGFAICSPFPVFHSLVESS--KYLCVES---WPSDSYRIAFTISLLLVQYILPLVCLTVSHTSVCRSISCGAQEKRSLTRIKKRSRSVFYRLTILILVFAVSWMPLHVFHVVTDFNDNLRHFKLVYCICHLLGMMSCCLNPILYGFLNNGIKADLRALIHCLHMS---------------------
>OPS3_DROPS
ARLSAESRLLGWNVPPDELRHIPEHWLIYPEPPESMNYLLGTLYIFFTVISMIGNGLVMWVFSAAKSLRTPSNILVINLAFCDFMMMIKTPIFIYNSFHQG-YALGHLGCQIFGVIGSYTGIAAGATNAFIAYDRYNVITRPM-EGKMTHGKAIAMIIFIYLYATPWVVACYTESWGRFPEGYLTSCTFD--YLTDNFDTRLFVACIFFFSFVCPTTMITYYYSQIVGHVFSHNVDSNVDKSKEAAEIRIAKAAITICFLFFASWTPYGVMSLIGAFGDKTLLTPGATMIPACTCKMVACIDPFVYAISHPRYRMELQKRCPWLAISEKAPESRAAEQQQTTAA----
>B1AR_MOUSE
LPDGAATAARLLVLASPPASLLPPASEGSAPLSQQWTAGMGLLVALIVLLIVVGNVLVIVAIAKTPRLQTLTNLFIMSLASADLVMGLLVVPFGATIVVWGRWEYGSFFCELWTSVDVLCVTASIETLCVIALDRYLAITSPFYQSLLTRARARALVCTVWAISALVSFLPILMHWWRAR-R--KCCDFV--------TNRAYAIASSVVSFYVPLCIMAFVYLRVFREAQKQNGRRRPSRLVALREQKALKTLGIIMGVFTLCWLPFFLANVVKAFHRDL-VPDRLFVFFNWLGYANSAFNPIIYCRSP-DFRKAFQRLLCCARRAACRRRAAHGCLARAGPPPSPG
>O6A1_HUMAN
------------MEWRNHSGRVSEFVLLGFPAPVPIQVILFALLLLAYVLVLTENTLIIMAIRNHSTLHKPMYFFLANMSFLEIWYVTVTIPKMLAGFVGSKQLISFEGCMTQLYFFLGLGCTECVLLAVMANDRYMAICYLLNPVIVSGRLCVQMAAGSWAGGFGISMVKVFLISGL---SNHFFCDVSNLSCTDMSTAELTDFILAIFILLGPLSVTGASYVAITGAV--------MHIPSAAGRYKAFSTCASHFNVVIIFYAASIFIYARPKALS---AFDTNKLVSVLYAVIVPLLNPIIYCLRNQEVKRALCCILHLYQHQDPDPKKGSR------------
>GPR8_HUMAN
PLDSRGSFSLPTMGANVSQDNGTGHNATFSEPLPFLYVLLPAVYSGICAVGLTGNTAVILVILRAPKMKTVTNVFILNLAVADGLFTLVLPVNIAEHLLQY-WPFGELLCKLVLAVDHYNIFSSIYFLAVMSVDRYLVVLATVHMPWRTYRGAKVASLCVWLGVTVLVLPFFSFAGVYS-LQ-VPSCGLS-FPWPERVWFKASRVYTLVLGFVLPVCTICVLYTDLLRRLRAVR--RSGAKALGKARRKVTVLVLVVLAVCLLCWTPFHLASVVALTTDLPPLVISMSYVITSLTYANSCLNPFLYAFLDDNFRKNFRSILRC-------------------------
>CCKR_.ENLA
SSTNGTHNLTTANWPPWNLNCTPILDRKKPSPSDLNLWVRIVMYSVIFLLSVFGNTLIIIVLVMNKRLRTITNSFLLSLALSDLMVAVLCMPFTLIPNLMENFIFGEVICRAAAYFMGLSVSVSTFNLVAISIERYSAICNPLSRVWQTRSHAYRVIAATWVLSSIIMIPYLVYK----DRRVGHQCRLV---WPSKQVQQAWYVLLLTILFFIPGVVMIVAYGLISRELYRGKMDINNSEAKLMAKKRVIRMLIVIVAMFFICWMPIFVANTWKAFDELSTLTGAPISFIHLLSYTSACVNPLIYCFMNKRFRKAFLGTFSSCIKP----CRNFRATGASLSKFSYT
>ETBR_RAT
SSAPAEVTKGGRVAGVPPRS-FPPPCQRKIEINKTFKYINTIVSCLVFVLGIIGNSTLLRIIYKNKCMRNGPNILIASLALGDLLHIIIDIPINAYKLLAGDWPFGAEMCKLVPFIQKASVGITVLSLCALSIDRYRAVASWSIKGIGVPKWTAVEIVLIWVVSVVLAVPEAIGFDVITRV--LRVCMLNQKTAFMQFYKTAKDWWLFSFYFCLPLAITAIFYTLMTCEMLRKSGM-IALNDHLKQRREVAKTVFCLVLVFALCWLPLHLSRILKLTLYDQSFLLVLDYIGINMASLNSCINPIALYLVSKRFKNCFKSCLCCWCQTFE-EKQSLEFKANDHGYDNFR
>EDG2_HUMAN
QPQFTAMNEPQCFYNESIAFFYNRSGKHLATEWNTVSKLVMGLGITVCIFIMLANLLVMVAIYVNRRFHFPIYYLMANLAAADFFAG-LAYFYLMFNTGPNTRRLTVSTWLLRQGLIDTSLTASVANLLAIAIERHITVFRMQLHTRMSNRRVVVVIVVIWTMAIVMGAIPSVGWNCI-C--DIENCSNM------APLYSDSYLVFWAIFNLVTFVVMVVLYAHIFGYVRQRRMSSSGPRRNRDTMMSLLKTVVIVLGAFIICWTPGLVLLLLDVCCPQC-DVLAYEKFFLLLAEFNSAMNPIIYSYRDKEMSATFRQILCCQRSENPTGPTESSNHTILAGVHSND
>5H1D_RAT
SLPNQSLEGLPQEASNRSLNAT---GAWDPEVLQALRISLVVVLSIITLATVLSNAFVLTTILLTKKLHTPANYLIGSLATTDLLVSILVMPISIAYTTTRTWNFGQILCDIWVSSDITCCTASILHLCVIALDRYWAITDALYSKRRTAGHAAAMIAAVWAISICISIPPLFWR----H-E--SDCLVN-------TSQISYTIYSTCGAFYIPSILLIILYGRIYVAARSRKLALERKRISAARERKATKTLGIILGAFIICWLPFFVVSLVLPICRDSWIHPALFDFFTWLGYLNSLINPVIYTVFNEDFRQAFQRVVHFRKAS---------------------
>ET1R_BOVIN
ELSFVVTTHQPTNLALPSNGSMHNYCPQQTKITSAFKYINTVISCTIFIVGMVGNATLLRIIYQNKCMRNGPNALIASLALGDLIYVVIDLPINVFKLLAGRNDFGVFLCKLFPFLQKSSVGITVLNLCALSVDRYRAVASWSVQGIGIPLVTAIEIVSIWILSFILAIPEAIGFVMVPRT--HRTCMLNATSKFMEFYQDVKDWWLFGFYFCMPLVCTAIFYTLMTCEMLNRGSL-IALSEHLKQRREVAKTVFCLVVIFALCWFPLHLSRILKKTVYDESFLLLMDYIGINLATMNSCINPIALYFVSKKFKNCFQSCLCCCCYQSKSLMTSVPWKNHEQNNHNTE
>GRE2_BALAM
----MSGGEASITGRTAPELN-ASAAPLDDERELGETVAATALLLAIILVTIVGNSLVIISVFTYRPLRSVQNFFVVSLAVADLTVALFVLPLNVAYRLLNQWLLGSYLCQMWLTCDILCCTSSILNLCVIALDRYWAITDPIYAQKRTIRRVNTMIAAVWALSLVISVPPLLGWNDW-T-E--TPCTLT--------QR-LFVVYSSSGSFFIPLIIMSVVYAKIFFATKRRSVHEEKQRISLSKERKAARVLGVIMGVFVVCWLPFFLMYAIVPFCTNCPPSQRVVDFVTWLGYVNSSLNPIIYTIYNKDFRTAFSRLLRCDRRMSA-------------------
>LSHR_MOUSE
ENELSGWDYDYDFCSPKTLQCTPEPDAFNPCEDIMGYAFLRVLIWLINILAIFGNLTVLFVLLTSRYKLTVPRFLMCNLSFADFCMGLYLLLIASVDSQTKGWQTG-SGCSAAGFFTVFASELSVYTLTVITLERWHTITYAVLDQKLRLRHAIPIMLGGWIFSTLMATLPLVGVSSY----KVSICLPM-----VESTLSQVYILSILLLNAVAFVVICACYVRIYFAVQNP------ELTAPNKDTKIAKKMAILIFTDFTCMAPISFFAISAAFKVPLITVTNSKVLLVLFYPVNSCANPFLYAVFTKAFQRDFFLLLSRFGCCKHRAELYRRFNSKNGFPRSSK
>MSHR_MOUSE
STQEPQKSLLGSLNSN--ATSHLGLATNQSEPWCLYVSIPDGLFLSLGLVSLVENVLVVIAITKNRNLHSPMYYFICCLALSDLMVSVSIVLETTIILLLEVVALVQQLDNLIDVLICGSMVSSLCFLGIIAIDRYISIFYALYHSIVTLPRARRAVVGIWMVSIVSSTLFITYYY----------------------KKHTAVLLCLVTFFLAMLALMAILYAHMFTRACQHIAQKRRRSIRQGFCLKGAATLTILLGIFFLCWGPFFLHLLLIVLCPQHSCIFKNFNLFLLLIVLSSTVDPLIYAFRSQELRMTLKEVLLCSW-----------------------
>5H5A_RAT
LPINLTSFSLSTPSTLEPNRSDTEALRTSQSFLSAFRVLVLTLLGFLAAATFTWNLLVLATILRVRTFHRVPHNLVASMAISDVLVAVLVMPLSLVHELSGRWQLGRRLCQLWIACDVLCCTASIWNVTAIALDRYWSITRHLYTLRARKRVSNVMILLTWALSAVISLAPLLFGWGE-S-E-SEECQVS--------REPSYTVFSTVGAFYLPLCVVLFVYWKIYKAAKFRATVTEGDTWREQKEQRAALMVGILIGVFVLCWFPFFVTELISPLCSW-DIPALWKSIFLWLGYSNSFFNPLIYTAFNRSYSSAFKVFFSKQQ-----------------------
>BRS3_HUMAN
QTLISITNDTESSSSVVSNDNTNKGWSGDNSPGIEALCAIYITYAVIISVGILGNAILIKVFFKTKSMQTVPNIFITSLAFGDLLLLLTCVPVDATHYLAEGWLFGRIGCKVLSFIRLTSVGVSVFTLTILSADRYKAVVKPLRQPSNAILKTCVKAGCVWIVSMIFALPEAIFSNVYTMT--FESCTSY---VSKKLLQEIHSLLCFLVFYIIPLSIISVYYSLIARTLYKSIPTQSHARKQIESRKRIARTVLVLVALFALCWLPNHLLYLYHSFTSQTAMHFIFTIFSRVLAFSNSCVNPFALYWLSKSFQKHFKAQLFCCKAERPEPPVADTMGTVPGTGSIQM
>OPSD_CATBO
GGGF-GNQTVVDKVPPEMLHLVDAHWYQFPPMNPLWHAILGFVIGILGMISVIGNGMVIYIFTTTKSLRTPSNLLVINLAISDFLMMLSMSPAMVINCYYETWVLGPLVCELYGLTGSLFGCGSIWTMTMIAFDRYNVIVKGLSAKPMTINGALLRILGIWFFSLGWTIAPMFGWNRYVPEGNMTACGTD--YLTKDLLSRSYILVYSFFCYFLPLFLIIYSYFFIIQAVAAHMNVRSAENQSTSAECKLAKVALMTISLWFMAWTPYLVINYAGIFETVK-INPLFTIWGSLFAKANAVYNPIVYGISHPKYRAALFQRFPSLACSSGPAG-ADTTEGTEKPAA---
>GASR_PRANA
GSSLCHPGVSLLNSSSAGNLSCEPPRIRGTGTRELELAIRITLYAVIFLMSIGGNMLIIVVLGLSRRLRTVTNAFLLSLAVSDLLLAVACMPFTLLPNLMGTFIFGTVICKAVSYLMGVSVSVSTLNLVAIALERYSAICRPLARVWQTRSHAARVILATWLLSGLLMVPYPVYTVVQP---V-LQCMHR---WPSARVRQTWSVLLLMLLFFIPGVVMAVAYGLISRELYLGTPGASANQAKLLAKKRVVRMLLVIVLLFFLCWLPIYSANTWCAFDGPGALSGAPISFIHLLSYASACVNPLVYCFMHRRFRQACLDTCARCCPRPPRARPRPLPSIASLSRLSYT
>P2Y3_CHICK
----------------MSMANFTGGRNSCTFHEEFKQVLLPLVYSVVFLLGLPLNAVVIGQIWLARKALTRTTIYMLNLAMADLLYVCSLPLLIYNYTQKDYWPFGDFTCKFVRFQFYTNLHGSILFLTCISVQRYMGICHPLWHKKKGKKLTWLVCAAVWFIVIAQCLPTFVFASTG-----RTVCYDL-SPPDRSTSYFPYGITLTITGFLLPFAAILACYCSMARILCQK---ELIGLAVHKKKDKAVRMIIIVVIVFSISFFPFHLTKTIYLIVRSSQAFAIAYKCTRPFASMNSVLDPILFYFTQRKFRESTRYLLDKMSSKWRQDHCISY------------
>AG2R_.ENLA
-----MSNASTVETSDVERIAVNC---SKSGMHNYIFIAIPIIYSTIFVVGVFGNSMVVIVIYSYMKMKTVASIFLMNLALSDLCFVITLPLWAAYTAMHYHWPFGNFLCKVASTAITLNLYTTVFLLTCLSIDRYSAIVHPMSRIWRTAMVARLTCVGIWLVAFLASMPSIIYRQIYL-TN--TVCAIV-YDSGHIYFMVGMSLAKNIVGFLIPFLIILTSYTLIGKTLKEV------YRAQRARNDDIFKMIVAVVLLFFFCWIPYQVFTFLDVLIQMDDIVDTGMPITICIAYFNSCLNPFLYGFFGKNFRKHFLQLIKYIPPKMRTHASVNTSLSD-------T
>FSHR_PIG
FDTMYSEFDYDLCNEVVDVICSPEPDTFNPCEDIMGHDILRVLIWFISILAITGNIIVLVILITSQYKLTVPRFLMCNLAFADLCIGIYLLLIASVDIHTKTWQTG-AGCDAAGFFTVFASELSVYTLTAITLERWHTITHAMLQCKVQLRHAASIMLVGWIFAFTVALFPIFGISSY----KVSICLPM-----IDSPLSQLYVVSLLVLNVLAFVVICGCYTHIYLTVRNP------NIMSSSSDTKIAKRMAMLIFTDFLCMAPISFFAISASLKVPLITVSKSKILLVLFYPINSCANPFLYAIFTKNFRRDVFILLSKFGCYEMQAQTYRTNIHPRNGHCPPA
>MSHR_DAMDA
PVLGSQRRLLGSLNCTPPATFPLTLAPNRTGPQCLEVSIPDGLFLSLGLVSLVENVLVVAAIAKNRNLHSPMYYFICCLAMSDLLVSVSNVLETAVMLLLEAAAVVQQLDNVIDVLICGSMVSSLCFLGAIAVDRYISIFYALYHSVVTLPRAWRIIAAIWVASILTSLLFITYYY----------------------NNHTVVLLCLVGFFIAMLALMAVLYVHMLARACQHIARKRQRPIHQGFGLKGAATLTILLGVFFLCWGPFFLHLSLIVLCPQHGCIFKNFNLFLALIICNAIVDPLIYAFRSQELRKTLQEVLQCSW-----------------------
>ETBR_HORSE
LSAPPQMPKAGRTAGAQRRTLPPPPCERTIEIKETFKYINTVVSCLVFVLGIIGNSTLLRIIYKNKCMRNGPNILIASLALGDLLHIIIDIPINVYKLLAEDWPFGVEMCKLVPFIQKASVGITVLSLCALSIDRYRAVASWSIKGIGVPKWTAVEIVLIWVVSVVLAVPEAVGFDMITRI--LRICLLHQKTAFMQFYKNAKDWWLFSFYFCLPLAITAFFYTLETCEMLRKSGM-IALNDHLKQRREVAKTVFCLVLVFALCWLPLHLSRILKHTLYDQSFLLVLEYIGINMASLNSCINPIALYLVSKRFKNCFKWCLCCWCQSFE-EKQSLEFKANDHGYDNFR
>MSHR_CAPHI
PALGSPRRLLGSLNCTPPATLPLTLAPNRTGPQCLEVSIPDGLFLSLGLVSLVENVLVVAAIAKNRNLHSPMYYFICCLAMSDLLVSVSNVLETAVMLLLEAAAVVQQLDNVIDVLICSSMVSSLCFLGAIAVDRYISIFYALYHSVVTLPRAWRIIAAIWVASILTSVLSITYYY----------------------NNHTVVLLCLVGFFIAMLALMAVLYVHMLARACQHIARKRQRPIHQGFGLKGAATLTILLGVFFLCWGPFFLHLSLIVLCPQHGCIFKNFNLFLALIICNAIVDPLIYAFRSQELRKTLQEVLQCSW-----------------------
>AG2S_RAT
------MTLNSSTEDGIKRIQDDC---PKAGRHNYIFVMIPTLYSIIFVVGIFGNSLVVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPLWAVYTAMEYRWPFGNHLCKIASASVSFNLYASVFLLTCLSIDRYLAIVHPMSRLRRTMLVAKVTCIIIWLMAGLASLPAVIYRNVYF-TN--TVCAFH-YESQNSTLPIGLGLTKNILGFVFPFLIILTSYTLIWKALKKA----YKIQKNTPRNDDIFRIIMAIVLFFFFSWVPHQIFTFLDVLIQLGDIVDTAMPITICIAYFNNCLNPLFYGFLGKKFKKYFLQLLKYIPPTAKSHAGLSTRPSD-------N
>D5DR_FUGRU
NFYNETEPTEPRGGVDPLRVVTAAEDVPAPVGGVSVRALTGCVLCALIVSTLLGNTLVCAAVIKFRHRSKVTNAFVVSLAVSDLFVAVLVMPWRAVSEVAGVWLFG-RFCDTWVAFDIMCSTASILNLCVISMDRYWAISNPFYERRMTRRFAFLMIAVAWTLSVLISFIPVQLNWHRASS-EQGDCNAS--------LNRTYAISSSLISFYIPVLIMVGTYTRIFRIAQTQRASESALKTSFKRETKVLKTLSVIMGVFVFCWLPFFVLNCVVPFCDVDCVSDTTFNIFVWFGWANSSLNPVIYAFNA-DFRKAFTTILGCSKFCSSSAVQAVDYHHDTTLQK---
>PI2R_BOVIN
------------------------MADSCRNLTYVRDSVGPATSTLMFVAGVVGNGLALGILGARRHRPSAFAVLVTGLGVTDLLGTCFLSPAVFAAYARNSARGRPALCDAFAFAMTFFGLASTLILFAMAVERCLALSHPYYAQLDGPRRARLALPAIYAFCTIFCSLPFLGLGQHQQYCPGSWCFIR---MRSAEPGGCAFLLAYASLVALLVAAIVLCNGSVTLSLCRMQRRRCPRPRAGEDEVDHLILLALMTGIMAVCSLPLTPQIRGFTQAIAPDSSEMGDLLAFRFNAFNPILDPWVFILFRKSVFQRLKLWFCCLYSRPAQGDSRTSRKDSSAPPALEG
>A1AA_RABIT
----------MVFLSGNASDSSNCT-HPPAPVNISKAILLGVILGGLILFGVLGNILVILSVACHRHLHSVTHYYIVNLAVADLLLTSTVLPFSAIFEILGYWAFGRVFCNIWAAVDVLCCTASIISLCVISIDRYIGVSYPLYPTIVTQRRGLRALLCVWAFSLVISVGPLFGWRQP--AP--TICQIN--------EEPGYVLFSALGSFYVPLTIILAMYCRVYVVAKREAKNFSVRLLKFSREKKAAKTLGIVVGCFVLCWLPFFLVMPIGSFFPDFKPPETVFKIVFWLGYLNSCINPIIYPCSSQEFKKAFQNVLKIQCLRRKQSSKHALSQ----------
>OPS4_DROPS
SSGSDELQFLGWNVPPDQIQYIPEHWLTQLEPPASMHYMLGVFYIFLFFASTLGNGMVIWIFSTSKSLRTPSNMFVLNLAVFDLIMCLKAPIFIYNSFHRG-FALGNTWCQIFASIGSYSGIGAGMTNAAIGYDRYNVITKPM-NRNMTFTKAVIMNIIIWLYCTPWVVLPLTQFWDRFPEGYLTSCSFD--YLSDNFDTRLFVGTIFLFSFVVPTLMILYYYSQIVGHVFNHNVESNVDKSKETAEIRIAKAAITICFLFFVSWTPYGVMSLIGAFGDKSLLTPGATMIPACTCKLVACIEPFVYAISHPRYRMELQKRCPWLGVNEKSGEASSAQTQQTSAA----
>5H1F_RAT
-------------MDFLNSSD-QNLTSEELLNRMPSKILVSLTLSGLALMTTTINCLVITAIIVTRKLHHPANYLICSLAVTDFLVAVLVMPFSIVYIVRESWIMGQGLCDLWLSVDIICCTCSILHLSAIALDRYRAITDAVYARKRTPRHAGITITTVWVISVFISVPPLFWR----S-R--DQCIIK-------HDHIVSTIYSTFGAFYIPLVLILILYYKIYRAARTLLKHWRRQKISGTRERKAATTLGLILGAFVICWLPFFVKELVVNICEKCKISEEMSNFLAWLGYLNSLINPLIYTIFNEDFKKAFQKLVRCRN-----------------------
>NK2R_CAVPO
----MGACVIVTNTNISSGLESNTTGITAFSMPTWQLALWATAYLALVLVAVTGNATVTWIILAHQRMRTVTNYFIVNLALADLCMAAFNAAFNFVYASHNIWYFGRAFCYFQNLFPITAMFVSIYSMTAIAIDRYMAIVHPF-QPRLSAPSTKAVIGGIWLVALALAFPQCFYST---GA---TKCVVAWPEDSRDKSLLLYHLVVIVLIYLLPLTVMFVAYSIIGLTLWRRRHQHGANLRHLQAKKKFVKTMVLVVVTFAICWLPYHLYFILGSFQEDIKFIQQVYLALFWLAMSSTMYNPIIYCCLNRRFRSGFRLAFRCCPWVTPTEE----HTPSFSLRVNRC
>5H1A_RAT
-MDVFSFGQGNNTTASQEPFGTGGNVTSISDVTFSYQVITSLLLGTLIFCAVLGNACVVAAIALERSLQNVANYLIGSLAVTDLMVSVLVLPMAALYQVLNKWTLGQVTCDLFIALDVLCCTSSILHLCAIALDRYWAITDPIYVNKRTPRRAAALISLTWLIGFLISIPPMLGWRT---DP--DACTIS--------KDHGYTIYSTFGAFYIPLLLMLVLYGRIFRAARFRKNEEAKRKMALARERKTVKTLGIIMGTFILCWLPFFIVALVLPFCESSHMPALLGAIINWLGYSNSLLNPVIYAYFNKDFQNAFKKIIKCKFCRR--------------------
>ACM2_PIG
--------------MNNSTNSSNSGLALTSPYKTFEVVFIVLVAGSLSLVTIIGNILVMVSIKVNRHLQTVNNYFLFSLACADLIIGVFSMNLYTLYTVIGYWPLGPVVCDLWLALDYVVSNASVMNLLIISFDRYFCVTKPLYPVKRTTKMAGMMIAAAWVLSFILWAPAILFWQFIV-VE-DGECYIQ------FFSNAAVTFGTAIAAFYLPVIIMTVLYWHISRASKSRVKMPAKKKPPPSREKKVTRTILAILLAFIITWAPYNVMVLINTFCAPC-IPNTVWTIGYWLCYINSTINPACYALCNATFKKTFKHLLMCHYKNIGATR----------------
>GPRV_HUMAN
-----------------------MPFPNCSAPSTVVATAVGVLLGLECGLGLLGNAVALWTFLFRVRVWKPYAVYLLNLALADLLLAACLPFLAAFYLSLQAWHLGRVGCWALRFLLDLSRSVGMAFLAAVALDRYLRVVHPRKVNLLSPQAALGVSGLVWLLMVALTCPGLLISEAAQ--S--TRCHSF-YSRADGSFSIIWQEALSCLQFVLPFGLIVFCNAGIIRALQKR----LREPEKQPKLQRAQALVTLVVVLFALCFLPCFLARVLMHIFQNLCAVAHTSDVTGSLTYLHSVVNPVVYCFSSPTFRSSYRRVFHTLRGKGQAAEPPDF------------
>YT66_CAEEL
----------------------------------------MGVTFHPGIVGNITNLMVLASRR-------LRAMYLRALAVADLLCMLFVLVFVSTEYLAKNKLYQIYQCHLMLTLINWALGAGVYVVVALSLERYISIVFPMFRTWNSPQRATRAIVIAFLIPAIFYVPYAITRYKGK---VTIYSMDD---IYTTFYWQIYKWTREAILRFLPIIILTVLNIQIMIAFRKRMFQNKRKEQGTQKDDTLMYMLGGTVLMSLVCNIPAAINLLLIDETLKKLDYQIFRAVANILEITNHASQFYVFCACSTDYRTTFLQKFPCFKTDYANRDRLRSVIQKQGSVEHTT
>RGR_MOUSE
---------------------MAATRALPAGLGELEVLAVGTVLLMEALSGISLNGLTIFSFCKTPDLRTPSNLLVLSLALADTGISLNALVAAVSSLLRR-WPHGSEGCQVHGFQGFATALASICGSAAVAWGRYHHYCTRR---QLAWDTAIPLVLFVWMSSAFWASLPLMGWGHYDYEPVGTCCTLD--YSRGDRNFISFLFTMAFFNFLVPLFITHTSYRFME--------------QKFSRSGHLPVNTTLPGRMLLLGWGPYALLYLYAAIADVSFISPKLQMVPALIAKTMPTINAINYALHREMVCRGTWQCLSPQKSKKDRTQA---------------
>A2AA_PIG
----MGSLQPEAGNASWNGTEAPGGGARATPYSLQVTLTLVCLAGLLMLFTVFGNVLVIIAVFTSRALKAPQNLFLVSLASADILVATLVIPFSLANEVMGYWYFGKAWCEIYLALDVLFCTSSIVHLCAISLDRYWSITQAIYNLKRTPRRIKAIIVTVWVISAVISFPPLISIEKKAQ-P--PRCEIN--------DQKWYVISSCIGSFFAPCLIMILVYVRIYQIAKRRRGGASRWRGRQNREKRFTFVLAVVIGVFVVCWFPFFFTYTLTAVGCS--VPPTLFKFFFWFGYCNSSLNPVIYTIFNHDFRRAFKKILCRGDRKRIV------------------
>PI2R_HUMAN
------------------------MADSCRNLTYVRGSVGPATSTLMFVAGVVGNGLALGILSARRPRPSAFAVLVTGLAATDLLGTSFLSPAVFVAYARNSARGGPALCDAFAFAMTFFGLASMLILFAMAVERCLALSHPYYAQLDGPRCARLALPAIYAFCVLFCALPLLGLGQHQQYCPGSWCFLR---MRWAQPGGAAFSLAYAGLVALLVAAIFLCNGSVTLSLCRMKRHLGPRPRTGEDEVDHLILLALMTVVMAVCSLPLTIRCFTQAVAPDS-SSEMGDLLAFRFYAFNPILDPWVFILFRKAVFQRLKLWVCCLCLGPAHGDSQTPRRDPRAPSAPVG
>AA3R_CANFA
-------------------------MAVNGTALLLANVTYITVEILIGLCAIVGNVLVIWVVKLNPSLQTTTFYFIVSLALADIAVGVLVMPLAIVISLG--ITIQFYNCLFMTCLLLIFTHASIMSLLAIAVDRYLRVKLTVYRRVTTQRRIWLALGLCWLVSFLVGLTPMFGWN-MKE-H-FLSCQFS-----SVMRMDYMVYFSFFTWILIPLVVMCAIYLDIFYVIRNK--NSKETGAFYGREFKTAKSLFLVLFLFAFSWLPLSIINCITYFHGE--VPQIILYLGILLSHANSMMNPIVYAYKIKKFKETYLLIFKTYMICQSSDSLDSS------------
>CB1R_MOUSE
EFYNKSLSSFKENEDNIQCGENFMDMECFMILNPSQQLAIAVLSLTLGTFTVLENLLVLCVILHSRSRCRPSYHFIGSLAVADLLGSVIFVYSFVDFHVFHR-KDSPNVFLFKLGGVTASFTASVGSLFLTAIDRYISIHRPLYKRIVTRPKAVVAFCLMWTIAIVIAVLPLLGWNCK-K--LQSVCSDI------FPLIDETYLMFWIGVTSVLLLFIVYAYMYILWKAHSHSEDQVTRPDQARMDIRLAKTLVLILVVLIICWGPLLAIMVYDVFGKMNKLIKTVFAFCSMLCLLNSTVNPIIYALRSKDLRHAFRSMFPSCEGTAQPLDNSMGHANNTASMHRAA
>OPSD_CANFA
MNGTEGPNFYVPFSNKTGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNVEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGMQCSCGIDYYTLKPEINNESFVIYMFVVHFAIPMIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSDFGPIFMTLPAFFAKSSSIYNPVIYIMMNKQFRNCMITTLCCGKNPLGDDEASASASKTETSQVAPA
>BRS3_CAVPO
QTLISITNDTESSSSVVSNDTTNKGWTGDNSPGIEALCAIYITYAVIISVGILGNAILIKVFFKTKSMQTVPNIFITSLALGDLLLLLTCVPVDATHYLAEGWLFGRIGCKVLSFIRLTSVGVSVFTLTILSADRYKAVVKPLRQPSNAILKTCAKAGCIWIMSMIFALPEAIFSNVHTMT--SEWCAFY---VSEKLLQEIHALLSFLVFYIIPLSIISVYYSLIARTLYKSIPTQSHARKQVESRKRIAKTVLVLVALFALCWLPNHLLNLYHSFTHKAAIHFIVTIFSRVLAFSNSCVNPFALYWLSKTFQKQFKAQLFCCKGELPEPPLAATMGRVSGTENTHI
>OPSD_DIPAN
MNGTEGPFFYVPMVNTTGIVRSPYEYPQYYLVNPAAYAALGAYMFLLILVGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVLGGFTTTMYTSMHGYFVLGRLGCNIEGFFATLGGEIALWSLVVLAIERWVVVCKPISNFRFGENHAIMGLAFTWTMAMACAAPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFICHFTIPLTVVFFCYGRLLCAVKEAAAAQQESETTQRAEKEVTRMVIMMVIAFLVCWLPYASVAWYIFTHQGSEFGPVFMTIPAFFAKSSSIYNPMIYICLNKQFRHCMITTLCCGKNPFEEEEGASTSVSSSSVSPAA-
>AG2R_MOUSE
------MALNSSTEDGIKRIQDDC---PRAGRHSYIFVMIPTLYSIIFVVGIFGNSLVVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPLWAVYTAMEYRWPFGNHLCKIASASVSFNLYASVFLLTCLSIDRYLAIVHPMSRLRRTMLVAKVTCIIIWLMAGLASLPAVIHRNVYF-TN--TVCAFH-YESRNSTLPIGLGLTKNILGFLFPFLIILTSYTLIWKALKKA----YEIQKNKPRNDDIFRIIMAIVLFFFFSWVPHQIFTFLDVLIQLGDIVDTAMPITICIAYFNNCLNPLFYGFLGKKFKKYFLQLLKYIPPKAKSHSSLSTRPSD-------N
>YMJC_CAEEL
NLRLNESPYKYVMSNNTTIPSCLTDRQMSLSVSSTEGVLIGTIIPILVLFGISGNILNLTVLLAPNL-RTRSNQLLACLAVADIVSLVVILPHSMAHYETFERKFYGKYKFQIIAMTNWSIATATWLVFVICLERLIIIKYPLPRNVVTIIVVTTFILTSYNHVSHACAEKLFCNGTQY--S-RWFRNEPPNSEFMKSVVRVAPQVNAIFVVLIPVVLVIIFNVMLILTLRQRKTISQFTQLQSKTEHKVTITVTAIVTCFTITQSPSAFVTFLSSYVH--RDWVTLSAICTILVVLGKALNFVLFCLSSASFRQRLLMQTKQGILRKSTRYTSVA------------
>OPSG_ASTFA
NE--ETTRESAFVYTNANNTRDPFEGPNYHIAPRWVYNLASLWMIIVVIASIFTNSLVIVATAKFKKLRHPLNWILVNLAIADLGETVLASTISVFNQVFGYFVLGHPMCIFEGWTVSVCGITALWSLTIISWERWVVVCKPFGNVKFDGKWAAGGIIFAWTWAIIWCTPPIFGWSRYWPHGLKTSCGPDVFSGSEDPGVASYMVTLLLTCCILPLSVIIICYIFVWNAIHQVAQQQKDSESTQKAEKEVSRMVVVMILAFILCWGPYASFATFSALNPGYAWHPLAAALPAYFAKSATIYNPIIYVFMNRQFRSCIMQLFGKKVEDA-----SEVSTAS--------
>DUFF_HUMAN
FEDVWNSSYGVNDSFPDGDYDANLEAAAPCHSCNLLDDSALPFFILTSVLGILASSTVLFMLFRPLFQLCPGWPVLAQLAVGSALFSIVVPVLAPGLGST--RSSALCSLGYCVWYGSAFAQALLLGCHASLGHRLGAGQ--------VPGLTLGLTVGIWGVAALLTLPVTLASG------SGGLCTLI-YSTELKALQATHTVACLAIFVLLPLGLFGAKGLKKA-------------------LGMGPGPWMNILWAWFIFWWPHGVVLGLDFLVRSKQALDLLLNLAEALAILHCVATPLLLALFCHQATRTLLPSLPLPEGWSSHLDTLGS------------
>OPR._RAT
GSHFQGNLSLLN---ETVPHHLLLNASHSAFLPLGLKVTIVGLYLAVCIGGLLGNCLVMYVILRHTKMKTATNIYIFNLALADTLVLLTLPFQGTDILLGF-WPFGNALCKTVIAIDYYNMFTSTFTLTAMSVDRYVAICHPIALDVRTSSKAQAVNVAIWALASVVGVPVAIMGSAQVEE---IECLVE-IPAPQDYWGPVFAICIFLFSFIIPVLIISVCYSLMIRRLRGVRLL-SGSREKDRNLRRITRLVLVVVAVFVGCWTPVQVFVLVQGLGVQPETAVAILRFCTALGYVNSCLNPILYAFLDENFKACFRKFCCASSLHREMQVSDRVGLGCKTSETVPR
>TRFR_SHEEP
------------MENETGSELNQTQLQPRAVVALEYQVVTILLVLIICGLGIVGNIMVVLVVMRTKHMRTPTNCYLVSLAVADLMVLVAAGLPNITDSIYGSWVYGYVGCLCITYLQYLGINASSCSITAFTIERYIAICHPIAQFLCTFSRAKKIIIFVWAFTSIYCMLWFFLLDLN--DA-SCGYKIS------RNYYSPIYLMDFGVFYVVPMILATVLYGFIARILFLSLNSNRYFNSTVSSRKQVTKMLAVVVILFALLWMPYRTLVVVNSFLSSPFQENWFLLFCRICIYLNSAINPVIYNLMSQKFRAAFRKLCNCKQKPVEKPANYSVKESDHFSTELDD
>ACTR_MESAU
-------------MKHIITPYEHTNDTARNNSDCPDVVLPEEIFFTISIIGVLENLIVLLAVVKNKNLQCPMYFFICSLAISDMLGSLYKILENILIMFRNRGNFESTADDIIDCMFILSLLGSIFSLSVIAADRYITIFHALYHSIVTMRRTIITLTVIWIFCTGSGIAMVIFFS----------------------HHHVPTVLTFTSLFPLMLVFILCLYIHMFLLARSH---ARKISTLPRANMKGAITLTILLGVFIFCWAPFILHVLLMTFCPNNVCYMSLFQINGMLIMCNAVIDPFIYAFRSPELRDAFKKMFSCHRYQ---------------------
>CKR7_HUMAN
QDEVTDDYIGDNTTVDYTLFESLC---SKKDVRNFKAWFLPIMYSIICFVGLLGNGLVVLTYIYFKRLKTMTDTYLLNLAVADILFLLTLPFWAYSAAKS--WVFGVHFCKLIFAIYKMSFFSGMLLLLCISIDRYVAIVQAVRHRARVLLISKLSCVGIWILATVLSIPELLYSDLQR-SE-AMRCSLI---TEHVEAFITIQVAQMVIGFLVPLLAMSFCYLVIIRTL---------LQARNFERNKAIKVIIAVVVVFIVFQLPYNGVVLAQTVANFNKQLNIAYDVTYSLACVRCCVNPFLYAFIGVKFRNDLFKLFKDLGCLSQEQLRQWS--------HIRR
>O.YR_SHEEP
GAFAANWSAEAVNGSAAPPGTEGNRTAGPPQRNEALARVEVAVLSLILFLALSGNACVLLALRTTRHKHSRLFFFMKHLSIADLAVAVFQVLPQLLWDITFRFYGPDLLCRLVKYLQVVGMFASTYLLLLMSLDRCLAICQPL--RSLRRRTDRLAVLATWLGCLVASAPQVHIFSLRE----VFDCWAV---FIQPWGPKAYITWITLAVYIVPVIVLAACYGLISFKIWQNRAAVSNVKLISKAKIRTVKMTFIVVLAFIVCWTPFFFKQMWSVWDADA-KEASAFIIAMLLASLNSCCNPWIYMLFTGHLFQDLVQRFLCCSFRRLKGSQLGEHSYTFVLSRHSS
>GPR7_HUMAN
MDNASFSEPWPANASGPDPALSCSNASTLAPLPAPLAVAVPVVYAVICAVGLAGNSAVLYVLLRAPRMKTVTNLFILNLAIADELFTLVLPINIADFLLRQ-WPFGELMCKLIVAIDQYNTFSSLYFLTVMSADRYLVVLATARVAGRTYSAARAVSLAVWGIVTLVVLPFAVFARLD--QG-RRQCVLV-FPQPEAFWWRASRLYTLVLGFAIPVSTICVLYTTLLCRLHAMR--DSHAKALERAKKRVTFLVVAILAVCLLCWTPYHLSTVVALTTDLPPLVIAISYFITSLTYANSCLNPFLYAFLDASFRRNLRQLITCRAAA---------------------
>GPRJ_HUMAN
TETATPLPSQYLMELSEEHSWMSNQTDLHYVLKPGEVATASIFFGILWLFSIFGNSLVCLVIHRSRRTQSTTNYFVVSMACADLLISVASTPFVLLQFTTGRWTLGSATCKVVRYFQYLTPGVQIYVLLSICIDRFYTIVYPL-SFKVSREKAKKMIAASWIFDAGFVTPVLFFYG---SNW-HCNYFLP-----SSWEGTAYTVIHFLVGFVIPSVLIILFYQKVIKYIWRIDGR-RTMNIVPRTKVKTIKMFLILNLLFLLSWLPFHVAQLWHPHEQDYKKSSLVFTAITWISFSSSASKPTLYSIYNANFRRGMKETFCMSSMKCYRSNAYTIKKNYVGISEIPS
>OAJ1_HUMAN
--MLLCFRFGNQSMKRENFTLITDFVFQGFSSFHEQQITLFGVFLALYILTLAGNIIIVTIIRIDLHLHTPMYFFLSMLSTSETVYTLVILPRMLSSLVGMSQPMSLAGCATQMFFFVTFGITNCFLLTAMGYDRYVAICNPLYMVIMNKRLRIQLVLGACSIGLIVAITQVTSVFRL---PPHFFCDIRKLSCIDTTVNEILTLIISVLVLVVPMGLVFISYVLIISTI--------LKIASVEGRKKAFATCASHLTVVIVHYSCASIAYLKPKSEN---TREHDQLISVTYTVITPLLNPVVYTLRNKEVKDALCRAVGGKFS----------------------
>RDC1_CANFA
YAEPGNFSDISWPCNSSDCIVVDTVLCPNMPNKSVLLYTLSFIYIFIFVIGMIANSVVVWVNIQAKTTGYDTHCYILNLAIADLWVVVTIPVWVVSLVQHNQWPMGELTCKITHLIFSINLFGSIFFLTCMSVDRYLSITYFATSSRRKKVVRRAVCVLVWLLAFCVSLPDTYYLKTVTNNE--TYCRSFYPEHSVKEWLISMELVSVVLGFAIPFCVIAVFYCLLARAI---------SASSDQEKQSSRKIIFSYVVVFLVCWLPYHVVVLLDIFSILHNFLFTALHVTQCLSLVHCCVNPVLYSFINRNYRYELMKAFIFKYSAKTGLTKLIDEYSALEQNAK--
>CML1_RAT
EYEGYNDSSIYGEEYSDGSDYIVDLEEAGPLEAKVAEVFLVVIYSLVCFLGILGNGLVIVIATFKMK-KTVNTVWFVNLAVADFLFNIFLPIHITYAAMDYHWVFGKAMCKISSFLLSHNMYTSVFLLTVISFDRCISVLLPVSQNHRSVRLAYMTCVVVWVWLSSESPPSLVFGHVST-SF--HSTHPR-TDPVGYSRHVAVTVTRFLCGFLIPVFIITACYLTIVFKL---------QRNRQAKTKKPFKIIITIIITFFLCWCPYHTLYLLELHHTAVSVFSLGLPLATAVAIANSCMNPILYVFMGHDFKK-FKVALFSRLVNALSEDTGPSFTKMSSLIEKAS
>FSHR_EQUAS
--MMYSEFDYDLCNEVVDVTCSPKPDAFNPCEDIMGYDILRVLIWFISILAITGNIIVLVILITSQYKLTVPRFLMCNLAFADLCIGIYLLLIASVDIHTKSWQTG-AGCDAAGFFTVFGSELSVYTLTAITLERWHTITHAMLECKVQLRHAASVMLVGWIFGFGVGLLPIFGISTY----KVSICLPM-----IDSPLSQLYVMSLLVLNVLAFVVICGCYTHIYLTVRNP------NIVSSSSDTKIAKRMGILIFTDFLCMAPISFFGISASLKVALITVSKSKILLVLFYPINSCANPFLYAIFTKNFRRDFFILLSKFGCYEMQAQTYRTISHPKNGPCPPT
>OPSD_MULSU
MNGTEGPYFYIPMVNTTGIVRSPYDYPQYYLVNPAAYAALGAYMFFLILVGFPINFLTLYVTIEHKKLRTPLNYILLNLAVANLFMVFGGFTTTMYTSMHGYFVLGRLGCNLEGFFATLGGEIALWSLVVLAVERWMVVCKPISNFRFGENHAIMGLAMTWLMASACAVPPLVGWSRYIPEGMQCSCGVDYYTRAEGFNNESFVVYMFCCHFMIPLIIVFFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVVIMVIAFLVCWLPYASVAWWIFTHQGSEFGPVFMTIPAFFAKSSSIYNPMIYICMNKQFRNCMITTLCCGKNPFEEEEGASSSSVSSSSVSPAA
>NK3R_HUMAN
GNLSSSPSALGLPVASPAPSQPWANLTNQFVQPSWRIALWSLAYGVVVAVAVLGNLIVIWIILAHKRMRTVTNYFLVNLAFSDASMAAFNTLVNFIYALHSEWYFGANYCRFQNFFPITAVFASIYSMTAIAVDRYMAIIDPL-KPRLSATATKIVIGSIWILAFLLAFPQCLYSK---GR---TLCFVQ--WPEGPKQHFTYHIIVIILVYCFPLLIMGITYTIVGITLWGG--PCDKYHEQLKAKRKVVKMMIIVVMTFAICWLPYHIYFILTAIYQQLKYIQQVYLASFWLAMSSTMYNPIIYCCLNKRFRAGFKRAFRWCPFIKVSSY----TTRFHPNRQSSM
>CKR5_CERTO
----MDYQVSSPTYDIDYYTSEPC---QKINVKQIAARLLPPLYSLVFIFGFVGNILVVLILINCKRLKSMTDIYLLNLAISDLLFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQR-EG-HYTCSPHFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKRFCKCCSIFA-------SSVY
>PI2R_MOUSE
DGHPGPPSVTPGSPLSAGGREWQGMAGSCWNITYVQDSVGPATSTLMFVAGVVGNGLALGILGARRRHPSAFAVLVTGLAVTDLLGTCFLSPAVFVAYARNSAHGGTMLCDTFAFAMTFFGLASTLILFAMAVERCLALSHPYYAQLDGPRCARFALPSIYAFCCLFCSLPLLGLGEHQQYCPGSWCFIR---MRSAQPGGCAFSLAYASLMALLVTSIFFCNGSVTLSLYHMRRHFVPTSRAREDEVYHLILLALMTVIMAVCSLPLMIRGFTQAIAPDS--REMGDLLAFRFNAFNPILDPWVFILFRKAVFQRLKFWLCCLCARSVHGDLQAPRRDPPAPTSLQA
>FSHR_CHICK
FGPVENEFDYGLCNEVVDFVCSPKPDAFNPCEDIMGYNVLRVLIWFINILAITGNTTVLIILISSQYKLTVPRFLMCNLAFADLCIGIYLLFIASVDIQTKSWQTG-AGCNAAGFFTVFASELSVYTLTVITLERWHTITYAMLNRKVRLRHAVIIMVFGWMFAFTVALLPIFGISSY----KVSICLPM-----IETPFSQAYVIFLLVLNVLAFVIICICYICIYFTVRNP------NVISSNSDTKIAKRMAILIFTDFLCMAPISFFAISASLRVPLITVSKSKILLVLFYPINSCANPFLYAIFTKTFRRDFFILLSKFGCCEMQAQIYRTNFHTRNGHYPTA
>OPRM_BOVIN
FSHLEGNLSDPCGPNRTELGGSDRLCPSAGSPSMITAIIIMALYSIVCVVGLFGNFLVMYVIVRYTKMKTATNIYIFNLALADALATSTLPFQSVNYLMGT-WPFGTILCKIVISIDYYNMFTSIFTLCTMSVDRYIAVCHPVALDLRTPRNAKIINICNWILSSAIGLPVMFMATTKYGS---IDCTLT-FSHPTWYWENLLKICVFIFAFIMPILIITVCYGLMILRLKSVRML-SGSKEKDRNLRRITRMVLVVVAVFIVCWTPIHIYVIIKALITIPTFQTVSWHFCIALGYTNSCLNPVLYAFLDENFKRCFREFCIPTSSTIEQQNSTRIPSTANTVDRTNH
>P2Y5_HUMAN
--------------------MVSVNSSHCFYNDSFKYTLYGCMFSMVFVLGLVSNCVAIYIFICVLKVRNETTTYMINLAMSDLLFVFTLPFRIFYFTTRN-WPFGDLLCKISVMLFYTNMYGSILFLTCISVDRFLAIVYPFSKTLRTKRNAKIVCTGVWLTVIGGSAPAVFVQSTHS---ASEACFENFPEATWKTYLSRIVIFIEIVGFFIPLILNVTCSSMVLKTLTKP----VTLSRSKINKTKVLKMIFVHLIIFCFCFVPYNINLILYSLVRTQAAVRTMYPITLCIAVSNCCFDPIVYYFTSDTIQNSIKMKNWSVRRSDFRFSEVHGNLQTLKSKIFDN
>HM74_HUMAN
----------MNRHHLQDHFLEIDKKNCCVFRDDFIAKVLPPVLGLEFIFGLLGNGLALWIFCFHLKSWKSSRIFLFNLAVADFLLIICLPFVMDYYVRRSDWNFGDIPCRLVLFMFAMNRQGSIIFLTVVAVDRYFRVVHPHALNKISNWTAAIISCLLWGITVGLTVHLLKKKLLIQ-PA--NVCISF-----SICHTFRWHEAMFLLEFLLPLGIILFCSARIIWSLRQR------QMDRHAKIKRAITFIMVVAIVFVICFLPSVVVRIRIFWLLHTRSVDLAFFITLSFTYMNSMLDPVVYYFSSPSFPNFFSTLINRCLQRKMTGEPDNNTGDPNKTRGAPE
>OPSD_DICLA
MNGTEGPFFYVPMVNTTGIVRSPYDYPQYYLVSPAAYAALGAYMFLLILLGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVLGRLGCNMEGFFATLGGEIGLWSLVVLAVERWLVVCKPISNFRFGENHAIMGLAFTWVMACSCAVPPLVGWSRYIPEGMQCSCGVDYYTRAEGFNNESFVIYMFACHFIIPMCVVFFCYGRLLCAVKEAAAAQQESETTQRAEKEVTRMVVIMGIAFLICWCPYASVAWYIFTHQGSEFGPVFMTLPAFFAKTSSVYNPLIYILMNKQFRHCMITTLCCGKNPFEEEEGASTSVSSSSVSPAA-
>GPRW_HUMAN
GTRGCSDRQPGVLTRDRSCSRKMNSSGCLSEEVGSLRPLTVVILSASIVVGVLGNGLVLWMTVFRMA-RTVSTVCFFHLALADFMLSLSLPIAMYYIVSRQ-WLLGEWACKLYITFVFLSYFASNCLLVFISVDRCISVLYPVALNHRTVQRASWLAFGVWLLAAALCSAHLKFRTTR-FNSNETAQIWI-----VVEGHIIGTIGHFLLGFLGPLAIIGTCAHLIRAKL---------LREGWVHANRPKRLLLVLVSAFFIFWSPFNVVLLVHLWRRV-PRMLLILQASFALGCVNSSLNPFLYVFVGRDFQEKFFQSLTSALARAFGEEEFLSPRE---------
>UL33_HSV6U
--------------------MDTVIELSKLQFKGNASCTSTPTLKTARIMESAVTGITLTTSIPMIIKHNATSFYVITLFASDFVLMWCVFFMTVNRKQL--FSFNRFFCQLVYFIYHAVCSYSISMLAIIATIRYK-TLHRRKKTESKTSSTGRNIGILLLASSMCAIPTALFVKTNGM-KKTGKCVVYSSKKAY-ELFLAVKIVFSFIWGVLPTMVFSFFYVIFCKAL---------HDVTEKKYKKTLFFIRILLLSFLLIQIPYIAILICEIAFLYMARVEILQLIIRLMPQVHCFSNPLVYAFTGGELRNRFTACFQSFFPKTLCSTQKRKDQNSKSKASVEK
>A1AB_HUMAN
HNTSAPAHWGELKNANFTGPNQTSSNSTLPQLDITRAISVGLVLGAFILFAIVGNILVILSVACNRHLRTPTNYFIVNLAMADLLLSFTVLPFSAALEVLGYWVLGRIFCDIWAAVDVLCCTASILSLCAISIDRYIGVRYSLYPTLVTRRKAILALLSVWVLSTVISIGPLLGWKEP--AP--KECGVT--------EEPFYALFSSLGSFYIPLAVILVMYCRVYIVAKRTHNPIAVKLFKFSREKKAAKTLGIVVGMFILCWLPFFIALPLGSLFSTLKPPDAVFKVVFWLGYFNSCLNPIIYPCSSKEFKRAFVRILGCQCRGRRRRRRRRRTYRPWTRGGSLE
>OPSD_SPHSP
GG-Y-GNQTVVDKVLPEMLHLIDPHWYQFPPMNPLWHGLLGFVIGCLGFVSVVGNGMVIYIFSTTKGLRTPSNLLVVNLAFSDFLMMLSMSPPMVINCYYETWVLGPFMCELYALLGSLFGCGSIWTMVMIALDRYNVIVKGLAAKPMTNKTAMLRILGIWAMSIAWTVFPLFGWNRYVPEGNMTACGTD--YLNKEWVSRSYILVYSVFVYFLPLATIIYSYWFIVQAVSAHMNVRSAENANTSAECKLAKVALMTISLWFFAWTPYLVTDFSGIFEWGK-ISPLATIWCSLFAKANAVYNPIVYGISHPKYRAALNKKFPSLACASEPDDTASQSDEKSASA----
>ET1R_RAT
EFNFLGTTLQPPNLALPSNGSMHGYCPQQTKITTAFKYINTVISCTIFIVGMVGNATLLRIIYQNKCMRNGPNALIASLALGDLIYVVIDLPINVFKLLAGRNDFGVFLCKLFPFLQKSSVGITVLNLCALSVDRYRAVASWSVQGIGIPLITAIEIVSIWILSFILAIPEAIGFVMVPRT--HRTCMLNATTKFMEFYQDVKDWWLFGFYFCMPLVCTAIFYTLMTCEMLNRGSL-IALSEHLKQRREVAKTVFCLVVIFALCWFPLHLSRILKKTVYDESFLLLMDYIGINLATMNSCINPIALYFVSKKFKNCFQSCLCCCCHQSKSLMTSVPWKNQEQN-HNTE
>B1AR_PIG
LPDGAATAARLLVPASPPASLLTPASEGSVQLSQQWTAGMGLLMALIVLLIVAGNVLVIVAIAKTPRLQTLTNLFIMSLASADLVMGLLVVPFGATIVVWGRWEYGSFFCELWTSVDVLCVTASIETLCVIALDRYLAITSPFYQSLLTRAAR-ALVCTVWAISALVSFLPILMHWWRDR-R--KCCDFV--------TNRAYAIASSVVSFYVPLCIMAFVYLRVFREAQKQNGRRRPSRLVALREQKALKTLGIIMGVFTLCWLPFFLANVVKAFHRDL-VPDRLFVFFNWLGYANSAFNPIIYCRSP-DFRKAFQRLLCCARRVARGSCAAAGCLAVARPPPSPG
>OLF3_RAT
-------------MDSSNRTRVSEFLLLGFVENKDLQPLIYGLFLSMYLVTVIGNISIIVAIISDPCLHTPMYFFLSNLSFVDICFISTTVPKMLVNIQTQNNVITYAGCITQIYFFLLFVELDNFLLTIMAYDRYVAICHPMYTVIMNYKLCGFLVLVSWIVSVLHALFQSLMMLAL---PPHYFCEPNQLTCSDAFLNDLVIYFTLVLLATVPLAGIFYSYFKIVSSI--------CAISSVHGKYKAFSTCASHLSVVSLFYCTGLGVYLSSAANN---SSQASATASVMYTVVTPMVNPFIYSLRNKDVKSVLKKTLCEEVIRSPPSLLHFFCFIFCY------
>5H5A_HUMAN
LPVNLTSFSLSTPSPLETNHSGKDDLRPSSPLLSVFGVLILTLLGFLVAATFAWNLLVLATILRVRTFHRVPHNLVASMAVSDVLVAALVMPLSLVHELSGRWQLGRRLCQLWIACDVLCCTASIWNVTAIALDRYWSITRHMYTLRTRKCVSNVMIALTWALSAVISLAPLLFGWGE-S-E-SEECQVS--------REPSYAVFSTVGAFYLPLCVVLFVYWKIYKAAKFRATVPEGDTWREQKEQRAALMVGILIGVFVLCWIPFFLTELISPLCSC-DIPAIWKSIFLWLGYSNSFFNPLIYTAFNKNYNSAFKNFFSRQH-----------------------
>OPRD_RAT
FSLLANVSDTFPSAFPSASANASGSPGARSASSLALAIAITALYSAVCAVGLLGNVLVMFGIVRYTKLKTATNIYIFNLALADALATSTLPFQSAKYLMET-WPFGELLCKAVLSIDYYNMFTSIFTLTMMSVDRYIAVCHPVALDFRTPAKAKLINICIWVLASGVGVPIMVMAVTQPGA---VVCTLQ-FPSPSWYWDTVTKICVFLFAFVVPILIITVCYGLMLLRLRSVRLL-SGSKEKDRSLRRITRMVLVVVGAFVVCWAPIHIFVIVWTLVDINPLVVAALHLCIALGYANSSLNPVLYAFLDENFKRCFRQLCRAPCGGQEPGSLRRPRVTACTPSDGPG
>CKRV_MOUSE
EIPAVTEPSYNTVAKNDFMSGFLC---FSINVRAFGITVPTPLYSLVFIIGVIGHVLVVLVLIQHKRLRNMTSIYLFNLAISDLVFLSTLPFWVDYIMKGD-WIFGNAMCKFVSGFYYLGLYSDMFFITLLTIDRYLAVVHVVALRARTVTFGIISSIITWVLAALVSIPCLYVFKSQM-EF-YHTCRAILPRKSLIRFLRFQALTMNILGLILPLLAMIICYTRIINVL---------HRRPNKKKAKVMRLIFVITLLFFLLLAPYYLAAFVSAFEDVLQQVDLSLMITEALAYTHCCVNPVIYVFVGKRFRKYLWQLFRRHTAITLPQWLPFLA-------SARL
>THRR_RAT
PLEGRAVYLNKSRFPPMPPPPFISEDASGYLTSPWLTLFIPSVYTFVFIVSLPLNILAIAVFVFRMKVKKPAVVYMLHLAMADVLFVSVLPFKISYYFSGTDWQFGSGMCRFATAACYCNMYASIMLMTVISIDRFLAVVYPISLSWRTLGRANFTCVVIWVMAIMGVVPLLLKEQTTQ--N--TTCHDVLNETLLHGFYSYYFSAFSAIFFLVPLIISTVCYTSIIRCL------SSSAVANRSKKSRALFLSAAVFCIFIVCFGPTNVLLIVHYLLLSDETAYFAYLLCVCVTSVASCIDPLIYYYASSECQKHLYSILCCRESSDSNSCNSTGDTCS--------
>DADR_PIG
--------------MRTLNTSTMDGTGLVVERDFSFRILTACFLSLLILSTLLGNTLVCAAVIRFRHRSKVTNFFVISLAVSDLLVAVLVMPWKAVAEIAGFWPFG-SFCNIWVAFDIMCSTASILNLCVISVDRYWAISSPFYERKMTPKAAFILISVAWTLSVLISFIPVQLSWHKAGN-TTHNCDSS--------LSRTYAISSSLISFYIPVAIMIVTYTRIYRIAQKQAECESSFKMSFKRETKVLKTLSVIMGVFVCCWLPFFILNCMVPFCGSGCIDSITFDVFVWFGWANSSLNPIIYAFNA-DFRKAFSTLLGCYRLCPTSTNAIETAVVFSSH-----
>GRHR_BOVIN
--MANSDSPEQNENHCSAINSSIPLTPGSLPTLTLSGKIRVTVTFFLFLLSTIFNTSFLLKLQNWTQKLSRMKLLLKHLTLANLLETLIVMPLDGMWNITVQWYAGELLCKVLSYLKLFSMYAPAFMMVVISLDRSLAITKPL-AVKSNSKLGQFMIGLAWLLSSIFAGPQLYIFGMIHEG--FSQCVTH--SFPQWWHQAFYNFFTFSCLFIIPLLIMVICNAKIIFTLTRVPHKNQSKNNIPRARLRTLKMTVAFATSFTVCWTPYYVLGIWYWFDPDMRVSDPVNHFFFLFAFLNPCFDPLIYGYFSL-------------------------------------
>OLF3_CHICK
-------------MASGNCTTPTTFILSGLTDNPGLQMPLFMVFLAIYTITLLTNLGLIRLISVDLHLQTPMYIFLQNLSFTDAAYSTVITPKMLATFLEERKTISYVGCILQYFSFVLLTTSECLLLAVMAYDRYVAICKPLYPAIMTKAVCWRLVESLYFLAFLNSLVHTCGLLKL---SNHFFCDISQISSSSIAISELLVIISGSLFVMSSIIIILISYVFIILTV--------VMIRSKDGKYKAFSTCTSHLMAVSLFHGTVIFMYLRPVKLF---SLDTDKIASLFYTVVIPMLNPLIYSWRNKEVKDALRRLTATTFGFIDSKAVQ--------------
>ML1A_PHOSU
-------MKGNGSTLLNASQQAPGVGEGGGPRPSWLASTLAFILIFTIVVDILGNLLVILSVYRNKKLRNAGNIFVVSLAIADLVVAIYPYPLVLTSIFNNGWNLGYLHCQISAFLMGLSVIGSIFNITGIAINRYCYICHSLYDRLYSNKNSLCYVFLIWVLTLVAIMPNLQTGT-LQYDP-IYSCTFT------QSVSSAYTIAVVVFHFIVPMIIVIFCYLRIWILVLQVRR-PDSKPRLKPQDFRNFVTMFVVFVLFAICWAPLNFIGLIVASDPATRIPEWLFVASYYMAYFNSCLNAIIYGLLNQNFRQEYKRILVSLFTAKMCFVDSSNCKPAPLIANNNL
>NY2R_HUMAN
EEMKVEQYGP-QTTPRGELVPDPEPELIDSTKLIEVQVVLILAYCSIILLGVIGNSLVIHVVIKFKSMRTVTNFFIANLAVADLLVNTLCLPFTLTYTLMGEWKMGPVLCHLVPYAQGLAVQVSTITLTVIALDRHRCIVYHL-ESKISKRISFLIIGLAWGISALLASPLAIFREYSLFE--IVACTEKWPGEEKSIYGTVYSLSSLLILYVLPLGIISFSYTRIWSKLKNHSPG-AANDHYHQRRQKTTKMLVCVVVVFAVSWLPLHAFQLAVDIDSQVKEYKLIFTVFHIIAMCSTFANPLLYGWMNSNYRKAFLSAFRCEQRLDAIHSE---AKKNLEVRKNSG
>FMLR_HUMAN
-----------METNSSLPTNISGGTPAVSAGYLFLDIITYLVFAVTFVLGVLGNGLVIWVAGFRMT-HTVTTISYLNLAVADFCFTSTLPFFMVRKAMGGHWPFGWFLCKFLFTIVDINLFGSVFLIALIALDRCVCVLHPVTQNHRTVSLAKKVIIGPWVMALLLTLPVIIRVTTVPSPWPKERINVA------VAMLTVRGIIRFIIGFSAPMSIVAVSYGLIATKI---------HKQGLIKSSRPLRVLSFVAAAFFLCWSPYQVVALIATVRIREKEIGIAVDVTSALAFFNSCLNPMLYVFMGQDFRERLIHALPASLERALTEDSTQTTL----------
>NK1R_CAVPO
-----MDNVLPVDSDLFPNISTNTSEPNQFVQPAWQIVLWAAAYTVIVVTSVVGNVVVMWIILAHKRMRTVTNYFLVNLAFAEASMAAFNTVVNFTYAVHNEWYYGLFYCKFHNFFPIAAVFASIYSMTAVAFDRYMAIIHPL-QPRLSATATKVVICVIWVLALLLAFPQGYYST---GR---VVCMIEWPSHPDKIYEKVYHICVTVLIYFLPLLVIGYAYTVVGITLE---IPSDRYHEQVSAKRKVVKMMIVVVCTFAICWLPFHIFFLLPYINPDLKFIQQVYLAIMWLAMSSTMYNPIIYCCLNDRFRLGFKHAFRCCPFISAADY----STRYFQTQGSVY
>ML1B_HUMAN
NGSFANCCEAGGWAVRPGWSGAGSARPSRTPRPPWVAPALSAVLIVTTAVDVVGNLLVILSVLRNRKLRNAGNLFLVSLALADLVVAFYPYPLILVAIFYDGWALGEEHCKASAFVMGLSVIGSVFNITAIAINRYCYICHSMYHRIYRRWHTPLHICLIWLLTVVALLPNFFVGS-LEYDP-IYSCTFI------QTASTQYTAAVVVIHFLLPIAVVSFCYLRIWVLVLQARK-PESRLCLKPSDLRSFLTMFVVFVIFAICWAPLNCIGLAVAINPQEQIPEGLFVTSYLLAYFNSCLNAIVYGLLNQNFRREYKRILLALWNPRHCIQDASKQSPAPPIIGVQH
>C3AR_RAT
--------------MESFTADTNSTDLHSRPLFKPQDIASMVILSLTCLLGLPGNGLVLWVAGVKMK-RTVNTVWFLHLTLADFLCCLSLPFSVAHLILRGHWPYGLFLCKLIPSVIILNMFASVFLLTAISLDRCLMVHKPICQNHRSVRTAFAVCGCVWVVTFVMCIPVFVYRDLLV-ED-DYFDQLM-YGNHAWTPQVAITISRLVVGFLVPFFIMITCYSLIVFRM--------RKTNLTKSRNKTLRVAVAVVTVFFVCWIPYHIVGILLVITDQEEVVLPWDHMSIALASANSCFNPFLYALLGKDFRKKARQSVKGILEAAFSEELTHSAPS---------
>OPSB_GECGE
MNGTEGINFYVPLSNKTGLVRSPFEYPQYYLADPWKFKVLSFYMFFLIAAGMPLNGLTLFVTFQHKKLRQPLNYILVNLAAANLVTVCCGFTVTFYASWYAYFVFGPIGCAIEGFFATIGGQVALWSLVVLAIERYIVICKPMGNFRFSATHAIMGIAFTWFMALACAGPPLFGWSRFIPEGMQCSCGPDYYTLNPDFHNESYVIYMFIVHFTVPMVVIFFSYGRLVCKVREAAAQQQESATTQKAEKEVTRMVILMVLGFLLAWTPYAATAIWIFTNRGAAFSVTFMTIPAFFSKSSSIYNPIIYVLLNKQFRNCMVTTICCGKNPFGDEDVSSSVSSVSSSQVAPA
>SSR5_RAT
LSLASTPSWNAS---AASSGNHNWSLVGSASPMGARAVLVPVLYLLVCTVGLSGNTLVIYVVLRHAKMKTVTNVYILNLAVADVLFMLGLPFLATQNAVVSYWPFGSFLCRLVMTLDGINQFTSIFCLMVMSVDRYLAVVHPLSARWRRPRVAKMASAAVWVFSLLMSLPLLVFADVQE-----GTCNLS-WPEPVGLWGAAFITYTSVLGFFGPLLVICLCYLLIVVKVKAAMR--VGSSRRRRSEPKVTRMVVVVVLVFVGCWLPFFIVNIVNLAFTLPPTSAGLYFFVVVLSYANSCANPLLYGFLSDNFRQSFRKVLCLRRGYGMEDADAIERPQATLPTRSCE
>B2AR_CANFA
MGQPANRSVFLLAPNGSHAPDQ----GDSQERSEAWVVGMGIVMSLIVLAIVFGNVLVITAIARFERLQTVTNYFITSLACADLVMGLAVVPFGASHILMKMWTFGNFWCEFWTSIDVLCVTASIETLCVIAVDRYFAITSPFYQSLLTKNKARVVILMVWIVSGLTSFLPIQMHWYRAI-N--TCCDFF--------TNQAYAIASSIVSFYLPLVVMVFVYSRVFQVAQRQGRSHRRSSKFCLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHVIQDNL-IPKEVYILLNWVGYVNSAFNPLIYCRSP-DFRIAFQELLCLRRSSLKAYGNGYSDYAGEHSGCHLG
>CKR6_MOUSE
FGTDDYDN---TEYYSIPPDHGPC---SLEEVRNFTKVFVPIAYSLICVFGLLGNIMVVMTFAFYKKARSMTDVYLLNMAITDILFVLTLPFWAVTHATNT-WVFSDALCKLMKGTYAVNFNCGMLLLACISMDRYIAIVQATRVRSRTLTHSKVICVAVWFISIIISSPTFIFNKKYE-LQ-RDVCEPRRSVSEPITWKLLGMGLELFFGFFTPLLFMVFCYLFIIKTL---------VQAQNSKRHRAIRVVIAVVLVFLACQIPHNMVLLVTAVNTGKKVLAYTRNVAEVLAFLHCCLNPVLYAFIGQKFRNYFMKIMKDVWCMRRKNKMPGFESY-----ISRQ
>NYR_DROME
TLSGLQFETYNITVMMNFSCDDYDLLSEDMWSSAYFKIIVYMLYIPIFIFALIGNGTVCYIVYSTPRMRTVTNYFIASLAIGDILMSFFCEPSSFISLFILNWPFGLALCHFVNYSQAVSVLVSAYTLVAISIDRYIAIMWPL-KPRITKRYATFIIAGVWFIALATALPIPIVSGLDI---WHTKCEKYREMWPSRSQEYYYTLSLFALQFVVPLGVLIFTYARITIRVWAKPGETNRDQRMARSKRKMVKMMLTVVIVFTCCWLPFNILQLLLNDEEFADPLPYVWFAFHWLAMSHCCYNPIIYCYMNARFRSGFVQLMHRMPGLRRWCCLRSVSGTGPALPLNRM
>HH2R_MOUSE
-------------------MEPNGTVHSCCLDSIALKVTISVVLTTLIFITVAGNVVVCLAVSLNRRLRSLTNCFIVSLAATDLLLGLLVMPFSAIYQLSFKWRFGQVFCNIYTSLDVMLCTASILNLFMISLDRYCAVTDPLYPVLVTPVRVAISLVFIWVISITLSFLSIHLGWN--RN-TF-KCKVQ--------VNEVYGLVDGMVTFYLPLLIMCVTYYRIFKIAREQR--ISSWKAATIREHKATVTLAAVMGAFIVCWFPYFTAFVYRGLRGDD-VNEVVEGIVLWLGYANSALNPILYATLNRDFRMAYQQLFHCKLASHNSHKTSLRRSQSREGRW---
>TRFR_BOVIN
-----------MENETGSELN-QTQLQPRAVVALEYQVVTILLVLIICGLGIVGNIMVVLVVMRTKHMRTPTNCYLVSLAVADLMVLVAAGLPNITDSIYGSWVYGYVGCLCITYLQYLGINASSCSITAFTIERYIAICHPIAQFLCTFSRAKKIIIFVWAFTSIYCMLWFFLLDLN--DA-SCGYKIS------RNYYSPIYLMDFGVFYVVPMILATVLYGFIARILFLNLNSNRYFNSTVSSRKQVTKMLAVVVILFALLWMPYRTLVVVNSFLSSPFQENWFLLFCRICIYLNSAINPVIYNLMSQKFRAAFRKLCNCKQKPVEKPANYSVKESDRFSTELDD
>OPSP_ICTPU
-------MASIILINFSETDTLHLGSVNDHIMPRIGYTILSIIMALSSTFGIILNMVVIIVTVRYKQLRQPLNYALVNLAVADLGCPVFGGLLTAVTNAMGYFSLGRVGCVLEGFAVAFFGIAGLCSVAVIAVDRYMVVCRPLGAVMFQTKHALAGVVFSWVWSFIWNTPPLFGWGSYQLEGVMTSCAPN--WYRRDPVNVSYILCYFMLCFALPFATIIFSYMHLLHTLWQVAKLVADSGSTAKVEVQVARMVVIMVMAFLLTWLPYAAFALTVIIDSNIYINPVIGTIPAYLAKSSTVFNPIIYIFMNRQFRDYALPCLLCGKNPWAAKEGRDSTVSKNTSVSPL-
>OPSR_.ENLA
NDDDDTTRSSVFTYTNSNNTRGPFEGPNYHIAPRWVYNLTSIWMIFVVFASVFTNGLVIVATLKFKKLRHPLNWILVNMAIADLGETVIASTISVFNQIFGYFILGHPMCVLEGFTVSTCGITALWSLTVIAWERWFVVCKPFGNIKFDEKLAATGIIFSWVWSAGWCAPPMFGWSRFWPHGLKTSCGPDVFSGSSDPGVQSYMLVLMITCCIIPLAIIILCYLHVWWTIRQVAQQQKESESTQKAEREVSRMVVVMIVAYIFCWGPYTFFACFAAFSPGYSFHPLAAALPAYFAKSATIYNPIIYVFMNRQFRNCIYQMFGKKVDDG-----SEVVSSVSNSSVSPA
>ML1._MOUSE
---------MGPTKAVPTPFGCIGCKLPKPDYPPALIIFMFCAMVITVVVDLIGNSMVILAVTKNKKLRNSGNIFVASLSVADMLVAIYPYPLMLYAMSVGGWDLSQLQCQMVGLVTGLSVVGSIFNITAIAINRYCYICHSLYKRIFSLRNTCIYLVVTWVMTVLAVLPNMYIGT-IEYDP-TYTCIFN------YVNNPAFTVTIVCIHFVLPLIIVGYCYTKIWIKVLAARD-AGQNPDNQFAEVRNFLTMFVIFLLFAVCWCPVNVLTVLVAVIPKEKIPNWLYLAAYCIAYFNSCLNAIIYGILNESFRREYWTIFHAMRHPILFISHLISTRALTRARVRAR
>OPSG_CHICK
MNGTEGINFYVPMSNKTGVVRSPFEYPQYYLAEPWKYRLVCCYIFFLISTGLPINLLTLLVTFKHKKLRQPLNYILVNLAVADLFMACFGFTVTFYTAWNGYFVFGPVGCAVEGFFATLGGQVALWSLVVLAIERYIVVCKPMGNFRFSATHAMMGIAFTWVMAFSCAAPPLFGWSRYMPEGMQCSCGPDYYTHNPDYHNESYVLYMFVIHFIIPVVVIFFSYGRLICKVREAAAQQQESATTQKAEKEVTRMVILMVLGFMLAWTPYAVVAFWIFTNKGADFTATLMAVPAFFSKSSSLYNPIIYVLMNKQFRNCMITTICCGKNPFGDEDVSSTVSSVSSSQVSPA
>GRHR_HUMAN
--MANSASPEQNQNHCSAINNSIPLMQGNLPTLTLSGKIRVTVTFFLFLLSATFNASFLLKLQKWTQKLSRMKLLLKHLTLANLLETLIVMPLDGMWNITVQWYAGELLCKVLSYLKLFSMYAPAFMMVVISLDRSLAITRPL-ALKSNSKVGQSMVGLAWILSSVFAGPQLYIFRMIHKV--FSQCVTH--SFSQWWHQAFYNFFTFSCLFIIPLFIMLICNAKIIFTLTRVPHENQSKNNIPRARLKTLKMTVAFATSFTVCWTPYYVLGIWYWFDPEMRLSDPVNHFFFLFAFLNPCFDPLIYGYFSL-------------------------------------
>5HT2_APLCA
----------------MLCGRLRHTMNSTTCFFSHRTVLIGIVGSLIIAVSVVGNVLVCLAIFTEPISHSKSKFFIVSLAVADLLLALLVMTFALVNSLYGYWLFGETFCFIWMSADVMCETASIFSICVISYNRLKQVQKPLYEEFMTTTRALLIIASLWICSFVVSFVPFFLEWHELGD-PKPECLFD--------VHFIYSVIYSLFCFYIPCTLMLRNYLRLFLIAKKH-RIHRLHRNQGTQGSKAARTLTIITGTFLACWLPFFIINPIEAVDEHL-IPLECFMVTIWLGYFNSCVNPIIYGTSNSKFRAAFQRLLRCRSVKSTVSSISPVSWIRPSLLDGP-
>OPSD_ASTFA
MNGTEGPYFYVPMSNATGVVRSPYEYPQYYLAPPWAYACLAAYMFFLILVGFPVNFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSLNGYFVFGRLGCNLEGFFATFGGINSLWCLVVLSIERWVVVCKPMSNFRFGENHAIMGVAFTWFMALACTVPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFVVHFLTPLFVITFCYGRLVCTVKEAAAQQQESETTQRAEREVTRMVILMFIAYLVCWLPYASVSWWIFTNQGSEFGPIFMTVPAFFAKSSSIYNPVIYICLNKQFRHCMITTLCCGKNPF--EEEEGATEASSVSSVSPA
>5H2C_HUMAN
GLLVWQCDISVSPVAAIVTDIFNTSDGGRFKFPDGVQNWPALSIVIIIIMTIGGNILVIMAVSMEKKLHNATNYFLMSLAIADMLVGLLVMPLSLLAILYDYWPLPRYLCPVWISLDVLFSTASIMHLCAISLDRYVAIRNPIHSRFNSRTKAIMKIAIVWAISIGVSVPIPVIGLRD-VF--NTTCVL---------NDPNFVLIGSFVAFFIPLTIMVITYCLTIYVLRRQKKKPRGTMQAINNERKASKVLGIVFFVFLIMWCPFFITNILSVLCEKSKLMEKLLNVFVWIGYVCSGINPLVYTLFNKIYRRAFSNYLRCNYKVEKKPPVRQIALSGRELNVNIY
>DBDR_RAT
PGRNRTAQPARLGLQRQLAQVDAPAG--SATPLGPAQVVTAGLLTLLIVWTLLGNVLVCAAIVRSRHRAKMTNIFIVSLAVSDLFVALLVMPWKAVAEVAGYWPFG-TFCDIWVAFDIMCSTASILNLCIISVDRYWAISRPFYERKMTQRVALVMVGLAWTLSILISFIPVQLNWHRDEG-RTENCDSS--------LNRTYAISSSLISFYIPVAIMIVTYTRIYRIAQVQRGADPSLRASIKKETKVFKTLSMIMGVFVCCWLPFFILNCMVPFCSSGCVSETTFDIFVWFGWANSSLNPIIYAFNA-DFRKVFAQLLGCSHFCFRTPVQTVNYNQDTVFHK---
>EDG1_MOUSE
VKALRSSVSDYGNYDIIVRHYNYTGKLNIGAEKDHGIKLTSVVFILICCFIILENIFVLLTIWKTKKFHRPMYYFIGNLALSDLLAG-VAYTANLLLSGATTYKLTPAQWFLREGSMFVALSASVFSLLAIAIERYITMLKMKLHNGSNSSRSFLLISACWVISLILGGLPSMGWNCI-S--SLSSCSTV------LPLYHKHYILFCTTVFTLLLLSIAILYCRIYSLVRTRLTFISKGSRSSEKSLALLKTVIIVLSVFIACWAPLFILLLLDVGCKAKCDILYKAEYFLVLAVLNSGTNPIIYTLTNKEMRRAFIRIVSCCKCPNGDSAGKFKEFSRSKSDNSSH
>B3AR_BOVIN
MAPWPPGNSSLTPWPDIPTLAPNTANASGLPGVPWAVALAGALLALAVLATVGGNLLVIVAIARTPRLQTMTNVFVTSLATADLVVGLLVVPPGATLALTGHWPLGVTGCELWTSVDVLCVTASIETLCALAVDRYLAVTNPLYGALVTKRRALAAVVLVWVVSAAVSFAPIMSKWWRIQ-R--RCCTFA--------SNMPYALLSSSVSFYLPLLVMLFVYARVFVVATRQGVPRRPARLLPLREHRALRTLGLIMGTFTLCWLPFFVVNVVRALGGPS-VSGPTFLALNWLGYANSAFNPLIYCRSP-DFRSAFRRLLCRCR---PEEHLAAAGAPTALTSPAGP
>NY2R_CAVPO
EEIKVEPYGPGHTTPRGELAPDPEPELIDSTKLTEVRVVLILAYCSIILLGVVGNSLVIHVVIKFKSMRTVTNFFIANLAVADLLVNTLCLPFTLTYTLMGEWKMGPVLCHLVPYAQGLAVQVSTVTLTVIALDRHRCIVYHL-DSKISKQNSFLIIGLAWGISALLASPLAIFREYSLFE--IVACTEKWPGEEKSIYGTVYSLSSLLILYVLPLGIISVSYVRIWSKLKNHSPG-AANDHYHQRRQKTTKMLVFVVVVFAVSWLPLHAFQLAVDIDSQVKEYKLIFTVFHIIAMCSTFANPLLYGWMNSNYRKAFLSAFRCQQRLDAIQSE---AKTNVEVEKNHG
>PE22_RAT
---------------MDNSFNDSRRVENCESRQYLLSDESPAISSVMFTAGVLGNLIALALLARRWRSISLFHVLVTELVLTDLLGTCLISPVVLASYSRNQLAPESRACTYFAFTMTFFSLATMLMLFAMALERYLAIGHPYYRRRVSRRGGLAVLPAIYGVSLLFCSLPLLNYGEYVQYCPGTWCFIQ--------HGRTAYLQLYATVLLLLIVAVLGCNISVILNLIRMRGPRRGERTSMAEETDHLILLAIMTITFAVCSLPFTIFAYMDETSS---RKEKWDLRALRFLSVNSIIDPWVFVILRPPVLRLMRSVLCCRTSLRAPEAPGAS--QTDLCGQL--
>A2AB_MOUSE
--------------------MSGPAMVHQEPYSVQATAAIASAITFLILFTIFGNALVILAVLTSRSLRAPQNLFLVSLAAADILVATLIIPFSLANELLGYWYFWRAWCEVYLALDVLFCTSSIVHLCAISLDRYWAVSRALYNSKRTPRRIKCIILTVWLIAAVISLPPLIYKGD-Q-----PQCELN--------QEAWYILASSIGSFFAPCLIMILVYLRIYVIAKRSGVAWWRRRTQLSREKRFTFVLAVVIGVFVVCWFPFFFSYSLGAICPQHKVPHGLFQFFFWIGYCNSSLNPVIYTIFNQDFRRAFRRILCRQWTQTGW------------------
>AG2S_HUMAN
------MILNSSTEDGIKRIQDDC---PKAGRHNYIFVMIPTLYSIIFVVGIFGNSLVVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPLWAVYTAMEYRWPFGNYLCKIASASVSFNLYASVFLLTCLSIDRYLAIVHPMSRLRRTMLVAKVTCIIIWLLAGLASLPAIIHRNVFF-TN--TVCAFH-YESRNSTLPIGLGLTKNILGSCFPFLIILTSYTLIWKALKKA----YEIQKNNPRNDDIFRIIMAIVLFFFFSWIPHQIFTFLDVLIQQGDIVDTAMPITIWIAYFNNCLNPLFYGFLGKKFKKDILQLLKYIPPKAKSHSNLSTRPSD-------N
>B2AR_BOVIN
MGQPGNRSVFLLAPNASHAPDQ----NVTLERDEAWVVGMGILMSLIVLAIVFGNVLVITAIAKFERLQTVTNYFITSLACADLVMGLAVVPFGACHILMKMWTFGNFWCEFWTSIDVLCVTASIETLCVIAVDRYLAITSPFYQCLLTKNKARVVILMVWIVSGLTSFLPIQMHWYRAI-N--TCCDFF--------TNQPYAIASSIVSFYLPLVVMVFVYSRVFQVAKRQGRSQRRTSKFYLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHVIKDNL-IRKEIYILLNWLGYINSAFNPLIYCRSP-DFRIAFQELLCLRRSSLKAYGNGCSDYTGEQSGYHLG
>IL8B_RABIT
G--DFSNYSYSTDLPPTLLDSAPC----RSESLETNSYVVLITYILVFLLSLLGNSLVMLVILYSRSTCSVTDVYLLNLAIADLLFATTLPIWAASKVHG--WTFGTPLCKVVSLVKEVNFYSGILLLACISVDRYLAIVHATRTMIQKRHLVKFICLSMWGVSLILSLPILLFRNAIF-NS-SPVCYED-MGNSTAKWRMVLRILPQTFGFILPLLVMLFCYVFTLRTL---------FQAHMGQKHRAMRVIFAVVLIFLLCWLPYNLVLLTDTLMRTHNDIDRALDATEILGFLHSCLNPIIYAFIGQKFRYGLLKILAAHGLISKEFLAKES------------
>GASR_RAT
GSSLCRPGVSLLNSSSAGNLSCDPPRIRGTGTRELEMAIRITLYAVIFLMSVGGNVLIIVVLGLSRRLRTVTNAFLLSLAVSDLLLAVACMPFTLLPNLMGTFIFGTVICKAISYLMGVSVSVSTLNLVAIALERYSAICRPLARVWQTRSHAARVILATWLLSGLLMVPYPVYTMVQP---V-LQCMHR---WPSARVQQTWSVLLLLLLFFIPGVVIAVAYGLISRELYLGPGPPRPNQAKLLAKKRVVRMLLVIVLLFFLCWLPVYSVNTWRAFDGPGALSGAPISFIHLLSYVSACVNPLVYCFMHRRFRQACLDTCARCCPRPPRARPQPLPSIASLSRLSYT
>A1AB_MOUSE
HNTSAPAHWGELKDANFTGPNQTSSNSTLPQLDVTRAISVGCLG-AFILFAIVGNILVILSVACNRHLRTPTNYFIVNLAIADLLLSFTDLPFSATLEVLGYWVLGRIFCDIWAAVDVLCCTASILSLCAISIDRYIGVRYSLYPTLVTRRKAILALLSVWVLSTVISIGPLLGWKEP--AP--KECGVT--------EEPFYALFSSLGSFYIPLAVILVMYCRVYIVAKRTHNPIAVKLFKFSREKKAAKTLGIVVGMFILCWLPFFIALPLGSLFSTLKPPDAVFKVVFWLGYFNSCLNPIIYPCSSKEFKRAFMRILGCQCRGGRRRRRRRRTYRPWTRGGSLE
>OPSR_FELCA
AGLEDSTRASIFTYTNSNATRGPFEGPNYHIAPRWVYHVTSAWMIFVVIASVFTNGLVLAATMKFKKLRHPLNWILVNLAVADLAETIIASTISVVNQIYGYFVLGHPMCVLEGYTVSLCGITGLWSLAIISWERWLVVCKPFGNVRFDAKLAIAGIAFSWIWAAVWTAPPIFGWSRYWPHGLKTSCGPDVFSGSSYPGVQSYMIVLMITCCIIPLSVIVLCYLQVWLAIRAVAKQQKESESTQKAEKEVTRMVMVMIFAYCVCWGPYTFFACFAAAHPGYAFHPLVAALPAYFAKSATIYNPIIYVFMNRQFRNCIMQLFGKKVDDG-----SELASSV--SSVSPA
>P2Y7_HUMAN
-------------------MNTTSSAAPPSLGVEFISLLAIILLSVALAVGLPGNSFVVWSILKRMQKRSVTALMVLNLALADLAVLLTAPFFLHFLAQGT-WSFGLAGCRLCHYVCGVSMYASVLLITAMSLDRSLAVARPFSQKLRTKAMARRVLAGIWVLSFLLATPVLAYRTVVPK--NMSLCFPR---YPSEGHRAFHLIFEAVTGFLLPFLAVVASYSDIGRRL---------QARRFRRSRRTGRLVVLIILTFAAFWLPYHVVNLAEAGRALAKRLSLARNVLIALAFLSSSVNPVLYACAGGGLLRSAGVGFVAKLLEGTGSEASSTQTARSGPAALEP
>GPRL_HUMAN
---------MNSTLDGNQSSHPFCLLAFGYLETVNFCLLEVLIIVFLTVLIISGNIIVIFVFHCAPLNHHTTSYFIQTMAYADLFVGVSCVVPSLSLLHHPLPVEESLTCQIFGFVVSVLKSVSMASLACISIDRYIAITKPLYNTLVTPWRLRLCIFLIWLYSTLVFLPSFFHWG---K-P-VFQWCAE-----SWHTDSYFTLFIVMMLYAPAALIVCFTYFNIFRICQQHRFSGETGEVQACPDKRYAMVLFRITSVFYILWLPYIIYFLLESSTGHS--NRFASFLTTWLAISNSFCNCVIYSLSNSVFQRGLKRLSGAMCTSCASQTTANDGPLNGCHI----
>CKR3_MOUSE
TDEIKTVVESFETTPYEYEWAPPC---EKVRIKELGSWLLPPLYSLVFIIGLLGNMMVVLILIKYRKLQIMTNIYLFNLAISDLLFLFTVPFWIHYVLWNE-WGFGHYMCKMLSGFYYLALYSEIFFIILLTIDRYLAIVHAVALRARTVTFATITSIITWGLAGLAALPEFIFHESQD-SF-EFSCSPRYPEGEEDSWKRFHALRMNIFGLALPLLVMVICYSGIIKTL---------LRCPNKKKHKAIRLIFVVMIVFFIFWTPYNLVLLFSAFHRTFKHLDLAMQVTEVIAYTHCCVNPVIYAFVGERFRKHLRLFFHRNVAVYLGKYIPFLT-------SSVS
>HH1R_MOUSE
----------MRLPNTSSASEDKMCEGNRTAMASPQLLPLVVVLSSISLVTVGLNLGVLYAVRSERKLHTVGNLYIVSLSVADLIVGAIVMPMNILYLIMTKWSLGRPLCLFWLSMDYVASTASIFSVFILCIDRYRSVQQPLYLRYRTKTRASATILGAWFLSFLWVIPILGWHHFT--EL-EDKCETD------FYNVTWFKIMTAIINFYLPTLLMLWFYVKIYNGVRRHLRSQYVSGLHLNRERKAAKQLGCIMAAFILCWIPYFIFFMVIAFCNSC-CSEPVHMFTIWLGYINSTLNPLIYPLCNENFKKTFKKILHIRS-----------------------
>SSR1_HUMAN
GEGGGSRGPGAGAADGMEEPGRNASQNGTLSEGQGSAILISFIYSVVCLVGLCGNSMVIYVILRYAKMKTATNIYILNLAIADELLMLSVPFLVTSTLLRH-WPFGALLCRLVLSVDAVNMFTSIYCLTVLSVDRYVAVVHPIAARYRRPTVAKVVNLGVWVLSLLVILPIVVFSRTAADG--TVACNML-MPEPAQRWLVGFVLYTFLMGFLLPVGAICLCYVLIIAKMRMVALK-AGWQQRKRSERKITLMVMMVVMVFVICWMPFYVVQLVNVFAEQ--DDATVSQLSVILGYANSCANPILYGFLSDNFKRSFQRILCLS-----WMDNAAETALKSRAYSVED
>MC5R_MOUSE
-MNSSSTLTVLNLTLNASEDGILGSNVKNKSLACEEMGIAVEVFLTLGLVSLLENILVIGAIVKNKNLHSPMYFFVGSLAVADMLVSMSNAWETVTIYLLNNDTFVRHIDNVFDSMICISVVASMCSLLAIAVDRYITIFYALYHHIMTARRSGVIIACIWTFCISCGIVFIIYYY----------------------EESKYVIICLISMFFTMLFFMVSLYIHMFLLARNH-RIPRYNSVRQRTSMKGAITLTMLLGIFIVCWSPFFLHLILMISCPQNSCFMSYFNMYLILIMCNSVIDPLIYALRSQEMRRTFKEIVCCHGFRRPCRLLGGY------------
>OPSD_CHELA
MNGTEGPYFYIPMVNTTGIVRSPYEYPQYYLVNPAAYAALGAYMFLLILVGFPVNFLTLYVTLEHKKLRTPLNYILLNLAVADLFMVLGGFTTTMYTSMHGYFVLGRLGCNVEGFFATLGGEIALWSLVVLAIERWVVVCKPISNFRFSEDHAIMGLAFTWVMASACAVPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFVCHFLIPLVVVFFCYGRLLCAVKEAAAAQQESETTQRAEREVSRMVVIMVVAFLVCWCPYAGVAWYIFTHQGSEFGPLFMTFPAFFAKSSSIYNPMIYICMNKQFRHCMITTLCCGKNPFEEEEGASTSVSSSSVSPAA-
>OLF6_RAT
----------MAWSTGQNLSTPGPFILLGFPGPRSMRIGLFLLFLVMYLLTVVGNLAIISLVGAHRCLQTPMYFFLCNLSFLEIWFTTACVPKTLATFAPRGGVISLAGCATQMYFVFSLGCTEYFLLAVMAYDRYLAICLPLYGGIMTPGLAMRLALGSWLCGFSAITVPATLIARL---SNHFFCDISVLSCTDTQVVELVSFGIAFCVILGSCGITLVSYAYIITTI--------IKIPSARGRHRAFSTCSSHLTVVLIWYGSTIFLHVRTSVES---SLDLTKAITVLNTIVTPVLNPFIYTLRNKDVKEALRRTVKGK------------------------
>D4DR_HUMAN
---MGNRSTADADGLLAGRGPAAGASAGASAGLAGQGAAALVGGVLLIGAVLAGNSLVCVSVATERALQTPTNSFIVSLAAADLLLALLVLPLFVYSEVQGGWLLSPRLCDALMAMDVMLCTASIFNLCAISVDRFVAVAVPLYNRQGGSRRQLLLIGATWLLSAAVAAPVLCGLN---G-R--AVCRL---------EDRDYVVYSSVCSFFLPCPLMLLLYWATFRGLQRWPPPRRRRAKITGRERKAMRVLPVVVGAFLLCWTPFFVVHITQALCPACSVPPRLVSAVTWLGYVNSALNPVIYTVFNAEFRNVFRKALRACC-----------------------
>NY2R_MOUSE
VEVKVEPYGPGHTTPRGELPPDPEPELIDSTKLVEVQVILILAYCSIILLGVVGNSLVIHVVIKFKSMRTVTNFFIANLAVADLLVNTLCLPFTLTYTLMGEWKMGPVLCHLVPYAQGLAVQVSTITLTVIALDRHRCIVYHL-ESKISKRISFLIIGLAWGISALLASPLAIFREYSLFE--IVACTEKWPGEEKSVYGTVYSLSTLLILYVLPLGIISFSYTRIWSKLRNHSPG-AASDHYHQRRHKMTKMLVCVVVVFAVSWLPLHAFQLAVDIDSHVKEYKLIFTVFHIIAMCSTFANPLLYGWMNSNYRKAFLSAFRCEQRLDAIHSE---AKKNLEVKKNNG
>ETBR_MOUSE
SSAPAEVTKGGRGAGVPPRS-FPPPCQRNIEISKTFKYINTIVSCLVFVLGIIGNSTLLRIIYKNKCMRNGPNILIASLALGDLLHIIIDIPINTYKLLAEDWPFGAEMCKLVPFIQKASVGITVLSLCALSIDRYRAVASWSIKGIGVPKWTAVEIVLIWVVSVVLAVPEAIGFDMITRV--LRVCMLNQKTAFMQFYKTAKDWWLFSFYFCLPLAITAVFYTLMTCEMLRKSGM-IALNDHLKQRREVAKTVFCLVLVFALCWLPLHLSRILKLTLYDQSFLLVLDYIGINMASLNSCINPIALYLVSKRFKNCFKSCLCCWCQTFE-EKQSLEFKANDHGYDNFR
>NK2R_RAT
----MGTRAIVSDANILSGLESNATGVTAFSMPGWQLALWATAYLALVLVAVTGNATVIWIILAHERMRTVTNYFIINLALADLCMAAFNATFNFIYASHNIWYFGRAFCYFQNLFPITAMFVSIYSMTAIAADRYMAIVHPF-QPRLSAPSTKAIIAGIWLVALALASPQCFYST---GA---TKCVVAWPNDNGGKMLLLYHLVVFVLIYFLPLLVMFGAYSVIGLTLWKRPRHHGANLRHLQAKKKFVKAMVLVVLTFAICWLPYHLYFILGTFQEDIKFIQQVYLALFWLAMSSTMYNPIIYCCLNHRFRSGFRLAFRCCPWVTPTEE----HTPSLSRRVNRC
>OPR._MOUSE
GSHFQGNLSLLN---ETVPHHLLLNASHSAFLPLGLKVTIVGLYLAVCIGGLLGNCLVMYVILRHTKMKTATNIYIFNLALADTLVLLTLPFQGTDILLGF-WPFGNALCKTVIAIDYYNMFTSTFTLTAMSVDRYVAICHPIALDVRTSSKAQAVNVAIWALASVVGVPVAIMGSAQVEE---IECLVE-IPAPQDYWGPVFAICIFLFSFIIPVLIISVCYSLMIRRLRGVRLL-SGSREKDRNLRRITRLVLVVVAVFVGCWTPVQVFVLVQGLGVQPETAVAILRFCTALGYVNSCLNPILYAFLDENFKACFRKFCCASALHREMQVSDRVGLGCKTSETVPR
>YLD1_CAEEL
--------------------------MIIFYLYVATQVFVAIAFVLLMATAIIGNSVVMWIIYQHKVMHYGFNYFLFNMAFADLLIALFNVGTSWTYNLYYDWWYG-DLCTLTSFFGIAPTTVSVCSMMALSWDRCQAVVNPLQKRPLSRKRSVIAILIIWVVSTVTALPFAIAASVNSVTSKAHVCSAP--------VNTFFEKVLFGIQYALPIIILGSTFTRIAVAFRATEATSSLKNNHTRAKSKAVKMLFLMVVAFVVCWLPYHIYHAFALEEFFDARGKYAYLLIYWIAMSSCAYNPIIYCFANERFRIGFRYVFRWIPVIDCKKEQYEYMRSMAISLQKGR
>5HT_BOMMO
LPLQNCSWNSTGWEPNWNVTVWQASAPFDTPAALVRAAAKAVVLGLLILATVVGNVFVIAAILLERHLRSAANNLILSLAVADLLVACLVMPLGAVYEVVQRWTLGPELCDMWTSGDVLCCTASILHLVAIALDRYWAVTNI-YIHASTAKRVGMMIACVWTVSFFVCIAQLLGWKDPDS-E--LRCVVS--------QDVGYQIFATASSFYVPVLIILILYWRIYQTARKRPSLKPKEAADSKRERKAAKTLAIITGAFVACWLPFFVLAILVPTCDC-EVSPVLTSLSLWLGYFNSTLNPVIYTVFSPEFRHAFQRLLCGRRVRRRRAPQ---------------
>OPSD_SALPV
MNGTEGPYFYIPMVNTTGIVRSPYEYPQYYLVNPAAYAALGAYMFFLILLGFPINFLTLYVTLEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVLGRLGCNLEGFFATLGGEIGLWSLVVLAIERWVVVCKPISNFRFGENHAIMGLAFTWIMACACAVPPLVGWSRYIPEGMQCSCGVDYYTRAEGFNNESFVVYMFTCHFCIPLTIIGFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVILMVVGFLVCWLPYASVAWYIFSNQGSQFGPLFMTIPAFFAKSSSVYNPMIYICMNKQFRHCMITTLCCGKNPFEEEEGASTSSVSSSSVSPAA
>OPSD_LITMO
MNGTEGPYFYVPMVNTSGIVRSPYEYPQYYLVNPAAYAALGAYMFLLILVGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVLGRLGCNIEGFFATLGGEIALWSLVVLAIERWVVVCKPISNFRFGENHAIMGLAFTWLMAMACAAPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFVCHFLIPLMVVFFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVVIMVIAFLICWCPYAGVAWWIFTHQGSDFGPVFMTIPAFFAKSSSIYNPMIYICLNKQFRHCMITTLCCGKNPFEEEEGASTSVSSSSVSPAA-
>P2UR_RAT
----MAAGLDSWNSTINGTWEGDELGYKCRFNEDFKYVLLPVSYGVVCVLGLCLNVVALYIFLCRLKTWNASTTYMFHLAVSDSLYAASLPLLVYYYAQGDHWPFSTVLCKLVRFLFYTNLYCSILFLTCISVHRCLGVLRPLSLSWGHARYARRVAAVVWVLVLACQAPVLYFVTTS-----RITCHDT-SARELFSHFVAYSSVMLGLLFAVPFSIILVCYVLMARRLLKP---AYGTTGLPRAKRKSVRTIALVLAVFALCFLPFHVTRTLYYSFRSLNAINMAYKITRPLASANSCLDPVLYFLAGQRLVRFARDAKPATEPTPSPQARRKLTDTVRKDLSISS
>OLF4_CHICK
-------------MASGNCTTPTTFILSGLTDNPGLQMPLFMVFLAIYTITLLTNLGLIALISVDLHLQTPMYIFLQNLSFTDAAYSTVITPKMLATFLEERKTISYIGCILQYFSFVLLTVTESLLLAVMAYDRYVAICKPLYPSIMTKAVCWRLVKGLYSLAFLNSLVHTSGLLKL---SNHFFCDNSQISSSSTTLNELLVFIFGSLFAMSSIITILISYVFIILTV--------VRIRSKDGKYKAFSTCTSHLMAVSLFHGTVIFMYLRPVKLF---SLDTDKIASLFYTVVIPMLNPLIYSWRNKEVKDALRRVIATNVWIH--------------------
>LSHR_RAT
ENELSGWDYDYGFCSPKTLQCAPEPDAFNPCEDIMGYAFLRVLIWLINILAIFGNLTVLFVLLTSRYKLTVPRFLMCNLSFADFCMGLYLLLIASVDSQTKGWQTG-SGCGAAGFFTVFASELSVYTLTVITLERWHTITYAVLDQKLRLRHAIPIMLGGWLFSTLIATMPLVGISNY----KVSICLPM-----VESTLSQVYILSILILNVVAFVVICACYIRIYFAVQNP------ELTAPNKDTKIAKKMAILIFTDFTCMAPISFFAISAAFKVPLITVTNSKILLVLFYPVNSCANPFLYAIFTKAFQRDFLLLLSRFGCCKRRAELYRRSNCKNGFPGASK
>MSHR_BOVIN
PALGSQRRLLGSLNCTPPATLPFTLAPNRTGPQCLEVSIPDGLFLSLGLVSLVENVLVVAAIAKNRNLHSPMYYFICCLAVSDLLVSVSNVLETAVMLLLEAAAVVQQLDNVIDVLICGSMVSSLCFLGAIAVDRYISIFYALYHSVVTLPRAWRIIAAIWVASILTSLLFITYYY----------------------NNHKVILLCLVGLFIAMLALMAVLYVHMLARACQHIARKRQRPIHQGFGLKGAATLTILLGVFFLCWGPFFLHLSLIVLCPQHGCIFKNFNLFLALIICNAIVDPLIYAFRSQELRKTLQEVLQCSW-----------------------
>PE24_RAT
--------------------MSIPGVNASFSSTPERLNSPVTIPAVMFIFGVVGNLVAIVVLCKSRKKETTFYTLVCGLAVTDLLGTLLVSPVTIATYMKGQWPGDQALCDYSTFILLFFGLSGLSIICAMSIERYLAINHAYYSHYVDKRLAGLTLFAVYASNVLFCALPNMGLGRSERQYPGTWCFID---WTTNVTAYAAFSYMYAGFSSFLILATVLCNVLVCGALLRMAAAVASFRRIAGAEIQMVILLIATSLVVLICSIPLVVRVFINQLYQPSDISRNPDLQAIRIASVNPILDPWIYILLRKTVLSKAIEKIKCLFCRIGGSGRDGSRRTSSAMSGHSR
>5H1B_FUGRU
----MEGTNNTTGWTHFDSTSNRTSKSFDEEVKLSYQVVTSFLLGALILCSIFGNACVVAAIALERSLQNVANYLIGSLAVTDLMVSVLVLPMAALYQVLNRWTLGQIPCDIFISLDMLCCTSSILHLCVIALDRYWAITEPIYMKKRTPRRAAVLISVTWLVGFSISIPPMLIMRSQPA-N--KQCKIT--------QDPWYTIYSTFGAFYIPLTLMLVLYGRIFKAARFRRHEETKRKIALARERKTVKTLGIIMGTFILCWLPFFIVALVMPFCQESFMPHWLKDVINWLGYSNSLLNPIIYAYFNKDFQSAFKKIIKCHFCRA--------------------
>GP39_HUMAN
-------MASPSLPGSDCSQIIDHSHVPEFEVATWIKITLILVYLIIFVMGLLGNSATIRVTQVLQKLQKEVTDHMVSLACSDILVFLIGMPMEFYSIIWNPTSSYTLSCKLHTFLFEACSYATLLHVLTLSFERYIAICHPFYKAVSGPCQVKLLIGFVWVTSALVALPLLFAMGTEYETSNMSICTNL-----SRWTVFQSSIFGAFVVYLVVLLSVAFMCWNMMQVLMKSRPPKSESEESRTARRQTIIFLRLIVVTLAVCWMPNQIRRIMAAAKPKHRAYMILLPFSETFFYLSSVINPLLYTVSSQQFRRVFVQVLCCRLSLQHANHEKRLTDSARFVQRPLL
>BRB2_HUMAN
FSADMLNVTLQGPTLNGTFAQSKC---PQVEWLGWLNTIQPPFLWVLFVLATLENIFVLSVFCLHKSSCTVAEIYLGNLAAADLILACGLPFWAITISNNFDWLFGETLCRVVNAIISMNLYSSICFLMLVSIDRYLALVKTMMGRMRGVRWAKLYSLVIWGCTLLLSSPMLVFRTMKE-HN--TACVIS---YPSLIWEVFTNMLLNVVGFLLPLSVITFCTMQIMQVLRNN---EMQKFKEIQTERRATVLVLVVLLLFIICWLPFQISTFLDTLHRLGRIIDVITQIASFMAYSNSCLNPLVYVIVGKRFRKKSWEVYQGVCQKGGCRSEPIQLRTS-------I
>RDC1_HUMAN
YAEPGNFSDISWPCNSSDCIVVDTVMCPNMPNKSVLLYTLSFIYIFIFVIGMIANSVVVWVNIQAKTTGYDTHCYILNLAIADLWVVLTIPVWVVSLVQHNQWPMGELTCKVTHLIFSINLFSGIFFLTCMSVDRYLSITYFTTPSSRKKMVRRVVCILVWLLAFCVSLPDTYYLKTVTNNE--TYCRSFYPEHSIKEWLIGMELVSVVLGFAVPFSIIAVFYFLLARAI---------SASSDQEKHSSRKIIFSYVVVFLVCWLPYHVAVLLDIFSILHHALFTALHVTQCLSLVHCCVNPVLYSFINRNYRYELMKAFIFKYSAKTGLTKLIDEYSALEQNAK--
>OPSB_ASTFA
QEFQEDFYIPIPLDTNNITALSPFLVPQDHLGGSGIFMIMTVFMLFLFIGGTSINVLTIVCTVQYKKLRSHLNYILVNLAISNLLVSTVGSFTAFVSFLNRYFIFGPTACKIEGFVATLGGMVSLWSLSVVAFERWLVICKPVGNFSFKGTHAIIGCALTWFFALLASTPPLFGWSRYIPEGLQCSCGPDWYTTENKYNNESYVMFLFCFCFGFPFTVILFCYGQLLFTLKSAAKAQADSASTQKAEREVTKMVVVMVMGFLVCWLPYASFALWVVFNRGQSFDLRLGTIPSCFSKASTVYNPVIYVFMNKQFRSCMMKLIFCGKSPFGDDEEASSSSVGPEK-----
>CB1B_FUGRU
TNASDFPLSNGSGEATQCGEDIVDNMECFMILTPAQQLVIVILAITLGTFTVLENFVVLCVILHSHTRSRPSYHFIGSLAVADLIGSIIFVYSFLDFHVLHR-KDSPSIFLFKLAGVIASFTASVGSLFLTAIDRYVSIHRPMYKRIITKTKAVIAFSVMWAISIEFSLLPLLGWNCK-R--LHSVCSDI------FPLIDEKYLMFWIGMTTVLLLFIIYAYMFILWKSHHHSEGQTVRPEQARMDLRLAKTLVLILVALIICWGPLLAIMVYDLFGRVNDFIKTVFAFCSMLCLLNSTINPVIYAMRSKDLRRAFVNICHMCRGTTQSLDSSAEVRSTGGRAGKDR
>5H2B_HUMAN
ILQSTFVHVISSNWSGLQTESIPEEMKQIVEEQGNKLHWAALLILMVIIPTIGGNTLVILAVSLEKKLQYATNYFLMSLAVADLLVGLFVMPIALLTIMFEAWPLPLVLCPAWLFLDVLFSTASIMHLCAISVDRYIAIKKPIANQYNSRATAFIKITVVWLISIGIAIPVPIKGIET-NP---ITCVLT------KERFGDFMLFGSLAAFFTPLAIMIVTYFLTIHALQKKRRTGKKSVQTISNEQRASKVLGIVFFLFLLMWCPFFITNITLVLCDSCTTLQMLLEIFVWIGYVSSGVNPLVYTLFNKTFRDAFGRYITCNYRATKSVKTLRKRNPMAENSKFFK
>OAR2_LYMST
RNFSVSADVWLCGANFSQEWQLMQPVCSTKYDSITIFITVAVVLTLITLWTILGNFFVLMALYRYGTLRTMSNCLIGNLAISDLLLAVTVLPISTVHDLLGYWVFGEFTCTLWLCMDVLYCTASIWGLCTVAFDRYLATVYPVYHDQRSVRKAVGCIVFVWIFSIVISFAPFIGWQHM-S-F--YQCILF--------TSSSYVLYSSMGSFVIPAILMAFMYVRIFVVLHNQRNKLSMKRRFELREQRATKRMLLIMACFCVCWMPFLFMYILRSVCDTCHMNQHFVAAIIWLGYVNSSLNPVLYTLFNDDFKVAFKRLIGARSPSAYRSPGPRR------------
>GRPR_HUMAN
DCFLLNLEVDHFMHCN-ISSHSADLPVNDDWSHPGILYVIPAVYGVIILIGLIGNITLIKIFCTVKSMRNVPNLFISSLALGDLLLLITCAPVDASRYLADRWLFGRIGCKLIPFIQLTSVGVSVFTLTALSADRYKAIVRPMIQASHALMKICLKAAFIWIISMLLAIPEAVFSDLHPQT--FISCAPY---HSNELHPKIHSMASFLVFYVIPLSIISVYYYFIAKNLIQSLPVNIHVKKQIESRKRLAKTVLVFVGLFAFCWLPNHVIYLYRSYHYSEMLHFVTSICARLLAFTNSCVNPFALYLLSKSFRKQFNTQLLCCQPGLIIR--SHSMTSLKSTNPSVA
>DBDR_HUMAN
PGSNGTAYPGQFALYQQLAQGNAVGGSAGAPPLGPSQVVTACLLTLLIIWTLLGNVLVCAAIVRSRHRANMTNVFIVSLAVSDLFVALLVMPWKAVAEVAGYWPFG-AFCDVWVAFDIMCSTASILNLCVISVDRYWAISRPFYKRKMTQRMALVMVGLAWTLSILISFIPVQLNWHRDED-NAENCDSS--------LNRTYAISSSLISFYIPVAIMIVTYTRIYRIAQVQSAADTSLRASIKKETKVLKTLSVIMGVFVCCWLPFFILNCMVPFCSGHCVSETTFDVFVWFGWANSSLNPVIYAFNA-DFQKVFAQLLGCSHFCSRTPVETVNYNQDIVFHK---
>B2AR_MESAU
MGPPGNDSDFLLTTNGSHVPDH----DVTEERDEAWVVGMAILMSVIVLAIVFGNVLVITAIAKFERLQTVTNYFITSLACADLVMGLAVVPFGASHILMKMWNFGNFWCEFWTSIDVLCVTASIETLCVIAVDRYIAITSPFYQSLLTKNKARMVILMVWIVSGLTSFLPIQMHWYRAI-D--TCCDFF--------TNQAYAIASSIVSFYVPLVVMVFVYSRVFQVAKRQGRSLRRSSKFCLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHVIQDNL-IPKEVYILLNWLGYVNSAFNPLIYCRSP-DFRIAFQELLCLRRSSSKAYGNGYSDYMGEASGCQLG
>5H1B_HUMAN
PAGSETWVPQANLSSAPSQNCSAKDYIYQDSISLPWKVLLVMLLALITLATTLSNAFVIATVYRTRKLHTPANYLIASLAVTDLLVSILVMPISTMYTVTGRWTLGQVVCDFWLSSDITCCTASILHLCVIALDRYWAITDAVYSAKRTPKRAAVMIALVWVFSISISLPPFFWR----E-E--SECVVN-------TDHILYTVYSTVGAFYFPTLLLIALYGRIYVEARSRRVSLEKKKLMAARERKATKTLGIILGAFIVCWLPFFIISLVMPICKDAWFHLAIFDFFTWLGYLNSLINPIIYTMSNEDFKQAFHKLIRFKCTS---------------------
>GPRH_HUMAN
------MNGLEVAPPGLITNFSLATAEQCGQETPLENMLFASFYLLDFILALVGNTLALWLFIRDHKSGTPANVFLMHLAVADLSCVLVLPTRLVYHFSGNHWPFGEIACRLTGFLFYLNMYASIYFLTCISADRFLAIVHPVSLKLRRPLYAHLACAFLWVVVAVAMAPLLVSPQTVQVCL-QLYRE----------KASHHALVSLAVAFTFPFITTVTCYLLIIRSLR------QGLRVEKRLKTKAVRMIAIVLAIFLVCFVPYHVNRSVYVLHYRSRILALANRITSCLTSLNGALDPIMYFFVAEKFRHALCNLLCGKRLKGPPPSFEGKAKSEL-------
>ACM1_MOUSE
------------MNTSVPPAVSPNITVLAPGKGPWQVAFIGSTTGLLSLATVTGNLLVLISIKVNTELKTVNNYFLLSLACADLIIGTFSMNLYTTYLLMGHWALGTLACDLWLALDYVASNASVMNLLLISFDRYFSVTRPLYRAKRTPRRAALMIGLAWLVSFVLWAPAILFWQYLV-VL-AGQCYIQ------FLSQPIITFGTAMAAFYLPVTVMCTLYWRIYRETENRRGKAKRKTFSLVKEKKAARTLSAILLAFILTWTPYNIMVLVSTFCKDC-VPETLWELGYWLCYVNSTVNPMCYASCNKAFRDHFRLLLLCRWDKRRWRKIPKRPSRQC-------
>B3AR_SHEEP
MAPWPPGNSFLTPWPDIPTLAPNTANASGLPGVPWAVALAGALLALAVLATVGGNLLVIVAIARTPRLQTMTNVFVTSLATADLVVGLLVVPPGATLALTGHWPLGVTGCELWTSVDVLCVTASIETLCALAVDRYLAVTNPLYGALVTKRRARAAVVLVWVVSAAVSFAPIMSKWWRVQ-R--RCCTFA--------SNMPYALLSSSVSFYLPLLVMLFVYARVFVVDTRQGVPRRPARLLPLREHRALRTLGLIMGTFTLCWLPFFVVNVVRALGGPS-VSGPTFLALNWLGYANSAFNPLIYCRSP-DFRSAFRRLLCRCPPEEHLAAASPPTVLTSPAGPRQP
>A1AD_MOUSE
TGSGEDNQSSTAEAGAAASGEVNGSAAVGGLVVSAQGVGVGVFLAAFILTAVAGNLLVILSVACNRHLQTVTNYFIVNLAVADLLLSAAVLPFSATMEVLGFWPFGRTFCDVWAAVDVLCCTASILSLCTISVDRYVGVRHSLYPAIMTERKAAAILALLWAVALVVSVGPLLGWKEP--VP--RFCGIT--------EEVGYAIFSSVCSFYLPMAVIVVMYCRVYVVARSTHTLLSVRLLKFSREKKAAKTLAIVVGVFVLCWFPFFFVLPLGSLFPQLKPSEGVFKVIFWLGYFNSCVNPLIYPCSSREFKRAFLRLLRCQCRRRRRR--LWPSLDRR-PALRLC
>C5AR_CANFA
SMNFSPPEYPDYG-TATLDPNIFVDESLNTPKLSVPDMIALVIFVMVFLVGVPGNFLVVWVTGFEVR-RTINAIWFLNLAVADLLSCLALPILFSSIVQQGYWPFGNAACRILPSLILLNMYASILLLTTISADRFVLVFNPICQNYRGPQLAWAACSVAWAVALLLTVPSFIFRGVHT-PF--MTCGVD-YSGVGVLVERGVAILRLLMGFLGPLVILSICYTFLLIRT---------WSRKATRSTKTLKVVVAVVVSFFVLWLPYQVTGMMMALFYKHRRVSRLDSLCVAVAYINCCINPIIYVLAAQGFHSRFLKSLPARLRQVLAEESVGRSTVD--------
>CKRA_HUMAN
QVSWGHYSGDEEDAYSAEPLPELC---YKADVQAFSRAFQPSVSLTVAALGLAGNGLVLATHLAARRARSPTSAHLLQLALADLLLALTLPFAAAGALQG--WSLGSATCRTISGLYSASFHAGFLFLACISADRYVAIARALGPRPSTPGRAHLVSVIVWLLSLLLALPALLFS--QD-RE-QRRCRLIFPEGLTQTVKGASAVAQVALGFALPLGVMVACYALLGRTL---------LAARGPERRRALRVVVALVAAFVVLQLPYSLALLLDTADLLAKRKDVALLVTSGLALARCGLNPVLYAFLGLRFRQDLRRLLRGGSSPSGPQPRRGC--------PRLS
>MTR_BUFMA
LNLDCSELPNSSWVNSSMENQSSNSTRDPLKRNEEVAKVEVTVLALILFLALAGNICVLLGIYINRHKHSRMYFFMKHLSIADLVVAIFQVLPQLIWDITFRFYAPDLVCRLVTYLQVVGMFASTYMLLLMSLDRCLAICQPL--RSLHRRSDCVYVLFTWILSFLLSTPQTVIFSLTE----VYDCRAD---FIQPWGPKAYITWITLAVYIIPVMILSVCYGLISYKIWQNRATVSSVRLISKAKIRTVKMTFIIVLAYIVCWTPFFFVQMWSVWDPNP-KEASLFIIAMLLGSLNSCCNPWIYMLFTGHLFHDLLQSFLCCSARYLKTQQQGSSNSSTFVLSRKS
>B1AR_HUMAN
LPDGAATAARLLVPASPPASLLPPASESPEPLSQQWTAGMGLLMALIVLLIVAGNVLVIVAIAKTPRLQTLTNLFIMSLASADLVMGLLVVPFGATIVVWGRWEYGSFFCELWTSVDVLCVTASIETLCVIALDRYLAITSPFYQSLLTRARARGLVCTVWAISALVSFLPILMHWWRAR-R--KCCDFV--------TNRAYAIASSVVSFYVPLCIMAFVYLRVFREAQKQNGRRRPSRLVALREQKALKTLGIIMGVFTLCWLPFFLANVVKAFHREL-VPDRLFVFFNWLGYANSAFNPIIYCRSP-DFRKAFQGLLCCARRAARRRHATHGCLARPGPPPSPG
>ACM2_RAT
--------------MNNSTNSSNNGLAITSPYKTFEVVFIVLVAGSLSLVTIIGNILVMVSIKVSRHLQTVNNYFLFSLACADLIIGVFSMNLYTLYTVIGYWPLGPVVCDLWLALDYVVSNASVMNLLIISFDRYFCVTKPLYPVKRTTKMAGMMIAAAWVLSFILWAPAILFWQFIV-VE-DGECYIQ------FFSNAAVTFGTAIAAFYLPVIIMTVLYWHISRASKSRVKMPAKKKPPPSREKKVTRTILAILLAFIITWAPYNVMVLINTFCAPC-IPNTVWTIGYWLCYINSTINPACYALCNATFKKTFKHLLMCHYKNIGATR----------------
>AG22_MERUN
AATSRNITSSLPFVNLNMSGTNDLIFNCSHKPSDKHLEAIPVLYYLIFVIGFAVNIIVVSLFCCQKGPKKVSSIYIFNLAVADLLLLATLPLWATYYSYRYDWLFGPVMCKVFGSFLTLNMFASIFFITCMSVDRYQSVIYPFLSQRRNPWQASYVVPLVWCMACLSSLPTFYFRDVRT-LG--NACVMAFPPEKYAQWSAGIALMKNVLGFIIPLIFIATCYFGIRKHLLKT----NSYGKNRITRDQVLKMAAAVVLAFIICWLPFHVLTFLDALSWMGAVIDLALPFAILLGFTNSCVNPFLYCFVGNRFQQKLRSMFRVPITWLQGKRETMSREMDTFVS----
>NY5R_PIG
-------------NTVATRNSGFPVWEDYKGSVDDLQYFLIGLYTFVSLLGFMGNLLILMAVMRKRNQKTTVNFLIGNLAFSDILVVLFCSPFTLTSVLLDQWMFGKVMCHIMPFLQCVTVLVSTLILISIAIVRYHMIKHPV-SNNLTANHGYFLIATVWTLGLAICSPLPVFHSLVESS--RYLCVES---WPSDSYRIAFTISLLLVQYILPLVCLTVSHTSVCRTISCGVPESRSIMRLRKRSRSVFYRLTVLILVFAVSWMPLHLFHVVTDFNDNLRHFKLVYCICHLLGMMSCCLNPILYGFLNNGIKADLMSLIHCLHVS---------------------
>CB1R_HUMAN
EFYNKSLSSFKENEENIQCGENFMDIECFMVLNPSQQLAIAVLSLTLGTFTVLENLLVLCVILHSRSRCRPSYHFIGSLAVADLLGSVIFVYSFIDFHVFHR-KDSRNVFLFKLGGVTASFTASVGSLFLTAIDRYISIHRPLYKRIVTRPKAVVAFCLMWTIAIVIAVLPLLGWNCE-K--LQSVCSDI------FPHIDETYLMFWIGVTSVLLLFIVYAYMYILWKAHSHSEDQVTRPDQARMDIRLAKTLVLILVVLIICWGPLLAIMVYDVFGKMNKLIKTVFAFCSMLCLLNSTVNPIIYALRSKDLRHAFRSMFPSCEGTAQPLDNSMGKHANNAASVHRA
>D3DR_MOUSE
--------MAPLSQISSHINSTCGAENSTGVNRARPHAYYALSYCALILAIIFGNGLVCAAVLRERALQTTTNYLVVSLAVADLLVATLVMPWVVYLEVTGGWNFSRICCDVFVTLDVMMCTASILNLCAISIDRYTAVVMPVGTGQSSCRRVALMITAVWVLAFAVSCPLLFGFN---D-P--SICSI---------SNPDFVIYSSVVSFYVPFGVTVLVYARIYMVLRQRTSLPLQPRGVPLREKKATQMVVIVLGAFIVCWLPFFLTHVLNTHCQACHVSPELYRATTWLGYVNSALNPVIYTTFNIEFRKAFLKILSC-------------------------
>GPRF_MACMU
----MDPEETSVYLDYYYATSPNPDIRETHSHVPYTSVFLPVFYTAVFLTGVLGNLVLMGALHFKPGSRRLIDIFIINLAASDFIFLVTLPLWVDKEASLGLWRTGSFLCKGSSYMISVNMHCSVFLLTCMSVDRYLAIVCPVSRKFRRTDCAYVVCASIWFISCLLGLPTLLSRELT-IDD-KPYCAEK----KATPLKLIWSLVALIFTFFVPLLSIVTCYCCIARKLCAH---YQQSGKHNKKLKKSIKIIFIVVAAFLVSWLPFNTFKLLAIVSGLQAMLQLGMEVSGPLAFANSCVNPFIYYIFDSYIRRAIVHCLCPCLKNYDFGSSTETALSTFIHAEDFT
>CB1R_RAT
EFYNKSLSSFKENEENIQCGENFMDMECFMILNPSQQLAIAVLSLTLGTFTVLENLLVLCVILHSRSRCRPSYHFIGSLAVADLLGSVIFVYSFVDFHVFHR-KDSPNVFLFKLGGVTASFTASVGSLFLTAIDRYISIHRPLYKRIVTRPKAVVAFCLMWTIAIVIAVLPLLGWNCK-K--LQSVCSDI------FPLIDETYLMFWIGVTSVLLLFIVYAYMYILWKAHSHSEDQVTRPDQARMDIRLAKTLVLILVVLIICWGPLLAIMVYDVFGKMNKLIKTVFAFCSMLCLLNSTVNPIIYALRSKDLRHAFRSMFPSCEGTAQPLDNSMGHANNTASMHRAA
>CKR4_HUMAN
IADTTLDESIYSNYYLYESIPKPC---TKEGIKAFGELFLPPLYSLVFVFGLLGNSVVVLVLFKYKRLRSMTDVYLLNLAISDLLFVFSLPFWGYYAADQ--WVFGLGLCKMISWMYLVGFYSGIFFVMLMSIDRYLAIVHAVSLRARTLTYGVITSLATWSVAVFASLPGFLFSTCYT-ER-HTYCKTK-YSLNSTTWKVLSSLEINILGLVIPLGIMLFCYSMIIRTL---------QHCKNEKKNKAVKMIFAVVVLFLGFWTPYNIVLFLETLVELERYLDYAIQATETLAFVHCCLNPIIYFFLGEKFRKYILQLFKTCRGLFVLCQYCGLTP------SSSY
>PD2R_MOUSE
----------------------MNESYRCQTSTWVERGSSATMGAVLFGAGLLGNLLALVLLARSGLPPSVFYVLVCGLTVTDLLGKCLISPMVLAAYAQNQPASGNQLCETFAFLMSFFGLASTLQLLAMAVECWLSLGHPFYQRHVTLRRGVLVAPVVAAFCLAFCALPFAGFGKFVQYCPGTWCFIQ-MIHKERSFSVIGFSVLYSSLMALLVLATVVCNLGAMYNLYDMAQSYRHGSLHPLEELDHFVLLALMTVLFTMCSLPLIYRAYYGAFKL--AEGDSEDLQALRFLSVISIVDPWIFIIFRTSVFRMLFHKVFTRPLIYRNWSSHSQL-----------
>OPSD_BUFMA
MNGTEGPNFYIPMSNKTGVVRSPFEYPQYYLAEPWQYSVLCAYMFLLILLGFPINFMTLYVTIQHKKLRTPLNYILLNLAFANHFMVLCGFTVTMYSSMNGYFVFGQTGCYVEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFSENHAIMGVAFTWIMALACAAPPLFGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFVVHFLIPLIIIFFCYGRLVCTVKEAAAQQQESATTQKAEKEVTRMVIIMVVFFLICWVPYASVAFFIFTHQGSEFGPVFMTIPAFFAKSSSIYNPVIYIMLNKQFRNCMITTLCCGKNPFGDEDASSAASSVSSSQVSPA
>5H1F_MOUSE
-------------MDFLNASD-QNLTSEELLNRMPSKILVSLTLSGLALMTTTINSLVIAAIIVTRKLHHPANYLICSLAVTDFLVAVLVMPFSIVYIVRESWIMGQVLCDIWLSVDIICCTCSILHLSAIALDRYRAITDAVYARKRTPRHAGIMITIVWVISVFISMPPLFWR----S-R--DECVIK-------HDHIVSTIYSTFGAFYIPLVLILILYYKIYRAARTLLKHWRRQKISGTRERKAATTLGLILGAFVICWLPFFVKELVVNVCEKCKISEEMSNFLAWLGYLNSLINPLIYTIFNEDFKKAFQKLVRCRY-----------------------
>B2AR_RAT
MEPHGNDSDFLLAPNGSRAPGH----DITQERDEAWVVGMAILMSVIVLAIVFGNVLVITAIAKFERLQTVTNYFITSLACADLVMGLAVVPFGASHILMKMWNFGNFWCEFWTSIDVLCVTASIETLCVIAVDRYVAITSPFYQSLLTKNKARVVILMVWIVSGLTSFLPIQMHWYRAI-D--TCCDFF--------TNQAYAIASSIVSFYVPLVVMVFVYSRVFQVAKRQGRSLRSSSKFCLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHVIRANL-IPKEVYILLNWLGYVNSAFNPLIYCRSP-DFRIAFQELLCLRRSSSKTYGNGYSDYTGEQSAYQLG
>O.YR_PIG
GVLAANWSAEAVNSSAAPPEAEGNRTAGPPQRNEALARVEVAVLCLILFLALSGNACVLLALRTTRHKHSRLFFFMKHLSIADLVVAVFQVLPQLLWDITFRFYGPDLLCRLVKYLQVVGMFASTYLLLLMSLDRCLAICQPL--RALRRPADRLAVLATWLGCLVASAPQVHIFSLRE----VFDCWAV---FIQPWGPKAYITWITLAVYIVPVIVLAACYGLISFKIWQNRAAVSSVKLISKAKIRTVKMTFIIVLAFIVCWTPFFFVQMWSVWDADA-KEASAFIIAMLLASLNSCCNPWIYMLFTGHLFHELVQRFLCCSSSHLKTSRPGENSSTFVLSQHSS
>OPSB_CARAU
PEFHEDFYIPIPLDINNLSAYSPFLVPQDHLGNQGIFMAMSVFMFFIFIGGASINILTILCTIQFKKLRSHLNYILVNLSIANLFVAIFGSPLSFYSFFNRYFIFGATACKIEGFLATLGGMVGLWSLAVVAFERWLVICKPLGNFTFKTPHAIAGCILPWISALAASLPPLFGWSRYIPEGLQCSCGPDWYTTNNKYNNESYVMFLFCFCFAVPFGTIVFCYGQLLITLKLAAKAQADSASTQKAEREVTKMVVVMVLGFLVCWAPYASFSLWIVSHRGEEFDLRMATIPSCLSKASTVYNPVIYVLMNKQFRSCMMKMVCGKNIEE---DEASTSSVAPEK-----
>5H1B_RAT
CAPPPPATSQTGVPLANLSHNSADDYIYQDSIALPWKVLLVALLALITLATTLSNAFVIATVYRTRKLHTPANYLIASLAVTDLLVSILVMPISTMYTVTGRWTLGQVVCDFWLSSDITCCTASIMHLCVIALDRYWAITDAVYSAKRTPKRAAIMIVLVWVFSISISLPPFFWR----E-E--LDCFVN-------TDHVLYTVYSTVGAFYLPTLLLIALYGRIYVEARSRRVSLEKKKLMAARERKATKTLGIILGAFIVCWLPFFIISLVMPICKDAWFHMAIFDFFNWLGYLNSLINPIIYTMSNEDFKQAFHKLIRFKCTG---------------------
>H963_HUMAN
-----------------------MTNSSFFCPVYKDLEPFTYFFYLVFLVGIIGSCFATWAFIQKNTNHRCVSIYLINLLTADFLLTLALPVKIVVDLGVAPWKLKIFHCQVTACLIYINMYLSIIFLAFVSIDRCLQLTHSCIYRIQEPGFAKMISTVVWLMVLLIMVPNMMIPIKD-SN---VGCMEF--KKEFGRNWHLLTNFICVAIFLNFSAIILISNCLVIRQLYR-----NKDNENYPNVKKALINILLVTTGYIICFVPYHIVRIPYTLSQTEISLFKAKEATLLLAVSNLCFDPILYYHLSKAFRSKVTETFASPKETKAQKEKLRC------------
>OPS3_DROME
ARLSAETRLLGWNVPPEELRHIPEHWLTYPEPPESMNYLLGTLYIFFTLMSMLGNGLVIWVFSAAKSLRTPSNILVINLAFCDFMMMVKTPIFIYNSFHQG-YALGHLGCQIFGIIGSYTGIAAGATNAFIAYDRFNVITRPM-EGKMTHGKAIAMIIFIYMYATPWVVACYTETWGRFPEGYLTSCTFD--YLTDNFDTRLFVACIFFFSFVCPTTMITYYYSQIVGHVFSHNVESNVDKNKETAEIRIAKAAITICFLFFCSWTPYGVMSLIGAFGDKTLLTPGATMIPACACKMVACIDPFVYAISHPRYRMELQKRCPWLALNEKAPESSAVEPQQTTAA----
>CCR5_MOUSE
DDLYKELAFYSNSTEIPLQDSNFCSTVEGPLLTSFKAVFMPVAYSLIFLLGMMGNILVLVILERHRHTRSSTETFLFHLAVADLLLVFILPFAVAEGSVG--WVLGTFLCKTVIALHKINFYCSSLLVACIAVDRYLAIVHAVAYRRRRLLSIHITCTAIWLAGFLFALPELLFAKVGQ-ND-LPQCTFSQENEAETRAWFTSRFLYHIGGFLLPMLVMGWCYVGVVHRL--------LQAQRRPQRQKAVRVAILVTSIFFLCWSPYHIVIFLDTLERLKGYLSVAITLCEFLGLAHCCLNPMLYTFAGVKFRSDLSRLLTKLGCAG---PASLC--------PNWR
>5H1F_CAVPO
-------------MDFLNSSD-QNLTSEELLHRMPSKILVSLTLSGLALMTTTINSLVIAAIIVTRKLHHPANYLICSLAVTDFLVAVLVMPFSIVYIVRESWIMGQVLCDIWLSVDIICCTCSILHLSAIALDRYRAITDAVYARKRTPKQAGIMITIVWIISVFISMPPLFWR----S-R--DECIIK-------HDHIVSTIYSTFGAFYIPLVLILILYYKIYKAAKTLLRHWRRQKISGTRERKAATTLGLILGAFVICWLPFFVKELVVNVCEKCKISEEMANFLAWLGYLNSLINPLIYTIFNEDFKKAFQKLVRCQY-----------------------
>OPSD_MESBI
MNGTEGLNFYVPFSNHTGVVRSPFEYPQYYLAEPWQFSVLAAYMFLLIMLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVANLFMVLGGFTTTLYTSMHAYFIFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGLALTWIMALACAAPPLVGWSRYIPEGMQCSCGVDYYTPSPEVNNESFVVYMFVVHFSIPMVIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVVIMVVAFLICWVPYASVAFYIFTHQGSNFGPIFMTIPSFFAKSSAIYNPVIYIMMNKQFRNCMLTTLCCGRNPLGDDEVSTTASKTETSQVAPA
>PI2R_RAT
GRPDGPPSITPESPLIVGGREWQGMAGSCWNITYVQDSVGPATSTLMFVAGVVGNGLALGILGARRRHPSAFAVLVTGLAVTDLLGTCFLSPAVFVAYARNSAHGGTMLCDTFAFAMTFFGLASTLILFAMAVERCLALSHPYYAQLDGPRCARLALPAIYAFCCLFCSLPLLGLGEHQQYCPGSWCFIR---MRSPQPGGCAFSLAYASLMALLVTSIFFCNGSVTLSLCHMRRHFVPTSRAREDEVYHLILLALMTGIMAVCSLPLTIRGFTQAIAPDS--REMGDLHAFRFNAFNPILDPWVFILFRKAVFQRLKFWLCCLCARSVHGDLQTPRRDTLAPDSLQA
>IL8A_PANTR
PQMWDFDDLNFTGMPPTDEGYSPC----RLETETLNKYVVIITYALVFLLSLLGNSLVMLVILYSRVGRSVTDVYLLNLALADLLFALTLPIWAASKVNG--WIFGTFLCKVVSLLKEVNFYSGILLLACISVDRYLAIVHATRTLTQKRHLVKFVCLGCWGLSMNLSLPFFLFRQAYH-NS-SPVCYEV-LGNDTAKWRMVLRILPHTFGFIVPLFVMLFCYGFTLRTL---------FKAHMGQKHRAMRVIFAVVLIFLLCWLPYNLVLLADTLMRTQNNIGRALDATEILGFLHSCLNPIIYAFIGQNFRHGFLKILAMHGLVSKEFLARHR------------
>CCR5_HUMAN
EDLFWELDRLDNYNDTSLVENHLCPATEGPLMASFKAVFVPVAYSLIFLLGVIGNVLVLVILERHRQTRSSTETFLFHLAVADLLLVFILPFAVAEGSVG--WVLGTFLCKTVIALHKVNFYCSSLLLACIAVDRYLAIVHAVAYRHRRLLSIHITCGTIWLVGFLLALPEILFAKVSQ-NN-LPRCTFSQENQAETHAWFTSRFLYHVAGFLLPMLVMGWCYVGVVHRL--------RQAQRRPQRQKAVRVAILVTSIFFLCWSPYHIVIFLDTLARLKGSLPVAITMCEFLGLAHCCLNPMLYTFAGVKFRSDLSRLLTKLGCTG---PASLC--------PSWR
>CB2R_HUMAN
----MEECWVTEIANGSKDGLDSNPMKDYMILSGPQKTAVAVLCTLLGLLSALENVAVLYLILSSHQRRKPSYLFIGSLAGADFLASVVFACSFVNFHVFHG-VDSKAVFLLKIGSVTMTFTASVGSLLLTAIDRYLCLRYPPYKALLTRGRALVTLGIMWVLSALVSYLPLMGWTC-----CPRPCSEL------FPLIPNDYLLSWLLFIAFLFSGIIYTYGHVLWKAHQHSGHQVPGMARMRLDVRLAKTLGLVLAVLLICWFPVLALMAHSLATTLSDQVKKAFAFCSMLCLINSMVNPVIYALRSGEIRSSAHHCLAHWKKCVRGLGSEAKVTETEADGKITP
>AA2B_HUMAN
------------------------------MLLETQDALYVALELVIAALSVAGNVLVCAAVGTANTLQTPTNYFLVSLAAADVAVGLFAIPFAITISLG--FCTDFYGCLFLACFVLVLTQSSIFSLLAVAVDRYLAICVPLYKSLVTGTRARGVIAVLWVLAFGIGLTPFLGWNSKDT-T-LVKCLFE-----NVVPMSYMVYFNFFGCVLPPLLIMLVIYIKIFLVACRQ---MDHSRTTLQREIHAAKSLAMIVGIFALCWLPVHAVNCVTLFQPAQNKPKWAMNMAILLSHANSVVNPIVYAYRNRDFRYTFHKIISRYLLCQADVKSGNGLGVGL-------
>MSHR_HUMAN
AVQGSQRRLLGSLNSTPTAIPQLGLAANQTGARCLEVSISDGLFLSLGLVSLVENALVVATIAKNRNLHSPMYCFICCLALSDLLVSGTNVLETAVILLLEAAAVLQQLDNVIDVITCSSMLSSLCFLGAIAVDRYISIFYALYHSIVTLPRAPRAVAAIWVASVVFSTLFIAYYY----------------------DDHVAVLLCLVVFFLAMLVLMAVLYVHMLARACQHIARKRQRPVHQGFGLKGAVTLTILLGIFFLCWGPFFLHLTLIVLCPEHGCIFKNFNLFLALIICNAIIDPLIYAFHSQELRRTLKEVLTCSW-----------------------
>ETBR_PIG
SSSPPQMPKGGRMAGPPARTLTPPPCEGPIEIKDTFKYINTVVSCLVFVLGIIGNSTLLRIIYKNKCMRNGPNILIASLALGDLLHIIIDIPINVYKLLAEDWPFGVEMCKLVPFIQKASVGITVLSLCALSIDRYRAVASWSIKGIGVPKWTAVEIVLIWVVSVVLAVPEALGFDMITRI--LRICLLHQKTAFMQFYKTAKDWWLFSFYFCLPLAITAFFYTLMTCEMLRKSGM-IALNDHLKQRREVAKTVFCLVLVFALCWLPLHLSRILKLTLYDQSFLLVLDYIGINMASLNSCINPIALYLVSKRFKNCFKSCLCCWCQSFE-EKQSLEFKANDHGYDNFR
>OAR2_LOCMI
SSAAEEPQDALVGGDACGGRRPPSVLGVRLAVPEWEVAVTAVSLSLIILITIVGNVLVVLSVFTYKPLRIVQNFFIVSLAVADLTVAVLVMPFNVAYSLIQRWVFGIVVCKMWLTCDVLCCTASILNLCAIALDRYWAITDPIYAQKRTLRRVLAMIAGVWLLSGVISSPPLIGWNDW-N-D--TPCQLT--------EEQGYVIYSSLGSFFIPLFIMTIVYVEIFIATKRRPVYEEKQRISLSKERRAARTLGIIMGVFVVCWLPFFLMYVIVPFCNPSKPSPKLVNFITWLGYINSALNPIIYTIFNLDFRRAFKKLLHFKT-----------------------
>MC5R_HUMAN
-MNSSFHLHFLDLNLNATEGNLSGPNVKNKSSPCEDMGIAVEVFLTLGVISLLENILVIGAIVKNKNLHSPMYFFVCSLAVADMLVSMSSAWETITIYLLNNDAFVRHIDNVFDSMICISVVASMCSLLAIAVDRYVTIFYALYHHIMTARRSGAIIAGIWAFCTGCGIVFILYYS----------------------EESTYVILCLISMFFAMLFLLVSLYIHMFLLARTH-RIPGASSARQRTSMQGAVTVTMLLGVFTVCWAPFFLHLTLMLSCPQNSRFMSHFNMYLILIMCNSVMDPLIYAFRSQEMRKTFKEIICCRGFRIACSFPRRD------------
>5H6_RAT
-----------MVPEPGPVNSSTPAWGPGPPPAPGGSGWVAAALCVVIVLTAAANSLLIVLICTQPARNTS-NFFLVSLFTSDLMVGLVVMPPAMLNALYGRWVLARGLCLLWTAFDVMCCSASILNLCLISLDRYLLILSPLYKLRMTAPRALALILGAWSLAALASFLPLLLGWH---E-APGQCRLL--------ASLPFVLVASGVTFFLPSGAICFTYCRILLAARKQMESRRLATKHSRKALKASLTLGILLGMFFVTWLPFFVANIAQAVCDC--ISPGLFDVLTWLGYCNSTMNPIIYPLFMRDFKRALGRFLPCVHCPPEHRPALPPAVPDQASACSRC
>5H1D_CAVPO
SPPNQSEEGLPQEASNRSLNATETPGDWDPGLLQALKVSLVVVLSIITLATVLSNAFVLTTILLTRKLHTPANYLIGSLATTDLLVSILVMPISIAYTTTRTWNFGQILCDIWVSSDITCCTASILHLCVIALDRYWAITDALYSKRRTAGHAGAMIAAVWVISICISIPPLFWR----Q-E--SDCLVN-------TSQISYTIYSTCGAFYIPSVLLIILYSRIYRAARSRKLALERKRISAARERKATKTLGIILGAFIVCWLPFFVVSLVLPICRDSWIHPALFDFFTWLGYLNSLINPIIYTVFNEDFRQAFQKVVHFRKAS---------------------
>NMBR_MOUSE
SNLSFPTEANESELVPEVWEKDFLPDSDGTTAELVIRCVIPSLYLIIISVGLLGNIMLVKIFLTNSAMRNVPNIFISNLAAGDLLLLLTCVPVDASRYFFDEWVFGKLGCKLIPAIQLTSVGVSVFTLTALSADRYRAIVNPMMQTSGVLLWTSLKAVGIWVVSVLLAVPEAVFSEVARSS--FTACIPY---QTDELHPKIHSVLIFLVYFLIPLVIISIYYYHIAKTLIKSLPGNEHTKKQMETRKRLAKIVLVFVGCFVFCWFPNHVLYLYRSFNYKELGHMIVTLVARVLSFSNSCVNPFALYLLSESFRKHFNSQLCCGRKSYPERSTSYLMTSLKSNTKNVV
>OPSD_SHEEP
MNGTEGPNFYVPFSNKTGVVRSPFEAPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLVGWSRYIPQGMQCSCGALYFTLKPEINNESFVIYMFVVHFSIPLIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWLPYAGVAFYIFTHQGSDFGPIFMTIPAFFAKSSSVYNPVIYIMMNKQFRNCMLTTLCCGKNPLGDDEASTTVSKTETSQVAPA
>OPSD_GALML
MNGTEGENFYVPMSNKTGVVRNPFEYPQYYLADHWMFAVLAAYMFFLIITGFPVNFLTLFVTIQNKKLRQPLNYILLNLAVANLFMVFGGFTTTLITSMNGYFVFGSTGCNLEGFFATLGGEISLWSLVVLAIERYVVVCKPMSNFRFGSQHAIAGVSLTWVMAMACAAPPLVGWSRYIPEGLQCSCGIDYYTPKPEINNVSFVIYMFVVHFSIPLTIIFFCYGRLVCTVKAAAAQQQESETTQRAEREVTRMVVIMVIGFLICWLPYASVALYIFNNQGSEFGPVFMTIPSFFAKSSALYNPLIYILMNKQFRNCMITTLCCGKNPFEEEESTSASSVSSSQVSPAA
>A1AA_RAT
----------MVLLSENASEGSNCT-HPPAPVNISKAILLGVILGGLIIFGVLGNILVILSVACHRHLHSVTHYYIVNLAVADLLLTSTVLPFSAIFEILGYWAFGRVFCNIWAAVDVLCCTASIMGLCIISIDRYIGVSYPLYPTIVTQRRGVRALLCVWVLSLVISIGPLFGWRQP--AP--TICQIN--------EEPGYVLFSALGSFYVPLAIILVMYCRVYVVAKREAKNFSVRLLKFSREKKAAKTLGIVVGCFVLCWLPFFLVMPIGSFFPDFKPSETVFKIVFWLGYLNSCINPIIYPCSSQEFKKAFQNVLRIQCLRRRQSSKHALSQ----------
>GHSR_RAT
SEEPEP-NVTLDLDWDASPGNDSLPDELLPLFPAPLLAGVTATCVALFVVGISGNLLTMLVVSRFRELRTTTNLYLSSMAFSDLLIFLCMPLDLVRLWQYRPWNFGDLLCKLFQFVSESCTYATVLTITALSVERYFAICFPLAKVVVTKGRVKLVILVIWAVAFCSAGPIFVLVG---T-D-TNECRAT---FAVRSGLLTVMVWVSSVFFFLPVFCLTVLYSLIGRKLWRRD--AVGASLRDQNHKQTVKMLAVVVFAFILCWLPFHVGRYLFSKSFEPQISQYCNLVSFVLFYLSAAINPILYNIMSKKYRVAVFKLLGFESFSQRKLSTLKDKSSINT------
>O1E2_HUMAN
-------------MMGQNQTSISDFLLLGLPIQPEQQNLCYALFLAMYLTTLLGNLLIIVLIRLDSHLHTPVYLFLSNLSFSDLCFSSVTMPKLLQNMQNQDPSIPYADCLTQMYFFLYFSDLESFLLVAMAYDRYVAICFPMYTAIMSPMLCLSVVALSWVLTTFHAMLHTLLMARL---CPHFFCDMSKLACSDTRVNEWVIFIMGGLILVIPFLLILGSYARIVSSI--------LKVPSSKGICKAFSTCGSHLSVVSLFYGTVIGLYLCPSANS---STLKDTVMAMMYTVVTPMLTPFIYSLRNRDMKGALERVICKRKNPFLL------------------
>OPSD_SARSL
MNGTEGPYFYVPMVNTSGIVRSPYEYPQYYLVNPAAYARLGAYMFLLILVGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVLGRLGCNIEGFFATLGGEIALWSLVVLAIERWVVVCKPISNFRFGENHAIMGLAFTWLMALACAAPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFVCHFTVPLMVVFFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVIMMVVAFLVCWLPYASVAWWIFTHQGSEFGPVFMTIPAFFAKSSSIYNPMIYICLNKQFRHCMITTLCCGKNPFEEEEGASTSVSSSSVSPAA-
>DOP2_DROME
SATSATLSPAMVATGGGGTTTPEPDLSEFLEALPNDRVGLLAFLFLFSFATVFGNSLVILAVIRERYLHTATNYFITSLAVADCLVGLVVMPFSALYEVLENWFFGTDWCDIWRSLDVLFSTASILNLCVISLDRYWAITDPFYPMRMTVKRAAGLIAAVWICSSAISFPAIVWWRAARP----YKCTFT--------EHLGYLVFSSTISFYLPLLVMVFTYCRIYRAAVIQMGKLSRKLAKFAKEKKAAKTLGIVMGVFIICWLPFFVVNLLSGFCIECEHEEIVSAIVTWLGWINSCMNPVIYACWSRDFRRAFVRLLCMCCPRKIRRKYQPTFATRRCYSTCSL
>GASR_RABIT
VASLCRPGGPLLNNSGTGNLSCEPPRIRGAGTRELELAIRVTLYAVIFLMSVGGNILIIVVLGLSRRLRTVTNAFLLSLAVSDLLLAVACMPFTLLPNLMGTFIFGTVICKAVSYLMGVSVSVSTLSLVAIALERYSAICRPLARVWQTRSHAARVILATWLLSGLLMVPYPVYTAVQP---V-LQCVHR---WPSARVRQTWSVLLLLLLFFVPGVVMAVAYGLISRELYLGSGPPRPAQAKLLAKKRVVRMLLVIVVLFFMCWLPVYSANTWRAFDGPGALSGAPISFIHLLSYASACVNPLVYCFMHRRFRQACLDTCARCCPRPPRARPRPLPSIASLSRLSYT
>IL8B_CANFA
G--DIDNYTYNTEMPIIPADSAPC----RPESLDINKYAVVVIYVLVFVLNLLGNSLVIMVVLYSRVSHSVTDVYLLNLAIADLLFALTLPIWAVSKVKG--WIFGTPLCKIVSLLKEVNFYSGILLLASISMDRYLAIVHATRRLTQKKHWVKFICLGIWALSLILSLPIFVFRRAIN-YS-SPVCYED-MGTNTTKLRIVMRALPQTFGFIVPLMIMLFCYGLTLRTL---------FEAHMGQKHRAMRVIFAVVLVFLLCWLPYNLVADTLMRLQAINDIGRALDATEILGFFHSCLNPLIYAFIGQKFRHGLLKIMAFHGLISKEYLPKDS------------
>KI01_RAT
---------------MDNTTTTEPPKQPCTRNTLITQQIIPMLYCVVFITGVLLNGISGWIFFYVPS-SKSFIIYLKNIVVADFLMGLTFPFKVLSDSGLGPWQLNVFVFRVSAVIFYVNMYVSIAFFGLISFDRYYKIVKPLVSIVQSVNYSKVLSVLVWVLMLLLAVPNIILTNQS-TN---IQCMEL--KNELGRKWHKASNYVFVSIFWIVFLLLTVFYMAITRKIFK----LKSRKNSISVKRKSSRNIFSIVLAFVACFAPYHVARIPYTKSQTEETLLYTKEFTLLLSAANVCLDPISISSYASRLEKS--------------------------------
>O18793
RSRFIRNTNGSGEEVTTFFDYDYGAPCHKFDVKQIGAQLLPPLYSLVFIFGFVGNMLVVLILINCKKLKSLTDIYLLNLAISDLLFLITLPLWAHSAANE--WVFGNAMCKLFTGLYHIGYLGGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWLVAVFASVPGIIFTKCQE-ED-VYICGPY----FPRGWNNFHTIMRNILGLVLPLLIMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWTPYNIVILLNTFQEFFRQLDQATQVTETLGMTHCCINPIIYAFVGEKFRRYLSMFFRKYITKRFCKQCPVFVTSTNTPSTAEQ
>NY1R_HUMAN
TLFSQVENHSVHSNFSEKNAQLLAFENDDCHLPLAMIFTLALAYGAVIILGVSGNLALIIIILKQKEMRNVTNILIVNLSFSDLLVAIMCLPFTFVYTLMDHWVFGEAMCKLNPFVQCVSITVSIFSLVLIAVERHQLIINPR-GWRPNNRHAYVGIAVIWVLAVASSLPFLIYQVMTDKD--KYVCFDQ---FPSDSHRLSYTTLLLVLQYFGPLCFIFICYFKIYIRLKRRNMMMRDNKYRSSETKRINIMLLSIVVAFAVCWLPLTIFNTVFDWNHQICNHNLLFLLCHLTAMISTCVNPIFYGFLNKNFQRDLQFFFNFCDFRSRDDDY---TMHTDVSKTSLK
>B2AR_FELCA
----MGQPGNRSVFLLAPNGSHAPDQDGTQERNDAWVVGMGIVMSLIVLAIVFGNVLVITAIARFERLQTVTNYFITSLACADLVMGLAVVPFGASHILMKMWTFGNFWCEFWTSIDVLCVTASIETLCVIAVDRYFAITSPFYQSLLTKNKARVVILMVWIVSGLTSFLPIQMHWYRAI-N--TCCDFF--------TNQAYAIASSIVSFYLPLVVMVFVYSRVFQVAQRQDGRHRRASKFCLKEHKALKTLGIIMGTFTLCWLPFFIVNIVHVIQDNL-IPKEVYILLNWVGYVNSAFNPLIYCRSP-DFRIAFQELLCLRRSSLKAYGNGYSDYAGEHSGGPLG
>OLF1_RAT
-------------MTEENQTVISQFLLLFLPIPSEHQHVFYALFLSMYLTTVLGNLIIIILIHLDSHLHTPMYLFLSNLSFSDLCFSSVTMPKLLQNMQSQVPSIPFAGCLTQLYFYLYFADLESFLLVAMAYDRYVAICFPLYMSIMSPKLCVSLVVLSWVLTTFHAMLHTLLMARL---SPHFFCDISKLSCSDTHVNELVIFVMGGLVIVIPFVLIIVSYARVVASI--------LKVPSVRGIHKIFSTCGSHLSVVSLFYGTIIGLYLCPSANN---STVKETVMAMMYTVVTPMLNPFIYSLRNRDMKEALIRVLCKKKITFCL------------------
>GASR_HUMAN
GASLCRPGAPLLNSSSVGNLSCEPPRIRGAGTRELELAIRITLYAVIFLMSVGGNMLIIVVLGLSRRLRTVTNAFLLSLAVSDLLLAVACMPFTLLPNLMGTFIFGTVICKAVSYLMGVSVSVSTLSLVAIALERYSAICRPLARVWQTRSHAARVIVATWLLSGLLMVPYPVYTVVQP---V-LQCVHR---WPSARVRQTWSVLLLLLLFFIPGVVMAVAYGLISRELYLGPGPSRPTQAKLLAKKRVVRMLLVIVVLFFLCWLPVYSANTWRAFDGPGALSGAPISFIHLLSYASACVNPLVYCFMHRRFRQACLETCARCCPRPPRARPRALPSIASLSRLSYT
>CKR2_MOUSE
FTRSIQELDEGATTPYDYDDGEPC---HKTSVKQIGAWILPPLYSLVFIFGFVGNMLVIIILIGCKKLKSMTDIYLLNLAISDLLFLLTLPFWAHYAANE--WVFGNIMCKVFTGLYHIGYFGGIFFIILLTIDRYLAIVHAVALKARTVTFGVITSVVTWVVAVFASLPGIIFTKSKQ-DD-HYTCGPY----FTQLWKNFQTIMRNILSLILPLLVMVICYSGILHTL--------FRCRNEKKRHRAVRLIFAIMIVYFLFWTPYNIVLFLTTFQESLKHLDQAMQVTETLGMTHCCINPVIYAFVGEKFRRYLSIFFRKHIAKRLCKQCPVFV-------SSTF
>CKRA_MOUSE
PTEQVSWGLYSGYDEEAYSVGPLPELCYKADVQAFSRAFQPSVSLMVAVLGLAGNGLVLATHLAARRTRSPTSVHLLQLALADLLLALTLPFAAAGALQG--WNLGSTTCRAISGLYSASFHAGFLFLACINADRYVAIARALGQRPSTPSRAHLVSVFVWLLSLFLALPALLFS--RD-RE-QRRCRLIFPESLTQTVKGASAVAQVVLGFALPLGVMAACYALLGRTL---------LAARGPERRRALRVVVALVVAFVVLQLPYSLALLLDTADLLAKRKDLALLVTGGLTLVRCSLNPVLYAFLGLRFRRDLRRLLQGGGCSPKPNPRGRCSCSAPTETHSLS
>OPSD_GOBNI
MNGTEGPFFYIPMVNTTGVVRSPYEYPQYYLVNPAAYACLGAYMFFLILVGFPVNFLTLYVTLEHKKLRTPLNYILLNLAVADLFMVFGGFTTTIYTSMHGYFVLGRLGCNIEGFFATLGGEIALWSLVVLAIERWVVVCKPISNFRFGENHAIMGVAFTWFMASACAVPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFTVHFCIPLAVVGFCYGRLLCAVKEAAAAQQESETTQRAEREVSRMVVIMVIGFLVCWLPYASVAWYIFTHQGSEFGPLFMTIPAFFAKSSSIYNPMIYICMNKQFRHCMITTLCCGKNPFEEEEGASTVSSSSVSPAA--
>AG2R_PIG
------MILNSSTEDSIKRIQDDC---PKAGRHNYIFVMIPTLYSIIFVVGIFGNSLVVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPLWAVYTAMEYRWPFGNYLCKIASASVSFNLYASVFLLTCLSIDRYLAIVHPMSRLRRTMLVAKVTCIIIWLLAGLASLPTIIHRNVFF-TN--TVCAFH-YESQNSTLPVGLGLTKNILGFLFPFLIILTSYTLIWKALKKA----YEIQKNKPRNDDIFKIIMAIVLFFFFSWVPHQIFTFLDVLIQLGDIVDTAMPITICLAYFNNCLNPLFYGFLGKKFKKYFLQLLKYIPPKAKSHSSLSTRPSE-------N
>BONZ_MACMU
-----MAEYDHYEDDGFLNSFNDSSQEEHQDFLQFRKVFLPCMYLVVFVCGLVGNSLVLVISIFYHKLQSLTDVFLVNLPLADLVFVCTLPFWAYAGIHE--WIFGQVMCKTLLGVYTINFYTSMLILTCITVDRFIVVVKATNQQAKRMTWGKVICLLIWVISLLVSLPQIIYGNVFNLD--KLICGYH-----DEEISTVVLATQMTLGFFLPLLAMIVCYSVIIKTL---------LHAGGFQKHRSLKIIFLVMAVFLLTQTPFNLVKLIRSTHWEYTSFHYTIIVTEAIAYLRACLNPVLYAFVSLKFRKNFWKLVKDIGCLPYLGVSHQWKTFSASHNVEAT
>PE24_RABIT
--------------------MSTPVANASASSMPELLNNPVTIPAVMFIFGVVGNLVAIVVLCKSRKKETTFYTLVCGLAVTDLLGTLLVSPVTIATYMKGQWPGGQALCDYSTFILLFFGLSGLSIICAMSIERYLAINHAYYSHYVDKRLAGLTLFAVYASNVLFCALPNMGLGRSRLQFPDTWCFID---WRTNVTAHAAFSYMYAGFSSFLILATVLCNVLVCGALLRMAAAAASFRRIAGAEIQMVILLIATSLVVLICSIPLVVRVFINQLYQPDEISQNPDLQAIRIASVNPILDPWIYILLRKTVLSKAIEKIKCLFCRIGGSRRDRSRRTSSAMSTHSR
>FSHR_MACFA
FDMTYAEFDYDLCNEVVDVTCSPKPDAFNPCEDILGYNILRVLIWFISILAITGNIIVLVTLTTSQYKLTVPRFLMCNLAFADLCIGIYLLLIASVDIHTKSWQTG-AGCDAAGFFTVFASELSVYTLTAITLERWHTITHAMLDCKVHVRHAASVMVMGWIFAFAAALFPIFGISSY----KVSICLPM-----IDSPLSQLYVMSLLVLNVLAFVVICGCYTHIYLTVRNP------NIVSSSSDTRIAKRMAMLIFTDFLCMAPISFFAISASLKVPLITVSKAKILLVLFYPINSCANPFLYAIFTKNFRRDFFILLSKFGCYEMQAQIYRTNSHPRNGHCSSA
>ACM1_DROME
-------------MYGNQTNGTIGFETKGPRYSLASMVVMGFVAAILSTVTVAGNVMVMISFKIDKQLQTISNYFLFSLAIADFAIGTISMPLFAVTTILGYWPLGPIVCDTWLALDYLASNASVLNLLIINFDRYFSVTRPLYRAKGTTNRAAVMIG-AWGISLLLWPPWIYSWPYIE-VP-KDECYIQ-----FIETNQYITFGTALAAFYFPVTIMCFLYWRIWRETKKRNPNKKKKSQEKRQESKAAKTLSAILLSFIITWTPYNILVLIKPLTTCSCIPTELWDFFYALCYINSTINPMSYALCNATFRRTYVRILTCKWHTRNREGMVRG------------
>BRS3_MOUSE
QTLISITNDTETSSSVVSNDTTHKGWTGDNSPGIEALCAIYITYAGIISVGILGNAILIKVFFKTKSMQTVPNIFITSLAFGDLLLLLTCVPVDATHYLAEGWLFGKVGCKVLSFIRLTSVGVSVFTLTILSADRYKAVVKPLRQPPNAILKTCAKAGGIWIVSMIFALPEAIFSNVYTVT--FESCNSY---ISERLLQEIHSLLCFLVFYIIPLSIISVYYSLIARTLYKSIPTQSHARKQIESRKRIAKTVLVLVALFALCWLPNHLLYLYHSFTYESDVPFVIIIFSRVLAFSNSCVNPFALYWLSKTFQQHFKAQLCCLKAEQPEPPLGDIMGRVPATGSAHV
>OPSD_MUGCE
MNGTEGPYFYIPMVNTTGIVRSPYEYPQYYLVNPAAYAALGAYMFLLILLGFPINFLTLYVTIEHKKLRTPLNYILLNLAVANLFMVFGGFTTTMYTSMHGYFVLGRLGCNLEGFFATLGGEIALWSLVVLAVERWMVVCKPISNFRFGENHAIMGLAFTWVMASACAVPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFVCHFLIPLVVVFFCYGRLLCAVKEAAAAQQESETTQRAEREVSRMVVIMVVAFLICWCPYAGVAWYIFTHQGSEFGPLFMTFPAFFAKSSSIYNPMIYICMNKQFRHCMITTLCCGKNPFEEEEGASTSVSSSSVSPAA-
>C5AR_HUMAN
SFNYTTPDYGHYDDKDTLDLNTPVD--KTSNTLRVPDILALVIFAVVFLVGVLGNALVVWVTAFEAK-RTINAIWFLNLAVADFLSCLALPILFTSIVQHHHWPFGGAACSILPSLILLNMYASILLLATISADRFLLVFKPICQNFRGAGLAWIACAVAWGLALLLTIPSFLYRVVRE-PP--VLCGVD-YS-HDKRRERAVAIVRLVLGFLWPLLTLTICYTFILLRT---------WSRRATRSTKTLKVVVAVVASFFIFWLPYQVTGIMMSFLEPSLLLNKLDSLCVSFAYINCCINPIIYVVAGQGFQGRLRKSLPSLLRNVLTEESVVRSTVD--------
>MAS_RAT
------MDQSNMTSFAEEKAMNTSSRNASLGTSHPPIPIVHWVIMSISPLGFVENGILLWFLCFRMR-RNPFTVYITHLSIADISLLFCIFILSIDYALDYESSGHYYTIVTLSVTFLFGYNTGLYLLTAISVERCLSVLYPIYRCHRPKHQSAFVCALLWALSCLVTTMEYVMCI-DSHS--QSDC-----------RAVIIFIAILSFLVFTPLMLVSSTILVVKIRK----------NTWASHSSKLYIVIMVTIIIFLIFAMPMRVLYLLYYEYW--STFGNLHNISLLFSTINSSANPFIYFFVGSSKKKRFRESLKVVLTRAFKDEMQPRTVSIETVV----
>GPRC_RAT
NLSGLPRDCIEAGTPENISAAVPSQGSVVESEPELVVNPWDIVLCSSGTLICCENAVVVLIIFHSPSLRAPMFLLIGSLALADLLAG-LGLIINFVFA-Y--LLQSEATKLVTIGLIVASFSASVCSLLAITVDRYLSLYYALYHSERTVTFTYVMLVMLWGTSTCLGLLPVMGWNCL-R--DESTCSVV-------RPLTKNNAAILSISFLFMFALMLQLYIQICKIVMRHIALHFLATSHYVTTRKGISTLALILGTFAACWMPFTLYSLIADY----TYPSIYTYATLLPATYNSIINPVIYAFRNQEIQKALCLICCGCIPNTLSQRARSP------------
>V1AR_HUMAN
SSPWWPLATGAGNTSREAEALGEGNGPPRDVRNEELAKLEIAVLAVTFAVAVLGNSSVLLALHRTPRKTSRMHLFIRHLSLADLAVAFFQVLPQMCWDITYRFRGPDWLCRVVKHLQVFGMFASAYMLVVMTADRYIAVCHPLKTLQQPARRSRLMIAAAWVLSFVLSTPQYFVFSMIETK--ARDCWAT---FIQPWGSRAYVTWMTGGIFVAPVVILGTCYGFICYNIWCNFLLVSSVKSISRAKIRTVKMTFVIVTAYIVCWAPFFIIQMWSVWDPMSESENPTITITALLGSLNSCCNPWIYMFFSGHLLQDCVQSFPCCQNMKEKFNKEDTTFYSNNRSPTNS
>A1AA_HUMAN
----------MVFLSGNASDSSNCT-QPPAPVNISKAILLGVILGGLILFGVLGNILVILSVACHRHLHSVTHYYIVNLAVADLLLTSTVLPFSAIFEVLGYWAFGRVFCNIWAAVDVLCCTASIMGLCIISIDRYIGVSYPLYPTIVTQRRGLMALLCVWALSLVISIGPLFGWRQP--AP--TICQIN--------EEPGYVLFSALGSFYLPLAIILVMYCRVYVVAKREAKTFSVRLLKFSREKKAAKTLGIVVGCFVLCWLPFFLVMPIGSFFPDFKPSETVFKIVFWLGYLNSCINPIIYPCSSQEFKKAFQNVLRIQCLCRKQSSKHALSQ----------
>CML2_RAT
PSNSTPLALNLSLALREDAPGNLTGDLSEHQQYVIALFLSCLYTIFLFPIGFVGNILILVVNISFREKMTIPDLYFINLAAADLILVADSLIEVFNLDEQ--YYDIAVLCTFMSLFLQINMYSSVFFLTWMSFDRYLALAKAMCGLFRTKHHARLSCGLIWMASVSATLVPFTAVHLR-A----CFCFA---------DVREVQWLEVTLGFIVPFAIIGLCYSLIVRALIR----AHRHRGLRPRRQKALRMIFAVVLVFFICWLPENVFISVHLLQWAQHAYPLTGHIVNLAAFSNSCLSPLIYSFLGETFRDKLRLYVAQKTSLPALNRFCHADSTEQSDVKFSS
>GPRF_HUMAN
----MDPEETSVYLDYYYATSPNSDIRETHSHVPYTSVFLPVFYTAVFLTGVLGNLVLMGALHFKPGSRRLIDIFIINLAASDFIFLVTLPLWVDKEASLGLWRTGSFLCKGSSYMISVNMHCSVLLLTCMSVDRYLAIVWPVSRKFRRTDCAYVVCASIWFISCLLGLPTLLSRELT-IDD-KPYCAEK----KATPIKLIWSLVALIFTFFVPLLSIVTCYCCIARKLCAH---YQQSGKHNKKLKKSIKIIFIVVAAFLVSWLPFNTFKFLAIVSGLRAILQLGMEVSGPLAFANSCVNPFIYYIFDSYIRRAIVHCLCPCLKNYDFGSSTETALSTFIHAEDFA
>CKR5_TRAFR
----MDYQVSSPTYDIDYYTSEPC---QKVNVKQIAARLLPPLYSLVFIFGFVGNILVVLILINCKRLKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQR-EG-HYTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKRFCKCCSIFA-------SSVY
>GP68_HUMAN
----------------MGNITADNSSMSCTIDHTIHQTLAPVVYVTVLVVGFPANCLSLYFGYLQIKARNELGVYLCNLTVADLFYICSLPFWLQYVLQHDNWSHGDLSCQVCGILLYENIYISVGFLCCISVDRYLAVAHPFFHQFRTLKAAVGVSVVIWAKELLTSIYFLMHEEV--QH---RVCFEHPIQAWQRAIQRAINYYRFLVGFLFPICLLLASYQGILRAVRR------SHGTQKSRKDQIQRLVLSTVVIFLACFLPYHVLLLVRSVWEASKGVFNAYHFSLLLTSFNCVADPVLYCFVSETTHRDLARLRGACLAFLTCSRTGRAAPEASGKSGAQG
>YYI3_CAEEL
SDPNAEDLYITMTPSVSTENDTTVWATEEPAAIVWRHPLLAIALFSICLLTVAGNCLVVIAVCTKKYIWVTRLYLIISLAIADLIVGVIVMPMNSLFEIANHWLFGLMMCDVFHAMDILASTASIWNLCVISLDRYMAGQDPIYRDKVSKRRILMAILSVWVLSAILSFPGIIWWWRTSP-H--SQCLFT--------DSKMYVSFSSLVSFYIPLFLILFAYGKVYIIATRHKLNKSRQMMRYVHEQRAARTLSIVVGAFILCWTPFFVFTPLTAFCESCSNKETIFTFVTWAGHLNSMLNPLIYSRFSRDFRRAFKQILTCQRQQKVKTAFKTPLISVTQMAPRFS
>CKR1_MACMU
METPNTTEDYDMITEFDYGDATPC---HKVNERAILAQLLPPLYSLVFVIGVVGNLLVVLVLVQYKRLKNMTNIYLLNLAISDLLFLFTLPFLIYYKSTDD-WIFGDAMCKILSGFYYTGLYSEIFFIILLTIDRYLAIVHAVALRARTVTFGVITSIIIWALAILASSPLMYFSKTQW-NI-RHSCNIHFPYESFQQWKLFQALKLNLFGLVLPLLVMIVCYTGIIKIL---------LRRPNEKKSKAVRLIFVIMIIFFLFWTPYNLTELISVFQEFLRQLDLAMEVTEVIANMHCCVNPVIYAFAGERFRKYLRQLFHRRVAVHLVKWLPFLV-------SSTS
>BRB2_RAT
IEMFNITTQALGSAHNGTFSEVNC---PDTEWWSWLNAIQAPFLWVLFLLAALENIFVLSVFCLHKTNCTVAEIYLGNLAAADLILACGLPFWAITIANNFDWLFGEVLCRVVNTMIYMNLYSSICFLMLVSIDRYLALVKTMMGRMRGVRWAKLYSLVIWSCTLLLSSPMLVFRTMKD-HN--TACVIV---YPSRSWEVFTNMLLNLVGFLLPLSIITFCTVRIMQVLRNN---EMKKFKEVQTEKKATVLVLAVLGLFVLCWFPFQISTFLDTLLRLGRAVDIVTQISSYVAYSNSCLNPLVYVIVGKRFRKKSREVYQAICRKGGCMGESVQLRTS-------I
>A1AD_RAT
TGSGEDNQSSTGEPGAAASGEVNGSAAVGGLVVSAQGVGVGVFLAAFILTAVAGNLLVILSVACNRHLQTVTNYFIVNLAVADLLLSAAVLPFSATMEVLGFWAFGRTFCDVWAAVDVLCCTASILSLCTISVDRYVGVRHSLYPAIMTERKAAAILALLWAVALVVSVGPLLGWKEP--VP--RFCGIT--------EEVGYAIFSSVCSFYLPMAVIVVMYCRVYVVARSTHTLLSVRLLKFSREKKAAKTLAIVVGVFVLCWFPFFFVLPLGSLFPQLKPSEGVFKVIFWLGYFNSCVNPLIYPCSSREFKRAFLRLLRCQCRRRRRR--LWAASTGD-ARSDCA
>TSHR_BOVIN
LQAFDSHYDYTVCGGSEDMVCTPKSDEFNPCEDIMGYKFLRIVVWFVSLLALLGNVFVLVILLTSHYKLTVPRFLMCNLAFADFCMGLYLLLIASVDLYTQSWQTG-PGCNTAGFFTVFASELSVYTLTVITLERWHAITFAMLDRKIRLWHAYVIMLGGWVCCFLLALLPLVGISSY----KVSICLPM-----TETPLALAYIILVLLLNIIAFIIVCACYVKIYITVRNP------HYNPGDKDTRIAKRMAVLIFTDFMCMAPISFYALSALMNKPLITVTNSKILLVLFYPLNSCANPFLYAIFTKAFQRDVFMLLSKFGICKRQAQAYRGSTGIRVQKVPPD
>MC3R_HUMAN
NASCCLPSVQPTLPNGSEHLQAPFFSNQSSSAFCEQVFIKPEIFLSLGIVSLLENILVILAVVRNGNLHSPMYFFLCSLAVADMLVSVSNALETIMIAIVHSDQFIQHMDNIFDSMICISLVASICNLLAIAVDRYVTIFYALYHSIMTVRKALTLIVAIWVCCGVCGVVFIVYYS----------------------EESKMVIVCLITMFFAMMLLMGTLYVHMFLFARLHIAAADGVAPQQHSCMKGAVTITILLGVFIFCWAPFFLHLVLIITCPTNICYTAHFNTYLVLIMCNSVIDPLIYAFRSLELRNTFREILCGCNGMNLG------------------
>MSHR_SHEEP
PVLGSQRRLLGSLNCTPPATLPLTLAPNRTGPQCLEVSIPDGLFLSLGLVSLVENVLVVAAIAKNRNLHSPMYYFICCLAMSDLLVSVSNVLETAVMLLLEAAAVVQQLDNVIDVLICSSMVSSLCFLGAIAVDRYISIFYALYHSVVTLPRAWRIIAAIWVASILTSVLSITYYY----------------------NNHTVVLLCLVGFFIAMLALMAVLYVHMLARACQHIARKRQRPIHQGFGLKGAATLTILLGVFFLCWGPFFLHLSLIVLCPQHGCIFKNFNLFLALIICNAIVDPLIYAFRSQELRKTLQEVLQCSW-----------------------
>CKR1_MOUSE
MEISDFTEAYPTTTEFDYGDSTPC---QKTAVRAFGAGLLPPLYSLVFIIGVVGNVLMILVLMQHRRLQSMTSIYLFNLAVSDLVFLFTLPFWIDYKLKDD-WIFGDAMCKLLSGFYYLGLYSEIFFIILLTIDRYLAIVHAVALRARTVTLGIITSIITWALAILASMPALYFFKAQW-EF-HRTCSPHFPYKSLKQWKRFQALKLNLLGLILPLLVMIICYAGIIRIL---------LRRPSEKKVKAVRLIFAITLLFFLLWTPYNLSVFVSAFQDVLKHLDLAMQVTEVIAYTHCCVNPIIYVFVGERFWKYLRQLFQRHVAIPLAKWLPFLT-------SSIS
>A2AA_BOVIN
----MGSLQPDAGNASWNGTEAPGGGARATPYSLQVTLTLVCLAGLLMLFTVFGNVLVIIAVFTSRALKAPQNLFLVSLASADILVATLVIPFSLANEVMGYWYFGKAWCEIYLALDVLFCTSSIVHLCAISLDRYWSITQAIYNLKRTPRRIKAIIVTVWVISAVISFPPLISFEKKRP-S--PRCEIN--------DQKWYVISSSIGSFFAPCLIMILVYVRIYQIAKRRSGGASRWRGRQNREKRFTFVLAVVIGVFVVCWFPFFFTYTLTAIGCP--VPPTLFKFFFWFGYCNSSLNPVIYTIFNHDFRRAFKKILCRGDRKRIV------------------
>SSR3_MOUSE
SEPMTLDPGNTSSTWPLDTTLGNTSAGASLTGLAVSGILISLVYLVVCVVGLLGNSLVIYVVLRHTSSPSVTSVYILNLALADELFMLGLPFLAAQNALSY-WPFGSLMCRLVMAVDGINQFTSIFCLTVMSVDRYLAVVHPTSARWRTAPVARTVSRAVWVASAVVVLPVVVFSGVPR-----STCHMQ-WPEPAAAWRTAFIIYMAALGFFGPLLVICLCYLLIVVKVRSTSCQAPACQRRRRSERRVTRMVVAVVALFVLCWMPFYLLNIVNVVCPLPPAFFGLYFLVVALPYANSCANPILYGFLSYRFKQGFRRILLRPSRRIRSQEPGSGEEDEEEEERREE
>P2YR_BOVIN
TAFLADPGSPWGNSTVTSTAAVASPFKCALTKTGFQFYYLPAVYILVFIIGFLGNSVAIWMFVFHMKPWSGISVYMFNLALADFLYVLTLPALIFYYFNKTDWIFGDAMCKLQRFIFHVNLYGSILFLTCISAHRYSGVVYPLSLGRLKKKNAVYISVLVWLIVVVGISPILFYSGTG----KTITCYDT-TSDEYLRSYFIYSMCTTVAMFCVPLVLILGCYGLIVRALIY------KDLDNSPLRRKSIYLVIIVLTVFAVSYIPFHVMKTMNLRARLDDRVYATYQVTRGLASLNSCVDPILYFLAGDTFRRRLSRATRKASRRSEANLQSKSLSEFKQNGDTSL
>5H5B_RAT
TPGIAFPPGPESCSDSPSSGRGGLILSGREPPFSAFTVLVVTLLVLLIAATFLWNLLVLVTILRVRAFHRVPHNLVASTAVSDVLVAALVMPLSLVSELSAGWQLGRSLCHVWISFDVLCCTASIWNVAAIALDRYWTITRHLYTLRTRRRASALMIAITWALSALIALAPLLFGWGE-D-A-LQRCQVS--------QEPSYAVFSTCGAFYVPLAVVLFVYWKIYKAAKFRATVTSGDSWREQKEKRAAMMVGILIGVFVLCWIPFFLTELVSPLCAC-SLPPIWKSIFLWLGYSNSFFNPLIYTAFNKNYNNAFKSLFTKQR-----------------------
>OPSD_CHICK
MNGTEGQDFYVPMSNKTGVVRSPFEYPQYYLAEPWKFSALAAYMFMLILLGFPVNFLTLYVTIQHKKLRTPLNYILLNLVVADLFMVFGGFTTTMYTSMNGYFVFGVTGCYIEGFFATLGGEIALWSLVVLAVERYVVVCKPMSNFRFGENHAIMGVAFSWIMAMACAAPPLFGWSRYIPEGMQCSCGIDYYTLKPEINNESFVIYMFVVHFMIPLAVIFFCYGNLVCTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTNQGSDFGPIFMTIPAFFAKSSAIYNPVIYIVMNKQFRNCMITTLCCGKNPLGDEDTSAGTSSVSTSQVSPA
>B3AR_MOUSE
MAPWPHRNGSLALWSDAPTLDPSAAN---TSGLPGVPWAAALAGALLALATVGGNLLVIIAIARTPRLQTITNVFVTSLAAADLVVGLLVMPPGATLALTGHWPLGETGCELWTSVDVLCVTASIETLCALAVDRYLAVTNPLYGTLVTKRRARAAVVLVWIVSAAVSFAPIMSQWWRVQ-E--RCCSFA--------SNMPYALLSSSVSFYLPLLVMLFVYARVFVVAKRQGVPRRPARLLPLREHRALRTLGLIMGIFSLCWLPFFLANVLRALAGPS-VPSGVFIALNWLGYANSAFNPVIYCRSP-DFRDAFRRLLCSYGGRGPEEPRAVTARQSPPLNRFDG
>SSR1_RAT
GEGVCSRGPGSGAADGMEEPGRNSSQNGTLSEGQGSAILISFIYSVVCLVGLCGNSMVIYVILRYAKMKTATNIYILNLAIADELLMLSVPFLVTSTLLRH-WPFGALLCRLVLSVDAVNMFTSIYCLTVLSVDRYVAVVHPIAARYRRPTVAKVVNLGVWVLSLLVILPIVVFSRTAADG--TVACNML-MPEPAQRWLVGFVLYTFLMGFLLPVGAICLCYVLIIAKMRMVALK-AGWQQRKRSERKITLMVMMVVMVFVICWMPFYVVQLVNVFAEQ--DDATVSQLSVILGYANSCANPILYGFLSDNFKRSFQRILCLS-----WMDNAAETALKSRAYSVED
>AA1R_HUMAN
----------------------------MPPSISAFQAAYIGIEVLIALVSVPGNVLVIWAVKVNQALRDATFCFIVSLAVADVAVGALVIPLAILINIG--PQTYFHTCLMVACPVLILTQSSILALLAIAVDRYLRVKIPLYKMVVTPRRAAVAIAGCWILSFVVGLTPMFGWNNLSN-G-VIKCEFE-----KVISMEYMVYFNFFVWVLPPLLLMVLIYLEVFYLIRKQK-VSGDPQKYYGKELKIAKSLALILFLFALSWLPLHILNCITLFCPSCHKPSILTYIAIFLTHGNSAMNPIVYAFRIQKFRVTFLKIWNDHFRCQPAPPID--PDD---------
>PE23_HUMAN
PFCTRLNHSYTGMWAPERSAEARGNLTRPPGSGEDCGSVSVAFPITMLLTGFVGNALAMLLVSRSYRRKKSFLLCIGWLALTDLVGQLLTTPVVIVVYLSKQIDPSGRLCTFFGLTMTVFGLSSLFIASAMAVERALAIRAPHYASHMKTRATRAVLLGVWLAVLAFALLPVLGVG----QYTGTWCFISNGTSSSHNWGNLFFASAFAFLGLLALTVTFSCNLATIKALVSRAKASQSSAQWGRITTETAIQLMGIMCVLSVCWSPLLIMMLKMIFNQTSQKECNFFLIAVRLASLNQILDPWVYLLLRKILLRKFCQIRYHTNNYASSSTSLPCWSDHLER-----
>OLF0_RAT
---------------MNNQTFITQFLLLGLPIPEEHQHLFYALFLVMYLTTILGNLLIIVLVQLDSQLHTPMYLFLSNLSFSDLCFSSVTMPKLLQNMRSQDTSIPYGGCLAQTYFFMVFGDMESFLLVAMAYDRYVAICFPLYTSIMSPKLCTCLVLLLWMLTTSHAMMHTLLAARL---SLNFFCDLFKLACSDTYINELMIFIMSTLLIIIPFFLIVMSYARIISSI--------LKVPSTQGICKVFSTCGSHLSVVSLFYGTIIGLYLCPAGNN---STVKEMVMAMMYTVVTPMLNPFIYSLRNRDMKRALIRVICSMKITL--------------------
>BRB2_RABIT
-MLNITSQVLAPALNGSVSQSSGC---PNTEWSGWLNVIQAPFLWVLFVLATLENLFVLSVFCLHKSSCTVAEVYLGNLAAADLILACGLPFWAVTIANHFDWLFGEALCRVVNTMIYMNLYSSICFLMLVSIDRYLALVKTMIGRMRRVRWAKLYSLVIWGCTLLLSSPMLVFRTMKD-YN--TACIID---YPSRSWEVFTNVLLNLVGFLLPLSVITFCTVQILQVLRNN---EMQKFKEIQTERRATVLVLAVLLLFVVCWLPFQVSTFLDTLLKLGHVIDVITQVGSFMGYSNSCLNPLVYVIVGKRFRKKSREVYRAACPKAGCVLEPVQLRTS-------I
>CCKR_MOUSE
DSLLMNGSNITPPCELGLENETLFCLDQPQPSKEWQSAVQILLYSFIFLLSVLGNTLVITVLIRNKRMRTVTNIFLLSLAVSDLMLCLFCMPFNLIPNLLKDFIFGSAVCKTTTYFMGTSVSVSTFNLVAISLERYGAICRPLSRVWQTKSHALKVIAATWCLSFTIMTPYPIYN----NNQTANMCRFL---LPSDAMQQSWQTFLLLILFLIPGVVMVVAYGLISLELYQGRINSSGSAANLIAKKRVIRMLIVIVVLFFLCWMPIFSANAWRAYDTVSHLSGTPISFILLLSYTSSCVNPIIYCFMNKRFRLGFMATFPCCPNPGPTGVRGEVTIRASLSRYSYS
>ACM3_RAT
N---------ISQETGNFSS-NDTSSDPLGGHTIWQVVFIAFLTGFLALVTIIGNILVIVAFKVNKQLKTVNNYFLLSLACADLIIGVISMNLFTTYIIMNRWALGNLACDLWLSIDYVASNASVMNLLVISFDRYFSITRPLYRAKRTTKRAGVMIGLAWVISFVLWAPAILFWQYFV-VP-PGECFIQ------FLSEPTITFGTAIAAFYMPVTIMTILYWRIYKETEKRKTRTKRKRMSLIKEKKAAQTLSAILLAFIITWTPYNIMVLVNTFCDSC-IPKTYWNLGYWLCYINSTVNPVCYALCNKTFRTTFKTLLLCQCDKRKRRKQQYQHKRVPEQAL---
>5H1E_HUMAN
-------------MNITNCT--TEASMAIRPKTITEKMLICMTLVVITTLTTLLNLAVIMAIGTTKKLHQPANYLICSLAVTDLLVAVLVMPLSIIYIVMDRWKLGYFLCEVWLSVDMTCCTCSILHLCVIALDRYWAITNAIYARKRTAKRAALMILTVWTISIFISMPPLFWRS---S-P--SQCTIQ-------HDHVIYTIYSTLGAFYIPLTLILILYYRIYHAAKSLNDLGERQQISSTRERKAARILGLILGAFILSWLPFFIKELIVGLSIYT-VSSEVADFLTWLGYVNSLINPLLYTSFNEDFKLAFKKLIRCREHT---------------------
>OPS1_CALVI
ALT---NGSVTDKVTPDMAHLVHPYWNQFPAMEPKWAKFLAAYMVLIATISWCGNGVVIYIFSTTKSLRTPANLLVINLAISDFGIMITNTPMMGINLFYETWVLGPLMCDIYGGLGSAFGCSSILSMCMISLDRYNVIVKGMAGQPMTIKLAIMKIALIWFMASIWTLAPVFGWSRYVPEGNLTSCGID--YLERDWNPRSYLIFYSIFVYYLPLFLICYSYWFIIAAVSAHMNVRSSEDADKSAEGKLAKVALVTISLWFMAWTPYTIINTLGLFKYEG-LTPLNTIWGACFAKSAACYNPIVYGISHPKYGIALKEKCPCCVFGKVDDGK-ASNNESETKA----
>V1BR_RAT
---MNSEPSWTATPSPGGTLPVPNATTPWLGRDEELAKVEIGILATVLVLATGGNLAVLLTLGRHGHKRSRMHLFVLHLALTDLGVALFQVLPQLLWDITYRFQGSDLLCRAVKYLQVLSMFASTYMLLAMTLDRYLAVCHPLRSLRQPSQSTYPLIAAPWLLAAILSLPQVFIFSLRESG--VLDCWAD---FYFSWGPRAYITWTTMAIFVLPVAVLSACYGLICHEIYKNRGLVSSISTISRAKIRTVKMTFVIVLAYIACWAPFFSVQMWSVWDENADSTNVAFTISMLLGNLSSCCNPWIYMGFNSRLLPRSLSHHACCTGSKPQVHRQLSRTTLLTHACGSP
>OPSD_SARPI
MNGTEGPFFYIPMSNATGLVRSPYDYPQYYLVPPWGYACLAAYMFLLILTGFPVNFLTLYVTIEHKKLRSPLNYILLNLAVADLFMVIGGFTTTMWTSLNGYFVFGRMGCNIEGFFATLGGEIALWSLVVLSMERWIVVCKPISNFRFGENHAVMGVAFSWFMAAACAVPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFVVHFTCPLTIITFCYGRLVCTVKEAAAQQQESETTQRAEREVTRMVIIMFVAFLACWVPYASVAWYIFTHQGSEFGPVFMTIPAFFAKSSAVYNPVIYICLNKQFRHCMITTLCCGKNPFEEEEGSTTSVCSVSPAA---
>V1BR_MOUSE
---MDSEPSWTATPSPGGTLFVPNTTTPWLGRDEELAKVEIGILATVLVLATGGNLAVLLILGLQGHKRSRMHLFVLHLALTDLGVALFQVLPQLLWDITYRFQGSDLLCRAVKYLQVLSMFASTYMLLAMTLDRYLAVCHPLRSLQQPSQSTYPLIAAPWLLAAILSLPQVFIFSLRESG--VLDCWAD---FYFSWGPRAYITWTTMAIFVLPVVVLTACYGLICHEIYKNATLVSSISTISRAKIRTVKMTFVIVLAYIACWAPFFSVQMWSVWDENADSTNVAFTISMLLGNLSSCCNPWIYMGFNSHLLPRSLSHRACCRGSKPRVHRQLSRTTLLTHTCGPS
>CCKR_CAVPO
DSLFVNGSNITSACELGFENETLFCLDRPRPSKEWQPAVQILLYSLIFLLSVLGNTLVITVLIRNKRMRTVTNIFLLSLAVSDLMLCLFCMPFNLIPSLLKDFIFGSAVCKTTTYFMGTSVSVSTFNLVAISLERYGAICKPLSRVWQTKSHALKVIAATWCLSFTIMTPYPIYN----NNQTGNMCRFL---LPNDVMQQTWHTFLLLILFLIPGIVMMVAYGLISLELYQGRINSSSSTANLMAKKRVIRMLIVIVVLFFLCWMPIFSANAWRAYDTVSHLSGTPISFILLLSYTSSCVNPIIYCFMNKRFRLGFMATFPCCPNPGTPGVRGEMTTGASLSRYSYS
>OPRM_RAT
LSHVDGNQSDPCGLNRTGLGGNDSLCPQTGSPSMVTAITIMALYSIVCVVGLFGNFLVMYVIVRYTKMKTATNIYIFNLALADALATSTLPFQSVNYLMGT-WPFGTILCKIVISIDYYNMFTSIFTLCTMSVDRYIAVCHPVALDFRTPRNAKIVNVCNWILSSAIGLPVMFMATTKYGS---IDCTLT-FSHPTWYWENLLKICVFIFAFIMPVLIITVCYGLMILRLKSVRML-SGSKEKDRNLRRITRMVLVVVAVFIVCWTPIHIYVIIKALITIPTFQTVSWHFCIALGYTNSCLNPVLYAFLDENFKRCFREFCIPTSSTIEQQNSTRVPSTANTVDRTNH
>FMLR_RABIT
-----------MDSNASLPLNVSGGTQATPAGLVVLDVFSYLILVVTFVLGVLGNGLVIWVTGFRMT-HTVTTISYLNLALADFSFTSTLPFFIVTKALGGHWPFGWFLCKFVFTIVDINLFGSVFLIALIALDRCICVLHPVAQNHRNVSLAKKVIVGPWICALLLTLPVIIRVTTLSSPWPAEKLKVA------ISMFMVRGIIRFIIGFSTPMSIVAVCYGLIATKI---------HRQGLIKSSRPLRVLSFVVASFLLCWSPYQIAALIATVRIREKDLRIVLDVTSFVAFFNSCLNPMLYVFMGQDFRERLIHSLPASLERALSEDSAQTTS----------
>OPSP_COLLI
----MDPTNSPQEPPHTSTPGPFDGPQWPHQAPRGMYLSVAVLMGIVVISASVVNGLVIVVSIRYKKLRSPLNYILVNLAMADLLVTLCGSSVSFSNNINGFFVFGKRLCELEGFMVSLTGIVGLWSLAILALERYVVVCRPLGDFRFQHRHAVTGCAFTWVWSLLWTTPPLLGWSSYVPEGLRTSCGPN--WYTGGSNNNSYILTLFVTCFVMPLSLILFSYANLLMTLRAAAAQQQESDTTQQAERQVTRMVVAMVMAFLICWLPYTTFALVVATNKDIAIQPALASLPSYFSKTATVYNPIIYVFMNKQFQSCLLKMLCCGHHPRGTGRTAPAGLRNKVTPSHPV
>O1D2_HUMAN
-------------MDGGNQSEGSEFLLLGMSESPEQQQILFWMFLSMYLVTVVGNVLIILAISSDSRLHTPVYFFLANLSFTDLFFVTNTIPKMLVNLQSHNKAISYAGCLTQLYFLVSLVALDNLILAVMAYDRYVAICCPLYTTAMSPKLCILLLSLCWVLSVLYGLIHTLLMTRV---THYIFCEMYRMACSNIQINHTVLIATGCFIFLIPFGFVIISYVLIIRAI--------LRIPSVSKKYKAFSTCASHLGAVSLFYGTLCMVYLKPLHTY----SVKDSVATVMYAVVTPMMNPFIYSLRNKDMHGALGRLLDKHFKRLT-------------------
>IL8A_HUMAN
PQMWDFDDLNFTGMPPADEDYSPC----MLETETLNKYVVIIAYALVFLLSLLGNSLVMLVILYSRVGRSVTDVYLLNLALADLLFALTLPIWAASKVNG--WIFGTFLCKVVSLLKEVNFYSGILLLACISVDRYLAIVHATRTLTQKRHLVKFVCLGCWGLSMNLSLPFFLFRQAYH-NS-SPVCYEV-LGNDTAKWRMVLRILPHTFGFIVPLFVMLFCYGFTLRTL---------FKAHMGQKHRAMRVIFAVVLIFLLCWLPYNLVLLADTLMRTQNNIGRALDATEILGFLHSCLNPIIYAFIGQNFRHGFLKILAMHGLVSKEFLARHR------------
>EBP2_HUMAN
NPDKDGGTPDSGQELRGNLTGQIQNPLYPVTESSYSAYAIMLLALVVFAVGIVGNLSVMCIVWHSYYLKSAWNSILASLALWDFLVLFFCLPIVIFNEITKQRLLGDVSCRAVPFMEVSSLGVTTFSLCALGIDRFHVATSTLVRPIERCQSILAKLAVIWVGSMTLAVPELLLWQLAQM----KPSASLSLYSLVMTYQNARMWWYFGCYFCLPILFTVTCQLVTWRVRGPPRKS-CRASKHEQCESQLNSTVVGLTVVYAFCTLPENVCNIVVAYLSTEQTLDLLGLINQFSTFFKGAITPVLLLCICRPLGQAFLDCCCCCCCEECGGASEASKLKTEVSSSIYF
>5H5B_MOUSE
TPGLAFPPGPESCSDSPSSGRGGLILPGREPPFSAFTVLVVTLLVLLIAATFLWNLLVLVTILRVRAFHRVPHNLVASTAVSDVLVAVLVMPLSLVSELSAGWQLGRSLCHVWISFDVLCCTASIWNVAAIALDRYWTITRHLYTLRTRSRASALMIAITWALSALIALAPLLFGWGE-D-A-LQRCQVS--------QEPSYAVFSTCGAFYLPLAVVLFVYWKIYKAAKFRATVTSGDSWREQKEKRAAMMVGILIGVFVLCWIPFFLTELISPLCAC-SLPPIWKSIFLWLGYSNSFFNPLIYTAFNKNYNNAFKSLFTKQR-----------------------
>V2R_BOVIN
MFMASTTSAVPWHLSQPTPAGNGSEGELLTARDPLLAQAELALLSTVFVAVALSNGLVLGALVRRGRRWAPMHVFIGHLCLADLAVALFQVLPQLAWDATDRFRGPDALCRAVKYLQMVGMYASSYMILAMTLDRHRAICRPMAHRHGGGTHWNRPVLLAWAFSLLFSLPQLFIFAQRDSG--VLDCWAR---FAEPWGLRAYVTWIALMVFVAPALGIAACQVLIFREIHASGCRPAEGARVSAAVAKTVKMTLVIVIVYVLCWAPFFLVQLWAAWDPEA-REGPPFVLLMLLASLNSCTNPWIYASFSSSISSELRSLLCCTWRRAPPSPGPQESFLAKDTPS---
>GRHR_MOUSE
--MANNASLEQDPNHCSAINNSIPLIQGKLPTLTVSGKIRVTVTFFLFLLSTAFNASFLLKLQKWTQKLSRMKVLLKHLTLANLLETLIVMPLDGMWNITVQWYAGEFLCKVLSYLKLFSMYAPAFMMVVISLDRSLAITQPL-AVQSNSKLEQSMISLAWILSIVFAGPQLYIFRMIYTV--FSQCVTH--SFPQWWHQAFYNFFTFGCLFIIPLLIMLICNAKIIFALTRVPRKNQSKNNIPRARLRTLKMTVAFATSFVVCWTPYYVLGIWYWFDPEMRVSEPVNHFFFLFAFLNPCFDPLIYGYFSL-------------------------------------
>OLF1_CANFA
--------------MDGNYTLVTEFILLGFPTRPELQIVLFLVFLTLYGIILTGNIGLMMLIRTDPHLQTPMYFFLSNLSFADLCFSSAIVPKMLVNFLSENKSISLYGCALQFYFSCAFADTESFILAAMAYDRYVAICNPLYTVVMSRGICVWLIVLSYIGGNMSSLVHTSFAFIL---KNHFFCDLPKLSCTDTSVNEWLLSTYGSSVEIFCFIVIVISYYFILRSV--------LRIRSSSGRKKTFSTCASHLTSVAIYQGTLLFIYSRPTYLY---TPNTDKIISVFYTIIIPVLNPLIYSLRNKDVKDAAKRAVRLKVDSS--------------------
>GPRY_HUMAN
SVSSWPYSSHRMRFITNHSDQATPNVTTCPMDEKLLSTVLTTSYSVIFIVGLVGNIIALYVFLGIHRKRNSIQIYLLNVAIADLLLIFCLPFRIMYHINQNKWTLGVILCKVVGTLFYMNMYISIILLGFISLDRYIKINRSIQRKAITTKQSIYVCCIVWMLALGGFLTMIILTLKK-----STMCFHY--RDKHNAKGEAIFNFILVVMFWLIFLLIILSYIKIGKNLLRISKR--SKFPNSGKYATTARNSFIVLIIFTICFVPYHAFRFIYISSQLNEIVHKTNEIMLVLSSFNSCLDPVMYFLMSSNIRKIMCQLLFRRFQGEPSRSESTSLHDTSVAVKIQS
>OPSD_RANPI
MNGTEGPNFYIPMSNKTGVVRSPFDYPQYYLAEPWKYSVLAAYMFLLILLGLPINFMTLYVTIQHKKLRTPLNYILLNLGVCNHFMVLCGFTITMYTSLHGYFVFGQTGCYFEGFFATLGGEIALWSLVVLAIERYIVVCKPMSNFRFGENHAMMGVAFTWIMALACAVPPLFGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFVVHFLIPLIIISFCYGRLVCTVKEAAAQQQESATTQKAEKEVTRMVIIMVIFFLICWVPYAYVAFYIFTHQGSEFGPIFMTVPAFFAKSSAIYNPVIYIMLNKQFRNCMITTLCCGKNPFGDDDASSAATSVSTSQVSPA
>GP38_HUMAN
GSPWNGSDGPEGAREPPWPALPPCDERRCSPFPLGALVPVTAVCLCLFVVGVSGNVVTVMLIGRYRDMRTTTNLYLGSMAVSDLLILLGLPFDLYRLWRSRPWVFGPLLCRLSLYVGEGCTYATLLHMTALSVERYLAICRPLARVLVTRRRVRALIAVLWAVALLSAGPFLFLVG---AEAFSRECRPS----PAQLGALRVMLWVTTAYFFLPFLCLSILYGLIGRELWSSPLR--AASGRERGHRQTVRVLLVVVLAFIICWLPFHVGRIIYINTEDSYFSQYFNIVALQLFYLSASINPILYNLISKKYRAAAFKLLLARKSRPRGFHRSRDDTGGDTVGYTET
>GPRM_HUMAN
PILEINMQSESNITVRDDIDDINTNMYQPLSYPLSFQVSLTGFLMLEIVLGLGSNLTVLVLYCMKSNINSVSNIITMNLHVLDVIICVGCIPLTIVILLLSLESNTALICCFHEACVSFASVSTAINVFAITLDRYDISVKPA-NRILTMGRAVMLMISIWIFSFFSFLIPFIEVNFFS--TKTLLCVST---EYYTELGMYYHLLVQIPIFFFTVVVMLITYTKILQALNIRTISQHEARERRERQKRVFRMSLLIISTFLLCWTPISVLNTTILCLGPSDLLVKLRLCFLVMAYGTTIFHPLLYAFTRQKFQKVLKSKMKKRVVSIVEADPLPNWIDPKRNKKITF
>GPR3_HUMAN
GAGSPLAWLSAGSGNVNVSSVGPAEGPTGPAAPLPSPKAWDVVLCISGTLVSCENALVVAIIVGTPAFRAPMFLLVGSLAVADLLAG-LGLVLHFAAV-F--CIGSAEMSLVLVGVLAMAFTASIGSLLAITVDRYLSLYNALYYSETTVTRTYVMLALVWGGALGLGLLPVLAWNCL-D--GLTTCGVV-------YPLSKNHLVVLAIAFFMVFGIMLQLYAQICRIVCRHIALHLLPASHYVATRKGIATLAVVLGAFAACWLPFTVYCLLGDA----HSPPLYTYLTLLPATYNSMINPIIYAFRNQDVQKVLWAVCCCCSSSKIPFRSRSP------------
>CCKR_RAT
DSLLMNGSNITPPCELGLENETLFCLDQPQPSKEWQSALQILLYSIIFLLSVLGNTLVITVLIRNKRMRTVTNIFLLSLAVSDLMLCLFCMPFNLIPNLLKDFIFGSAVCKTTTYFMGTSVSVSTFNLVAISLERYGAICRPLSRVWQTKSHALKVIAATWCLSFTIMTPYPIYN----NNQTANMCRFL---LPSDAMQQSWQTFLLLILFLLPGIVMVVAYGLISLELYQGRLNSSSSAANLIAKKRVIRMLIVIVVLFFLCWMPIFSANAWRAYDTVSHLSGTPISFILLLSYTSSCVNPIIYCFMNKRFRLGFMATFPCCPNPGPPGVRGEVTIRALLSRYSYS
>RDC1_MOUSE
YAEPGNYSDINWPCNSSDCIVVDTVQCPTMPNKNVLLYTLSFIYIFIFVIGMIANSVVVWVNIQAKTTGYDTHCYILNLAIADLWVVITTPVWVVSLVQHNQWPMGELTCKITHLIFSINLFGSIFFLACMSVDRYLSITYFTTSSYKKKMVRRVVCILVWLLAFFVSLPDTYYLKAVTNNE--TYCRSFYPEHSIKEWLIGMELVSVILGFAVPFTIIAIFYFLLARAM---------SASGDQEKHSSRKIIFSYVVVFLVCWLPYHFVVLLDIFSILHNVLFTALHVTQCLSLVHCCVNPVLYSFINRNYRYELMKAFIFKYSAKTGLTKLIDEYSALEQNTK--
>OLF9_RAT
-------------MTRRNQTAISQFFLLGLPFPPEYQHLFYALFLAMYLTTLLGNLIIIILILLDSHLHTPMYLFLSNLSFADLCFSSVTMPKLLQNMQSQVPSIPYAGCLAQIYFFLFFGDLGNFLLVAMAYDRYVAICFPLYMSIMSPKLCVSLVVLSWVLTTFHAMLHTLLMARL---SPHYFCDMSKVACSDTHDNELAIFILGGPIVVLPFLLIIVSYARIVSSI--------FKVPSSQSIHKAFSTCGSHLSVVSLFYGTVIGLYLCPSANN---STVKETVMSLMYTMVTPMLNPFIYSLRNRDIKDALEKIMCKKQIPSFL------------------
>5H1D_RABIT
SPSNQSAEGLPQEAANRSLNATGTPEAWDPGTLQALKISLAVVLSIITVATVLSNTFVLTTILLTRKLHTPANYLIGSLATTDLLVSILVMPISIAYTITHTWNFGQVLCDIWVSSDITCCTASILHLCVIALDRYWAITDALYSKRRTAGHAAAMIAVVWAISICISIPPLFWR----H-E--SDCLVN-------TSQISYTIYSTCGAFYIPSVLLIVLYGRIYMAARNRKLALERKRISAARERKATKTLGIILGAFIGCWLPFFVASLVLPICRDSWMPPGLFDFFTWLGYLNSLINPIIYTVFNEDFRQAFQRVIHFRKAF---------------------
>ML1C_.ENLA
-----MMEVNSTCLDCRTPGTIRTEQDAQDSASQGLTSALAVVLIFTIVVDVLGNILVILSVLRNKKLQNAGNLFVVSLSIADLVVAVYPYPVILIAIFQNGWTLGNIHCQISGFLMGLSVIGSVFNITAIAINRYCYICHSLYDKLYNQRSTWCYLGLTWILTIIAIVPNFFVGS-LQYDP-IFSCTFA------QTVSSSYTITVVVVHFIVPLSVVTFCYLRIWVLVIQVHR-QDFKQKLTQTDLRNFLTMFVVFVLFAVCWAPLNFIGLAVAINPFHKIPEWLFVLSYFMAYFNSCLNAVIYGVLNQNFRKEYKRILMSLLTPRLLFLDTSRSKPSPAVTNNNQ
>SSR2_PIG
LLNGSQPWLSSPFDLNGSVATANSSNQTEPYYDLTSNAVLTFIYFVVCIIGLCGNTLVIYVILRYAKMKTITNIYILNLAIADELFMLGLPFLAMQVALVH-WPFGKAICRVVMTVDGINQFTSIFCLTVMSIDRYLAVVHPISAKWRRPRTAKMINVAVWGVSLLVILPIMIYAGLRSWG--RSSCTIN-WPGESGAWYTGFIIYAFILGFLVPLTIICLCYLFIIIKVKSSGIR-VGSSKRKKSEKKVTRMVSIVVAVFIFCWLPFYIFNVSSVSVAISPALKGMFDFVVVLTYANSCANPILYAFLSDNFKKSFQNVLCLVKVSGTDDGERSDLNETTETQRTLL
>NTR1_HUMAN
EEALLAPGFGNASGNASERVLAAPSSELDVNTDIYSKVLVTAVYLALFVVGTVGNTVTAFTLARKKSLQSTVHYHLGSLALSDLLTLLLAMPVELYNFIWVHWAFGDAGCRGYYFLRDACTYATALNVASLSVERYLAICHPFAKTLMSRSRTKKFISAIWLASALLTVPMLFTMGEQN--SGGLVCTPT---IHTATVKVVIQVN-TFMSFIFPMVVISVLNTIIANKLTVMEHSMAIEPGRVQALRHGVRVLRAVVIAFVVCWLPYHVRRLMFCYISDEDFYHYFYMVTNALFYVSSTINPILYNLVSANFRHIFLATLACLCPVWRRRRK-RPSVSSNHTLSSNA
>AG2R_CAVPO
------MILNSSTEDGIKRIQDDC---PKAGRHSYIFVMIPTLYSIIFVVGIFGNSLVVIVIYFYMKLKTVASVFLLNLALADICFLLTLPLWAVYTAMEYRWPFGNYLCKIASASVSFNLYASVFLLTCLSIDRYLAIVHPMSRLRRTMLVAKVTCVIIWLMAGLASLPAVIHRNVFF-TN--TVCAFH-YESQNSTLPIGLGLTKNILGFMFPFLIILTSYTLIWKALKKA----YEIQKNKPRNDDIFKIIMAIVLFFFFSWVPHQIFTFLDVLIQLGDIVDTAMPITICIAYFNNCLNPLFYGFLGKKFKKYFLQLLKYIPPKAKSHSTLSTRPSD-------N
>PE23_RABIT
PFCTRLNHSYPGMWAP----EARGNLTRPPGPGEDCGSVSVAFPITMLITGFVGNALAMLLVSRSYRRKKSFLLCIGWLALTDLVGQLLTSPVVILVYLSKQLDPSGRLCTFFGLTMTVFGLSSLFIASAMAVERALAIRAPHYASHMKTRATRAVLLGVWLAVLAFALLPVLGVG----QYTGTWCFISNGTSSSHNWGNLFFASTFAFLGLLALAITFTCNLATIKALVSRAKASQSSAQWGRITTETAIQLMGIMCVLSVCWSPLLIMMLKMIFNQTSQKECNFFLIAVRLASLNQILDPWVYLLLRKILLRKFCQVIHENNEQKDEIQRENRHEEARDSEKSKT
>CB1A_FUGRU
TDLFGNRNTTRD-ENSIQCGENFMDMECFMILTPSQQLAVAVLSLTLGTFTVLENLVVLCVIFQSRTRCRPSYHFIGSLAVADLLGSVIFVYSFLDFHVFHK-KDSPNVFLFKLGGVTASFTASVGSLFLTAIDRYISIHRPLYRRIVTRTKAVIAFCMMWTISIIIAVLPLLGWNCK-R--LNSVCSDI------FPLIDENYLMFWIGVTSVLVLFIIYAYIYILWKAHHHAEGQTTRPEQTRMDIRLAKTLVLILAVLVICWGPLLAIMVYDLFWKMDDNIKTVFAFCSMLCLLNSTVNPIIYALRSRDLRHAFLSSCHACRGSAQQLDNSLENVN--ISANRAA
>CKR9_HUMAN
DDYGSESTSSMEDYVNFNFTDFYC---EKNNVRQFASHFLPPLYWLVFIVGALGNSLVILVYWYCTRVKTMTDMFLLNLAIADLLFLVTLPFWAIAAADQ--WKFQTFMCKVVNSMYKMNFYSCVLLIMCISVDRYIAIAQAMTWREKRLLYSKMVCFTIWVLAAALCIPEILYSQIKE-G--IAICTMVYPSDESTKLKSAVLTLKVILGFFLPFVVMACCYTIIIHTL---------IQAKKSSKHKALKVTITVLTVFVLSQFPYNCILLVQTIDAYATNIDICFQVTQTIAFFHSCLNPVLYVFVGERFRRDLVKTLKNLGCISQAQWVSFT--------GSLK
>V1BR_HUMAN
---MDSGPLWDANPTPRGTLSAPNATTPWLGRDEELAKVEIGVLATVLVLATGGNLAVLLTLGQLGRKRSRMHLFVLHLALTDLAVALFQVLPQLLWDITYRFQGPDLLCRAVKYLQVLSMFASTYMLLAMTLDRYLAVCHPLRSLQQPGQSTYLLIAAPWLLAAIFSLPQVFIFSLRESG--VLDCWAD---FGFPWGPRAYLTWTTLAIFVLPVTMLTACYSLICHEICKNRGLVSSINTISRAKIRTVKMTFVIVLAYIACWAPFFSVQMWSVWDKNADSTNVAFTISMLLGNLNSCCNPWIYMGFNSHLLPRPLRHLACCGGPQPRMRRRLSHTTLLTRSSCPA
>5H4_RAT
------------------MDRLDANVSSNEGFGSVEKVVLLTFFAMVILMAILGNLLVMVAVCRDRQRKIKTNYFIVSLAFADLLVSVLVNAFGAIELVQDIWFYGEMFCLVRTSLDVLLTTASIFHLCCISLDRYYAICCQPYRNKMTPLRIALMLGGCWVIPMFISFLPIMQGWNNIRK-NSTFCVFM--------VNKPYAITCSVVAFYIPFLLMVLAYYRIYVTAKEHSRPDQHSTHRMRTETKAAKTLCVIMGCFCFCWAPFFVTNIVDPFIDYT-VPEKVWTAFLWLGYINSGLNPFLYAFLNKSFRRAFLIILCCDDERYKRPPILGQTINGSTHVLR--
>5HTA_DROME
DSSLFGEMLANRSGHLDLINGTTSKVAEDDFTQLLRMAVTSVLLGLMILVTIIGNVFVIAAIILERNLQNVANYLVASLAVADLFVACLVMPLGAVYEISQGWILGPELCDIWTSCDVLCCTASILHLVAIAVDRYWAVTNI-YIHSRTSNRVFMMIFCVWTAAVIVSLAPQFGWKDPDE-Q--QKCMVS--------QDVSYQVFATCCTFYVPLMVILALYWKIYQTARKRPMQKRKETLEAKRERKAAKTLAIITGAFVVCWLPFFVMALTMPLCAACQISDSVASLFLWLGYFNSTLNPVIYTIFSPEFRQAFKRILFGGHRPVHYRSGKL-------------
>HH1R_CAVPO
-MSFLPGMTPVTLSNFSWALEDRMLEGNSTTTPTRQLMPLVVVLSSVSLVTVALNLLVLYAVRSERKLHTVGNLYIVSLSVADLIVGAVVMPMSILYLHRSAWILGRPLCLFWLSMDYVASTASIFSVFILCIDRYRSVQQPLYLRYRTKTRASATILGAWLLSFLWVIPILGWHHFM--EP-EKKCETD------FYDVTWFKVMTAIINFYLPTLLMLWFYIRIYKAVRRHLRSQYTSGLHLNRERKAAKQLGCIMAAFILCWIPYFVFFMVIAFCKSC-SNEPVHMFTIWLGYLNSTLNPLIYPLCNENFRKTFKRILRIPP-----------------------
>OPSD_PHOGR
MNGTEGPNFYVPFSNKTGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVGLTWVMALACAAPPLVGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGFNFGPIFMTLPAFFAKAAAIYNPVIYIMMNKQFRTCMITTLCCGKNPLGDDEVSASASKTETSQVAPA
>MSHR_CANFA
VWQGPQRRLLGSLNGTSPATPHFELAANQTGPRCLEVSIPNGLFLSLGLVSVVENVLVVAAIAKNRNLHSPMYYFIGCLAVSDLLVSVTNVLETAVMLLVEAAAVVQQLDDIIDVLICGSMVSSLCFLGAIAVDRYLSIFYALYHSIVTLPRAWRAISAIWVASVLSSTLFIAYYY----------------------NNHTAVLLCLVSFFVAMLVLMAVLYVHMLARARQH-IAKRQHSVHQGFGLKGAATLTILLGIFFLCWGPFFLHLSLMVLCPQHGCVFQNFNLFLTLIICNSIIDPFIYAFRSQELRKTLQEVVLCSWA----------------------
>A2AC_HUMAN
AAGPNASGAGERGSGGVANASGASWGPPRGQYSAGAVAGLAAVVGFLIVFTVVGNVLVVIAVLTSRALRAPQNLFLVSLASADILVATLVMPFSLANELMAYWYFGQVWCGVYLALDVLFCTSSIVHLCAISLDRYWSVTQAVYNLKRTPRRVKATIVAVWLISAVISFPPLVSLYR-------PQCGLN--------DETWYILSSCIGSFFAPCLIMGLVYARIYRVAKRRRRAVCRRKVAQAREKRFTFVLAVVMGVFVLCWFPFFFIYSLYGICREAQVPGPLFKFFFWIGYCNSSLNPVIYTVFNQDFRPSFKHILFRRRRRGFRQ-----------------
>B1AR_SHEEP
VPDGAATAARLLVP-SPLRLAADLGQRGTPLLSQQWTVGMGLLMAFIVLLIVAGNVLVIVAIAKTPRLQTLTNLFIMSLASADLVMGLLVVPFGATIVVWGRWEYGSFFCELWTSVDVLCVTASIETLCVIALDRYLAITSPFYQSLLTRARARALVCTVWAISALVSFLPIFMQWWGDS-R--ECCDFI--------INEGYAITSSVVSFYVPLCIMAFVYLRVFREAQKQNGRRRPSRLVALREQKALKTLGIIMGVFTLCWLPFFLANVVKAFHRDL-VPDRLFVFFNWLGYANSAFNPIIYCRSP-DFRKAFQRLLCCARRAACGSHGAAGCLAVARPSPSPG
>OPS4_DROME
S-GNGDLQFLGWNVPPDQIQYIPEHWLTQLEPPASMHYMLGVFYIFLFCASTVGNGMVIWIFSTSKSLRTPSNMFVLNLAVFDLIMCLKAPIFNSFHRGFA-IYLGNTWCQIFASIGSYSGIGAGMTNAAIGYDRYNVITKPM-NRNMTFTKAVIMNIIIWLYCTPWVVLPLTQFWDRFPEGYLTSCSFD--YLSDNFDTRLFVGTIFFFSFVCPTLMILYYYSQIVGHVFSHNVESNVDKSKETAEIRIAKAAITICFLFFVSWTPYGVMSLIGAFGDKSLLTQGATMIPACTCKLVACIDPFVYAISHPRYRLELQKRCPWLGVNEKSGEISSAQQQTTAA-----
>O1E1_HUMAN
-------------MMGQNQTSISDFLLLGLPIQPEQQNLCYALFLAMYLTTLLGNLLIIVLIRLDSHLHTPMYLFLSNLSFSDLCFSSVTIPKLLQNMQNQDPSIPYADCLTQMYFFLLFGDLESFLLVAMAYDRYVAICFPLYTAIMSPMLCLALVALSWVLTTFHAMLHTLLMARL---CPHFFCDMSKLAFSDTRVNEWVIFIMGGLILVIPFLLILGSYARIVSSI--------LKVPSSKGICKAFSTCGSHLSVVSLFYGTVIGLYLCSSANS---STLKDTVMAMMYTVVTPMLNPFIYSLRNRDMKGALSRVIHQKKTFFSL------------------
>MSHR_CHICK
-MSMLAPLRLVREPWNASEGNQSNATAGAGGAWCQGLDIPNELFLTLGLVSLVENLLVVAAILKNRNLHSPTYYFICCLAVSDMLVSVSNLAKTLFMLLMEHASIVRHMDNVIDMLICSSVVSSLSFLGVIAVDRYITIFYALYHSIMTLQRAVVTMASVWLASTVSSTVLITYYY----------------------RRNNAILLCLIGFFLFMLVLMLVLYIHMFALACHHSISQKQPTIYRTSSLKGAVTLTILLGVFFICWGPFFFHLILIVTCPTNTCFFSYFNLFLILIICNSVVDPLIYAFRSQELRRTLREVVLCSW-----------------------
>A1AA_BOVIN
----------MVFLSGNASDSSNCT-HPPPPVNISKAILLGVILGGLILFGVLGNILVILSVACHRHLHSVTHYYIVNLAVADLLLTSTVLPFSAIFEILGYWAFGRVFCNVWAAVDVLCCTASIMGLCIISIDRYIGVSYPLYPTIVTQKRGLMALLCVWALSLVISIGPLFGWRQP--AP--TICQIN--------EEPGYVLFSALGSFYVPLTIILVMYCRVYVVAKREAKNFSVRLLKFSREKKAAKTLGIVVGCFVLCWLPFFLVMPIGSFFPDFRPSETVFKIAFWLGYLNSCINPIIYPCSSQEFKKAFQNVLRIQCLRRKQSSKHTLSH----------
>CKR5_PYGBI
----MDYQVSSPTYDIDYYTSEPC---QKVNVKQIAARLLPPLYSLVFIFGFVGNILVVLILINCKRLKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQR-EG-HYTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKRFCKCCYIFA-------SSVY
>OPSO_SALSA
TLRIAVNGVSYNEASEIYKPHADPFTGPITNLAPWNFAVLATLMFVITSLSLFENFTVMLATYKFKQLRQPLNYIIVNLSLADFLVSLTGGTISFLTNARGYFFLGNWACVLEGFAVTYFGIVAMWSLAVLSFERYFVICRPLGNVRLRGKHAALGLLFVWTFSFIWTIPPVFGWCSYTVSKIGTTCEPN--WYSNNIWNHTYIITFFVTCFIMPLGMIIYCYGKLLQKLRKVSHD--RLGNAKKPERQVSRMVVVMIVAYLVGWTPYAAFSIIVTACPTIYLDPRLAAAPAFFSKTAAVYNPVIYVFMNKQVSTQLNWGFWSRA-----------------------
>NY1R_RAT
TLFSRVENYSVHYNVSE-NSPFLAFENDDCHLPLAVIFTLALAYGAVIILGVSGNLALIIIILKQKEMRNVTNILIVNLSFSDLLVAVMCLPFTFVYTLMDHWVFGETMCKLNPFVQCVSITVSIFSLVLIAVERHQLIINPR-GWRPNNRHAYIGITVIWVLAVASSLPFVIYQILTDKD--KYVCFDK---FPSDSHRLSYTTLLLVLQYFGPLCFIFICYFKIYIRLKRRNMMIRDSKYRSSETKRINVMLLSIVVAFAVCWLPLTIFNTVFDWNHQICNHNLLFLLCHLTAMISTCVNPIFYGFLNKNFQRDLQFFFNFCDFRSRDDDY---TMHTDVSKTSLK
>OLF3_CANFA
-------------MGTGNQTWVREFVLLGLSSDWDTEVSLFVLFLITYMVTVLGNFLIILLIRLDSRLHTPMYFFLTNLSLVDVSYATSIIPQMLAHLLAAHKAIPFVSCAAQLFFSLGLGGIEFVLLAVMAYDRYVAVCDPLYSVIMHGGLCTRLAITSWVSGSMNSLMQTVITFQL---PDHISCELLRLACVDTSSNEIAIMVSSIVLLMTPFCLVLLSYIQIISTI--------LKIQSTEGRKKAFHTCASHLTVVVLCYGMAIFTYIQPRSSP---SVLQEKLISLFYSVLTPMLNPMIYSVRNKEVKGAWQKLLGQLTGITSKLAT---------------
>P2Y9_HUMAN
DRRFIDFQFQDSNSSLRPRLGNATANNTCIVDDSFKYNLNGAVYSVVFILGLITNSVSLFVFCFRMKMRSETAIFITNLAVSDLLFVCTLPFKIFYNFNRH-WPFGDTLCKISGTAFLTNIYGSMLFLTCISVDRFLAIVYPFSRTIRTRRNSAIVCAGVWILVLSGGISASLFSTTN----ATTTCFEGFSKRVWKTYLSKITIFIEVVGFIIPLILNVSCSSVVLRTLRKP----ATLSQIGTNKKKVLKMITVHMAVFVVCFVPYNSVLFLYALVRSQRFAKIMYPITLCLATLNCCFDPFIYYFTLESFQKSFYINAHIRMESLFKTETPLTIQEEVSDQTTNN
>MC5R_BOVIN
-MNSSFHLHFLDLGLNTTDGNLSGLSVQNASSLCEDMGIAVEVFLALGLISLLENILVIGAIVRNRNLHTPMYFFVGSLAVADMLVSLSNSWETITIYLLTNDASVRHLDNVFDSMICISVVASMCSLLAIAVDRYVTIFCALYQRIMTGRRSGAIIGGIWAFCASCGTVFIVYYY----------------------EESTYVVICLIAMFLTMLLLMASLYTHMFLLARTH-RIPGHSSVRQRTGVKGAITLAMLLGVFIVCWAPFFLHLILMISCPHNSCFMSHFNMYLILIMCNSVIDPLIYAFRSQEMRKTFKEIVCFQSFRTPCRFPSRY------------
>CKR5_TRAPH
----MDYQVSSPTYDIDYYTSEPC---QKVNVKQIAARLLPPLYSLVFIFGFVGNILVVLILINCKRLKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQR-EG-HYTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKRFCKCCSIFA-------SSVY
>OPSB_SAIBB
--MSKMPEEEEFYLFKNISSVGPWDGPQYHIAPVWAFQLQAAFMGIVFLAGLPLNSMVLVATVRYKKLRHPLNYVLVNVSVGGFLLCIFSVLPVFVNSCNGYFVFGRHVCALEGFLGTVAGLVTGWSLAFLAFERYIVICKPFGNFRFSSKHALMVVLTTWTIGIGVSIPPFFGWSRYIAEGLQCSCGPDWYTVGTKYRSEYYTWFLFIFCFIVPLSLICFSYAQLLRALKAVAAQQQESATTQKAEREVSRMVVVMVGSFCVCYVPYAALAMYMVNNRNHGLDLRLVSIPAFFSKSSCIYNPIIYCFMNKQFRACIMEMVCGKAMTD---ESDISSTVSSSQVGPN-
>OAR_DROME
LSTAQADKDSAGECEGAVEELHASILGLQLAVPEWEALLTALVLSVIIVLTIIGNILVILSVFTYKPLRIVQNFFIVSLAVADLTVALLVLPFNVAYSILGRWEFGIHLCKLWLTCDVLCCTSSILNLCAIALDRYWAITDPIYAQKRTVGRVLLLISGVWLLSLLISSPPLIGWNDW-T-S--TPCELT--------SQRGYVIYSSLGSFFIPLAIMTIVYIEIFVATRRRGVNEEKQKISLSKERRAARTLGIIMGVFVICWLPFFLMYVILPFCQTCCPTNKFKNFITWLGYINSGLNPVIYTIFNLDYRRAFKRLLGLN------------------------
>PAR3_HUMAN
-WTGATITVKIKCPEESASHLHVKNATMGYLTSSLSTKLIPAIYLLVFVVGVPANAVTLWMLFFRTR-SICTTVFYTNLAIADFLFCVTLPFKIAYHLNGNNWVFGEVLCRATTVIFYGNMYCSILLLACISINRYLAIVHPFYRGLPKHTYALVTCGLVWATVFLYMLPFFILKQEYY--D--TTCHDVNTCESSSPFQLYYFISLAFFGFLIPFVLIIYCYAAIIRTL----------NAYDHRWLWYVKASLLILVIFTICFAPSNIILIIHHANYYYDGLYFIYLIALCLGSLNSCLDPFLYFLMSKTRNHSTAYLTK--------------------------
>NMBR_HUMAN
SNLSVTTGANESGSVPEGWERDFLPASDGTTTELVIRCVIPSLYLLIITVGLLGNIMLVKIFITNSAMRSVPNIFISNLAAGDLLLLLTCVPVDASRYFFDEWMFGKVGCKLIPVIQLTSVGVSVFTLTALSADRYRAIVNPMMQTSGALLRTCVKAMGIWVVSVLLAVPEAVFSEVARSS--FTACIPY---QTDELHPKIHSVLIFLVYFLIPLAIISIYYYHIAKTLIKSLPGNEHTKKQMETRKRLAKIVLVFVGCFIFCWFPNHILYMYRSFNYNELGHMIVTLVARVLSFGNSCVNPFALYLLSESFRRHFNSQLCCGRKSYQERGTSYLMTSLKSNAKNMV
>AA1R_BOVIN
----------------------------MPPSISAFQAAYIGIEVLIALVSVPGNVLVIWAVKVNQALRDATFCFIVSLAVADVAVGALVIPLAILINIG--PRTYFHTCLKVACPVLILTQSSILALLAMAVDRYLRVKIPLYKTVVTPRRAVVAITGCWILSFVVGLTPMFGWNNLSN-G-VIECQFE-----KVISMEYMVYFNFFVWVLPPLLLMVLIYMEVFYLIRKQK-VSGDPQKYYGKELKIAKSLALILFLFALSWLPLHILNCITLFCPSCHMPRILIYIAIFLSHGNSAMNPIVYAFRIQKFRVTFLKIWNDHFRCQPAPPID--PDD---------
>C.C1_MOUSE
----------MESSTAFYDYHDKLSLLCENNVIFFSTISTIVLYSLVFLLSLVGNSLVLWVLVKYENLESLTNIFILNLCLSDLMFSCLLPVLISAQWS---WFLGDFFCKFFNMIFGISLYSSIFFLTIMTIHRYLSVVSPITLGIHTLRCRVLVTSCVWAASILFSIPDAVFHKVIS-----LNCKYS------EHHGFLASVYQHNIFFLLSMGIILFCYVQILRTL---------FRTRSRQRHRTVRLIFTVVVAYFLSWAPYNLTLFLKTGIIQQQQLDIAMIICRHLAFSHCCFNPVLYVFVGIKFRRHLKHLFQQVWLCRKTSSTVPCEGPSFY------
>OPS1_PATYE
NGTLNRSMTPNTGWEGPYDMSVHLHWTQFPPVTEEWHYIIGVYITIVGLLGIMGNTTVVYIFSNTKSLRSPSNLFVVNLAVSDLIFSAVNGFPLLTVSSFHQWIFGSLFCQLYGFVGGVFGLMSINTLTAISIDRYVVITKPLASQTMTRRKVHLMIVIVWVLSILLSIPPFFGWGAYIPEGFQTSCTFD--YLTKTARTRTYIVVLYLFGFLIPLIIIGVCYVLIIRGVRRHSMKARANNKRARSELRISKIAMTVTCLFIISWSPYAIIALIAQFGPAHWITPLVSELPMMLAKSSSMHNPVVYALSHPKFRKALYQRVPWLFCCCKPKEKADFRSVTRTESVNSD
>5H2B_MOUSE
ILQKTCDHLILTNRSGLETDSVAEEMKQTVEGQGHTVHWAALLILAVIIPTIGGNILVILAVALEKRLQYATNYFLMSLAIADLLVGLFVMPIALLTIMFEAWPLPLALCPAWLFLDVLFSTASIMHLCAISLDRYIAIKKPIANQCNTRATAFIKITVVWLISIGIAIPVPIKGIET-NP---VTCELT------KDRFGSFMVFGSLAAFFVPLTIMVVTYFLTIHTLQKKRRMGKRSAQTISNEQRASKALGVVFFLFLLMWCPFFITNLTLALCDSCTTLKTLLEIFVWIGYVSSGVNPLIYTLFNKTFREAFGRYITCNYRATKSVKALRKGNSMVENSKFFT
>OPSG_HUMAN
DSYEDSTQSSIFTYTNSNSTRGPFEGPNYHIAPRWVYHLTSVWMIFVVIASVFTNGLVLAATMKFKKLRHPLNWILVNLAVADLAETVIASTISVVNQVYGYFVLGHPMCVLEGYTVSLCGITGLWSLAIISWERWMVVCKPFGNVRFDAKLAIVGIAFSWIWAAVWTAPPIFGWSRYWPHGLKTSCGPDVFSGSSYPGVQSYMIVLMVTCCITPLSIIVLCYLQVWLAIRAVAKQQKESESTQKAEKEVTRMVVVMVLAFCFCWGPYAFFACFAAANPGYPFHPLMAALPAFFAKSATIYNPVIYVFMNRQFRNCILQLFGKKVDDG-----SELVSSV--SSVSPA
>ACM4_MOUSE
------MANFTPVNGSSANQSVRLVTTAHNHLETVEMVFIATVTGSLSLVTVVGNILVMLSIKVNRQLQTVNNYFLFSLACADLIIGAFSMNLYTLYIIKGYWPLGAVVCDLWLALDYVVSNASVMNLLIISFDRYFCVTKPLYPARRTTKMAGLMIAAAWVLSFVLWAPAILFWQFVV-VP-DNQCFIQ------FLSNPAVTFGTAIAAFYLPVVIMTVLYIHISLASRSRSIAVRKKRQMAARERKVTRTIFAILLAFILTWTPYNVMVLVNTFCQSC-IPERVWSIGYWLCYVNSTINPACYALCNATFKKTFRHLLLCQYRNIGTAR----------------
>AA1R_RAT
----------------------------MPPYISAFQAAYIGIEVLIALVSVPGNVLVIWAVKVNQALRDATFCFIVSLAVADVAVGALVIPLAILINIG--PQTYFHTCLMVACPVLILTQSSILALLAIAVDRYLRVKIPLYKTVVTQRRAAVAIAGCWILSLVVGLTPMFGWNNLSN-G-VIKCEFE-----KVISMEYMVYFNFFVWVLPPLLLMVLIYLEVFYLIRKQK-VSGDPQKYYGKELKIAKSLALILFLFALSWLPLHILNCITLFCPTCQKPSILIYIAIFLTHGNSAMNPIVYAFRIHKFRVTFLKIWNDHFRCQPKPPID--AED---------
>AA1R_CHICK
----------------------------MAQSVTAFQAAYISIEVLIALVSVPGNILVIWAVKMNQALRDATFCFIVSLAVADVAVGALVIPLAIIINIG--PQTEFYSCLMMACPVLILTESSILALLAIAVDRYLRVKIPVYKSVVTPRRAAVAIACCWIVSFLVGLTPMFGWNNLNN-V-VIKCQFE-----TVISMEYMVYFNFFVWVLPPLLLMLLIYLEVFNLIRTQK-VSNDPQKYYGKELKIAKSLALVLFLFALSWLPLHILNCITLFCPSCKTPHILTYIAIFLTHGNSAMNPIVYAFRIKKFRTAFLQIWNQYFCCKTNKSSS--VN----------
>P2YR_MELGA
PELLAG-----------GWAAGNASTKCSLTKTGFQFYYLPTVYILVFITGFLGNSVAIWMFVFHMRPWSGISVYMFNLALADFLYVLTLPALIFYYFNKTDWIFGDVMCKLQRFIFHVNLYGSILFLTCISVHRYTGVVHPLSLGRLKKKNAVYVSSLVWALVVAVIAPILFYSGTG----KTITCYDT-TADEYLRSYFVYSMCTTVFMFCIPFIVILGCYGLIVKALIY------KDLDNSPLRRKSIYLVIIVLTVFAVSYLPFHVMKTLNLRARLDDKVYATYQVTRGLASLNSCVDPILYFLAGDTFRRRLSRATRKSSRRSEPNVQSKSLTEYKQNGDTSL
>GRE1_BALAM
----MEGPPLSPAPADNVTLNVSCGRPATLFDWADHRLISLLALAFLNLMVVAGNLLVVMAVFVHSKLRTVTNLFIVSLACADLLVGMLVLPFSATLEVLDVWLYGDVWCSVWLAVDVWMCTSSILNLCAISLDRYLAVSQPIYPSLMSTRRAKQLIAAVWVLSFVICFPPLVGWNDR-G-S--LTCELT--------NERGYVIYSALGSFFLPSTVMLFFYGRIYRTAVSTRVSVRHQARRFRMETKAAKTVGIIVGLFILCWLPFFVCYLVRGFCADC-VPPLLFSVFFWLGYCNSAVNPCVYALCSRDFRFAFSSILCKCVCRRGAMERRFRRSQTEEDCEVAD
>TDA8_MOUSE
---------------------MAMNSMCIEEQRHLEHYLFPVVYIIVFIVSVPANIGSLCVSFLQAKKENELGIYLFSLSLSDLLYALTLPLWINYTWNKDNWTFSPTLCKGSVFFTYMNFYSSTAFLTCIALDRYLAVVYPLFSFLRTRRFAFITSLSIWILESFFNSMLLWKDETS-DKSNFTLCYDK---YPLEKWQINLNLFRTCMGYAIPLITIMICNHKVYRAV------RHNQATENSEKRRIIKLLASITLTFVLCFTPFHVMVLIRCVLERDWQTFTVYRVTVALTSLNCVADPILYCFVTETGRADMWNILKLCTRKHNRHQGKKRRDAVELEIID--
>MSHR_CEREL
PVLGSQRRLLGSLNCTPPATFPLTLAPNRTGPQCLEVAIPDGLFLSLGLVSLVENVLVVAAIAKNRNLQSPMYYFICCLAMSDLLVSVSNVLETAVMLLLEAAAVVQQLDNVIDVLICGSMVSSLCFLGAIAVDRYISIFYALYHSVVTLPRAWRIIAAIWVASILTSLLFITYYY----------------------NNHTVVLLCLVGFFIAMLALMAVLYVHMLARACQHIARKRQRPIHQGFGLKGAATLTILLGVFFLCWGPFFLHLSLIVLCPQHGCIFKNFNLFLALIICNAIVDPLIYAFRSQELRKTLQEVLQCSW-----------------------
>NY1R_MOUSE
TLFSKVENHSIHYNASE-NSPLLAFENDDCHLPLAVIFTLALAYGAVIILGVSGNLALIIIILKQKEMRNVTNILIVNLSFSDLLVAVMCLPFTFVYTLMDHWVFGETMCKLNPFVQCVSITVSIFSLVLIAVERHQLIINPR-GWRPNNRHAYIGITVIWVLAVASSLPFVIYQILTDKD--KYVCFDK---FPSDSHRLSYTTLLLVLQYFGPLCFIFICYFKIYIRLKRRNMMIRDSKYRSSETKRINIMLLSIVVAFAVCWLPLTIFNTVFDWNHQICNHNLLFLLCHLTAMISTCVNPIFYGFLNKNFQRDLQFFFNFCDFRSRDDDY---TMHTDVSKTSLK
>GPR1_MACMU
EDLEETLFEEFENYSYALDYYSLESDLEEKVQLGVVHWVSLVLYCLSFVLGIPGNAIVIWFTGFKWKKTVS-TLWFLNLAIADFIFLLFLPLYISYVVMNFHWPFGIWLCKANSFTAQLNMFASVFFLTVISLDHYIHLIHPVSHRHRTLKNSLIVIIFIWLLASLIGGPALYFR--D--NN-HTLCYNNHDPDLTVIRHHVLTWVKFIVGYLFPLLTMSICYLCLIFKV---------KKRSILISSRHFWTILAVVVAFVVCWTPYHLFSIWELTIHHNHVMQAGIPLSTGLAFLNSCLNPILYVLISKKFQARFRSSVAEILKYTLWEVSCSGNSETKNLCLLET
>ACM1_PIG
------------MNTSAPPAVSPNITVLAPGKGPWQVAFIGITTGLLSLATVTGNLLVLISFKVNTELKTVNNYFLLSLACADLIIGTFSMNLYTTYLLMGHWALGTLACDLWLALDYVASNASVMNLLLISFDRYFSVTRPLYRAKRTPRRAALMIGLAWLVSFVLWAPAILFWQYLV-VL-AGQCYIQ------FLSQPIITFGTAMAAFYLPVTVMCTLYWRIYRETENRRGKAKRKTFSLVKEKKAARTLSAILLAFIVTWTPYNIMVLVSTFCKDC-VPETLWELGYWLCYVNSTINPMCYALCNKAFRDTFRLLLLCRWDKRRWRKIPKRPSRQC-------
>RTA_RAT
EAHSTNQNKMCPGMSEALELYSRGFLTIEQIATLPPPAVTNYIFLLLCLCGLVGNGLVLWFFGFSIK-RTPFSIYFLHLASADGIYLFSKAVIALLNMGTFLGSFPDYVRRVSRIVGLCTFFAGVSLLPAISIERCVSVIFPMYWRRRPKRLSAGVCALLWLLSFLVTSIHNYFCM-FL---SGTACL-----------NMDISLGILLFFLFCPLMVLPCLALILHVE---------CRARRRQRSAKLNHVVLAIVSVFLVSSIYLGIDWFLFWVFQ--IPAPFPEYVTDLCICINSSAKPIVYFLAGRDKSQRLWEPLRVVFQRALRDGAEPGNTVTMEMQCPSG
>GP43_HUMAN
------------------------------MLPDWKSSLILMAYIIIFLTGLPANLLALRAFVGRIRQPAPVHILLLSLTLADLLLLLLLPFKIIEAASNFRWYLPKVVCALTSFGFYSSIYCSTWLLAGISIERYLGVAFPVYKLSRRPLYGVIAALVAWVMSFGHCTIVIIVQYLNT--N--ITCYEN-FTDNQLDVVLPVRLELCLVLFFIPMAVTIFCYWRFVWIMLSQP------LVGAQRRRRAVGLAVVTLLNFLVCFGPYNVSHLVGYHQR---KSPWWRSIAVVFSSLNASLDPLLFYFSSSVVRRAFGRGLQVLRNQGSSLLGRRG---------TAE
>CKR1_HUMAN
METPNTTEDYDTTTEFDYGDATPC---QKVNERAFGAQLLPPLYSLVFVIGLVGNILVVLVLVQYKRLKNMTSIYLLNLAISDLLFLFTLPFWIDYKLKDD-WVFGDAMCKILSGFYYTGLYSEIFFIILLTIDRYLAIVHAVALRARTVTFGVITSIIIWALAILASMPGLYFSKTQW-EF-HHTCSLHFPHESLREWKLFQALKLNLFGLVLPLLVMIICYTGIIKIL---------LRRPNEKKSKAVRLIFVIMIIFFLFWTPYNLTILISVFQDFLRHLDLAVQVTEVIAYTHCCVNPVIYAFVGERFRKYLRQLFHRRVAVHLVKWLPFLV-------SSTS
>VK02_SPVKA
YEYSTITDYYNTINNDITSSSVIKAFDNNCTFLEDTKYHIIVIHIILFLLGSIGNIFVVSLIAFKRN-KSITDIYILNLSMSDCIFVFQIPFIVYSKLDQ--WIFGNILCKIMSVLYYVGFFSNMFIITLMSIDRYFAIVHPIRQPYRTKRIGILMCCSAWLLSLILSSPVSKLYENIP--MDIYQCTLTENDSIIAFIKRLMQIEITILGFLIPIIIFVYCYYRIFTTV---------VRLRNRRKYKSIKIVLMIVVCSLICWIPLYIVLMIATIVSLYLNLAYAITFSETISLARCCINPIIYTLIGEHVRSRISSICSCIYRDNRIRKKLFSNII---------
>AA1R_CAVPO
----------------------------MPHSVSAFQAAYIGIEVLIALVSVPGNVLVIWAVKVNQALRDATFCFIASLAVADVAVGALVIPLAILINIG--PQTYFHTCLMVACPVLILTQSSILALLAIAVDRYLRVKIPLYKTVVTPRRAAVAIAGCWILSLVVGLTPMFGWNNLSN-G-VIKCEFE-----KVISMEYMVYFNFFVWVLPPLLLMVLIYLEVFYLIRKQK-VSGDPQKYYGKELKIAKSLALILFLFALSWLPLHILNCITLFCPTCHKPTILTYIAIFLTHGNSAMNPIVYAFRIQKFRVTFLKIWNDHFRCQPEPPID--VDD---------
>P2Y6_RAT
-----------MERDNGTIQAPGLPPTTCVYREDFKRLLLPPVYSVVLVVGLPLNVCVIAQICASRRTLTRSAVYTLNLALADLLYACSLPLLIYNYARGDHWPFGDLACRLVRFLFYANLHGSILFLTCISFQRYLGICHPLWHKRGGRRAAWVVCGVVWLVVTAQCLPTAVFAATG-----RTVCYDL-SPPILSTRYLPYGMALTVIGFLLPFTALLACYCRMARRLCRQ---GPAGPVAQERRSKAARMAVVVAAVFVISFLPFHITKTAYLAVRSTETFAAAYKGTRPFASANSVLDPILFYFTQQKFRRQPHDLLQKLTAKWQRQRV---------------
>O2H3_HUMAN
---------------MDNQSSTPGFLLLGFSEHPGLGRTLFVDVITSYLLTLVGNTLIILLSALDTKLHSPMYFFLSNLSFLDLCFTTSCVPQMLANLWGPKKTISFLDCSVQIFIFLSLGTTECILMKVMAFDRYVAVCQPLYATIIHPRLCWQLASVAWVIGLVGSVVQTPSTLHL---PDDFVCEVPRLSCEDTSYNEIQVAVASVFILVVPLSLILVSYGAITWAV--------LRINSATAWRKAFGTCSSHLTVVTLFYSSVIAVYLQPKNPY---AQGRGKFFGLFYAVGTPSLNPLVYTLRNKEIKRALRRLLGKERDSRESWRAA--------------
>OPRK_MOUSE
LPNSSSWFPNWAESDSNGSVGSEDQQLESAHISPAIPVIITAVYSVVFVVGLVGNSLVMFVIIRYTKMKTATNIYIFNLALADALVTTTMPFQSAVYLMNS-WPFGDVLCKIVISIDYYNMFTSIFTLTMMSVDRYIAVCHPVALDFRTPLKAKIINICIWLLASSVGISAIVLGGTKVDVD-VIECSLQFPDDEYSWWDLFMKICVFVFAFVIPVLIIIVCYTLMILRLKSVRLL-SGSREKDRNLRRITKLVLVVVAVFIICWTPIHIFILVEALGSTSTAALSSYYFCIALGYTNSSLNPVLYAFLDENFKRCFRDFCFPIKMRMERQSTNRVASMRDVGGMNKP
>5HTB_DROME
TTSNLSQIVWNRSVNGNGNSNDEQERAAVEFWLLVKMIAMAVVLGLMILVTIIGNVFVIAAIILERNLQNVANYLVASLAVADLFVACLVMPLGAVYEISNGWILGPELCDIWTSCDVLCCTASILHLVAIAADRYWTVTNI-YNNLRTPRRVFLMIFCVWFAALIVSLAPQFGWKDPDE-E--QHCMVS--------QDVGYQIFATCCTFYVPLLVILFLYWKIYIIARKRPHQKRRQLLEAKRERKAAQTLAIITGAFVICWLPFFVMALTMSLCKECEIHTAVASLFLWLGYFNSTLNPVIYTIFNPEFRRAFKRILFGRKAAARARSAKI-------------
>OPR._HUMAN
GSHLQGNLSLLSPNHSLLPPHLLLNASHGAFLPLGLKVTIVGLYLAVCVGGLLGNCLVMYVILRHTKMKTATNIYIFNLALADTLVLLTLPFQGTDILLGF-WPFGNALCKTVIAIDYYNMFTSTFTLTAMSVDRYVAICHPIALDVRTSSKAQAVNVAIWALASVVGVPVAIMGSAQVEE---IECLVE-IPTPQDYWGPVFAICIFLFSFIVPVLVISVCYSLMIRRLRGVRLL-SGSREKDRNLRRITRLVLVVVAVFVGCWTPVQVFVLAQGLGVQPETAVAILRFCTALGYVNSCLNPILYAFLDENFKACFRKFCCASALRRDVQVSDRVALACKTSETVPR
>VG74_KSHV
LDDDESWNETLNMSGYDYSGNFSLEVSVCEMTTVVPYTWNVGILSLIFLINVLGNGLVTYIFCKHRS-RAGAIDILLLGICLNSLCLSISLLAEVLM--FLFNIISTGLCRLEIFFYYLYVYLDIFSVVCVSLVRYLLVAYSTSWPKKQSLGWVLTSAALLIALVLSGDACRHRSRVVD-PVKQAMCYEN---NMTADWRLHVRTVSVTAGFLLPLALLILFYALTWCVV---------RRTKLQARRKVRGVIVAVVLLFFVFCFPYHVLNLLDTLLRRRGLINVGLAVTSLLQALYSAVVPLIYSCLGSLFRQRMYGLFQSLRQSFMSGATT--------------
>PE22_MOUSE
---------------MDNFLNDSKLMEDCKSRQWLLSGESPAISSVMFSAGVLGNLIALALLARRWRSISLFHVLVTELVLTDLLGTCLISPVVLASYSRNQLAPESHACTYFAFTMTFFSLATMLMLFAMALERYLSIGYPYYRRHLSRRGGLAVLPVIYGASLLFCSLPLLNYGEYVQYCPGTWCFIR--------HGRTAYLQLYATMLLLLIVAVLACNISVILNLIRMRGPRRGERTSMAEETDHLILLAIMTITFAICSLPFTIFAYMDETSS---LKEKWDLRALRFLSVNSIIDPWVFAILRPPVLRLMRSVLCCRTSLRTQEAQQTSSKQTDLCGQL--
>A1AD_RABIT
GSGEDNRSSAGEPGGAGGGGEVNGTAAVGGLVVSAQSVGVGVFLAAFILTAVAGNLLVILSVACNRHLQTVTNYFIVNLAVADLLLSATVLPFSATMEVLGFWAFGRAFCDVWAAVDVLCCTASILSLCTISVDRYVGVRHSLYPAIMTERKAAAILALLWAVALVVSMGPLLGWKEP--VP--RFCGIT--------EEVGYAVFSSLCSFYLPMAVIVVMYCRVYVVARSTHTFLSVRLLKFSREKKAAKTLAIVVGVFVLCWFPFFFVLPLGSLFPQLKPSEGVFKVIFWLGYFNSCVNPLIYPCSSREFKRAFLRLLRCQCRRRRRRRPLWRASAGGGPHPDCA
>OPSD_CRIGR
MNGTEGPNFYVPFSNATGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVICKPMSNFRFGENHAIMGVVFTWIMALACAAPPLVGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFVVHFTIPLIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVILMVVFFLICWFPYAGVAFYIFTHQGSNFGPIFMTLPAFFAKSSSIYNPVIYIMMNKQFRNCMLTTLCCGKNILGDDEASATASKTETSQVAPA
>AA1R_CANFA
----------------------------MPPAISAFQAAYIGIEVLIALVSVPGNVLVIWAVKVNQALRDATFCFIVSLAVADVAVGALVIPLAILINIG--PRTYFHTCLMVACPVLILTQSSILALLAIAVDRYLRVKIPLYKTVVTPRRAAVAIAGCWILSFVVGLTPLFGWNRLGN-G-VIKCEFE-----KVISMEYMVYFNFFVWVLPPLLLMVLIYLEVFYLIRRQK-VSGDPQKYYGKELKIAKSLALILFLFALSWLPLHILNCITLFCPSCRKPSILMYIAIFLTHGNSAMNPIVYAFRIQKFRVTFLKIWNDHFRCQPTPPVD--PHD---------
>DCDR_.ENLA
---------MENFSIFNVTVNVWHADLDVGNSDLSLRALTGLLLSLLILSTLLGNTLVCLAVIKFRHRSKVTNFFVISLAVSDLFVALLVMPWKAVTEVAGFWVFG-DFCDTWVAFDIMCSTASILNLCIISLDRYWAIASPFYERKMTQRVAFIMIGVAWTLSILISFIPVQLSWHKSEE-HTENCDSS--------LNRTYAISSSLISFYIPVVIMIGTYTRIYRIAQTQSSRENSLKTSFRKETKVLKTLSIIMGVFVFCWLPFFVLNCMIPFCHMNCVSETTFNIFVWFGWANSSLNPVIYAFNA-DFRKAFTTILGCNRFCSSNNVEAVNYHHDTTFQK---
>O.YR_BOVIN
GAFAANWSAEAVNGSAAPPGTEGNRTAGPPQRNEALARVEVAVLCLILFLALSGNACVLLALRTTRHKHSRLFFFMKHLSIADLVVAVFQVLPQLLWDITFRFYGPDLLCRLVKYLQVVGMFASTYLLLLMSLDRCLAICQPL--RSLSRRTDRLAVLVTWLGCLVASAPQVHIFSLRE----VFDCWAV---FIQPWGPKAYITWITLAVYIVPVIVLATCYGLISFKIWQNRAIVSNVKLISKAKIRTVKMTFIVVLAFIVCWTPFFFVQMWSVWDADA-KEASPFIIAMLLASLNSCCNPWIYMLFTGHLFQELVQRFLCCSFRRLKGSRPGENSSTFVLSQYSS
>YQNJ_CAEEL
RDSVINASSAVSTTTLPPLDIPMTSMKPPSIIPTVELVLGTITYLVIIAMTVVGNTLVVVAVFSYRPLKKVQNYFLVSLAASDLAVAIFVMPLHVVTFLAGQWLLGVTVCQFFTTADILLCTSSILNLCAIALDRYWAIHNPIYAQKRTTKFVCIVIVIVWILSMLISVPPIIGWNNW-M---EDSCGLS--------TEKAFVVFSAAGSFFLPLLVMVVVYVKIFISARQRNPTRKREKISVAKEKRAAKTIAVIIFVFSFCWLPFFVAYVIRPFCETCTLVQQVEQAFTWLGYINSSLNPFLYGILNLEFRRAFKKILCPKAVLEQRRRRMSA------------
>OPS2_PATYE
------------------MPFPLNRTDTALVISPSEFRIIGIFISICCIIGVLGNLLIIIVFAKRRSVRRPINFFVLNLAVSDLIVALLGYPMTAASAFSNRWIFDNIGCKIYAFLCFNSGVISIMTHAALSFCRYIIICQYGYRKKITQTTVLRTLFSIWSFAMFWTLSPLFGWSSYVIEVVPVSCSVN--WYGHGLGDVSYTISVIVAVYVFPLSIIVFSYGMILQEKVCGIRARYTPRFIQDIEQRVTFISFLMMAAFMVAWTPYAIMSALAIGSFN--VENSFAALPTLFAKASCAYNPFIYAFTNANFRDTVVEIMAPWTTRRVGVSTLPWRRRTSAVNTTDI
>RGR_HUMAN
---------------------MAETSALPTGFGELEVLAVGMVLLVEALSGLSLNTLTIFSFCKTPELRTPCHLLVLSLALADSGIS-LNALVAATSSLLRRWPYGSDGCQAHGFQGFVTALASICSSAAIAWGRYHHYCTRS---QLAWNSAVSLVLFVWLSSAFWAALPLLGWGHYDYEPLGTCCTLD--YSKGDRNFTSFLFTMSFFNFAMPLFITITSYSLME--------------QKLGKSGHLQVNTTLPARTLLLGWGPYAILYLYAVIADVTSISPKLQMVPALIAKMVPTINAINYALGNEMVCRGIWQCLSPQKREKDRTK----------------
>PE23_PIG
------------MWAPERSAEEQGNLTRSLGSSEDCGSVSVVFPMTMLITGFVGNALAMLLVSQSYRRKKSFLLCIGWLALTDMVGQLLTSPVVIVLYLSHQLDPSGRLCTFFGLTMTAFGLSSLFIASAMAVERALAIRAPHYSSHMKTSATRAVLLGVWLAVLAFALLPVLGVG----QYTGTWCFISNETSSENNWGNIFFASAFSFLGLSALVVTFACNLATIKALVSRAKASQSSAQWGRITTETAIQLMGIMCVLSVCWSPLLIMMLKTIFNQTSQNECNFFLIAVRLASLNQILDPWVYLLLRKILLQKFCQAVSQKQREEAATLIFTHPGEARVLFSKSK
>ACM1_MACMU
------------MNTSAPPAVSPNITVLAPGKGPWQVAFIGITTGLLSLATVTGNLLVLISFKVNTELKTVNNYFLLSLACADLIIGTFSMNLYTTYLLMGHWALGTLACDLWLALDYVASNASVMNLLLISFDRYFSVTRPLYRAKRTPRRAALMIGLAWLVSFVLWAPAILFWQYLV-VL--GQCYIQ------FLSQPIITFGTAMAAFYLPVTVMCTLYWRIYRETENRRGKAKRKTFSLVKEKKAARTLSAILLAFILTWTPYNIMVLVSTFCKDC-VPETLWELGYWLCYVNSTINPMCYALCNKAFRDTFRLLLLCRWDKRRWRKIPKRPSRQC-------
>GPRF_CERAE
----MDPEETSVYLDYYYATSPNPDIRETHSHVPYTSVFLPVFYIAVFLTGVLGNLVLMGALHFKPGSRRLIDIFIINLAASDFIFLVTLPLWVDKEASLGLWRTGSFLCKGSSYMISVNMHCSVFLLTCMSVDRYLAIVCPVSRKFRRTDCAYVVCASIWFISCLLGLPTLLSRELT-IDD-KPYCAEK----KATPLKLIWSLVALIFTFFVPLLSIVTCYCRIARKLCAH---YQQSGKHNKKLKKSIKIIFIVVAAFLVSWLPFNTSKLLAIVSGLQAILQLGMEVSGPLAFANSCVNPFIYYIFDSYIRRAIVHCLCPCLKNYDFGSSTETALSTFIHAEDFT
>NK3R_RAT
-G--NFSSALGLPATTQAPSQVRANLTNQFVQPSWRIALWSLAYGLVVAVAVFGNLIVIWIILAHKRMRTVTNYFLVNLAFSDASVAAFNTLINFIYGLHSEWYFGANYCRFQNFFPITAVFASIYSMTAIAVDRYMAIIDPL-KPRLSATATKIVIGSIWILAFLLAFPQCLYSK---GR---TLCYVQ--WPEGPKQHFTYHIIVIILVYCFPLLIMGVTYTIVGITLWGG--PCDKYHEQLKAKRKVVKMMIIVVVTFAICWLPYHVYFILTAIYQQLKYIQQVYLASFWLAMSSTMYNPIIYCCLNKRFRAGFKRAFRWCPFIQVSSY----TTRFHPTRQSSL
>CKR3_CERAE
MTTSLYTVETFGPTSYDDDMGLLC---EKADVGALIAQFVPPLYSLVFTVGLLGNVVVVMILIKYRRLRIMTNIYLLNLAISDLLFLFTLPFWIHYVREHN-WVFSHGMCKVLSGFYHTGLYSEIFFIILLTIDRYLAIVHAVALRARTVTFGVITSIVTWGLAVLVALPEFIFYGTEE-LF-ETLCSAIYPQDTVYSWRHFHTLKMTILCLALPLLVMAICYTGIIKTL---------LKCPSKKKYKAIRLIFVIMAVFFIFWTPYNVAILISTYQSILKHVDLVVLVTEVIAYSHCCVNPVIYAFVGERFRKYLRHFFHRHVLMHLGRYIPFLT-------SSVS
>PE24_HUMAN
--------------------MSTPGVNSSASLSPDRLNSPVTIPAVMFIFGVVGNLVAIVVLCKSRKKETTFYTLVCGLAVTDLLGTLLVSPVTIATYMKGQWPGGQPLCEYSTFILLFFSLSGLSIICAMSVERYLAINHAYYSHYVDKRLAGLTLFAVYASNVLFCALPNMGLGSSRLQYPDTWCFID---WTTNVTAHAAYSYMYAGFSSFLILATVLCNVLVCGALLRMAAAASSFRRIAGAEIQMVILLIATSLVVLICSIPLVVRVFVNQLYQPSEVSKNPDLQAIRIASVNPILDPWIYILLRKTVLSKAIEKIKCLFCRIGGSRRERSQRTSSAMSGHSR
>GPR4_PIG
--------------------MGNGTWEGCHVDSRVDHLFPPSLYIFVIGVGLPTNCLRLWAAYRQVRQRNELGVYLMNLSIADLLYICTLPLWVDYFLHHDNWIHGPGSCKLFGFIFYTNIYISIAFLCCISVDRYLAVAHPLFARLRRVKTAVAVSSVVWATELGANSVPLFHDEL--NH---TFCFEKPMEGWVAWMVAWMNLYRVFVGFLFPWALMLLSYRGILRAVRG------SVSTERQEKAKIKRLALSLIAIVLVCFAPYHVLLLSRSAVYLGERVFSAYHSSLAFTSLNCVADPILYCLVNEGARSDVAKALHNLLRFLTSDKPQEMDTPLTSKRNSMA
>PE23_BOVIN
PFCTRFNHSDPGIWAAERAVEAPNNLTLPPEPSEDCGSVSVAFSMTMMITGFVGNALAITLVSKSYRRKKSFLLCIGWLALTDMVGQLLTSPVVIVLYLSHQLDPSGRLCTFFGLTMTVFGLSSLFIASAMAVERALATRAPHYSSHMKTSVTRAVLLGVWLAVLAFALLPVLGVG----QYTGTWCFISNGTNSRQNWGNVFFASAFAILGLSALVVTFACNLATIKALVSRAKASQSSAQWGRITTETAIQLMGIMCVLSVCWSPLLIMMLKMIFNHTSQDECNFFLIAVRLASLNQILDPWVYLLLRKILLQKFCQLLKGHSYGLDTEGGTENNLYISNLSRFFI
>OPSD_RANCA
MNGTEGPNFYVPMSNKTGIVRSPFEYPQYYLAEPWKYSVLAAYMFLLILLGLPINFMTLYVTIQHKKLRTPLNYILLNLAFANHFMVLCGFTITMYTSLHGYFVFGQTGCYFEGFFATLGGEIALWSLVVLAIERYIVVCKPMSNFRFGENHAMMGVAFTWIMALACAVPPLFGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFVVHFLIPLIIISFCYGRLVCTVKEAAAQQQESATTQKAEKEVTRMVVIMVIFFLICWVPYAYVAFYIFTHQGSEFGPIFMTVPAFFAKSSAIYNPVIYIMLNKQFRNCMITTLCCGKNPFGDEDASSAATSVSTSQVSPA
>OPS6_DROME
GR----NLSLAESVPAEIMHMVDPYWYQWPPLEPMWFGIIGFVIAILGTMSLAGNFIVMYIFTSSKGLRTPSNMFVVNLAFSDFMMMFTMFPPVVLNGFYGTWIMGPFLCELYGMFGSLFGCVSIWSMTLIAYDRYCVIVKGMARKPLTATAAVLRLMVVWTICGAWALMPLFGWNRYVPEGNMTACGTD--YFAKDWWNRSYIIVYSLWVYLTPLLTIIFSYWHIMKAVAAHNVANSEADKSKAIEIKLAKVALTTISLWFFAWTPYTIINYAGIFESMH-LSPLSTICGSVFAKANAVCNPIVYGLSHPKYKQVLREKMPCLACGKDDLTSDSRSESQA-------
>OPRM_HUMAN
LSHLDGNLSDPCGPNRTDLGGRDSLCPPTGSPSMITAITIMALYSIVCVVGLFGNFLVMYVIVRYTKMKTATNIYIFNLALADALATSTLPFQSVNYLMGT-WPFGTILCKIVISIDYYNMFTSIFTLCTMSVDRYIAVCHPVALDFRTPRNAKIINVCNWILSSAIGLPVMFMATTKYGS---IDCTLT-FSHPTWYWENLLKICVFIFAFIMPVLIITVCYGLMILRLKSVRML-SGSKEKDRNLRRITRMVLVVVAVFIVCWTPIHIYVIIKALVTIPTFQTVSWHFCIALGYTNSCLNPVLYAFLDENFKRCFREFCIPTSSNIEQQNSTRIPSTANTVDRTNH
>AA2A_MOUSE
----------------------------------MGSSVYIMVELAIAVLAILGNVLVCWAVWINSNLQNVTNFFVVSLAAADIAVGVLAIPFAITISTG--FCAACHGCLFIACFVLVLTQSSIFSLLAIAIDRYIAIRIPLYNGLVTGMKAKGIIAICWVLSFAIGLTPMLGWNNCST-K-RVTCLFE-----DVVPMNYMVYYNFFAFVLLPLLLMLAIYLRIFLAARRQESQGERTRSTLQKEVHAAKSLAIIVGLFALCWLPLHIINCFTFFCSTCHAPPWLMYLAIILSHSNSVVNPFIYAYRIREFRQTFRKIIRTHVLRRQEPFRAGGAHSTEGEQVSLR
>OPSG_CAVPO
DAYEDSTQASLFTYTNSNNTRGPFEGPNYHIAPRWVYHLTSAWMTIVVIASIFTNGLVLVATMRFKKLRHPLNWILVNLAVADLAETVIASTISVVNQVYGYFVLGHPLCVVEGYTVSLCGITGLWSLAIISWERWLVVCKPFGNVRFDAKLAIVGIVFSWVWSAVWTAPPIFGWSRYWPYGLKTSCGPDVFSGTSYPGVQSYMMVLMVTCCITPLSIIVLCYLHVWLAIRAVAKQQKESESTQKAEKEVTRMVVVMVLAYCLCWGPYAFFACFATANPGYSFHPLVAALPAYFAKSATIYNPIIYVFMNRQFRNCILQLFGKKVEDSSELSSTSRSVSPAA------
>CCR4_CERTO
IYTSDNYTEEMG-SGDYDSIKEPC---FREKNAHFNRIFLPTIYSIIFLTGIVGNGLVILVMGYQKKLRSMTDKYRLHLSVADLLFVITLPFWAVDAVAN--WYFGNFLCKAVHVIYTVNLYSSVLILAFISLDRYLAIVHATSQKPRKLLAEKVVYVGVWIPALLLTIPGFIFASVSE-DD-RFICDRF---YPNDLWVVVFQFQHIMVGLILPGIVILSCYCIIISKL---------SHSKGHQKRKALKTTVILILAFFACWLPYYIGISIDSFILLENTVHKWISITEALAFFHCCLNPILYAFLGAKFKTSAQHALTSVSRGSSLKILSKG----------GH
>OPSB_ANOCA
MNGTEGINFYVPLSNKTGLVRSPFEYPQYYLAEPWKYKVVCCYIFFLIFTGLPINILTLLVTFKHKKLRQPLNYILVNLAVADLFMACFGFTVTFYTAWNGYFIFGPIGCAIEGFFATLGGQVALWSLVVLAIERYIVVCKPMGNFRFSATHALMGISFTWFMSFSCAAPPLLGWSRYIPEGMQCSCGPDYYTLNPDYHNESYVLYMFGVHFVIPVVVIFFSYGRLICKVREAAAQQQESASTQKAEREVTRMVILMVLGFLLAWTPYAMVAFWIFTNKGVDFSATLMSVPAFFSKSSSLYNPIIYVLMNKQFRNCMITTICCGKNPFGDEDVSSSVSSVSSSQVSPA
>O1A2_HUMAN
-------------MKKENQSFNLDFILLGVTSQQEQNNVFFVIFLCIYPITLTGNLLIILAICADIRLHNPMYFLLANLSLVDIIFSSVTIPKVLANHLLGSKFISFGGCLMQMYFMIALAKADSYTLAAMAYDRAVAISCPLYTTIMSPRSCILLIAGSWVIGNTSALPHTLLTASL---SANFYCDIMKLSCSDVVFFNVKMMYLGVGVFSLPLLCIIVSYVQVFSTV--------FQVPSTKSLFKAFCTCGSHLTVVFLYYGTTMGMYFRPLTSY----SPKDAVITVMYVAVTPALNPFIYSLRNWDMKAALQKLFSKRISS---------------------
>A2AC_MOUSE
AEGPNGSDAGEWGSGGGANASGTDWVPPPGQYSAGAVAGLAAVVGFLIVFTVVGNVLVVIAVLTSRALRAPQNLFLVSLASADILVATLVMPFSLANELMAYWYFGQVWCGVYLALDVLFCTSSIVHLCAISLDRYWSVTQAVYNLKRTPRRVKATIVAVWLISAVISFPPLVSFYR-------PQCGLN--------DETWYILSSCIGSFFAPCLIMGLVYARIYRVAKLRRRAVCRRKVAQAREKRFTFVLAVVMGVFVLCWFPFFFSYSLYGICREAQLPEPLFKFFFWIGYCNSSLNPVIYTVFNQDFRRSFKHILFRRRRRGFRQ-----------------
>5H2C_RAT
LLVWQFDISISPVAAIVTDTFNSSDGGRLFQFPDGVQNWPALSIVVIIIMTIGGNILVIMAVSMEKKLHNATNYFLMSLAIADMLVGLLVMPLSLLAILYDYWPLPRYLCPVWISLDVLFSTASIMHLCAISLDRYVAIRNPIHSRFNSRTKAIMKIAIVWAISIGVSVPIPVIGLRD-VF--NTTCVL---------NDPNFVLIGSFVAFFIPLTIMVITYFLTIYVLRRQKKKPRGTMQAINNEKKASKVLGIVFFVFLIMWCPFFITNILSVLCGKAKLMEKLLNVFVWIGYVCSGINPLVYTLFNKIYRRAFSKYLRCDYKPDKKPPVRQIALSGRELNVNIY
>OPRM_PIG
FSHLEGNLSDPCIRNRTELGGSDSLCPPTGSPSMVTAITIMALYSIVCVVGLFGNFLVMYVIVRYTKMKTATNIYIFNLALADALATSTLPFQSVNYLMGT-WPFGTILCKIVISIDYYNMFTSIFTLCTMSVDRYIAVCHPVALDFRTPRNAKIINVCNWILSSAIGLPVMFMATTKYGS---IDCALT-FSHPTWYWENLLKICVFIFAFIMPVLIITVCYGLMILRLKSVRML-SGSKEKDRNLRRITRMVLVVVAVFIVCWTPIHIYVIIKALITIPTFQTVSWHFCIALGYTNSCLNPVLYAFLDENFKRCFREFCIPTSSTIEQQNSARIPSTANTVDRTNH
>B1AR_FELCA
LPDGAATAARLLVPASPSASPLTPTSEGPAPLSQQWTAGIGLLMALIVLLIVAGNVLVIVAIAKTPRLQTLTNLFIMSLASADLVMGLLVVPFGATIVMRGRWEYGSFFCELWTSVDVLCVTASIETLCVIALDRYLAITSPFYQSLLTRARARALVCTVWAISALVSFLPILMHWWRAR-R--KCCDFV--------TNRAYAIASSVVSFYVPLCIMAFVYLRVFREAQKQNGRRRPSRLVALREQKALKTLGIIMGVFTLCWLPFFLANVVKAFHRDL-VPDRLFVFFNWLGYANSAFNPIIYCRSP-DFRKAFQRLLCFARRAARGGHAAAGCLPGTRPPPSPG
>BONZ_HUMAN
------MAEHDYHEDYGFSSFNDSSQEEHQDFLQFSKVFLPCMYLVVFVCGLVGNSLVLVISIFYHKLQSLTDVFLVNLPLADLVFVCTLPFWAYAGIHE--WVFGQVMCKSLLGIYTINFYTSMLILTCITVDRFIVVVKATNQQAKRMTWGKVTSLLIWVISLLVSLPQIIYGNVFNLD--KLICGYH-----DEAISTVVLATQMTLGFFLPLLTMIVCYSVIIKTL---------LHAGGFQKHRSLKIIFLVMAVFLLTQMPFNLMKFIRSTHWEYTSFHYTIMVTEAIAYLRACLNPVLYAFVSLKFRKNFWKLVKDIGCLPYLGVSHQWKTFSASHNVEAT
>CH23_HUMAN
EDEDYNTSISYGDEYPDYLDSIVVLEDLSPLEARVTRIFLVVVYSIVCFLGILGNGLVIIIATFKMK-KTVNMVWFLNLAVADFLFNVFLPIHITYAAMDYHWVFGTAMCKISNFLLIHNMFTSVFLLTIISSDRCISVLLPVSQNHRSVRLAYMACMVIWVLAFFLSSPSLVFRDTAN-SS--WPTHSQ-MDPVGYSRHMVVTVTRFLCGFLVPVLIITACYLTIVCKL---------QRNRLAKTKKPFKIIVTIIITFFLCWCPYHTLNLLELHHTAMSVFSLGLPLATALAIANSCMNPILYVFMGQDFKK-FKVALFSRLVNALSEDTGHSFTKMSSMNERTS
>TSHR_HUMAN
LQAFDSHYDYTICGDSEDMVCTPKSDEFNPCEDIMGYKFLRIVVWFVSLLALLGNVFVLLILLTSHYKLNVPRFLMCNLAFADFCMGMYLLLIASVDLYTHSWQTG-PGCNTAGFFTVFASELSVYTLTVITLERWYAITFAMLDRKIRLRHACAIMVGGWVCCFLLALLPLVGISSY----KVSICLPM-----TETPLALAYIVFVLTLNIVAFVIVCCCHVKIYITVRNP------QYNPGDKDTKIAKRMAVLIFTDFICMAPISFYALSAILNKPLITVSNSKILLVLFYPLNSCANPFLYAIFTKAFQRDVFILLSKFGICKRQAQAYRGSTDIQVQKVTHD
>GHSR_PIG
SEEPGPNLTLPDLGWDAPPENDSLVEELLPLFPTPLLAGVTATCVALFVVGIAGNLLTMLVVSRFREMRTTTNLYLSSMAFSDLLIFLCMPLDLFRLWQYRPWNLGNLLCKLFQFVSESCTYATVLTITALSVERYFAICFPLAKVVVTKGRVKLVILVIWAVAFCSAGPIFVLVG---T-D-TNECRAT---FAVRSGLLTVMVWVSSVFFFLPVFCLTVLYSLIGRKLWRRGE-AVGSSLRDQNHKQTVKMLAVVVFAFILCWLPFHVGRYLFSKSLEPQISQYCNLVSFVLFYLSAAINPILYNIMSKKYRVAVFKLLGFEPFSQRKLSTLKDESSINT------
>B1AR_RAT
LPDGAATAARLLVLASPPASLLPPASEGSAPLSQQWTAGMGLLLALIVLLIVVGNVLVIVAIAKTPRLQTLTNLFIMSLASADLVMGLLVVPFGATIVVWGRWEYGSFFCELWTSVDVLCVTASIETLCVIALDRYLAITLPFYQSLLTRARARALVCTVWAISALVSFLPILMHWWRAR-R--KCCDFV--------TNRAYAIASSVVSFYVPLCIMAFVYLRVFREAQKQNGRRRPSRLVALREQKALKTLGIIMGVFTLCWLPFFLANVVKAFHRDL-VPDRLFVFFNWLGYANSAFNPIIYCRSP-DFRKAFQRLLCCARRAACRRRAAHGCLARAGPPPSPG
>ACM2_CHICK
-----------MNNSTYINSSSENVIALESPYKTIEVVFIVLVAGSLSLVTIIGNILVMVSIKVNRHLQTVNNYFLFSLACADLIIGIFSMNLYTLYTVIGYWPLGPVVCDLWLALDYVVSNASVMNLLIISFDRYFCVTKPLYPVKRTTKMAGMMIAAAWVLSFILWAPAILFWQFIV-VP-DKDCYIQ------FFSNPAVTFGTAIAAFYLPVIIMTVLYWQISRASKSRVKMPAKKKPPPSREKKVTRTILAILLAFIITWTPYNVMVLINSFCASC-IPGTVWTIGYWLCYINSTINPACYALCNATFKKTFKHLLMCHYKNIGATR----------------
>GPRK_HUMAN
ATAVTTVRTNASGLEVPLFHLFARLDEELHGTFPGLCVALMAVHGAIFLAGLVLNGLALYVFCCRTRAKTPSVIYTINLVVTDLLVGLSLPTRFAVYYGA---RGCLRCAFPHVLGYFLNMHCSILFLTCICVDRYLAIVRPEPAACRQPACARAVCAFVWLAAGAVTLSVLGVTG----------S-----------RPCCRVFALTVLEFLLPLLVISVFTGRIMCALSRP----GLLHQGRQRRVRAMQLLLTVLIIFLVCFTPFHARQVAVALWPDMHTSLVVYHVAVTLSSLNSCMDPIVYCFVTSGFQATVRGLFGQHGEREPSSGDVVSSGRHHILSAGPH
>ACM1_HUMAN
------------MNTSAPPAVSPNITVLAPGKGPWQVAFIGITTGLLSLATVTGNLLVLISFKVNTELKTVNNYFLLSLACADLIIGTFSMNLYTTYLLMGHWALGTLACDLWLALDYVASNASVMNLLLISFDRYFSVTRPLYRAKRTPRRAALMIGLAWLVSFVLWAPAILFWQYLV-VL-AGQCYIQ------FLSQPIITFGTAMAAFYLPVTVMCTLYWRIYRETENRRGKAKRKTFSLVKEKKAARTLSAILLAFILTWTPYNIMVLVSTFCKDC-VPETLWELGYWLCYVNSTINPMCYALCNKAFRDTFRLLLLCRWDKRRWRKIPKRPSRQC-------
>ML1A_HUMAN
-------MQGNGSALPNASQ---PVLRGDGARPSWLASALACVLIFTIVVDILGNLLVILSVYRNKKLRNAGNIFVVSLAVADLVVAIYPYPLVLMSIFNNGWNLGYLHCQVSGFLMGLSVIGSIFNITGIAINRYCYICHSLYDKLYSSKNSLCYVLLIWLLTLAAVLPNLRAGT-LQYDP-IYSCTFA------QSVSSAYTIAVVVFHFLVPMIIVIFCYLRIWILVLQVQR-PDRKPKLKPQDFRNFVTMFVVFVLFAICWAPLNFIGLAVASDPASRIPEWLFVASYYMAYFNSCLNAIIYGLLNQNFRKEYRRIIVSLCTARVFFVDSSNWKPSPLMTNNNV
>TSHR_RAT
LQAFDSHYDYTVCGDNEDMVCTPKSDEFNPCEDIMGYKFLRIVVWFVSPMALLGNVFVLFVLLTSHYKLTVPRFLMCNLAFADFCMGVYLLLIASVDLYTHTWQTG-PGCNTAGFFTVFASELSVYTLTVITLERWYAITFAMLDRKIRLRHAYTIMAGGWVSCFLLALLPMVGISSY----KVSICLPM-----TDTPLALAYIALVLLLNVVAFVIVCSCYVKIYITVRNP------QYNPRDKDTKIAKRMAVLIFTDFMCMAPISFYALSALMNKPLITVTNSGVLLVLFYPLNSCANPFLYAIFTKAFQRDVFILLSKFGLCKHQAQAYQANTGIQIQKIPQD
>OPS2_DROME
AQSS-GNGSVLDNVLPDMAHLVNPYWSRFAPMDPMMSKILGLFTLAIMIISCCGNGVVVYIFGGTKSLRTPANLLVLNLAFSDFCMMASQSPVMIINFYYETWVLGPLWCDIYAGCGSLFGCVSIWSMCMIAFDRYNVIVKGINGTPMTIKTSIMKILFIWMMAVFWTVMPLIGWSAYVPEGNLTACSID--YMTRMWNPRSYLITYSLFVYYTPLFLICYSYWFIIAAVAAHMNVRSSEDCDKSAEGKLAKVALTTISLWFMAWTPYLVICYFGLFKIDG-LTPLTTIWGATFAKTSAVYNPIVYGISHPKYRIVLKEKCPMCVFGNTDEPKPDATSEADSKA----
>OPRD_HUMAN
PPLFANASDAYPSACPSAGANASGPPGARSASSLALAIAITALYSAVCAVGLLGNVLVMFGIVRYTKMKTATNIYIFNLALADALATSTLPFQSAKYLMET-WPFGELLCKAVLSIDYYNMFTSIFTLTMMSVDRYIAVCHPVALDFRTPAKAKLINICIWVLASGVGVPIMVMAVTRPGA---VVCMLQ-FPSPSWYWDTVTKICVFLFAFVVPILIITVCYGLMLLRLRSVRLL-SGSKEKDRSLRRITRMVLVVVGAFVVCWAPIHIFVIVWTLVDIDPLVVAALHLCIALGYANSSLNPVLYAFLDENFKRCFRQLCRKPCGRPDPSSFSRARVTACTPSDGPG
>OPSD_.ENLA
MNGTEGPNFYVPMSNKTGVVRSPFDYPQYYLAEPWQYSALAAYMFLLILLGLPINFMTLFVTIQHKKLRTPLNYILLNLVFANHFMVLCGFTVTMYTSMHGYFIFGPTGCYIEGFFATLGGEVALWSLVVLAVERYIVVCKPMANFRFGENHAIMGVAFTWIMALSCAAPPLFGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFIVHFTIPLIVIFFCYGRLLCTVKEAAAQQQESLTTQKAEKEVTRMVVIMVVFFLICWVPYAYVAFYIFTHQGSNFGPVFMTVPAFFAKSSAIYNPVIYIVLNKQFRNCLITTLCCGKNPFGDEDGSSAASSVSSSQVSPA
>PAFR_HUMAN
----------------------MEPHDSSHMDSEFRYTLFPIVYSIIFVLGVIANGYVLWVFARLYPKFNEIKIFMVNLTMADMLFLITLPLWIVYYQNQGNWILPKFLCNVAGCLFFINTYCSVAFLGVITYNRFQAVTRPITAQANTRKRGISLSLVIWVAIVGAASYFLILDSTN-G--NVTRCFEH---YEKGSVPVLIIHIFIVFSFFLVFLIILFCNLVIIRTLLMQP---VQQQRNAEVKRRALWMVCTVLAVFIICFVPHHVVQLPWTLAELGQAINDAHQVTLCLLSTNCVLDPVIYCFLTKKFRKHLTEKFYSMRSSRKCSRATTDPFNQIPGNSLKN
>OPSG_SCICA
DSHEDSTQSSIFTYTNSNATRGPFEGPNYHIAPRWVYHITSTWMIIVVIASVFTNGLVLVATMKFKKLRHPLNWILVNLAIADLAETVIASTISVVNQLYGYFVLGHPLCVVEGYTVSVCGITGLWSLAIISWERWLVVCKPFGNMRFDAKLAIVGIAFSWIWSAVWTAPPIFGWSRYWPYGLKTSCGPDVFSGTSYPGVQSYMMVLMVTCCIIPLSIIILCYLQVWLAIRAVAKQQKESESTQKAEKEVTRMVVVMVFAYCLCWGPYTFFACFATANPGYAFHPLVAALPAYFAKSATIYNPIIYVFMNRQFRNCILQLFGKKVDDTSELSSASKSVSPAA------
>HH2R_RAT
-------------------MEPNGTVHSCCLDSMALKVTISVVLTTLILITIAGNVVVCLAVSLNRRLRSLTNCFIVSLAATDLLLGLLVLPFSAIYQLSFTWSFGHVFCNIYTSLDVMLCTASILNLFMISLDRYCAVTDPLYPVLVTPVRVAISLVFIWVISITLSFLSIHLGWN--RN-TF-KCKVQ--------VNEVYGLVDGLVTFYLPLLIMCVTYYRIFKIAREQR--ISSWKAATIREHKATVTLAAVMGAFIICWFPYFTAFVYRGLRGDD-INEAVEGIVLWLGYANSALNPILYAALNRDFRTAYQQLFHCKFASHNSHKTSLRRSQSREGRW---
>V1AR_RAT
SSPWWPLTTEGSNGSQEAARLGEGDSPLGDVRNEELAKLEIAVLAVIFVVAVLGNSSVLLALHRTPRKTSRMHLFIRHLSLADLAVAFFQVLPQLCWDITSSFRGPDWLCRVVKHLQVFAMFASAYMLVVMTADRYIAVCHPLKTLQQPARRSRLMIATSWVLSFILSTPQYFIFSVIETK--TQDCWAT---FIQPWGTRAYVTWMTSGVFVAPVVVLGTCYGFICYHIWRNLLVVSSVKSISRAKIRTVKMTFVIVSAYILCWAPFFIVQMWSVWDENFDSENPSITITALLASLNSCCNPWIYMFFSGHLLQDCVQSFPCCHSMAQKFAKDDSTSYSNNRSPTNS
>GPR1_HUMAN
EDLEETLFEEFENYSYDLDYYSLESDLEEKVQLGVVHWVSLVLYCLAFVLGIPGNAIVIWFTGLKWK-KTVTTLWFLNLAIADFIFLLFLPLYISYVAMNFHWPFGIWLCKANSFTAQLNMFASVFFLTVISLDHYIHLIHPVSHRHRTLKNSLIVIIFIWLLASLIGGPALYFR--D--NN-HTLCYNNHDPDLTLIRHHVLTWVKFIIGYLFPLLTMSICYLCLIFKV---------KKRTVLISSRHFWTILVVVVAFVVCWTPYHLFSIWELTIHHNHVMQAGIPLSTGLAFLNSCLNPILYVLISKKFQARFRSSVAEILKYTLWEVSCSGNSETKNLCLLET
>DADR_MACMU
--------------MRTLNTSAMDGTGLVVERDFSVRILTACFLSLLILSTLLGNTLVCAAVIRFRHRSKVTNFFVISLAVSDLLVAVLVMPWKAVAEIAGFWPFG-SFCNIWVAFDIMCSTASILNLCVISVDRYWAISSPFYERKMTPKAAFILISVAWTLSVLISFIPVQLSWHKAGN-TIDNCDSS--------LSRTYAISSSVISFYIPVAIMIVTYTRIYRIAQKQVECESSFKMSFKRETKVLKTLSVIMGVFVCCWLPFFILNCILPFCGSGCIDSITFDVFVWFGWANSSLNPIIYAFNA-DFRKAFSTLLGCYRLCPATNNAIETAAMFSSH-----
>AG2R_CANFA
------MILNSSTEDGIKRIQDDC---PKAGRHNYIFVMIPTLYSIIFVVGIFGNSLVVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPLWAVYTAMEYRWPFGNYLCKIASASVSFNLYASVFLLTCLSIDRYVAIVHPMSPVRRTMLMAKVTCIIIWLLAGLASLPTIIHRNVFF-TN--TVCAFH-YESQNSTLPIGLGLTKNILGFLFPFLIILTSYTLIWKTLKRA----YEIQKNKPRNDDIFKIIMAIVLFFFFSWVPHQIFTFLDVLIQLGDIVDTAMPITICIAYFNNCLNPLFYGFLGKKFKKYFLQLLKYIPPKAKSHSSLSTRPSD-------H
>OPSD_ATHBO
MNGTEGPYFYIPMLNTTGVVRSPYEYPQYYLVNPAAYAVLGAYMFFLILVGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTIYTSMHGYFVLGRLGCNVEGFSATLGGEIALWSLVVLAIERWVVVCKPISNFRFGENHAIMGVAFTWFMAAACAVPPLFGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFTCHFCIPLMVVFFCYGRLVCAVKEAAAAQQESETTQRAEREVTRMVIIMVVSFLVSWVPYASVAWYIFTHQGSEFGPLFMTIPAFFAKSSSIYNPMIYICMNKQFRHCMITTLCCGKNPFEEEEGASSSSVSSSSVSPAA
>RDC1_RAT
YVEPGNYSDSNWPCNSSDCIVVDTVQCPAMPNKNVLLYTLSFIYIFIFVIGMIANSVVVWVNIQAKTTGYDTHCYILNLAIADLWVVITIPVWVVSLVQHNQWPMGELTCKITHLIFSINLFGSIFFLACMSVDRYLSITYFTTSSYKKKMVLRVVCVLVWLLAFFVSLPDTYYLKTVTNNE--TYCRSFYPEHSIKEWLIGMELVSVILGFAVPFTIIAIFYFLLARAM---------SASGDQEKHSSRKIIFSYVVVFLVCWLPYHFVVLLDIFSILHNVLFTALHVTQCLSLVHCCVNPVLYSFINRNYRYELMKAFIFKYSAKTGLTKLIDEYSALEQNAKA-
>IL8A_RAT
EGDFEEEFGNITRMLPTGEYFSPC----KR-VPMTNRQAVVVFYALVFLLSLLGNSLVMLVILYRRRTRSVTDVYVLNLAIADLLFSLTLPFLAVSKWKG--WIFGTPLCKMVSLLKEVNFFSGILLLACISVDRYLAIVHATRTLTRKRYLVKFVCMGTWGLSLVLSLPFAIFRQAYK-RS-GTVCYEV-LGEATADLRITLRGLSHIFGFLLPLFIMLVCYGLTLRTL---------FKAHMRQKRRAMWVIFAVVLVFLLCCLPYNLVLLSDTLLGAHNNIDQALYITEILGFSHSCLNPVIYAFVGQSFRHEFLKILAN--LVHKEVLTHHS------------
>CB2R_RAT
----MEGCRELELTNGSNGGLEFNPMKEYMILSDAQQIAVAVLCTLMGLLSALENVAVLYLILSSQRRRKPSYLFIGSLAGADFLASVIFACNFVIFHVFHG-VDSRNIFLLKIGSVTMTFTASVGSLLLTAVDRYLCLCYPPYKALVTRGRALVALGVMWVLSALISYLPLMGWTC-----CPSPCSEL------FPLIPNDYLLGWLLFIAILFSGIIYTYGYVLWKAHQHTEHQVPGIARMRLDVRLAKTLGLVMAVLLICWFPALALMGHSLVTTLSDKVKEAFAFCSMLCLVNSMVNPIIYALRSGEIRSAAQHCLTGWKKYLQGLGSEGKVTETEAEVKTTT
>LSHR_HUMAN
ESELSGWDYEYGFCLPKTPRCAPEPDAFNPCEDIMGYDFLRVLIWLINILAIMGNMTVLFVLLTSRYKLTVPRFLMCNLSFADFCMGLYLLLIASVDSQTKGWQTG-SGCSTAGFFTVFASELSVYTLTVITLERWHTITYAILDQKLRLRHAILIMLGGWLFSSLIAMLPLVGVSNY----KVSICFPM-----VETTLSQVYILTILILNVVAFFIICACYIKIYFAVRNP------ELMATNKDTKIAKKMAILIFTDFTCMAPISFFAISAAFKVPLITVTNSKVLLVLFYPINSCANPFLYAIFTKTFQRDFFLLLSKFGCCKRRAELYRRSNCKNGFTGSNK
>OPSD_LIZSA
MNGTEGPYFYIPMVNTTGIVRSPYEYPQYYLVNPAAYAALGAYMFLLILVGFPINFLTLYVTIEHKKLRTPLNYILLNLAVANLFMVFGGFTTTMYTSMHGYFVLGRLGCNLEGFFATLGGEIALWSLVVLAIERWMVVCKPISNFRFGEDHAIMGLAFTWVMAAACAVPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVIYMFVCHFLIPLVVVFFCYGRLLCAVKEAAAAQQESETTQRAEREVSRMVVIMVVAFLICWCPYAGVAWYIFTHQGSEFGPLFMTFPAFFAKSSSIYNPMIYICMNKQFRHCMITTLCCGKNPFEEEEGASTSVSSSSVSPAA-
>CB1R_FELCA
EFYNKSLSSYKENEENIQCGENFMDMECFMILNPSQQLAIAVLSLTLGTFTVLENLLVLCVILHSRSRCRPSYHFIGSLAVADLLGSVIFVYSFVDFHVFHR-KDSPNVFLFKLGGVTASFTASVGSLFLTAIDRYISIHRPLYKKIVTRPKAVVAFCLMWTIAIVIAVLPLLGWNCK-K--LQSVCSDI------FPLIDETYLMFWIGVTSVLLLFIVYAYMYILWKAHIHSEDQVTRPDQARMDIRLAKTLVLILVVLIICWGPLLAIMVYDVFGKMNKLIKTVFAFCSMLCLLNSTVNPIIYALRSKDLRHAFRSMFPSCEGTAQPLDNSMGHANNTANVHRAA
>EDG3_HUMAN
TALPPRLQPVRGNETLREHYQYVGKLAGRLKEASEGSTLTTVLFLVICSFIVLENLMVLIAIWKNNKFHNRMYFFIGNLALCDLLAG-IAYKVNILMSGKKTFSLSPTVWFLREGSMFVALGASTCSLLAIAIERHLTMIKMRPYDANKRHRVFLLIGMCWLIAFTLGALPILGWNCL-H--NLPDCSTI------LPLYSKKYIAFCISIFTAILVTIVILYARIYFLVKSS---KVANHNNSERSMALLRTVVIVVSVFIACWSPLFILFLIDVACRVQCPILFKAQWFIVLAVLNSAMNPVIYTLASKEMRRAFFRLVCNCLVRGRGARASPIRSKSSSSNNSSH
>UR2R_HUMAN
AATGSSVPEPPGGPNATLNSSWASPTEPSSLEDLVATGTIGTLLSAMGVVGVVGNAYTLVVTCRSLRAVASMYVYVVNLALADLLYLLSIPFIVATYVTKE-WHFGDVGCRVLFGLDFLTMHASIFTLTVMSSERYAAVLRPLDTVQRPKGYRKLLALGTWLLALLLTLPVMLAMR---GP--KSLCLPA----WGPRAHRAYLTLLFATSIAGPGLLIGLLYARLARAYRRSQR--ASFKRARRPGARALRLVLGIVLLFWACFLPFWLWQLLAQYHQAPRTARIVNYLTTCLTYGNSCANPFLYTLLTRNYRDHLRGRVRGPGSGGGRGPVPSLRCSGRSLSSCSP
>AA2B_CHICK
--------------------------------MNTMKTTYIVLELIIAVLSIAGNVLVCWAVAINSTLKNATNYFLVSLAVADIAVGLLAIPFAITISIG--FQVDFHSCLFFACFVLVLTQSSIFSLLAVAIDRYLAIKIPLYNSLVTGKRARGLIAVLWLLSFVIGLTPLMGWNKAMG-A-FISCLFE-----NVVTMSYMVYFNFFGCVLLPLIIMLGIYIKIFMVACKQ---MGNSRTTLQKEVHAAKSLAIIVGLFAFCWLPLHILNCITHFHEEFSKPEWVMYVAIILSHANSVINPIIYAYRIRDFRYTFHKIISKILCKTDDFPKCTTVTNVNAPAASVT
>5H2A_MOUSE
NTSEASNWTIDAENRTNLSCEGYLPPTCLSILHLQEKNWSALLTTVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYWPLPSKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHSRFNSRTKAFLKIIAVWTISVGISMPIPVFGLQD-VF---GSCLL---------ADDNFVLIGSFVAFFIPLTIMVITYFLTIKSLQKEEPGGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICKESNVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENRKPLQLILAYKSSQLQVGQK
>US27_HCMVA
-------MTTSTNNQTLTQVSNMTNHTLNSTEIYQLFEYTRLGVWLMCIVGTFLNVLVITTILYYRRKKSPSDTYICNLAVADLLIVVGLPFFLEYAKHHP-KLSREVVCSGLNACFYICLFAGVCFLINLSMDRYCVIVWGVLNRVRNNKRATCWVVIFWILAVLMGMPHYLMYSHT-----NNECVGE-ANETSGWFPVFLNTKVNICGYLAPIALMAYTYNRMVRFI---------INYVGKWHMQTLHVLLVVVVSFASFWFPFNLALFLESIRLLANVIIFCLYVGQFLAYVRACLNPGIYILVGTQMRKDMWTTLRVFACCCVKQEIPYQKDIQRRAKHTKR
>NK1R_HUMAN
-----MDNVLPVDSDLSPNISTNTSEPNQFVQPAWQIVLWAAAYTVIVVTSVVGNVVVMWIILAHKRMRTVTNYFLVNLAFAEASMAAFNTVVNFTYAVHNEWYYGLFYCKFHNFFPIAAVFASIYSMTAVAFDRYMAIIHPL-QPRLSATATKVVICVIWVLALLLAFPQGYYST---SR---VVCMIEWPEHPNKIYEKVYHICVTVLIYFLPLLVIGYAYTVVGITLE---IPSDRYHEQVSAKRKVVKMMIVVVCTFAICWLPFHIFFLLPYINPDLKFIQQVYLAIMWLAMSSTMYNPIIYCCLNDRFRLGFKHAFRCCPFISAGDY----STRYLQTQGSVY
>5H1A_HUMAN
-MDVLSPGQGNNTTSPPAPFETGGNTTGISDVTVSYQVITSLLLGTLIFCAVLGNACVVAAIALERSLQNVANYLIGSLAVTDLMVSVLVLPMAALYQVLNKWTLGQVTCDLFIALDVLCCTSSILHLCAIALDRYWAITDPIYVNKRTPRRAAALISLTWLIGFLISIPPMLGW-----DP--DACTIS--------KDHGYTIYSTFGAFYIPLLLMLVLYGRIFRAARFRKNEEAKRKMALARERKTVKTLGIIMGTFILCWLPFFIVALVLPFCESSHMPTLLGAIINWLGYSNSLLNPVIYAYFNKDFQNAFKKIIKCKFCRQ--------------------
>IL8B_MOUSE
SGDLDIFN-YSSGMPSILPDAVPC----HSENLEINSYAVVVIYVLVTLLSLVGNSLVMLVILYNRSTCSVTDVYLLNLAIADLFFALTLPVWAASKVNG--WTFGSTLCKIFSYVKEVTFYSSVLLLACISMDRYLAIVHATSTLIQKRHLVKFVCIAMWLLSVILALPILILRNPVK-LS-TLVCYED-VGNNTSRLRVVLRILPQTFGFLVPLLIMLFCYGFTLRTL---------FKAHMGQKHRAMRVIFAVVLVFLLCWLPYNLVLFTDTLMRTKDDIDKALNATEILGFLHSCLNPIIYAFIGQKFRHGLLKIMATYGLVSKEFLAKEG------------
>GPRY_MOUSE
LCSSHGMHFITNYSDQASQNFGVPNVTSCPMDEKLLSTVLTTFYSVIFLVGLVGNIIALYVFLGIHRKRNSIQIYLLNVAVADLLLIFCLPFRIMYHINQNKWTLGVILCKVVGTLFYMNMYISIILLGFISLDRYIKINRSIQRRAITTKQSIYVCCIVWTVALAGFLTMIILTLKK-----STMCFHY--RDRHNAKGEAIFNFVLVVMFWLIFLLIILSYIKIGKNLLRISKR--SKFPNSGKYATTARNSFIVLIIFTICFVPYHAFRFIYISSQLNEIIHKTNEIMLVFSSFNSCLDPVMYFLMSSNIRKIMCQLLFRRFQSEASRSESTSLHDLSVTVKMPQ
>PAR2_RAT
---LDTPPPITGKGAPVEPGFSVDEFSASVLTGKLTTVFLPVIYIIVFVIGLPSNGMALWVFFFRTKKKHPAVIYMANLALADLLSVIWFPLKISYHLHGNDWTYGDALCKVLIGFFYGNMYCSILFMTCLSVQRYWVIVNPMGHSRKRANIAVGVSLAIWLLIFLVTIPLYVMRQTIY--N--TTCHDVLPEEVLVGDMFSYFLSLAIGVFLFPALLTASAYVLMIKTL------SAMDEHSEKKRRRAIRLIITVLSMYFICFAPSNVLLVVHYFLIKSSHVYALYLVALCLSTLNSCIDPFVYYFVSKDFRDQARNALLCRSVRTVKRMQISLKSSS--------
>5H5A_MOUSE
LPVNLTSFSLSTPSSLEPNRSDTEVLRPSRPFLSAFRVLVLTLLGFLAAATFTWNLLVLATILKVRTFHRVPHNLVASMAISDVLVAVLVMPLSLVHELSGRWQLGRRLCQLWIACDVLCCTASIWNVTAIALDRYWSITRHLYTLRTRKRVSNVMILLTWALSTVISLAPLLFGWGE-S-E-SEECQVS--------REPSYTVFSTVGAFYLPLWLVLFVYWKIYRAAKFRATVTEGDTWREQKEQRAALMVGILIGVFVLCWFPFFVTELISPLCSW-DVPAIWKSIFLWLGYSNSFFNPLIYTAFNRSYSSAFKVFFSKQQ-----------------------
>IL8A_GORGO
PQMWDFDDLNFTGMPPIDEDYSPC----RLETETLNKYVVIITYALAFLLSLLGNSLVMLVILYSRGGRSVTDVYLLNLALADLLFALTLPIWAASKVNG--WIFGTFLCKVVSLLKEVNFYSGILLLACISVDRYLAIVHATRTLTQKRHLVKFVCLGCWGLSMILSLPFFLFRQAYH-NS-SPVCYEV-LGNDTAKWRMVLRILPHTFGFIVPLFVMLFCYGFTLRTL---------FKAHMGQKHRAMRVIFAVVLIFLLCWLPYNLVLLADTLMRTQNNVSLALDATEILGFLHSCLNPIIYAFIGQNFRHGFLKILAMHGLVSKEFLARHR------------
>5H2A_RAT
NTSEASNWTIDAENRTNLSCEGYLPPTCLSILHLQEKNWSALLTTVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYWPLPSKLCAIWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHSRFNSRTKAFLKIIAVWTISVGISMPIPVFGLQD-VF---GSCLL---------ADDNFVLIGSFVAFFIPLTIMVITYFLTIKSLQKEEPGGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICKESNVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENRKPLQLILAYKSSQLQVGQK
>SSR2_BOVIN
ELNETQPWLTTPFDLNGSVGAANISNQTEPYYDLASNVVLTFIYFVVCIIGLCGNTLVIYVILRYAKMKTITNIYILNLAIADELFMLGLPFLAMQVALVH-WPFGKAICRVVMTVDGINQFTSIFCLTVMSIDRYLAVVHPISAKWRRPRTAKMINVAVWGVSLLVILPIMIYAGLRSWG--RSSCTIN-WPGESGAWYTGFIIYAFILGFLVPLTIICLCYLFIIIKVKSSGIR-VGSSKRKKSEKKVTRMVSIVVAVFIFCWLPFYIFNVSSVSVAISPALKGMFDFVVVLTYANSCANPILYAFLSDNFKKSFQNVLCLVKVSGTDDGERSDLNETTETQRTLL
>GP42_HUMAN
-----------------------MDTGPDQSYFSGNHWFVFSVYLLTFLVGLPLNLLALVVFVGKLRRPVAVDVLLLNLTASDLLLLLFLPFRMVEAANGMHWPLPFILCPLSGFIFFTTIYLTALFLAAVSIERFLSVAHPLYKTRPRLGQAGLVSVACWLLASAHCSVVYVIEFSGD--T--GTCYLE-FWKDQLAILLPVRLEMAVVLFVVPLIITSYCYSRLVWILGR--------GGSHRRQRRVAGLVAATLLNFLVCFGPYNVSHVVGYICG---ESPVWRIYVTLLSTLNSCVDPFVYYFSSSGFQADFHELLRRLCGLWGQWQQESSGGEEQRADRPAE
>APJ_MACMU
---------MEEGGDFDNYYGADNQSECEYTDWKSSGALIPAIYMLVFLLGTTGNGLVLWTVFRSSRKRRSADIFIASLAVADLTFVVTLPLWATYTYRDYDWPFGTFSCKLSSYLIFVNMYASVFCLTGLSFDRYLAIVRPVNARLRLRVSGAVATAVLWVLAALLAMPVMVFRTTGDQCY-MDYSMVA-TVSSDWAWEVGLGVSSTTVGFVVPFTIMLTCYFFIAQTIAGHFR--KERIEGLRKRRRLLSIIVVLVVTFALCWMPYHLVKTLYMLGSLLLFLMNVFPYCTCISYVNSCLNPFLYAFFDPRFRQACTSMLCCGQSRCAGTSHSSSSSGHSQGPGPNM
>AG22_MOUSE
RNITSSRPFDNLNATGTNESAFNC----SHKPSDKHLEAIPVLYYMIFVIGFAVNIVVVSLFCCQKGPKKVSSIYIFNLALADLLLLATLPLWATYYSYRYDWLFGPVMCKVFGSFLTLNMFASIFFITCMSVDRYQSVIYPFLSQRRNPWQASYVVPLVWCMACLSSLPTFYFRDVRT-LG--NACIMAFPPEKYAQWSAGIALMKNILGFIIPLIFIATCYFGIRKHLLKT----NSYGKNRITRDQVLKMAAAVVLAFIICWLPFHVLTFLDALTWMGAVIDLALPFAILLGFTNSCVNPFLYCFVGNRFQQKLRSVFRVPITWLQGKRETMSREMD-------T
>FSHR_SHEEP
FDMMYSEFDYDLCSEVVDVTCSPEPDAFNPCEDIMGYDILRVLIWFISILAITGNILVLVILITSQYKLTVPRFLMCNLAFADLCIGIYLLLIASVDVHTKSWQTG-AGCDAAGFFTVFASELSVYTLTAITLERWHTITHAMLECKVHVRHAASIMLVGWVFAFAVALFPIFGISSY----KVSICLPM-----IDSPLSQLYVMSLLVLNVLAFVVICGCYTHIYLTVRNP------NITSSSSDTKIAKRMAMLIFTDFLCMAPISFFAISASLKVPLITVSKSKILLVLFYPINSCANPFLYAIFTRNFRRDFFILLSKFGCYEVQAQTYRSNFHPRNGHCPPA
>YN84_CAEEL
EVFHHISTTNKIFQKMFDKRNFSTDYTFNPKTFPGYRTYVASTYISFNVVGFVINAWVLYVVAPLLFVPKSILFYIFALCVGDLMTMIAMLLLVIELVFG--TWQFS-SMVCTSYLIFDSMNKFMAPMIVFLISRTCYSTVCLGEKAATLKYAIIQFCIAFAFVMILLWPVFAYSQVFTQEVVMRKCGFF----PPPQIEFWFNLIACITSYAVPLFGIIYWYVSVPFFLKRR---LVASSSMDAALRKVITTVLLLTVIYVLCWTPYWVSMFANRIWIMEKSIIIISYFIHLLPYISCVAYPLIFTLLNRGIRSAHAKIVADQRRRFRSLTDEASRTIPGTKMKKNE
>5H1D_CANFA
SPPNQSLEGLLQEASNRSLNATETPEAWGPETLQALKISLALLLSIITMATALSNAFVLTTIFLTRKLHTPANYLIGSLAMTDLLVSILVMPISIAYTTTRTWSFGQILCDIWLSSDITCCTASILHLCVIALDRYWAITDALYSKRRTAGRAAVMIATVWVISICISIPPLFWR----Q-E--SDCQVN-------TSQISYTIYSTCGAFYIPSVLLIILYGRIYVAARNRKLALERKRISAARERKATKTLGIILGAFIVCWLPFFVASLVLPICRASWLHPALFDFFTWLGYLNSLINPIIYTVFNEEFRQAFQRVVHVRKAS---------------------
>GASR_BOVIN
GASLCRSGGPLLNGSGTGNLSCEPPRIRGAGTRELELAIRVTLYAVIFLMSVGGNVLIIVVLGLSRRLRTVTNAFLLSLAVSDLLLAVACMPFTLLPNLMGTFIFGTVVCKAVSYFMGVSVSVSTLSLVAIALERYSAICRPLARVWQTRSHAARVIVATWMLSGLLMVPYPVYTAVQP---V-LQCMHR---WPSARVRQTWSVLLLLLLFFVPGVVMAVAYGLISRELYLGPGPTRPAQAKLLAKKRVVRMLLVIVVLFFLCWLPVYSANTWRAFDGPGALSGAPISFIHLLTYASACVNPLVYCFMHRRFRQACLDTCTRCCPRPPRARPRPLPSIASLSRLSYT
>BRS4_BOMOR
QTLPSAISSIAHLESLNDSFILGAKQSEDVSPGLEILALISVTYAVIISVGILGNTILIKVFFKIKSMQTVPNIFITSLAFGDLLLLLTCVPVDASRYIVDTWMFGRAGCKIISFIQLTSVGVSVFTLTVLSADRYRAIVKPLLQTSDAVLKTCGKAVCVWIISMLLAAPEAVFSDLYETT--FEACAPY---VSEKILQETHSLICFLVFYIVPLSIISAYYFLIAKTLYKSMPAHTHARKQIESRKRVAKTVLVLVALFAVCWLPNHMLYLYRSFTYHSAFHLSATIFARVLAFSNSCVNPFALYWLSRSFRQHFKKQVYCCKTEPPAS--QQSTGITAVKGNIQM
>OPSB_RAT
-----MSGE-EFYLFQNISSVGPWDGPQYHIAPVWAFHLQAAFMGFVFFAGTPLNATVLVATLHYKKLRQPLNYILVNVSLGGFLFCIFSVFTVFIASCHGYFLFGRHVCALEAFLGSVAGLVTGWSLAFLAFERYLVICKPFGNIRFNSKHALTVVLITWTIGIGVSIPPFFGWSRFIPEGLQCSCGPDWYTVGTKYRSEHYTWFLFIFCFIIPLSLICFSYFQLLRTLRAVAAQQQESATTQKAEREVSHMVVVMVGSFCLCYVPYAALAMYMVNNRNHGLYLRLVTIPAFFSKSSCVYNPIIYCFMNKQFRACILEMVCRKPMTD---ESDMSSTVSSSKVGPH-
>NK2R_RABIT
----MGACDIVTEANISSDIDSNATGVTAFSMPGWQLALWATAYLALVLVAVVGNATVIWIILAHRRMRTVTNYFIVNLALADLCMATFNAAFNFVYASHNIWYFGRAFCYFQNLFPITAMFVSIYSMTAIAADRYMAIVHPF-QPRLSGPGTKAVIAGIWLVALALAFPQCFYST---GA---TKCVVAWPEDSGGKMLLLYHLTVIALIYFLPLVVMFVAYSVIGFKLWRRPGHHGANLRHLRAKKKFVKTMVLVVVTFAVCWLPYHLYFLLGHFQDDIKFIQQVYLVLFWLAMSSTMYNPIIYCCLNHRFRSGFRLAFRCCPWVTPTEE----HTPSLSVRVNRC
>PAR2_MOUSE
---LETQPPITGKGVPVEPGFSIDEFSASILTGKLTTVFLPVVYIIVFVIGLPSNGMALWIFLFRTKKKHPAVIYMANLALADLLSVIWFPLKISYHLHGNNWVYGEALCKVLIGFFYGNMYCSILFMTCLSVQRYWVIVNPMGHPRKKANIAVGVSLAIWLLIFLVTIPLYVMKQTIY--N--TTCHDVLPEEVLVGDMFNYFLSLAIGVFLFPALLTASAYVLMIKTL------SAMDEHSEKKRQRAIRLIITVLAMYFICFAPSNLLLVVHYFLIKTSHVYALYLVALCLSTLNSCIDPFVYYFVSKDFRDHARNALLCRSVRTVNRMQISLKSGS--------
>OPSD_CARAU
MNGTEGDMFYVPMSNATGIVRSPYDYPQYYLVAPWAYACLAAYMFFLIITGFPVNFLTLYVTIEHKKLRTPLNYILLNLAISDLFMVFGGFTTTMYTSLHGYFVFGRVGCNPEGFFATLGGEMGLWSLVVLAFERWMVVCKPVSNFRFGENHAIMGVVFTWFMACTCAVPPLVGWSRYIPEGMQCSCGVDYYTRPQAYNNESFVIYMFIVHFIIPLIVIFFCYGRLVCTVKEAAAQHEESETTQRAEREVTRMVVIMVIGFLICWIPYASVAWYIFTHQGSEFGPVFMTLPAFFAKTAAVYNPCIYICMNKQFRHCMITTLCCGKNPFEEEEGASTASSVSSSSVSPA
>TRFR_HUMAN
------------MENETVSELNQTQLQPRAVVALEYQVVTILLVLIICGLGIVGNIMVVLVVMRTKHMRTPTNCYLVSLAVADLMVLVAAGLPNITDSIYGSWVYGYVGCLCITYLQYLGINASSCSITAFTIERYIAICHPIAQFLCTFSRAKKIIIFVWAFTSLYCMLWFFLLDLN--DA-SCGYKIS------RNYYSPIYLMDFGVFYVVPMILATVLYGFIARILFLNLNVNRCFNSTVSSRKQVTKMLAVVVILFALLWMPYRTLVVVNSFLSSPFQENWFLLFCRICIYLNSAINPVIYNLMSQKFRAAFRKLCNCKQKPTEKPANYSVKESDHFSTELDD
>PE24_MOUSE
GTIPRSNRELQRCVLLTTTIMSIPGVNASFSSTPERLNSPVTIPAVMFIFGVVGNLVAIVVLCKSRKKETTFYTLVCGLAVTDLLGTLLVSPVTIATYMKGQWPGDQALCDYSTFILLFFGLSGLSIICAMSIERYLAINHAYYSHYVDKRLAGLTLFAIYASNVLFCALPNMGLGRSERQYPGTWCFID---WTTNVTAYAAFSYMYAGFSSFLILATVLCNVLVCGALLRMAAAVASFRRIAGAEIQMVILLIATSLVVLICSIPLVVRVFINQLYQPNDISRNPDLQAIRIASVNPILDPWIYILLRKTVLSKAIEKIKCLFCRIGGSGRDSSRRTSSAMSGHSR
>C3AR_CAVPO
--------------MESSSAETNSTGLHLEPQYQPETILAMAILGLTFVLGLPGNGLVLWVAGLKMR-RTVNTVWFLHLTVADFVCCLSLPFSMAHLALRGYWPYGEILCKFIPTVIIFNMFASVFLLTAISLDRCLMVLKPICQNHRNVRTACIICGCIWLVAFVLCIPVFVYRETFT-EE-DDLSPFT-HEYRTPRLLKVITFTRLVVGFLLPMIIMVACYTLIIFRM--------RRVRVVKSWNKALHLAMVVVTIFLICWAPYHVFGVLILFINPEAALLSWDHVSIALASANSCFNPFLYALLGRDLRKRVRQSMKGILEAAFSEDISKSAFS---------
>GPR3_MOUSE
GAGSSMAWFSAGSGSVNVSSVDPVEEPTGPATLLPSPRAWDVVLCISGTLVSCENALVVAIIVGTPAFRAPMFLLVGSLAVADLLAG-LGLVLHFAAD-F--CIGSPEMSLMLVGVLAMAFTASIGSLLAITVDRYLSLYNALYYSETTVTRTYVMLALVWVGALGLGLVPVLAWNCR-D--GLTTCGVV-------YPLSKNHLVVLAIAFFMVFGIMLQLYAQICRIVCRHIALHLLPASHYVATRKGIATLAVVLGAFAACWLPFTVYCLLGDA----DSPRLYTYLTLLPATYNSMINPVIYAFRNQDVQKVLWAICCCCSTSKIPFRSRSP------------
>OPSD_OCTDO
--MVESTTLVNQTWWYNPTVDIHPHWAKFDPIPDAVYYSVGIFIGVVGIIGILGNGVVIYLFSKTKSLQTPANMFIINLAMSDLSFSAINGFPLKTISAFMKWIFGKVACQLYGLLGGIFGFMSINTMAMISIDRYNVIGRPMASKKMSHRRAFLMIIFVWMWSIVWSVGPVFNWGAYVPEGILTSCSFD--YLSTDPSTRSFILCMYFCGFMLPIIIIAFCYFNIVMSVSNHRLNLRKAQAGASAEMKLAKISMVIITQFMLSWSPYAIIALLAQFGPAEWVTPYAAELPVLFAKASAIHNPIVYSVSHPKFREAIQTTFPWLLTCCQFDEKECEEVVASERG-GES
>B1AR_MELGA
WLPPDCGPHNRSGGGGATAAPTGSRQVSAELLSQQWEAGMSLLMALVVLLIVAGNVLVIAAIGRTQRLQTLTNLFITSLACADLVMGLLVVPFGATLVVRGTWLWGSFLCECWTSLDVLCVTASIETLCVIAIDRYLAITSPFYQSLMTRARAKVIICTVWAISALVSFLPIMMHWWRDL-K--GCCDFV--------TNRAYAIASSIISFYIPLLIMIFVYLRVYREAKEQNGRRKTSRVMAMREHKALKTLGIIMGVFTLCWLPFFLVNIVNVFNRDL-VPDWLFVFFNWLGYANSAFNPIIYCRSP-DFRKAFKRLLCFPRKADRRLHAGGQFISTLGSPEHSP
>5H1B_CAVPO
PAVLGSQTGLPHANVSAPPNNAP-SHIYQDSIALPWKVLLVVLLALITLATTLSNAFVIATVYRTRKLHTPANYLIASLAFTDLLVSILVMPISTMYTVTGRWTLGQALCDFWLSSDITCCTASIMHLCVIALDRYWAITDAVYSAKRTPRRAAGMIALVWVFSICISLPPFFWR----E-E--LDCLVN-------TDHVLYTVYSTGGAFYLPTLLLIALYGRIYVEARSRRVSLEKKKLMAARERKATKTLGVILGAFIVCWLPFFIISLVMPICKDAWFHMAIFDFFTWLGYLNSLINPIIYTMSNEDFKQAFHKLIRFKCTT---------------------
>BRB2_MOUSE
IEMFNVTTQVLGSALNGTLSKDNC---PDTEWWSWLNAIQAPFLWVLFLLAALENLFVLSVFFLHKNSCTVAEIYLGNLAAADLILACGLPFWAITIANNFDWVFGEVLCRVVNTMIYMNLYSSICFLMLVSIDRYLALVKTMMGRMRGVRWAKLYSLVIWGCTLLLSSPMLVFRTMRE-HN--TACVIV---YPSRSWEVFTNVLLNLVGFLLPLSVITFCTVRILQVLRNN---EMKKFKEVQTERKATVLVLAVLGLFVLCWVPFQISTFLDTLLRLGHAVDVITQISSYVAYSNSGLNPLVYVIVGKRFRKKSREVYRVLCQKGGCMGEPVQLRTS-------I
>OPSB_APIME
YVPSMREKFLGWNVPPEYSDLVRPHWRAFPAPGKHFHIGLAIIYSMLLIMSLVGNCCVIWIFSTSKSLRTPSNMFIVSLAIFDIIMAFEMPMLVISSFMERM--GWEIGCDVYSVFGSISGMGQAMTNAAIAFDRYRTISCPI-DGRLNSKQAAVIIAFTWFWVTPFTVLPLLKVWGRYTEGFLTTCSFD--FLTDDEDTKVFVTCIFIWAYVIPLIFIILFYSRLLSSIRNHNVKSNQDKER-SAEVRIAKVAFTIFFLFLLAWTPYATVALIGVYGNRELLTPVSTMLPAVFAKTVSCIDPWIYAINHPRYRQELQKRCKWMGIHEPETTSDATKTDE--------
>O.YR_HUMAN
GALAANWSAEAANASAAPPGAEGNRTAGPPRRNEALARVEVAVLCLILLLALSGNACVLLALRTTRQKHSRLFFFMKHLSIADLVVAVFQVLPQLLWDITFRFYGPDLLCRLVKYLQVVGMFASTYLLLLMSLDRCLAICQPL--RSLRRRTDRLAVLATWLGCLVASAPQVHIFSLRE----VFDCWAV---FIQPWGPKAYITWITLAVYIVPVIVLATCYGLISFKIWQNRVAVSSVKLISKAKIRTVKMTFIIVLAFIVCWTPFFFVQMWSVWDANA-KEASAFIIVMLLASLNSCCNPWIYMLFTGHLFHELVQRFLCCSASYLKGRRLGENSSSFVLSHRSS
>OPSR_CAPHI
ANFEESTQGSIFTYTNSNSTRDPFEGPNYHIAPRWVYHLTSAWMVFVVIASVFTNGLVLAATMRFKKLRHPLNWILVNLAIADLAETIIASTISVVNQMYGYFVLGHPLCVVEGYTVSLCGITGLWSLAIISWERWMVVCKPFGNVRFDAKLATAGIAFSWIWAAVWTAPPIFGWSRYWPHGLKTSCGPDVFSGSSYPGVQSYMIVLMITCCFIPLSVIILCYLQVWLAIRAVAKQQKESESTQKAEKEVTRMVMVMIFAYCLCWGPYTFFACFAAAHPGYAFHPLVAALPAYFAKSATIYNPIIYVFMNRQFRNCILQLFGKKVDDS-----SELASSV--SSVSPA
>APJ_HUMAN
---------MEEGGDFDNYYGADNQSECEYTDWKSSGALIPAIYMLVFLLGTTGNGLVLWTVFRSSRKRRSADIFIASLAVADLTFVVTLPLWATYTYRDYDWPFGTFFCKLSSYLIFVNMYASVFCLTGLSFDRYLAIVRPVNARLRLRVSGAVATAVLWVLAALLAMPVMVLRTTGDQCY-MDYSMVA-TVSSEWAWEVGLGVSSTTVGFVVPFTIMLTCYFFIAQTIAGHFR--KERIEGLRKRRRLLSIIVVLVVTFALCWMPYHLVKTLYMLGSLLLFLMNIFPYCTCISYVNSCLNPFLYAFFDPRFRQACTSMLCCGQSRCAGTSHSSSSSGHSQGPGPNM
>NY1R_PIG
TLSSQVENHSIYYNFSEKNSQFLAFENDDCHLPLAMIFTLALAYGAVIILGVSGNLALIIIILKQKEMRNVTNILIVNLSFSDLLVAIMCLPFTFVYTLMDHWVFGEVMCKLNPFVQCVSITVSIFSLVLIAVERHQLIINPR-GWRPSNRHAYVGIAVIWVLAVASSLPFLIYQVLTDKD--KYVCFDK---FLSDSHRLSYTTLLLVLQYFGPLCFIFICYFKIYIRLKRRNMMMRDNKYRSSETKRINVMLLSIVVAFAVCWLPLTIFNTVFDWNHQICNHNLLFLLCHLTAMISTCINPIFYGFLNKNFQRDLQFFFNFCDFRSRDDDY---TMHTDVSKTSLK
>OAR1_LYMST
---------MSRDIFMKRLRLHLLFDEVAMVTHIVGDVLSSVLLCAVVLLVLVGNTLVVAAVATSRKLRTVTNVFIVNLACADLLLGVLVLPFSAVNEIKDVWIFGHVWCQVWLAVDVWLCTASILNLCCISLDRYLAITRPIYPGLMSAKRAKTLVAGVWLFSFVICCPPLIGWNDGGT-Y--TTCELT--------NSRGYRIYAALGSFFIPMLVMVFFYLQIYRAAVKTHKPMRLHMQKFNREKKAAKTLAIIVGAFIMCWMPFFTIYLVGAFCENC-ISPIVFSVAFWLGYCNSAMNPCVYALFSRDFRFAFRKLLTCSCKAWSKNRSFRPIQLHCATQDDAK
>OLF5_CHICK
-------------MALGNCTTPTTFILSGLTDNPRLQMPLFMVFLAIYTITLLANLGLIALISVDFHLQTPMYIFLQNLSFTDAAYSTVITPKMLATFLEERRTISYVGCILQYFSFVLLTSSECLLLAVMAYDRYVAICKPLYPAIMTKAVCWRLVEGLYSLAFLNSLVHTSGLLKL---SNHFFCDNSQISSSSTTLNELLVFIFGSWFAMSSIITTPISYVFIILTV--------VRIRSKDGKYKAFSTCTSHLMAVSLFHGTVIFMYLRPVKLF---SLDTDKIASLFYTVVIPMLNPLIYSWRNKEVKDALRRVIATNVWIH--------------------
>A2AR_CARAU
----------MDVTQSNATKDDANITVTPWPYTETAAAFIILVVSVIILVSIVGNVLVIVAVLTSRALRAPQNLFLVSLACADILVATLVIPFSLANEIMGYWFFGSTWCAFYLALDVLFCTSSIVHLCAISLDRYWSVTKAVYNLKRTPKRIKSMIAVVWVISAVISFPPLIMTKH-------KECLIN--------DETWYILSSSLVSFFAPGFIMITVYCKIYRVAKQRSKQASKTKVAQMREKRFTFVLTVVMGVFVLCWFPFFFTYSLHAICGDSEPPEALFKLFFWIGYCNSSVNPIIYTIFNRDFRKAFKKICLLDCAAHLRDSCLGTCIFECHQKSNQE
>VQ3L_CAPVK
SNYTTAYNTTYYSDDYDDYEVSIVDIPHCDDGVDTTSFGLITLYSTIFFLGLFGNIIVLTVLRKYKI-KTIQDMFLLNLTLSDLIFVLVFPFNLYDSIAKQ-WSLGDCLCKFKAMFYFVGFYNSMSFITLMSIDRYLAVVHPVSMPIRTKRYGIVLSMVVWIVSTIESFPIMLFYETKK-VY-ITYCHVF-YNDNAKIWKLFINFEINIFGMIIPLTILLYCYYKILNTL----------KTSQTKNKKAIKMVFLIVICSVLFLLPFSVTVFVSSLYLLNRFVNLAVHVAEIVSLCHCFINPLIYAFCSREFTKKLLRLRTTSSAGSISIG----------------
>GP40_HUMAN
--------------------------------MDLPPQLSFGLYVAAFALGFPLNVLAIRGATAHARRLTPSLVYALNLGCSDLLLTVSLPLKAVEALASGAWPLPASLCPVFAVAHFFPLYAGGGFLAALSAGRYLGAAFPLYQAFRRPCYSWGVCAAIWALVLCHLGLVFGLEAPGGTPVGSPVCLEA----WDPASAGPARFSLSLLLFFLPLAITAFCYVGCLRAL-------ARSGLTHRRKLRAAWVAGGALLTLLLCVGPYNASNVASFLYP--NLGGSWRKLGLITGAWSVVLNPLVTGYLGRGPGLKTVCAARTQGGKSQK------------------
>CKR8_MACMU
--MDYTLDPSMTTMTDYYYPDSLSSPCDGELIQRNDKLLLAVFYCLLFVFSLLGNSLVILVLVVCKKLRNITDIYLLNLALSDLLFVFSFPFQTYYQLDQ--WVFGTVMCKVVSGFYYIGFYSSMFFITLMSVDRYLAVVHAVIKVRTIRMGTTTLSLLVWLTAIMATIPLLVFYQVAS-ED-VLQCYSF-YNQQTLKWKIFTNFEMNILGLLIPFTIFMFCYIKILHQL---------KRCQNHNKTKAIRLVLIVVIASLLFWVPFNVVLFLTSLHSMHQQLNYATHVTEIISFTHCCVNPVIYAFVGEKFKKHLSEIFQKSCSHIFIYLGRQMSSSCQQHSFRSS
>NTR1_RAT
EATFLALSLSNGSGNTSESDTAGPNSDLDVNTDIYSKVLVTAIYLALFVVGTVGNSVTAFTLARKKSLQSTVHYHLGSLALSDLLILLLAMPVELYNFIWVHWAFGDAGCRGYYFLRDACTYATALNVASLSVERYLAICHPFAKTLMSRSRTKKFISAIWLASALLAIPMLFTMGLQN--SGGLVCTPI---VDTATVKVVIQVN-TFMSFLFPMLVISILNTVIANKLTVMEHSMTIEPGRVQALRHGVLVLRAVVIAFVVCWLPYHVRRLMFCYISDEDFYHYFYMLTNALFYVSSAINPILYNLVSANFRQVFLSTLACLCPGWRHRRKKRPSMSSNHAFSTSA
>B3AR_MACMU
MAPWPHGNSSLVPWPDVPTLAPNTANTSGLPGVPWAAALAGALLALAVLATVGGNLLVIVAITRTPRLQTMTNVFVTSLAAADLVMGLLVVPPAATLVLTGHWPLGATGCELWTSVDVLCVTASIETLCALAVDRYLAVTNPLYGALVTKRRARAAVVLVWVVSAAVSFAPIMSQWWRVQ-R--RCCAFA--------SNMPYVLLSSSVSFYLPLLVMLFVYARVFVVATRQGVPRRPARLLPLREHRALCTLGLIMGTFTLCWLPFFLANVLRALGGPS-VPDPAFLALNWLGYANSAFNPLIYCRSP-DFRSAFRRLLCHCGGRLPREPCAADAPLRPGPAPRSP
>GALS_MOUSE
------------MNGSDSQGAEDSSQE-GGGGWQPEAVLVPLFFALIFLVGAVGNALVLAVLLRGGQAVSTTNLFILNLGVADLCFILCCVPFQATIYTLDDWVFGSLLCKAVHFLIFLTMHASSFTLAAVSLDRYLAIRYPMSRELRTPRNALAAIGLIWGLALLFSGPYLSYYS---AN--LTVCHPA----WSAPRRRAMDLCTFVFSYLLPVLVLSLTYARTLHYLWRTDP-VAAGSGSQRAKRKVTRMIVIVAVLFCLCWMPHHALILCVWFGRFPRATYALRILSHLVSYANSCVNPIVYALVSKHFRKGFRKICAGLLRRAPRRASGRVHSGGMLEPESTD
>P2Y3_MELGA
----------------MSMANFTAGRNSCTFQEEFKQVLLPLVYSVVFLLGLPLNAVVIGQIWLARKALTRTTIYMLNLATADLLYVCSLPLLIYNYTQKDYWPFGDFTCKFVRFQFYTNLHGSILFLTCISVQRYMGICHPLWHKKKGKKLTWLVCAAVWFIVIAQCLPTFVFASTG-----RTVCYDL-SPPDRSASYFPYGITLTITGFLLPFAAILACYCSMARILCQK---ELIGLAVHKKKDKAVRMIIIVVIVFSISFFPFHLTKTIYLIVRSSQAFAIAYKCTRPFASMNSVLDPILFYFTQRKFRESTRYLLDKMSSKWRHDHCITY------------
>OPSD_TURTR
MNGTEGLNFYVPFSNKTGVVRSPFEYPQYYLAEPWQFSVLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVANLFMVFGGFTTTLYTSLHAYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGLALTWIMAMACAAAPLVGWSRYIPEGMQCSCGIDYYTSRQEVNNESFVIYMFVVHFTIPLVIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVVAFLICWVPYASVAFYIFTHQGSDFGPIFMTIPSFFAKSSSIYNPVIYIMMNKQFRNCMLTTLCCGRNPLGDDEASTTASKTETSQVAPA
>A2AB_CAVPO
-------------------------MDHQEPYSVQATAAIAAVITFLILFTIFGNALVILAVLTSRSLPAPQNLFLVSLAAADILVATLIIPFSLANELLGYWYFWRTWCEVYLALDVLFCTSSIVHLCAISLDRYWAVSRALYNSKRTPRRIKCIILTVWLIAAVISLPPLIYKGD-Q-----PQCKIN--------QEAWYILASSIGSFFAPCLIMILVYLRIYLIAKRSGAVWWRRRTQMTREKRFTFVLAVVIGVFVLCWFPFFFTYSLGAICPQHKVPHGLFQFFFWIGYCNSSLNPVIYTIFNQDFRRAFRRILCRQWTQTAW------------------
>CML1_MOUSE
EYDAYNDSGIYDDEYSDGFGYFVDLEEASPWEAKVAPVFLVVIYSLVCFLGLLGNGLVIVIATFKMK-KTVNTVWFVNLAVADFLFNIFLPMHITYAAMDYHWVFGKAMCKISNFLLSHNMYTSVFLLTVISFDRCISVLLPVSQNHRSIRLAYMTCSAVWVLAFFLSSPSLVFRDTAN-SS--HPAHSQ-VVSTGYSRHVAVTVTRFLCGFLIPVFIITACYLTIVFKL---------QRNRLAKNKKPFKIIITIIITFFLCWCPYHTLYLLELHHTAVSVFSLGLPLATAVAIANSCMNPILYVFMGHDFRK-FKVALFSRLANALSEDTGPSFTKMSSLNEKAS
>5HT1_APLCA
-MKSLKSSTHDVPHPEHVVWAPPAYDEQHHLFFSHGTVLIGIVGSLIITVAVVGNVLVCLAIFTEPISHSKSNFFIVSLAVADLLLALLVMTFALVNDMYGYWLFGETFCFIWMSADVMCETASIFSICVISYDRLKQVQKPLYEEFMTTTRALLIIACLWICSFVLSFVPIFLEWHELGD-AKHVCLFD--------VHFTYSVIYSFICFYVPCTLMLTNYLRLFLIAQTHQLRASSYRNQGTQGSKAARTLTIITGTFLACWLPFFIINPIAAADEHL-IPLECFMVTIWLGYFNSSVNPIIYGTSNSKFRAAFKRLLRCRSVKSVVGSISPVSWIRPSRLDLSS
>5H1A_FUGRU
NDSNATSGYSDTAAVDWDEGENATGSGSLPDPELSYQIITSLFLGALILCSIFGNSCVVAAIALERSLQNVANYLIGSLAVTDLMVSVLVLPMAALYQVLNKWTLGQDICDLFIALDVLCCTSSILHLCAIALDRYWAITDPIYVNKRTPRRAAVLISVTWLIGFSISIPPMLGWRS---NP--DACIIS--------QDPGYTIYSTFGAFYIPLILMLVLYGRIFKAARFRINEGTRRKIALARERKTVKTLGIIMGTFIFCWLPFFIVALVLPFCAENYMPEWLGAVINWLGYSNSLLNPIIYAYFNKDFQSAFKKILRCKFHRH--------------------
>ACTR_BOVIN
-------------MKHILNLYENINSTARNNSDCPAVILPEEIFFTVSIVGVLENLMVLLAVAKNKSLQSPMYFFICSLAISDMLGSLYKILENVLIMFKNMGSFESTADDVVDSLFILSLLGSICSLSVIAADRYITIFHALYHRIMTPHRALVILTVLWAGCTGSGITIVTFFS----------------------HHHVPTVIAFTALFPLMLAFILCLYVHMFLLARSH---TRRTPSLPKANMRGAVTLTVLLGVFIFCWAPFVLHVLLMTFCPADACYMSLFQVNGVLIMCNAIIDPFIYAFRSPELRVAFKKMVICNCYQ---------------------
>CKR9_MOUSE
DDFSYDSTASTDDYMNLNFSSFFC---KKNNVRQFASHFLPPLYWLVFIVGTLGNSLVILVYWYCTRVKTMTDMFLLNLAIADLLFLATLPFWAIAAAGQ--WMFQTFMCKVVNSMYKMNFYSCVLLIMCISVDRYIAIVQAMVWRQKRLLYSKMVCITIWVMAAVLCTPEILYSQVSG-G--IATCTMVYPKDKNAKLKSAVLILKVTLGFFLPFMVMAFCYTIIIHTL---------VQAKKSSKHKALKVTITVLTVFIMSQFPYNSILVVQAVDAYATNIDICFQVTQTIAFFHSCLNPVLYVFVGERFRRDLVKTLKNLGCISQAQWVSFT--------GSLK
>5H1B_SPAEH
CAPPPPAGSQTQTPSSNLSHNSADSYIYQDSIALPWKVLLVALLALITLATTLSNAFVIATVYRTRKLHTPANYLIASLAVTDLLVSILVMPISTMYTVTGRWTLGQVVCDFWLSSDITCCTASIMHLCVIALDRYWAITDAVYSAKRTPRRAAVMIALVWVFSISISLPRFFWR----E-E--LDCLVN-------TDHVLYTVYSTVGAFYLPTLLLIALYGRIYVEARSRRVSLEKKKLMAARERKATKTLGIILGAFIVCWLPFFIISLVMPICKDAWFHMAIFDFFNWLGYLNSLINPIIYTMPNEDFKQAFHKLIRFKCTG---------------------
>OPS4_DROVI
VSGNGDLQFLGWNVPPDQIQHIPEHWLTQLEPPASMHYMLGVFYIFLFCASTVGNGMVIWIFSTSKALRTPSNMFVLNLAVFDFIMCLKAPIFIYNSFHRG-FALGNTGCQIFAAIGSYSGIGAGMTNAAIGYDRLNVITKPM-NRNMTFTKAIIMNVIIWLYCTPWVVLPLTQFWDRFPEGYLTSCTFD--YLTDNFDTRLFVGTIFFFSFVCPTLMIIYYYSQIVGHVFSHNVESNVDKSKDTAEIRIAKAAITICFLFFVSWTPYGVMSLIGAFGDKSLLTPGATMIPACTCKLVACIDPFVYAISHPRYRMELQKRCPWLAIDEKAPESSSAEQQQTTAA----
>OPSG_RABIT
ESHEDSTQASIFTYTNSNSTRGPFEGPNFHIAPRWVYHLTSAWMILVVIASVFTNGLVLVATMRFKKLRHPLNWILVNLAVADLAETVIASTISVVNQFYGYFVLGHPLCVVEGYTVSLCGITGLWSLAIISWERWLVVCKPFGNVRFDAKLAIAGIAFSWIWAAVWTAPPIFGWSRYWPYGLKTSCGPDVFSGTSYPGVQSYMMVLMVTCCIIPLSVIVLCYLQVWMAIRTVAKQQKESESTQKAEKEVTRMVVVMVFAYCLCWGPYTFFACFATAHPGYSFHPLVAAIPSYFAKSATIYNPIIYVFMNRQFRNCILQLFGKKVEDS-----SELASSV--SSVSPA
>NK3R_RABIT
LVQAGNLSSSLPSSVPGLPTTPRANLTNQFVQPSWRIALWSLAYGVVVAVAVFGNLIVIWIILAHKRMRTVTNYFLVNLAFSDASMAAFNTLVNFIYALHSEWYFGANYCRFQNFFPITAVFASIYSMTAIAVDRYMAIIDPL-KPRLSATATKIVIGSIWILAFLLALPQCLYSK---GR---TLCYVQ--WPEGPKQHFIYHIIVIILVYCFPLLIMGITYTIVGITLWGG--PCDKYHEQLKAKRKVVKMMIIVVVTFAICWLPYHIYFILTAIYQQLKYIQQVYLASFWLAMSSTMYNPIIYCCLNKRFRAGFKRAFRWCPFIQVSSYDELEPTRQSSLYTVTR
>ACM5_HUMAN
-------MEGDSYHNATTVNGTPVNHQPLERHRLWEVITIAAVTAVVSLITIVGNVLVMISFKVNSQLKTVNNYYLLSLACADLIIGIFSMNLYTTYILMGRWALGSLACDLWLALDYVASNASVMNLLVISFDRYFSITRPLYRAKRTPKRAGIMIGLAWLISFILWAPAILCWQYLV-VP-LDECQIQ------FLSEPTITFGTAIAAFYIPVSVMTILYCRIYRETEKRNPSTKRKRVVLVKERKAAQTLSAILLAFIITWTPYNIMVLVSTFCDKC-VPVTLWHLGYWLCYVNSTVNPICYALCNRTFRKTFKMLLLCRWKKKKVEEKLYW------------
>O3A1_HUMAN
----------MQPESGANGTVIAEFILLGLLEAPGLQPVVFVLFLFAYLVTVRGNLSILAAVLVEPKLHTPMYFFLGNLSVLDVGCISVTVPSMLSRLLSRKRAVPCGACLTQLFFFHLFVGVDCFLLTAMAYDQFLAICRPLYSTRMSQTVQRMLVAASWACAFTNALTHTVAMSTL---NNHFYCDLPQLSCSSTQLNELLLFAVGFIMAGTPMALIVISYIHVAAAV--------LRIRSVEGRKKAFSTCGSHLTVVAIFYGSGIFNYMRLGSTK---LSDKDKAVGIFNTVINPMLNPIIYSFRNPDVQSAIWRMLTGRRSLA--------------------
>CKRB_BOVIN
YNQSTDYYYEENEMNDTHDYSQYEVICIKEEVRKFAKVFLPAFFTIAFIIGLAGNSTVVAIYAYYKKRRTKTDVYILNLAVADLFLLFTLPFWAVNAVHG--WVLGKIMCKVTSALYTVNFVSGMQFLACISTDRYWAVTKAP-SQSGVGKPCWVICFCVWVAAILLSIPQLVFYTVN----HKARCVPI--YHLGTSMKASIQILEICIGFIIPFLIMAVCYFITAKTL---------IKMPNIKKSQPLKVLFTVVIVFIVTQLPYNIVKFCQAIDIIYKRMDVAIQITESIALFHSCLNPVLYVFMGTSFKNYIMKVAKKYGSW---------NVEEIPFESEDA
>TA2R_HUMAN
--MWPNG-----------SSLGPCFRPTNITLEERRLIASPWFAASFCVVGLASNLLALSVLAGARQTRSSFLTFLCGLVLTDFLGLLVTGTIVVSQHAALFVDPGCRLCRFMGVVMIFFGLSPLLLGAAMASERYLGITRPFRPAVASQRRAWATVGLVWAAALALGLLPLLGVGRYTVQYPGSWCFLT----LGAESGDVAFGLLFSMLGGLSVGLSFLLNTVSVATLCHVYHGEAAQQRPRDSEVEMMAQLLGIMVVASVCWLPLLVFIAQTVLRNPPRTTEKELLIYLRVATWNQILDPWVYILFRRAVLRRLQPRLSTRPRRVSLCGPAWSTATSASRVQAIL
>AG2R_CHICK
------MVPNYSTEETVKRIHVDC---PVSGRHSYIYIMVPTVYSIIFIIGIFGNSLVVIVIYCYMKLKTVASIFLLNLALADLCFLITLPLWAAYTAMEYQWPFGNCLCKLASAGISFNLYASVFLLTCLSIDRYLAIVHPVSRIRRTMFVARVTCIVIWLLAGVASLPVIIHRNIFF-LN--TVCGFR-YDNNNTTLRVGLGLSKNLLGFLIPFLIILTSYTLIWKTLKKA----YQIQRNKTRNDDIFKMIVAIVFFFFFSWIPHQVFTFLDVLIQLHDIVDTAMPFTICIAYFNNCLNPFFYVFFGKNFKKYFLQLIKYIPPNVSTHPSLTTRPPE-------N
>P2YR_MOUSE
AAFLAGLGSLWGNSTVASTAAVSSSFQCALTKTGFQFYYLPAVYILVFIIGFLGNSVAIWMFVFHMKPWSGISVYMFNLALADFLYVLTLPALIFYYFNKTDWIFGDAMCKLQRFIFHVNLYGSILFLTCISAHRYSGVVYPLSLGRLKKKNAIYVSVLVWLIVVVAISPILFYSGTG----KTVTCYDT-TSNDYLRSYFIYSMCTTVAMFCIPLVLILGCYGLIVKALIY------NDLDNSPLRRKSIYLVIIVLTVFAVSYIPFHVMKTMNLRARLDDRVYATYQVTRGLASLNSCVDPILYFLAGDTFRRRLSRATRKASRRSEANLQSKSLSEFKQNGDTSL
>NY5R_RAT
-------------NTAAARNAAFPAWEDYRGSVDDLQYFLIGLYTFVSLLGFMGNLLILMAVMKKRNQKTTVNFLIGNLAFSDILVVLFCSPFTLTSVLLDQWMFGKAMCHIMPFLQCVSVLVSTLILISIAIVRYHMIKHPI-SNNLTANHGYFLIATVWTLGFAICSPLPVFHSLVESS--KYLCVES---WPSDSYRIAFTISLLLVQYILPLVCLTVSHTSVCRSISCGAHEKRSITRIKKRSRSVFYRLTILILVFAVSWMPLHVFHVVTDFNDNLRHFKLVYCICHLLGMMSCCLNPILYGFLNNGIKADLRALIHCLHMS---------------------
>THRR_PAPHA
LTEYRLVSINKSSPLQKPLPAFISEDASGYLTSSWLTLFVPSVYTGVFVVSLPVNIMAIVVFILKMKVKKPAVVYMLHLATADVLFVSVLPFKISYYLSGSDWQFGSELCRFVTAAFYCNMYASILLMTVISIDRFLAVVYPMSLSWRTLGRASFTCLAIWALAIAGVVPLLLKEQTIQ--N--TTCHDVLNETLLEGYYAYYFSAFSAVFFFVPLIISTVCYVSIIRCL------SSSTVANRSKKSRALFLSAAVFCIFIICFGPTNILLIAHYSFLSHEAAYFAYLLCVCVSSISCCIDPLIYYYASSECQRYVYSILCCKESSDPSSSNSSGDTCS--------
>FSHR_HUMAN
FDMTYTEFDYDLCNEVVDVTCSPKPDAFNPCEDIMGYNILRVLIWFISILAITGNIIVLVILTTSQYKLTVPRFLMCNLAFADLCIGIYLLLIASVDIHTKSWQTG-AGCDAAGFFTVFASELSVYTLTAITLERWHTITHAMLDCKVQLRHAASVMVMGWIFAFAAALFPIFGISSY----KVSICLPM-----IDSPLSQLYVMSLLVLNVLAFVVICGCYIHIYLTVRNP------NIVSSSSDTRIAKRMAMLIFTDFLCMAPISFFAISASLKVPLITVSKAKILLVLFHPINSCANPFLYAIFTKNFRRDFFILLSKCGCYEMQAQIYRTNTHPRNGHCSSA
>OPSB_CONCO
MNGTEGPNFYVPMSNATGVVRSPFEYPQYYLAEPWAFSILAAYMFFLIITGFPINFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVFGETGCNLEGYFATLGGEISLWSLVVLAIERWVVVCKPISNFRFGENHAIMGLTLTWVMANACAMPPLFGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFLVHFTIPLTIISFCYGRLVCAVKEAAAQQQESETTQRAEREVTRMVVIMVISFLVCWIPYASVAWYIFTHQGSTFGPIFMTVPSFFAKSSSIYNPMIYICMNKQFRNCMITTLFCGKNPFEGEE--EGASAVSS--VSPA
>A2AA_HUMAN
----MGSLQPDAGNASWNGTEAPGGGARATPYSLQVTLTLVCLAGLLMLLTVFGNVLVIIAVFTSRALKAPQNLFLVSLASADILVATLVIPFSLANEVMGYWYFGKAWCEIYLALDVLFCTSSIVHLCAISLDRYWSITQAIYNLKRTPRRIKAIIITVWVISAVISFPPLISIEKKGQ-P--PRCEIN--------DQKWYVISSCIGSFFAPCLIMILVYVRIYQIAKRRRVGASRWRGRQNREKRFTFVLAVVIGVFVVCWFPFFFTYTLTAVGCS--VPRTLFKFFFWFGYCNSSLNPVIYTIFNHDFRRAFKKILCRGDRKRIV------------------
>OPSD_TRIMA
MNGTEGPNFYVPFSNKTGVVRSPFEYPQYYLAEPWQFSMLAAYMFLLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNVEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVAFTWVMALACAAPPLAGWSRYIPEGMQCSCGIDYYTLKPEVNNESFVIYMFVVHFTIPMIVIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVIAFLICWVPYASVAFYIFTHQGSNFGPIFMTLPAFFAKSASIYNPVIYIMMNKQFRNCMLTTICCGKNPFAEEEGATTVSKTETSQVAPA
>AA3R_HUMAN
-------------------------MPNNSTALSLANVTYITMEIFIGLCAIVGNVLVICVVKLNPSLQTTTFYFIVSLALADIAVGVLVMPLAIVVSLG--ITIHFYSCLFMTCLLLIFTHASIMSLLAIAVDRYLRVKLTVYKRVTTHRRIWLALGLCWLVSFLVGLTPMFGWN-MKE-Y-FLSCQFV-----SVMRMDYMVYFSFLTWIFIPLVVMCAIYLDIFYIIRNK--NSKETGAFYGREFKTAKSLFLVLFLFALSWLPLSIINCIIYFNGE--VPQLVLYMGILLSHANSMMNPIVYAYKIKKFKETYLLILKACVVCHPSDSLDTS------------
>ET1R_HUMAN
ELSFLVTTHQPTNLVLPSNGSMHNYCPQQTKITSAFKYINTVISCTIFIVGMVGNATLLRIIYQNKCMRNGPNALIASLALGDLIYVVIDLPINVFKLLAGRNDFGVFLCKLFPFLQKSSVGITVLNLCALSVDRYRAVASWSVQGIGIPLVTAIEIVSIWILSFILAIPEAIGFVMVPKT--HKTCMLNATSKFMEFYQDVKDWWLFGFYFCMPLVCTAIFYTLMTCEMLNRGSL-IALSEHLKQRREVAKTVFCLVVIFALCWFPLHLSRILKKTVYNESFLLLMDYIGINLATMNSCINPIALYFVSKKFKNCFQSCLCCCCYQSKSLMTSVPWKNHDQNNHNTD
>GALS_HUMAN
------------MNVSGCPGAGNASQAGGGGGWHPEAVIVPLLFALIFLVGTVGNTLVLAVLLRGGQAVSTTNLFILNLGVADLCFILCCVPFQATIYTLDGWVFGSLLCKAVHFLIFLTMHASSFTLAAVSLDRYLAIRYPLSRELRTPRNALAAIGLIWGLSLLFSGPYLSYYR---AN--LTVCHPA----WSAPRRRAMDICTFVFSYLLPVLVLGLTYARTLRYLWRADP-VAAGSGARRAKRKVTRMILIVAALFCLCWMPHHALILCVWFGQFPRATYALRILSHLVSYANSCVNPIVYALVSKHFRKGFRTICAGLLGRAPGRASGRVHSGSVLERESSD
>OPSV_ORYLA
-------MGKYFYLYENISKVGPYDGPQYYLAPTWAFYLQAAFMGFVFFVGTPLNFVVLLATAKYKKLRVPLNYILVNITFAGFIFVTFSVSQVFLASVRGYYFFGQTLCALEAAVGAVAGLVTSWSLAVLSFERYLVICKPFGAFKFGSNHALAAVIFTWFMGV-VRCPPFFGWSRYIPEGLGCSCGPDWYTNCEEFSCASYSKFLLVTCFICPITIIIFSYSQLLGALRAVAAQQAESASTQKAEKEVSRMIIVMVASFVTCYGPYALTAQYYAYSQDENKDYRLVTIPAFFSKSSCVYNPLIYAFMNKQFNGCIMEMVFGKKMEE-----ASESTDS--------
>YWO1_CAEEL
----------------------------MNHVDYVAHVIVMPIVLSIGMINQCLNVCTLLHIRT------SIFLYLKASAIADILSIVAFIPFLFRHAKLIDELGMFYHAHLELPLINALISASALNIVAMTVDRYVSVCHPINETKPSRRRTMLIIVMIYFIALMIYFPSVFQKKLG-VTTIYTIVRNE---VEALQVFKFYLIVRECICRWGPVLLLVILNMCVVRGLRKIRTEQRQLRSPRDDRSRISVLLFVTSATFIICNIPASVISFFVRRVSGSLFWQIFRAIANLLQVTSYLYNFYLYALCSSEYRHAFLRLFGCRSSLSPTSTGDSPGKRCHQAVVLLG
>CKR5_RAT
--MDFQGSIPTYIYDIDYSMSAPC---QKVNVKQIAAQLLPPLYSLVFIFGFVGNMMVFLILISCKKLKSMTDIYLFNLAISDLLFLLTLPFWAHYAANE--WVFGNIMCKLFTGIYHIGYFGGIFFIILLTIDRYLAIVHAVAIKARTVNFGVITSVVTWVVAVFVSLPEIIFMRSQK-EG-HYTCSPHFLHIQYRFWKHFQTLKMVILSLILPLLVMVICYSGILNTL--------FRCRNEKKRHRAVRLIFAIMIVYFLFWTPYNIVLLLTTFQEYFNRLDQAMQVTETLGMTHCCLNPVIYAFVGEKFRNYLSVFFRKHIVKRFCKHCSIFV-------SSVY
>5HT_HELVI
TSSEWFDGSNCSWVDAVSWGCTSTNATSTDVTSFVLMAVTSVVLALIILATIVGNVFVIAAIIIERNLQNVANYLVASLAVADLMVACLVMPLGAVYEVSQGWILGPELCDMWTSSDVLCSSASILHLVAIATDRYWAVTDV-YIHIRNEKRIFTMIVLVWGAALVVSLAPQLGWKDPDT-Q--QKCLVS--------QDLAYQIFATMSTFYVPLAVILILYWKIFQTARRRPAPEKKESLEAKRERKAAKTLAIITGAFVFCWLPFFIMALVMPICQTCVISDYLASFFLWLGYFNSTLNPVIYTIFSPDFRQAFARILFGTHRRRRYKKF---------------
>NTR2_HUMAN
-----METSSPRPPRPSSNPGLSLDARLGVDTRLWAKVLFTALYALIWALGAAGNALSVHVVLKARARAGRLRHHVLSLALAGLLLLLVGVPVELYSFVWFHWVFGDLGCRGYYFVHELCAYATVLSVAGLSAERCLAVCQPLARSLLTPRRTRWLVALSWAASLGLALPMAVIMGQKH--AASRVCTVL---VVSRTALQVFIQVNVLVSFVLPLALTAFLNGVTVSHLLALGQVRHKDVRRIRSLQRSVQVLRAIVVMYVICWLPYHARRLMYCYVPDDNFYHYFYMVTNTLFYVSSAVTPLLYNAVSSSFRKLFLEAVSSLCGEHHPMKRLPPMDTASGFGDPPE
>NY4R_MOUSE
HFLAPLFPGSLQGKNGTNPLDSPYNFSDGCQDSAELLAFIITTYSIETILGVLGNLCLIFVTTRQKEKSNVTNLLIANLAFSDFLMCLICQPLTVTYTIMDYWIFGEVLCKMLTFIQCMSVTVSILSLVLVALERHQLIINPT-GWKPSIFQAYLGIVVIWFISCFLSLPFLANSTLNDED--KVVCFVS---WSSDHHRLIYTTFLLLFQYCIPLAFILVCYIRIYQRLQRQKHVAHACSSRAGQMKRINSMLMTMVTAFAVLWLPLHVFNTLEDWYQEACHGNLIFLMCHLLAMASTCVNPFIYGFLNINFKKDIKALVLTCHCRSPQGES---TVHTDLSKGSMR
>PE21_RAT
--MSPYG-LNLSLVDEATTCVTPRVPNTSVVLPTGGNGTSPALPIFSMTLGAVSNVLALALLAQVAGSTATFLLFVASLLAIDLAGHVIPGALVLRLYTAG-RAPAGGACHFLGGCMVFFGLCPLLLGCGMAVERCVGVTQPLHAARVSVARARLALALLAAMALAVALLPLVHVGHYELQYPGTWCFIS--LGPPGGWRQALLAGLFAGLGLAALLAALVCNTLSGLALLRDRRRSRGLRRVHAHDVEMVGQLVGIMVVSCICWSPLLVLVVLAIGGWNSNSLQRPLFLAVRLASWNQILDPWVYILLRQAMLRQLLRLLPLRVSAKGGPTELSLSSLRSSRHSGFS
>OPSG_RAT
DHYEDSTQASIFTYTNSNSTRGPFEGPNYHIAPRWVYHLTSTWMILVVIASVFTNGLVLAATMRFKKLRHPLNWILVNLAVADLAETIIASTISVVNQIYGYFVLGHPLCVIEGYIVSLCGITGLWSLAIISWERWLVVCKPFGNVRFDAKLATVGIVFSWVWAAVWTAPPIFGWSRYWPYGLKTSCGPDVFSGTSYPGVQSYMMVLMVTCCIFPLSIIVLCYLQVWLAIRAVAKQQKESESTQKAEKEVTRMVVVMVFAYCLCWGPYTFFACFATAHPGYAFHPLVASLPSYFAKSATIYNPIIYVFMNRQFRNCILQLFGKKVDDS-----SELVSSV--SSVSPA
>P2UR_MOUSE
----MAADLEPWNSTINGTWEGDELGYKCRFNEDFKYVLLPVSYGVVCVLGLCLNVVALYIFLCRLKTWNASTTYMFHLAVSDSLYAASLPLLVYYYARGDHWPFSTVLCKLVRFLFYTNLYCSILFLTCISVHRCLGVLRPLSLRWGRARYARRVAAVVWVLVLACQAPVLYFVTTS-----RITCHDT-SARELFSHFVAYSSVMLGLLFAVPFSVILVCYVLMARRLLKP---YGTTGGLPRAKRKSVRTIALVLAVFALCFLPFHVTRTLYYSFRSLNAINMAYKITRPLASANSCLDPVLYFLAGQRLVRFARDAKPPTEPTPSPQARRKL--TVRKDLSVSS
>A2AC_DIDMA
-MDLQLTTNSTDSGDRGGSSNESLQRQPPSQYSPAEVAGLAAVVSFLIVFTIVGNVLVVIPVLTSRALKAPQNLFLVSLASADILVATLVMPFSLANELMNYWYFGKVWCDIYLALDVLFCTSSIVHLCAISLDRYWSVTQAVYNLKRTPRRIKGIIVTVWLISAVISFPPLISLYR-------PQCELN--------DETWYILSSCIGSFFAPCIIMVLVYVRIYRVAKLRRRKLCRRKVTQAREKRFTFVLAVVMGVFVVCWFPFFFTYSLYGICREAQVPETLFKFFFWFGYCNSSLNPVIYTIFNQDFRRSFKHILFKKKKKTSLQ-----------------
>GPR1_RAT
EVSREMLFEELDNYSYALEYYSQEPDAEENVYPGIVHWISLLLYALAFVLGIPGNAIVIWFMGFKWK-KTVTTLWFLNLAIADFIFVLFLPLYISYVALSFHWPFGRWLCKLNSFIAQLNMFSSVFFLTVISLDRYIHLIHPGSHPHRTLKNSLLVVLFVWLLASLLGGPTLYFR--D--NN-RIICYNNQEYELTLMRHHVLTWVKFLFGYLLPLLTMSSCYLCLIFKT---------KKQNILISSKHLWMILSVVIAFMVCWTPFHLFSIWELSIHHNNVLQGGIPLSTGLAFLNSCLNPILYVIISKKFQARFRASVAEVLKRSLWEASCSGSAETKSLSLLET
>CCR4_PAPAN
IYTSDNYTEEMG-SGDYDSIKEPC---FREENAHFNRIFLPTIYSIIFLTGIVGNGLVILVMGYQKKLRSMTDKYRLHLSVADLLFVITLPFWAVDAVAN--WYFGNFLCKAVHVIYTVNLYSSVLILAFISLDRYLAIVHATSQRPRKLLAEKVVYVGVWIPALLLTIPDFIFASVSE-DD-RYICDRF---YPNDLWVVVFQFQHIMVGLILPGIVILSCYCIIISKL---------SHSKGHQKRKALKTTVILILAFFACWLPYYIGISIDSFILLENTVHKWISITEALAFFHCCLNPILYAFLGAKFKTSAQHALTSVSRGSSLKILSKG----------GH
>D2D1_.ENLA
---MDPQNLSMYNDD------INNGTNGTAVDQKPHYNYYAMLLTLLVFVIVFGNVLVCIAVSREKALQTTTNYLIVSLAVADLLVATLVMPWAVYMEVVGEWRFSRIHCDIFVTLDVMMCTASILNLCAISIDRYTAVAMPMNTRYSSKRRVTVMISVVWVLSFAISCPLLFGLN---S----KVCII---------DNPAFVIYSSIVSFYVPFIVTLLVYVQIYIVLRKRRTSMSKKKLSQHKEKKATQMLAIVLGVFIICWLPFFIIHILNMHCNCN-IPQALYSAFTWLGYVNSAVNPIIYTTFNVEFRKAFIKILHC-------------------------
>OLF8_RAT
---------------MNNKTVITHFLLLGLPIPPEHQQLFFALFLIMYLTTFLGNLLIVVLVQLDSHLHTPMYLFLSNLSFSDLCFSSVTMLKLLQNIQSQVPSISYAGCLTQIFFFLLFGYLGNFLLVAMAYDRYVAICFPLYTNIMSHKLCTCLLLVFWIMTSSHAMMHTLLAARL---SLNFFCDLFKLACSDTYVNELMIHIMGVIIIVIPFVLIVISYAKIISSI--------LKVPSTQSIHKVFSTCGSHLSVVSLFYGTIIGLYLCPSGDN---FSLKGSAMAMMYTVVTPMLNPFIYSLRNRDMKQALIRVTCSKKISLPW------------------
>GASR_MOUSE
GSSLCRPGVSLLNSSSAGNLSCETPRIRGTGTRELELTIRITLYAVIFLMSVGGNVLIIVVLGLSRRLRTVTNAFLLSLAVSDLLLAVACMPFTLLPNLMGTFIFGTVICKAVSYLMGVSVSVSTLNLAAIALERYSAICRPLARVWQTRSHAARVILATWLLSGLLMVPYPVYTVVQP---I-LQCMHL---WPSERVQQMWSVLLLILLFFIPGVVMAVAYGLISRELYLGTGPPRPNQAKLLAKKRVVRMLLVIVLLFFVCWLPVYSANTWRAFDGPGALAGAPISFIHLLSYTSACANPLVYCFMHRRFRQACLDTCARCCPRPPRARPRPLPSIASLSRLSYT
>OPRK_CAVPO
LPNGSAWLPGWAEPDGNGSAGPQDEQLEPAHISPAIPVIITAVYSVVFVVGLVGNSLVMFVIIRYTKMKTATNIYIFNLALADALVTTTMPFQSTVYLMNS-WPFGDVLCKIVISIDYYNMFTSIFTLTMMSVDRYIAVCHPVALDFRTPLKAKIINICIWLLSSSVGISAIILGGTKVDVD-IIECSLQFPDDDYSWWDLFMKICVFVFAFVIPVLIIIVCYTLMILRLKSVRLL-SGSREKDRNLRRITRLVLVVVAVFIICWTPIHIFILVEALGSTSTAALSSYYFCIALGYTNSSLNPILYAFLDENFKRCFRDFCFPIKMRMERQSTSRVAYMRNVDGVNKP
>OLFD_CANFA
-------------MTEKNQTVVSEFVLLGLPIDPDQRDLFYALFLAMYVTTILGNLLIIVLIQLDSHLHTPMYLFLSNLSFSDLCFSSVTMPKLLQNMQSQVPSIPYAGCLTQMYFFLFFGDLESFLLVAMAYDRYVAICFPLYTTIMSPKLCFSLLVLSWVLTMFHAVLHTLLMARL---CPHFFCDMSKLACSDTQVNELVIFIMGGLILVIPFLLIITSYARIVSSI--------LKVPSAIGICKVFSTCGSHLSVVSLFYGTVIGLYLCPSANN---STVKETIMAMMYTVVTPMLNPFIYSLRNKDMKGALRRVICRKKITFSV------------------
>O2F1_HUMAN
-------------MGTDNQTWVSEFILLGLSSDWDTRVSLFVLFLVMYVVTVLGNCLIVLLIRLDSRLHTPMYFFLTNLSLVDVSYATSVVPQLLAHFLAEHKAIPFQSCAAQLFFSLALGGIEFVLLAVMAYDRYVAVCDALYSAIMHGGLCARLAITSWVSGFISSPVQTAITFQL---PDHISCELLRLACVDTSSNEVTIMVSSIVLLMTPFCLVLLSYIQIISTI--------LKIQSREGRKKAFHTCASHLTVVALCYGVAIFTYIQPHSSP---SVLQEKLFSVFYAILTPMLNPMIYSLRNKEVKGAWQKLLWKFSGLTSKLAT---------------
>OPS._MOUSE
------------MLSEASDFNSSGSRSEGSVFSRTEHSVIAAYLIVAGITSILSNVVVLGIFIKYKELRTPTNAVIINLAFTDIGVSSIGYPMSAASDLHGSWKFGHAGCQIYAGLNIFFGMVSIGLLTVVAMDRYLTISCPDVGRRMTTNTYLSMILGAWINGLFWALMPIIGWASYA--PTGATCTIN--WRNNDTSFVSYTMMVIVVNFIVPLTVMFYCYYHVSRSLRLYAS-TAHLHRDWADQADVTKMSVIMILMFLLAWSPYSIVCLWACFGNPKKIPPSMAIIAPLFAKSSTFYNPCIYVAAHKKFRKAMLAMFKCQPHLAVPEPSTLPLAPVRI------
>A1AB_RAT
HNTSAPAHWGELKDDNFTGPNQTSSNSTLPQLDVTRAISVGLVLGAFILFAIVGNILVILSVACNRHLRTPTNYFIVNLAIADLLLSFTVLPFSATLEVLGYWVLGRIFCDIWAAVDVLCCTASILSLCAISIDRYIGVRYSLYPTLVTRRKAILALLSVWVLSTVISIGPLLGWKEP--AP--KECGVT--------EEPFYALFSSLGSFYIPLAVILVMYCRVYIVAKRTHNPIAVKLFKFSREKKAAKTLGIVVGMFILCWLPFFIALPLGSLFSTLKPPDAVFKVVFWLGYFNSCLNPIIYPCSSKEFKRAFMRILGCQCRGGRRRRRRRRTYRPWTRGGSLE
>D4DR_RAT
---MGNSSATGDGGLLAGRGP---ESLGTGTGLGGAGAAALVGGVLLIGMVLAGNSLVCVSVASERILQTPTNYFIVSLAAADLLLAVLVLPLFVYSEVQGGWLLSPRLCDTLMAMDVMLCTASIFNLCAISVDRFVAVTVPL-RYNQQGQCQLLLIAATWLLSAAVAAPVVCGLN---G-R--TVCCL---------EDRDYVVYSSICSFFLPCPLMLLLYWATFRGLRRWPAPRKRGAKITGRERKAMRVLPVVVGAFLMCWTPFFVVHITRALCPACFVSPRLVSAVTWLGYVNSALNPIIYTIFNAEFRSVFRKTLRLRC-----------------------
>GP52_HUMAN
SRWTEWRILNMSSGIVNASERHSCPLGFGHYSVVDVCIFETVVIVLLTFLIIAGNLTVIFAFHCAPLHHYTTSYFIQTMAYADLFVGVSCLVPTLSLLHYSTGVHESLTCRVFGYIISVLKSVSMACLACISVDRYLAITKPLYNQLVTPCRLRICIILIWIYSCLIFLPSFFGWG---PGY-IFEWCAT-----SWLTSAYFTGFIVCLLYAPAAFVVCFTYFHIFKICRQHFPSDSSRETGHSPDRRYAMVLFRITSVFYMLWLPYIIYFLLESSRV--LDNPTLSFLTTWLAVSNSFCNCVIYSLSNGVFRLGLRRLFETMCTSCMCVKDQEARANSCSI-----
>ACTR_MOUSE
-------------MKHIINSYEHTNDTARNNSDCPDVVLPEEIFFTISVIGILENLIVLLAVIKNKNLQSPMYFFICSLAISDMLGSLYKILENILIMFRNMGSFESTADDIIDCMFILSLLGSIFSLSVIAADRYITIFHALYHSIVTMRRTIITLTIIWMFCTGSGITMVIFFS----------------------HHHIPTVLTFTSLFPLMLVFILCLYIHMFLLARSH---ARKISTLPRTNMKGAMTLTILLGVFIFCWAPFVLHVLLMTFCPNNVCYMSLFQVNGMLIMCNAVIDPFIYAFRSPELRDAFKRMLFCNRY----------------------
>SSR4_HUMAN
GEEGLGTAWPSAANASSAPAEAEEAVAGPGDARAAGMVAIQCIYALVCLVGLVGNALVIFVILRYAKMKTATNIYLLNLAVADELFMLSVPFVASSAALRH-WPFGSVLCRAVLSVDGLNMFTSVFCLTVLSVDRYVAVVHPLAATYRRPSVAKLINLGVWLASLLVTLPIAIFADTRPGGQ-AVACNLQ---WPHPAWSAVFVVYTFLLGFLLPVLAIGLCYLLIVGKMRAVALR-AGWQQRRRSEKKITRLVLMVVVVFVLCWMPFYVVQLLNLVVTS--LDATVNHVSLILSYANSCANPILYGFLSDNFRRSFQRVLCLRCCLLEGAGGAEETALKSKGGAGCM
>OPSD_PIG
MNGTEGPNFYVPFSNKTGVVRSPFEYPQYYLAEPWQFSMLAAYMFMLIVLGFPINFLTLYVTVQHKKLRTPLNYILLNLAVADLFMVFGGFTTTLYTSLHGYFVFGPTGCNLEGFFATLGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGLALTWVMALACAAPPLVGWSRYIPEGLQCSCGIDYYTLKPEVNNESFVIYMFVVHFSIPLVIIFFCYGQLVFTVKEAAAQQQESATTQKAEKEVTRMVIIMVVAFLICWLPYASVAFYIFTHQGSDFGPIFMTIPAFFAKSASIYNPVIYIMMNKQFRNCMLTTLCCGKNPLGDDEASTTTSKTETSQVAPA
>GRHR_SHEEP
--MANGDSPDQNENHCSAINSSILLTPGSLPTLTLSGKIRVTVTFFLFLLSTIFNTSFLLKLQNWTQKLSKMKVLLKHLTLANLLETLIVMPLDGMWNITVQWYAGELLCKVLSYLKLFSMYAPAFMMVVISLDRSLAITRPL-AVKSNSKLGQFMIGLAWLLSSIFAGPQLYIFGMIHEG--FSQCVTH--SFPQWWHQAFYNFFTFSCLFIIPLLIMLICNAKIIFTLTRVPHKNQSKNNIPQARLRTLKMTVAFATSFTVCWTPYYVLGIWYWFDPDMRVSDPVNHFFFLFAFLNPCFDPLIYGYFSL-------------------------------------
>DOP1_DROME
YDGTTLTSFYNESSWTNASEMDTIVGEEPEPLSLVSIVVVGIFLSVLIFLSVAGNILVCLAIYTDGSPRIG-NLFLASLAIADLFVASLVMTFAGVNDLLGYWIFGAQFCDTWVAFDVMCSTASILNLCAISMDRYIHIKDPLYGRWVTRRVAVITIAAIWLLAAFVSFVPISLGIHRPLI-KYPTCALD--------LTPTYAVVSSCISFYFPCVVMIGIYCRLYCYAQKHFKVHTHSSPYHVSDHKAAVTVGVIMGVFLICWVPFFCVNITAAFCKTC-IGGQTFKILTWLGYSNSAFNPIIYSIFNKEFRDAFKRILTMRNP----------------------
>AA1R_RABIT
----------------------------MPPSISAFQAAYIGIEVLIALVSVPGNVLVIWAVKVNQALRDATFCFIVSLAVADVAVGALVIPLAILINIG--PETYFHTCLMVACPVLILTQSSILALLAIAVDRYLRVKIPLYKAVVTPRRAAVAIAGCWILSLVVGLTPMFGWNNLRN-G-VIKCEFE-----KVISMEYMVYFNFFVWVLPPLLLMVLIYLEVFYLIRRQK-ASGDPHKYYGKELKIAKSLALILFLFALSWLPLHILNCVTLFCPSCQKPSILVYTAIFLTHGNSAMNPIVYAFRIHKFRVTFLKIWNDHFRCRPAPAGDGDPND---------
>OPSI_ASTFA
LNGFEGDNFYIPMSNRTGLVRDPFVYEQYYLAEPWQFKLLACYMFFLICLGLPINGFTLFVTAQHKKLQQPLNFILVNLAVAGMIMVCFGFTITISSAVNGYFYFGPTACAIEGFMATLGGEVALWSLVVLAIERYIVVCKPMGSFKFSASHALGGIGFTWFMAMTCAAPPLVGWSRYIPEGLQCSCGPDYYTLNPKYNNESYVIYMFVVHFIVPVTVIFFTYGRLVCTVKSAAAAQQDSASTQKAEKEVTRMVILMVVGFLVAWTPYATVAAWIFFNKGAAFTAQFMAVPAFFSKSSALFNPIIYVLLNKQFRNCMLTTLFCGKNPLGDEESSTVTVSS----VSPA
>ACTR_HUMAN
-------------MKHIINSYENINNTARNNSDCPRVVLPEEIFFTISIVGVLENLIVLLAVFKNKNLQAPMYFFICSLAISDMLGSLYKILENILIILRNMGSFETTADDIIDSLFVLSLLGSIFSLSVIAADRYITIFHALYHSIVTMRRTVVVLTVIWTFCTGTGITMVIFFS----------------------HHHVPTVITFTSLFPLMLVFILCLYVHMFLLARSH---TRKISTLPRANMKGAITLTILLGVFIFCWAPFVLHVLLMTFCPSNACYMSLFQVNGMLIMCNAVIDPFIYAFRSPELRDAFKKMIFCSRYW---------------------
>CCR4_RAT
IYTSDNYSEEVG-SGDYDSNKEPC---FRDENENFNRIFLPTIYFIIFLTGIVGNGLVILVMGYQKKLRSMTDKYRLHLSVADLLFVITLPFWAVDAMAD--WYFGKFLCKAVHIIYTVNLYSSVLILAFISLDRYLAIVHATSQSARKLLAEKAVYVGVWIPALLLTIPDIIFADVSQ-DG-RYICDRL---YPDSLWMVVFQFQHIMVGLILPGIVILSCYCIIISKL---------SHSKGHQKRKALKTTVILILAFFACWLPYYVGISIDSFILLESVVHKWISITEALAFFHCCLNPILYAFLGAKFKSSAQHALNSMSRGSSLKILSKG----------GH
>OPSD_ANOCA
MNGTEGQNFYVPMSNKTGVVRNPFEYPQYYLADPWQFSALAAYMFLLILLGFPINFLTLFVTIQHKKLRTPLNYILLNLAVANLFMVLMGFTTTMYTSMNGYFIFGTVGCNIEGFFATLGGEMGLWSLVVLAVERYVVICKPMSNFRFGETHALIGVSCTWIMALACAGPPLLGWSRYIPEGMQCSCGVDYYTPTPEVHNESFVIYMFLVHFVTPLTIIFFCYGRLVCTVKAAAAQQQESATTQKAEREVTRMVVIMVISFLVCWVPYASVAFYIFTHQGSDFGPVFMTIPAFFAKSSAIYNPVIYILMNKQFRNCMIMTLCCGKNPLGDEETSAGTSTVSTSQVSPA
>ML1A_CHICK
-------MRANGSELNGTVLPRDPPAEGSPRRPPWVTSTLATILIFTIVVDLLGNLLVILSVYRNKKLRNAGNIFVVSLAIADLVVAIYPYPLVLTSVFHNGWNLGYLHCQISGFLMGLSVIGSIFNITGIAINRYCYICHSLYDKLYSDKNSLCYVGLIWVLTVVAIVPNLFVGS-LQYDP-IYSCTFA------QSVSSAYTIAVVFFHFILPIAIVTYCYLRIWILVIQVRR-PDNNPRLKPHDFRNFVTMFVVFVLFAVCWAPLNFIGLAVAVDPETRIPEWLFVSSYYMAYFNSCLNAIIYGLLNQNFRREYKKIVVSFCTAKAFFQDSSNSKPSPLITNNNQ
>EDG1_HUMAN
VKAHRSSVSDYVNYDIIVRHYNYTGKLNISADKENSIKLTSVVFILICCFIILENIFVLLTIWKTKKFHRPMYYFIGNLALSDLLAG-VAYTANLLLSGATTYKLTPAQWFLREGSMFVALSASVFSLLAIAIERYITMLKMKLHNGSNNFRLFLLISACWVISLILGGLPIMGWNCI-S--ALSSCSTV------LPLYHKHYILFCTTVFTLLLLSIVILYCRIYSLVRTRLTFISKASRSSE-NVALLKTVIIVLSVFIACWAPLFILLLLDVGCKVKCDILFRAEYFLVLAVLNSGTNPIIYTLTNKEMRRAFIRIMSCCKCPSGDSAGKFKEFSRSKSDNSSH
>OPS1_DROPS
APS---NGSVVDKVTPDMAHLISPYWDQFPAMDPIWAKILTAYMIIIGMISWCGNGVVIYIFATTKSLRTPANLLVINLAISDFGIMITNTPMMGINLYFETWVLGPMMCDIYAGLGSAFGCSSIWSMCMISLDRYQVIVKGMAGRPMTIPLALGKIAYIWFMSTIWCCLAPVFWSRYVPEGNLTSCGID--YLERDWNPRSYLIFYSIFVYYIPLFLICYSYWFIIAAVSAHMNVRSSEDADKSAEGKLAKVALVTISLWFMAWTPYLVINCMGLFKFEG-LTPLNTIWGACFAKSAACYNPIVYGISHPKYRLALKEKCPCCVFGKVDDGK-SSTSEAESKA----
>PAFR_CAVPO
----------------------MELNSSSRVDSEFRYTLFPIVYSIIFVLGIIANGYVLWVFARLYPKLNEIKIFMVNLTVADLLFLITLPLWIVYYSNQGNWFLPKFLCNLAGCLFFINTYCSVAFLGVITYNRFQAVKYPITAQATTRKRGIALSLVIWVAIVAAASYFLVMDSTN-G--NITRCFEH---YEKGSKPVLIIHICIVLGFFIVFLLILFCNLVIIHTLLRQP---VKQQRNAEVRRRALWMVCTVLAVFVICFVPHHMVQLPWTLAELGQAINDAHQVTLCLLSTNCVLDPVIYCFLTKKFRKHLSEKLNIMRSSQKCSRVTTDPINHTPVNPIKN
>OPSB_MOUSE
-----MSGEDDFYLFQNISSVGPWDGPQYHLAPVWAFRLQAAFMGFVFFVGTPLNAIVLVATLHYKKLRQPLNYILVNVSLGGFLFCIFSVFTVFIASCHGYFLFGRHVCALEAFLGSVAGLVTGWSLAFLAFERYVVICKPFGSIRFNSKHALMVVLATWIIGIGVSIPPFFGWSRFIPEGLQCSCGPDWYTVGTKYRSEYYTWFLFIFCFIIPLSLICFSYSQLLRTLRAVAAQQQESATTQKAEREVSHMVVVMVGSFCLCYVPYAALAMYMVNNRNHGLDLRLVTIPAFFSKSSCVYNPIIYCFMNKQFRACILEMVCRKPMAD---ESDVSSTVSSSKVGPH-
>HH2R_CANFA
-------------------MISNGTGSSFCLDSPPCRITVSVVLTVLILITIAGNVVVCLAVGLNRRLRSLTNCFIVSLAITDLLLGLLVLPFSAFYQLSCRWSFGKVFCNIYTSLDVMLCTASILNLFMISLDRYCAVTDPLYPVLITPVRVAVSLVLIWVISITLSFLSIHLGWN--RN-TIPKCKVQ--------VNLVYGLVDGLVTFYLPLLVMCITYYRIFKIARDQR--MGSWKAATIGEHKATVTLAAVMGAFIICWFPYFTVFVYRGLKGDD-INEAFEAVVLWLGYANSALNPILYATLNRDFRTAYQQLFRCRPASHNAQETSLRRNQSREPMR---
>C3.1_MOUSE
--MSTSFPELDLENFEYDDSAEAC---YLGDIVAFGTIFLSVFYALVFTFGLVGNLLVVLALTNSRKPKSITDIYLLNLALSDLLFVATLPFWTHYLISHEG--LHNAMCKLTTAFFFIGFFGGIFFITVISIDRYLAIVLAASMNNRTVQHGVTISLGVWAAAILVASPQFMFTKRKD-----NECLGDYPEVLQEMWPVLRNSEVNILGFALPLLIMSFCYFRIIQTL---------FSCKNRKKARAVRLILLVVFAFFLFWTPYNIMIFLETLKFYNRDLRLALSVTETVAFSHCCLNPFIYAFAGEKFRRYLGHLYRKCLAVLCGHPVHTGRSRQDSILSS-F
>ML1._SHEEP
---------MGRTLAVPTPYGCIGCKLPQPDYPPALIVFMFCAMVITIVVDLIGNSMVILAVSKNKKLRNSGNVFVVSLSVADMLVAIYPYPLMLHAMAIGGWDLSKLQCQMVGFITGLSVVGSIFNIMAIAINRYCYICHSLYERIFSVRNTCIYLAVTWIMTVLAVLPNMYIGT-IEYDP-TYTCIFN------YVNNPAFAVTIVCIHFVLPLLIVGFCYVKIWTKVLAARD-AGQNPDNQLAEVRNFLTMFVIFLLFAVCWCPINALTVLVAVNPKEKIPNWVYLAAYFIAYFNSCLNAVIYGVLNENFRREYWTIFHAMRHPVLFLSGLLTAQAHTHARARAR
>GRPR_MOUSE
NCSHLNLDVDPFLSCNDTFNQSLSPPKMDNWFHPGFIYVIPAVYGLIIVIGLIGNITLIKIFCTVKSMRNVPNLFISSLALGDLLLLVTCAPVDASKYLADRWLFGRIGCKLIPFIQLTSVGVSVFTLTALSADRYKAIVRPMIQASHALMKICLKAALIWIVSMLLAIPEAVFSDLHPQT--FISCAPY---HSNELHPKIHSMASFLVFYVIPLAIISVYYYFIARNLIQSLPVNIHVKKQIESRKRLAKTVLVFVGLFAFCWLPNHVIYLYRSYHYSEMLHFVTSICARLLAFTNSCVNPFALYLLSKSFRKQFNTQLLCCQPGLMNR--SHSMTSFKSTNP-SA
>OPS._HUMAN
------------MLRNNLGNSSDSKNEDGSVFSQTEHNIVATYLIMAGMISIISNIIVLGIFIKYKELRTPTNAIIINLAVTDIGVSSIGYPMSAASDLYGSWKFGYAGCQVYAGLNIFFGMASIGLLTVVAVDRYLTICLPDVGRRMTTNTYIGLILGAWINGLFWALMPIIGWASYA--PTGATCTIN--WRKNDRSFVSYTMTVIAINFIVPLTVMFYCYYHVTLSIKHHSD-TESLNRDWSDQIDVTKMSVIMICMFLVAWSPYSIVCLWASFGDPKKIPPPMAIIAPLFAKSSTFYNPCIYVVANKKFRRAMLAMFKCQTHQTMPVTSILPLASGRI------
>5HT1_DROME
VSYQGITSSNLGDSNTTLVPLLEEFAAGEFVLPPLTSIFVSIVLLIVILGTVVGNVLVCIAVCMVRKLRRPCNYLLVSLALSDLCVALLVMPMALLYEVLEKWNFGPLLCDIWVSFDVLCCTASILNLCAISVDRYLAITKPLYGVKRTPRRMMLCVGIVWLAAACISLPPLLILGN--E-G--PICTVC--------QNFAYQIYATLGSFYIPLSVMLFVYYQIFRAARRILLGHKKLRFQLAKEKKASTTLGIIMSAFTVCWLPFFILALIRPFETM-HVPASLSSLFLWLGYANSLLNPIIYATLNRDFRKPFQEILYFRCSSLNTMMRENYPPSQRVMLGDER
>CRH2_MOUSE
------MANVTLKPLCPLLEEMVQLPNHSNSSLRYIDHVSVLLHGLASLLGLVENGLILFVVGCRMR-QTVVTTWVLHLALSDLLAAASLPFFTYFLAVGHSWELGTTFCKLHSSVFFLNMFASGFLLSAISLDRCLQVVRPVAQNHRTVAVAHRVCLMLWALAVLNTIPYFVFRDTIPWNPRDTTCDY---------RQKALAVSKFLLAFMVPLAIIASSHVAVSLRL---------HHRGRQRTGRFVRLVAAIVVAFVLCWGPYHIFSLLEARAHS-QLASRGLPFVTSLAFFNSVVNPLLYVFTCPDMLYKLRRSLRAVLESVLVEDSDQSRRASSTATPAST
>ML1C_CHICK
-------------MERPGSNGSCSGCRLEGGPAARAASGLAAVLIVTIVVDVLGNALVILSVLRNKKLRNAGNIFVVSLSVADLVVAVYPYPLILSAIFHNGWTMGNIHCQISGFLMGLSVIGSIFNITAIAINRYCYICHSLYDKLFNLKNTCCYICLTWTLTVVAIVPNFFVGS-LQYDP-IYSCTFA------QTVSTSYTITVVVVHFIVPLSIVTFCYLRIWILVIQVHR-QDCKQKIRAADIRNFLTMFVVFVLFAVCWGPLNFIGLAVSINPSKHIPEWLFVLSYFMAYFNSCLNAVIYGLLNQNFRKEYKRILLMLRTPRLLFIDVSKSKPSPAVTNNNQ
>5H1F_HUMAN
-------------MDFLNSSD-QNLTSEELLNRMPSKILVSLTLSGLALMTTTINSLVIAAIIVTRKLHHPANYLICSLAVTDFLVAVLVMPFSIVYIVRESWIMGQVVCDIWLSVDITCCTCSILHLSAIALDRYRAITDAVYARKRTPKHAGIMITIVWIISVFISMPPLFWR----S-R--DECIIK-------HDHIVSTIYSTFGAFYIPLALILILYYKIYRAAKTLFKHWRRQKISGTRERKAATTLGLILGAFVICWLPFFVKELVVNVCDKCKISEEMSNFLAWLGYLNSLINPLIYTIFNEDFKKAFQKLVRCRC-----------------------
>GRHR_PIG
--MANSASPEQNQNHCSAINSSILLTQGNLPTLTLSPNIRVTVTFFLFLLSTAFNASFLLKLQKWTQKLSRMKVLLKHLTLANLLETLIVMPLDGMWNITVQWYAGEFLCKVLSYLKLFSMYAPAFMMVVISLDRSLAITRPL-AVKSNSRLGRFMIGLAWLLSSIFAGPQLYIFRMIHEG--FSQCVTH--SFPQWWHQAFYDFFTFSCLFIIPLLIMLICNAKIMFTLTRVPHNNQSKNNIPRARLRTLKMTVAFAASFIVCWTPYLVLGIWYWFDPEMRVSDPVNHFFFLFAFLNPCFDPLIYGYFSL-------------------------------------
>MAS_MOUSE
------MDQSNMTSLAEEKAMNTSSRNASLGSSHPPIPIVHWVIMSISPLGFVENGILLWFLCFRMR-RNPFTVYITHLSMADISLLFCIFILSIDYALDYESSGHHYTIVTLSVTFLFGYNTGLYLLTAISVERCLSVLYPIYTSHRPKHQSAFVCALLCALSCLVTTMEYVMCI-DSHS--RSDC-----------RAVIIFIAILSFLVFTPLMLVSSSILVVKIRK----------NTWASHSSKLYIVIMVTIIIFLIFAMPMRVLYLLYYEYW--SAFGNLHNISLLFSTINSSANPFIYFFVGSSKKKRFRESLKVVLTRAFKDEMQPRTVSIETVV----
>UL33_HCMVA
----------------------------MTGPLFAIRTTEAVLNTFIIFVGGPLNAIVLITQLLTNRGYSTPTIYMTNLYSTNFLTLTVLPFIVLSNQ-WL-LPAGVASCKFLSVIYYSSCTVGFATVALIAADRYR-VLHKRTYARQSYRSTYMILLLTWLAGLIFSVPAAVYTTVVMN-NGHATCVLYVAEEVH-TVLLSWKVLLTMVWGAAPVIMMTWFYAFFYSTV---------QRTSQKQRSRTLTFVSVLLISFVALQTPYVSLMIFNSYATTATLRRTIGTLARVVPHLHCLINPILYALLGHDFLQRMRQCFRGQLLDRRAFLRSQQTNLAAGNNSQSV
>CKR5_GORGO
----MDYQVSSPTYDIDYYTSEPC---QKTNVKQIAARLLPPLYSLVFIFGFVGNMLVILILINCKRLKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQK-EG-HYTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKRFCKCCSIFA-------SSVY
>PE21_MOUSE
--MSPCG-LNLSLADEAATCATPRLPNTSVVLPTGDNGTSPALPIFSMTLGAVSNVLALALLAQVAGSAATFLLFVASLLAIDLAGHVIPGALVLRLYTAG-RAPAGGACHFLGGCMVFFGLCPLLLGCGMAVERCVGVTQPLHAARVSVARARLALAVLAAMALAVALLPLVHVGRYELQYPGTWCFIS--LGPRGGWRQALLAGLFAGLGLAALLAALVCNTLSGLALLRDRRRSRGPRRVHAHDVEMVGQLVGIMVVSCICWSPLLVLVVLAIGGWNSNSLQRPLFLAVRLASWNQILDPWVYILLRQAMLRQLLRLLPLRVSAKGGPTELGLSSLRSSRHSGFS
>A1AB_MESAU
HNTSAPAQWGELKDANFTGPNQTSSNSTLPQLDVTRAISVGLVLGAFILFAIVGNILVILSVACNRHLRTPTNYFIVNLAIADLLLSFTVLPFSATLEVLGYWVLGRIFCDIWAAVDVLCCTASILSLCAISIDRYIGVRYSLYPTLVTRRKAILALLSVWVLSTVISIGPLLGWKEP--AP--KECGVT--------EEPFYALFSSLGSFYIPLAVILVMYCRVYIVAKRTHNPIAVKLFKFSREKKAAKTLGIVVGMFILCWLPFFIALPLGSLFSTLKPPDAVFKVVFWLGYFNSCLNPIIYPCSSKEFKRAFMRILGCQCRSGRRRRRRRRTYRPWTRGGSLE
>O3A2_HUMAN
----------MEPEAGTNRTAVAEFILLGLVQTEEMQPVVFVLLLFAYLVTTGGNLSILAAVLVEPKLHAPMYFFLGNLSVLDVGCITVTVPAMLGRLLSHKSTISYDACLSQLFFFHLLAGMDCFLLTAMAYDRLLAICQPLYSTRMSQTVQRMLVAASLACAFTNALTHTVAMSTL---NNHFYCDLPQLSCSSTQLNELLLFAVGFIMAGTPLVLIITAYSHVAAAV--------LRIRSVEGRKKAFSTCGSHLTVVCLFFGRGIFNYMRLGSEE---ASDKDKGVGVFNTVINPMLNPLIYSLRNPDVQGALWQIFLGRRSLTA-------------------
>OPSU_CARAU
-------MDAWTYQFGNLSKISPFEGPQYHLAPKWAFYLQAAFMGFVFFVGTPLNAIVLFVTMKYKKLRQPLNYILVNISLGGFIFDTFSVSQVFFSALRGYYFFGYTLCAMEAAMGSIAGLVTGWSLAVLAFERYVVICKPFGSFKFGQSQALGAVALTWIIGIGCATPPFWGWSRYIPEGIGTACGPDWYTKNEEYNTESYTYFLLVSCFMMPIMIITFSYSQLLGALRAVAAQQAESASTQKAEKEVSRMVVVMVGSFVVCYGPYAITALYFSYAEDSNKDYRLVAIPSLFSKSSCVYNPLIYAFMNKQFNACIMETVFGKKIDE-----SSESSVSA-------
>PAFR_RAT
----------------------MEQNGSFRVDSEFRYTLFPIVYSVIFVLGVVANGYVLWVFATLYPKLNEIKIFMVNLTVADLLFLMTLPLWIVYYSNEGDWIVHKFLCNLAGCLFFINTYCSVAFLGVITYNRYQAVAYPITAQATTRKRGITLSLVIWISIAATASYFLATDSTN-G--NITRCFEH---YEPYSVPILVVHIFITSCFFLVFFLIFYCNMVIIHTLLTRP---VRQQRKPEVKRRALWMVCTVLAVFVICFVPHHVVQLPWTLAELGQAINDAHQITLCLLSTNCVLDPVIYCFLTKKFRKHLSEKFYSMRSSRKCSRATSDPANQTPVLPLKN
>YYO1_CAEEL
ISPNASNYLTYPFDGLCLQKFFYQLQTSLRRFTPYEEIIYTTVYIIISVAAVIGNGLVIMAVVRKKTMRTNRNVLILNLALSNLILAITNIPFLWLPSIDFEFPYSRFFCKFANVLPGSNIYCSTLTISVMAIDRYYSVKKLKASNRKQCFHAVLVSLAIWIVSFILSLPLLLYYETS--MQEVRQCRLVRLPDITQSIQLLMSILQVAFLYIVPLFVLSIFNVKLTRFLKTNDSHLKNNNKTNQRTNRTTSLLIAMAGSYAALWFPFTLITFLIDFELIINLVERIDQTCKMVSMLSICVNPFLYGFLNTNFRHEFSDIYYRYIRCETKSQPAGRIAHHRQDSVYND
>OPS1_LIMPO
PN-----ASVVDTMPKEMLYMIHEHWYAFPPMNPLWYSILGVAMIILGIICVLGNGMVIYLMMTTKSLRTPTNLLVVNLAFSDFCMMAFMMPTMTSNCFAETWILGPFMCEVYGMAGSLFGCASIWSMVMITLDRYNVIVRGMAAAPLTHKKATLLLLFVWIWSGGWTILPFFGWSRYVPEGNLTSCTVD--YLTKDWSSASYVVIYGLAVYFLPLITMIYCYFFIVHAVAEHNVAANADQQKQSAECRLAKVAMMTVGLWFMAWTPYLIISWAGVFSSGTRLTPLATIWGSVFAKANSCYNPIVYGISHPRYKAALYQRFPSLACGSGESGSDVKTMEEKPKIPEA-
>OPSV_APIME
YLPAGPPRLLGWNVPAEELIHIPEHWLVYPEPNPSLHYLLALLYILFTFLALLGNGLVIWIFCAAKSLRTPSNMFVVNLAICDFFMMIKTPIFIYNSFNTG-FALGNLGCQIFAVIGSLTGIGAAITNAAIAYDRYSTIARPL-DGKLSRGQVILFIVLIWTYTIPWALMPVMGVWGRFPEGFLTSCSFD--YLTDTNEIRIFVATIFTFSYCIPMILIIYYYSQIVSHVVNHNVDSNANTSSQSAEIRIAKAAITICFLYVLSWTPYGVMSMIGAFGNKALLTPGVTMIPACTCKAVACLDPYVYAISHPKYRLELQKRLPWLELQEKPISDSTSTPPASS------
>5H1B_DIDMA
PPASGSLTSSQTNHSTFPNPNAPDLEPYQDSIALPWKVLLATFLGLITLGTTLSNAFVIATVSRTRKLHTPANYLIASLAVTDLLVSILVMPISTMYTVTGRWTLGQVVCDFWLSSDITCCTASILHLCVIALDRYWAITDAVYSAKRTPKRAAGMIIMVWVFSVSISMPPLFWR------E--ADCSVN-------TDHILYTVYSTVGAFYFPTLLLIALYGRIYVEARSRKVSLEKKKLMAARERKATRTLGIILGAFIVCWLPFFIISLALPICDDAWFHLAIFDFFNWLGYLNSLINPIIYTKSNDDFKQAFQKLMRFRRTS---------------------
>NY2R_PIG
EEMKMEPSGPGHTTPRGELAPDSEPELKDSTKLIEVQIILILAYCSIILLGVVGNSLVIHVVIKFKSMRTVTNFFIANLAVADLLVNTLCLPFTLTYTLMGEWKMGPVLCHLVPYAQGLAVQVSTITLTVIALDRHRCIVYHL-ESKISKRISFLIIGLAWGISALLASPLAIFREYS-FE--IVACTEKWPGEEKSIYGTVYSLSSLLILYVLPLGIISFSYARIWSKLKNHSPG-GVNDHYHQRRQKTTKMLVCVVVVFAVSWLPLHAFQLAVDIDSQVKEYKLIFTVFHIIAMCSTFANPLLYGWMNSNYRKAFLSAFRCEQRLDAIHSE---AKKNLEATKNGG
>B1AR_.ENLA
WGPMEC--RNRS----GTPTTVPSPMHPLPELTHQWTMGMTMFMAAIILLIVMGNIMVIVAIGRNQRLQTLTNVFITSLACADLIMGLFVVPLGATLVVSGRWLYGSIFCEFWTSVDVLCVTASIETLCVISIDRYIAITSPFYQSLLTKGRAKGIVCSVWGISALVSFLPIMMHWWRDM-K--GCCDFV--------TNRAYAIASSIISFYFPLIIMIFVYIRVFKEAQKQHGRRILSKILVAKEQKALKTLGIIMGTFTLCWLPFFLANVVNVFYRNL-IPDKLFLFLNWLGYANSAFNPIIYCRSP-DFRKAFKRLLCCPKKADRHLHTTGEFVNSLDTN---A
>D2DR_HUMAN
---MDPLNLSWYDDDLERQNWSRPFNGSDGKADRPHYNYYATLLTLLIAVIVFGNVLVCMAVSREKALQTTTNYLIVSLAVADLLVATLVMPWVVYLEVVGEWKFSRIHCDIFVTLDVMMCTASILNLCAISIDRYTAVAMPMNTRYSSKRRVTVMISIVWVLSFTISCPLLFGLN---Q----NECII---------ANPAFVVYSSIVSFYVPFIVTLLVYIKIYIVLRRRRTSMSRRKLSQQKEKKATQMLAIVLGVFIICWLPFFITHILNIHCDCN-IPPVLYSAFTWLGYVNSAVNPIIYTTFNIEFRKAFLKILHC-------------------------
>FML2_HUMAN
-----------METNFSIPLNETEEVLPEPAGHTVLWIFSLLVHGVTFVFGVLGNGLVIWVAGFRMT-RTVNTICYLNLALADFSFSAILPFRMVSVAMREKWPFASFLCKLVHVMIDINLFVSVYLITIIALDRCICVLHPAAQNHRTMSLAKRVMTGLWIFTIVLTLPNFIFWTTISAFWAVERLNVF------ITMAKVFLILHFIIGFTVPMSIITVCYGIIAAKI---------HRNHMIKSSRPLRVFAAVVASFFICWFPYELIGILMAVWLKEKIILVLINPTSSLAFFNSCLNPILYVFMGRNFQERLIRSLPTSLERALTEVPDSATSAS--------
>ACM5_MACMU
-------MEGDSYHNATTVNGTPVYHQPLERHRLWEVISIAAVTAVVSLITIVGNVLVMISFKVNSQLKTVNNYYLLSLACADLIIGIFSMNLYTTYILMGRWALGSLACDLWLALDYVASNASVMNLLVISFDRYFSITRPLYRAKRTPKRAGVMIGLAWLISFILWAPAILCWQYLV-VP-LDECQIQ------FLSEPTITFGTAIAAFYIPVSVMTILYCRIYRETEKRNPSTKRKRMVLVKERKAAQTLSAILLAFIITWTPYNIMVLVSTFCDKC-VPVTLWHLGYWLCYVNSTVNPICYALCNRTFRKTFKMLLLCRWKKKKVEEKLYW------------
>PE22_CANFA
----------------MGSISNNSGSEDCESREWLPSGESPAISSAMFSAGVLGNLIALALLARRWRSISLFHVLVTELVFTDLLGTCLISPVVLASYARNQLEPERRACTYFAFAMTFFSLATMLMLFAMALERYLSIGRPYYQRHVTRRGGLAVLPTIYTVSLLFCSLPLLGYGQYVQYCPGTWCFIR--------HGRTAYLQLYATLLLLLIVAVLACNFSVILNLIRMDGSRRGERVSVAEETDHLILLAIMTITFAICSLPFTIFAYMNETSS---RREKWDLQALRFLSINSIIDPWVFAIFRPPVLRLMRSVLCCRVSLRAQDATQTSSRLTFVDTS---
>5H1B_RABIT
AAGSQIAVPQANLSAAHSHNCSAEGYIYQDSIALPWKVLLVLLLALFTLATTLSNAFVVATVYRTRKLHTPANYLIASLAVTDLLVSILVMPISTMYTVTGRWTLGQVVCDLWLSSDITCCTASIMHLCVIALDRYWAITDAVYSAKRTPKRAAIMIRLVWVFSICISLPPFFWR----E-E--SECLVN-------TDHVLYTVYSTVGAFYLPTLLLIALYGRIYVEARSRRVSLEKKKLMAARERKATKTLGIILGVFIVCWLPFFIISLVMPICKDAWFHQAIFDFFTWLGYVNSLINPIIYTMSNEDFKQAFHKLIRFKCTS---------------------
>GLHR_ANTEL
HSNHTPNGTQFHQCSKIPVQCVPKSDAFHPCEDIMGYVWLTVVSFMVGAVALVANLVVALVLLTSQRRLNVTRFLMCNLAFADFILGLYIFILTSVSAVTRGWQNG-AGCKILGFLAVFSSELSLFTLVMMTIERFYAIVHAMMNARLSFRKTVRFMIGGWIFALVMAVVPLTGVSGY----KVAICLPF-----VSDATSTAYVAFLLLVNGASFISVMYLYSRMLYVVVSG---GDMEGAPKRNDSKVAKRMAILVFTDMLCWAPIAFFGLLAAFGQTLLTVTQSKILLVFFFPINSICNPFLYAFFTKAFKRELFTALSRIGFCKFRALKYNGSRSRRHHSTVNA
>MSHR_OVIMO
PALGSQRRLLGSLNCTPPATLPLTLAPNRTGPQCLEVSIPNGLFLSLGLVSLVENVLVVAAIAKNRNLHSPMYYFVCCLAMSDLLVSVSNVLETAVMLLLEAAAVVQQLDNVIDVLICSSMVSSLCFLGAIAVDRYISIFYALYHSVVTLPRAWRIIAAIWVASILTSVLSITYYY----------------------NNHTVVLLCLVGFFIAMLALMAVLYVHMLARACRHIARKRQRPIHQGFGLKGAATLTILLGVFFLCWGPFFLHLSLIVLCPQHGCIFKNFNLFLALIICNAIVDPLIYAFRSQELRKTLQEVLQCSW-----------------------
>P2YR_CHICK
PELLAG-----------GWAAGNATTKCSLTKTGFQFYYLPTVYILVFITGFLGNSVAIWMFVFHMRPWSGISVYMFNLALADFLYVLTLPALIFYYFNKTDWIFGDVMCKLQRFIFHVNLYGSILFLTCISVHRYTGVVHPLSLGRLKKKNAVYVSSLVWALVVAVIAPILFYSGTG----KTITCYDT-TADEYLRSYFVYSMCTTVFMFCIPFIVILGCYGLIVKALIY------KDLDNSPLRRKSIYLVIIVLTVFAVSYLPFHVMKTLNLRARLDDKVYATYQVTRGLASLNSCVDPILYFLAGDTFRRRLSRATRKSSRRSEPNVQSKSLTEYKQNGDTSL
>ADMR_MOUSE
LEPDNDFRDIHNWTELLHLFNQTFTDCHIEFNENTKHVVLFVFYLAIFVVGLVENVLVICVNCRRSGRVGMLNLYILNMAIADLGIILSLPVWMLEVMLYETWLWGSFSCRFIHYFYLVNMYSSIFFLTCLSIDRYVTLTNTSSWQRHQHRIRRAVCAGVWVLSAIIPLPEVVHIQLLD--E--PMCLFLAPFETYSAWALAVALSATILGFLLPFLLIAVFNILTACRL---------RRQRQTESRRHCLLMWAYIVVFAICWLPYQVTMLLLTLHGTHNLLYFFYEIIDCFSMLHCVANPILYNFLSPSFRGRLLSLVVRYLPKEQARAAGGRQHSIIITKEGSL
>FSHR_BOVIN
FDVMYSEFDYDLCNEVVDVTCSPEPDAFNPCEDIMGDDILRVLIWFISILAITGNILVLVILITSQYKLTVPRFLMCNLAFADLCIGIYLLLIASVDVHTKTWQTG-AGCDAAGFFTVFASELSVYTLTAITLERWHTITHAMLECKVQLRHAASIMLVGWIFAFAVALFPIFGISSY----KVSICLPM-----IDSPLSQLYVMSLLVLNVLAFVVICGCYTHIYLTVRNP------NITSSSSDTKIAKRMAMLIFTDFLCMAPISFFAISASLKVPLITVSKSKILLVLFYPINSCANPFLYAIFTKNFRRDFFILLSKFGCYEVQAQTYRSNFHPRNGHCPPA
>B3AR_RAT
MAPWPHKNGSLAFWSDAPTLDPSAAN---TSGLPGVPWAAALAGALLALATVGGNLLVITAIARTPRLQTITNVFVTSLATADLVVGLLVMPPGATLALTGHWPLGATGCELWTSVDVLCVTASIETLCALAVDRYLAVTNPLYGTLVTKRRARAAVVLVWIVSATVSFAPIMSQWWRVQ-E--RCCSFA--------SNMPYALLSSSVSFYLPLLVMLFVYARVFVVAKRQGVPRRPARLLPLGEHRALRTLGLIMGIFSLCWLPFFLANVLRALVGPS-VPSGVFIALNWLGYANSAFNPLIYCRSP-DFRDAFRRLLCSYGGRGPEEPRVVTSRQNSPLNRFDG
>D3DR_CERAE
--------MAPLSQLSGHLNYTCGVENSTGASQARPHAYYALSYCALILAIVFGNGLVCMAVLKERALQTTTNYLVVSLAVADLLVATLVMPWVVYLEVTGGWNFSRVCCDVFVTLDVMMCTASILNLCAISIDRYTAVVMPVGTGQSSCRRVTLMITAVWVLAFAVSCPLLFGFN---D-P--TVCSI---------SNPDFVIYSSVVSFYLPFGVTVLVYARIYVVLKQRTSLPLQPRGVPLREKKATQMVAIVLGAFIVCWLPFFLTHVLNTHCQTCHVSPELYSATTWLGYVNSALNPVIYTTFNIEFRKAFLKILSC-------------------------
>NK2R_MOUSE
----MGAHASVTDTNILSGLESNATGVTAFSMPGWQLALWATAYLALVLVAVTGNATVIWIILAHERMRTVTNYFIINLALADLCMAAFNATFNFIYASHNIWYFGSTFCYFQNLFPVTAMFVSIYSMTAIAADRYMAIVHPF-QPRLSAPSTKAVIAVIWLVALALASPQCFYST---GA---TKCVVAWPNDNGGKMLLLYHLVVFVLIYFLPLVVMFAAYSVIGLTLWKRPRHHGANLRHLQAKKKFVKAMVLVVVTFAICWLPYHLYFILGTFQEDIKFIQQVYLALFWLAMSSTMYNPIIYCCLNHRFRSGFRLAFRCCPWGTPTEE----HTPSISRRVNRC
>BONZ_MACNE
------MAEHDYHEDYGLNSFNDSSQEEHQDFLQFRKVFLPCMYLVVFVCGLVGNSLVLVISIFYHKLQSLTDVFLVNLPLADLVFVCTLPFWAYAGIHE--WIFGQVMCKTLLGVYTINFYTSMLILTCITVDRFIVVVKATNQQAKRMTWGKVICLLIWVISLLVSLPQIIYGNVFNLD--KLICGYH-----DKEISTVVLATQMTLGFFLPLLAMIVCYSVIIKTL---------LHAGGFQKHRSLKIIFLVMAVFLLTQTPFNLVKLIRSTHWEYTSFHYTIIVTEAIAYLRACLNPVLYAFVSLKFRKNFWKLVKDIGCLPYLGVSHQWKTFSASHNVEAT
>NK1R_MOUSE
-----MDNVLPVDSDLFPNTSTNTSESNQFVQPTWQIVLWAAAYTVIVVTSVVGNVVVIWIILAHKRMRTVTNYFLVNLAFAEACMAAFNTVVNFTYAVHNVWYYGLFYCKFHNFFPIAALFASIYSMTAVAFDRYMAIIHPL-QPRLSATATKVVIFVIWVLALLLAFPQGYYST---SR---VVCMIEWPEHPNRTYEKAYHICVTVLIYFLPLLVIGYAYTVVGITLE---IPSDRYHEQVSAKRKVVKMMIVVVCTFAICWLPFHIFFLLPYINPDLKFIQQVYLASMWLAMSSTMYNPIIYCCLNDRFRLGFKHAFRCCPFISAGDY----STRYLQTQSSVY
>O.2R_HUMAN
ASELNETQEPFLNPTDYDDEEFLRYLWREYLHPKEYEWVLIAGYIIVFVVALIGNVLVCVAVWKNHHMRTVTNYFIVNLSLADVLVTITCLPATLVVDITETWFFGQSLCKVIPYLQTVSVSVSVLTLSCIALDRWYAICHPL-MFKSTAKRARNSIVIIWIVSCIIMIPQAIVMECSTKTTLFTVCDER---WGGEIYPKMYHICFFLVTYMAPLCLMVLAYLQIFRKLWCRKSRVAAEIKQIRARRKTARMLMVVLLVFAICYLPISILNVLKRVFGMFETVYAWFTFSHWLVYANSAANPIIYNFLSGKFREEFKAAFSCCCLGVHHRQEDRLESRKSLTTQISN
>IL8A_RABIT
WTWFEDEFANATGMPPVEKDYSPC----LVVTQTLNKYVVVVIYALVFLLSLLGNSLVMLVILYSRSNRSVTDVYLLNLAMADLLFALTMPIWAVSKEKG--WIFGTPLCKVVSLVKEVNFYSGILLLACISVDRYLAIVHATRTLTQKRHLVKFICLGIWALSLILSLPFFLFRQVFS-NS-SPVCYED-LGHNTAKWRMVLRILPHTFGFILPLLVMLFCYGFTLRTL---------FQAHMGQKHRAMRVIFAVVLIFLLCWLPYNLVLLADTLMRTHNDIDRALDATEILGFLHSCLNPIIYAFIGQNFRNGFLKMLAARGLISKEFLTRHR------------
>AG2S_MOUSE
------MILNSSIEDGIKRIQDDC---PKAGRHNYIFVMIPTLYSIIFVVGIFGNSLVVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPLWAVYTAMEYQWPFGNHLCKIASASVSFNLYASVFLLTCLSIDRYLAIVHPMSRLRRTMLVAKVTCIIIWLMAGLASLPAVIHRNVYF-TN--TVCAFH-YESQNSTLPIGLGLTKNILGFVFPFVIILTSYTLIWKALKKA----YKIQKNTPRNDDIFRIIMAIVLFFFFSWVPHQIFSFLDVLIQLGDVVDTAMPITICIAYFNNCLNPLFYGFLGKKFKRYFLQLLKYIPPKARSHAGLSTRPSD-------N
>OPSD_ANGAN
MNGTEGPNFYIPMSNITGVVRSPFEYPQYYLAEPWAYTILAAYMFTLILLGFPVNFLTLYVTIEHKKLRTPLNYILLNLAVANLFMVFGGFTTTVYTSMHGYFVFGETGCNLEGYFATLGGEISLWSLVVLAIERWVVVCKPMSNFRFGENHAIMGLAFTWIMANSCAMPPLFGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFIVHFSVPLTIISFCYGRLVCTVKEAAAQQQESETTQRAEREVTRMVVIMVIAFLVCWVPYASVAWYIFTHQGSTFGPVFMTVPSFFAKSSAIYNPLIYICLNSQFRNCMITTLFCGKNPFQEEEGASTASSVSS--VSPA
>O3A3_HUMAN
----MSLQKLMEPEAGTNRTAVAEFILLGLVQTEEMQPVVFVLLLFAYLVTIGGNLSILAAVLVEPKLHAPMYFFLGNLSVLDVGCITVTVPAMLGRLLSHKSTISYDACLSQLFFFHLLAGMDCFLLTAMAYDRLLAICQPLYSTRMSQTVQRMLVAASWACAFTNALTHTVAMSTL---NNHFYCDLPQLSCSSTQLNELLLFVAAAFMAVAPLVFISVSYAHVVAAV--------LQIRSAEGRKKAFSTCGSHLTVVGIFYGTGVFSYMRLGSVE---SSDKDKGVGVFMTVINPMLNPLIYSLRNTDVQGALCQLLVGERSLT--------------------
>GPRC_MOUSE
NLSGLPRDCIDAGAPENISAAVPSQGSVAESEPELVVNPWDIVLCSSGTLICCENAVVVLIIFHSPSLRAPMFLLIGSLALADLLAG-LGLIINFVFA-Y--LLQSEATKLVTIGLIVASFSASVCSLLAITVDRYLSLYYALYHSERTVTFTYVMLVMLWGTSICLGLLPVMGWNCL-R--DESTCSVV-------RPLTKNNAAILSISFLFMFALMLQLYIQICKIVMRHIALHFLATSHYVTTRKGVSTLALILGTFAACWMPFTLYSLIADY----TYPSIYTYATLLPATYNSIINPVIYAFRNQEIQKALCLICCGCIPSSLSQRARSP------------
>D1DR_CARAU
-------------MAVLDLNLTTVIDSGFMESDRSVRVLTGCFLSVLILSTLLGNTLVCAAVTKFRHRSKVTNFFVISLAVSDLLVAVLVMPWKAVTEVAGFWPFG-AFCDIWVAFDIMCSTASILNLCVISVDRYWAISSPFYERKMTPRVAFVMISGAWTLSVLISFIPVQLKWHKAVN-PTDNCDSS--------LNRTYAISSSLISFYIPVAIMIVTYTQIYRIAQKQGSNESSFKLSFKRETKVLKTLSVIMGVFVCCWLPFFILNCMVPFCKRTCISPTTFDVFVWFGWANSSLNPIIYAFNA-DFRRAFAILLGCQRLCPGSI-SMET------------
>OPS1_SCHGR
GG-F-ANQTVVDKVPPEMLYLVDPHWYQFPPMNPLWHGLLGFVIGVLGVISVIGNGMVIYIFSTTKSLRTPSNLLVVNLAFSDFLMMFTMSAPMGINCYYETWVLGPFMCELYALFGSLFGCGSIWTMTMIALDRYNVIVKGLSAKPMTNKTAMLRILFIWAFSVAWTIMPLFGWNRYVPEGNMTACGTD--YLTKDWVSRSYILVYSFFVYLLPLGTIIYSYFFILQAVSAHMNVRSAEASQTSAECKLAKVALMTISLWFFGWTPYLIINFTGIFETMK-ISPLLTIWGSLFAKANAVFNPIVYGISHPKYRAALEKKFPSLACASSSDDNTSVSDEKSEKSASA-
>MSHR_CAPCA
PVLGSQRRLLGSLNCTPPATFPLTLAPNRTGPQCLEVSIPDGLFLSLGLVSLVENVLVVAAIAKNRNLHSPMYYFICCLAVSDLLVSVSNVLETAVMLLLEAAAVVQQLDNVIDMLICGSMVSSLCFLGAIAVDRYISIFYALYHSVVTLPRAWRIIAAIWVASILTSLLFITYYY----------------------NNHTVVLLCLVGFFIAMLALMAVLYVHMLARACQHIARKRQRPIHQGFGLKGAATLTILLGVFFLCWGPFFLHLSLIVLCPQHGCIFKNFNLFLALIICNAIVDPLIYAFRSQELRKTLQEVLQCSW-----------------------
>O3A4_HUMAN
-----------MDLGNSGNDSVTKFVLLGLTETAALQPILFVIFLLAYVTTIGGTLSILAAILMETKLHSPMYFFLGNLSLPDVGCVSVTVPAMLSHFISNDRSIPYKACLSELFFFHLLAGADCFLLTIMAYDRYLAICQSLYSSRMSWGIQQALVGMSWVFSFTNALTQTVALSPL---NNHFYCDLPQLSCASVHLNGQLLFVAAAFMGVAPLVLITVSYAHVAAAV--------LRIRSAEGKKKAFSTCSSHLTVVGIFYGTGVFSYTRLGSVE---SSDKDKGIGILNTVISPMLNPLIYWTSLLDVGCISHCSSDAGVSPGPPVQSSLCLSPPPGWGGLSP
>P70658
IYTSDNYSEEVG-SGDYDSNKEPC---FRDENVHFNRIFLPTIYFIIFLTGIVGNGLVILVMGYQKKLRSMTDKYRLHLSVADLLFVITLPFWAVDAMAD--WYFGKFLCKAVHIIYTVNLYSSVLILAFISLDRYLAIVHATSQRPRKLLAEKAVYVGVWIPALLLTIPDFIFADVSQGDD-RYICDRL---YPDSLWMVVFQFQHIMVGLILPGIVILSCYCIIISKL---------SHSKGHQKRKALKTTVILILAFFACWLPYYVGISIDSFILLGSIVHKWISITEALAFFHCCLNPILYAFLGAKFKSSAQHALNSMSRGSSLKILSKG----------GH
>OPSD_AMBTI
MNGTEGPNFYVPFSNKSGVVRSPFEYPQYYLAEPWQYSVLAAYMFLLILLGFPVNFLTLYVTIQHKKLRTPLNYILLNLAFANHFMVFGGFPVTMYSSMHGYFVFGQTGCYIEGFFATMGGEIALWSLVVLAIERYVVVCKPMSNFRFGENHAIMGVMMTWIMALACAAPPLFGWSRYIPEGMQCSCGVDYYTLKPEVNNESFVIYMFLVHFTIPLMIIFFCYGRLVCTVKEAAAQQQESATTQKAEKEVTRMVIIMVVAFLICWVPYASVAFYIFSNQGTDFGPIFMTVPAFFAKSSAIYNPVIYIVLNKQFRNCMITTICCGKNPFGDDETTSAASSVSSSQVSPA
>V2R_PIG
-MLRATTSAVPRALSWPAAPGNGSEREPLDDRDPLLARVELALLSTVFVAVALSNGLVLGALVRRGRRWAPMHVFIGHLCLADLAVALFQVLPQLAWDATYRFRGPDALCRAVKYLQMVGMYASSYMILAMTLDRHRAICRPMAYRHGGGARWNRPVLVAWAFSLLLSLPQLFIFAQRDSG--VLDCWAS---FAEPWGLRAYVTWIALMVFVAPALGIAACQVLIFREIHTSGRRPREGARVSAAMAKTARMTLVIVAVYVLCWAPFFLVQLWSVWDPKA-REGPPFVLLMLLASLNSCTNPWIYASFSSSISSELRSLLCCPRRRTPPSLRPQESFSARDTSS---
>5H7_HUMAN
GSWAPHLLS---EVTASPAPTNASGCGEQINYGRVEKVVIGSILTLITLLTIAGNCLVVISVCFVKKLRQPSNYLIVSLALADLSVAVAVMPFVSVTDLIGGWIFGHFFCNVFIAMDVMCCTASIMTLCVISIDRYLGITRPLYPVRQNGKCMAKMILSVWLLSASITLPPLFGWAQ--N-D--KVCLIS--------QDFGYTIYSTAVAFYIPMSVMLFMYYQIYKAARKSSRLERKNISIFKREQKAATTLGIIVGAFTVCWLPFFLLSTARPFICGTCIPLWVERTFLWLGYANSLINPFIYAFFNRDLRTTYRSLLQCQYRNINRKLSAAGAERPERPEFVLR
>NTR2_RAT
-----METSSPWPPRPSPSAGLSLEARLGVDTRLWAKVLFTALYSLIFAFGTAGNALSVHVVLKARARPGRLRYHVLSLALSALLLLLVSMPMELYNFVWSHWVFGDLGCRGYYFVRELCAYATVLSVASLSAERCLAVCQPLARRLLTPRRTRRLLSLVWVASLGLALPMAVIMGQKH--AASRVCTVL---VSRATLQVFIQVN-VLVSFALPLALTAFLNGITVNHLMALVQARHKDASQIRSLQHSAQVLRAIVAVYVICWLPYHARRLMYCYIPDDDFYHYFYMVTNTLFYVSSAVTPILYNAVSSSFRKLFLESLGSLCGEQHSLVPLPQSTYSFRLWGSPR
>OPSD_RAJER
MNGTEGENFYVPMSNKTGVVRSPFDYPQYYLGEPWMFSALAAYMFFLILTGLPVNFLTLFVTIQHKKLRQPLNYILLNLAVSDLFMVFGGFTTTIITSMNGYFIFGPAGCNFEGFFATLGGEVGLWCLVVLAIERYMVVCKPMANFRFGSQHAIIGVVFTWIMALSCAGPPLVGWSRYIPEGLQCSCGVDYYTMKPEVNNESFVIYMFVVHFTIPLIVIFFCYGRLVCTVKEAAAQQQESESTQRAEREVTRMVIIMVVAFLICWVPYASVAFYIFINQGCDFTPFFMTVPAFFAKSSAVYNPLIYILMNKQFRNCMITTICLGKNPFEEEESTSAASSVSSSQVAPA
>OPSD_CAMAB
GGGF-GNQTVVDKVPPEMLHMVDAHWYQFPPMNPLWHALLGFVIGVLGVISVIGNGMVIYIFTTTKSLRTPSNLLVVNLAISDFLMMLCMSPAMVINCYYETWVLGPLFCELYGLAGSLFGCASIWTMTMIAFDRYNVIVKGLSAKPMTINGALIRILTIWFFTLAWTIAPMFGWNRYVPEGNMTACGTD--YLTKDLFSRSYILIYSIFVYFTPLFLIIYSYFFIIQAVAAHMNVRSAENQSTSAECKLAKVALMTISLWFMAWTPYLVINYSGIFETTK-ISPLFTIWGSLFAKANAVYNPIVYGISHPKYRAALFQKFPSLACTTEPTG-ADTTEGNEKPAA---
>SSR4_MOUSE
VED----TTWTPGINASWAPEQEEDAMGSDGTGTAGMVTIQCIYALVCLVGLVGNALVIFVILRYAKMKTATNIYLLNLAVADELFMLSVPFVRSAAALRH-WPFGAVLCRAVLSVDGLNMFTSVFCLTVLSVDRYVAVVHPLTATYRRPSVAKLINLGVWLASLLVTLPIAVFADTRPGGE-AVACNLH---WPHPAWSAVFVIYTFLLGFLPPVLAIGLCYLLIVGKMRAVALR-GGWQQRRRSEKKITRLVLMVVTVFVLCWMPFYVVQLLNLFVTS--LDATVNHVSLILSYANSCANPILYGFLSDNFRRSFQRVLCLRCCLLETTGGAEETALKSRGGAGCI
>NY1R_.ENLA
NFSTYFENLSVPNNISG---NITFPISEDCALPLPMIFTLALAYGAVIILGLSGNLALIIIILKQKEMRNVTNILIVNLSFSDLLATIMCLPFTLIYTLMDHWIFGEVMCKLNEYIQCVSVTVSIFSLVLIAIERHQLIINPR-GWRPNNRHACFGITVIWGFAMACSTPLMMYSVLTDIG--KYVCLED---FPEDKFRLSYTTLLFILQYLGPLCFIFVCYTKIFLRLKRRNMMIRDNKYRSSETKRINIMLLSIVVGFALCWLPFFIFNLVFDWNHEACNHNLLFLICHLTAMISTCVNPIFYGFLNKNFQRDLQFFFNFCDFRSREDDY---TMHTDVSKTSLK
>LSHR_PIG
ESELSDWDYDYGFCSPKTLQCAPEPDAFNPCEDIMGYDFLRVLIWLINILAIMGNVTVLFVLLTSHYKLTVPRFLMCNLSFADFCMGLYLLLIASVDAQTKGWQTG-NGCSVAGFFTVFASELSVYTLTVITLERWHTITYAILDQKLRLRHAIPIMLGGWLFSTLIAMLPLVGVSSY----KVSICLPM-----VETTLSQVYILTILILNVVAFIIICACYIKIYFAVQNP------ELMATNKDTKIAKKMAVLIFTDFTCMAPISFFAISAALKVPLITVTNSKVLLVLFYPVNSCANPFLYAIFTKAFRRDFFLLLSKSGCCKHQAELYRRK---NGFTGSNK
>GPR4_HUMAN
--------------------MGNHTWEGCHVDSRVDHLFPPSLYIFVIGVGLPTNCLALWAAYRQVQQRNELGVYLMNLSIADLLYICTLPLWVDYFLHHDNWIHGPGSCKLFGFIFYTNIYISIAFLCCISVDRYLAVAHPLFARLRRVKTAVAVSSVVWATELGANSAPLFHDEL--NH---TFCFEKPMEGWVAWMVAWMNLYRVFVGFLFPWALMLLSYRGILRAVRG------SVSTERQEKAKIKRLALSLIAIVLVCFAPYHVLLLSRSAIYLGERVFSAYHSSLAFTSLNCVADPILYCLVNEGARSDVAKALHNLLRFLASDKPQEMETPLTSKRNSTA
>OPSD_POMMI
MNGTEGPFFYIPMVNTTGIVRSPYEYPQYYLVNPAAYAALGAYMFFLILTGFPINFLTLYVTLEHKKLRTALNLILLNLAVADLFMVFGGFTTTMYTSMHGYFVLGRLGCNVEGFFATLGGEIALWSLVVLAVERWVVVCKPISNFRFTENHAIMGVAFSWIMAATCAVPPLVGWSRYIPEGMQCSCGVDYYTRAEGFNNESFVIYMFIVHFLAPLIVIFFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVIIMVIGFLTSWLPYASVAWYIFTHQGTEFGPLFMTIPAFFAKSSALYNPMIYICMNKQFRHCMITTLCCGKNPFEEEEGASTASSVSSSSVSPA
>CB1R_POEGU
EFYNKSLSTFKDNEENIQCGENFMDMECFMILNPSQQLAIAVLSLTLGTFTVLENLLVLCVILHSRSRCRPSYHFIGSLAVADLLGSVIFVYSFVDFHVFHR-KDSPNVFLFKLGGVTASFTASVGSLFLTAIDRYISIHRPLYKRIVTRPKAVVAFCVMWTIAIVIAVLPLLGWNCK-K--LNSVCSDI------FPLIDETYLMFWIGVTSILLLFIVYAYMYILWKAHSHTEDQITRPDQTRMDIRLAKTLVLILVVLIICWGPLLAIMVYDVFGKMNKLIKTIFAFCSMLCLLNSTVNPIIYALRSKDLRHAFRSMFPTCEGTAQPLDNSMEANN-AGNVHRAA
>ADMR_HUMAN
AVPTSDLGEIHNWTELLDLFNHTLSECHVELSQSTKRVVLFALYLAMFVVGLVENLLVICVNWRGSGRAGLMNLYILNMAIADLGIVLSLPVWMLEVTLDYTWLWGSFSCRFTHYFYFVNMYSSIFFLVCLSVDRYVTLTSASSWQRYQHRVRRAMCAGIWVLSAIIPLPEVVHIQLVE--E--PMCLFMAPFETYSTWALAVALSTTILGFLLPFPLITVFNVLTACRL---------RQPGQPKSRRHCLLLCAYVAVFVMCWLPYHVTLLLLTLHGTHHLLYFFYDVIDCFSMLHCVINPILYNFLSPHFRGRLLNAVVHYLPKDQTKAGTCAQHSIIITKGDSQ
>GPR6_HUMAN
G-PPAAAALGAGGGANGSLELSSQLSAGPPGLLLPAVNPWDVLLCVSGTVIAGENALVVALIASTPALRTPMFVLVGSLATADLLAG-CGLILHFVFQ-Y--LVPSETVSLLTVGFLVASFAASVSSLLAITVDRYLSLYNALYYSRRTLLGVHLLLAATWTVSLGLGLLPVLGWNCL-A--ERAACSVV-------RPLARSHVALLSAAFFMVFGIMLHLYVRICQVVWRHIALHCLAPPHLAATRKGVGTLAVVLGTFGASWLPFAIYCVVGSH----EDPAVYTYATLLPATYNSMINPIIYAFRNQEIQRALWLLLCGCFQSKVPFRSRSP------------
>5H1D_MOUSE
SLPNQSLEGLPQEASNRSLNAT---GAWDPEVLQALRISLVVVLSVITLATVLSNAFVLTTILLTKKLHTPANYLIGSLATTDLLVSILVMPISIAYTTTRTWNFGQILCDIWVSSDITCCTASILHLCVIALDRYWAITDALYSKRRTAGHAAAMIAAVWIISICISIPPLFWR----H-E--SDCLVN-------TSQISYTIYSTCGAFYIPSILLIILYGRIYVAARSRKLALERKRISAARERKATKTLGIILGAFIICWLPFFVVSLVLPICRDSWIHPALFDFFTWLGYLNSLINPVIYTVFNEDFRQAFQKVVHFRKAS---------------------
>OPSD_ORYLA
MNGTEGPYFNVPMVNTTGIVRSPYEYPQYYLVSPAAYAALGAYMFFLILVGFPINFLTLYVTLEHKKLRTPLNYILLNLAVADLFMVFGGFTTTMYTSMHGYFVLGRLGCNLEGFFATLGGEIGLWSLVVLAIERWVVVCKPISNFRFGENHAIMGLVFTWIMAASCAVPPLVGWSRYIPEGMQCSCGVDYYTRAEGFNNESFVVYMFVCHFLIPLIVVFFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVVIMVIGFLVCWLPYASVAWYIFTNQGSEFGPLFMTIPAFFAKSSSIYNPAIYICMNKQFRNCMITTLCCGKNPFEEEEGASTASSVSSSSVSPA
>YDBM_CAEEL
EGAGEDVDHHSLFCPKKLVGNLKGFIRNQYHQHETIQILKGSALFLLVLWTIFANSLVFIVLYKNPRLQTVPNLLVGNLAFSDLALGLIVLPLSSVYAIAGEWVFPDALCEVFVSADILCSTASIWNLSIVGLDRYWAITSPVYMSKRNKRTAGIMILSVWISSALISLAPLLGWKQ--Q-TTVRQCTFL--------DLPSYTVYSATGSFFIPTLLMFFVYFKIYQAFAKHYRKRKPKAISAAKERRGVKVLGIILGCFTVCWAPFFTMYVLVQFCKDCSPNAHIEMFITWLGYSNSAMNPIIYTVFNRDYQIALKRLFTSEKKPSSTSRV---------------
>THRR_MOUSE
LLEGRAVYLNISLPPHTPPPPFISEDASGYLTSPWLTLFMPSVYTIVFIVSLPLNVLAIAVFVLRMKVKKPAVVYMLHLAMADVLFVSVLPFKISYYFSGTDWQFGSGMCRFATAAFYGNMYASIMLMTVISIDRFLAVVYPISLSWRTLGRANFTCVVIWVMAIMGVVPLLLKEQTTR--N--TTCHDVLSENLMQGFYSYYFSAFSAIFFLVPLIVSTVCYTSIIRCL------SSSAVANRSKKSRALFLSAAVFCIFIVCFGPTNVLLIVHYLFLSDEAAYFAYLLCVCVSSVSCCIDPLIYYYASSECQRHLYSILCCKESSDPNSCNSTGDTCS--------
>H218_RAT
----MGGLYSEYLNPEKVQEHYNYTKETLDMQETPSRKVASAFIIILCCAIVVENLLVLIAVARNSKFHSAMYLFLGNLAASDLLAG-VAFVANTLLSGPVTLSLTPLQWFAREGSAFITLSASVFSLLAIAIERQVAIAKVKLYGSDKSCRMLMLIGASWLISLILGGLPILGWNCL-D--HLEACSTV------LPLYAKHYVLCVVTIFSVILLAIVALYVRIYFVVRSS-----HADVAGPQTLALLKTVTIVLGVFIICWLPAFSILLLDSTCPVRCPVLYKAHYFFAFATLNSLLNPVIYTWRSRDLRREVLRPLLCWRQGKGATGRRGGPLRSSSSLERGL
>CKR5_MOUSE
--MDFQGSVPTYIYDIDYGMSAPC---QKINVKQIAAQLLPPLYSLVFIFGFVGNMMVFLILISCKKLKSVTDIYLLNLAISDLLFLLTLPFWAHYAANE--WIFGNIMCKVFTGVYHIGYFGGIFFIILLTIDRYLAIVHAVALKVRTVNFGVITSVVTWVVAVFASLPEIIFTRSQK-EG-HYTCSPHFPHTQYHFWKSFQTLKMVILSLILPLLVMIICYSGILHTL--------FRCRNEKKRHRAVRLIFAIMIVYFLFWTPYNIVLLLTTFQEFFNRLDQAMQATETLGMTHCCLNPVIYAFVGEKFRSYLSVFFRKHIVKRFCKRCSIFV-------SSVY
>B3AR_CANFA
MAPWPHGNGSVASWPAAPTPTPDAANTSGLPGAPWAVALAGALLALEVLATVGGNLLVIVAIARTPRLQTMTNVFVTSLATADLVVGLLVVPPGATLALTGRWPLGATGCELWTSVDVLCVTASIETLCALAVDRYLAVTNPLYGALVTKRRARAAVVLVWVVSAAVSFAPIMSKWWRVQ-R--HCCAFA--------SNIPYALLSSSVSFYLPLLVMLFVYARVFLVATRQAVPLRPARLLPLREHRALRTLGLIVGTFTLCWLPFFVANVMRALGGPS-VPSPALLALNWLGYANSAFNPLIYCRSP-DFRSAFRRLLCRCR---REEHRAAAAAPAALTSPAES
>P2Y5_CHICK
-----------------------MVSSNCSTEDSFKYTLYGCVFSMVFVLGLIANCVAIYIFTFTLKVRNETTTYMLNLAISDLLFVFTLPFRIYYFVVRN-WPFGDVLCKISVTLFYTNMYGSILFLTCISVDRFLAIVHPFSKTLRTKRNARIVCVAVWITVLAGSTPASFFQSTNR---EQRTCFENFPESTWKTYLSRIVIFIEIVGFFIPLILNVTCSTMVLRTLNKP----LTLSRNKLSKKKVLKMIFVHLVIFCFCFVPYNITLILYSLMRTQTAVRTMYPVTLCIAVSNCCFDPIVYYFTSDTNSELDKKQQVHQNT----------------------
>GPRO_HUMAN
HASRMSVLRAKPMSNSQRLLLLSPGSPPRTGSISYINIIMPSVFGTICLLGIIGNSTVIFAVVKKSKCNNVPDIFIINLSVVDLLFLLGMPFMIHQLMGNGVWHFGETMCTLITAMDANSQFTSTYILTAMAIDRYLATVHPISTKFRKPSVATLVICLLWALSFISITPVWLYARLIP-FP-AVGCGIR--LPNPDTDLYWFTLYQFFLAFALPFVVITAAYVRILQRMTSSV--PASQRSIRLRTKRVTRTAIAICLVFFVCWAPYYVLQLTQLSISRPTFVYLYNAAIS-LGYANSCLNPFVYIVLCETFRKRLVLSVKPAAQGQLRAVSNAQESKGT-------
>THRR_.ENLA
SGEGSGDQAPVSRSARKPIRRNITKEAEQYLSSQWLTKFVPSLYTVVFIVGLPLNLLAIIIFLFKMKVRKPAVVYMLNLAIADVFFVSVLPFKIAYHLSGNDWLFGPGMCRIVTAIFYCNMYCSVLLIASISVDRFLAVVYPMSLSWRTMSRAYMACSFIWLISIASTIPLLVTEQTQK--D--TTCHDVLDLKDLKDFYIYYFSSFCLLFFFVPFIITTICYIGIIRSL------SSSSIENSCKKTRALFLAVVVLCVFIICFGPTNVLFLTHYLQEANEFLYFAYILSACVGSVSCCLDPLIYYYASSQCQRYLYSLLCCRKVSEPGSSTGQLDNCS--------
>GHSR_HUMAN
SEEPGFNLTLADLDWDASPGNDSLGDELLQLFPAPLLAGVTATCVALFVVGIAGNLLTMLVVSRFRELRTTTNLYLSSMAFSDLLIFLCMPLDLVRLWQYRPWNFGDLLCKLFQFVSESCTYATVLTITALSVERYFAICFPLAKVVVTKGRVKLVIFVIWAVAFCSAGPIFVLVG---T-D-TNECRPT---FAVRSGLLTVMVWVSSIFFFLPVFCLTVLYSLIGRKLWRRGD-VVGASLRDQNHKQTVKMLAVVVFAFILCWLPFHVGRYLFSKSFEPQISQYCNLVSFVLFYLSAAINPILYNIMSKKYRVAVFRLLGFEPFSQRKLSTLKDESSINT------
>CKR3_CAVPO
PEEAELETEFPGTTFYDYEFAQPC---FKVSITDLGAQFLPSLFSLVFIVGLLGNITVIVVLTKYQKLKIMTNIYLLNLAISDLLFLFTLPFWTYYVHWNK-WVFGHFMCKIISGLYYVGLFSEIFFIILLTIDRYLAIVHAVALRTRTVTFGIITSVITWVLAVLAALPEFMFYGTQG-HF-VLFCGPSYPEKKEHHWKRFQALRMNIFGLALPLLIMIICYTGIIKTL---------LRCPSKKKYKAIRLIFVIMVVFFVFWTPYNLLLLFSAFDLSFKQLDMAKHVTEVIAHTHCCINPIIYAFVGERFQKYLRHFLHRNVTMHLSKYIPFFS-------SSIS
>ADMR_RAT
LAPDNDFREIHNWTELLHLFNQTFSDCRMELNENTKQVVLFVFYLAIFVVGLVENVLVICVNCRRSGRVGMLNLYILNMAVADLGIILSLPVWMLEVMLYETWLWGSFSCRFIHYFYLANMYSSIFFLTCLSIDRYVTLTNTSSWQRHQHRIRRAVCAGVWVLSAIIPLPEVVHIQLLD--E--PMCLFLAPFETYSAWALAVALSATILGFLLPFPLIAVFNILSACRL---------RRQGQTESRRHCLLMWAYIVVFAICWLPYHVTMLLLTLHTTHNFLYFFYEITDCFSMLHCVANPILYNFLSPSFRGRLLSLVVRYLPKEQARAAGGRQHSIIITKEGSL
>GPR._MOUSE
--------MDLINSSTHVINVSTSLTNSTGVPTPAPKTIIAASLFMAFIIGVISNGLYLWMLQFKMQ-RTVNTLLFFHLILSYFISTLILPFMATSFLQDNHWVFGSVLCKAFNSTLSVSMFASVFFLSAISVARYYLILHPVSQQHRTPHWASRIALQIWISATILSIPYLVFRTTHD-IS--DWESKE-HQTLGQWIHAACFVGRFLLGFLLPFLVIIFCYKRVATKM---------KEKGLFKSSKPFKVMVTAVISFFVCWMPYHVHSGLVLTKSQP-PLHLTLGLAVVTISFNTVVSPVLYLFTGENFKV-FKKSILALFNSTFSDISSTEETEI--------
>CML2_HUMAN
APNTTSPELNLSHPLLGTALANGTGELSEHQQYVIGLFLSCLYTIFLFPIGFVGNILILVVNISFREKMTIPDLYFINLAVADLILVADSLIEVFNLHER--YYDIAVLCTFMSLFLQVNMYSSVFFLTWMSFDRYIALARAMCSLFRTKHHARLSCGLIWMASVSATLVPFTAVHLQ-A----CFCFA---------DVREVQWLEVTLGFIVPFAIIGLCYSLIVRVLVR----AHRHRGLRPRRQKALRMILAVVLVFFVCWLPENVFISVHLLQRTQHAHPLTGHIVNLAAFSNSCLNPLIYSFLGETFRDKLRLYIEQKTNLPALNRFCHADSTEQSDVRFSS
>AA2A_CANFA
-------------------------------MSTMGSWVYITVELAIAVLAILGNVLVCWAVWLNSNLQNVTNYFVVSLAAADIAVGVLAIPFAITISTG--FCAACHNCLFFACFVLVLTQSSIFSLLAIAIDRYIAIRIPLYNGLVTGTRAKGIIAVCWVLSFAIGLTPMLGWNNCSS-Q-QVACLFE-----DVVPMNYMVYYNFFAFVLVPLLLMLGVYLRIFLAARRQESQGERARSTLQKEVHAAKSLAIIVGLFALCWLPLHIINCFTFFCPECHAPLWLMYLTIVLSHTNSVVNPFIYAYRIREFRQTFRKIIRSHVLRRREPFKAGGAHGSDGEQISLR
>NK2R_MESAU
----MGGRAIVTDTNIFSGLESNTTGVTAFSMPAWQLALWATAYLGLVLVAVTGNATVIWIILAHERMRTVTNYFIINLALADLCMAAFNATFNFVYASHNIWYFGRAFCYFQNLFPITAMFVSIYSMTAIAADRYMAIVHPF-QPRLSAPITKATIAGIWLVALALASPQCFYST---GA---TKCVVAWPNDNGGKMLLLYHLVVFVLVYFLPLVVMFVAYSVIGLTLWKRPRHHGANLRHLHAKKKFVKAMVLVVLTFAICWLPYHLYFILGSFQKDIKFIQQVYLALFWLAMSSTMYNPIIYCCLNHRFRSGFRLAFRCCPWVTPTEE----RTPSLSRRVNRC
>PAFR_MOUSE
----------------------MEHNGSFRVDSEFRYTLFPIVYSVIFILGVVANGYVLWVFANLYPKLNEIKIFMVNLTMADLLFLITLPLWIVYYYNEGDWILPNFLCNVAGCLFFINTYCSVAFLGVITYNRYQAVAYPITAQATTRKRGISLSLIIWVSIVATASYFLATDSTN-G--NITRCFEH---YEPYSVPILVVHVFIAFCFFLVFFLIFYCNLVIIHTLLTQP---MRQQRKAGVKRRALWMVCTVLAVFIICFVPHHVVQLPWTLAELGQAINDAHQITLCLLSTNCVLDPVIYCFLTKKFRKHLSEKFYSMRSSRKCSRATSDPANQTPIVSLKN
>GU27_RAT
--------------------------------MILNCNPFSGLFLSMYLVTVLGNLLIILAVSSNSHLHNLMYFFLSNLSFVDICFISTTIPKMLVNIHSQTKDISYIECLSQVYFLTTFGGMDNFLLTLMACDRYVAICHPLYTVIMNLQLCALLILMFWLIMFCVSLIHVLLMNEL---NPHFFCELAKVANSDTHINNVFMYVVTSLLGLIPMTGILMSYSQIASSL--------LKMSSSVSKYKAFSTCGSHLCVVSLFYGSATIVYFCSSVLHS---THKKMIASLMYTVISPMLNPFIYSLRNKDVKGALGKLFIRVASCPLWSKDFRPRQSL--------
>ET3R_.ENLA
GNVLNMSPP------------PPSPCLSRAKIRHAFKYVTTILSCVIFLVGIVGNSTLLRIIYKNKCMRNGPNVLIASLALGDLFYILIAIPIISISFWLS----TGHSEYIYQLVHLYRARVYSLSLCALSIDRYRAVASWNIRSIGIPVRKAIELTLIWAVAIIVAVPEAIAFNLVELV--ILVCMLPQTSDFMRFYQEVKVWWLFGFYFCLPLACTGVFYTLMSCEMLSINGM-IALNDHMKQRREVAKTVFCLVVIFALCWLPLHVSSIFVRLSATVQLLMVMNYTGINMASLNSCIGPVALYFVSRKFKNCFQSCLCCWCHRPTLTITPMDWKANGHDLDLDR
>B3AR_CAPHI
MAPWPPRNSSLTPWPDIPTLAPNTANASGLPGVPWAVALAGALLALAVLAIVGGNLLVIVAIARTPRLQTMTNVFVTSLATADLVVGLLVVPPGATLALTGHWPLGVTGCELWTSVDVLCVTASIETLCALAVDRYLAVTNPLYGALVTKRRARAAVVLVWVVSAAVSFAPIMSKWWRVQ-R--RCCTFA--------SNMPYALLSSSVSFYLPLLVMLFVYARVFVVATRQGVPRRPARLLPLREHRALRTLGLIMGTFTLCWLPFFVVNVVRALGGPS-VSGPTFLALNWLGYANSAFNPLIYCRSP-DFQSAFRRLLCRCRPEEHLAAASPPRVLTSPAGPRQP
>GP41_HUMAN
-----------------------MDTGPDQSYFSGNHWFVFSVYLLTFLVGLPLNLLALVVFVGKLQRPVAVDVLLLNLTASDLLLLLFLPFRMVEAANGMHWPLPFILCPLSGFIFFTTIYLTALFLAAVSIERFLSVAHPLYKTRPRLGQAGLVSVACWLLASAHCSVVYVIEFSGD--T--GTCYLE-FRKDQLAILLPVRLEMAVVLFVVPLIITSYCYSRLVWILGR--------GGSHRRQRRVAGLLAATLLNFLVCFGPYNVSHVVGYICG---ESPAWRIYVTLLSTLNSCVDPFVYYFSSSGFQADFHELLRRLCGLWGQWQQESSGGEEQRADRPAE
>O.YR_RAT
GTPAANWSVELDLGSGVPPGEEGNRTAGPPQRNEALARVEVAVLCLILFLALSGNACVLLALRTTRHKHSRLFFFMKHLSIRDLVVAVFQVLPQLLWDITFRFYGPDLLCRLVKYLQVVGMFASTYLLLLMSLDRCLAICQPL--RSLRRRTDRLAVLGTWLGCLVASAPQVHIFSLRE----VFDCWAV---FIQPWGPKAYVTWITLAVYIVPVIVLAACYGLISFKIWQNRAAVSSVKLISKAKIRTVKMTFIIVLAFIVCWTPFFFVQMWSVWDVNA-KEASAFIIAMLLASLNSCCNPWIYMLFTGHLFHELVQRFFCCSARYLKGSRPGENSSTFVLSRRSS
>OLF2_CHICK
-------------MASGNCTTPTTFILSGLTDNPRLQMPLFMVFLVIYTTTLLTNLGLIALIGMDLHLQTPMYIFLQNLSFTDAAYSTVITPKMLATFLEERRTISYVGCILQYFSFVLLTTSEWLLLAVMAYDRYVAICKPLYPSIMTKAVCWRLVKGLYSLAFLNSLVHTSGLLKL---SNHFFCDNRQISSSSTTLNELLVIISGSLFVMSSIITILISYVFIILTV--------VMIRSKDGKYKAFSTCTSHLMAVSLFHGTVIFMYLRSVKLF---SLDTDKIASLFYTVVIPMLNPLIYSWRNKEVKDALRRLTATSVWLH--------------------
>NTR1_MOUSE
PHPQFGLETMLLALSLSNGSGLEPNSNLDVNTDIYSKVLVTAVYLALFVVGTVGNSVTAFTLARKKSLQSTVHYHLGSLALSDLLILLLAMPVELYNFIWVHWAFGDAGCRGYYFLRDACTYATALNVASLSVERYLAICHPFAKTLMSRSRTKKFISAIWLASALLAVPMLFTMGLQN--SGGLVCTPT---VVDTATVKVVIQVNTFMSFLFPMLIISILNTVIANKLTVMEHSMSIEPGRVQALRHGVLVLRAVVIAFVVCWLPYHVRRLMFCYISDEDFYHYFYMLTNALFYVSSAINPILYNLVSANFRQVFLSTLACLCPGWRRRRKKRPSMSSNHAFSTSA
>OLF7_RAT
------------MERRNHSGRVSEFVLLGFPAPAPLRVLLFFLSLL-YVLVLTENMLIIIAIRNHPTLHKPMYFFLANMSFLEIWYVTVTIPKMLAGFIGSKQLISFEACMTQLYFFLGLGCTECVLLAVMAYDRYVAICHPLYPVIVSSRLCVQMAAGSWAGGFGISMVKVFLISRL---SNHFFCDVSNLSCTDMSTAELTDFVLAIFILLGPLSVTGASYMAITGAV--------MRIPSAAGRHKAFSTCASHLTVVIIFYAASIFIYARPKALS---AFDTNKLVSVLYAVIVPLFNPIIYCLRNQDVKRALRRTLHLAQDQEANTNKGSK------------
>MC5R_RAT
-MNSSSHLTLLDLTLNASEDNILGQNVNNKSSACEDMGIAVEVFLTLGLVSLLENILVIGAIVKNKNLHSPMYFFVGSLAVADMLVSMSNAWETITIYLINNDTFVRHIDNVFDSMICISVVASMCSLLAIAVDRYITIFYALYHHIMTARRSGVIIACIWTFCISCGIVFIIYYY----------------------EESKYVIVCLISMFFTMLFFMVSLYIHMFLLARNH-RIPRYNSVRQRASMKGAITLTMLLGIFIVCWSPFFLHLILMISCPQNACFMSYFNMYLILIMCNSVIDPLIYALRSQEMRRTFKEIICCHGFRRTCTLLGRY------------
>BONZ_CERAE
------MAEYDHYEDNGFNSFNDSSQEEHQDFLQFSKVFLPCMYLVVFVCGLVGNSLVLVISIFYHKLQSLTDVFLVNLPLADLVFVCTLPFWAYAGIHE--WIFGQVMCKTLLGIYTINFYTSMLILTCITVDRFIVVVKATNQQAKKMTWGKVICLLIWVISLLVSLPQIIYGNVFNLD--KLICGYH-----DEEISTVVLATQMTLGFFLPLLAMIVCYSVIIKTL---------LHAGGFQKHRSLKIIFLVMAVFLLTQTPFNLVKLIRSTHWEYTSFHYTIIVTEAIAYLRACLNPVLYAFVSLKFRKNFWKLVKDIGCLPYLGVSHQWKTFSASHNVEAT
>SSR5_MOUSE
LSLASTPSWNAS---AASSGSHNWSLVDPVSPMGARAVLVPVLYLLVCTVGLGGNTLVIYVVLRYAKMKTVTNVYILNLAVADVLFMLGLPFLATQNAVSY-WPFGSFLCRLVMTLDGINQFTSIFCLMVMSVDRYLAVVHPLSARWRRPRVAKLASAAVWVFSLLMSLPLLVFADVQE-----GTCNLS-WPEPVGLWGAAFITYTSVLGFFGPLLVICLCYLLIVVKVKAAMR--VGSSRRRRSERKVTRMVVVVVLVFVGCWLPFFIVNIVNLAFTLPPTSAGLYFFVVVLSYANSCANPLLYGFLSDNFRQSFRKALCLRRGYGVEDADAIERPQTTLPTRSCE
>OPSR_ASTFA
GD--DTTREAAFTYTNSNNTKDPFEGPNYHIAPRWVYNLATCWMFFVVVASTVTNGLVLVASAKFKKLRHPLNWILVNLAIADLLETLLASTISVCNQFFGYFILGHPMCVFEGFTVATCGIAGLWSLTVISWERWVVVCKPFGNVKFDGKMATAGIVFTWVWSAVWCAPPIFGWSRYWPHGLKTSCGPDVFSGSEDPGVQSYMIVLMITCCFIPLGIIILCYIAVWWAIRTVAQQQKDSESTQKAEKEVSRMVVVMIMAYCFCWGPYTFFACFAAANPGYAFHPLAAAMPAYFAKSATIYNPVIYVFMNRQFRVCIMQLFGKKVDDG-----SEVSS------VAPA
>C.C1_HUMAN
------MESSGNPESTTFFYYDLQSQPCENQAWVFATLATTVLYCLVFLLSLVGNSLVLWVLVKYESLESLTNIFILNLCLSDLVFACLLPVWISPYHWG--WVLGDFLCKLLNMIFSISLYSSIFFLTIMTIHRYLSVVSPLTLRVPTLRCRVLVTMAVWVASILSSILDTIFHKVLS-----SGCDYS------ELTWYLTSVYQHNLFFLLSLGIILFCYVEILRTL---------FRSRSKRRHRTVKLIFAIVVAYFLSWGPYNFTLFLQTLFRTQQQLEYALLICRNLAFSHCCFNPVLYVFVGVKFRTHLKHVLRQFWFCRLQAPSPASFAYEGASFY---
>GALT_RAT
--------------------MADIQNISLDSPGSVGAVAVPVIFALIFLLGMVGNGLVLAVLLQPGPPRSTTDLFILNLAVADLCFILCCVPFQAAIYTLDAWLFGAFVCKTVHLLIYLTMYASSFTLAAVSLDRYLAVRHPLSRALRTPRNARAAVGLVWLLAALFSAPYLSYYG---GA--LELCVPA----WEDARRRALDVATFAAGYLLPVAVVSLAYGRTLCFLWAAP--AAAAEARRRATGRAGRAMLAVAALYALCWGPHHALILCFWYGRFAPATYACRLASHCLAYANSCLNPLVYSLASRHFRARFRRLWPCGRRRHRHHHR-AHPASSGPAGYPGD
>ACM4_HUMAN
-----MANFTPVNGSSGNQSVRLVTSSSHNRYETVEMVFIATVTGSLSLVTVVGNILVMLSIKVNRQLQTVNNYFLFSLACADLIIGAFSMNLYTVYIIKGYWPLGAVVCDLWLALDYVVSNASVMNLLIISFDRYFCVTKPLYPARRTTKMAGLMIAAAWVLSFVLWAPAILFWQFVV-VP-DNHCFIQ------FLSNPAVTFGTAIAAFYLPVVIMTVLYIHISLASRSRSIAVRKKRQMAARERKVTRTIFAILLAFILTWTPYNVMVLVNTFCQSC-IPDTVWSIGYWLCYVNSTINPACYALCNATFKKTFRHLLLCQYRNIGTAR----------------
>AG22_RAT
RNITSSLPFDNLNATGTNESAFNC----SHKPADKHLEAIPVLYYMIFVIGFAVNIVVVSLFCCQKGPKKVSSIYIFNLAVADLLLLATLPLWATYYSYRYDWLFGPVMCKVFGSFLTLNMFASIFFITCMSVDRYQSVIYPFLSQRRNPWQASYVVPLVWCMACLSSLPTFYFRDVRT-LG--NACIMAFPPEKYAQWSAGIALMKNILGFIIPLIFIATCYFGIRKHLLKT----NSYGKNRITRDQVLKMAAAVVLAFIICWLPFHVLTFLDALTWMGAVIDLALPFAILLGFTNSCVNPFLYCFVGNRFQQKLRSVFRVPITWLQGKRETMSREMD-------T
>OPRM_MOUSE
LSHVDGNQSDPCGPNRTGLGGSHSLCPQTGSPSMVTAITIMALYSIVCVVGLFGNFLVMYVIVRYTKMKTATNIYIFNLALADALATSTLPFQSVNYLMGT-WPFGNILCKIVISIDYYNMFTSIFTLCTMSVDRYIAVCHPVALDFRTPRNAKIVNVCNWILSSAIGLPVMFMATTKYGS---IDCTLT-FSHPTWYWENLLKICVFIFAFIMPVLIITVCYGLMILRLKSVRML-SGSKEKDRNLRRITRMVLVVVAVFIVCWTPIHIYVIIKALITIPTFQTVSWHFCIALGYTNSCLNPVLYAFLDENFKRCFREFCIPTSSTIEQQNSARIPSTANTVDRTNH
>MC4R_HUMAN
GMHTSLHLWNRSSYRLHSNASESLGKGYSDGGCYEQLFVSPEVFVTLGVISLLENILVIVAIAKNKNLHSPMYFFICSLAVADMLVSVSNGSETIIITLLNSQSFTVNIDNVIDSVICSSLLASICSLLSIAVDRYFTIFYALYHNIMTVKRVGIIISCIWAACTVSGILFIIYYS----------------------DDSSAVIICLITMFFTMLALMASLYVHMFLMARLH-RIPGTGAIRQGANMKGAITLTILIGVFVVCWAPFFLHLIFYISCPQNVCFMSHFNLYLILIMCNSIIDPLIYALRSQELRKTFKEIICCYPLGGLCDLSSRY------------
>5H7_CAVPO
STWTPRLLSGVPEVAASPSPSNVSGCGEQINYGRAEKVVIGSILTLITLLTIAGNCLVVISVCFVKKLRQPSNYLIVSLALADLSVAVAVIPFVSVTDLIGGWIFGHFFCNVFIAMDVMCCTASIMTLCVISIDRYLGITRPLYPVRQNGKCMPKMILSVWLLSASITLPPLFGWAQ--N-D--KVCLIS--------QDFGYTIYSTAVAFYIPMSVMLFMYYRIYKAARKSSRLERKNISIFKREQKAATTLGIIVGAFTVCWLPFFLLSTARPFICGTCIPLWVERTCLWLGYANSLINPFIYAFFNRDLRTTYRSLLQCQYRNINRKLSAAGAERPERPECVLQ
>NK2R_HUMAN
----MGTCDIVTEANISSGPESNTTGITAFSMPSWQLALWAPAYLALVLVAVTGNAIVIWIILAHRRMRTVTNYFIVNLALADLCMAAFNAAFNFVYASHNIWYFGRAFCYFQNLFPITAMFVSIYSMTAIAADRYMAIVHPF-QPRLSAPSTKAVIAGIWLVALALASPQCFYST---GA---TKCVVAWPEDSGGKTLLLYHLVVIALIYFLPLAVMFVAYSVIGLTLWRRPGHHGANLRHLQAKKKFVKTMVLVVLTFAICWLPYHLYFILGSFQEDIKFIQQVYLALFWLAMSSTMYNPIIYCCLNHRFRSGFRLAFRCCPWVTPTKE----PTTSLSTRVNRC
>NY1R_CAVPO
TSFSQLENHSVHYNLSEEKPSFFAFENDDCHLPLAVIFTLALAYGAVIILGVSGNLALILIILKQKEMRNVTNILIVNLSFSDLLVAIMCLPFTFVYTLMDHWIFGEIMCKLNPFVQCVSITVSIFSLVLIAVERHQLIINPR-GWRPNNRHAYIGIAVIWVLAVASSLPFMIYQVLTDKD--KLVCFDQ---FPSDSHRLSYTTLLLVLQYFGPLCFIFICYFKIYIRLKRRNMMMRDSKYRSSESKRINIMLLSIVVAFAVCWLPLTIFNTVFDWNHQICNHNLLFLLCHLTAMISTCVNPIFYGFLNKNFQRDLQFFFNFCDFRSRDDDY---TMHTDVSKTSLK
>O.1R_RAT
PGVPTSSGEPFHLPPDYED-EFLRYLWRDYLYPKQYEWVLIAAYVAVFLIALVGNTLVCLAVWRNHHMRTVTNYFIVNLSLADVLVTAICLPASLLVDITESWLFGHALCKVIPYLQAVSVSVAVLTLSFIALDRWYAICHPL-LFKSTARRARGSILGIWAVSLAVMVPQAAVMECSSRTRLFSVCDER---WADELYPKIYHSCFFFVTYLAPLGLMGMAYFQIFRKLWGPQPRFLAEVKQMRARRKTAKMLMVVLLVFALCYLPISVLNVLKRVFGMFEAVYACFTFSHWLVYANSAANPIIYNFLSGKFREQFKAAFSCCLPGLGPS-----RHKS---LSLQS
>YTJ5_CAEEL
-------------MPNYTVPPDPADTSWDSPYSIPVQIVVWIIIIVLSLETIIGNAMVVMAYRIERNSKQVSNRYIVSLAISDLIIGIEGFPFFTVYVLNGDWPLGWVACQTWLFLDYTLCLVSILTVLLITADRYLSVCHTAYLKWQSPTKTQLLIVMSWLLPAIIFGIMIYGWQAMTQSTSGAECSAP------FLSNPYVNMGMYVAYYWTTLVAMLILYKVFSSGYQKKSQPDRLAPPNKTDTFLSASGTITFIVGFFAILWSPYYIMATVYGFCKG--IPSFLYTLSYYMCYLNSSGNPFAYALANRQFRSAFMRMFRGNFNKVA------------------
>KI01_HUMAN
---------------MINSTSTQPPDESCSQNLLITQQIIPVLYCMVFIAGILLNGVSGWIFFYVPS-SKSFIIYLKNIVIADFVMSLTFPFKILGDSGLGPWQLNVFVCRVSAVLFYVNMYVSIVFFGLISFDRYYKIVKPLTSFIQSVSYSKLLSVIVWMLMLLLAVPNIILTNQS-TQ---IKCIEL--KSELGRKWHKASNYIFVAIFWIVFLLLIVFYTAITKKIFK----LKSSRNSTSVKKKSSRNIFSIVFVFFVCFVPYHIARIPYTKSQTEEILRYMKEFTLLLSAANVCLDPIIYFFLCQPFREILCKKLHIPLKAQNDLDISRIESTDTL------
>PD2R_HUMAN
---------------------MKSPFYRCQNTTSVEKGNSAVMGGVLFSTGLLGNLLALGLLARSGLLPSVFYMLVCGLTVTDLLGKCLLSPVVLAAYAQNRPALDNSLCQAFAFFMSFFGLSSTLQLLAMALECWLSLGHPFYRRHITLRLGALVAPVVSAFSLAFCALPFMGFGKFVQYCPGTWCFIQ-MVHEEGSLSVLGYSVLYSSLMALLVLATVLCNLGAMRNLYAMAEPGREASPQPLEELDHLLLLALMTVLFTMCSLPVIYRAYYGAFKDV-TSEEAEDLRALRFLSVISIVDPWIFIIFRSPVFRIFFHKIFIRPLRYRSRCSNST------------
>C3.1_RAT
--MPTSFPELDLENFEYDDSAEAC---YLGDIVAFGTIFLSIFYSLVFTFGLVGNLLVVLALTNSRKSKSITDIYLLNLALSDLLFVATLPFWTHYLISHEG--LHNAMCKLTTAFFFIGFFGGIFFITVISIDRYLAIVLAASMNNRTVQHGVTISLGVWAAAILVASPQFMFTKRKD-----NECLGDYPEVLQEIWPVLRNSEVNILGFVLPLLIMSFCYFRIVRTL---------FSCKNRKKARAIRLILLVVVVFFLFWTPYNIVIFLETLKFYNRDLRWALSVTETVAFSHCCLNPFIYAFAGEKFRRYLRHLYNKCLAVLCGRPVHAGRSRQDSILSS-L
>UR2R_RAT
TVSGSTVTELPGDSNVSLNSSWSGPTDPSSLKDLVATGVIGAVLSAMGVVGMVGNVYTLVVMCRFLRASASMYVYVVNLALADLLYLLSIPFIIATYVTKD-WHFGDVGCRVLFSLDFLTMHASIFTLTIMSSERYAAVLRPLDTVQRSKGYRKLLVLGTWLLALLLTLPMMLAIQ---GS--KSLCLPA----WGPRAHRTYLTLLFGTSIVGPGLVIGLLYVRLARAYWLS---ASFKQTRRLPNPRVLYLILGIVLLFWACFLPFWLWQLLAQYHEAMETARIVNYLTTCLTYGNSCINPFLYTLLTKNYREYLRGRQRSLGSSCHSPGSPGSLQQDSGRSLSSS
>5H4_MOUSE
------------------MDKLDANVSSNEGFRSVEKVVLLTFLAVVILMAILGNLLVMVAVCRDRQRKIKTNYFIVSLAFADLLVSVLVMPFGAIELVQDIWAYGEMFCLVRTSLDVLLTTASIFHLCCISLDRYYAICCQPYRNKMTPLRIALMLGGCWVLPMFISFLPIMQGWNNIRK-NSTWCVFM--------VNKPYAITCSVVAFYIPFLLMVLAYYRIYVTAKEHSRPDQHSTHRMRTETKAAKTLCVIMGCFCFCWAPFFVTNIVDPFIDYT-VPEQVWTAFLWLGYINSGLNPFLYAFLNKSFRRAFLIILCCDDERYKRPPILGQTINGSTHVLR--
>MC4R_RAT
GMYTSLHLWNRSSHGLHGNASESLGKGHSDGGCYEQLFVSPEVFVTLGVISLLENILVIVAIAKNKNLHSPMYFFICSLAVADMLVSVSNGSETIVITLLNSQSFTVNIDNVIDSVICSSLLASICSLLSIAVDRYFTIFYALYHNIMTVRRVGIIISCIWAACTVSGVLFIIYYS----------------------DDSSAVIICLITMFFTMLVLMASLYVHMFLMARLH-RIPGTGTIRQGANMKGAITLTILIGVFVVCWAPFFLHLLFYISCPQNVCFMSHFNLYLILIMCNAVIDPLIYALRSQELRKTFKEIICFYPLGGICELPGRY------------
>OLF2_CANFA
-------------MDGKNCSSVNEFLLVGISNKPGVKVTLFITFLIVYLIILVANLGMIILIRMDSQLHTPMYFFLSHLSFSDARYSTAVGPRMLVGFIAKNKSIPFYSCAMQWLVFCTFVDSECLLLAVMAFDRYKAISHPLYTVSMSSRVCSLLMAGVYLVGIMDASVNTILTFRL---CNHFFCDVPLLSCSDTQVNELVIFTIFGFIELITLSGLFVSYCYIILAV--------RKINSAEGRFKAFSTCTSHLTAVAIFQGTMLFMYFRPSSSY---SLDQDKIISLFYSLVIPMLNPLIYSLRNKDVKEALKKLKNKKWFH---------------------
>AG2S_.ENLA
----MLSNISAGENSEVEKIVVKC---SKSGMHNYIFITIPIIYSTIFVVGVFGNSLVVIVIYSYMKMKTMASVFLMNLALSDLCFVITLPLWAVYTAMHYHWPFGDLLCKIASTAITLNLYTTVFLLTCLSIDRYSAIVHPMSRIRRTVMVARLTCVGIWLVAFLASLPSVIYRQIFI-TN--TVCALV-YHSGHIYFMVGMSLVKNIVGFFIPFVIILTSYTLIGKTLKEV------YRAQRARNDDIFKMIVAVVLLFFFCWIPHQVFTFLDVLIQMDDIVDTGMPITICIAYFNSCLNPFLYGFFGKKFRKHFLQLIKYIPPKMRTHASVNTRLSD-------T
>OPRD_MOUSE
SSPLVNLSDAFPSAFPSAGANASGSPGARSASSLALAIAITALYSAVCAVGLLGNVLVMFGIVRYTKLKTATNIYIFNLALADALATSTLPFQSAKYLMET-WPFGELLCKAVLSIDYYNMFTSIFTLTMMSVDRYIAVCHPVALDFRTPAKAKLINICIWVLASGVGVPIMVMAVTQPGA---VVCMLQ-FPSPSWYWDTVTKICVFLFAFVVPILIITVCYGLMLLRLRSVRLL-SGSKEKDRSLRRITRMVLVVVGAFVVCWAPIHIFVIVWTLVDINPLVVAALHLCIALGYANSSLNPVLYAFLDENFKRCFRQLCRTPCGRQEPGSLRRPRVTACTPSDGPG
>CKR5_PYGNE
----MDYQVSSPTYDIDYYTSEPC---QKVNVKQIAARLLPPLYSLVFIFGFVGNILVVLILINCKRLKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQR-EG-HYTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLIMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKRFCKCCSIFA-------SSVY
>OPSD_GAMAF
MNGTEGPYFYVPMVNTTGIVRSPYEYPQYYLVSPAAYACLGAYMFFLILVGFPVNFLTLYVTIEHKKLRTPLNYILLNLAVADLFMVFGGFTTTIYTSMHGYFVLGRLGCNLEGYFATLGGEIGLWSLVVLAVERWLVVCKPISNFRFTENHAIMGLVFTWIMANACAAPPLLGWSRYIPEGMQCSCGVDYYTRAEGFNNESFVIYMFICHFCIPLVVVFFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVVILVIGFLVCWTPYASVAWYIFSNQGSEFGPLFMTIPAFFAKSSSIYNPMIYICMNKQFRHCMITTLCCGKNPFEEEEGASTASSVSSSSVSPA
>LSHR_CALJA
ESGQSGWDYDYGFHLPKTPRCAPEPDAFNPCEDIMGYDFLRVLIWLINILAIMGNMTVLFVLLTSRYKLTVPRFLMCNLSFADFCMGLYLLLIASVDSQTKGWQTG-SGCNTAGFFTVFASELSVYTLTVITLERWHTITYAILDQKLRLRHAILIMLGGWLFSSLIAMLPLVGVSNY----KVSICFPM-----VETTLSQIYILTILILNVVAFIIICACYIKIYFAVRNP------ELMATNKDTKIAKKMAILIFTDFTCMAPISFFAISAAFKMPLITVTNSKVLLVLFYPINSCANPFLYAIFTKTFRRDFFLLLGKFGCCKHRAELYRRSNYKNGFTGSSK
>O1F1_HUMAN
-------------MSGTNQSSVSEFLLLGLSRQPQQQHLLFVFFLSMYLATVLGNLLIILSVSIDSCLHTPMYFFLSNLSFVDICFSFTTVPKMLANHILETQTISFCGCLTQMYFVFMFVDMDNFLLAVMAYDHFVAVCHPLYTAKMTHQLCALLVAGLWVVANLNVLLHTLLMAPL---STHFFCDVTKLSCSDTHLNEVIILSEGALVMITPFLCILASYMHITCTV--------LKVPSTKGRWKAFSTCGSHLAVVLLFYSTIIAVYFNPLSSH---SAEKDTMATVLYTVVTPMLNPFIYSLRNRYLKGALKKVVGRVVFSV--------------------
>GALT_HUMAN
--------------------MADAQNISLDSPGSVGAVAVPVVFALIFLLGTVGNGLVLAVLLQPGPPGSTTDLFILNLAVADLCFILCCVPFQATIYTLDAWLFGALVCKAVHLLIYLTMYASSFTLAAVSVDRYLAVRHPLSRALRTPRNARAAVGLVWLLAALFSAPYLSYYG---GA--LELCVPA----WEDARRRALDVATFAAGYLLPVAVVSLAYGRTLRFLWAAP--AAAAEARRRATGRAGRAMLAVAALYALCWGPHHALILCFWYGRFAPATYACRLASHCLAYANSCLNPLVYALASRHFRARFRRLWPCGRRRRHRARRALRGPPGCPGDARPS
>A2AA_CAVPO
----MGSLQPDSGNASWNGTEGPGGGTRATPYSLQVTVTLVCLVGLLILLTVFGNVLVIIAVFTSRALKAPQNLFLVSLASADILVATLVIPFSLANEVMGYWYFGKAWCEIYLALDVLFCTSSIVHLCAISLDRYWSITQAIYNLKRTPRRIKAIIVTVWVISAVISFPPLISFEK-AQ-P--PRCEIN--------DQKWYVISSSIGSFFAPCLIMILVYVRIYQIAKRRRGGASRWRGRQNREKRFTFVLAVVIGVFVVCWFPFFFTYTLTAVGCS--VPRTLFKFFFWFGYCNSSLNPVIYTIFNHDFRRAFKKILCRGDRKRIV------------------
>A2AA_RAT
----MGSLQPDAGNSSWNGTEAPGGGTRATPYSLQVTLTLVCLAGLLMLFTVFGNVLVIIAVFTSRALKAPQNLFLVSLASADILVATLVIPFSLANEVMGYWYFGKVWCEIYLALDVLFCTSSIVHLCAISLDRYWSITQAIYNLKRTPRRIKAIIVTVWVISAVISFPPLISIEKKGQ-P--PSCKIN--------DQKWYVISSSIGSFFAPCLIMILVYVRIYQIAKRRRAGASRWRGRQNREKRFTFVLAVVIGVFVVCWFPFFFTYTLIAVGCP--VPYQLFNFFFWFGYCNSSLNPVIYTIFNHDFRRAFKKILCRGDRKRIV------------------
>OLF4_CANFA
-------------MELENDTRIPEFLLLGFSEEPKLQPFLFGLFLSMYLVTILGNLLLILAVSSDSHLHTPMYFFLANLSFVDICFTCTTIPKMLVNIQTQRKVITYESCIIQMYFFELFAGIDNFLLTVMAYDRYMAICYPLYMVIMNPQLCSLLLLVSWIMSALHSLLQTLMVLRL---SPHFFCELNQLACSDTFLNNMMLYFAAILLGVAPLVGVLYSYFKIVSSI--------RGISSAHSKYKAFSTCASHLSVVSLFYCTSLGVYLSSAAPQ---STHTSSVASVMYTVVTPMLNPFIYSLRNKDIKGALNVFFRGKP-----------------------
>SSR2_RAT
QFNGSQVWIPSPFDLNGSLGPSNGSNQTEPYYDMTSNAVLTFIYFVVCVVGLCGNTLVIYVILRYAKMKTITNIYILNLAIADELFMLGLPFLAMQVALVH-WPFGKAICRVVMTVDGINQFTSIFCLTVMSIDRYLAVVHPISAKWRRPRTAKMINVAVWGVSLLVILPIMIYAGLRSWG--RSSCTIN-WPGESGAWYTGFIIYAFILGFLVPLTIICLCYLFIIIKVKSSGIR-VGSSKRKKSEKKVTRMVSIVVAVFIFCWLPFYIFNVSSVSVAISPALKGMFDFVVILTYANSCANPILYAFLSDNFKKSFQNVLCLVKVSGAEDGERSDLNETTETQRTLL
>BRS3_SHEEP
QTLISTTNDTESSSSVVPNDSTNKRRTGDNSPGIEALCAIYITYAVIISVGILGNAILIKVFFKTKSMQTVPNIFITSLAFGDLLLLLTCVPVDVTHYLAEGWLFGRIGCKVLSFIRLTSVGVSVFTLTILSADRYKAVVKPLRQPPNAILKTCAKAGCIWIMSMIIALPEAIFSNVYTVT--FKACASY---VSERLLQEIHSLLCFLVFYIIPLSIISVYYSLIARTLYKSIPTQRHARKQIESRKRIAKTVLVLVALFALCWLPNHLLYLYRSFTSQTTVHLFVTIISRILAFSNSCVNPFALYWLSNTFQQHFKAQLFCCKAGRPDPTAANTMGRVPGAASTQM
>ETBR_BOVIN
SSATPQIPRGGRMAGIPPR--TPPPCDGPIEIKETFKYINTVVSCLVFVLGIIGNSTLLRIIYKNKCMRNGPNILIASLALGDLLHIIIDIPINTYKLLAKDWPFGVEMCKLVPFIQKASVGITVLSLCALSIDRYRAVASWSIKGIGVPKWTAVEIVLIWVVSVVLAVPEAVGFDIITRI--LRICLLHQKTAFMQFYKTAKDWWLFSFYFCLPLAITALFYTLMTCEMLRKSGM-IALNDHLKQRREVAKTVFCLVLVFALCWLPLHLSRILKLTLYDQSFLLVLDYIGINMASLNSCINPIALYLVSKRFKNCFKSCLCCWCQSFE-EKQSLEFKANDHGYDNFR
>FML1_HUMAN
-----------METNFSTPLNEYEEVSYESAGYTVLRILPLVVLGVTFVLGVLGNGLVIWVAGFRMT-RTVTTICYLNLALADFSFTATLPFLIVSMAMGEKWPFGWFLCKLIHIVVDINLFGSVFLIGFIALDRCICVLHPVAQNHRTVSLAMKVIVGPWILALVLTLPVFLFLTTVTASWPEERLKVA------ITMLTARGIIRFVIGFSLPMSIVAICYGLIAAKI---------HKKGMIKSSRPLRVLTAVVASFFICWFPFQLVALLGTVWLKEKIIDILVNPTSSLAFFNSCLNPMLYVFVGQDFRERLIHSLPTSLERALSEDSAPTAS----------
>V1AR_MOUSE
SSPWWPLTTEGANSSREAAGLGEGGSPPGDVRNEELAKLEVTVLAVIFVVAVLGNSSVLLALHRTPRKTSRMHLFIRHLSLADLAVAFFQVLPQLCWDITYRFRGPDWLCRVVKHLQVFAMFASSYMLVVMTADRYIAVCHPLKTLQQPARRSRLMIAASWGLSFVLSIPQYFIFSVIETK--AQDCWAT---FIPPWGTRAYVTWMTSGVFVVPVIILGTCYGFICYHIWRNLLVVSSVKSISRAKIRTVKMTFVIVSAYILCWTPFFIVQMWSVWDTNFDSENPSTTITALLASLNSCCNPWIYMFFSGHLLQDCVQSFPCCQSIAQKFAKDDSTSYSNNRSPTNS
>OPSB_BOVIN
--MSKMSEEEEFLLFKNISLVGPWDGPQYHLAPVWAFHLQAVFMGFVFFVGTPLNATVLVATLRYRKLRQPLNYILVNVSLGGFIYCIFSVFIVFITSCYGYFVFGRHVCALEAFLGCTAGLVTGWSLAFLAFERYIIICKPFGNFRFSSKHALMVVVATWTIGIGVSIPPFFGWSRFVPEGLQCSCGPDWYTVGTKYYSEYYTWFLFIFCYIVPLSLICFSYSQLLGALRAVAAQQQESASTQKAEREVSHMVVVMVGSFCLCYTPYAALAMYIVNNRNHGVDLRLVTIPAFFSKSACVYNPIIYCFMNKQFRACIMEMVCGKPMTD---ESELSSTVSSSQVGPN-
>APJ_MOUSE
-----------MEDDGYNYYGADNQSECDYADWKPSGALIPAIYMLVFLLGTTGNGLVLWTVFRTSRKRRSADIFIASLAVADLTFVVTLPLWATYTYREFDWPFGTFSCKLSSYLIFVNMYASVFCLTGLSFDRYLAIVRPVNARLRLRVSGAVATAVLWVLAALLAVPVMVFRSTDAQCY-MDYSMVA-TSNSEWAWEVGLGVSSTAVGFVVPFTIMLTCYFFIAQTIAGHFR--KERIEGLRKRRRLLSIIVVLVVTFALCWMPYHLVKTLYMLGSLLIFLMNVFPYCTCISYVNSCLNPFLYAFFDPRFRQACTSMLCCDQSGCKGTPHSSSSSGHSQGPGPNM
>O1G1_HUMAN
-------------MEGKNLTSISECFLLGFSEQLEEQKPLFGSFLFMYLVTVAGNLLIILVIITDTQLHTPMYFFLANLSLADACFVSTTVPKMLANIQIQSQAISYSGCLLQLYFFMLFVMLEAFLLAVMAYDCYVAICHPLYILIMSPGLCIFLVSASWIMNALHSLLHTLLMNSL---SPHFFCDINSLSCTDPFTNELVIFITGGLTGLICVLCLIISYTNVFSTI--------LKIPSAQGKRKAFSTCSSHLSVVSLFFGTSFCVDFSSPSTH---SAQKDTVASVMYTVVTPMLNPFIYSLRNQEIKSSLRKLIWVRKIHSP-------------------
>CKR2_HUMAN
FIRNTNESGEEVTTFFDYDYGAPC---HKFDVKQIGAQLLPPLYSLVFIFGFVGNMLVVLILINCKKLKCLTDIYLLNLAISDLLFLITLPLWAHSAANE--WVFGNAMCKLFTGLYHIGYFGGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWLVAVFASVPGIIFTKCQK-ED-VYVCGPY----FPRGWNNFHTIMRNILGLVLPLLIMVICYSGILKTL--------LRCRNEKKRHRAVRVIFTIMIVYFLFWTPYNIVILLNTFQEFFSQLDQATQVTETLGMTHCCINPIIYAFVGEKFRSLFHIALGCRIAPLQKPVCGGPV-------KVTT
>5H2A_CRIGR
NSSDASNWTIDGENRTNLSFEGYLPPTCLSILHLQEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYWPLPSKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHSRFNSRTKAFLKIIAVWTISVGVSMPIPVFGLQD-VF---GSCLL---------ADDNFVLIGSFVAFFIPLTIMVITYFLTIKSLQKEEPGGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICKESHVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENRKPLQLILAYKSSQLQAGQN
>TRFR_MOUSE
------------MENDTVSEMNQTELQPQAAVALEYQVVTILLVVIICGLGIVGNIMVVLVVMRTKHMRTPTNCYLVSLAVADLMVLVAAGLPNITDSIYGSWVYGYVGCLCITYLQYLGINASSCSITAFTIERYIAICHPIAQFLCTFSRAKKIIIFVWAFTSIYCMLWFFLLDLN--NA-SCGYKIS------RNYYSPIYLMDFGVFYVVPMILATVLYGFIARILFLNLNLNRCFNSTVSSRKQVTKMLAVVVILFALLWMPYRTLVVVNSFLSSPFQENWFLLFCRICIYLNSAINPVIYNLMSQKFRAAFRKLCNCKQKPTEKAANYSVKESDRFSTELED
>OPSH_CARAU
MNGTEGNNFYVPLSNRTGLVRSPFEYPQYYLAEPWQFKLLAVYMFFLICLGLPINGLTLICTAQHKKLRQPLNFILVNLAVAGAIMVCFGFTVTFYTAINGYFALGPTGCAVEGFMATLGGEVALWSLVVLAIERYIVVCKPMGSFKFSSTHASAGIAFTWVMAMACAAPPLVGWSRYIPEGIQCSCGPDYYTLNPEYNNESYVLYMFICHFILPVTIIFFTYGRLVCTVKAAAAQQQDSASTQKAEREVTKMVILMVLGFLVAWTPYATVAAWIFFNKGAAFSAQFMAIPAFFSKTSALYNPVIYVLLNKQFRSCMLTTLFCGKNPLGDEESSTVSS------VSPA
>A2AB_RAT
--------------------MSGPTMDHQEPYSVQATAAIASAITFLILFTIFGNALVILAVLTSRSLRAPQNLFLVSLAAADILVATLIIPFSLANELLGYWYFWRAWCEVYLALDVLFCTSSIVHLCAISLDRYWAVSRALYNSKRTPCRIKCIILTVWLIAAVISLPPLIYKGD-Q-----PQCELN--------QEAWYILASSIGSFFAPCLIMILVYLRIYVIAKRSGVAWWRRRTQLSREKRFTFVLAVVIGVFVVCWFPFFFSYSLGAICPQHKVPHGLFQFFFWIGYCNSSLNPVIYTVFNQDFRRAFRRILCRPWTQTGW------------------
>TA2R_CERAE
--MWPNG-----------SSLGPCFRPTNITLEERRLIASPWFAASFCVVGLASNLLALSVLAGARQTRSSFLTFLCGLVLTDFLGLLVTGAIVVSQHAALFVDPGCRLCRFMGVVMIFFGLSPLLLGATMASERFLGITRPFRPVVTSQRRAWATVGLVWAAALALGLLPLLGLGRYTVQYPGSWCFLT----LGAESGDVAFGLLFSMLGGLSVGLSFLLNTVSVATLCHVYHGEAAQQRPRDSEVEMMAQLLGIMLVASVCWLPLLVFIAQTVLRNPPRATEQELLIYLRVATWNQILDPWVYILFRRAVLRRLQPRLSTRPRSLSLQPQLTQ------------
>SSR2_HUMAN
PLNGSHTWLSIPFDLNGSVVSTNTSNQTEPYYDLTSNAVLTFIYFVVCIIGLCGNTLVIYVILRYAKMKTITNIYILNLAIADELFMLGLPFLAMQVALVH-WPFGKAICRVVMTVDGINQFTSIFCLTVMSIDRYLAVVHPISAKWRRPRTAKMITMAVWGVSLLVILPIMIYAGLRSWG--RSSCTIN-WPGESGAWYTGFIIYTFILGFLVPLTIICLCYLFIIIKVKSSGIR-VGSSKRKKSEKKVTRMVSIVVAVFIFCWLPFYIFNVSSVSMAISPALKGMFDFVVVLTYANSCANPILYAFLSDNFKKSFQNVLCLVKVSGTDDGERSDLNETTETQRTLL
>ACM4_.ENLA
----MENDTWENESSASNHSIDETIVEIPGKYQTMEMIFIATVTGSLSLVTVVGNILVMLSIKVNRQLQTVNNYFLFSLACADLIIGVFSMNLYSLYIIKGYWPLGPIVCDLWLALDYVVSNASVMNLLIISLER-FCVTKPLYPARRTTKMAGLMIAAAWLLSFELWAPAILFWQFIV-VP-SGECYIQ------FLSNPAVTFGTAIAAFYLPVVIMTILYIHISLASRSRSIAVRKKRQMAAREKKVTRTIFAILLAFIITWTPYNVMVLINTFCQTC-IPETIWYIGYWLCYVNSTINPACYALCNATFKKTFKHLLMCQYKSIGTAR----------------
>Y..5_CAEEL
SVNESCDNYVEIFNKINYFFRDDQVINGTEYSPKEFGYFITFAYMLIILFGAIGNFLTIIVVILNPAMRTTRNFFILNLALSDFFVCIVTAPTTLYTVLYMFWPFSRTLCKIAGSLQGFNIFLSTFSIASIAVDRYVLIIFPT-KRERQQNLSFCFFIMIWVISLILAVPLLQASDLTPCDLALYICHEQEIWEKMIISKGTYTLAVLITQYAFPLFSLVFAYSRIAHRMKLRTTNSQRRRSVVERQRRTHLLLVCVVAVFAVAWLPLNVFHIFNTFELVN-FSVTTFSICHCLAMCSACLNPLIYAFFNHNFRIEFMHLFDRVGLRSLRVVIFGEMRTEFRSRGGCK
>GRHR_CLAGA
TLLLSNPTNVLDNSSVLNVSVSPPVLKWETPTFTTAARFRVAATLVLFVFRAASNLSVLLSVTRGRGLASHLRPLIASLASADLVMTFVVMPLDAVWNVTVQWYAGDAMCKLMCFLKLFAMHSAAFILVVVSLDRHHAILHPL-DTLDAGRRNRRMLLTAWILSLLLASPQLFIFRAIKVD--FVQCATH--SFQQHWQETAYNMFHFVTLYVFPLLVMSLCYTRILVEINRQGEPRSGTDMIPKARMKTLKMTIIIVASFVICWTPYYLLGIWYWFQPQMVIPDYVHHVFFVFGNLNTCCDPVIYGFFTPSFRADLSRCFCWRNQNASAKSLPHFSGEAESDLGSGD
>CKR5_HUMAN
----MDYQVSSPIYDINYYTSEPC---QKINVKQIAARLLPPLYSLVFIFGFVGNMLVILILINCKRLKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAVVHAVALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQK-EG-HYTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKRFCKCCSIFA-------SSVY
>OPSD_NEOSA
MNGTEGPYFYVPMVNTTGVVRSPYEYPQYYLVNPAAFAVLGAYMFFLIIFGFPINFLTLYVTLEHKKLRTPLNYILLNLAVADLFMVIGGFTTTMYSSMHGYFVLGRLGCNLEGFSATLGGMISLWSLAVLAIERWVVVCKPTSNFRFGENHAIMGVSLTWTMALACTVPPLVGWSRYIPEGMQCSCGIDYYTRAEGFNNESFVLYMFFCHFMVPLIIIFFCYGRLLCAVKEAAAAQQESETTQRAEREVTRMVILMVIGYLVCWLPYASVAWFIFTHQGSEFGPLFMTIPAFFAKSSSIYNPVIYICMNKQFRNCMITTLFCGKNPF---EGEEETEASSASSVSPA
>A2AB_HUMAN
-------------------------MDHQDPYSVQATAAIAAAITFLILFTIFGNALVILAVLTSRSLRAPQNLFLVSLAAADILVATLIIPFSLANELLGYWYFRRTWCEVYLALDVLFCTSSIVHLCAISLDRYWAVSRALYNSKRTPRRIKCIILTVWLIAAVISLPPLIYKGD-Q-----PQCKLN--------QEAWYILASSIGSFFAPCLIMILVYLRIYLIAKRSGAIWWRRRAHVTREKRFTFVLAVVIGVFVLCWFPFFFSYSLGAICPKHKVPHGLFQFFFWIGYCNSSLNPVIYTIFNQDFRRAFRRILCRPWTQTAW------------------
>CKR4_MOUSE
VTDTTQDETVYNSYYFYESMPKPC---TKEGIKAFGEVFLPPLYSLVFLLGLFGNSVVVLVLFKYKRLKSMTDVYLLNLAISDLLFVLSLPFWGYYAADQ--WVFGLGLCKIVSWMYLVGFYSGIFFIMLMSIDRYLAIVHAVSLKARTLTYGVITSLITWSVAVFASLPGLLFSTCYT-EH-HTYCKTQ-YSVNSTTWKVLSSLEINVLGLLIPLGIMLFWYSMIIRTL---------QHCKNEKKNRAVRMIFGVVVLFLGFWTPYNVVLFLETLVELERYLDYAIQATETLGFIHCCLNPVIYFFLGEKFRKYITQLFRTCRGPLVLCKHCDFMS------SSSY
>A1AD_HUMAN
GSGEDNRSSAGEPGSAGAGGDVNGTAAVGGLVVSAQGVGVGVFLAAFILMAVAGNLLVILSVACNRHLQTVTNYFIVNLAVADLLLSATVLPFSATMEVLGFWAFGRAFCDVWAAVDVLCCTASILSLCTISVDRYVGVRHSLYPAIMTERKAAAILALLWVVALVVSVGPLLGWKEP--VP--RFCGIT--------EEAGYAVFSSVCSFYLPMAVIVVMYCRVYVVARSTHTFLSVRLLKFSREKKAAKTLAIVVGVFVLCWFPFFFVLPLGSLFPQLKPSEGVFKVIFWLGYFNSCVNPLIYPCSSREFKRAFLRLLRCQCRRRRRRRPLWRASTSG-LRQDCA
>C3AR_HUMAN
--------------MASFSAETNSTDLLSQPWNEPPVILSMVILSLTFLLGLPGNGLVLWVAGLKMQ-RTVNTIWFLHLTLADLLCCLSLPFSLAHLALQGQWPYGRFLCKLIPSIIVLNMFASVFLLTAISLDRCLVVFKPICQNHRNVGMACSICGCIWVVAFVMCIPVFVYREIFT-ED-YNLGQFT-DDDQVPTPLVAITITRLVVGFLLPSVIMIACYSFIVFRM--------QRGRFAKSQSKTFRVAVVVVAVFLVCWTPYHIFGVLSLLTDPEKTLMSWDHVCIALASANSCFNPFLYALLGKDFRKKARQSIQGILEAAFSEELTRSVIS---------
>AG2R_HUMAN
------MILNSSTEDGIKRIQDDC---PKAGRHNYIFVMIPTLYSIIFVVGIFGNSLVVIVIYFYMKLKTVASVFLLNLALADLCFLLTLPLWAVYTAMEYRWPFGNYLCKIASASVSFNLYASVFLLTCLSIDRYLAIVHPMSRLRRTMLVAKVTCIIIWLLAGLASLPAIIHRNVFF-TN--TVCAFH-YESQNSTLPIGLGLTKNILGFLFPFLIILTSYTLIWKALKKA----YEIQKNKPRNDDIFKIIMAIVLFFFFSWIPHQIFTFLDVLIQLGDIVDTAMPITICIAYFNNCLNPLFYGFLGKKFKRYFLQLLKYIPPKAKSHSNLSTRPSD-------N
>CKR5_PANTR
----MDYQVSSPIYDIDYYTSEPC---QKINVKQIAARLLPPLYSLVFIFGFVGNMLVILILINCKRLKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQK-EG-HYTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKRFCKCCSIFA-------SSVY
>PF2R_MOUSE
--MSMNS---------SKQPVSPAAGLIANTTCQTENRLSVFFSIIFMTVGILSNSLAIAILMKAYQSKASFLLLASGLVITDFFGHLINGGIAVFVYASDKFDQSNILCSIFGISMVFSGLCPLFLGSAMAIERCIGVTNPIHSTKITSKHVKMILSGVCMFAVFVAVLPILGHRDYQIQASRTWCFYN--TEHIEDWEDRFYLLFFSFLGLLALGVSFSCNAVTGVTLLRVKFRSQQHRQGRSHHLEMIIQLLAIMCVSCVCWSPFLVTMANIAINGNNPVTCETTLFALRMATWNQILDPWVYILLRKAVLRNLYKLASRCCGVNIISLHIWELKVAAISESPAA
>OPSB_ORYLA
VEFPDDFWIPIPLDTNNVTALSPFLVPQDHLGSPTIFYSMSALMFVLFVAGTAINLLTIACTLQYKKLRSHLNYILVNMAVANLIVASTGSSTCFVCFAFKYMVLGPLGCKIEGFTAALGGMVSLWSLAVIAFERWLVICKPLGNFVFKSEHALLCCALTWVCGLCASVPPLVGWSRYIPEGMQCSCGPDWYTTGNKFNNESFVMFLFCFCFAVPFSIIVFCYSQLLFTLKMAAKAQADSASTQKAEKEVTRMVVVMVVAFLVCYVPYASFALWVINNRGQTFDLRLATIPSCVSKASTVYNPVIYVLLNKQFRLCMKKMLGMSADED---EESSTSKVGPS------
>CCKR_RABIT
ASLLGNASGIPPPCELGLDNETLFCLDQPPPSKEWQPAVQILLYSLIFLLSVLGNTLVITVLIRNKRMRTVTNIFLLSLAISDLMLCLFCMPFNLIPNLLKDFIFGSALCKTTTYLMGTSVSVSTLNLVAISLERYGAICKPLSRVWQTKSHALKVIAATWCLSFAIMTPYPIYN----NNQTANMCRFL---LPSDVMQQAWHTFLLLILFLIPGIVMMVAYGMISLELYQGRVSSSSSAATLMAKKRVIRMLMVIVVLFFLCWMPIFSANAWRAYDTVSRLSGTPISFILLLSYTSSCVNPIIYCFMNRRFRLGFMATFPCCPNPGPPGPRAEATTRASLSRYSYS
>BRB1_HUMAN
SSWPPLELQSSNQSQLFPQNATAC--DNAPEAWDLLHRVLPTFIISICFFGLLGNLFVLLVFLLPRRQLNVAEIYLANLAASDLVFVLGLPFWAENIWNQFNWPFGALLCRVINGVIKANLFISIFLVVAISQDRYRVLVHPMSGRQQRRRQARVTCVLIWVVGGLLSIPTFLLRSIQA-LN--TACILL---LPHEAWHFARIVELNILGFLLPLAAIVFFNYHILASLRTR---VSRTRVRGPKDSKTTALILTLVVAFLVCWAPYHFFAFLEFLFQVQDFIDLGLQLANFFAFTNSSLNPVIYVFVGRLFRTKVWELYKQCTPK---------LAPI-------S
>TRFR_RAT
------------MENETVSELNQTELPPQVAVALEYQVVTILLVVVICGLGIVGNIMVVLVVMRTKHMRTATNCYLVSLAVADLMVLVAAGLPNITDSIYGSWVYGYVGCLCITYLQYLGINASSCSITAFTIERYIAICHPIAQFLCTFSRAKKIIIFVWAFTSIYCMLWFFLLDLN--DA-SCGYKIS------RNYYSPIYLMDFGVFYVMPMILATVLYGFIARILFLNMNLNRCFNSTVSSRKQVTKMLAVVVILFALLWMPYRTLVVVNSFLSSPFQENWFLLFCRICIYLNSAINPVIYNLMSQKFRAAFRKLCNCKQKPTEKAANYSVKESDRFSTELDD
>AG22_HUMAN
KNITSGLHFGLVNISGNNESTLNC----SQKPSDKHLDAIPILYYIIFVIGFLVNIVVVTLFCCQKGPKKVSSIYIFNLAVADLLLLATLPLWATYYSYRYDWLFGPVMCKVFGSFLTLNMFASIFFITCMSVDRYQSVIYPFLSQRRNPWQASYIVPLVWCMACLSSLPTFYFRDVRT-LG--NACIMAFPPEKYAQWSAGIALMKNILGFIIPLIFIATCYFGIRKHLLKT----NSYGKNRITRDQVLKMAAAVVLAFIICWLPFHVLTFLDALAWMGAVIDLALPFAILLGFTNSCVNPFLYCFVGNRFQQKLRSVFRVPITWLQGKRESMSREME-------T
>VU51_HSV7J
-----------------MKNIDLTNWKLLAEIYEYLFFFSFFFLCLLVIIVVKFNNSTVGR-E--------YTFSTFSGMLVYILLLPVKMGMLTKM-----WDVSTDYCIILMFLSDFSFIFSSWALTLLALERINNFSFSEIKVNETKILKQMSFPIIWVTSIFQAVQISMKYKKSQ---EDDYCLLA--------IRSAEEAWILLMYTVVIPTFIVFFYVLNKRFL-----------FLERDLNSIVTHLSLFLFFGALCFFPASVLNEFNCN----RLFYGLHELLIVCLELKIFYVPTMTYIISCENYRLAAKAFFCKCFKPCFLMPSLRSTQF--------
>PAR3_MOUSE
-WTGATTTIKAECPEDSISTLHVNNATIGYLRSSLSTQVIPAIYILLFVVGVPSNIVTLWKLSLRTK-SISLVIFHTNLAIADLLFCVTLPFKIAYHLNGNNWVFGEVMCRITTVVFYGNMYCAILILTCMGINRYLATAHPFYQKLPKRSFSLLMCGIVWVMVFLYMLPFVILKQEYH--E--TTCHDVDACESPSSFRFYYFVSLAFFGFLIPFVIIIFCYTTLIHKL----------KSKDRIWLGYIKAVLLILVIFTICFAPTNIILVIHHANYYYDSLYFMYLIALCLGSLNSCLDPFLYFVMSKVVDQLNP------------------------------
>GPRC_HUMAN
NLSGLPRDYLDAAAAENISAAVSSRVPAVEPEPELVVNPWDIVLCTSGTLISCENAIVVLIIFHNPSLRAPMFLLIGSLALADLLAG-IGLITNFVFA-Y--LLQSEATKLVTIGLIVASFSASVCSLLAITVDRYLSLYYALYHSERTVTFTYVMLVMLWGTSICLGLLPVMGWNCL-R--DESTCSVV-------RPLTKNNAAILSVSFLFMFALMLQLYIQICKIVMRHIALHFLATSHYVTTRKGVSTLAIILGTFAACWMPFTLYSLIADY----TYPSIYTYATLLPATYNSIINPVIYAFRNQEIQKALCLICCGCIPSSLAQRARSP------------
>BRB2_CAVPO
---MFNITSQVSALNATLAQGNSC---LDAEWWSWLNTIQAPFLWVLFVLAVLENIFVLSVFFLHKSSCTVAEIYLGNLAVADLILAFGLPFWAITIANNFDWLFGEVLCRMVNTMIQMNMYSSICFLMLVSIDRYLALVKTMMGRMRGVRWAKLYSLVIWGCALLLSSPMLVFRTMKD-HN--TACLII---YPSLTWQVFTNVLLNLVGFLLPLSIITFCTVQIMQVLRNN---EMQKFKEIQTERRATVLVLAVLLLFVVCWLPFQIGTFLDTLRLLGHVIDLITQISSYLAYSNSCLNPLVYVIVGKRFRKKSREVYHGLCRSGGCVSEPAQLRTS-------I
>P2YR_RAT
AAFLAGLGSLWGNSTIASTAAVSSSFRCALIKTGFQFYYLPAVYILVFIIGFLGNSVAIWMFVFHMKPWSGISVYMFNLALADFLYVLTLPALIFYYFNKTDWIFGDVMCKLQRFIFHVNLYGSILFLTCISAHRYSGVVYPLSLGRLKKKNAIYVSVLVWLIVVVAISPILFYSGTG----KTVTCYDS-TSDEYLRSYFIYSMCTTVAMFCIPLVLILGCYGLIVRALIY------KDLDNSPLRRKSIYLVIIVLTVFAVSYIPFHVMKTMNLRARLDDRVYATYQVTRGLASLNSCVDPILYFLAGDTFRRRLSRATRKASRRSEANLQSKSLSEFKQNGDTSL
>PF2R_SHEEP
--MSTNN---------SVQPVSPASELLSNTTCQLEEDLSISFSIIFMTVGILSNSLAIAILMKAYQYKSSFLLLASALVITDFFGHLINGTIAVFVYASDKFDKSNILCSIFGICMVFSGLCPLFLGSLMAIERCIGVTKPIHSTKITTKHVKMMLSGVCFFAVFVALLPILGHRDYKIQASRTWCFYK--TDQIKDWEDRFYLLLFAFLGLLALGISFVCNAITGISLLKVKFRSQQHRQGRSHHFEMVIQLLGIMCVSCICWSPFLVTMASIGMNIQDKDSCERTLFTLRMATWNQILDPWVYILLRKAVLRNLYVCTRRCCGVHVISLHVWELKVAAISDLPVT
>GPRO_RAT
QTSLLSTGPNASNISDGQDNLTLPGSPPRTGSVSYINIIMPSVFGTICLLGIVGNSTVIFAVVKKSKCSNVPDIFIINLSVVDLLFLLGMPFMIHQLMGNGVWHFGETMCTLITAMDANSQFTSTYILTAMTIDRYLATVHPISTKFRKPSMATLVICLLWALSFISITPVWLYARLIP-FP-AVGCGIR--LPNPDTDLYWFTLYQFFLAFALPFVVITAAYVKILQRMTSSV--PASQRSIRLRTKRVTRTAIAICLVFFVCWAPYYVLQLTQLSISRPTFVYLYNAAIS-LGYANSCLNPFVYIVLCETFRKRLVLSVKPAAQGQLRTVSNAQESKGT-------
>ML1._HUMAN
---------MGPTLAVPTPYGCIGCKLPQPEYPPALIIFMFCAMVITIVVDLIGNSMVILAVTKNKKLRNSGNIFVVSLSVADMLVAIYPYPLMLHAMSIGGWDLSQLQCQMVGFITGLSVVGSIFNIVAIAINRYCYICHSLYERIFSVRNTCIYLVITWIMTVLAVLPNMYIGT-IEYDP-TYTCIFN------YLNNPVFTVTIVCIHFVLPLLIVGFCYVRIWTKVLAARD-AGQNPDNQLAEVRNFLTMFVIFLLFAVCWCPINVLTVLVAVSPKEKIPNWLYLAAYFIAYFNSCLNAVIYGLLNENFRREYWTIFHAMRHPIIFFPGLISARTLARARAHAR
>AVT_CATCO
-------------MGRIANQTTASNDTDPFGRNEEVAKMEITVLSVTFFVAVIGNLSVLLAMHNTKKKSSRMHLFIKHLSLADMVVAFFQVLPQLCWEITFRFYGPDFLCRIVKHLQVLGMFASTYMMVMMTLDRYIAICHPLKTLQQPTQRAYIMIGSTWLCSLLLSTPQYFIFSLSESY--VYDCWGH---FIEPWGIRAYITWITVGIFLIPVIILMICYGFICHSIWKNMIGVSSVTIISRAKLRTVKMTLVIVLAYIVCWAPFFIVQMWSVWDENFDSENAAVTLSALLASLNSCCNPWIYMLFSGHLLYDFLRCFPCCKKPRNMLQKEDSTLLTKLAAGRMT
>ACM3_PIG
N---------ISQAAGNFSSPNGTTSDPLGGHTIWQVVFIAFLTGILALVTIIGNILVIVAFKVNKQLKTVNNYFLLSLACADLIIGVISMNLFTTYIIMNRWALGNLACDLWLSIDYVASNASVMNLLVISFDRYFSITRPLYRAKRTTKRAGVMIGLAWVISFILWAPAILFWQYFV-VP-PGECFIQ------FLSEPTITFGTAIAAFYMPVTIMTILYWRIYKETEKRKTRTKRKRMSLIKEKKAAQTLSAILLAFIITWTPYNIMVLVNTFCDSC-IPKTYWNLGYWLCYINSTVNPVCYALCNKTFRTTFKMLLLCQCDKRKRRKQQYQHKRVPEQAL---
>MSHR_ALCAA
PVLGSQRRLLGSLNCTPPATFSLTLAPNRTGPQCLEVSIPDGLFLSLGLVSLVENVLVVAAIAKNRNLHSPMYYFICCLAVSDLLVSVSNVLETAVMLLLEAAAVVQQLDNVIDVLICGSMVSSLCFLGAIAMDRYISIFYALYHSVVTLPRAWRIIAAIWVASILTSLLFITYYY----------------------NNHTVVLLCLVGFFIAMLALMAILYVHMLARACQHIARKRQHPIHQGFGLKGAATLTILLGVFFLCWGPFFLHLSLIVLCPQHGCIFKNFNLFLALIICNAIVDPLIYAFRSQELRKTLQEVLQCSW-----------------------
>GP72_MOUSE
TGPNASSHFWANYTFSDWQNFVGRRRYGAESQNPTVKALLIVAYSFTIVFSLFGNVLVCHVIFKNQRMHSATSLFIVNLAVADIMITLLNTPFTLVRFVNSTWVFGKGMCHVSRFAQYCSLHVSALTLTAIAVDRHQVIMHPL-KPRISITKGVIYIAVIWVMATFFSLPHAICQKLFT--EVRSLCLPD-FPEPADLFWKYLDLATFILLYLLPLFIISVAYARVAKKLWLCGDVTEQYLALRRKKKTTVKMLVLVVVLFALCWFPLNCYVLLLSSKAI-HTNNALYFAFHWFAMSSTCYNPFIYCWLNENFRVELKALLSMCQRPPKPQEDRLPVAWTEKSHGRRA
>ACM5_RAT
--------MEGESYNESTVNGTPVNHQALERHGLWEVITIAVVTAVVSLMTIVGNVLVMISFKVNSQLKTVNNYYLLSLACADLIIGIFSMNLYTTYILMGRWVLGSLACDLWLALDYVASNASVMNLLVISFDRYFSITRPLYRAKRTPKRAGIMIGLAWLVSFILWAPAILCWQYLV-VP-PDECQIQ------FLSEPTITFGTAIAAFYIPVSVMTILYCRIYRETEKRNLSTKRKRMVLVKERKAAQTLSAILLAFIITWTPYNIMVLVSTFCDKC-VPVTLWHLGYWLCYVNSTINPICYALCNRTFRKTFKLLLLCRWKKKKVEEKLYW------------
>CKR5_PONPY
----MDYQVSSPTYDIDYYTSEPC---QKINVKQIAARLLPPLYSLVFIFGFVGNMLVILILINCKRLKSMTDIYLLNLAISDLFFLLTVPFWAHYAAAQ--WDFGNTMCQLLTGLYFIGFFSGIFFIILLTIDRYLAIVHAVALKARTVTFGVVTSVITWVVAVFASLPGIIFTRSQK-EG-HYTCSSHFPYSQYQFWKNFQTLKIVILGLVLPLLVMVICYSGILKTL--------LRCRNEKKRHRAVRLIFTIMIVYFLFWAPYNIVLLLNTFQEFFNRLDQAMQVTETLGMTHCCINPIIYAFVGEKFRNYLLVFFQKHIAKRFCKCCSIFA-------SSVY
>GASR_CANFA
GASLCRAGGALLNSSGAGNLSCEPPRLRGAGTRELELAIRVTLYAVIFLMSVGGNVLIIVVLGLSRRLRTVTNAFLLSLAVSDLLLAVACMPFTLLPNLMGTFIFGTVVCKAVSYLMGVSVSVSTLSLVAIALERYSAICRPLARVWQTRSHAARVIIATWMLSGLLMVPYPVYTAVQP---A-LQCVHR---WPSARVRQTWSVLLLLLLFFVPGVVMAVAYGLISRELYLGPGPPRPYQAKLLAKKRVVRMLLVIVVLFFLCWLPLYSANTWRAFDSSGALSGAPISFIHLLSYASACVNPLVYCFMHRRFRQACLETCARCCPRPPRARPRPLPSIASLSRLSYT
>TLR2_DROME
TLSTDQPAVGDVEDAAEDAAASMETGSFAFVVPWWRQVLWSILFGGMVIVATGGNLIVVWIVMTTKRMRTVTNYFIVNLSIADAMVSSLNVTFNYYYMLDSDWPFGEFYCKLSQFIAMLSICASVFTLMAISIDRYVAIIRPL-QPRMSKRCNLAIAAVIWLASTLISCPMMIIYR----NR--TVCYPEDGPTNHSTMESLYNILIIILTYFLPIVSMTVTYSRVGIELWGSTIGTPRQVENVRSKRRVVKMMIVVVLIFAICWLPFHSYFIITSCYPAIPFIQELYLAIYWLAMSNSMYNPIIYCWMNSRFRYGFKMVFRWCLFVRVGTEPFSRYSCSGSPDHNRI
>D2DR_MELGA
------MDPLNLSWYNTGDRNWSEPVNESSADQKPQYNYYAVLLTLLIFVIVFGNVLVCMAVSREKALQTTTNYLIVSLAVADLLVATLVMPWVVYLEVVGEWRFSRIHCDIFVTLDVMMCTASILNLCAISIDRYTAAAMPMNTRYSSKRRVTVMIACVWVLSFAISSPILFGLN---E----RECII---------ANPAFVVYSSVVSFYVPFIVTLLVYVQIYMVLRRRSTLMNRRKLSQQKEKKATQMLAIVLGVFIICWLPFFITHILNMHCDCN-IPPAMYSAFTWLGYVNSAVNPIIYTTFNIEFRKAFMKILHC-------------------------
>EDG2_BOVIN
QPQFTAMNEQQCFSNESIAFFYNRSGKYLATEWNTVTKLVMGLGITVCIFIMLANLLVMVAIYVNRRFHFPIYYLMANLAAADFFAG-LAYFYLMFNTGPNTRRLTVSTWLLRQGLIDTSLTVSVANLLAIAIERHITVFRMQLHARMSNRRVVVVIVVIWTMAIVMGAIPSVGWNCI-C--DIENCSNM------APLYSDSYLVFWAIFNLVTFVVMVVLYAHIFGYVRQRRMSSSGPRRNRDTMMSLLKTVVIVLGAFIICWTPGLVLLLLDVCCPQC-DVLAYEKFFLLLAEFNSAMNPIIYSYRDKEMSATFRQILCCQRSENTSGPTEGSNHTILAGVHSND
>NY2R_BOVIN
EEMKVDQFGPGHTTLPGELAPDSEPELIDSTKLIEVQVVLILAYCSIILLGVIGNSLVIHVVIKFKSMRTVTNFFIANLAVADLLVNTLCLPFTLTYTLMGEWKMGPVLCHLVPYAQGLAVQVSTITLTVIALDRHRCIVYHL-ESKISKQISFLIIGLAWGVSALLASPLAIFREYSLFE--IVACTEKWPGEEKGIYGTIYSLSSLLILYVLPLGIISFSYTRIWSKLKNHSPG-AAHDHYHQRRQKTTKMLVCVVVVFAVSWLPLHAFQLAVDIDSHVKEYKLIFTVFHIIAMCSTFANPLLYGWMNSNYRKAFLSAFRCEQRLDAIHSE---AKKHLQVTKNNG
>HH1R_RAT
----------MSFANTSSTFEDKMCEGNRTAMASPQLLPLVVVLSSISLVTVGLNLLVLYAVHSERKLHTVGNLYIVSLSVADLIVGAVVMPMNILYLIMTKWSLGRPLCLFWLSMDYVASTASIFSVFILCIDRYRSVQQPLYLRYRTKTRASATILGAWFFSFLWVIPILGWHHFM--EL-EDKCETD------FYNVTWFKIMTAIINFYLPTLLMLWFYVKIYKAVRRHLRSQYVSGLHLNRERKAAKQLGFIMAAFILCWIPYFIFFMVIAFCKSC-CSEPMHMFTIWLGYINSTLNPLIYPLCNENFKKTFKKILHIRS-----------------------"""
gpcr_aln = DenseAlignment(data=gpcr_ungapped.split('\n'),MolType=PROTEIN)
myos_data = """>gi|107137|pir||A37102
LSRIITRIQA
>gi|11024712|ref|NP-060003.1|
LAQLITRTQA
>gi|11276950|pir||A59286
LSRIITRIQA
>gi|11276952|pir||A59293
LAQLITRTQA
>gi|11276954|pir||A59234
LSLIITRIQA
>gi|11276955|pir||A59236
LSSIFKLIQA
>gi|11321579|ref|NP-003793.1|
LVTLMTSTQA
>gi|11342672|ref|NP-002461.1|
LAKLITRTQA
>gi|1197168|dbj|BAA08111.1|
LSSIFKLIQA
>gi|12003423|gb|AAG43570.1|AF21
LVTLMTRTQA
>gi|12003425|gb|AAG43571.1|AF21
LVTLMTRTQA
>gi|12003427|gb|AAG43572.1|AF21
LVTLMTRTQA
>gi|12053672|emb|CAC20413.1|
LSRIITRIQA
>gi|12060489|dbj|BAB20630.1|
LSRIITRIQA
>gi|12657350|emb|CAC27776.1|
LASLVTLTQA
>gi|12657354|emb|CAC27778.1|
LAKLVTMTQA
>gi|127741|sp|P02563|MYH6-RAT
LSRIITRIQA
>gi|127748|sp|P02564|MYH7-RAT
LSRIITRIQA
>gi|127755|sp|P12847|MYH3-RAT
LAKLITRTQA
>gi|1289512|gb|AAC59911.1|
LSLIITRIQA
>gi|1289514|gb|AAC59912.1|
LSLIITRIQA
>gi|13431707|sp|Q28641|MYH4-RAB
LAQLITRTQA
>gi|13431711|sp|Q90339|MYSS-CYP
LALLVTMTQA
>gi|13431716|sp|Q9UKX2|MYH2-HUM
LAQLITRTQA
>gi|13431717|sp|Q9UKX3|MYHD-HUM
LVTLMTSTQA
>gi|13431724|sp|Q9Y623|MYH4-HUM
LAQLITRTQA
>gi|1346637|sp|P02565|MYH3-CHIC
LAQLITRTQA
>gi|13560269|dbj|BAB40920.1|
LAQLMTRTQA
>gi|13560273|dbj|BAB40922.1|
LSRIITRIQA
>gi|13638390|sp|P12882|MYH1-HUM
LAQLITRTQA
>gi|14017756|dbj|BAB47399.1|
LSLIITRIQA
>gi|15384839|emb|CAC59753.1|
LATLVTMTQA
>gi|1581130|prf||2116354A
LSRIITRIQA
>gi|1619328|emb|CAA27817.1|
LAKLITRTQA
>gi|16508127|gb|AAL17913.1|
LSRIITRIQA
>gi|1698895|gb|AAB37320.1|
LSRIITRIQA
>gi|17907763|dbj|BAB79445.1|
LAKILTMLQA
>gi|179508|gb|AAA51837.1|
LSRIITRIQA
>gi|179510|gb|AAA62830.1|
LSRIITRIQA
>gi|18859641|ref|NP-542766.1|
LSRIITRIQA
>gi|191618|gb|AAA37159.1|
LSRIITRIQA
>gi|191620|gb|AAA37160.1|
LSRIITRIQA
>gi|191622|gb|AAA37161.1|
LSRIITRIQA
>gi|191624|gb|AAA37162.1|
LSRIITRIQA
>gi|2119306|pir||I49464
LSRIITRIQA
>gi|2119307|pir||I48175
LSRIITRIQA
>gi|2119308|pir||I48153
LSRIITRIQA
>gi|212376|gb|AAA48972.1|
LAQLITRTQA
>gi|21623523|dbj|BAC00871.1|
LAALVGMVQA
>gi|21743235|dbj|BAB40921.2|
LAQLITRTQA
>gi|21907898|dbj|BAC05679.1|
LAQIITRTQA
>gi|21907900|dbj|BAC05680.1|
LAQIITRTQA
>gi|21907902|dbj|BAC05681.1|
LSRIITRIQA
>gi|219524|dbj|BAA00791.1|
LSRIITRIQA
>gi|22121649|gb|AAM88909.1|
LAQIITRTQA
>gi|23379831|gb|AAM88910.1|
LAQLITRTQA
>gi|2351219|dbj|BAA22067.1|
LSHLVTMTQA
>gi|2351221|dbj|BAA22068.1|
LVNLVTMTQA
>gi|2351223|dbj|BAA22069.1|
LALLVTMTQA
>gi|27764861|ref|NP-002462.1|
LSRIITRMQA
>gi|297024|emb|CAA79675.1|
LSRIITRMQA
>gi|3024204|sp|Q02566|MYH6-MOUS
LSRIITRIQA
>gi|3041706|sp|P13533|MYH6-HUMA
LSRIITRMQA
>gi|3041708|sp|P13540|MYH7-MESA
LSRIITRIQA
>gi|3043372|sp|P11055|MYH3-HUMA
LAKLITRTQA
>gi|34870884|ref|XP-213345.2|
LAQLITRTQA
>gi|34870892|ref|XP-340820.1|
LAQIITRTQA
>gi|37720046|gb|AAN71741.1|
LARILTGIQA
>gi|38091410|ref|XP-354614.1|
LAKLITRTQA
>gi|38091413|ref|XP-354615.1|
LAQLITRTQA
>gi|38177589|gb|AAF00096.2|AF11
LSLIISGIQA
>gi|38347761|dbj|BAD01606.1|
LSLLLTRTQA
>gi|38347763|dbj|BAD01607.1|
LSLLLTRTQA
>gi|38488753|ref|NP-942118.1|
LARILTGIQA
>gi|3915779|sp|P13539|MYH6-MESA
LSRIITRIQA
>gi|402372|gb|AAA62313.1|
LSRIITRIQA
>gi|402374|gb|AAB59701.1|
LSRIITRIQA
>gi|41350446|gb|AAS00505.1|
LAALVTMTQA
>gi|41386691|ref|NP-776542.1|
LAQLITRTQA
>gi|41386711|ref|NP-777152.1|
LSRIITRIQA
>gi|42476190|ref|NP-060004.2|
LAQLITRTQA
>gi|42662294|ref|XP-371398.2|
LAKVLTLLQA
>gi|45382109|ref|NP-990097.1|
LAKILTMIQA
>gi|45383005|ref|NP-989918.1|
LAKILTMLQA
>gi|45383668|ref|NP-989559.1|
LAQLITRTQA
>gi|4557773|ref|NP-000248.1|
LSRIITRIQA
>gi|45595719|gb|AAH67305.1|
LAQLITRTQA
>gi|476355|pir||A46762
LSRIITRIQA
>gi|4808809|gb|AAD29948.1|
LVTLMTSTQA
>gi|4808811|gb|AAD29949.1|
LAQLITRTQA
>gi|4808813|gb|AAD29950.1|
LAQLITRTQA
>gi|4808815|gb|AAD29951.1|
LAQLITRTQA
>gi|5360746|dbj|BAA82144.1|
LAQLITRTQA
>gi|5360748|dbj|BAA82145.1|
LAQLITRTQA
>gi|5360750|dbj|BAA82146.1|
LAQLITRTQA
>gi|547966|sp|P12883|MYH7-HUMAN
LSRIITRIQA
>gi|56655|emb|CAA34064.1|
LSRIITRIQA
>gi|6093461|sp|P79293|MYH7-PIG
LSRIITRIQA
>gi|6683485|dbj|BAA89233.1|
LAQLITRTQA
>gi|6708502|gb|AAD09454.2|
LAKIMTMLQC
>gi|7209643|dbj|BAA92289.1|
LATLVTMTQA
>gi|7248371|dbj|BAA92710.1|
LAKILTMIQA
>gi|7669506|ref|NP-005954.2|
LAQLITRTQA
>gi|8393804|ref|NP-058935.1|
LSRIITRIQA
>gi|8393807|ref|NP-058936.1|
LSRIITRIQA
>gi|86358|pir||A29320
LAQLITRTQA
>gi|88201|pir||S04090
LAKLITRTQA
>gi|92498|pir||S06005
LSRIITRIQA
>gi|92499|pir||S06006
LSRIITRIQA
>gi|92509|pir||A24922
LAKLITRTQA
>gi|940233|gb|AAA74199.1|
LAQLITRTQA
>gi|9800486|gb|AAF99314.1|AF272
LAQLITRTQA
>gi|9800488|gb|AAF99315.1|AF272
LAQLITRTQA
>gi|9971579|dbj|BAB12571.1|
LAALVTMTQA"""
myos_aln = DenseAlignment(data=myos_data.split('\n'),MolType=PROTEIN)
# a randomly generated tree to use in tests
tree20_string='(((0:0.5,1:0.5):0.5,(((2:0.5,3:0.5):0.5,(4:0.5,(5:0.5,6:0.5):0.5):0.5):0.5,((7:0.5,8:0.5):0.5,((9:0.5,((10:0.5,11:0.5):0.5,12:0.5):0.5):0.5,13:0.5):0.5):0.5):0.5):0.5,(((14:0.5,(15:0.5,16:0.5):0.5):0.5,17:0.5):0.5,(18:0.5,19:0.5):0.5):0.5);'
default_gctmpca_aa_sub_matrix_lines = default_gctmpca_aa_sub_matrix.split('\n')
default_gctmpca_aa_sub_matrix = {}
for aa,line in zip(gctmpca_aa_order,default_gctmpca_aa_sub_matrix_lines):
default_gctmpca_aa_sub_matrix[aa] = dict([(col_aa,float(rate)/100.) \
for col_aa,rate in zip(gctmpca_aa_order,line.split())])
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_evolve/test_likelihood_function.py 000644 000765 000024 00000057305 12024702176 025415 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""
Some tests for the likelihood function class.
tests to do:
setting of parameters, by coord, by for-all, checking pars sets
testing the likelihood for specified pars
getting ancestral probs
simulating sequence (not possible to verify values as random)
checking that the object resets on tree change, model change, etc
"""
import warnings
warnings.filterwarnings("ignore", "Motif probs overspecified")
warnings.filterwarnings("ignore", "Model not reversible")
warnings.filterwarnings("ignore", "Ignoring tree edge lengths")
import os
from numpy import ones, dot
from cogent.evolve import substitution_model, predicate
from cogent import DNA, LoadSeqs, LoadTree
from cogent.util.unit_test import TestCase, main
from cogent.maths.matrix_exponentiation import PadeExponentiator as expm
from cogent.maths.stats.information_criteria import aic, bic
from cogent.evolve.models import JTT92
Nucleotide = substitution_model.Nucleotide
MotifChange = predicate.MotifChange
__author__ = "Peter Maxwell and Gavin Huttley"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Peter Maxwell", "Gavin Huttley", "Rob Knight",
"Matthew Wakefield", "Brett Easton"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Production"
base_path = os.getcwd()
data_path = os.path.join(base_path, 'data')
ALIGNMENT = LoadSeqs(
moltype=DNA,
filename = os.path.join(data_path,'brca1.fasta'))
OTU_NAMES = ["Human", "Mouse", "HowlerMon"]
########################################################
# some funcs for assembling Q-matrices for 'manual' calc
def isTransition(motif1, motif2):
position = getposition(motif1, motif2)
a, b = motif1[position], motif2[position]
transitions = {('A', 'G') : 1, ('C', 'T'):1}
pair = (min(a, b), max(a, b))
return transitions.has_key(pair)
def numdiffs_position(motif1, motif2):
assert len(motif1) == len(motif2),\
"motif1[%s] & motif2[%s] have inconsistent length" %\
(motif1, motif2)
ndiffs, position = 0, -1
for i in range(len(motif1)):
if motif1[i] != motif2[i]:
position = i
ndiffs += 1
return ndiffs == 1, position
def isinstantaneous(motif1, motif2):
if motif1 != motif2 and (motif1 == '-' * len(motif1) or \
motif2 == '-' * len(motif1)):
return True
ndiffs, position = numdiffs_position(motif1, motif2)
return ndiffs
def getposition(motif1, motif2):
ndiffs, position = numdiffs_position(motif1, motif2)
return position
##############################################################
# funcs for testing the monomer weighted substitution matrices
_root_probs = lambda x: dict([(n1+n2, p1*p2) \
for n1,p1 in x.items() for n2,p2 in x.items()])
def make_p(length, coord, val):
"""returns a probability matrix with value set at coordinate in
instantaneous rate matrix"""
Q = ones((4,4), float)*0.25 # assumes equi-frequent mprobs at root
for i in range(4):
Q[i,i] = 0.0
Q[coord] *= val
row_sum = Q.sum(axis=1)
scale = 1/(.25*row_sum).sum()
for i in range(4):
Q[i,i] -= row_sum[i]
Q *= scale
return expm(Q)(length)
class LikelihoodCalcs(TestCase):
"""tests ability to calculate log-likelihoods for several
substitution models."""
def setUp(self):
self.alignment = ALIGNMENT.takeSeqs(OTU_NAMES)[0: 42]
self.tree = LoadTree(tip_names=OTU_NAMES)
def _makeLikelihoodFunction(self, submod, translate=False, **kw):
alignment = self.alignment
if translate:
alignment = alignment.getTranslation()
calc = submod.makeLikelihoodFunction(self.tree, **kw)
calc.setAlignment(alignment)
calc.setParamRule('length', value=1.0, is_constant=True)
if not translate:
calc.setParamRule('kappa', value=3.0, is_constant=True)
return calc
def test_no_seq_named_root(self):
"""root is a reserved name"""
aln = self.alignment.takeSeqs(self.alignment.Names[:4])
aln = aln.todict()
one = aln.pop(aln.keys()[0])
aln["root"] = one
aln = LoadSeqs(data=aln)
submod = Nucleotide()
tree = LoadTree(treestring="%s" % str(tuple(aln.Names)))
lf = submod.makeLikelihoodFunction(tree)
try:
lf.setAlignment(aln)
except AssertionError:
pass
collection = aln.degap().NamedSeqs
collection.pop("Human")
tree = LoadTree(treestring="%s" % str(tuple(collection.keys())))
lf = submod.makeLikelihoodFunction(tree, aligned=False)
try:
lf.setSequences(collection)
except AssertionError:
pass
def test_binned_gamma(self):
"""just rate is gamma distributed"""
submod = substitution_model.Codon(
predicates={'kappa': 'transition', 'omega': 'replacement'},
ordered_param='rate', distribution='gamma', mprob_model='tuple')
lf = self._makeLikelihoodFunction(submod, bins=3)
try:
values = lf.getParamValueDict(['bin'])['omega_factor'].values()
except KeyError:
# there shouldn't be an omega factor
pass
values = lf.getParamValueDict(['bin'])['rate'].values()
obs = round(sum(values) / len(values), 6)
self.assertEqual(obs, 1.0)
self.assertEqual(len(values), 3)
shape = lf.getParamValue('rate_shape')
def test_binned_gamma_ordered_param(self):
"""rate is gamma distributed omega follows"""
submod = substitution_model.Codon(
predicates={'kappa': 'transition', 'omega': 'replacement'},
ordered_param='rate', partitioned_params='omega',
distribution='gamma', mprob_model='tuple')
lf = self._makeLikelihoodFunction(submod,bins=3)
values = lf.getParamValueDict(['bin'])['omega_factor'].values()
self.assertEqual(round(sum(values) / len(values), 6), 1.0)
self.assertEqual(len(values), 3)
shape = lf.getParamValue('rate_shape')
def test_binned_partition(self):
submod = substitution_model.Codon(
predicates={'kappa': 'transition', 'omega': 'replacement'},
ordered_param='rate', partitioned_params='omega',
distribution='free', mprob_model='tuple')
lf = self._makeLikelihoodFunction(submod, bins=3)
values = lf.getParamValueDict(['bin'])['omega_factor'].values()
self.assertEqual(round(sum(values) / len(values), 6), 1.0)
self.assertEqual(len(values), 3)
def test_complex_binned_partition(self):
submod = substitution_model.Codon(
predicates={'kappa': 'transition', 'omega': 'replacement'},
ordered_param='kappa', partitioned_params=['omega'],
mprob_model='tuple')
lf = self._makeLikelihoodFunction(submod,
bins=['slow', 'fast'])
lf.setParamRule('kappa', value=1.0, is_constant=True)
lf.setParamRule('kappa', edge="Human", init=1.0, is_constant=False)
values = lf.getParamValueDict(['bin'])['kappa_factor'].values()
self.assertEqual(round(sum(values) / len(values), 6), 1.0)
self.assertEqual(len(values), 2)
def test_codon(self):
"""test a three taxa codon model."""
submod = substitution_model.Codon(
equal_motif_probs=True,
do_scaling=False,
motif_probs=None,
predicates={'kappa': 'transition', 'omega': 'replacement'},
mprob_model='tuple')
likelihood_function = self._makeLikelihoodFunction(submod)
likelihood_function.setParamRule('omega', value=0.5, is_constant=True)
evolve_lnL = likelihood_function.getLogLikelihood()
self.assertFloatEqual(evolve_lnL, -80.67069614541883)
def test_nucleotide(self):
"""test a nucleotide model."""
submod = Nucleotide(
equal_motif_probs=True,
do_scaling=False,
motif_probs=None,
predicates={'kappa': 'transition'})
# now do using the evolve
likelihood_function = self._makeLikelihoodFunction(
submod)
self.assertEqual(likelihood_function.getNumFreeParams(), 0)
evolve_lnL = likelihood_function.getLogLikelihood()
self.assertFloatEqual(evolve_lnL, -157.49363874840455)
def test_discrete_nucleotide(self):
"""test that partially discrete nucleotide model can be constructed,
differs from continuous, and has the expected number of free params"""
submod = Nucleotide(
equal_motif_probs=True,
do_scaling=False,
motif_probs=None,
predicates={'kappa': 'transition'})
likelihood_function = self._makeLikelihoodFunction(
submod, discrete_edges=['Human'])
self.assertEqual(likelihood_function.getNumFreeParams(), 12)
evolve_lnL = likelihood_function.getLogLikelihood()
self.assertNotEqual(evolve_lnL, -157.49363874840455)
def test_dinucleotide(self):
"""test a dinucleotide model."""
submod = substitution_model.Dinucleotide(
equal_motif_probs=True,
do_scaling=False,
motif_probs = None,
predicates = {'kappa': 'transition'},
mprob_model='tuple')
likelihood_function = self._makeLikelihoodFunction(submod)
evolve_lnL = likelihood_function.getLogLikelihood()
self.assertFloatEqual(evolve_lnL, -102.48145536663735)
def test_protein(self):
"""test a protein model."""
submod = substitution_model.Protein(
do_scaling=False, equal_motif_probs=True)
likelihood_function = self._makeLikelihoodFunction(submod,
translate=True)
evolve_lnL = likelihood_function.getLogLikelihood()
self.assertFloatEqual(evolve_lnL, -89.830370754876185)
class LikelihoodFunctionTests(TestCase):
"""tests for a tree analysis class. Various tests to create a tree analysis class,
set parameters, and test various functions.
"""
def setUp(self):
self.submodel = Nucleotide(
do_scaling=True, model_gaps=False, equal_motif_probs=True,
predicates = {'beta': 'transition'})
self.data = LoadSeqs(
filename = os.path.join(data_path, 'brca1_5.paml'),
moltype = self.submodel.MolType)
self.tree = LoadTree(
filename = os.path.join(data_path, 'brca1_5.tree'))
def _makeLikelihoodFunction(self, **kw):
lf = self.submodel.makeLikelihoodFunction(self.tree, **kw)
lf.setParamRule('beta', is_independent=True)
lf.setAlignment(self.data)
return lf
def _setLengthsAndBetas(self, likelihood_function):
for (species, length) in [
("DogFaced", 0.1),
("NineBande", 0.2),
("Human", 0.3),
("HowlerMon", 0.4),
("Mouse", 0.5)]:
likelihood_function.setParamRule("length", value=length,
edge=species, is_constant=True)
for (species1, species2, length) in [
("Human", "HowlerMon", 0.7),
("Human", "Mouse", 0.6)]:
LCA = self.tree.getConnectingNode(species1, species2).Name
likelihood_function.setParamRule("length", value=length,
edge=LCA, is_constant=True)
likelihood_function.setParamRule("beta", value=4.0, is_constant=True)
def test_information_criteria(self):
"""test get information criteria from a model."""
lf = self._makeLikelihoodFunction()
nfp = lf.getNumFreeParams()
lnL = lf.getLogLikelihood()
l = len(self.data)
self.assertFloatEqual(lf.getAic(), aic(lnL, nfp))
self.assertFloatEqual(lf.getAic(second_order=True),
aic(lnL, nfp, l))
self.assertFloatEqual(lf.getBic(), bic(lnL, nfp, l))
def test_result_str(self):
# actualy more a test of self._setLengthsAndBetas()
likelihood_function = self._makeLikelihoodFunction()
self._setLengthsAndBetas(likelihood_function)
self.assertEqual(str(likelihood_function), \
"""Likelihood Function Table\n\
======
beta
------
4.0000
------
=============================
edge parent length
-----------------------------
Human edge.0 0.3000
HowlerMon edge.0 0.4000
edge.0 edge.1 0.7000
Mouse edge.1 0.5000
edge.1 root 0.6000
NineBande root 0.2000
DogFaced root 0.1000
-----------------------------
===============
motif mprobs
---------------
T 0.2500
C 0.2500
A 0.2500
G 0.2500
---------------""")
likelihood_function = self._makeLikelihoodFunction(digits=2,space=2)
self.assertEqual(str(likelihood_function), \
"""Likelihood Function Table\n\
===============================
edge parent length beta
-------------------------------
Human edge.0 1.00 1.00
HowlerMon edge.0 1.00 1.00
edge.0 edge.1 1.00 1.00
Mouse edge.1 1.00 1.00
edge.1 root 1.00 1.00
NineBande root 1.00 1.00
DogFaced root 1.00 1.00
-------------------------------
=============
motif mprobs
-------------
T 0.25
C 0.25
A 0.25
G 0.25
-------------""")
def test_calclikelihood(self):
likelihood_function = self._makeLikelihoodFunction()
self._setLengthsAndBetas(likelihood_function)
self.assertAlmostEquals(-250.686745262,
likelihood_function.getLogLikelihood(),places=9)
def test_g_statistic(self):
likelihood_function = self._makeLikelihoodFunction()
self._setLengthsAndBetas(likelihood_function)
self.assertAlmostEquals(230.77670557,
likelihood_function.getGStatistic(),places=6)
def test_ancestralsequences(self):
likelihood_function = self._makeLikelihoodFunction()
self._setLengthsAndBetas(likelihood_function)
result = likelihood_function.reconstructAncestralSeqs()['edge.0']
a_column_with_mostly_Ts = -1
motif_G = 2
self.assertAlmostEquals(2.28460181711e-05,
result[a_column_with_mostly_Ts][motif_G], places=8)
lf = self.submodel.makeLikelihoodFunction(self.tree, bins=['low', 'high'])
lf.setParamRule('beta', bin='low', value=0.1)
lf.setParamRule('beta', bin='high', value=10.0)
lf.setAlignment(self.data)
result = lf.reconstructAncestralSeqs()
def test_likely_ancestral(self):
"""excercising the most likely ancestral sequences"""
likelihood_function = self._makeLikelihoodFunction()
self._setLengthsAndBetas(likelihood_function)
result = likelihood_function.likelyAncestralSeqs()
def test_simulateAlignment(self):
"Simulate DNA alignment"
likelihood_function = self._makeLikelihoodFunction()
self._setLengthsAndBetas(likelihood_function)
simulated_alignment = likelihood_function.simulateAlignment(20, exclude_internal = False)
self.assertEqual(len(simulated_alignment), 20)
self.assertEqual(len(simulated_alignment.getSeqNames()), 8)
def test_simulateHetergeneousAlignment(self):
"Simulate substitution-heterogeneous DNA alignment"
lf = self.submodel.makeLikelihoodFunction(self.tree, bins=['low', 'high'])
lf.setParamRule('beta', bin='low', value=0.1)
lf.setParamRule('beta', bin='high', value=10.0)
simulated_alignment = lf.simulateAlignment(100)
def test_simulatePatchyHetergeneousAlignment(self):
"Simulate patchy substitution-heterogeneous DNA alignment"
lf = self.submodel.makeLikelihoodFunction(self.tree, bins=['low', 'high'], sites_independent=False)
lf.setParamRule('beta', bin='low', value=0.1)
lf.setParamRule('beta', bin='high', value=10.0)
simulated_alignment = lf.simulateAlignment(100)
def test_simulateAlignment2(self):
"Simulate alignment with dinucleotide model"
al = LoadSeqs(data={'a':'ggaatt','c':'cctaat'})
t = LoadTree(treestring="(a,c);")
sm = substitution_model.Dinucleotide(mprob_model='tuple')
lf = sm.makeParamController(t)
lf.setAlignment(al)
simalign = lf.simulateAlignment()
self.assertEqual(len(simalign), 6)
def test_simulateAlignment3(self):
"""Simulated alignment with gap-induced ambiguous positions
preserved"""
t = LoadTree(treestring='(a:0.4,b:0.3,(c:0.15,d:0.2)edge.0:0.1)root;')
al = LoadSeqs(data={
'a':'g--cactat?',
'b':'---c-ctcct',
'c':'-a-c-ctat-',
'd':'-a-c-ctat-'})
sm = Nucleotide(recode_gaps=True)
lf = sm.makeParamController(t)
#pc.setConstantLengths()
lf.setAlignment(al)
#print lf.simulateAlignment(sequence_length=10)
simulated = lf.simulateAlignment()
self.assertEqual(len(simulated.getSeqNames()), 4)
import re
self.assertEqual(
re.sub('[ATCG]', 'x', simulated.todict()['a']),
'x??xxxxxx?')
def test_simulateAlignment_root_sequence(self):
"""provide a root sequence for simulating an alignment"""
def use_root_seq(root_sequence):
al = LoadSeqs(data={'a':'ggaatt','c':'cctaat'})
t = LoadTree(treestring="(a,c);")
sm = substitution_model.Dinucleotide(mprob_model='tuple')
lf = sm.makeParamController(t)
lf.setAlignment(al)
simalign = lf.simulateAlignment(exclude_internal=False,
root_sequence=root_sequence)
root = simalign.NamedSeqs['root']
self.assertEqual(str(root), str(root_sequence))
root_sequence = DNA.makeSequence('GTAATT')
use_root_seq(root_sequence) # as a sequence instance
use_root_seq('GTAATC') # as a string
def test_pc_initial_parameters(self):
"""Default parameter values from original annotated tree"""
likelihood_function = self._makeLikelihoodFunction()
self._setLengthsAndBetas(likelihood_function)
tree = likelihood_function.getAnnotatedTree()
lf = self.submodel.makeParamController(tree)
lf.setAlignment(self.data)
self.assertEqual(lf.getParamValue("length", "Human"), 0.3)
self.assertEqual(lf.getParamValue("beta", "Human"), 4.0)
def test_set_par_all(self):
likelihood_function = self._makeLikelihoodFunction()
likelihood_function.setParamRule("length", value=4.0, is_constant=True)
likelihood_function.setParamRule("beta", value=6.0, is_constant=True)
self.assertEqual(str(likelihood_function), \
"""Likelihood Function Table
======
beta
------
6.0000
------
=============================
edge parent length
-----------------------------
Human edge.0 4.0000
HowlerMon edge.0 4.0000
edge.0 edge.1 4.0000
Mouse edge.1 4.0000
edge.1 root 4.0000
NineBande root 4.0000
DogFaced root 4.0000
-----------------------------
===============
motif mprobs
---------------
T 0.2500
C 0.2500
A 0.2500
G 0.2500
---------------""")
#self.submodel.setScaleRule("ts",['beta'])
#self.submodel.setScaleRule("tv",['beta'], exclude_pars = True)
self.assertEqual(str(likelihood_function),\
"""Likelihood Function Table
======
beta
------
6.0000
------
=============================
edge parent length
-----------------------------
Human edge.0 4.0000
HowlerMon edge.0 4.0000
edge.0 edge.1 4.0000
Mouse edge.1 4.0000
edge.1 root 4.0000
NineBande root 4.0000
DogFaced root 4.0000
-----------------------------
===============
motif mprobs
---------------
T 0.2500
C 0.2500
A 0.2500
G 0.2500
---------------""")
def test_getMotifProbs(self):
likelihood_function = self._makeLikelihoodFunction()
mprobs = likelihood_function.getMotifProbs()
assert hasattr(mprobs, 'keys'), mprobs
keys = mprobs.keys()
keys.sort()
obs = self.submodel.getMotifs()
obs.sort()
self.assertEqual(obs, keys)
def test_getAnnotatedTree(self):
likelihood_function = self._makeLikelihoodFunction()
likelihood_function.setParamRule("length", value=4.0, edge="Human", is_constant=True)
result = likelihood_function.getAnnotatedTree()
self.assertEqual(result.getNodeMatchingName('Human').params['length'], 4.0)
self.assertEqual(result.getNodeMatchingName('Human').Length, 4.0)
def test_getparamsasdict(self):
likelihood_function = self._makeLikelihoodFunction()
likelihood_function.setName("TEST")
self.assertEqual(str(likelihood_function),\
"""TEST
=======================================
edge parent length beta
---------------------------------------
Human edge.0 1.0000 1.0000
HowlerMon edge.0 1.0000 1.0000
edge.0 edge.1 1.0000 1.0000
Mouse edge.1 1.0000 1.0000
edge.1 root 1.0000 1.0000
NineBande root 1.0000 1.0000
DogFaced root 1.0000 1.0000
---------------------------------------
===============
motif mprobs
---------------
T 0.2500
C 0.2500
A 0.2500
G 0.2500
---------------""")
self.assertEqual(likelihood_function.getParamValueDict(['edge']), {
'beta': {'NineBande': 1.0, 'edge.1': 1.0,'DogFaced': 1.0, 'Human': 1.0,
'edge.0': 1.0, 'Mouse': 1.0, 'HowlerMon': 1.0},
'length': {'NineBande': 1.0,'edge.1': 1.0, 'DogFaced': 1.0, 'Human': 1.0,
'edge.0': 1.0, 'Mouse': 1.0,'HowlerMon': 1.0}})
def test_get_statistics_from_empirical_model(self):
"""should return valid dict from an empirical substitution model"""
submod = JTT92()
aln = self.data.getTranslation()
lf = submod.makeLikelihoodFunction(self.tree)
lf.setAlignment(aln)
stats = lf.getParamValueDict(['edge'], params=['length'])
def test_constant_to_free(self):
"""excercise setting a constant param rule, then freeing it"""
# checks by just trying to make the calculator
lf = self.submodel.makeLikelihoodFunction(self.tree)
lf.setAlignment(self.data)
lf.setParamRule('beta', is_constant=True, value=2.0,
edges=['NineBande', 'DogFaced'], is_clade=True)
lf.setParamRule('beta', init=2.0, is_constant=False,
edges=['NineBande', 'DogFaced'], is_clade=True)
def test_get_psub_rate_matrix(self):
"""lf should return consistent rate matrix and psub"""
lf = self.submodel.makeLikelihoodFunction(self.tree)
lf.setAlignment(self.data)
Q = lf.getRateMatrixForEdge('NineBande')
P = lf.getPsubForEdge('NineBande')
self.assertFloatEqual(expm(Q.array)(1.0), P.array)
# should fail for a discrete Markov model
dm = substitution_model.DiscreteSubstitutionModel(DNA.Alphabet)
lf = dm.makeLikelihoodFunction(self.tree)
lf.setAlignment(self.data)
self.assertRaises(Exception, lf.getRateMatrixForEdge, 'NineBande')
def test_make_discrete_markov(self):
"""lf ignores tree lengths if a discrete Markov model"""
t = LoadTree(treestring='(a:0.4,b:0.3,(c:0.15,d:0.2)edge.0:0.1)root;')
dm = substitution_model.DiscreteSubstitutionModel(DNA.Alphabet)
lf = dm.makeLikelihoodFunction(t)
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_evolve/test_models.py 000644 000765 000024 00000003623 12024702176 022642 0 ustar 00jrideout staff 000000 000000 from cogent.util.unit_test import TestCase, main
from cogent.evolve.models import JC69, F81, HKY85, TN93, GTR, \
MG94HKY, MG94GTR, GY94, H04G, H04GK, H04GGK, \
DSO78, AH96, AH96_mtmammals, JTT92, WG01, CNFGTR, CNFHKY, \
WG01_matrix, WG01_freqs
__author__ = "Gavin Huttley"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Gavin Huttley"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Production"
class CannedModelsTest(TestCase):
"""Check each canned model can actually be instantiated."""
def _instantiate_models(self, models, **kwargs):
for model in models:
model(**kwargs)
def test_nuc_models(self):
"""excercising nucleotide model construction"""
self._instantiate_models([JC69, F81, HKY85, GTR])
def test_codon_models(self):
"""excercising codon model construction"""
self._instantiate_models([CNFGTR, CNFHKY, MG94HKY, MG94GTR, GY94,
H04G, H04GK, H04GGK])
def test_aa_models(self):
"""excercising aa model construction"""
self._instantiate_models([DSO78, AH96, AH96_mtmammals, JTT92, WG01])
def test_bin_options(self):
kwargs = dict(with_rate=True, distribution='gamma')
model = WG01(**kwargs)
model = GTR(**kwargs)
def test_empirical_values_roundtrip(self):
model = WG01()
assert model.getMotifProbs() == WG01_freqs
assert (model.calcExchangeabilityMatrix('dummy_mprobs') ==
WG01_matrix).all()
def test_solved_models(self):
for klass in [TN93, HKY85, F81]:
for scaled in [True, False]:
model = klass(rate_matrix_required=False, do_scaling=scaled)
model.checkPsubCalculationsMatch()
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_evolve/test_motifchange.py 000644 000765 000024 00000013272 12024702176 023644 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
import unittest
from cogent.evolve.predicate import MotifChange
from cogent.core.moltype import CodonAlphabet
__author__ = "Peter Maxwell"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Peter Maxwell", "Gavin Huttley", "Rob Knight",
"Matthew Wakefield", "Brett Easton"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Production"
class FakeModel(object):
def __init__(self, alphabet):
self.alphabet = alphabet
self.MolType = alphabet.MolType
def getAlphabet(self):
return self.alphabet
class TestPredicates(unittest.TestCase):
def setUp(self):
self.alphabet = CodonAlphabet()
self.model = FakeModel(self.alphabet)
def _makeMotifChange(self, *args, **kw):
pred = MotifChange(*args, **kw)
return pred.interpret(self.model)
def assertMatch(self, pred, seq1, seq2):
assert pred(seq1, seq2), (pred.__doc__, (seq1, seq2))
def assertNoMatch(self, pred, seq1, seq2):
assert not pred(seq1, seq2), ('not ' + pred.__doc__, (seq1, seq2))
def test_indels(self):
indel = self._makeMotifChange('---', 'NNN')
self.assertMatch(indel, '---', 'AAA')
def test_impossible_change(self):
self.assertRaises(Exception,
self._makeMotifChange, '----', 'NNNN')
def test_isfromcpg(self):
isFromCpG = self._makeMotifChange('CG', forward_only = True)
self.assertMatch(isFromCpG, 'CG', 'CA')
self.assertMatch(isFromCpG, 'CG', 'TG')
self.assertMatch(isFromCpG, 'ACG', 'ATG')
self.assertMatch(isFromCpG, 'CGT', 'CTT')
self.assertNoMatch(isFromCpG, 'CTT', 'CGT')
self.assertNoMatch(isFromCpG, 'C', 'G')
def test_isfromtocpg(self):
isFromToCpG = self._makeMotifChange('CG')
self.assertMatch(isFromToCpG, 'CG', 'CA')
self.assertMatch(isFromToCpG, 'CG', 'TG')
self.assertMatch(isFromToCpG, 'ACG', 'ATG')
self.assertMatch(isFromToCpG, 'CGT', 'CTT')
self.assertMatch(isFromToCpG, 'CTT', 'CGT')
def test_isFromToCpA_C_only(self):
isFromToCpA_C_only = self._makeMotifChange('CA', diff_at = 0)
self.assertMatch(isFromToCpA_C_only, 'CA', 'TA')
self.assertMatch(isFromToCpA_C_only, 'TCA', 'TTA')
self.assertMatch(isFromToCpA_C_only, 'TAA', 'CAA')
self.assertNoMatch(isFromToCpA_C_only, 'TCA', 'TCT')
def test_isFromCpA_C_only(self):
isFromCpA_C_only = self._makeMotifChange('CA', forward_only = True, diff_at = 0)
self.assertMatch(isFromCpA_C_only, 'CA', 'TA')
self.assertMatch(isFromCpA_C_only, 'TCA', 'TTA')
self.assertNoMatch(isFromCpA_C_only, 'TAA', 'CAA')
def test_isCpT_T_only(self):
isCpT_T_only = self._makeMotifChange('CT', diff_at = 1)
self.assertMatch(isCpT_T_only, 'CT', 'CA')
self.assertMatch(isCpT_T_only, 'TCA', 'TCT')
self.assertNoMatch(isCpT_T_only, 'TTA', 'TCA')
self.assertNoMatch(isCpT_T_only, 'TA', 'CT')
def test_isCCC(self):
isCCC = self._makeMotifChange('CCC')
self.assertNoMatch(isCCC, 'CC', 'CT')
def test_isC(self):
isC = self._makeMotifChange('C')
self.assertMatch(isC, 'C', 'T')
self.assertNoMatch(isC, 'CA', 'CT')
self.assertMatch(isC, 'CA', 'CC')
self.assertMatch(isC, 'CAT', 'GAT')
self.assertMatch(isC, 'CAT', 'CCT')
self.assertMatch(isC, 'CAT', 'CAC')
self.assertNoMatch(isC, 'CAT', 'CAA')
self.assertNoMatch(isC, 'C', 'C')
def test_isCtoT(self):
isCtoT = self._makeMotifChange('C', 'T')
self.assertMatch(isCtoT, 'C', 'T')
self.assertMatch(isCtoT, 'T', 'C')
self.assertNoMatch(isCtoT, 'T', 'A')
isCtoT = self._makeMotifChange('C', 'T', forward_only = True)
self.assertMatch(isCtoT, 'C', 'T')
self.assertNoMatch(isCtoT, 'T', 'C')
def test_isCGtoCA(self):
isCG_CA = self._makeMotifChange('CG', 'CA')
self.assertMatch(isCG_CA, 'CG', 'CA')
self.assertMatch(isCG_CA, 'CA', 'CG')
self.assertMatch(isCG_CA, 'CAT', 'CGT')
self.assertMatch(isCG_CA, 'CGT', 'CAT')
self.assertMatch(isCG_CA, 'TCA', 'TCG')
self.assertNoMatch(isCG_CA, 'TCT', 'TCG')
self.assertMatch(isCG_CA, 'CGTT', 'CATT')
self.assertMatch(isCG_CA, 'TCGT', 'TCAT')
self.assertMatch(isCG_CA, 'TTCG', 'TTCA')
self.assertMatch(isCG_CA, 'CATT', 'CGTT')
self.assertMatch(isCG_CA, 'TCAT', 'TCGT')
self.assertMatch(isCG_CA, 'TTCA', 'TTCG')
isCG_CA = self._makeMotifChange('CG', 'CA', forward_only = True)
self.assertMatch(isCG_CA, 'CGTT', 'CATT')
self.assertMatch(isCG_CA, 'TCGT', 'TCAT')
self.assertMatch(isCG_CA, 'TTCG', 'TTCA')
self.assertNoMatch(isCG_CA, 'CATT', 'CGTT')
self.assertNoMatch(isCG_CA, 'TCAT', 'TCGT')
self.assertNoMatch(isCG_CA, 'TTCA', 'TTCG')
isCG = self._makeMotifChange('CG', diff_at = 1)
self.assertMatch(isCG, 'CGTT', 'CATT')
self.assertMatch(isCG, 'TCGT', 'TCAT')
self.assertMatch(isCG, 'TTCG', 'TTCA')
self.assertNoMatch(isCG, 'CGTT', 'TGTT')
self.assertNoMatch(isCG, 'TCGT', 'TAGT')
self.assertNoMatch(isCG, 'TTCG', '--GG')
def test_wildcards(self):
isCG_CN = self._makeMotifChange('CG', 'CN')
self.assertMatch(isCG_CN, 'CG', 'CA')
self.assertNoMatch(isCG_CN, 'CG', 'CG')
self.assertNoMatch(isCG_CN, 'CG', 'C-')
if __name__ == '__main__':
unittest.main()
PyCogent-1.5.3/tests/test_evolve/test_newq.py 000644 000765 000024 00000043010 12024702176 022323 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
import warnings
warnings.filterwarnings("ignore", "Motif probs overspecified")
warnings.filterwarnings("ignore", "Model not reversible")
from numpy import ones, dot, array
from cogent import LoadSeqs, DNA, LoadTree, LoadTable
from cogent.evolve.substitution_model import Nucleotide, General, \
GeneralStationary
from cogent.evolve.discrete_markov import DiscreteSubstitutionModel
from cogent.evolve.predicate import MotifChange
from cogent.util.unit_test import TestCase, main
from cogent.maths.matrix_exponentiation import PadeExponentiator as expm
__author__ = "Peter Maxwell and Gavin Huttley"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Gavin Huttley"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Production"
def _dinuc_root_probs(x,y=None):
if y is None:
y = x
return dict([(n1+n2, p1*p2)
for n1,p1 in x.items() for n2,p2 in y.items()])
def _trinuc_root_probs(x,y,z):
return dict([(n1+n2+n3, p1*p2*p3)
for n1,p1 in x.items() for n2,p2 in y.items()
for n3,p3 in z.items()])
def make_p(length, coord, val):
"""returns a probability matrix with value set at coordinate in
instantaneous rate matrix"""
Q = ones((4,4), float)*0.25 # assumes equi-frequent mprobs at root
for i in range(4):
Q[i,i] = 0.0
Q[coord] *= val
row_sum = Q.sum(axis=1)
scale = 1/(.25*row_sum).sum()
for i in range(4):
Q[i,i] -= row_sum[i]
Q *= scale
return expm(Q)(length)
class NewQ(TestCase):
aln = LoadSeqs(data={
'seq1': 'TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACT',
'seq2': 'TGTGGCACAAATACTCATGCCAGCTCATTACAGCATGAGAACAGCAGTTTATTACTCACT'},
moltype=DNA)
tree = LoadTree(tip_names=['seq1', 'seq2'])
symm_nuc_probs = dict(A=0.25,T=0.25,C=0.25,G=0.25)
symm_root_probs = _dinuc_root_probs(symm_nuc_probs)
asymm_nuc_probs = dict(A=0.1,T=0.1,C=0.4,G=0.4)
asymm_root_probs = _dinuc_root_probs(asymm_nuc_probs)
posn_root_probs = _dinuc_root_probs(symm_nuc_probs, asymm_nuc_probs)
cond_root_probs = dict([(n1+n2, p1*[.1, .7][n1==n2])
for n1,p1 in asymm_nuc_probs.items() for n2 in 'ATCG'])
# Each of these (data, model) pairs should give a result different
# from any of the simpler models applied to the same data.
ordered_by_complexity = [
# P(AA) == P(GG) == P(AG)
[symm_root_probs, 'tuple'],
# P(GA) == P(AG) but P(AA) != P(GG)
[asymm_root_probs, 'monomer'],
# P(AG) == P(A?)*P(?G) but P(A?) != P(?A)
[posn_root_probs, 'monomers'],
# P(AG) != P(A?)*P(?G)
[cond_root_probs, 'conditional'],
]
def test_newQ_is_nuc_process(self):
"""newQ is an extension of an independent nucleotide process"""
nuc = Nucleotide(motif_probs = self.asymm_nuc_probs)
new_di = Nucleotide(motif_length=2, mprob_model='monomer',
motif_probs = self.asymm_root_probs)
nuc_lf = nuc.makeLikelihoodFunction(self.tree)
new_di_lf = new_di.makeLikelihoodFunction(self.tree)
# newQ branch length is exactly motif_length*nuc branch length
nuc_lf.setParamRule('length', is_independent=False, init=0.2)
new_di_lf.setParamRule('length', is_independent=False, init=0.4)
nuc_lf.setAlignment(self.aln)
new_di_lf.setAlignment(self.aln)
self.assertFloatEqual(nuc_lf.getLogLikelihood(),
new_di_lf.getLogLikelihood())
def test_lf_display(self):
"""str of likelihood functions should not fail"""
for (dummy, model) in self.ordered_by_complexity:
di = Nucleotide(motif_length=2, mprob_model=model)
di.adaptMotifProbs(self.cond_root_probs, auto=True)
lf = di.makeLikelihoodFunction(self.tree)
s = str(lf)
def test_get_statistics(self):
"""get statistics should correctly apply arguments"""
for (mprobs, model) in self.ordered_by_complexity:
di = Nucleotide(motif_length=2, motif_probs=mprobs,
mprob_model=model)
lf = di.makeLikelihoodFunction(self.tree)
for wm, wt in [(True, True), (True, False), (False, True),
(False, False)]:
stats = lf.getStatistics(with_motif_probs=wm, with_titles=wt)
def test_sim_alignment(self):
"""should be able to simulate an alignment under all models"""
for (mprobs, model) in self.ordered_by_complexity:
di = Nucleotide(motif_length=2, motif_probs=mprobs,
mprob_model=model)
lf = di.makeLikelihoodFunction(self.tree)
lf.setParamRule('length', is_independent=False, init=0.4)
lf.setAlignment(self.aln)
sim = lf.simulateAlignment()
def test_reconstruct_ancestor(self):
"""should be able to reconstruct ancestral sequences under all
models"""
for (mprobs, model) in self.ordered_by_complexity:
di = Nucleotide(motif_length=2, mprob_model=model)
di.adaptMotifProbs(mprobs, auto=True)
lf = di.makeLikelihoodFunction(self.tree)
lf.setParamRule('length', is_independent=False, init=0.4)
lf.setAlignment(self.aln)
ancestor = lf.reconstructAncestralSeqs()
def test_results_different(self):
for (i, (mprobs, dummy)) in enumerate(self.ordered_by_complexity):
results = []
for (dummy, model) in self.ordered_by_complexity:
di = Nucleotide(motif_length=2, motif_probs=mprobs,
mprob_model=model)
lf = di.makeLikelihoodFunction(self.tree)
lf.setParamRule('length', is_independent=False, init=0.4)
lf.setAlignment(self.aln)
lh = lf.getLogLikelihood()
for other in results[:i]:
self.failIfAlmostEqual(other, lh, places=2)
for other in results[i:]:
self.assertFloatEqual(other, lh)
results.append(lh)
def test_position_specific_mprobs(self):
"""correctly compute likelihood when positions have distinct
probabilities"""
aln_len = len(self.aln)
posn1 = []
posn2 = []
for name, seq in self.aln.todict().items():
p1 = [seq[i] for i in range(0,aln_len,2)]
p2 = [seq[i] for i in range(1,aln_len,2)]
posn1.append([name, ''.join(p1)])
posn2.append([name, ''.join(p2)])
# the position specific alignments
posn1 = LoadSeqs(data=posn1)
posn2 = LoadSeqs(data=posn2)
# a newQ dinucleotide model
sm = Nucleotide(motif_length=2, mprob_model='monomer', do_scaling=False)
lf = sm.makeLikelihoodFunction(self.tree)
lf.setAlignment(posn1)
posn1_lnL = lf.getLogLikelihood()
lf.setAlignment(posn2)
posn2_lnL = lf.getLogLikelihood()
expect_lnL = posn1_lnL+posn2_lnL
# the joint model
lf.setAlignment(self.aln)
aln_lnL = lf.getLogLikelihood()
# setting the full alignment, which has different motif probs, should
# produce a different lnL
self.failIfAlmostEqual(expect_lnL, aln_lnL)
# set the arguments for taking position specific mprobs
sm = Nucleotide(motif_length=2, mprob_model='monomers',
do_scaling=False)
lf = sm.makeLikelihoodFunction(self.tree)
lf.setAlignment(self.aln)
posn12_lnL = lf.getLogLikelihood()
self.assertFloatEqual(expect_lnL, posn12_lnL)
def test_compute_conditional_mprobs(self):
"""equal likelihood from position specific and conditional mprobs"""
def compare_models(motif_probs, motif_length):
# if the 1st and 2nd position motifs are independent of each other
# then conditional is the same as positional
ps = Nucleotide(motif_length=motif_length, motif_probs=motif_probs,
mprob_model='monomers')
cd = Nucleotide(motif_length=motif_length,motif_probs=motif_probs,
mprob_model='conditional')
ps_lf = ps.makeLikelihoodFunction(self.tree)
ps_lf.setParamRule('length', is_independent=False, init=0.4)
ps_lf.setAlignment(self.aln)
cd_lf = cd.makeLikelihoodFunction(self.tree)
cd_lf.setParamRule('length', is_independent=False, init=0.4)
cd_lf.setAlignment(self.aln)
self.assertFloatEqual(cd_lf.getLogLikelihood(),
ps_lf.getLogLikelihood())
compare_models(self.posn_root_probs, 2)
# trinucleotide
trinuc_mprobs = _trinuc_root_probs(self.asymm_nuc_probs,
self.asymm_nuc_probs, self.asymm_nuc_probs)
compare_models(trinuc_mprobs, 3)
def test_cond_pos_differ(self):
"""lnL should differ when motif probs are not multiplicative"""
dinuc_probs = {'AA': 0.088506666666666664, 'AC': 0.044746666666666664,
'GT': 0.056693333333333332, 'AG': 0.070199999999999999,
'CC': 0.048653333333333333, 'TT': 0.10678666666666667,
'CG': 0.0093600000000000003, 'GG': 0.049853333333333333,
'GC': 0.040253333333333335, 'AT': 0.078880000000000006,
'GA': 0.058639999999999998, 'TG': 0.081626666666666667,
'TA': 0.068573333333333333, 'CA': 0.06661333333333333,
'TC': 0.060866666666666666, 'CT': 0.069746666666666665}
mg = Nucleotide(motif_length=2, motif_probs=dinuc_probs,
mprob_model='monomer')
mg_lf = mg.makeLikelihoodFunction(self.tree)
mg_lf.setParamRule('length', is_independent=False, init=0.4)
mg_lf.setAlignment(self.aln)
cd = Nucleotide(motif_length=2, motif_probs=dinuc_probs,
mprob_model='conditional')
cd_lf = cd.makeLikelihoodFunction(self.tree)
cd_lf.setParamRule('length', is_independent=False, init=0.4)
cd_lf.setAlignment(self.aln)
self.assertNotAlmostEqual(mg_lf.getLogLikelihood(),
cd_lf.getLogLikelihood())
def test_getting_node_mprobs(self):
"""return correct motif probability vector for tree nodes"""
tree = LoadTree(treestring='(a:.2,b:.2,(c:.1,d:.1):.1)')
aln = LoadSeqs(data={
'a': 'TGTG',
'b': 'TGTG',
'c': 'TGTG',
'd': 'TGTG',
})
motifs = ['T', 'C', 'A', 'G']
aX = MotifChange(motifs[0], motifs[3], forward_only=True).aliased('aX')
bX = MotifChange(motifs[3], motifs[0], forward_only=True).aliased('bX')
edX = MotifChange(motifs[1], motifs[2], forward_only=True).aliased('edX')
cX = MotifChange(motifs[2], motifs[1], forward_only=True).aliased('cX')
sm = Nucleotide(predicates=[aX, bX, edX, cX], equal_motif_probs=True)
lf = sm.makeLikelihoodFunction(tree)
lf.setParamRule('aX', edge='a', value=8.0)
lf.setParamRule('bX', edge='b', value=8.0)
lf.setParamRule('edX', edge='edge.0', value=2.0)
lf.setParamRule('cX', edge='c', value=0.5)
lf.setParamRule('edX', edge='d', value=4.0)
lf.setAlignment(aln)
# we construct the hand calc variants
mprobs = ones(4, float) * .25
a = make_p(.2, (0,3), 8)
a = dot(mprobs, a)
b = make_p(.2, (3, 0), 8)
b = dot(mprobs, b)
e = make_p(.1, (1, 2), 2)
e = dot(mprobs, e)
c = make_p(.1, (2, 1), 0.5)
c = dot(e, c)
d = make_p(.1, (1, 2), 4)
d = dot(e, d)
prob_vectors = lf.getMotifProbsByNode()
self.assertFloatEqual(prob_vectors['a'].array, a)
self.assertFloatEqual(prob_vectors['b'].array, b)
self.assertFloatEqual(prob_vectors['c'].array, c)
self.assertFloatEqual(prob_vectors['d'].array, d)
self.assertFloatEqual(prob_vectors['edge.0'].array, e)
def _make_likelihood(model, tree, results, is_discrete=False):
"""creates the likelihood function"""
# discrete model fails to make a likelihood function if tree has
# lengths
if is_discrete:
kwargs={}
else:
kwargs=dict(expm='pade')
lf = model.makeLikelihoodFunction(tree,
optimise_motif_probs=True, **kwargs)
if not is_discrete:
for param in lf.getParamNames():
if param in ('length', 'mprobs'):
continue
lf.setParamRule(param, is_independent=True, upper=5)
lf.setAlignment(results['aln'])
return lf
def MakeCachedObjects(model, tree, seq_length, opt_args):
"""simulates an alignment under F81, all models should be the same"""
lf = model.makeLikelihoodFunction(tree)
lf.setMotifProbs(dict(A=0.1,C=0.2,G=0.3,T=0.4))
aln = lf.simulateAlignment(seq_length)
results = dict(aln=aln)
discrete_tree = LoadTree(tip_names=aln.Names)
def fit_general(results=results):
if 'general' in results:
return
gen = General(DNA.Alphabet)
gen_lf = _make_likelihood(gen, tree, results)
gen_lf.optimise(**opt_args)
results['general'] = gen_lf
return
def fit_gen_stat(results=results):
if 'gen_stat' in results:
return
gen_stat = GeneralStationary(DNA.Alphabet)
gen_stat_lf = _make_likelihood(gen_stat, tree, results)
gen_stat_lf.optimise(**opt_args)
results['gen_stat'] = gen_stat_lf
def fit_constructed_gen(results=results):
if 'constructed_gen' in results:
return
preds = [MotifChange(a,b, forward_only=True) for a,b in [['A', 'C'],
['A', 'G'], ['A', 'T'], ['C', 'A'], ['C', 'G'],
['C', 'T'], ['G', 'C'], ['G', 'T'], ['T', 'A'],
['T', 'C'], ['T', 'G']]]
nuc = Nucleotide(predicates=preds)
nuc_lf = _make_likelihood(nuc, tree, results)
nuc_lf.optimise(**opt_args)
results['constructed_gen'] = nuc_lf
def fit_discrete(results=results):
if 'discrete' in results:
return
dis_lf = _make_likelihood(DiscreteSubstitutionModel(DNA.Alphabet),
discrete_tree, results, is_discrete=True)
dis_lf.optimise(**opt_args)
results['discrete'] = dis_lf
funcs = dict(general=fit_general, gen_stat=fit_gen_stat,
discrete=fit_discrete, constructed_gen=fit_constructed_gen)
def call(self, obj_name):
if obj_name not in results:
funcs[obj_name]()
return results[obj_name]
return call
# class DiscreteGeneral(TestCase):
# """test discrete and general markov"""
# tree = LoadTree(treestring='(a:0.4,b:0.4,c:0.6)')
# opt_args = dict(max_restarts=1, local=True)
# make_cached = MakeCachedObjects(Nucleotide(), tree, 100000, opt_args)
#
# def _setup_discrete_from_general(self, gen_lf):
# dis_lf = self.make_cached('discrete')
# for edge in self.tree:
# init = gen_lf.getPsubForEdge(edge.Name)
# dis_lf.setParamRule('psubs', edge=edge.Name, init=init)
# dis_lf.setMotifProbs(gen_lf.getMotifProbs())
# return dis_lf
#
# def test_discrete_vs_general1(self):
# """compares fully general models"""
# gen_lf = self.make_cached('general')
# gen_lnL = gen_lf.getLogLikelihood()
# dis_lf = self._setup_discrete_from_general(gen_lf)
# self.assertFloatEqual(gen_lnL, dis_lf.getLogLikelihood())
#
# def test_general_vs_constructed_general(self):
# """a constructed general lnL should be identical to General"""
# sm_lf = self.make_cached('constructed_gen')
# sm_lnL = sm_lf.getLogLikelihood()
# gen_lf = self.make_cached('general')
# gen_lnL = gen_lf.getLogLikelihood()
# self.assertFloatEqualAbs(sm_lnL, gen_lnL, eps=0.1)
#
# def test_general_stationary(self):
# """General stationary should be close to General"""
# gen_stat_lf = self.make_cached('gen_stat')
# gen_lf = self.make_cached('general')
# gen_stat_lnL = gen_stat_lf.getLogLikelihood()
# gen_lnL = gen_lf.getLogLikelihood()
# self.assertLessThan(gen_stat_lnL, gen_lnL)
#
# def test_general_stationary_is_stationary(self):
# """should be stationary"""
# gen_stat_lf = self.make_cached('gen_stat')
# mprobs = gen_stat_lf.getMotifProbs()
# mprobs = array([mprobs[nuc] for nuc in DNA.Alphabet])
# for edge in self.tree:
# psub = gen_stat_lf.getPsubForEdge(edge.Name)
# pi = dot(mprobs, psub.array)
# self.assertFloatEqual(mprobs, pi)
#
# def test_general_is_not_stationary(self):
# """should not be stationary"""
# gen_lf = self.make_cached('general')
# mprobs = gen_lf.getMotifProbs()
# mprobs = array([mprobs[nuc] for nuc in DNA.Alphabet])
# for edge in self.tree:
# psub = gen_lf.getPsubForEdge(edge.Name)
# pi = dot(mprobs, psub.array)
# try:
# self.assertFloatEqual(mprobs, pi)
# except AssertionError:
# pass
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_evolve/test_pairwise_distance.py 000644 000765 000024 00000026426 12024702176 025062 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
import warnings
warnings.filterwarnings('ignore', 'Not using MPI as mpi4py not found')
import numpy
# hides the warning from taking log of -ve determinant
numpy.seterr(invalid='ignore')
from cogent.util.unit_test import TestCase, main
from cogent import LoadSeqs, DNA, RNA, PROTEIN
from cogent.evolve.pairwise_distance import get_moltype_index_array, \
seq_to_indices, _fill_diversity_matrix, \
_jc69_from_matrix, JC69Pair, _tn93_from_matrix, TN93Pair, LogDetPair
from cogent.evolve._pairwise_distance import \
_fill_diversity_matrix as pyx_fill_diversity_matrix
import math
__author__ = "Gavin Huttley and Yicheng Zhu"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Gavin Huttley", "Yicheng Zhu"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "Gavin.Huttley@anu.edu.au"
__status__ = "Production"
class TestPair(TestCase):
dna_char_indices = get_moltype_index_array(DNA)
rna_char_indices = get_moltype_index_array(RNA)
alignment = LoadSeqs(data=[('s1', 'ACGTACGTAC'),
('s2', 'GTGTACGTAC')], moltype=DNA)
ambig_alignment = LoadSeqs(data=[('s1', 'RACGTACGTACN'),
('s2', 'AGTGTACGTACA')], moltype=DNA)
diff_alignment = LoadSeqs(data=[('s1', 'ACGTACGTTT'),
('s2', 'GTGTACGTAC')], moltype=DNA)
def est_char_to_index(self):
"""should correctly recode a DNA & RNA seqs into indices"""
seq = 'TCAGRNY?-'
expected = [0, 1, 2, 3, -9, -9, -9, -9, -9]
indices = seq_to_indices(seq, self.dna_char_indices)
self.assertEquals(indices, expected)
seq = 'UCAGRNY?-'
indices = seq_to_indices(seq, self.rna_char_indices)
self.assertEquals(indices, expected)
def est_fill_diversity_matrix_all(self):
"""make correct diversity matrix when all chars valid"""
s1 = seq_to_indices('ACGTACGTAC', self.dna_char_indices)
s2 = seq_to_indices('GTGTACGTAC', self.dna_char_indices)
matrix = numpy.zeros((4,4), float)
# self-self should just be an identity matrix
_fill_diversity_matrix(matrix, s1, s1)
self.assertEquals(matrix.sum(), len(s1))
self.assertEquals(matrix,
numpy.array([[2,0,0,0],
[0,3,0,0],
[0,0,3,0],
[0,0,0,2]], float))
# small diffs
matrix.fill(0)
_fill_diversity_matrix(matrix, s1, s2)
self.assertEquals(matrix,
numpy.array([[2,0,0,0],
[1,2,0,0],
[0,0,2,1],
[0,0,0,2]], float))
def est_fill_diversity_matrix_some(self):
"""make correct diversity matrix when not all chars valid"""
s1 = seq_to_indices('RACGTACGTACN', self.dna_char_indices)
s2 = seq_to_indices('AGTGTACGTACA', self.dna_char_indices)
matrix = numpy.zeros((4,4), float)
# small diffs
matrix.fill(0)
_fill_diversity_matrix(matrix, s1, s2)
self.assertEquals(matrix,
numpy.array([[2,0,0,0],
[1,2,0,0],
[0,0,2,1],
[0,0,0,2]], float))
def est_python_vs_cython_fill_matrix(self):
"""python & cython fill_diversity_matrix give same answer"""
s1 = seq_to_indices('RACGTACGTACN', self.dna_char_indices)
s2 = seq_to_indices('AGTGTACGTACA', self.dna_char_indices)
matrix1 = numpy.zeros((4,4), float)
_fill_diversity_matrix(matrix1, s1, s2)
matrix2 = numpy.zeros((4,4), float)
pyx_fill_diversity_matrix(matrix2, s1, s2)
self.assertFloatEqual(matrix1, matrix2)
def est_jc69_from_matrix(self):
"""compute JC69 from diversity matrix"""
s1 = seq_to_indices('ACGTACGTAC', self.dna_char_indices)
s2 = seq_to_indices('GTGTACGTAC', self.dna_char_indices)
matrix = numpy.zeros((4,4), float)
_fill_diversity_matrix(matrix, s1, s2)
total, p, dist, var = _jc69_from_matrix(matrix)
self.assertEquals(total, 10.0)
self.assertEquals(p, 0.2)
def est_jc69_from_alignment(self):
"""compute JC69 dists from an alignment"""
calc = JC69Pair(DNA, alignment=self.alignment)
calc.run()
self.assertEquals(calc.Lengths['s1', 's2'], 10)
self.assertEquals(calc.Proportions['s1', 's2'], 0.2)
# value from OSX MEGA 5
self.assertFloatEqual(calc.Dists['s1', 's2'], 0.2326161962)
# value**2 from OSX MEGA 5
self.assertFloatEqual(calc.Variances['s1', 's2'],
0.029752066125078681)
# value from OSX MEGA 5
self.assertFloatEqual(calc.StdErr['s1', 's2'], 0.1724878724)
# same answer when using ambiguous alignment
calc.run(self.ambig_alignment)
self.assertFloatEqual(calc.Dists['s1', 's2'], 0.2326161962)
# but different answer if subsequent alignment is different
calc.run(self.diff_alignment)
self.assertTrue(calc.Dists['s1', 's2'] != 0.2326161962)
def est_tn93_from_matrix(self):
"""compute TN93 distances"""
calc = TN93Pair(DNA, alignment=self.alignment)
calc.run()
self.assertEquals(calc.Lengths['s1', 's2'], 10)
self.assertEquals(calc.Proportions['s1', 's2'], 0.2)
# value from OSX MEGA 5
self.assertFloatEqual(calc.Dists['s1', 's2'], 0.2554128119)
# value**2 from OSX MEGA 5
self.assertFloatEqual(calc.Variances['s1', 's2'], 0.04444444445376601)
# value from OSX MEGA 5
self.assertFloatEqual(calc.StdErr['s1', 's2'], 0.2108185107)
# same answer when using ambiguous alignment
calc.run(self.ambig_alignment)
self.assertFloatEqual(calc.Dists['s1', 's2'], 0.2554128119)
# but different answer if subsequent alignment is different
calc.run(self.diff_alignment)
self.assertTrue(calc.Dists['s1', 's2'] != 0.2554128119)
def est_distance_pair(self):
"""get distances dict"""
calc = TN93Pair(DNA, alignment=self.alignment)
calc.run()
dists = calc.getPairwiseDistances()
dist = 0.2554128119
expect = {('s1', 's2'): dist, ('s2', 's1'): dist}
self.assertEquals(dists.keys(), expect.keys())
self.assertFloatEqual(dists.values(), expect.values())
def est_logdet_pair_dna(self):
"""logdet should produce distances that match MEGA"""
aln = LoadSeqs('data/brca1_5.paml', moltype=DNA)
logdet_calc = LogDetPair(moltype=DNA, alignment=aln)
logdet_calc.run(use_tk_adjustment=True)
dists = logdet_calc.getPairwiseDistances()
all_expected = {('Human', 'NineBande'): 0.075336929999999996,
('NineBande', 'DogFaced'): 0.0898575452,
('DogFaced', 'Human'): 0.1061747919,
('HowlerMon', 'DogFaced'): 0.0934480008,
('Mouse', 'HowlerMon'): 0.26422862920000001,
('NineBande', 'Human'): 0.075336929999999996,
('HowlerMon', 'NineBande'): 0.062202897899999998,
('DogFaced', 'NineBande'): 0.0898575452,
('DogFaced', 'HowlerMon'): 0.0934480008,
('Human', 'DogFaced'): 0.1061747919,
('Mouse', 'Human'): 0.26539976700000001,
('NineBande', 'HowlerMon'): 0.062202897899999998,
('HowlerMon', 'Human'): 0.036571181899999999,
('DogFaced', 'Mouse'): 0.2652555144,
('HowlerMon', 'Mouse'): 0.26422862920000001,
('Mouse', 'DogFaced'): 0.2652555144,
('NineBande', 'Mouse'): 0.22754789210000001,
('Mouse', 'NineBande'): 0.22754789210000001,
('Human', 'Mouse'): 0.26539976700000001,
('Human', 'HowlerMon'): 0.036571181899999999}
for pair in dists:
got = dists[pair]
expected = all_expected[pair]
self.assertFloatEqual(got, expected)
def est_logdet_tk_adjustment(self):
"""logdet using tamura kumar differs from classic"""
aln = LoadSeqs('data/brca1_5.paml', moltype=DNA)
logdet_calc = LogDetPair(moltype=DNA, alignment=aln)
logdet_calc.run(use_tk_adjustment=True, show_progress=False)
tk = logdet_calc.getPairwiseDistances()
logdet_calc.run(use_tk_adjustment=False, show_progress=False)
not_tk = logdet_calc.getPairwiseDistances()
self.assertNotEqual(tk, not_tk)
def est_logdet_pair_aa(self):
"""logdet shouldn't fail to produce distances for aa seqs"""
aln = LoadSeqs('data/brca1_5.paml', moltype=DNA)
aln = aln.getTranslation()
logdet_calc = LogDetPair(moltype=PROTEIN, alignment=aln)
logdet_calc.run(use_tk_adjustment=True, show_progress=False)
dists = logdet_calc.getPairwiseDistances()
def test_logdet_missing_states(self):
"""should calculate logdet measurement with missing states"""
data = [('seq1', "GGGGGGGGGGGCCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGCGGTTTTTTTTTTTTTTTTTT"),
('seq2', "TAAAAAAAAAAGGGGGGGGGGGGGGGGGGTTTTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCC")]
aln = LoadSeqs(data=data, moltype=DNA)
logdet_calc = LogDetPair(moltype=DNA, alignment=aln)
logdet_calc.run(use_tk_adjustment=True, show_progress=False)
dists = logdet_calc.getPairwiseDistances()
self.assertTrue(dists.values()[0] is not None)
logdet_calc.run(use_tk_adjustment=False, show_progress=False)
dists = logdet_calc.getPairwiseDistances()
self.assertTrue(dists.values()[0] is not None)
def test_logdet_variance(self):
"""calculate logdet variance consistent with hand calculation"""
data = [('seq1', "GGGGGGGGGGGCCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGCGGTTTTTTTTTTTTTTTTTT"),
('seq2', "TAAAAAAAAAAGGGGGGGGGGGGGGGGGGTTTTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCC")]
aln = LoadSeqs(data=data, moltype=DNA)
logdet_calc = LogDetPair(moltype=DNA, alignment=aln)
logdet_calc.run(use_tk_adjustment=True, show_progress=False)
self.assertFloatEqual(logdet_calc.Variances[1,1], 0.5267, eps=1e-3)
logdet_calc.run(use_tk_adjustment=False, show_progress=False)
dists = logdet_calc.getPairwiseDistances()
self.assertFloatEqual(logdet_calc.Variances[1,1], 0.4797, eps=1e-3)
def est_logdet_for_determinant_lte_zero(self):
"""returns distance of None if the determinant is <= 0"""
data = dict(seq1="AGGGGGGGGGGCCCCCCCCCCCCCCCCCGGGGGGGGGGGGGGGCGGTTTTTTTTTTTTTTTTTT",
seq2="TAAAAAAAAAAGGGGGGGGGGGGGGGGGGTTTTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCC")
aln = LoadSeqs(data=data, moltype=DNA)
logdet_calc = LogDetPair(moltype=DNA, alignment=aln)
logdet_calc.run(use_tk_adjustment=True, show_progress=False)
dists = logdet_calc.getPairwiseDistances()
self.assertTrue(dists.values()[0] is None)
logdet_calc.run(use_tk_adjustment=False, show_progress=False)
dists = logdet_calc.getPairwiseDistances()
self.assertTrue(dists.values()[0] is None)
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_evolve/test_parameter_controller.py 000644 000765 000024 00000020112 12024702176 025572 0 ustar 00jrideout staff 000000 000000 #! /usr/bin/env python
# Matthew Wakefield Feb 2004
from __future__ import with_statement
import unittest
import os
import warnings
from cogent import LoadSeqs, LoadTree
import cogent.evolve.parameter_controller, cogent.evolve.substitution_model
from cogent.maths import optimisers
__author__ = "Peter Maxwell"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Peter Maxwell", "Gavin Huttley", "Matthew Wakefield"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Production"
base_path = os.getcwd()
data_path = os.path.join(base_path, 'data')
good_rule_sets = [
[
{'par_name' : 'length','is_independent':True},
],
[
{'par_name' : 'length','is_independent':True},
],
[
{'par_name' : 'length','is_clade' :True, 'is_independent':True, 'edges' : ['a','b']},
],
[
{'par_name' : 'length','is_independent':True, 'edges' : ['a','c','e']},
],
[
{'par_name' : 'length','is_independent':True, 'edge' : 'a'},
],
]
bad_rule_sets = [
[
{'par_name' : 'length','is_clade' :True, 'edges' : ['b','f'],},
],
]
class test_parameter_controller(unittest.TestCase):
"""Tesing Parameter Controller"""
def setUp(self):
#length all edges 1 except c=2. b&d transitions all other transverions
self.al = LoadSeqs(
data={'a':'tata', 'b':'tgtc', 'c':'gcga', 'd':'gaac', 'e':'gagc',})
self.tree = LoadTree(treestring='((a,b),(c,d),e);')
self.model = cogent.evolve.substitution_model.Nucleotide(
do_scaling=True, equal_motif_probs=True, model_gaps=True)
def test_scoped_local(self):
model = cogent.evolve.substitution_model.Nucleotide(
do_scaling=True, equal_motif_probs=True, model_gaps=True,
predicates = {'kappa':'transition'})
lf = model.makeLikelihoodFunction(self.tree)
lf.setConstantLengths()
lf.setAlignment(self.al)
null = lf.getNumFreeParams()
lf.setParamRule(par_name='kappa',
is_independent=True,
edges=['b','d'])
self.assertEqual(null+2, lf.getNumFreeParams())
def test_setMotifProbs(self):
"""Mprobs supplied to the parameter controller"""
model = cogent.evolve.substitution_model.Nucleotide(
model_gaps=True, motif_probs=None)
lf = model.makeLikelihoodFunction(self.tree,
motif_probs_from_align=False)
mprobs = {'A':0.1,'C':0.2,'G':0.2,'T':0.5,'-':0.0}
lf.setMotifProbs(mprobs)
self.assertEqual(lf.getMotifProbs(), mprobs)
lf.setMotifProbsFromData(self.al[:1], is_constant=True)
self.assertEqual(lf.getMotifProbs()['G'], 0.6)
lf.setMotifProbsFromData(self.al[:1], pseudocount=1)
self.assertNotEqual(lf.getMotifProbs()['G'], 0.6)
# test with consideration of ambiguous states
al = LoadSeqs(data = {'seq1': 'ACGTAAGNA', 'seq2': 'ACGTANGTC',
'seq3': 'ACGTACGTG'})
lf.setMotifProbsFromData(al, include_ambiguity=True, is_constant=True)
motif_probs = dict(lf.getMotifProbs())
correct_probs = {'A': 8.5/27, 'C': 5.5/27, '-': 0.0, 'T': 5.5/27,
'G': 7.5/27}
self.assertEqual(motif_probs, correct_probs)
self.assertEqual(sum(motif_probs.values()), 1.0)
def test_setMultiLocus(self):
"""2 loci each with own mprobs"""
model = cogent.evolve.substitution_model.Nucleotide(motif_probs=None)
lf = model.makeLikelihoodFunction(self.tree,
motif_probs_from_align=False, loci=["a", "b"])
mprobs_a = dict(A=.2, T=.2, C=.3, G=.3)
mprobs_b = dict(A=.1, T=.2, C=.3, G=.4)
for is_constant in [False, True]:
lf.setMotifProbs(mprobs_a, is_constant=is_constant)
s = str(lf)
lf.setMotifProbs(mprobs_b, locus="b")
self.assertEqual(lf.getMotifProbs(locus="a"), mprobs_a)
self.assertEqual(lf.getMotifProbs(locus="b"), mprobs_b)
s = str(lf)
#lf.setParamRule('mprobs', is_independent=False)
def test_setParamRules(self):
lf = self.model.makeLikelihoodFunction(self.tree)
def do_rules(rule_set):
for rule in rule_set:
lf.setParamRule(**rule)
for rule_set in good_rule_sets:
lf.setDefaultParamRules()
do_rules(rule_set)
for rule_set in bad_rule_sets:
lf.setDefaultParamRules()
self.assertRaises((KeyError, TypeError,
AssertionError, ValueError), do_rules, rule_set)
def test_setLocalClock(self):
pass
def test_setConstantLengths(self):
t = LoadTree(treestring='((a:1,b:2):3,(c:4,d:5):6,e:7);')
lf = self.model.makeLikelihoodFunction(t)#self.tree)
lf.setParamRule('length', is_constant=True)
# lf.setConstantLengths(t)
lf.setAlignment(self.al)
self.assertEqual(lf.getParamValue('length', 'b'), 2)
self.assertEqual(lf.getParamValue('length', 'd'), 5)
def test_pairwise_clock(self):
al = LoadSeqs(data={'a':'agct','b':'ggct'})
tree = LoadTree(treestring='(a,b);')
model = cogent.evolve.substitution_model.Dinucleotide(
do_scaling=True, equal_motif_probs=True, model_gaps=True,
mprob_model='tuple')
lf = model.makeLikelihoodFunction(tree)
lf.setLocalClock('a','b')
lf.setAlignment(al)
lf.optimise(local=True)
rd = lf.getParamValueDict(['edge'], params=['length'])
self.assertAlmostEquals(lf.getLogLikelihood(),-10.1774488956)
self.assertEqual(rd['length']['a'],rd['length']['b'])
def test_local_clock(self):
lf = self.model.makeLikelihoodFunction(self.tree)
lf.setLocalClock('c','d')
lf.setAlignment(self.al)
lf.optimise(local=True,
tolerance=1e-8, max_restarts=2)
rd = lf.getParamValueDict(['edge'], params=['length'])
self.assertAlmostEquals(lf.getLogLikelihood(),-27.84254174)
self.assertEqual(rd['length']['c'],rd['length']['d'])
self.assertNotEqual(rd['length']['a'],rd['length']['e'])
def test_complex_parameter_rules(self):
# This test has many local minima and so does not cope
# with changes to optimiser details.
model = cogent.evolve.substitution_model.Nucleotide(
do_scaling=True, equal_motif_probs=True, model_gaps=True,
predicates = {'kappa':'transition'})
lf = model.makeLikelihoodFunction(self.tree)
lf.setParamRule(par_name='kappa',
is_independent=True)
lf.setParamRule(par_name='kappa',
is_independent=False,
edges=['b','d'])
lf.setConstantLengths(LoadTree(
treestring='((a:1,b:1):1,(c:2,d:1):1,e:1);'))
#print self.pc
lf.setAlignment(self.al)
lf.optimise(local=True)
rd = lf.getParamValueDict(['edge'], params=['kappa'])
self.assertAlmostEquals(lf.getLogLikelihood(),-27.3252, 3)
self.assertEqual(rd['kappa']['b'],rd['kappa']['d'])
self.assertNotEqual(rd['kappa']['a'],rd['kappa']['b'])
def test_bounds(self):
"""Test setting upper and lower bounds for parameters"""
lf = self.model.makeLikelihoodFunction(self.tree)
lf.setParamRule('length', value=3, lower=0, upper=5)
# Out of bounds value should warn and keep bounded
with warnings.catch_warnings(record=True) as w:
lf.setParamRule('length', lower=0, upper=2)
self.assertTrue(len(w), 'No warning issued')
self.assertEqual(lf.getParamValue('length', edge='a'), 2)
# upper < lower bounds should fail
self.assertRaises(ValueError, lf.setParamRule,
'length', lower=2, upper=0)
if __name__ == '__main__':
unittest.main()
PyCogent-1.5.3/tests/test_evolve/test_scale_rules.py 000644 000765 000024 00000014024 12024702176 023655 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
import unittest
from cogent import LoadTree
from cogent.evolve import substitution_model
def a_c(x, y):
return (x == 'A' and y == 'C') or (x == 'C' and y == 'A')
from cogent.evolve.predicate import MotifChange, replacement
__author__ = "Peter Maxwell and Gavin Huttley"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Peter Maxwell", "Gavin Huttley"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Production"
a_c = MotifChange('A', 'C')
trans = MotifChange('A', 'G') | MotifChange('T', 'C')
TREE = LoadTree(tip_names='ab')
class ScaleRuleTests(unittest.TestCase):
def _makeModel(self, do_scaling, predicates, scale_rules=[]):
return substitution_model.Nucleotide(
do_scaling=do_scaling, equal_motif_probs=True,
model_gaps=False, predicates=predicates, scales=scale_rules)
def _getScaledLengths(self, model, params):
LF = model.makeLikelihoodFunction(TREE)
for param in params:
LF.setParamRule(param, value=params[param], is_constant=True)
result = {}
for predicate in model.scale_masks:
result[predicate] = LF.getScaledLengths(predicate)['a']
return result
def test_scaled(self):
"""Scale rule requiring matrix entries to have all pars specified"""
model = self._makeModel(True, {'k':trans}, {
'ts':trans, 'tv': ~trans})
self.assertEqual(
self._getScaledLengths(model, {'k':6.0, 'length':4.0}),
{'ts': 3.0, 'tv':1.0})
def test_binned(self):
model = self._makeModel(True, {'k':trans}, {
'ts':trans, 'tv': ~trans})
LF = model.makeLikelihoodFunction(TREE, bins=2)
LF.setParamRule('length', value=4.0, is_constant=True)
LF.setParamRule('k', value=6.0, bin='bin0', is_constant=True)
LF.setParamRule('k', value=1.0, bin='bin1', is_constant=True)
for (bin, expected) in [('bin0', 3.0), ('bin1', 4.0/3), (None, 13.0/6)]:
self.assertEqual(LF.getScaledLengths('ts', bin=bin)['a'], expected)
def test_unscaled(self):
"""Scale rule on a model which has scaling performed after calculation
rather than during it"""
model = self._makeModel(False, {'k':trans}, {
'ts':trans, 'tv': ~trans})
self.assertEqual(
self._getScaledLengths(model, {'k':6.0, 'length':2.0}),
{'ts': 3.0, 'tv':1.0})
def test_scaled_or(self):
"""Scale rule where matrix entries can have any of the pars specified"""
model = self._makeModel(True, {'k':trans, 'ac':a_c}, {
'or': (trans | a_c), 'not': ~(trans | a_c)})
self.assertEqual(
self._getScaledLengths(model, {'k':6.0,'length':6.0, 'ac': 3.0}),
{'or': 5.0, 'not': 1.0})
def test_scaling(self):
"""Testing scaling calculations using Dn and Ds as an example."""
model = substitution_model.Codon(
do_scaling=True, model_gaps=False, recode_gaps=True,
predicates = {
'k': trans,
'r': replacement},
motif_probs={
'TAT': 0.0088813702685557206, 'TGT': 0.020511736096426307,
'TCT': 0.024529498836963416, 'TTT': 0.019454430112074435,
'TGC': 0.0010573059843518714, 'TGG': 0.0042292239374074857,
'TAC': 0.002326073165574117, 'TTC': 0.0086699090716853451,
'TCG': 0.0010573059843518714, 'TTA': 0.020723197293296681,
'TTG': 0.01036159864664834, 'TCC': 0.0082469866779445976,
'TCA': 0.022414886868259674, 'GCA': 0.015648128568407697,
'GTA': 0.014590822584055826, 'GCC': 0.0095157538591668436,
'GTC': 0.0063438359061112285, 'GCG': 0.0016916895749629942,
'GTG': 0.0067667582998519769, 'CAA': 0.018185662930852189,
'GTT': 0.021569042080778176, 'GCT': 0.014167900190315077,
'ACC': 0.0042292239374074857, 'GGT': 0.014167900190315077,
'CGA': 0.0012687671812222456, 'CGC': 0.0010573059843518714,
'GAT': 0.030238951152463524, 'AAG': 0.034891097483611758,
'CGG': 0.002326073165574117, 'ACT': 0.028758722774370905,
'GGG': 0.0071896806935927262, 'GGA': 0.016282512159018821,
'GGC': 0.0090928314654260944, 'GAG': 0.031296257136815393,
'AAA': 0.05476844998942694, 'GAC': 0.011207443434129837,
'CGT': 0.0033833791499259885, 'GAA': 0.076337492070205112,
'CTT': 0.010573059843518714, 'ATG': 0.012687671812222457,
'ACA': 0.021991964474518927, 'ACG': 0.00084584478748149711,
'ATC': 0.0076126030873334746, 'AAC': 0.022837809262000422,
'ATA': 0.017762740537111441, 'AGG': 0.013533516599703954,
'CCT': 0.025586804821315288, 'AGC': 0.029393106364982026,
'AGA': 0.021991964474518927, 'CAT': 0.021357580883907802,
'AAT': 0.05772890674561218, 'ATT': 0.019031507718333687,
'CTG': 0.012899133009092831, 'CTA': 0.013744977796574329,
'CTC': 0.0078240642842038483, 'CAC': 0.0050750687248889825,
'CCG': 0.00021146119687037428, 'AGT': 0.03742863184605625,
'CAG': 0.024106576443222668, 'CCA': 0.021357580883907802,
'CCC': 0.0069782194967223515},
scales = {'dN': replacement, 'dS': ~replacement},
mprob_model = 'tuple',
)
length = 0.1115
a = self._getScaledLengths(model,
{'k': 3.6491, 'r': 0.6317, 'length': length})
b = self._getScaledLengths(model,
{'k': 3.6491, 'r': 1.0, 'length': length})
dN = length * a['dN'] / (3.0 * b['dN'])
dS = length * a['dS'] / (3.0 * b['dS'])
# following are results from PAML
self.assertEqual('%.4f' % dN, '0.0325')
self.assertEqual('%.4f' % dS ,'0.0514')
if __name__ == '__main__':
unittest.main()
PyCogent-1.5.3/tests/test_evolve/test_simulation.py 000644 000765 000024 00000004640 12024702176 023543 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""testing the alignment simulation code. We will first just create a simple Jukes Cantor model using a four taxon tree with very different branch lengths, and a Kimura two (really one) parameter model.
The test is to reestimate the parameter values as accurately as possible."""
import sys
from cogent.core import alignment, tree
from cogent.evolve import substitution_model
__author__ = "Peter Maxwell and Gavin Huttley"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Peter Maxwell", "Gavin Huttley"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Production"
# specify the 4 taxon tree, and a 'dummy' alignment
t = tree.LoadTree(treestring='(a:0.4,b:0.3,(c:0.15,d:0.2)edge.0:0.1)root;')
#al = alignments.LoadSeqs(data={'a':'a','b':'a','c':'a','d':'a'})
# how long the simulated alignments should be
# at 1000000 the estimates get nice and close
length_of_align = 10000
#########################
#
# For a Jukes Cantor model
#
#########################
sm = substitution_model.Nucleotide()
lf = sm.makeLikelihoodFunction(t)
lf.setConstantLengths()
lf.setName('True JC model')
print lf
simulated = lf.simulateAlignment(sequence_length=length_of_align)
print simulated
new_lf = sm.makeLikelihoodFunction(t)
new_lf = new_lf.setAlignment(simulated)
new_lf.optimise(tolerance=1.0)
new_lf.optimise(local=True)
new_lf.setName('True JC model')
print new_lf
#########################
#
# a Kimura model
#
#########################
# has a ts/tv term, different values for every edge
sm = substitution_model.Nucleotide(predicates={'kappa':'transition'})
lf = sm.makeLikelihoodFunction(t)
lf.setConstantLengths()
lf.setParamRule('kappa',is_constant = True, value = 4.0, edge_name='a')
lf.setParamRule('kappa',is_constant = True, value = 0.5, edge_name='b')
lf.setParamRule('kappa',is_constant = True, value = 0.2, edge_name='c')
lf.setParamRule('kappa',is_constant = True, value = 3.0, edge_name='d')
lf.setParamRule('kappa',is_constant = True, value = 2.0, edge_name='edge.0')
lf.setName('True Kappa model')
print lf
simulated = lf.simulateAlignment(sequence_length=length_of_align)
print simulated
new_lf = sm.makeLikelihoodFunction(t)
new_lf.setParamRule('kappa',is_independent=True)
new_lf.setAlignment(simulated)
new_lf.optimise(tolerance=1.0)
new_lf.optimise(local=True)
new_lf.setName('Estimated Kappa model')
print new_lf
PyCogent-1.5.3/tests/test_evolve/test_substitution_model.py 000644 000765 000024 00000020657 12024702176 025321 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
import os
from cogent import LoadSeqs, CodonAlphabet, DNA, LoadTable
from cogent.core import genetic_code
from cogent.evolve import substitution_model, substitution_calculation
from cogent.util.unit_test import TestCase, main
__author__ = "Gavin Huttley"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Peter Maxwell", "Gavin Huttley"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Production"
base_path = os.getcwd()
data_path = os.path.join(base_path, 'data')
class NucleotideModelTestMethods(TestCase):
def setUp(self):
self.submodel = substitution_model.Nucleotide(
do_scaling=True, model_gaps=False)
def test_isTransition(self):
"""testing isTransition"""
isTransition = self.submodel.getPredefinedPredicate('transition')
assert isTransition('A', 'G')
assert isTransition('C', 'T')
assert not isTransition('A', 'T')
assert not isTransition('C', 'G')
def test_isTransversion(self):
"""testing isTransversion"""
isTransversion = self.submodel.getPredefinedPredicate('transversion')
assert not isTransversion('A', 'G')
assert not isTransversion('C', 'T')
assert isTransversion('A', 'T')
assert isTransversion('C', 'G')
def test_isIndel(self):
"""testing indel comparison nucleotide model"""
model = substitution_model.Nucleotide(
do_scaling=True, model_gaps=True)
isIndel = model.getPredefinedPredicate('indel')
assert isIndel('A', '-')
assert isIndel('-', 'G')
#assert not self.submodel.isIndel('-', '-')
assert not isIndel('a', 't')
def test_PredicateChecks(self):
# overparameterisation
self.assertRaises(ValueError, substitution_model.Nucleotide,
model_gaps=False, predicates=['transition', 'transversion'])
class MultiLetterMotifSubstModelTests(TestCase):
def setUp(self):
self.submodel = substitution_model.Dinucleotide(do_scaling=True,
model_gaps=True, mprob_model='tuple')
def test_asciiArt(self):
model = substitution_model.Dinucleotide(mprob_model='tuple',
predicates=['k:transition'])
model.asciiArt()
model = substitution_model.Dinucleotide(mprob_model='tuple')
model.asciiArt()
def test_isIndel(self):
"""testing indel comparison for dinucleotide model"""
# these are non-instantaneous
isIndel = self.submodel.getPredefinedPredicate('indel')
assert not isIndel('AA', '--')
assert not isIndel('--', 'CT')
#assert not self.submodel.isIndel('--', '--')
assert not isIndel('AT', 'AA')
assert isIndel('AA', 'A-')
assert isIndel('-A', 'CA')
assert isIndel('TA', '-A')
# isIndel can now assume it won't get any non-instantaneous pairs
# assert self.submodel.isIndel('-a', 'a-') == 0
class TupleModelMotifProbFuncs(TestCase):
dinucs = ('TT', 'CT', 'AT', 'GT',
'TC', 'CC', 'AC', 'GC',
'TA', 'CA', 'AA', 'GA',
'TG', 'CG', 'AG', 'GG')
nuc_probs = [('T', 0.1), ('C', 0.2), ('A', 0.3), ('G', 0.4)]
dinuc_probs=[(m2+m1,p1*p2) for m1,p1 in nuc_probs for m2,p2 in nuc_probs]
mat_indices = dict(
C=set([(0,1),(0,4),(1,5),(2,1),(2,6),(3,1),(3,7),(4,5),(6,5),
(7,5),(8,4),(8,9),(9,5),(10,6),(10,9),(11,7),(11,9),(12,4),
(12,13),(13,5),(14,6),(14,13),(15,7),(15,13)]),
A=set([(0,2),(0,8),(1,2),(1,9),(2,10),(3,2),(3,11),(4,6),(4,8),
(5,6),(5,9),(6,10),(7,6),(7,11),(8,10),(9,10),(11,10),
(12,8),(12,14),(13,9),(13,14),(14,10),(15,11),(15,14)]),
G=set([(0,3),(0,12),(1,3),(1,13),(2,3),(2,14),(3,15),(4,7),(4,12),
(5,7),(5,13),(6,7),(6,14),(7,15),(8,11),(8,12),(9,11),
(9,13),(10,11),(10,14),(11,15),(12,15),(13,15),(14,15)]),
T=set([(1,0),(2,0),(3,0),(4,0),(5,1),(5,4),(6,2),(6,4),(7,3),
(7,4),(8,0),(9,1),(9,8),(10,2),(10,8),(11,3),(11,8),(12,0),
(13,1),(13,12),(14,2),(14,12),(15,3),(15,12)])
)
class ThreeLetterMotifSubstModelTests(TestCase):
def setUp(self):
self.submodel = substitution_model.Nucleotide(motif_length=3,
mprob_model='tuple')
def test_isIndel(self):
"""testing indel comparison for trinucleotide model"""
isIndel = self.submodel.getPredefinedPredicate('indel')
assert isIndel('AAA', 'AA-')
assert isIndel('-CA', 'CCA')
assert isIndel('TAC', 'T-C')
# isIndel can now assume it won't get any non-instantaneous pairs
assert not isIndel('AAA', '---')
assert not isIndel('---', 'CTT')
assert not isIndel('AAA', '--A')
assert not isIndel('C--', 'CTT')
class CodonSubstModelTests(TestCase):
def setUp(self):
self.standardcode = substitution_model.Codon(model_gaps=True, gc=1,
mprob_model='tuple')
self.mitocode = substitution_model.Codon(model_gaps=False, gc=2,
mprob_model='tuple')
def test_isTransition(self):
"""testing codon isTransition"""
isTransition = self.standardcode.getPredefinedPredicate('transition')
# first position
assert isTransition('TGC', 'CGC')
assert isTransition('GGC', 'AGC')
# second position
assert isTransition('CTT', 'CCT')
assert isTransition('CAT', 'CGT')
# thirs position
assert isTransition('CTT', 'CTC')
assert isTransition('CTA', 'CTG')
# mito code
assert isTransition('CTT', 'CTC')
assert isTransition('CTA', 'CTG')
assert not isTransition('GAG', 'GTG')
assert not isTransition('CCC', 'CGC')
def test_isReplacement(self):
"""test isReplacement for the two major genetic codes"""
isReplacement = self.standardcode.getPredefinedPredicate('replacement')
# for the standard code, a replacement
assert isReplacement('CTG', 'ATG')
assert not isReplacement('AGT','TCC')
assert not isReplacement('CTG', '---')
assert not isReplacement('---', 'CTA')
# for the vert mitocho code, instantaneous replacement
isReplacement = self.mitocode.getPredefinedPredicate('replacement')
assert isReplacement('AAA', 'AAC')
def test_isSilent(self):
"""testing isSilent for the two major genetic codes"""
isSilent = self.standardcode.getPredefinedPredicate('silent')
assert isSilent('CTA', 'CTG')
assert not isSilent('AGT','AAG')
assert not isSilent('CTG', '---')
assert not isSilent('---', 'CTG')
# for vert mito code
isSilent = self.mitocode.getPredefinedPredicate('silent')
assert isSilent('TCC', 'TCA')
def test_isIndel(self):
"""test isIndel for codon model"""
isIndel = self.standardcode.getPredefinedPredicate('indel')
assert isIndel('CTA', '---')
assert not isIndel('---', '---')
assert isIndel('---', 'TTC')
def test_str_(self):
"""str() and repr() of a substitution model"""
s = str(self.standardcode)
r = repr(self.standardcode)
class ModelDataInteractionTestMethods(TestCase):
def test_excludeinggaps(self):
"""testing excluding gaps from model"""
model = substitution_model.Nucleotide(model_gaps=False)
assert len(model.getAlphabet()) == 4
def test_includinggaps(self):
"""testing excluding gaps from model"""
model = substitution_model.Nucleotide(model_gaps=True)
assert len(model.getAlphabet()) == 5
def test_getMotifs(self):
"""testing return of motifs"""
model_motifs = substitution_model.Nucleotide().getMotifs()
def test_getParamList(self):
"""testing getting the parameter list"""
model = substitution_model.Nucleotide()
self.assertEqual(model.getParamList(), [])
model = substitution_model.Nucleotide(predicates=['beta:transition'])
self.assertEqual(model.getParamList(), ['beta'])
# need to ensure entering motif probs that sum to 1, that motif sets are the same
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_draw/__init__.py 000644 000765 000024 00000000714 12024702176 021512 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
__all__ = ['test_distribution_plots']
__author__ = ""
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jeremy Widmann", "Sandra Smit", "Gavin Huttley",
"Rob Knight", "Zongzhi Liu", "Amanda Birmingham",
"Greg Caporaso", "Jai Ram Rideout"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
PyCogent-1.5.3/tests/test_draw/test_distribution_plots.py 000755 000765 000024 00000061034 12024702176 024757 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
__author__ = "Jai Ram Rideout"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jai Ram Rideout"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Jai Ram Rideout"
__email__ = "jai.rideout@gmail.com"
__status__ = "Production"
"""Tests public and private functions in the distribution_plots module."""
from matplotlib import use
use('Agg', warn=False)
from StringIO import StringIO
import sys
import matplotlib.colors as colors
from matplotlib.pyplot import boxplot
from numpy import array
from cogent.draw.distribution_plots import _validate_input,\
_get_distribution_markers, _validate_x_values, _create_plot,\
_calc_data_point_locations, _set_axes_options, generate_box_plots,\
generate_comparative_plots, _calc_data_point_ticks, _plot_bar_data,\
_plot_scatter_data, _plot_box_data, _color_box_plot,\
_create_legend, _set_figure_size
from cogent.util.unit_test import TestCase, main
class DistributionPlotsTests(TestCase):
"""Tests of the distribution_plots module."""
def setUp(self):
"""Create some data to be used in the tests."""
# Test null data list.
self.Null = None
# Test empty data list.
self.Empty = []
# Test nested empty data list.
self.EmptyNested = [[]]
# Test nested empty data list (for bar/scatter plots).
self.EmptyDeeplyNested = [[[]]]
# Test invalid number of samples in data list (for bar/scatter plots).
self.InvalidNumSamples = [[[1, 2, 3, 4, 5]],
[[4, 5, 6, 7, 8], [2, 3, 2]],
[[4, 7, 10, 33, 32, 6, 7, 8]]]
# Test valid data with one sample (for bar/scatter plots).
self.ValidSingleSampleData = [[[1, 2, 3, 4, 5]],
[[4, 5, 6, 7, 8]],
[[4, 7, 10, 33, 32, 6, 7, 8]]]
# Test valid data with three samples and four data points
# (for bar/scatter plots).
self.ValidTypicalData = [[[1.0, 2, 3.5, 5], [2, 3, 5, 6], [2, 3, 8]],
[[4, 7, 8], [8, 9, 10, 11], [9.0, 4, 1, 1]],
[[4, 33, 32, 6, 8], [5, 4, 8, 13], [1, 1, 2]],
[[2, 2, 2, 2], [3, 9, 8], [2, 1, 6, 7, 4, 5]]]
# Test typical data to be plotted by the boxplot function.
self.ValidTypicalBoxData = [[3.4, 10, 11.67, 12.0, 2, 2, 99.99],
[2.3, 4, 5, 88, 9, 10, 11, 1, 0, 3, -8],
[2, 9, 7, 5, 6]]
def test_validate_input_null(self):
"""_validate_input() should raise a ValueError if null data is passed
to it."""
self.assertRaises(ValueError, _validate_input,
self.Null, None, None, None)
def test_validate_input_empty(self):
"""_validate_input() should raise a ValueError if empty data is passed
to it."""
self.assertRaises(ValueError, _validate_input,
self.Empty, None, None, None)
def test_validate_input_empty_nested(self):
"""_validate_input() should raise a ValueError if empty nested data is
passed to it."""
self.assertRaises(ValueError, _validate_input,
self.EmptyNested, None, None, None)
def test_validate_input_empty_deeply_nested(self):
"""_validate_input() should pass for deeply nested empty data."""
num_points, num_samples = _validate_input(self.EmptyDeeplyNested,
None, None, None)
self.assertEqual(num_points, 1)
self.assertEqual(num_samples, 1)
def test_validate_input_invalid_num_samples(self):
"""_validate_input() should raise a ValueError if an inconsistent
number of samples in included in the data."""
self.assertRaises(ValueError, _validate_input,
self.InvalidNumSamples, None, None, None)
def test_validate_x_values_invalid_x_values(self):
"""_validate_x_values() should raise a ValueError on an invalid number
of x_values."""
self.assertRaises(ValueError, _validate_x_values,
[1, 2, 3, 4], ["T0", "T1", "T2"],
len(self.ValidSingleSampleData))
def test_validate_x_values_invalid_x_tick_labels(self):
"""_validate_x_values() should raise a ValueError on an invalid number
of x_tick_labels."""
self.assertRaises(ValueError, _validate_x_values,
None, ["T0"], len(self.ValidSingleSampleData))
def test_validate_x_values_nonnumber_x_values(self):
"""_validate_x_values() should raise a ValueError on x_values that
aren't numbers."""
self.assertRaises(ValueError, _validate_x_values,
["foo", 2, 3], None, len(self.ValidSingleSampleData))
def test_validate_x_values_valid_x_values(self):
"""_validate_x_values() should not throw an exception."""
_validate_x_values([1, 2.0, 3], None, 3)
def test_validate_input_invalid_data_point_names(self):
"""_validate_input() should raise a ValueError on data_point_names that
are an invalid length."""
self.assertRaises(ValueError, _validate_input,
self.ValidSingleSampleData, None, ["T0", "T1"], None)
def test_validate_input_invalid_sample_names(self):
"""_validate_input() should raise a ValueError on sample_names that are
an invalid length."""
self.assertRaises(ValueError, _validate_input,
self.ValidSingleSampleData, None, None, ["Men", "Women"])
def test_validate_input_all_valid_input(self):
"""_validate_input() should return valid information about the data
without throwing an exception."""
self.assertEqual(_validate_input(self.ValidTypicalData, [1, 3, 4, 8],
["T0", "T1", "T2", "T3"],
["Infants", "Children", "Teens"]),
(4, 3))
def test_get_distribution_markers_null_marker_list(self):
"""_get_distribution_markers() should return a list of predefined
matplotlib markers."""
self.assertEqual(_get_distribution_markers('colors', None, 5),
['b', 'g', 'r', 'c', 'm'])
def test_get_distribution_markers_empty_marker_list(self):
"""_get_distribution_markers() should return a list of predefined
matplotlib markers."""
self.assertEqual(_get_distribution_markers('colors', None, 4),
['b', 'g', 'r', 'c'])
def test_get_distribution_markers_insufficient_markers(self):
"""_get_distribution_markers() should return a wrapped list of
predefined markers."""
# Save stdout and replace it with something that will capture the print
# statement. Note: this code was taken from here:
# http://stackoverflow.com/questions/4219717/how-to-assert-output-
# with-nosetest-unittest-in-python/4220278#4220278
saved_stdout = sys.stdout
try:
out = StringIO()
sys.stdout = out
self.assertEqual(_get_distribution_markers('colors', None, 10),
['b', 'g', 'r', 'c', 'm', 'y', 'w', 'b', 'g', 'r'])
self.assertEqual(_get_distribution_markers('symbols',
['^', '>', '<'], 5), ['^', '>', '<', '^', '>'])
output = out.getvalue().strip()
self.assertEqual(output, "There are not enough markers to "
"uniquely represent each distribution in your dataset. "
"You may want to provide a list of markers that is at "
"least as large as the number of distributions in your "
"dataset.\nThere are not enough markers to "
"uniquely represent each distribution in your dataset. "
"You may want to provide a list of markers that is at "
"least as large as the number of distributions in your "
"dataset.")
finally:
sys.stdout = saved_stdout
def test_get_distribution_markers_bad_marker_type(self):
"""_get_distribution_markers() should raise a ValueError."""
self.assertRaises(ValueError, _get_distribution_markers, 'shapes', [],
3)
def test_get_distribution_markers_zero_markers(self):
"""_get_distribution_markers() should return an empty list."""
self.assertEqual(_get_distribution_markers('symbols', None, 0), [])
self.assertEqual(_get_distribution_markers('symbols', ['^'], 0), [])
def test_create_plot(self):
"""_create_plot() should return a tuple containing a Figure and
Axes."""
fig, ax = _create_plot()
self.assertEqual(fig.__class__.__name__, "Figure")
self.assertEqual(ax.__class__.__name__, "AxesSubplot")
def test_plot_bar_data(self):
"""_plot_bar_data() should return a list of Rectangle objects."""
fig, ax = _create_plot()
result = _plot_bar_data(ax, [1, 2, 3], 'red', 0.5, 3.75, 1.5, 'stdv')
self.assertEqual(result[0].__class__.__name__, "Rectangle")
self.assertEqual(len(result), 1)
self.assertFloatEqual(result[0].get_width(), 0.5)
self.assertFloatEqual(result[0].get_facecolor(), (1.0, 0.0, 0.0, 1.0))
self.assertFloatEqual(result[0].get_height(), 2.0)
fig, ax = _create_plot()
result = _plot_bar_data(ax, [1, 2, 3], 'red', 0.5, 3.75, 1.5, 'sem')
self.assertEqual(result[0].__class__.__name__, "Rectangle")
self.assertEqual(len(result), 1)
self.assertFloatEqual(result[0].get_width(), 0.5)
self.assertFloatEqual(result[0].get_facecolor(), (1.0, 0.0, 0.0, 1.0))
self.assertFloatEqual(result[0].get_height(), 2.0)
def test_plot_bar_data_bad_error_bar_type(self):
"""_plot_bar_data() should raise an exception on bad error bar type."""
fig, ax = _create_plot()
self.assertRaises(ValueError, _plot_bar_data, ax, [1, 2, 3], 'red',
0.5, 3.75, 1.5, 'var')
def test_plot_bar_data_empty(self):
"""_plot_bar_data() should not error when given empty list of data,
but should not plot anything."""
fig, ax = _create_plot()
result = _plot_bar_data(ax, [], 'red', 0.5, 3.75, 1.5, 'stdv')
self.assertTrue(result is None)
fig, ax = _create_plot()
result = _plot_bar_data(ax, [], 'red', 0.5, 3.75, 1.5, 'sem')
self.assertTrue(result is None)
def test_plot_scatter_data(self):
"""_plot_scatter_data() should return a Collection instance."""
fig, ax = _create_plot()
result = _plot_scatter_data(ax, [1, 2, 3], '^', 0.77, 1, 1.5, 'stdv')
self.assertFloatEqual(result.get_sizes(), 20)
def test_plot_scatter_data_empty(self):
"""_plot_scatter_data() should not error when given empty list of data,
but should not plot anything."""
fig, ax = _create_plot()
result = _plot_scatter_data(ax, [], '^', 0.77, 1, 1.5, 'stdv')
self.assertTrue(result is None)
def test_plot_box_data(self):
"""_plot_box_data() should return a dictionary for Line2D's."""
fig, ax = _create_plot()
result = _plot_box_data(ax, [0, 0, 7, 8, -3, 44], 'blue', 0.33, 55,
1.5, 'stdv')
self.assertEqual(result.__class__.__name__, "dict")
self.assertEqual(len(result['boxes']), 1)
self.assertEqual(len(result['medians']), 1)
self.assertEqual(len(result['whiskers']), 2)
self.assertEqual(len(result['fliers']), 2)
self.assertEqual(len(result['caps']), 2)
def test_plot_box_data_empty(self):
"""_plot_box_data() should not error when given empty list of data,
but should not plot anything."""
fig, ax = _create_plot()
result = _plot_box_data(ax, [], 'blue', 0.33, 55, 1.5, 'stdv')
self.assertEqual(result.__class__.__name__, "dict")
self.assertEqual(len(result['boxes']), 0)
self.assertEqual(len(result['medians']), 0)
self.assertEqual(len(result['whiskers']), 0)
self.assertEqual(len(result['fliers']), 0)
self.assertEqual(len(result['caps']), 0)
def test_calc_data_point_locations_invalid_widths(self):
"""_calc_data_point_locations() should raise a ValueError
exception when it encounters bad widths."""
self.assertRaises(ValueError, _calc_data_point_locations, [1, 2, 3],
3, 2, -2, 0.5)
self.assertRaises(ValueError, _calc_data_point_locations, [1, 2, 3],
3, 2, 2, -0.5)
def test_calc_data_point_locations_default_spacing(self):
"""_calc_data_point_locations() should return an array containing
the x-axis locations for each data point, evenly spaced from 1..n."""
locs = _calc_data_point_locations(None, 4, 2, 0.25, 0.5)
self.assertEqual(locs, array([1.0, 2.0, 3.0, 4.0]))
def test_calc_data_point_locations_custom_spacing(self):
"""_calc_data_point_locations() should return an array containing
the x-axis locations for each data point, spaced according to a custom
spacing scheme."""
locs = _calc_data_point_locations([3, 4, 10, 12], 4, 2, 0.25, 0.75)
self.assertEqual(locs, array([3.75, 5.0, 12.5, 15.0]))
def test_calc_data_point_ticks(self):
"""_calc_data_point_ticks() should return an array containing the
x-axis locations for each data point tick."""
ticks = _calc_data_point_ticks(array([1, 5, 9, 11]), 1, 0.5, False)
self.assertFloatEqual(ticks, array([1.25, 5.25, 9.25, 11.25]))
ticks = _calc_data_point_ticks(array([0]), 3, 0.5, False)
self.assertFloatEqual(ticks, array([0.75]))
def test_set_axes_options(self):
"""_set_axes_options() should set the labels on the axes and not raise
any exceptions."""
fig, ax = _create_plot()
_set_axes_options(ax, "Plot Title", "x-axis label", "y-axis label",
x_tick_labels=["T0", "T1"])
self.assertEqual(ax.get_title(), "Plot Title")
self.assertEqual(ax.get_ylabel(), "y-axis label")
self.assertEqual(ax.get_xticklabels()[0].get_text(), "T0")
self.assertEqual(ax.get_xticklabels()[1].get_text(), "T1")
def test_set_axes_options_ylim(self):
"""_set_axes_options() should set the y-axis limits."""
fig, ax = _create_plot()
_set_axes_options(ax, "Plot Title", "x-axis label", "y-axis label",
x_tick_labels=["T0", "T1", "T2"], y_min=0, y_max=1)
self.assertEqual(ax.get_title(), "Plot Title")
self.assertEqual(ax.get_ylabel(), "y-axis label")
self.assertEqual(ax.get_xticklabels()[0].get_text(), "T0")
self.assertEqual(ax.get_xticklabels()[1].get_text(), "T1")
self.assertFloatEqual(ax.get_ylim(), [0, 1])
def test_set_axes_options_bad_ylim(self):
"""_set_axes_options() should raise an exception when given non-numeric
y limits."""
fig, ax = _create_plot()
self.assertRaises(ValueError, _set_axes_options, ax, "Plot Title",
"x-axis label", "y-axis label",
x_tick_labels=["T0", "T1", "T2"], y_min='car',
y_max=30)
def test_create_legend(self):
"""_create_box_plot_legend() should create a legend on valid input."""
fig, ax = _create_plot()
_create_legend(ax, ['b', 'r'], ['dist1', 'dist2'], 'colors')
self.assertEqual(len(ax.get_legend().get_texts()), 2)
fig, ax = _create_plot()
_create_legend(ax, ['^', '<', '>'], ['dist1', 'dist2', 'dist3'],
'symbols')
self.assertEqual(len(ax.get_legend().get_texts()), 3)
def test_create_legend_invalid_input(self):
"""Test raises error on bad input."""
fig, ax = _create_plot()
self.assertRaises(ValueError, _create_legend, ax,
['^', '<', '>'], ['dist1', 'dist2'], 'symbols')
self.assertRaises(ValueError, _create_legend, ax, ['^', '<', '>'],
['dist1', 'dist2', 'dist3'], 'foo')
def test_generate_box_plots(self):
"""generate_box_plots() should return a valid Figure object."""
fig = generate_box_plots(self.ValidTypicalBoxData, [1, 4, 10],
["Data 1", "Data 2", "Data 3"], "Test",
"x-axis label", "y-axis label")
ax = fig.get_axes()[0]
self.assertEqual(ax.get_title(), "Test")
self.assertEqual(ax.get_xlabel(), "x-axis label")
self.assertEqual(ax.get_ylabel(), "y-axis label")
self.assertEqual(len(ax.get_xticklabels()), 3)
self.assertFloatEqual(ax.get_xticks(), [1, 4, 10])
def test_generate_comparative_plots_bar(self):
"""generate_comparative_plots() should return a valid barchart Figure
object."""
fig = generate_comparative_plots('bar', self.ValidTypicalData,
[1, 4, 10, 11], ["T0", "T1", "T2", "T3"],
["Infants", "Children", "Teens"], ['b', 'r', 'g'],
"x-axis label", "y-axis label", "Test")
ax = fig.get_axes()[0]
self.assertEqual(ax.get_title(), "Test")
self.assertEqual(ax.get_xlabel(), "x-axis label")
self.assertEqual(ax.get_ylabel(), "y-axis label")
self.assertEqual(len(ax.get_xticklabels()), 4)
self.assertFloatEqual(ax.get_xticks(), [2.3, 7.4, 17.6, 19.3])
def test_generate_comparative_plots_insufficient_colors(self):
"""generate_comparative_plots() should work even when there aren't
enough colors. We should capture a print statement that warns the
users."""
saved_stdout = sys.stdout
try:
out = StringIO()
sys.stdout = out
generate_comparative_plots('bar', self.ValidTypicalData,
[1, 4, 10, 11], ["T0", "T1", "T2", "T3"],
["Infants", "Children", "Teens"], ['b', 'r'],
"x-axis label", "y-axis label", "Test")
output = out.getvalue().strip()
self.assertEqual(output, "There are not enough markers to "
"uniquely represent each distribution in your dataset. "
"You may want to provide a list of markers that is at "
"least as large as the number of distributions in your "
"dataset.")
finally:
sys.stdout = saved_stdout
def test_generate_comparative_plots_scatter(self):
"""generate_comparative_plots() should return a valid scatterplot
Figure object."""
fig = generate_comparative_plots('scatter', self.ValidTypicalData,
[1, 4, 10, 11], ["T0", "T1", "T2", "T3"],
["Infants", "Children", "Teens"], ['^', '>', '<'],
"x-axis label", "y-axis label", "Test")
ax = fig.get_axes()[0]
self.assertEqual(ax.get_title(), "Test")
self.assertEqual(ax.get_xlabel(), "x-axis label")
self.assertEqual(ax.get_ylabel(), "y-axis label")
self.assertEqual(len(ax.get_xticklabels()), 4)
self.assertFloatEqual(ax.get_xticks(), [2.1, 7.2, 17.4, 19.1])
def test_generate_comparative_plots_insufficient_symbols(self):
"""generate_comparative_plots() should work even when there aren't
enough symbols. We should capture a print statement that warns the
users."""
saved_stdout = sys.stdout
try:
out = StringIO()
sys.stdout = out
generate_comparative_plots('scatter', self.ValidTypicalData,
[1, 4, 10, 11], ["T0", "T1", "T2", "T3"],
["Infants", "Children", "Teens"], ['^'],
"x-axis label", "y-axis label", "Test")
output = out.getvalue().strip()
self.assertEqual(output, "There are not enough markers to "
"uniquely represent each distribution in your dataset. "
"You may want to provide a list of markers that is at "
"least as large as the number of distributions in your "
"dataset.")
finally:
sys.stdout = saved_stdout
def test_generate_comparative_plots_empty_marker_list(self):
"""generate_comparative_plots() should use the predefined list of
markers if an empty list is provided by the user."""
generate_comparative_plots('scatter', self.ValidTypicalData,
[1, 4, 10, 11], ["T0", "T1", "T2", "T3"],
["Infants", "Children", "Teens"], [],
"x-axis label", "y-axis label", "Test")
def test_generate_comparative_plots_box(self):
"""generate_comparative_plots() should return a valid boxplot Figure
object."""
fig = generate_comparative_plots('box', self.ValidTypicalData,
[1, 4, 10, 11], ["T0", "T1", "T2", "T3"],
["Infants", "Children", "Teens"], ['b', 'g', 'y'],
"x-axis label", "y-axis label", "Test")
ax = fig.get_axes()[0]
self.assertEqual(ax.get_title(), "Test")
self.assertEqual(ax.get_xlabel(), "x-axis label")
self.assertEqual(ax.get_ylabel(), "y-axis label")
self.assertEqual(len(ax.get_xticklabels()), 4)
self.assertFloatEqual(ax.get_xticks(), [2.1, 7.2, 17.4, 19.1])
def test_generate_comparative_plots_error(self):
"""generate_comparative_plots() should raise a ValueError for an
invalid plot type."""
self.assertRaises(ValueError, generate_comparative_plots, 'pie',
self.ValidTypicalData,
[1, 4, 10, 11], ["T0", "T1", "T2", "T3"],
["Infants", "Children", "Teens"], ['b', 'g', 'y'],
"x-axis label", "y-axis label", "Test")
def test_color_box_plot(self):
"""_color_box_plot() should not throw an exception when passed the
proper input."""
fig, ax = _create_plot()
box_plot = boxplot(self.ValidTypicalBoxData)
_color_box_plot(ax, box_plot, 'blue')
def test_set_figure_size(self):
"""Test setting a valid figure size."""
fig, ax = _create_plot()
_set_axes_options(ax, 'foo', 'x_foo', 'y_foo',
x_tick_labels=['foofoofoo', 'barbarbar'],
x_tick_labels_orientation='vertical')
_set_figure_size(fig, 3, 4)
self.assertFloatEqual(fig.get_size_inches(), (3, 4))
def test_set_figure_size_defaults(self):
"""Test setting a figure size using matplotlib defaults."""
fig, ax = _create_plot()
_set_axes_options(ax, 'foo', 'x_foo', 'y_foo',
x_tick_labels=['foofoofoo', 'barbarbar'],
x_tick_labels_orientation='vertical')
orig_fig_size = fig.get_size_inches()
_set_figure_size(fig)
self.assertFloatEqual(fig.get_size_inches(), orig_fig_size)
def test_set_figure_size_invalid(self):
"""Test setting a figure size using invalid dimensions."""
fig, ax = _create_plot()
_set_axes_options(ax, 'foo', 'x_foo', 'y_foo',
x_tick_labels=['foofoofoo', 'barbarbar'],
x_tick_labels_orientation='vertical')
orig_fig_size = fig.get_size_inches()
_set_figure_size(fig, -1, 0)
self.assertFloatEqual(fig.get_size_inches(), orig_fig_size)
def test_set_figure_size_long_labels(self):
"""Test setting a figure size that has really long labels."""
saved_stdout = sys.stdout
try:
out = StringIO()
sys.stdout = out
fig, ax = _create_plot()
_set_axes_options(ax, 'foo', 'x_foo', 'y_foo',
x_tick_labels=['foofoofooooooooooooooooooooooooo'
'ooooooooooooooooooooooooooooooooooooooooooooooo'
'ooooooooooooooooooooo', 'barbarbar'],
x_tick_labels_orientation='vertical')
_set_figure_size(fig, 3, 3)
self.assertFloatEqual(fig.get_size_inches(), (3, 3))
output = out.getvalue().strip()
self.assertEqual(output,
"Warning: could not automatically resize plot to make room for "
"axes labels and plot title. This can happen if the labels or "
"title are extremely long and the plot size is too small. Your "
"plot may have its labels and/or title cut-off. To fix this, "
"try increasing the plot's size (in inches) and try again.")
finally:
sys.stdout = saved_stdout
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_draw/test_matplotlib/ 000755 000765 000024 00000000000 12024703633 022604 5 ustar 00jrideout staff 000000 000000 PyCogent-1.5.3/tests/test_draw/test_matplotlib/test_arrow_rates.py 000644 000765 000024 00000001602 12024702176 026545 0 ustar 00jrideout staff 000000 000000 #/usr/bin/env python
from cogent.draw.arrow_rates import make_arrow_plot, sample_data
from cogent.util.unit_test import TestCase, main
from os import remove
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class arrow_rates_tests(TestCase):
"""Tests of top-level function, primarily checking that it writes the file.
WARNING: must visually inspect output to check correctness!
"""
def test_make_arrow_plot(self):
"""arrow_plot should write correct file and not raise exception"""
make_arrow_plot(sample_data, graph_name='arrows.png')
#comment out line below to see the result
#remove('arrows.png')
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_draw/test_matplotlib/test_codon_usage.py 000644 000765 000024 00000023234 12024702176 026510 0 ustar 00jrideout staff 000000 000000 #/usr/bin/env python
"""Tests of the codon usage graphs.
Note: currently, this must be invoked manually because all the output is
graphical. Expects to be invoked from the test directory.
"""
from cogent.maths.stats.cai.adaptor import data_from_file, adapt_p12, \
adapt_p123gc, adapt_cai_p3, adapt_cai_histogram, adapt_fingerprint, \
adapt_pr2_bias, file_to_codon_list, \
bin_by_p3
from cogent.draw.codon_usage import plot_cai_p3_scatter, \
plot_p12_p3, plot_p123_gc, plot_fingerprint, plot_pr2_bias, \
plot_p12_p3_contour, \
plot_p12_p3_contourlines, aa_labels
from cogent.draw.util import as_species, \
plot_scatter_with_histograms, plot_histograms, \
plot_filled_contour, format_contour_array
from pylab import gca, clf
from numpy import transpose
from sys import argv
from os import getcwd
from os.path import sep, join
test_path = getcwd().split(sep)
index = test_path.index('tests')
fields = test_path[:index+1] + ["data"]
test_path = sep + join(*fields)
test_file_name = join(test_path, 'Homo_sapiens_codon_usage.pri')
__author__ = "Stephanie Wilson"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight", "Stephanie Wilson"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
def insert_before_extension(name, item):
"""Inserts item before extension in name."""
last_dot = name.rfind('.')
return name[:last_dot+1]+str(item)+'.'+ name[last_dot+1:]
#TESTS USING ADAPTORS
def make_generic_adaptor_test(adaptor_f, plot_f, default_outfilename):
"""Makes adaptor test for generic graphs."""
def adaptor_test(codons, infilename, name, outfilename=default_outfilename):
print "=> outfile:", outfilename
print "from file:", infilename
graph_data = adaptor_f(codons)
plot_f(graph_data, num_genes=len(codons), graph_name=outfilename,\
title=name)
gca().clear()
clf()
return adaptor_test
def make_contour_adaptor_test(adaptor_f, plot_f, default_outfilename):
"""Makes adaptor test for contour graphs."""
def adaptor_test(codons, infilename, name, outfilename=default_outfilename):
print "=> outfile:", outfilename
print "from file:", infilename
xy_data = adaptor_f(codons)
x, y, data = format_contour_array(xy_data)
plot_f(x, y, data, xy_data, num_genes=len(codons), \
graph_name=outfilename, title=name)
gca().clear()
clf()
return adaptor_test
def make_pr2bias_adaptor_test(adaptor_f, plot_f, default_outfilename):
"""Makes adaptor test for pr2bias graphs."""
def adaptor_test(codons, infilename, name, outfilename=default_outfilename):
print "=> base outfile:", outfilename
print "from file:", infilename
for aa, triplet in aa_labels.items():
triplet = triplet.replace('T','U')
graph_data = adaptor_f(codons, block=triplet[:2])
curr_outfilename = insert_before_extension(outfilename, triplet)
plot_f(graph_data, num_genes=len(codons), \
graph_name=curr_outfilename, title=aa)
gca().clear()
clf()
return adaptor_test
def make_gc_gradient_adaptor_test(adaptor_f, plot_f, default_outfilename, \
one_graph=False, one_series=False):
"""Makes adaptor test for replicated graphs over e.g. a GC gradient"""
def adaptor_test(codons, infilename, name, outfilename=default_outfilename):
min_gene_threshold=10 #suppress bins with few genes
print "=>base outfile:", outfilename
print "from file:", infilename
print "one graph:", one_graph
print "one series:", one_series
gc_bins = bin_by_p3(codons)
if one_series: #assume we want to adapt the list of codon usages
data = adaptor_f(gc_bins)
plot_f(data, num_genes=len(codons), graph_name=outfilename,\
title=name)
else:
data = []
for b in gc_bins:
try:
data.append(adaptor_f(b))
except:
data.append([])
if one_graph: #assume downstream f copes with list of data
total_genes = len(codons)
alpha = [float(len(b))/total_genes for b in gc_bins]
plot_f(data, num_genes=total_genes, graph_name=outfilename, \
alpha=alpha, title=name, multiple=True)
else: #multiple graphs: feed one at a time
for i, c in enumerate(gc_bins):
curr_p3 = i*0.1
if len(c) > min_gene_threshold:
curr_outfilename = insert_before_extension(\
outfilename, curr_p3)
graph_data = adaptor_f(c)
plot_f(graph_data, num_genes=len(c), \
graph_name=curr_outfilename, \
title=name+' '+str(curr_p3)+'-'+str(curr_p3+0.1))
gca().clear()
clf()
gca().clear()
clf()
return adaptor_test
mgat = make_generic_adaptor_test #save typing in what follows
mcat = make_contour_adaptor_test #ditto
mpat = make_pr2bias_adaptor_test
mggat = make_gc_gradient_adaptor_test
#scatterplot adaptors
test_p12_p3_adaptor = mgat(adapt_p12, plot_p12_p3, 'test_p12_p3_A.png')
test_p123_gc_adaptor = mgat(adapt_p123gc, plot_p123_gc, \
'test_p123_gc_A.png')
def plot_p12_p3_from_gc(*args, **kwargs):
return plot_p123_gc(use_p3_as_x=True, graph_shape='sqr',*args, **kwargs)
test_p12_p3gc_adaptor = mgat(adapt_p123gc, plot_p12_p3_from_gc, \
'test_p12_from_gc_A.png')
test_cai_p3_adaptor = mgat(adapt_cai_p3, plot_cai_p3_scatter, \
'test_p3_cai_A.png')
def adapt_cai_p3_twoseries(*args, **kwargs):
return adapt_cai_p3(both_series=True)
def adapt_cai_p3_twoseries(*args, **kwargs): \
return adapt_cai_p3(both_series=True, *args, **kwargs)
test_cai_p3_twoseries_adaptor = mgat(adapt_cai_p3_twoseries, \
plot_cai_p3_scatter, 'test_p3_cai_twoseries_A.png')
def scat_hist_cai_p3(data, *args, **kwargs):
return plot_scatter_with_histograms(data, x_label='$P_3$', y_label='CAI',\
*args, **kwargs)
test_cai_p3_twoseries_adaptor_hist = mgat(adapt_cai_p3_twoseries, \
scat_hist_cai_p3, 'test_p3_cai_twoseries_hist_A.png')
#hist adaptors
def cai_histogram(data, *args, **kwargs):
return plot_histograms(data, x_label='CAI', \
series_names=['others', 'ribosomal'], show_legend=True, \
colors=['white','red'], linecolors=['black','red'], alpha=0.7, \
*args, **kwargs)
test_cai_hist_adaptor = mgat(adapt_cai_histogram, cai_histogram,\
'test_cai_hist.png')
#fingerprint adaptors
test_fingerprint_adaptor = mgat(adapt_fingerprint, plot_fingerprint, \
'test_fingerprint_A.png')
test_fingerprint_gradient_adaptor = mggat(adapt_fingerprint, plot_fingerprint,\
'test_fingerprint_gradient_A.png')
test_fingerprint_gradient_adaptor_one_graph = mggat(adapt_fingerprint, \
plot_fingerprint, 'test_fingerprint_gradient_onegraph_A.png', one_graph=True)
#pr2 bias adaptors
test_pr2bias_adaptor = mpat(adapt_pr2_bias, plot_pr2_bias, \
'test_pr2_bias_A.png')
#contour adaptors
test_p12_p3_contour_adaptor = mcat(adapt_p12, plot_p12_p3_contour, \
'test_p12_p3_contour_A.png')
test_p12_p3_contourlines_adaptor = mcat(adapt_p12, plot_p12_p3_contourlines, \
'test_p12_p3_contourlines_A.png')
scatter_adaptor = [test_p12_p3_adaptor, test_p123_gc_adaptor, \
test_p12_p3gc_adaptor, test_cai_p3_adaptor, test_cai_p3_twoseries_adaptor,\
test_cai_p3_twoseries_adaptor_hist]
hist_adaptor=[test_cai_hist_adaptor]
fingerprint_adaptor = [test_fingerprint_adaptor,
test_fingerprint_gradient_adaptor, \
test_fingerprint_gradient_adaptor_one_graph]
pr2bias_adaptor = [test_pr2bias_adaptor]
contour_adaptor = [test_p12_p3_contour_adaptor, \
test_p12_p3_contourlines_adaptor]
all_adaptor = scatter_adaptor + fingerprint_adaptor + pr2bias_adaptor \
+ hist_adaptor + contour_adaptor
#take in pre-constructed codon usage objects, output requested graphs
def codons_to_graph(codons, as_file, species, which_tests):
"""Function for directly passing in codons instead of
reading them in from a file"""
adaptor_tests= all_adaptor
codon_data_fname=as_file
print "Running adaptor tests..."
codon_data = codons
if which_tests:
for i in which_tests:
print "doing test %s" % i
a = adaptor_tests[i]
a(codon_data, codon_data_fname, species)
else:
for i, a in enumerate(adaptor_tests):
print "doing test %s" % i
a(codon_data,codon_data_fname, species)
if __name__ == '__main__':
"""Tests if the graphs will all compile
and outputs the graph from the current
version of code:
test_fingerprint.png
"""
adaptor_tests = all_adaptor
if len(argv) > 1:
codon_data_fname = argv[1]
else:
codon_data_fname = test_file_name
if len(argv) > 2:
which_tests = map(int, argv[2].split(','))
else:
which_tests = None
print "Running adaptor tests..."
if codon_data_fname.endswith('.nuc'): #assume FASTA from KEGG
codon_data = kegg_fasta_to_codon_list(open(codon_data_fname))
else:
codon_data = file_to_codon_list(codon_data_fname)
if which_tests:
for i in which_tests:
print "doing test %s" % i
a = adaptor_tests[i]
a(codon_data, codon_data_fname, as_species(codon_data_fname))
else:
for i, a in enumerate(adaptor_tests):
print "doing test %s" % i
a(codon_data, codon_data_fname, as_species(codon_data_fname))
PyCogent-1.5.3/tests/test_draw/test_matplotlib/test_dinuc.py 000644 000765 000024 00000005152 12024702176 025323 0 ustar 00jrideout staff 000000 000000 #/usr/bin/env python
from cogent.draw.dinuc import dinuc_plot
from numpy import array, clip
from numpy.random import random
from cogent.util.unit_test import TestCase, main
from os import remove
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
def add_random(a):
"""Adds a small random component to gene a, while maintaining the sum."""
r = (random(a.shape)-.5)/10
return clip(a + r, 0, 1)
class dinuc_tests(TestCase):
"""Tests of top-level function, primarily checking that it writes the file.
WARNING: must visually inspect output to check correctness!
"""
def test_dinuc(self):
"""dinuc_plot should write correct file and not raise exception"""
spec_a_ave = array([0.25, 0.20, 0.15, 0.20, 0.25, 0.30, 0.35, 0.40,
0.45, 0.40, 0.55, 0.60, 0.65, 0.70, 0.60, 0.70])
spec_b_ave = array([0.80, 0.75, 0.70, 0.65, 0.60, 0.55, 0.50, 0.45,
0.40, 0.35, 0.30, 0.25, 0.20, 0.15, 0.20, 0.15])
spec_c_ave = array([0.40, 0.50, 0.55, 0.50, 0.45, 0.50, 0.45, 0.50,
0.45, 0.55, 0.45, 0.45, 0.45, 0.55, 0.50, 0.55])
spec_a_ht_gene = array([0.41, 0.52, 0.53, 0.53, 0.45, 0.51, 0.46, 0.52,
0.43, 0.55, 0.45, 0.45, 0.45, 0.55, 0.53, 0.55])
spec_b_ht_gene = array([0.26, 0.22, 0.17, 0.19, 0.23, 0.32, 0.36, 0.41,
0.44, 0.41, 0.53, 0.62, 0.63, 0.71, 0.64, 0.72])
spec_b_rb_gene = array([0.82, 0.74, 0.73, 0.63, 0.62, 0.54, 0.51, 0.43,
0.41, 0.34, 0.32, 0.24, 0.23, 0.17, 0.28, 0.19])
spec_c_rb_gene = array([0.43, 0.54, 0.56, 0.51, 0.44, 0.51, 0.42, 0.53,
0.44, 0.53, 0.43, 0.47, 0.46, 0.57, 0.53, 0.52])
a_data = {'hgt':[spec_a_ht_gene], \
None: map(add_random, [spec_a_ave.copy() for i in range(10)])}
b_data = {'hgt':[spec_b_ht_gene], 'ribosomal':[spec_b_rb_gene], \
None:[spec_b_ave]}
c_data = {'ribosomal':[spec_b_rb_gene], None:[spec_c_ave]}
data = {'Species A': a_data, 'Species B': b_data, 'Species C':c_data}
dinuc_plot(data, avg_formats={'markersize':5}, \
point_formats={'s':2, 'alpha':0.2}, graph_name='test.png')
#note:comment out the next line to see the test file
#remove('test.png')
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_draw/test_matplotlib/test_multivariate_plot.py 000644 000765 000024 00000010511 12024702176 027760 0 ustar 00jrideout staff 000000 000000 #/usr/bin/env python
from cogent.draw.multivariate_plot import (plot_ordination,
map_colors, text_points, scatter_points, plot_points, arrows)
from cogent.util.unit_test import TestCase, main
import os, pylab
from tempfile import mkstemp
from numpy import asarray, c_
from pdb import set_trace
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
######
# util class
class TestCasePlot(TestCase):
Debug = True
def fig(self, fname=None):
if self.Debug:
pylab.show()
else:
if not fname:
fd, fname = mkstemp(prefix='PlotTest_', suffix='.png')
pylab.savefig(fname)
pylab.clf()
os.remove(fname)
def p(self, obj):
if self.Debug:
print obj
class TestI(TestCasePlot):
def setUp(self):
self.points = [(0, 0), (1,1), (1,2), (2, 2)]
#adjust canvas
#pylab.xlim(-0.5, 2.5)
#pylab.ylim(-0.5, 2.5)
class FunctionsTests(TestI):
def test_map_colors(self):
#default
self.assertEqual(map_colors(range(3)),
['#000080', '#7dff7a', '#800000'])
#alternative cmap
self.assertEqual(map_colors(range(3), cmap='hot'),
['#0b0000', '#ff5c00', '#ffffff'])
#return tuples
self.assertFloatEqual(map_colors(range(3), mode='tuples'),
[(0.0, 0.0, 0.5), (0.490196, 1.0, 0.477546), (0.5, 0.0, 0.0)])
def test_text_points(self):
#same text for all the points
text_points(self.points, 'X')
self.fig()
def test_text_points_diff_texts(self):
#diff text for all the points
text_points(self.points, ['A', 'B', 'C', 'A'])
self.fig()
def test_plot_points(self):
plot_points(self.points, label='X')
self.fig()
class scatter_points_tests(TestI):
def test_basic(self):
scatter_points(self.points, label='X')
self.fig()
def test_color_list(self):
scatter_points(self.points, c=['k', 'red', '#00FF00', (0, 0, 1)],
s=[100, 200, 300, 400], marker=['o', 's', 'd', 'h'])
pylab.xlim(-0.5, 2.5)
pylab.ylim(-0.5, 2.5)
self.fig()
def test_color_shades(self):
#self.Debug = True
scatter_points(self.points, c=[1, 2, 3, 4],
s=[100, 200, 300, 400], marker=['o', 's', 'd', 'h'])
pylab.xlim(-0.5, 2.5)
pylab.ylim(-0.5, 2.5)
self.fig()
class arrows_tests(TestI):
def test_arrows(self):
points = self.points
arrows([[0,0]]*len(points), points)
self.fig()
class plot_ordination_tests(TestCasePlot):
def setUp(self):
self.points = points_3d
self.keys = ['eigvals', 'samples', 'species', 'centroids', 'biplot']
self.values = [
[0.3, 0.1, 0, -0.5], #eigvals
self.points, #samples
self.points + 0.2, #species
[(0, 1, 0), (2, -2, 0)], #centroids
[(0.5, 0.5, 1), (-1, 1.5, 1)], #biplot
]
def test_basic(self):
res = dict(zip(self.keys[:2], self.values))
plot_ordination(res)
self.fig()
def test_species(self):
res = dict(zip(self.keys[:3], self.values))
plot_ordination(res)
self.fig()
def test_centroids(self):
res = dict(zip(self.keys[:4], self.values))
plot_ordination(res)
self.fig()
def test_biplot(self):
res = dict(zip(self.keys, self.values))
plot_ordination(res,
species_kw={'label': 'sp'}, biplot_kw={'label':['b1', 'b2']},
samples_kw={'label': 'sa'})
self.fig()
def test_choices(self):
res = dict(zip(self.keys, self.values))
plot_ordination(res, choices=[2,3])
self.fig()
def test_axis_names(self):
#self.Debug = True
res = dict(zip(self.keys[:2], self.values))
plot_ordination(res, axis_names=['CCA1', 'CA1', 'CA2'],
constrained_names='CCA')
self.fig()
#####
# test data
points = asarray([(0,0), (-1, 1), (1, -2), (2, 3)], float)
points_3d = c_[points, [[3]]*len(points)]
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_db/__init__.py 000644 000765 000024 00000000522 12024702176 021137 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
__all__ = ['test_util', 'test_ncbi', 'test_pdb', 'test_rfam', 'test_ensembl']
__author__ = ""
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
PyCogent-1.5.3/tests/test_db/test_ensembl/ 000755 000765 000024 00000000000 12024703632 021511 5 ustar 00jrideout staff 000000 000000 PyCogent-1.5.3/tests/test_db/test_ncbi.py 000644 000765 000024 00000025560 12024702176 021363 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests of data retrieval from NCBI."""
from cogent.util.unit_test import TestCase, main
from cogent.db.ncbi import EUtils, ESearch, EFetch, ELink, ESearchResultParser,\
ELinkResultParser, get_primary_ids, ids_to_taxon_ids, \
taxon_lineage_extractor, taxon_ids_to_lineages, taxon_ids_to_names, \
taxon_ids_to_names_and_lineages, \
get_unique_lineages, get_unique_taxa, parse_taxonomy_using_elementtree_xml_parse
from string import strip
from StringIO import StringIO
__author__ = "Mike Robeson"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Mike Robeson", "Rob Knight", "Zongzhi Liu"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Mike Robeson"
__email__ = "mike.robeson@colorado.edu"
__status__ = "Production"
class EUtilsTests(TestCase):
"""Tests of the EUtils class."""
def test_simple_get(self):
"""EUtils simple access of an item should work"""
g = EUtils(db='protein',rettype='gp')
result = g['NP_003320'].read()
assert result.startswith('LOCUS')
assert 'NP_003320' in result
def test_get_slice(self):
"""EUtils access of a slice should work"""
g = EUtils(db='protein',rettype='gp', retmax=1)
result = g['NP_003320':'NP_003322'].read()
lines = result.splitlines()
is_locus = lambda x: x.startswith('LOCUS')
loci = filter(is_locus, lines)
self.assertEqual(len(loci), 3)
#EUtils access of a slice should work, while limiting
#the esearch term length
g = EUtils(db='protein',rettype='gp', retmax=1, url_limit=2)
result = g['NP_003320':'NP_003322'].read()
lines = result.splitlines()
is_locus = lambda x: x.startswith('LOCUS')
loci = filter(is_locus, lines)
self.assertEqual(len(loci), 3)
def test_get_list(self):
"""EUtils access of a list should work"""
g = EUtils(db='protein',rettype='gp')
result = g['NP_003320','NP_003321','NP_003322'].read()
lines = result.splitlines()
is_locus = lambda x: x.startswith('LOCUS')
loci = filter(is_locus, lines)
self.assertEqual(len(loci), 3)
#EUtils access of a slice should work, while limiting
#the esearch term length
g = EUtils(db='protein',rettype='gp',url_limit=2)
result = g['NP_003320','NP_003321','NP_003322'].read()
lines = result.splitlines()
is_locus = lambda x: x.startswith('LOCUS')
loci = filter(is_locus, lines)
self.assertEqual(len(loci), 3)
# def test_get_from_taxonomy_db(self):
# """EUtils access from taxonomy database should work"""
# #note: this is more fragile than the nucleotide databases
# g = EUtils(db='taxonomy', rettype='Brief', retmode='text')
# ids = '9606[taxid] OR 28901[taxid]'
# result = sorted(g[ids].read().splitlines())
# self.assertEqual(result, ['Homo sapiens', 'Salmonella enterica'])
def test_get_from_taxonomy_db(self):
"""EUtils access from taxonomy database should work"""
#note: this is more fragile than the nucleotide databases
g = EUtils(db='taxonomy', rettype='xml', retmode='xml')
ids = '9606[taxid] OR 28901[taxid]'
fh = StringIO()
fh.write(g[ids].read())
fh.seek(0)
data = parse_taxonomy_using_elementtree_xml_parse(fh)
result = sorted([item['ScientificName'] for item in data])
self.assertEqual(result, ['Homo sapiens', 'Salmonella enterica'])
def test_query(self):
"""EUtils access via a query should work"""
g = EUtils(db='protein', rettype='gi', retmax=100)
result = g['homo[organism] AND erf1[ti]'].read().splitlines()
assert '5499721' in result #gi of human eRF1
#note: successfully retrieved 841,821 ids on a query for 'rrna',
#although it took about 40 min so not in the tests. RK 1/3/07.
def test_query_retmax(self):
"""EUtils should join results taken retmax at a time"""
g = EUtils(db='protein', rettype='gi', retmax=3, DEBUG=False)
result = g['homo[organism] AND myh7'].read().splitlines()
assert len(result) > 1
assert '83304912' in result #gi of human myh7
def test_query_max_recs(self):
"""EUtils should stop query at max_recs when max_recs < retmax"""
g = EUtils(db='protein', rettype='gi', max_recs=5, DEBUG=False,
retmax=100)
result = g['homo[organism] AND myh7'].read().splitlines()
self.assertEqual(len(result), 5)
def test_query_max_recs_gt_retmax(self):
"""EUtils should stop query at max_recs when max_recs > retmax"""
g = EUtils(db='protein', rettype='gi', max_recs=5, DEBUG=False,
retmax=3)
result = g['homo[organism] AND myh7'].read().splitlines()
self.assertEqual(len(result), 5)
class ESearchTests(TestCase):
"""Tests of the ESearch class: gets primary ids from search."""
def test_simple_search(self):
"""ESearch Access via a query should return accessions"""
s = ESearch(db='protein', rettype='gi', retmax=1000,
term='homo[organism] AND myh7')
result = s.read()
parsed = ESearchResultParser(result)
assert '83304912' in parsed.IdList #gi of human cardiac beta myh7
class ELinkTests(TestCase):
"""Tests of the ELink class: converts ids between databases"""
def test_simple_elink(self):
"""ELink should retrieve a link from a single id"""
l = ELink(db='taxonomy', dbfrom='protein', id='83304912')
result = l.read()
parsed = ELinkResultParser(result)
self.assertEqual(parsed, ['9606']) #human sequence
def test_multiple_elink(self):
"""ELink should find unique links in a set of ids"""
l = ELink(db='taxonomy', dbfrom='protein',
id='83304912 115496169 119586556 111309484')
result = l.read()
parsed = ELinkResultParser(result)
self.assertEqual(sorted(parsed), ['10090', '9606'])
#human and mouse sequences
class EFetchTests(TestCase):
"""Tests of the EFetch class: gets records using primary ids."""
def test_simple_efetch(self):
"""EFetch should return records from list of ids"""
f = EFetch(db='protein', rettype='fasta', retmode='text',
id='111309484')
result = f.read().splitlines()
assert result[0].startswith('>')
assert result[1].startswith('madaemaafg'.upper())
class NcbiTests(TestCase):
"""Tests of top-level convenience wrappers."""
def setUp(self):
"""Define some lengthy data."""
self.mouse_taxonomy = map(strip, 'cellular organisms; Eukaryota; Opisthokonta; Metazoa; Eumetazoa; Bilateria; Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Tetrapoda; Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muroidea; Muridae; Murinae; Mus; Mus'.split(';'))
self.human_taxonomy = map(strip, 'cellular organisms; Eukaryota; Opisthokonta; Metazoa; Eumetazoa; Bilateria; Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Tetrapoda; Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; Primates; Haplorrhini; Simiiformes; Catarrhini; Hominoidea; Hominidae; Homininae; Homo'.split(';'))
def test_get_primary_ids(self):
"""get_primary_ids should get primary ids from query"""
res = get_primary_ids('homo[orgn] AND myh7[ti]', retmax=5, max_recs=7)
self.assertEqual(len(res), 7)
res = get_primary_ids('homo[orgn] AND myh7[ti]', retmax=5, max_recs=2)
self.assertEqual(len(res), 2)
res = get_primary_ids('homo[orgn] AND myh7[ti]', retmax=100)
assert '115496168' in res
def test_ids_to_taxon_ids(self):
"""ids_to_taxonomy should get taxon ids from primary ids"""
ids = ['83304912', '115496169', '119586556', '111309484']
result = ids_to_taxon_ids(ids, db='protein')
self.assertEqual(sorted(result), ['10090', '9606'])
def test_taxon_lineage_extractor(self):
"""taxon_lineage_extractor should find lineage lines"""
lines = """ignore
xxx;yyy
ignore
aaa;bbb
"""
self.assertEqual(list(taxon_lineage_extractor(lines.splitlines())),
[['xxx','yyy'],['aaa','bbb']])
def test_parse_taxonomy_using_elementtree_xml_parse(self):
"""parse_taxonomy_using_elementtree_xml_parse should return taxonomy associated information"""
g = EUtils(db='taxonomy', rettype='xml', retmode='xml')
ids = '28901[taxid]'
fh = StringIO()
fh.write(g[ids].read())
fh.seek(0)
data = parse_taxonomy_using_elementtree_xml_parse(fh)[0]
obs = (data['Lineage'],data['TaxId'],data['ScientificName'],\
data['Rank'])
exp = ('cellular organisms; Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; Salmonella',\
'28901','Salmonella enterica','species')
self.assertEqual(obs,exp)
def test_taxon_ids_to_lineages(self):
"""taxon_ids_to_lineages should return lineages from taxon ids"""
taxon_ids = ['10090', '9606']
result = [self.mouse_taxonomy, self.human_taxonomy]
self.assertEqualItems(list(taxon_ids_to_lineages(taxon_ids)), result)
# def test_taxon_ids_to_names(self):
# """taxon_ids_to_names should return names from taxon ids"""
# taxon_ids = ['10090', '9606']
# result = set(['Mus musculus', 'Homo sapiens'])
# self.assertEqual(set(taxon_ids_to_names(taxon_ids)), result)
def test_taxon_ids_to_names(self):
"""taxon_ids_to_names should return names from taxon ids"""
taxon_ids = ['10090', '9606']
result = set(['Mus musculus', 'Homo sapiens'])
self.assertEqual(set(taxon_ids_to_names(taxon_ids)), result)
def test_taxon_ids_to_names_and_lineages(self):
"""taxon_ids_to_names should return names/lineages from taxon ids"""
taxon_ids = ['10090', '9606']
exp = [('10090', 'Mus musculus', '; '.join(self.mouse_taxonomy)),
('9606', 'Homo sapiens', '; '.join(self.human_taxonomy))]
obs = list(taxon_ids_to_names_and_lineages(taxon_ids))
self.assertEqualItems(obs, exp)
def test_get_unique_lineages(self):
"""get_unique_lineages should return all lineages from a query"""
result = get_unique_lineages('angiotensin[ti] AND rodents[orgn]')
assert tuple(self.mouse_taxonomy) in result
assert len(result) > 2
def test_get_unique_taxa(self):
"""get_unique_taxa should return all taxa from a query"""
result = get_unique_taxa('angiotensin[ti] AND primate[orgn]')
assert 'Homo sapiens' in result
assert len(result) > 2
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_db/test_pdb.py 000644 000765 000024 00000001470 12024702176 021207 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests of data retrieval from PDB."""
from cogent.util.unit_test import TestCase, main
from cogent.db.pdb import Pdb
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class PdbTests(TestCase):
"""Tests of the Pdb class."""
def test_simple_get(self):
"""Simple access of an item should work."""
rec = Pdb()
result = rec['1RMN'].read()
assert result.startswith('HEADER')
assert result.rstrip().endswith('END') #note: trailing whitespace
assert 'HAMMERHEAD RIBOZYME' in result
assert '1RMN' in result
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_db/test_rfam.py 000644 000765 000024 00000001351 12024702176 021365 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests of data retrieval from PDB."""
from cogent.util.unit_test import TestCase, main
from cogent.db.rfam import Rfam
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight","Jeremy Widmann"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class RfamTests(TestCase):
"""Tests of the Rfam class."""
def test_simple_get(self):
"""Simple access of an item should work."""
rec = Rfam()
result = rec['rf00100'].read()
assert result.startswith('# STOCKHOLM')
assert 'AM773434.1/1-304' in result
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_db/test_util.py 000644 000765 000024 00000005534 12024702176 021424 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests of the db utility functions and classes."""
from cogent.util.unit_test import TestCase, main
from cogent.db.util import UrlGetter, expand_slice, last_nondigit_index,make_lists_of_expanded_slices_of_set_size,make_lists_of_accessions_of_set_size
from os import remove
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class db_util_tests(TestCase):
"""Tests of top-level functions."""
def test_last_nondigit_index(self):
"""last_nondigit_index should return i such that s[i:] is numeric"""
ldi = last_nondigit_index
self.assertEqual(ldi('3'), 0)
self.assertEqual(ldi('345'), 0)
self.assertEqual(ldi('a34'), 1)
self.assertEqual(ldi('abc234'), 3)
self.assertEqual(ldi('abcd'), None)
def test_expand_slice(self):
"""expand_slice should get accession range"""
self.assertEqual(expand_slice(slice('AF1001','AF1003')), \
['AF1001','AF1002','AF1003'])
#can't expand if accession prefixes
self.assertRaises(TypeError, expand_slice, slice('AF100:','AG1002'))
#should keep leading zeros
self.assertEqual(expand_slice(slice('AF0001','AF0003')), \
['AF0001','AF0002','AF0003'])
def test_make_lists_of_expanded_slices_of_set_size(self):
"""make_lists_of_expanded_slices_of_set_size: should return a
list of lists"""
expected_list = ['HM780503 HM780504 HM780505','HM780506']
observed = make_lists_of_expanded_slices_of_set_size(slice('HM780503','HM780506'),size_limit=3)
self.assertEqual(observed,expected_list)
def make_lists_of_accessions_of_set_size(self):
"""make_lists_of_expanded_slices_of_set_size: should return a
list of lists"""
expected_list = ['HM780503 HM780506 HM780660 HM780780']
observed = make_lists_of_accessions_of_set_size(['HM780503','HM780506', 'HM780660', 'HM780780'],size_limit=3)
self.assertEqual(observed,expected_list)
class UrlGetterTests(TestCase):
"""Tests of the UrlGetter class"""
def retrieval_test(self):
"""Urlgetter should init, read and retrieve"""
class Google(UrlGetter):
BaseUrl='http://www.google.com'
g = Google()
#test URL construction
self.assertEqual(str(g), 'http://www.google.com')
#test reading
text = g.read()
assert 'Google ' in text
#test file getting
fname = '/tmp/google_test'
g.retrieve(fname)
g_file = open(fname)
g_text = g_file.read()
self.assertEqual(g_text, text)
g_text.close()
remove(fname)
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_db/test_ensembl/__init__.py 000644 000765 000024 00000000637 12024702176 023632 0 ustar 00jrideout staff 000000 000000 __all__ = ['test_assembly', 'test_compara', 'test_database',
'test_feature_level', 'test_genome', 'test_host', 'test_species']
__author__ = "Gavin Huttley, Hua Ying"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Gavin Huttley", "hua Ying"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "Gavin.Huttley@anu.edu.au"
__status__ = "alpha"
PyCogent-1.5.3/tests/test_db/test_ensembl/test_assembly.py 000644 000765 000024 00000011576 12024702176 024755 0 ustar 00jrideout staff 000000 000000 import os
from cogent.util.unit_test import TestCase, main
from cogent.db.ensembl.host import HostAccount, get_ensembl_account
from cogent.db.ensembl.assembly import Coordinate, CoordSystem, \
get_coord_conversion
from cogent.db.ensembl.genome import Genome
__author__ = "Gavin Huttley, Hua Ying"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Gavin Huttley", "hua Ying"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "Gavin.Huttley@anu.edu.au"
__status__ = "alpha"
Release = 68
if 'ENSEMBL_ACCOUNT' in os.environ:
args = os.environ['ENSEMBL_ACCOUNT'].split()
host, username, password = args[0:3]
kwargs = {}
if len(args) > 3:
kwargs['port'] = int(args[3])
account = HostAccount(host, username, password, **kwargs)
else:
account = get_ensembl_account(release=Release)
human = Genome(Species = 'human', Release=Release, account=account)
platypus = Genome(Species = 'platypus', Release=Release, account=account)
class TestLocation(TestCase):
def test_init(self):
human_loc = Coordinate(CoordName='x', Start=1000, End=10000, Strand=-1,
genome = human)
# TODO: complete test for platpus
self.assertEqual(human_loc.CoordType, 'chromosome')
self.assertEqual(human_loc.CoordName, 'x')
self.assertEqual(human_loc.Start, 1000)
self.assertEqual(human_loc.End, 10000)
self.assertEqual(human_loc.Strand, -1)
self.assertEqual(human_loc.Species, "Homo sapiens")
self.assertEqual(human_loc.seq_region_id, 27516)
def test_get_coord_conversion(self):
"""should correctly map between different coordinate levels"""
# not really testing the contig coordinates are correct
CoordName, Start, End, Strand = '1', 1000, 1000000, 1
human_loc = Coordinate(CoordName = CoordName, Start = Start, End = End,
Strand = Strand, genome = human)
results = get_coord_conversion(human_loc, 'contig', human.CoreDb)
for result in results:
self.assertTrue(result[0].CoordName == CoordName)
self.assertTrue(result[0].Start >= Start)
self.assertTrue(result[0].End <= End)
self.assertTrue(result[0].Strand == Strand)
def test_coord_shift(self):
"""adding coordinates should produce correct results"""
CoordName, Start, End, Strand = '1', 1000, 1000000, 1
loc1 = Coordinate(CoordName = CoordName, Start = Start, End = End,
Strand = Strand, genome = human)
for shift in [100, -100]:
loc2 = loc1.shifted(shift)
self.assertEqual(loc2.Start, loc1.Start+shift)
self.assertEqual(loc2.End, loc1.End+shift)
self.assertEqual(id(loc1.genome), id(loc2.genome))
self.assertNotEqual(id(loc1), id(loc2))
def test_coord_resize(self):
"""resizing should work"""
CoordName, Start, End, Strand = '1', 1000, 1000000, 1
loc1 = Coordinate(CoordName = CoordName, Start = Start, End = End,
Strand = Strand, genome = human)
front_shift = -100
back_shift = 100
loc2 = loc1.resized(front_shift, back_shift)
self.assertEqual(len(loc2), len(loc1)+200)
self.assertEqual(loc2.Start, loc1.Start+front_shift)
self.assertEqual(loc2.End, loc1.End+back_shift)
self.assertEqual(loc1.Strand, loc2.Strand)
def test_adopted(self):
"""coordinate should correctly adopt seq_region_id properties of
provided coordinate"""
CoordName, Start, End, Strand = '1', 1000, 1000000, 1
c1 = Coordinate(CoordName = CoordName, Start = Start, End = End,
Strand = Strand, genome = human)
CoordName, Start, End, Strand = '2', 2000, 2000000, 1
c2 = Coordinate(CoordName = CoordName, Start = Start, End = End,
Strand = Strand, genome = human)
c3 = c1.adopted(c2)
self.assertEqual(c3.CoordName, c2.CoordName)
self.assertEqual(c3.CoordType, c2.CoordType)
self.assertEqual(c3.seq_region_id, c2.seq_region_id)
self.assertEqual(c3.Start, c1.Start)
self.assertEqual(c3.End, c1.End)
self.assertEqual(c3.Strand, c1.Strand)
c3 = c1.adopted(c2, shift = 100)
self.assertEqual(c3.Start, c1.Start+100)
self.assertEqual(c3.End, c1.End+100)
class TestCoordSystem(TestCase):
def test_call(self):
human_chrom = CoordSystem('chromosome', core_db = human.CoreDb,
species = 'human')
human_contig = CoordSystem(1, species = 'human')
self.assertEqual(human_chrom.coord_system_id, 2)
self.assertEqual(human_contig.name, 'contig')
self.assertEqual(human_contig.attr, 'default_version, sequence_level')
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_db/test_ensembl/test_compara.py 000644 000765 000024 00000025354 12024702176 024557 0 ustar 00jrideout staff 000000 000000 from __future__ import division
import os
from cogent.util.unit_test import TestCase, main
from cogent.db.ensembl.host import HostAccount, get_ensembl_account
from cogent.db.ensembl.compara import Compara
__author__ = "Gavin Huttley, Hua Ying"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Gavin Huttley", "hua Ying"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "Gavin.Huttley@anu.edu.au"
__status__ = "alpha"
Release = 68
if 'ENSEMBL_ACCOUNT' in os.environ:
args = os.environ['ENSEMBL_ACCOUNT'].split()
host, username, password = args[0:3]
kwargs = {}
if len(args) > 3:
kwargs['port'] = int(args[3])
account = HostAccount(host, username, password, **kwargs)
else:
account = get_ensembl_account(release=Release)
def calc_slope(x1, y1, x2, y2):
"""computes the slope from two coordinate sets, assigning a delta of 1
when values are identical"""
delta_y = y2-y1
delta_x = x2-x1
delta_y = [delta_y, 1][delta_y == 0]
delta_x = [delta_x, 1][delta_x == 0]
return delta_y/delta_x
class ComparaTestBase(TestCase):
comp = Compara(['human', 'mouse', 'rat', 'platypus'], Release=Release,
account=account)
class TestCompara(ComparaTestBase):
def test_query_genome(self):
"""compara should attach valid genome attributes by common name"""
brca2 = self.comp.Mouse.getGeneByStableId("ENSMUSG00000041147")
self.assertEquals(brca2.Symbol.lower(), 'brca2')
def test_get_related_genes(self):
"""should correctly return the related gene regions from each genome"""
brca2 = self.comp.Mouse.getGeneByStableId("ENSMUSG00000041147")
Orthologs = self.comp.getRelatedGenes(gene_region=brca2,
Relationship="ortholog_one2one")
self.assertEquals("ortholog_one2one", Orthologs.Relationships[0])
def test_get_related_genes2(self):
"""should handle case where gene is absent from one of the genomes"""
clec2d = self.comp.Mouse.getGeneByStableId(
StableId='ENSMUSG00000030157')
orthologs = self.comp.getRelatedGenes(gene_region=clec2d,
Relationship='ortholog_one2many')
self.assertTrue(len(orthologs.Members) < 4)
def test_get_collection(self):
brca2 = self.comp.Human.getGeneByStableId(StableId="ENSG00000139618")
Orthologs = self.comp.getRelatedGenes(gene_region=brca2,
Relationship="ortholog_one2one")
collection = Orthologs.getSeqCollection()
self.assertTrue(len(collection.Seqs[0])> 1000)
def test_getting_alignment(self):
mid = "ENSMUSG00000041147"
brca2 = self.comp.Mouse.getGeneByStableId(StableId=mid)
result = list(self.comp.getSyntenicRegions(region=brca2,
align_method='PECAN', align_clade='vertebrates'))[0]
aln = result.getAlignment(feature_types='gene')
self.assertTrue(len(aln) > 1000)
def test_generate_method_clade_data(self):
"""should correctly determine the align_method align_clade options for
a group of species"""
# we should correctly infer the method_species_links, which is a
# cogent.util.Table instance
self.assertTrue(self.comp.method_species_links.Shape > (0,0))
def test_no_method_clade_data(self):
"""generate a Table with no rows if no alignment data"""
compara = Compara(['S.cerevisiae'], Release=Release, account=account)
self.assertEquals(compara.method_species_links.Shape[0], 0)
def test_get_syntenic_returns_nothing(self):
"""should correctly return None for a SyntenicRegion with golden-path
assembly gap"""
Start = 100000
End = Start + 100000
related = list(self.comp.getSyntenicRegions(Species='mouse',
CoordName='1', Start=Start, End=End,
align_method='PECAN', align_clade='vertebrates'))
self.assertEquals(related, [])
def test_get_species_set(self):
"""should return the correct set of species"""
expect = set(['Homo sapiens', 'Ornithorhynchus anatinus',
'Mus musculus', 'Rattus norvegicus'])
brca2 = self.comp.Human.getGeneByStableId(StableId="ENSG00000139618")
Orthologs = self.comp.getRelatedGenes(gene_region=brca2,
Relationship="ortholog_one2one")
self.assertEquals(Orthologs.getSpeciesSet(), expect)
def test_pool_connection(self):
"""excercising ability to specify pool connection"""
dog = Compara(['chimp', 'dog'], Release=Release, account=account,
pool_recycle=1000)
class TestSyntenicRegions(TestCase):
comp = Compara(['human', 'chimp', 'macaque'], account=account,
Release=Release)
def test_correct_alignments(self):
"""should return the correct alignments"""
# following cases have a mixture of strand between ref seq and others
coords_expected = [
[{'CoordName': 4, 'End': 78099, 'Species': 'human', 'Start': 77999, 'Strand':-1},
{'Homo sapiens:chromosome:4:77999-78099:-1':
'ATGTAAATCAAAACCAAAGTCTGCATTTATTTGCGGAAAGAGATGCTACATGTTCAAAGATAAATATGGAACATTTTTTAAAAGCATTCATGACTTAGAA',
'Macaca mulatta:chromosome:1:3891064-3891163:1':
'ATGTCAATCAAAACCAAAGTCTGTATTTATTTGCAGAAAGAGATACTGCATGTTCAAAGATAAATATGGAAC-TTTTTAAAAAGCATTAATGACTTATAC',
'Pan troglodytes:chromosome:4:102056-102156:-1':
'ATGTAAATCAAAACCAAAGTCTGCATTTATTTGCGGAAAGAGATGCTACATGTTCAAAGATAAATATGGAACATTTTTAAAAAGCATTCATGACTTAGAA'}],
[{'CoordName': 18, 'End': 213739, 'Species': 'human', 'Start': 213639, 'Strand':-1},
{'Homo sapiens:chromosome:18:213639-213739:-1':
'ATAAGCATTTCCCTTTAGGGCTCTAAGATGAGGTCATCATCGTTTTTAATCCTGAAGAAGGGCTACTGAGTGAGTGCAGATTATTCGGTAAACACT----CTTA',
'Macaca mulatta:chromosome:18:13858303-13858397:1':
'------GTTTCCCTTTAGGGCTCTAAGATGAGGTCATCATTGTTTTTAATCCTGAAGAAGGGCTACTGA----GTGCAGATTATTCTGTAAATGTGCTTACTTG',
'Pan troglodytes:chromosome:18:16601082-16601182:1':
'ATAAGCATTTCCCTTTAGGGCTCTAAGATGAGGTCATCATCGTTTTTAATCCTGAAGAAGGGCTACTGA----GTGCAGATTATTCTGTAAACACTCACTCTTA'}],
[{'CoordName': 5, 'End': 204974, 'Species': 'human', 'Start': 204874, 'Strand':1},
{'Homo sapiens:chromosome:5:204874-204974:1':
'AACACTTGGTATTT----CCCCTTTATGGAGTGAGAGAGATCTTTAAAATATAAACCCTTGATAATATAATATTACTACTTCCTATTA---CCTGTTATGCAGTTCT',
'Macaca mulatta:chromosome:6:1297736-1297840:-1':
'AACTCTTGGTGTTTCCTTCCCCTTTATGG---GAGAGAGATCTTTAAAATAAAAAACCTTGATAATATAATATTACTACTTTCTATTATCATCTGTTATGCAGTTCT',
'Pan troglodytes:chromosome:5:335911-336011:1':
'AACACTTGGTAGTT----CCCCTTTATGGAGTGAGAGAGATCTTTAAAATATAAACCCTTGATAATATAATATTACTACTTTCTATTA---CCTGTTATGCAGTTCT'}],
[{'CoordName': 18, 'End': 203270, 'Species': 'human', 'Start': 203170, 'Strand':-1},
{'Homo sapiens:chromosome:18:203170-203270:-1':
'GGAATAATGAAAGCAATTGTGAGTTAGCAATTACCTTCAAAGAATTACATTTCTTATACAAAGTAAAGTTCATTACTAACCTTAAGAACTTTGGCATTCA',
'Pan troglodytes:chromosome:18:16611584-16611684:1':
'GGAATAATGAAAGCAATTGTAAGTTAGCAATTACCTTCAAAGAATTACATTTCTTATACAAAGTAAAGTTCATTACTAACCTTAAGAACTTTGGCATTCA'}],
[{'CoordName': 2, 'End': 46445, 'Species': 'human', 'Start': 46345, 'Strand':-1},
{'Homo sapiens:chromosome:2:46345-46445:-1':
'CTACCACTCGAGCGCGTCTCCGCTGGACCCGGAACCCCGGTCGGTCCATTCCCCGCGAAGATGCGCGCCCTGGCGGCCCTGAGCGCGCCCCCGAACGAGC',
'Macaca mulatta:chromosome:13:43921-44021:-1':
'CTGCCACTCCAGCGCGTCTCCGCTGCACCCGGAGCGCCGGCCGGTCCATTCCCCGCGAGGATGCGCGCCCTGGCGGCCCTGAACACGTCGGCGAGAGAGC',
'Pan troglodytes:chromosome:2a:36792-36892:-1':
'CTACCACTCGAGCGCGTCTCCGCTGGACCCGGAACCCCAGTCGGTCCATTCCCCGCGAAGATGCGCGCCCTGGCGGCCCTGAACGCGCCCCCGAACGAGC'}],
[{'CoordName': 18, 'End': 268049, 'Species': 'human', 'Start': 267949, 'Strand':-1},
{'Homo sapiens:chromosome:18:267949-268049:-1':
'GCGCAGTGGCGGGCACGCGCAGCCGAGAAGATGTCTCCGACGCCGCCGCTCTTCAGTTTGCCCGAAGCGCGGACGCGGTTTACGGTGAGCTGTAGAGGGG',
'Macaca mulatta:chromosome:18:13805604-13805703:1':
'GCGCAG-GGCGGGCACGCGCAGCCGAGAAGATGTCTCCGACGCCGCCGCTCTTCAGTTTGCCCGAAGCGCGGACGCGGTTTACGGTGAGCTGTAGGCGGG',
'Pan troglodytes:chromosome:18:16546800-16546900:1':
'GCGCAGTGGCGGGCACGCGCAGCCGAGAAGATGTCTCCGACGCCGCCGCTCTTCAGTTTGCCCGAAGCGCGGACGCGGTTTACGGTGAGCTGTAGCGGGG'}],
[{'CoordName': 16, 'End': 107443, 'Species': 'human', 'Start': 107343, 'Strand':-1},
{'Homo sapiens:chromosome:16:107343-107443:-1':
'AAGAAGCAAACAGGTTTATTTTATACAGTGGGCCAGGCCGTGGGTCTGCCATGTGACTAGGGCATTTGGACCTAGGGAGAGGTCAGTCTCAGGCCAAGTA',
'Pan troglodytes:chromosome:16:48943-49032:-1':
'AAGAAGCAAACAGGTTTATTTTATACACTGGGCCAGGCCGTGGGTCTGCCATGTGACTAGGGAATTTGGACC-----------CAGTCTCAGGCCAAGTA'}]
]
# print self.comp.method_species_links
for coord, expect in coords_expected[1:]:
syntenic = list(
self.comp.getSyntenicRegions(method_clade_id=548, **coord))[0]
# check the slope computed from the expected and returned
# coordinates is ~ 1
got_names = dict([(n.split(':')[0], n.split(':')) for n in syntenic.getAlignment().Names])
exp_names = dict([(n.split(':')[0], n.split(':')) for n in expect.keys()])
for species in exp_names:
exp_chrom = exp_names[species][2]
got_chrom = got_names[species][2]
self.assertEquals(exp_chrom.lower(), got_chrom.lower())
exp_start, exp_end = map(int, exp_names[species][3].split('-'))
got_start, got_end = map(int, got_names[species][3].split('-'))
slope = calc_slope(exp_start, exp_end, got_start, got_end)
self.assertFloatEqual(abs(slope), 1.0, eps=1e-3)
def test_failing_region(self):
"""should correctly handle queries where multiple Ensembl have
genome block associations for multiple coord systems"""
gene = self.comp.Human.getGeneByStableId(StableId='ENSG00000188554')
# this should simply not raise any exceptions
syntenic_regions = list(self.comp.getSyntenicRegions(region=gene,
align_method='PECAN',
align_clade='vertebrates'))
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_db/test_ensembl/test_database.py 000644 000765 000024 00000007404 12024702176 024675 0 ustar 00jrideout staff 000000 000000 import os
from cogent.util.unit_test import TestCase, main
from cogent.db.ensembl.host import HostAccount, get_ensembl_account
from cogent.db.ensembl.database import Database
__author__ = "Gavin Huttley, Hua Ying"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Gavin Huttley", "hua Ying"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "Gavin.Huttley@anu.edu.au"
__status__ = "alpha"
Release = 68
if 'ENSEMBL_ACCOUNT' in os.environ:
args = os.environ['ENSEMBL_ACCOUNT'].split()
host, username, password = args[0:3]
kwargs = {}
if len(args) > 3:
kwargs['port'] = int(args[3])
account = HostAccount(host, username, password, **kwargs)
else:
account = get_ensembl_account(release=Release)
class TestDatabase(TestCase):
def test_connect(self):
human = Database(account=account, release=Release,
species='human', db_type='core')
gene = human.getTable('gene')
def test_get_distinct(self):
"""should return list of strings"""
db = Database(account=account, release=Release,
species='human', db_type='variation')
tn, tc = 'variation_feature', 'consequence_types'
expected = set(('3_prime_UTR_variant', 'splice_acceptor_variant',
'5_prime_UTR_variant'))
got = db.getDistinct(tn, tc)
self.assertNotEquals(set(got) & expected, set())
db = Database(account=account, release=Release,
species='human', db_type='core')
tn, tc = 'gene', 'biotype'
expected = set(['protein_coding', 'pseudogene', 'processed_transcript',
'Mt_tRNA', 'Mt_rRNA', 'IG_V_gene', 'IG_J_gene',
'IG_C_gene', 'IG_D_gene', 'miRNA', 'misc_RNA', 'snoRNA', 'snRNA', 'rRNA'])
got = set(db.getDistinct(tn, tc))
self.assertNotEquals(set(got) & expected, set())
db = Database(account=account, release=Release, db_type='compara')
got = set(db.getDistinct('homology', 'description'))
expected = set(['apparent_ortholog_one2one', 'ortholog_many2many',
'ortholog_one2many', 'ortholog_one2one', 'within_species_paralog'])
self.assertEquals(len(got&expected), len(expected))
def test_get_table_row_counts(self):
"""should return correct row counts for some tables"""
expect = {'homo_sapiens_core_68_37.analysis': 61L,
'homo_sapiens_core_68_37.seq_region': 55616L,
'homo_sapiens_core_68_37.assembly': 102090L,
'homo_sapiens_core_68_37.qtl': 0L}
human = Database(account=account, release=Release,
species='human', db_type='core')
table_names = [n.split('.')[1] for n in expect]
got = dict(human.getTablesRowCount(table_names).getRawData())
for dbname in expect:
self.assertTrue(got[dbname] >= expect[dbname])
def test_table_has_column(self):
"""return correct values for whether a Table has a column"""
account = get_ensembl_account(release=Release)
var61 = Database(account=account, release=61, species='human',
db_type='variation')
var62 = Database(account=account, release=62, species='human',
db_type='variation')
self.assertTrue(var61.tableHasColumn('transcript_variation',
'peptide_allele_string'))
self.assertFalse(var61.tableHasColumn('transcript_variation',
'pep_allele_string'))
self.assertTrue(var62.tableHasColumn('transcript_variation',
'pep_allele_string'))
self.assertFalse(var62.tableHasColumn('transcript_variation',
'peptide_allele_string'))
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_db/test_ensembl/test_feature_level.py 000644 000765 000024 00000006171 12024702176 025753 0 ustar 00jrideout staff 000000 000000 import os
from cogent import DNA
from cogent.util.unit_test import TestCase, main
from cogent.db.ensembl.host import HostAccount, get_ensembl_account
from cogent.db.ensembl.genome import Genome
from cogent.db.ensembl.assembly import CoordSystem, Coordinate, get_coord_conversion
from cogent.db.ensembl.feature_level import FeatureCoordLevels
__author__ = "Gavin Huttley, Hua Ying"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Gavin Huttley", "hua Ying"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "Gavin.Huttley@anu.edu.au"
__status__ = "alpha"
Release = 68
if 'ENSEMBL_ACCOUNT' in os.environ:
args = os.environ['ENSEMBL_ACCOUNT'].split()
host, username, password = args[0:3]
kwargs = {}
if len(args) > 3:
kwargs['port'] = int(args[3])
account = HostAccount(host, username, password, **kwargs)
else:
account = get_ensembl_account(release=Release)
class TestFeatureCoordLevels(TestCase):
def setUp(self):
self.chicken = Genome(Species='chicken', Release=Release,
account=account)
def test_feature_levels(self):
ChickenFeatureLevels = FeatureCoordLevels('chicken')
chicken_feature_levels = ChickenFeatureLevels(
feature_types=['gene', 'cpg', 'est'],
core_db=self.chicken.CoreDb,
otherfeature_db=self.chicken.OtherFeaturesDb)
self.assertEquals(chicken_feature_levels['repeat'].levels, ['contig'])
self.assertEquals(set(chicken_feature_levels['cpg'].levels),\
set(['contig', 'supercontig', 'chromosome']))
def test_repeat(self):
# use chicken genome as it need to do conversion
# chicken coordinate correspondent toRefSeq human IL2A region
coord = dict(CoordName=9, Start=23817146, End=23818935)
region = self.chicken.getRegion(**coord)
# repeat is recorded at contig level, strand is 0
repeats = region.getFeatures(feature_types = 'repeat')
expect = [("9", 23817293, 23817321), ("9", 23817803, 23817812),
("9", 23817963, 23817972)]
obs = []
for repeat in repeats:
loc = repeat.Location
obs.append((loc.CoordName, loc.Start, loc.End))
self.assertEquals(set(obs), set(expect))
def test_cpg(self):
# contain 3 CpG island recorded at chromosome level
coord1 = dict(CoordName=26, Start=110000, End=190000)
cpgs1 = self.chicken.getFeatures(feature_types = 'cpg', **coord1)
exp = [("26", 116969, 117955), ("26", 139769, 140694),
("26", 184546, 185881)]
obs = []
for cpg in cpgs1:
loc = cpg.Location
obs.append((loc.CoordName, loc.Start, loc.End))
self.assertEquals(set(exp), set(obs))
# test cpg features record at supercontig level:
coord2 = dict(CoordName='Un_random', Start=29434117, End=29439117)
cpgs2 = self.chicken.getFeatures(feature_types='cpg', **coord2)
self.assertEquals(len(list(cpgs2)), 1)
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_db/test_ensembl/test_genome.py 000644 000765 000024 00000076673 12024702176 024421 0 ustar 00jrideout staff 000000 000000 import os
from cogent import DNA
from cogent.util.unit_test import TestCase, main
from cogent.db.ensembl.host import HostAccount, get_ensembl_account
from cogent.db.ensembl.util import convert_strand
from cogent.db.ensembl.genome import Genome
from cogent.db.ensembl.sequence import _assemble_seq
from cogent.db.ensembl.util import asserted_one
__author__ = "Gavin Huttley, Hua Ying"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Gavin Huttley", "hua Ying"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "Gavin.Huttley@anu.edu.au"
__status__ = "alpha"
Release = 67
if 'ENSEMBL_ACCOUNT' in os.environ:
args = os.environ['ENSEMBL_ACCOUNT'].split()
host, username, password = args[0:3]
kwargs = {}
if len(args) > 3:
kwargs['port'] = int(args[3])
account = HostAccount(host, username, password, **kwargs)
else:
account = get_ensembl_account(release=Release)
class GenomeTestBase(TestCase):
human = Genome(Species="human", Release=Release, account=account)
mouse = Genome(Species="mouse", Release=Release, account=account)
rat = Genome(Species="rat", Release=Release, account=account)
macaq = Genome(Species="macaque", Release=Release, account=account)
gorilla = Genome(Species="gorilla", Release=Release, account=account)
brca2 = human.getGeneByStableId(StableId="ENSG00000139618")
class TestGenome(GenomeTestBase):
def test_other_features(self):
"""should correctly return record for ENSESTG00000035043"""
est = self.human.getEstMatching(StableId='ENSESTG00000035043')
direct = list(est)[0]
ests = self.human.getFeatures(feature_types='est', CoordName=8,
Start=121470000, End=121600000)
stable_ids = [est.StableId for est in ests]
self.assertContains(stable_ids, direct.StableId)
def test_genome_comparison(self):
"""different genome instances with same CoreDb connection are equal"""
h2 = Genome(Species='human', Release=Release, account=account)
self.assertEquals(self.human, h2)
def test_make_location(self):
"""should correctly make a location for an entire chromosome"""
loc = self.human.makeLocation(CoordName=1)
self.assertEquals(len(loc), 249250621)
def test_get_region(self):
"""should return a generic region that extracts correct sequence"""
chrom = 1
Start = 11137
End = Start+20
region = self.human.getRegion(CoordName=chrom, Start=Start, End=End,
ensembl_coord=True)
self.assertEquals(region.Location.Start, Start-1)
self.assertEquals(region.Location.End, End)
self.assertEquals(region.Location.CoordName, str(chrom))
self.assertEquals(region.Location.CoordType, 'chromosome')
self.assertEquals(region.Seq, 'ACCTCAGTAATCCGAAAAGCC')
def test_get_assembly_exception_region(self):
"""should return correct sequence for region with an assembly
exception"""
##old:chrY:57767412-57767433; New: chrY:59358024-59358045
region = self.human.getRegion(CoordName = "Y", Start = 59358024,
End = 59358045, Strand = 1, ensembl_coord = True)
self.assertEquals(str(region.Seq), 'CGAGGACGACTGGGAATCCTAG')
def test_no_assembly(self):
"""return N's for coordinates with no assembly"""
krat = Genome('Kangaroo rat', Release=58)
Start=24385
End=Start+100
region = krat.getRegion(CoordName='scaffold_13754', Start=Start,
End=End)
self.assertEquals(str(region.Seq), 'N' * (End-Start))
def test_getting_annotated_seq(self):
"""a region should return a sequence with the correct annotation"""
new_loc = self.brca2.Location.resized(-100, 100)
region = self.human.getRegion(region=new_loc)
annot_seq = region.getAnnotatedSeq(feature_types='gene')
gene_annots = annot_seq.getAnnotationsMatching('gene')
self.assertEquals(gene_annots[0].Name, self.brca2.Symbol)
def test_correct_feature_type_id_cache(self):
"""should obtain the feature type identifiers without failure"""
self.assertNotEquals(self.human._feature_type_ids.CpGisland, None)
def test_strand_conversion(self):
"""should consistently convert strand info"""
self.assertEquals(convert_strand(None), 1)
self.assertEquals(convert_strand(-1), -1)
self.assertEquals(convert_strand(1), 1)
self.assertEquals(convert_strand('-'), -1)
self.assertEquals(convert_strand('+'), 1)
self.assertEquals(convert_strand(-1.0), -1)
self.assertEquals(convert_strand(1.0), 1)
def test_pool_connection(self):
"""excercising ability to specify pool connection"""
dog = Genome(Species="dog", Release=Release, account=account,
pool_recycle=1000)
def test_gorilla(self):
"""should correctly return a gorilla gene"""
self.gorilla = Genome(Species="gorilla", Release=Release, account=account)
gene = self.gorilla.getGeneByStableId('ENSGGOG00000005730')
self.assertEquals(str(gene.Seq[:10]), 'TGGGAGTCCA')
def test_diff_strand_contig_chrom(self):
"""get correct sequence when contig and chromosome strands differ"""
gene = self.gorilla.getGeneByStableId('ENSGGOG00000001953')
cds = gene.CanonicalTranscript.Cds
self.assertEquals(str(cds), 'ATGGCCCAGGATCTCAGCGAGAAGGACCTGTTGAAGATG'
'GAGGTGGAGCAGCTGAAGAAAGAAGTGAAAAACACAAGAATTCCGATTTCCAAAGCGGGAAAGGAAAT'
'CAAAGAGTACGTGGAGGCCCAAGCAGGAAACGATCCTTTTCTCAAAGGCATCCCTGAGGACAAGAATC'
'CCTTCAAGGAGAAAGGTGGCTGTCTGATAAGCTGA')
def test_get_distinct_biotype(self):
"""Genome instance getDistinct should work on all genomes"""
for genome in self.gorilla, self.human, self.mouse, self.rat, self.macaq:
biotypes = genome.getDistinct('biotype')
class TestGene(GenomeTestBase):
def _eval_brca2(self, brca2):
"""tests all attributes correctly created"""
self.assertEquals(brca2.Symbol.lower(), 'brca2')
self.assertEquals(brca2.StableId, 'ENSG00000139618')
self.assertEquals(brca2.BioType.lower(), 'protein_coding')
self.assertContains(brca2.Description.lower(), 'breast cancer')
self.assertEquals(brca2.Status, 'KNOWN')
self.assertEquals(brca2.CanonicalTranscript.StableId,
'ENST00000380152')
# note length can change between genome builds
self.assertGreaterThan(len(brca2), 83700)
transcript = brca2.getMember('ENST00000380152')
self.assertEquals(transcript.getCdsLength(),len(transcript.Cds))
def test_get_genes_by_stable_id(self):
"""if get gene by stable_id, attributes should be correctly
constructed"""
self._eval_brca2(self.brca2)
def test_get_exons(self):
"""transcript should return correct exons for brca2"""
transcript = self.brca2.getMember('ENST00000380152')
self.assertEquals(len(transcript.TranslatedExons), 26)
self.assertEquals(len(transcript.Cds), 3419*3)
self.assertEquals(len(transcript.ProteinSeq), 3418)
def test_translated_exons(self):
"""should correctly translate a gene with 2 exons but 1st exon
transcribed"""
gene = self.mouse.getGeneByStableId(StableId='ENSMUSG00000036136')
transcript = gene.getMember('ENSMUST00000041133')
self.assertTrue(len(transcript.ProteinSeq) > 0)
# now one on the - strand
gene = self.mouse.getGeneByStableId(StableId='ENSMUSG00000045912')
transcript = gene.Transcripts[0]
self.assertTrue(len(transcript.ProteinSeq) > 0)
def test_failed_ensembl_annotation(self):
"""we demonstrate a failed annotation by ensembl"""
# I'm including this to demonstrate that Ensembl coords are
# complex. This case has a macaque gene which we correctly
# infer the CDS boundaries for according to Ensembl, but the CDS
# length is not divisible by 3.
gene = self.macaq.getGeneByStableId(StableId='ENSMMUG00000001551')
transcript = gene.getMember('ENSMMUT00000002194')
# the following works because we enforce the length being divisble by 3
# in producing ProteinSeq
prot_seq = transcript.ProteinSeq
# BUT if you work off the Cds you will need to slice the CDS to be
# divisible by 3 to get the same protein sequence
l = transcript.getCdsLength()
trunc_cds = transcript.Cds[: l - (l % 3)]
prot_seq = trunc_cds.getTranslation()
self.assertEquals(str(prot_seq),
'MPSSPLRVAVVCSSNQNRSMEAHNILSKRGFSVRSFGTGTHVKLPGPAPDKPNVYDFKTT'\
'YDQMYNDLLRKDKELYTQNGILHMLDRNKRIKPRPERFQNCKDLFDLILTCEERVY')
def test_exon_phases(self):
"""correctly identify phase for an exon"""
stable_id = 'ENSG00000171408'
gene = self.human.getGeneByStableId(StableId=stable_id)
exon1 = gene.Transcripts[1].Exons[0]
# first two bases of codon missing
self.assertEquals(exon1.PhaseStart, 2)
# last two bases of codon missing
self.assertEquals(exon1.PhaseEnd, 1)
# can translate the sequence if we take those into account
seq = exon1.Seq[1:-1].getTranslation()
self.assertEquals(str(seq), 'HMLSKVGMWDFDIFLFDRLTN')
def test_cds_from_outofphase(self):
"""return a translatable Cds sequence from out-of-phase start"""
# canonical transcript phase end_phase
# ENSG00000111729 ENST00000229332 -1 -1
# ENSG00000177151 ENST00000317450 0 -1
# ENSG00000249624 ENST00000433395 1 -1
# ENSG00000237276 ENST00000442385 2 -1
# ENSG00000167744 ENST00000301411 -1 0
canon_ids = 'ENSG00000111729 ENSG00000177151 ENSG00000237276 ENSG00000167744 ENSG00000251184'.split()
for index, stable_id in enumerate(canon_ids):
gene = self.human.getGeneByStableId(StableId=stable_id)
transcript = gene.CanonicalTranscript
prot_seq = transcript.ProteinSeq
def test_gene_transcripts(self):
"""should return multiple transcripts"""
stable_id = 'ENSG00000012048'
gene = self.human.getGeneByStableId(StableId=stable_id)
self.assertTrue(len(gene.Transcripts) > 1)
# .. and correctly construct the Cds and location
for transcript in gene.Transcripts:
self.assertTrue(transcript.getCdsLength()>0)
self.assertEquals(transcript.Location.CoordName,'17')
def test_get_longest_cds_transcript2(self):
"""should correctly return transcript with longest cds"""
# ENSG00000123552 is protein coding, ENSG00000206629 is ncRNA
for stable_id, max_cds_length in [('ENSG00000123552', 2445),
('ENSG00000206629', 164)]:
gene = self.human.getGeneByStableId(StableId=stable_id)
ts = gene.getLongestCdsTranscript()
self.assertEquals(len(ts.Cds), max_cds_length)
self.assertEquals(ts.getCdsLength(), max(gene.getCdsLengths()))
def test_get_longest_cds_transcript1(self):
"""should correctly return transcript with longest cds"""
stable_id = 'ENSG00000178591'
gene = self.human.getGeneByStableId(StableId=stable_id)
ts = gene.getLongestCdsTranscript()
self.assertEquals(ts.getCdsLength(), max(gene.getCdsLengths()))
def test_rna_transcript_cds(self):
"""should return a Cds for an RNA gene too"""
rna_gene = self.human.getGeneByStableId(StableId='ENSG00000210049')
self.assertTrue(rna_gene.Transcripts[0].getCdsLength() > 0)
def test_gene_annotation(self):
"""should correctly annotated a sequence"""
annot_seq = self.brca2.getAnnotatedSeq(feature_types='gene')
gene_annots = annot_seq.getAnnotationsMatching('gene')
self.assertEquals(gene_annots[0].Name, self.brca2.Symbol)
def test_get_by_symbol(self):
"""selecting a gene by it's HGNC symbol should correctly populate all
specified attributes"""
results = self.human.getGenesMatching(Symbol="BRCA2")
for snp in results:
self._eval_brca2(snp)
def test_get_by_symbol_synonym(self):
"""return correct gene if provide a synonymn, rather than symbol"""
synonym = 'FOXO1A'
gene = list(self.human.getGenesMatching(Symbol=synonym))[0]
self.assertEquals(gene.Symbol, 'FOXO1')
def test_get_by_description(self):
"""if get by description, all attributes should be correctly
constructed"""
description='breast cancer 2'
results = list(self.human.getGenesMatching(Description=description))
self.assertEquals(len(results), 1)
self._eval_brca2(results[0])
def test_get_member(self):
"""should return correct exon and translated exon"""
transcript = self.brca2.getMember('ENST00000380152')
# just returns the first
exon_id = 'ENSE00001484009'
exon = transcript.getMember(exon_id)
trans_exon = transcript.getMember(exon_id,'TranslatedExons')
self.assertEquals(exon.StableId, exon_id)
self.assertEquals(trans_exon.StableId, exon_id)
# we check we got Exon in the first call and TranslatedExon in the
# second using the fact that the Exons entry is longer than the
# TranslatedExons one
self.assertGreaterThan(len(exon), len(trans_exon))
def test_get_by_biotype(self):
results = list(self.human.getGenesMatching(BioType='Mt_tRNA', like=False))
self.assertEquals(len(results), 22)
results = list(self.human.getGenesMatching(BioType='Mt_tRNA', like=True))
self.assertEquals(len(results), 607)
def test_get_by_decsr_biotype(self):
"""combining the description and biotype should return a result"""
results = list(self.human.getGenesMatching(BioType="protein_coding",
Description="cancer"))
self.assertTrue(len(results) > 50)
def test_variant(self):
"""variant attribute correctly constructed"""
self.assertTrue(len(self.brca2.Variants) > 880)
def test_get_gene_by_stable_id(self):
"""should correctly handle getting gene by stable_id"""
stable_id = 'ENSG00000012048'
gene = self.human.getGeneByStableId(StableId=stable_id)
self.assertEquals(gene.StableId, stable_id)
# if invalid stable_id, should just return None
stable_id = 'ENSG00000XXXXX'
gene = self.human.getGeneByStableId(StableId=stable_id)
self.assertEquals(gene, None)
def test_intron_number(self):
"""number of introns should be correct"""
for gene_id, transcript_id, exp_number in [
('ENSG00000227268', 'ENST00000445946', 0),
('ENSG00000132199', 'ENST00000319815', 8),
('ENSG00000132199', 'ENST00000383578', 15)]:
gene = asserted_one(self.human.getGenesMatching(StableId=gene_id))
transcript = asserted_one(
[t for t in gene.Transcripts if t.StableId==transcript_id])
if exp_number == 0:
self.assertEqual(transcript.Introns, None)
else:
self.assertEqual(len(transcript.Introns), exp_number)
def test_intron(self):
"""should get correct Intron sequence, regardless of strand"""
# IL2 is on - strand, IL13 is on + strand, both have three introns
IL2_exp_introns = [
(1, 123377358, 123377448, 'gtaagtatat', 'actttcttag'),
(2, 123375008, 123377298, 'gtaagtacaa', 'attattctag'),
(3, 123373017,123374864, 'gtaaggcatt', 'tcttttatag')]
IL13_exp_introns = [
(1, 131994052, 131995109, 'gtgagtgtcg', 'gctcccacag'),
(2, 131995163, 131995415, 'gtaaggacct', 'ctccccacag'),
(3, 131995520, 131995866, 'gtaaggcatc', 'tgtcctgcag')]
for symbol, stable_id, exp_introns in [
('IL2', 'ENST00000226730', IL2_exp_introns),
('IL13', 'ENST00000304506', IL13_exp_introns)]:
gene = asserted_one(self.human.getGenesMatching(Symbol=symbol))
strand = gene.Location.Strand
transcript = asserted_one(
[t for t in gene.Transcripts if t.StableId==stable_id])
introns = transcript.Introns
self.assertEqual(len(introns), len(exp_introns))
idx = 0
for intron in introns:
loc = intron.Location
start, end = loc.Start, loc.End
seq = str(intron.Seq)
exp_rank, exp_start, exp_end, exp_seq5, \
exp_seq3 = exp_introns[idx]
self.assertEqual(loc.Strand, strand)
# test the order using rank
self.assertEqual(intron.Rank, exp_rank)
# test position
self.assertEqual(start, exp_start)
self.assertEqual(end, exp_end)
# test sequence
self.assertEqual(seq[:10], exp_seq5.upper())
self.assertEqual(seq[-10:], exp_seq3.upper())
idx += 1
def test_intron_annotation(self):
"""sequences annotated with Introns should return correct seq"""
for symbol, stable_id, rank, exp_seq5, exp_seq3 in [
('IL2', 'ENST00000226730', 1, 'gtaagtatat', 'actttcttag'),
('IL13', 'ENST00000304506', 3, 'gtaaggcatc', 'tgtcctgcag')]:
gene = asserted_one(self.human.getGenesMatching(Symbol=symbol))
seq = gene.getAnnotatedSeq(feature_types='gene')
intron = asserted_one(seq.getAnnotationsMatching('intron',
'%s-%d'%(stable_id, rank)))
intron_seq = str(seq.getRegionCoveringAll(intron).getSlice())
self.assertEqual(intron_seq[:10], exp_seq5.upper())
self.assertEqual(intron_seq[-10:], exp_seq3.upper())
class TestVariation(GenomeTestBase):
snp_names = ['rs34213141', 'rs12791610', 'rs10792769', 'rs11545807', 'rs11270496']
snp_nt_alleles = ['G/A', 'C/T', 'A/G', 'C/A', 'CAGCTCCAGCTC/-']
snp_aa_alleles = ['G/R', 'P/L', 'Y/C', "V/F", "GAGAV/V"]
snp_effects = ['non_synonymous_codon']*3+[['2KB_upstream_variant', '5KB_upstream_variant', 'non_synonymous_codon']]+['non_synonymous_codon']
snp_nt_len = [1, 1, 1, 1, 12]
map_weights = [1,1,1,1,1]
snp_flanks = [
('CTGAGGTGAGCCAGCGTTGGAGCTGTTTTTCCTTTCAGTATGAATTCCACAAGGAAATCATCTCAGGAGGAAGGGCTCATACTTGGATCCAGAAAATATCAACATAGCCAAAGAAAAACAATCAAGACATACCTCCAGGAGCTGTGTAACAGCAACCGGAAAGAGAAACAATGGTGTGTTCCTATGTGGGATATAAAGAGCCGGGGCTCAGGGGGCTCCACACCTGCACCTCCTTCTCACCTGCTCCTCTACCTGCTCCACCCTCAATCCACCAGAACCATGGGCTGCTGTGGCTGCTCC',
'GAGGCTGTGGCTCCAGCTGTGGAGGCTGTGACTCCAGCTGTGGGAGCTGTGGCTCTGGCTGCAGGGGCTGTGGCCCCAGCTGCTGTGCACCCGTCTACTGCTGCAAGCCCGTGTGCTGCTGTGTTCCAGCCTGTTCCTGCTCTAGCTGTGGCAAGCGGGGCTGTGGCTCCTGTGGGGGCTCCAAGGGAGGCTGTGGTTCTTGTGGCTGCTCCCAGTGCAGTTGCTGCAAGCCCTGCTGTTGCTCTTCAGGCTGTGGGTCATCCTGCTGCCAGTGCAGCTGCTGCAAGCCCTACTGCTCCC'),
('GAAAATATCAACATAGCCAAAGAAAAACAATCAAGACATACCTCCAGGAGCTGTGTAACAGCAACCGGAAAGAGAAACAATGGTGTGTTCCTATGTGGGATATAAAGAGCCGGGGCTCAGGGGGCTCCACACCTGCACCTCCTTCTCACCTGCTCCTCTACCTGCTCCACCCTCAATCCACCAGAACCATGGGCTGCTGTGGCTGCTCCGGAGGCTGTGGCTCCAGCTGTGGAGGCTGTGACTCCAGCTGTGGGAGCTGTGGCTCTGGCTGCAGGGGCTGTGGCCCCAGCTGCTGTGCAC',
'CGTCTACTGCTGCAAGCCCGTGTGCTGCTGTGTTCCAGCCTGTTCCTGCTCTAGCTGTGGCAAGCGGGGCTGTGGCTCCTGTGGGGGCTCCAAGGGAGGCTGTGGTTCTTGTGGCTGCTCCCAGTGCAGTTGCTGCAAGCCCTGCTGTTGCTCTTCAGGCTGTGGGTCATCCTGCTGCCAGTGCAGCTGCTGCAAGCCCTACTGCTCCCAGTGCAGCTGCTGTAAGCCCTGTTGCTCCTCCTCGGGTCGTGGGTCATCCTGCTGCCAATCCAGCTGCTGCAAGCCCTGCTGCTCATCCTC'),
('ATCAACATAGCCAAAGAAAAACAATCAAGACATACCTCCAGGAGCTGTGTAACAGCAACCGGAAAGAGAAACAATGGTGTGTTCCTATGTGGGATATAAAGAGCCGGGGCTCAGGGGGCTCCACACCTGCACCTCCTTCTCACCTGCTCCTCTACCTGCTCCACCCTCAATCCACCAGAACCATGGGCTGCTGTGGCTGCTCCGGAGGCTGTGGCTCCAGCTGTGGAGGCTGTGACTCCAGCTGTGGGAGCTGTGGCTCTGGCTGCAGGGGCTGTGGCCCCAGCTGCTGTGCACCCGTCT',
'CTGCTGCAAGCCCGTGTGCTGCTGTGTTCCAGCCTGTTCCTGCTCTAGCTGTGGCAAGCGGGGCTGTGGCTCCTGTGGGGGCTCCAAGGGAGGCTGTGGTTCTTGTGGCTGCTCCCAGTGCAGTTGCTGCAAGCCCTGCTGTTGCTCTTCAGGCTGTGGGTCATCCTGCTGCCAGTGCAGCTGCTGCAAGCCCTACTGCTCCCAGTGCAGCTGCTGTAAGCCCTGTTGCTCCTCCTCGGGTCGTGGGTCATCCTGCTGCCAATCCAGCTGCTGCAAGCCCTGCTGCTCATCCTCAGGCTG'),
('GCTGAAGAAACCATTTCAAACAGGATTGGAATAGGGAAACCCGGCACTCAGCTCGGCGCAAGCCGGCGGTGCCTTCAGACTAGAGAGCCTCTCCTCCGGTGCGCTGCAAGTAGGGCCTCGGCTCGAGGTCAACATTCTAGTTGTCCAGCGCTCCCTCTCCGGCACCTCGGTGAGGCTAGTTGACCCGACAGGCGCGGATCATGAGCAGCTGCAGGAGAATGAAGAGCGGGGACGTAATGAGGCCGAACCAGAGCTCCCGAGTCTGCTCCGCCAGCTTCTGGCACAACAGCATCTCGAAGA',
'GAACTTGAGACTCAGGACCGTAAGTACCCAGAAAAGGCGGAGCACCGCCAGCCGCTTCTCTCCATCCTGGAAGAGGCGCACGGACACGATGGTGGTGAAGTAGGTGCTGAGCCCGTCAGCGGCGAAGAAAGGCACGAACACGTTCCACCAGGAGAGGCCCGGGACCAGGCCATCCACACGCAGTGCCAGCAGCACAGAGAACACCAACAGGGCCAGCAGGTGCACGAAGATCTCGAAGGTGGCGAAGCCTAGCCACTGCACCAGCTCCCGGAGCGAGAAGAGCATCGCGCCCGTTGAGCG')]
def test_get_variation_by_symbol(self):
"""should return correct snp when query genome by symbol"""
# supplement this test with some synonymous snp's, where they have no
# peptide alleles
for i in range(4):
snp = list(self.human.getVariation(Symbol=self.snp_names[i]))[0]
self.assertEquals(snp.Symbol, self.snp_names[i])
self.assertEquals(snp.Effect, self.snp_effects[i])
self.assertEquals(snp.Alleles, self.snp_nt_alleles[i])
self.assertEquals(snp.MapWeight, self.map_weights[i])
def test_num_alleles(self):
"""should correctly infer the number of alleles"""
for i in range(4):
snp = list(self.human.getVariation(Symbol=self.snp_names[i]))[0]
self.assertEquals(len(snp), self.snp_nt_len[i])
def test_get_peptide_alleles(self):
"""should correctly infer the peptide alleles"""
for i in range(4):
snp = list(self.human.getVariation(Symbol=self.snp_names[i]))[0]
if snp.Effect == 'INTRONIC':
continue
self.assertEquals(snp.PeptideAlleles, self.snp_aa_alleles[i])
def test_get_peptide_location(self):
"""should return correct location for aa variants"""
index = self.snp_names.index('rs11545807')
snp = list(self.human.getVariation(Symbol=self.snp_names[index]))[0]
self.assertEquals(snp.TranslationLocation, 95)
def test_validation_status(self):
"""should return correct validation status"""
def func(x):
if type(x) == str or x is None:
x = [x]
return set(x)
data = (('rs34213141', set(['freq']), func),
('rs12791610', set(['cluster', 'freq']), func),
('rs10792769', set(['cluster', 'freq', '1000Genome',
'hapmap', 'doublehit']), func))
for name, status, conv in data:
snp = list(self.human.getVariation(Symbol=name))[0]
self.assertTrue(status <= conv(snp.Validation))
def test_get_flanking_seq(self):
"""should correctly get the flanking sequence"""
for i in range(4): # only have flanking sequence for 3
snp = list(self.human.getVariation(Symbol=self.snp_names[i]))[0]
self.assertEquals(snp.FlankingSeq, self.snp_flanks[i])
def test_variation_seq(self):
"""should return the sequence for a Variation snp if asked"""
snp = list(self.human.getVariation(Symbol=self.snp_names[0]))[0]
self.assertContains(snp.Alleles, str(snp.Seq))
def test_get_validation_condition(self):
"""simple test of SNP validation status"""
snp_status = [('rs94', False), ('rs90', True)]
for symbol, status in snp_status:
snp = list(self.human.getVariation(Symbol=symbol, validated=True))
self.assertEquals(snp != [], status)
def test_allele_freqs(self):
"""exercising getting AlleleFreq data"""
snp = list(self.human.getVariation(Symbol='rs34213141'))[0]
expect = set([('A', '0.0303'), ('G', '0.9697')])
allele_freqs = snp.AlleleFreqs
allele_freqs = set((a, '%.4f' % f )
for a, f in allele_freqs.getRawData(['allele', 'freq']))
self.assertTrue(expect.issubset(allele_freqs))
class TestFeatures(GenomeTestBase):
def setUp(self):
self.igf2 = self.human.getGeneByStableId(StableId='ENSG00000167244')
def test_CpG_island(self):
"""should return correct CpG islands"""
CpGislands = self.human.getFeatures(region=self.igf2,
feature_types='CpG')
expected_stats = [(630, 757), (652, 537), (3254, 3533)]
obs_stats = [(int(island.Score), len(island)) \
for island in CpGislands]
obs_stats.sort()
self.assertTrue(set(expected_stats) & set(obs_stats) != set())
def test_get_multiple_features(self):
"""should not fail to get multiple feature types"""
regions =\
self.human.getFeatures(feature_types=['repeat','gene','cpg'],
CoordName=1, Start=869936,End=901867)
for region in regions:
pass
def test_repeats(self):
"""should correctly return a repeat"""
loc = self.igf2.Location.resized(-1000, 1000)
repeats = list(self.human.getFeatures(
region=loc, feature_types='repeat'))
self.assertTrue(len(repeats) >= 4)
def test_genes(self):
"""should correctly identify igf2 within a region"""
loc = self.igf2.Location.resized(-1000, 1000)
genes = self.human.getFeatures(region=loc, feature_types='gene')
symbols = [g.Symbol.lower() for g in genes]
self.assertContains(symbols, self.igf2.Symbol.lower())
def test_other_genes(self):
"""docstring for est_other_genes"""
mouse = self.mouse.getRegion(CoordName='5', Start=150791005,
End=150838512, Strand='-')
rat = self.rat.getRegion(CoordName='12', Start=4282534, End=4324019,
Strand='+')
for region in [mouse, rat]:
features = region.getFeatures(feature_types=['gene'])
ann_seq = region.getAnnotatedSeq(feature_types='gene')
genes = ann_seq.getAnnotationsMatching('gene')
self.assertTrue(genes != [])
def test_get_variation_feature(self):
"""should correctly return variation features within a region"""
snps = self.human.getFeatures(feature_types='variation',
region=self.brca2)
# snp coordname, start, end should satsify constraints of brca2 loc
c = 0
loc = self.brca2.Location
for snp in snps:
self.assertEquals(snp.Location.CoordName, loc.CoordName)
self.assertTrue(loc.Start < snp.Location.Start < loc.End)
c += 1
if c == 2:
break
def test_gene_feature_data_correct(self):
"""should apply gene feature data in a manner consistent with strand
and the Cogent sequence annotations slice should return the same
result"""
plus = list(self.human.getFeatures(feature_types='gene',
CoordName=13,
Start=31787610,
End=31871820))[0]
minus = plus.Location.copy()
minus.Strand *= -1
minus = self.human.getRegion(region = minus)
# get Sequence
plus_seq = plus.getAnnotatedSeq(feature_types='gene')
minus_seq = minus.getAnnotatedSeq(feature_types='gene')
# the seqs should be the rc of each other
self.assertEquals(str(plus_seq), str(minus_seq.rc()))
# the Cds, however, from the annotated sequences should be identical
plus_cds = plus_seq.getAnnotationsMatching('CDS')[0]
minus_cds = minus_seq.getAnnotationsMatching('CDS')[0]
self.assertEquals(str(plus_cds.getSlice()),str(minus_cds.getSlice()))
def test_other_feature_data_correct(self):
"""should apply CpG feature data in a manner consistent with strand"""
human = self.human
coord = dict(CoordName=11, Start=2165124,End=2165724)
exp_coord = dict(CoordName=11, Start=2165136, End=2165672)
exp_loc = human.getRegion(Strand=1, ensembl_coord=True, **exp_coord)
exp = exp_loc.Seq
ps_feat = human.getRegion(Strand=1, **coord)
ms_feat = human.getRegion(Strand=-1, **coord)
ps_seq = ps_feat.getAnnotatedSeq(feature_types='CpG')
ps_cgi = ps_seq.getAnnotationsMatching('CpGisland')[0]
self.assertEquals(ps_feat.Seq, ms_feat.Seq.rc())
self.assertEquals(ps_cgi.getSlice().rc(), exp)
ms_seq = ms_feat.getAnnotatedSeq(feature_types='CpG')
ms_cgi = ms_seq.getAnnotationsMatching('CpGisland')[0]
self.assertEquals(ms_cgi.getSlice(), ps_cgi.getSlice())
def test_other_repeat(self):
"""should apply repeat feature data in a manner consistent with strand"""
coord=dict(CoordName=13, Start=32890200, End=32890500)
ps_repeat = self.human.getRegion(Strand=1, **coord)
ms_repeat = self.human.getRegion(Strand=-1, **coord)
exp = DNA.makeSequence('CTTACTGTGAGGATGGGAACATTTTACAGCTGTGCTG'\
'TCCAAACCGGTGCCACTAGCCACATTAAGCACTCGAAACGTGGCTAGTGCGACTAGAGAAGAGGA'\
'TTTTCATACGATTTAGTTTCAATCACGCTAACCAGTGACGCGTGGCTAGTGG')
self.assertEquals(ms_repeat.Seq, ps_repeat.Seq.rc())
ps_annot_seq = ps_repeat.getAnnotatedSeq(feature_types='repeat')
ms_annot_seq = ms_repeat.getAnnotatedSeq(feature_types='repeat')
ps_seq = ps_annot_seq.getAnnotationsMatching('repeat')[0]
ms_seq = ms_annot_seq.getAnnotationsMatching('repeat')[0]
self.assertEquals(ms_seq.getSlice(), ps_seq.getSlice())
self.assertEquals(ps_seq.getSlice(), exp)
def test_get_features_from_nt(self):
"""should correctly return the encompassing gene from 1nt"""
snp = list(self.human.getVariation(Symbol='rs34213141'))[0]
gene=list(self.human.getFeatures(feature_types='gene',region=snp))[0]
self.assertEquals(gene.StableId, 'ENSG00000254997')
class TestAssembly(TestCase):
def test_assemble_seq(self):
"""should correctly fill in a sequence with N's"""
expect = DNA.makeSequence("NAAAAANNCCCCCNNGGGNNN")
frags = ["AAAAA","CCCCC","GGG"]
positions = [(11, 16), (18, 23), (25, 28)]
self.assertEqual(_assemble_seq(frags, 10, 31, positions), expect)
positions = [(1, 6), (8, 13), (15, 18)]
self.assertEqual(_assemble_seq(frags, 0, 21, positions), expect)
# should work with:
# start matches first frag start
expect = DNA.makeSequence("AAAAANNCCCCCNNGGGNNN")
positions = [(0, 5), (7, 12), (14, 17)]
self.assertEqual(_assemble_seq(frags, 0, 20, positions), expect)
# end matches last frag_end
expect = DNA.makeSequence("NAAAAANNCCCCCNNGGG")
positions = [(11, 16), (18, 23), (25, 28)]
self.assertEqual(_assemble_seq(frags, 10, 28, positions), expect)
# both start and end matched
expect = DNA.makeSequence("AAAAANNCCCCCNNGGG")
positions = [(10, 15), (17, 22), (24, 27)]
self.assertEqual(_assemble_seq(frags, 10, 27, positions), expect)
# one frag
expect = DNA.makeSequence(''.join(frags))
positions = [(10, 23)]
self.assertEqual(_assemble_seq([''.join(frags)],10,23,positions),
expect)
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_db/test_ensembl/test_host.py 000644 000765 000024 00000011317 12024702176 024104 0 ustar 00jrideout staff 000000 000000 import os
from cogent.util.unit_test import TestCase, main
from cogent.db.ensembl.name import EnsemblDbName
from cogent.db.ensembl.host import get_db_name, get_latest_release,\
DbConnection, HostAccount, get_ensembl_account
from cogent.db.ensembl.species import Species
__author__ = "Gavin Huttley, Hua Ying"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Gavin Huttley", "hua Ying"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "Gavin.Huttley@anu.edu.au"
__status__ = "alpha"
Release = 68
if 'ENSEMBL_ACCOUNT' in os.environ:
args = os.environ['ENSEMBL_ACCOUNT'].split()
host, username, password = args[0:3]
kwargs = {}
if len(args) > 3:
kwargs['port'] = int(args[3])
account = HostAccount(host, username, password, **kwargs)
else:
account = get_ensembl_account(release=Release)
class TestEnsemblDbName(TestCase):
def test_cmp_name(self):
"""should validly compare names by attributes"""
n1 = EnsemblDbName('homo_sapiens_core_46_36h')
n2 = EnsemblDbName('homo_sapiens_core_46_36h')
self.assertEqual(n1, n2)
def test_name_without_build(self):
"""should correctly handle a db name without a build"""
n = EnsemblDbName("pongo_pygmaeus_core_49_1")
self.assertEqual(n.Prefix, "pongo_pygmaeus")
self.assertEqual(n.Type, "core")
self.assertEqual(n.Build, '1')
def test_ensemblgenomes_names(self):
"""correctly handle the ensemblgenomes naming system"""
n = EnsemblDbName('aedes_aegypti_core_5_58_1e')
self.assertEqual(n.Prefix, 'aedes_aegypti')
self.assertEqual(n.Type, 'core')
self.assertEqual(n.Release, '5')
self.assertEqual(n.GeneralRelease, '58')
self.assertEqual(n.Build, '1e')
n = EnsemblDbName('ensembl_compara_metazoa_6_59')
self.assertEqual(n.Release, '6')
self.assertEqual(n.GeneralRelease, '59')
self.assertEqual(n.Type, 'compara')
class TestDBconnects(TestCase):
def test_get_ensembl_account(self):
"""return an HostAccount with correct port"""
for release in [48, '48', None]:
act_new = get_ensembl_account(release=release)
self.assertEqual(act_new.port, 5306)
for release in [45, '45']:
act_old = get_ensembl_account(release=45)
self.assertEqual(act_old.port, 3306)
def test_getdb(self):
"""should discover human entries correctly"""
for name, db_name in [("human", "homo_sapiens_core_49_36k"),
("mouse", "mus_musculus_core_49_37b"),
("rat", "rattus_norvegicus_core_49_34s"),
("platypus", "ornithorhynchus_anatinus_core_49_1f")]:
result = get_db_name(species=name, db_type="core", release='49')
self.assertEqual(len(result), 1)
result = result[0]
self.assertEqual(result.Name, db_name)
self.assertEqual(result.Release, '49')
def test_latest_release_number(self):
"""should correctly the latest release number"""
self.assertGreaterThan(get_latest_release(), "53")
def test_get_all_available(self):
"""should return a listing of all the available databases on the
indicated server"""
available = get_db_name()
# make sure we have a compara db present -- a crude check on
# correctness
one_valid = False
for db in available:
if db.Type == "compara":
one_valid = True
break
self.assertEqual(one_valid, True)
# now check that when we request available under a specific version
# that we only receive valid ones back
available = get_db_name(release="46")
for db in available:
self.assertEqual(db.Release, '46')
def test_active_connections(self):
"""connecting to a database on a specified server should be done once
only, but same database on a different server should be done"""
ensembl_acct = get_ensembl_account(release='46')
engine1 = DbConnection(account=ensembl_acct,
db_name="homo_sapiens_core_46_36h")
engine2 = DbConnection(account=ensembl_acct,
db_name="homo_sapiens_core_46_36h")
self.assertEqual(engine1, engine2)
def test_pool_recycle_option(self):
"""excercising ability to specify a pool recycle option"""
ensembl_acct = get_ensembl_account(release='56')
engine1 = DbConnection(account=ensembl_acct,
db_name="homo_sapiens_core_46_36h", pool_recycle=1000)
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_db/test_ensembl/test_metazoa.py 000644 000765 000024 00000005364 12024702176 024574 0 ustar 00jrideout staff 000000 000000 from cogent.db.ensembl.host import HostAccount, get_ensembl_account
from cogent.db.ensembl.compara import Compara, Genome
from cogent.util.unit_test import TestCase, main
__author__ = "Jason Merkin"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Gavin Huttley", "Hua Ying"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "Gavin.Huttley@anu.edu.au"
__status__ = "alpha"
Release = 12
account = HostAccount('mysql.ebi.ac.uk','anonymous', '', port=4157)
class MZ_ComparaTestBase(TestCase):
comp = Compara(['D.grimshawi', 'D.melanogaster'], Release=Release,
account=account, division='metazoa')
class MZ_TestCompara(MZ_ComparaTestBase):
def test_query_genome(self):
"""compara should attach valid genome attributes by common name"""
brca2 = self.comp.Dmelanogaster.getGeneByStableId("FBgn0050169")
self.assertEquals(brca2.Symbol.lower(), 'brca2')
def test_get_related_genes(self):
"""should correctly return the related gene regions from each genome"""
# using sc35, a splicing factor
sc35 = self.comp.Dmelanogaster.getGeneByStableId("FBgn0040286")
Orthologs = self.comp.getRelatedGenes(gene_region=sc35,
Relationship="ortholog_one2one")
self.assertEquals("ortholog_one2one", Orthologs.Relationships[0])
def test_get_related_genes2(self):
"""should handle case where gene is absent from one of the genomes"""
# here, it is brca2
brca2 = self.comp.Dmelanogaster.getGeneByStableId(
StableId='FBgn0050169')
orthologs = self.comp.getRelatedGenes(gene_region=brca2,
Relationship='ortholog_one2one')
self.assertEquals(len(orthologs.Members),2)
def test_get_collection(self):
sc35 = self.comp.Dmelanogaster.getGeneByStableId(StableId="FBgn0040286")
Orthologs = self.comp.getRelatedGenes(gene_region=sc35,
Relationship="ortholog_one2one")
collection = Orthologs.getSeqCollection()
self.assertTrue(len(collection.Seqs[0])> 1000)
class MZ_Genome(TestCase):
def test_get_general_release(self):
"""should correctly infer the general release"""
rel_lt_65 = Genome('D.melanogaster', Release=11, account=account)
self.assertEqual(rel_lt_65.GeneralRelease, 64)
self.assertEqual(rel_lt_65.CoreDb.db_name, 'drosophila_melanogaster_core_11_64_539')
rel_gt_65 = Genome('D.melanogaster', Release=13, account=account)
self.assertEqual(rel_gt_65.GeneralRelease, 66)
self.assertEqual(rel_gt_65.CoreDb.db_name, 'drosophila_melanogaster_core_13_66_539')
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_db/test_ensembl/test_species.py 000644 000765 000024 00000006732 12024702176 024567 0 ustar 00jrideout staff 000000 000000 from cogent.util.unit_test import TestCase, main
from cogent.db.ensembl.species import Species
__author__ = "Gavin Huttley, Hua Ying"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Gavin Huttley", "hua Ying"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "Gavin.Huttley@anu.edu.au"
__status__ = "alpha"
class TestSpeciesNamemaps(TestCase):
def test_get_name_type(self):
"""should return the (latin|common) name given a latin, common or ensembl
db prefix names"""
self.assertEqual(Species.getSpeciesName("human"), "Homo sapiens")
self.assertEqual(Species.getSpeciesName("homo_sapiens"), "Homo sapiens")
self.assertEqual(Species.getCommonName("Mus musculus"), "Mouse")
self.assertEqual(Species.getCommonName("mus_musculus"), "Mouse")
def test_get_ensembl_format(self):
"""should take common or latin names and return the corresponding
ensembl db prefix"""
self.assertEqual(Species.getEnsemblDbPrefix("human"), "homo_sapiens")
self.assertEqual(Species.getEnsemblDbPrefix("mouse"), "mus_musculus")
self.assertEqual(Species.getEnsemblDbPrefix("Mus musculus"),
"mus_musculus")
def test_add_new_species(self):
"""should correctly add a new species/common combination and infer the
correct ensembl prefix"""
species_name, common_name = "Otolemur garnettii", "Bushbaby"
Species.amendSpecies(species_name, common_name)
self.assertEqual(Species.getSpeciesName(species_name), species_name)
self.assertEqual(Species.getSpeciesName("Bushbaby"), species_name)
self.assertEqual(Species.getSpeciesName(common_name), species_name)
self.assertEqual(Species.getCommonName(species_name), common_name)
self.assertEqual(Species.getCommonName("Bushbaby"), common_name)
self.assertEqual(Species.getEnsemblDbPrefix("Bushbaby"), "otolemur_garnettii")
self.assertEqual(Species.getEnsemblDbPrefix(species_name), "otolemur_garnettii")
self.assertEqual(Species.getEnsemblDbPrefix(common_name), "otolemur_garnettii")
def test_amend_existing(self):
"""should correctly amend an existing species"""
species_name = 'Ochotona princeps'
common_name1 = 'american pika'
common_name2 = 'pika'
ensembl_pref = 'ochotona_princeps'
Species.amendSpecies(species_name, common_name1)
self.assertEqual(Species.getCommonName(species_name),common_name1)
Species.amendSpecies(species_name, common_name2)
self.assertEqual(Species.getSpeciesName(common_name2), species_name)
self.assertEqual(Species.getSpeciesName(ensembl_pref), species_name)
self.assertEqual(Species.getCommonName(species_name), common_name2)
self.assertEqual(Species.getCommonName(ensembl_pref), common_name2)
self.assertEqual(Species.getEnsemblDbPrefix(species_name),
ensembl_pref)
self.assertEqual(Species.getEnsemblDbPrefix(common_name2),
ensembl_pref)
def test_get_compara_name(self):
"""should correctly form valid names for assignment onto objects"""
self.assertEqual(Species.getComparaName('pika'), 'Pika')
self.assertEqual(Species.getComparaName('C.elegans'), 'Celegans')
self.assertEqual(Species.getComparaName('Caenorhabditis elegans'),
'Celegans')
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_data/__init__.py 000644 000765 000024 00000000450 12024702176 021463 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
__all__ = ['test_molecular_weight']
__author__ = ""
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
PyCogent-1.5.3/tests/test_data/test_molecular_weight.py 000644 000765 000024 00000002104 12024702176 024313 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests for molecular weight.
"""
from cogent.util.unit_test import TestCase, main
from cogent.data.molecular_weight import WeightCalculator, DnaMW, RnaMW, \
ProteinMW
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class WeightCalculatorTests(TestCase):
"""Tests for WeightCalculator, which should calculate molecular weights.
"""
def test_call(self):
"""WeightCalculator should return correct molecular weight"""
r = RnaMW
p = ProteinMW
self.assertEqual(p(''), 0)
self.assertEqual(r(''), 0)
self.assertFloatEqual(p('A'), 107.09)
self.assertFloatEqual(r('A'), 375.17)
self.assertFloatEqual(p('AAA'), 285.27)
self.assertFloatEqual(r('AAA'), 1001.59)
self.assertFloatEqual(r('AAACCCA'), 2182.37)
#run if called from command-line
if __name__ == "__main__":
main()
PyCogent-1.5.3/tests/test_core/__init__.py 000644 000765 000024 00000001566 12024702176 021513 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
__all__ = ['test_alignment',
'test_alphabet',
'test_annotation',
'test_bitvector',
'test_core_standalone',
'test_entity',
'test_genetic_code',
'test_info',
'test_location',
'test_maps',
'test_moltype',
'test_profile',
'test_sequence',
'test_tree',
'test_tree2',
'test_usage']
__author__ = ""
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Catherine Lozupone", "Peter Maxwell", "Rob Knight",
"Gavin Huttley", "Jeremy Widmann", "Greg Caporaso",
"Sandra Smit", "Justin Kuczynski", "Marcin Cieslik"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
PyCogent-1.5.3/tests/test_core/test_alignment.py 000644 000765 000024 00000247514 12024702176 022776 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
from cogent.util.unit_test import TestCase, main
from cogent.core.sequence import RnaSequence, frac_same, ModelSequence, Sequence
from cogent.maths.stats.util import Freqs, Numbers
from cogent.core.moltype import RNA, DNA, PROTEIN, BYTES
from cogent.struct.rna2d import ViennaStructure
from cogent.core.alignment import SequenceCollection, \
make_gap_filter, coerce_to_string, \
seqs_from_array, seqs_from_model_seqs, seqs_from_generic, seqs_from_fasta, \
seqs_from_dict, seqs_from_aln, seqs_from_kv_pairs, seqs_from_empty, \
aln_from_array, aln_from_model_seqs, aln_from_collection,\
aln_from_generic, aln_from_fasta, aln_from_dense_aln, aln_from_empty, \
DenseAlignment, Alignment, DataError
from cogent.core.moltype import AB, DNA
from cogent.parse.fasta import MinimalFastaParser
from numpy import array, arange, transpose
from tempfile import mktemp
from os import remove
import re
__author__ = "Rob Knight"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jeremy Widmann", "Catherine Lozuopone", "Gavin Huttley",
"Rob Knight", "Daniel McDonald", "Jan Kosinski"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class alignment_tests(TestCase):
"""Tests of top-level functions."""
def test_seqs_from_array(self):
"""seqs_from_array should return chars, and successive indices."""
a = array([[0,1,2],[2,1,0]]) #three 2-char seqs
obs_a, obs_labels = seqs_from_array(a)
#note transposition
self.assertEqual(obs_a, [array([0,2]), array([1,1]), array([2,0])])
self.assertEqual(obs_labels, None)
def test_seqs_from_model_seqs(self):
"""seqs_from_model_seqs should return model seqs + names."""
s1 = ModelSequence('ABC', Name='a')
s2 = ModelSequence('DEF', Name='b')
obs_a, obs_labels = seqs_from_model_seqs([s1, s2])
self.assertEqual(obs_a, [s1,s2]) #seq -> numbers
self.assertEqual(obs_labels, ['a','b'])
def test_seqs_from_generic(self):
"""seqs_from_generic should initialize seqs from list of lists, etc."""
s1 = 'ABC'
s2 = 'DEF'
obs_a, obs_labels = seqs_from_generic([s1, s2])
self.assertEqual(obs_a, ['ABC','DEF'])
self.assertEqual(obs_labels, [None, None])
def test_seqs_from_fasta(self):
"""seqs_from_fasta should initialize seqs from fasta-format string"""
s = '>aa\nAB\nC\n>bb\nDE\nF\n'
obs_a, obs_labels = seqs_from_fasta(s)
self.assertEqual(obs_a, ['ABC','DEF'])
self.assertEqual(obs_labels, ['aa','bb'])
def test_seqs_from_aln(self):
"""seqs_from_aln should initialize from existing alignment"""
c = SequenceCollection(['abc','def'])
obs_a, obs_labels = seqs_from_aln(c)
self.assertEqual(obs_a, ['abc','def'])
self.assertEqual(obs_labels, ['seq_0','seq_1'])
def test_seqs_from_kv_pairs(self):
"""seqs_from_kv_pairs should initialize from key-value pairs"""
c = [['a', 'abc'], ['b', 'def']]
obs_a, obs_labels = seqs_from_kv_pairs(c)
self.assertEqual(obs_a, ['abc','def'])
self.assertEqual(obs_labels, ['a','b'])
def test_seqs_from_empty(self):
"""seqs_from_empty should always raise ValueError"""
self.assertRaises(ValueError, seqs_from_empty, 'xyz')
def test_aln_from_array(self):
"""aln_from_array should return same array, and successive indices."""
a = array([[0,1,2],[3,4,5]]) #three 2-char seqs
obs_a, obs_labels = aln_from_array(a)
self.assertEqual(obs_a, transpose(a))
self.assertEqual(obs_labels, None)
def test_aln_from_model_seqs(self):
"""aln_from_model_seqs should initialize aln from sequence objects."""
s1 = ModelSequence('ACC', Name='a', Alphabet=RNA.Alphabet)
s2 = ModelSequence('GGU', Name='b', Alphabet=RNA.Alphabet)
obs_a, obs_labels = aln_from_model_seqs([s1, s2], \
Alphabet=BYTES.Alphabet)
self.assertEqual(obs_a, array([[2,1,1],[3,3,0]], 'b'))
#seq -> numbers
self.assertEqual(obs_labels, ['a','b'])
def test_aln_from_generic(self):
"""aln_from_generic should initialize aln from list of lists, etc."""
s1 = 'AAA'
s2 = 'GGG'
obs_a, obs_labels = aln_from_generic([s1, s2], 'b', \
Alphabet=RNA.Alphabet) #specify array type
self.assertEqual(obs_a, array([[2,2,2],[3,3,3]], 'b')) #str -> chars
self.assertEqual(obs_labels, [None, None])
def test_aln_from_fasta(self):
"""aln_from_fasta should initialize aln from fasta-format string"""
s = '>aa\nAB\nC\n>bb\nDE\nF\n'
obs_a, obs_labels = aln_from_fasta(s.splitlines())
self.assertEqual(obs_a, array(['ABC','DEF'], 'c').view('B')) #seq -> numbers
self.assertEqual(obs_labels, ['aa','bb'])
def test_aln_from_dense_aln(self):
"""aln_from_dense_aln should initialize from existing alignment"""
a = DenseAlignment(array([[0,1,2],[3,4,5]]), conversion_f=aln_from_array)
obs_a, obs_labels = aln_from_dense_aln(a)
self.assertEqual(obs_a, a.SeqData)
self.assertEqual(obs_labels, a.Names)
def test_aln_from_collection(self):
"""aln_from_collection should initialize from existing alignment"""
a = SequenceCollection(['AAA','GGG'])
obs_a, obs_labels = aln_from_collection(a, Alphabet=RNA.Alphabet)
self.assertEqual(a.toFasta(), '>seq_0\nAAA\n>seq_1\nGGG')
self.assertEqual(obs_a, array([[2,2,2],[3,3,3]]))
def test_aln_from_empty(self):
"""aln_from_empty should always raise ValueError"""
self.assertRaises(ValueError, aln_from_empty, 'xyz')
class SequenceCollectionBaseTests(object):
"""Base class for testing the SequenceCollection object.
Unlike Alignments, SequenceCollections can have sequences that are not equal
length. This module contains all the code that _doesn't_ depend on being
able to look at "ragged" SequenceCollections. It is intended that all
classes that inherit from SequenceCollection should have test classes that
inherit from this class, but that the SequenceCollection tests themselves
will additionally contain code to deal with SequenceCollections of unequal
length.
set self.Class in subclasses to generate the rught constructor.
"""
Class = SequenceCollection
def setUp(self):
"""Define some standard SequenceCollection objects."""
self.one_seq = self.Class({'a':'AAAAA'})
self.ragged_padded = self.Class({'a':'AAAAAA','b':'AAA---', \
'c':'AAAA--'})
self.identical = self.Class({'a':'AAAA','b':'AAAA'})
self.gaps = self.Class({'a':'AAAAAAA','b':'A--A-AA', \
'c':'AA-----'})
self.gaps_rna = self.Class({'a':RnaSequence('AAAAAAA'), \
'b':RnaSequence('A--A-AA'), \
'c':RnaSequence('AA-----')})
self.unordered = self.Class({'a':'AAAAA','b':'BBBBB'})
self.ordered1 = self.Class({'a':'AAAAA','b':'BBBBB'}, \
Names=['a','b'])
self.ordered2 = self.Class({'a':'AAAAA','b':'BBBBB'}, \
Names=['b','a'])
self.mixed = self.Class({'a':'ABCDE', 'b':'LMNOP'})
self.end_gaps = self.Class({'a':'--A-BC-', 'b':'-CB-A--', \
'c':'--D-EF-'}, Names=['a','b','c'])
self.many = self.Class({
'a': RnaSequence('UCAGUCAGUU'),
'b': RnaSequence('UCCGUCAAUU'),
'c': RnaSequence('ACCAUCAGUC'),
'd': RnaSequence('UCAAUCGGUU'),
'e': RnaSequence('UUGGUUGGGU'),
'f': RnaSequence('CCGGGCGGCC'),
'g': RnaSequence('UCAACCGGAA'),
})
#Additional SequenceCollections for tests added 6/4/04 by Jeremy Widmann
self.sequences = self.Class(map(RnaSequence, ['UCAG', 'UCAG', 'UCAG']))
self.structures = self.Class(map(ViennaStructure,
['(())..', '......', '(....)']), MolType=BYTES)
self.labeled = self.Class(['ABC', 'DEF'], ['1st', '2nd'])
#Additional SequenceCollection for tests added 1/30/06 by Cathy Lozupone
self.omitSeqsTemplate_aln = self.Class({
's1':RnaSequence('UC-----CU---C'),
's2':RnaSequence('UC------U---C'),
's3':RnaSequence('UUCCUUCUU-UUC'),
's4':RnaSequence('UU-UUUU-UUUUC'),
's5':RnaSequence('-------------')
})
self.a = DenseAlignment(['AAA','AAA'])
self.b = Alignment(['AAA','AAA'])
self.c = SequenceCollection(['AAA','AAA'])
def test_guess_input_type(self):
"""SequenceCollection _guess_input_type should figure out data type correctly"""
git = self.a._guess_input_type
self.assertEqual(git(self.a), 'dense_aln')
self.assertEqual(git(self.b), 'aln')
self.assertEqual(git(self.c), 'collection')
self.assertEqual(git('>ab\nabc'), 'fasta')
self.assertEqual(git(['>ab','abc']), 'fasta')
self.assertEqual(git(['abc','def']), 'generic')
self.assertEqual(git([[1,2],[4,5]]), 'kv_pairs') #precedence over generic
self.assertEqual(git([[1,2,3],[4,5,6]]), 'generic')
self.assertEqual(git([ModelSequence('abc')]), 'model_seqs')
self.assertEqual(git(array([[1,2,3],[4,5,6]])), 'array')
self.assertEqual(git({'a':'aca'}), 'dict')
self.assertEqual(git([]), 'empty')
def test_init_aln(self):
""" SequenceCollection should init from existing alignments"""
exp = self.Class(['AAA','AAA'])
x = self.Class(self.a)
y = self.Class(self.b)
z = self.Class(self.c)
self.assertEqual(x, exp)
self.assertEqual(z, exp)
self.assertEqual(y, exp)
test_init_aln.__doc__ = Class.__name__ + test_init_aln.__doc__
def test_init_dict(self):
"""SequenceCollection init from dict should work as expected"""
d = {'a':'AAAAA', 'b':'BBBBB'}
a = self.Class(d)
self.assertEqual(a, d)
self.assertEqual(a.NamedSeqs.items(), d.items())
def test_init_name_mapped(self):
"""SequenceCollection init should allow name mapping function"""
d = {'a':'AAAAA', 'b':'BBBBB'}
f = lambda x: x.upper()
a = self.Class(d, label_to_name=f)
self.assertNotEqual(a, d)
self.assertNotEqual(a.NamedSeqs.items(), d.items())
d_upper = {'A':'AAAAA','B':'BBBBB'}
self.assertEqual(a, d_upper)
self.assertEqual(a.NamedSeqs.items(), d_upper.items())
def test_init_seq(self):
"""SequenceCollection init from list of sequences should use indices as keys"""
seqs = ['AAAAA', 'BBBBB', 'CCCCC']
a = self.Class(seqs)
self.assertEqual(len(a.NamedSeqs), 3)
self.assertEqual(a.NamedSeqs['seq_0'], 'AAAAA')
self.assertEqual(a.NamedSeqs['seq_1'], 'BBBBB')
self.assertEqual(a.NamedSeqs['seq_2'], 'CCCCC')
self.assertEqual(a.Names, ['seq_0','seq_1','seq_2'])
self.assertEqual(list(a.Seqs), ['AAAAA','BBBBB','CCCCC'])
def test_init_annotated_seq(self):
"""SequenceCollection init from seqs w/ Info should preserve data"""
a = Sequence('AAA', Name='a', Info={'x':3})
b = Sequence('CCC', Name='b', Info={'x':4})
c = Sequence('GGG', Name='c', Info={'x':5})
seqs = [c,b,a]
a = self.Class(seqs)
self.assertEqual(list(a.Names), ['c','b','a'])
self.assertEqual(map(str, a.Seqs), ['GGG','CCC','AAA'])
if self.Class is not DenseAlignment:
#DenseAlignment is allowed to strip Info objects
self.assertEqual([i.Info.x for i in a.Seqs], [5,4,3])
#check it still works if constructed from same class
b = self.Class(a)
self.assertEqual(list(b.Names), ['c','b','a'])
self.assertEqual(map(str, b.Seqs), ['GGG','CCC','AAA'])
if self.Class is not DenseAlignment:
#DenseAlignment is allowed to strip Info objects
self.assertEqual([i.Info.x for i in b.Seqs], [5,4,3])
def test_init_pairs(self):
"""SequenceCollection init from list of (key,val) pairs should work correctly"""
seqs = [['x', 'XXX'], ['b','BBB'], ['c','CCC']]
a = self.Class(seqs)
self.assertEqual(len(a.NamedSeqs), 3)
self.assertEqual(a.NamedSeqs['x'], 'XXX')
self.assertEqual(a.NamedSeqs['b'], 'BBB')
self.assertEqual(a.NamedSeqs['c'], 'CCC')
self.assertEqual(a.Names, ['x','b','c'])
self.assertEqual(list(a.Seqs), ['XXX','BBB','CCC'])
def test_init_duplicate_keys(self):
"""SequenceCollection init from (key, val) pairs should fail on dup. keys"""
seqs = [['x', 'XXX'], ['b','BBB'],['x','CCC'], ['d','DDD'], ['a','AAA']]
self.assertRaises(ValueError, self.Class, seqs)
aln = self.Class(seqs, remove_duplicate_names=True)
self.assertEqual(str(self.Class(seqs, remove_duplicate_names=True)),
'>x\nXXX\n>b\nBBB\n>d\nDDD\n>a\nAAA\n')
def test_init_ordered(self):
"""SequenceCollection should iterate over seqs correctly even if ordered"""
first = self.ordered1
sec = self.ordered2
un = self.unordered
self.assertEqual(first.Names, ['a','b'])
self.assertEqual(sec.Names, ['b', 'a'])
self.assertEqual(un.Names, un.NamedSeqs.keys())
first_list = list(first.Seqs)
sec_list = list(sec.Seqs)
un_list = list(un.Seqs)
self.assertEqual(first_list, ['AAAAA','BBBBB'])
self.assertEqual(sec_list, ['BBBBB','AAAAA'])
#check that the unordered seq matches one of the lists
self.assertTrue((un_list == first_list) or (un_list == sec_list))
self.assertNotEqual(first_list, sec_list)
def test_init_ambig(self):
"""SequenceCollection should tolerate ambiguous chars"""
aln = self.Class(['AAA','CCC'],MolType=DNA)
aln = self.Class(['ANS','CWC'],MolType=DNA)
aln = self.Class(['A-A','CC-'],MolType=DNA)
aln = self.Class(['A?A','CC-'],MolType=DNA)
def test_aln_from_fasta_parser(self):
"""aln_from_fasta_parser should init from iterator"""
s = '>aa\nAC\n>bb\nAA\n>c\nGG\n'.splitlines()
p = MinimalFastaParser(s)
aln = self.Class(p, MolType=DNA)
self.assertEqual(aln.NamedSeqs['aa'], 'AC')
self.assertEqual(aln.toFasta(), '>aa\nAC\n>bb\nAA\n>c\nGG')
s2_ORIG = '>x\nCA\n>b\nAA\n>>xx\nGG'
s2 = '>aa\nAC\n>bb\nAA\n>c\nGG\n'
d = DenseAlignment(MinimalFastaParser(s2.splitlines()))
self.assertEqual(d.toFasta(), aln.toFasta())
def test_aln_from_fasta(self):
"""SequenceCollection should init from fasta-format string"""
s = '>aa\nAC\n>bb\nAA\n>c\nGG\n'
aln = self.Class(s)
self.assertEqual(aln.toFasta(), s.strip())
def test_SeqLen_get(self):
"""SequenceCollection SeqLen should return length of longest seq"""
self.assertEqual(self.one_seq.SeqLen, 5)
self.assertEqual(self.identical.SeqLen, 4)
self.assertEqual(self.gaps.SeqLen, 7)
def test_Seqs(self):
"""SequenceCollection Seqs property should return seqs in correct order."""
first = self.ordered1
sec = self.ordered2
un = self.unordered
first_list = list(first.Seqs)
sec_list = list(sec.Seqs)
un_list = list(un.Seqs)
self.assertEqual(first_list, ['AAAAA','BBBBB'])
self.assertEqual(sec_list, ['BBBBB','AAAAA'])
#check that the unordered seq matches one of the lists
self.assertTrue((un_list == first_list) or (un_list == sec_list))
self.assertNotEqual(first_list, sec_list)
def test_iterSeqs(self):
"""SequenceCollection iterSeqs() method should support reordering of seqs"""
self.ragged_padded = self.Class(self.ragged_padded.NamedSeqs, \
Names=['a','b','c'])
seqs = list(self.ragged_padded.iterSeqs())
self.assertEqual(seqs, ['AAAAAA', 'AAA---', 'AAAA--'])
seqs = list(self.ragged_padded.iterSeqs(seq_order=['b','a','a']))
self.assertEqual(seqs, ['AAA---', 'AAAAAA', 'AAAAAA'])
self.assertSameObj(seqs[1], seqs[2])
self.assertSameObj(seqs[0], self.ragged_padded.NamedSeqs['b'])
def test_Items(self):
"""SequenceCollection Items should iterate over items in specified order."""
#should work if one row
self.assertEqual(list(self.one_seq.Items), ['A']*5)
#should take order into account
self.assertEqual(list(self.ordered1.Items), ['A']*5 + ['B']*5)
self.assertEqual(list(self.ordered2.Items), ['B']*5 + ['A']*5)
def test_iterItems(self):
"""SequenceCollection iterItems() should iterate over items in correct order"""
#should work if one row
self.assertEqual(list(self.one_seq.iterItems()), ['A']*5)
#should take order into account
self.assertEqual(list(self.ordered1.iterItems()), ['A']*5 + ['B']*5)
self.assertEqual(list(self.ordered2.iterItems()), ['B']*5 + ['A']*5)
#should allow row and/or col specification
r = self.ragged_padded
self.assertEqual(list(r.iterItems(seq_order=['c','b'], \
pos_order=[5,1,3])), list('-AA-A-'))
#should not interfere with superclass iteritems()
i = list(r.NamedSeqs.iteritems())
i.sort()
self.assertEqual(i, [('a','AAAAAA'),('b','AAA---'),('c','AAAA--')])
def test_takeSeqs(self):
"""SequenceCollection takeSeqs should return new SequenceCollection with selected seqs."""
a = self.ragged_padded.takeSeqs('bc')
self.assertTrue(isinstance(a, SequenceCollection))
self.assertEqual(a, {'b':'AAA---','c':'AAAA--'})
#should be able to negate
a = self.ragged_padded.takeSeqs('bc', negate=True)
self.assertEqual(a, {'a':'AAAAAA'})
def test_takeSeqs_moltype(self):
"""takeSeqs should preserve the MolType"""
orig = self.Class(data={'a':'CCCCCC','b':'AAA---', 'c':'AAAA--'}, MolType=DNA)
subset = orig.takeSeqs('ab')
self.assertEqual(set(subset.MolType), set(orig.MolType))
def test_getSeqIndices(self):
"""SequenceCollection getSeqIndices should return names of seqs where f(row) is True"""
srp = self.ragged_padded
is_long = lambda x: len(x) > 10
is_med = lambda x: len(str(x).replace('-','')) > 3 #strips gaps
is_any = lambda x: len(x) > 0
self.assertEqual(srp.getSeqIndices(is_long), [])
srp.Names = 'cba'
self.assertEqual(srp.getSeqIndices(is_med), ['c','a'])
srp.Names = 'bac'
self.assertEqual(srp.getSeqIndices(is_med), ['a','c'])
self.assertEqual(srp.getSeqIndices(is_any),['b','a','c'])
#should be able to negate
self.assertEqual(srp.getSeqIndices(is_med, negate=True), ['b'])
self.assertEqual(srp.getSeqIndices(is_any, negate=True), [])
def test_takeSeqsIf(self):
"""SequenceCollection takeSeqsIf should return seqs where f(row) is True"""
is_long = lambda x: len(x) > 10
is_med = lambda x: len(str(x).replace('-','')) > 3
is_any = lambda x: len(x) > 0
srp = self.ragged_padded
self.assertEqual(srp.takeSeqsIf(is_long), {})
srp.Names = 'cba'
self.assertEqual(srp.takeSeqsIf(is_med), \
{'c':'AAAA--','a':'AAAAAA'})
srp.Names = srp.NamedSeqs.keys()
self.assertEqual(srp.takeSeqsIf(is_med), \
{'c':'AAAA--','a':'AAAAAA'})
self.assertEqual(srp.takeSeqsIf(is_any), srp)
self.assertTrue(isinstance(srp.takeSeqsIf(is_med), SequenceCollection))
#should be able to negate
self.assertEqual(srp.takeSeqsIf(is_med, negate=True), \
{'b':'AAA---'})
def test_getItems(self):
"""SequenceCollection getItems should return list of items from k,v pairs"""
self.assertEqual(self.mixed.getItems([('a',3),('b',4),('a',0)]), \
['D','P','A'])
self.assertRaises(KeyError, self.mixed.getItems, [('x','y')])
self.assertRaises(IndexError, self.mixed.getItems, [('a',1000)])
#should be able to negate -- note that results will have seqs in
#arbitrary order
self.assertEqualItems(self.mixed.getItems([('a',3),('b',4),('a',0)], \
negate=True), ['B','C','E','L','M','N','O'])
def test_getItemIndices(self):
"""SequenceCollection getItemIndices should return coordinates of matching items"""
is_vowel = lambda x: x in 'AEIOU'
#reverse name order to test that it's not alphabetical
self.mixed = self.Class(self.mixed.NamedSeqs, Names=['b','a'])
self.assertEqual(self.mixed.getItemIndices(is_vowel), \
[('b',3),('a',0),('a',4)])
is_lower = lambda x: x.islower()
self.assertEqual(self.ragged_padded.getItemIndices(is_lower), [])
#should be able to negate
self.assertEqualItems(self.mixed.getItemIndices(is_vowel, negate=True),\
[('a',1),('a',2),('a',3),('b',0),('b',1),('b',2),('b',4)])
def test_getItemsIf(self):
"""SequenceCollection getItemsIf should return matching items"""
is_vowel = lambda x: x in 'AEIOU'
#reverse name order to test that it's not alphabetical
self.mixed = self.Class(self.mixed.NamedSeqs, Names=['b','a'])
self.assertEqual(self.mixed.getItemsIf(is_vowel), ['O','A','E'])
self.assertEqual(self.one_seq.getItemsIf(is_vowel), list('AAAAA'))
#should be able to negate
self.assertEqualItems(self.mixed.getItemsIf(is_vowel, negate=True), \
list('BCDLMNP'))
def test_getSimilar(self):
"""SequenceCollection getSimilar should get all sequences close to target seq"""
aln = self.many
x = RnaSequence('GGGGGGGGGG')
y = RnaSequence('----------')
#test min and max similarity ranges
result = aln.getSimilar(aln.NamedSeqs['a'], min_similarity=0.4,\
max_similarity=0.7)
for seq in 'cefg':
self.assertContains(result.NamedSeqs, seq)
self.assertEquals(result.NamedSeqs[seq], aln.NamedSeqs[seq])
self.assertEqual(len(result.NamedSeqs), 4)
result = aln.getSimilar(aln.NamedSeqs['a'], min_similarity=0.95, \
max_similarity=1)
for seq in 'a':
self.assertContains(result.NamedSeqs, seq)
self.assertEquals(result.NamedSeqs[seq], aln.NamedSeqs[seq])
self.assertEqual(len(result.NamedSeqs), 1)
result = aln.getSimilar(aln.NamedSeqs['a'], min_similarity=0.75, \
max_similarity=0.85)
for seq in 'bd':
self.assertContains(result.NamedSeqs, seq)
self.assertEquals(result.NamedSeqs[seq], aln.NamedSeqs[seq])
self.assertEqual(len(result.NamedSeqs), 2)
result = aln.getSimilar(aln.NamedSeqs['a'], min_similarity=0, \
max_similarity=0.2)
self.assertEqual(result, {})
#test some sequence transformations
transform = lambda s: s[1:4]
result = aln.getSimilar(aln.NamedSeqs['a'], min_similarity=0.5, \
transform=transform)
for seq in 'abdfg':
self.assertContains(result.NamedSeqs, seq)
self.assertEquals(result.NamedSeqs[seq], aln.NamedSeqs[seq])
self.assertEqual(len(result.NamedSeqs), 5)
transform = lambda s: s[-3:]
result = aln.getSimilar(aln.NamedSeqs['a'], min_similarity=0.5, \
transform=transform)
for seq in 'abcde':
self.assertContains(result.NamedSeqs, seq)
self.assertEquals(result.NamedSeqs[seq], aln.NamedSeqs[seq])
self.assertEqual(len(result.NamedSeqs), 5)
#test a different distance metric
metric = lambda x, y: str(x).count('G') + str(y).count('G')
result = aln.getSimilar(aln.NamedSeqs['a'], min_similarity=5, \
max_similarity=10, metric=metric)
for seq in 'ef':
self.assertContains(result.NamedSeqs, seq)
self.assertEquals(result.NamedSeqs[seq], aln.NamedSeqs[seq])
self.assertEqual(len(result.NamedSeqs), 2)
#test the combination of a transform and a distance metric
aln = self.Class(dict(enumerate(map(RnaSequence, \
['GA-GU','A-GAC','GG-GG']))), MolType=RNA)
transform = lambda s: RnaSequence(str(s).replace('G','A'\
).replace('U','C'))
metric = RnaSequence.fracSameNonGaps
null_transform = lambda s: RnaSequence(str(s))
#first, do it without the transformation
try:
result = aln.getSimilar(aln.NamedSeqs[0], min_similarity=0.5, \
metric=metric)
except TypeError: #need to coerce to RNA seq w/ null_transform
result = aln.getSimilar(aln.NamedSeqs[0], min_similarity=0.5, \
metric=metric, transform=null_transform)
for seq in [0,2]:
self.assertContains(result.NamedSeqs, seq)
self.assertEquals(result.NamedSeqs[seq], aln.NamedSeqs[seq])
self.assertEqual(len(result.NamedSeqs), 2)
#repeat with higher similarity
try:
result = aln.getSimilar(aln.NamedSeqs[0], min_similarity=0.8, \
metric=metric)
except TypeError: #need to coerce to RNA
result = aln.getSimilar(aln.NamedSeqs[0], min_similarity=0.8, \
metric=metric, transform=null_transform)
for seq in [0]:
self.assertContains(result.NamedSeqs, seq)
self.assertEquals(result.NamedSeqs[seq], aln.NamedSeqs[seq])
self.assertEqual(len(result.NamedSeqs), 1)
#then, verify that the transform changes the results
result = aln.getSimilar(aln.NamedSeqs[0], min_similarity=0.5, \
metric=metric, transform=transform)
for seq in [0,1,2]:
self.assertContains(result.NamedSeqs, seq)
self.assertEquals(result.NamedSeqs[seq], aln.NamedSeqs[seq])
self.assertEqual(len(result.NamedSeqs), 3)
result = aln.getSimilar(aln.NamedSeqs[0], min_similarity=0.8, \
metric=metric, transform=transform)
for seq in [0,1]:
self.assertContains(result.NamedSeqs, seq)
self.assertEquals(result.NamedSeqs[seq], aln.NamedSeqs[seq])
self.assertEqual(len(result.NamedSeqs), 2)
def test_distanceMatrix(self):
"""SequenceCollection distanceMatrix should produce correct scores"""
self.assertEqual(self.one_seq.distanceMatrix(frac_same), {'a':{'a':1}})
self.assertEqual(self.gaps.distanceMatrix(frac_same),
{ 'a':{'a':7/7.0,'b':4/7.0,'c':2/7.0},
'b':{'a':4/7.0,'b':7/7.0,'c':3/7.0},
'c':{'a':2/7.0,'b':3/7.0,'c':7/7.0},
})
def test_isRagged(self):
"""SequenceCollection isRagged should return true if ragged alignment"""
assert(not self.identical.isRagged())
assert(not self.gaps.isRagged())
def test_toPhylip(self):
"""SequenceCollection should return PHYLIP string format correctly"""
align_norm = self.Class( ['ACDEFGHIKLMNPQRSTUVWY-',
'ACDEFGHIKLMNPQRSUUVWF-',
'ACDEFGHIKLMNPERSKUVWC-',
'ACNEFGHIKLMNPQRS-UVWP-',
])
phylip_str, id_map = align_norm.toPhylip()
self.assertEqual(phylip_str, """4 22\nseq0000001 ACDEFGHIKLMNPQRSTUVWY-\nseq0000002 ACDEFGHIKLMNPQRSUUVWF-\nseq0000003 ACDEFGHIKLMNPERSKUVWC-\nseq0000004 ACNEFGHIKLMNPQRS-UVWP-""")
self.assertEqual(id_map, {'seq0000004':'seq_3', 'seq0000001':'seq_0', \
'seq0000003': 'seq_2', 'seq0000002': 'seq_1'})
def test_toFasta(self):
"""SequenceCollection should return correct FASTA string"""
aln = self.Class(['AAA','CCC'])
self.assertEqual(aln.toFasta(), '>seq_0\nAAA\n>seq_1\nCCC')
#NOTE THE FOLLOWING SURPRISING BEHAVIOR BECAUSE OF THE TWO-ITEM
#SEQUENCE RULE:
aln = self.Class(['AA','CC'])
self.assertEqual(aln.toFasta(), '>A\nA\n>C\nC')
def test_toNexus(self):
"""SequenceCollection should return correct Nexus string format"""
align_norm = self.Class( ['ACDEFGHIKLMNPQRSTUVWY-',
'ACDEFGHIKLMNPQRSUUVWF-',
'ACDEFGHIKLMNPERSKUVWC-',
'ACNEFGHIKLMNPQRS-UVWP-'])
expect = '#NEXUS\n\nbegin data;\n dimensions ntax=4 nchar=22;\n'+\
' format datatype=protein interleave=yes missing=? gap=-;\n'+\
' matrix\n seq_1 ACDEFGHIKLMNPQRSUUVWF-\n seq_0'+\
' ACDEFGHIKLMNPQRSTUVWY-\n seq_3 ACNEFGHIKLMNPQRS-UVWP-\n '+\
' seq_2 ACDEFGHIKLMNPERSKUVWC-\n\n ;\nend;'
self.assertEqual(align_norm.toNexus('protein'), expect)
def test_getIntMap(self):
"""SequenceCollection.getIntMap should return correct mapping."""
aln = self.Class({'seq1':'ACGU','seq2':'CGUA','seq3':'CCGU'})
int_keys = {'seq_0':'seq1','seq_1':'seq2','seq_2':'seq3'}
int_map = {'seq_0':'ACGU','seq_1':'CGUA','seq_2':'CCGU'}
im,ik = aln.getIntMap()
self.assertEqual(ik,int_keys)
self.assertEqual(im,int_map)
#test change prefix from default 'seq_'
prefix='seqn_'
int_keys = {'seqn_0':'seq1','seqn_1':'seq2','seqn_2':'seq3'}
int_map = {'seqn_0':'ACGU','seqn_1':'CGUA','seqn_2':'CCGU'}
im,ik = aln.getIntMap(prefix=prefix)
self.assertEqual(ik,int_keys)
self.assertEqual(im,int_map)
def test_getNumSeqs(self):
"""SequenceCollection.getNumSeqs should count seqs."""
aln = self.Class({'seq1':'ACGU','seq2':'CGUA','seq3':'CCGU'})
self.assertEqual(aln.getNumSeqs(), 3)
def test_copyAnnotations(self):
"""SequenceCollection copyAnnotations should copy from seq objects"""
aln = self.Class({'seq1':'ACGU','seq2':'CGUA','seq3':'CCGU'})
seq_1 = Sequence('ACGU', Name='seq1')
seq_1.addFeature('xyz','abc', [(1,2)])
seq_5 = Sequence('ACGUAAAAAA', Name='seq5')
seq_5.addFeature('xyzzz','abc', [(1,2)])
annot = {'seq1': seq_1, 'seq5':seq_5}
aln.copyAnnotations(annot)
aln_seq_1 = aln.NamedSeqs['seq1']
if not hasattr(aln_seq_1, 'annotations'):
aln_seq_1 = aln_seq_1.data
aln_seq_2 = aln.NamedSeqs['seq2']
if not hasattr(aln_seq_2, 'annotations'):
aln_seq_2 = aln_seq_2.data
self.assertEqual(len(aln_seq_1.annotations), 1)
self.assertEqual(aln_seq_1.annotations[0].Name,'abc')
self.assertEqual(len(aln_seq_2.annotations), 0)
def test_annotateFromGff(self):
"""SequenceCollection.annotateFromGff should read gff features"""
aln = self.Class({'seq1':'ACGU','seq2':'CGUA','seq3':'CCGU'})
gff = [
['seq1', 'prog1', 'snp', '1', '2', '1.0', '+', '1','"abc"'],
['seq5', 'prog2', 'snp', '2', '3', '1.0', '+', '1','"yyy"'],
]
gff = map('\t'.join, gff)
aln.annotateFromGff(gff)
aln_seq_1 = aln.NamedSeqs['seq1']
if not hasattr(aln_seq_1, 'annotations'):
aln_seq_1 = aln_seq_1.data
aln_seq_2 = aln.NamedSeqs['seq2']
if not hasattr(aln_seq_2, 'annotations'):
aln_seq_2 = aln_seq_2.data
self.assertEqual(len(aln_seq_1.annotations), 1)
self.assertEqual(aln_seq_1.annotations[0].Name,'abc')
self.assertEqual(len(aln_seq_2.annotations), 0)
def test_replaceSeqs(self):
"""replaceSeqs should replace 1-letter w/ 3-letter seqs"""
a = Alignment({'seq1':'ACGU','seq2':'C-UA','seq3':'C---'})
seqs = {'seq1':'AAACCCGGGUUU','seq2':'CCCUUUAAA','seq3':'CCC'}
result = a.replaceSeqs(seqs)
self.assertEqual(result.toFasta(), \
">seq1\nAAACCCGGGUUU\n>seq2\nCCC---UUUAAA\n>seq3\nCCC---------")
def test_getGappedSeq(self):
"""SequenceCollection.getGappedSeq should return seq, with gaps"""
aln = self.Class({'seq1': '--TTT?', 'seq2': 'GATC??'})
self.assertEqual(str(aln.getGappedSeq('seq1')), '--TTT?')
def test_add(self):
"""__add__ should concatenate sequence data, by name"""
align1= self.Class({'a': 'AAAA', 'b': 'TTTT', 'c': 'CCCC'})
align2 = self.Class({'a': 'GGGG', 'b': '----', 'c': 'NNNN'})
align = align1 + align2
concatdict = align.todict()
self.assertEqual(concatdict, {'a': 'AAAAGGGG', 'b': 'TTTT----', 'c': 'CCCCNNNN'})
def test_addSeqs(self):
"""addSeqs should return an alignment with the new sequences appended or inserted"""
data = [('name1', 'AAA'), ('name2', 'AAA'), ('name3', 'AAA'), ('name4', 'AAA')]
data1 = [('name1', 'AAA'), ('name2', 'AAA')]
data2 = [('name3', 'AAA'), ('name4', 'AAA')]
data3 = [('name5', 'BBB'), ('name6', 'CCC')]
aln = self.Class(data)
aln3 = self.Class(data3)
out_aln = aln.addSeqs(aln3)
self.assertEqual(str(out_aln), str(self.Class(data+data3))) #test append at the end
out_aln = aln.addSeqs(aln3, before_name='name3')
self.assertEqual(str(out_aln), str(self.Class(data1+data3+data2))) # test insert before
out_aln = aln.addSeqs(aln3, after_name='name2')
self.assertEqual(str(out_aln), str(self.Class(data1+data3+data2))) # test insert after
out_aln = aln.addSeqs(aln3, before_name='name1')
self.assertEqual(str(out_aln), str(self.Class(data3+data))) #test if insert before first seq works
out_aln = aln.addSeqs(aln3, after_name='name4')
self.assertEqual(str(out_aln), str(self.Class(data+data3))) #test if insert after last seq works
self.assertRaises(ValueError, aln.addSeqs, aln3, before_name='name5') #wrong after/before name
self.assertRaises(ValueError, aln.addSeqs, aln3, after_name='name5') #wrong after/before name
if isinstance(aln, Alignment) or isinstance(aln, DenseAlignment):
self.assertRaises((DataError, ValueError), aln.addSeqs, aln3+aln3)
else:
exp = set([seq for name, seq in data])
exp.update([seq+seq for name, seq in data3])
got = set()
for seq in aln.addSeqs(aln3+aln3).Seqs:
got.update([str(seq).strip()])
self.assertEqual(got, exp)
def test_writeToFile(self):
"""SequenceCollection.writeToFile should write in correct format"""
aln = self.Class([('a','AAAA'),( 'b','TTTT'),('c','CCCC')])
fn = mktemp(suffix='.fasta')
aln.writeToFile(fn)
result = open(fn, 'U').read()
self.assertEqual(result, '>a\nAAAA\n>b\nTTTT\n>c\nCCCC\n')
remove(fn)
def test_len(self):
"""len(SequenceCollection) returns length of longest sequence"""
aln = self.Class([('a','AAAA'),( 'b','TTTT'),('c','CCCC')])
self.assertEqual(len(aln), 4)
def test_getTranslation(self):
"""SequenceCollection.getTranslation translates each seq"""
for seqs in [
{'seq1': 'GATTTT', 'seq2': 'GATC??'},
{'seq1': 'GAT---', 'seq2': '?GATCT'}]:
alignment = self.Class(data=seqs, MolType=DNA)
self.assertEqual(len(alignment.getTranslation()), 2)
# check for a failure when no moltype specified
alignment = self.Class(data=seqs)
try:
peps = alignment.getTranslation()
except AttributeError:
pass
def test_getSeq(self):
"""SequenceCollection.getSeq should return specified seq"""
aln = self.Class({'seq1': 'GATTTT', 'seq2': 'GATC??'})
self.assertEqual(aln.getSeq('seq1'), 'GATTTT')
self.assertRaises(KeyError, aln.getSeq, 'seqx')
def test_todict(self):
"""SequenceCollection.todict should return dict of strings (not obj)"""
aln = self.Class({'seq1': 'GATTTT', 'seq2': 'GATC??'})
self.assertEqual(aln.todict(), {'seq1':'GATTTT','seq2':'GATC??'})
for i in aln.todict().values():
assert isinstance(i, str)
def test_getPerSequenceAmbiguousPositions(self):
"""SequenceCollection.getPerSequenceAmbiguousPositions should return pos"""
aln = self.Class({'s1':'ATGRY?','s2':'T-AG??'}, MolType=DNA)
self.assertEqual(aln.getPerSequenceAmbiguousPositions(), \
{'s2': {4: '?', 5: '?'}, 's1': {3: 'R', 4: 'Y', 5: '?'}})
def test_degap(self):
"""SequenceCollection.degap should strip gaps from each seq"""
aln = self.Class({'s1':'ATGRY?','s2':'T-AG??'}, MolType=DNA)
self.assertEqual(aln.degap(), {'s1':'ATGRY','s2':'TAG'})
def test_withModifiedTermini(self):
"""SequenceCollection.withModifiedTermini should code trailing gaps as ?"""
aln = self.Class({'s1':'AATGR--','s2':'-T-AG?-'}, MolType=DNA)
self.assertEqual(aln.withModifiedTermini(), \
{'s1':'AATGR??','s2':'?T-AG??'})
def test_omitSeqsTemplate(self):
"""SequenceCollection.omitSeqsTemplate returns new aln with well-aln to temp"""
aln = self.omitSeqsTemplate_aln
result = aln.omitSeqsTemplate('s3', 0.9, 5)
self.assertEqual(result, {'s3': 'UUCCUUCUU-UUC', \
's4': 'UU-UUUU-UUUUC'})
result2 = aln.omitSeqsTemplate('s4', 0.9, 4)
self.assertEqual(result2, {'s3': 'UUCCUUCUU-UUC', \
's4': 'UU-UUUU-UUUUC'})
result3 = aln.omitSeqsTemplate('s1', 0.9, 4)
self.assertEqual(result3, {'s2': 'UC------U---C', \
's1': 'UC-----CU---C', 's5': '-------------'})
result4 = aln.omitSeqsTemplate('s3', 0.5, 13)
self.assertEqual(result4, {'s3': 'UUCCUUCUU-UUC', \
's4': 'UU-UUUU-UUUUC'})
def test_make_gap_filter(self):
"""make_gap_filter returns f(seq) -> True if aligned ok w/ query"""
s1 = RnaSequence('UC-----CU---C')
s3 = RnaSequence('UUCCUUCUU-UUC')
s4 = RnaSequence('UU-UUUU-UUUUC')
#check that the behavior is ok for gap runs
f1 = make_gap_filter(s1, 0.9, 5)
f3 = make_gap_filter(s3, 0.9, 5)
#Should return False since s1 has gap run >= 5 with respect to s3
self.assertEqual(f3(s1), False)
#Should return False since s3 has an insertion run >= 5 to s1
self.assertEqual(f1(s3), False)
#Should retun True since s4 does not have a long enough gap or ins run
self.assertEqual(f3(s4), True)
f3 = make_gap_filter(s3, 0.9, 6)
self.assertEqual(f3(s1), True)
#Check that behavior is ok for gap_fractions
f1 = make_gap_filter(s1, 0.5, 6)
f3 = make_gap_filter(s3, 0.5, 6)
#Should return False since 0.53% of positions are diff for gaps
self.assertEqual(f3(s1), False)
self.assertEqual(f1(s3), False)
self.assertEqual(f3(s4), True)
def test_omitGapSeqs(self):
"""SequenceCollection omitGapSeqs should return alignment w/o seqs with gaps"""
#check default params
self.assertEqual(self.gaps.omitGapSeqs(), self.gaps.omitGapSeqs(0))
#check for boundary effects
self.assertEqual(self.gaps.omitGapSeqs(-1), {})
self.assertEqual(self.gaps.omitGapSeqs(0), {'a':'AAAAAAA'})
self.assertEqual(self.gaps.omitGapSeqs(0.1), {'a':'AAAAAAA'})
self.assertEqual(self.gaps.omitGapSeqs(3.0/7 - 0.01), {'a':'AAAAAAA'})
self.assertEqual(self.gaps.omitGapSeqs(3.0/7), \
{'a':'AAAAAAA','b':'A--A-AA'})
self.assertEqual(self.gaps.omitGapSeqs(3.0/7 + 0.01), \
{'a':'AAAAAAA','b':'A--A-AA'})
self.assertEqual(self.gaps.omitGapSeqs(5.0/7 - 0.01), \
{'a':'AAAAAAA','b':'A--A-AA'})
self.assertEqual(self.gaps.omitGapSeqs(5.0/7 + 0.01), self.gaps)
self.assertEqual(self.gaps.omitGapSeqs(0.99), self.gaps)
#check new object creation
self.assertNotSameObj(self.gaps.omitGapSeqs(0.99), self.gaps)
self.assertTrue(isinstance(self.gaps.omitGapSeqs(3.0/7),
SequenceCollection))
#repeat tests for object that supplies its own gaps
self.assertEqual(self.gaps_rna.omitGapSeqs(-1), {})
self.assertEqual(self.gaps_rna.omitGapSeqs(0), {'a':'AAAAAAA'})
self.assertEqual(self.gaps_rna.omitGapSeqs(0.1), {'a':'AAAAAAA'})
self.assertEqual(self.gaps_rna.omitGapSeqs(3.0/7 - 0.01), \
{'a':'AAAAAAA'})
self.assertEqual(self.gaps_rna.omitGapSeqs(3.0/7), \
{'a':'AAAAAAA','b':'A--A-AA'})
self.assertEqual(self.gaps_rna.omitGapSeqs(3.0/7 + 0.01), \
{'a':'AAAAAAA','b':'A--A-AA'})
self.assertEqual(self.gaps_rna.omitGapSeqs(5.0/7 - 0.01), \
{'a':'AAAAAAA','b':'A--A-AA'})
self.assertEqual(self.gaps_rna.omitGapSeqs(5.0/7 + 0.01), self.gaps_rna)
self.assertEqual(self.gaps_rna.omitGapSeqs(0.99), self.gaps_rna)
self.assertNotSameObj(self.gaps_rna.omitGapSeqs(0.99), self.gaps_rna)
self.assertTrue(isinstance(self.gaps_rna.omitGapSeqs(3.0/7),
SequenceCollection))
def test_omitGapRuns(self):
"""SequenceCollection omitGapRuns should return alignment w/o runs of gaps"""
#negative value will still let through ungapped sequences
self.assertEqual(self.gaps.omitGapRuns(-5), {'a':'AAAAAAA'})
#test edge effects
self.assertEqual(self.gaps.omitGapRuns(0), {'a':'AAAAAAA'})
self.assertEqual(self.gaps.omitGapRuns(1), {'a':'AAAAAAA'})
self.assertEqual(self.gaps.omitGapRuns(2),{'a':'AAAAAAA','b':'A--A-AA'})
self.assertEqual(self.gaps.omitGapRuns(3),{'a':'AAAAAAA','b':'A--A-AA'})
self.assertEqual(self.gaps.omitGapRuns(4),{'a':'AAAAAAA','b':'A--A-AA'})
self.assertEqual(self.gaps.omitGapRuns(5), self.gaps)
self.assertEqual(self.gaps.omitGapRuns(6), self.gaps)
self.assertEqual(self.gaps.omitGapRuns(1000), self.gaps)
#test new object creation
self.assertNotSameObj(self.gaps.omitGapRuns(6), self.gaps)
self.assertTrue(isinstance(self.gaps.omitGapRuns(6),
SequenceCollection))
def test_consistent_gap_degen_handling(self):
"""gap degen character should be treated consistently"""
# the degen character '?' can be a gap, so when we strip gaps it should
# be gone too
raw_seq = "---??-??TC-GGCG-GCA-G-GC-?-C-TAN-GCGC-CCTC-AGGA?-???-??--"
raw_ungapped = re.sub("[-?]", "", raw_seq)
raw_no_ambigs = re.sub("[N?]+", "", raw_seq)
dna = DNA.makeSequence(raw_seq)
aln = self.Class(data=[("a", dna),("b", dna)])
expect = self.Class(data=[("a", raw_ungapped),("b", raw_ungapped)]).toFasta()
self.assertEqual(aln.degap().toFasta(), expect)
seqs = self.Class(data=[("a", dna),("b", dna)])
self.assertEqual(seqs.degap().toFasta(), expect)
def test_padSeqs(self):
"""SequenceCollection padSeqs should work on alignment."""
#pad to max length
padded1 = self.ragged_padded.padSeqs()
seqs1 = list(padded1.iterSeqs(seq_order=['a','b','c']))
self.assertEqual(map(str,seqs1),['AAAAAA', 'AAA---', 'AAAA--'])
#pad to alternate length
padded1 = self.ragged_padded.padSeqs(pad_length=10)
seqs1 = list(padded1.iterSeqs(seq_order=['a','b','c']))
self.assertEqual(map(str,seqs1),['AAAAAA----', 'AAA-------',\
'AAAA------'])
#assertRaises error when pad_length is less than max seq length
self.assertRaises(ValueError, self.ragged_padded.padSeqs, 5)
class SequenceCollectionTests(SequenceCollectionBaseTests, TestCase):
"""Tests of the SequenceCollection object. Includes ragged collection tests.
Should not test alignment-specific features.
"""
def setUp(self):
"""Adds self.ragged for ragged collection tests."""
self.ragged = SequenceCollection({'a':'AAAAAA', 'b':'AAA', 'c':'AAAA'})
super(SequenceCollectionTests, self).setUp()
def test_SeqLen_get_ragged(self):
"""SequenceCollection SeqLen get should work for ragged seqs"""
self.assertEqual(self.ragged.SeqLen, 6)
def test_isRagged_ragged(self):
"""SequenceCollection isRagged should return True if ragged"""
self.assertTrue(self.ragged.isRagged())
def test_Seqs_ragged(self):
"""SequenceCollection Seqs should work on ragged alignment"""
self.ragged.Names = 'bac'
self.assertEqual(list(self.ragged.Seqs), ['AAA', 'AAAAAA', 'AAAA'])
def test_iterSeqs_ragged(self):
"""SequenceCollection iterSeqs() method should support reordering of seqs"""
self.ragged.Names = ['a','b','c']
seqs = list(self.ragged.iterSeqs())
self.assertEqual(seqs, ['AAAAAA', 'AAA', 'AAAA'])
seqs = list(self.ragged.iterSeqs(seq_order=['b','a','a']))
self.assertEqual(seqs, ['AAA', 'AAAAAA', 'AAAAAA'])
self.assertSameObj(seqs[1], seqs[2])
self.assertSameObj(seqs[0], self.ragged.NamedSeqs['b'])
def test_toPHYLIP_ragged(self):
"""SequenceCollection should refuse to convert ragged seqs to phylip"""
align_rag = self.Class( ['ACDEFGHIKLMNPQRSTUVWY-',
'ACDEFGHIKLMNPQRSUUVWF-',
'ACDEFGHIKLMNPERSKUVWC-',
'ACNEFGHIKLMNUVWP-',
])
self.assertRaises(ValueError, align_rag.toPhylip)
def test_padSeqs_ragged(self):
"""SequenceCollection padSeqs should work on ragged alignment."""
#pad to max length
padded1 = self.ragged.padSeqs()
seqs1 = list(padded1.iterSeqs(seq_order=['a','b','c']))
self.assertEqual(map(str,seqs1),['AAAAAA', 'AAA---', 'AAAA--'])
#pad to alternate length
padded1 = self.ragged.padSeqs(pad_length=10)
seqs1 = list(padded1.iterSeqs(seq_order=['a','b','c']))
self.assertEqual(map(str,seqs1),['AAAAAA----', 'AAA-------',\
'AAAA------'])
#assertRaises error when pad_length is less than max seq length
self.assertRaises(ValueError, self.ragged.padSeqs, 5)
class AlignmentBaseTests(SequenceCollectionBaseTests):
"""Tests of basic Alignment functionality. All Alignments should pass these.
Note that this is not a TestCase: need to subclass to test each specific
type of Alignment. Override self.Constructor with your alignment class
as a constructor.
"""
def test_Positions(self):
"""SequenceCollection Positions property should iterate over positions, using self.Names"""
r = self.Class({'a':'AAAAAA','b':'AAA---','c':'AAAA--'})
r.Names = ['a','b','c']
self.assertEqual(list(r.Positions), map(list, \
['AAA','AAA','AAA', 'A-A', 'A--', 'A--']))
def test_iterPositions(self):
#"""SequenceCollection iterPositions() method should support reordering of #cols"""
r = self.Class(self.ragged_padded.NamedSeqs, Names=['c','b'])
self.assertEqual(list(r.iterPositions(pos_order=[5,1,3])),\
map(list,['--','AA','A-']))
#reorder names
r = self.Class(self.ragged_padded.NamedSeqs, Names=['a','b','c'])
cols = list(r.iterPositions())
self.assertEqual(cols, map(list, ['AAA','AAA','AAA','A-A','A--','A--']))
def test_takePositions(self):
"""SequenceCollection takePositions should return new alignment w/ specified pos"""
self.assertEqual(self.gaps.takePositions([5,4,0], \
seq_constructor=coerce_to_string), \
{'a':'AAA','b':'A-A','c':'--A'})
self.assertTrue(isinstance(self.gaps.takePositions([0]),
SequenceCollection))
#should be able to negate
self.assertEqual(self.gaps.takePositions([5,4,0], negate=True, \
seq_constructor=coerce_to_string),
{'a':'AAAA','b':'--AA','c':'A---'})
def test_getPositionIndices(self):
"""SequenceCollection getPositionIndices should return names of cols where f(col)"""
gap_1st = lambda x: x[0] == '-'
gap_2nd = lambda x: x[1] == '-'
gap_3rd = lambda x: x[2] == '-'
is_list = lambda x: isinstance(x, list)
self.gaps = self.Class(self.gaps.NamedSeqs, Names=['a','b','c'])
self.assertEqual(self.gaps.getPositionIndices(gap_1st), [])
self.assertEqual(self.gaps.getPositionIndices(gap_2nd), [1,2,4])
self.assertEqual(self.gaps.getPositionIndices(gap_3rd), [2,3,4,5,6])
self.assertEqual(self.gaps.getPositionIndices(is_list), [0,1,2,3,4,5,6])
#should be able to negate
self.assertEqual(self.gaps.getPositionIndices(gap_2nd, negate=True), \
[0,3,5,6])
self.assertEqual(self.gaps.getPositionIndices(gap_1st, negate=True), \
[0,1,2,3,4,5,6])
self.assertEqual(self.gaps.getPositionIndices(is_list, negate=True), [])
def test_takePositionsIf(self):
"""SequenceCollection takePositionsIf should return cols where f(col) is True"""
gap_1st = lambda x: x[0] == '-'
gap_2nd = lambda x: x[1] == '-'
gap_3rd = lambda x: x[2] == '-'
is_list = lambda x: isinstance(x, list)
self.gaps.Names = 'abc'
self.assertEqual(self.gaps.takePositionsIf(gap_1st,seq_constructor=coerce_to_string),\
{'a':'', 'b':'', 'c':''})
self.assertEqual(self.gaps.takePositionsIf(gap_2nd,seq_constructor=coerce_to_string),\
{'a':'AAA', 'b':'---', 'c':'A--'})
self.assertEqual(self.gaps.takePositionsIf(gap_3rd,seq_constructor=coerce_to_string),\
{'a':'AAAAA', 'b':'-A-AA', 'c':'-----'})
self.assertEqual(self.gaps.takePositionsIf(is_list,seq_constructor=coerce_to_string),\
self.gaps)
self.assertTrue(isinstance(self.gaps.takePositionsIf(gap_1st),
SequenceCollection))
#should be able to negate
self.assertEqual(self.gaps.takePositionsIf(gap_1st, seq_constructor=coerce_to_string,\
negate=True), self.gaps)
self.assertEqual(self.gaps.takePositionsIf(gap_2nd, seq_constructor=coerce_to_string,\
negate=True), {'a':'AAAA','b':'AAAA','c':'A---'})
self.assertEqual(self.gaps.takePositionsIf(gap_3rd, seq_constructor=coerce_to_string,\
negate=True), {'a':'AA','b':'A-','c':'AA'})
def test_omitGapPositions(self):
"""SequenceCollection omitGapPositions should return alignment w/o positions of gaps"""
aln = self.end_gaps
#first, check behavior when we're just acting on the cols (and not
#trying to delete the naughty seqs).
#default should strip out cols that are 100% gaps
self.assertEqual(aln.omitGapPositions(seq_constructor=coerce_to_string), \
{'a':'-ABC', 'b':'CBA-', 'c':'-DEF'})
#if allowed_gap_frac is 1, shouldn't delete anything
self.assertEqual(aln.omitGapPositions(1, seq_constructor=coerce_to_string), \
{'a':'--A-BC-', 'b':'-CB-A--', 'c':'--D-EF-'})
#if allowed_gap_frac is 0, should strip out any cols containing gaps
self.assertEqual(aln.omitGapPositions(0, seq_constructor=coerce_to_string), \
{'a':'AB', 'b':'BA', 'c':'DE'})
#intermediate numbers should work as expected
self.assertEqual(aln.omitGapPositions(0.4, seq_constructor=coerce_to_string), \
{'a':'ABC', 'b':'BA-', 'c':'DEF'})
self.assertEqual(aln.omitGapPositions(0.7, seq_constructor=coerce_to_string), \
{'a':'-ABC', 'b':'CBA-', 'c':'-DEF'})
#second, need to check behavior when the naughty seqs should be
#deleted as well.
#default should strip out cols that are 100% gaps
self.assertEqual(aln.omitGapPositions(seq_constructor=coerce_to_string, \
del_seqs=True), {'a':'-ABC', 'b':'CBA-', 'c':'-DEF'})
#if allowed_gap_frac is 1, shouldn't delete anything
self.assertEqual(aln.omitGapPositions(1, seq_constructor=coerce_to_string, \
del_seqs=True), {'a':'--A-BC-', 'b':'-CB-A--', 'c':'--D-EF-'})
#if allowed_gap_frac is 0, should strip out any cols containing gaps
self.assertEqual(aln.omitGapPositions(0, seq_constructor=coerce_to_string, \
del_seqs=True), {}) #everything has at least one naughty non-gap
#intermediate numbers should work as expected
self.assertEqual(aln.omitGapPositions(0.4, seq_constructor=coerce_to_string,
del_seqs=True), {'a':'ABC', 'c':'DEF'}) #b has a naughty non-gap
#check that does not delete b if allowed_frac_bad_calls higher than 0.14
self.assertEqual(aln.omitGapPositions(0.4, seq_constructor=coerce_to_string,
del_seqs=True, allowed_frac_bad_cols=0.2), \
{'a':'ABC', 'b':'BA-','c':'DEF'})
self.assertEqual(aln.omitGapPositions(0.4, seq_constructor=coerce_to_string,
del_seqs=True), {'a':'ABC', 'c':'DEF'}) #b has a naughty non-gap
self.assertEqual(aln.omitGapPositions(0.7, seq_constructor=coerce_to_string,
del_seqs=True), {'a':'-ABC', 'b':'CBA-', 'c':'-DEF'}) #all ok
#when we increase the number of sequences to 6, more differences
#start to appear.
new_aln_data = aln.NamedSeqs.copy()
new_aln_data['d'] = '-------'
new_aln_data['e'] = 'XYZXYZX'
new_aln_data['f'] = 'AB-CDEF'
aln = self.Class(new_aln_data)
#if no gaps are allowed, everything is deleted...
result = aln.omitGapPositions(seq_constructor=coerce_to_string)
self.assertEqual(aln.omitGapPositions(0, del_seqs=False), \
{'a':'', 'b':'', 'c':'', 'd':'', 'e':'', 'f':''})
#...though not a sequence that's all gaps, since it has no positions
#that are not gaps. This 'feature' should possibly be considered a bug.
self.assertEqual(aln.omitGapPositions(0, del_seqs=True), {'d':''})
#if we're deleting only full positions of gaps, del_seqs does nothing.
self.assertEqual(aln.omitGapPositions(del_seqs=True, \
seq_constructor=coerce_to_string), aln)
#at 50%, should delete a bunch of minority sequences
self.assertEqual(aln.omitGapPositions(0.5, del_seqs=True, \
seq_constructor=coerce_to_string), \
{'a':'-ABC','b':'CBA-','c':'-DEF','d':'----'})
#shouldn't depend on order of seqs
aln.Names = 'fadbec'
self.assertEqual(aln.omitGapPositions(0.5, del_seqs=True, \
seq_constructor=coerce_to_string), \
{'a':'-ABC','b':'CBA-','c':'-DEF','d':'----'})
def test_IUPACConsensus_RNA(self):
"""SequenceCollection IUPACConsensus should use RNA IUPAC symbols correctly"""
alignmentUpper = self.Class( ['UCAGN-UCAGN-UCAGN-UCAGAGCAUN-',
'UUCCAAGGNN--UUCCAAGGNNAGCAG--',
'UUCCAAGGNN--UUCCAAGGNNAGCUA--',
'UUUUCCCCAAAAGGGGNNNN--AGCUA--',
'UUUUCCCCAAAAGGGGNNNN--AGCUA--',
], MolType=RNA)
#following IUPAC consensus calculated by hand
#Test all uppper
self.assertEqual(alignmentUpper.IUPACConsensus(),
'UYHBN?BSNN??KBVSN?NN??AGCWD?-')
def test_IUPACConsensus_DNA(self):
"""SequenceCollection IUPACConsensus should use DNA IUPAC symbols correctly"""
alignmentUpper = self.Class( ['TCAGN-TCAGN-TCAGN-TCAGAGCATN-',
'TTCCAAGGNN--TTCCAAGGNNAGCAG--',
'TTCCAAGGNN--TTCCAAGGNNAGCTA--',
'TTTTCCCCAAAAGGGGNNNN--AGCTA--',
'TTTTCCCCAAAAGGGGNNNN--AGCTA--',
])
#following IUPAC consensus calculated by hand
#Test all uppper
self.assertEqual(alignmentUpper.IUPACConsensus(DNA),
'TYHBN?BSNN??KBVSN?NN??AGCWD?-')
def test_IUPACConsensus_Protein(self):
"""SequenceCollection IUPACConsensus should use protein IUPAC symbols correctly"""
alignmentUpper = self.Class( ['ACDEFGHIKLMNPQRSTUVWY-',
'ACDEFGHIKLMNPQRSUUVWF-',
'ACDEFGHIKLMNPERSKUVWC-',
'ACNEFGHIKLMNPQRS-UVWP-',
])
#following IUPAC consensus calculated by hand
#Test all uppper
self.assertEqual(alignmentUpper.IUPACConsensus(PROTEIN),
'ACBEFGHIKLMNPZRS?UVWX-')
def test_isRagged(self):
"""SequenceCollection isRagged should return true if ragged alignment"""
assert(not self.identical.isRagged())
assert(not self.gaps.isRagged())
def test_columnProbs(self):
"""SequenceCollection.columnProbs should find Pr(symbol) in each column"""
#make an alignment with 4 seqs (easy to calculate probabilities)
align = self.Class(["AAA", "ACA", "GGG", "GUC"])
cp = align.columnProbs()
#check that the column probs match the counts we expect
self.assertEqual(cp, map(Freqs, [
{'A':0.5, 'G':0.5},
{'A':0.25, 'C':0.25, 'G':0.25, 'U':0.25},
{'A':0.5, 'G':0.25, 'C':0.25},
]))
def test_majorityConsensus(self):
"""SequenceCollection.majorityConsensus should return commonest symbol per column"""
#Check the exact strings expected from string transform
self.assertEqual(self.sequences.majorityConsensus(str), 'UCAG')
self.assertEqual(self.structures.majorityConsensus(str), '(.....')
def test_uncertainties(self):
"""SequenceCollection.uncertainties should match hand-calculated values"""
aln = self.Class(['ABC', 'AXC'])
obs = aln.uncertainties()
self.assertFloatEqual(obs, [0, 1, 0])
#check what happens with only one input sequence
aln = self.Class(['ABC'])
obs = aln.uncertainties()
self.assertFloatEqual(obs, [0, 0, 0])
#check that we can screen out bad items OK
aln = self.Class(['ABC', 'DEF', 'GHI', 'JKL', '333'], MolType=BYTES)
obs = aln.uncertainties('ABCDEFGHIJKLMNOP')
self.assertFloatEqual(obs, [2.0] * 3)
def test_columnFreqs(self):
"""Alignment.columnFreqs should count symbols in each column"""
#calculate by hand what the first and last positions should look like in
#each case
firstvalues = [
[self.sequences, Freqs('UUU')],
[self.structures, Freqs('(.(')],
]
lastvalues = [
[self.sequences, Freqs('GGG')],
[self.structures, Freqs('..)')],
]
#check that the first positions are what we expected
for obj, result in firstvalues:
freqs = obj.columnFreqs()
self.assertEqual(str(freqs[0]), str(result))
#check that the last positions are what we expected
for obj, result in lastvalues:
freqs = obj.columnFreqs()
self.assertEqual(str(freqs[-1]), str(result))
def test_scoreMatrix(self):
"""Alignment scoreMatrix should produce position specific score matrix."""
scoreMatrix = {
0:{'A':1.0,'C':1.0,'U':5.0},
1:{'C':6.0,'U':1.0},
2:{'A':3.0,'C':2.0,'G':2.0},
3:{'A':3.0,'G':4.0},
4:{'C':1.0,'G':1.0,'U':5.0},
5:{'C':6.0,'U':1.0},
6:{'A':3.0,'G':4.0},
7:{'A':1.0,'G':6.0},
8:{'A':1.0,'C':1.0,'G':1.0,'U':4.0},
9:{'A':1.0,'C':2.0,'U':4.0},
}
self.assertEqual(self.many.scoreMatrix(), scoreMatrix)
def test_sample(self):
"""Alignment.sample should permute alignment by default"""
alignment = self.Class({'seq1': 'ABCDEFGHIJKLMNOP',
'seq2': 'ABCDEFGHIJKLMNOP'})
# effectively permute columns, preserving length
shuffled = alignment.sample()
# ensure length correct
sample = alignment.sample(10)
self.assertEqual(len(sample), 10)
# test columns alignment preserved
seqs = sample.todict().values()
self.assertEqual(seqs[0], seqs[1])
# ensure each char occurs once as sampling without replacement
for char in seqs[0]:
self.assertEqual(seqs[0].count(char), 1)
def test_sample_with_replacement(self):
#test with replacement -- just verify that it rnus
alignment = self.Class({'seq1': 'gatc', 'seq2': 'gatc'})
sample = alignment.sample(1000, with_replacement=True)
self.assertEqual(len(sample), 1000)
# ensure that sampling with replacement works on single col alignment
alignment1 = self.Class({'seq1': 'A',
'seq2': 'A'})
result = alignment1.sample(with_replacement=True)
self.assertEqual(len(result), 1)
def test_sample_tuples(self):
##### test with motif size != 1 #####
alignment = self.Class({'seq1': 'AABBCCDDEEFFGGHHIIJJKKLLMMNNOOPP',
'seq2': 'AABBCCDDEEFFGGHHIIJJKKLLMMNNOOPP'})
shuffled = alignment.sample(motif_length=2)
# ensure length correct
sample = alignment.sample(10,motif_length=2)
self.assertEqual(len(sample), 20)
# test columns alignment preserved
seqs = sample.todict().values()
self.assertEqual(seqs[0], seqs[1])
# ensure each char occurs twice as sampling dinucs without replacement
for char in seqs[0]:
self.assertEqual(seqs[0].count(char), 2)
def test_copy(self):
"""correctly copy an alignment"""
aln = self.Class(data=[('a', 'AC-GT'), ('b', 'ACCGT')])
copied = aln.copy()
self.assertTrue(type(aln), type(copied))
self.assertEqual(aln.todict(), copied.todict())
self.assertEqual(id(aln.MolType), id(copied.MolType))
aln = self.Class(data=[('a', 'AC-GT'), ('b', 'ACCGT')],
Info={'check': True})
copied = aln.copy()
self.assertEqual(aln.Info, copied.Info)
class DenseAlignmentTests(AlignmentBaseTests, TestCase):
Class = DenseAlignment
def test_get_freqs(self):
"""DenseAlignment getSeqFreqs: should work on positions and sequences
"""
s1 = DNA.Sequence('TCAG', Name='s1')
s2 = DNA.Sequence('CCAC', Name='s2')
s3 = DNA.Sequence('AGAT', Name='s3')
da = DenseAlignment([s1,s2,s3], MolType=DNA, Alphabet=DNA.Alphabet)
seq_exp = array([[1,1,1,1],[0,3,1,0],[1,0,2,1]])
pos_exp = array([[1,1,1,0],[0,2,0,1],[0,0,3,0],[1,1,0,1]])
self.assertEqual(da._get_freqs(index=1), pos_exp)
self.assertEqual(da._get_freqs(index=0), seq_exp)
def test_getSeqFreqs(self):
"""DenseAlignment getSeqFreqs: should work with DnaSequences and strings
"""
exp = array([[1,1,1,1],[0,3,1,0],[1,0,2,1]])
s1 = DNA.Sequence('TCAG', Name='s1')
s2 = DNA.Sequence('CCAC', Name='s2')
s3 = DNA.Sequence('AGAT', Name='s3')
da = DenseAlignment([s1,s2,s3], MolType=DNA, Alphabet=DNA.Alphabet)
obs = da.getSeqFreqs()
self.assertEqual(obs.Data, exp)
self.assertEqual(obs.Alphabet, DNA.Alphabet)
self.assertEqual(obs.CharOrder, list("TCAG"))
s1 = 'TCAG'
s2 = 'CCAC'
s3 = 'AGAT'
da = DenseAlignment([s1,s2,s3], MolType=DNA, Alphabet=DNA.Alphabet)
obs = da.getSeqFreqs()
self.assertEqual(obs.Data, exp)
self.assertEqual(obs.Alphabet, DNA.Alphabet)
self.assertEqual(obs.CharOrder, list("TCAG"))
def test_getPosFreqs_sequence(self):
"""DenseAlignment getPosFreqs: should work with DnaSequences and strings
"""
exp = array([[1,1,1,0],[0,2,0,1],[0,0,3,0],[1,1,0,1]])
s1 = DNA.Sequence('TCAG', Name='s1')
s2 = DNA.Sequence('CCAC', Name='s2')
s3 = DNA.Sequence('AGAT', Name='s3')
da = DenseAlignment([s1,s2,s3], MolType=DNA, Alphabet=DNA.Alphabet)
obs = da.getPosFreqs()
self.assertEqual(obs.Data, exp)
self.assertEqual(obs.Alphabet, DNA.Alphabet)
self.assertEqual(obs.CharOrder, list("TCAG"))
s1 = 'TCAG'
s2 = 'CCAC'
s3 = 'AGAT'
da = DenseAlignment([s1,s2,s3], MolType=DNA, Alphabet=DNA.Alphabet)
obs = da.getPosFreqs()
self.assertEqual(obs.Data, exp)
self.assertEqual(obs.Alphabet, DNA.Alphabet)
self.assertEqual(obs.CharOrder, list("TCAG"))
class AlignmentTests(AlignmentBaseTests, TestCase):
Class = Alignment
def test_get_freqs(self):
"""Alignment _get_freqs: should work on positions and sequences
"""
s1 = DNA.Sequence('TCAG', Name='s1')
s2 = DNA.Sequence('CCAC', Name='s2')
s3 = DNA.Sequence('AGAT', Name='s3')
aln = Alignment([s1,s2,s3], MolType=DNA, Alphabet=DNA.Alphabet)
seq_exp = array([[1,1,1,1],[0,3,1,0],[1,0,2,1]])
pos_exp = array([[1,1,1,0],[0,2,0,1],[0,0,3,0],[1,1,0,1]])
self.assertEqual(aln._get_freqs(index=1), pos_exp)
self.assertEqual(aln._get_freqs(index=0), seq_exp)
def test_getSeqFreqs(self):
"""Alignment getSeqFreqs: should work with DnaSequences and strings
"""
exp = array([[1,1,1,1],[0,3,1,0],[1,0,2,1]])
s1 = DNA.Sequence('TCAG', Name='s1')
s2 = DNA.Sequence('CCAC', Name='s2')
s3 = DNA.Sequence('AGAT', Name='s3')
aln = Alignment([s1,s2,s3], MolType=DNA, Alphabet=DNA.Alphabet)
obs = aln.getSeqFreqs()
self.assertEqual(obs.Data, exp)
self.assertEqual(obs.Alphabet, DNA.Alphabet)
self.assertEqual(obs.CharOrder, list("TCAG"))
s1 = 'TCAG'
s2 = 'CCAC'
s3 = 'AGAT'
aln = Alignment([s1,s2,s3], MolType=DNA, Alphabet=DNA.Alphabet)
obs = aln.getSeqFreqs()
self.assertEqual(obs.Data, exp)
self.assertEqual(obs.Alphabet, DNA.Alphabet)
self.assertEqual(obs.CharOrder, list("TCAG"))
def test_getPosFreqs(self):
"""Alignment getPosFreqs: should work with DnaSequences and strings
"""
exp = array([[1,1,1,0],[0,2,0,1],[0,0,3,0],[1,1,0,1]])
s1 = DNA.Sequence('TCAG', Name='s1')
s2 = DNA.Sequence('CCAC', Name='s2')
s3 = DNA.Sequence('AGAT', Name='s3')
aln = Alignment([s1,s2,s3], MolType=DNA, Alphabet=DNA.Alphabet)
obs = aln.getPosFreqs()
self.assertEqual(obs.Data, exp)
self.assertEqual(obs.Alphabet, DNA.Alphabet)
self.assertEqual(obs.CharOrder, list("TCAG"))
s1 = 'TCAG'
s2 = 'CCAC'
s3 = 'AGAT'
aln = Alignment([s1,s2,s3], MolType=DNA, Alphabet=DNA.Alphabet)
obs = aln.getPosFreqs()
self.assertEqual(obs.Data, exp)
self.assertEqual(obs.Alphabet, DNA.Alphabet)
self.assertEqual(obs.CharOrder, list("TCAG"))
def make_and_filter(self, raw, expected, motif_length):
# a simple filter func
func = lambda x: re.findall("[-N?]", " ".join(x)) == []
aln = self.Class(raw)
result = aln.filtered(func,motif_length=motif_length,log_warnings=False)
self.assertEqual(result.todict(), expected)
def test_filtered(self):
"""filtered should return new alignment with positions consistent with
provided callback function"""
# a simple filter option
raw = {'a':'ACGACGACG',
'b':'CCC---CCC',
'c':'AAAA--AAA'}
self.make_and_filter(raw, {'a':'ACGACG','b':'CCCCCC','c':'AAAAAA'}, 1)
# check with motif_length = 2
self.make_and_filter(raw, {'a':'ACAC','b':'CCCC','c':'AAAA'}, 2)
# check with motif_length = 3
self.make_and_filter(raw, {'a':'ACGACG','b':'CCCCCC','c':'AAAAAA'}, 3)
def test_slidingWindows(self):
"""slidingWindows should return slices of alignments."""
alignment = self.Class({'seq1': 'ACGTACGT', 'seq2': 'ACGTACGT', 'seq3': 'ACGTACGT'})
result = []
for bit in alignment.slidingWindows(5,2):
result+=[bit]
self.assertEqual(result[0].todict(), {'seq3': 'ACGTA', 'seq2': 'ACGTA', 'seq1': 'ACGTA'})
self.assertEqual(result[1].todict(), {'seq3': 'GTACG', 'seq2': 'GTACG', 'seq1': 'GTACG'})
result = []
for bit in alignment.slidingWindows(5,1):
result+=[bit]
self.assertEqual(result[0].todict(), {'seq3': 'ACGTA', 'seq2': 'ACGTA', 'seq1': 'ACGTA'})
self.assertEqual(result[1].todict(), {'seq3': 'CGTAC', 'seq2': 'CGTAC', 'seq1': 'CGTAC'})
self.assertEqual(result[2].todict(), {'seq3': 'GTACG', 'seq2': 'GTACG', 'seq1': 'GTACG'})
self.assertEqual(result[3].todict(), {'seq3': 'TACGT', 'seq2': 'TACGT', 'seq1': 'TACGT'})
def test_withGapsFrom(self):
"""withGapsFrom should overwrite with gaps."""
gapless = self.Class({'seq1': 'TCG', 'seq2': 'TCG'})
pregapped = self.Class({'seq1': '-CG', 'seq2': 'TCG'})
template = self.Class({'seq1': 'A-?', 'seq2': 'ACG'})
r1 = gapless.withGapsFrom(template).todict()
r2 = pregapped.withGapsFrom(template).todict()
self.assertEqual(r1, {'seq1': 'T-G', 'seq2': 'TCG'})
self.assertEqual(r2, {'seq1': '--G', 'seq2': 'TCG'})
def test_getDegappedRelativeTo(self):
"""should remove all columns with a gap in sequence with given name"""
aln = self.Class([
['name1', '-AC-DEFGHI---'],
['name2', 'XXXXXX--XXXXX'],
['name3', 'YYYY-YYYYYYYY'],
['name4', '-KL---MNPR---'],
])
out_aln = self.Class([
['name1', 'ACDEFGHI'],
['name2', 'XXXX--XX'],
['name3', 'YY-YYYYY'],
['name4', 'KL--MNPR'],
])
self.assertEqual(aln.getDegappedRelativeTo('name1'), out_aln)
self.assertRaises(ValueError, aln.getDegappedRelativeTo, 'nameX')
def test_addFromReferenceAln(self):
"""should add or insert seqs based on align to reference"""
aln1 = self.Class([
['name1', '-AC-DEFGHI---'],
['name2', 'XXXXXX--XXXXX'],
['name3', 'YYYY-YYYYYYYY'],
])
aln2 = self.Class([
['name1', 'ACDEFGHI'],
['name4', 'KL--MNPR'],
['name5', 'KLACMNPR'],
['name6', 'KL--MNPR'],
])
aligned_to_ref_out_aln_inserted = self.Class([
['name1', '-AC-DEFGHI---'],
['name4', '-KL---MNPR---'],
['name5', '-KL-ACMNPR---'],
['name6', '-KL---MNPR---'],
['name2', 'XXXXXX--XXXXX'],
['name3', 'YYYY-YYYYYYYY'],
])
aln2_wrong_refseq = self.Class((
('name1', 'ACDXFGHI'),
('name4', 'KL--MNPR'),
))
aln2_wrong_refseq_name = self.Class([
['nameY', 'ACDEFGHI'],
['name4', 'KL--MNPR'],
])
aln2_different_aln_class = DenseAlignment([
['name1', 'ACDEFGHI'],
['name4', 'KL--MNPR'],
])
aln2_list = [
['name1', 'ACDEFGHI'],
['name4', 'KL--MNPR'],
]
aligned_to_ref_out_aln = self.Class([
['name1', '-AC-DEFGHI---'],
['name2', 'XXXXXX--XXXXX'],
['name3', 'YYYY-YYYYYYYY'],
['name4', '-KL---MNPR---'],
])
out_aln = aln1.addFromReferenceAln(aln2, after_name='name1')
self.assertEqual(str(aligned_to_ref_out_aln_inserted),
str(out_aln)) #test insert_after
out_aln = aln1.addFromReferenceAln(aln2, before_name='name2')
self.assertEqual(aligned_to_ref_out_aln_inserted,
out_aln) #test insert_before
self.assertRaises(ValueError, aln1.addFromReferenceAln,
aln2_wrong_refseq_name) #test wrong_refseq_name
aln = aln1.addFromReferenceAln(aln2_different_aln_class)
self.assertEqual(aligned_to_ref_out_aln,
aln) #test_align_to_refseq_different_aln_class
aln = aln1.addFromReferenceAln(aln2_list)
self.assertEqual(aligned_to_ref_out_aln,
aln) #test from_list
self.assertRaises(ValueError, aln1.addFromReferenceAln,
aln2_wrong_refseq) #test wrong_refseq
class DenseAlignmentSpecificTests(TestCase):
"""Tests of the DenseAlignment object and its methods"""
def setUp(self):
"""Define some standard alignments."""
self.a = DenseAlignment(array([[0,1,2],[3,4,5]]), \
conversion_f=aln_from_array)
self.a2 = DenseAlignment(['ABC','DEF'], Names=['x','y'])
class ABModelSequence(ModelSequence):
Alphabet = AB.Alphabet
self.ABModelSequence = ABModelSequence
self.a = DenseAlignment(map(ABModelSequence, ['abaa','abbb']), \
Alphabet=AB.Alphabet)
self.b = Alignment(['ABC','DEF'])
self.c = SequenceCollection(['ABC','DEF'])
def test_init(self):
"""DenseAlignment init should work from a sequence"""
a = DenseAlignment(array([[0,1,2],[3,4,5]]), conversion_f=aln_from_array)
self.assertEqual(a.SeqData, array([[0,3],[1,4],[2,5]], 'B'))
self.assertEqual(a.ArrayPositions, array([[0,1,2],[3,4,5]], 'B'))
self.assertEqual(a.Names, ['seq_0','seq_1','seq_2'])
def test_guess_input_type(self):
"""DenseAlignment _guess_input_type should figure out data type correctly"""
git = self.a._guess_input_type
self.assertEqual(git(self.a), 'dense_aln')
self.assertEqual(git(self.b), 'aln')
self.assertEqual(git(self.c), 'collection')
self.assertEqual(git('>ab\nabc'), 'fasta')
self.assertEqual(git(['>ab','abc']), 'fasta')
self.assertEqual(git(['abc','def']), 'generic')
self.assertEqual(git([[1,2],[4,5]]), 'kv_pairs') #precedence over generic
self.assertEqual(git([[1,2,3],[4,5,6]]), 'generic')
self.assertEqual(git([ModelSequence('abc')]), 'model_seqs')
self.assertEqual(git(array([[1,2,3],[4,5,6]])), 'array')
self.assertEqual(git({'a':'aca'}), 'dict')
self.assertEqual(git([]), 'empty')
def test_init_seqs(self):
"""DenseAlignment init should work from ModelSequence objects."""
s = map(ModelSequence, ['abc','def'])
a = DenseAlignment(s)
self.assertEqual(a.SeqData, array(['abc','def'], 'c').view('B'))
def test_init_generic(self):
"""DenseAlignment init should work from generic objects."""
s = ['abc','def']
a = DenseAlignment(s)
self.assertEqual(a.SeqData, array(['abc','def'], 'c').view('B'))
def test_init_aln(self):
"""DenseAlignment init should work from another alignment."""
s = ['abc','def']
a = DenseAlignment(s)
b = DenseAlignment(a)
self.assertNotSameObj(a.SeqData, b.SeqData)
self.assertEqual(b.SeqData, array(['abc','def'], 'c').view('B'))
def test_init_dict(self):
"""DenseAlignment init should work from dict."""
s = {'abc':'aaaccc','xyz':'gcgcgc'}
a = DenseAlignment(s)
self.assertEqual(a.SeqData, array(['aaaccc','gcgcgc'], 'c').view('B'))
self.assertEqual(tuple(a.Names), ('abc','xyz'))
def test_init_empty(self):
"""DenseAlignment init should fail if empty."""
self.assertRaises(TypeError, DenseAlignment)
self.assertRaises(ValueError, DenseAlignment, 3)
def test_get_alphabet_and_moltype(self):
"""DenseAlignment should figure out correct alphabet and moltype"""
s1 = 'A'
s2 = RNA.Sequence('AA')
d = DenseAlignment(s1)
self.assertSameObj(d.MolType, BYTES)
self.assertSameObj(d.Alphabet, BYTES.Alphabet)
d = DenseAlignment(s1, MolType=RNA)
self.assertSameObj(d.MolType, RNA)
self.assertSameObj(d.Alphabet, RNA.Alphabets.DegenGapped)
d = DenseAlignment(s1, Alphabet=RNA.Alphabet)
self.assertSameObj(d.MolType, RNA)
self.assertSameObj(d.Alphabet, RNA.Alphabet)
d = DenseAlignment(s2)
self.assertSameObj(d.MolType, RNA)
self.assertSameObj(d.Alphabet, RNA.Alphabets.DegenGapped)
d = DenseAlignment(s2, MolType=DNA)
self.assertSameObj(d.MolType, DNA)
self.assertSameObj(d.Alphabet, DNA.Alphabets.DegenGapped)
#checks for containers
d = DenseAlignment([s2])
self.assertSameObj(d.MolType, RNA)
d = DenseAlignment({'x':s2})
self.assertSameObj(d.MolType, RNA)
d = DenseAlignment(set([s2]))
self.assertSameObj(d.MolType, RNA)
def test_iter(self):
"""DenseAlignment iter should iterate over positions"""
result = list(iter(self.a2))
for i, j in zip(result, [list(i) for i in ['AD', 'BE', 'CF']]):
self.assertEqual(i,j)
def test_getitem(self):
"""DenseAlignment getitem should default to positions as chars"""
a2 = self.a2
self.assertEqual(a2[1], ['B','E'])
self.assertEqual(a2[1:], [['B','E'],['C','F']])
def test_getSubAlignment(self):
"""DenseAlignment getSubAlignment should get requested part of alignment."""
a = DenseAlignment('>x ABCE >y FGHI >z JKLM'.split())
#passing in positions should keep all seqs, but just selected positions
b = DenseAlignment('>x BC >y GH >z KL'.split())
a_1 = a.getSubAlignment(pos=[1,2])
self.assertEqual(a_1.Names, b.Names)
self.assertEqual(a_1.Seqs, b.Seqs)
#...and with invert_pos, should keep all except the positions passed in
a_2 = a.getSubAlignment(pos=[0,3], invert_pos=True)
self.assertEqual(a_2.Seqs, b.Seqs)
self.assertEqual(a_2.Names, b.Names)
#passing in seqs should keep all positions, but just selected seqs
c = DenseAlignment('>x ABCE >z JKLM'.split())
a_3 = a.getSubAlignment(seqs=[0,2])
self.assertEqual(a_3.Seqs, c.Seqs)
#check that labels were updates as well...
self.assertEqual(a_3.Names, c.Names)
#...and should work with invert_seqs to exclude just selected seqs
a_4 = a.getSubAlignment(seqs=[1], invert_seqs=True)
self.assertEqual(a_4.Seqs, c.Seqs)
self.assertEqual(a_4.Names, c.Names)
#should be able to do both seqs and positions simultaneously
d = DenseAlignment('>x BC >z KL'.split())
a_5 = a.getSubAlignment(seqs=[0,2], pos=[1,2])
self.assertEqual(a_5.Seqs, d.Seqs)
self.assertEqual(a_5.Names, d.Names)
def test_str(self):
"""DenseAlignment str should return FASTA representation of aln"""
self.assertEqual(str(self.a2), '>x\nABC\n>y\nDEF\n')
#should work if labels diff length
self.a2.Names[-1] = 'yyy'
self.assertEqual(str(self.a2), '>x\nABC\n>yyy\nDEF\n')
def test_get_freqs(self):
"""DenseAlignment _get_freqs should get row or col freqs"""
ABModelSequence = self.ABModelSequence
a = self.a
self.assertEqual(a._get_freqs(0), array([[3,1],[1,3]]))
self.assertEqual(a._get_freqs(1), array([[2,0],[0,2],[1,1],[1,1]]))
def test_getSeqFreqs(self):
"""DenseAlignment getSeqFreqs should get profile of freqs in each seq"""
ABModelSequence = self.ABModelSequence
a = self.a
f = a.getSeqFreqs()
self.assertEqual(f.Data, array([[3,1],[1,3]]))
def test_getPosFreqs(self):
"""DenseAlignment getPosFreqs should get profile of freqs at each pos"""
ABModelSequence = self.ABModelSequence
a = self.a
f = a.getPosFreqs()
self.assertEqual(f.Data, array([[2,0],[0,2],[1,1],[1,1]]))
def test_getSeqEntropy(self):
"""DenseAlignment getSeqEntropy should get entropy of each seq"""
ABModelSequence = self.ABModelSequence
a = DenseAlignment(map(ABModelSequence, ['abab','bbbb','abbb']), \
Alphabet=AB.Alphabet)
f = a.getSeqEntropy()
e = 0.81127812445913283 #sum(p log_2 p) for p = 0.25, 0.75
self.assertFloatEqual(f, array([1,0,e]))
def test_getPosEntropy(self):
"""DenseAlignment getPosEntropy should get entropy of each pos"""
ABModelSequence = self.ABModelSequence
a = self.a
f = a.getPosEntropy()
e = array([0,0,1,1])
self.assertEqual(f, e)
class IntegrationTests(TestCase):
"""Test for integration between regular and model seqs and alns"""
def setUp(self):
"""Intialize some standard sequences"""
self.r1 = RNA.Sequence('AAA', Name='x')
self.r2 = RNA.Sequence('CCC', Name='y')
self.m1 = RNA.ModelSeq('AAA', Name='xx')
self.m2 = RNA.ModelSeq('CCC', Name='yy')
def test_model_to_model(self):
"""Model seq should work with dense alignment"""
a = DenseAlignment([self.m1, self.m2])
self.assertEqual(str(a), '>xx\nAAA\n>yy\nCCC\n')
a = DenseAlignment([self.m1, self.m2], MolType=DNA)
self.assertEqual(str(a), '>xx\nAAA\n>yy\nCCC\n')
self.assertEqual(self.m1.Name, 'xx')
def test_regular_to_model(self):
"""Regular seq should work with dense alignment"""
a = DenseAlignment([self.r1, self.r2])
self.assertEqual(str(a), '>x\nAAA\n>y\nCCC\n')
a = DenseAlignment([self.r1, self.r2], MolType=DNA)
self.assertEqual(str(a), '>x\nAAA\n>y\nCCC\n')
self.assertEqual(self.r1.Name, 'x')
def test_model_to_regular(self):
"""Model seq should work with regular alignment"""
a = Alignment([self.m1, self.m2])
self.assertEqual(str(a), '>xx\nAAA\n>yy\nCCC\n')
a = Alignment([self.m1, self.m2], MolType=DNA)
self.assertEqual(str(a), '>xx\nAAA\n>yy\nCCC\n')
self.assertEqual(self.m1.Name, 'xx')
def test_regular_to_regular(self):
"""Regular seq should work with regular alignment"""
a = Alignment([self.r1, self.r2])
self.assertEqual(str(a), '>x\nAAA\n>y\nCCC\n')
a = Alignment([self.r1, self.r2], MolType=DNA)
self.assertEqual(str(a), '>x\nAAA\n>y\nCCC\n')
self.assertEqual(self.r1.Name, 'x')
def test_model_aln_to_regular_aln(self):
"""Dense aln should convert to regular aln"""
a = DenseAlignment([self.r1, self.r2])
d = Alignment(a)
self.assertEqual(str(d), '>x\nAAA\n>y\nCCC\n')
d = Alignment(a, MolType=DNA)
self.assertEqual(str(d), '>x\nAAA\n>y\nCCC\n')
self.assertEqual(self.r1.Name, 'x')
def test_regular_aln_to_model_aln(self):
"""Regular aln should convert to model aln"""
a = Alignment([self.r1, self.r2])
d = DenseAlignment(a)
self.assertEqual(str(d), '>x\nAAA\n>y\nCCC\n')
d = DenseAlignment(a, MolType=DNA)
self.assertEqual(str(d), '>x\nAAA\n>y\nCCC\n')
self.assertEqual(self.r1.Name, 'x')
def test_regular_aln_to_regular_aln(self):
"""Regular aln should convert to regular aln"""
a = Alignment([self.r1, self.r2])
d = Alignment(a)
self.assertEqual(str(d), '>x\nAAA\n>y\nCCC\n')
d = Alignment(a, MolType=DNA)
self.assertEqual(str(d), '>x\nAAA\n>y\nCCC\n')
self.assertEqual(self.r1.Name, 'x')
def test_model_aln_to_model_aln(self):
"""Model aln should convert to model aln"""
a = Alignment([self.r1, self.r2])
d = Alignment(a)
self.assertEqual(str(d), '>x\nAAA\n>y\nCCC\n')
d = Alignment(a, MolType=DNA)
self.assertEqual(str(d), '>x\nAAA\n>y\nCCC\n')
self.assertEqual(self.r1.Name, 'x')
#run tests if invoked from command line
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_core/test_alphabet.py 000644 000765 000024 00000030176 12024702176 022572 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests of the Enumeration and Alphabet objects.
Note: individual Alphabets are typically in MolType and are tested there.
"""
from cogent.core.alphabet import Enumeration, get_array_type, \
uint8, uint16, uint32, array, JointEnumeration, CharAlphabet, \
_make_translation_tables, _make_complement_array
from cogent.core.moltype import RNA
from cogent.util.unit_test import TestCase, main
DnaBases = CharAlphabet('TCAG')
RnaBases = CharAlphabet('UCAG')
AminoAcids = CharAlphabet('ACDEFGHIKLMNPQRSTVWY')
__author__ = "Rob Knight, Peter Maxwell and Gavin Huttley"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Peter Maxwell", "Rob Knight", "Gavin Huttley"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"
class translation_table_tests(TestCase):
"""Tests of top-level translation table functions"""
def test_make_translation_tables(self):
"""_make_translation_tables should translate from chars to indices"""
a = 'ucag'
itoa, atoi = _make_translation_tables(a)
s = 'ggacu'
obs = s.translate(atoi)
self.assertEqual(obs, '\x03\x03\x02\x01\x00')
orig = obs.translate(itoa)
self.assertEqual(orig, s)
def test_make_complement_array(self):
"""_make_complement_array should identify complements correctly"""
complement_array = _make_complement_array(RNA.Alphabet, RNA.Complements)
test = 'UCAG'
test_array = [RNA.Alphabet.index(i) for i in test]
complements = complement_array.take(test_array)
result = ''.join([RNA.Alphabet[i] for i in complements])
self.assertEqual(result, 'AGUC')
class get_array_type_tests(TestCase):
"""Tests of the get_array_type top-level function."""
def test_get_array_type(self):
"""get_array_type should return unsigned type that fits elements."""
self.assertEqual(get_array_type(0), uint8)
self.assertEqual(get_array_type(100), uint8)
self.assertEqual(get_array_type(256), uint8) #boundary case
self.assertEqual(get_array_type(257), uint16) #boundary case
self.assertEqual(get_array_type(10000), uint16)
self.assertEqual(get_array_type(65536), uint16)
self.assertEqual(get_array_type(65537), uint32)
class EnumerationTests(TestCase):
"""Tests of the Enumeration object."""
def test_init(self):
"""Enumeration init should work from any sequence"""
a = Enumeration('abc')
self.assertEqual(a.index('a'), 0)
self.assertEqual(a.index('b'), 1)
self.assertEqual(a.index('c'), 2)
self.assertEqual(a[0], 'a')
self.assertEqual(a[1], 'b')
self.assertEqual(a[2], 'c')
self.assertEqual(a.ArrayType, uint8)
a = Enumeration('bca')
self.assertEqual(a.index('b'), 0)
self.assertEqual(a.index('c'), 1)
self.assertEqual(a.index('a'), 2)
self.assertEqual(a[0], 'b')
self.assertEqual(a[1], 'c')
self.assertEqual(a[2], 'a')
a = Enumeration([1,'2'])
self.assertEqual(a.index(1), 0)
self.assertEqual(a.index('2'), 1)
self.assertRaises(KeyError, a.index, '1')
#check that it works with gaps
a = Enumeration('ab-', '-')
self.assertEqual(a.Gap, '-')
self.assertEqual(a.GapIndex, 2)
a = Enumeration(range(257)) #too big to fit in uint8
self.assertEqual(a.ArrayType, uint16)
def test_index(self):
"""Enumeration index should return first index of item"""
a = Enumeration('bca')
self.assertEqual(a.index('b'), 0)
self.assertEqual(a.index('c'), 1)
self.assertEqual(a.index('a'), 2)
def test_getitem(self):
"""Enumeration[i] should return character at i"""
a = Enumeration('bca')
self.assertEqual(a[0], 'b')
self.assertEqual(a[1], 'c')
self.assertEqual(a[2], 'a')
def test_toIndices(self):
"""Enumeration toIndices should return indices from elements"""
a = Enumeration('bca')
self.assertEqual(a.toIndices(''), [])
self.assertEqual(a.toIndices('ccabac'), [1,1,2,0,2,1])
def test_isValid(self):
"""Enumeration isValid should return True for valid sequence"""
a = Enumeration('bca')
self.assertEqual(a.isValid(''), True)
self.assertEqual(a.isValid('bbb'), True)
self.assertEqual(a.isValid('bbbaac'), True)
self.assertEqual(a.isValid('bbd'), False)
self.assertEqual(a.isValid('d'), False)
self.assertEqual(a.isValid(['a', 'b']), True)
self.assertEqual(a.isValid(['a', None]), False)
def test_fromIndices(self):
"""Enumeration fromIndices should return elements from indices"""
a = Enumeration('bca')
self.assertEqual(a.fromIndices([]), [])
self.assertEqual(a.fromIndices([1,1,2,0,2,1]), list('ccabac'))
def test_pow(self):
"""Enumeration pow should produce JointEnumeration with n copies"""
a = AminoAcids**3
self.assertEqual(a[0], (AminoAcids[0],)*3)
self.assertEqual(a[-1], (AminoAcids[-1],)*3)
self.assertEqual(len(a), len(AminoAcids)**3)
self.assertEqual(a.ArrayType, uint16)
#check that it works with gaps
a = Enumeration('a-b', '-')
b = a**3
self.assertEqual(len(b), 27)
self.assertEqual(b.Gap, ('-','-','-'))
self.assertEqual(b.GapIndex, 13)
self.assertEqual(b.ArrayType, uint8)
#check that array type is set correctly if needed
b = a**6 #too big to fit in char
self.assertEqual(b.ArrayType, uint16)
def test_mul(self):
"""Enumeration mul should produce correct JointEnumeration"""
a = DnaBases * RnaBases
self.assertEqual(len(a), 16)
self.assertEqual(a[0], ('T','U'))
self.assertEqual(a[-1], ('G','G'))
#check that it works with gaps
a = Enumeration('ab-','-')
b = Enumeration('xz','z')
x = a*b
self.assertEqual(x.Gap, ('-','z'))
self.assertEqual(x.GapIndex, 5)
self.assertEqual(len(x), 6)
self.assertEqual(x, (('a','x'),('a','z'),('b','x'),('b','z'),('-','x'),\
('-','z')))
#check that it doesn't work when only one seq has gaps
c = Enumeration('c')
x = a*c
self.assertEqual(x.Gap, None)
def test_counts(self):
"""Enumeration counts should count freqs in array"""
a = DnaBases
f = array([[0,0,1,0,0,3]])
self.assertEqual(a.counts(f), array([4,1,0,1]))
#check that it works with byte array
f = array([[0,0,1,0,0,3]], 'B')
self.assertEqual(a.counts(f), array([4,1,0,1]))
#should ignore out-of-bounds items
g = [0,4]
self.assertEqual(a.counts(g), array([1,0,0,0]))
#make sure it works for long sequences, i.e. no wraparound at 255
h = [0, 3] * 70000
self.assertEqual(a.counts(h), array([70000,0,0,70000]))
h2 = array(h).astype('B')
self.assertEqual(a.counts(h2), array([70000,0,0,70000]))
i = array([0,3] * 75000)
self.assertEqual(a.counts(i), array([75000,0,0,75000]))
#make sure it works for long _binary_ sequences, e.g. the results
#of array comparisons.
a = array([0,1,2,3]*10000)
b = array([0,0,0,0]*10000)
same = (a==b)
class CharAlphabetTests(TestCase):
"""Tests of CharAlphabets."""
def test_init(self):
"""CharAlphabet init should make correct translation tables"""
r = CharAlphabet('UCAG')
i2c, c2i = r._indices_to_chars, r._chars_to_indices
s = array([0,0,1,0,3,2], 'b').tostring()
self.assertEqual(s.translate(i2c), 'UUCUGA')
self.assertEqual('UUCUGA'.translate(c2i), '\000\000\001\000\003\002')
def test_fromString(self):
"""CharAlphabet fromString should return correct array"""
r = CharAlphabet('UCAG')
self.assertEqual(r.fromString('UUCUGA'), array([0,0,1,0,3,2],'B'))
def test_isValid(self):
"""CharAlphabet isValid should return True for valid sequence"""
a = CharAlphabet('bca')
self.assertEqual(a.isValid(''), True)
self.assertEqual(a.isValid('bbb'), True)
self.assertEqual(a.isValid('bbbaac'), True)
self.assertEqual(a.isValid('bbd'), False)
self.assertEqual(a.isValid('d'), False)
self.assertEqual(a.isValid(['a', 'b']), True)
self.assertEqual(a.isValid(['a', None]), False)
def test_fromArray(self):
"""CharAlphabet fromArray should return correct array"""
r = CharAlphabet('UCAG')
self.assertEqual(r.fromArray(array(['UUC','UGA'], 'c')), \
array([[0,0,1],[0,3,2]], 'B'))
def test_toChars(self):
"""CharAlphabet toChars should convert an input array to chars"""
r = CharAlphabet('UCAG')
c = r.toChars(array([[0,0,1],[0,3,2]], 'B'))
self.assertEqual(c, \
array(['UUC','UGA'], 'c'))
def test_toString(self):
"""CharAlphabet toString should convert an input array to string"""
r = CharAlphabet('UCAG')
self.assertEqual(r.toString(array([[0,0,1],[0,3,2]], 'B')), 'UUC\nUGA')
#should work with single seq
self.assertEqual(r.toString(array([[0,0,1,0,3,2]], 'B')), 'UUCUGA')
#should work with single seq
self.assertEqual(r.toString(array([0,0,1,0,3,2], 'B')), 'UUCUGA')
#should work with empty seq
self.assertEqual(r.toString(array([], 'B')), '')
def test_pairs(self):
"""pairs should cache the same object."""
r = CharAlphabet('UCAG')
rp = r.Pairs
self.assertEqual(len(rp), 16)
rp2 = r.Pairs
self.assertSameObj(rp, rp2)
def test_triples(self):
"""triples should cache the same object."""
r = CharAlphabet('UCAG')
rt = r.Triples
self.assertEqual(len(rt), 64)
rt2 = r.Triples
self.assertSameObj(rt, rt2)
class JointEnumerationTests(TestCase):
"""Tests of JointEnumerations."""
def test_init(self):
"""JointEnumeration init should work as expected"""
#should work for alphabet object
a = JointEnumeration([DnaBases, RnaBases])
self.assertEqual(len(a), 16)
self.assertEqual(a.Shape, (4,4))
self.assertEqual(a[0], ('T','U'))
self.assertEqual(a[-1], ('G','G'))
self.assertEqual(a._sub_enum_factors, array([[4],[1]]))
#should work for arbitrary sequences
a = JointEnumeration(['TCAG', 'UCAG'])
self.assertEqual(len(a), 16)
self.assertEqual(a[0], ('T','U'))
self.assertEqual(a[-1], ('G','G'))
self.assertEqual(a._sub_enum_factors, array([[4],[1]]))
#should work for different length sequences
a = JointEnumeration(['TCA', 'UCAG'])
self.assertEqual(a.Shape, (3,4))
self.assertEqual(len(a), 12)
self.assertEqual(a[0], ('T','U'))
self.assertEqual(a[-1], ('A','G'))
self.assertEqual(a._sub_enum_factors, \
array([[4],[1]])) #note: _not_ [3,1]
def test_toIndices(self):
"""JointEnumeration toIndices should convert tuples correctly"""
a = JointEnumeration(['TCAG','UCAG'])
i = a.toIndices([('T','U'),('G','G'),('G','G')])
self.assertEqual(i, [0, 15, 15])
def test_fromIndices(self):
"""JointEnumeration fromIndices should return correct tuples"""
a = JointEnumeration(['TCAG','UCAG'])
i = a.fromIndices([0, 15, 15])
self.assertEqual(i, [('T','U'),('G','G'),('G','G')])
def test_packArrays(self):
"""JointEnumeration packArrays should return correct array."""
a = JointEnumeration(['xyz', 'abcd', 'ef'])
v = [[0,1,2,0],[3,3,1,0], [1,1,0,0]]
result = a.packArrays(v)
self.assertEqual(result, array([7,15,18,0]))
def test_unpackArrays(self):
"""JointEnumeration unpackArrays should return correct arrays."""
a = JointEnumeration(['xyz', 'abcd', 'ef'])
v = [7,15,18,0]
result = a.unpackArrays(v)
self.assertEqual(result, array([[0,1,2,0],[3,3,1,0], [1,1,0,0]]))
if __name__ == '__main__':
main()
PyCogent-1.5.3/tests/test_core/test_annotation.py 000644 000765 000024 00000020663 12024702176 023164 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
import unittest
from cogent import DNA, LoadSeqs
from cogent.core.annotation import Feature, Variable
from cogent.core.location import Map, Span
__author__ = "Gavin Huttley"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Gavin Huttley"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Gavin Huttley"
__email__ = "gavin.huttley@anu.edu.au"
__status__ = "Production"
def makeSampleSequence(with_gaps=False):
raw_seq = 'AACCCAAAATTTTTTGGGGGGGGGGCCCC'
cds = (15, 25)
utr = (12, 15)
if with_gaps:
raw_seq = raw_seq[:5] + '-----' +raw_seq[10:-2] + '--'
seq = DNA.makeSequence(raw_seq)
seq.addAnnotation(Feature, 'CDS', 'CDS', [cds])
seq.addAnnotation(Feature, "5'UTR", "5' UTR", [utr])
return seq
def makeSampleAlignment():
seq1 = makeSampleSequence()
seq2 = makeSampleSequence(with_gaps=True)
seqs = {'FAKE01': seq1, 'FAKE02': seq2}
aln = LoadSeqs(data = seqs)
aln.addAnnotation(Feature, 'misc_feature', 'misc', [(12,25)])
aln.addAnnotation(Feature, 'CDS', 'blue', [(15, 25)])
aln.addAnnotation(Feature, "5'UTR", 'red', [(2, 4)])
aln.addAnnotation(Feature, "LTR", "fake", [(2,15)])
return aln
class TestAnnotations(unittest.TestCase):
def setUp(self):
self.seq = makeSampleSequence()
self.aln = makeSampleAlignment()
def test_slice_seq_with_annotations(self):
newseq = self.seq[:5] + self.seq[10:]
for annot_type in ["CDS", "5'UTR"]:
orig = str(list(self.seq.getByAnnotation(annot_type))[0])
new = str(list(newseq.getByAnnotation(annot_type))[0])
assert orig == new, (annot_type, orig, new)
def test_aln_annotations(self):
"""test that annotations to alignment and its' sequences"""
aln_expecteds = {"misc_feature":{'FAKE01': 'TTTGGGGGGGGGG',
'FAKE02': 'TTTGGGGGGGGGG'},
"CDS": {'FAKE01': 'GGGGGGGGGG', 'FAKE02': 'GGGGGGGGGG'},
"5'UTR": {'FAKE01': 'CC', 'FAKE02': 'CC'},
"LTR" : {"FAKE01": "CCCAAAATTTTTT",
"FAKE02": "CCC-----TTTTT"}
}
seq_expecteds = {"CDS": {"FAKE01": "GGGGGGGGGG",
"FAKE02": "GGGGGGGGGG"},
"5'UTR": {"FAKE01": "TTT",
"FAKE02": "TTT"}}
for annot_type in ["misc_feature", "CDS", "5'UTR", "LTR"]:
observed = list(self.aln.getByAnnotation(annot_type))[0].todict()
expected = aln_expecteds[annot_type]
assert observed == expected, (annot_type, expected, observed)
if annot_type in ["misc_feature", "LTR"]:
continue # because seqs haven't been annotated with it
for name in self.aln.Names:
observed = list(self.aln.NamedSeqs[name].data.\
getByAnnotation(annot_type))[0]
observed = str(observed)
expected = seq_expecteds[annot_type][name]
assert str(observed) == expected, (annot_type, name, expected,
observed)
def test_slice_aln_with_annotations(self):
"""test that annotations of sequences and alignments survive alignment
slicing."""
aln_expecteds = {"misc_feature":{'FAKE01': 'TTTGGGGGGGGGG',
'FAKE02': 'TTTGGGGGGGGGG'},
"CDS": {'FAKE01': 'GGGGGGGGGG', 'FAKE02': 'GGGGGGGGGG'},
"5'UTR": {'FAKE01': 'CC', 'FAKE02': 'CC'},
"LTR" : {"FAKE01": "CCCTTTTT",
"FAKE02": "CCCTTTTT"}}
newaln = self.aln[:5]+self.aln[10:]
feature_list = newaln.getAnnotationsMatching("LTR")
for annot_type in ["LTR", "misc_feature", "CDS", "5'UTR"]:
feature_list = newaln.getAnnotationsMatching(annot_type)
new = newaln.getRegionCoveringAll(feature_list).getSlice().todict()
expected = aln_expecteds[annot_type]
assert expected == new, (annot_type, expected, new)
if annot_type in ["misc_feature", "LTR"]:
continue # because seqs haven't been annotated with it
for name in self.aln.Names:
orig = str(list(self.aln.getAnnotationsFromSequence(name,
annot_type))[0].getSlice())
new = str(list(newaln.getAnnotationsFromSequence(name,
annot_type))[0].getSlice())
assert orig == new, (name, annot_type, orig, new)
def test_feature_projection(self):
expecteds = {"FAKE01": "CCCAAAATTTTTT", "FAKE02": "CCC-----TTTTT"}
aln_ltr = self.aln.getAnnotationsMatching('LTR')[0]
for seq_name in ['FAKE01', 'FAKE02']:
expected = expecteds[seq_name]
seq_ltr = self.aln.projectAnnotation(seq_name, aln_ltr)
if '-' in expected:
self.assertRaises(ValueError, seq_ltr.getSlice)
seq_ltr = seq_ltr.withoutLostSpans()
expected = expected.replace('-', '')
self.assertEqual(seq_ltr.getSlice(), expected)
def test_reversecomplement(self):
"""test correct translation of annotations on reverse complement."""
aln_expecteds = {"misc_feature":{'FAKE01': 'TTTGGGGGGGGGG',
'FAKE02': 'TTTGGGGGGGGGG'},
"CDS": {'FAKE01': 'GGGGGGGGGG', 'FAKE02': 'GGGGGGGGGG'},
"5'UTR": {'FAKE01': 'CC', 'FAKE02': 'CC'},
"LTR" : {"FAKE01": "CCCAAAATTTTTT",
"FAKE02": "CCC-----TTTTT"}
}
seq_expecteds = {"CDS": {"FAKE01": "GGGGGGGGGG",
"FAKE02": "GGGGGGGGGG"},
"5'UTR": {"FAKE01": "TTT",
"FAKE02": "TTT"}}
rc = self.aln.rc()
# rc'ing an Alignment or Sequence rc's their annotations too. This means
# slicing returns the same sequence as the non-rc'd alignment/seq
for annot_type in ["misc_feature", "CDS", "5'UTR", "LTR"]:
observed = list(self.aln.getByAnnotation(annot_type))[0].todict()
expected = aln_expecteds[annot_type]
assert observed == expected, ("+", annot_type, expected, observed)
observed = list(rc.getByAnnotation(annot_type))[0].todict()
expected = aln_expecteds[annot_type]
assert observed == expected, ("-", annot_type, expected, observed)
if annot_type in ["misc_feature", "LTR"]:
continue # because seqs haven't been annotated with it
for name in self.aln.Names:
observed = list(self.aln.NamedSeqs[name].data.\
getByAnnotation(annot_type))[0]
observed = str(observed)
expected = seq_expecteds[annot_type][name]
assert str(observed) == expected, ("+", annot_type, name, expected,
observed)
observed = list(rc.NamedSeqs[name].data.\
getByAnnotation(annot_type))[0]
observed = str(observed)
expected = seq_expecteds[annot_type][name]
assert str(observed) == expected, ("-", annot_type, name, expected,
observed)
class TestMapSpans(unittest.TestCase):
"""Test attributes of Map & Spans classes critical to annotation
manipulation."""
def test_span(self):
length = 100
forward = Span(20, 30)
reverse = Span(70, 80, Reverse=True)
assert forward.reversedRelativeTo(100) == reverse
assert reverse.reversedRelativeTo(100) == forward
def test_map(self):
"""reversing a map with multiple spans should preserve span relative
order"""
forward = [Span(20,30), Span(40,50)]
fmap = Map(spans=forward, parent_length=100)
fmap_reversed = fmap.nucleicReversed()
reverse = [Span(70,80, Reverse=True), Span(50,60, Reverse=True)]
rmap = Map(spans=reverse, parent_length=100)
for i in range(2):
self.assertEquals(fmap_reversed.spans[i], rmap.spans[i])
if __name__ == '__main__':
unittest.main()
PyCogent-1.5.3/tests/test_core/test_bitvector.py 000644 000765 000024 00000125020 12024702176 023004 0 ustar 00jrideout staff 000000 000000 #!/usr/bin/env python
"""Tests of the bitvector module.
"""
from cogent.util.unit_test import TestCase, main
from cogent.core.bitvector import is_nonzero_string_char, is_nonzero_char, \
seq_to_bitstring, is_nonzero_string_int, is_nonzero_int, seq_to_bitlist,\
num_to_bitstring, bitcount, Bitvector, MutableBitvector, \
ImmutableBitvector, VectorFromCases, VectorFromMatches, VectorFromRuns, \
VectorFromSpans, VectorFromPositions, PackedBases, \
LongBitvector, ShortBitvector
import re
__author__ = "Jeremy Widmann"
__copyright__ = "Copyright 2007-2012, The Cogent Project"
__credits__ = ["Jeremy Widmann", "Rob Knight"]
__license__ = "GPL"
__version__ = "1.5.3"
__maintainer__ = "Jeremy Widmann"
__email__ = "jeremy.widmann@colorado.edu"
__status__ = "Production"
class bitvectorTests(TestCase):
"""Tests of top-level functions."""
def test_is_nonzero_string_char(self):
"""is_nonzero_string_char should return '1' for anything but '0', ''"""
self.assertEqual(is_nonzero_string_char('0'), '0')
self.assertEqual(is_nonzero_string_char(''), '0')
for char in "QWERTYUIOPASDFGHJKL:ZXCGHJMK?{|!@#$%^&*()12345678":
self.assertEqual(is_nonzero_string_char(char), '1')
def test_is_nonzero_char(self):
"""is_nonzero_char should return '0' for any False item or '0'"""
zero = ['', 0, '0', [], {}, None, 0L, 0.0, False]
for z in zero:
self.assertEqual(is_nonzero_char(z), '0')
nonzero = ['z', '1', '00', ' ', 1, -1, 1e-30, [''], {'':None}, True]
for n in nonzero:
self.assertEqual(is_nonzero_char(n), '1')
def test_seq_to_bitstring(self):
"""seq_to_bitstring should provide expected results"""
zero = ['', 0, '0', [], {}, None, 0L, 0.0, False]
self.assertEqual(seq_to_bitstring(zero), '0'*9)
nonzero = ['z', '1', '00', ' ', 1, -1, 1e-30, [''], {'':None}, True]
self.assertEqual(seq_to_bitstring(nonzero), '1'*10)
self.assertEqual(seq_to_bitstring(''), '')
self.assertEqual(seq_to_bitstring('305'), '101')
self.assertEqual(seq_to_bitstring(''), '')
def test_is_nonzero_string_int(self):
"""is_nonzero_string_int should return 1 for anything but '0', ''"""
self.assertEqual(is_nonzero_string_int('0'), 0)
self.assertEqual(is_nonzero_string_int(''), 0)
for char in "QWERTYUIOPASDFGHJKL:ZXCGHJMK?{|!@#$%^&*()12345678":
self.assertEqual(is_nonzero_string_int(char), 1)
def test_is_nonzero_int(self):
"""is_nonzero_int should return 0 for any False item or '0'"""
zero = ['', 0, '0', [], {}, None, 0L, 0.0, False]
for z in zero:
self.assertEqual(is_nonzero_int(z), 0)
nonzero = ['z', '1', '00', ' ', 1, -1, 1e-30, [''], {'':None}, True]
for n in nonzero:
self.assertEqual(is_nonzero_int(n), 1)
def test_seq_to_bitlist(self):
"""seq_to_bitlist should provide expected results"""
zero = ['', 0, '0', [], {}, None, 0L, 0.0, False]
self.assertEqual(seq_to_bitlist(zero), [0]*9)
nonzero = ['z', '1', '00', ' ', 1, -1, 1e-30, [''], {'':None}, True]
self.assertEqual(seq_to_bitlist(nonzero), [1]*10)
self.assertEqual(seq_to_bitlist(''), [])
self.assertEqual(seq_to_bitlist('305'), [1,0,1])
self.assertEqual(seq_to_bitlist(''), [])
def test_number_to_bitstring(self):
"""number_to_bitstring should provide expected results"""
numbers = [0, 1, 2, 7, 8, 1024, 814715L]
for n in numbers:
self.assertEqual(num_to_bitstring(n, 0), '')
single_results = list('0101001')
for exp, num in zip(single_results, numbers):
self.assertEqual(num_to_bitstring(num, 1), exp)
three_results = ['000','001','010','111','000','000','011']
for exp, num in zip(three_results, numbers):
self.assertEqual(num_to_bitstring(num, 3), exp)
#should pad or truncate to the correct length
self.assertEqual(num_to_bitstring(814715, 20),'11000110111001111011')
self.assertEqual(num_to_bitstring(814715, 10),'1001111011')
self.assertEqual(num_to_bitstring(8, 10),'0000001000')
def test_bitcount(self):
"""bitcount should provide expected results"""
numbers = [0, 1, 2, 7, 8, 1024, 814715L]
twenty_results = [0, 1, 1, 3, 1, 1, 13]
for exp, num in zip(twenty_results, numbers):
self.assertEqual(bitcount(num, 20), exp)
self.assertEqual(bitcount(num, 20, 1), exp)
self.assertEqual(bitcount(num, 20, 0), 20 - exp)
three_results = [0,1,1,3,0,0,2]
for exp, num in zip(three_results, numbers):
self.assertEqual(bitcount(num, 3), exp)
self.assertEqual(bitcount(num, 3, 1), exp)
self.assertEqual(bitcount(num, 3, 0), 3 - exp)
for num in numbers:
self.assertEqual(bitcount(num, 0), 0)
self.assertEqual(bitcount(num, 0, 0), 0)
self.assertEqual(bitcount(num, 0, 1), 0)
class BitvectorTests(TestCase):
"""Tests of the (immutable) Bitvector class."""
def setUp(self):
"""Define a few standard strings and vectors."""
self.strings = ['', '0', '1', '00', '01', '10', '11']
self.vectors = map(Bitvector, self.strings)
def test_init(self):
"""Bitvector init should give expected results."""
self.assertEqual(Bitvector(), 0)
self.assertEqual(Bitvector('1001'), 9)
self.assertEqual(Bitvector(['1','0','0','0']), 8)
self.assertEqual(Bitvector([]), 0)
#if passing in non-sequence, must specify length
self.assertRaises(TypeError, Bitvector, 1024)
self.assertEqual(Bitvector(1024, 10), 1024)
bv = Bitvector(10, 3)
self.assertEqual(bv, 10)
self.assertEqual(len(bv), 3)
self.assertEqual(len(Bitvector('1'*1000)), 1000)
#check that initializing a bv from itself preserves length
bv2 = Bitvector(bv)
self.assertEqual(bv2, 10)
self.assertEqual(len(bv2), 3)
def test_len(self):
"""Bitvector len should match initialized length"""
self.assertEqual(len(Bitvector()), 0)
self.assertEqual(len(Bitvector('010')), 3)
self.assertEqual(len(Bitvector(1024, 5)), 5)
self.assertEqual(len(Bitvector(1024, 0)), 0)
self.assertEqual(len(Bitvector('1'*1000)), 1000)
def test_str(self):
"""Bitvector str should match expected results"""
vecs = [Bitvector(i, 0) for i in [0, 1, 2, 7, 8, 1024, 814715L]]
for v in vecs:
self.assertEqual(str(v), '')
vecs = [Bitvector(i, 1) for i in [0, 1, 2, 7, 8, 1024, 814715L,'1'*50]]
single_results = list('01010011')
for exp, vec in zip(single_results, vecs):
self.assertEqual(str(vec), exp)
vecs = [Bitvector(i, 3) for i in [0, 1, 2, 7, 8, 1024, 814715L,'1'*50]]
three_results = ['000','001','010','111','000','000','011','111']
for exp, vec in zip(three_results, vecs):
self.assertEqual(str(vec), exp)
#should pad or truncate to the correct length
self.assertEqual(str(Bitvector(814715, 20)),'11000110111001111011')
self.assertEqual(str(Bitvector(814715, 10)),'1001111011')
self.assertEqual(str(Bitvector(8, 10)),'0000001000')
self.assertEqual(str(Bitvector('1'*50)), '1'*50)
def test_or(self):
"""Bitvector A|B should return 1 for each position that is 1 in A or B"""
results = [
['', '', '', '', '', '', ''], #'' or x
['', '0', '1', '0', '0', '1', '1'], #'0' or x
['', '1', '1', '1', '1', '1', '1'], #'1' or x
['', '0', '1', '00', '01', '10', '11'], #'00' or x
['', '0', '1', '01', '01', '11', '11'], #'01' or x
['', '1', '1', '10', '11', '10', '11'], #'10' or x
['', '1', '1', '11', '11', '11', '11'], #'11' or x
]
vectors = self.vectors
for first_pos, first in enumerate(vectors):
for second_pos, second in enumerate(vectors):
self.assertEqual( str(first | second),
results[first_pos][second_pos])
#test chaining
expected = Bitvector('1110')
observed = Bitvector('1000') | Bitvector('0100') | Bitvector('0110')
self.assertEqual(observed, expected)
#test long
self.assertEqual(Bitvector('10'*50) | Bitvector('01'*50), \
Bitvector('11'*50))
def test_and(self):
"""Bitvector A&B should return 0 for each position that is 0 in A and B"""
results = [
['', '', '', '', '', '', ''], #'' and x
['', '0', '0', '0', '0', '0', '0'], #'0' and x
['', '0', '1', '0', '0', '1', '1'], #'1' and x
['', '0', '0', '00', '00', '00', '00'], #'00' and x
['', '0', '0', '00', '01', '00', '01'], #'01' and x
['', '0', '1', '00', '00', '10', '10'], #'10' and x
['', '0', '1', '00', '01', '10', '11'], #'11' and x
]
vectors = self.vectors
for first_pos, first in enumerate(vectors):
for second_pos, second in enumerate(vectors):
self.assertEqual( str(first & second),
results[first_pos][second_pos])
#test chaining
expected = Bitvector('0110')
observed = Bitvector('1110') & Bitvector('1111') & Bitvector('0111')
self.assertEqual(observed, expected)
#test long
self.assertEqual(Bitvector('10'*50) & Bitvector('11'*50), \
Bitvector('10'*50))
def test_xor(self):
"""Bitvector A^B should return 0 for each identical position in A and B"""
results = [
['', '', '', '', '', '', ''], #'' xor x
['', '0', '1', '0', '0', '1', '1'], #'0' xor x
['', '1', '0', '1', '1', '0', '0'], #'1' xor x
['', '0', '1', '00', '01', '10', '11'], #'00' xor x
['', '0', '1', '01', '00', '11', '10'], #'01' xor x
['', '1', '0', '10', '11', '00', '01'], #'10' xor x
['', '1', '0', '11', '10', '01', '00'], #'11' xor x
]
vectors = self.vectors
for first_pos, first in enumerate(vectors):
for second_pos, second in enumerate(vectors):
self.assertEqual( str(first ^ second),
results[first_pos][second_pos])
#test chaining
expected = Bitvector('0110')
observed = Bitvector('1111') ^ Bitvector('0110') ^ Bitvector('1111')
#test long
self.assertEqual(Bitvector('11'*50) ^ Bitvector('01'*50), \
Bitvector('10'*50))
def test_invert(self):
"""Bitvector ~A should return a vector exchanging 1's for 0's"""
results = map(Bitvector, ['', '1', '0', '11', '10', '01', '00'])
for data, result in zip(self.vectors, results):
self.assertEqual(~data, result)
if len(data):
self.assertNotEqual(data, result)
else:
self.assertEqual(data, result)
#test chaining
self.assertEqual(~~data, data) #inverting twice should give original
self.assertEqual(~~~data, ~data)
#test long
self.assertEqual(~Bitvector('10'*50), Bitvector('01'*50))
self.assertEqual(str(~Bitvector('10'*50)), str(Bitvector('01'*50)))
def test_getitem(self):
"""Bitvector getitem should return states at specified position(s)"""
vec_strings = ['', '0', '1', '10', '10001101', '101'*50]
vecs = map(Bitvector, vec_strings)
for vec_string, vec in zip(vec_strings, vecs):
for char, item in zip(vec_string, vec):
self.assertEqual(char, str(item))
#test some 2- and 3-item slices as well
vec = Bitvector('1001000101001')
self.assertEqual(vec[3:7], Bitvector('1000'))
self.assertEqual(vec[:4], Bitvector('1001'))
self.assertEqual(vec[7:], Bitvector('101001'))
self.assertEqual(vec[1:11:2], Bitvector('01011'))
def test_bitcount(self):
"""Bitvector bitcount should correctly count 1's or 0's"""
vec_strings = ['', '0', '1', '10', '10001101', '101'*50]
vecs = map(Bitvector, vec_strings)
one_counts = [0, 0, 1, 1, 4, 100]
zero_counts = [0, 1, 0, 1, 4, 50]
for v, o, z in zip(vecs, one_counts, zero_counts):
self.assertEqual(v.bitcount(), o)
self.assertEqual(v.bitcount(1), o)
self.assertEqual(v.bitcount(0), z)
def test_repr(self):
"""Bitvector repr should look like a normal object"""
v = Bitvector(3, 10)
v_id = str(hex(id(v)))
expected = '