changeo-1.2.0/0000755000175000017500000000000014136777167012467 5ustar nileshnileshchangeo-1.2.0/README.rst0000755000175000017500000000237314043605507014147 0ustar nileshnilesh.. image:: https://img.shields.io/pypi/dm/changeo :target: https://pypi.org/project/changeo .. image:: https://img.shields.io/static/v1?label=AIRR-C%20sw-tools%20v1&message=compliant&color=008AFF&labelColor=000000&style=plastic :target: https://docs.airr-community.org/en/stable/swtools/airr_swtools_standard.html Change-O - Repertoire clonal assignment toolkit ================================================================================ Change-O is a collection of tools for processing the output of V(D)J alignment tools, assigning clonal clusters to immunoglobulin (Ig) sequences, and reconstructing germline sequences. Dramatic improvements in high-throughput sequencing technologies now enable large-scale characterization of Ig repertoires, defined as the collection of trans-membrane antigen-receptor proteins located on the surface of B cells and T cells. Change-O is a suite of utilities to facilitate advanced analysis of Ig and TCR sequences following germline segment assignment. Change-O handles output from IMGT/HighV-QUEST and IgBLAST, and provides a wide variety of clustering methods for assigning clonal groups to Ig sequences. Record sorting, grouping, and various database manipulation operations are also included. changeo-1.2.0/PKG-INFO0000644000175000017500000000444714136777167013575 0ustar nileshnileshMetadata-Version: 1.1 Name: changeo Version: 1.2.0 Summary: A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data. Home-page: http://changeo.readthedocs.io Author: Namita Gupta, Jason Anthony Vander Heiden Author-email: immcantation@googlegroups.com License: GNU Affero General Public License 3 (AGPL-3) Download-URL: https://bitbucket.org/kleinstein/changeo/downloads Description: .. image:: https://img.shields.io/pypi/dm/changeo :target: https://pypi.org/project/changeo .. image:: https://img.shields.io/static/v1?label=AIRR-C%20sw-tools%20v1&message=compliant&color=008AFF&labelColor=000000&style=plastic :target: https://docs.airr-community.org/en/stable/swtools/airr_swtools_standard.html Change-O - Repertoire clonal assignment toolkit ================================================================================ Change-O is a collection of tools for processing the output of V(D)J alignment tools, assigning clonal clusters to immunoglobulin (Ig) sequences, and reconstructing germline sequences. Dramatic improvements in high-throughput sequencing technologies now enable large-scale characterization of Ig repertoires, defined as the collection of trans-membrane antigen-receptor proteins located on the surface of B cells and T cells. Change-O is a suite of utilities to facilitate advanced analysis of Ig and TCR sequences following germline segment assignment. Change-O handles output from IMGT/HighV-QUEST and IgBLAST, and provides a wide variety of clustering methods for assigning clonal groups to Ig sequences. Record sorting, grouping, and various database manipulation operations are also included. Keywords: bioinformatics,sequencing,immunology,adaptive immunity,immunoglobulin,AIRR-seq,Rep-Seq,B cell repertoire analysis,adaptive immune receptor repertoires Platform: UNKNOWN Classifier: Development Status :: 4 - Beta Classifier: Environment :: Console Classifier: Intended Audience :: Science/Research Classifier: Natural Language :: English Classifier: Operating System :: OS Independent Classifier: Programming Language :: Python :: 3.4 Classifier: Topic :: Scientific/Engineering :: Bio-Informatics changeo-1.2.0/NEWS.rst0000644000175000017500000006220314136777122013767 0ustar nileshnileshRelease Notes =============================================================================== Version 1.2.0: October 29, 2021 ------------------------------------------------------------------------------- + Updated dependencies to presto >= v0.7.0. AssignGenes: + Fixed reporting of IgBLAST output counts when specifying ``--format airr``. BuildTrees: + Added support for specifying fixed omega and hotness parameters at the commandline. CreateGermlines: + Will now use the first allele in the reference database when duplicate allele names are provided. Only appears to affect mouse BCR light chains and TCR alleles in the IMGT database when the same allele name differs by strain. MakeDb: + Added support for changes in how IMGT/HighV-QUEST v1.8.4 handles special characters in sequence identifiers. + Fixed the ``imgt`` subcommand incorrectly allowing execution without specifying the IMGT/HighV-QUEST output file at the commandline. ParseDb: + Added reporting of output file sizes to the console log of the ``split`` subcommand. Version 1.1.0: June 21, 2021 ------------------------------------------------------------------------------- + Fixed gene parsing for IMGT temporary designation nomenclature. + Updated dependencies to biopython >= v1.77, airr >= v1.3.1, PyYAML>=5.1. MakeDb: + Added the ``--imgt-id-len`` argument to accommodate changes introduced in how IMGT/HighV-QUEST truncates sequence identifiers as of v1.8.3 (May 7, 2021). The header lines in the fasta files are now truncated to 49 characters. In IMGT/HighV-QUEST versions older than v1.8.3, they were truncated to 50 characters. ``--imgt-id-len`` default value is 49. Users should specify ``--imgt-id-len 50`` to analyze IMGT results generated with IMGT/HighV-QUEST versions older than v1.8.3. + Added the ``--infer-junction`` argument to ``MakeDb igblast``, to enable the inference of the junction sequence when not reported by IgBLAST. Should be used with data from IgBLAST v1.6.0 or older; before igblast added the IMGT-CDR3 inference. Version 1.0.2: January 18, 2021 ------------------------------------------------------------------------------- AlignRecords: + Fixed a bug caused the program to exit when encountering missing sequence data. It will now fail the row or group with missing data and continue. MakeDb: + Added support for IgBLAST v1.17.0. ParseDb: + Added a relevant error message when an input field is missing from the data. Version 1.0.1: October 13, 2020 ------------------------------------------------------------------------------- + Updated to support Biopython v1.78. + Increased the biopython dependency to v1.71. + Increased the presto dependency to 0.6.2. Version 1.0.0: May 6, 2020 ------------------------------------------------------------------------------- + The default output in all tools is now the AIRR Rearrangement standard (``--format airr``). Support for the legacy Change-O data standard is still provided through the ``--format changeo`` argument to the tools. + License changed to AGPL-3. AssignGenes: + Added the ``igblast-aa`` subcommand to run igblastp on amino acid input. BuildTrees: + Adjusted ``RECORDS`` to indicate all sequences in input file. ``INITIAL_FILTER`` now shows sequence count after initial ``min_seq`` filtering. + Added option to skip codon masking: ``--nmask``. + Mask ``:``, ``,``, ``)``, and ``(`` in IDs and metadata with ``-``. + Can obtain germline from ``GERMLINE_IMGT`` if ``GERMLINE_IMGT_D_MASK`` not specified. + Can reconstruct intermediate sequences with IgPhyML using ``--asr``. ConvertDb: + Fixed a bug in the ``airr`` subcommand that caused the ``junction_length`` field to be deleted from the output. + Fixed a bug in the ``genbank`` subcommand that caused the junction CDS to be missing from the ASN output. CreateGermlines: + Added the ``--cf`` argument to allow specification of the clone field. MakeDb: + Added the ``igblast-aa`` subcommand to parse the output of igblastp. + Changed the log entry ``FUNCTIONAL`` to ``PRODUCTIVE`` and removed the ``IMGT_PASS`` log entry in favor of an informative ``ERROR`` entry when sequences fail the junction region validation. + Add --regions argument to the ``igblast`` and ``igblast-aa`` subcommands to allow specification of the IMGT CDR/FWR region boundaries. Currently, the supported specifications are ``default`` (human, mouse) and ``rhesus-igl``. Version 0.4.6: July 19, 2019 ------------------------------------------------------------------------------- BuildTrees: + Added capability of running IgPhyML on outputted data (``--igphyml``) and support for passing IgPhyML arguments through BuildTrees. + Added the ``--clean`` argument to force deletion of all intermediate files after IgPhyML execution. + Added the ``--format`` argument to allow specification input and output of either the Change-O standard (``changeo``) or AIRR Rearrangement standard (``airr``). CreateGermlines: + Fixed a bug causing incorrect reporting of the germline format in the console log. ConvertDb: + Removed requirement for the ``NP1_LENGTH`` and ``NP2_LENGTH`` fields from the genbank subcommand. DefineClones: + Fixed a biopython warning arising when applying ``--model aa`` to junction sequences that are not a multiple of three. The junction will now be padded with an appropriate number of Ns (usually resulting in a translation to X). MakeDb: + Added the ``--10x`` argument to all subcommands to support merging of Cell Ranger annotation data, such as UMI count and C-region assignment, with the output of the supported alignment tools. + Added inference of the receptor locus from the alignment data to all subcommands, which is output in the ``LOCUS`` field. + Combined the extended field arguments of all subcommands (``--scores``, ``--regions``, ``--cdr3``, and ``--junction``) into a single ``--extended`` argument. + Removed parsing of old IgBLAST v1.5 CDR3 fields (``CDR3_IGBLAST``, ``CDR3_IGBLAST_AA``). Version 0.4.5: January 9, 2019 ------------------------------------------------------------------------------- + Slightly changed version number display in commandline help. BuildTrees: + Fixed a bug that caused malformed lineages.tsv output file. CreateGermlines: + Fixed a bug in the CreateGermlines log output causing incorrect missing D gene or J gene error messages. DefineClones: + Fixed a bug that caused a missing junction column to cluster sequences together. MakeDb: + Fixed a bug that caused failed germline reconstructions to be recorded as ``None``, rather than an empty string, in the ``GERMLINE_IMGT`` column. Version 0.4.4: October 27, 2018 ------------------------------------------------------------------------------- + Fixed a bug causing the values of ``_start`` fields to be off by one from the v1.2 AIRR Schema requirement when specifying ``--format airr``. Version 0.4.3: October 19, 2018 ------------------------------------------------------------------------------- + Updated airr library requirement to v1.2.1 to fix empty V(D)J start coordinate values when specifying ``--format airr`` to tools. + Changed pRESTO dependency to v0.5.10. BuildTrees: + New tool. + Converts tab-delimited database files into input for `IgPhyML `_ CreateGermlines: + Now verifies that all files/folder passed to the ``-r`` argument exist. Version 0.4.2: September 6, 2018 ------------------------------------------------------------------------------- + Updated support for the AIRR Rearrangement schema to v1.2 and added the associated airr library dependency. AssignGenes: + New tool. + Provides a simple IgBLAST wrapper as the ``igblast`` subcommand. ConvertDb: + The ``genbank`` subcommand will perform a check for some of the required columns in the input file and exit if they are not found. + Changed the behavior of the ``-y`` argument in the ``genbank`` subcommand. This argument is now featured to sample features only, but allows for the inclusion of any BioSample attribute. CreateGermlines: + Will now perform a naive verification that the reference sequences provided to the ``-r`` argument are IMGT-gapped. A warning will be issued to standard error if the reference sequence fail the check. + Will perform a check for some of the required columns in the input file and exit if they are not found. MakeDb: + Changed the output of ``SEQUENCE_VDJ`` from the igblast subcommand to retain insertions in the query sequence rather than delete them as is done in the ``SEQUENCE_IMGT`` field. + Will now perform a naive verification that the reference sequences provided to the ``-r`` argument are IMGT-gapped. A warning will be issued to standard error if the reference sequence fail the check. Version 0.4.1: July 16, 2018 ------------------------------------------------------------------------------- + Fixed installation incompatibility with pip 10. + Fixed duplicate newline issue on Windows. + All tools will no longer create empty pass or fail files if there are no records meeting the appropriate criteria for output. + Most tools now allow explicit specification of the output file name via the optional ``-o`` argument. + Added support for the AIRR standard TSV via the ``--format airr`` argument to all relevant tools. + Replaced V, D and J ``BTOP`` columns with ``CIGAR`` columns in data standard. + Numerous API changes and internal structural changes to commandline tools. AlignRecords: + Fixed a bug arising when space characters are present in the sequence identifiers. ConvertDb: + New tool. + Includes the airr and changeo subcommand to convert between AIRR and Change-O formatted TSV files. + The genbank subcommand creates MiAIRR compliant files for submission to GenBank/TLS. + Contains the baseline and fasta subcommands previously in ParseDb. CreateGermlines + Changed character used to pad clonal consensus sequences from ``.`` to ``N``. + Changed tie resolution in clonal consensus from random V/J gene to alphabetical by sequence identifier. + Added ``--df`` and ``-jf`` arguments for specifying D and J fields, respectively. + Add initial sorting step with specifying ``--cloned`` so that clonally ordered input is no longer required. DefineClones: + Removed the chen2010 and ademokun2011 and made the previous bygroup subcommand the default behavior. + Renamed the ``--f`` argument to ``--gf`` for consistency with other tools. + Added the arguments ``--vf`` and ``-jf`` to allow specification of V and J call fields, respectively. MakeDb: + Renamed ``--noparse`` argument to ``--asis-id``. + Added ``asis-calls`` argument to igblast subcommand to allow use with non-standard gene names. + Added the ``GERMLINE_IMGT`` column to the default output. + Changed junction inference in igblast subcommand to use IgBLAST's CDR3 assignment for IgBLAST versions greater than or equal to 1.7.0. + Added a verification that the ``SEQUENCE_IMGT`` and ``JUNCTION`` fields are in agreement for records to pass. + Changed behavior of the igblast subcommand's translation of the junction sequence to truncate junction that are not multiples of 3, rather than pad to a multiple of 3 (removes trailing X character). + The igblast subcommand will now fail records missing the required optional fields ``subject seq``, ``query seq`` and ``BTOP``, rather than abort. + Fixed bug causing parsing of IgBLAST <= 1.4 output to fail. ParseDb: + Added the merge subcommand which will combine TSV files. + All field arguments are now case sensitive to provide support for both the Change-O and AIRR data standards. Version 0.3.12: February 16, 2018 ------------------------------------------------------------------------------- MakeDb: + Fixed a bug wherein specifying multiple simultaneous inputs would cause duplication of parsed pRESTO fields to appear in the second and higher output files. Version 0.3.11: February 6, 2018 ------------------------------------------------------------------------------- MakeDb: + Fixed junction inferrence for igblast subcommand when J region is truncated. Version 0.3.10: February 6, 2018 ------------------------------------------------------------------------------- Fixed incorrect progress bars resulting from files containing empty lines. DefineClones: + Fixed several bugs in the chen2010 and ademokun2011 methods that caused them to either fail or incorrectly cluster all sequences into a single clone. + Added informative message for out of memory error in chen2010 and ademokun2011 methods. Version 0.3.9: October 17, 2017 ------------------------------------------------------------------------------- DefineClones: + Fixed a bug causing DefineClones to fail when all are sequences removed from a group due to missing characters. Version 0.3.8: October 5, 2017 ------------------------------------------------------------------------------- AlignRecords: + Ressurrected AlignRecords which performs multiple alignment of sequence fields. + Added new subcommands ``across`` (multiple aligns within columns), ``within`` (multiple aligns columns within each row), and ``block`` (multiple aligns across both columns and rows). CreateGermlines: + Fixed a bug causing CreateGermlines to incorrectly fail records when using the argument ``--vf V_CALL_GENOTYPED``. DefineClones: + Added the ``--maxmiss`` argument to the bygroup subcommand of DefineClones which set exclusion criteria for junction sequence with ambiguous and missing characters. By default, bygroup will now fail all sequences with any missing characters in the junction (``--maxmiss 0``). Version 0.3.7: June 30, 2017 ------------------------------------------------------------------------------- MakeDb: + Fixed an incompatibility with IgBLAST v1.7.0. CreateGermlines: + Fixed an error that occurs when using the ``--cloned`` with an input file containing duplicate values in ``SEQUENCE_ID`` that caused some records to be discarded. Version 0.3.6: June 13, 2017 ------------------------------------------------------------------------------- + Fixed an overflow error on Windows that caused tools to fatally exit. + All tools will now print detailed help if no arguments are provided. Version 0.3.5: May 12, 2017 ------------------------------------------------------------------------------- Fixed a bug wherein ``.tsv`` was not being recognized as a valid extension. MakeDb: + Added the ``--cdr3`` argument to the igblast subcommand to extract the CDR3 nucleotide and amino acid sequence defined by IgBLAST. + Updated the IMGT/HighV-QUEST parser to handle recent column name changes. + Fixed a bug in the igblast parser wherein some sequence identifiers were not being processed correctly. DefineClones: + Changed the way ``X`` characters are handled in the amino acid Hamming distance model to count as a match against any character. Version 0.3.4: February 14, 2017 ------------------------------------------------------------------------------- License changed to Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0). CreateGermlines: + Added ``GERMLINE_V_CALL``, ``GERMLINE_D_CALL`` and ``GERMLINE_J_CALL`` columns to the output when the ``-cloned`` argument is specified. These columns contain the consensus annotations when clonal groups contain ambiguous gene assignments. + Fixed the error message for an invalid repo (``-r``) argument. DefineClones: + Deprecated ``m1n`` and ``hs1f`` distance models, renamed them to ``m1n_compat`` and ``hs1f_compat``, and replaced them with ``hh_s1f`` and replaced ``mk_rs1nf``, respectively. + Renamed the ``hs5f`` distance model to ``hh_s5f``. + Added the mouse specific distance model ``mk_rs5nf`` from Cui et al, 2016. MakeDb: + Added compatibility for IgBLAST v1.6. + Added the flag ``--partial`` which tells MakeDb to pass incomplete alignment results specified. + Added missing console log entries for the ihmm subcommand. + IMGT/HighV-QUEST, IgBLAST and iHMMune-Align parsers have been cleaned up, better documented and moved into the iterable classes ``changeo.Parsers.IMGTReader``, ``change.Parsers.IgBLASTReader``, and ``change.Parsers.IHMMuneReader``, respectively. + Corrected behavior of ``D_FRAME`` annotation from the ``--junction`` argument to the imgt subcommand such that it now reports no value when no value is reported by IMGT, rather than reporting the reading frame as 0 in these cases. + Fixed parsing of ``IN_FRAME``, ``STOP``, ``D_SEQ_START`` and ``D_SEQ_LENGTH`` fields from iHMMune-Align output. + Removed extraneous score fields from each parser. + Fixed the error message for an invalid repo (``-r``) argument. Version 0.3.3: August 8, 2016 ------------------------------------------------------------------------------- Increased ``csv.field_size_limit`` in changeo.IO, ParseDb and DefineClones to be able to handle files with larger number of UMIs in one field. Renamed the fields ``N1_LENGTH`` to ``NP1_LENGTH`` and ``N2_LENGTH`` to ``NP2_LENGTH``. CreateGermlines: + Added differentiation of the N and P regions the the ``REGION`` log field if the N/P region info is present in the input file (eg, from the ``--junction`` argument to MakeDb-imgt). If the additional N/P region columns are not present, then both N and P regions will be denoted by N, as in previous versions. + Added the option 'regions' to the ``-g`` argument to create add the ``GERMLINE_REGIONS`` field to the output which represents the germline positions as V, D, J, N and P characters. This is equivalent to the ``REGION`` log entry. DefineClones: + Improved peformance significantly of the ``--act set`` grouping method in the bygroup subcommand. MakeDb: + Fixed a bug producing ``D_SEQ_START`` and ``J_SEQ_START`` relative to ``SEQUENCE_VDJ`` when they should be relative to ``SEQUENCE_INPUT``. + Added the argument ``--junction`` to the imgt subcommand to parse additional junction information fields, including N/P region lengths and the D-segment reading frame. This provides the following additional output fields: ``D_FRAME``, ``N1_LENGTH``, ``N2_LENGTH``, ``P3V_LENGTH``, ``P5D_LENGTH``, ``P3D_LENGTH``, ``P5J_LENGTH``. + The fields ``N1_LENGTH`` and ``N2_LENGTH`` have been renamed to accommodate adding additional output from IMGT under the ``--junction`` flag. The new names are ``NP1_LENGTH`` and ``NP2_LENGTH``. + Fixed a bug that caused the ``IN_FRAME``, ``MUTATED_INVARIANT`` and ``STOP`` field to be be parsed incorrectly from IMGT data. + Ouput from iHMMuneAlign can now be parsed via the ``ihmm`` subcommand. Note, there is insufficient information returned by iHMMuneAlign to reliably reconstruct germline sequences from the output using CreateGermlines. ParseDb: + Renamed the clip subcommand to baseline. Version 0.3.2: March 8, 2016 ------------------------------------------------------------------------------- Fixed a bug with installation on Windows due to old file paths lingering in changeo.egg-info/SOURCES.txt. Updated license from CC BY-NC-SA 3.0 to CC BY-NC-SA 4.0. CreateGermlines: + Fixed a bug producing incorrect values in the ``SEQUENCE`` field on the log file. MakeDb: + Updated igblast subcommand to correctly parse records with indels. Now igblast must be run with the argument ``outfmt "7 std qseq sseq btop"``. + Changed the names of the FWR and CDR output columns added with ``--regions`` to ``_IMGT``. + Added ``V_BTOP`` and ``J_BTOP`` output when the ``--scores`` flag is specified to the igblast subcommand. Version 0.3.1: December 18, 2015 ------------------------------------------------------------------------------- MakeDb: + Fixed bug wherein the imgt subcommand was not properly recognizing an extracted folder as input to the ``-i`` argument. Version 0.3.0: December 4, 2015 ------------------------------------------------------------------------------- Conversion to a proper Python package which uses pip and setuptools for installation. The package now requires Python 3.4. Python 2.7 is not longer supported. The required dependency versions have been bumped to numpy 1.9, scipy 0.14, pandas 0.16 and biopython 1.65. DbCore: + Divided DbCore functionality into the separate modules: Defaults, Distance, IO, Multiprocessing and Receptor. IgCore: + Remove IgCore in favor of dependency on pRESTO >= 0.5.0. AnalyzeAa: + This tool was removed. This functionality has been migrated to the alakazam R package. DefineClones: + Added ``--sf`` flag to specify sequence field to be used to calculate distance between sequences. + Fixed bug in wherein sequences with missing data in grouping columns were being assigned into a single group and clustered. Sequences with missing grouping variables will now be failed. + Fixed bug where sequences with "None" junctions were grouped together. GapRecords: + This tool was removed in favor of adding IMGT gapping support to igblast subcommand of MakeDb. MakeDb: + Updated IgBLAST parser to create an IMGT gapped sequence and infer the junction region as defined by IMGT. + Added the ``--regions`` flag which adds extra columns containing FWR and CDR regions as defined by IMGT. + Added support to imgt subcommand for the new IMGT/HighV-QUEST compression scheme (.txz files). Version 0.2.5: August 25, 2015 ------------------------------------------------------------------------------- CreateGermlines: + Removed default '-r' repository and added informative error messages when invalid germline repositories are provided. + Updated '-r' flag to take list of folders and/or fasta files with germlines. Version 0.2.4: August 19, 2015 ------------------------------------------------------------------------------- MakeDb: + Fixed a bug wherein N1 and N2 region indexing was off by one nucleotide for the igblast subcommand (leading to incorrect SEQUENCE_VDJ values). ParseDb: + Fixed a bug wherein specifying the ``-f`` argument to the index subcommand would cause an error. Version 0.2.3: July 22, 2015 ------------------------------------------------------------------------------- DefineClones: + Fixed a typo in the default normalization setting of the bygroup subcommand, which was being interpreted as 'none' rather than 'len'. + Changed the 'hs5f' model of the bygroup subcommand to be centered -log10 of the targeting probability. + Added the ``--sym`` argument to the bygroup subcommand which determines how asymmetric distances are handled. Version 0.2.2: July 8, 2015 ------------------------------------------------------------------------------- CreateGermlines: + Germline creation now works for IgBLAST output parsed with MakeDb. The argument ``--sf SEQUENCE_VDJ`` must be provided to generate germlines from IgBLAST output. The same reference database used for the IgBLAST alignment must be specified with the ``-r`` flag. + Fixed a bug with determination of N1 and N2 region positions. MakeDb: + Combined the ``-z`` and ``-f`` flags of the imgt subcommand into a single flag, ``-i``, which autodetects the input type. + Added requirement that IgBLAST input be generated using the ``-outfmt "7 std qseq"`` argument to igblastn. + Modified SEQUENCE_VDJ output from IgBLAST parser to include gaps inserted during alignment. + Added correction for IgBLAST alignments where V/D, D/J or V/J segments are assigned overlapping positions. + Corrected N1_LENGTH and N2_LENGTH calculation from IgBLAST output. + Added the ``--scores`` flag which adds extra columns containing alignment scores from IMGT and IgBLAST output. Version 0.2.1: June 18, 2015 ------------------------------------------------------------------------------- DefineClones: + Removed mouse 3-mer model, 'm3n'. Version 0.2.0: June 17, 2015 ------------------------------------------------------------------------------- Initial public prerelease. Output files were added to the usage documentation of all scripts. General code cleanup. DbCore: + Updated loading of database files to convert column names to uppercase. AnalyzeAa: + Fixed a bug where junctions less than one codon long would lead to a division by zero error. + Added ``--failed`` flag to create database with records that fail analysis. + Added ``--sf`` flag to specify sequence field to be analyzed. CreateGermlines: + Fixed a bug where germline sequences could not be created for light chains. DefineClones: + Added a human 1-mer model, 'hs1f', which uses the substitution rates from from Yaari et al, 2013. + Changed default model to 'hs1f' and default normalization to length for bygroup subcommand. + Added ``--link`` argument which allows for specification of single, complete, or average linkage during clonal clustering (default single). GapRecords: + Fixed a bug wherein non-standard sequence fields could not be aligned. MakeDb: + Fixed bug where the allele 'TRGVA*01' was not recognized as a valid allele. ParseDb: + Added rename subcommand to ParseDb which renames fields. Version 0.2.0.beta-2015-05-31: May 31, 2015 ------------------------------------------------------------------------------- Minor changes to a few output file names and log field entries. ParseDb: + Added index subcommand to ParseDb which adds a numeric index field. Version 0.2.0.beta-2015-05-05: May 05, 2015 ------------------------------------------------------------------------------- Prerelease for review. changeo-1.2.0/requirements.txt0000644000175000017500000000015214136624752015740 0ustar nileshnileshnumpy>=1.8 scipy>=0.14 pandas>=0.24 biopython>=1.77 PyYAML>=5.1 setuptools>=2.0 presto>=0.7.0 airr>=1.3.1 changeo-1.2.0/changeo/0000755000175000017500000000000014136777167014073 5ustar nileshnileshchangeo-1.2.0/changeo/Distance.py0000644000175000017500000001770013674203454016171 0ustar nileshnilesh""" Distance calculations """ # Info __author__ = 'Jason Anthony Vander Heiden, Namita Gupta' # Imports import numpy as np import pandas as pd from itertools import combinations, product, zip_longest from pkg_resources import resource_stream from scipy.cluster.hierarchy import fcluster, linkage from scipy.spatial.distance import squareform # Presto and changeo imports from presto.Sequence import scoreDNA, scoreAA def zip_equal(*iterables): """ Zips iterables and raises exception if different lengths Arguments: iterables : pointer to iterables to zip together Returns: iter : A generator of tuples with combined elements from the iterables """ for combo in zip_longest(*iterables): if None in combo: raise IndexError('Distance model requires sequences to have same length.') yield combo def getDNADistMatrix(mat=None, mask_dist=0, gap_dist=0): """ Generates a DNA distance matrix Arguments: mat : Input distance matrix to extend to full alphabet; if unspecified, creates Hamming distance matrix that incorporates IUPAC equivalencies mask_dist : Distance for all matches against an N character gap_dist : Distance for all matches against a gap (-, .) character Returns: DataFrame : pandas.DataFrame of distances """ IUPAC_chars = list('-.ACGTRYSWKMBDHVN') mask_char = 'N' # Default matrix to inf dist_mat = pd.DataFrame(float('inf'), index=IUPAC_chars, columns=IUPAC_chars, dtype=float) # Fill in provided distances from input matrix if mat is not None: for i,j in product(mat.index, mat.columns): dist_mat.at[i, j] = mat.at[i, j] # If no input matrix, create IUPAC-defined Hamming distance else: for i,j in product(dist_mat.index, dist_mat.columns): dist_mat.at[i, j] = 1 - scoreDNA(i, j) # Set gap distance for c in '-.': dist_mat.loc[c] = dist_mat.loc[:, c] = float(gap_dist) # Set mask distance dist_mat.loc[mask_char] = dist_mat.loc[:, mask_char] = float(mask_dist) return dist_mat def getAADistMatrix(mat=None, mask_dist=0, gap_dist=0): """ Generates an amino acid distance matrix Arguments: mat : Input distance matrix to extend to full alphabet; if unspecified, creates Hamming distance matrix that incorporates IUPAC equivalencies mask_dict : Score for all matches against an X character gap_dist : Score for all matches against a gap (-, .) character Returns: DataFrame : pandas.DataFrame of distances """ IUPAC_chars = list('-.*ABCDEFGHIJKLMNOPQRSTUVWXYZ') mask_char = 'X' # Default matrix to inf dist_mat = pd.DataFrame(float('inf'), index=IUPAC_chars, columns=IUPAC_chars, dtype=float) # Fill in provided distances from input matrix if mat is not None: for i, j in product(mat.index, mat.columns): dist_mat.at[i, j] = mat.at[i, j] # If no input matrix, create IUPAC-defined Hamming distance else: for i, j in product(dist_mat.index, dist_mat.columns): dist_mat.at[i, j] = 1 - scoreAA(i, j) # Set gap distance for c in '-.': dist_mat.loc[c] = dist_mat.loc[:, c] = float(gap_dist) # Set mask distance dist_mat.loc[mask_char] = dist_mat.loc[:, mask_char] = float(mask_dist) return dist_mat def getNmers(sequences, n): """ Breaks input sequences down into n-mers Arguments: sequences : List of sequences to be broken into n-mers n : Length of n-mers to return Returns: dict : Dictionary mapping sequence to a list of n-mers """ # Add Ns so first nucleotide is center of first n-mer sequences_n = ['N' * ((n - 1) // 2) + seq + 'N' * ((n - 1) // 2) for seq in sequences] nmers = {} for seq,seqn in zip(sequences,sequences_n): nmers[seq] = [seqn[i:i+n] for i in range(len(seqn)-n+1)] # nmers = {(seq, [seqn[i:i+n] for i in range(len(seqn)-n+1)]) for seq,seqn in izip(sequences,sequences_n)} return nmers def calcDistances(sequences, n, dist_mat, sym='avg', norm=None): """ Calculate pairwise distances between input sequences Arguments: sequences : List of sequences for which to calculate pairwise distances n : Length of n-mers to be used in calculating distance dist_mat : pandas.DataFrame of mutation distances norm : Normalization method. One of None, 'len', or 'mut'. sym : Symmetry method; one of 'avg' of 'min. Returns: ndarray : numpy matrix of pairwise distances between input sequences """ # Initialize output distance matrix dists = np.zeros((len(sequences), len(sequences))) # Generate dictionary of n-mers from input sequences nmers = getNmers(sequences, n) # Iterate over combinations of input sequences for j, k in combinations(list(range(len(sequences))), 2): # Only consider characters and n-mers with mutations mutated = [i for i, (c1, c2) in enumerate(zip_equal(sequences[j], sequences[k])) if c1 != c2] seq1 = [sequences[j][i] for i in mutated] seq2 = [sequences[k][i] for i in mutated] nmer1 = [nmers[sequences[j]][i] for i in mutated] nmer2 = [nmers[sequences[k]][i] for i in mutated] # Determine normalizing factor if norm == 'len': norm_by = len(sequences[0]) elif norm == 'mut': norm_by = len(mutated) else: norm_by = 1 # Determine symmetry function if sym == 'avg': sym_fun = np.mean elif sym == 'min': sym_fun = min else: sym_fun = sum # Calculate distances try: dists[j, k] = dists[k, j] = sum([sym_fun([dist_mat.at[c1, n2], dist_mat.at[c2, n1]]) for c1, c2, n1, n2 in zip(seq1, seq2, nmer1, nmer2)]) / norm_by except KeyError: raise KeyError('Unrecognized character in sequence.') return dists def formClusters(dists, link, distance): """ Form clusters based on hierarchical clustering of input distance matrix with linkage type and cutoff distance Arguments: dists : numpy matrix of distances link : Linkage type for hierarchical clustering distance : Distance at which to cut into clusters Returns: list : List of cluster assignments """ # Make distance matrix square dists = squareform(dists) # Compute linkage links = linkage(dists, link) # Break into clusters based on cutoff clusters = fcluster(links, distance, criterion='distance') return clusters # TODO: This should all probably be a class # Amino acid Hamming distance aa_model = getAADistMatrix(mask_dist=0, gap_dist=0) # DNA Hamming distance ham_model = getDNADistMatrix(mask_dist=0, gap_dist=0) # Load model data with resource_stream(__name__, 'data/hh_s1f_dist.tsv') as f: hh_s1f_model = pd.read_csv(f, sep='\t', index_col=0) with resource_stream(__name__, 'data/hh_s5f_dist.tsv') as f: hh_s5f_model = pd.read_csv(f, sep='\t', index_col=0) with resource_stream(__name__, 'data/mk_rs1nf_dist.tsv') as f: mk_rs1nf_model = pd.read_csv(f, sep='\t', index_col=0) with resource_stream(__name__, 'data/mk_rs5nf_dist.tsv') as f: mk_rs5nf_model = pd.read_csv(f, sep='\t', index_col=0) with resource_stream(__name__, 'data/m1n_compat_dist.tsv') as f: m1n_compat_model = pd.read_csv(f, sep='\t', index_col=0) with resource_stream(__name__, 'data/hs1f_compat_dist.tsv') as f: hs1f_compat_model = pd.read_csv(f, sep='\t', index_col=0) distance_models = {'ham': ham_model, 'aa': aa_model, 'hh_s1f': hh_s1f_model, 'hh_s5f': hh_s5f_model, 'mk_rs1nf': mk_rs1nf_model, 'mk_rs5nf': mk_rs5nf_model, 'm1n_compat': m1n_compat_model, 'hs1f_compat': hs1f_compat_model}changeo-1.2.0/changeo/Alignment.py0000644000175000017500000003315413674203454016356 0ustar nileshnilesh""" Alignment manipulation """ # Info __author__ = 'Jason Anthony Vander Heiden' # Imports import re from Bio.Seq import Seq # Presto and changeo imports from changeo.Gene import getVAllele, getJAllele # Load regions # import yaml # from pkg_resources import resource_stream # with resource_stream(__name__, 'data/regions.yaml') as f: # imgt_regions = yaml.load(f, Loader=yaml.FullLoader) imgt_regions = {'default': {'fwr1': 1, 'cdr1': 27, 'fwr2': 39, 'cdr2': 56, 'fwr3': 66, 'cdr3': 105}, 'rhesus-igl': {'fwr1': 1, 'cdr1': 28, 'fwr2': 40, 'cdr2': 59, 'fwr3': 69, 'cdr3': 108}} class RegionDefinition: """ FWR and CDR region boundary definitions """ def __init__(self, junction_length, amino_acid=False, definition='default'): """ Initializer Arguments: junction_length (int): length of the junction region. If None then CDR3 end and FWR4 start/end are undefined. definition (str): region definition entry in the data/regions.yaml file to use. amino_acid (bool): if True define boundaries in amino acid space, otherwise use nucleotide positions. Returns: changeo.Alignment.RegionDefinition """ self.junction_length = junction_length self.amino_acid = amino_acid self.definition = definition pos_mod = 1 if amino_acid else 3 # Define regions regions = {k: (int(v) - 1) * pos_mod for k, v in imgt_regions[definition].items()} # Assign positions if junction_length is not None: fwr4_start = max(regions['cdr3'], regions['cdr3'] - (2 * pos_mod) + junction_length) \ if junction_length is not None else None junction_end = fwr4_start + (1 * pos_mod) else: fwr4_start = None junction_end = None self.positions = {'fwr1': [regions['fwr1'], regions['cdr1']], 'cdr1': [regions['cdr1'], regions['fwr2']], 'fwr2': [regions['fwr2'], regions['cdr2']], 'cdr2': [regions['cdr2'], regions['fwr3']], 'fwr3': [regions['fwr3'], regions['cdr3']], 'cdr3': [regions['cdr3'], fwr4_start], 'fwr4': [fwr4_start, None], 'junction': [regions['cdr3'] - (1 * pos_mod), junction_end]} def getRegions(self, seq): """ Return IMGT defined FWR and CDR regions Arguments: seq : IMGT-gapped sequence. Returns: dict : dictionary of FWR and CDR sequences. """ regions = {'fwr1_imgt': None, 'fwr2_imgt': None, 'fwr3_imgt': None, 'fwr4_imgt': None, 'cdr1_imgt': None, 'cdr2_imgt': None, 'cdr3_imgt': None} try: seq_len = len(seq) regions['fwr1_imgt'] = seq[self.positions['fwr1'][0]:min(self.positions['fwr1'][1], seq_len)] except (KeyError, IndexError, TypeError): return regions try: regions['cdr1_imgt'] = seq[self.positions['cdr1'][0]:min(self.positions['cdr1'][1], seq_len)] except (IndexError): return regions try: regions['fwr2_imgt'] = seq[self.positions['fwr2'][0]:min(self.positions['fwr2'][1], seq_len)] except (IndexError): return regions try: regions['cdr2_imgt'] = seq[self.positions['cdr2'][0]:min(self.positions['cdr2'][1], seq_len)] except (IndexError): return regions try: regions['fwr3_imgt'] = seq[self.positions['fwr3'][0]:min(self.positions['fwr3'][1], seq_len)] except (IndexError): return regions try: regions['cdr3_imgt'] = seq[self.positions['cdr3'][0]:min(self.positions['cdr3'][1], seq_len)] regions['fwr4_imgt'] = seq[self.positions['fwr4'][0]:] except (KeyError, IndexError, TypeError): return regions return regions def decodeBTOP(btop): """ Parse a BTOP string into a list of tuples in CIGAR annotation. Arguments: btop : BTOP string. Returns: list : tuples of (operation, length) for each operation in the BTOP string using CIGAR annotation. """ # Determine chunk type and length def _recode(m): if m.isdigit(): return ('=', int(m)) elif m[0] == '-': return ('I', len(m) // 2) elif m[1] == '-': return ('D', len(m) // 2) else: return ('X', len(m) // 2) # Split BTOP string into sections btop_split = re.sub(r'(\d+|[-A-Z]{2})', r'\1;', btop) # Parse each chunk of encoding matches = re.finditer(r'(\d+)|([A-Z]{2};)+|(-[A-Z];)+|([A-Z]-;)+', btop_split) return [_recode(m.group().replace(';', '')) for m in matches] def decodeCIGAR(cigar): """ Parse a CIGAR string into a list of tuples. Arguments: cigar : CIGAR string. Returns: list : tuples of (operation, length) for each operation in the CIGAR string. """ matches = re.findall(r'(\d+)([A-Z])', cigar) return [(m[1], int(m[0])) for m in matches] def encodeCIGAR(alignment): """ Encodes a list of tuple with alignment information into a CIGAR string. Arguments: tuple : tuples of (type, length) for each alignment operation. Returns: str : CIGAR string. """ return ''.join(['%i%s' % (x, s) for s, x in alignment]) def padAlignment(alignment, q_start, r_start): """ Pads the start of an alignment based on query and reference positions. Arguments: alignment : tuples of (operation, length) for each alignment operation. q_start : query (input) start position (0-based) r_start : reference (subject) start position (0-based) Returns: list : updated list of tuples of (operation, length) for the alignment. """ # Copy list to avoid weirdness result = alignment[:] # Add query deletions if result [0][0] == 'S': result[0] = ('S', result[0][1] + q_start) elif q_start > 0: result.insert(0, ('S', q_start)) # Add reference padding if present if result[0][0] == 'N': result[0] = ('N', result[0][1] + r_start) elif result [0][0] == 'S' and result[1][0] == 'N': result[1] = ('N', result[1][1] + r_start) elif result[0][0] == 'S' and r_start > 0: result.insert(1, ('N', r_start)) elif r_start > 0: result.insert(0, ('N', r_start)) return result def alignmentPositions(alignment): """ Extracts start position and length from an alignment Arguments: alignment : tuples of (operation, length) for each alignment operation. Returns: dict : query (q) and reference (r) start (0-based) and length information with keys {q_start, q_length, r_start, r_length}. """ # Return object result = {'q_start': 0, 'q_length': 0, 'r_start': 0, 'r_length': 0} # Query start if alignment[0][0] == 'S': result['q_start'] = alignment[0][1] # Reference start if alignment[0][0] == 'N': result['r_start'] = alignment[0][1] elif alignment[0][0] == 'S' and alignment[1][0] == 'N': result['r_start'] = alignment[1][1] # Reference length for x, i in alignment: if x in ('M', '=', 'X'): result['r_length'] += i result['q_length'] += i elif x == 'D': result['r_length'] += i elif x == 'I': result['q_length'] += i return result def gapV(seq, v_germ_start, v_germ_length, v_call, references, asis_calls=False): """ Construction IMGT-gapped V segment sequences. Arguments: seq (str): V(D)J sequence alignment (SEQUENCE_VDJ). v_germ_start (int): start position V segment alignment in the germline (V_GERM_START_VDJ, 1-based). v_germ_length (int): length of the V segment alignment against the germline (V_GERM_LENGTH_VDJ, 1-based). v_call (str): V segment allele assignment (V_CALL). references (dict): dictionary of IMGT-gapped reference sequences. asis_calls (bool): if True do not parse v_call for allele names and just split by comma. Returns: dict: dictionary containing IMGT-gapped query sequences and germline positions. Raises: KeyError: raised if the v_call is not found in the reference dictionary. """ # Initialize return object imgt_dict = {'sequence_imgt': None, 'v_germ_start_imgt': None, 'v_germ_length_imgt': None} # Initialize imgt gapped sequence seq_imgt = '.' * (int(v_germ_start) - 1) + seq # Extract first V call if not asis_calls: vgene = getVAllele(v_call, action='first') else: vgene = v_call.split(',')[0] # Find gapped germline V segment try: #if vgene in references: vgap = references[vgene] # Iterate over gaps in the germline segment gaps = re.finditer(r'\.', vgap) gapcount = int(v_germ_start) - 1 for gap in gaps: i = gap.start() # Break if gap begins after V region if i >= v_germ_length + gapcount: break # Insert gap into IMGT sequence seq_imgt = seq_imgt[:i] + '.' + seq_imgt[i:] # Update gap counter gapcount += 1 imgt_dict['sequence_imgt'] = seq_imgt # Update IMGT positioning information for V imgt_dict['v_germ_start_imgt'] = 1 imgt_dict['v_germ_length_imgt'] = v_germ_length + gapcount except KeyError as e: raise KeyError('%s was not found in the germline repository.' % vgene) #else: # printWarning('%s was not found in the germline repository. IMGT-gapped sequence cannot be determined.' % vgene) return imgt_dict def inferJunction(seq, j_germ_start, j_germ_length, j_call, references, asis_calls=False, regions='default'): """ Identify junction region by IMGT definition. Arguments: seq (str): IMGT-gapped V(D)J sequence alignment (SEQUENCE_IMGT). j_germ_start (int): start position J segment alignment in the germline (J_GERM_START, 1-based). j_germ_length (int): length of the J segment alignment against the germline (J_GERM_LENGTH). j_call (str): J segment allele assignment (J_CALL). references (dict): dictionary of IMGT-gapped reference sequences. asis_calls (bool): if True do not parse V_CALL for allele names and just split by comma. regions (str): name of the IMGT FWR/CDR region definitions to use. Returns: dict : dictionary containing junction sequence, translation and length. """ junc_dict = {'junction': None, 'junction_aa': None, 'junction_length': None} # Find germline J segment if not asis_calls: jgene = getJAllele(j_call, action='first') else: jgene = j_call.split(',')[0] jgerm = references.get(jgene, None) if jgerm is not None: # Look for (F|W)GXG amino acid motif in germline nucleotide sequence motif = re.search(r'T(TT|TC|GG)GG[ACGT]{4}GG[AGCT]', jgerm) # Define junction end position seq_len = len(seq) if motif: j_start = seq_len - j_germ_length motif_pos = max(motif.start() - j_germ_start + 1, -1) junc_end = j_start + motif_pos + 3 else: junc_end = seq_len # Extract junction rd = RegionDefinition(None, amino_acid=False, definition=regions) junc_start = rd.positions['junction'][0] junc_dict['junction'] = seq[junc_start:junc_end] junc_len = len(junc_dict['junction']) junc_dict['junction_length'] = junc_len # Translation junc_tmp = junc_dict['junction'].replace('-', 'N').replace('.', 'N') if junc_len % 3 > 0: junc_tmp = junc_tmp[:junc_len - junc_len % 3] junc_dict['junction_aa'] = str(Seq(junc_tmp).translate()) return junc_dict def getRegions(seq, junction_length): """ Identify FWR and CDR regions by IMGT definition. Arguments: seq : IMGT-gapped sequence. junction_length : length of the junction region in nucleotides. Returns: dict : dictionary of FWR and CDR sequences. """ region_dict = {'fwr1_imgt': None, 'fwr2_imgt': None, 'fwr3_imgt': None, 'fwr4_imgt': None, 'cdr1_imgt': None, 'cdr2_imgt': None, 'cdr3_imgt': None} try: seq_len = len(seq) region_dict['fwr1_imgt'] = seq[0:min(78, seq_len)] except (KeyError, IndexError, TypeError): return region_dict try: region_dict['cdr1_imgt'] = seq[78:min(114, seq_len)] except (IndexError): return region_dict try: region_dict['fwr2_imgt'] = seq[114:min(165, seq_len)] except (IndexError): return region_dict try: region_dict['cdr2_imgt'] = seq[165:min(195, seq_len)] except (IndexError): return region_dict try: region_dict['fwr3_imgt'] = seq[195:min(312, seq_len)] except (IndexError): return region_dict try: # CDR3 cdr3_end = 306 + junction_length region_dict['cdr3_imgt'] = seq[312:cdr3_end] # FWR4 region_dict['fwr4_imgt'] = seq[cdr3_end:] except (KeyError, IndexError, TypeError): return region_dict return region_dict changeo-1.2.0/changeo/Applications.py0000644000175000017500000002224313674203454017063 0ustar nileshnilesh""" Application wrappers """ # Info __author__ = 'Jason Anthony Vander Heiden' # Imports import os import re from subprocess import check_output, STDOUT, CalledProcessError # Presto and changeo imports from presto.IO import printError, printWarning from changeo.Defaults import default_igblastn_exec, default_igblastp_exec, default_tbl2asn_exec, \ default_igphyml_exec # Defaults default_igblast_output = 'legacy' def runASN(fasta, template=None, exec=default_tbl2asn_exec): """ Executes tbl2asn to generate Sequin files Arguments: fasta (str): fsa file name. template (str): sbt file name. exec (str): the name or path to the tbl2asn executable. Returns: str: tbl2asn console output. """ # Basic command that requires .fsa and .tbl files in the same directory # tbl2asn -i records.fsa -a s -V vb -t template.sbt # Define tbl2asn command cmd = [exec, '-i', os.path.abspath(fasta), '-a', 's', '-V', 'vb'] if template is not None: cmd.extend(['-t', os.path.abspath(template)]) # Execute tbl2asn try: stdout_str = check_output(cmd, stderr=STDOUT, shell=False, universal_newlines=True) except CalledProcessError as e: printError('Running command: %s\n%s' % (' '.join(cmd), e.output)) if 'Unable to read any FASTA records' in stdout_str: printError('%s failed: %s' % (' '.join(cmd), stdout_str)) return stdout_str def runIgPhyML(rep_file, rep_dir, model='HLP17', motifs='FCH', threads=1, exec=default_igphyml_exec): """ Run IgPhyML Arguments: rep_file (str): repertoire tsv file. rep_dir (str): directory containing input fasta files. model (str): model to use. motif (str): motifs argument. threads : number of threads. exec : the path to the IgPhyMl executable. Returns: str: name of the output tree file. """ # cd rep_dir # igphyml --repfile rep_file -m HLP17 --motifs FCH --omegaOpt e,e --run_id test -o tlr --threads 4 --minSeq 2 # Define igphyml command cmd = [exec, '--repfile', rep_file, '-m', model, '--motifs', motifs, '--omegaOpt', 'e,e', '-o', 'tlr', '--minSeq', '2', '--threads', str(threads)] # Run IgPhyMl try: stdout_str = check_output(cmd, stderr=STDOUT, shell=False, universal_newlines=True, cwd=rep_dir) except CalledProcessError as e: printError('Running command: %s\n%s' % (' '.join(cmd), e.output)) return None def runIgBLASTN(fasta, igdata, loci='ig', organism='human', vdb=None, ddb=None, jdb=None, output=None, format=default_igblast_output, threads=1, exec=default_igblastn_exec): """ Runs igblastn on a sequence file Arguments: fasta (str): fasta file containing sequences. igdata (str): path to the IgBLAST database directory (IGDATA environment). loci (str): receptor type; one of 'ig' or 'tr'. organism (str): species name. vdb (str): name of a custom V reference in the database folder to use. ddb (str): name of a custom D reference in the database folder to use. jdb (str): name of a custom J reference in the database folder to use. output (str): output file name. If None, automatically generate from the fasta file name. format (str): output format. One of 'blast' or 'airr'. threads (int): number of threads for igblastn. exec (str): the name or path to the igblastn executable. Returns: str: IgBLAST console output. """ # export IGDATA # declare -A SEQTYPE # SEQTYPE[ig] = "Ig" # SEQTYPE[tr] = "TCR" # GERMLINE_V = "imgt_${SPECIES}_${RECEPTOR}_v" # GERMLINE_D = "imgt_${SPECIES}_${RECEPTOR}_d" # GERMLINE_J = "imgt_${SPECIES}_${RECEPTOR}_j" # AUXILIARY = "${SPECIES}_gl.aux" # IGBLAST_DB = "${IGDATA}/database" # IGBLAST_CMD = "igblastn \ # -germline_db_V ${IGBLAST_DB}/${GERMLINE_V} \ # -germline_db_D ${IGBLAST_DB}/${GERMLINE_D} \ # -germline_db_J ${IGBLAST_DB}/${GERMLINE_J} \ # -auxiliary_data ${IGDATA}/optional_file/${AUXILIARY} \ # -ig_seqtype ${SEQTYPE[${RECEPTOR}]} -organism ${SPECIES} \ # -domain_system imgt -outfmt '7 std qseq sseq btop'" # # # Set run commmand # OUTFILE =$(basename ${READFILE}) # OUTFILE = "${OUTDIR}/${OUTFILE%.fasta}.fmt7" # IGBLAST_VER =$(${IGBLAST_CMD} -version | grep 'Package' | sed s / 'Package: ' //) # IGBLAST_RUN = "${IGBLAST_CMD} -query ${READFILE} -out ${OUTFILE} -num_threads ${NPROC}" try: outfmt = {'blast': '7 std qseq sseq btop', 'airr': '19'}[format] except KeyError: printError('Invalid output format %s.' % format) try: seqtype = {'ig': 'Ig', 'tr': 'TCR'}[loci] except KeyError: printError('Invalid receptor type %s.' % loci) # Set auxilary data auxilary = os.path.join(igdata, 'optional_file', '%s_gl.aux' % organism) # Set V database if vdb is not None: v_germ = os.path.join(igdata, 'database', vdb) else: v_germ = os.path.join(igdata, 'database', 'imgt_%s_%s_v' % (organism, loci)) # Set D database if ddb is not None: d_germ = os.path.join(igdata, 'database', ddb) else: d_germ = os.path.join(igdata, 'database', 'imgt_%s_%s_d' % (organism, loci)) # Set J database if jdb is not None: j_germ = os.path.join(igdata, 'database', jdb) else: j_germ = os.path.join(igdata, 'database', 'imgt_%s_%s_j' % (organism, loci)) # Define IgBLAST command cmd = [exec, '-query', os.path.abspath(fasta), '-out', os.path.abspath(output), '-num_threads', str(threads), '-ig_seqtype', seqtype, '-organism', organism, '-auxiliary_data', str(auxilary), '-germline_db_V', str(v_germ), '-germline_db_D', str(d_germ), '-germline_db_J', str(j_germ), '-outfmt', outfmt, '-domain_system', 'imgt'] # Execute IgBLAST env = os.environ.copy() env['IGDATA'] = igdata try: stdout_str = check_output(cmd, stderr=STDOUT, shell=False, env=env, universal_newlines=True) except CalledProcessError as e: printError('Running command: %s\n%s' % (' '.join(cmd), e.output)) #if 'Unable to read any FASTA records' in stdout_str: # sys.stderr.write('\n%s failed: %s\n' % (' '.join(cmd), stdout_str)) return stdout_str def runIgBLASTP(fasta, igdata, loci='ig', organism='human', vdb=None, output=None, threads=1, exec=default_igblastp_exec): """ Runs igblastp on a sequence file Arguments: fasta (str): fasta file containing sequences. igdata (str): path to the IgBLAST database directory (IGDATA environment). loci (str): receptor type; one of 'ig' or 'tr'. organism (str): species name. vdb (str): name of a custom V reference in the database folder to use. output (str): output file name. If None, automatically generate from the fasta file name. threads (int): number of threads for igblastp. exec (str): the name or path to the igblastp executable. Returns: str: IgBLAST console output. """ # IGBLAST_CMD="igblastp \ # -germline_db_V ${IGBLAST_DB}/imgt_aa_${SPECIES}_${RECEPTOR}_v \ # -ig_seqtype ${SEQTYPE} -organism ${SPECIES} \ # -domain_system imgt -outfmt '7 std qseq sseq btop'" try: seqtype = {'ig': 'Ig', 'tr': 'TCR'}[loci] except KeyError: printError('Invalid receptor type %s.' % loci) # Set V database if vdb is not None: v_germ = os.path.join(igdata, 'database', vdb) else: v_germ = os.path.join(igdata, 'database', 'imgt_aa_%s_%s_v' % (organism, loci)) # Define IgBLAST command cmd = [exec, '-query', os.path.abspath(fasta), '-out', os.path.abspath(output), '-num_threads', str(threads), '-ig_seqtype', seqtype, '-organism', organism, '-germline_db_V', str(v_germ), '-outfmt', '7 std qseq sseq btop', '-domain_system', 'imgt'] # Execute IgBLAST env = os.environ.copy() env['IGDATA'] = igdata try: stdout_str = check_output(cmd, stderr=STDOUT, shell=False, env=env, universal_newlines=True) except CalledProcessError as e: printError('Running command: %s\n%s' % (' '.join(cmd), e.output)) return stdout_str def getIgBLASTVersion(exec=default_igblastn_exec): """ Gets the version of the IgBLAST executable Arguments: exec (str): the name or path to the igblastn executable. Returns: str: version number. """ # Build commandline cmd = [exec, '-version'] # Run try: stdout_str = check_output(cmd, stderr=STDOUT, shell=False, universal_newlines=True) except CalledProcessError as e: printError('Running command: %s\n%s' % (' '.join(cmd), e.output)) # Extract version number match = re.search('(?<=Package: igblast )(\d+\.\d+\.\d+)', stdout_str) version = match.group(0) return versionchangeo-1.2.0/changeo/Multiprocessing.py0000644000175000017500000002553113674203454017627 0ustar nileshnilesh""" Multiprocessing """ # Info __author__ = 'Jason Anthony Vander Heiden' # Imports import os import sys from collections import OrderedDict from time import time # Presto and changeo imports from presto.IO import printProgress, printLog, printError, printWarning from changeo.Defaults import default_out_args from changeo.IO import countDbFile, getOutputHandle, AIRRReader, AIRRWriter from changeo.Receptor import Receptor class DbData: """ A class defining data objects for worker processes Attributes: id : result identifier data : list of data records valid : True if preprocessing was successfull and data should be processed """ # Instantiation def __init__(self, key, records): self.id = key self.data = records self.valid = (key is not None and records is not None) # Boolean evaluation def __bool__(self): return self.valid # Length evaluation def __len__(self): if isinstance(self.data, Receptor): return 1 elif self.data is None: return 0 else: return len(self.data) class DbResult: """ A class defining result objects for collector processes Attributes: id : result identifier data : list of original data records results: list of processed records data_pass: list of records that pass filtering for workers that split data before processing data_fail: list of records that failed filtering for workers that split data before processing valid : True if processing was successful and results should be written log : OrderedDict of log items """ # Instantiation def __init__(self, key, records): self.id = key self.data = records self.results = None self.data_pass = records self.data_fail = None self.valid = False self.log = OrderedDict([('ID', key)]) # Boolean evaluation def __bool__(self): return self.valid # Length evaluation def __len__(self): if isinstance(self.results, Receptor): return 1 elif self.results is None: return 0 else: return len(self.results) # Set data_count to number of data records @property def data_count(self): if isinstance(self.data, Receptor): return 1 elif self.data is None: return 0 else: return len(self.data) def feedDbQueue(alive, data_queue, db_file, reader=AIRRReader, group_func=None, group_args={}): """ Feeds the data queue with Ig records Arguments: alive : multiprocessing.Value boolean controlling whether processing continues if False exit process data_queue : multiprocessing.Queue to hold data for processing db_file : database file reader : database reader class group_func : function to use for grouping records group_args : dictionary of arguments to pass to group_func Returns: None """ # Open input file and perform grouping try: # Iterate over records and assign groups db_handle = open(db_file, 'rt') db_iter = reader(db_handle) if group_func is not None: # import cProfile # prof = cProfile.Profile() # group_dict = prof.runcall(group_func, db_iter, **group_args) # prof.dump_stats('feed-%d.prof' % os.getpid()) group_dict = group_func(db_iter, **group_args) group_iter = iter(group_dict.items()) else: group_iter = ((r.sequence_id, r) for r in db_iter) except: alive.value = False raise # Add groups to data queue try: # Iterate over groups and feed data queue while alive.value: # Get data from queue if data_queue.full(): continue else: data = next(group_iter, None) # Exit upon reaching end of iterator if data is None: break # Feed queue data_queue.put(DbData(*data)) else: sys.stderr.write('PID %s> Error in sibling process detected. Cleaning up.\n' \ % os.getpid()) return None except: #sys.stderr.write('Exception in feeder queue feeding step\n') alive.value = False raise return None def processDbQueue(alive, data_queue, result_queue, process_func, process_args={}, filter_func=None, filter_args={}): """ Pulls from data queue, performs calculations, and feeds results queue Arguments: alive : multiprocessing.Value boolean controlling whether processing continues; when False function returns data_queue : multiprocessing.Queue holding data to process result_queue : multiprocessing.Queue to hold processed results process_func : function to use for processing sequences process_args : dictionary of arguments to pass to process_func filter_func : function to use for filtering sequences before processing filter_args : dictionary of arguments to pass to filter_func Returns: None """ try: # Iterator over data queue until sentinel object reached while alive.value: # Get data from queue if data_queue.empty(): continue else: data = data_queue.get() # Exit upon reaching sentinel if data is None: break # Perform work if filter_func is None: result = process_func(data, **process_args) else: result = filter_func(data, **filter_args) result = process_func(result, **process_args) # Feed results to result queue result_queue.put(result) else: sys.stderr.write('PID %s> Error in sibling process detected. Cleaning up.\n' \ % os.getpid()) return None except: alive.value = False printError('Processing data with ID: %s.' % str(data.id), exit=False) raise return None def collectDbQueue(alive, result_queue, collect_queue, db_file, label, fields, writer=AIRRWriter, out_file=None, out_args=default_out_args): """ Pulls from results queue, assembles results and manages log and file IO Arguments: alive : multiprocessing.Value boolean controlling whether processing continues; when False function returns. result_queue : multiprocessing.Queue holding worker results. collect_queue : multiprocessing.Queue to store collector return values. db_file : database file name. label : task label used to tag the output files. fields : list of output fields. writer : writer class. out_file : output file name. Automatically generated from the input file if None. out_args : common output argument dictionary from parseCommonArgs. Returns: None: Adds a dictionary with key value pairs to collect_queue containing 'log' defining a log object along with the 'pass' and 'fail' output file names. """ # Wrapper for opening handles and writers def _open(x, f, writer=writer, label=label, out_file=out_file): if out_file is not None and x == 'pass': handle = open(out_file, 'w') else: handle = getOutputHandle(db_file, out_label='%s-%s' % (label, x), out_dir=out_args['out_dir'], out_name=out_args['out_name'], out_type=out_args['out_type']) return handle, writer(handle, fields=f) try: # Count input result_count = countDbFile(db_file) # Define log handle if out_args['log_file'] is None: log_handle = None else: log_handle = open(out_args['log_file'], 'w') except: alive.value = False raise try: # Initialize handles, writers and counters pass_handle, pass_writer = None, None fail_handle, fail_writer = None, None set_count, rec_count, pass_count, fail_count = 0, 0, 0, 0 start_time = time() # Iterator over results queue until sentinel object reached while alive.value: # Get result from queue if result_queue.empty(): continue else: result = result_queue.get() # Exit upon reaching sentinel if result is None: break # Print progress for previous iteration printProgress(rec_count, result_count, 0.05, start_time=start_time) # Update counts for current iteration set_count += 1 rec_count += result.data_count # Write log if result.log is not None: printLog(result.log, handle=log_handle) # Write output if result: # Write passing results pass_count += result.data_count try: pass_writer.writeReceptor(result.results) except AttributeError: # Open pass file and define writer object pass_handle, pass_writer = _open('pass', fields) pass_writer.writeReceptor(result.results) else: # Write failing data fail_count += result.data_count if out_args['failed']: try: fail_writer.writeReceptor(result.data) except AttributeError: # Open fail file and define writer object fail_handle, fail_writer = _open('fail', fields) fail_writer.writeReceptor(result.data) else: sys.stderr.write('PID %s> Error in sibling process detected. Cleaning up.\n' \ % os.getpid()) return None # Print total counts printProgress(rec_count, result_count, 0.05, start_time=start_time) # Update log log = OrderedDict() log['OUTPUT'] = os.path.basename(pass_handle.name) if pass_handle is not None else None log['RECORDS'] = rec_count log['GROUPS'] = set_count log['PASS'] = pass_count log['FAIL'] = fail_count # Close file handles and generate return data collect_dict = {'log': log, 'pass': None, 'fail': None} if pass_handle is not None: collect_dict['pass'] = pass_handle.name pass_handle.close() if fail_handle is not None: collect_dict['fail'] = fail_handle.name fail_handle.close() if log_handle is not None: log_handle.close() collect_queue.put(collect_dict) except: alive.value = False raise return Nonechangeo-1.2.0/changeo/Defaults.py0000644000175000017500000000207514062470431016176 0ustar nileshnilesh""" Default parameters """ # Info __author__ = 'Jason Anthony Vander Heiden, Namita Gupta' # System settings default_csv_size = 2**24 # Fields default_v_field = 'v_call' default_d_field = 'd_call' default_j_field = 'j_call' default_id_field = 'sequence_id' default_seq_field = 'sequence_alignment' default_germ_field = 'germline_alignment' default_junction_field = 'junction' default_clone_field = 'clone_id' # Receptor attributes v_attr = 'v_call' d_attr = 'd_call' j_attr = 'j_call' id_attr = 'sequence_id' seq_attr = 'sequence_imgt' germ_attr = 'germline_imgt' junction_attr = 'junction' clone_attr = 'clone' # External applications default_igblastn_exec = 'igblastn' default_igblastp_exec = 'igblastp' default_tbl2asn_exec = 'tbl2asn' default_igphyml_exec = 'igphyml' # Commandline arguments choices_format = ('airr', 'changeo') default_format = 'airr' default_out_args = {'log_file': None, 'out_dir': None, 'out_name': None, 'out_type': 'tsv', 'failed': False} # IMGT default_imgt_id_len = 49 changeo-1.2.0/changeo/Commandline.py0000755000175000017500000002741014010042047016647 0ustar nileshnilesh""" Commandline interface """ # Info __author__ = 'Jason Anthony Vander Heiden, Namita Gupta' from changeo import __version__, __date__ # Imports import os import sys import multiprocessing as mp from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter, \ RawDescriptionHelpFormatter # Changeo imports from presto.IO import printWarning, printError from changeo.Defaults import choices_format, default_format from changeo.Receptor import AIRRSchema, ChangeoSchema class CommonHelpFormatter(RawDescriptionHelpFormatter, ArgumentDefaultsHelpFormatter): """ Custom argparse.HelpFormatter """ # TODO: add some sort of list formating for arguments with choices # TODO: override argument position. # def __init__(self, prog, indent_increment=2, max_help_position=10, width=None): # super(self.__class__, self).__init__(self, prog, # indent_increment=indent_increment, # max_help_position=max_help_position, # width=width) # TODO: remove multiple inheritance and clean up default value printing. # From ArgumentDefaultsHelpFormatter # def _get_help_string(self, action): # help = action.help # if '%(default)' not in action.help: # if action.default is not SUPPRESS: # defaulting_nargs = [OPTIONAL, ZERO_OR_MORE] # if action.option_strings or action.nargs in defaulting_nargs: # help += ' (default: %(default)s)' # return help pass def getCommonArgParser(db_in=True, db_out=True, out_file=True, failed=True, log=True, format=True, multiproc=False, add_help=True): """ Defines an ArgumentParser object with common pRESTO arguments Arguments: db_in (bool): if True include tab delimited database input arguments. db_out (bool): if True include explicit output file name argument. out_file (bool): if True add explicit output file name arguments. failed (bool): if True include arguments for output of failed results. log (bool): if True include log arguments. format (bool): input and output type arguments. multiproc (bool): if True include multiprocessing arguments. Returns: argparse.ArgumentParser : an argument parser. """ parser = ArgumentParser(formatter_class=CommonHelpFormatter, add_help=False) # Add help and version arguments if add_help: group_help = parser.add_argument_group('help') group_help.add_argument('--version', action='version', version='%(prog)s:' + ' %s %s' %(__version__, __date__)) group_help.add_argument('-h', '--help', action='help', help='show this help message and exit') # Set standard group group = parser.add_argument_group('standard arguments') # Database arguments if db_in: group.add_argument('-d', nargs='+', action='store', dest='db_files', required=True, help='A list of tab delimited database files.') if db_out: # Place holder for the future pass # Output filename if out_file: group.add_argument('-o', nargs='+', action='store', dest='out_files', default=None, help='''Explicit output file name. Note, this argument cannot be used with the --failed, --outdir, or --outname arguments. If unspecified, then the output filename will be based on the input filename(s).''') # Universal arguments group.add_argument('--outdir', action='store', dest='out_dir', default=None, help='''Specify to changes the output directory to the location specified. The input file directory is used if this is not specified.''') group.add_argument('--outname', action='store', dest='out_name', default=None, help='''Changes the prefix of the successfully processed output file to the string specified. May not be specified with multiple input files.''') # Log arguments if log: group.add_argument('--log', action='store', dest='log_file', default=None, help='''Specify to write verbose logging to a file. May not be specified with multiple input files.''') # Failed result arguments if failed: group.add_argument('--failed', action='store_true', dest='failed', help='''If specified create files containing records that fail processing.''') # Format arguments if format: group.add_argument('--format', action='store', dest='format', default=default_format, choices=choices_format, help='''Output format. Also specifies the input format for tools accepting tab delimited AIRR Rearrangement or Change-O files.''') # Multiprocessing arguments if multiproc: group.add_argument('--nproc', action='store', dest='nproc', type=int, default=mp.cpu_count(), help='''The number of simultaneous computational processes to execute (CPU cores to utilized).''') return parser def parseCommonArgs(args, in_arg=None, in_types=None, in_list=False): """ Checks common arguments from getCommonArgParser and transforms output options to a dictionary Arguments: args : Argument Namespace defined by ArgumentParser.parse_args. in_arg : String defining a non-standard input file argument to verify; by default 'db_files' and 'seq_files' are supported in that order. in_types : List of types (file extensions as strings) to allow for files in file_arg; if None do not check type. in_list : if True allow multiple input files with the out_name and log arguments. Returns: dict : Dictionary copy of args with output arguments embedded in the dictionary out_args """ db_types = ['.tab', '.tsv', '.txt'] seq_types = ['.fasta', 'fna', '.fa', '.fastq', '.fq'] if in_types is not None: in_types = [f.lower for f in in_types] args_dict = args.__dict__.copy() # Count input files if 'seq_files' in args_dict: input_count = len(args_dict['seq_files'] or []) input_files = args_dict['seq_files'] elif 'db_files' in args_dict: input_count = len(args_dict['db_files'] or []) input_files = args_dict['db_files'] elif in_arg is not None and in_arg in args_dict: input_count = len(args_dict[in_arg] or []) input_files = args_dict[in_arg] else: printError('Cannot determine input file argument.') # Verify sequence files if 'seq_files' in args_dict and args_dict['seq_files']: for f in args_dict['seq_files']: if not os.path.isfile(f): printError('Sequence file %s does not exist.' % f) if os.path.splitext(f)[-1].lower() not in seq_types: printError('Sequence file %s is not a supported type. Must be one: %s.' \ % (f, ', '.join(seq_types))) # Verify database files if 'db_files' in args_dict and args_dict['db_files']: for f in args_dict['db_files']: if not os.path.isfile(f): printError('Database file %s does not exist.' % f) if os.path.splitext(f)[-1].lower() not in db_types: printError('Database file %s is not a supported type. Must be one: %s.' \ % (f, ', '.join(db_types))) # Verify non-standard input files if in_arg is not None and in_arg in args_dict and args_dict[in_arg]: files = args_dict[in_arg] if isinstance(args_dict[in_arg], list) else [args_dict[in_arg]] for f in files: if not os.path.exists(f): printError('Input %s does not exist.' % f) if in_types is not None and os.path.splitext(f)[-1].lower() not in in_types: printError('Input %s is not a supported type. Must be one: %s.' \ % (f, ', '.join(in_types))) # Verify output file arguments and exit if anything is hinky if args_dict.get('out_files', None) is not None \ or args_dict.get('out_file', None) is not None: if args_dict.get('out_dir', None) is not None: printError('The -o argument may not be specified with the --outdir argument.') if args_dict.get('out_name', None) is not None: printError('The -o argument may not be specified with the --outname argument.') if args_dict.get('failed', False): printError('The -o argument may not be specified with the --failed argument.') if args_dict.get('out_files', None) is not None: if len(args_dict['out_files']) != input_count: printError('The -o argument requires one output file name per input file.') for f in args_dict['out_files']: if f in input_files: printError('Output files and input files cannot have the same names.') for f in args_dict['out_files']: if os.path.isfile(f): printWarning('Output file %s already exists and will be overwritten.' % f) if args_dict.get('out_file', None) is not None: if args_dict['out_file'] in input_files: printError('Output files and input files cannot have the same names.') if os.path.isfile(args_dict['out_file']): printWarning('Output file %s already exists and will be overwritten.' % args_dict['out_file']) # Exit if output names or log files are specified with multiple input files if args_dict.get('out_name', None) is not None \ and input_count > 1 and not in_list: printError('The --outname argument may not be specified with multiple input files.') if args_dict.get('log_file', None) is not None \ and input_count > 1 and not in_list: printError('The --log argument may not be specified with multiple input files.') # Verify output directory if 'out_dir' in args_dict and args_dict['out_dir']: if os.path.exists(args_dict['out_dir']) and not os.path.isdir(args_dict['out_dir']): printError('Path %s exists but it is not a directory.' % args_dict['out_dir']) # Redefine common output options as out_args dictionary out_args = ['log_file', 'out_dir', 'out_name', 'out_type', 'failed'] args_dict['out_args'] = {k:args_dict.setdefault(k, None) for k in out_args} for k in out_args: del args_dict[k] return args_dict def checkArgs(parser): """ Checks that arguments have been provided and prints help if they have not. Arguments: parser : An argparse.ArgumentParser defining the commandline arguments. Returns: boolean : True if arguments are present. Prints help and exits if not. """ if len(sys.argv) == 1: parser.print_help() sys.exit(1) return True def setDefaultFields(args, defaults, format='airr'): """ Sets default field arguments by format Arguments: args (dict): parsed argument dictionary. defaults (dict): default variables to set with with keys as argument variables and values as AIRR field names. format (str): one of 'changeo' or 'airr' which defines the file format. Returns: dict: modified input args. """ if format == 'changeo': defaults = {k: ChangeoSchema.fromReceptor(AIRRSchema.toReceptor(v)) \ for k, v in defaults.items()} for f in defaults: if args[f] is None: args[f] = defaults[f] return(args) changeo-1.2.0/changeo/IO.py0000644000175000017500000024765614135627425014766 0ustar nileshnilesh""" File I/O and parsers """ # Info __author__ = 'Namita Gupta, Jason Anthony Vander Heiden' # Imports import csv import os import re import tarfile import yaml import zipfile from itertools import chain, groupby, zip_longest from tempfile import TemporaryDirectory from textwrap import indent from Bio import SeqIO from Bio.Seq import Seq # Presto and changeo imports from presto.IO import getFileType, printError, printWarning, printDebug from changeo.Defaults import default_csv_size from changeo.Gene import getAllele, getLocus, getVAllele, getDAllele, getJAllele from changeo.Receptor import AIRRSchema, AIRRSchemaAA, ChangeoSchema, ChangeoSchemaAA, Receptor, ReceptorData from changeo.Alignment import decodeBTOP, encodeCIGAR, padAlignment, gapV, inferJunction, \ RegionDefinition, getRegions # System settings csv.field_size_limit(default_csv_size) class TSVReader: """ Simple csv.DictReader wrapper to read format agnostic TSV files. Attributes: reader (iter): reader object. fields (list): field names. """ def __init__(self, handle): """ Initializer Arguments: handle : handle to an open TSV file Returns: changeo.IO.TSVReader """ # Arguments self.handle = handle self.receptor = False self.reader = csv.DictReader(self.handle, dialect='excel-tab') self.fields = self.reader.fieldnames def __iter__(self): """ Iterator initializer Returns: changeo.IO.TSVReader """ return self def __next__(self): """ Next method Returns: dist : row as a dictionary of field:value pairs. """ # Get next row from reader iterator try: record = next(self.reader) except StopIteration: self.handle.close() raise StopIteration return self._parse(record) def _parse(self, record): """ Parses a dictionary of fields Arguments: record : dict with fields and values to parse. Returns: dict : parsed dict. """ return record class TSVWriter: """ Simple csv.DictWriter wrapper to write format agnostic TSV files. """ def __init__(self, handle, fields, header=True): """ Initializer Arguments: handle : handle to an open output file fields : list of output field names header : if True write the header on initialization. Returns: changeo.IO.TSVWriter """ # Arguments self.handle = handle self.fields = fields self.writer = csv.DictWriter(self.handle, fieldnames=self.fields, dialect='excel-tab', extrasaction='ignore', lineterminator='\n') if header: self.writeHeader() def writeHeader(self): """ Writes the header Returns: None """ self.writer.writeheader() def writeDict(self, records): """ Writes a row from a dictionary Arguments: records : dictionary of row data or an iterable of such objects. Returns: None """ if isinstance(records, dict): self.writer.writerow(records) else: self.writer.writerows(records) class ChangeoReader(TSVReader): """ An iterator to read and parse Change-O formatted data. """ def __init__(self, handle): """ Initializer Arguments: handle : handle to an open Change-O formatted file Returns: changeo.IO.ChangeoReader """ # Arguments self.handle = handle self.reader = csv.DictReader(self.handle, dialect='excel-tab') self.reader.fieldnames = [n.strip().upper() for n in self.reader.fieldnames] self.fields = self.reader.fieldnames def _parse(self, record): """ Parses a dictionary to a Receptor object Arguments: record : dict with fields and values in the Change-O format Returns: changeo.Receptor.Receptor : parsed Receptor object. """ # Parse fields result = {} for k, v in record.items(): k = ChangeoSchema.toReceptor(k) result[k] = v return Receptor(result) class ChangeoWriter(TSVWriter): """ Writes Change-O formatted data. """ def __init__(self, handle, fields=ChangeoSchema.required, header=True): """ Initializer Arguments: handle : handle to an open output file fields : list of output field names header : if True write the header on initialization. Returns: changeo.IO.ChangeoWriter """ # Arguments self.handle = handle self.fields = [n.strip().upper() for n in fields] self.writer = csv.DictWriter(self.handle, fieldnames=self.fields, dialect='excel-tab', extrasaction='ignore', lineterminator='\n') if header: self.writeHeader() def _parseReceptor(self, record): """ Parses a Receptor object to a Change-O dictionary Arguments: record : dict with fields and values in the Receptor format. Returns: dict : parsed dict. """ row = record.toDict() # Parse known fields result = {} for k, v in row.items(): k = ChangeoSchema.fromReceptor(k) result[k] = v return result def writeReceptor(self, records): """ Writes a row from a Receptor object Arguments: records : a changeo.Receptor.Receptor object to write or an iterable of such objects. Returns: None """ if isinstance(records, Receptor): row = self._parseReceptor(records) self.writer.writerow(row) else: rows = (self._parseReceptor(r) for r in records) self.writer.writerows(rows) class AIRRReader(TSVReader): """ An iterator to read and parse AIRR formatted data. """ def __init__(self, handle): """ Initializer Arguments: handle : handle to an open AIRR formatted file receptor : if True (default) iteration returns a Receptor object, otherwise it returns a dictionary. Returns: changeo.IO.AIRRReader """ # Arguments self.handle = handle # Define reader try: import airr self.reader = airr.io.RearrangementReader(self.handle, base=0, debug=False) except ImportError as e: printError('AIRR library cannot be imported: %s.' % e) # Set field list self.fields = self.reader.fields def _parse(self, record): """ Parses a dictionary of AIRR records to a Receptor object Arguments: record : dict with fields and values in the AIRR format. Returns: changeo.Receptor.Receptor : parsed Receptor object. """ # Parse fields result = {} for k, v in record.items(): # Rename fields k = AIRRSchema.toReceptor(k) # Convert start positions to 0-based # if k in ReceptorData.start_fields and v is not None and v != '': # v = str(int(v) + 1) # Assign new field result[k] = v # Assign length based on start and end for end, (start, length) in ReceptorData.end_fields.items(): if end in result and result[end] is not None: result[length] = int(result[end]) - int(result[start]) + 1 return Receptor(result) class AIRRWriter(TSVWriter): """ Writes AIRR formatted data. """ def __init__(self, handle, fields=AIRRSchema.required): """ Initializer Arguments: handle : handle to an open output file fields : list of output field names Returns: changeo.IO.AIRRWriter """ # Arguments self.handle = handle self.fields = [f.lower() for f in fields] # Define writer try: import airr self.writer = airr.io.RearrangementWriter(self.handle, fields=self.fields, base=0, debug=False) except ImportError as e: printError('AIRR library cannot be imported: %s.' % e) def _parseReceptor(self, record): """ Parses a Receptor object to an AIRR dictionary Arguments: record : dict with fields and values in the Receptor format Returns: dict : a parsed dict. """ result = {} row = record.toDict() for k, v in row.items(): # Convert start positions to 0-based # if k in ReceptorData.start_fields and v is not None and v != '': # v = str(int(v) - 1) # Convert field names k = AIRRSchema.fromReceptor(k) result[k] = v return result def writeReceptor(self, records): """ Writes a row from a Receptor object Arguments: records : a changeo.Receptor object to write or iterable of such objects. Returns: None """ if isinstance(records, Receptor): row = self._parseReceptor(records) self.writer.write(row) else: rows = (self._parseReceptor(r) for r in records) for r in rows: self.writer.write(r) class IMGTReader: """ An iterator to read and parse IMGT output files. """ @staticmethod def customFields(scores=False, regions=False, junction=False, schema=None): """ Returns non-standard fields defined by the parser Arguments: scores : if True include alignment scoring fields. regions : if True include IMGT-gapped CDR and FWR region fields. junction : if True include detailed junction annotation fields. schema : schema class to pass field through for conversion. If None, return changeo.Receptor.Receptor attribute names. Returns: list : list of field names. """ # Alignment scoring fields score_fields = ['v_score', 'v_identity', 'j_score', 'j_identity'] # FWR amd CDR fields region_fields = ['fwr1_imgt', 'fwr2_imgt', 'fwr3_imgt', 'fwr4_imgt', 'cdr1_imgt', 'cdr2_imgt', 'cdr3_imgt'] # Define default detailed junction field ordering junction_fields = ['n1_length', 'n2_length', 'p3v_length', 'p5d_length', 'p3d_length', 'p5j_length', 'd_frame'] fields = [] if scores: fields.extend(score_fields) if regions: fields.extend(region_fields) if junction: fields.extend(junction_fields) # Convert field names if schema provided if schema is not None: fields = [schema.fromReceptor(f) for f in fields] return fields def __init__(self, summary, gapped, ntseq, junction, receptor=True): """ Initializer Arguments: summary : handle to an open '1_Summary' IMGT/HighV-QUEST output file. gapped : handle to an open '2_IMGT-gapped-nt-sequences' IMGT/HighV-QUEST output file. ntseq: handle to an open '3_Nt-sequences' IMGT/HighV-QUEST output file. junction : handle to an open '6_Junction' IMGT/HighV-QUEST output file. receptor : if True (default) iteration returns an Receptor object, otherwise it returns a dictionary. Returns: change.Parsers.IMGTReader """ # Arguments self.summary = summary self.gapped = gapped self.ntseq = ntseq self.junction = junction self.receptor = receptor # Open readers readers = [csv.DictReader(self.summary, delimiter='\t'), csv.DictReader(self.gapped, delimiter='\t'), csv.DictReader(self.ntseq, delimiter='\t'), csv.DictReader(self.junction, delimiter='\t')] self.records = zip(*readers) @staticmethod def _parseFunctionality(summary): """ Parse functionality information Arguments: summary : dictionary containing one row of the '1_Summary' file. Returns: dict : database entries for functionality information. """ # Correct for new functionality column names if 'Functionality' not in summary: summary['Functionality'] = summary['V-DOMAIN Functionality'] summary['Functionality comment'] = summary['V-DOMAIN Functionality comment'] # Orientation parser def _revcomp(): x = {'+': 'F', '-': 'T'} return x.get(summary['Orientation'], None) # Functionality parser def _functional(): x = summary['Functionality'] if x.startswith('productive'): return 'T' elif x.startswith('unproductive'): return 'F' else: return None # Junction frame parser def _inframe(): x = {'in-frame': 'T', 'out-of-frame': 'F'} return x.get(summary['JUNCTION frame'], None) # Stop codon parser def _stop(): return 'T' if 'stop codon' in summary['Functionality comment'] else 'F' # Mutated invariant parser def _invariant(): x = summary['Functionality comment'] y = summary['V-REGION potential ins/del'] return 'T' if ('missing' in x) or ('missing' in y) else 'F' # Mutated invariant parser def _indels(): x = summary['V-REGION potential ins/del'] y = summary['V-REGION insertions'] z = summary['V-REGION deletions'] return 'T' if any([x, y, z]) else 'F' result = {} # Parse functionality information if 'No results' not in summary['Functionality']: result['rev_comp'] = _revcomp() result['functional'] = _functional() result['in_frame'] = _inframe() result['stop'] = _stop() result['mutated_invariant'] = _invariant() result['indels'] = _indels() return result @staticmethod def _parseGenes(summary): """ Parse gene calls Arguments: summary : dictionary containing one row of the '1_Summary' file. Returns: dict : database entries for gene calls. """ clean_regex = re.compile('(,)|(\(see comment\))') delim_regex = re.compile('\sor\s') # Gene calls v_str = summary['V-GENE and allele'] d_str = summary['D-GENE and allele'] j_str = summary['J-GENE and allele'] v_call = delim_regex.sub(',', clean_regex.sub('', v_str)) if v_str else None d_call = delim_regex.sub(',', clean_regex.sub('', d_str)) if d_str else None j_call = delim_regex.sub(',', clean_regex.sub('', j_str)) if j_str else None # Locus locus_list = [getLocus(v_call, action='first'), getLocus(j_call, action='first')] locus = set(filter(None, locus_list)) # Result result = {'v_call': v_call, 'd_call': d_call, 'j_call': j_call, 'locus': locus.pop() if len(locus) == 1 else None} return result @staticmethod def _parseSequences(gapped, ntseq): """ Parses full length V(D)J sequences Arguments: gapped : dictionary containing one row of the '2_IMGT-gapped-nt-sequences' file. ntseq: dictionary containing one row of the '3_Nt-sequences' file. Returns: dict : database entries for fill length V(D)J sequences. """ result = {} # Extract ungapped sequences if ntseq['V-D-J-REGION']: result['sequence_vdj'] = ntseq['V-D-J-REGION'] elif ntseq['V-J-REGION']: result['sequence_vdj'] = ntseq['V-J-REGION'] else: result['sequence_vdj'] = ntseq['V-REGION'] # Extract gapped sequences if gapped['V-D-J-REGION']: result['sequence_imgt'] = gapped['V-D-J-REGION'] elif gapped['V-J-REGION']: result['sequence_imgt'] = gapped['V-J-REGION'] else: result['sequence_imgt'] = gapped['V-REGION'] return result @staticmethod def _parseVPos(gapped, ntseq): """ Parses V alignment positions Arguments: gapped : dictionary containing one row of the '2_IMGT-gapped-nt-sequences' file. ntseq: dictionary containing one row of the '3_Nt-sequences' file. Returns: dict : database entries for V query and germline alignment positions. """ result = {} result['v_seq_start'] = ntseq['V-REGION start'] result['v_seq_length'] = len(ntseq['V-REGION']) if ntseq['V-REGION'] else 0 result['v_germ_start_imgt'] = 1 result['v_germ_length_imgt'] = len(gapped['V-REGION']) if gapped['V-REGION'] else 0 return result @staticmethod def _parseJuncPos(junction, db): """ Parses junction N/P and D alignment positions Arguments: junction : dictionary containing one row of the '6_Junction' file. db : database containing V alignment information. Returns: dict : database entries for junction, N/P and D region alignment positions. """ v_start = db['v_seq_start'] v_length = db['v_seq_length'] # First N/P length def _np1(): nb = [junction['P3\'V-nt nb'], junction['N-REGION-nt nb'], junction['N1-REGION-nt nb'], junction['P5\'D-nt nb']] return sum(int(i) for i in nb if i) # D start def _dstart(): nb = [v_start, v_length, junction['P3\'V-nt nb'], junction['N-REGION-nt nb'], junction['N1-REGION-nt nb'], junction['P5\'D-nt nb']] return sum(int(i) for i in nb if i) # Second N/P length def _np2(): nb = [junction['P3\'D-nt nb'], junction['N2-REGION-nt nb'], junction['P5\'J-nt nb']] return sum(int(i) for i in nb if i) result = {} # Junction sequence result['junction'] = junction['JUNCTION'] result['junction_aa'] = junction['JUNCTION (AA)'] result['junction_length'] = len(junction['JUNCTION']) if junction['JUNCTION'] else 0 # N/P and D alignment positions result['np1_length'] = _np1() result['d_seq_start'] = _dstart() result['d_seq_length'] = int(junction['D-REGION-nt nb'] or 0) result['d_germ_start'] = int(junction['5\'D-REGION trimmed-nt nb'] or 0) + 1 result['d_germ_length'] = int(junction['D-REGION-nt nb'] or 0) result['np2_length'] = _np2() return result @staticmethod def _parseJPos(gapped, ntseq, junction, db): """ Parses J alignment positions Arguments: gapped : dictionary containing one row of the '2_IMGT-gapped-nt-sequences' file. ntseq: dictionary containing one row of the '3_Nt-sequences' file. junction : dictionary containing one row of the '6_Junction' file. db : database containing V, N/P and D alignment information. Returns: dict : database entries for J region alignment positions. """ # J start def _jstart(): nb = [db['v_seq_start'], db['v_seq_length'], db['np1_length'], db['d_seq_length'], db['np2_length']] return sum(int(i) for i in nb if i) # J region alignment positions result = {} result['j_seq_start'] = _jstart() result['j_seq_length'] = len(ntseq['J-REGION']) if ntseq['J-REGION'] else 0 result['j_germ_start'] = int(junction['5\'J-REGION trimmed-nt nb'] or 0) + 1 result['j_germ_length'] = len(gapped['J-REGION']) if gapped['J-REGION'] else 0 return result @staticmethod def _parseScores(summary): """ Parse alignment scores Arguments: summary : dictionary containing one row of the '1_Summary' file. Returns: dict : database entries for alignment scores. """ result = {} # V score try: result['v_score'] = float(summary['V-REGION score']) except (TypeError, ValueError): result['v_score'] = None # V identity try: result['v_identity'] = float(summary['V-REGION identity %']) / 100.0 except (TypeError, ValueError): result['v_identity'] = 'None' # J score try: result['j_score'] = float(summary['J-REGION score']) except (TypeError, ValueError): result['j_score'] = None # J identity try: result['j_identity'] = float(summary['J-REGION identity %']) / 100.0 except (TypeError, ValueError): result['j_identity'] = None return result @staticmethod def _parseJuncDetails(junction): """ Parse detailed junction region information Arguments: junction : dictionary containing one row of the '6_Junction' file. Returns: dict : database entries for detailed D, N and P region information. """ # D reading frame def _dframe(): frame = None x = junction['D-REGION reading frame'] if x: try: frame = int(x) except ValueError: m = re.search(r'reading frame ([0-9])', x).group(1) frame = int(m) return frame # First N region length def _n1(): nb = [junction['N-REGION-nt nb'], junction['N1-REGION-nt nb']] return sum(int(i) for i in nb if i) # D Frame and junction fields result = {} result['d_frame'] = _dframe() result['n1_length'] = _n1() result['n2_length'] = int(junction['N2-REGION-nt nb'] or 0) result['p3v_length'] = int(junction['P3\'V-nt nb'] or 0) result['p5d_length'] = int(junction['P5\'D-nt nb'] or 0) result['p3d_length'] = int(junction['P3\'D-nt nb'] or 0) result['p5j_length'] = int(junction['P5\'J-nt nb'] or 0) return result def parseRecord(self, summary, gapped, ntseq, junction): """ Parses a single row from each IMTG file. Arguments: summary : dictionary containing one row of the '1_Summary' file. gapped : dictionary containing one row of the '2_IMGT-gapped-nt-sequences' file. ntseq : dictionary containing one row of the '3_Nt-sequences' file. junction : dictionary containing one row of the '6_Junction' file. Returns: dict: database entry for the row. """ # Check that rows are syncronized id_set = [summary['Sequence ID'], gapped['Sequence ID'], ntseq['Sequence ID'], junction['Sequence ID']] if len(set(id_set)) != 1: printError('IMGT files are corrupt starting with Summary file record %s.' % id_set[0]) # Initialize db with query ID and sequence db = {'sequence_id': summary['Sequence ID'], 'sequence_input': summary['Sequence']} # Parse required fields db.update(IMGTReader._parseFunctionality(summary)) db.update(IMGTReader._parseGenes(summary)) db.update(IMGTReader._parseSequences(gapped, ntseq)) db.update(IMGTReader._parseVPos(gapped, ntseq)) db.update(IMGTReader._parseJuncPos(junction, db)) db.update(IMGTReader._parseJPos(gapped, ntseq, junction, db)) # Parse optional fields db.update(IMGTReader._parseScores(summary)) rd = RegionDefinition(junction_length=db.get('junction_length', None), amino_acid=False, definition='default') db.update(rd.getRegions(db.get('sequence_imgt', None))) db.update(IMGTReader._parseJuncDetails(junction)) return db def __iter__(self): """ Iterator initializer. Returns: changeo.IO.IMGTReader """ return self def __next__(self): """ Next method. Returns: changeo.Receptor.Receptor : parsed IMGT/HighV-QUEST result as an Receptor (receptor=True) or dictionary (receptor=False). """ # Get next set of records from dictionary readers try: summary, gapped, ntseq, junction = next(self.records) except StopIteration: raise StopIteration db = self.parseRecord(summary, gapped, ntseq, junction) if self.receptor: return Receptor(db) else: return db class IgBLASTReader: """ An iterator to read and parse IgBLAST output files """ # Ordered list of known fields @staticmethod def customFields(schema=None): """ Returns non-standard fields defined by the parser Arguments: schema : schema class to pass field through for conversion. If None, return changeo.Receptor.Receptor attribute names. Returns: list : list of field names. """ # IgBLAST scoring fields fields = ['v_score', 'v_identity', 'v_evalue', 'v_cigar', 'd_score', 'd_identity', 'd_evalue', 'd_cigar', 'j_score', 'j_identity', 'j_evalue', 'j_cigar', 'fwr1_imgt', 'fwr2_imgt', 'fwr3_imgt', 'fwr4_imgt', 'cdr1_imgt', 'cdr2_imgt', 'cdr3_imgt'] # Convert field names if schema provided if schema is not None: fields = [schema.fromReceptor(f) for f in fields] return fields def __init__(self, igblast, sequences, references, asis_calls=False, regions='default', receptor=True, infer_junction=False): """ Initializer. Arguments: igblast (file): handle to an open IgBLAST output file written with '-outfmt 7 std qseq sseq btop'. sequences (dict): dictionary of query sequences; sequence descriptions as keys with original query sequences as SeqRecord values. references (dict): dictionary of IMGT gapped germline sequences. asis_calls (bool): if True do not parse gene calls for allele names. regions (str): name of the IMGT FWR/CDR region definitions to use. receptor (bool): if True (default) iteration returns an Receptor object, otherwise it returns a dictionary. infer_junction (bool): if True, infer the junction region if not reported by IgBLAST. Returns: changeo.IO.IgBLASTReader """ # Arguments self.igblast = igblast self.sequences = sequences self.references = references self.regions = regions self.asis_calls = asis_calls self.receptor = receptor self.infer_junction = infer_junction # Define parsing blocks self.groups = groupby(self.igblast, lambda x: not re.match('# IGBLAST', x)) def _parseQueryChunk(self, chunk): """ Parse query section Arguments: chunk : list of strings Returns: str : query identifier """ # Extract query id from comments query = next((x for x in chunk if x.startswith('# Query:'))) return query.replace('# Query: ', '', 1) def _parseSummaryChunk(self, chunk): """ Parse summary section Args: chunk: list of strings Returns: dict : summary section. """ # Mapping for field names in the summary section summary_map = {'Top V gene match': 'v_match', 'Top D gene match': 'd_match', 'Top J gene match': 'j_match', 'Chain type': 'chain_type', 'stop codon': 'stop_codon', 'V-J frame': 'vj_frame', 'Productive': 'productive', 'Strand': 'strand', 'V Frame shift': 'v_frameshift'} # Extract column names from comments f = next((x for x in chunk if x.startswith('# V-(D)-J rearrangement summary'))) f = re.search('summary for query sequence \((.+)\)\.', f).group(1) columns = [summary_map[x.strip()] for x in f.split(',')] # Extract first row as a list row = next((x.split('\t') for x in chunk if not x.startswith('#'))) # Populate template dictionary with parsed fields summary = {v: None for v in summary_map.values()} summary.update(dict(zip(columns, row))) return summary def _parseSubregionChunk(self, chunk): """ Parse CDR3 sequences generated by IgBLAST Args: chunk: list of strings Returns: dict : nucleotide and amino acid CDR3 sequences """ # Example: # # Sub-region sequence details (nucleotide sequence, translation, start, end) # CDR3 CAACAGTGGAGTAGTTACCCACGGACG QQWSSYPRT 248 287 # Define column names cdr3_map = {'nucleotide sequence': 'cdr3_igblast', 'translation': 'cdr3_igblast_aa', 'start': 'cdr3_igblast_start', 'end': 'cdr3_igblast_end'} # Extract column names from comments f = next((x for x in chunk if x.startswith('# Sub-region sequence details'))) f = re.search('sequence details \((.+)\)', f).group(1) columns = [cdr3_map[x.strip()] for x in f.split(',')] # Extract first CDR3 as a list and remove the CDR3 label rows = next((x.split('\t') for x in chunk if x.startswith('CDR3')))[1:] # Populate dictionary with parsed fields cdr = {v: None for v in columns} cdr.update(dict(zip(columns, rows))) # Add length if cdr.get('cdr3_igblast', None) is not None: cdr['cdr3_igblast_length'] = len(cdr['cdr3_igblast']) return cdr def _parseHitsChunk(self, chunk): """ Parse hits section Args: chunk: list of strings Returns: list: hit table as a list of dictionaries """ # Extract column names from comments f = next((x for x in chunk if x.startswith('# Fields:'))) columns = chain(['segment'], f.replace('# Fields:', '', 1).split(',')) columns = [x.strip() for x in columns] # Split non-comment rows into a list of lists rows = [x.split('\t') for x in chunk if not x.startswith('#')] # Create list of dictionaries containing hits hits = [{k: x[i] for i, k in enumerate(columns)} for x in rows] return hits def _parseSummarySection(self, summary, db, asis_calls=False): """ Parse summary section Arguments: summary : summary section dictionary return by parseBlock db : initial database dictionary. asis_calls : if True do not parse gene calls for allele names. Returns: dict : db of results. """ result = {} # Parse V, D, and J calls if not asis_calls: v_call = getVAllele(summary['v_match'], action='list') d_call = getDAllele(summary['d_match'], action='list') j_call = getJAllele(summary['j_match'], action='list') result['v_call'] = ','.join(v_call) if v_call else None result['d_call'] = ','.join(d_call) if d_call else None result['j_call'] = ','.join(j_call) if j_call else None else: result['v_call'] = None if summary['v_match'] == 'N/A' else summary['v_match'] result['d_call'] = None if summary['d_match'] == 'N/A' else summary['d_match'] result['j_call'] = None if summary['j_match'] == 'N/A' else summary['j_match'] # Parse locus locus = None if summary['chain_type'] == 'N/A' else summary['chain_type'] locus_map = {'VH': 'IGH', 'VK': 'IGK', 'VL': 'IGL', 'VB': 'TRB', 'VD': 'TRD', 'VA': 'TRA', 'VG': 'TRG'} result['locus'] = locus_map.get(locus, locus) # Parse quality information result['stop'] = 'T' if summary['stop_codon'] == 'Yes' else 'F' result['in_frame'] = 'T' if summary['vj_frame'] == 'In-frame' else 'F' result['functional'] = 'T' if summary['productive'] == 'Yes' else 'F' # Reverse complement input sequence if required if summary['strand'] == '-': seq_rc = Seq(db['sequence_input']).reverse_complement() result['sequence_input'] = str(seq_rc) result['rev_comp'] = 'T' else: result['rev_comp'] = 'F' # Add v_frameshift field if present if 'v_frameshift' in summary: result['v_frameshift'] = 'T' if summary['v_frameshift'] == 'Yes' else 'F' return result def _parseSubregionSection(self, section, sequence): """ Parse subregion section Arguments: section : subregion section dictionary return by parseBlock sequence : input sequence Returns: dict : db of results. """ # Extract junction junc_start = int(section['cdr3_igblast_start']) - 3 junc_end = int(section['cdr3_igblast_end']) + 3 junc_seq = sequence[(junc_start - 1):junc_end] junc_len = len(junc_seq) # Translation junc_tmp = junc_seq.replace('-', 'N').replace('.', 'N') if junc_len % 3 > 0: junc_tmp = junc_tmp[:junc_len - junc_len % 3] junc_aa = str(Seq(junc_tmp).translate()) # Build return values return {'junction': junc_seq, 'junction_aa': junc_aa, 'junction_length': junc_len, 'junction_start': junc_start, 'junction_end': junc_end} def _parseVHitPos(self, v_hit): """ Parse V alignment positions Arguments: v_hit : V alignment row from the hit table Returns: dict: db of D starts and lengths """ result = {} # Germline positions result['v_germ_start_vdj'] = int(v_hit['s. start']) result['v_germ_length_vdj'] = int(v_hit['s. end']) - result['v_germ_start_vdj'] + 1 # Query sequence positions result['v_seq_start'] = int(v_hit['q. start']) result['v_seq_length'] = int(v_hit['q. end']) - result['v_seq_start'] + 1 result['indels'] = 'F' if int(v_hit['gap opens']) == 0 else 'T' return result def _parseDHitPos(self, d_hit, overlap): """ Parse D alignment positions Arguments: d_hit : D alignment row from the hit table overlap : V-D overlap length Returns: dict: db of D starts and lengths """ result = {} # Query sequence positions result['d_seq_start'] = int(d_hit['q. start']) + overlap result['d_seq_length'] = max(int(d_hit['q. end']) - result['d_seq_start'] + 1, 0) # Germline positions result['d_germ_start'] = int(d_hit['s. start']) + overlap result['d_germ_length'] = max(int(d_hit['s. end']) - result['d_germ_start'] + 1, 0) return result def _parseJHitPos(self, j_hit, overlap): """ Parse J alignment positions Arguments: j_hit : J alignment row from the hit table overlap : D-J or V-J overlap length Returns: dict: db of J starts and lengths """ result = {} result['j_seq_start'] = int(j_hit['q. start']) + overlap result['j_seq_length'] = max(int(j_hit['q. end']) - result['j_seq_start'] + 1, 0) result['j_germ_start'] = int(j_hit['s. start']) + overlap result['j_germ_length'] = max(int(j_hit['s. end']) - result['j_germ_start'] + 1, 0) return result def _appendSeq(self, seq, hits, start, trim=True): """ Append aligned query sequence segment Arguments: seq : sequence to modify. hits : hit table row for the sequence. start : start position of the query sequence. trim : if True then remove insertions from the hit sequence before appending. Returns: str: modified sequence. """ if 'subject seq' not in hits or 'query seq' not in hits: return None # Remove insertions if trim: for m in re.finditer(r'-', hits['subject seq']): ins = m.start() seq += hits['query seq'][start:ins] start = ins + 1 # Append seq += hits['query seq'][start:] return seq def _parseVHits(self, hits, db): """ Parse V hit sub-table Arguments: hits : hit table as a list of dictionaries. db : database dictionary containing summary results. Returns: dict : db of results. """ result = {} seq_vdj = db['sequence_vdj'] seq_trim = db['sequence_trim'] v_hit = next(x for x in hits if x['segment'] == 'V') # Alignment positions result.update(self._parseVHitPos(v_hit)) # Update VDJ sequence with and without removing insertions result['sequence_vdj'] = self._appendSeq(seq_vdj, v_hit, 0, trim=False) result['sequence_trim'] = self._appendSeq(seq_trim, v_hit, 0, trim=True) return result def _parseDHits(self, hits, db): """ Parse D hit sub-table Arguments: hits : hit table as a list of dictionaries. db : database dictionary containing summary and V results. Returns: dict : db of results. """ result = {} seq_vdj = db['sequence_vdj'] seq_trim = db['sequence_trim'] d_hit = next(x for x in hits if x['segment'] == 'D') # Determine N-region length and amount of J overlap with V or D alignment overlap = 0 if db['v_call']: np1_len = int(d_hit['q. start']) - (db['v_seq_start'] + db['v_seq_length']) if np1_len < 0: result['np1_length'] = 0 overlap = abs(np1_len) else: result['np1_length'] = np1_len np1_start = db['v_seq_start'] + db['v_seq_length'] - 1 np1_end = int(d_hit['q. start']) - 1 if seq_vdj is not None: seq_vdj += db['sequence_input'][np1_start:np1_end] if seq_trim is not None: seq_trim += db['sequence_input'][np1_start:np1_end] # D alignment positions result.update(self._parseDHitPos(d_hit, overlap)) # Update VDJ sequence with and without removing insertions result['sequence_vdj'] = self._appendSeq(seq_vdj, d_hit, overlap, trim=False) result['sequence_trim'] = self._appendSeq(seq_trim, d_hit, overlap, trim=True) return result def _parseJHits(self, hits, db): """ Parse J hit sub-table Arguments: hits : hit table as a list of dictionaries. db : database dictionary containing summary, V and D results. Returns: dict : db of results. """ result = {} seq_vdj = db['sequence_vdj'] seq_trim = db['sequence_trim'] j_hit = next(x for x in hits if x['segment'] == 'J') # Determine N-region length and amount of J overlap with V or D alignment overlap = 0 if db['d_call']: np2_len = int(j_hit['q. start']) - (db['d_seq_start'] + db['d_seq_length']) if np2_len < 0: result['np2_length'] = 0 overlap = abs(np2_len) else: result['np2_length'] = np2_len n2_start = db['d_seq_start'] + db['d_seq_length'] - 1 n2_end = int(j_hit['q. start']) - 1 if seq_vdj is not None: seq_vdj += db['sequence_input'][n2_start:n2_end] if seq_trim is not None: seq_trim += db['sequence_input'][n2_start:n2_end] elif db['v_call']: np1_len = int(j_hit['q. start']) - (db['v_seq_start'] + db['v_seq_length']) if np1_len < 0: result['np1_length'] = 0 overlap = abs(np1_len) else: result['np1_length'] = np1_len np1_start = db['v_seq_start'] + db['v_seq_length'] - 1 np1_end = int(j_hit['q. start']) - 1 if seq_vdj is not None: seq_vdj += db['sequence_input'][np1_start: np1_end] if seq_trim is not None: seq_trim += db['sequence_input'][np1_start: np1_end] else: result['np1_length'] = 0 # J alignment positions result.update(self._parseJHitPos(j_hit, overlap)) # Update VDJ sequence with and without removing insertions result['sequence_vdj'] = self._appendSeq(seq_vdj, j_hit, overlap, trim=False) result['sequence_trim'] = self._appendSeq(seq_trim, j_hit, overlap, trim=True) return result def _parseHitScores(self, hits, segment): """ Parse alignment scores Arguments: hits : hit table as a list of dictionaries. segment : segment name; one of 'v', 'd' or 'j'. Returns: dict : scores """ result = {} s_hit = next(x for x in hits if x['segment'] == segment.upper()) # Score try: result['%s_score' % segment] = float(s_hit['bit score']) except (TypeError, ValueError): result['%s_score' % segment] = None # Identity try: result['%s_identity' % segment] = float(s_hit['% identity']) / 100.0 except (TypeError, ValueError): result['%s_identity' % segment] = None # E-value try: result['%s_evalue' % segment] = float(s_hit['evalue']) except (TypeError, ValueError): result['%s_evalue' % segment] = None # BTOP try: result['%s_btop' % segment] = s_hit['BTOP'] except (KeyError, TypeError, ValueError): result['%s_btop' % segment] = None # CIGAR try: align = decodeBTOP(s_hit['BTOP']) align = padAlignment(align, int(s_hit['q. start']) - 1, int(s_hit['s. start']) - 1) result['%s_cigar' % segment] = encodeCIGAR(align) except (KeyError, TypeError, ValueError): result['%s_cigar' % segment] = None return result def parseBlock(self, block): """ Parses an IgBLAST result into separate sections Arguments: block (iter): an iterator from itertools.groupby containing a single IgBLAST result. Returns: dict: a parsed results block; with the keys 'query' (sequence identifier as a string), 'summary' (dictionary of the alignment summary), 'subregion' (dictionary of IgBLAST CDR3 sequences), and 'hits' (VDJ hit table as a list of dictionaries). Returns None if the block has no data that can be parsed. """ # Parsing info # # Columns for non-hit-table sections # 'V-(D)-J rearrangement summary': (Top V gene match, Top D gene match, Top J gene match, Chain type, stop codon, V-J frame, Productive, Strand) # 'V-(D)-J junction details': (V end, V-D junction, D region, D-J junction, J start) # 'Alignment summary': (from, to, length, matches, mismatches, gaps, percent identity) # 'subregion': (nucleotide sequence, translation, start, end) # # Ignored sections # 'junction': '# V-(D)-J junction details' # 'v_alignment': '# Alignment summary' # # Hit table fields for -outfmt "7 std qseq sseq btop" # 0: segment # 1: query id # 2: subject id # 3: % identity # 4: alignment length # 5: mismatches # 6: gap opens # 7: gaps # 8: q. start # 9: q. end # 10: s. start # 11: s. end # 12: evalue # 13: bit score # 14: query seq # 15: subject seq # 16: btop # Map of valid block parsing keys and functions chunk_map = {'query': ('# Query:', self._parseQueryChunk), 'summary': ('# V-(D)-J rearrangement summary', self._parseSummaryChunk), 'subregion': ('# Sub-region sequence details', self._parseSubregionChunk), 'hits': ('# Hit table', self._parseHitsChunk)} # Parsing chunks results = {} for match, chunk in groupby(block, lambda x: x != '\n'): if match: # Strip whitespace and convert to list chunk = [x.strip() for x in chunk] # Parse non-query sections chunk_dict = {k: f(chunk) for k, (v, f) in chunk_map.items() if chunk[0].startswith(v)} results.update(chunk_dict) return results if results else None def parseSections(self, sections): """ Parses an IgBLAST sections into a db dictionary Arguments: sections : dictionary of parsed sections from parseBlock. Returns: dict : db entries. """ # Initialize dictionary with input sequence and id db = {} if 'query' in sections: query = sections['query'] db['sequence_id'] = query db['sequence_input'] = str(self.sequences[query].seq) # Parse summary section if 'summary' in sections: db.update(self._parseSummarySection(sections['summary'], db, asis_calls=self.asis_calls)) # Parse hit table if 'hits' in sections: db['sequence_vdj'] = '' db['sequence_trim'] = '' if db['v_call']: db.update(self._parseVHits(sections['hits'], db)) db.update(self._parseHitScores(sections['hits'], 'v')) if db['d_call']: db.update(self._parseDHits(sections['hits'], db)) db.update(self._parseHitScores(sections['hits'], 'd')) if db['j_call']: db.update(self._parseJHits(sections['hits'], db)) db.update(self._parseHitScores(sections['hits'], 'j')) # Create IMGT-gapped sequence if ('v_call' in db and db['v_call']) and ('sequence_trim' in db and db['sequence_trim']): try: imgt_dict = gapV(db['sequence_trim'], v_germ_start=db['v_germ_start_vdj'], v_germ_length=db['v_germ_length_vdj'], v_call=db['v_call'], references=self.references, asis_calls=self.asis_calls) except KeyError as e: imgt_dict = {'sequence_imgt': None, 'v_germ_start_imgt': None, 'v_germ_length_imgt': None} printWarning(e) db.update(imgt_dict) del db['sequence_trim'] # Add junction if 'subregion' in sections and 'cdr3_igblast_start' in sections['subregion']: junc_dict = self._parseSubregionSection(sections['subregion'], db['sequence_input']) db.update(junc_dict) elif self.infer_junction and ('j_call' in db and db['j_call']) and ('sequence_imgt' in db and db['sequence_imgt']): junc_dict = inferJunction(db['sequence_imgt'], j_germ_start=db['j_germ_start'], j_germ_length=db['j_germ_length'], j_call=db['j_call'], references=self.references, asis_calls=self.asis_calls, regions=self.regions) db.update(junc_dict) # Add IgBLAST CDR3 sequences if 'subregion' in sections: # Sequences already parsed into dict by parseBlock db.update(sections['subregion']) else: # Section does not exist (ie, older version of IgBLAST or CDR3 not found) db.update({'cdr3_igblast': None, 'cdr3_igblast_aa': None}) # Add FWR and CDR regions rd = RegionDefinition(junction_length=db.get('junction_length', None), amino_acid=False, definition=self.regions) db.update(rd.getRegions(db.get('sequence_imgt', None))) return db def __iter__(self): """ Iterator initializer. Returns: changeo.IO.IgBLASTReader """ return self def __next__(self): """ Next method. Returns: changeo.Receptor.Receptor : parsed IMGT/HighV-QUEST result as an Receptor (receptor=True) or dictionary (receptor=False). """ # Get next block from groups iterator try: match = False block = None while not match: match, block = next(self.groups) except StopIteration: raise StopIteration # Parse block sections = self.parseBlock(block) db = self.parseSections(sections) if self.receptor: return Receptor(db) else: return db class IgBLASTReaderAA(IgBLASTReader): """ An iterator to read and parse IgBLAST amino acid alignment output files """ @staticmethod def customFields(schema=None): """ Returns non-standard fields defined by the parser Arguments: schema : schema class to pass field through for conversion. If None, return changeo.Receptor.Receptor attribute names. Returns: list : list of field names. """ # IgBLAST scoring fields fields = ['v_score', 'v_identity', 'v_evalue', 'v_cigar', 'fwr1_aa_imgt', 'fwr2_aa_imgt', 'fwr3_aa_imgt', 'cdr1_aa_imgt', 'cdr2_aa_imgt'] # Convert field names if schema provided if schema is not None: fields = [schema.fromReceptor(f) for f in fields] return fields def _parseVHitPos(self, v_hit): """ Parse V alignment positions Arguments: v_hit : V alignment row from the hit table Returns: dict: db of D starts and lengths """ result = {} # Germline positions result['v_germ_aa_start_vdj'] = int(v_hit['s. start']) result['v_germ_aa_length_vdj'] = int(v_hit['s. end']) - result['v_germ_aa_start_vdj'] + 1 # Query sequence positions result['v_seq_aa_start'] = int(v_hit['q. start']) result['v_seq_aa_length'] = int(v_hit['q. end']) - result['v_seq_aa_start'] + 1 result['indels'] = 'F' if int(v_hit['gap opens']) == 0 else 'T' return result def _parseVHits(self, hits, db): """ Parse V hit sub-table Arguments: hits : hit table as a list of dictionaries. db : database dictionary containing summary results. Returns: dict : db of results. """ result = {} seq_vdj = db['sequence_aa_vdj'] seq_trim = db['sequence_aa_trim'] v_hit = next(x for x in hits if x['segment'] == 'V') # Alignment positions result.update(self._parseVHitPos(v_hit)) # Assign V gene and update VDJ sequence with and without removing insertions result['v_call'] = v_hit['subject id'] result['sequence_aa_vdj'] = self._appendSeq(seq_vdj, v_hit, 0, trim=False) result['sequence_aa_trim'] = self._appendSeq(seq_trim, v_hit, 0, trim=True) # Derived functionality result['locus'] = getLocus(result['v_call'], action='first') result['stop'] = '*' in result['sequence_aa_vdj'] return result def parseSections(self, sections): """ Parses an IgBLAST sections into a db dictionary Arguments: sections : dictionary of parsed sections from parseBlock. Returns: dict : db entries. """ # Initialize dictionary with input sequence and id db = {} if 'query' in sections: query = sections['query'] db['sequence_id'] = query db['sequence_aa_input'] = str(self.sequences[query].seq) # Parse hit table if 'hits' in sections: db['v_call'] = '' db['sequence_aa_vdj'] = '' db['sequence_aa_trim'] = '' db.update(self._parseVHits(sections['hits'], db)) db.update(self._parseHitScores(sections['hits'], 'v')) # Create IMGT-gapped sequence if ('v_call' in db and db['v_call']) and ('sequence_aa_trim' in db and db['sequence_aa_trim']): try: gap = gapV(db['sequence_aa_trim'], v_germ_start=db['v_germ_aa_start_vdj'], v_germ_length=db['v_germ_aa_length_vdj'], v_call=db['v_call'], references=self.references, asis_calls=self.asis_calls) imgt_dict = {'sequence_aa_imgt': gap['sequence_imgt'], 'v_germ_aa_start_imgt': gap['v_germ_start_imgt'], 'v_germ_aa_length_imgt': gap['v_germ_length_imgt']} except KeyError as e: imgt_dict = {'sequence_aa_imgt': None, 'v_germ_aa_start_imgt': None, 'v_germ_aa_length_imgt': None} printWarning(e) db.update(imgt_dict) del db['sequence_aa_trim'] # Add FWR and CDR regions rd = RegionDefinition(junction_length=db.get('junction_length', None), amino_acid=True, definition=self.regions) regions = rd.getRegions(db.get('sequence_aa_imgt', None)) regions = {'fwr1_aa_imgt': regions['fwr1_imgt'], 'fwr2_aa_imgt': regions['fwr2_imgt'], 'fwr3_aa_imgt': regions['fwr3_imgt'], 'cdr1_aa_imgt': regions['cdr1_imgt'], 'cdr2_aa_imgt': regions['cdr2_imgt']} db.update(regions) return db class IHMMuneReader: """ An iterator to read and parse iHMMune-Align output files. """ # iHMMuneAlign columns # Courtesy of Katherine Jackson # # 1: Identifier - sequence identifer from FASTA input file # 2: IGHV - IGHV gene match from the IGHV repertoire, if multiple genes had equally # good alignments both will be listed, if indels were found this will be # listed, in case of multiple IGHV all further data is reported with # respect to the first listed gene # 3: IGHD - IGHD gene match, if no IGHD could be found or the IGHD that was found # failed to meet confidence criteria this will be 'NO_DGENE_ALIGNMENT' # 4: IGHJ - IGHJ gene match, only a single best matching IGHJ is reported, if indels # are found then 'indel' will be listed # 5: V-REGION - portion of input sequence that matches to the germline IGHV, were # nucleotide are missing at start or end the sequence is padded back # to full length with '.' (the exonuclease loss from the end of the # gene will therefore be equal to the number of '.' characters at the # 5` end), mismatches between germline and rearranged are in uppercase, # matches are in lowercase # 6: N1-REGION - sequence between V- and D-REGIONs # 7: D-REGION - portion of input sequence that matches to the germline IGHD # (model doesn't currently permit indels in the IGHD), where IGHD is # reported as 'NO_DGENE_ALIGNMENT' this field contains all nucleotides # between the V- and J-REGIONs # 8: N2-REGION - sequence between D- and J-REGIONs # 9: J-REGION - portion of the input sequence that matches germline IGHJ, padded # 5` and 3` to length of germline match # 10: V mutation count - count of mismatches in the V-REGION # 11: D mutation count - count of mismatches in the D-REGION # 12: J mutation count - count of mismatches in the J-REGION # 13: count of ambigious nts - count of 'n' or 'x' nucleotides in the input sequence # 14: IGHJ in-frame - 'true' is IGHJ is in-frame and 'false' if IGHJ is out-of-frame, # WARNING indels and germline IGHV database sequences that are # not RF1 can cause this to report inaccurately # 15: IGHV start offset - offset for start of alignment between input sequence and # germline IGHV # NOTE: appears to be base 1 indexing. # 16: stop codons - count of stop codons in the sequence, WARNING indels and germline # IGHV database sequence that are not RF can cause this to be inaccurate # 17: IGHD probability - probability that N-nucleotide addition could have created the # D-REGION sequence # 18: HMM path score - path score from HMM # 19: reverse complement - 0 for no reverse complement, 1 if alignment was to reverse # complement NOTE currently this version only functions with # input in coding orientation # 20: mutations in common region - count of mutations in common region, which is a # portion of the IGHV that is highly conserved, # mutations in this region are used to set various # probabilities in the HMM # 21: ambigious nts in common region - count of 'n' or 'x' nucleotides in the # common region # 22: IGHV start offset - offset for start of alignment between input sequence and # germline IGHV # NOTE: appears to be base 0 indexing. # NOTE: don't know if this differs from 15; it doesn't appear to. # 23: IGHV gene length - length of IGHV gene # 24: A score - A score probability is calculated from the common region mutations # and is used for HMM calculations relating to expected mutation # probability at different positions in the rearrangement ihmmune_fields = ['SEQUENCE_ID', 'V_CALL', 'D_CALL', 'J_CALL', 'V_SEQ', 'NP1_SEQ', 'D_SEQ', 'NP2_SEQ', 'J_SEQ', 'V_MUT', 'D_MUT', 'J_MUT', 'NX_COUNT', 'J_INFRAME', 'V_SEQ_START', 'STOP_COUNT', 'D_PROB', 'HMM_SCORE', 'RC', 'COMMON_MUT', 'COMMON_NX_COUNT', 'V_SEQ_START', 'V_SEQ_LENGTH', 'A_SCORE'] # Ordered list of known fields @staticmethod def customFields(scores=False, regions=False, cell=False, schema=None): """ Returns non-standard Receptor attributes defined by the parser Arguments: scores : if True include alignment scoring fields. regions : if True include IMGT-gapped CDR and FWR region fields. schema : schema class to pass field through for conversion. If None, return changeo.Receptor.Receptor attribute names. Returns: list : list of field names. """ # Alignment scoring fields score_fields = ['vdj_score'] # FWR amd CDR fields region_fields = ['fwr1_imgt', 'fwr2_imgt', 'fwr3_imgt', 'fwr4_imgt', 'cdr1_imgt', 'cdr2_imgt', 'cdr3_imgt'] fields = [] if scores: fields.extend(score_fields) if regions: fields.extend(region_fields) # Convert field names if schema provided if schema is not None: fields = [schema.fromReceptor(f) for f in fields] return fields def __init__(self, ihmmune, sequences, references, receptor=True): """ Initializer Arguments: ihmmune (file): handle to an open iHMMune-Align output file. sequences (dict): dictionary with sequence descriptions as keys mapping to the SeqRecord containing the original query sequences. references (dict): dictionary of IMGT gapped germline sequences. receptor (bool): if True (default) iteration returns an Receptor object, otherwise it returns a dictionary Returns: changeo.IO.IHMMuneReader """ # Arguments self.ihmmune = ihmmune self.sequences = sequences self.references = references self.receptor = receptor # Open reader self.records = csv.DictReader(self.ihmmune, fieldnames=IHMMuneReader.ihmmune_fields, delimiter=';', quotechar='"') @staticmethod def _parseFunctionality(record): """ Parse functionality information Arguments: record : dictionary containing a single row from the iHMMune-Align ouptut. Returns: dict : database entries containing functionality information. """ # Orientation def _revcomp(): return 'F' if int(record['RC']) == 0 else 'T' # Functional def _functional(): if not record['V_CALL'] or \ record['V_CALL'].startswith('NA - ') or \ record['J_INFRAME'] != 'true' or \ not record['J_CALL'] or \ record['J_CALL'] == 'NO_JGENE_ALIGNMENT' or \ int(record['STOP_COUNT']) > 0: return 'F' else: return 'T' # Stop codon def _stop(): return 'T' if int(record['STOP_COUNT']) > 0 else 'F' # J in-frame def _inframe(): return 'T' if record['J_INFRAME'] == 'true' else 'F' # Indels def _indels(): check = [x is not None and 'indels' in x \ for x in [record['V_CALL'], record['D_CALL'], record['J_CALL']]] return 'T' if any(check) else 'F' # Parse functionality result = {'rev_comp': _revcomp(), 'functional': _functional(), 'in_frame': _inframe(), 'stop': _stop(), 'indels': _indels()} return result @staticmethod def _parseGenes(record): """ Parse gene calls Arguments: record : dictionary containing a single row from the iHMMune-Align ouptut. Returns: dict : database entries for gene calls. """ # Extract allele calls v_call = getVAllele(record['V_CALL'], action='list') d_call = getDAllele(record['D_CALL'], action='list') j_call = getJAllele(record['J_CALL'], action='list') # Locus locus_list = [getLocus(record['V_CALL'], action='first'), getLocus(record['J_CALL'], action='first')] locus = set(filter(None, locus_list)) # Build return object result = {'v_call': ','.join(v_call) if v_call else None, 'd_call': ','.join(d_call) if d_call else None, 'j_call': ','.join(j_call) if j_call else None, 'locus': locus.pop() if len(locus) == 1 else None} return result @staticmethod def _parseNPHit(record): """ Parse N/P region alignment information Arguments: record : dictionary containing a single row from the iHMMune-Align ouptut. Returns: dict : database entries containing N/P region lengths. """ # N/P lengths result = {'np1_length': len(record['NP1_SEQ']), 'np2_length': len(record['NP2_SEQ'])} return result @staticmethod def _parseVHit(record, db): """ Parse V alignment information Arguments: record : dictionary containing a single row from the iHMMune-Align ouptut. db : database containing V and D alignment information. Returns: dict : database entries containing V call and alignment positions. """ # Default return result = {'v_seq_start': None, 'v_seq_length': None, 'v_germ_start_vdj': None, 'v_germ_length_vdj': None} # Find V positions if db['v_call']: # Query positions result['v_seq_start'] = int(record['V_SEQ_START']) result['v_seq_length'] = len(record['V_SEQ'].strip('.')) # Germline positions result['v_germ_start_vdj'] = 1 result['v_germ_length_vdj'] = result['v_seq_length'] return result def _parseDHit(record, db): """ Parse D alignment information Arguments: record : dictionary containing a single row from the iHMMune-Align ouptut. db : database containing V alignment information. Returns: dict : database entries containing D call and alignment positions. """ # D start position def _dstart(): nb = [db['v_seq_start'], db['v_seq_length'], db['np1_length']] return sum(int(i) for i in nb if i) # Default return result = {'d_seq_start': None, 'd_seq_length': None, 'd_germ_start': None, 'd_germ_length': None} if db['d_call']: # Query positions result['d_seq_start'] = _dstart() result['d_seq_length'] = len(record['D_SEQ'].strip('.')) # Germline positions result['d_germ_start'] = len(record['D_SEQ']) - len(record['D_SEQ'].lstrip('.')) result['d_germ_length'] = result['d_seq_length'] return result @staticmethod def _parseJHit(record, db): """ Parse J alignment information Arguments: record : dictionary containing a single row from the iHMMune-Align ouptut. db : database containing V and D alignment information. Returns: dict : database entries containing J call and alignment positions. """ # J start position def _jstart(): # J positions nb = [db['v_seq_start'], db['v_seq_length'], db['np1_length'], db['d_seq_length'], db['np2_length']] return sum(int(i) for i in nb if i) # Default return result = {'j_seq_start': None, 'j_seq_length': None, 'j_germ_start': None, 'j_germ_length': None} # Find J region if db['j_call']: # Query positions result['j_seq_start'] = _jstart() result['j_seq_length'] = len(record['J_SEQ'].strip('.')) # Germline positions result['j_germ_start'] = len(record['J_SEQ']) - len(record['J_SEQ'].lstrip('.')) result['j_germ_length'] = result['j_seq_length'] return result @staticmethod def _assembleVDJ(record, db): """ Build full length V(D)J sequence Arguments: record : dictionary containing a single row from the iHMMune-Align ouptut. db : database containing V and D alignment information. Returns: dict : database entries containing the full length V(D)J sequence. """ segments = [record['V_SEQ'].strip('.') if db['v_call'] else '', record['NP1_SEQ'] if db['np1_length'] else '', record['D_SEQ'].strip('.') if db['d_call'] else '', record['NP2_SEQ'] if db['np2_length'] else '', record['J_SEQ'].strip('.') if db['j_call'] else ''] return {'sequence_vdj': ''.join(segments)} @staticmethod def _parseScores(record): """ Parse alignment scores Arguments: record : dictionary containing a single row from the iHMMune-Align ouptut. Returns: dict : database entries for alignment scores. """ result = {} try: result['vdj_score'] = float(record['HMM_SCORE']) except (TypeError, ValueError): result['vdj_score'] = None return result def parseRecord(self, record): """ Parses a single row from each IMTG file. Arguments: record : dictionary containing one row of iHMMune-Align file. Returns: dict : database entry for the row. """ # Extract query ID and sequence query = record['SEQUENCE_ID'] db = {'sequence_id': query, 'sequence_input': str(self.sequences[query].seq)} # Check for valid alignment if not record['V_CALL'] or \ record['V_CALL'].startswith('NA - ') or \ record['V_CALL'].startswith('State path'): db['functional'] = None db['v_call'] = None db['d_call'] = None db['j_call'] = None return db # Parse record db.update(IHMMuneReader._parseFunctionality(record)) db.update(IHMMuneReader._parseGenes(record)) db.update(IHMMuneReader._parseNPHit(record)) db.update(IHMMuneReader._parseVHit(record, db)) db.update(IHMMuneReader._parseDHit(record, db)) db.update(IHMMuneReader._parseJHit(record, db)) db.update(IHMMuneReader._assembleVDJ(record, db)) # Create IMGT-gapped sequence if 'v_call' in db and db['v_call'] and 'sequence_vdj' in db and db['sequence_vdj']: try: imgt_dict = gapV(db['sequence_vdj'], v_germ_start=db['v_germ_start_vdj'], v_germ_length=db['v_germ_length_vdj'], v_call=db['v_call'], references=self.references) except KeyError as e: imgt_dict = {'sequence_imgt': None, 'v_germ_start_imgt': None, 'v_germ_length_imgt': None} printWarning(e) db.update(imgt_dict) # Infer IMGT junction if ('j_call' in db and db['j_call']) and ('sequence_imgt' in db and db['sequence_imgt']): junc_dict = inferJunction(db['sequence_imgt'], j_germ_start=db['j_germ_start'], j_germ_length=db['j_germ_length'], j_call=db['j_call'], references=self.references, regions='default') db.update(junc_dict) # Overall alignment score db.update(IHMMuneReader._parseScores(record)) # FWR and CDR regions rd = RegionDefinition(junction_length=db.get('junction_length', None), amino_acid=False, definition='default') db.update(rd.getRegions(db.get('sequence_imgt', None))) return db def __iter__(self): """ Iterator initializer. Returns: changeo.IO.IHMMuneReader """ return self def __next__(self): """ Next method. Returns: changeo.Receptor.Receptor : parsed IMGT/HighV-QUEST result as an Receptor (receptor=True) or dictionary (receptor=False). """ # Get next set of records from dictionary readers try: record = None while not record: record = next(self.records) except StopIteration: raise StopIteration db = self.parseRecord(record) if self.receptor: return Receptor(db) else: return db def readGermlines(references, asis=False, warn=False): """ Parses germline repositories Arguments: references (list): list of strings specifying directories and/or files from which to read germline records. asis (bool): if True use sequence ID as record name and do not parse headers for allele names. warn (bool): print warning messages to standard error if True. Returns: dict: Dictionary of germlines in the form {allele: sequence}. """ repo_files = [] # Iterate over items passed to commandline for r in references: if os.path.isdir(r): # If directory, get fasta files from within repo_files.extend([os.path.join(r, f) for f in os.listdir(r) \ if getFileType(f) == 'fasta']) elif os.path.isfile(r) and getFileType(r) == 'fasta': # If file, make sure file is fasta repo_files.extend([r]) # Catch instances where no valid fasta files were passed in if len(repo_files) < 1: printError('No valid germline fasta files (.fasta, .fna, .fa) were found at %s.' % ','.join(references)) repo_dict = {} duplicates = [] for file_name in repo_files: with open(file_name, 'rU') as file_handle: germlines = SeqIO.parse(file_handle, 'fasta') for g in germlines: germ_key = getAllele(g.description, 'first') if not asis else g.id if germ_key not in repo_dict: repo_dict[germ_key] = str(g.seq).upper() else: duplicates.append(g.description) if warn and len(duplicates) > 0: w = indent('\n'.join(duplicates), ' '*9) printWarning('Duplicated germline allele names excluded from references:\n%s' % w) return repo_dict def extractIMGT(imgt_output): """ Extract necessary files from IMGT/HighV-QUEST results. Arguments: imgt_output : zipped file or unzipped folder output by IMGT/HighV-QUEST. Returns: tuple : (temporary directory handle, dictionary with names of extracted IMGT files). """ # Map of IMGT file names imgt_names = ('1_Summary', '2_IMGT-gapped', '3_Nt-sequences', '6_Junction') imgt_keys = ('summary', 'gapped', 'ntseq', 'junction') # Open temporary directory and intialize return dictionary temp_dir = TemporaryDirectory() # Zip input if zipfile.is_zipfile(imgt_output): imgt_zip = zipfile.ZipFile(imgt_output, 'r') # Extract required files imgt_files = sorted([n for n in imgt_zip.namelist() \ if os.path.basename(n).startswith(imgt_names)]) imgt_zip.extractall(temp_dir.name, imgt_files) # Define file dictionary imgt_dict = {k: os.path.join(temp_dir.name, f) for k, f in zip_longest(imgt_keys, imgt_files)} # Folder input elif os.path.isdir(imgt_output): folder_files = [] for root, dirs, files in os.walk(imgt_output): folder_files.extend([os.path.join(os.path.abspath(root), f) for f in files]) # Define file dictionary imgt_files = sorted([n for n in folder_files \ if os.path.basename(n).startswith(imgt_names)]) imgt_dict = {k: f for k, f in zip_longest(imgt_keys, imgt_files)} # Tarball input elif tarfile.is_tarfile(imgt_output): imgt_tar = tarfile.open(imgt_output, 'r') # Extract required files imgt_files = sorted([n for n in imgt_tar.getnames() \ if os.path.basename(n).startswith(imgt_names)]) imgt_tar.extractall(temp_dir.name, [imgt_tar.getmember(n) for n in imgt_files]) # Define file dictionary imgt_dict = {k: os.path.join(temp_dir.name, f) for k, f in zip_longest(imgt_keys, imgt_files)} else: printError('Unsupported IGMT output file. Must be either a zipped file (.zip), LZMA compressed tarfile (.txz) or a folder.') # Check extraction for errors if len(imgt_dict) != len(imgt_names): printError('Extra files or missing necessary file IMGT output %s.' % imgt_output) return temp_dir, imgt_dict def countDbFile(file): """ Counts the records in database files Arguments: file : tab-delimited database file. Returns: int : count of records in the database file. """ # Count records and check file try: with open(file, 'rt') as db_handle: db_records = csv.reader(db_handle, dialect='excel-tab') for i, __ in enumerate(db_records): pass db_count = i except IOError: printError('File %s cannot be read.' % file) except: printError('File %s is invalid.' % file) else: if db_count == 0: printError('File %s is empty.' % file) return db_count def getDbFields(file, add=None, exclude=None, reader=TSVReader): """ Get field names from a db file Arguments: file : db file to pull base fields from. add : fields to append to the field set. exclude : fields to exclude from the field set. reader : reader class. Returns: list : list of field names """ try: with open(file, 'rt') as handle: fields = reader(handle).fields except IOError: printError('File %s cannot be read.' % file) except: printError('File %s is invalid.' % file) # Add extra fields if add is not None: if not isinstance(add, list): add = [add] fields.extend([f for f in add if f not in fields]) # Remove unwanted fields if exclude is not None: if not isinstance(exclude, list): exclude = [exclude] fields = [f for f in fields if f not in exclude] return fields def getFormatOperators(format): """ Simple wrapper for fetching the set of operator classes for a data format Arguments: format (str): name of the data format. Returns: tuple: a tuple with the reader class, writer class, and schema definition class. """ # Format options if format == 'changeo': reader = ChangeoReader writer = ChangeoWriter schema = ChangeoSchema elif format == 'changeo-aa': reader = ChangeoReader writer = ChangeoWriter schema = ChangeoSchemaAA elif format == 'airr': reader = AIRRReader writer = AIRRWriter schema = AIRRSchema elif format == 'airr-aa': reader = AIRRReader writer = AIRRWriter schema = AIRRSchemaAA else: raise ValueError return reader, writer, schema def splitName(file): """ Extract the extension from a file name Arguments: file (str): file name. Returns: tuple : tuple of the file directory, basename and extension. """ directory, filename = os.path.split(file) basename, extension = os.path.splitext(filename) extension = extension.lower().lstrip('.') return directory, basename, extension def getOutputName(file, out_label=None, out_dir=None, out_name=None, out_type=None): """ Creates and output filename from an existing filename Arguments: file : filename to base output file name on. out_label : text to be inserted before the file extension; if None do not add a label. out_type : the file extension of the output file; if None use input file extension. out_dir : the output directory; if None use directory of input file out_name : the short filename to use for the output file; if None use input file short name. Returns: str: file name. """ # Get filename components directory, basename, extension = splitName(file) # Define output directory if out_dir is None: out_dir = directory else: out_dir = os.path.abspath(out_dir) if not os.path.exists(out_dir): os.mkdir(out_dir) # Define output file prefix if out_name is None: out_name = basename # Define output file extension if out_type is None: out_type = extension # Define output file name if out_label is None: out_file = os.path.join(out_dir, '%s.%s' % (out_name, out_type)) else: out_file = os.path.join(out_dir, '%s_%s.%s' % (out_name, out_label, out_type)) # Return file name return out_file def getOutputHandle(file, out_label=None, out_dir=None, out_name=None, out_type=None): """ Opens an output file handle Arguments: file : filename to base output file name on. out_label : text to be inserted before the file extension; if None do not add a label. out_type : the file extension of the output file; if None use input file extension. out_dir : the output directory; if None use directory of input file out_name : the short filename to use for the output file; if None use input file short name. Returns: file : File handle """ out_file = getOutputName(file, out_label=out_label, out_dir=out_dir, out_name=out_name, out_type=out_type) # Open and return handle try: return open(out_file, mode='w') except: printError('File %s cannot be opened.' % out_file) def checkFields(attributes, header, schema=AIRRSchema): """ Checks that a file header contains a required set of Receptor attributes Arguments: attributes (list): list of Receptor attributes to check for. header (list): list of fields names in the file header. schema (object): schema object to convert field names to Receptor attributes. Returns: bool: True if all attributes mapping fields are found. Raises: LookupError: """ if schema is None: columns = attributes else: columns = [schema.fromReceptor(f) for f in attributes] missing = [x for x in columns if x not in header] if len(missing) > 0: raise LookupError('Missing required fields: %s' % ', '.join(missing)) return True def yamlDict(file): """ Returns a dictionary from a yaml file Arguments: file (str): simple yaml file with rows in the form 'argument: value'. Returns: dict: dictionary of key:value pairs in the file. """ try: yaml_dict = dict(yaml.load(open(file, 'r'), Loader=yaml.FullLoader)) except: printError('YAML file is invalid.') return yaml_dict changeo-1.2.0/changeo/Version.py0000644000175000017500000000046314136777037016071 0ustar nileshnilesh""" Version and authorship information """ __author__ = 'Namita Gupta, Jason Anthony Vander Heiden' __copyright__ = 'Copyright 2021 Kleinstein Lab, Yale University. All rights reserved.' __license__ = 'GNU Affero General Public License 3 (AGPL-3)' __version__ = '1.2.0' __date__ = '2021.10.29' changeo-1.2.0/changeo/data/0000755000175000017500000000000014136777167015004 5ustar nileshnileshchangeo-1.2.0/changeo/data/m1n_compat_dist.tsv0000644000175000017500000000022713674203454020611 0ustar nileshnilesh A C G T N . - A 0 2.86 1 2.14 0 0 0 C 2.86 0 2.14 1 0 0 0 G 1 2.14 0 2.86 0 0 0 T 2.14 1 2.86 0 0 0 0 N 0 0 0 0 0 0 0 . 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 changeo-1.2.0/changeo/data/hh_s1f_dist.tsv0000644000175000017500000000024313674203454017721 0ustar nileshnilesh A C G T N - . A 0 1.21 0.64 1.16 0 0 0 C 1.21 0 1.16 0.64 0 0 0 G 0.64 1.16 0 1.21 0 0 0 T 1.16 0.64 1.21 0 0 0 0 N 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 . 0 0 0 0 0 0 0 changeo-1.2.0/changeo/data/hh_s5f_dist.tsv0000644000175000017500000021315313674203454017733 0ustar nileshnilesh AAAAA AAAAC AAAAG AAAAT AAAAN AAACA AAACC AAACG AAACT AAACN AAAGA AAAGC AAAGG AAAGT AAAGN AAATA AAATC AAATG AAATT AAATN AAANA AAANC AAANG AAANT AAANN AACAA AACAC AACAG AACAT AACAN AACCA AACCC AACCG AACCT AACCN AACGA AACGC AACGG AACGT AACGN AACTA AACTC AACTG AACTT AACTN AACNA AACNC AACNG AACNT AACNN AAGAA AAGAC AAGAG AAGAT AAGAN AAGCA AAGCC AAGCG AAGCT AAGCN AAGGA AAGGC AAGGG AAGGT AAGGN AAGTA AAGTC AAGTG AAGTT AAGTN AAGNA AAGNC AAGNG AAGNT AAGNN AATAA AATAC AATAG AATAT AATAN AATCA AATCC AATCG AATCT AATCN AATGA AATGC AATGG AATGT AATGN AATTA AATTC AATTG AATTT AATTN AATNA AATNC AATNG AATNT AATNN AANAA AANAC AANAG AANAT AANAN AANCA AANCC AANCG AANCT AANCN AANGA AANGC AANGG AANGT AANGN AANTA AANTC AANTG AANTT AANTN AANNA AANNC AANNG AANNT AANNN ACAAA ACAAC ACAAG ACAAT ACAAN ACACA ACACC ACACG ACACT ACACN ACAGA ACAGC ACAGG ACAGT ACAGN ACATA ACATC ACATG ACATT ACATN ACANA ACANC ACANG ACANT ACANN ACCAA ACCAC ACCAG ACCAT ACCAN ACCCA ACCCC ACCCG ACCCT ACCCN ACCGA ACCGC ACCGG ACCGT ACCGN ACCTA ACCTC ACCTG ACCTT ACCTN ACCNA ACCNC ACCNG ACCNT ACCNN ACGAA ACGAC ACGAG ACGAT ACGAN ACGCA ACGCC ACGCG ACGCT ACGCN ACGGA ACGGC ACGGG ACGGT ACGGN ACGTA ACGTC ACGTG ACGTT ACGTN ACGNA ACGNC ACGNG ACGNT ACGNN ACTAA ACTAC ACTAG ACTAT ACTAN ACTCA ACTCC ACTCG ACTCT ACTCN ACTGA ACTGC ACTGG ACTGT ACTGN ACTTA ACTTC ACTTG ACTTT ACTTN ACTNA ACTNC ACTNG ACTNT ACTNN ACNAA ACNAC ACNAG ACNAT ACNAN ACNCA ACNCC ACNCG ACNCT ACNCN ACNGA ACNGC ACNGG ACNGT ACNGN ACNTA ACNTC ACNTG ACNTT ACNTN ACNNA ACNNC ACNNG ACNNT ACNNN AGAAA AGAAC AGAAG AGAAT AGAAN AGACA AGACC AGACG AGACT AGACN AGAGA AGAGC AGAGG AGAGT AGAGN AGATA AGATC AGATG AGATT AGATN AGANA AGANC AGANG AGANT AGANN AGCAA AGCAC AGCAG AGCAT AGCAN AGCCA AGCCC AGCCG AGCCT AGCCN AGCGA AGCGC AGCGG AGCGT AGCGN AGCTA AGCTC AGCTG AGCTT AGCTN AGCNA AGCNC AGCNG AGCNT AGCNN AGGAA AGGAC AGGAG AGGAT AGGAN AGGCA AGGCC AGGCG AGGCT AGGCN AGGGA AGGGC AGGGG AGGGT AGGGN AGGTA AGGTC AGGTG AGGTT AGGTN AGGNA AGGNC AGGNG AGGNT AGGNN AGTAA AGTAC AGTAG AGTAT AGTAN AGTCA AGTCC AGTCG AGTCT AGTCN AGTGA AGTGC AGTGG AGTGT AGTGN AGTTA AGTTC AGTTG AGTTT AGTTN AGTNA AGTNC AGTNG AGTNT AGTNN AGNAA AGNAC AGNAG AGNAT AGNAN AGNCA AGNCC AGNCG AGNCT AGNCN AGNGA AGNGC AGNGG AGNGT AGNGN AGNTA AGNTC AGNTG AGNTT AGNTN AGNNA AGNNC AGNNG AGNNT AGNNN ATAAA ATAAC ATAAG ATAAT ATAAN ATACA ATACC ATACG ATACT ATACN ATAGA ATAGC ATAGG ATAGT ATAGN ATATA ATATC ATATG ATATT ATATN ATANA ATANC ATANG ATANT ATANN ATCAA ATCAC ATCAG ATCAT ATCAN ATCCA ATCCC ATCCG ATCCT ATCCN ATCGA ATCGC ATCGG ATCGT ATCGN ATCTA ATCTC ATCTG ATCTT ATCTN ATCNA ATCNC ATCNG ATCNT ATCNN ATGAA ATGAC ATGAG ATGAT ATGAN ATGCA ATGCC ATGCG ATGCT ATGCN ATGGA ATGGC ATGGG ATGGT ATGGN ATGTA ATGTC ATGTG ATGTT ATGTN ATGNA ATGNC ATGNG ATGNT ATGNN ATTAA ATTAC ATTAG ATTAT ATTAN ATTCA ATTCC ATTCG ATTCT ATTCN ATTGA ATTGC ATTGG ATTGT ATTGN ATTTA ATTTC ATTTG ATTTT ATTTN ATTNA ATTNC ATTNG ATTNT ATTNN ATNAA ATNAC ATNAG ATNAT ATNAN ATNCA ATNCC ATNCG ATNCT ATNCN ATNGA ATNGC ATNGG ATNGT ATNGN ATNTA ATNTC ATNTG ATNTT ATNTN ATNNA ATNNC ATNNG ATNNT ATNNN ANAAA ANAAC ANAAG ANAAT ANAAN ANACA ANACC ANACG ANACT ANACN ANAGA ANAGC ANAGG ANAGT ANAGN ANATA ANATC ANATG ANATT ANATN ANANA ANANC ANANG ANANT ANANN ANCAA ANCAC ANCAG ANCAT ANCAN ANCCA ANCCC ANCCG ANCCT ANCCN ANCGA ANCGC ANCGG ANCGT ANCGN ANCTA ANCTC ANCTG ANCTT ANCTN ANCNA ANCNC ANCNG ANCNT ANCNN ANGAA ANGAC ANGAG ANGAT ANGAN ANGCA ANGCC ANGCG ANGCT ANGCN ANGGA ANGGC ANGGG ANGGT ANGGN ANGTA ANGTC ANGTG ANGTT ANGTN ANGNA ANGNC ANGNG ANGNT ANGNN ANTAA ANTAC ANTAG ANTAT ANTAN ANTCA ANTCC ANTCG ANTCT ANTCN ANTGA ANTGC ANTGG ANTGT ANTGN ANTTA ANTTC ANTTG ANTTT ANTTN ANTNA ANTNC ANTNG ANTNT ANTNN ANNAA ANNAC ANNAG ANNAT ANNAN ANNCA ANNCC ANNCG ANNCT ANNCN ANNGA ANNGC ANNGG ANNGT ANNGN ANNTA ANNTC ANNTG ANNTT ANNTN ANNNA ANNNC ANNNG ANNNT ANNNN CAAAA CAAAC CAAAG CAAAT CAAAN CAACA CAACC CAACG CAACT CAACN CAAGA CAAGC CAAGG CAAGT CAAGN CAATA CAATC CAATG CAATT CAATN CAANA CAANC CAANG CAANT CAANN CACAA CACAC CACAG CACAT CACAN CACCA CACCC CACCG CACCT CACCN CACGA CACGC CACGG CACGT CACGN CACTA CACTC CACTG CACTT CACTN CACNA CACNC CACNG CACNT CACNN CAGAA CAGAC CAGAG CAGAT CAGAN CAGCA CAGCC CAGCG CAGCT CAGCN CAGGA CAGGC CAGGG CAGGT CAGGN CAGTA CAGTC CAGTG CAGTT CAGTN CAGNA CAGNC CAGNG CAGNT CAGNN CATAA CATAC CATAG CATAT CATAN CATCA CATCC CATCG CATCT CATCN CATGA CATGC CATGG CATGT CATGN CATTA CATTC CATTG CATTT CATTN CATNA CATNC CATNG CATNT CATNN CANAA CANAC CANAG CANAT CANAN CANCA CANCC CANCG CANCT CANCN CANGA CANGC CANGG CANGT CANGN CANTA CANTC CANTG CANTT CANTN CANNA CANNC CANNG CANNT CANNN CCAAA CCAAC CCAAG CCAAT CCAAN CCACA CCACC CCACG CCACT CCACN CCAGA CCAGC CCAGG CCAGT CCAGN CCATA CCATC CCATG CCATT CCATN CCANA CCANC CCANG CCANT CCANN CCCAA CCCAC CCCAG CCCAT CCCAN CCCCA CCCCC CCCCG CCCCT CCCCN CCCGA CCCGC CCCGG CCCGT CCCGN CCCTA CCCTC CCCTG CCCTT CCCTN CCCNA CCCNC CCCNG CCCNT CCCNN CCGAA CCGAC CCGAG CCGAT CCGAN CCGCA CCGCC CCGCG CCGCT CCGCN CCGGA CCGGC CCGGG CCGGT CCGGN CCGTA CCGTC CCGTG CCGTT CCGTN CCGNA CCGNC CCGNG CCGNT CCGNN CCTAA CCTAC CCTAG CCTAT CCTAN CCTCA CCTCC CCTCG CCTCT CCTCN CCTGA CCTGC CCTGG CCTGT CCTGN CCTTA CCTTC CCTTG CCTTT CCTTN CCTNA CCTNC CCTNG CCTNT CCTNN CCNAA CCNAC CCNAG CCNAT CCNAN CCNCA CCNCC CCNCG CCNCT CCNCN CCNGA CCNGC CCNGG CCNGT CCNGN CCNTA CCNTC CCNTG CCNTT CCNTN CCNNA CCNNC CCNNG CCNNT CCNNN CGAAA CGAAC CGAAG CGAAT CGAAN CGACA CGACC CGACG CGACT CGACN CGAGA CGAGC CGAGG CGAGT CGAGN CGATA CGATC CGATG CGATT CGATN CGANA CGANC CGANG CGANT CGANN CGCAA CGCAC CGCAG CGCAT CGCAN CGCCA CGCCC CGCCG CGCCT CGCCN CGCGA CGCGC CGCGG CGCGT CGCGN CGCTA CGCTC CGCTG CGCTT CGCTN CGCNA CGCNC CGCNG CGCNT CGCNN CGGAA CGGAC CGGAG CGGAT CGGAN CGGCA CGGCC CGGCG CGGCT CGGCN CGGGA CGGGC CGGGG CGGGT CGGGN CGGTA CGGTC CGGTG CGGTT CGGTN CGGNA CGGNC CGGNG CGGNT CGGNN CGTAA CGTAC CGTAG CGTAT CGTAN CGTCA CGTCC CGTCG CGTCT CGTCN CGTGA CGTGC CGTGG CGTGT CGTGN CGTTA CGTTC CGTTG CGTTT CGTTN CGTNA CGTNC CGTNG CGTNT CGTNN CGNAA CGNAC CGNAG CGNAT CGNAN CGNCA CGNCC CGNCG CGNCT CGNCN CGNGA CGNGC CGNGG CGNGT CGNGN CGNTA CGNTC CGNTG CGNTT CGNTN CGNNA CGNNC CGNNG CGNNT CGNNN CTAAA CTAAC CTAAG CTAAT CTAAN CTACA CTACC CTACG CTACT CTACN CTAGA CTAGC CTAGG CTAGT CTAGN CTATA CTATC CTATG CTATT CTATN CTANA CTANC CTANG CTANT CTANN CTCAA CTCAC CTCAG CTCAT CTCAN CTCCA CTCCC CTCCG CTCCT CTCCN CTCGA CTCGC CTCGG CTCGT CTCGN CTCTA CTCTC CTCTG CTCTT CTCTN CTCNA CTCNC CTCNG CTCNT CTCNN CTGAA CTGAC CTGAG CTGAT CTGAN CTGCA CTGCC CTGCG CTGCT CTGCN CTGGA CTGGC CTGGG CTGGT CTGGN CTGTA CTGTC CTGTG CTGTT CTGTN CTGNA CTGNC CTGNG CTGNT CTGNN CTTAA CTTAC CTTAG CTTAT CTTAN CTTCA CTTCC CTTCG CTTCT CTTCN CTTGA CTTGC CTTGG CTTGT CTTGN CTTTA CTTTC CTTTG CTTTT CTTTN CTTNA CTTNC CTTNG CTTNT CTTNN CTNAA CTNAC CTNAG CTNAT CTNAN CTNCA CTNCC CTNCG CTNCT CTNCN CTNGA CTNGC CTNGG CTNGT CTNGN CTNTA CTNTC CTNTG CTNTT CTNTN CTNNA CTNNC CTNNG CTNNT CTNNN CNAAA CNAAC CNAAG CNAAT CNAAN CNACA CNACC CNACG CNACT CNACN CNAGA CNAGC CNAGG CNAGT CNAGN CNATA CNATC CNATG CNATT CNATN CNANA CNANC CNANG CNANT CNANN CNCAA CNCAC CNCAG CNCAT CNCAN CNCCA CNCCC CNCCG CNCCT CNCCN CNCGA CNCGC CNCGG CNCGT CNCGN CNCTA CNCTC CNCTG CNCTT CNCTN CNCNA CNCNC CNCNG CNCNT CNCNN CNGAA CNGAC CNGAG CNGAT CNGAN CNGCA CNGCC CNGCG CNGCT CNGCN CNGGA CNGGC CNGGG CNGGT CNGGN CNGTA CNGTC CNGTG CNGTT CNGTN CNGNA CNGNC CNGNG CNGNT CNGNN CNTAA CNTAC CNTAG CNTAT CNTAN CNTCA CNTCC CNTCG CNTCT CNTCN CNTGA CNTGC CNTGG CNTGT CNTGN CNTTA CNTTC CNTTG CNTTT CNTTN CNTNA CNTNC CNTNG CNTNT CNTNN CNNAA CNNAC CNNAG CNNAT CNNAN CNNCA CNNCC CNNCG CNNCT CNNCN CNNGA CNNGC CNNGG CNNGT CNNGN CNNTA CNNTC CNNTG CNNTT CNNTN CNNNA CNNNC CNNNG CNNNT CNNNN GAAAA GAAAC GAAAG GAAAT GAAAN GAACA GAACC GAACG GAACT GAACN GAAGA GAAGC GAAGG GAAGT GAAGN GAATA GAATC GAATG GAATT GAATN GAANA GAANC GAANG GAANT GAANN GACAA GACAC GACAG GACAT GACAN GACCA GACCC GACCG GACCT GACCN GACGA GACGC GACGG GACGT GACGN GACTA GACTC GACTG GACTT GACTN GACNA GACNC GACNG GACNT GACNN GAGAA GAGAC GAGAG GAGAT GAGAN GAGCA GAGCC GAGCG GAGCT GAGCN GAGGA GAGGC GAGGG GAGGT GAGGN GAGTA GAGTC GAGTG GAGTT GAGTN GAGNA GAGNC GAGNG GAGNT GAGNN GATAA GATAC GATAG GATAT GATAN GATCA GATCC GATCG GATCT GATCN GATGA GATGC GATGG GATGT GATGN GATTA GATTC GATTG GATTT GATTN GATNA GATNC GATNG GATNT GATNN GANAA GANAC GANAG GANAT GANAN GANCA GANCC GANCG GANCT GANCN GANGA GANGC GANGG GANGT GANGN GANTA GANTC GANTG GANTT GANTN GANNA GANNC GANNG GANNT GANNN GCAAA GCAAC GCAAG GCAAT GCAAN GCACA GCACC GCACG GCACT GCACN GCAGA GCAGC GCAGG GCAGT GCAGN GCATA GCATC GCATG GCATT GCATN GCANA GCANC GCANG GCANT GCANN GCCAA GCCAC GCCAG GCCAT GCCAN GCCCA GCCCC GCCCG GCCCT GCCCN GCCGA GCCGC GCCGG GCCGT GCCGN GCCTA GCCTC GCCTG GCCTT GCCTN GCCNA GCCNC GCCNG GCCNT GCCNN GCGAA GCGAC GCGAG GCGAT GCGAN GCGCA GCGCC GCGCG GCGCT GCGCN GCGGA GCGGC GCGGG GCGGT GCGGN GCGTA GCGTC GCGTG GCGTT GCGTN GCGNA GCGNC GCGNG GCGNT GCGNN GCTAA GCTAC GCTAG GCTAT GCTAN GCTCA GCTCC GCTCG GCTCT GCTCN GCTGA GCTGC GCTGG GCTGT GCTGN GCTTA GCTTC GCTTG GCTTT GCTTN GCTNA GCTNC GCTNG GCTNT GCTNN GCNAA GCNAC GCNAG GCNAT GCNAN GCNCA GCNCC GCNCG GCNCT GCNCN GCNGA GCNGC GCNGG GCNGT GCNGN GCNTA GCNTC GCNTG GCNTT GCNTN GCNNA GCNNC GCNNG GCNNT GCNNN GGAAA GGAAC GGAAG GGAAT GGAAN GGACA GGACC GGACG GGACT GGACN GGAGA GGAGC GGAGG GGAGT GGAGN GGATA GGATC GGATG GGATT GGATN GGANA GGANC GGANG GGANT GGANN GGCAA GGCAC GGCAG GGCAT GGCAN GGCCA GGCCC GGCCG GGCCT GGCCN GGCGA GGCGC GGCGG GGCGT GGCGN GGCTA GGCTC GGCTG GGCTT GGCTN GGCNA GGCNC GGCNG GGCNT GGCNN GGGAA GGGAC GGGAG GGGAT GGGAN GGGCA GGGCC GGGCG GGGCT GGGCN GGGGA GGGGC GGGGG GGGGT GGGGN GGGTA GGGTC GGGTG GGGTT GGGTN GGGNA GGGNC GGGNG GGGNT GGGNN GGTAA GGTAC GGTAG GGTAT GGTAN GGTCA GGTCC GGTCG GGTCT GGTCN GGTGA GGTGC GGTGG GGTGT GGTGN GGTTA GGTTC GGTTG GGTTT GGTTN GGTNA GGTNC GGTNG GGTNT GGTNN GGNAA GGNAC GGNAG GGNAT GGNAN GGNCA GGNCC GGNCG GGNCT GGNCN GGNGA GGNGC GGNGG GGNGT GGNGN GGNTA GGNTC GGNTG GGNTT GGNTN GGNNA GGNNC GGNNG GGNNT GGNNN GTAAA GTAAC GTAAG GTAAT GTAAN GTACA GTACC GTACG GTACT GTACN GTAGA GTAGC GTAGG GTAGT GTAGN GTATA GTATC GTATG GTATT GTATN GTANA GTANC GTANG GTANT GTANN GTCAA GTCAC GTCAG GTCAT GTCAN GTCCA GTCCC GTCCG GTCCT GTCCN GTCGA GTCGC GTCGG GTCGT GTCGN GTCTA GTCTC GTCTG GTCTT GTCTN GTCNA GTCNC GTCNG GTCNT GTCNN GTGAA GTGAC GTGAG GTGAT GTGAN GTGCA GTGCC GTGCG GTGCT GTGCN GTGGA GTGGC GTGGG GTGGT GTGGN GTGTA GTGTC GTGTG GTGTT GTGTN GTGNA GTGNC GTGNG GTGNT GTGNN GTTAA GTTAC GTTAG GTTAT GTTAN GTTCA GTTCC GTTCG GTTCT GTTCN GTTGA GTTGC GTTGG GTTGT GTTGN GTTTA GTTTC GTTTG GTTTT GTTTN GTTNA GTTNC GTTNG GTTNT GTTNN GTNAA GTNAC GTNAG GTNAT GTNAN GTNCA GTNCC GTNCG GTNCT GTNCN GTNGA GTNGC GTNGG GTNGT GTNGN GTNTA GTNTC GTNTG GTNTT GTNTN GTNNA GTNNC GTNNG GTNNT GTNNN GNAAA GNAAC GNAAG GNAAT GNAAN GNACA GNACC GNACG GNACT GNACN GNAGA GNAGC GNAGG GNAGT GNAGN GNATA GNATC GNATG GNATT GNATN GNANA GNANC GNANG GNANT GNANN GNCAA GNCAC GNCAG GNCAT GNCAN GNCCA GNCCC GNCCG GNCCT GNCCN GNCGA GNCGC GNCGG GNCGT GNCGN GNCTA GNCTC GNCTG GNCTT GNCTN GNCNA GNCNC GNCNG GNCNT GNCNN GNGAA GNGAC GNGAG GNGAT GNGAN GNGCA GNGCC GNGCG GNGCT GNGCN GNGGA GNGGC GNGGG GNGGT GNGGN GNGTA GNGTC GNGTG GNGTT GNGTN GNGNA GNGNC GNGNG GNGNT GNGNN GNTAA GNTAC GNTAG GNTAT GNTAN GNTCA GNTCC GNTCG GNTCT GNTCN GNTGA GNTGC GNTGG GNTGT GNTGN GNTTA GNTTC GNTTG GNTTT GNTTN GNTNA GNTNC GNTNG GNTNT GNTNN GNNAA GNNAC GNNAG GNNAT GNNAN GNNCA GNNCC GNNCG GNNCT GNNCN GNNGA GNNGC GNNGG GNNGT GNNGN GNNTA GNNTC GNNTG GNNTT GNNTN GNNNA GNNNC GNNNG GNNNT GNNNN TAAAA TAAAC TAAAG TAAAT TAAAN TAACA TAACC TAACG TAACT TAACN TAAGA TAAGC TAAGG TAAGT TAAGN TAATA TAATC TAATG TAATT TAATN TAANA TAANC TAANG TAANT TAANN TACAA TACAC TACAG TACAT TACAN TACCA TACCC TACCG TACCT TACCN TACGA TACGC TACGG TACGT TACGN TACTA TACTC TACTG TACTT TACTN TACNA TACNC TACNG TACNT TACNN TAGAA TAGAC TAGAG TAGAT TAGAN TAGCA TAGCC TAGCG TAGCT TAGCN TAGGA TAGGC TAGGG TAGGT TAGGN TAGTA TAGTC TAGTG TAGTT TAGTN TAGNA TAGNC TAGNG TAGNT TAGNN TATAA TATAC TATAG TATAT TATAN TATCA TATCC TATCG TATCT TATCN TATGA TATGC TATGG TATGT TATGN TATTA TATTC TATTG TATTT TATTN TATNA TATNC TATNG TATNT TATNN TANAA TANAC TANAG TANAT TANAN TANCA TANCC TANCG TANCT TANCN TANGA TANGC TANGG TANGT TANGN TANTA TANTC TANTG TANTT TANTN TANNA TANNC TANNG TANNT TANNN TCAAA TCAAC TCAAG TCAAT TCAAN TCACA TCACC TCACG TCACT TCACN TCAGA TCAGC TCAGG TCAGT TCAGN TCATA TCATC TCATG TCATT TCATN TCANA TCANC TCANG TCANT TCANN TCCAA TCCAC TCCAG TCCAT TCCAN TCCCA TCCCC TCCCG TCCCT TCCCN TCCGA TCCGC TCCGG TCCGT TCCGN TCCTA TCCTC TCCTG TCCTT TCCTN TCCNA TCCNC TCCNG TCCNT TCCNN TCGAA TCGAC TCGAG TCGAT TCGAN TCGCA TCGCC TCGCG TCGCT TCGCN TCGGA TCGGC TCGGG TCGGT TCGGN TCGTA TCGTC TCGTG TCGTT TCGTN TCGNA TCGNC TCGNG TCGNT TCGNN TCTAA TCTAC TCTAG TCTAT TCTAN TCTCA TCTCC TCTCG TCTCT TCTCN TCTGA TCTGC TCTGG TCTGT TCTGN TCTTA TCTTC TCTTG TCTTT TCTTN TCTNA TCTNC TCTNG TCTNT TCTNN TCNAA TCNAC TCNAG TCNAT TCNAN TCNCA TCNCC TCNCG TCNCT TCNCN TCNGA TCNGC TCNGG TCNGT TCNGN TCNTA TCNTC TCNTG TCNTT TCNTN TCNNA TCNNC TCNNG TCNNT TCNNN TGAAA TGAAC TGAAG TGAAT TGAAN TGACA TGACC TGACG TGACT TGACN TGAGA TGAGC TGAGG TGAGT TGAGN TGATA TGATC TGATG TGATT TGATN TGANA TGANC TGANG TGANT TGANN TGCAA TGCAC TGCAG TGCAT TGCAN TGCCA TGCCC TGCCG TGCCT TGCCN TGCGA TGCGC TGCGG TGCGT TGCGN TGCTA TGCTC TGCTG TGCTT TGCTN TGCNA TGCNC TGCNG TGCNT TGCNN TGGAA TGGAC TGGAG TGGAT TGGAN TGGCA TGGCC TGGCG TGGCT TGGCN TGGGA TGGGC TGGGG TGGGT TGGGN TGGTA TGGTC TGGTG TGGTT TGGTN TGGNA TGGNC TGGNG TGGNT TGGNN TGTAA TGTAC TGTAG TGTAT TGTAN TGTCA TGTCC TGTCG TGTCT TGTCN TGTGA TGTGC TGTGG TGTGT TGTGN TGTTA TGTTC TGTTG TGTTT TGTTN TGTNA TGTNC TGTNG TGTNT TGTNN TGNAA TGNAC TGNAG TGNAT TGNAN TGNCA TGNCC TGNCG TGNCT TGNCN TGNGA TGNGC TGNGG TGNGT TGNGN TGNTA TGNTC TGNTG TGNTT TGNTN TGNNA TGNNC TGNNG TGNNT TGNNN TTAAA TTAAC TTAAG TTAAT TTAAN TTACA TTACC TTACG TTACT TTACN TTAGA TTAGC TTAGG TTAGT TTAGN TTATA TTATC TTATG TTATT TTATN TTANA TTANC TTANG TTANT TTANN TTCAA TTCAC TTCAG TTCAT TTCAN TTCCA TTCCC TTCCG TTCCT TTCCN TTCGA TTCGC TTCGG TTCGT TTCGN TTCTA TTCTC TTCTG TTCTT TTCTN TTCNA TTCNC TTCNG TTCNT TTCNN TTGAA TTGAC TTGAG TTGAT TTGAN TTGCA TTGCC TTGCG TTGCT TTGCN TTGGA TTGGC TTGGG TTGGT TTGGN TTGTA TTGTC TTGTG TTGTT TTGTN TTGNA TTGNC TTGNG TTGNT TTGNN TTTAA TTTAC TTTAG TTTAT TTTAN TTTCA TTTCC TTTCG TTTCT TTTCN TTTGA TTTGC TTTGG TTTGT TTTGN TTTTA TTTTC TTTTG TTTTT TTTTN TTTNA TTTNC TTTNG TTTNT TTTNN TTNAA TTNAC TTNAG TTNAT TTNAN TTNCA TTNCC TTNCG TTNCT TTNCN TTNGA TTNGC TTNGG TTNGT TTNGN TTNTA TTNTC TTNTG TTNTT TTNTN TTNNA TTNNC TTNNG TTNNT TTNNN TNAAA TNAAC TNAAG TNAAT TNAAN TNACA TNACC TNACG TNACT TNACN TNAGA TNAGC TNAGG TNAGT TNAGN TNATA TNATC TNATG TNATT TNATN TNANA TNANC TNANG TNANT TNANN TNCAA TNCAC TNCAG TNCAT TNCAN TNCCA TNCCC TNCCG TNCCT TNCCN TNCGA TNCGC TNCGG TNCGT TNCGN TNCTA TNCTC TNCTG TNCTT TNCTN TNCNA TNCNC TNCNG TNCNT TNCNN TNGAA TNGAC TNGAG TNGAT TNGAN TNGCA TNGCC TNGCG TNGCT TNGCN TNGGA TNGGC TNGGG TNGGT TNGGN TNGTA TNGTC TNGTG TNGTT TNGTN TNGNA TNGNC TNGNG TNGNT TNGNN TNTAA TNTAC TNTAG TNTAT TNTAN TNTCA TNTCC TNTCG TNTCT TNTCN TNTGA TNTGC TNTGG TNTGT TNTGN TNTTA TNTTC TNTTG TNTTT TNTTN TNTNA TNTNC TNTNG TNTNT TNTNN TNNAA TNNAC TNNAG TNNAT TNNAN TNNCA TNNCC TNNCG TNNCT TNNCN TNNGA TNNGC TNNGG TNNGT TNNGN TNNTA TNNTC TNNTG TNNTT TNNTN TNNNA TNNNC TNNNG TNNNT TNNNN NAAAA NAAAC NAAAG NAAAT NAAAN NAACA NAACC NAACG NAACT NAACN NAAGA NAAGC NAAGG NAAGT NAAGN NAATA NAATC NAATG NAATT NAATN NAANA NAANC NAANG NAANT NAANN NACAA NACAC NACAG NACAT NACAN NACCA NACCC NACCG NACCT NACCN NACGA NACGC NACGG NACGT NACGN NACTA NACTC NACTG NACTT NACTN NACNA NACNC NACNG NACNT NACNN NAGAA NAGAC NAGAG NAGAT NAGAN NAGCA NAGCC NAGCG NAGCT NAGCN NAGGA NAGGC NAGGG NAGGT NAGGN NAGTA NAGTC NAGTG NAGTT NAGTN NAGNA NAGNC NAGNG NAGNT NAGNN NATAA NATAC NATAG NATAT NATAN NATCA NATCC NATCG NATCT NATCN NATGA NATGC NATGG NATGT NATGN NATTA NATTC NATTG NATTT NATTN NATNA NATNC NATNG NATNT NATNN NANAA NANAC NANAG NANAT NANAN NANCA NANCC NANCG NANCT NANCN NANGA NANGC NANGG NANGT NANGN NANTA NANTC NANTG NANTT NANTN NANNA NANNC NANNG NANNT NANNN NCAAA NCAAC NCAAG NCAAT NCAAN NCACA NCACC NCACG NCACT NCACN NCAGA NCAGC NCAGG NCAGT NCAGN NCATA NCATC NCATG NCATT NCATN NCANA NCANC NCANG NCANT NCANN NCCAA NCCAC NCCAG NCCAT NCCAN NCCCA NCCCC NCCCG NCCCT NCCCN NCCGA NCCGC NCCGG NCCGT NCCGN NCCTA NCCTC NCCTG NCCTT NCCTN NCCNA NCCNC NCCNG NCCNT NCCNN NCGAA NCGAC NCGAG NCGAT NCGAN NCGCA NCGCC NCGCG NCGCT NCGCN NCGGA NCGGC NCGGG NCGGT NCGGN NCGTA NCGTC NCGTG NCGTT NCGTN NCGNA NCGNC NCGNG NCGNT NCGNN NCTAA NCTAC NCTAG NCTAT NCTAN NCTCA NCTCC NCTCG NCTCT NCTCN NCTGA NCTGC NCTGG NCTGT NCTGN NCTTA NCTTC NCTTG NCTTT NCTTN NCTNA NCTNC NCTNG NCTNT NCTNN NCNAA NCNAC NCNAG NCNAT NCNAN NCNCA NCNCC NCNCG NCNCT NCNCN NCNGA NCNGC NCNGG NCNGT NCNGN NCNTA NCNTC NCNTG NCNTT NCNTN NCNNA NCNNC NCNNG NCNNT NCNNN NGAAA NGAAC NGAAG NGAAT NGAAN NGACA NGACC NGACG NGACT NGACN NGAGA NGAGC NGAGG NGAGT NGAGN NGATA NGATC NGATG NGATT NGATN NGANA NGANC NGANG NGANT NGANN NGCAA NGCAC NGCAG NGCAT NGCAN NGCCA NGCCC NGCCG NGCCT NGCCN NGCGA NGCGC NGCGG NGCGT NGCGN NGCTA NGCTC NGCTG NGCTT NGCTN NGCNA NGCNC NGCNG NGCNT NGCNN NGGAA NGGAC NGGAG NGGAT NGGAN NGGCA NGGCC NGGCG NGGCT NGGCN NGGGA NGGGC NGGGG NGGGT NGGGN NGGTA NGGTC NGGTG NGGTT NGGTN NGGNA NGGNC NGGNG NGGNT NGGNN NGTAA NGTAC NGTAG NGTAT NGTAN NGTCA NGTCC NGTCG NGTCT NGTCN NGTGA NGTGC NGTGG NGTGT NGTGN NGTTA NGTTC NGTTG NGTTT NGTTN NGTNA NGTNC NGTNG NGTNT NGTNN NGNAA NGNAC NGNAG NGNAT NGNAN NGNCA NGNCC NGNCG NGNCT NGNCN NGNGA NGNGC NGNGG NGNGT NGNGN NGNTA NGNTC NGNTG NGNTT NGNTN NGNNA NGNNC NGNNG NGNNT NGNNN NTAAA NTAAC NTAAG NTAAT NTAAN NTACA NTACC NTACG NTACT NTACN NTAGA NTAGC NTAGG NTAGT NTAGN NTATA NTATC NTATG NTATT NTATN NTANA NTANC NTANG NTANT NTANN NTCAA NTCAC NTCAG NTCAT NTCAN NTCCA NTCCC NTCCG NTCCT NTCCN NTCGA NTCGC NTCGG NTCGT NTCGN NTCTA NTCTC NTCTG NTCTT NTCTN NTCNA NTCNC NTCNG NTCNT NTCNN NTGAA NTGAC NTGAG NTGAT NTGAN NTGCA NTGCC NTGCG NTGCT NTGCN NTGGA NTGGC NTGGG NTGGT NTGGN NTGTA NTGTC NTGTG NTGTT NTGTN NTGNA NTGNC NTGNG NTGNT NTGNN NTTAA NTTAC NTTAG NTTAT NTTAN NTTCA NTTCC NTTCG NTTCT NTTCN NTTGA NTTGC NTTGG NTTGT NTTGN NTTTA NTTTC NTTTG NTTTT NTTTN NTTNA NTTNC NTTNG NTTNT NTTNN NTNAA NTNAC NTNAG NTNAT NTNAN NTNCA NTNCC NTNCG NTNCT NTNCN NTNGA NTNGC NTNGG NTNGT NTNGN NTNTA NTNTC NTNTG NTNTT NTNTN NTNNA NTNNC NTNNG NTNNT NTNNN NNAAA NNAAC NNAAG NNAAT NNAAN NNACA NNACC NNACG NNACT NNACN NNAGA NNAGC NNAGG NNAGT NNAGN NNATA NNATC NNATG NNATT NNATN NNANA NNANC NNANG NNANT NNANN NNCAA NNCAC NNCAG NNCAT NNCAN NNCCA NNCCC NNCCG NNCCT NNCCN NNCGA NNCGC NNCGG NNCGT NNCGN NNCTA NNCTC NNCTG NNCTT NNCTN NNCNA NNCNC NNCNG NNCNT NNCNN NNGAA NNGAC NNGAG NNGAT NNGAN NNGCA NNGCC NNGCG NNGCT NNGCN NNGGA NNGGC NNGGG NNGGT NNGGN NNGTA NNGTC NNGTG NNGTT NNGTN NNGNA NNGNC NNGNG NNGNT NNGNN NNTAA NNTAC NNTAG NNTAT NNTAN NNTCA NNTCC NNTCG NNTCT NNTCN NNTGA NNTGC NNTGG NNTGT NNTGN NNTTA NNTTC NNTTG NNTTT NNTTN NNTNA NNTNC NNTNG NNTNT NNTNN NNNAA NNNAC NNNAG NNNAT NNNAN NNNCA NNNCC NNNCG NNNCT NNNCN NNNGA NNNGC NNNGG NNNGT NNNGN NNNTA NNNTC NNNTG NNNTT NNNTN NNNNA NNNNC NNNNG NNNNT NNNNN A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.93 0.93 0.99 0.99 0.95 0.89 0.9 0.95 0.92 0.91 1 1.1 0.89 0.9 0.95 1 0.96 0.93 0.95 0.96 0.95 0.95 0.94 0.93 0.94 1.03 1.11 1.05 0.95 1.02 0.78 1.1 0.86 0.79 0.84 1.01 1.16 1.16 0.9 1.01 0.76 1.14 0.95 0.84 0.86 0.84 1.12 0.96 0.86 0.91 0.94 0.94 0.91 0.94 0.93 1.1 1.08 1.09 1.09 1.08 1.08 0.98 0.98 0.98 1 1.07 1.13 0.96 1.04 1.03 1.04 1.01 0.97 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.11 1.08 1.08 1.15 1.1 1.05 1.04 1.05 1.12 1.06 1.24 1.17 1.16 1.19 1.18 1.06 1.1 1.03 1.03 1.05 1.09 1.08 1.06 1.1 1.08 1.01 1.13 1.01 0.8 0.92 0.75 0.96 0.93 0.73 0.8 0.92 1.14 1.11 1.02 1.02 0.71 1 0.91 0.85 0.82 0.8 1.04 0.96 0.81 0.87 0.95 0.98 0.99 1 0.98 1.19 1.13 1.22 1.17 1.16 1.18 1.02 1.16 1.13 1.1 1.13 1.14 1.11 1.13 1.12 1.07 1.04 1.09 1.09 1.07 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.7 0.7 0.69 0.68 0.69 0.93 0.83 0.87 1.01 0.89 0.86 1.03 0.89 0.86 0.89 0.91 0.99 0.91 0.89 0.92 0.82 0.84 0.81 0.82 0.82 0.96 1.05 1.13 0.93 0.99 1.03 1.07 1.03 0.74 0.88 0.92 1.07 1.11 0.91 0.97 0.77 1.22 0.89 0.82 0.85 0.87 1.09 0.99 0.82 0.9 1.03 1 0.91 0.95 0.96 1.11 1.1 1.13 1.18 1.12 1.05 1.08 1.01 1.03 1.04 1.16 1 1.06 1.03 1.05 1.07 1.03 1.01 1.03 1.04 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.07 1.05 1.02 1.11 1.06 1.11 1.05 1.06 0.9 1 1.04 1.18 1.17 1.13 1.11 1.04 1.11 1.02 1.03 1.04 1.04 1.09 1.04 1 1.04 0.96 1.13 1.01 0.85 0.95 0.75 0.97 0.88 0.8 0.82 0.91 1.11 1.18 0.94 0.99 0.79 1.05 0.92 0.86 0.87 0.82 1.05 0.95 0.85 0.89 0.98 0.97 0.92 0.94 0.95 1.21 1.21 1.21 1.21 1.21 1.14 1.08 1.09 1.06 1.09 1.2 1.15 1.16 1.22 1.18 1.1 1.07 1.06 1.07 1.08 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.88 0.88 0.89 0.89 0.88 0.96 0.91 0.95 0.96 0.94 0.99 1.11 0.97 0.95 0.99 0.99 1.03 0.96 0.96 0.98 0.95 0.96 0.93 0.93 0.94 0.98 1.1 1.04 0.86 0.96 0.79 1.01 0.91 0.76 0.83 0.93 1.12 1.14 0.93 1 0.75 1.07 0.91 0.84 0.85 0.83 1.07 0.96 0.83 0.89 0.97 0.96 0.93 0.95 0.95 1.15 1.11 1.15 1.15 1.14 1.11 1.03 1.04 1.04 1.05 1.11 1.09 1.05 1.09 1.08 1.07 1.03 1.02 1.04 1.04 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.22 1.05 1.1 1.17 1.12 1.13 1.11 1.12 1.12 1.1 1.11 1.08 1.13 1.15 1.09 1.1 1.22 1.05 1.15 1.12 1.14 1.09 1.06 1.14 1.1 1.03 1.19 1.02 1.02 1.04 0.81 1.07 0.95 0.79 0.86 0.93 1.17 1.18 0.93 1 0.74 0.96 0.93 0.86 0.84 0.82 1.04 0.98 0.88 0.9 0.89 0.95 0.98 0.95 0.94 1.16 1.13 1.18 1.07 1.13 0.96 0.99 1.11 0.96 0.99 1.02 1.21 0.96 1.02 1.01 0.99 1.04 1.02 0.99 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.22 1.18 1.21 1.21 1.2 1.17 1.22 1.17 1.16 1.17 1.19 1.32 1.18 1.28 1.23 1.31 1.17 1.2 1.29 1.22 1.22 1.17 1.17 1.23 1.2 1.01 1.1 1.01 0.92 0.99 0.74 1.09 0.9 0.78 0.82 0.92 1.15 1.23 0.94 1 0.74 1.1 0.91 0.84 0.85 0.81 1.1 0.96 0.86 0.89 0.96 0.93 0.96 0.97 0.95 1.2 1.18 1.23 1.19 1.2 1.09 1.19 1.12 1.02 1.07 1.13 1.27 1.19 1.07 1.15 1.06 1.09 1.1 1.03 1.06 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.85 0.86 0.89 1.16 0.91 1.16 0.98 1 1.13 1.05 1.01 1.01 1.03 0.99 1.01 1.01 1.12 0.99 1.17 1.05 0.97 0.97 0.95 1.1 0.99 0.95 1.06 0.99 0.84 0.91 0.88 0.92 0.92 0.79 0.86 0.94 1.07 1.1 0.93 0.99 0.73 1.1 0.89 0.84 0.84 0.83 1.01 0.95 0.83 0.88 0.89 0.98 0.91 0.97 0.93 1.14 1.12 1.13 1.14 1.13 1.03 1.02 0.99 0.95 0.99 1.01 1.03 1.01 1.06 1.03 1 1.02 0.99 1.01 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.21 1.18 1.18 1.16 1.17 1.3 1.12 1.14 1.14 1.16 1.19 1.18 1.22 1.19 1.19 1.24 1.27 1.19 1.24 1.23 1.23 1.17 1.18 1.18 1.19 0.96 1.2 1 0.85 0.95 0.86 1.1 0.87 0.78 0.86 0.99 1.11 1.09 1.03 1.05 0.85 1.14 0.93 0.9 0.92 0.9 1.13 0.94 0.86 0.92 0.95 0.86 0.97 0.95 0.93 1.21 1.21 1.21 1.21 1.21 1.16 1.08 1.19 1.22 1.14 1.18 1.19 1.18 1.17 1.18 1.08 1.03 1.11 1.09 1.08 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.06 1.02 1.05 1.16 1.06 1.17 1.05 1.08 1.12 1.1 1.1 1.1 1.08 1.1 1.09 1.11 1.19 1.08 1.2 1.13 1.1 1.08 1.06 1.14 1.09 0.98 1.12 1 0.89 0.97 0.81 1.02 0.91 0.78 0.85 0.94 1.12 1.14 0.95 1.01 0.76 1.05 0.91 0.85 0.86 0.83 1.06 0.96 0.85 0.9 0.92 0.92 0.95 0.96 0.94 1.17 1.16 1.18 1.15 1.16 1.03 1.05 1.08 1.01 1.04 1.07 1.14 1.04 1.06 1.07 1.03 1.04 1.05 1.02 1.03 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.18 0.93 1.21 1.01 1.05 1.02 1.06 1.11 1.08 1.06 1.2 1.02 1.21 1.09 1.11 1.1 1.16 1.29 1.17 1.15 1.07 1.02 1.19 1.08 1.08 0.88 1.11 1.02 0.83 0.93 0.93 0.92 0.93 0.71 0.83 0.82 1.17 1.17 0.89 0.93 0.79 0.98 0.91 0.85 0.85 0.83 1.01 0.95 0.8 0.88 0.96 0.81 0.92 0.92 0.89 1.2 1.16 1.23 1.12 1.16 0.99 0.91 0.97 0.81 0.9 0.97 1.11 1.17 0.98 1.04 1.01 0.95 1.03 0.92 0.97 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.25 1.13 1.16 1.2 1.16 1.2 1.19 1.21 1.14 1.18 1.25 1.25 1.38 1.2 1.26 1.22 1.26 1.21 1.18 1.21 1.2 1.19 1.21 1.17 1.19 0.99 1.14 1.11 0.95 1.02 0.84 0.99 0.85 0.96 0.89 1.03 1.05 1.15 0.96 1.03 0.8 1 0.94 0.86 0.88 0.88 1.04 0.97 0.91 0.94 1.01 0.96 1.05 0.95 0.99 1.24 1.23 1.19 1.25 1.22 1.17 1.02 1.13 1.07 1.09 1.04 1.14 1.19 1.03 1.07 1.09 1.04 1.12 1.05 1.07 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.22 0.75 1.15 1.11 0.96 1.18 1.31 1.19 1.19 1.21 1.1 1.2 1.17 1.21 1.16 1.11 1.23 1.24 1.19 1.18 1.13 1 1.18 1.16 1.1 0.92 1.05 0.9 0.77 0.87 0.73 1.08 0.91 0.82 0.78 0.81 1.25 1.08 0.96 0.95 0.71 1.08 0.92 0.76 0.79 0.73 1.1 0.93 0.8 0.82 0.86 0.8 0.88 0.88 0.85 1.15 1.18 1.15 1.04 1.12 0.99 1.02 0.99 1.02 1 1.02 1 1.16 0.97 1.01 0.98 0.95 1.01 0.95 0.97 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.16 1.18 1.21 1.09 1.14 1.21 1.12 1.26 1.17 1.17 1.32 1.17 1.16 1.2 1.2 1.08 1.25 1.23 1.22 1.13 1.09 1.16 1.19 1.16 1.14 1.06 1.11 1.01 0.88 0.99 0.97 1 1.06 0.93 0.98 0.95 1.13 1.08 0.95 1.01 0.77 1.07 0.97 0.87 0.87 0.85 1.07 1.01 0.89 0.94 0.94 0.99 0.94 1.02 0.97 1.23 1.2 1.2 1.2 1.2 0.97 1.11 1.12 1.09 1.05 1.12 1.17 1.19 1.18 1.16 1.03 1.09 1.07 1.1 1.07 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.2 0.92 1.17 1.08 1.06 1.12 1.14 1.17 1.13 1.13 1.18 1.12 1.19 1.16 1.16 1.1 1.21 1.24 1.18 1.16 1.11 1.07 1.19 1.13 1.12 0.95 1.09 0.99 0.85 0.94 0.77 0.98 0.92 0.81 0.84 0.88 1.13 1.11 0.93 0.97 0.76 1.02 0.93 0.83 0.84 0.81 1.05 0.96 0.85 0.89 0.93 0.87 0.94 0.93 0.92 1.2 1.19 1.18 1.13 1.17 1 1 1.04 0.94 0.99 1.02 1.09 1.16 1.02 1.06 1.02 1 1.05 0.99 1.01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.04 1.05 1.02 0.96 1.01 0.94 1.06 0.93 0.95 0.95 1.09 1.06 0.91 0.94 0.98 1.03 0.96 0.85 0.93 0.93 1 1.01 0.92 0.94 0.96 1.15 1.1 1.01 0.81 0.96 0.79 0.94 0.96 0.8 0.85 0.96 1.13 1.11 1.05 1.04 0.78 1.07 0.96 0.82 0.86 0.87 1.03 0.99 0.85 0.91 0.96 0.9 1.02 0.94 0.95 1.09 1.2 1.08 1.17 1.13 1.08 1.16 0.94 1.11 1.04 1.13 1.19 1.04 1.02 1.08 1.04 1.07 1.01 1.04 1.03 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.15 1.01 1.2 1.12 1.1 1.05 1.26 1 1.1 1.08 1.11 1.27 1.19 1.09 1.15 1.03 1.12 1.09 1.02 1.02 1.03 1.14 1.1 1.07 1.07 1.08 1.12 1.13 0.9 1 1.1 0.99 0.9 0.78 0.89 1.12 1.16 1.34 1.13 1.15 0.76 1.06 0.92 0.85 0.85 0.91 1.07 0.98 0.88 0.93 1.01 0.88 1.06 0.96 0.96 1.19 1.14 1.23 1.19 1.18 1.08 1.13 1.1 1.11 1.09 1.07 1.11 1.14 1.17 1.11 1.04 1.02 1.11 1.08 1.06 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.73 0.87 1.06 0.97 0.84 1.05 0.97 0.99 1 1 1.23 1.18 0.92 0.94 1 1.11 0.99 1.07 1.04 1.04 0.94 0.95 0.95 0.95 0.95 0.96 1.09 0.98 0.84 0.94 0.84 1.08 0.92 0.79 0.87 0.92 1.08 1.1 0.92 0.98 0.75 1.01 0.9 0.86 0.85 0.83 1.06 0.95 0.84 0.89 0.96 0.88 0.9 0.9 0.91 1.13 1.12 1.13 1.14 1.13 1.02 1.02 1.03 1.08 1.03 0.99 1.02 1.08 1.05 1.03 1.01 0.99 1.02 1.02 1.01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.8 1.06 1.01 1.06 0.93 0.94 0.93 0.97 0.94 0.94 1.11 0.97 0.98 0.98 1 0.99 1.08 1.03 1.04 1.03 0.92 0.99 0.99 0.98 0.97 1.08 1.12 0.96 1.11 1.05 0.76 1 0.87 0.78 0.82 1.07 1.12 1.16 1.01 1.08 0.78 1.07 0.94 0.87 0.87 0.84 1.07 0.94 0.88 0.9 0.95 0.93 0.96 0.94 0.95 1.19 1.22 1.22 1.22 1.21 1.08 1.09 1.07 1.15 1.09 1.19 1.16 1.18 1.17 1.18 1.07 1.06 1.08 1.08 1.07 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.86 0.98 1.01 1 0.94 0.98 1.02 0.97 0.98 0.98 1.11 1.06 0.96 0.97 1.01 1.02 1.03 0.97 0.99 1 0.96 1 0.97 0.97 0.98 1.03 1.1 1.01 0.88 0.98 0.82 0.99 0.91 0.79 0.85 1 1.12 1.15 1.01 1.05 0.77 1.05 0.93 0.85 0.86 0.86 1.05 0.96 0.86 0.91 0.96 0.9 0.98 0.93 0.94 1.14 1.17 1.16 1.18 1.16 1.06 1.08 1.01 1.11 1.06 1.07 1.09 1.09 1.09 1.08 1.04 1.03 1.05 1.05 1.04 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.03 0.97 1.04 1 1 0.97 0.97 1.01 0.99 0.98 1.04 1.02 0.95 0.97 0.99 1.03 1.03 0.96 1.01 1.01 1.01 0.99 0.98 0.99 0.99 0.99 1.12 1.02 0.88 0.98 0.82 0.98 0.92 0.77 0.85 0.91 1.15 1.15 0.93 0.99 0.77 1.01 0.93 0.84 0.85 0.84 1.05 0.97 0.85 0.9 0.93 0.89 0.95 0.94 0.93 1.13 1.13 1.13 1.11 1.12 1.01 1 0.98 0.93 0.98 1.03 1.16 1 1.01 1.04 1.01 1.01 1 0.98 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.15 1.06 1.15 1.15 1.12 1.1 1.12 1.08 1.11 1.1 1.17 1.19 1.2 1.16 1.18 1.07 1.14 1.1 1.09 1.09 1.09 1.11 1.12 1.11 1.11 1 1.12 1.05 0.88 0.98 0.81 1 0.89 0.78 0.84 0.98 1.12 1.18 1 1.04 0.74 1.04 0.92 0.85 0.85 0.84 1.06 0.97 0.86 0.9 0.98 0.93 1.01 0.97 0.97 1.2 1.15 1.21 1.19 1.19 1.12 1.06 1.1 1.07 1.08 1.06 1.15 1.15 1.08 1.1 1.06 1.04 1.1 1.06 1.06 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.8 0.77 0.84 0.86 0.81 1.04 0.95 0.97 1.05 0.99 0.98 1.08 0.96 0.95 0.98 1 1.06 1 1.01 1.02 0.93 0.92 0.93 0.95 0.93 0.95 1.06 0.97 0.83 0.92 0.77 1.01 0.93 0.78 0.83 0.89 1.11 1.1 0.93 0.97 0.74 1.08 0.9 0.81 0.83 0.8 1.06 0.95 0.82 0.87 0.92 0.89 0.9 0.92 0.91 1.13 1.13 1.13 1.12 1.13 1.02 1.03 1 1.02 1.02 1.03 1.02 1.07 1.02 1.03 1.01 1 1.01 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.95 1.1 1.07 1.1 1.03 1.08 1.03 1.06 0.97 1.03 1.13 1.08 1.09 1.09 1.1 1.06 1.15 1.07 1.08 1.08 1.03 1.08 1.07 1.05 1.05 1 1.12 0.99 0.89 0.98 0.8 1.01 0.9 0.81 0.86 0.97 1.12 1.11 0.97 1.02 0.79 1.07 0.94 0.87 0.88 0.85 1.07 0.96 0.87 0.91 0.95 0.93 0.95 0.96 0.95 1.21 1.21 1.21 1.21 1.21 1.06 1.09 1.1 1.12 1.09 1.17 1.17 1.18 1.18 1.17 1.07 1.06 1.08 1.09 1.08 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.94 0.93 1 1 0.96 1.03 0.99 1.01 1.01 1.01 1.06 1.08 1.01 1.01 1.04 1.04 1.08 1.02 1.04 1.04 1 1.01 1.01 1.01 1.01 0.98 1.1 1.01 0.87 0.96 0.8 1 0.91 0.78 0.85 0.93 1.12 1.13 0.95 1 0.76 1.05 0.92 0.84 0.85 0.83 1.06 0.96 0.85 0.89 0.94 0.91 0.95 0.94 0.93 1.16 1.15 1.17 1.15 1.16 1.05 1.04 1.04 1.01 1.03 1.06 1.1 1.08 1.06 1.07 1.04 1.03 1.04 1.03 1.03 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0.87 0.98 0.95 0.74 0.87 0.97 1 0.88 1.03 0.96 0.99 0.98 1.01 0.98 0.99 0.88 0.94 0.92 0.81 0.88 0.92 0.97 0.93 0.86 0.92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.12 1.14 1.09 0.95 1.06 0.89 1.21 0.97 0.92 0.95 1.12 1.24 1.25 1.01 1.11 0.81 1.15 0.98 0.92 0.9 0.92 1.18 1.03 0.94 0.98 0.91 0.9 0.82 0.87 0.87 0.95 0.99 0.99 0.97 0.97 0.99 0.95 0.88 0.89 0.92 0.95 0.84 0.85 0.81 0.85 0.94 0.9 0.87 0.87 0.89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.03 1.05 1.04 0.86 0.98 1.11 0.93 0.98 1.07 1.01 1.07 1.06 1.09 1.04 1.06 0.99 1.07 0.97 0.97 0.99 1.04 1.02 1.01 0.96 1.01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1.12 1.05 0.8 0.93 0.88 1.09 1.05 0.86 0.93 1 1.21 1.18 1.13 1.1 0.75 1.02 0.93 0.88 0.85 0.85 1.09 1.03 0.87 0.92 0.88 0.9 0.89 0.93 0.9 0.97 1.04 1 0.95 0.99 1.09 0.99 1.05 1.04 1.02 0.83 0.84 0.82 0.91 0.85 0.91 0.92 0.92 0.95 0.92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.06 1.08 1.02 1.09 1.06 1 1 1.01 1.17 1.03 0.99 1.09 0.96 1.05 1.01 1.06 1.11 1.01 1.07 1.06 1.02 1.06 1 1.09 1.04 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.07 1.16 1.25 1.04 1.11 1.13 1.16 1.12 0.84 0.98 1.11 1.25 1.3 1.1 1.16 0.82 1.28 0.94 0.87 0.9 0.97 1.19 1.09 0.93 1.01 1 0.98 0.88 0.92 0.94 0.95 0.94 0.97 1.01 0.96 1.01 1.04 0.97 0.99 1 1.05 0.89 0.96 0.92 0.94 0.99 0.95 0.93 0.95 0.96 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.94 0.93 0.98 0.96 0.95 0.93 0.95 0.96 0.95 0.94 1.02 1 0.96 0.99 0.99 0.89 0.84 0.8 0.85 0.84 0.94 0.92 0.91 0.93 0.92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.01 1.17 1.05 0.89 0.99 0.85 1.07 0.98 0.89 0.92 1.03 1.24 1.31 1.07 1.11 0.78 1.04 0.91 0.85 0.86 0.88 1.11 1.01 0.91 0.95 0.87 0.86 0.82 0.84 0.84 1.03 1.03 1.03 1.03 1.03 1.03 0.98 0.99 0.96 0.99 0.93 0.88 0.89 0.96 0.91 0.95 0.92 0.9 0.92 0.92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.95 0.98 0.99 0.87 0.94 0.98 0.96 0.95 1.02 0.97 1.01 1.01 0.98 1 1 0.93 0.95 0.9 0.89 0.92 0.96 0.97 0.95 0.93 0.95 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.04 1.14 1.09 0.9 1.01 0.9 1.12 1.02 0.87 0.94 1.05 1.23 1.25 1.07 1.12 0.78 1.09 0.94 0.88 0.88 0.9 1.14 1.04 0.91 0.96 0.91 0.9 0.85 0.88 0.88 0.97 0.99 1 0.99 0.99 1.02 0.98 0.95 0.96 0.97 0.93 0.86 0.87 0.89 0.88 0.95 0.92 0.9 0.92 0.92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.93 0.89 0.85 0.83 0.87 0.86 1.01 0.91 0.93 0.91 0.93 0.95 0.78 0.92 0.88 0.87 0.93 0.84 0.84 0.87 0.88 0.94 0.84 0.88 0.88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.13 1.32 1.03 1.04 1.1 0.9 1.15 1 0.89 0.94 1.09 1.28 1.31 1.05 1.13 0.79 0.94 0.95 0.86 0.86 0.92 1.11 1.03 0.93 0.97 0.87 0.9 0.85 0.83 0.86 0.99 0.98 1.08 0.96 0.99 0.89 0.87 1.01 0.91 0.91 0.88 0.84 0.82 0.87 0.83 0.89 0.88 0.9 0.89 0.89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.95 1.03 0.95 0.97 0.97 0.97 0.98 0.99 1 0.99 1.08 1 0.98 1.03 1 0.98 0.95 0.98 0.98 0.98 0.97 0.98 0.97 0.99 0.98 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1.09 1 0.92 0.99 0.87 1.22 1.02 0.9 0.95 1 1.23 1.3 1.01 1.07 0.76 1.05 0.94 0.87 0.85 0.86 1.14 1.01 0.91 0.94 0.94 0.88 0.83 0.85 0.87 0.98 0.96 1.01 0.97 0.98 1.02 1.1 1.02 0.97 1 0.84 0.9 0.89 0.78 0.84 0.93 0.93 0.9 0.87 0.91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.01 0.99 1.01 0.94 0.99 0.96 1.02 1.05 1 1 0.93 1 1.02 0.95 0.97 1 1 1.06 0.94 1 0.97 1 1.03 0.95 0.99 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.07 1.17 1.1 0.85 1 0.96 1.01 1.01 0.89 0.95 1.13 1.26 1.29 1.12 1.18 0.78 1.11 0.94 0.82 0.86 0.93 1.11 1.05 0.89 0.97 0.86 0.81 0.88 0.85 0.85 0.97 0.96 0.96 0.98 0.97 0.99 0.98 0.92 1.01 0.97 0.9 0.92 0.9 0.95 0.92 0.92 0.9 0.91 0.93 0.91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.87 0.95 0.94 0.94 0.92 0.9 0.94 0.95 0.94 0.93 0.92 0.99 1.01 0.94 0.95 0.83 0.84 0.85 0.84 0.84 0.86 0.92 0.93 0.91 0.9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.06 1.33 1.01 0.89 1.01 0.95 1.2 0.97 0.88 0.95 1.15 1.24 1.22 1.15 1.18 0.9 1.18 0.92 1.02 0.97 0.99 1.23 1 0.95 1.01 0.85 0.82 0.87 0.87 0.85 1.04 1.04 1.04 1.04 1.04 1.09 0.95 0.94 1.12 1.01 0.91 0.92 0.91 0.9 0.91 0.94 0.89 0.92 0.94 0.92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.92 0.95 0.92 0.91 0.92 0.92 0.97 0.96 0.95 0.95 0.94 0.98 0.91 0.95 0.94 0.9 0.92 0.91 0.89 0.9 0.91 0.95 0.92 0.92 0.93 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.06 1.2 1.03 0.91 1.02 0.91 1.12 1 0.89 0.95 1.08 1.24 1.27 1.07 1.13 0.8 1.04 0.94 0.88 0.88 0.92 1.14 1.02 0.92 0.97 0.88 0.84 0.86 0.85 0.86 0.99 0.98 1.01 0.99 0.99 0.97 0.96 0.95 0.98 0.96 0.88 0.89 0.84 0.86 0.87 0.92 0.9 0.9 0.91 0.91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.93 0.9 0.94 0.9 0.91 0.99 1.01 0.93 0.91 0.96 0.98 0.99 1.09 0.97 0.98 0.85 0.93 0.9 0.94 0.9 0.93 0.95 0.94 0.93 0.93 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.02 1.16 1.05 0.95 1.01 1.12 1.01 0.98 0.81 0.94 0.96 1.24 1.32 1.01 1.06 0.77 1.09 0.92 0.95 0.9 0.93 1.1 1.01 0.91 0.96 0.96 0.82 0.9 0.86 0.87 1.02 0.92 0.95 1.04 0.98 0.94 0.9 0.92 0.78 0.87 0.86 0.83 0.89 0.84 0.85 0.93 0.84 0.9 0.85 0.88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.09 0.94 1.02 0.98 1 1.04 1.08 0.93 0.87 0.97 1.09 1.19 1.16 0.95 1.07 1.04 1.03 0.97 0.97 1 1.06 1.04 1.01 0.93 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.04 1.14 1.06 0.94 1.02 0.97 1.12 0.98 1.16 1.04 1.13 1.12 1.22 1.04 1.11 0.91 1.04 0.96 0.88 0.93 0.98 1.09 1.01 0.98 1 0.91 0.88 0.93 0.91 0.9 1.02 0.98 0.91 1.03 0.98 1.1 0.99 1.01 1.04 1.03 0.95 0.8 0.92 0.9 0.88 0.98 0.9 0.94 0.95 0.94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.98 1.06 0.95 0.95 0.97 1.07 1.02 0.99 1.04 1.02 1 1.09 1.09 0.96 1.03 1.04 1.02 0.93 1.06 1 1.02 1.04 0.97 0.99 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.1 1.18 0.96 0.88 0.98 0.96 1.18 0.91 0.91 0.85 1.01 1.31 1.28 1.14 1.1 0.75 1.24 0.97 0.85 0.87 0.86 1.21 0.99 0.92 0.93 0.87 0.84 0.85 0.83 0.84 0.99 0.99 0.99 0.93 0.97 0.96 0.98 0.95 1.01 0.97 0.92 0.9 1.05 0.83 0.9 0.91 0.89 0.93 0.88 0.9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.93 0.93 0.92 0.92 0.92 0.95 0.97 0.9 0.97 0.94 0.93 1.04 0.93 1.02 0.97 0.85 0.86 0.87 0.84 0.85 0.91 0.93 0.9 0.93 0.92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.17 1.05 1.01 0.92 1.01 1.16 1.1 1.12 1.05 1.1 1.05 1.21 1.13 1.07 1.1 0.74 1.06 0.98 0.86 0.86 0.94 1.1 1.05 0.96 0.99 0.84 0.84 0.84 0.86 0.84 1.05 1.02 1.02 1.02 1.03 0.87 1.01 1.01 0.99 0.95 0.91 0.9 0.92 0.91 0.91 0.89 0.92 0.92 0.93 0.92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.97 0.94 0.95 0.93 0.94 1 1.01 0.93 0.93 0.96 0.97 1.06 1.03 0.97 1 0.92 0.93 0.91 0.93 0.92 0.96 0.98 0.95 0.94 0.96 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.06 1.12 1.01 0.91 1 0.94 1.09 0.98 0.94 0.95 1.01 1.2 1.22 1.06 1.09 0.79 1.09 0.95 0.88 0.89 0.91 1.12 1.01 0.93 0.96 0.88 0.83 0.87 0.86 0.86 1.02 0.97 0.96 0.99 0.98 0.94 0.96 0.95 0.9 0.94 0.91 0.85 0.94 0.86 0.88 0.92 0.88 0.92 0.89 0.9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.91 0.89 0.87 0.89 0.89 1.04 0.94 0.91 0.92 0.95 0.96 0.95 0.9 0.93 0.93 0.9 0.87 0.89 0.87 0.88 0.95 0.91 0.89 0.9 0.91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.99 1.15 1.02 0.87 0.96 0.91 1.02 1.01 0.91 0.94 0.96 1.18 1.18 1.02 1.06 0.77 1.1 0.99 0.87 0.88 0.86 1.09 1.03 0.9 0.94 0.93 0.84 0.83 0.86 0.86 1 0.95 0.97 1.02 0.98 1.01 0.91 0.83 0.99 0.91 0.87 0.84 0.88 0.83 0.85 0.94 0.87 0.86 0.9 0.89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0.96 0.99 0.97 0.98 1.13 0.99 1 0.97 1.01 1.01 0.97 0.97 0.97 0.98 0.96 0.92 0.99 0.99 0.96 1.02 0.95 0.98 0.97 0.98 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.91 1.12 1.13 0.89 0.97 1.22 1.12 1.02 0.91 1.02 1.12 1.23 1.41 1.1 1.18 0.78 1.09 0.94 0.88 0.88 0.91 1.12 1.03 0.91 0.97 0.97 0.82 0.87 0.89 0.88 0.97 0.92 1.01 1.04 0.98 1.01 0.87 0.99 0.98 0.96 0.81 0.76 0.85 0.88 0.82 0.92 0.83 0.92 0.93 0.89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.01 0.99 1.01 1.01 1 1.02 1.01 1.03 0.99 1.01 1.03 0.98 1.01 1 1 1 1 1.04 1 1.01 1.01 1 1.02 1 1.01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.07 1.2 1.1 0.95 1.05 0.93 1.18 1.01 0.89 0.96 1.11 1.27 1.29 1.11 1.16 0.8 1.06 0.95 0.92 0.9 0.94 1.16 1.05 0.95 1 0.94 0.85 0.88 0.87 0.88 0.97 0.96 0.97 0.98 0.97 0.98 0.99 0.99 1.04 1 0.88 0.92 0.97 0.94 0.92 0.93 0.91 0.94 0.94 0.93 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.95 0.93 0.92 0.92 0.93 0.93 0.95 0.96 0.95 0.94 1.1 0.97 0.98 1.09 1.02 0.81 0.88 0.85 0.83 0.84 0.92 0.93 0.92 0.93 0.92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.13 1.16 1.01 1.15 1.09 0.85 1.1 0.97 0.88 0.92 1.2 1.25 1.28 1.13 1.2 0.77 1.06 0.93 0.86 0.86 0.9 1.13 1 0.94 0.97 0.84 0.83 0.86 0.84 0.84 1.02 1.04 1.04 1.04 1.04 0.97 0.98 0.97 1.04 0.99 0.92 0.89 0.91 0.9 0.91 0.92 0.91 0.93 0.93 0.92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.96 0.93 0.93 0.93 0.94 1.01 0.96 0.96 0.95 0.97 1.01 0.96 0.95 0.99 0.97 0.9 0.91 0.92 0.91 0.91 0.96 0.94 0.94 0.94 0.94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.01 1.15 1.05 0.93 1.01 0.93 1.09 1 0.89 0.96 1.06 1.23 1.26 1.07 1.13 0.78 1.07 0.95 0.88 0.88 0.9 1.12 1.03 0.92 0.97 0.91 0.83 0.85 0.86 0.86 0.99 0.96 0.99 1.02 0.99 0.99 0.93 0.92 1.01 0.96 0.86 0.84 0.9 0.88 0.87 0.93 0.88 0.91 0.92 0.91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.91 0.91 0.9 0.83 0.88 0.96 0.98 0.9 0.94 0.94 0.94 0.96 0.91 0.95 0.94 0.87 0.92 0.89 0.86 0.88 0.92 0.94 0.9 0.89 0.91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.04 1.18 1.05 0.93 1.02 0.94 1.07 0.99 0.88 0.94 1.01 1.23 1.26 1.01 1.08 0.78 1.04 0.95 0.9 0.88 0.91 1.11 1.02 0.92 0.96 0.91 0.86 0.84 0.85 0.86 0.99 0.96 0.99 0.99 0.98 0.94 0.9 0.9 0.86 0.9 0.89 0.84 0.83 0.84 0.84 0.92 0.87 0.88 0.87 0.89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.01 0.99 1 0.94 0.98 1.05 0.99 0.97 0.97 0.99 1.04 1.04 1.03 0.99 1.02 0.99 0.98 0.98 0.98 0.98 1.02 1 0.99 0.96 0.99 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.98 1.12 1.05 0.87 0.97 0.94 1.13 1.02 0.93 0.98 1.03 1.19 1.25 1.06 1.11 0.79 1.04 0.94 0.88 0.87 0.89 1.11 1.02 0.91 0.95 0.92 0.87 0.87 0.89 0.89 0.98 0.97 0.98 0.99 0.98 1.05 0.97 0.99 1 1 0.85 0.82 0.86 0.86 0.84 0.93 0.89 0.92 0.92 0.91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.01 1.02 0.99 0.98 1 1.01 1.01 1.01 1.03 1.02 0.98 1.03 1.01 0.98 1 1.02 1.03 0.99 1.01 1.01 1 1.02 1 1 1.01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.07 1.18 1.07 0.92 1.03 0.87 1.11 1 0.88 0.92 1.08 1.26 1.29 1.11 1.15 0.79 1.14 0.95 0.86 0.88 0.91 1.16 1.04 0.92 0.97 0.9 0.85 0.87 0.86 0.87 0.97 0.96 0.97 0.97 0.97 0.98 1 0.96 1.01 0.98 0.92 0.91 0.96 0.9 0.92 0.93 0.91 0.92 0.92 0.92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.92 0.93 0.94 0.94 0.93 0.92 0.95 0.94 0.95 0.94 0.96 1 0.97 1 0.98 0.84 0.85 0.84 0.84 0.84 0.91 0.93 0.91 0.92 0.92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.08 1.16 1.02 0.93 1.02 0.92 1.11 0.99 0.91 0.96 1.09 1.23 1.22 1.1 1.14 0.79 1.07 0.93 0.89 0.88 0.92 1.13 1.01 0.94 0.98 0.85 0.83 0.84 0.85 0.84 1.03 1.03 1.03 1.03 1.03 0.96 0.98 0.97 1.02 0.98 0.92 0.89 0.91 0.91 0.91 0.92 0.91 0.92 0.93 0.92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.94 0.95 0.94 0.91 0.94 0.97 0.97 0.95 0.96 0.96 0.97 1 0.96 0.97 0.98 0.91 0.93 0.91 0.9 0.91 0.95 0.96 0.94 0.93 0.94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.04 1.15 1.04 0.91 1.01 0.91 1.1 1 0.9 0.95 1.04 1.23 1.25 1.06 1.11 0.79 1.07 0.94 0.88 0.88 0.91 1.13 1.03 0.92 0.97 0.89 0.85 0.85 0.86 0.86 0.99 0.98 0.99 1 0.99 0.98 0.95 0.94 0.95 0.96 0.89 0.86 0.88 0.87 0.88 0.93 0.9 0.91 0.91 0.91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G 0.85 0.89 0.87 0.85 0.86 0.89 0.89 0.81 0.96 0.88 0.88 0.86 0.93 0.9 0.89 0.86 0.84 0.83 0.8 0.83 0.87 0.87 0.85 0.85 0.86 0.85 1 0.95 1 0.94 0.86 0.89 0.91 0.9 0.89 0.83 1.05 0.99 0.97 0.93 0.88 0.84 0.83 0.88 0.85 0.85 0.92 0.91 0.93 0.9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.97 0.95 0.93 0.93 0.94 1 1.24 1.1 1.08 1.09 1.07 0.89 0.89 1.03 0.95 1.1 1.08 0.97 1.14 1.05 1.03 1 0.95 1.01 0.99 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.01 0.97 0.97 0.97 0.97 1.03 0.88 0.94 1 0.95 0.95 0.94 1.01 0.95 0.96 0.97 0.96 0.9 0.9 0.93 0.98 0.93 0.94 0.94 0.95 1.04 1.14 1.05 1.16 1.09 1.09 1.08 1.09 1.1 1.09 1.07 1.12 1.26 1.26 1.16 0.94 0.98 0.93 0.96 0.95 1.01 1.07 1.05 1.09 1.05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.02 1.04 1 0.99 1.01 1.09 1.29 1.12 1.07 1.13 1.17 0.93 1.06 1.17 1.06 1.14 1.15 1.13 1.24 1.16 1.06 1.05 1.04 1.09 1.06 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.02 1.03 0.97 1.04 1.01 0.92 0.92 0.94 1.1 0.95 0.95 1.04 0.91 1 0.97 0.99 1.05 0.94 1.01 0.99 0.96 1 0.94 1.03 0.98 0.91 0.91 0.9 0.89 0.9 0.92 0.82 0.86 1 0.88 0.93 1.1 0.96 0.93 0.97 0.81 0.89 0.81 0.79 0.82 0.87 0.89 0.87 0.87 0.88 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.07 1.04 0.95 0.99 1 1.03 1.02 1.05 1.1 1.05 1.07 1.1 1.03 1.05 1.06 1.15 0.99 1.06 1.02 1.04 1.07 1.03 1.01 1.03 1.03 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.83 0.82 0.88 0.86 0.85 0.82 0.84 0.85 0.84 0.83 0.88 0.86 0.82 0.85 0.85 0.89 0.84 0.8 0.85 0.84 0.85 0.83 0.82 0.84 0.83 1.04 1.02 0.99 1.08 1.03 1.05 1 1 0.84 0.94 0.96 1.1 1.08 1.05 1.03 0.86 0.93 0.84 0.85 0.87 0.96 1 0.95 0.92 0.95 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.95 0.93 0.89 0.91 0.92 1.02 1.02 1.02 1.02 1.02 1.1 1.04 1.05 1.02 1.05 1.07 1.02 1.03 1.1 1.05 1.01 0.98 0.96 0.98 0.98 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.9 0.91 0.91 0.89 0.9 0.89 0.88 0.87 0.93 0.89 0.91 0.91 0.9 0.91 0.91 0.9 0.88 0.84 0.85 0.87 0.9 0.89 0.88 0.89 0.89 0.9 0.96 0.92 0.95 0.93 0.94 0.9 0.93 0.93 0.92 0.89 1.08 1.01 0.98 0.97 0.86 0.9 0.84 0.85 0.86 0.89 0.94 0.92 0.92 0.92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.99 0.99 0.94 0.95 0.96 1.03 1.1 1.06 1.06 1.06 1.09 0.98 0.99 1.05 1.02 1.09 1.05 1.02 1.08 1.06 1.04 1.01 0.99 1.02 1.01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.82 0.88 0.86 0.88 0.86 0.8 0.95 0.91 0.89 0.87 1.03 0.92 0.82 0.9 0.9 0.83 0.86 0.82 0.81 0.83 0.85 0.89 0.85 0.86 0.86 1.01 1.07 0.94 1.04 1 1.07 1.23 1.08 1.04 1.09 1.03 0.99 1.25 1.2 1.09 1.05 1.02 0.96 1.05 1.01 1.03 1.06 1.03 1.07 1.04 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.91 0.93 0.98 0.99 0.95 1.05 1.08 0.96 1.07 1.03 0.91 0.97 0.86 1.01 0.92 1 1.09 0.74 1.05 0.92 0.95 1 0.84 1.02 0.94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.96 1.02 0.97 0.98 0.98 0.92 0.93 0.95 0.95 0.94 1.17 0.96 1.02 0.98 1.02 0.92 0.88 0.92 0.92 0.91 0.97 0.94 0.96 0.96 0.96 1.19 1.2 1.18 1.18 1.19 1.21 1.33 1.2 1.2 1.23 1.23 1.36 1.33 1.32 1.3 1.23 1.02 1.13 1.2 1.12 1.2 1.18 1.2 1.21 1.2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.99 0.97 0.99 1.04 0.99 1.1 1.08 1.13 1.09 1.1 1.13 1.08 0.85 1.07 1 1.15 1.15 1.2 1.09 1.13 1.06 1.03 0.98 1.04 1.03 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.96 0.95 0.97 0.99 0.97 0.9 0.95 0.98 0.92 0.93 0.89 0.96 0.97 0.93 0.93 0.94 0.94 0.99 0.91 0.94 0.92 0.95 0.98 0.94 0.94 1.07 1.08 1.1 0.99 1.04 1.11 0.97 0.99 1.05 1.02 1.08 1.09 1.11 1.06 1.08 0.95 1.02 0.89 1.07 0.96 1.03 1.02 1.01 1.02 1.02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.93 0.96 0.95 0.98 0.95 1.05 1.04 1.05 1.07 1.05 1.05 1.04 1.02 1.09 1.05 1 1.02 1 1.05 1.02 0.99 1 1 1.03 1.01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.77 0.85 0.84 0.84 0.82 0.86 0.83 0.84 0.83 0.84 1.14 0.85 0.87 0.88 0.9 0.83 0.84 0.85 0.84 0.84 0.86 0.83 0.84 0.84 0.84 0.99 1.2 1.03 1.05 1.06 1.11 1.06 1.09 1.08 1.08 1.11 1.09 1.09 1.11 1.1 1.06 1.04 1.03 1.06 1.05 1.06 1.09 1.06 1.07 1.07 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.92 0.84 0.94 0.94 0.9 1 1.02 1.02 1.02 1.01 1.1 1.06 1.02 1.18 1.07 1.05 1.06 1.05 1.04 1.05 0.98 0.95 0.99 1 0.98 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.86 0.91 0.89 0.91 0.89 0.86 0.9 0.9 0.89 0.89 1.01 0.92 0.89 0.92 0.93 0.86 0.86 0.87 0.85 0.86 0.89 0.89 0.89 0.89 0.89 1.02 1.09 1.03 1.05 1.04 1.11 1.09 1.06 1.08 1.08 1.08 1.08 1.14 1.12 1.1 1.01 1.02 0.97 1.08 1.01 1.05 1.06 1.04 1.08 1.06 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.93 0.92 0.96 0.98 0.95 1.04 1.05 1.03 1.06 1.04 1.02 1.02 0.92 1.06 1 1.03 1.07 0.93 1.03 1.01 0.99 0.99 0.94 1.02 0.99 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.83 0.85 0.9 0.85 0.85 0.9 0.88 0.87 0.86 0.87 0.83 0.86 1.03 0.93 0.9 0.82 0.83 0.79 0.86 0.82 0.84 0.85 0.87 0.87 0.86 1.15 1.14 1.19 1.09 1.11 1.04 1.07 1.09 1.13 1.08 1.25 1.12 1.2 1.19 1.17 0.93 1.01 1.18 1.05 1.02 1.04 1.07 1.15 1.11 1.09 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0.86 0.96 0.97 0.94 1.03 1.06 1.1 1.09 1.06 0.99 1 1 0.88 0.95 1 1.07 1.15 1 1.05 0.99 0.98 1.04 0.95 0.98 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.97 0.89 0.93 0.99 0.94 0.97 0.94 0.88 0.91 0.92 0.96 1.06 0.97 0.92 0.95 0.89 0.94 0.91 0.91 0.91 0.94 0.94 0.92 0.93 0.93 1.24 1.27 1.13 1.32 1.22 1.24 1.3 1.25 1.39 1.28 1.3 1.35 1.42 1.32 1.34 1.06 1.15 1.13 1.22 1.13 1.18 1.25 1.2 1.3 1.23 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.06 1.06 1.11 1.08 1.07 1.14 1.12 1.06 1.15 1.11 1.2 1.1 1.28 1.14 1.16 1.12 1.17 1.17 1.14 1.13 1.11 1.08 1.14 1.1 1.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.9 1.02 0.92 0.89 0.92 0.98 0.95 0.86 0.95 0.93 0.96 0.94 1.04 0.92 0.95 0.97 0.95 0.82 0.98 0.92 0.95 0.96 0.89 0.93 0.93 1.16 1.13 1.12 1.16 1.03 1.25 1.14 1.14 1.2 1.17 1.18 1.27 1.25 1.28 1.24 1.01 1.11 1.14 1.06 1.07 1.12 1.04 1.15 1.16 1.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.9 0.86 0.91 0.92 0.9 1.07 1.09 1.07 0.98 1.05 0.96 1.04 1.02 1.09 1.02 1.07 0.96 1.15 0.9 0.99 0.98 0.96 1.01 0.95 0.97 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.83 0.83 0.82 0.82 0.82 0.84 0.86 0.88 0.86 0.85 0.78 0.9 0.88 0.88 0.85 0.88 0.75 0.87 0.84 0.83 0.83 0.82 0.86 0.84 0.83 1.16 1.21 1.27 1.06 1.16 1.08 1.06 1.29 1.26 1.14 1.23 1.09 1.08 1.16 1.13 0.84 1.09 1.11 1.09 0.98 1 1.1 1.17 1.12 1.09 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.91 0.93 0.91 0.99 0.93 1.06 1 1 1 1.02 0.93 1.07 1.21 1.05 1.04 0.95 1.05 1.06 1.05 1.02 0.94 0.99 1.01 1.01 0.98 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.87 0.88 0.88 0.88 0.88 0.91 0.9 0.86 0.88 0.89 0.86 0.92 0.95 0.91 0.9 0.86 0.84 0.84 0.88 0.85 0.87 0.88 0.88 0.89 0.88 1.17 1.09 1.17 1.13 1.11 1.11 1.12 1.17 1.22 1.15 1.2 1.17 1.19 1.22 1.2 0.93 1.08 1.14 1.09 1.04 1.07 1.09 1.17 1.16 1.12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.96 0.91 0.96 0.98 0.95 1.07 1.06 1.05 1.04 1.05 0.99 1.04 1.1 0.99 1.02 1.02 1.03 1.12 1 1.04 1 0.99 1.04 0.99 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.84 0.84 0.86 0.89 0.86 0.89 0.89 0.87 0.89 0.88 0.88 0.9 0.93 0.91 0.9 0.82 0.82 0.81 0.83 0.82 0.85 0.86 0.86 0.88 0.86 0.88 1.05 1.06 1.02 0.98 0.91 0.88 0.98 1.03 0.94 0.99 0.89 0.95 0.99 0.94 0.82 0.83 0.89 0.86 0.84 0.88 0.9 0.96 0.96 0.92 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.01 1 1.15 0.97 1.01 1.07 1.14 1.05 1.08 1.08 1.11 1.09 0.91 1.11 1.03 0.99 1.19 1.01 1.01 1.03 1.03 1.08 1 1.02 1.03 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.94 0.98 0.98 0.99 0.97 0.98 0.94 0.97 0.94 0.95 0.97 0.91 1 0.95 0.96 0.89 0.86 0.92 0.92 0.9 0.94 0.92 0.96 0.95 0.94 0.98 1.02 1.24 1.17 1.07 1.09 1.07 1.03 1.18 1.08 1.15 1.11 1.23 1.15 1.14 0.82 0.99 1.13 0.95 0.94 0.96 1.03 1.14 1.09 1.04 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.05 0.98 1.19 1.03 1.04 1.09 1.04 1.13 1.09 1.08 1.1 1.05 1.07 1.1 1.07 0.93 1.1 1.16 1.19 1.06 1.02 1.02 1.1 1.07 1.04 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.96 0.95 0.97 0.97 0.96 0.95 0.94 0.95 0.92 0.94 0.99 0.94 0.96 0.95 0.96 0.93 0.93 0.97 0.93 0.94 0.96 0.94 0.96 0.94 0.95 0.94 1.08 1.28 1.18 1.06 1.04 0.96 0.98 0.99 0.99 1.3 1.26 0.99 1.02 1.07 1.01 0.89 0.96 0.93 0.94 1 1.01 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0.92 0.94 0.94 0.95 1.05 1.04 1.05 1.06 1.05 1.04 1.05 1.05 1.1 1.06 0.98 1.02 1.07 1.04 1.02 1.01 0.99 1.01 1.02 1.01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.85 0.83 0.82 0.82 0.83 0.82 0.84 0.85 0.84 0.83 0.96 0.83 0.84 0.95 0.88 0.81 0.88 0.85 0.83 0.84 0.84 0.84 0.83 0.84 0.84 0.77 1.02 0.98 1.03 0.89 0.88 0.87 0.92 0.88 0.89 1.03 0.89 0.89 0.9 0.92 0.81 0.9 0.86 0.86 0.85 0.84 0.91 0.9 0.9 0.89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.92 0.9 0.93 0.91 0.91 1 1.03 1.03 1.03 1.02 1.03 1.05 1.03 1.11 1.05 1.06 1.03 1.05 1.04 1.05 0.98 0.97 0.99 0.99 0.98 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.89 0.89 0.89 0.9 0.89 0.89 0.89 0.9 0.89 0.89 0.94 0.89 0.92 0.94 0.92 0.84 0.86 0.86 0.86 0.86 0.88 0.88 0.89 0.89 0.89 0.84 1.03 1.09 1.07 0.98 0.96 0.92 0.97 0.99 0.96 1.09 0.98 0.97 0.99 0.99 0.85 0.89 0.92 0.89 0.88 0.91 0.95 0.98 0.98 0.95 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.99 0.94 1.02 0.96 0.97 1.05 1.05 1.06 1.06 1.05 1.07 1.05 0.99 1.1 1.05 0.98 1.04 1.06 1.05 1.03 1.01 1 1.02 1.02 1.01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.84 0.86 0.87 0.86 0.85 0.86 0.9 0.86 0.89 0.88 0.9 0.88 0.89 0.91 0.89 0.83 0.83 0.81 0.82 0.82 0.85 0.87 0.86 0.86 0.86 0.91 1.05 0.99 1.01 0.98 0.94 0.96 0.99 1 0.97 0.97 0.99 1.02 1.04 1 0.9 0.88 0.9 0.92 0.9 0.93 0.96 0.97 0.98 0.96 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.97 0.92 0.99 0.96 0.96 1.03 1.12 1.04 1.08 1.06 1 0.97 0.9 0.98 0.96 1.01 1.1 0.92 1.04 1.01 1 1.01 0.94 1 0.98 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.96 0.96 0.96 0.98 0.96 0.97 0.92 0.93 0.94 0.94 1 0.96 0.99 0.95 0.97 0.91 0.91 0.91 0.91 0.91 0.96 0.93 0.94 0.94 0.94 1.08 1.12 1.13 1.19 1.12 1.14 1.15 1.11 1.19 1.15 1.16 1.19 1.29 1.23 1.21 0.93 1.01 1.05 1.04 1 1.05 1.11 1.13 1.15 1.11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.02 1.01 1.05 1.03 1.03 1.1 1.11 1.1 1.1 1.1 1.14 1.01 1.02 1.11 1.06 1.06 1.13 1.16 1.15 1.12 1.06 1.04 1.06 1.07 1.06 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.96 0.98 0.95 0.96 0.96 0.94 0.94 0.93 0.96 0.94 0.94 0.96 0.96 0.95 0.95 0.95 0.96 0.91 0.95 0.94 0.95 0.96 0.94 0.95 0.95 0.93 1.01 1 0.95 0.96 1.03 0.9 0.95 1.03 0.97 1.05 1.15 1.04 1.02 1.06 0.91 0.95 0.9 0.9 0.91 0.97 0.97 0.96 0.96 0.96 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.96 0.93 0.94 0.95 0.94 1.05 1.05 1.06 1.05 1.05 1.03 1.06 1.03 1.08 1.05 1.03 1 1.06 0.99 1.02 1.01 0.99 1.01 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.82 0.83 0.84 0.84 0.83 0.83 0.84 0.85 0.84 0.84 0.89 0.86 0.85 0.88 0.87 0.85 0.82 0.84 0.84 0.84 0.84 0.83 0.84 0.84 0.84 0.89 1.09 1.03 1.05 0.99 0.98 0.97 1.02 0.94 0.97 1.04 1 1 1.01 1.01 0.86 0.97 0.92 0.92 0.91 0.93 1 0.99 0.97 0.97 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.92 0.9 0.91 0.93 0.92 1.02 1.02 1.02 1.02 1.02 1.01 1.05 1.06 1.08 1.05 1.03 1.04 1.05 1.05 1.04 0.98 0.97 0.99 1 0.98 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.88 0.89 0.89 0.89 0.89 0.89 0.89 0.88 0.9 0.89 0.92 0.91 0.91 0.92 0.91 0.86 0.86 0.85 0.86 0.86 0.89 0.89 0.88 0.89 0.89 0.92 1.03 1.02 1.02 0.99 1 0.97 1 1.02 1 1.03 1.06 1.04 1.04 1.04 0.9 0.94 0.93 0.93 0.92 0.96 0.99 0.99 1 0.98 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.97 0.94 0.97 0.97 0.96 1.05 1.06 1.05 1.05 1.05 1.04 1.02 0.99 1.04 1.02 1.03 1.05 1.02 1.04 1.03 1.01 1 0.99 1.01 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0.92 0.95 0.92 0.92 0.92 0.96 0.93 0.91 1.04 0.95 0.96 0.9 1.09 0.91 0.95 0.85 0.88 0.92 0.89 0.88 0.92 0.91 0.95 0.91 0.92 0.82 0.87 0.88 0.89 0.86 0.81 0.82 0.86 0.86 0.84 0.88 0.94 0.81 0.81 0.85 0.84 0.81 0.8 0.87 0.83 0.84 0.85 0.83 0.85 0.84 1.17 1.25 1.19 1.13 1.16 0.95 1.27 1.03 0.97 1.01 1.15 1.27 1.29 1.03 1.14 0.88 1.28 1.07 1.02 1 0.98 1.26 1.1 1.03 1.05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.08 1.03 1.02 1.03 1.03 1.1 0.93 0.98 1.07 1.01 1.03 0.98 1.17 0.96 1.02 0.95 1.01 0.95 0.95 0.96 1.03 0.98 1.02 0.99 1 1.01 1.01 0.97 1.05 1.01 0.99 0.98 1 1.06 1.01 1.12 1.02 1.08 1.1 1.07 0.91 0.95 0.9 0.96 0.93 0.99 0.99 0.97 1.02 0.99 1.16 1.28 1.15 0.98 1.08 0.94 1.14 1.11 0.91 0.98 1.04 1.25 1.23 1.16 1.14 0.82 1.15 1.04 0.98 0.95 0.94 1.18 1.11 0.97 1.01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.21 1.22 1.17 1.23 1.2 1.12 1.12 1.13 1.3 1.15 1.13 1.22 1.09 1.18 1.14 1.16 1.22 1.12 1.18 1.16 1.15 1.18 1.12 1.21 1.16 0.89 0.9 0.89 0.88 0.89 0.83 0.72 0.76 0.91 0.79 0.73 0.91 0.77 0.74 0.77 0.77 0.86 0.78 0.76 0.79 0.79 0.8 0.78 0.78 0.79 1.17 1.26 1.34 1.14 1.2 1.23 1.26 1.23 0.94 1.08 1.09 1.24 1.29 1.08 1.14 0.97 1.42 1.09 1.02 1.05 1.06 1.28 1.18 1.02 1.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0.99 1.04 1.02 1.01 0.89 0.91 0.92 0.91 0.91 0.97 0.95 0.92 0.95 0.94 0.87 0.83 0.78 0.83 0.82 0.92 0.91 0.9 0.92 0.91 1.03 1.01 0.97 1.06 1.01 1.05 0.99 1 0.84 0.94 0.89 1.04 1.02 0.99 0.97 0.88 0.95 0.86 0.88 0.89 0.94 0.99 0.94 0.9 0.94 1.07 1.24 1.11 0.96 1.06 0.92 1.14 1.05 0.96 0.99 1.03 1.24 1.3 1.06 1.11 0.85 1.12 0.99 0.93 0.94 0.93 1.16 1.06 0.97 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.02 1.02 1.02 1.01 1.01 0.99 0.96 0.97 1.03 0.98 1.01 0.98 1.04 0.97 1 0.93 0.93 0.91 0.92 0.92 0.98 0.97 0.98 0.98 0.98 0.88 0.88 0.87 0.88 0.88 0.88 0.84 0.87 0.89 0.87 0.86 0.97 0.86 0.84 0.87 0.84 0.88 0.83 0.85 0.85 0.86 0.88 0.85 0.86 0.86 1.13 1.25 1.18 1.03 1.11 0.97 1.19 1.09 0.94 1.01 1.07 1.24 1.27 1.07 1.13 0.87 1.2 1.04 0.98 0.98 0.97 1.21 1.11 0.99 1.04 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0.88 0.97 1.11 0.97 1.01 1.1 1.04 1.03 1.03 1.08 0.99 0.91 1.08 1 0.94 0.93 0.97 1.04 0.95 0.99 0.96 0.97 1.05 0.98 1.09 1.02 0.99 1.05 1.03 0.98 1.12 1.01 0.95 1.01 0.98 0.97 1.2 1.05 1.03 0.94 1.06 0.94 1.05 0.99 0.99 1.03 1.01 1.02 1.01 1.15 1.28 1.15 1.13 1.15 0.99 1.25 1.11 0.97 1.04 1.06 1.29 1.35 1.07 1.14 0.85 1.07 1.06 1.07 0.98 0.96 1.17 1.13 1.03 1.04 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.04 1.02 1.05 1.06 1.04 0.97 0.98 0.99 1 0.99 1.3 1.02 1.11 1.04 1.09 0.97 0.95 0.97 0.97 0.96 1.04 0.99 1.02 1.01 1.01 1.11 1.18 1.11 1.1 1.12 1.12 1.23 1.11 1.1 1.13 1.09 1.22 1.26 1.18 1.17 1.2 1 1.09 1.18 1.09 1.13 1.12 1.13 1.13 1.13 1.16 1.25 1.16 1.07 1.15 0.92 1.27 1.08 0.96 1 1.04 1.27 1.35 1.06 1.12 0.87 1.22 1.04 0.97 0.97 0.95 1.25 1.11 1 1.03 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.16 1.14 1.16 1.22 1.16 1.11 1.15 1.17 1.12 1.13 1.06 1.14 1.15 1.11 1.11 1.11 1.11 1.17 1.14 1.12 1.1 1.13 1.16 1.14 1.13 1.05 1.06 1.09 0.96 1.02 1.02 0.88 0.9 0.95 0.92 0.89 0.89 0.91 0.86 0.89 0.84 0.99 0.86 1.04 0.91 0.92 0.93 0.92 0.94 0.93 1.16 1.27 1.2 0.94 1.1 1.07 1.11 1.11 0.99 1.06 1.11 1.24 1.28 1.1 1.16 0.93 1.25 1.09 1.06 1.03 1.02 1.2 1.14 1 1.07 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.94 1.01 1 1 0.98 1 0.9 0.91 0.9 0.93 0.94 0.94 0.96 0.9 0.93 0.81 0.83 0.84 0.83 0.83 0.9 0.91 0.92 0.9 0.91 1.07 1.15 1.08 1.08 1.09 1.12 1.06 1.08 1.08 1.08 1.05 1.03 1.07 1.05 1.05 1.09 1.11 1.06 1.08 1.08 1.07 1.08 1.07 1.07 1.07 1.08 1.29 1.13 0.96 1.06 1.04 1.27 1.04 0.94 1.03 1.12 1.23 1.21 1.16 1.17 0.97 1.25 0.99 1.05 1.03 1.03 1.25 1.06 1 1.05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.01 0.98 1.03 1.07 1.02 1 1.01 1.01 0.99 1 1.05 1.01 1 1.01 1.02 0.93 0.93 0.95 0.95 0.94 0.99 0.98 0.99 1 0.99 1.04 1.06 1.04 1.03 1.04 1.05 1.02 1 1.01 1.02 0.98 0.98 1.07 0.99 1 0.96 1.03 0.96 1.08 1 1 1.02 1.01 1.02 1.01 1.13 1.25 1.16 1.01 1.1 1 1.2 1.08 0.97 1.03 1.08 1.26 1.29 1.09 1.15 0.89 1.17 1.04 1.02 0.99 0.98 1.21 1.11 1.01 1.05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.99 1.03 1.1 1.01 1.03 0.99 0.91 0.99 0.98 0.96 0.93 0.94 1.18 0.93 0.97 0.86 0.88 0.97 0.97 0.91 0.93 0.93 1.03 0.96 0.96 1.18 1.08 1.15 1.08 1.1 0.96 1.05 1 1.07 1.01 1.22 1.07 1.13 1.03 1.09 0.96 1.02 1.16 1.07 1.03 1.03 1.05 1.1 1.06 1.05 1.02 1.24 1.21 1.05 1.09 1.02 1.09 1.04 0.92 0.97 0.98 1.19 1.3 0.99 1.04 0.86 1.14 0.9 1.11 0.96 0.94 1.13 1.05 0.99 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.1 0.99 1.08 1.08 1.05 1.03 0.98 0.99 0.91 0.97 1.04 1.13 1.06 0.89 1.01 0.97 0.97 0.96 0.96 0.96 1.03 1 1.01 0.95 0.99 1.24 1.18 1.05 1.22 1.16 1.14 1.23 1.15 1.28 1.19 1.27 1.29 1.28 1.18 1.24 1.1 1.22 1.11 1.1 1.12 1.17 1.22 1.12 1.18 1.17 1.2 1.3 1.22 1.1 1.18 1.02 1.17 1.03 1.22 1.09 1.17 1.17 1.27 1.08 1.15 0.96 1.17 1.07 0.99 1.03 1.05 1.19 1.1 1.08 1.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.12 1.2 1.12 1.06 1.11 1.16 1.14 1.13 1.14 1.13 1.13 1.15 1.21 1.06 1.13 1.15 1.13 1 1.09 1.07 1.13 1.15 1.09 1.08 1.11 1.21 1.12 1.09 1.25 1.05 1.11 1.16 1.07 1.15 1.12 0.98 1.08 1.05 1.08 1.04 1 1.05 1.11 1.09 1.05 1.05 0.98 1.08 1.12 1.04 1.12 1.28 1.12 0.98 1.08 0.6 1.28 1.15 1.02 0.85 1 1.29 1.26 1.13 1.09 0.88 1.25 1.12 1.02 0.99 0.81 1.25 1.13 1.01 0.97 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.98 0.99 0.98 0.98 0.98 0.91 0.93 0.84 0.93 0.9 0.88 0.99 0.89 0.97 0.92 0.84 0.83 0.85 0.83 0.84 0.9 0.92 0.88 0.92 0.9 1.25 1.14 1.14 1.04 1.12 1.2 1.05 1.17 1.14 1.12 1.16 1.03 1.01 1.07 1.06 0.89 1.09 1.1 1.2 1.01 1.04 1.07 1.09 1.1 1.07 1.13 1.11 1.09 0.98 1.05 1.13 1.17 1.15 1.1 1.13 1.09 1.14 1.12 1.05 1.08 0.82 1.14 0.97 0.94 0.91 0.96 1.13 1.06 1 1.02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.03 1.03 1.06 1.02 1.03 1 0.97 0.96 0.98 0.98 0.97 1.02 1.04 0.95 0.99 0.93 0.92 0.93 0.94 0.93 0.98 0.98 0.99 0.97 0.98 1.21 1.03 1.1 1.12 1.09 1.07 1.1 1.08 1.15 1.1 1.12 1.09 1.08 1.08 1.09 0.96 1.08 1.12 1.1 1.05 1.06 1.06 1.09 1.11 1.07 1.09 1.22 1.14 1.01 1.09 0.83 1.16 1.07 1.02 0.98 1.04 1.18 1.22 1.06 1.08 0.87 1.16 0.99 1 0.96 0.92 1.17 1.08 1.01 1.01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.97 0.96 0.95 0.95 0.96 0.98 0.94 1.02 0.91 0.95 0.96 0.96 0.99 0.96 0.96 0.93 0.9 0.95 0.92 0.93 0.96 0.93 0.97 0.93 0.95 0.9 0.93 0.82 0.86 0.87 0.84 0.83 0.84 0.87 0.85 0.95 0.82 0.82 0.8 0.84 0.79 0.81 0.78 0.85 0.81 0.86 0.84 0.81 0.84 0.84 1.04 1.22 1.16 1 1.05 0.87 1.04 1.07 1 0.97 1.14 1.21 1.23 1.05 1.13 0.86 1.22 0.98 1.04 0.96 0.92 1.15 1.08 0.99 1.01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.07 1.06 1.06 1.07 1.06 1.07 0.99 1.11 0.95 1.02 1.03 0.97 1.06 1.01 1.01 1 0.95 0.98 0.98 0.97 1.04 0.99 1.05 1 1.02 1.01 0.9 1.01 1.02 0.97 0.99 1.03 0.94 1.03 0.99 1.01 1.03 1.1 0.95 1.01 0.79 0.97 1.02 0.94 0.9 0.91 0.97 1 0.97 0.95 0.96 1.27 1.28 1.05 1.08 1.18 1.17 1.08 0.96 1.04 1.3 1.28 1.46 1.13 1.25 0.89 1.19 1.05 0.98 0.98 0.98 1.22 1.13 0.99 1.05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.15 1.14 1.16 1.16 1.15 1.15 1.14 1.15 1.11 1.14 1.16 1.11 1.14 1.13 1.13 1.1 1.1 1.14 1.1 1.11 1.14 1.12 1.15 1.12 1.13 0.93 1.07 1.26 1.17 1.04 0.95 0.86 0.89 0.89 0.89 1.1 1.06 0.79 0.82 0.88 0.98 0.86 0.93 0.9 0.91 0.91 0.92 0.92 0.92 0.91 1.17 1.3 1.19 1.05 1.15 1.03 1.28 1.11 0.99 1.06 1.09 1.26 1.27 1.09 1.15 0.95 1.2 1.1 1.06 1.05 1.02 1.25 1.14 1.04 1.08 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.01 0.99 0.98 0.98 0.99 0.89 0.91 0.92 0.91 0.91 1.06 0.92 0.93 1.05 0.97 0.8 0.87 0.83 0.82 0.83 0.91 0.92 0.91 0.92 0.91 0.75 1.01 0.96 1.01 0.88 0.88 0.87 0.91 0.88 0.88 0.97 0.83 0.83 0.84 0.85 0.83 0.93 0.88 0.88 0.88 0.83 0.89 0.89 0.89 0.87 1.19 1.22 1.07 1.21 1.16 0.92 1.17 1.04 0.95 0.99 1.19 1.24 1.28 1.13 1.2 0.85 1.13 1 0.93 0.94 0.95 1.18 1.06 1 1.02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.04 1.02 1.02 1.02 1.02 1 0.98 1.03 0.96 0.99 1.04 0.98 1.02 1.02 1.01 0.93 0.94 0.94 0.93 0.94 0.99 0.98 1 0.98 0.99 0.85 0.96 0.96 0.98 0.92 0.9 0.88 0.89 0.9 0.89 0.99 0.89 0.85 0.84 0.88 0.83 0.88 0.87 0.89 0.86 0.87 0.89 0.89 0.89 0.88 1.07 1.25 1.16 1.04 1.1 0.94 1.15 1.07 0.97 1.01 1.15 1.24 1.28 1.09 1.17 0.88 1.18 1.02 1 0.98 0.96 1.19 1.1 1 1.04 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.97 0.94 0.97 0.98 0.96 0.97 0.96 0.98 0.98 0.97 0.97 0.94 1.01 0.95 0.97 0.89 0.9 0.95 0.94 0.92 0.95 0.93 0.98 0.95 0.95 0.94 0.96 0.92 0.95 0.94 0.87 0.91 0.9 0.91 0.9 0.96 0.93 0.91 0.87 0.92 0.86 0.88 0.86 0.92 0.88 0.9 0.92 0.9 0.91 0.91 1.07 1.24 1.17 1.06 1.1 0.94 1.13 1.06 0.96 1 1.06 1.23 1.29 1.03 1.1 0.86 1.16 0.99 1.06 0.97 0.95 1.17 1.09 1.01 1.03 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.07 1.02 1.05 1.05 1.05 1.03 0.97 1.01 0.97 0.99 1.08 1.01 1.09 0.97 1.03 0.97 0.96 0.96 0.96 0.97 1.03 0.99 1.02 0.99 1.01 1.06 1.03 1.02 1.08 1.05 1.04 1.08 1.02 1.1 1.06 1.1 1.1 1.15 1.07 1.1 0.92 1.01 1 1 0.98 1.01 1.05 1.04 1.05 1.03 1.1 1.27 1.19 1.04 1.11 0.96 1.18 1.07 0.98 1.02 1.11 1.23 1.3 1.09 1.16 0.88 1.18 1.05 0.98 0.98 0.97 1.21 1.11 1 1.04 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.15 1.17 1.15 1.15 1.15 1.13 1.14 1.14 1.15 1.14 1.11 1.15 1.14 1.11 1.13 1.13 1.13 1.09 1.11 1.11 1.13 1.14 1.13 1.13 1.13 0.94 1 0.97 0.95 0.96 0.93 0.84 0.86 0.95 0.89 0.85 0.95 0.84 0.82 0.86 0.86 0.91 0.87 0.88 0.88 0.88 0.89 0.88 0.89 0.88 1.16 1.28 1.18 1.01 1.12 0.85 1.21 1.14 0.98 0.99 1.06 1.24 1.27 1.1 1.13 0.93 1.26 1.1 1.03 1.03 0.95 1.24 1.14 1.02 1.05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.98 0.99 1 1 0.99 0.92 0.91 0.9 0.91 0.91 0.94 0.95 0.92 0.96 0.94 0.83 0.83 0.82 0.83 0.83 0.91 0.91 0.9 0.91 0.91 0.92 1.05 1.01 1.05 0.99 1.01 0.97 1 0.92 0.97 0.98 0.94 0.95 0.95 0.95 0.89 1 0.93 0.96 0.94 0.94 0.98 0.97 0.96 0.96 1.1 1.2 1.1 1 1.07 0.97 1.18 1.05 0.97 1.02 1.09 1.21 1.21 1.09 1.13 0.87 1.15 0.98 0.96 0.95 0.96 1.17 1.06 0.99 1.02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.02 1.01 1.03 1.03 1.02 1 0.98 0.99 0.99 0.99 1.01 1 1.02 0.99 1 0.93 0.93 0.93 0.94 0.93 0.99 0.98 0.99 0.98 0.98 0.94 0.97 0.96 0.97 0.96 0.95 0.92 0.93 0.95 0.94 0.95 0.96 0.93 0.9 0.93 0.88 0.94 0.9 0.93 0.91 0.92 0.94 0.93 0.94 0.93 1.1 1.24 1.16 1.02 1.1 0.92 1.17 1.08 0.97 1.01 1.08 1.23 1.26 1.08 1.13 0.88 1.18 1.02 1 0.98 0.96 1.19 1.1 1 1.03 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 N 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 changeo-1.2.0/changeo/data/mk_rs1nf_dist.tsv0000644000175000017500000000024313674203454020271 0ustar nileshnilesh A C G T N - . A 0 1.51 0.32 1.17 0 0 0 C 1.51 0 1.17 0.32 0 0 0 G 0.32 1.17 0 1.51 0 0 0 T 1.17 0.32 1.51 0 0 0 0 N 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 . 0 0 0 0 0 0 0 changeo-1.2.0/changeo/data/hs1f_compat_dist.tsv0000644000175000017500000000022713674203454020757 0ustar nileshnilesh A C G T N . - A 0 2.08 1 1.75 0 0 0 C 2.08 0 1.75 1 0 0 0 G 1 1.75 0 2.08 0 0 0 T 1.75 1 2.08 0 0 0 0 N 0 0 0 0 0 0 0 . 0 0 0 0 0 0 0 - 0 0 0 0 0 0 0 changeo-1.2.0/changeo/data/mk_rs5nf_dist.tsv0000644000175000017500000021324313674203454020303 0ustar nileshnilesh AAAAA AAAAC AAAAG AAAAT AAAAN AAACA AAACC AAACG AAACT AAACN AAAGA AAAGC AAAGG AAAGT AAAGN AAATA AAATC AAATG AAATT AAATN AAANA AAANC AAANG AAANT AAANN AACAA AACAC AACAG AACAT AACAN AACCA AACCC AACCG AACCT AACCN AACGA AACGC AACGG AACGT AACGN AACTA AACTC AACTG AACTT AACTN AACNA AACNC AACNG AACNT AACNN AAGAA AAGAC AAGAG AAGAT AAGAN AAGCA AAGCC AAGCG AAGCT AAGCN AAGGA AAGGC AAGGG AAGGT AAGGN AAGTA AAGTC AAGTG AAGTT AAGTN AAGNA AAGNC AAGNG AAGNT AAGNN AATAA AATAC AATAG AATAT AATAN AATCA AATCC AATCG AATCT AATCN AATGA AATGC AATGG AATGT AATGN AATTA AATTC AATTG AATTT AATTN AATNA AATNC AATNG AATNT AATNN AANAA AANAC AANAG AANAT AANAN AANCA AANCC AANCG AANCT AANCN AANGA AANGC AANGG AANGT AANGN AANTA AANTC AANTG AANTT AANTN AANNA AANNC AANNG AANNT AANNN ACAAA ACAAC ACAAG ACAAT ACAAN ACACA ACACC ACACG ACACT ACACN ACAGA ACAGC ACAGG ACAGT ACAGN ACATA ACATC ACATG ACATT ACATN ACANA ACANC ACANG ACANT ACANN ACCAA ACCAC ACCAG ACCAT ACCAN ACCCA ACCCC ACCCG ACCCT ACCCN ACCGA ACCGC ACCGG ACCGT ACCGN ACCTA ACCTC ACCTG ACCTT ACCTN ACCNA ACCNC ACCNG ACCNT ACCNN ACGAA ACGAC ACGAG ACGAT ACGAN ACGCA ACGCC ACGCG ACGCT ACGCN ACGGA ACGGC ACGGG ACGGT ACGGN ACGTA ACGTC ACGTG ACGTT ACGTN ACGNA ACGNC ACGNG ACGNT ACGNN ACTAA ACTAC ACTAG ACTAT ACTAN ACTCA ACTCC ACTCG ACTCT ACTCN ACTGA ACTGC ACTGG ACTGT ACTGN ACTTA ACTTC ACTTG ACTTT ACTTN ACTNA ACTNC ACTNG ACTNT ACTNN ACNAA ACNAC ACNAG ACNAT ACNAN ACNCA ACNCC ACNCG ACNCT ACNCN ACNGA ACNGC ACNGG ACNGT ACNGN ACNTA ACNTC ACNTG ACNTT ACNTN ACNNA ACNNC ACNNG ACNNT ACNNN AGAAA AGAAC AGAAG AGAAT AGAAN AGACA AGACC AGACG AGACT AGACN AGAGA AGAGC AGAGG AGAGT AGAGN AGATA AGATC AGATG AGATT AGATN AGANA AGANC AGANG AGANT AGANN AGCAA AGCAC AGCAG AGCAT AGCAN AGCCA AGCCC AGCCG AGCCT AGCCN AGCGA AGCGC AGCGG AGCGT AGCGN AGCTA AGCTC AGCTG AGCTT AGCTN AGCNA AGCNC AGCNG AGCNT AGCNN AGGAA AGGAC AGGAG AGGAT AGGAN AGGCA AGGCC AGGCG AGGCT AGGCN AGGGA AGGGC AGGGG AGGGT AGGGN AGGTA AGGTC AGGTG AGGTT AGGTN AGGNA AGGNC AGGNG AGGNT AGGNN AGTAA AGTAC AGTAG AGTAT AGTAN AGTCA AGTCC AGTCG AGTCT AGTCN AGTGA AGTGC AGTGG AGTGT AGTGN AGTTA AGTTC AGTTG AGTTT AGTTN AGTNA AGTNC AGTNG AGTNT AGTNN AGNAA AGNAC AGNAG AGNAT AGNAN AGNCA AGNCC AGNCG AGNCT AGNCN AGNGA AGNGC AGNGG AGNGT AGNGN AGNTA AGNTC AGNTG AGNTT AGNTN AGNNA AGNNC AGNNG AGNNT AGNNN ATAAA ATAAC ATAAG ATAAT ATAAN ATACA ATACC ATACG ATACT ATACN ATAGA ATAGC ATAGG ATAGT ATAGN ATATA ATATC ATATG ATATT ATATN ATANA ATANC ATANG ATANT ATANN ATCAA ATCAC ATCAG ATCAT ATCAN ATCCA ATCCC ATCCG ATCCT ATCCN ATCGA ATCGC ATCGG ATCGT ATCGN ATCTA ATCTC ATCTG ATCTT ATCTN ATCNA ATCNC ATCNG ATCNT ATCNN ATGAA ATGAC ATGAG ATGAT ATGAN ATGCA ATGCC ATGCG ATGCT ATGCN ATGGA ATGGC ATGGG ATGGT ATGGN ATGTA ATGTC ATGTG ATGTT ATGTN ATGNA ATGNC ATGNG ATGNT ATGNN ATTAA ATTAC ATTAG ATTAT ATTAN ATTCA ATTCC ATTCG ATTCT ATTCN ATTGA ATTGC ATTGG ATTGT ATTGN ATTTA ATTTC ATTTG ATTTT ATTTN ATTNA ATTNC ATTNG ATTNT ATTNN ATNAA ATNAC ATNAG ATNAT ATNAN ATNCA ATNCC ATNCG ATNCT ATNCN ATNGA ATNGC ATNGG ATNGT ATNGN ATNTA ATNTC ATNTG ATNTT ATNTN ATNNA ATNNC ATNNG ATNNT ATNNN ANAAA ANAAC ANAAG ANAAT ANAAN ANACA ANACC ANACG ANACT ANACN ANAGA ANAGC ANAGG ANAGT ANAGN ANATA ANATC ANATG ANATT ANATN ANANA ANANC ANANG ANANT ANANN ANCAA ANCAC ANCAG ANCAT ANCAN ANCCA ANCCC ANCCG ANCCT ANCCN ANCGA ANCGC ANCGG ANCGT ANCGN ANCTA ANCTC ANCTG ANCTT ANCTN ANCNA ANCNC ANCNG ANCNT ANCNN ANGAA ANGAC ANGAG ANGAT ANGAN ANGCA ANGCC ANGCG ANGCT ANGCN ANGGA ANGGC ANGGG ANGGT ANGGN ANGTA ANGTC ANGTG ANGTT ANGTN ANGNA ANGNC ANGNG ANGNT ANGNN ANTAA ANTAC ANTAG ANTAT ANTAN ANTCA ANTCC ANTCG ANTCT ANTCN ANTGA ANTGC ANTGG ANTGT ANTGN ANTTA ANTTC ANTTG ANTTT ANTTN ANTNA ANTNC ANTNG ANTNT ANTNN ANNAA ANNAC ANNAG ANNAT ANNAN ANNCA ANNCC ANNCG ANNCT ANNCN ANNGA ANNGC ANNGG ANNGT ANNGN ANNTA ANNTC ANNTG ANNTT ANNTN ANNNA ANNNC ANNNG ANNNT ANNNN CAAAA CAAAC CAAAG CAAAT CAAAN CAACA CAACC CAACG CAACT CAACN CAAGA CAAGC CAAGG CAAGT CAAGN CAATA CAATC CAATG CAATT CAATN CAANA CAANC CAANG CAANT CAANN CACAA CACAC CACAG CACAT CACAN CACCA CACCC CACCG CACCT CACCN CACGA CACGC CACGG CACGT CACGN CACTA CACTC CACTG CACTT CACTN CACNA CACNC CACNG CACNT CACNN CAGAA CAGAC CAGAG CAGAT CAGAN CAGCA CAGCC CAGCG CAGCT CAGCN CAGGA CAGGC CAGGG CAGGT CAGGN CAGTA CAGTC CAGTG CAGTT CAGTN CAGNA CAGNC CAGNG CAGNT CAGNN CATAA CATAC CATAG CATAT CATAN CATCA CATCC CATCG CATCT CATCN CATGA CATGC CATGG CATGT CATGN CATTA CATTC CATTG CATTT CATTN CATNA CATNC CATNG CATNT CATNN CANAA CANAC CANAG CANAT CANAN CANCA CANCC CANCG CANCT CANCN CANGA CANGC CANGG CANGT CANGN CANTA CANTC CANTG CANTT CANTN CANNA CANNC CANNG CANNT CANNN CCAAA CCAAC CCAAG CCAAT CCAAN CCACA CCACC CCACG CCACT CCACN CCAGA CCAGC CCAGG CCAGT CCAGN CCATA CCATC CCATG CCATT CCATN CCANA CCANC CCANG CCANT CCANN CCCAA CCCAC CCCAG CCCAT CCCAN CCCCA CCCCC CCCCG CCCCT CCCCN CCCGA CCCGC CCCGG CCCGT CCCGN CCCTA CCCTC CCCTG CCCTT CCCTN CCCNA CCCNC CCCNG CCCNT CCCNN CCGAA CCGAC CCGAG CCGAT CCGAN CCGCA CCGCC CCGCG CCGCT CCGCN CCGGA CCGGC CCGGG CCGGT CCGGN CCGTA CCGTC CCGTG CCGTT CCGTN CCGNA CCGNC CCGNG CCGNT CCGNN CCTAA CCTAC CCTAG CCTAT CCTAN CCTCA CCTCC CCTCG CCTCT CCTCN CCTGA CCTGC CCTGG CCTGT CCTGN CCTTA CCTTC CCTTG CCTTT CCTTN CCTNA CCTNC CCTNG CCTNT CCTNN CCNAA CCNAC CCNAG CCNAT CCNAN CCNCA CCNCC CCNCG CCNCT CCNCN CCNGA CCNGC CCNGG CCNGT CCNGN CCNTA CCNTC CCNTG CCNTT CCNTN CCNNA CCNNC CCNNG CCNNT CCNNN CGAAA CGAAC CGAAG CGAAT CGAAN CGACA CGACC CGACG CGACT CGACN CGAGA CGAGC CGAGG CGAGT CGAGN CGATA CGATC CGATG CGATT CGATN CGANA CGANC CGANG CGANT CGANN CGCAA CGCAC CGCAG CGCAT CGCAN CGCCA CGCCC CGCCG CGCCT CGCCN CGCGA CGCGC CGCGG CGCGT CGCGN CGCTA CGCTC CGCTG CGCTT CGCTN CGCNA CGCNC CGCNG CGCNT CGCNN CGGAA CGGAC CGGAG CGGAT CGGAN CGGCA CGGCC CGGCG CGGCT CGGCN CGGGA CGGGC CGGGG CGGGT CGGGN CGGTA CGGTC CGGTG CGGTT CGGTN CGGNA CGGNC CGGNG CGGNT CGGNN CGTAA CGTAC CGTAG CGTAT CGTAN CGTCA CGTCC CGTCG CGTCT CGTCN CGTGA CGTGC CGTGG CGTGT CGTGN CGTTA CGTTC CGTTG CGTTT CGTTN CGTNA CGTNC CGTNG CGTNT CGTNN CGNAA CGNAC CGNAG CGNAT CGNAN CGNCA CGNCC CGNCG CGNCT CGNCN CGNGA CGNGC CGNGG CGNGT CGNGN CGNTA CGNTC CGNTG CGNTT CGNTN CGNNA CGNNC CGNNG CGNNT CGNNN CTAAA CTAAC CTAAG CTAAT CTAAN CTACA CTACC CTACG CTACT CTACN CTAGA CTAGC CTAGG CTAGT CTAGN CTATA CTATC CTATG CTATT CTATN CTANA CTANC CTANG CTANT CTANN CTCAA CTCAC CTCAG CTCAT CTCAN CTCCA CTCCC CTCCG CTCCT CTCCN CTCGA CTCGC CTCGG CTCGT CTCGN CTCTA CTCTC CTCTG CTCTT CTCTN CTCNA CTCNC CTCNG CTCNT CTCNN CTGAA CTGAC CTGAG CTGAT CTGAN CTGCA CTGCC CTGCG CTGCT CTGCN CTGGA CTGGC CTGGG CTGGT CTGGN CTGTA CTGTC CTGTG CTGTT CTGTN CTGNA CTGNC CTGNG CTGNT CTGNN CTTAA CTTAC CTTAG CTTAT CTTAN CTTCA CTTCC CTTCG CTTCT CTTCN CTTGA CTTGC CTTGG CTTGT CTTGN CTTTA CTTTC CTTTG CTTTT CTTTN CTTNA CTTNC CTTNG CTTNT CTTNN CTNAA CTNAC CTNAG CTNAT CTNAN CTNCA CTNCC CTNCG CTNCT CTNCN CTNGA CTNGC CTNGG CTNGT CTNGN CTNTA CTNTC CTNTG CTNTT CTNTN CTNNA CTNNC CTNNG CTNNT CTNNN CNAAA CNAAC CNAAG CNAAT CNAAN CNACA CNACC CNACG CNACT CNACN CNAGA CNAGC CNAGG CNAGT CNAGN CNATA CNATC CNATG CNATT CNATN CNANA CNANC CNANG CNANT CNANN CNCAA CNCAC CNCAG CNCAT CNCAN CNCCA CNCCC CNCCG CNCCT CNCCN CNCGA CNCGC CNCGG CNCGT CNCGN CNCTA CNCTC CNCTG CNCTT CNCTN CNCNA CNCNC CNCNG CNCNT CNCNN CNGAA CNGAC CNGAG CNGAT CNGAN CNGCA CNGCC CNGCG CNGCT CNGCN CNGGA CNGGC CNGGG CNGGT CNGGN CNGTA CNGTC CNGTG CNGTT CNGTN CNGNA CNGNC CNGNG CNGNT CNGNN CNTAA CNTAC CNTAG CNTAT CNTAN CNTCA CNTCC CNTCG CNTCT CNTCN CNTGA CNTGC CNTGG CNTGT CNTGN CNTTA CNTTC CNTTG CNTTT CNTTN CNTNA CNTNC CNTNG CNTNT CNTNN CNNAA CNNAC CNNAG CNNAT CNNAN CNNCA CNNCC CNNCG CNNCT CNNCN CNNGA CNNGC CNNGG CNNGT CNNGN CNNTA CNNTC CNNTG CNNTT CNNTN CNNNA CNNNC CNNNG CNNNT CNNNN GAAAA GAAAC GAAAG GAAAT GAAAN GAACA GAACC GAACG GAACT GAACN GAAGA GAAGC GAAGG GAAGT GAAGN GAATA GAATC GAATG GAATT GAATN GAANA GAANC GAANG GAANT GAANN GACAA GACAC GACAG GACAT GACAN GACCA GACCC GACCG GACCT GACCN GACGA GACGC GACGG GACGT GACGN GACTA GACTC GACTG GACTT GACTN GACNA GACNC GACNG GACNT GACNN GAGAA GAGAC GAGAG GAGAT GAGAN GAGCA GAGCC GAGCG GAGCT GAGCN GAGGA GAGGC GAGGG GAGGT GAGGN GAGTA GAGTC GAGTG GAGTT GAGTN GAGNA GAGNC GAGNG GAGNT GAGNN GATAA GATAC GATAG GATAT GATAN GATCA GATCC GATCG GATCT GATCN GATGA GATGC GATGG GATGT GATGN GATTA GATTC GATTG GATTT GATTN GATNA GATNC GATNG GATNT GATNN GANAA GANAC GANAG GANAT GANAN GANCA GANCC GANCG GANCT GANCN GANGA GANGC GANGG GANGT GANGN GANTA GANTC GANTG GANTT GANTN GANNA GANNC GANNG GANNT GANNN GCAAA GCAAC GCAAG GCAAT GCAAN GCACA GCACC GCACG GCACT GCACN GCAGA GCAGC GCAGG GCAGT GCAGN GCATA GCATC GCATG GCATT GCATN GCANA GCANC GCANG GCANT GCANN GCCAA GCCAC GCCAG GCCAT GCCAN GCCCA GCCCC GCCCG GCCCT GCCCN GCCGA GCCGC GCCGG GCCGT GCCGN GCCTA GCCTC GCCTG GCCTT GCCTN GCCNA GCCNC GCCNG GCCNT GCCNN GCGAA GCGAC GCGAG GCGAT GCGAN GCGCA GCGCC GCGCG GCGCT GCGCN GCGGA GCGGC GCGGG GCGGT GCGGN GCGTA GCGTC GCGTG GCGTT GCGTN GCGNA GCGNC GCGNG GCGNT GCGNN GCTAA GCTAC GCTAG GCTAT GCTAN GCTCA GCTCC GCTCG GCTCT GCTCN GCTGA GCTGC GCTGG GCTGT GCTGN GCTTA GCTTC GCTTG GCTTT GCTTN GCTNA GCTNC GCTNG GCTNT GCTNN GCNAA GCNAC GCNAG GCNAT GCNAN GCNCA GCNCC GCNCG GCNCT GCNCN GCNGA GCNGC GCNGG GCNGT GCNGN GCNTA GCNTC GCNTG GCNTT GCNTN GCNNA GCNNC GCNNG GCNNT GCNNN GGAAA GGAAC GGAAG GGAAT GGAAN GGACA GGACC GGACG GGACT GGACN GGAGA GGAGC GGAGG GGAGT GGAGN GGATA GGATC GGATG GGATT GGATN GGANA GGANC GGANG GGANT GGANN GGCAA GGCAC GGCAG GGCAT GGCAN GGCCA GGCCC GGCCG GGCCT GGCCN GGCGA GGCGC GGCGG GGCGT GGCGN GGCTA GGCTC GGCTG GGCTT GGCTN GGCNA GGCNC GGCNG GGCNT GGCNN GGGAA GGGAC GGGAG GGGAT GGGAN GGGCA GGGCC GGGCG GGGCT GGGCN GGGGA GGGGC GGGGG GGGGT GGGGN GGGTA GGGTC GGGTG GGGTT GGGTN GGGNA GGGNC GGGNG GGGNT GGGNN GGTAA GGTAC GGTAG GGTAT GGTAN GGTCA GGTCC GGTCG GGTCT GGTCN GGTGA GGTGC GGTGG GGTGT GGTGN GGTTA GGTTC GGTTG GGTTT GGTTN GGTNA GGTNC GGTNG GGTNT GGTNN GGNAA GGNAC GGNAG GGNAT GGNAN GGNCA GGNCC GGNCG GGNCT GGNCN GGNGA GGNGC GGNGG GGNGT GGNGN GGNTA GGNTC GGNTG GGNTT GGNTN GGNNA GGNNC GGNNG GGNNT GGNNN GTAAA GTAAC GTAAG GTAAT GTAAN GTACA GTACC GTACG GTACT GTACN GTAGA GTAGC GTAGG GTAGT GTAGN GTATA GTATC GTATG GTATT GTATN GTANA GTANC GTANG GTANT GTANN GTCAA GTCAC GTCAG GTCAT GTCAN GTCCA GTCCC GTCCG GTCCT GTCCN GTCGA GTCGC GTCGG GTCGT GTCGN GTCTA GTCTC GTCTG GTCTT GTCTN GTCNA GTCNC GTCNG GTCNT GTCNN GTGAA GTGAC GTGAG GTGAT GTGAN GTGCA GTGCC GTGCG GTGCT GTGCN GTGGA GTGGC GTGGG GTGGT GTGGN GTGTA GTGTC GTGTG GTGTT GTGTN GTGNA GTGNC GTGNG GTGNT GTGNN GTTAA GTTAC GTTAG GTTAT GTTAN GTTCA GTTCC GTTCG GTTCT GTTCN GTTGA GTTGC GTTGG GTTGT GTTGN GTTTA GTTTC GTTTG GTTTT GTTTN GTTNA GTTNC GTTNG GTTNT GTTNN GTNAA GTNAC GTNAG GTNAT GTNAN GTNCA GTNCC GTNCG GTNCT GTNCN GTNGA GTNGC GTNGG GTNGT GTNGN GTNTA GTNTC GTNTG GTNTT GTNTN GTNNA GTNNC GTNNG GTNNT GTNNN GNAAA GNAAC GNAAG GNAAT GNAAN GNACA GNACC GNACG GNACT GNACN GNAGA GNAGC GNAGG GNAGT GNAGN GNATA GNATC GNATG GNATT GNATN GNANA GNANC GNANG GNANT GNANN GNCAA GNCAC GNCAG GNCAT GNCAN GNCCA GNCCC GNCCG GNCCT GNCCN GNCGA GNCGC GNCGG GNCGT GNCGN GNCTA GNCTC GNCTG GNCTT GNCTN GNCNA GNCNC GNCNG GNCNT GNCNN GNGAA GNGAC GNGAG GNGAT GNGAN GNGCA GNGCC GNGCG GNGCT GNGCN GNGGA GNGGC GNGGG GNGGT GNGGN GNGTA GNGTC GNGTG GNGTT GNGTN GNGNA GNGNC GNGNG GNGNT GNGNN GNTAA GNTAC GNTAG GNTAT GNTAN GNTCA GNTCC GNTCG GNTCT GNTCN GNTGA GNTGC GNTGG GNTGT GNTGN GNTTA GNTTC GNTTG GNTTT GNTTN GNTNA GNTNC GNTNG GNTNT GNTNN GNNAA GNNAC GNNAG GNNAT GNNAN GNNCA GNNCC GNNCG GNNCT GNNCN GNNGA GNNGC GNNGG GNNGT GNNGN GNNTA GNNTC GNNTG GNNTT GNNTN GNNNA GNNNC GNNNG GNNNT GNNNN TAAAA TAAAC TAAAG TAAAT TAAAN TAACA TAACC TAACG TAACT TAACN TAAGA TAAGC TAAGG TAAGT TAAGN TAATA TAATC TAATG TAATT TAATN TAANA TAANC TAANG TAANT TAANN TACAA TACAC TACAG TACAT TACAN TACCA TACCC TACCG TACCT TACCN TACGA TACGC TACGG TACGT TACGN TACTA TACTC TACTG TACTT TACTN TACNA TACNC TACNG TACNT TACNN TAGAA TAGAC TAGAG TAGAT TAGAN TAGCA TAGCC TAGCG TAGCT TAGCN TAGGA TAGGC TAGGG TAGGT TAGGN TAGTA TAGTC TAGTG TAGTT TAGTN TAGNA TAGNC TAGNG TAGNT TAGNN TATAA TATAC TATAG TATAT TATAN TATCA TATCC TATCG TATCT TATCN TATGA TATGC TATGG TATGT TATGN TATTA TATTC TATTG TATTT TATTN TATNA TATNC TATNG TATNT TATNN TANAA TANAC TANAG TANAT TANAN TANCA TANCC TANCG TANCT TANCN TANGA TANGC TANGG TANGT TANGN TANTA TANTC TANTG TANTT TANTN TANNA TANNC TANNG TANNT TANNN TCAAA TCAAC TCAAG TCAAT TCAAN TCACA TCACC TCACG TCACT TCACN TCAGA TCAGC TCAGG TCAGT TCAGN TCATA TCATC TCATG TCATT TCATN TCANA TCANC TCANG TCANT TCANN TCCAA TCCAC TCCAG TCCAT TCCAN TCCCA TCCCC TCCCG TCCCT TCCCN TCCGA TCCGC TCCGG TCCGT TCCGN TCCTA TCCTC TCCTG TCCTT TCCTN TCCNA TCCNC TCCNG TCCNT TCCNN TCGAA TCGAC TCGAG TCGAT TCGAN TCGCA TCGCC TCGCG TCGCT TCGCN TCGGA TCGGC TCGGG TCGGT TCGGN TCGTA TCGTC TCGTG TCGTT TCGTN TCGNA TCGNC TCGNG TCGNT TCGNN TCTAA TCTAC TCTAG TCTAT TCTAN TCTCA TCTCC TCTCG TCTCT TCTCN TCTGA TCTGC TCTGG TCTGT TCTGN TCTTA TCTTC TCTTG TCTTT TCTTN TCTNA TCTNC TCTNG TCTNT TCTNN TCNAA TCNAC TCNAG TCNAT TCNAN TCNCA TCNCC TCNCG TCNCT TCNCN TCNGA TCNGC TCNGG TCNGT TCNGN TCNTA TCNTC TCNTG TCNTT TCNTN TCNNA TCNNC TCNNG TCNNT TCNNN TGAAA TGAAC TGAAG TGAAT TGAAN TGACA TGACC TGACG TGACT TGACN TGAGA TGAGC TGAGG TGAGT TGAGN TGATA TGATC TGATG TGATT TGATN TGANA TGANC TGANG TGANT TGANN TGCAA TGCAC TGCAG TGCAT TGCAN TGCCA TGCCC TGCCG TGCCT TGCCN TGCGA TGCGC TGCGG TGCGT TGCGN TGCTA TGCTC TGCTG TGCTT TGCTN TGCNA TGCNC TGCNG TGCNT TGCNN TGGAA TGGAC TGGAG TGGAT TGGAN TGGCA TGGCC TGGCG TGGCT TGGCN TGGGA TGGGC TGGGG TGGGT TGGGN TGGTA TGGTC TGGTG TGGTT TGGTN TGGNA TGGNC TGGNG TGGNT TGGNN TGTAA TGTAC TGTAG TGTAT TGTAN TGTCA TGTCC TGTCG TGTCT TGTCN TGTGA TGTGC TGTGG TGTGT TGTGN TGTTA TGTTC TGTTG TGTTT TGTTN TGTNA TGTNC TGTNG TGTNT TGTNN TGNAA TGNAC TGNAG TGNAT TGNAN TGNCA TGNCC TGNCG TGNCT TGNCN TGNGA TGNGC TGNGG TGNGT TGNGN TGNTA TGNTC TGNTG TGNTT TGNTN TGNNA TGNNC TGNNG TGNNT TGNNN TTAAA TTAAC TTAAG TTAAT TTAAN TTACA TTACC TTACG TTACT TTACN TTAGA TTAGC TTAGG TTAGT TTAGN TTATA TTATC TTATG TTATT TTATN TTANA TTANC TTANG TTANT TTANN TTCAA TTCAC TTCAG TTCAT TTCAN TTCCA TTCCC TTCCG TTCCT TTCCN TTCGA TTCGC TTCGG TTCGT TTCGN TTCTA TTCTC TTCTG TTCTT TTCTN TTCNA TTCNC TTCNG TTCNT TTCNN TTGAA TTGAC TTGAG TTGAT TTGAN TTGCA TTGCC TTGCG TTGCT TTGCN TTGGA TTGGC TTGGG TTGGT TTGGN TTGTA TTGTC TTGTG TTGTT TTGTN TTGNA TTGNC TTGNG TTGNT TTGNN TTTAA TTTAC TTTAG TTTAT TTTAN TTTCA TTTCC TTTCG TTTCT TTTCN TTTGA TTTGC TTTGG TTTGT TTTGN TTTTA TTTTC TTTTG TTTTT TTTTN TTTNA TTTNC TTTNG TTTNT TTTNN TTNAA TTNAC TTNAG TTNAT TTNAN TTNCA TTNCC TTNCG TTNCT TTNCN TTNGA TTNGC TTNGG TTNGT TTNGN TTNTA TTNTC TTNTG TTNTT TTNTN TTNNA TTNNC TTNNG TTNNT TTNNN TNAAA TNAAC TNAAG TNAAT TNAAN TNACA TNACC TNACG TNACT TNACN TNAGA TNAGC TNAGG TNAGT TNAGN TNATA TNATC TNATG TNATT TNATN TNANA TNANC TNANG TNANT TNANN TNCAA TNCAC TNCAG TNCAT TNCAN TNCCA TNCCC TNCCG TNCCT TNCCN TNCGA TNCGC TNCGG TNCGT TNCGN TNCTA TNCTC TNCTG TNCTT TNCTN TNCNA TNCNC TNCNG TNCNT TNCNN TNGAA TNGAC TNGAG TNGAT TNGAN TNGCA TNGCC TNGCG TNGCT TNGCN TNGGA TNGGC TNGGG TNGGT TNGGN TNGTA TNGTC TNGTG TNGTT TNGTN TNGNA TNGNC TNGNG TNGNT TNGNN TNTAA TNTAC TNTAG TNTAT TNTAN TNTCA TNTCC TNTCG TNTCT TNTCN TNTGA TNTGC TNTGG TNTGT TNTGN TNTTA TNTTC TNTTG TNTTT TNTTN TNTNA TNTNC TNTNG TNTNT TNTNN TNNAA TNNAC TNNAG TNNAT TNNAN TNNCA TNNCC TNNCG TNNCT TNNCN TNNGA TNNGC TNNGG TNNGT TNNGN TNNTA TNNTC TNNTG TNNTT TNNTN TNNNA TNNNC TNNNG TNNNT TNNNN NAAAA NAAAC NAAAG NAAAT NAAAN NAACA NAACC NAACG NAACT NAACN NAAGA NAAGC NAAGG NAAGT NAAGN NAATA NAATC NAATG NAATT NAATN NAANA NAANC NAANG NAANT NAANN NACAA NACAC NACAG NACAT NACAN NACCA NACCC NACCG NACCT NACCN NACGA NACGC NACGG NACGT NACGN NACTA NACTC NACTG NACTT NACTN NACNA NACNC NACNG NACNT NACNN NAGAA NAGAC NAGAG NAGAT NAGAN NAGCA NAGCC NAGCG NAGCT NAGCN NAGGA NAGGC NAGGG NAGGT NAGGN NAGTA NAGTC NAGTG NAGTT NAGTN NAGNA NAGNC NAGNG NAGNT NAGNN NATAA NATAC NATAG NATAT NATAN NATCA NATCC NATCG NATCT NATCN NATGA NATGC NATGG NATGT NATGN NATTA NATTC NATTG NATTT NATTN NATNA NATNC NATNG NATNT NATNN NANAA NANAC NANAG NANAT NANAN NANCA NANCC NANCG NANCT NANCN NANGA NANGC NANGG NANGT NANGN NANTA NANTC NANTG NANTT NANTN NANNA NANNC NANNG NANNT NANNN NCAAA NCAAC NCAAG NCAAT NCAAN NCACA NCACC NCACG NCACT NCACN NCAGA NCAGC NCAGG NCAGT NCAGN NCATA NCATC NCATG NCATT NCATN NCANA NCANC NCANG NCANT NCANN NCCAA NCCAC NCCAG NCCAT NCCAN NCCCA NCCCC NCCCG NCCCT NCCCN NCCGA NCCGC NCCGG NCCGT NCCGN NCCTA NCCTC NCCTG NCCTT NCCTN NCCNA NCCNC NCCNG NCCNT NCCNN NCGAA NCGAC NCGAG NCGAT NCGAN NCGCA NCGCC NCGCG NCGCT NCGCN NCGGA NCGGC NCGGG NCGGT NCGGN NCGTA NCGTC NCGTG NCGTT NCGTN NCGNA NCGNC NCGNG NCGNT NCGNN NCTAA NCTAC NCTAG NCTAT NCTAN NCTCA NCTCC NCTCG NCTCT NCTCN NCTGA NCTGC NCTGG NCTGT NCTGN NCTTA NCTTC NCTTG NCTTT NCTTN NCTNA NCTNC NCTNG NCTNT NCTNN NCNAA NCNAC NCNAG NCNAT NCNAN NCNCA NCNCC NCNCG NCNCT NCNCN NCNGA NCNGC NCNGG NCNGT NCNGN NCNTA NCNTC NCNTG NCNTT NCNTN NCNNA NCNNC NCNNG NCNNT NCNNN NGAAA NGAAC NGAAG NGAAT NGAAN NGACA NGACC NGACG NGACT NGACN NGAGA NGAGC NGAGG NGAGT NGAGN NGATA NGATC NGATG NGATT NGATN NGANA NGANC NGANG NGANT NGANN NGCAA NGCAC NGCAG NGCAT NGCAN NGCCA NGCCC NGCCG NGCCT NGCCN NGCGA NGCGC NGCGG NGCGT NGCGN NGCTA NGCTC NGCTG NGCTT NGCTN NGCNA NGCNC NGCNG NGCNT NGCNN NGGAA NGGAC NGGAG NGGAT NGGAN NGGCA NGGCC NGGCG NGGCT NGGCN NGGGA NGGGC NGGGG NGGGT NGGGN NGGTA NGGTC NGGTG NGGTT NGGTN NGGNA NGGNC NGGNG NGGNT NGGNN NGTAA NGTAC NGTAG NGTAT NGTAN NGTCA NGTCC NGTCG NGTCT NGTCN NGTGA NGTGC NGTGG NGTGT NGTGN NGTTA NGTTC NGTTG NGTTT NGTTN NGTNA NGTNC NGTNG NGTNT NGTNN NGNAA NGNAC NGNAG NGNAT NGNAN NGNCA NGNCC NGNCG NGNCT NGNCN NGNGA NGNGC NGNGG NGNGT NGNGN NGNTA NGNTC NGNTG NGNTT NGNTN NGNNA NGNNC NGNNG NGNNT NGNNN NTAAA NTAAC NTAAG NTAAT NTAAN NTACA NTACC NTACG NTACT NTACN NTAGA NTAGC NTAGG NTAGT NTAGN NTATA NTATC NTATG NTATT NTATN NTANA NTANC NTANG NTANT NTANN NTCAA NTCAC NTCAG NTCAT NTCAN NTCCA NTCCC NTCCG NTCCT NTCCN NTCGA NTCGC NTCGG NTCGT NTCGN NTCTA NTCTC NTCTG NTCTT NTCTN NTCNA NTCNC NTCNG NTCNT NTCNN NTGAA NTGAC NTGAG NTGAT NTGAN NTGCA NTGCC NTGCG NTGCT NTGCN NTGGA NTGGC NTGGG NTGGT NTGGN NTGTA NTGTC NTGTG NTGTT NTGTN NTGNA NTGNC NTGNG NTGNT NTGNN NTTAA NTTAC NTTAG NTTAT NTTAN NTTCA NTTCC NTTCG NTTCT NTTCN NTTGA NTTGC NTTGG NTTGT NTTGN NTTTA NTTTC NTTTG NTTTT NTTTN NTTNA NTTNC NTTNG NTTNT NTTNN NTNAA NTNAC NTNAG NTNAT NTNAN NTNCA NTNCC NTNCG NTNCT NTNCN NTNGA NTNGC NTNGG NTNGT NTNGN NTNTA NTNTC NTNTG NTNTT NTNTN NTNNA NTNNC NTNNG NTNNT NTNNN NNAAA NNAAC NNAAG NNAAT NNAAN NNACA NNACC NNACG NNACT NNACN NNAGA NNAGC NNAGG NNAGT NNAGN NNATA NNATC NNATG NNATT NNATN NNANA NNANC NNANG NNANT NNANN NNCAA NNCAC NNCAG NNCAT NNCAN NNCCA NNCCC NNCCG NNCCT NNCCN NNCGA NNCGC NNCGG NNCGT NNCGN NNCTA NNCTC NNCTG NNCTT NNCTN NNCNA NNCNC NNCNG NNCNT NNCNN NNGAA NNGAC NNGAG NNGAT NNGAN NNGCA NNGCC NNGCG NNGCT NNGCN NNGGA NNGGC NNGGG NNGGT NNGGN NNGTA NNGTC NNGTG NNGTT NNGTN NNGNA NNGNC NNGNG NNGNT NNGNN NNTAA NNTAC NNTAG NNTAT NNTAN NNTCA NNTCC NNTCG NNTCT NNTCN NNTGA NNTGC NNTGG NNTGT NNTGN NNTTA NNTTC NNTTG NNTTT NNTTN NNTNA NNTNC NNTNG NNTNT NNTNN NNNAA NNNAC NNNAG NNNAT NNNAN NNNCA NNNCC NNNCG NNNCT NNNCN NNNGA NNNGC NNNGG NNNGT NNNGN NNNTA NNNTC NNNTG NNNTT NNNTN NNNNA NNNNC NNNNG NNNNT NNNNN A 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.07 0.91 0.95 1.01 0.97 0.99 0.96 0.95 0.97 0.96 0.97 1.16 0.97 0.96 0.98 0.87 1.04 0.96 0.77 0.87 0.96 0.95 0.96 0.9 0.94 0.74 0.91 0.91 0.76 0.8 0.61 0.88 0.85 0.58 0.66 0.72 1.01 0.99 0.88 0.84 0.62 0.89 0.71 0.61 0.67 0.65 0.91 0.82 0.65 0.71 0.99 0.98 1.05 0.95 0.95 0.99 1.07 1.08 1.11 1.06 1.02 0.97 0.97 1.09 1 1.01 0.99 0.97 1.17 1.02 1 1 1.01 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.97 1.06 1.04 1.05 1.02 1.1 1 1.03 1.18 1.06 1.13 1.1 1.1 1.11 1.11 1.01 1.26 1.13 1.04 1.08 1.02 1.06 1.06 1.07 1.05 0.9 0.91 0.86 0.77 0.85 0.64 0.78 0.72 0.77 0.71 0.92 0.98 0.98 0.82 0.91 0.63 0.88 0.74 0.63 0.68 0.7 0.87 0.79 0.72 0.75 1.09 0.91 1.02 0.99 0.99 1.25 1.16 1.16 1.15 1.17 1.17 1.07 1.12 1.17 1.13 1.2 1.14 1.14 1.12 1.15 1.15 1.04 1.1 1.08 1.09 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.02 0.92 0.97 0.95 0.96 0.98 0.93 1.06 0.98 0.98 0.8 0.98 0.78 0.92 0.84 0.95 0.99 0.99 0.99 0.97 0.89 0.88 0.91 0.9 0.9 0.95 0.95 0.98 0.83 0.91 0.7 0.94 0.85 0.72 0.77 0.97 1.01 1.02 0.87 0.95 0.66 0.97 0.73 0.81 0.75 0.75 0.96 0.83 0.79 0.81 1 0.99 1.06 0.95 0.99 1.1 1.15 1.21 1.2 1.15 1.06 1 1.11 1.17 1.07 1.14 1.02 1.11 1.08 1.08 1.07 1.03 1.11 1.07 1.06 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.03 1.09 1.06 1.15 1.07 1.06 1.01 1 1.12 1.04 1.03 1.22 1.21 1.08 1.11 0.94 0.96 1.03 1.04 0.98 1.01 1.03 1.06 1.07 1.04 0.76 1.02 0.88 0.87 0.85 0.64 0.8 0.73 0.71 0.71 0.82 1.23 0.9 0.86 0.89 0.66 1.04 0.88 0.7 0.75 0.7 0.93 0.82 0.76 0.77 0.95 0.94 1.05 0.94 0.96 1.14 1.1 1.21 1.12 1.13 1.13 1.24 1.17 1.17 1.17 1.12 1.11 1.03 1.1 1.08 1.06 1.06 1.08 1.05 1.06 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.99 0.96 0.99 1.01 0.98 0.98 0.94 0.99 0.99 0.97 0.94 1.09 0.94 0.98 0.97 0.9 1.01 0.99 0.89 0.93 0.94 0.96 0.98 0.96 0.96 0.81 0.94 0.9 0.79 0.85 0.64 0.83 0.77 0.66 0.7 0.82 1.02 0.96 0.86 0.89 0.64 0.93 0.75 0.67 0.7 0.69 0.91 0.81 0.72 0.76 0.99 0.95 1.02 0.93 0.97 1.09 1.11 1.16 1.13 1.12 1.08 1.04 1.07 1.15 1.08 1.1 1.06 1.05 1.11 1.08 1.06 1.03 1.07 1.05 1.05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.04 1.06 1.02 0.93 1.01 1.14 1.01 0.96 0.99 1 1.07 1.17 1.08 1.07 1.09 0.97 0.95 1.05 1.1 1.01 1.03 1.03 1.01 0.99 1.02 0.82 0.86 0.91 0.77 0.82 0.67 0.85 0.72 0.62 0.69 0.83 1.04 1.01 0.86 0.9 0.72 0.91 0.8 0.67 0.74 0.73 0.89 0.81 0.69 0.76 0.97 0.87 0.87 0.9 0.89 1.11 1.1 1.08 1.08 1.09 1.09 1 0.99 1.1 1.03 0.89 1.04 1.02 0.98 0.97 0.99 0.98 0.96 0.99 0.98 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.2 1.15 1.12 1.23 1.17 1.23 1.05 1.15 1.07 1.11 1.26 1.23 1.16 1.21 1.21 1.19 1.12 1.1 1.17 1.14 1.21 1.13 1.12 1.16 1.15 0.98 0.92 0.86 0.85 0.89 0.62 0.78 0.79 0.63 0.68 0.81 0.95 0.96 0.91 0.89 0.61 0.89 0.86 0.74 0.72 0.69 0.87 0.85 0.73 0.76 1.02 0.97 0.96 0.96 0.97 1.29 1.33 1.17 1.15 1.19 1.11 1.15 1.14 1.3 1.15 1.13 1.14 1.23 1.13 1.15 1.11 1.09 1.1 1.08 1.09 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.04 1.04 1.03 0.99 1.02 1 1.04 1.02 1.12 1.04 0.91 0.91 0.9 0.87 0.9 1.2 1.08 1.12 1.26 1.14 1 0.99 0.99 1 0.99 0.79 0.9 0.84 0.88 0.84 0.7 0.77 0.77 0.64 0.7 0.79 0.86 0.99 0.82 0.85 0.71 0.92 0.76 0.67 0.73 0.74 0.85 0.81 0.71 0.76 0.95 0.97 0.96 0.97 0.96 1.21 1.21 1.18 1.2 1.2 1.16 1.08 1.16 1.08 1.11 1.05 1.07 1.08 0.95 1.02 1.06 1.06 1.07 1.02 1.05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1.17 1.04 1.18 1.08 1.13 1.14 1.09 1.2 1.13 1.12 1.23 1.17 1.14 1.16 1.12 1.12 1.12 1.12 1.11 1.08 1.15 1.1 1.12 1.11 0.92 1.06 1.01 0.76 0.88 0.7 0.87 0.79 0.65 0.72 0.85 1.04 1.03 0.85 0.91 0.67 0.94 0.85 0.63 0.72 0.75 0.95 0.88 0.69 0.78 1.15 0.91 0.97 0.95 0.97 1.19 1.29 1.19 1.24 1.22 1.16 1.06 1.23 1.1 1.12 1.06 1.12 1.09 1.08 1.08 1.12 1.05 1.08 1.06 1.08 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.04 1.08 1.02 1.02 1.04 1.06 1.04 1.01 1.07 1.04 1.04 1.08 1.04 1.02 1.05 1.08 1.03 1.05 1.11 1.06 1.05 1.05 1.03 1.04 1.04 0.85 0.91 0.89 0.8 0.86 0.67 0.81 0.76 0.63 0.7 0.81 0.95 0.99 0.85 0.88 0.67 0.91 0.81 0.67 0.73 0.72 0.88 0.84 0.7 0.76 1 0.92 0.93 0.94 0.94 1.18 1.19 1.15 1.15 1.17 1.12 1.06 1.1 1.11 1.1 1.01 1.09 1.09 1.02 1.05 1.06 1.04 1.04 1.03 1.04 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.08 1.16 1.11 1.15 1.11 1.19 1.21 1.07 1.21 1.15 1.27 1.13 1.13 1.12 1.14 0.97 1.01 1.1 1.15 1.04 1.06 1.1 1.08 1.14 1.09 0.74 0.8 0.76 0.68 0.73 0.57 0.75 0.86 0.6 0.65 0.72 0.91 1.01 0.87 0.82 0.55 0.84 0.68 0.55 0.61 0.61 0.81 0.77 0.63 0.67 0.92 0.92 0.86 0.9 0.9 1.09 1.07 1.1 1.09 1.08 0.98 1.06 1.01 1.01 1.01 1.06 0.95 1.02 1.05 1.02 1 0.98 0.98 0.99 0.99 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.16 1.28 1.38 1.36 1.26 1.23 1.23 1.2 1.2 1.21 1.31 1.31 1.28 1.28 1.29 1.27 1.17 1.25 1.26 1.23 1.23 1.23 1.26 1.26 1.24 0.81 0.93 0.93 0.78 0.84 0.65 0.9 0.77 0.63 0.7 0.82 1 0.97 0.85 0.88 0.65 1.02 0.9 0.67 0.73 0.71 0.95 0.86 0.7 0.77 1.16 0.91 0.99 0.98 0.98 1.15 1.18 1.23 1.14 1.17 1.13 1.12 1.15 1.11 1.13 1.15 1.16 1.15 1.2 1.16 1.13 1.05 1.1 1.09 1.09 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.25 1.21 1.21 1.25 1.23 1.19 1.16 1.13 1.17 1.16 1.06 1.03 1 1 1.02 1.16 1.1 1.08 1.03 1.06 1.12 1.08 1.09 1.09 1.09 0.71 1.01 0.85 0.83 0.81 0.59 0.82 0.75 0.64 0.67 0.88 1.02 0.95 0.82 0.9 0.56 0.85 0.66 0.59 0.63 0.64 0.89 0.76 0.68 0.71 0.99 0.94 0.88 0.9 0.92 1.18 1.24 1.18 1.16 1.19 1.11 1.17 1.04 1.09 1.09 1.24 0.98 0.95 1.07 1.02 1.11 1.04 0.98 1.03 1.03 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.13 1.21 1.1 1.16 1.15 1.16 1.11 1.25 1.25 1.18 1.19 1.2 1.34 1.19 1.22 1.24 1.18 1.13 1.15 1.15 1.18 1.15 1.18 1.17 1.17 0.89 1.04 0.8 0.78 0.85 0.71 0.91 0.77 0.64 0.72 0.88 1.04 0.96 0.79 0.88 0.62 0.89 0.77 0.59 0.67 0.72 0.85 0.8 0.67 0.74 1.23 0.93 1.14 0.89 0.97 1.17 1.2 1.16 1.3 1.2 1.07 1.22 0.95 1.26 1.06 1.13 0.93 1.24 1.05 1.04 1.13 1 1.07 1.04 1.05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.14 1.2 1.16 1.19 1.17 1.18 1.15 1.12 1.19 1.16 1.18 1.13 1.13 1.11 1.13 1.08 1.08 1.11 1.12 1.09 1.12 1.12 1.13 1.15 1.13 0.77 0.91 0.82 0.75 0.8 0.61 0.79 0.78 0.63 0.68 0.8 0.98 0.97 0.83 0.87 0.59 0.88 0.72 0.59 0.65 0.66 0.86 0.79 0.66 0.72 1.02 0.92 0.93 0.91 0.94 1.14 1.16 1.16 1.16 1.15 1.06 1.13 1.01 1.1 1.07 1.13 0.98 1.05 1.09 1.05 1.08 1.01 1.02 1.04 1.04 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.12 0.89 0.96 0.94 0.96 0.89 0.92 0.98 1 0.94 0.96 0.99 0.98 0.96 0.97 0.91 0.96 0.94 0.87 0.91 0.95 0.92 0.96 0.93 0.94 0.72 0.88 0.89 0.74 0.78 0.62 0.77 0.74 0.59 0.65 0.75 0.98 0.89 0.76 0.81 0.63 0.9 0.75 0.67 0.7 0.66 0.85 0.79 0.67 0.71 1.12 1.05 0.9 0.93 0.97 1.08 1.14 1.1 1.08 1.1 1.06 0.97 1.06 1.01 1.02 1.12 1.05 1.07 1.04 1.07 1.08 1.04 1.01 1 1.03 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.07 1.1 1.12 1.17 1.1 1.15 1.05 1.07 1.09 1.08 1.12 1.14 1.12 1.11 1.12 0.98 1.04 1.06 1.08 1.03 1.07 1.08 1.08 1.1 1.08 0.81 0.92 0.87 0.78 0.83 0.63 0.79 0.78 0.82 0.73 1 0.99 0.98 0.94 0.97 0.63 1.02 0.73 0.81 0.73 0.7 0.89 0.81 0.82 0.78 1.13 0.94 0.89 1 0.96 1.18 1.18 1.18 1.18 1.18 1.18 1.23 1.07 1.15 1.15 1.08 1.27 1.09 1.2 1.14 1.13 1.1 1.02 1.11 1.08 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.98 0.9 1.01 1.02 0.96 0.96 1.07 1.09 1.07 1.04 0.83 0.84 0.81 0.83 0.83 1.1 0.93 0.99 0.89 0.95 0.91 0.89 0.94 0.9 0.91 0.82 1.07 0.97 0.84 0.89 0.73 0.9 0.76 0.68 0.74 0.89 1 0.96 0.92 0.93 0.67 0.89 0.74 0.7 0.73 0.75 0.94 0.81 0.75 0.79 1.06 0.97 1.04 0.95 0.99 1.24 1.25 1.3 1.19 1.24 1.11 1.13 1.19 1.31 1.16 1.07 1.14 1.11 1.05 1.09 1.1 1.1 1.13 1.06 1.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.18 1.09 1.09 1.18 1.12 1.1 0.99 1.08 1.07 1.05 1.1 1.13 1.08 1.13 1.11 1.11 1.13 1 1.1 1.07 1.11 1.07 1.06 1.11 1.08 0.87 0.9 0.86 0.8 0.85 0.67 0.88 0.79 0.69 0.73 0.84 1.13 1.06 0.81 0.88 0.64 0.92 0.86 0.71 0.74 0.72 0.93 0.86 0.74 0.78 1.02 1.01 1.02 0.97 1 1.16 1.21 1.17 1.22 1.19 1.1 1.08 1.13 1.15 1.11 1.11 1.1 1.15 1.06 1.1 1.09 1.09 1.11 1.07 1.09 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.03 0.96 1 1.02 1 0.99 0.99 1.03 1.03 1 0.96 0.97 0.95 0.96 0.96 0.93 0.97 0.97 0.93 0.94 0.97 0.96 0.98 0.97 0.97 0.79 0.92 0.89 0.78 0.83 0.65 0.81 0.77 0.66 0.7 0.84 1.01 0.96 0.83 0.89 0.64 0.92 0.76 0.71 0.72 0.7 0.89 0.82 0.73 0.76 1.07 0.99 0.95 0.96 0.98 1.15 1.18 1.17 1.16 1.17 1.1 1.07 1.11 1.12 1.1 1.09 1.12 1.1 1.07 1.1 1.1 1.08 1.06 1.06 1.07 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.04 0.97 0.96 0.98 0.98 1 0.97 0.98 1 0.98 1.01 1.07 1.02 1.01 1.02 0.93 0.94 0.99 0.91 0.94 0.98 0.97 0.98 0.96 0.97 0.75 0.86 0.85 0.73 0.78 0.61 0.79 0.78 0.59 0.66 0.75 0.98 0.96 0.83 0.84 0.62 0.88 0.73 0.62 0.67 0.66 0.86 0.8 0.66 0.71 0.98 0.94 0.89 0.9 0.92 1.06 1.09 1.09 1.09 1.08 1.03 0.99 1 1.04 1.02 1 1 1.02 1.04 1.02 1.01 1 0.98 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.07 1.11 1.09 1.16 1.1 1.14 1.06 1.09 1.12 1.1 1.18 1.17 1.15 1.16 1.16 1.06 1.1 1.1 1.11 1.09 1.1 1.1 1.1 1.13 1.11 0.85 0.92 0.88 0.79 0.85 0.64 0.8 0.76 0.68 0.7 0.86 0.98 0.97 0.87 0.91 0.63 0.94 0.79 0.69 0.71 0.7 0.89 0.82 0.74 0.76 1.08 0.93 0.96 0.98 0.98 1.21 1.19 1.18 1.15 1.18 1.14 1.13 1.12 1.17 1.14 1.13 1.16 1.14 1.16 1.15 1.13 1.07 1.07 1.09 1.09 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.03 0.97 1.03 1.02 1.01 1 1.02 1.07 1.06 1.03 0.86 0.92 0.84 0.89 0.87 1.04 0.99 1.01 0.95 0.99 0.95 0.94 0.96 0.95 0.95 0.79 0.96 0.9 0.85 0.86 0.67 0.84 0.78 0.67 0.72 0.87 0.95 0.98 0.85 0.9 0.64 0.9 0.72 0.67 0.7 0.71 0.9 0.8 0.72 0.76 0.99 0.96 0.96 0.94 0.96 1.18 1.19 1.21 1.19 1.19 1.1 1.08 1.11 1.13 1.1 1.1 1.04 1.04 1.02 1.05 1.08 1.05 1.06 1.04 1.06 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.07 1.12 1.06 1.16 1.1 1.1 1.04 1.08 1.13 1.08 1.1 1.19 1.17 1.12 1.14 1.07 1.07 1.06 1.1 1.06 1.08 1.09 1.09 1.11 1.09 0.84 0.98 0.87 0.8 0.86 0.68 0.81 0.77 0.67 0.72 0.84 1.08 0.97 0.82 0.89 0.65 0.93 0.83 0.65 0.71 0.72 0.9 0.84 0.71 0.77 1.05 0.94 1.02 0.93 0.97 1.16 1.18 1.18 1.2 1.18 1.11 1.12 1.07 1.15 1.11 1.1 1.04 1.1 1.07 1.08 1.1 1.04 1.08 1.06 1.07 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.03 1.01 1.01 1.04 1.02 1.02 1 1.03 1.05 1.02 0.99 1.05 0.99 1 1.01 0.97 0.99 1.01 0.98 0.98 1 1 1.01 1.01 1 0.8 0.92 0.87 0.78 0.83 0.64 0.81 0.77 0.65 0.69 0.82 0.99 0.97 0.84 0.88 0.63 0.91 0.76 0.65 0.7 0.69 0.89 0.81 0.7 0.75 1.02 0.94 0.95 0.93 0.96 1.14 1.15 1.16 1.15 1.15 1.09 1.07 1.07 1.12 1.09 1.08 1.05 1.07 1.07 1.07 1.08 1.04 1.05 1.04 1.05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 1.04 0.99 1.08 1.11 1.05 1.08 1.11 1.04 1.04 1.06 1.1 1.23 1.16 1.09 1.13 1 1.03 1.02 1.09 1.03 1.05 1.06 1.07 1.08 1.06 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.97 1.11 1.11 0.94 1 0.68 1.07 0.98 0.7 0.78 0.96 1.25 1.23 1.12 1.07 0.74 1.03 0.88 0.77 0.81 0.8 1.1 1 0.82 0.88 1.01 1 1.07 0.86 0.95 1.08 1.01 1.02 1.05 1.03 1 0.95 0.95 1.08 0.99 0.93 0.91 0.94 1.09 0.95 1 0.96 0.99 0.95 0.97 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.11 1.12 1.02 1.14 1.09 1.13 1.02 1.12 1.13 1.09 1.19 1.19 1.24 1.14 1.18 1.12 1.03 1.05 1.12 1.07 1.13 1.07 1.09 1.12 1.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.25 1.25 1.2 1.11 1.19 0.9 1.04 0.98 1.04 0.97 1.11 1.17 1.17 1.01 1.1 0.91 1.16 1.02 0.91 0.96 0.96 1.12 1.04 0.98 1.01 1.07 0.98 1 0.92 0.98 1.09 0.99 1 0.98 1.01 1.11 1.01 1.07 1.11 1.07 1 0.96 0.98 0.94 0.97 1.06 0.97 1.01 0.97 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.1 1 1.06 1.15 1.07 1.13 1.17 1.07 1.09 1.11 1.08 1.08 1.14 1.14 1.1 1.06 0.99 1.07 1.08 1.04 1.09 1.04 1.08 1.11 1.08 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.22 1.22 1.17 1.11 1.16 1.03 1.21 1.12 0.99 1.05 1.28 1.28 1.32 1.18 1.25 0.94 1.11 0.94 1.06 0.96 1.04 1.19 1.07 1.07 1.07 0.98 0.98 1.04 0.95 0.98 1.01 0.92 1.03 0.98 0.98 1 0.95 1.02 1.11 1 1.03 0.92 1.01 0.99 0.98 1 0.94 1.02 0.98 0.98 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.04 0.99 1.01 1.02 1 0.96 0.99 0.94 0.98 0.97 0.98 1.17 1.14 1.06 1.06 0.95 0.87 0.97 0.93 0.92 0.98 0.97 0.98 0.98 0.98 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.05 1.3 1.15 1.13 1.12 0.98 1 0.93 1 0.95 1.19 1.52 1.17 1.16 1.19 1.14 1.35 1.14 0.97 1.05 1.05 1.19 1.07 1.04 1.05 0.89 0.91 0.85 0.85 0.87 0.97 0.92 1.04 0.94 0.96 1.05 1.14 1.07 1.05 1.07 0.97 0.94 0.88 0.95 0.93 0.95 0.94 0.93 0.92 0.94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.04 0.97 1.02 1.05 1.02 1.05 1.04 1.02 1.04 1.04 1.04 1.12 1.09 1.09 1.08 1.01 0.96 0.99 1.02 0.99 1.03 1.01 1.02 1.04 1.03 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.08 1.2 1.14 1.05 1.1 0.85 1.06 0.98 0.88 0.92 1.08 1.27 1.21 1.11 1.14 0.87 1.13 0.97 0.9 0.92 0.93 1.14 1.04 0.95 0.99 0.97 0.96 0.97 0.88 0.94 1.02 0.95 1.02 0.98 0.99 1.03 0.99 1.01 1.08 1.02 0.98 0.93 0.94 0.98 0.95 0.99 0.95 0.98 0.95 0.97 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.01 0.99 1.02 1.06 1.01 1.08 0.96 1.05 1.11 1.04 1.03 1.01 0.99 1.15 1.03 0.95 0.93 1.13 0.89 0.94 1 0.97 1.02 1.01 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.03 1.06 1.28 0.94 1.04 0.83 0.98 0.9 0.75 0.84 1.12 1.28 1.2 1.1 1.14 0.81 1.05 0.93 0.79 0.86 0.91 1.06 1.01 0.85 0.93 0.98 0.89 0.91 0.91 0.91 1.04 0.95 1.02 1.03 1.01 1.07 0.99 0.98 1.09 1.02 0.98 0.96 0.94 0.9 0.94 1 0.93 0.93 0.95 0.95 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.1 1.11 1.05 0.99 1.05 1.12 1.06 1.06 1.02 1.06 1.2 1.21 1.18 1.04 1.14 1.14 1.01 0.9 0.99 0.98 1.14 1.08 1.02 0.99 1.05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.33 1.27 1.2 1.24 1.25 0.88 1.04 1.05 0.89 0.94 1 1.14 1.15 1.16 1.09 0.9 1.17 1.17 1.04 1.02 0.94 1.13 1.12 1.02 1.02 0.99 0.95 0.94 0.94 0.95 1.13 0.94 1.01 0.98 1 1.04 1.09 1.06 1.24 1.09 0.94 0.97 1.05 0.96 0.97 1.01 0.97 1 0.99 0.99 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.06 1.07 1.09 1.05 1.07 1.09 1.07 1.09 1.09 1.09 1.09 1.08 0.91 1.09 1.01 1.01 1.03 1.02 0.99 1.01 1.06 1.06 1 1.05 1.04 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.06 1.17 1.11 1.15 1.11 0.88 1.04 1.04 0.91 0.95 1.09 1.16 1.29 1.12 1.14 0.95 1.14 0.98 0.81 0.93 0.98 1.11 1.07 0.94 1.01 0.94 0.95 0.95 0.96 0.95 1.02 1.02 1 1.02 1.02 1.09 1.02 1.1 1.02 1.05 0.95 0.97 0.98 0.85 0.93 0.98 0.98 0.99 0.93 0.97 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.06 0.96 1.17 1.05 1.04 0.98 0.93 0.99 0.93 0.95 1.04 1.06 0.92 1.04 1.01 1 0.9 0.89 0.92 0.92 1.01 0.95 0.97 0.97 0.97 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.33 1.33 1.29 1.04 1.18 1.01 1.09 0.99 0.92 0.97 1.23 1.32 1.29 1.07 1.19 1.14 1.19 1.07 0.89 0.99 1.13 1.21 1.11 0.95 1.05 1.09 0.92 0.91 0.91 0.93 1.03 1.12 1.02 1.07 1.05 1.06 0.99 1.14 0.99 1.03 0.91 0.97 0.93 0.93 0.93 1.01 0.97 0.97 0.95 0.97 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.02 0.99 1.04 1.02 1.02 1.04 0.99 1.03 1.02 1.02 1.06 1.07 0.97 1.04 1.03 1 0.95 0.93 0.93 0.95 1.03 0.99 0.99 0.99 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.14 1.17 1.2 1.06 1.13 0.89 1.02 0.98 0.85 0.91 1.09 1.19 1.22 1.11 1.13 0.89 1.13 1.01 0.86 0.93 0.97 1.11 1.07 0.93 1 0.98 0.92 0.91 0.93 0.93 1.04 0.99 1.01 1.02 1.02 1.07 1.01 1.04 1.06 1.04 0.94 0.96 0.96 0.9 0.94 0.99 0.96 0.96 0.95 0.96 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.11 1.03 1.03 1 1.04 1.02 1.04 1.06 0.98 1.02 1.12 1.03 1.02 1.03 1.05 1.17 0.87 1.04 0.99 0.99 1.08 0.98 1.04 1 1.02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.99 1.01 1.04 0.89 0.97 0.69 0.85 0.99 0.74 0.77 0.87 1.19 1.24 1.1 1.04 0.67 0.99 0.82 0.72 0.76 0.77 0.98 0.96 0.81 0.85 0.94 0.93 0.94 0.87 0.91 1.04 0.94 1.04 1.03 1.01 1.02 1.04 0.99 1.06 1.02 0.91 0.92 0.94 0.94 0.92 0.96 0.95 0.95 0.96 0.95 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.11 1.1 0.97 1.17 1.07 1.07 0.99 1.07 1.05 1.04 1.21 1.14 1.19 1.11 1.15 1.13 1.1 1.06 0.91 1.02 1.11 1.07 1.04 1.04 1.06 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.15 1.28 1.27 1.12 1.18 0.91 1.16 1.02 0.89 0.96 1.01 1.19 1.16 1.04 1.07 0.94 1.31 1.2 0.95 1.02 0.96 1.21 1.12 0.96 1.02 1.14 0.9 0.97 0.97 0.97 1 1.01 1.07 0.98 1.01 1.07 1.06 1.09 1.06 1.07 0.97 0.92 0.97 1.02 0.97 1.04 0.95 1.01 1 0.99 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.18 1.03 1.03 1.15 1.08 1.17 1.11 1.06 1.13 1.1 1.17 1.03 1.11 1.09 1.09 1.02 0.96 0.98 1.13 1.01 1.11 1.02 1.04 1.1 1.06 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.15 1.27 1.31 1.11 1.14 0.92 1.09 1.02 0.91 0.95 1.24 1.32 1.25 1.15 1.22 0.99 1.07 0.98 0.99 0.94 1.02 1.15 1.08 0.98 1.02 0.97 0.9 0.87 0.93 0.91 1.01 1.05 1 0.99 1.01 1.05 1.11 0.97 1.03 1.03 1.14 0.94 0.86 0.97 0.94 1.02 0.96 0.9 0.96 0.95 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.98 1.03 0.95 0.94 0.97 0.95 0.92 0.96 1.11 0.96 1.01 0.98 1.12 0.99 1 0.82 0.9 0.91 0.86 0.86 0.91 0.95 0.96 0.95 0.94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.16 1.32 1.08 1.24 1.15 1.02 0.69 0.97 0.99 0.82 1.34 1.33 1.26 1.08 1.2 1.02 1.18 1.12 1.06 1.03 1.06 0.95 1.07 1.04 0.98 1.17 0.8 1.08 0.83 0.89 1.01 1.04 1 1.13 1.03 0.97 1.12 0.85 1.16 0.96 0.98 0.81 1.09 0.91 0.9 1.01 0.87 0.96 0.93 0.93 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.05 1.03 0.96 0.99 1 1.03 0.97 1.02 1.02 1.01 1.08 1.02 1.05 1.04 1.04 0.98 0.93 0.97 0.94 0.95 1.03 0.98 1 0.99 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.08 1.17 1.14 1.04 1.09 0.85 0.87 0.99 0.86 0.85 1.04 1.24 1.22 1.08 1.11 0.84 1.11 0.97 0.87 0.9 0.91 1.04 1.04 0.93 0.95 1 0.88 0.93 0.89 0.92 1.01 1 1.02 1.02 1.01 1.02 1.07 0.95 1.05 1.01 0.98 0.88 0.93 0.95 0.93 1 0.93 0.94 0.96 0.95 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.96 1.03 1.09 0.99 1 1.11 1.06 1.04 1.05 1.06 1.13 1.07 0.95 1.16 1.04 1.12 0.99 0.94 1.05 1.01 1.06 1.03 0.97 1.05 1.03 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.92 1.08 1.09 0.8 0.93 0.7 0.84 0.87 0.71 0.76 0.96 1.22 1.13 1 1.04 0.73 1.03 0.87 0.8 0.82 0.8 1 0.95 0.8 0.86 1.13 1.07 0.98 0.94 1 1.11 1.08 1.04 1.02 1.06 1.04 0.95 1.05 0.99 1 0.98 0.97 0.97 0.93 0.96 1.05 1 0.99 0.96 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.15 1.14 1.09 1.07 1.1 1.07 1.16 1.05 1.03 1.07 1.19 1.09 1.18 1.13 1.14 1.1 1.12 1.1 1.04 1.09 1.12 1.11 1.1 1.06 1.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.14 1.26 1.15 1.12 1.15 0.89 1.05 1.04 1.05 0.98 1.19 1.18 1.17 1.13 1.16 0.91 1.26 1.02 1.05 0.99 0.95 1.14 1.06 1.06 1.03 1.12 0.93 0.88 0.98 0.95 1.05 0.98 1.02 1 1.01 1.12 1.17 1.07 1.09 1.1 1.01 1.09 0.91 1.02 0.99 1.06 1 0.94 1.01 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.06 1.1 1.05 1.05 1.06 1.08 1.1 1.1 1.03 1.07 1.17 1.07 1.08 1.09 1.1 1.05 1.06 0.94 1 1 1.08 1.08 1.03 1.04 1.05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.07 1.33 1.2 1.07 1.13 0.96 1.17 1.03 0.91 0.99 1.2 1.3 1.26 1.18 1.22 0.88 1.1 0.88 0.87 0.91 0.99 1.2 1.03 0.97 1.02 1.05 0.96 1.03 0.93 0.98 1.06 1.07 1.12 1.01 1.06 1.05 1.06 1.13 1.24 1.1 0.98 1.04 1.01 0.95 0.99 1.02 1.01 1.05 0.98 1.01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.04 0.99 0.96 1.02 0.99 0.95 1 0.94 0.92 0.95 1.01 1.01 1.01 0.97 0.99 1.09 0.88 0.85 0.95 0.92 1.02 0.96 0.92 0.95 0.96 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.15 1.17 1.14 0.95 1.08 0.87 1.08 0.99 0.96 0.95 1.12 1.42 1.35 1.09 1.17 1.02 1.22 1.02 0.95 0.98 0.99 1.18 1.08 0.96 1.02 0.96 0.95 0.95 0.89 0.94 0.99 1.03 1 1.05 1.01 1.01 0.97 1.02 1.04 1.01 0.96 0.94 1 0.9 0.95 0.98 0.97 0.99 0.95 0.97 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.02 1.02 1 1.01 1.01 1.03 1.06 1.01 1 1.02 1.09 1.05 1 1.05 1.04 1.07 0.98 0.93 1 0.99 1.05 1.02 0.98 1.01 1.01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.05 1.18 1.13 0.94 1.05 0.83 0.99 0.97 0.87 0.9 1.08 1.26 1.21 1.07 1.13 0.85 1.13 0.93 0.9 0.91 0.92 1.11 1.03 0.92 0.97 1.05 0.97 0.95 0.93 0.97 1.04 1.03 1.03 1.01 1.03 1.05 1.01 1.06 1.06 1.04 0.98 1 0.96 0.94 0.97 1.03 0.99 0.99 0.97 0.99 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.01 1 1.04 1.03 1.02 1.06 1.03 1.04 1.04 1.04 1.09 1.06 1 1.09 1.06 1.02 0.95 1 0.98 0.99 1.04 1.01 1.02 1.03 1.02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.97 1.06 1.1 0.88 0.98 0.72 0.91 0.92 0.72 0.78 0.96 1.22 1.19 1.07 1.07 0.73 1.03 0.86 0.76 0.8 0.81 1.03 0.98 0.82 0.88 0.99 0.95 0.95 0.89 0.94 1.06 0.99 1.03 1.03 1.03 1.03 0.98 0.99 1.04 1.01 0.94 0.94 0.95 0.95 0.94 1 0.96 0.96 0.95 0.97 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.12 1.11 1.03 1.07 1.08 1.1 1.05 1.07 1.05 1.06 1.18 1.15 1.18 1.09 1.15 1.13 1.06 1 1 1.03 1.12 1.08 1.06 1.04 1.07 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.19 1.26 1.2 1.14 1.19 0.89 1.06 1.02 0.94 0.96 1.05 1.17 1.16 1.07 1.1 0.91 1.21 1.08 0.97 0.99 0.95 1.14 1.08 1 1.02 1.07 0.94 0.94 0.95 0.96 1.06 0.98 1.02 0.99 1.01 1.08 1.07 1.07 1.11 1.08 0.98 0.97 0.96 0.98 0.97 1.04 0.97 0.98 0.99 0.99 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.09 1.05 1.06 1.09 1.07 1.1 1.11 1.08 1.08 1.09 1.12 1.06 1.02 1.1 1.07 1.03 1.01 0.99 1.04 1.01 1.08 1.05 1.03 1.07 1.06 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.08 1.23 1.15 1.11 1.13 0.92 1.11 1.04 0.92 0.98 1.18 1.24 1.28 1.15 1.2 0.91 1.09 0.93 0.88 0.92 0.99 1.15 1.06 0.98 1.02 0.98 0.94 0.95 0.94 0.95 1.02 1 1.03 1 1.01 1.04 1.02 1.04 1.07 1.04 1.01 0.96 0.94 0.93 0.95 1 0.97 0.97 0.96 0.98 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.02 0.98 0.99 1 1 0.96 0.95 0.96 0.96 0.95 1 1.03 1.01 1.01 1.01 0.94 0.88 0.88 0.91 0.9 0.98 0.95 0.95 0.96 0.96 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.15 1.26 1.14 1.05 1.13 0.95 0.89 0.97 0.96 0.91 1.21 1.37 1.25 1.09 1.19 1.07 1.22 1.06 0.93 1 1.05 1.11 1.08 0.98 1.02 0.98 0.88 0.92 0.87 0.9 1 1 1.01 1.03 1.01 1.01 1.03 0.97 1.05 1.01 0.95 0.89 0.95 0.92 0.93 0.98 0.93 0.96 0.94 0.95 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.03 1 1 1.01 1.01 1.04 1.01 1.02 1.01 1.02 1.06 1.06 1.02 1.05 1.05 1.01 0.95 0.95 0.97 0.97 1.03 1 1 1.01 1.01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.08 1.18 1.14 1.02 1.09 0.85 0.97 0.98 0.86 0.89 1.07 1.23 1.21 1.09 1.13 0.86 1.12 0.96 0.87 0.91 0.93 1.1 1.04 0.93 0.98 1 0.93 0.93 0.9 0.94 1.03 0.99 1.02 1.01 1.01 1.04 1.01 1.01 1.06 1.03 0.97 0.93 0.95 0.94 0.95 1 0.95 0.97 0.96 0.97 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G 0.88 0.84 0.92 0.86 0.87 0.91 0.86 0.87 0.91 0.88 0.84 1 0.93 0.87 0.9 0.85 0.83 0.87 0.9 0.86 0.87 0.86 0.89 0.88 0.87 0.85 1.15 0.94 1.06 0.95 0.91 0.9 0.95 1.09 0.94 1.05 1.29 1.05 1.04 1.07 0.75 0.75 0.86 0.78 0.78 0.85 0.93 0.93 0.94 0.91 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.04 1.03 1.1 0.79 0.95 1.05 1.09 1.1 1.13 1.09 1.11 1.06 1.06 1.2 1.1 1.05 1.03 1.07 1.21 1.07 1.06 1.05 1.07 1 1.04 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.01 1.02 0.92 1.04 0.99 1 0.92 1.01 1.03 0.98 1.02 1.03 1.08 0.98 1.02 1.02 0.98 0.93 0.95 0.97 1.01 0.97 0.97 0.99 0.98 1.12 1.21 1.19 1.28 1.19 1.12 1.09 1.12 1.27 1.12 1.09 1.06 1.07 1.08 1.08 1.03 1.01 1.08 1.06 1.04 1.05 1.08 1.09 1.14 1.09 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.22 1.14 1.14 1.25 1.16 1.23 1.13 1.14 1.19 1.16 1.24 1.14 1.19 1.24 1.2 1.06 1.11 1.08 1.1 1.08 1.17 1.12 1.13 1.16 1.14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.04 0.94 1.02 1.09 1.01 1.03 1.08 0.98 1 1.02 0.97 0.97 1.03 1.03 1 0.97 0.91 0.92 0.99 0.94 1 0.96 0.98 1.02 0.99 0.9 0.87 0.99 1.14 0.95 0.9 0.9 1.03 0.93 0.93 1.05 1.24 1.02 1.14 1.08 0.72 0.72 0.84 0.71 0.74 0.85 0.87 0.94 0.9 0.89 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.08 1.07 1.13 1.03 1.07 1.2 1.2 1.24 1.2 1.2 1.12 1.08 1.12 1.23 1.12 1.11 0.97 1.08 1.05 1.05 1.12 1.06 1.14 1.1 1.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.86 0.77 0.83 0.84 0.82 0.89 0.91 0.86 0.89 0.89 0.8 0.85 0.79 0.89 0.83 0.8 0.82 0.76 0.79 0.79 0.83 0.82 0.8 0.84 0.82 1.06 1.09 1.08 1.22 1.1 1.12 1.22 1.12 1.3 1.17 1.21 1.37 1.29 1.24 1.25 0.94 0.93 1.02 1.02 0.96 1.05 1.09 1.1 1.15 1.09 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.02 0.97 1.08 1 1.01 1.05 1.02 1.13 1.03 1.05 1.19 1.29 1.23 1.21 1.22 1.24 1.13 1.04 1.14 1.13 1.1 1.07 1.08 1.06 1.08 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.93 0.85 0.91 0.92 0.89 0.93 0.92 0.91 0.94 0.92 0.87 0.94 0.9 0.93 0.91 0.89 0.86 0.84 0.88 0.86 0.9 0.88 0.88 0.91 0.89 0.95 1.03 1.03 1.13 1.02 0.96 0.98 1.04 1.07 1 1.04 1.2 1.03 1.08 1.07 0.82 0.81 0.92 0.84 0.84 0.92 0.96 0.99 1 0.97 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.07 1.04 1.09 0.96 1.03 1.12 1.1 1.14 1.12 1.12 1.15 1.12 1.13 1.22 1.15 1.09 1.05 1.06 1.11 1.08 1.1 1.07 1.1 1.07 1.09 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.9 0.89 0.81 0.88 0.86 0.83 0.86 0.88 0.94 0.87 0.84 0.81 0.86 0.85 0.84 0.8 0.84 0.75 0.76 0.78 0.83 0.85 0.82 0.82 0.83 1.06 1.11 1.27 1.15 1.12 1.14 1.1 0.96 1.08 1.05 1.15 1.17 1.15 1.15 1.14 0.89 0.91 0.99 0.93 0.93 1.02 1.05 1.05 1.04 1.04 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.01 0.92 0.8 0.95 0.91 1.07 1.12 1.1 1.13 1.1 1.18 1.1 1.08 1.2 1.13 1.07 1.08 1.02 1.02 1.04 1.06 1.04 0.96 1.04 1.02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1.01 0.95 0.89 0.95 1.02 0.96 0.95 0.89 0.95 1.05 0.98 1 1.05 1.01 1.05 0.91 0.86 0.86 0.9 1.03 0.95 0.93 0.91 0.94 1.36 1.3 1.34 1.39 1.34 1.33 1.15 1.24 1.17 1.2 1.23 1.2 1.12 1.18 1.18 1.21 1.14 1.21 1.19 1.19 1.27 1.18 1.21 1.21 1.22 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.13 1.07 1.08 1.09 1.09 1.27 1.13 1.14 1.12 1.15 1.18 1.21 1.23 1.37 1.23 1.09 1.12 1.2 1.1 1.12 1.15 1.12 1.15 1.13 1.13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1.01 1.03 0.99 1.01 1 0.98 0.99 0.99 0.99 0.97 0.98 0.8 0.98 0.9 0.92 0.94 0.93 0.9 0.92 0.97 0.97 0.91 0.96 0.95 1.03 1.03 1.02 0.98 1.01 0.97 1.01 0.99 1.09 1.01 1.16 1.16 1.15 1.12 1.15 1.02 0.9 0.94 0.95 0.95 1.01 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.03 1.04 1.04 1.05 1.04 1.24 1.24 1.22 1.23 1.23 1.21 1.14 1.21 1.14 1.17 1.02 1.04 1.06 0.92 1 1.1 1.1 1.11 1.05 1.09 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.8 0.78 0.99 0.84 0.83 0.84 0.85 0.91 0.88 0.87 0.87 0.89 0.79 0.87 0.85 0.91 0.9 0.83 0.81 0.85 0.85 0.84 0.85 0.84 0.84 1.07 1.23 1.24 1.25 1.17 1.2 1.27 1.21 1.23 1.2 1.3 1.41 1.32 1.29 1.33 1.1 1.12 1.17 1.06 1.1 1.14 1.23 1.21 1.16 1.18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.22 1 1.03 1 1.04 1.12 1.22 1.12 1.17 1.15 1.21 1.11 1.29 1.13 1.17 1.1 1.16 1.12 1.11 1.12 1.14 1.08 1.1 1.08 1.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.9 0.88 0.92 0.89 0.9 0.9 0.89 0.92 0.91 0.9 0.91 0.9 0.84 0.92 0.89 0.89 0.88 0.83 0.82 0.85 0.9 0.88 0.87 0.87 0.88 1.09 1.14 1.17 1.13 1.13 1.1 1.11 1.05 1.12 1.09 1.15 1.17 1.14 1.13 1.15 1.02 0.99 1.05 1.01 1.01 1.08 1.09 1.09 1.08 1.09 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.08 1 0.96 1.02 1.01 1.15 1.15 1.14 1.15 1.15 1.19 1.13 1.18 1.18 1.17 1.06 1.09 1.08 1.02 1.06 1.11 1.08 1.07 1.07 1.08 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.84 0.82 0.84 0.81 0.83 0.85 0.87 0.89 0.85 0.86 0.83 0.8 0.82 0.8 0.81 1.02 0.86 0.89 0.84 0.88 0.87 0.83 0.86 0.83 0.84 1.13 1.21 1.31 1.04 1.14 1.18 1.2 1.07 1.21 1.15 1.39 1.21 1.21 1.2 1.23 0.85 0.99 1 1.06 0.96 1.05 1.12 1.11 1.09 1.09 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.97 0.97 0.92 1.09 0.97 1.12 1.04 1.11 1.12 1.09 1.08 1.15 1.1 1.03 1.09 1.11 0.97 1.06 1.06 1.04 1.05 1.02 1.03 1.05 1.04 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.02 0.9 0.95 0.95 0.95 0.97 0.89 0.96 0.91 0.93 0.94 0.98 0.93 0.94 0.94 1.03 0.95 0.85 0.93 0.92 0.98 0.92 0.91 0.91 0.93 1.32 1.44 1.54 1.52 1.42 1.32 1.32 1.29 1.29 1.3 1.28 1.28 1.25 1.25 1.26 1.29 1.19 1.27 1.28 1.25 1.29 1.29 1.31 1.31 1.3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.29 1.05 1.12 1.12 1.11 1.13 1.15 1.21 1.12 1.15 1.2 1.19 1.22 1.18 1.2 1.12 1.05 1.11 1.17 1.11 1.17 1.09 1.15 1.14 1.13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.93 0.97 0.97 1.09 0.98 0.89 1.02 0.97 1.04 0.97 0.98 0.92 1.01 0.96 0.96 0.93 0.88 0.89 0.91 0.9 0.93 0.93 0.95 0.98 0.95 1.25 1.2 1.15 1.24 1.21 1.17 1.13 1.1 1.14 1.13 1.32 1.28 1.25 1.25 1.27 0.98 0.93 1.05 1.11 1 1.13 1.09 1.12 1.16 1.12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.06 0.94 0.96 1.04 0.99 1.22 1.27 1.22 1.2 1.22 1.17 1.22 1.09 1.15 1.15 1.21 1.03 0.93 1.05 1.02 1.14 1.07 1.02 1.09 1.07 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.8 0.84 0.79 0.75 0.79 0.91 0.78 0.88 0.85 0.85 0.77 0.88 0.78 0.82 0.81 0.79 0.77 0.75 0.77 0.77 0.81 0.81 0.79 0.79 0.8 1.21 1.25 1.34 1.23 1.25 1.27 1.33 1.36 1.37 1.32 1.35 1.36 1.49 1.34 1.37 1.22 1.06 1.23 1.13 1.14 1.24 1.21 1.33 1.24 1.25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.29 0.93 1.2 0.96 1.02 1.1 1.13 1.09 1.23 1.12 1.1 1.27 1 1.31 1.11 1.16 1 1.27 1.09 1.09 1.14 1.01 1.09 1.06 1.07 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.87 0.87 0.86 0.84 0.86 0.89 0.85 0.91 0.88 0.88 0.84 0.87 0.85 0.86 0.86 0.89 0.84 0.83 0.83 0.85 0.87 0.85 0.86 0.85 0.86 1.21 1.26 1.28 1.2 1.23 1.22 1.21 1.16 1.23 1.2 1.29 1.23 1.23 1.21 1.24 1 1.02 1.11 1.12 1.05 1.15 1.15 1.18 1.18 1.17 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.09 0.96 1.01 1.04 1.02 1.13 1.13 1.15 1.15 1.14 1.13 1.2 1.08 1.14 1.13 1.12 1 1.05 1.08 1.06 1.12 1.04 1.06 1.08 1.07 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.79 0.9 0.93 0.83 0.85 0.94 0.9 0.87 0.89 0.9 0.9 0.84 0.71 0.94 0.81 0.98 0.85 0.83 0.91 0.88 0.89 0.87 0.81 0.88 0.86 0.94 1.07 1.07 1.06 0.99 0.89 0.92 0.98 1.08 0.95 1.04 1.07 1.06 1.04 1.05 0.81 0.77 0.91 0.84 0.82 0.89 0.92 0.98 0.98 0.93 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.16 1.1 1.05 0.97 1.04 1.05 1.16 1.12 1.1 1.1 1.15 1.07 1.16 1.1 1.11 1.06 1.1 1.07 1.13 1.08 1.1 1.09 1.08 1.06 1.08 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.05 1.04 0.98 0.97 1 0.95 0.95 0.95 1.01 0.96 1 1 1 0.99 1 1.01 1.02 1.01 0.94 0.99 1 1 0.98 0.97 0.98 1.24 1.15 1.21 1.33 1.22 1.19 1.16 1.16 1.21 1.18 1.09 1.11 1.09 1.08 1.09 1.08 1.12 1.14 1.1 1.1 1.13 1.12 1.14 1.16 1.14 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.26 1.07 1.02 1.14 1.09 1.16 1.13 1.16 1.16 1.15 1.24 1.3 1.13 1.22 1.21 1.13 1.23 1.06 1.17 1.13 1.19 1.14 1.06 1.16 1.13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.01 1.05 0.98 0.98 1 0.99 1.03 1.01 0.94 0.99 1.07 0.96 0.97 0.99 0.99 0.96 0.91 1.01 0.92 0.94 1 0.98 0.98 0.95 0.98 0.9 0.94 1.03 1.08 0.97 0.97 1.05 1.06 1.05 1.02 1.08 1.08 1.06 1.08 1.08 0.78 0.87 0.9 0.79 0.82 0.9 0.95 0.98 0.94 0.94 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.14 1.05 1.12 1.04 1.08 1.27 1.28 1.34 1.23 1.27 1.16 1.18 1.24 1.36 1.21 1.05 1.12 1.08 1.02 1.06 1.14 1.13 1.17 1.1 1.13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.87 0.81 0.77 0.84 0.82 0.86 0.92 0.86 0.91 0.89 0.84 0.99 0.84 0.8 0.86 0.85 0.79 0.88 0.87 0.83 0.85 0.86 0.83 0.85 0.85 1.26 1.23 1.18 1.26 1.22 1.22 1.11 1.2 1.18 1.17 1.26 1.29 1.27 1.28 1.27 1.08 1.03 1.04 1.08 1.06 1.18 1.13 1.15 1.17 1.16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.08 1.08 1.08 1.09 1.08 1.04 1.13 1.1 1.15 1.1 1.16 1.12 1.17 1.2 1.16 1.16 1.13 1.19 1.09 1.14 1.1 1.11 1.12 1.1 1.11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.9 0.91 0.88 0.89 0.89 0.91 0.94 0.9 0.92 0.92 0.92 0.93 0.83 0.89 0.89 0.94 0.86 0.91 0.9 0.9 0.92 0.91 0.88 0.9 0.9 1.02 1.05 1.08 1.12 1.06 1.02 1.03 1.07 1.1 1.05 1.06 1.07 1.05 1.06 1.06 0.89 0.9 0.97 0.9 0.91 0.98 1 1.03 1.02 1.01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.15 1.06 1.05 1.05 1.07 1.11 1.17 1.16 1.15 1.15 1.17 1.14 1.17 1.18 1.16 1.09 1.13 1.09 1.09 1.1 1.13 1.12 1.1 1.1 1.11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.85 0.86 0.86 0.84 0.85 0.87 0.87 0.88 0.89 0.88 0.85 0.84 0.81 0.85 0.84 0.88 0.84 0.83 0.84 0.84 0.86 0.85 0.84 0.85 0.85 0.97 1.08 1.08 1.02 1.02 0.97 0.98 0.98 1.07 1 1.1 1.14 1.1 1.08 1.1 0.82 0.83 0.92 0.86 0.85 0.93 0.97 1 0.98 0.97 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.03 0.99 0.93 0.92 0.96 1.07 1.1 1.11 1.12 1.1 1.13 1.09 1.1 1.12 1.1 1.06 1.04 1.05 1.08 1.06 1.06 1.05 1.03 1.04 1.04 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.02 0.98 0.95 0.95 0.97 0.98 0.92 0.97 0.95 0.95 0.99 0.99 0.99 0.98 0.99 1.03 0.96 0.9 0.91 0.94 1 0.96 0.94 0.94 0.96 1.22 1.24 1.25 1.34 1.26 1.19 1.15 1.19 1.23 1.18 1.15 1.14 1.11 1.13 1.13 1.11 1.09 1.15 1.13 1.12 1.15 1.14 1.16 1.19 1.16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.21 1.08 1.08 1.14 1.11 1.19 1.13 1.16 1.15 1.15 1.21 1.2 1.19 1.23 1.21 1.1 1.11 1.1 1.13 1.11 1.17 1.11 1.12 1.15 1.13 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.99 0.98 1 1.03 1 0.97 1.02 0.99 0.99 0.99 0.99 0.95 0.91 0.99 0.96 0.94 0.91 0.93 0.93 0.93 0.97 0.96 0.95 0.98 0.96 0.97 0.96 1.02 1.06 1 0.97 0.99 1.04 1.02 1 1.12 1.17 1.09 1.13 1.12 0.82 0.82 0.91 0.83 0.84 0.94 0.95 0.99 0.98 0.96 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.07 1.02 1.04 1.04 1.04 1.23 1.23 1.24 1.21 1.23 1.16 1.14 1.15 1.19 1.16 1.08 1.03 1.02 1 1.03 1.12 1.09 1.09 1.08 1.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.83 0.8 0.82 0.81 0.81 0.87 0.85 0.88 0.88 0.87 0.81 0.89 0.8 0.84 0.83 0.83 0.81 0.8 0.8 0.81 0.83 0.83 0.82 0.83 0.83 1.13 1.18 1.19 1.24 1.18 1.19 1.2 1.19 1.22 1.19 1.27 1.35 1.31 1.28 1.3 1.06 1.01 1.1 1.06 1.05 1.14 1.15 1.18 1.17 1.16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.11 0.98 1.07 1 1.03 1.07 1.1 1.11 1.12 1.1 1.16 1.17 1.12 1.2 1.16 1.16 1.08 1.13 1.11 1.12 1.12 1.06 1.1 1.08 1.09 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.9 0.88 0.89 0.88 0.88 0.9 0.89 0.91 0.91 0.9 0.88 0.91 0.85 0.9 0.88 0.9 0.86 0.85 0.85 0.86 0.89 0.88 0.87 0.88 0.88 1.05 1.08 1.11 1.12 1.09 1.05 1.05 1.07 1.11 1.07 1.1 1.15 1.09 1.1 1.11 0.91 0.9 0.99 0.93 0.93 1.01 1.03 1.06 1.05 1.04 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.09 1.01 1.02 1.01 1.03 1.13 1.13 1.15 1.14 1.14 1.16 1.14 1.13 1.18 1.15 1.09 1.06 1.07 1.07 1.07 1.11 1.08 1.08 1.08 1.09 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 1.1 1.06 1.13 1.19 1.11 1.02 0.97 0.99 0.97 0.99 1.1 1.23 1.16 1.21 1.15 0.94 0.99 0.96 0.91 0.94 1.02 1.03 1.04 1.01 1.03 0.75 0.73 0.71 0.81 0.74 0.76 0.66 0.73 0.75 0.72 0.68 0.92 0.68 0.68 0.71 0.68 0.65 0.7 0.71 0.68 0.71 0.7 0.7 0.72 0.71 1.12 1.21 1.2 0.95 1.08 0.92 1.07 1.16 0.93 0.93 0.98 1.28 1.26 1.16 1.11 0.87 1.2 1.03 0.9 0.96 0.94 1.17 1.11 0.92 0.99 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.12 1.13 1.04 1.16 1.1 1.04 0.95 1.04 1.06 1.01 1.19 1.11 1.16 1.06 1.12 1.04 0.92 1.01 0.95 0.97 1.09 1 1.04 1.05 1.04 0.8 0.88 0.8 0.91 0.84 1.03 0.86 0.9 1.03 0.93 0.89 0.87 0.87 0.88 0.88 0.82 0.83 0.88 0.85 0.84 0.86 0.86 0.86 0.9 0.87 1.06 1.1 1.05 0.95 1.02 0.92 1.06 1 1.04 0.99 1.17 1.23 1.24 1.08 1.16 0.83 1.09 0.94 0.81 0.88 0.92 1.09 1.01 0.94 0.97 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.18 1.08 1.05 1.21 1.11 1.13 1.18 1.08 1.1 1.12 1.17 1.16 1.22 1.22 1.19 1.1 1.02 1.09 1.07 1.07 1.14 1.09 1.1 1.14 1.11 0.72 0.68 0.72 0.73 0.71 0.72 0.68 0.82 0.71 0.72 0.69 0.86 0.67 0.83 0.74 0.62 0.6 0.7 0.65 0.64 0.68 0.67 0.71 0.71 0.69 1.17 1.17 1.22 1.06 1.14 0.94 1.2 1.1 0.94 1.01 1.21 1.24 1.25 1.1 1.18 0.82 1.14 0.93 0.88 0.89 0.96 1.18 1.06 0.96 1.02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.94 0.78 0.9 0.91 0.87 0.86 0.89 0.84 0.87 0.86 0.87 0.95 0.86 0.96 0.9 0.92 0.77 0.76 0.79 0.8 0.89 0.83 0.83 0.87 0.85 0.84 0.81 0.83 0.95 0.85 0.82 0.8 0.8 0.94 0.83 0.8 0.94 0.96 0.81 0.86 0.79 0.69 0.85 0.8 0.77 0.81 0.78 0.84 0.86 0.82 0.98 1.25 1.11 1.09 1.08 0.84 1.07 0.99 0.93 0.94 1.11 1.44 1.11 1.08 1.12 0.89 1.26 1.14 0.88 0.97 0.93 1.16 1.06 0.97 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.05 0.95 1.01 1.06 1.01 0.99 0.97 0.96 0.98 0.97 1.03 1.08 1.03 1.06 1.05 0.97 0.89 0.91 0.9 0.92 1.01 0.96 0.97 0.99 0.98 0.77 0.75 0.75 0.82 0.77 0.8 0.73 0.8 0.8 0.77 0.74 0.89 0.74 0.78 0.77 0.7 0.66 0.75 0.71 0.7 0.74 0.73 0.76 0.77 0.75 1.04 1.17 1.13 1 1.07 0.88 1.08 1.04 0.92 0.95 1.08 1.27 1.2 1.1 1.13 0.84 1.15 0.98 0.83 0.91 0.93 1.14 1.06 0.93 0.98 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.09 1.08 0.92 1.17 1.04 1.08 1.01 1 1.06 1.03 1.06 1.09 1.08 1.06 1.07 0.88 0.93 1 0.81 0.89 1.01 1.01 0.98 0.97 0.99 0.87 0.86 0.84 0.82 0.84 0.92 0.82 0.74 0.88 0.82 0.79 0.87 0.79 0.79 0.8 0.75 0.77 0.81 0.8 0.78 0.81 0.82 0.79 0.82 0.81 1.23 1.13 1.21 1.06 1.13 0.96 1.16 1.11 0.96 1.02 1.03 1.32 1.23 1.14 1.14 0.88 1.13 1.06 0.97 0.97 0.98 1.16 1.09 0.99 1.03 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.12 1.11 1.07 1.01 1.07 1.04 0.99 0.98 0.94 0.98 1.12 1.08 1.14 1.1 1.11 1.06 0.92 1.05 0.87 0.94 1.08 1.01 1.03 0.96 1.01 1.04 0.97 1.01 1.06 1.01 1.1 0.92 1.01 0.94 0.97 1.03 1 0.92 0.98 0.98 1 0.94 0.99 0.98 0.98 1.04 0.95 0.98 0.98 0.98 1.17 1.11 1.05 1.06 1.09 0.9 1.06 1.06 0.91 0.96 1.06 1.2 1.22 1.23 1.16 0.82 1.09 1.1 0.94 0.93 0.91 1.09 1.09 0.97 0.99 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.13 1.15 1.17 1.12 1.14 1.1 1.08 1.1 1.1 1.09 1.16 1.17 0.99 1.17 1.09 1.05 1.06 1.05 1.03 1.05 1.1 1.11 1.04 1.09 1.08 0.79 0.79 0.78 0.75 0.78 0.75 0.79 0.77 0.88 0.79 0.81 0.81 0.8 0.77 0.8 0.89 0.76 0.8 0.8 0.8 0.8 0.79 0.79 0.79 0.79 1.01 1.12 1.07 1.11 1.07 0.98 1.02 1.02 0.89 0.96 1.02 1.08 1.22 1.05 1.07 0.87 1.12 0.96 0.85 0.92 0.96 1.07 1.04 0.93 0.98 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.97 0.86 1.06 0.91 0.93 0.9 0.83 0.89 0.86 0.87 0.93 0.96 0.96 0.94 0.94 0.9 0.8 0.82 0.77 0.81 0.92 0.85 0.9 0.86 0.88 0.89 0.93 0.87 0.98 0.91 0.96 0.94 0.89 1.08 0.95 0.89 0.98 0.91 0.89 0.91 0.88 0.83 0.95 0.84 0.87 0.9 0.91 0.9 0.92 0.91 1.17 1.28 1.24 1 1.11 1.05 1.19 1.06 0.91 1.02 1.1 1.24 1.16 1.08 1.11 0.99 1.27 1.07 0.9 0.99 1.03 1.21 1.08 0.95 1.03 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.06 1 1.02 1.02 1.03 1 0.95 0.98 0.97 0.97 1.03 1.04 1.01 1.04 1.03 0.94 0.91 0.95 0.85 0.9 1 0.97 0.98 0.95 0.97 0.87 0.87 0.85 0.84 0.86 0.87 0.85 0.81 0.92 0.86 0.85 0.88 0.84 0.82 0.85 0.84 0.8 0.86 0.83 0.83 0.86 0.85 0.84 0.85 0.85 1.1 1.14 1.12 1.04 1.09 0.96 1.1 1.05 0.91 0.98 1.05 1.18 1.2 1.12 1.12 0.88 1.14 1.04 0.9 0.95 0.96 1.13 1.07 0.96 1.01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.95 1.15 1.08 1.1 1.04 0.97 0.99 1.01 0.94 0.98 1.1 1.03 1.1 0.98 1.05 1.11 0.91 0.98 0.93 0.96 1 0.99 1.02 0.97 1 0.87 0.95 0.98 0.94 0.93 0.96 0.98 0.85 0.99 0.93 1.03 0.85 0.85 0.84 0.87 0.74 0.8 0.86 0.91 0.81 0.84 0.87 0.87 0.91 0.87 1.17 1.1 1.12 1.06 1.09 0.87 0.95 1.17 0.92 0.92 0.94 1.19 1.28 1.14 1.08 0.85 1.07 1.01 0.9 0.91 0.91 1.05 1.08 0.95 0.97 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.12 1.15 1.05 1.08 1.09 0.99 0.92 0.99 1.05 0.98 1.03 1.05 1.06 0.96 1.02 1.05 0.94 0.86 0.83 0.9 1.04 0.99 0.97 0.96 0.99 0.99 1.1 1.2 1.19 1.09 1.09 1.09 1.06 1.06 1.07 1.08 1.08 1.05 1.05 1.06 1.09 0.99 1.06 1.07 1.05 1.05 1.05 1.08 1.08 1.07 0.99 1.12 1.11 0.97 1.03 0.93 1.18 1.04 0.91 0.97 1.07 1.26 1.23 1.1 1.14 0.86 1.21 1.13 0.87 0.93 0.93 1.17 1.09 0.93 0.99 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.13 1.11 1.1 1.22 1.13 1.09 1.12 1.07 1.13 1.09 1.23 1.1 1.21 1.09 1.14 1.06 0.98 1.02 1 1.01 1.12 1.07 1.09 1.09 1.09 1.01 0.97 0.95 1.01 0.98 0.95 0.91 0.88 0.92 0.91 0.97 0.93 0.9 0.9 0.92 0.84 0.79 0.92 0.87 0.84 0.92 0.87 0.91 0.91 0.9 0.96 1.24 1.1 1.06 1.05 0.87 1.07 1 0.89 0.93 1.08 1.25 1.18 1.07 1.12 0.84 1.06 1.06 0.79 0.88 0.89 1.12 1.03 0.9 0.95 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.87 0.92 0.84 0.81 0.86 0.84 0.76 0.86 0.8 0.81 0.82 0.84 0.86 0.89 0.85 0.89 0.75 0.86 0.76 0.81 0.84 0.8 0.85 0.81 0.82 0.93 0.98 0.94 0.96 0.95 0.95 0.93 1.04 1.05 0.98 0.94 0.95 1.08 0.93 0.96 1 0.86 0.97 0.91 0.92 0.95 0.92 0.99 0.95 0.95 1.12 1.27 1.02 1.08 1.09 0.98 1.02 1.04 0.86 0.96 1.09 1.25 1.18 1 1.09 0.87 1.1 0.89 0.84 0.87 0.96 1.06 1 0.91 0.97 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.98 1.05 0.98 0.97 0.99 0.95 0.9 0.97 0.94 0.94 0.98 0.97 1.01 0.96 0.98 0.97 0.87 0.91 0.86 0.9 0.97 0.93 0.96 0.93 0.95 0.94 0.99 0.98 0.99 0.97 0.98 0.96 0.92 0.99 0.96 0.99 0.93 0.93 0.91 0.94 0.84 0.84 0.93 0.92 0.87 0.92 0.91 0.94 0.95 0.93 1.02 1.13 1.06 1.01 1.04 0.89 1.02 1.05 0.89 0.94 1.02 1.22 1.21 1.07 1.1 0.84 1.09 0.95 0.83 0.88 0.91 1.09 1.04 0.91 0.96 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.01 1.14 1.15 1.04 1.07 1.06 1.01 0.99 1.01 1.01 1.14 1.08 0.95 1.17 1.05 1.07 0.93 1 1 0.98 1.04 1.02 0.99 1.04 1.02 0.79 0.7 0.69 0.71 0.72 0.68 0.7 0.76 0.74 0.71 0.68 0.67 0.67 0.68 0.67 0.7 0.61 0.68 0.67 0.66 0.7 0.66 0.7 0.69 0.69 1.02 1.17 1.18 0.92 1.04 0.98 1.09 1.05 0.84 0.96 1.07 1.25 1.16 1.04 1.09 0.88 1.17 0.95 0.93 0.94 0.96 1.13 1.05 0.91 0.98 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.17 1.16 1.09 1.09 1.12 1.04 0.96 0.98 0.97 0.98 1.03 1.06 1.08 1.03 1.05 1.02 1.04 1.03 0.95 1 1.06 1.04 1.04 1 1.03 0.9 0.86 0.94 1 0.92 0.87 0.88 0.93 0.96 0.91 0.89 0.91 0.89 0.88 0.89 0.91 0.81 0.86 0.89 0.86 0.89 0.86 0.9 0.92 0.89 0.94 1.1 1.07 0.96 1 0.91 1.06 1.06 1.09 1 1.26 1.24 1.24 1.19 1.23 0.83 1.22 0.94 0.99 0.93 0.9 1.12 1.04 1.04 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.14 1.18 1.27 1.12 1.16 1.08 1.1 1.1 1.04 1.08 1.26 1.15 1.15 1.18 1.18 1.09 1.14 1.09 1.04 1.09 1.13 1.13 1.14 1.09 1.12 0.71 0.67 0.77 0.81 0.73 0.74 0.83 0.84 0.83 0.8 0.73 0.73 0.71 0.73 0.73 0.66 0.63 0.72 0.63 0.65 0.7 0.69 0.75 0.72 0.71 1.05 1.29 1.2 1.02 1.1 0.99 1.15 1.02 0.94 1 1.09 1.23 1.19 1.18 1.16 0.9 1.09 1.03 0.91 0.96 0.97 1.17 1.06 0.97 1.02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.01 0.88 0.85 0.91 0.91 0.83 0.9 0.84 0.87 0.86 0.91 0.92 0.9 0.86 0.89 0.98 0.87 0.86 0.83 0.88 0.92 0.88 0.86 0.87 0.88 0.98 0.86 0.9 0.98 0.92 0.87 0.79 0.88 0.86 0.84 0.84 0.88 0.86 0.86 0.86 0.87 0.83 0.86 0.86 0.85 0.88 0.83 0.87 0.88 0.87 1.1 1.13 1.09 1.1 1.09 0.96 1.14 1.05 0.96 1 1 1.34 1.27 0.94 1.06 0.94 1.14 1.05 0.87 0.95 0.95 1.15 1.08 0.94 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.05 1.04 1.03 1.02 1.03 0.97 0.98 0.95 0.96 0.96 1.03 1.02 0.96 1.01 1 1.03 0.95 0.96 0.94 0.97 1.02 0.99 0.97 0.98 0.99 0.8 0.74 0.79 0.82 0.79 0.75 0.78 0.83 0.82 0.79 0.76 0.76 0.75 0.75 0.75 0.74 0.68 0.76 0.71 0.72 0.76 0.73 0.78 0.76 0.76 1 1.15 1.13 0.98 1.04 0.94 1.09 1.04 0.93 0.98 1.06 1.25 1.2 1.05 1.11 0.88 1.14 0.98 0.91 0.94 0.94 1.13 1.05 0.95 0.99 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.02 1.1 1.05 1.12 1.06 1.02 0.99 1 0.99 1 1.09 1.08 1.05 1.08 1.07 0.97 0.94 0.97 0.89 0.94 1.02 1.01 1.01 1 1.01 0.81 0.77 0.76 0.8 0.79 0.77 0.74 0.76 0.81 0.77 0.74 0.79 0.73 0.72 0.74 0.71 0.69 0.74 0.73 0.71 0.75 0.74 0.75 0.76 0.75 1.12 1.15 1.16 0.97 1.08 0.92 1.04 1.11 0.9 0.95 0.99 1.25 1.22 1.11 1.1 0.85 1.14 0.99 0.91 0.93 0.93 1.12 1.08 0.94 0.99 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.13 1.12 1.06 1.07 1.09 1.03 0.95 1 0.99 0.99 1.08 1.07 1.1 1.03 1.07 1.04 0.95 0.96 0.89 0.95 1.07 1.01 1.02 0.98 1.02 0.89 0.92 0.92 1 0.93 0.99 0.91 0.96 0.99 0.96 0.95 0.94 0.92 0.93 0.93 0.91 0.87 0.93 0.92 0.91 0.93 0.91 0.93 0.95 0.93 1.02 1.11 1.07 0.98 1.03 0.91 1.08 1.04 0.96 0.98 1.12 1.23 1.23 1.14 1.17 0.83 1.13 1 0.88 0.92 0.92 1.11 1.05 0.96 0.99 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.14 1.12 1.12 1.16 1.14 1.09 1.11 1.09 1.09 1.09 1.2 1.14 1.1 1.16 1.14 1.07 1.04 1.06 1.03 1.05 1.12 1.1 1.09 1.1 1.1 0.76 0.73 0.78 0.79 0.76 0.76 0.77 0.82 0.8 0.79 0.76 0.81 0.74 0.79 0.77 0.7 0.67 0.76 0.69 0.7 0.74 0.73 0.77 0.76 0.75 1.02 1.19 1.13 1.06 1.08 0.93 1.09 1.03 0.91 0.97 1.08 1.18 1.21 1.09 1.13 0.84 1.1 0.97 0.83 0.9 0.93 1.13 1.04 0.93 0.99 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.94 0.85 0.89 0.88 0.89 0.86 0.83 0.85 0.85 0.85 0.88 0.91 0.89 0.91 0.89 0.91 0.79 0.82 0.79 0.82 0.89 0.84 0.86 0.85 0.86 0.89 0.88 0.88 0.96 0.9 0.89 0.85 0.88 0.96 0.89 0.86 0.93 0.92 0.86 0.89 0.86 0.78 0.89 0.85 0.84 0.87 0.85 0.89 0.9 0.87 1.08 1.21 1.09 1.06 1.09 0.94 1.08 1.03 0.91 0.98 1.07 1.29 1.16 1.01 1.09 0.91 1.17 1.02 0.86 0.94 0.97 1.13 1.05 0.94 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.03 1.01 1.01 1.01 1.01 0.98 0.94 0.96 0.96 0.96 1.02 1.02 1 1.01 1.01 0.97 0.9 0.93 0.88 0.92 1 0.96 0.97 0.96 0.97 0.83 0.8 0.82 0.85 0.82 0.82 0.8 0.83 0.86 0.83 0.8 0.85 0.79 0.8 0.81 0.76 0.72 0.81 0.76 0.76 0.8 0.78 0.81 0.81 0.8 1.03 1.15 1.1 1 1.06 0.92 1.07 1.05 0.91 0.96 1.05 1.23 1.2 1.08 1.12 0.85 1.13 0.98 0.86 0.91 0.93 1.12 1.05 0.93 0.98 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 N 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 changeo-1.2.0/changeo/__init__.py0000644000175000017500000000040213674203454016165 0ustar nileshnilesh# Set package info from .Version import __author__ from .Version import __version__ from .Version import __date__ from .Version import __copyright__ from .Version import __license__ # Set package level imports __all__ = ['Defaults'] from .Defaults import * changeo-1.2.0/changeo/Receptor.py0000755000175000017500000016322113755307545016233 0ustar nileshnilesh""" Receptor data structure """ # Info __author__ = 'Jason Anthony Vander Heiden, Namita Gupta, Scott Christley' # Imports from collections import OrderedDict from Bio.Seq import Seq # import yaml # from pkg_resources import resource_stream # Presto and changeo imports from presto.IO import printError, printWarning from changeo.Gene import getAllele, getGene, getFamily, getAlleleNumber # class Schema: # """ # Schema for mapping Receptor attributes to column names # """ # def __init__(self, schema): # """ # Initializer # # Arguments: # schema (str): name of schema to load. # # Returns: # changeo.Receptor.Schema # """ # with resource_stream(__name__, 'data/receptor.yaml') as f: # data = yaml.load(f, Loader=yaml.FullLoader) # receptor = {v[schema]: k for k, v in data['receptor'].items()} # definition = data[schema] # # # Output extension # self.out_type = definition['out_type'] # # # Field sets # self.fields = list(receptor.keys()) # self.required = definition['standard'] # self.custom_fields = definition['custom'] # # # Mapping of schema column names to Receptor attributes # self._schema_map = {k : receptor[k] for k in self.fields} # self._receptor_map = {v: k for k, v in self._schema_map.items()} # # def toReceptor(self, field): # """ # Returns a Receptor attribute name from an Schema column name # # Arguments: # field (str): schema column name. # Returns: # str: Receptor attribute name. # """ # return self._schema_map.get(field, field) # # def fromReceptor(self, field): # """ # Returns a schema column name from a Receptor attribute name # # Arguments: # field (str): Receptor attribute name. # # Returns: # str: schema column name. # """ # return self._receptor_map.get(field, field) # # # AIRRSchema = Schema('airr') # ChangeoSchema = Schema('changeo') class AIRRSchema: """ AIRR format to Receptor mappings """ # Default file extension out_type = 'tsv' # Core fields required = ['sequence_id', 'sequence', 'sequence_alignment', 'germline_alignment', 'rev_comp', 'productive', 'stop_codon', 'vj_in_frame', 'locus', 'v_call', 'd_call', 'j_call', 'junction', 'junction_length', 'junction_aa', 'np1_length', 'np2_length', 'v_sequence_start', 'v_sequence_end', 'v_germline_start', 'v_germline_end', 'd_sequence_start', 'd_sequence_end', 'd_germline_start', 'd_germline_end', 'j_sequence_start', 'j_sequence_end', 'j_germline_start', 'j_germline_end'] # Mapping of AIRR column names to Receptor attributes _schema_map = OrderedDict([('sequence_id', 'sequence_id'), ('sequence', 'sequence_input'), ('sequence_alignment', 'sequence_imgt'), ('germline_alignment', 'germline_imgt'), ('sequence_aa', 'sequence_aa_input'), ('sequence_aa_alignment', 'sequence_aa_imgt'), ('germline_aa_alignment', 'germline_aa_imgt'), ('rev_comp', 'rev_comp'), ('productive', 'functional'), ('stop_codon', 'stop'), ('vj_in_frame', 'in_frame'), ('v_frameshift', 'v_frameshift'), ('locus', 'locus'), ('v_call', 'v_call'), ('d_call', 'd_call'), ('j_call', 'j_call'), ('junction', 'junction'), ('junction_start', 'junction_start'), ('junction_end', 'junction_end'), ('junction_length', 'junction_length'), ('junction_aa', 'junction_aa'), ('junction_aa_length', 'junction_aa_length'), ('np1_length', 'np1_length'), ('np2_length', 'np2_length'), ('np1_aa_length', 'np1_aa_length'), ('np2_aa_length', 'np2_aa_length'), ('v_sequence_start', 'v_seq_start'), ('v_sequence_end', 'v_seq_end'), ('v_sequence_length', 'v_seq_length'), ('v_germline_start', 'v_germ_start_imgt'), ('v_germline_end', 'v_germ_end_imgt'), ('v_germline_length', 'v_germ_length_imgt'), ('v_sequence_aa_start', 'v_seq_aa_start'), ('v_sequence_aa_end', 'v_seq_aa_end'), ('v_sequence_aa_length', 'v_seq_aa_length'), ('v_germline_aa_start', 'v_germ_aa_start_imgt'), ('v_germline_aa_end', 'v_germ_aa_end_imgt'), ('v_germline_aa_length', 'v_germ_aa_length_imgt'), ('d_sequence_start', 'd_seq_start'), ('d_sequence_end', 'd_seq_end'), ('d_sequence_length', 'd_seq_length'), ('d_germline_start', 'd_germ_start'), ('d_germline_end', 'd_germ_end'), ('d_germline_length', 'd_germ_length'), ('d_sequence_aa_start', 'd_seq_aa_start'), ('d_sequence_aa_end', 'd_seq_aa_end'), ('d_sequence_aa_length', 'd_seq_aa_length'), ('d_germline_aa_start', 'd_germ_aa_start'), ('d_germline_aa_end', 'd_germ_aa_end'), ('d_germline_aa_length', 'd_germ_aa_length'), ('j_sequence_start', 'j_seq_start'), ('j_sequence_end', 'j_seq_end'), ('j_sequence_length', 'j_seq_length'), ('j_germline_start', 'j_germ_start'), ('j_germline_end', 'j_germ_end'), ('j_germline_length', 'j_germ_length'), ('j_sequence_aa_start', 'j_seq_aa_start'), ('j_sequence_aa_end', 'j_seq_aa_end'), ('j_sequence_aa_length', 'j_seq_aa_length'), ('j_germline_aa_start', 'j_germ_aa_start'), ('j_germline_aa_end', 'j_germ_aa_end'), ('j_germline_aa_length', 'j_germ_aa_length'), ('c_call', 'c_call'), ('germline_alignment_d_mask', 'germline_imgt_d_mask'), ('v_score', 'v_score'), ('v_identity', 'v_identity'), ('v_support', 'v_evalue'), ('v_cigar', 'v_cigar'), ('d_score', 'd_score'), ('d_identity', 'd_identity'), ('d_support', 'd_evalue'), ('d_cigar', 'd_cigar'), ('j_score', 'j_score'), ('j_identity', 'j_identity'), ('j_support', 'j_evalue'), ('j_cigar', 'j_cigar'), ('vdj_score', 'vdj_score'), ('cdr1', 'cdr1_imgt'), ('cdr2', 'cdr2_imgt'), ('cdr3', 'cdr3_imgt'), ('fwr1', 'fwr1_imgt'), ('fwr2', 'fwr2_imgt'), ('fwr3', 'fwr3_imgt'), ('fwr4', 'fwr4_imgt'), ('cdr1_aa', 'cdr1_aa_imgt'), ('cdr2_aa', 'cdr2_aa_imgt'), ('cdr3_aa', 'cdr3_aa_imgt'), ('fwr1_aa', 'fwr1_aa_imgt'), ('fwr2_aa', 'fwr2_aa_imgt'), ('fwr3_aa', 'fwr3_aa_imgt'), ('fwr4_aa', 'fwr4_aa_imgt'), ('cdr1_start', 'cdr1_start'), ('cdr1_end', 'cdr1_end'), ('cdr2_start', 'cdr2_start'), ('cdr2_end', 'cdr2_end'), ('cdr3_start', 'cdr3_start'), ('cdr3_end', 'cdr3_end'), ('fwr1_start', 'fwr1_start'), ('fwr1_end', 'fwr1_end'), ('fwr2_start', 'fwr2_start'), ('fwr2_end', 'fwr2_end'), ('fwr3_start', 'fwr3_start'), ('fwr3_end', 'fwr3_end'), ('fwr4_start', 'fwr4_start'), ('fwr4_end', 'fwr4_end'), ('n1_length', 'n1_length'), ('n2_length', 'n2_length'), ('p3v_length', 'p3v_length'), ('p5d_length', 'p5d_length'), ('p3d_length', 'p3d_length'), ('p5j_length', 'p5j_length'), ('d_frame', 'd_frame'), ('cdr3_igblast', 'cdr3_igblast'), ('cdr3_igblast_aa', 'cdr3_igblast_aa'), ('duplicate_count', 'dupcount'), ('consensus_count', 'conscount'), ('umi_count', 'umicount'), ('clone_id', 'clone'), ('cell_id', 'cell')]) # Mapping of Receptor attributes to AIRR column names _receptor_map = {v: k for k, v in _schema_map.items()} # All fields fields = list(_schema_map.keys()) @staticmethod def toReceptor(field): """ Returns a Receptor attribute name from an AIRR column name Arguments: field : AIRR column name. Returns: str: Receptor attribute name. """ field = field.lower() return AIRRSchema._schema_map.get(field, field) @staticmethod def fromReceptor(field): """ Returns an AIRR column name from a Receptor attribute name Arguments: field : Receptor attribute name. Returns: str: AIRR column name. """ field = field.lower() return AIRRSchema._receptor_map.get(field, field) class AIRRSchemaAA(AIRRSchema): """ AIRR format to Receptor amino acid mappings """ # Core fields required = ['sequence_id', 'sequence', 'sequence_alignment', 'germline_alignment', 'sequence_aa', 'sequence_aa_alignment', 'germline_aa_alignment', 'rev_comp', 'productive', 'stop_codon', 'locus', 'v_call', 'd_call', 'j_call', 'junction', 'junction_length', 'junction_aa', 'v_sequence_aa_start', 'v_sequence_aa_end', 'v_germline_aa_start', 'v_germline_aa_end'] class ChangeoSchema: """ Change-O to Receptor mappings """ # Default file extension out_type = 'tab' # Standard fields required = ['SEQUENCE_ID', 'SEQUENCE_INPUT', 'FUNCTIONAL', 'IN_FRAME', 'STOP', 'MUTATED_INVARIANT', 'INDELS', 'LOCUS', 'V_CALL', 'D_CALL', 'J_CALL', 'SEQUENCE_VDJ', 'SEQUENCE_IMGT', 'V_SEQ_START', 'V_SEQ_LENGTH', 'V_GERM_START_VDJ', 'V_GERM_LENGTH_VDJ', 'V_GERM_START_IMGT', 'V_GERM_LENGTH_IMGT', 'NP1_LENGTH', 'D_SEQ_START', 'D_SEQ_LENGTH', 'D_GERM_START', 'D_GERM_LENGTH', 'NP2_LENGTH', 'J_SEQ_START', 'J_SEQ_LENGTH', 'J_GERM_START', 'J_GERM_LENGTH', 'JUNCTION', 'JUNCTION_LENGTH', 'GERMLINE_IMGT'] # Mapping of Change-O column names to Receptor attributes _schema_map = OrderedDict([('SEQUENCE_ID', 'sequence_id'), ('SEQUENCE_INPUT', 'sequence_input'), ('SEQUENCE_AA_INPUT', 'sequence_aa_input'), ('FUNCTIONAL', 'functional'), ('IN_FRAME', 'in_frame'), ('STOP', 'stop'), ('MUTATED_INVARIANT', 'mutated_invariant'), ('INDELS', 'indels'), ('V_FRAMESHIFT', 'v_frameshift'), ('LOCUS', 'locus'), ('V_CALL', 'v_call'), ('D_CALL', 'd_call'), ('J_CALL', 'j_call'), ('SEQUENCE_VDJ', 'sequence_vdj'), ('SEQUENCE_IMGT', 'sequence_imgt'), ('SEQUENCE_AA_VDJ', 'sequence_aa_vdj'), ('SEQUENCE_AA_IMGT', 'sequence_aa_imgt'), ('V_SEQ_START', 'v_seq_start'), ('V_SEQ_LENGTH', 'v_seq_length'), ('V_GERM_START_VDJ', 'v_germ_start_vdj'), ('V_GERM_LENGTH_VDJ', 'v_germ_length_vdj'), ('V_GERM_START_IMGT', 'v_germ_start_imgt'), ('V_GERM_LENGTH_IMGT', 'v_germ_length_imgt'), ('V_SEQ_AA_START', 'v_seq_aa_start'), ('V_SEQ_AA_LENGTH', 'v_seq_aa_length'), ('V_GERM_AA_START_VDJ', 'v_germ_aa_start_vdj'), ('V_GERM_AA_LENGTH_VDJ', 'v_germ_aa_length_vdj'), ('V_GERM_AA_START_IMGT', 'v_germ_aa_start_imgt'), ('V_GERM_AA_LENGTH_IMGT', 'v_germ_aa_length_imgt'), ('NP1_LENGTH', 'np1_length'), ('NP1_AA_LENGTH', 'np1_aa_length'), ('D_SEQ_START', 'd_seq_start'), ('D_SEQ_LENGTH', 'd_seq_length'), ('D_GERM_START', 'd_germ_start'), ('D_GERM_LENGTH', 'd_germ_length'), ('D_SEQ_AA_START', 'd_seq_aa_start'), ('D_SEQ_AA_LENGTH', 'd_seq_aa_length'), ('D_GERM_AA_START', 'd_germ_aa_start'), ('D_GERM_AA_LENGTH', 'd_germ_aa_length'), ('NP2_LENGTH', 'np2_length'), ('NP2_AA_LENGTH', 'np2_aa_length'), ('J_SEQ_START', 'j_seq_start'), ('J_SEQ_LENGTH', 'j_seq_length'), ('J_GERM_START', 'j_germ_start'), ('J_GERM_LENGTH', 'j_germ_length'), ('J_SEQ_AA_START', 'j_seq_aa_start'), ('J_SEQ_AA_LENGTH', 'j_seq_aa_length'), ('J_GERM_AA_START', 'j_germ_aa_start'), ('J_GERM_AA_LENGTH', 'j_germ_aa_length'), ('JUNCTION', 'junction'), ('JUNCTION_LENGTH', 'junction_length'), ('GERMLINE_IMGT', 'germline_imgt'), ('GERMLINE_AA_IMGT', 'germline_aa_imgt'), ('JUNCTION_START', 'junction_start'), ('V_SCORE', 'v_score'), ('V_IDENTITY', 'v_identity'), ('V_EVALUE', 'v_evalue'), ('V_BTOP', 'v_btop'), ('V_CIGAR', 'v_cigar'), ('D_SCORE', 'd_score'), ('D_IDENTITY', 'd_identity'), ('D_EVALUE', 'd_evalue'), ('D_BTOP', 'd_btop'), ('D_CIGAR', 'd_cigar'), ('J_SCORE', 'j_score'), ('J_IDENTITY', 'j_identity'), ('J_EVALUE', 'j_evalue'), ('J_BTOP', 'j_btop'), ('J_CIGAR', 'j_cigar'), ('VDJ_SCORE', 'vdj_score'), ('FWR1_IMGT', 'fwr1_imgt'), ('FWR2_IMGT', 'fwr2_imgt'), ('FWR3_IMGT', 'fwr3_imgt'), ('FWR4_IMGT', 'fwr4_imgt'), ('CDR1_IMGT', 'cdr1_imgt'), ('CDR2_IMGT', 'cdr2_imgt'), ('CDR3_IMGT', 'cdr3_imgt'), ('FWR1_AA_IMGT', 'fwr1_aa_imgt'), ('FWR2_AA_IMGT', 'fwr2_aa_imgt'), ('FWR3_AA_IMGT', 'fwr3_aa_imgt'), ('FWR4_AA_IMGT', 'fwr4_aa_imgt'), ('CDR1_AA_IMGT', 'cdr1_aa_imgt'), ('CDR2_AA_IMGT', 'cdr2_aa_imgt'), ('CDR3_AA_IMGT', 'cdr3_aa_imgt'), ('N1_LENGTH', 'n1_length'), ('N2_LENGTH', 'n2_length'), ('P3V_LENGTH', 'p3v_length'), ('P5D_LENGTH', 'p5d_length'), ('P3D_LENGTH', 'p3d_length'), ('P5J_LENGTH', 'p5j_length'), ('D_FRAME', 'd_frame'), ('C_CALL', 'c_call'), ('CDR3_IGBLAST', 'cdr3_igblast'), ('CDR3_IGBLAST_AA', 'cdr3_igblast_aa'), ('CONSCOUNT', 'conscount'), ('DUPCOUNT', 'dupcount'), ('UMICOUNT', 'umicount'), ('CLONE', 'clone'), ('CELL', 'cell')]) # Mapping of Receptor attributes to Change-O column names _receptor_map = {v: k for k, v in _schema_map.items()} # All fields fields = list(_schema_map.keys()) @staticmethod def toReceptor(field): """ Returns a Receptor attribute name from a Change-O column name Arguments: field : Change-O column name. Returns: str: Receptor attribute name. """ return ChangeoSchema._schema_map.get(field, field.lower()) @staticmethod def fromReceptor(field): """ Returns a Change-O column name from a Receptor attribute name Arguments: field : Receptor attribute name. Returns: str: Change-O column name. """ return ChangeoSchema._receptor_map.get(field, field.upper()) class ChangeoSchemaAA(ChangeoSchema): """ Change-O to Receptor amino acid mappings """ # Standard fields required = ['SEQUENCE_ID', 'SEQUENCE_AA_INPUT', 'STOP', 'INDELS', 'LOCUS', 'V_CALL', 'SEQUENCE_AA_VDJ', 'SEQUENCE_AA_IMGT', 'V_SEQ_AA_START', 'V_SEQ_AA_LENGTH', 'V_GERM_AA_START_VDJ', 'V_GERM_AA_LENGTH_VDJ', 'V_GERM_AA_START_IMGT', 'V_GERM_AA_LENGTH_IMGT', 'GERMLINE_AA_IMGT'] class ReceptorData: """ A class containing type conversion methods for Receptor data attributes Attributes: sequence_id (str): unique sequence identifier. rev_comp (bool): whether the alignment is relative to the reverse compliment of the input sequence. functional (bool): whether sample V(D)J sequence is predicted to be functional. in_frame (bool): whether junction region is in-frame. stop (bool): whether a stop codon is present in the V(D)J sequence. mutated_invariant (bool): whether the conserved amino acids are mutated in the V(D)J sequence. indels (bool): whether the V(D)J nucleotide sequence contains insertions and/or deletions. v_frameshift (bool): whether the V segment contains a frameshift sequence_input (Bio.Seq.Seq): input nucleotide sequence. sequence_vdj (Bio.Seq.Seq): Aligned V(D)J nucleotide sequence without IMGT-gaps. sequence_imgt (Bio.Seq.Seq): IMGT-gapped V(D)J nucleotide sequence. sequence_aa_input (Bio.Seq.Seq): input amino acid sequence. sequence_aa_vdj (Bio.Seq.Seq): Aligned V(D)J nucleotide sequence without IMGT-gaps. sequence_aa_imgt (Bio.Seq.Seq): IMGT-gapped V(D)J amino sequence. junction (Bio.Seq.Seq): ungapped junction region nucletide sequence. junction_aa (Bio.Seq.Seq): ungapped junction region amino acid sequence. junction_start (int): start positions of the junction in the input nucleotide sequence. junction_length (int): length of the junction in nucleotides. germline_vdj (Bio.Seq.Seq): full ungapped germline V(D)J nucleotide sequence. germline_vdj_d_mask (Bio.Seq.Seq): ungapped germline V(D)J nucleotides sequence with Ns masking the NP1-D-NP2 regions. germline_imgt (Bio.Seq.Seq): full IMGT-gapped germline V(D)J nucleotide sequence. germline_imgt_d_mask (Bio.Seq.Seq): IMGT-gapped germline V(D)J nucleotide sequence with ns masking the NP1-D-NP2 regions. germline_aa_vdj (Bio.Seq.Seq): full ungapped germline V(D)J amino acid sequence. germline_aa_imgt (Bio.Seq.Seq): full IMGT-gapped germline V(D)J amino acid sequence. v_call (str): V allele assignment(s). d_call (str): D allele assignment(s). j_call (str): J allele assignment(s). c_call (str): C region assignment. v_seq_start (int): position of the first V nucleotide in the input sequence (1-based). v_seq_length (int): number of V nucleotides in the input sequence. v_germ_start_imgt (int): position of the first V nucleotide in IMGT-gapped V germline sequence alignment (1-based). v_germ_length_imgt (int): length of the IMGT numbered germline V alignment. v_germ_start_vdj (int): position of the first nucleotide in ungapped V germline sequence alignment (1-based). v_germ_length_vdj (int): length of the ungapped germline V alignment. v_seq_aa_start (int): position of the first V amino acid in the amino acid input sequence (1-based). v_seq_aa_length (int): number of V amino acid in the amino acid input sequence. v_germ_aa_start_imgt (int): position of the first V amino acid in IMGT-gapped V germline amino acid alignment (1-based). v_germ_aa_length_imgt (int): length of the IMGT numbered germline V amino acid alignment. v_germ_aa_start_vdj (int): position of the first amino acid in ungapped V germline amino acid alignment (1-based). v_germ_aa_length_vdj (int): length of the ungapped germline V amino acid alignment. np1_start (int): position of the first untemplated nucleotide between the V and D segments in the input sequence (1-based). np1_length (int): number of untemplated nucleotides between the V and D segments. np1_aa_start (int): position of the first untemplated amino acid between the V and D segments in the input amino acid sequence (1-based). np1_aa_length (int): number of untemplated amino acids between the V and D segments. d_seq_start (int): position of the first D nucleotide in the input sequence (1-based). d_seq_length (int): number of D nucleotides in the input sequence. d_germ_start (int): position of the first nucleotide in D germline sequence alignment (1-based). d_germ_length (int): length of the germline D alignment. d_seq_aa_start (int): position of the first D amino acid in the input amino acidsequence (1-based). d_seq_aa_length (int): number of D amino acids in the input amino acid sequence. d_germ_aa_start (int): position of the first amino acid in D germline amino acid alignment (1-based). d_germ_aa_length (int): length of the germline D amino acid alignment. np2_start (int): position of the first untemplated nucleotide between the D and J segments in the input sequence (1-based). np2_length (int): number of untemplated nucleotides between the D and J segments. np2_aa_start (int): position of the first untemplated amino acid between the D and J segments in the input amino acid sequence (1-based). np2_aa_length (int): number of untemplated amino acid between the D and J segments. j_seq_start (int): position of the first J nucleotide in the input sequence (1-based). j_seq_length (int): number of J nucleotides in the input sequence. j_germ_start (int): position of the first nucleotide in J germline sequence alignment (1-based). j_germ_length (int): length of the germline J alignment. j_seq_aa_start (int): position of the first J amino acid in the input amino acidsequence (1-based). j_seq_aa_length (int): number of J amino acid in the input amino acidsequence. j_germ_aa_start (int): position of the first amino acid in J germline amino acid alignment (1-based). j_germ_aa_length (int): length of the germline J amino acid alignment. v_score (float): alignment score for the V. v_identity (float): alignment identity for the V. v_evalue (float): E-value for the alignment of the V. v_btop (str): BTOP for the alignment of the V. v_cigar (str): CIGAR for the alignment of the V. d_score (float): alignment score for the D. d_identity (float): alignment identity for the D. d_evalue (float): E-value for the alignment of the D. d_btop (str): BTOP for the alignment of the D. d_cigar (str): CIGAR for the alignment of the D. j_score (float): alignment score for the J. j_identity (float): alignment identity for the J. j_evalue (float): E-value for the alignment of the J. j_btop (str): BTOP for the alignment of the J. j_cigar (str): CIGAR for the alignment of the J. vdj_score (float): alignment score for the V(D)J. fwr1_imgt (Bio.Seq.Seq): IMGT-gapped FWR1 nucleotide sequence. fwr2_imgt (Bio.Seq.Seq): IMGT-gapped FWR2 nucleotide sequence. fwr3_imgt (Bio.Seq.Seq): IMGT-gapped FWR3 nucleotide sequence. fwr4_imgt (Bio.Seq.Seq): IMGT-gapped FWR4 nucleotide sequence. cdr1_imgt (Bio.Seq.Seq): IMGT-gapped CDR1 nucleotide sequence. cdr2_imgt (Bio.Seq.Seq): IMGT-gapped CDR2 nucleotide sequence. cdr3_imgt (Bio.Seq.Seq): IMGT-gapped CDR3 nucleotide sequence. cdr3_igblast (Bio.Seq.Seq): CDR3 nucleotide sequence assigned by IgBLAST. fwr1_aa_imgt (Bio.Seq.Seq): IMGT-gapped FWR1 amino acid sequence. fwr2_aa_imgt (Bio.Seq.Seq): IMGT-gapped FWR2 amino acid sequence. fwr3_aa_imgt (Bio.Seq.Seq): IMGT-gapped FWR3 amino acid sequence. fwr4_aa_imgt (Bio.Seq.Seq): IMGT-gapped FWR4 amino acid sequence. cdr1_aa_imgt (Bio.Seq.Seq): IMGT-gapped CDR1 amino acid sequence. cdr2_aa_imgt (Bio.Seq.Seq): IMGT-gapped CDR2 amino acid sequence. cdr3_aa_imgt (Bio.Seq.Seq): IMGT-gapped CDR3 amino acid sequence. cdr3_igblast_aa (Bio.Seq.Seq): CDR3 amino acid sequence assigned by IgBLAST. n1_length (int): M nucleotides 5' of the D segment. n2_length (int): nucleotides 3' of the D segment. p3v_length (int): palindromic nucleotides 3' of the V segment. p5d_length (int): palindromic nucleotides 5' of the D segment. p3d_length (int): palindromic nucleotides 3' of the D segment. p5j_length (int): palindromic nucleotides 5' of the J segment. d_frame (int): D segment reading frame. conscount (int): number of reads contributing to the UMI consensus sequence. dupcount (int): copy number of the sequence. umicount (int): number of UMIs representing the sequence. clone (str): clonal cluster identifier. cell (str): origin cell identifier. annotations (dict): dictionary containing all unknown fields. """ #with resource_stream(__name__, 'data/receptor.yaml') as f: # data = yaml.load(f, Loader=yaml.FullLoader) # # # Define type parsers # parsers = {k: v['type'] for k, v in data['receptor'].items()} # # # Define coordinate field sets # coordinates = {} # for k, v in data['receptor'].items(): # if 'coordinate' in v: # position = {v['coordinate']['position']: k} # group = coordinates.setdefault(v['coordinate']['group'], {}) # group.update(position) # # # Positional fields sets in the form {start: (length, end)} # self.start_fields = {x['start']: (x['length'], x['end']) for x in coordinates.values()} # # # Positional fields sets in the form {length: (start, end)} # self.length_fields = {x['length']: (x['start'], x['end']) for x in coordinates.values()} # # # Positional fields sets in the form {end: (start, length)} # self.end_fields = {x['end']: (x['start'], x['length']) for x in coordinates.values()} # Mapping of member variables to parsing functions parsers = {'sequence_id': 'identity', 'rev_comp': 'logical', 'functional': 'logical', 'locus': 'identity', 'in_frame': 'logical', 'stop': 'logical', 'mutated_invariant': 'logical', 'indels': 'logical', 'v_frameshift': 'logical', 'sequence_input': 'nucleotide', 'sequence_imgt': 'nucleotide', 'sequence_vdj': 'nucleotide', 'sequence_aa_input': 'aminoacid', 'sequence_aa_imgt': 'aminoacid', 'sequence_aa_vdj': 'aminoacid', 'junction': 'nucleotide', 'junction_aa': 'aminoacid', 'junction_start': 'integer', 'junction_length': 'integer', 'germline_imgt': 'nucleotide', 'germline_imgt_d_mask': 'nucleotide', 'germline_vdj': 'nucleotide', 'germline_vdj_d_mask': 'nucleotide', 'germline_aa_imgt': 'aminoacid', 'germline_aa_vdj': 'aminoacid', 'v_call': 'identity', 'd_call': 'identity', 'j_call': 'identity', 'c_call': 'identity', 'v_seq_start': 'integer', 'v_seq_length': 'integer', 'v_germ_start_imgt': 'integer', 'v_germ_length_imgt': 'integer', 'v_germ_start_vdj': 'integer', 'v_germ_length_vdj': 'integer', 'v_seq_aa_start': 'integer', 'v_seq_aa_length': 'integer', 'v_germ_aa_start_imgt': 'integer', 'v_germ_aa_length_imgt': 'integer', 'v_germ_aa_start_vdj': 'integer', 'v_germ_aa_length_vdj': 'integer', 'np1_start': 'integer', 'np1_length': 'integer', 'np1_aa_start': 'integer', 'np1_aa_length': 'integer', 'd_seq_start': 'integer', 'd_seq_length': 'integer', 'd_germ_start': 'integer', 'd_germ_length': 'integer', 'd_seq_aa_start': 'integer', 'd_seq_aa_length': 'integer', 'd_germ_aa_start': 'integer', 'd_germ_aa_length': 'integer', 'np2_start': 'integer', 'np2_length': 'integer', 'np2_aa_start': 'integer', 'np2_aa_length': 'integer', 'j_seq_start': 'integer', 'j_seq_length': 'integer', 'j_germ_start': 'integer', 'j_germ_length': 'integer', 'j_seq_aa_start': 'integer', 'j_seq_aa_length': 'integer', 'j_germ_aa_start': 'integer', 'j_germ_aa_length': 'integer', 'v_score': 'double', 'v_identity': 'double', 'v_evalue': 'double', 'v_btop': 'identity', 'v_cigar': 'identity', 'd_score': 'double', 'd_identity': 'double', 'd_evalue': 'double', 'd_btop': 'identity', 'd_cigar': 'identity', 'j_score': 'double', 'j_identity': 'double', 'j_evalue': 'double', 'j_btop': 'identity', 'j_cigar': 'identity', 'vdj_score': 'double', 'fwr1_imgt': 'nucleotide', 'fwr2_imgt': 'nucleotide', 'fwr3_imgt': 'nucleotide', 'fwr4_imgt': 'nucleotide', 'cdr1_imgt': 'nucleotide', 'cdr2_imgt': 'nucleotide', 'cdr3_imgt': 'nucleotide', 'fwr1_aa_imgt': 'aminoacid', 'fwr2_aa_imgt': 'aminoacid', 'fwr3_aa_imgt': 'aminoacid', 'fwr4_aa_imgt': 'aminoacid', 'cdr1_aa_imgt': 'aminoacid', 'cdr2_aa_imgt': 'aminoacid', 'cdr3_aa_imgt': 'aminoacid', 'n1_length': 'integer', 'n2_length': 'integer', 'p3v_length': 'integer', 'p5d_length': 'integer', 'p3d_length': 'integer', 'p5j_length': 'integer', 'd_frame': 'integer', 'cdr3_igblast': 'nucleotide', 'cdr3_igblast_aa': 'aminoacid', 'conscount': 'integer', 'dupcount': 'integer', 'umicount': 'integer', 'clone': 'identity', 'cell': 'identity'} # Positional fields sets in the form (start, length, end) _coordinate_map = [('v_seq_start', 'v_seq_length', 'v_seq_end'), ('v_germ_start_imgt', 'v_germ_length_imgt', 'v_germ_end_imgt'), ('v_germ_start_vdj', 'v_germ_length_vdj', 'v_germ_end_vdj'), ('v_alignment_start', 'v_alignment_length', 'v_alignment_end'), ('v_seq_aa_start', 'v_seq_aa_length', 'v_seq_aa_end'), ('v_germ_aa_start_imgt', 'v_germ_aa_length_imgt', 'v_germ_aa_end_imgt'), ('v_germ_aa_start_vdj', 'v_germ_aa_length_vdj', 'v_germ_aa_end_vdj'), ('v_alignment_aa_start', 'v_alignment_aa_length', 'v_alignment_aa_end'), ('d_seq_start', 'd_seq_length', 'd_seq_end'), ('d_germ_start', 'd_germ_length', 'd_germ_end'), ('d_seq_aa_start', 'd_seq_aa_length', 'd_seq_aa_end'), ('d_germ_aa_start', 'd_germ_aa_length', 'd_germ_aa_end'), ('j_seq_start', 'j_seq_length', 'j_seq_end'), ('j_germ_start', 'j_germ_length', 'j_germ_end'), ('j_seq_aa_start', 'j_seq_aa_length', 'j_seq_aa_end'), ('j_germ_aa_start', 'j_germ_aa_length', 'j_germ_aa_end'), ('junction_start', 'junction_length', 'junction_end'), ('fwr1_start', 'fwr1_length', 'fwr1_end'), ('fwr2_start', 'fwr2_length', 'fwr2_end'), ('fwr3_start', 'fwr3_length', 'fwr3_end'), ('fwr4_start', 'fwr4_length', 'fwr4_end'), ('cdr1_start', 'cdr1_length', 'cdr1_end'), ('cdr2_start', 'cdr2_length', 'cdr2_end'), ('cdr3_start', 'cdr3_length', 'cdr3_end')] # Positional fields sets in the form {start: (length, end)} start_fields = {x[0]: (x[1], x[2]) for x in _coordinate_map} # Positional fields sets in the form {length: (start, end)} length_fields = {x[1]: (x[0], x[2]) for x in _coordinate_map} # Positional fields sets in the form {end: (start, length)} end_fields = {x[2]: (x[0], x[1]) for x in _coordinate_map} @staticmethod def identity(v, deparse=False): return v # Logical type conversion @staticmethod def logical(v, deparse=False): parse_map = {True: True, 'T': True, 'TRUE': True, False: False, 'F': False, 'FALSE': False, 'NA': None, 'None': None, '': None} deparse_map = {False: 'F', True: 'T', None: ''} if not deparse: try: return parse_map[v] except: return None else: try: return deparse_map[v] except: return '' # Integer type conversion @staticmethod def integer(v, deparse=False): if not deparse: try: return int(v) except: return None else: return '' if v is None else str(v) # Float type conversion @staticmethod def double(v, deparse=False): if not deparse: try: return float(v) except: return None else: return '' if v is None else str(v) # Nucleotide sequence type conversion @staticmethod def nucleotide(v, deparse=False): if not deparse: try: #return '' if v in ('NA', 'None') else Seq(v, IUPAC.ambiguous_dna).upper() return '' if v in ('NA', 'None') else v.upper() except: return '' else: return '' if v in ('NA', 'None', None) else str(v) # Sequence type conversion @staticmethod def aminoacid(v, deparse=False): if not deparse: try: #return '' if v in ('NA', 'None') else Seq(v, IUPAC.extended_protein).upper() return '' if v in ('NA', 'None') else v.upper() except: return '' else: return '' if v in ('NA', 'None', None) else str(v) class Receptor: """ A class defining a V(D)J sequence and its annotations """ # Mapping of derived properties to types _derived = {'v_seq_end': 'integer', 'v_germ_end_vdj': 'integer', 'v_germ_end_imgt': 'integer', 'v_seq_aa_end': 'integer', 'v_germ_aa_end_vdj': 'integer', 'v_germ_aa_end_imgt': 'integer', 'd_seq_end': 'integer', 'd_germ_end': 'integer', 'd_seq_aa_end': 'integer', 'd_germ_aa_end': 'integer', 'j_seq_end': 'integer', 'j_germ_end': 'integer', 'j_seq_aa_end': 'integer', 'j_germ_aa_end': 'integer', 'junction_end': 'integer'} def _junction_start(self): """ Determine the position of the first junction nucleotide in the input sequence """ try: x = self.v_germ_end_imgt - 310 return self.v_seq_end - x if x >= 0 else None except TypeError: return None def __init__(self, data): """ Initializer Arguments: data : dict of field/value data Returns: changeo.Receptor.Receptor """ # Convert case of keys data = {k.lower(): v for k, v in data.items()} # Define known keys required_keys = ('sequence_id', ) optional_keys = (x for x in ReceptorData.parsers if x not in required_keys) # Parse required fields try: for k in required_keys: f = getattr(ReceptorData, ReceptorData.parsers[k]) setattr(self, k, f(data.pop(k))) except: printError('Input must contain valid %s values.' % ','.join(required_keys)) # Parse optional known fields for k in optional_keys: f = getattr(ReceptorData, ReceptorData.parsers[k]) setattr(self, k, f(data.pop(k, None))) # Derive junction_start if not provided if not hasattr(self, 'junction_start') or self.junction_start is None: setattr(self, 'junction_start', self._junction_start()) # Add remaining elements as annotations dictionary self.annotations = data def setDict(self, data, parse=False): """ Adds or updates multiple attributes and annotations Arguments: data : a dictionary of annotations to add or update. parse : if True pass values through string parsing functions for known fields. Returns: None : updates attribute values and the annotations attribute. """ # Partition data attributes = {k.lower(): v for k, v in data.items() if k.lower() in ReceptorData.parsers} annotations = {k.lower(): v for k, v in data.items() if k.lower() not in attributes} # Update attributes for k, v in attributes.items(): if parse: f = getattr(ReceptorData, ReceptorData.parsers[k]) setattr(self, k, f(v)) else: setattr(self, k, v) # Update annotations self.annotations.update(annotations) def setField(self, field, value, parse=False): """ Set an attribute or annotation value Arguments: field : attribute name as a string value : value to assign parse : if True pass values through string parsing functions for known fields. Returns: None. Updates attribute or annotation. """ field = field.lower() if field in ReceptorData.parsers and parse: f = getattr(ReceptorData, ReceptorData.parsers[field]) setattr(self, field, f(value)) elif field in ReceptorData.parsers: setattr(self, field, value) else: self.annotations[field] = value def getField(self, field): """ Get an attribute or annotation value Arguments: field : attribute name as a string Returns: Value in the attribute. Returns None if the attribute cannot be found. """ field = field.lower() if field in ReceptorData.parsers: return getattr(self, field) elif field in self.annotations: return self.annotations[field] else: return None def getSeq(self, field): """ Get an attribute value converted to a Seq object Arguments: field : variable name as a string Returns: Bio.Seq.Seq : Value in the field as a Seq object """ v = self.getField(field) if isinstance(v, Seq): return v elif isinstance(v, str): return Seq(v) else: return None def getAIRR(self, field, seq=False): """ Get an attribute from an AIRR field name Arguments: field : AIRR column name as a string seq : if True return the attribute as a Seq object Returns: Value in the AIRR field. Returns None if the field cannot be found. """ # Map to Receptor attribute field = AIRRSchema.toReceptor(field) if seq: return self.getSeq(field) else: return self.getField(field) def getChangeo(self, field, seq=False): """ Get an attribute from a Change-O field name Arguments: field : Change-O column name as a string seq : if True return the attribute as a Seq object Returns: Value in the Change-O field. Returns None if the field cannot be found. """ # Map to Receptor attribute field = ChangeoSchema.toReceptor(field) if seq: return self.getSeq(field) else: return self.getField(field) def toDict(self): """ Convert the namespace to a dictionary Returns: dict : member fields with values converted to appropriate strings """ d = {} n = self.__dict__ # Parse attributes for k, v in n.items(): if k == 'annotations': d.update(n['annotations']) else: f = getattr(ReceptorData, ReceptorData.parsers[k]) d[k] = f(v, deparse=True) # Parse properties for k in Receptor._derived: f = getattr(ReceptorData, Receptor._derived[k]) v = getattr(self, k) d[k] = f(v, deparse=True) return d def getAlleleCalls(self, calls, action='first'): """ Get multiple allele calls Arguments: calls : iterable of calls to get; one or more of ('v','d','j') actions : One of ('first','set') Returns: list : List of requested calls in order """ vdj = {'v': self.getVAllele(action), 'd': self.getDAllele(action), 'j': self.getJAllele(action)} return [vdj[k] for k in calls] def getGeneCalls(self, calls, action='first'): """ Get multiple gene calls Arguments: calls : iterable of calls to get; one or more of ('v','d','j') actions : One of ('first','set') Returns: list : List of requested calls in order """ vdj = {'v': self.getVGene(action), 'd': self.getDGene(action), 'j': self.getJGene(action)} return [vdj[k] for k in calls] def getFamilyCalls(self, calls, action='first'): """ Get multiple family calls Arguments: calls : iterable of calls to get; one or more of ('v','d','j') actions : One of ('first','set') Returns: list : List of requested calls in order """ vdj = {'v': self.getVFamily(action), 'd': self.getDFamily(action), 'j': self.getJFamily(action)} return [vdj[k] for k in calls] # TODO: this can't distinguish empty value ("") from missing field (no column) def getVAllele(self, action='first', field=None): """ V segment allele getter Arguments: actions : One of 'first', 'set' or list' field : attribute or annotation name containing the V call. Use v_call attribute if None. Returns: str : String of the allele when action is 'first'; tuple : Tuple of allele calls for 'set' or 'list' actions. """ x = self.v_call if field is None else self.getField(field) return getAllele(x, action=action) def getDAllele(self, action='first', field=None): """ D segment allele getter Arguments: actions : One of 'first', 'set' or 'list' field : attribute or annotation name containing the D call. Use d_call attribute if None. Returns: str : String of the allele when action is 'first'; tuple : Tuple of allele calls for 'set' or 'list' actions. """ x = self.d_call if field is None else self.getField(field) return getAllele(x, action=action) def getJAllele(self, action='first', field=None): """ J segment allele getter Arguments: actions : One of 'first', 'set' or 'list' field : attribute or annotation name containing the J call. Use j_call attribute if None. Returns: str : String of the allele when action is 'first'; tuple : Tuple of allele calls for 'set' or 'list' actions. """ x = self.j_call if field is None else self.getField(field) return getAllele(x, action=action) def getVGene(self, action='first', field=None): """ V segment gene getter Arguments: actions : One of 'first', 'set' or list' field : attribute or annotation name containing the V call. Use v_call attribute if None. Returns: str : String of the allele when action is 'first'; tuple : Tuple of allele calls for 'set' or 'list' actions. """ x = self.v_call if field is None else self.getField(field) return getGene(x, action=action) def getDGene(self, action='first', field=None): """ D segment gene getter Arguments: actions : One of 'first', 'set' or list' field : attribute or annotation name containing the D call. Use d_call attribute if None. Returns: str : String of the allele when action is 'first'; tuple : Tuple of allele calls for 'set' or 'list' actions. """ x = self.d_call if field is None else self.getField(field) return getGene(x, action=action) def getJGene(self, action='first', field=None): """ J segment gene getter Arguments: actions : One of 'first', 'set' or list' field : attribute or annotation name containing the J call. Use j_call attribute if None. Returns: str : String of the allele when action is 'first'; tuple : Tuple of allele calls for 'set' or 'list' actions. """ x = self.j_call if field is None else self.getField(field) return getGene(x, action=action) def getVFamily(self, action='first', field=None): """ V segment family getter Arguments: actions : One of 'first', 'set' or list' field : attribute or annotation name containing the V call. Use v_call attribute if None. Returns: str : String of the allele when action is 'first'; tuple : Tuple of allele calls for 'set' or 'list' actions. """ x = self.v_call if field is None else self.getField(field) return getFamily(x, action=action) def getDFamily(self, action='first', field=None): """ D segment family getter Arguments: actions : One of 'first', 'set' or list' field : attribute or annotation name containing the D call. Use d_call attribute if None. Returns: str : String of the allele when action is 'first'; tuple : Tuple of allele calls for 'set' or 'list' actions. """ x = self.d_call if field is None else self.getField(field) return getFamily(x, action=action) def getJFamily(self, action='first', field=None): """ J segment family getter Arguments: actions : One of 'first', 'set' or list' field : attribute or annotation name containing the J call. Use j_call attribute if None. Returns: str : String of the allele when action is 'first'; tuple : Tuple of allele calls for 'set' or 'list' actions. """ x = self.j_call if field is None else self.getField(field) return getFamily(x, action=action) def getAlleleNumbers(self, calls, action='first'): """ Get multiple allele numeric identifiers Arguments: calls : iterable of calls to get; one or more of ('v','d','j') actions : One of ('first','set') Returns: list : List of requested calls in order """ vdj = {'v': self.getVAlleleNumber(action), 'd': self.getDAlleleNumber(action), 'j': self.getJAlleleNumber(action)} return [vdj[k] for k in calls] def getVAlleleNumber(self, action='first', field=None): """ V segment allele number getter Arguments: actions : One of 'first', 'set' or list' field : attribute or annotation name containing the V call. Use v_call attribute if None. Returns: str : String of the allele when action is 'first'; tuple : Tuple of allele numbers for 'set' or 'list' actions. """ x = self.v_call if field is None else self.getField(field) return getAlleleNumber(x, action=action) def getDAlleleNumber(self, action='first', field=None): """ D segment allele number getter Arguments: actions : One of 'first', 'set' or list' field : attribute or annotation name containing the D call. Use d_call attribute if None. Returns: str : String of the allele when action is 'first'; tuple : Tuple of allele numbers for 'set' or 'list' actions. """ x = self.d_call if field is None else self.getField(field) return getAlleleNumber(x, action=action) def getJAlleleNumber(self, action='first', field=None): """ J segment allele number getter Arguments: actions : One of 'first', 'set' or list' field : attribute or annotation name containing the J call. Use j_call attribute if None. Returns: str : String of the allele when action is 'first'; tuple : Tuple of allele numbers for 'set' or 'list' actions. """ x = self.j_call if field is None else self.getField(field) return getAlleleNumber(x, action=action) @property def v_seq_end(self): """ Position of the last V nucleotide in the input sequence """ try: return self.v_seq_start + self.v_seq_length - 1 except TypeError: return None @property def v_germ_end_imgt(self): """ Position of the last nucleotide in the IMGT-gapped V germline sequence alignment """ try: return self.v_germ_start_imgt + self.v_germ_length_imgt - 1 except TypeError: return None @property def v_germ_end_vdj(self): """ Position of the last nucleotide in the ungapped V germline sequence alignment """ try: return self.v_germ_start_vdj + self.v_germ_length_vdj - 1 except TypeError: return None @property def v_seq_aa_end(self): """ Position of the last V nucleotide in the input sequence """ try: return self.v_seq_aa_start + self.v_seq_aa_length - 1 except TypeError: return None @property def v_germ_aa_end_imgt(self): """ Position of the last nucleotide in the IMGT-gapped V germline sequence alignment """ try: return self.v_germ_aa_start_imgt + self.v_germ_aa_length_imgt - 1 except TypeError: return None @property def v_germ_aa_end_vdj(self): """ Position of the last nucleotide in the ungapped V germline sequence alignment """ try: return self.v_germ_aa_start_vdj + self.v_germ_aa_length_vdj - 1 except TypeError: return None @property def d_seq_end(self): """ Position of the last D nucleotide in the input sequence """ try: return self.d_seq_start + self.d_seq_length - 1 except TypeError: return None @property def d_germ_end(self): """ Position of the last nucleotide in the D germline sequence alignment """ try: return self.d_germ_start + self.d_germ_length - 1 except TypeError: return None @property def d_seq_aa_end(self): """ Position of the last D amino acid in the input amino acid sequence """ try: return self.d_seq_aa_start + self.d_seq_aa_length - 1 except TypeError: return None @property def d_germ_aa_end(self): """ Position of the last amino acid in the D germline amino acid alignment """ try: return self.d_germ_aa_start + self.d_germ_aa_length - 1 except TypeError: return None @property def j_seq_end(self): """ Position of the last J nucleotide in the input sequence """ try: return self.j_seq_start + self.j_seq_length - 1 except TypeError: return None @property def j_germ_end(self): """ Position of the last nucleotide in the J germline sequence alignment """ try: return self.j_germ_start + self.j_germ_length - 1 except TypeError: return None @property def j_seq_aa_end(self): """ Position of the last J amino acid in the input amino sequence """ try: return self.j_seq_aa_start + self.j_seq_aa_length - 1 except TypeError: return None @property def j_germ_aa_end(self): """ Position of the last amino acid in the J germline amino acid alignment """ try: return self.j_germ_aa_start + self.j_germ_aa_length - 1 except TypeError: return None @property def junction_end(self): """ Position of the last junction nucleotide in the input sequence """ try: gaps = self.junction.count('.') return self.junction_start + self.junction_length - gaps - 1 except TypeError: return None changeo-1.2.0/changeo/Gene.py0000644000175000017500000005332514062470431015311 0ustar nileshnilesh""" Gene annotations """ # Info __author__ = 'Jason Anthony Vander Heiden' # Imports import re from collections import OrderedDict # Presto and changeo imports from changeo.Defaults import v_attr, d_attr, j_attr, seq_attr # Ig and TCR Regular expressions allele_number_regex = re.compile(r'(?<=\*)([\.\w]+)') allele_regex = re.compile(r'((IG[HLK]|TR[ABGD])([VDJ][A-R0-9]+[-/\w]*[-\*][\.\w]+))') gene_regex = re.compile(r'((IG[HLK]|TR[ABGD])([VDJ][A-R0-9]+[-/\w]*))') family_regex = re.compile(r'((IG[HLK]|TR[ABGD])([VDJ][A-R0-9]+))') locus_regex = re.compile(r'(IG[HLK]|TR[ABGD])') v_allele_regex = re.compile(r'((IG[HLK]|TR[ABGD])V[A-R0-9]+[-/\w]*[-\*][\.\w]+)') d_allele_regex = re.compile(r'((IG[HLK]|TR[ABGD])D[A-R0-9]+[-/\w]*[-\*][\.\w]+)') j_allele_regex = re.compile(r'((IG[HLK]|TR[ABGD])J[A-R0-9]+[-/\w]*[-\*][\.\w]+)') c_gene_regex = re.compile(r'((IG[HLK]|TR[ABGD])([DMAGEC][P0-9]?[A-Z]?))') def parseGeneCall(gene, regex, action='first'): """ Extract alleles from strings Arguments: gene (str): string with gene calls regex (re.Pattern): compiled regular expression for allele match action (str): action to perform for multiple alleles; one of ('first', 'set', 'list'). Returns: str: String of the allele when action is 'first'; tuple: Tuple of allele calls for 'set' or 'list' actions. """ try: match = [x.group(0) for x in regex.finditer(gene)] except: match = None if action == 'first': return match[0] if match else None elif action == 'set': return tuple(sorted(set(match))) if match else None elif action == 'list': return tuple(sorted(match)) if match else None else: return None def getAllele(gene, action='first'): """ Extract allele from gene call string Arguments: gene (str): string with gene calls action (str): action to perform for multiple alleles; one of ('first', 'set', 'list'). Returns: str: String of the first allele calls when action is 'first'. tuple: Tuple of allele calls for 'set' or 'list' actions. """ return parseGeneCall(gene, allele_regex, action=action) def getGene(gene, action='first'): """ Extract gene from gene call string Arguments: gene (str): string with gene calls action (str): action to perform for multiple alleles; one of ('first', 'set', 'list'). Returns: str: String of the first gene call when action is 'first'. tuple: Tuple of gene calls for 'set' or 'list' actions. """ return parseGeneCall(gene, gene_regex, action=action) def getFamily(gene, action='first'): """ Extract family from gene call string Arguments: gene (str): string with gene calls action (str): action to perform for multiple alleles; one of ('first', 'set', 'list'). Returns: str: String of the first family call when action is 'first'. tuple: Tuple of allele calls for 'set' or 'list' actions. """ return parseGeneCall(gene, family_regex, action=action) def getLocus(gene, action='first'): """ Extract locus from gene call string Arguments: gene (str): string with gene calls action (str): action to perform for multiple alleles; one of ('first', 'set', 'list'). Returns: str: String of the first locus call when action is 'first'. tuple: Tuple of locus calls for 'set' or 'list' actions. """ return parseGeneCall(gene, locus_regex, action=action) def getAlleleNumber(gene, action='first'): """ Extract allele number from gene call string Arguments: gene (str): string with gene calls action (str): action to perform for multiple alleles; one of ('first', 'set', 'list'). Returns: str: String of the first allele number call when action is 'first'. tuple: Tuple of allele numbers for 'set' or 'list' actions. """ return parseGeneCall(gene, allele_number_regex, action=action) def getCGene(gene, action='first'): """ Extract C-region gene from gene call string Arguments: gene (str): string with C-region gene calls action (str): action to perform for multiple alleles; one of ('first', 'set', 'list'). Returns: str: String of the first C-region gene call when action is 'first'. tuple: Tuple of gene calls for 'set' or 'list' actions. """ return parseGeneCall(gene, c_gene_regex, action=action) def getVAllele(gene, action='first'): """ Extract V allele gene from gene call string Arguments: gene (str): string with V gene calls action (str): action to perform for multiple alleles; one of ('first', 'set', 'list'). Returns: str: String of the first V allele call when action is 'first'. tuple: Tuple of V allele calls for 'set' or 'list' actions. """ return parseGeneCall(gene, v_allele_regex, action=action) def getDAllele(gene, action='first'): """ Extract D allele gene from gene call string Arguments: gene (str): string with D gene calls action (str): action to perform for multiple alleles; one of ('first', 'set', 'list'). Returns: str: String of the first D allele call when action is 'first'. tuple: Tuple of D allele calls for 'set' or 'list' actions. """ return parseGeneCall(gene, d_allele_regex, action=action) def getJAllele(gene, action='first'): """ Extract J allele gene from gene call string Arguments: gene (str): string with J gene calls action (str): action to perform for multiple alleles; one of ('first', 'set', 'list'). Returns: str: String of the first J allele call when action is 'first'. tuple: Tuple of J allele calls for 'set' or 'list' actions. """ return parseGeneCall(gene, j_allele_regex, action=action) # TODO: this is not generalized for non-IMGT gapped sequences! def getVGermline(receptor, references, v_field=v_attr, amino_acid=False): """ Extract V allele and germline sequence Arguments: receptor (changeo.Receptor.Receptor): Receptor object references (dict): dictionary of germline sequences v_field (str): Receptor attribute containing the V allele assignment amino_acid (bool): if True then use the amino acid positional fields, otherwise use the nucleotide fields. Returns: tuple: V allele name, V segment germline sequence """ # Extract V allele call vgene = receptor.getVAllele(action='first', field=v_field) # Get germline start and length if not amino_acid: pad_char = 'N' try: vstart = int(receptor.v_germ_start_imgt) - 1 except (TypeError, ValueError): vstart = 0 try: vlen = int(receptor.v_germ_length_imgt) except (TypeError, ValueError): vlen = 0 else: pad_char = 'X' try: vstart = int(receptor.v_germ_aa_start_imgt) - 1 except (TypeError, ValueError, AttributeError): vstart = 0 try: vlen = int(receptor.v_germ_aa_length_imgt) except (TypeError, ValueError, AttributeError): vlen = 0 # Build V segment germline sequence if vgene is None: germ_vseq = pad_char * vlen elif vgene in references: vseq = references[vgene] vpad = vlen - len(vseq[vstart:]) if vpad < 0: vpad = 0 germ_vseq = vseq[vstart:(vstart + vlen)] + (pad_char * vpad) else: germ_vseq = None return vgene, germ_vseq def getDGermline(receptor, references, d_field=d_attr, amino_acid=False): """ Extract D allele and germline sequence Arguments: receptor (changeo.Receptor.Receptor): Receptor object references (dict): dictionary of germline sequences d_field (str): Receptor attribute containing the D allele assignment amino_acid (bool): if True then use the amino acid positional fields, otherwise use the nucleotide fields. Returns: tuple: D allele name, D segment germline sequence """ # Extract D allele call dgene = receptor.getDAllele(action='first', field=d_field) # Get germline start and length if not amino_acid: try: dstart = int(receptor.d_germ_start) - 1 except (TypeError, ValueError): dstart = 0 try: dlen = int(receptor.d_germ_length) except (TypeError, ValueError): dlen = 0 else: try: dstart = int(receptor.d_germ_aa_start) - 1 except (TypeError, ValueError, AttributeError): dstart = 0 try: dlen = int(receptor.d_germ_aa_length) except (TypeError, ValueError, AttributeError): dlen = 0 # Build D segment germline sequence if dgene is None: germ_dseq = '' elif dgene in references: # Define D germline sequence dseq = references[dgene] germ_dseq = dseq[dstart:(dstart + dlen)] else: germ_dseq = None return dgene, germ_dseq def getJGermline(receptor, references, j_field=j_attr, amino_acid=False): """ Extract J allele and germline sequence Arguments: receptor (changeo.Receptor.Receptor): Receptor object references (dict): dictionary of germline sequences j_field (str): Receptor attribute containing the J allele assignment amino_acid (bool): if True then use the amino acid positional fields, otherwise use the nucleotide fields. Returns: tuple: J allele name, J segment germline sequence """ # Extract J allele call jgene = receptor.getJAllele(action='first', field=j_field) # Get germline start and length if not amino_acid: pad_char = 'N' try: jstart = int(receptor.j_germ_start) - 1 except (TypeError, ValueError): jstart = 0 try: jlen = int(receptor.j_germ_length) except (TypeError, ValueError): jlen = 0 else: pad_char = 'X' try: jstart = int(receptor.j_germ_aa_start) - 1 except (TypeError, ValueError, AttributeError): jstart = 0 try: jlen = int(receptor.j_germ_aa_length) except (TypeError, ValueError, AttributeError): jlen = 0 # Build J segment germline sequence if jgene is None: germ_jseq = pad_char * jlen elif jgene in references: jseq = references[jgene] jpad = jlen - len(jseq[jstart:]) if jpad < 0: jpad = 0 germ_jseq = jseq[jstart:(jstart + jlen)] + (pad_char * jpad) else: germ_jseq = None return jgene, germ_jseq def stitchVDJ(receptor, v_seq, d_seq, j_seq, amino_acid=False): """ Assemble full length germline sequence Arguments: receptor (changeo.Receptor.Receptor): Receptor object v_seq (str): V segment sequence as a string d_seq (str): D segment sequence as a string j_seq (str): J segment sequence as a string amino_acid (bool): if True use X for N/P regions and amino acid positional fields, otherwise use N and nucleotide fields. Returns: str: full germline sequence """ # Get N/P lengths if not amino_acid: np_char = 'N' try: np1_len = int(receptor.np1_length) except (TypeError, ValueError): np1_len = 0 try: np2_len = int(receptor.np2_length) except (TypeError, ValueError): np2_len = 0 else: np_char = 'X' try: np1_len = int(receptor.np1_aa_length) except (TypeError, ValueError, AttributeError): np1_len = 0 try: np2_len = int(receptor.np2_aa_length) except (TypeError, ValueError, AttributeError): np2_len = 0 # Assemble pieces starting with V segment sequence = v_seq sequence += np_char * np1_len sequence += d_seq sequence += np_char * np2_len sequence += j_seq return sequence def stitchRegions(receptor, v_seq, d_seq, j_seq, amino_acid=False): """ Assemble full length region encoding Arguments: receptor (changeo.Receptor.Receptor): Receptor object v_seq (str): V segment germline sequence as a string d_seq (str): D segment germline sequence as a string j_seq (str): J segment germline sequence as a string amino_acid (bool): if True use amino acid positional fields, otherwise use nucleotide fields. Returns: str: string defining germline regions """ # Set mode for region definitions full_junction = True if getattr(receptor, 'n1_length', None) is not None else False # Assemble pieces starting with V segment regions = 'V' * len(v_seq) # NP nucleotide additions after V if amino_acid: # PNP nucleotide additions after V try: np1_len = int(receptor.np1_aa_length) except (TypeError, ValueError, AttributeError): np1_len = 0 regions += 'N' * np1_len elif not full_junction: # PNP nucleotide additions after V try: np1_len = int(receptor.np1_length) except (TypeError, ValueError): np1_len = 0 regions += 'N' * np1_len else: # P nucleotide additions before N1 try: p3v_len = int(receptor.p3v_length) except (TypeError, ValueError): p3v_len = 0 # N1 nucleotide additions try: n1_len = int(receptor.n1_length) except (TypeError, ValueError): n1_len = 0 # P nucleotide additions before D try: p5d_len = int(receptor.p5d_length) except (TypeError, ValueError): p5d_len = 0 # Update regions regions += 'P' * p3v_len regions += 'N' * n1_len regions += 'P' * p5d_len # Add D segment regions += 'D' * len(d_seq) # NP nucleotide additions before J if amino_acid: # NP nucleotide additions try: np2_len = int(receptor.np2_aa_length) except (TypeError, ValueError, AttributeError): np2_len = 0 regions += 'N' * np2_len elif not full_junction: # NP nucleotide additions try: np2_len = int(receptor.np2_length) except (TypeError, ValueError): np2_len = 0 regions += 'N' * np2_len else: # P nucleotide additions after D try: p3d_len = int(receptor.p3d_length) except (TypeError, ValueError): p3d_len = 0 # N2 nucleotide additions try: n2_len = int(receptor.n2_length) except (TypeError, ValueError): n2_len = 0 # P nucleotide additions before J try: p5j_len = int(receptor.p5j_length) except (TypeError, ValueError): p5j_len = 0 # Update regions regions += 'P' * p3d_len regions += 'N' * n2_len regions += 'P' * p5j_len # Add J segment regions += 'J' * len(j_seq) return regions # TODO: Should do 'first' method for ambiguous V/J groups. And explicit allele extraction. def buildGermline(receptor, references, seq_field=seq_attr, v_field=v_attr, d_field=d_attr, j_field=j_attr, amino_acid=False): """ Join gapped germline sequences aligned with sample sequences Arguments: receptor (changeo.Receptor.Receptor): Receptor object. references (dict): dictionary of IMGT gapped germline sequences. seq_field (str): Receptor attribute in which to look for sequence. v_field (str): Receptor attribute in which to look for V call. d_field (str): Receptor attribute in which to look for V call. j_field (str): Receptor attribute in which to look for V call. amino_acid (bool): if True then use the amino acid positional fields, otherwise use the nucleotide fields. Returns: tuple: log dictionary, dictionary of {germline_type: germline_sequence}, dictionary of {segment: gene call} """ # Return objects log = OrderedDict() germlines = {'full': '', 'dmask': '', 'vonly': '', 'regions': ''} # Build V segment germline sequence vgene, germ_vseq = getVGermline(receptor, references, v_field=v_field, amino_acid=amino_acid) log['V_CALL'] = vgene if germ_vseq is None: log['ERROR'] = 'Allele %s is not in the provided germline database.' % vgene return log, None, None # Build D segment germline sequence dgene, germ_dseq = getDGermline(receptor, references, d_field=d_field, amino_acid=amino_acid) log['D_CALL'] = dgene if germ_dseq is None: log['ERROR'] = 'Allele %s is not in the provided germline database.' % dgene return log, None, None # Build J segment germline sequence jgene, germ_jseq = getJGermline(receptor, references, j_field=j_field, amino_acid=amino_acid) log['J_CALL'] = jgene if germ_jseq is None: log['ERROR'] = 'Allele %s is not in the provided germline database.' % jgene return log, None, None # Stitch complete germlines germ_seq = stitchVDJ(receptor, germ_vseq, germ_dseq, germ_jseq, amino_acid=amino_acid) regions = stitchRegions(receptor, germ_vseq, germ_dseq, germ_jseq, amino_acid=amino_acid) # Update log log['SEQUENCE'] = receptor.getField(seq_field) log['GERMLINE'] = germ_seq log['REGIONS'] = regions # Check that input and germline sequence match if len(receptor.getField(seq_field)) == 0: log['ERROR'] = 'Sequence is missing from the %s field' % seq_field return log, None, None len_check = len(germ_seq) - len(receptor.getField(seq_field)) if len_check != 0: log['ERROR'] = 'Germline sequence differs in length from input sequence by %i characters.' % abs(len_check) return log, None, None # Define return germlines object pad_char = 'X' if amino_acid else 'N' germ_dmask = germ_seq[:len(germ_vseq)] + \ pad_char * (len(germ_seq) - len(germ_vseq) - len(germ_jseq)) + \ germ_seq[-len(germ_jseq):] germlines = {'full': germ_seq, 'dmask': germ_dmask, 'vonly': germ_vseq, 'regions': regions} for k, v in germlines.items(): germlines[k] = v.upper() # Define return genes object genes = {'v': log['V_CALL'], 'd': log['D_CALL'], 'j': log['J_CALL']} return log, germlines, genes def buildClonalGermline(receptors, references, seq_field=seq_attr, v_field=v_attr, d_field=d_attr, j_field=j_attr, amino_acid=False): """ Determine consensus clone sequence and create germline for clone Arguments: receptors (changeo.Receptor.Receptor): list of Receptor objects references (dict): dictionary of IMGT gapped germline sequences seq_field (str): Receptor attribute in which to look for sequence v_field (str): Receptor attributein which to look for V call d_field (str): Receptor attributein which to look for D call j_field (str): Receptor attributein which to look for J call amino_acid (bool): if True then use the amino acid positional fields, otherwise use the nucleotide fields. Returns: tuple: log dictionary, dictionary of {germline_type: germline_sequence}, dictionary of consensus {segment: gene call} """ # Log log = OrderedDict() # Create dictionaries to count observed V/J calls v_dict = OrderedDict() j_dict = OrderedDict() # Amino acid settings pad_char = 'X' if amino_acid else 'N' # Find longest sequence in clone max_length = 0 for rec in receptors: v = rec.getVAllele(action='first', field=v_field) v_dict[v] = v_dict.get(v, 0) + 1 j = rec.getJAllele(action='first', field=j_field) j_dict[j] = j_dict.get(j, 0) + 1 seq_len = len(rec.getField(seq_field)) if seq_len > max_length: max_length = seq_len # Consensus V and J having most observations v_cons = [k for k in list(v_dict.keys()) if v_dict[k] == max(v_dict.values())] j_cons = [k for k in list(j_dict.keys()) if j_dict[k] == max(j_dict.values())] # Consensus sequence(s) with consensus V/J calls and longest sequence cons = [x for x in receptors if x.getVAllele(action='first', field=v_field) in v_cons and \ x.getJAllele(action='first', field=j_field) in j_cons and \ len(x.getField(seq_field)) == max_length] # Consensus sequence(s) with consensus V/J calls but not the longest sequence if not cons: cons = [x for x in receptors if x.getVAllele(action='first', field=v_field) in v_cons and \ x.getJAllele(action='first', field=j_field) in j_cons] # Return without germline if no sequence has both consensus V and J call if not cons: log['V_CALL'] = ','.join(v_cons) log['J_CALL'] = ','.join(j_cons) log['ERROR'] = 'No sequence found with both consensus V and J calls.' return log, None, None # Select consensus Receptor, resolving ties by alphabetical ordering of sequence id. cons = sorted(cons, key=lambda x: x.sequence_id)[0] # Pad end of consensus sequence with gaps to make it the max length gap_length = max_length - len(cons.getField(seq_field)) if gap_length > 0: if amino_acid: cons.j_germ_aa_length = int(cons.j_germ_aa_length or 0) + gap_length else: cons.j_germ_length = int(cons.j_germ_length or 0) + gap_length cons.setField(seq_field, cons.getField(seq_field) + (pad_char * gap_length)) # Update lengths padded to longest sequence in clone for rec in receptors: x = max_length - len(rec.getField(seq_field)) if amino_acid: rec.j_germ_aa_length = int(rec.j_germ_aa_length or 0) + x else: rec.j_germ_length = int(rec.j_germ_length or 0) + x rec.setField(seq_field, rec.getField(seq_field) + (pad_char * x)) # Stitch consensus germline cons_log, germlines, genes = buildGermline(cons, references, seq_field=seq_field, v_field=v_field, d_field=d_field, j_field=j_field, amino_acid=amino_acid) # Update log log['CONSENSUS'] = cons.sequence_id log.update(cons_log) # Return log return log, germlines, genes changeo-1.2.0/setup.py0000755000175000017500000000546713674203454014205 0ustar nileshnilesh#!/usr/bin/env python3 """ Presto setup """ # Imports import os import sys # Check setup requirements if sys.version_info < (3,4,0): sys.exit('At least Python 3.4.0 is required.\n') try: from setuptools import setup except ImportError: sys.exit('Please install setuptools before installing changeo.\n') # Get version, author and license information info_file = os.path.join('changeo', 'Version.py') __version__, __author__, __license__ = None, None, None try: exec(open(info_file).read()) except: sys.exit('Failed to load package information from %s.\n' % info_file) if __version__ is None: sys.exit('Missing version information in %s\n.' % info_file) if __author__ is None: sys.exit('Missing author information in %s\n.' % info_file) if __license__ is None: sys.exit('Missing license information in %s\n.' % info_file) # Define installation path for commandline tools scripts = ['AlignRecords.py', 'AssignGenes.py', 'BuildTrees.py', 'ConvertDb.py', 'CreateGermlines.py', 'DefineClones.py', 'MakeDb.py', 'ParseDb.py'] install_scripts = [os.path.join('bin', s) for s in scripts] # Load long package description desc_files = ['README.rst'] long_description = '\n\n'.join([open(f, 'r').read() for f in desc_files]) # Parse requirements if os.environ.get('READTHEDOCS', None) == 'True': # Set empty install_requires to get install to work on readthedocs install_requires = [] else: with open('requirements.txt') as req: install_requires = req.read().splitlines() # Setup setup(name='changeo', version=__version__, author=__author__, author_email='immcantation@googlegroups.com', description='A bioinformatics toolkit for processing high-throughput lymphocyte receptor sequencing data.', long_description=long_description, zip_safe=False, license=__license__, url='http://changeo.readthedocs.io', download_url='https://bitbucket.org/kleinstein/changeo/downloads', keywords=['bioinformatics', 'sequencing', 'immunology', 'adaptive immunity', 'immunoglobulin', 'AIRR-seq', 'Rep-Seq', 'B cell repertoire analysis', 'adaptive immune receptor repertoires'], install_requires=install_requires, packages=['changeo'], package_dir={'changeo': 'changeo'}, package_data={'changeo': ['data/*_dist.tsv']}, scripts=install_scripts, classifiers=['Development Status :: 4 - Beta', 'Environment :: Console', 'Intended Audience :: Science/Research', 'Natural Language :: English', 'Operating System :: OS Independent', 'Programming Language :: Python :: 3.4', 'Topic :: Scientific/Engineering :: Bio-Informatics']) changeo-1.2.0/bin/0000755000175000017500000000000014136777167013237 5ustar nileshnileshchangeo-1.2.0/bin/CreateGermlines.py0000755000175000017500000003624514135625447016667 0ustar nileshnilesh#!/usr/bin/env python3 """ Reconstructs germline sequences from alignment data """ # Info __author__ = 'Namita Gupta, Jason Anthony Vander Heiden' from changeo import __version__, __date__ # Imports import os from argparse import ArgumentParser from collections import OrderedDict from itertools import groupby from textwrap import dedent from time import time # Presto and change imports from presto.Defaults import default_out_args from presto.IO import printLog, printMessage, printProgress, printError, printWarning from changeo.Defaults import default_v_field, default_d_field, default_j_field, default_clone_field, \ default_seq_field, default_format from changeo.Commandline import CommonHelpFormatter, checkArgs, getCommonArgParser, parseCommonArgs, \ setDefaultFields from changeo.Gene import buildGermline, buildClonalGermline from changeo.IO import countDbFile, getDbFields, getFormatOperators, getOutputHandle, readGermlines, \ checkFields # Defaults default_germ_types = ['dmask'] def createGermlines(db_file, references, seq_field=default_seq_field, v_field=default_v_field, d_field=default_d_field, j_field=default_j_field, cloned=False, clone_field=default_clone_field, germ_types=default_germ_types, format=default_format, out_file=None, out_args=default_out_args): """ Write germline sequences to tab-delimited database file Arguments: db_file : input tab-delimited database file. references : folders and/or files containing germline repertoire data in FASTA format. seq_field : field in which to look for sequence. v_field : field in which to look for V call. d_field : field in which to look for D call. j_field : field in which to look for J call. cloned : if True build germlines by clone, otherwise build individual germlines. clone_field : field containing clone identifiers; ignored if cloned=False. germ_types : list of germline sequence types to be output from the set of 'full', 'dmask', 'vonly', 'regions' format : input and output format. out_file : output file name. Automatically generated from the input file if None. out_args : arguments for output preferences. Returns: dict: names of the 'pass' and 'fail' output files. """ # Print parameter info log = OrderedDict() log['START'] = 'CreateGermlines' log['FILE'] = os.path.basename(db_file) log['GERM_TYPES'] = ','.join(germ_types) log['SEQ_FIELD'] = seq_field log['V_FIELD'] = v_field log['D_FIELD'] = d_field log['J_FIELD'] = j_field log['CLONED'] = cloned if cloned: log['CLONE_FIELD'] = clone_field printLog(log) # Define format operators try: reader, writer, schema = getFormatOperators(format) except ValueError: printError('Invalid format %s' % format) out_args['out_type'] = schema.out_type # TODO: this won't work for AIRR necessarily # Define output germline fields germline_fields = OrderedDict() seq_type = seq_field.split('_')[-1] if 'full' in germ_types: germline_fields['full'] = 'germline_' + seq_type if 'dmask' in germ_types: germline_fields['dmask'] = 'germline_' + seq_type + '_d_mask' if 'vonly' in germ_types: germline_fields['vonly'] = 'germline_' + seq_type + '_v_region' if 'regions' in germ_types: germline_fields['regions'] = 'germline_regions' if cloned: germline_fields['v'] = 'germline_v_call' germline_fields['d'] = 'germline_d_call' germline_fields['j'] = 'germline_j_call' out_fields = getDbFields(db_file, add=[schema.fromReceptor(f) for f in germline_fields.values()], reader=reader) # Get repertoire and open Db reader reference_dict = readGermlines(references) db_handle = open(db_file, 'rt') db_iter = reader(db_handle) # Check for required columns try: required = ['v_germ_start_imgt', 'd_germ_start', 'j_germ_start', 'np1_length', 'np2_length'] checkFields(required, db_iter.fields, schema=schema) except LookupError as e: printError(e) # Check for IMGT-gaps in germlines if all('...' not in x for x in reference_dict.values()): printWarning('Germline reference sequences do not appear to contain IMGT-numbering spacers. Results may be incorrect.') # Count input total_count = countDbFile(db_file) # Check for existence of fields for f in [v_field, d_field, j_field, seq_field]: if f not in db_iter.fields: printError('%s field does not exist in input database file.' % f) # Translate to Receptor attribute names v_field = schema.toReceptor(v_field) d_field = schema.toReceptor(d_field) j_field = schema.toReceptor(j_field) seq_field = schema.toReceptor(seq_field) clone_field = schema.toReceptor(clone_field) # Define Receptor iterator if cloned: start_time = time() printMessage('Sorting by clone', start_time=start_time, width=20) sorted_records = sorted(db_iter, key=lambda x: x.getField(clone_field)) printMessage('Done', start_time=start_time, end=True, width=20) receptor_iter = groupby(sorted_records, lambda x: x.getField(clone_field)) else: receptor_iter = ((x.sequence_id, [x]) for x in db_iter) # Define log handle if out_args['log_file'] is None: log_handle = None else: log_handle = open(out_args['log_file'], 'w') # Initialize handles, writers and counters pass_handle, pass_writer = None, None fail_handle, fail_writer = None, None rec_count, pass_count, fail_count = 0, 0, 0 start_time = time() # Iterate over rows for key, records in receptor_iter: # Print progress printProgress(rec_count, total_count, 0.05, start_time=start_time) # Define iteration variables records = list(records) rec_log = OrderedDict([('ID', key)]) rec_count += len(records) # Build germline for records if len(records) == 1: germ_log, germlines, genes = buildGermline(records[0], reference_dict, seq_field=seq_field, v_field=v_field, d_field=d_field, j_field=j_field) else: germ_log, germlines, genes = buildClonalGermline(records, reference_dict, seq_field=seq_field, v_field=v_field, d_field=d_field, j_field=j_field) rec_log.update(germ_log) # Write row to pass or fail file if germlines is not None: pass_count += len(records) # Add germlines to Receptor record annotations = {} if 'full' in germ_types: annotations[germline_fields['full']] = germlines['full'] if 'dmask' in germ_types: annotations[germline_fields['dmask']] = germlines['dmask'] if 'vonly' in germ_types: annotations[germline_fields['vonly']] = germlines['vonly'] if 'regions' in germ_types: annotations[germline_fields['regions']] = germlines['regions'] if cloned: annotations[germline_fields['v']] = genes['v'] annotations[germline_fields['d']] = genes['d'] annotations[germline_fields['j']] = genes['j'] # Write records try: for r in records: r.setDict(annotations) pass_writer.writeReceptor(r) except AttributeError: # Create output file handle and writer if out_file is not None: pass_handle = open(out_file, 'w') else: pass_handle = getOutputHandle(db_file, out_label='germ-pass', out_dir=out_args['out_dir'], out_name=out_args['out_name'], out_type=out_args['out_type']) pass_writer = writer(pass_handle, fields=out_fields) for r in records: r.setDict(annotations) pass_writer.writeReceptor(r) else: fail_count += len(records) if out_args['failed']: try: fail_writer.writeReceptor(records) except AttributeError: fail_handle = getOutputHandle(db_file, out_label='germ-fail', out_dir=out_args['out_dir'], out_name=out_args['out_name'], out_type=out_args['out_type']) fail_writer = writer(fail_handle, fields=out_fields) fail_writer.writeReceptor(records) # Write log printLog(rec_log, handle=log_handle) # Print log printProgress(rec_count, total_count, 0.05, start_time=start_time) log = OrderedDict() log['OUTPUT'] = os.path.basename(pass_handle.name) if pass_handle is not None else None log['RECORDS'] = rec_count log['PASS'] = pass_count log['FAIL'] = fail_count log['END'] = 'CreateGermlines' printLog(log) # Close file handles db_handle.close() output = {'pass': None, 'fail': None} if pass_handle is not None: output['pass'] = pass_handle.name pass_handle.close() if fail_handle is not None: output['fail'] = fail_handle.name fail_handle.close() if log_handle is not None: log_handle.close() return output def getArgParser(): """ Defines the ArgumentParser Arguments: None Returns: an ArgumentParser object """ # Define input and output field help message fields = dedent( ''' output files: germ-pass database with assigned germline sequences. germ-fail database with records failing germline assignment. required fields: sequence_id, sequence_alignment, v_call, d_call, j_call, v_sequence_start, v_sequence_end, v_germline_start, v_germline_end, d_sequence_start, d_sequence_end, d_germline_start, d_germline_end, j_sequence_start, j_sequence_end, j_germline_start, j_germline_end, np1_length, np2_length optional fields: n1_length, n2_length, p3v_length, p5d_length, p3d_length, p5j_length, clone_id output fields: germline_v_call, germline_d_call, germline_j_call, germline_alignment, germline_alignment_d_mask, germline_alignment_v_region, germline_regions, ''') # Define argument parser parser = ArgumentParser(description=__doc__, epilog=fields, parents=[getCommonArgParser(format=True)], formatter_class=CommonHelpFormatter, add_help=False) # Germlines arguments group = parser.add_argument_group('germline construction arguments') group.add_argument('-r', nargs='+', action='store', dest='references', required=True, help='''List of folders and/or fasta files (with .fasta, .fna or .fa extension) with germline sequences. When using the default Change-O sequence and coordinate fields, these reference sequences must contain IMGT-numbering spacers (gaps) in the V segment. Alternative numbering schemes, or no numbering, may work for alternative sequence and coordinate definitions that define a valid alignment, but a warning will be issued.''') group.add_argument('-g', action='store', dest='germ_types', default=default_germ_types, nargs='+', choices=('full', 'dmask', 'vonly', 'regions'), help='''Specify type(s) of germlines to include full germline, germline with D segment masked, or germline for V segment only.''') group.add_argument('--cloned', action='store_true', dest='cloned', help='''Specify to create only one germline per clone. Note, if allele calls are ambiguous within a clonal group, this will place the germline call used for the entire clone within the germline_v_call, germline_d_call and germline_j_call fields.''') group.add_argument('--sf', action='store', dest='seq_field', default=None, help='''Field containing the aligned sequence. Defaults to sequence_alignment (airr) or SEQUENCE_IMGT (changeo).''') group.add_argument('--vf', action='store', dest='v_field', default=None, help='''Field containing the germline V segment call. Defaults to v_call (airr) or V_CALL (changeo).''') group.add_argument('--df', action='store', dest='d_field', default=None, help='''Field containing the germline D segment call. Defaults to d_call (airr) or D_CALL (changeo).''') group.add_argument('--jf', action='store', dest='j_field', default=None, help='''Field containing the germline J segment call. Defaults to j_call (airr) or J_CALL (changeo).''') group.add_argument('--cf', action='store', dest='clone_field', default=None, help='''Field containing clone identifiers. Ignored if --cloned is not also specified. Defaults to clone_id (airr) or CLONE (changeo).''') return parser if __name__ == '__main__': """ Parses command line arguments and calls main """ # Parse command line arguments parser = getArgParser() checkArgs(parser) args = parser.parse_args() args_dict = parseCommonArgs(args) # Set default fields default_fields = {'seq_field': default_seq_field, 'v_field': default_v_field, 'd_field': default_d_field, 'j_field': default_j_field, 'clone_field': default_clone_field} args_dict = setDefaultFields(args_dict, default_fields, format=args_dict['format']) # Check that reference files exist for f in args_dict['references']: if not os.path.exists(f): parser.error('Germline reference file or folder %s does not exist.' % f) # Clean arguments dictionary del args_dict['db_files'] if 'out_files' in args_dict: del args_dict['out_files'] # Call main function for each input file for i, f in enumerate(args.__dict__['db_files']): args_dict['db_file'] = f args_dict['out_file'] = args.__dict__['out_files'][i] \ if args.__dict__['out_files'] else None createGermlines(**args_dict) changeo-1.2.0/bin/AlignRecords.py0000755000175000017500000004552214001402022016136 0ustar nileshnilesh#!/usr/bin/env python3 """ Multiple aligns sequence fields """ # Info __author__ = 'Jason Anthony Vander Heiden' from changeo import __version__, __date__ # Imports import os import shutil from argparse import ArgumentParser from collections import OrderedDict from itertools import chain from textwrap import dedent from Bio.SeqRecord import SeqRecord # Presto and changeo import from presto.Defaults import default_out_args, default_muscle_exec from presto.Applications import runMuscle from presto.IO import printLog, printError, printWarning from presto.Multiprocessing import manageProcesses from changeo.Commandline import CommonHelpFormatter, checkArgs, getCommonArgParser, parseCommonArgs from changeo.IO import getDbFields, getFormatOperators from changeo.Multiprocessing import DbResult, feedDbQueue, processDbQueue, collectDbQueue # TODO: maybe not bothering with 'set' is best. can just work off field identity def groupRecords(records, fields=None, calls=['v', 'j'], mode='gene', action='first'): """ Groups Receptor objects based on gene or annotation Arguments: records : an iterator of Receptor objects to group. fields : gene field to group by. calls : allele calls to use for grouping. one or more of ('v', 'd', 'j'). mode : specificity of alignment call to use for allele call fields. one of ('allele', 'gene'). action : only 'first' is currently supported. Returns: dictionary of grouped records """ # Define functions for grouping keys if mode == 'allele' and fields is None: def _get_key(rec, calls, action): return tuple(rec.getAlleleCalls(calls, action)) elif mode == 'gene' and fields is None: def _get_key(rec, calls, action): return tuple(rec.getGeneCalls(calls, action)) elif mode == 'allele' and fields is not None: def _get_key(rec, calls, action): vdj = rec.getAlleleCalls(calls, action) ann = [rec.getChangeo(k) for k in fields] return tuple(chain(vdj, ann)) elif mode == 'gene' and fields is not None: def _get_key(rec, calls, action): vdj = rec.getGeneCalls(calls, action) ann = [rec.getChangeo(k) for k in fields] return tuple(chain(vdj, ann)) rec_index = {} for rec in records: key = _get_key(rec, calls, action) # Assigned grouped records to individual keys and all failed to a single key if all([k is not None for k in key]): rec_index.setdefault(key, []).append(rec) else: rec_index.setdefault(None, []).append(rec) return rec_index def alignBlocks(data, field_map, muscle_exec=default_muscle_exec): """ Multiple aligns blocks of sequence fields together Arguments: data : DbData object with Receptor objects to process. field_map : a dictionary of {input sequence : output sequence) field names to multiple align. muscle_exec : the MUSCLE executable. Returns: changeo.Multiprocessing.DbResult : object containing Receptor objects with multiple aligned sequence fields. """ # Define sequence fields seq_fields = list(field_map.keys()) # Function to validate record def _pass(rec): if all([len(rec.getField(f)) > 0 for f in seq_fields]): return True else: return False # Define return object result = DbResult(data.id, data.data) result.results = data.data result.valid = True # Fail invalid groups if result.id is None or not all([_pass(x) for x in data.data]): result.log = None result.valid = False return result # Run muscle and map results seq_list = [SeqRecord(r.getSeq(f), id='%s_%s' % (r.sequence_id.replace(' ', '_'), f)) for f in seq_fields \ for r in data.data] seq_aln = runMuscle(seq_list, aligner_exec=muscle_exec) if seq_aln is not None: aln_map = {x.id: i for i, x in enumerate(seq_aln)} for i, r in enumerate(result.results, start=1): for f in seq_fields: idx = aln_map['%s_%s' % (r.sequence_id.replace(' ', '_'), f)] seq = str(seq_aln[idx].seq) r.annotations[field_map[f]] = seq result.log['%s-%s' % (f, r.sequence_id)] = seq else: result.valid = False #for r in result.results: print r.annotations return result def alignAcross(data, field_map, muscle_exec=default_muscle_exec): """ Multiple aligns sequence fields column wise Arguments: data : DbData object with Receptor objects to process. field_map : a dictionary of {input sequence : output sequence) field names to multiple align. muscle_exec : the MUSCLE executable. Returns: changeo.Multiprocessing.DbResult : object containing Receptor objects with multiple aligned sequence fields. """ # Define sequence fields seq_fields = list(field_map.keys()) # Function to validate record def _pass(rec): if all([len(rec.getField(f)) > 0 for f in seq_fields]): return True else: return False # Define return object result = DbResult(data.id, data.data) result.results = data.data result.valid = True # Fail invalid groups if result.id is None or not all([_pass(x) for x in data.data]): result.log = None result.valid = False return result seq_fields = list(field_map.keys()) for f in seq_fields: seq_list = [SeqRecord(r.getSeq(f), id=r.sequence_id.replace(' ', '_')) for r in data.data] seq_aln = runMuscle(seq_list, aligner_exec=muscle_exec) if seq_aln is not None: aln_map = {x.id: i for i, x in enumerate(seq_aln)} for i, r in enumerate(result.results, start=1): idx = aln_map[r.sequence_id.replace(' ', '_')] seq = str(seq_aln[idx].seq) r.annotations[field_map[f]] = seq result.log['%s-%s' % (f, r.sequence_id)] = seq else: result.valid = False #for r in result.results: print r.annotations return result def alignWithin(data, field_map, muscle_exec=default_muscle_exec): """ Multiple aligns sequence fields within a row Arguments: data : DbData object with Receptor objects to process. field_map : a dictionary of {input sequence : output sequence) field names to multiple align. muscle_exec : the MUSCLE executable. Returns: changeo.Multiprocessing.DbResult : object containing Receptor objects with multiple aligned sequence fields. """ # Define sequence fields seq_fields = list(field_map.keys()) # Function to validate record def _pass(rec): if all([len(rec.getField(f)) > 0 for f in seq_fields]): return True else: return False # Define return object result = DbResult(data.id, data.data) result.results = data.data result.valid = True # Fail invalid groups if result.id is None or not _pass(data.data): result.log = None result.valid = False return result record = data.data seq_list = [SeqRecord(record.getSeq(f), id=f) for f in seq_fields] seq_aln = runMuscle(seq_list, aligner_exec=muscle_exec) if seq_aln is not None: aln_map = {x.id: i for i, x in enumerate(seq_aln)} for f in seq_fields: idx = aln_map[f] seq = str(seq_aln[idx].seq) record.annotations[field_map[f]] = seq result.log[f] = seq else: result.valid = False return result def alignRecords(db_file, seq_fields, group_func, align_func, group_args={}, align_args={}, format='changeo', out_file=None, out_args=default_out_args, nproc=None, queue_size=None): """ Performs a multiple alignment on sets of sequences Arguments: db_file : filename of the input database. seq_fields : the sequence fields to multiple align. group_func : function to use to group records. align_func : function to use to multiple align sequence groups. group_args : dictionary of arguments to pass to group_func. align_args : dictionary of arguments to pass to align_func. format : output format. One of 'changeo' or 'airr'. out_file : output file name. Automatically generated from the input file if None. out_args : common output argument dictionary from parseCommonArgs. nproc : the number of processQueue processes. if None defaults to the number of CPUs. queue_size : maximum size of the argument queue. if None defaults to 2*nproc. Returns: dict : names of the 'pass' and 'fail' output files. """ # Define subcommand label dictionary cmd_dict = {alignAcross: 'across', alignWithin: 'within', alignBlocks: 'block'} # Print parameter info log = OrderedDict() log['START'] = 'AlignRecords' log['COMMAND'] = cmd_dict.get(align_func, align_func.__name__) log['FILE'] = os.path.basename(db_file) log['SEQ_FIELDS'] = ','.join(seq_fields) if 'group_fields' in group_args: log['GROUP_FIELDS'] = ','.join(group_args['group_fields']) if 'mode' in group_args: log['MODE'] = group_args['mode'] if 'action' in group_args: log['ACTION'] = group_args['action'] log['NPROC'] = nproc printLog(log) # Define format operators try: reader, writer, schema = getFormatOperators(format) except ValueError: printError('Invalid format %s.' % format) # Define feeder function and arguments if 'group_fields' in group_args and group_args['group_fields'] is not None: group_args['group_fields'] = [schema.toReceptor(f) for f in group_args['group_fields']] feed_func = feedDbQueue feed_args = {'db_file': db_file, 'reader': reader, 'group_func': group_func, 'group_args': group_args} # Define worker function and arguments field_map = OrderedDict([(schema.toReceptor(f), '%s_align' % f) for f in seq_fields]) align_args['field_map'] = field_map work_func = processDbQueue work_args = {'process_func': align_func, 'process_args': align_args} # Define collector function and arguments out_fields = getDbFields(db_file, add=list(field_map.values()), reader=reader) out_args['out_type'] = schema.out_type collect_func = collectDbQueue collect_args = {'db_file': db_file, 'label': 'align', 'fields': out_fields, 'writer': writer, 'out_file': out_file, 'out_args': out_args} # Call process manager result = manageProcesses(feed_func, work_func, collect_func, feed_args, work_args, collect_args, nproc, queue_size) # Print log result['log']['END'] = 'AlignRecords' printLog(result['log']) output = {k: v for k, v in result.items() if k in ('pass', 'fail')} return output def getArgParser(): """ Defines the ArgumentParser Arguments: None Returns: an ArgumentParser object """ # Define output file names and header fields fields = dedent( ''' output files: align-pass database with multiple aligned sequences. align-fail database with records failing alignment. required fields: sequence_id, v_call, j_call user specified sequence fields to align. output fields: _align ''') # Define ArgumentParser parser = ArgumentParser(description=__doc__, epilog=fields, formatter_class=CommonHelpFormatter, add_help=False) group_help = parser.add_argument_group('help') group_help.add_argument('--version', action='version', version='%(prog)s:' + ' %s %s' %(__version__, __date__)) group_help.add_argument('-h', '--help', action='help', help='show this help message and exit') subparsers = parser.add_subparsers(title='subcommands', dest='command', metavar='', help='alignment method') # TODO: This is a temporary fix for Python issue 9253 subparsers.required = True # Parent parser parser_parent = getCommonArgParser(format=True, multiproc=True) # Argument parser for column-wise alignment across records parser_across = subparsers.add_parser('across', parents=[parser_parent], formatter_class=CommonHelpFormatter, add_help=False, help='''Multiple aligns sequence columns within groups and across rows using MUSCLE.''') group_across = parser_across.add_argument_group('alignment arguments') group_across.add_argument('--sf', nargs='+', action='store', dest='seq_fields', required=True, help='The sequence fields to multiple align within each group.') group_across.add_argument('--gf', nargs='+', action='store', dest='group_fields', default=None, help='Additional (not allele call) fields to use for grouping.') group_across.add_argument('--calls', nargs='+', action='store', dest='calls', choices=('v', 'd', 'j'), default=['v', 'j'], help='Segment calls (allele assignments) to use for grouping.') group_across.add_argument('--mode', action='store', dest='mode', choices=('allele', 'gene'), default='gene', help='''Specifies whether to use the V(D)J allele or gene when an allele call field (--calls) is specified.''') group_across.add_argument('--act', action='store', dest='action', default='first', choices=('first', ), help='''Specifies how to handle multiple values within default allele call fields. Currently, only "first" is supported.''') group_across.add_argument('--exec', action='store', dest='muscle_exec', default=default_muscle_exec, help='The location of the MUSCLE executable') parser_across.set_defaults(group_func=groupRecords, align_func=alignAcross) # Argument parser for alignment of fields within records parser_within = subparsers.add_parser('within', parents=[parser_parent], formatter_class=CommonHelpFormatter, add_help=False, help='Multiple aligns sequence fields within rows using MUSCLE') group_within = parser_within.add_argument_group('alignment arguments') group_within.add_argument('--sf', nargs='+', action='store', dest='seq_fields', required=True, help='The sequence fields to multiple align within each record.') group_within.add_argument('--exec', action='store', dest='muscle_exec', default=default_muscle_exec, help='The location of the MUSCLE executable') parser_within.set_defaults(group_func=None, align_func=alignWithin) # Argument parser for column-wise alignment across records parser_block = subparsers.add_parser('block', parents=[parser_parent], formatter_class=CommonHelpFormatter, add_help=False, help='''Multiple aligns sequence groups across both columns and rows using MUSCLE.''') group_block = parser_block.add_argument_group('alignment arguments') group_block.add_argument('--sf', nargs='+', action='store', dest='seq_fields', required=True, help='The sequence fields to multiple align within each group.') group_block.add_argument('--gf', nargs='+', action='store', dest='group_fields', default=None, help='Additional (not allele call) fields to use for grouping.') group_block.add_argument('--calls', nargs='+', action='store', dest='calls', choices=('v', 'd', 'j'), default=['v', 'j'], help='Segment calls (allele assignments) to use for grouping.') group_block.add_argument('--mode', action='store', dest='mode', choices=('allele', 'gene'), default='gene', help='''Specifies whether to use the V(D)J allele or gene when an allele call field (--calls) is specified.''') group_block.add_argument('--act', action='store', dest='action', default='first', choices=('first', ), help='''Specifies how to handle multiple values within default allele call fields. Currently, only "first" is supported.''') group_block.add_argument('--exec', action='store', dest='muscle_exec', default=default_muscle_exec, help='The location of the MUSCLE executable') parser_block.set_defaults(group_func=groupRecords, align_func=alignBlocks) return parser if __name__ == '__main__': """ Parses command line arguments and calls main function """ # Parse arguments parser = getArgParser() checkArgs(parser) args = parser.parse_args() args_dict = parseCommonArgs(args) # Check if a valid MUSCLE executable was specified for muscle mode if not shutil.which(args.muscle_exec): parser.error('%s does not exist or is not executable.' % args.muscle_exec) # Define align_args args_dict['align_args'] = {'muscle_exec': args_dict['muscle_exec']} del args_dict['muscle_exec'] # Define group_args if args_dict['group_func'] is groupRecords: args_dict['group_args'] = {'fields':args_dict['group_fields'], 'calls':args_dict['calls'], 'mode':args_dict['mode'], 'action':args_dict['action']} del args_dict['group_fields'] del args_dict['calls'] del args_dict['mode'] del args_dict['action'] # Clean arguments dictionary del args_dict['command'] del args_dict['db_files'] if 'out_files' in args_dict: del args_dict['out_files'] # Call main function for each input file for i, f in enumerate(args.__dict__['db_files']): args_dict['db_file'] = f args_dict['out_file'] = args.__dict__['out_files'][i] \ if args.__dict__['out_files'] else None alignRecords(**args_dict) changeo-1.2.0/bin/MakeDb.py0000755000175000017500000012557314135625406014737 0ustar nileshnilesh#!/usr/bin/env python3 """ Create tab-delimited database file to store sequence alignment information """ # Info __author__ = 'Namita Gupta, Jason Anthony Vander Heiden' from changeo import __version__, __date__ # Imports import os import re import csv from argparse import ArgumentParser from collections import OrderedDict from textwrap import dedent from time import time from Bio import SeqIO # Presto and changeo imports from presto.Annotation import parseAnnotation from presto.IO import countSeqFile, printLog, printMessage, printProgress, printError, printWarning, readSeqFile from changeo.Defaults import default_format, default_out_args, default_imgt_id_len from changeo.Commandline import CommonHelpFormatter, checkArgs, getCommonArgParser, parseCommonArgs from changeo.Alignment import RegionDefinition from changeo.Gene import buildGermline from changeo.IO import countDbFile, extractIMGT, readGermlines, getFormatOperators, getOutputHandle, \ AIRRWriter, ChangeoWriter, IgBLASTReader, IgBLASTReaderAA, IMGTReader, IHMMuneReader from changeo.Receptor import ChangeoSchema, AIRRSchema # 10X Receptor attributes cellranger_base = ['cell', 'c_call', 'conscount', 'umicount'] cellranger_extended = ['cell', 'c_call', 'conscount', 'umicount', 'v_call_10x', 'd_call_10x', 'j_call_10x', 'junction_10x', 'junction_10x_aa'] def readCellRanger(cellranger_file, fields=cellranger_base): """ Load a Cell Ranger annotation table Arguments: cellranger_file (str): path to the annotation file. fields (list): list of fields to keep. Returns: dict: dict of dicts with contig_id as the primary key. """ # Mapping of 10X annotations to Receptor attributes cellranger_map = {'cell': 'barcode', 'c_call': 'c_gene', 'locus': 'chain', 'conscount': 'reads', 'umicount': 'umis', 'v_call_10x': 'v_gene', 'd_call_10x': 'd_gene', 'j_call_10x': 'j_gene', 'junction_10x': 'cdr3_nt', 'junction_10x_aa': 'cdr3'} # Function to parse individual fields def _parse(x): return '' if x == 'None' else x # Generate annotation dictionary ann_dict = {} with open(cellranger_file) as csv_file: # Detect delimiters dialect = csv.Sniffer().sniff(csv_file.readline()) csv_file.seek(0) # Read in annotation file csv_reader = csv.DictReader(csv_file, dialect=dialect) # Generate annotation dictionary for row in csv_reader: ann_dict[row['contig_id']] = {f: _parse(row[cellranger_map[f]]) for f in fields} return ann_dict def addGermline(receptor, references, amino_acid=False): """ Add full length germline to Receptor object Arguments: receptor (changeo.Receptor.Receptor): Receptor object to modify. references (dict): dictionary of IMGT-gapped references sequences. amino_acid (bool): if True build amino acid germline, otherwise build nucleotide germline Returns: changeo.Receptor.Receptor: modified Receptor with the germline sequence added. """ if amino_acid: __, germlines, __ = buildGermline(receptor, references, seq_field='sequence_aa_imgt', amino_acid=True) germline_seq = None if germlines is None else germlines['full'] receptor.setField('germline_aa_imgt', germline_seq) else: __, germlines, __ = buildGermline(receptor, references, amino_acid=False) germline_seq = None if germlines is None else germlines['full'] receptor.setField('germline_imgt', germline_seq) return receptor def getIDforIMGT(seq_file, imgt_id_len=default_imgt_id_len): """ Create a sequence ID translation using IMGT truncation. Arguments: seq_file : a fasta file of sequences input to IMGT. Returns: dict : a dictionary of with the IMGT truncated ID as the key and the full sequence description as the value. """ # Create a sequence ID translation using IDs truncate up to space or 49 chars ids = {} for rec in readSeqFile(seq_file): if len(rec.description) <= imgt_id_len: id_key = rec.description else: # truncate and replace characters if imgt_id_len == 49: # 28 September 2021 (version 1.8.4) id_key = re.sub('\s|\t', '_', rec.description[:imgt_id_len]) else: # older versions id_key = re.sub('\||\s|!|&|\*|<|>|\?', '_', rec.description[:imgt_id_len]) ids.update({id_key: rec.description}) return ids def getSeqDict(seq_file): """ Create a dictionary from a sequence file. Arguments: seq_file : sequence file. Returns: dict : sequence description as keys with Bio.SeqRecords as values. """ seq_dict = SeqIO.to_dict(readSeqFile(seq_file), key_function=lambda x: x.description) return seq_dict def writeDb(records, fields, aligner_file, total_count, id_dict=None, annotations=None, amino_acid=False, partial=False, asis_id=True, regions='default', writer=AIRRWriter, out_file=None, out_args=default_out_args): """ Writes parsed records to an output file Arguments: records : a iterator of Receptor objects containing alignment data. fields : a list of ordered field names to write. aligner_file : input file name. total_count : number of records (for progress bar). id_dict : a dictionary of the truncated sequence ID mapped to the full sequence ID. annotations : additional annotation dictionary. amino_acid : if True do verification on amino acid fields. partial : if True put incomplete alignments in the pass file. asis_id : if ID is to be parsed for pRESTO output with default delimiters. regions (str): name of the IMGT FWR/CDR region definitions to use. writer : writer class. out_file : output file name. Automatically generated from the input file if None. out_args : common output argument dictionary from parseCommonArgs. Returns: None """ # Wrapper for opening handles and writers def _open(x, f, writer=writer, out_file=out_file): if out_file is not None and x == 'pass': handle = open(out_file, 'w') else: handle = getOutputHandle(aligner_file, out_label='db-%s' % x, out_dir=out_args['out_dir'], out_name=out_args['out_name'], out_type=out_args['out_type']) return handle, writer(handle, fields=f) # Function to convert fasta header annotations to changeo columns def _changeo(f, header): h = [ChangeoSchema.fromReceptor(x) for x in header if x.upper() not in f] f.extend(h) return f def _airr(f, header): h = [AIRRSchema.fromReceptor(x) for x in header if x.lower() not in f] f.extend(h) return f # Function to verify IMGT-gapped sequence and junction concur def _imgt_check(rec): try: if amino_acid: rd = RegionDefinition(rec.junction_aa_length, amino_acid=amino_acid, definition=regions) x, y = rd.positions['junction'] check = (rec.junction_aa == rec.sequence_aa_imgt[x:y]) else: rd = RegionDefinition(rec.junction_length, amino_acid=amino_acid, definition=regions) x, y = rd.positions['junction'] check = (rec.junction == rec.sequence_imgt[x:y]) except (TypeError, AttributeError): check = False return check # Function to check for valid records strictly def _strict(rec): if amino_acid: valid = [rec.v_call and rec.v_call != 'None', rec.j_call and rec.j_call != 'None', rec.functional is not None, rec.sequence_aa_imgt, rec.junction_aa, _imgt_check(rec)] else: valid = [rec.v_call and rec.v_call != 'None', rec.j_call and rec.j_call != 'None', rec.functional is not None, rec.sequence_imgt, rec.junction, _imgt_check(rec)] return all(valid) # Function to check for valid records loosely def _gentle(rec): valid = [rec.v_call and rec.v_call != 'None', rec.d_call and rec.d_call != 'None', rec.j_call and rec.j_call != 'None'] return any(valid) # Set writer class and annotation conversion function if writer == ChangeoWriter: _annotate = _changeo elif writer == AIRRWriter: _annotate = _airr else: printError('Invalid output writer.') # Additional annotation (e.g. 10X cell calls) # _append_table = None # if cellranger_file is not None: # with open(cellranger_file) as csv_file: # # Read in annotation file (use Sniffer to discover file delimiters) # dialect = csv.Sniffer().sniff(csv_file.readline()) # csv_file.seek(0) # csv_reader = csv.DictReader(csv_file, dialect = dialect) # # # Generate annotation dictionary # anntab_dict = {entry['contig_id']: {cellranger_map[field]: entry[field] \ # for field in cellranger_map.keys()} for entry in csv_reader} # # fields = _annotate(fields, cellranger_map.values()) # _append_table = lambda sequence_id: anntab_dict[sequence_id] # Set pass criteria _pass = _gentle if partial else _strict # Define log handle if out_args['log_file'] is None: log_handle = None else: log_handle = open(out_args['log_file'], 'w') # Initialize handles, writers and counters pass_handle, pass_writer = None, None fail_handle, fail_writer = None, None pass_count, fail_count = 0, 0 start_time = time() # Validate and write output printProgress(0, total_count, 0.05, start_time=start_time) for i, record in enumerate(records, start=1): # Replace sequence description with full string, if required if id_dict is not None and record.sequence_id in id_dict: record.sequence_id = id_dict[record.sequence_id] # Parse sequence description into new columns if not asis_id: try: ann_raw = parseAnnotation(record.sequence_id) record.sequence_id = ann_raw.pop('ID') # Convert to Receptor fields ann_parsed = OrderedDict() for k, v in ann_raw.items(): ann_parsed[ChangeoSchema.toReceptor(k)] = v # Add annotations to Receptor and update field list record.setDict(ann_parsed, parse=True) if i == 1: fields = _annotate(fields, ann_parsed.keys()) except IndexError: # Could not parse pRESTO-style annotations so fall back to no parse asis_id = True printWarning('Sequence annotation format not recognized. Sequence headers will not be parsed.') # Add supplemental annotation fields # if _append_table is not None: # record.setDict(_append_table(record.sequence_id), parse=True) if annotations is not None: record.setDict(annotations[record.sequence_id], parse=True) if i == 1: fields = _annotate(fields, annotations[record.sequence_id].keys()) # Count pass or fail and write to appropriate file if _pass(record): pass_count += 1 # Write row to pass file try: pass_writer.writeReceptor(record) except AttributeError: # Open pass file and writer pass_handle, pass_writer = _open('pass', fields) pass_writer.writeReceptor(record) else: fail_count += 1 # Write row to fail file if specified if out_args['failed']: try: fail_writer.writeReceptor(record) except AttributeError: # Open fail file and writer fail_handle, fail_writer = _open('fail', fields) fail_writer.writeReceptor(record) # Write log if log_handle is not None: log = OrderedDict([('ID', record.sequence_id), ('V_CALL', record.v_call), ('D_CALL', record.d_call), ('J_CALL', record.j_call), ('PRODUCTIVE', record.functional)]) if not _imgt_check(record) and not amino_acid: log['ERROR'] = 'Junction does not match the sequence starting at position 310 in the IMGT numbered V(D)J sequence.' printLog(log, log_handle) # Print progress printProgress(i, total_count, 0.05, start_time=start_time) # Print console log log = OrderedDict() log['OUTPUT'] = os.path.basename(pass_handle.name) if pass_handle is not None else None log['PASS'] = pass_count log['FAIL'] = fail_count log['END'] = 'MakeDb' printLog(log) # Close file handles output = {'pass': None, 'fail': None} if pass_handle is not None: output['pass'] = pass_handle.name pass_handle.close() if fail_handle is not None: output['fail'] = fail_handle.name fail_handle.close() return output def parseIMGT(aligner_file, seq_file=None, repo=None, cellranger_file=None, partial=False, asis_id=True, extended=False, format=default_format, out_file=None, out_args=default_out_args, imgt_id_len=default_imgt_id_len): """ Main for IMGT aligned sample sequences. Arguments: aligner_file : zipped file or unzipped folder output by IMGT. seq_file : FASTA file input to IMGT (from which to get seqID). repo : folder with germline repertoire files. partial : If True put incomplete alignments in the pass file. asis_id : if ID is to be parsed for pRESTO output with default delimiters. extended : if True add alignment score, FWR, CDR and junction fields to output file. format : output format. one of 'changeo' or 'airr'. out_file : output file name. Automatically generated from the input file if None. out_args : common output argument dictionary from parseCommonArgs. imgt_id_len: maximum character length of sequence identifiers reported by IMGT/HighV-QUEST. Returns: dict : names of the 'pass' and 'fail' output files. """ # Print parameter info log = OrderedDict() log['START'] = 'MakeDb' log['COMMAND'] = 'imgt' log['ALIGNER_FILE'] = aligner_file log['SEQ_FILE'] = os.path.basename(seq_file) if seq_file else '' log['ASIS_ID'] = asis_id log['PARTIAL'] = partial log['EXTENDED'] = extended printLog(log) start_time = time() printMessage('Loading files', start_time=start_time, width=20) # Extract IMGT files temp_dir, imgt_files = extractIMGT(aligner_file) # Count records in IMGT files total_count = countDbFile(imgt_files['summary']) # Get (parsed) IDs from fasta file submitted to IMGT id_dict = getIDforIMGT(seq_file, imgt_id_len) if seq_file else {} # Load supplementary annotation table if cellranger_file is not None: f = cellranger_extended if extended else cellranger_base annotations = readCellRanger(cellranger_file, fields=f) else: annotations = None printMessage('Done', start_time=start_time, end=True, width=20) # Define format operators try: __, writer, schema = getFormatOperators(format) except ValueError: printError('Invalid format %s.' % format) out_args['out_type'] = schema.out_type # Define output fields fields = list(schema.required) if extended: custom = IMGTReader.customFields(scores=True, regions=True, junction=True, schema=schema) fields.extend(custom) # Parse IMGT output and write db with open(imgt_files['summary'], 'r') as summary_handle, \ open(imgt_files['gapped'], 'r') as gapped_handle, \ open(imgt_files['ntseq'], 'r') as ntseq_handle, \ open(imgt_files['junction'], 'r') as junction_handle: # Open parser parse_iter = IMGTReader(summary_handle, gapped_handle, ntseq_handle, junction_handle) # Add germline sequence if repo is None: germ_iter = parse_iter else: references = readGermlines(repo) # Check for IMGT-gaps in germlines if all('...' not in x for x in references.values()): printWarning('Germline reference sequences do not appear to contain IMGT-numbering spacers. Results may be incorrect.') germ_iter = (addGermline(x, references) for x in parse_iter) # Write db output = writeDb(germ_iter, fields=fields, aligner_file=aligner_file, total_count=total_count, annotations=annotations, id_dict=id_dict, asis_id=asis_id, partial=partial, writer=writer, out_file=out_file, out_args=out_args) # Cleanup temp directory temp_dir.cleanup() return output def parseIgBLAST(aligner_file, seq_file, repo, amino_acid=False, cellranger_file=None, partial=False, asis_id=True, asis_calls=False, extended=False, regions='default', infer_junction=False, format='changeo', out_file=None, out_args=default_out_args): """ Main for IgBLAST aligned sample sequences. Arguments: aligner_file (str): IgBLAST output file to process. seq_file (str): fasta file input to IgBlast (from which to get sequence). repo (str): folder with germline repertoire files. amino_acid (bool): if True then the IgBLAST output files are results from igblastp. igblastn is assumed if False. partial : If True put incomplete alignments in the pass file. asis_id (bool): if ID is to be parsed for pRESTO output with default delimiters. asis_calls (bool): if True do not parse gene calls for allele names. extended (bool): if True add alignment scores, FWR regions, and CDR regions to the output. regions (str): name of the IMGT FWR/CDR region definitions to use. infer_junction (bool): if True, infer the junction sequence, if not reported by IgBLAST. format (str): output format. one of 'changeo' or 'airr'. out_file (str): output file name. Automatically generated from the input file if None. out_args (dict): common output argument dictionary from parseCommonArgs. Returns: dict : names of the 'pass' and 'fail' output files. """ # Print parameter info log = OrderedDict() log['START'] = 'MakeDB' log['COMMAND'] = 'igblast-aa' if amino_acid else 'igblast' log['ALIGNER_FILE'] = os.path.basename(aligner_file) log['SEQ_FILE'] = os.path.basename(seq_file) log['ASIS_ID'] = asis_id log['ASIS_CALLS'] = asis_calls log['PARTIAL'] = partial log['EXTENDED'] = extended log['INFER_JUNCTION'] = infer_junction printLog(log) # Set amino acid conditions if amino_acid: format = '%s-aa' % format parser = IgBLASTReaderAA else: parser = IgBLASTReader # Start start_time = time() printMessage('Loading files', start_time=start_time, width=20) # Count records in sequence file total_count = countSeqFile(seq_file) # Get input sequence dictionary seq_dict = getSeqDict(seq_file) # Create germline repo dictionary references = readGermlines(repo, asis=asis_calls) # Load supplementary annotation table if cellranger_file is not None: f = cellranger_extended if extended else cellranger_base annotations = readCellRanger(cellranger_file, fields=f) else: annotations = None printMessage('Done', start_time=start_time, end=True, width=20) # Check for IMGT-gaps in germlines if all('...' not in x for x in references.values()): printWarning('Germline reference sequences do not appear to contain IMGT-numbering spacers. Results may be incorrect.') # Define format operators try: __, writer, schema = getFormatOperators(format) except ValueError: printError('Invalid format %s.' % format) out_args['out_type'] = schema.out_type # Define output fields fields = list(schema.required) if extended: custom = parser.customFields(schema=schema) fields.extend(custom) # Parse and write output with open(aligner_file, 'r') as f: parse_iter = parser(f, seq_dict, references, regions=regions, asis_calls=asis_calls, infer_junction=infer_junction) germ_iter = (addGermline(x, references, amino_acid=amino_acid) for x in parse_iter) output = writeDb(germ_iter, fields=fields, aligner_file=aligner_file, total_count=total_count, annotations=annotations, amino_acid=amino_acid, partial=partial, asis_id=asis_id, regions=regions, writer=writer, out_file=out_file, out_args=out_args) return output def parseIHMM(aligner_file, seq_file, repo, cellranger_file=None, partial=False, asis_id=True, extended=False, format=default_format, out_file=None, out_args=default_out_args): """ Main for iHMMuneAlign aligned sample sequences. Arguments: aligner_file : iHMMune-Align output file to process. seq_file : fasta file input to iHMMuneAlign (from which to get sequence). repo : folder with germline repertoire files. partial : If True put incomplete alignments in the pass file. asis_id : if ID is to be parsed for pRESTO output with default delimiters. extended : if True parse alignment scores, FWR and CDR region fields. format : output format. One of 'changeo' or 'airr'. out_file : output file name. Automatically generated from the input file if None. out_args : common output argument dictionary from parseCommonArgs. Returns: dict : names of the 'pass' and 'fail' output files. """ # Print parameter info log = OrderedDict() log['START'] = 'MakeDB' log['COMMAND'] = 'ihmm' log['ALIGNER_FILE'] = os.path.basename(aligner_file) log['SEQ_FILE'] = os.path.basename(seq_file) log['ASIS_ID'] = asis_id log['PARTIAL'] = partial log['EXTENDED'] = extended printLog(log) start_time = time() printMessage('Loading files', start_time=start_time, width=20) # Count records in sequence file total_count = countSeqFile(seq_file) # Get input sequence dictionary seq_dict = getSeqDict(seq_file) # Create germline repo dictionary references = readGermlines(repo) # Load supplementary annotation table if cellranger_file is not None: f = cellranger_extended if extended else cellranger_base annotations = readCellRanger(cellranger_file, fields=f) else: annotations = None printMessage('Done', start_time=start_time, end=True, width=20) # Check for IMGT-gaps in germlines if all('...' not in x for x in references.values()): printWarning('Germline reference sequences do not appear to contain IMGT-numbering spacers. Results may be incorrect.') # Define format operators try: __, writer, schema = getFormatOperators(format) except ValueError: printError('Invalid format %s.' % format) out_args['out_type'] = schema.out_type # Define output fields fields = list(schema.required) if extended: custom = IHMMuneReader.customFields(scores=True, regions=True, schema=schema) fields.extend(custom) # Parse and write output with open(aligner_file, 'r') as f: parse_iter = IHMMuneReader(f, seq_dict, references) germ_iter = (addGermline(x, references) for x in parse_iter) output = writeDb(germ_iter, fields=fields, aligner_file=aligner_file, total_count=total_count, annotations=annotations, asis_id=asis_id, partial=partial, writer=writer, out_file=out_file, out_args=out_args) return output def getArgParser(): """ Defines the ArgumentParser. Returns: argparse.ArgumentParser """ fields = dedent( ''' output files: db-pass database of alignment records with functionality information, V and J calls, and a junction region. db-fail database with records that fail due to no productivity information, no gene V assignment, no J assignment, or no junction region. universal output fields: sequence_id, sequence, sequence_alignment, germline_alignment, rev_comp, productive, stop_codon, vj_in_frame, locus, v_call, d_call, j_call, junction, junction_length, junction_aa, v_sequence_start, v_sequence_end, v_germline_start, v_germline_end, d_sequence_start, d_sequence_end, d_germline_start, d_germline_end, j_sequence_start, j_sequence_end, j_germline_start, j_germline_end, np1_length, np2_length, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3 imgt specific output fields: n1_length, n2_length, p3v_length, p5d_length, p3d_length, p5j_length, d_frame, v_score, v_identity, d_score, d_identity, j_score, j_identity igblast specific output fields: v_score, v_identity, v_support, v_cigar, d_score, d_identity, d_support, d_cigar, j_score, j_identity, j_support, j_cigar ihmm specific output fields: vdj_score 10X specific output fields: cell_id, c_call, consensus_count, umi_count, v_call_10x, d_call_10x, j_call_10x, junction_10x, junction_10x_aa ''') # Define ArgumentParser parser = ArgumentParser(description=__doc__, epilog=fields, formatter_class=CommonHelpFormatter, add_help=False) group_help = parser.add_argument_group('help') group_help.add_argument('--version', action='version', version='%(prog)s:' + ' %s %s' %(__version__, __date__)) group_help.add_argument('-h', '--help', action='help', help='show this help message and exit') subparsers = parser.add_subparsers(title='subcommands', dest='command', help='Aligner used', metavar='') # TODO: This is a temporary fix for Python issue 9253 subparsers.required = True # Parent parser parser_parent = getCommonArgParser(db_in=False) # igblastn output parser parser_igblast = subparsers.add_parser('igblast', parents=[parser_parent], formatter_class=CommonHelpFormatter, add_help=False, help='Process igblastn output.', description='Process igblastn output.') group_igblast = parser_igblast.add_argument_group('aligner parsing arguments') group_igblast.add_argument('-i', nargs='+', action='store', dest='aligner_files', required=True, help='''IgBLAST output files in format 7 with query sequence (igblastn argument \'-outfmt "7 std qseq sseq btop"\').''') group_igblast.add_argument('-r', nargs='+', action='store', dest='repo', required=True, help='''List of folders and/or fasta files containing the same germline set used in the IgBLAST alignment. These reference sequences must contain IMGT-numbering spacers (gaps) in the V segment.''') group_igblast.add_argument('-s', action='store', nargs='+', dest='seq_files', required=True, help='''List of input FASTA files (with .fasta, .fna or .fa extension), containing sequences.''') group_igblast.add_argument('--10x', action='store', nargs='+', dest='cellranger_file', help='''Table file containing 10X annotations (with .csv or .tsv extension).''') group_igblast.add_argument('--asis-id', action='store_true', dest='asis_id', help='''Specify to prevent input sequence headers from being parsed to add new columns to database. Parsing of sequence headers requires headers to be in the pRESTO annotation format, so this should be specified when sequence headers are incompatible with the pRESTO annotation scheme. Note, unrecognized header formats will default to this behavior.''') group_igblast.add_argument('--asis-calls', action='store_true', dest='asis_calls', help='''Specify to prevent gene calls from being parsed into standard allele names in both the IgBLAST output and reference database. Note, this requires the sequence identifiers in the reference sequence set and the IgBLAST database to be exact string matches.''') group_igblast.add_argument('--partial', action='store_true', dest='partial', help='''If specified, include incomplete V(D)J alignments in the pass file instead of the fail file. An incomplete alignment is defined as a record for which a valid IMGT-gapped sequence cannot be built or that is missing a V gene assignment, J gene assignment, junction region, or productivity call.''') group_igblast.add_argument('--extended', action='store_true', dest='extended', help='''Specify to include additional aligner specific fields in the output. Adds _score, _identity, _support, _cigar, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2 and cdr3.''') group_igblast.add_argument('--regions', action='store', dest='regions', choices=('default', 'rhesus-igl'), default='default', help='''IMGT CDR and FWR boundary definition to use.''') group_igblast.add_argument('--infer-junction', action='store_true', dest='infer_junction', help='''Infer the junction sequence. For use with IgBLAST v1.6.0 or older, prior to the addition of IMGT-CDR3 inference.''') parser_igblast.set_defaults(func=parseIgBLAST, amino_acid=False) # igblastp output parser parser_igblast_aa = subparsers.add_parser('igblast-aa', parents=[parser_parent], formatter_class=CommonHelpFormatter, add_help=False, help='Process igblastp output.', description='Process igblastp output.') group_igblast_aa = parser_igblast_aa.add_argument_group('aligner parsing arguments') group_igblast_aa.add_argument('-i', nargs='+', action='store', dest='aligner_files', required=True, help='''IgBLAST output files in format 7 with query sequence (igblastp argument \'-outfmt "7 std qseq sseq btop"\').''') group_igblast_aa.add_argument('-r', nargs='+', action='store', dest='repo', required=True, help='''List of folders and/or fasta files containing the same germline set used in the IgBLAST alignment. These reference sequences must contain IMGT-numbering spacers (gaps) in the V segment.''') group_igblast_aa.add_argument('-s', action='store', nargs='+', dest='seq_files', required=True, help='''List of input FASTA files (with .fasta, .fna or .fa extension), containing sequences.''') group_igblast_aa.add_argument('--10x', action='store', nargs='+', dest='cellranger_file', help='''Table file containing 10X annotations (with .csv or .tsv extension).''') group_igblast_aa.add_argument('--asis-id', action='store_true', dest='asis_id', help='''Specify to prevent input sequence headers from being parsed to add new columns to database. Parsing of sequence headers requires headers to be in the pRESTO annotation format, so this should be specified when sequence headers are incompatible with the pRESTO annotation scheme. Note, unrecognized header formats will default to this behavior.''') group_igblast_aa.add_argument('--asis-calls', action='store_true', dest='asis_calls', help='''Specify to prevent gene calls from being parsed into standard allele names in both the IgBLAST output and reference database. Note, this requires the sequence identifiers in the reference sequence set and the IgBLAST database to be exact string matches.''') group_igblast_aa.add_argument('--extended', action='store_true', dest='extended', help='''Specify to include additional aligner specific fields in the output. Adds v_score, v_identity, v_support, v_cigar, fwr1, fwr2, fwr3, cdr1 and cdr2.''') group_igblast_aa.add_argument('--regions', action='store', dest='regions', choices=('default', 'rhesus-igl'), default='default', help='''IMGT CDR and FWR boundary definition to use.''') parser_igblast_aa.set_defaults(func=parseIgBLAST, partial=True, amino_acid=True) # IMGT aligner parser_imgt = subparsers.add_parser('imgt', parents=[parser_parent], formatter_class=CommonHelpFormatter, add_help=False, help='''Process IMGT/HighV-Quest output (does not work with V-QUEST).''', description='''Process IMGT/HighV-Quest output (does not work with V-QUEST).''') group_imgt = parser_imgt.add_argument_group('aligner parsing arguments') group_imgt.add_argument('-i', nargs='+', action='store', dest='aligner_files', required=True, help='''Either zipped IMGT output files (.zip or .txz) or a folder containing unzipped IMGT output files (which must include 1_Summary, 2_IMGT-gapped, 3_Nt-sequences, and 6_Junction).''') group_imgt.add_argument('-s', nargs='*', action='store', dest='seq_files', required=False, help='''List of FASTA files (with .fasta, .fna or .fa extension) that were submitted to IMGT/HighV-QUEST. If unspecified, sequence identifiers truncated by IMGT/HighV-QUEST will not be corrected.''') group_imgt.add_argument('-r', nargs='+', action='store', dest='repo', required=False, help='''List of folders and/or fasta files containing the germline sequence set used by IMGT/HighV-QUEST. These reference sequences must contain IMGT-numbering spacers (gaps) in the V segment. If unspecified, the germline sequence reconstruction will not be included in the output.''') group_imgt.add_argument('--10x', action='store', nargs='+', dest='cellranger_file', help='''Table file containing 10X annotations (with .csv or .tsv extension).''') group_imgt.add_argument('--asis-id', action='store_true', dest='asis_id', help='''Specify to prevent input sequence headers from being parsed to add new columns to database. Parsing of sequence headers requires headers to be in the pRESTO annotation format, so this should be specified when sequence headers are incompatible with the pRESTO annotation scheme. Note, unrecognized header formats will default to this behavior.''') group_imgt.add_argument('--partial', action='store_true', dest='partial', help='''If specified, include incomplete V(D)J alignments in the pass file instead of the fail file. An incomplete alignment is defined as a record that is missing a V gene assignment, J gene assignment, junction region, or productivity call.''') group_imgt.add_argument('--extended', action='store_true', dest='extended', help='''Specify to include additional aligner specific fields in the output. Adds _score, _identity>, fwr1, fwr2, fwr3, fwr4, cdr1, cdr2, cdr3, n1_length, n2_length, p3v_length, p5d_length, p3d_length, p5j_length and d_frame.''') group_imgt.add_argument('--imgt-id-len', action='store', dest='imgt_id_len', type=int, default=default_imgt_id_len, help='''The maximum character length of sequence identifiers reported by IMGT/HighV-QUEST. Specify 50 if the IMGT files (-i) were generated with an IMGT/HighV-QUEST version older than 1.8.3 (May 7, 2021).''') parser_imgt.set_defaults(func=parseIMGT) # iHMMuneAlign Aligner parser_ihmm = subparsers.add_parser('ihmm', parents=[parser_parent], formatter_class=CommonHelpFormatter, add_help=False, help='Process iHMMune-Align output.', description='Process iHMMune-Align output.') group_ihmm = parser_ihmm.add_argument_group('aligner parsing arguments') group_ihmm.add_argument('-i', nargs='+', action='store', dest='aligner_files', required=True, help='''iHMMune-Align output file.''') group_ihmm.add_argument('-r', nargs='+', action='store', dest='repo', required=True, help='''List of folders and/or FASTA files containing the set of germline sequences used by iHMMune-Align. These reference sequences must contain IMGT-numbering spacers (gaps) in the V segment.''') group_ihmm.add_argument('-s', action='store', nargs='+', dest='seq_files', required=True, help='''List of input FASTA files (with .fasta, .fna or .fa extension) containing sequences.''') group_ihmm.add_argument('--10x', action='store', nargs='+', dest='cellranger_file', help='''Table file containing 10X annotations (with .csv or .tsv extension).''') group_ihmm.add_argument('--asis-id', action='store_true', dest='asis_id', help='''Specify to prevent input sequence headers from being parsed to add new columns to database. Parsing of sequence headers requires headers to be in the pRESTO annotation format, so this should be specified when sequence headers are incompatible with the pRESTO annotation scheme. Note, unrecognized header formats will default to this behavior.''') group_ihmm.add_argument('--partial', action='store_true', dest='partial', help='''If specified, include incomplete V(D)J alignments in the pass file instead of the fail file. An incomplete alignment is defined as a record for which a valid IMGT-gapped sequence cannot be built or that is missing a V gene assignment, J gene assignment, junction region, or productivity call.''') group_ihmm.add_argument('--extended', action='store_true', dest='extended', help='''Specify to include additional aligner specific fields in the output. Adds the path score of the iHMMune-Align hidden Markov model as vdj_score; adds fwr1, fwr2, fwr3, fwr4, cdr1, cdr2 and cdr3.''') parser_ihmm.set_defaults(func=parseIHMM) return parser if __name__ == "__main__": """ Parses command line arguments and calls main """ parser = getArgParser() checkArgs(parser) args = parser.parse_args() args_dict = parseCommonArgs(args, in_arg='aligner_files') # Set no ID parsing if sequence files are not provided if 'seq_files' in args_dict and not args_dict['seq_files']: args_dict['asis_id'] = True # Delete if 'aligner_files' in args_dict: del args_dict['aligner_files'] if 'seq_files' in args_dict: del args_dict['seq_files'] if 'out_files' in args_dict: del args_dict['out_files'] if 'command' in args_dict: del args_dict['command'] if 'func' in args_dict: del args_dict['func'] # Call main for i, f in enumerate(args.__dict__['aligner_files']): args_dict['aligner_file'] = f args_dict['seq_file'] = args.__dict__['seq_files'][i] \ if args.__dict__['seq_files'] else None args_dict['out_file'] = args.__dict__['out_files'][i] \ if args.__dict__['out_files'] else None args_dict['cellranger_file'] = args.__dict__['cellranger_file'][i] \ if args.__dict__['cellranger_file'] else None args.func(**args_dict) changeo-1.2.0/bin/BuildTrees.py0000755000175000017500000015671414136543165015661 0ustar nileshnilesh#!/usr/bin/env python3 """ Converts TSV files into IgPhyML input files """ # Info __author__ = "Kenneth Hoehn" from changeo import __version__, __date__ # Imports import os import random import subprocess import multiprocessing as mp from argparse import ArgumentParser from collections import OrderedDict from textwrap import dedent from time import time from Bio.Seq import Seq from functools import partial # Presto and changeo imports from presto.Defaults import default_out_args from presto.IO import printLog, printMessage, printWarning, printError, printDebug from changeo.Defaults import default_format from changeo.IO import splitName, getDbFields, getFormatOperators, getOutputHandle, getOutputName from changeo.Alignment import RegionDefinition from changeo.Commandline import CommonHelpFormatter, checkArgs, getCommonArgParser, parseCommonArgs def correctMidCodonStart(scodons, qi, debug): """ Find and mask split codons Arguments: scodons (list): list of codons in IMGT sequence. qi (str) : input sequence. spos (int) : starting position of IMGT sequence in input sequence. debug (bool) : print debugging statements. Returns: tuple: (modified input sequence, modified starting position of IMGT sequence in input sequence). """ spos = 0 for i in range(0, len(scodons)): printDebug("%s %s" % (scodons[i], qi[0:3]), debug) if scodons[i] != "...": if scodons[i][0:2] == "..": scodons[i] = "NN" + scodons[i][2] #sometimes IMGT will just cut off first letter if non-match, at which point we"ll just want to mask the #first codon in the IMGT seq, other times it will be legitimately absent from the query, at which point #we have to shift the frame. This attempts to correct for this by looking at the next codon over in the #alignment if scodons[i][2:3] != qi[2:3] or scodons[i + 1] != qi[3:6]: qi = "NN" + qi spos = i break elif scodons[i][0] == ".": scodons[i] = "N" + scodons[i][1:3] if scodons[i][1:3] != qi[1:3] or scodons[i+1] != qi[3:6]: qi = "N" + qi spos = i break else: spos = i break return qi, spos def checkFrameShifts(receptor, oqpos, ospos, log, debug): """ Checks whether a frameshift occured in a sequence Arguments: receptor (changeo.Receptor.Receptor): Receptor object. oqpos (int) : position of interest in input sequence. ospos (int) : position of interest in IMGT sequence. log (dict) : log of information for each sequence. debug (bool) : print debugging statements. """ frameshifts = 0 for ins in range(1, 3): ros = receptor.sequence_input ris = receptor.sequence_imgt psite = receptor.v_seq_start - 1 + oqpos*3 pisite = ospos * 3 if (psite + 3 + ins) < len(ros) and (pisite + 3) < len(ris): #cut out 1 or 2 nucleotides downstream of offending codon receptor.sequence_input = ros[0:(psite + 3)] + ros[(psite + 3 + ins):] receptor.sequence_imgt = ris[0:(pisite + 3)] + ris[(pisite + 3):] # Debug sequence modifications printDebug(ros, debug) printDebug(receptor.sequence_input, debug) printDebug(ris, debug) printDebug(receptor.sequence_imgt, debug) printDebug("RUNNING %d" % ins, debug) mout = maskSplitCodons(receptor, recursive=True) if mout[1]["PASS"]: #if debug: receptor.sequence_input = ros receptor.sequence_imgt = ris frameshifts += 1 printDebug("FRAMESHIFT of length %d!" % ins, debug) log["FAIL"] = "SINGLE FRAME-SHIFTING INSERTION" break else: receptor.sequence_input = ros receptor.sequence_imgt = ris return frameshifts def findAndMask(receptor, scodons, qcodons, spos, s_end, qpos, log, debug, recursive=False): """ Find and mask split codons Arguments: receptor (changeo.Receptor.Receptor): Receptor object. scodons (list): list of codons in IMGT sequence qcodons (list): list of codons in input sequence spos (int): starting position of IMGT sequence in input sequence s_end (int): end of IMGT sequence qpos (int): starting position of input sequence in IMGT sequence log (dict): log of information for each sequence debug (bool): print debugging statements? recursive (bool): was this function called recursively? """ frameshifts = 0 while spos < s_end and qpos < len(qcodons): if debug: print(scodons[spos] + "\t" + qcodons[qpos]) if scodons[spos] == "..." and qcodons[qpos] != "...": #if IMGT gap, move forward in imgt spos += 1 elif scodons[spos] == qcodons[qpos]: # if both are the same, move both forward spos += 1 qpos += 1 elif qcodons[qpos] == "N": # possible that SEQ-IMGT ends on a bunch of Ns qpos += 1 spos += 1 else: # if not the same, mask IMGT at that site and scan forward until you find a codon that matches next site if debug: print("checking %s at position %d %d" % (scodons[spos], spos, qpos)) ospos=spos oqpos=qpos spos += 1 qpos += 1 while spos < s_end and scodons[spos] == "...": #possible next codon is just a gap spos += 1 while qpos < len(qcodons) and spos < s_end and scodons[spos] != qcodons[qpos]: printDebug("Checking " + scodons[spos]+ "\t" + qcodons[qpos], debug) qpos += 1 if qcodons[qpos-1] == scodons[ospos]: #if codon in previous position is equal to original codon, it was preserved qpos -= 1 spos = ospos printDebug("But codon was apparently preserved", debug) if "IN-FRAME" in log: log["IN-FRAME"] = log["IN-FRAME"] + "," + str(spos) else: log["IN-FRAME"] = str(spos) elif qpos >= len(qcodons) and spos < s_end: printDebug("FAILING MATCH", debug) log["PASS"] = False #if no match for the adjacent codon was found, something"s up. log["FAIL"] = "FAILED_MATCH_QSTRING:"+str(spos) #figure out if this was due to a frame-shift by repeating this method but with an edited input sequence if not recursive: frameshifts += checkFrameShifts(receptor, oqpos, ospos, log, debug) elif spos >= s_end or qcodons[qpos] != scodons[spos]: scodons[ospos] = "NNN" if spos >= s_end: printDebug("Masked %s at position %d, at end of subject sequence" % (scodons[ospos], ospos), debug) if "END-MASKED" in log: log["END-MASKED"] = log["END-MASKED"] + "," + str(spos) else: log["END-MASKED"] = str(spos) else: printDebug("Masked %s at position %d, but couldn't find upstream match" % (scodons[ospos], ospos), debug) log["PASS"]=False log["FAIL"]="FAILED_MATCH:"+str(spos) elif qcodons[qpos] == scodons[spos]: printDebug("Masked %s at position %d" % (scodons[ospos], ospos), debug) scodons[ospos] = "NNN" if "MASKED" in log: log["MASKED"] = log["MASKED"] + "," + str(spos) else: log["MASKED"] = str(spos) else: log["PASS"] = False log["FAIL"] = "UNKNOWN" def maskSplitCodons(receptor, recursive=False, mask=True): """ Identify junction region by IMGT definition. Arguments: receptor (changeo.Receptor.Receptor): Receptor object. recursive (bool) : was this method part of a recursive call? mask (bool) : mask split codons for use with igphyml? Returns: str: modified IMGT gapped sequence. log: dict of sequence information """ debug = False qi = receptor.sequence_input si = receptor.sequence_imgt log = OrderedDict() log["ID"]=receptor.sequence_id log["CLONE"]=receptor.clone log["PASS"] = True if debug: print(receptor.sequence_id) # adjust starting position of query sequence qi = qi[(receptor.v_seq_start - 1):] #tally where --- gaps are in IMGT sequence and remove them for now gaps = [] ndotgaps = [] nsi = "" for i in range(0,len(si)): if si[i] == "-": gaps.append(1) ndotgaps.append(1) else: gaps.append(0) nsi = nsi + si[i] if si[i] != ".": ndotgaps.append(0) #find any gaps not divisible by three curgap = 0 for i in ndotgaps: if i == 1: curgap += 1 elif i == 0 and curgap != 0: if curgap % 3 != 0 : printDebug("Frame-shifting gap detected! Refusing to include sequence.", debug) log["PASS"] = False log["FAIL"] = "FRAME-SHIFTING DELETION" log["SEQ_IN"] = receptor.sequence_input log["SEQ_IMGT"] = receptor.sequence_imgt log["SEQ_MASKED"] = receptor.sequence_imgt return receptor.sequence_imgt, log else: curgap = 0 si = nsi scodons = [si[i:i + 3] for i in range(0, len(si), 3)] # deal with the fact that it's possible to start mid-codon qi,spos = correctMidCodonStart(scodons, qi, debug) qcodons = [qi[i:i + 3] for i in range(0, len(qi), 3)] frameshifts = 0 s_end = 0 #adjust for the fact that IMGT sequences can end on gaps for i in range(spos, len(scodons)): if scodons[i] != "..." and len(scodons[i]) == 3 and scodons[i] != "NNN": s_end = i printDebug("%i:%i:%s" % (s_end, len(scodons), scodons[s_end]), debug) s_end += 1 qpos = 0 if mask: findAndMask(receptor, scodons, qcodons, spos, s_end, qpos, log, debug, recursive) if not log["PASS"] and not recursive: log["FRAMESHIFTS"] = frameshifts if len(scodons[-1]) != 3: if scodons[-1] == ".." or scodons[-1] == ".": scodons[-1] = "..." else: scodons[-1] = "NNN" if "END-MASKED" in log: log["END-MASKED"] = log["END-MASKED"] + "," + str(len(scodons)) else: log["END-MASKED"] = str(spos) concatenated_seq = Seq("") for i in scodons: concatenated_seq += i # add --- gaps back to IMGT sequence ncon_seq = "" counter = 0 for i in gaps: #print(str(i) + ":" + ncon_seq) if i == 1: ncon_seq = ncon_seq + "." elif i == 0: ncon_seq = ncon_seq + concatenated_seq[counter] counter += 1 ncon_seq = ncon_seq + concatenated_seq[counter:] concatenated_seq = ncon_seq log["SEQ_IN"] = receptor.sequence_input log["SEQ_IMGT"] = receptor.sequence_imgt log["SEQ_MASKED"] = concatenated_seq return concatenated_seq, log def unAmbigDist(seq1, seq2, fbreak=False): """ Calculate the distance between two sequences counting only A,T,C,Gs Arguments: seq1 (str): sequence 1 seq2 (str): sequence 2 fbreak (bool): break after first difference found? Returns: int: number of ACGT differences. """ if len(seq1) != len(seq2): printError("Sequences are not the same length! %s %s" % (seq1, seq2)) dist = 0 for i in range(0,len(seq1)): if seq1[i] != "N" and seq1[i] != "-" and seq1[i] != ".": if seq2[i] != "N" and seq2[i] != "-" and seq2[i] != ".": if seq1[i] != seq2[i]: dist += 1 if fbreak: break return dist def deduplicate(useqs, receptors, log=None, meta_data=None, delim=":"): """ Collapses identical sequences Argument: useqs (dict): unique sequences within a clone. maps sequence to index in Receptor list. receptors (dict): receptors within a clone (index is value in useqs dict). log (collections.OrderedDict): log of sequence errors. meta_data (str): Field to append to sequence IDs. Splits identical sequences with different meta_data. meta_data (str): Field to append to sequence IDs. Splits identical sequences with different meta_data. delim (str): delimited to use when appending meta_data. Returns: list: deduplicated receptors within a clone. """ keys = list(useqs.keys()) join = {} # id -> sequence id to join with (least ambiguous chars) joinseqs = {} # id -> useq to join with (least ambiguous chars) ambigchar = {} #sequence id -> number ATCG nucleotides for i in range(0,len(keys)-1): for j in range(i+1,len(keys)): ki = keys[i] kj = keys[j] if meta_data is None: ski = keys[i] skj = keys[j] else: ski, cid = keys[i].split(delim) skj, cid = keys[j].split(delim) ri = receptors[useqs[ki]] rj = receptors[useqs[kj]] dist = unAmbigDist(ski, skj, True) m_match = True if meta_data is not None: matches = 0 for m in meta_data: if ri.getField(m) == rj.getField(m) and m != "DUPCOUNT": matches += 1 m_match = (matches == len(meta_data)) if dist == 0 and m_match: ncounti = ki.count("A") + ki.count("T") + ki.count("G") + ki.count("C") ncountj = kj.count("A") + kj.count("T") + kj.count("G") + kj.count("C") ambigchar[useqs[ki]] = ncounti ambigchar[useqs[kj]] = ncountj # this algorithm depends on the fact that all sequences are compared pairwise, and all are zero # distance from the sequence they will be collapse to. if ncountj > ncounti: nci = 0 if useqs[ki] in join: nci = ambigchar[join[useqs[ki]]] if nci < ncountj: join[useqs[ki]] = useqs[kj] joinseqs[ki] = kj else: ncj = 0 if useqs[kj] in join: ncj = ambigchar[join[useqs[kj]]] if ncj < ncounti: join[useqs[kj]] = useqs[ki] joinseqs[kj] = ki # loop through list of joined sequences and collapse keys = list(useqs.keys()) for k in keys: if useqs[k] in join: rfrom = receptors[useqs[k]] rto = receptors[join[useqs[k]]] rto.dupcount += rfrom.dupcount if log is not None: log[rfrom.sequence_id]["PASS"] = False log[rfrom.sequence_id]["DUPLICATE"] = True log[rfrom.sequence_id]["COLLAPSETO"] = joinseqs[k] log[rfrom.sequence_id]["COLLAPSEFROM"] = k log[rfrom.sequence_id]["FAIL"] = "Collapsed with " + rto.sequence_id del useqs[k] return useqs def hasPTC(sequence): """ Determines whether a PTC exits in a sequence Arguments: sequence (str): IMGT gapped sequence in frame 1. Returns: int: negative if not PTCs, position of PTC if found. """ ptcs = ("TAA", "TGA", "TAG", "TRA", "TRG", "TAR", "TGR", "TRR") for i in range(0, len(sequence), 3): if sequence[i:(i+3)] in ptcs: return i return -1 def rmCDR3(sequences, clones): """ Remove CDR3 from all sequences and germline of a clone Arguments: sequences (list): list of sequences in clones. clones (list): list of Receptor objects. """ for i in range(0,len(sequences)): imgtar = clones[i].getField("imgtpartlabels") germline = clones[i].getField("germline_imgt_d_mask") nseq = [] nimgtar = [] ngermline = [] ncdr3 = 0 #print("imgtarlen: " + str(len(imgtar))) #print("seqlen: " + str(len(sequences[i]))) #print("germline: " + str(len(germline))) #if len(germline) < len(sequences[i]): # print("\n" + str(clones[i].sequence_id)) # print("\n " + str((sequences[i])) ) # print("\n" + str((germline))) for j in range(0,len(imgtar)): if imgtar[j] != 108: nseq.append(sequences[i][j]) if j < len(germline): ngermline.append(germline[j]) nimgtar.append(imgtar[j]) else: ncdr3 += 1 clones[i].setField("imgtpartlabels",nimgtar) clones[i].setField("germline_imgt_d_mask", "".join(ngermline)) sequences[i] = "".join(nseq) #print("Length: " + str(ncdr3)) def characterizePartitionErrors(sequences, clones, meta_data): """ Characterize potential mismatches between IMGT labels within a clone Arguments: sequences (list): list of sequences in clones. clones (list): list of Receptor objects. meta_data (str): Field to append to sequence IDs. Splits identical sequences with different meta_data. Returns: tuple: tuple of length four containing a list of IMGT positions for first sequence in clones, the germline sequence of the first receptor in clones, the length of the first sequence in clones, and the number of sequences in clones. """ sites = len(sequences[0]) nseqs = len(sequences) imgtar = clones[0].getField("imgtpartlabels") germline = clones[0].getField("germline_imgt_d_mask") if germline == "": germline = clones[0].getField("germline_imgt") correctseqs = False for seqi in range(0, len(sequences)): i = sequences[seqi] if len(i) != sites or len(clones[seqi].getField("imgtpartlabels")) != len(imgtar): correctseqs = True if correctseqs: maxlen = sites maximgt = len(imgtar) for j in range(0,len(sequences)): if len(sequences[j]) > maxlen: maxlen = len(sequences[j]) if len(clones[j].getField("imgtpartlabels")) > maximgt: imgtar = clones[j].getField("imgtpartlabels") maximgt = len(imgtar) sites = maxlen for j in range(0,len(sequences)): cimgt = clones[j].getField("imgtpartlabels") seqdiff = maxlen - len(sequences[j]) imgtdiff = len(imgtar)-len(cimgt) sequences[j] = sequences[j] + "N"*(seqdiff) last = cimgt[-1] cimgt.extend([last]*(imgtdiff)) clones[j].setField("imgtpartlabels",cimgt) if meta_data is not None: meta_data_ar = meta_data[0].split(",") for c in clones: if meta_data is not None: c.setField(meta_data[0],c.getField(meta_data_ar[0])) for m in range(1,len(meta_data_ar)): st = c.getField(meta_data[0])+":"+c.getField(meta_data_ar[m]) c.setField(meta_data[0],st) if len(c.getField("imgtpartlabels")) != len(imgtar): printError("IMGT assignments are not the same within clone %d!\n" % c.clone,False) printError(c.getField("imgtpartlabels"),False) printError("%s\n%d\n" % (imgtar,j),False) for j in range(0, len(sequences)): printError("%s\n%s\n" % (sequences[j],clones[j].getField("imgtpartlabels")),False) printError("ChangeO file needs to be corrected") for j in range(0,len(imgtar)): if c.getField("imgtpartlabels")[j] != imgtar[j]: printError("IMGT assignments are not the same within clone %d!\n" % c.clone, False) printError(c.getField("imgtpartlabels"), False) printError("%s\n%d\n" % (imgtar, j)) #Resolve germline if there are differences, e.g. if reconstruction was done before clonal clustering resolveglines = False for c in clones: ngermline = c.getField("germline_imgt_d_mask") if ngermline == "": ngermline = c.getField("germline_imgt") if ngermline != germline: resolveglines = True if resolveglines: printError("%s %s" % ("Predicted germlines are not the same among sequences in the same clone.", "Be sure to cluster sequences into clones first and then predict germlines using --cloned")) if sites > (len(germline)): seqdiff = sites - len(germline) germline = germline + "N" * (seqdiff) if sites % 3 != 0: printError("number of sites must be divisible by 3! len: %d, clone: %s , id: %s, seq: %s" %(len(sequences[0]),\ clones[0].clone,clones[0].sequence_id,sequences[0])) return imgtar, germline, sites, nseqs def outputSeqPartFiles(out_dir, useqs_f, meta_data, clones, collapse, nseqs, delim, newgerm, conseqs, duplicate, imgt): """ Create intermediate sequence alignment and partition files for IgPhyML output Arguments: out_dir (str): directory for sequence files. useqs_f (dict): unique sequences mapped to ids. meta_data (str): Field to append to sequence IDs. Splits identical sequences with different meta_data. clones (list) : list of receptor objects. collapse (bool) : deduplicate sequences. nseqs (int): number of sequences. delim (str) : delimiter for extracting metadata from ID. newgerm (str) : modified germline of clonal lineage. conseqs (list) : consensus sequences. duplicate (bool) : duplicate sequence if only one in a clone. imgt (list) : IMGT numbering of clonal positions . """ # bootstrap these data if desired lg = len(newgerm) sites = range(0, lg) transtable = clones[0].sequence_id.maketrans(" ", "_") outfile = os.path.join(out_dir, "%s.fasta" % clones[0].clone) with open(outfile, "w") as clonef: if collapse: for seq_f, num in useqs_f.items(): seq = seq_f cid = "" if meta_data is not None: seq, cid = seq_f.split(delim) cid = delim + cid.replace(":", "_") sid = clones[num].sequence_id.translate(transtable) + cid clonef.write(">%s\n%s\n" % (sid.replace(":","-"), seq.replace(".", "-"))) if len(useqs_f) == 1 and duplicate: if meta_data is not None: if meta_data[0] == "DUPCOUNT": cid = delim + "0" sid = clones[num].sequence_id.translate(transtable) + "_1" + cid clonef.write(">%s\n%s\n" % (sid.replace(":","-"), seq.replace(".", "-"))) else: for j in range(0, nseqs): cid = "" if meta_data is not None: meta_data_list = [] for m in meta_data: meta_data_list.append(clones[j].getField(m).replace(":", "_")) cid = delim + str(delim.join(meta_data_list)) sid = clones[j].sequence_id.translate(transtable) + cid clonef.write(">%s\n%s\n" % (sid.replace(":","-"), conseqs[j].replace(".", "-"))) if nseqs == 1 and duplicate: if meta_data is not None: if meta_data[0] == "DUPCOUNT": cid = delim + "0" sid = clones[j].sequence_id.translate(transtable)+"_1" + cid clonef.write(">%s\n%s\n" % (sid.replace(":","-"), conseqs[j].replace(".", "-"))) germ_id = ["GERM"] if meta_data is not None: for i in range(1,len(meta_data)): germ_id.append("GERM") clonef.write(">%s_%s\n" % (clones[0].clone,"_".join(germ_id))) for i in range(0, len(newgerm)): clonef.write("%s" % newgerm[i].replace(".","-")) clonef.write("\n") #output partition file partfile = os.path.join(out_dir, "%s.part.txt" % clones[0].clone) with open(partfile, "w") as partf: partf.write("%d %d\n" % (2, len(newgerm))) partf.write("FWR:IMGT\n") partf.write("CDR:IMGT\n") partf.write("%s\n" % (clones[0].v_call.split("*")[0])) partf.write("%s\n" % (clones[0].j_call.split("*")[0])) partf.write(",".join(map(str, imgt))) partf.write("\n") def outputIgPhyML(clones, sequences, meta_data=None, collapse=False, ncdr3=False, logs=None, fail_writer=None, out_dir=None, min_seq=1): """ Create intermediate sequence alignment and partition files for IgPhyML output Arguments: clones (list): receptor objects within the same clone. sequences (list): sequences within the same clone (share indexes with clones parameter). meta_data (str): Field to append to sequence IDs. Splits identical sequences with different meta_data collapse (bool): if True collapse identical sequences. ncdr3 (bool): if True remove CDR3 logs (dict): contains log information for each sequence out_dir (str): directory for output files. fail_writer (changeo.IO.TSVWriter): failed sequences writer object. min_seq (int): minimum number of data sequences to include. Returns: int: number of clones. """ s = "" delim = "_" duplicate = True # duplicate sequences in clones with only 1 sequence? imgtar, germline, sites, nseqs = characterizePartitionErrors(sequences, clones, meta_data) tallies = [] for i in range(0, sites, 3): tally = 0 for j in range(0, nseqs): if sequences[j][i:(i + 3)] != "...": tally += 1 tallies.append(tally) newseqs = [] # remove gap only sites from observed data newgerm = [] imgt = [] for j in range(0, nseqs): for i in range(0, sites, 3): if i == 0: newseqs.append([]) if tallies[i//3] > 0: newseqs[j].append(sequences[j][i:(i+3)]) lcodon = "" for i in range(0, sites, 3): if tallies[i//3] > 0: newgerm.append(germline[i:(i+3)]) lcodon=germline[i:(i+3)] imgt.append(imgtar[i]) if len(lcodon) == 2: newgerm[-1] = newgerm[-1] + "N" elif len(lcodon) == 1: newgerm[-1] = newgerm[-1] + "NN" if ncdr3: ngerm = [] nimgt = [] for i in range(0, len(newseqs)): nseq = [] ncdr3 = 0 for j in range(0, len(imgt)): if imgt[j] != 108: nseq.append(newseqs[i][j]) if i == 0: ngerm.append(newgerm[j]) nimgt.append(imgt[j]) else: ncdr3 += 1 newseqs[i] = nseq newgerm = ngerm imgt = nimgt #print("Length: " + str(ncdr3)) useqs_f = OrderedDict() conseqs = [] for j in range(0, nseqs): conseq = "".join([str(seq_rec) for seq_rec in newseqs[j]]) if meta_data is not None: meta_data_list = [] for m in range(0,len(meta_data)): if isinstance(clones[j].getField(meta_data[m]), str): clones[j].setField(meta_data[m],clones[j].getField(meta_data[m]).replace("_", "")) meta_data_list.append(str(clones[j].getField(meta_data[m]))) conseq_f = "".join([str(seq_rec) for seq_rec in newseqs[j]])+delim+":".join(meta_data_list) else: conseq_f = conseq if conseq_f in useqs_f and collapse: clones[useqs_f[conseq_f]].dupcount += clones[j].dupcount logs[clones[j].sequence_id]["PASS"] = False logs[clones[j].sequence_id]["FAIL"] = "Duplication of " + clones[useqs_f[conseq_f]].sequence_id logs[clones[j].sequence_id]["DUPLICATE"]=True if fail_writer is not None: fail_writer.writeReceptor(clones[j]) else: useqs_f[conseq_f] = j conseqs.append(conseq) if collapse: useqs_f = deduplicate(useqs_f, clones, logs, meta_data, delim) if collapse and len(useqs_f) < min_seq: for seq_f, num in useqs_f.items(): logs[clones[num].sequence_id]["FAIL"] = "Clone too small: " + str(len(useqs_f)) logs[clones[num].sequence_id]["PASS"] = False return -len(useqs_f) elif not collapse and len(conseqs) < min_seq: for j in range(0, nseqs): logs[clones[j].sequence_id]["FAIL"] = "Clone too small: " + str(len(conseqs)) logs[clones[j].sequence_id]["PASS"] = False return -len(conseqs) # Output fasta file of masked, concatenated sequences outputSeqPartFiles(out_dir, useqs_f, meta_data, clones, collapse, nseqs, delim, newgerm, conseqs, duplicate, imgt) if collapse: return len(useqs_f) else: return nseqs def maskCodonsLoop(r, clones, cloneseqs, logs, fails, out_args, fail_writer, mask=True): """ Masks codons split by alignment to IMGT reference Arguments: r (changeo.Receptor.Receptor): receptor object for a particular sequence. clones (list): list of receptors. cloneseqs (list): list of masked clone sequences. logs (dict): contains log information for each sequence. fails (dict): counts of various sequence processing failures. out_args (dict): arguments for output preferences. fail_writer (changeo.IO.TSVWriter): failed sequences writer object. Returns: 0: returns 0 if an error occurs or masking fails. 1: returns 1 masking succeeds """ if r.clone is None: printError("Cannot export datasets until sequences are clustered into clones.") if r.dupcount is None: r.dupcount = 1 fails["rec_count"] += 1 #printProgress(rec_count, rec_count, 0.05, start_time) ptcs = hasPTC(r.sequence_imgt) gptcs = hasPTC(r.getField("germline_imgt_d_mask")) if gptcs >= 0: log = OrderedDict() log["ID"] = r.sequence_id log["CLONE"] = r.clone log["SEQ_IN"] = r.sequence_input log["SEQ_IMGT"] = r.sequence_imgt logs[r.sequence_id] = log logs[r.sequence_id]["PASS"] = False logs[r.sequence_id]["FAIL"] = "Germline PTC" fails["seq_fail"] += 1 fails["germlineptc"] += 1 return 0 if r.functional and ptcs < 0: #If IMGT regions are provided, record their positions rd = RegionDefinition(r.junction_length, amino_acid=False) regions = rd.getRegions(r.sequence_imgt) if regions["cdr3_imgt"] != "" and regions["cdr3_imgt"] is not None: simgt = regions["fwr1_imgt"] + regions["cdr1_imgt"] + regions["fwr2_imgt"] + regions["cdr2_imgt"] + \ regions["fwr3_imgt"] + regions["cdr3_imgt"] + regions["fwr4_imgt"] if len(simgt) < len(r.sequence_imgt): r.fwr4_imgt = r.fwr4_imgt + ("."*(len(r.sequence_imgt) - len(simgt))) simgt = regions["fwr1_imgt"] + regions["cdr1_imgt"] + regions["fwr2_imgt"] + \ regions["cdr2_imgt"] + regions["fwr3_imgt"] + regions["cdr3_imgt"] + regions["fwr4_imgt"] imgtpartlabels = [13]*len(regions["fwr1_imgt"]) + [30]*len(regions["cdr1_imgt"]) + [45]*len(regions["fwr2_imgt"]) + \ [60]*len(regions["cdr2_imgt"]) + [80]*len(regions["fwr3_imgt"]) + [108] * len(regions["cdr3_imgt"]) + \ [120] * len(regions["fwr4_imgt"]) r.setField("imgtpartlabels", imgtpartlabels) if len(r.getField("imgtpartlabels")) != len(r.sequence_imgt) or simgt != r.sequence_imgt: log = OrderedDict() log["ID"] = r.sequence_id log["CLONE"] = r.clone log["SEQ_IN"] = r.sequence_input log["SEQ_IMGT"] = r.sequence_imgt logs[r.sequence_id] = log logs[r.sequence_id]["PASS"] = False logs[r.sequence_id]["FAIL"] = "FWR/CDR error" logs[r.sequence_id]["FWRCDRSEQ"] = simgt fails["seq_fail"] += 1 fails["region_fail"] += 1 return 0 elif regions["fwr3_imgt"] != "" and regions["fwr3_imgt"] is not None: simgt = regions["fwr1_imgt"] + regions["cdr1_imgt"] + regions["fwr2_imgt"] + regions["cdr2_imgt"] + \ regions["fwr3_imgt"] nseq = r.sequence_imgt[len(simgt):len(r.sequence_imgt)] if len(simgt) < len(r.sequence_imgt): simgt = regions["fwr1_imgt"] + regions["cdr1_imgt"] + regions["fwr2_imgt"] + \ regions["cdr2_imgt"] + regions["fwr3_imgt"] + nseq imgtpartlabels = [13] * len(regions["fwr1_imgt"]) + [30] * len(regions["cdr1_imgt"]) + [45] * len( regions["fwr2_imgt"]) + \ [60] * len(regions["cdr2_imgt"]) + [80] * len(regions["fwr3_imgt"]) + \ [108] * int(len(nseq)) r.setField("imgtpartlabels", imgtpartlabels) if len(r.getField("imgtpartlabels")) != len(r.sequence_imgt) or simgt != r.sequence_imgt: log = OrderedDict() log["ID"] = r.sequence_id log["CLONE"] = r.clone log["SEQ_IN"] = r.sequence_input log["SEQ_IMGT"] = r.sequence_imgt logs[r.sequence_id] = log logs[r.sequence_id]["PASS"] = False logs[r.sequence_id]["FAIL"] = "FWR/CDR error" logs[r.sequence_id]["FWRCDRSEQ"] = simgt fails["seq_fail"] += 1 fails["region_fail"] += 1 return 0 else: #imgt_warn = "\n! IMGT FWR/CDR sequence columns not detected.\n! Cannot run CDR/FWR partitioned model on this data.\n" imgtpartlabels = [0] * len(r.sequence_imgt) r.setField("imgtpartlabels", imgtpartlabels) mout = maskSplitCodons(r, mask=mask) mask_seq = mout[0] ptcs = hasPTC(mask_seq) if ptcs >= 0: printWarning("Masked sequence suddenly has a PTC.. %s\n" % r.sequence_id) mout[1]["PASS"] = False mout[1]["FAIL"] = "PTC_ADDED_FROM_MASKING" logs[mout[1]["ID"]] = mout[1] if mout[1]["PASS"]: #passreads += r.dupcount if r.clone in clones: clones[r.clone].append(r) cloneseqs[r.clone].append(mask_seq) else: clones[r.clone] = [r] cloneseqs[r.clone] = [mask_seq] return 1 else: if out_args["failed"]: fail_writer.writeReceptor(r) fails["seq_fail"] += 1 fails["failreads"] += r.dupcount if mout[1]["FAIL"] == "FRAME-SHIFTING DELETION": fails["del_fail"] += 1 elif mout[1]["FAIL"] == "SINGLE FRAME-SHIFTING INSERTION": fails["in_fail"] += 1 else: fails["other_fail"] += 1 else: log = OrderedDict() log["ID"] = r.sequence_id log["CLONE"] = r.clone log["PASS"] = False log["FAIL"] = "NONFUNCTIONAL/PTC" log["SEQ_IN"] = r.sequence_input logs[r.sequence_id] = log if out_args["failed"]: fail_writer.writeReceptor(r) fails["seq_fail"] += 1 fails["nf_fail"] += 1 return 0 # Run IgPhyML on outputed data def runIgPhyML(outfile, igphyml_out, clone_dir, nproc=1, optimization="lr", omega="e,e", kappa="e", motifs="FCH", hotness="e,e,e,e,e,e", oformat="tab", nohlp=False, asr=-1, clean="none"): """ Run IgPhyML on outputted data Arguments: outfile (str): Output file name. igphymlout (str): igphyml output file nproc (int): Number of threads to parallelize IgPhyML across optimization (str): Optimize combination of topology (t) branch lengths (l) and parameters (r) in IgPhyML. omega (str): omega optimization in IgPhyML (--omega) kappa (str): kappa optimization in IgPhyML (-t) motifs (str): motifs to use in IgPhyML (--motifs) hotness (str): motif in IgPhyML (--hotness) oformat (str): output format for IgPhyML (tab or txt) nohlp (bool): If True, only estimate GY94 trees and parameters clean (str): delete intermediate files? (none, all) """ osplit = outfile.split(".") outrep = ".".join(osplit[0:(len(osplit)-1)]) + "_gy.tsv" gyout = outfile + "_igphyml_stats_gy.txt" gy_args = ["igphyml", "--repfile", outfile, "-m", "GY", "--run_id", "gy", "--outrep", outrep, "--threads", str(nproc),"--outname",gyout] hlp_args = ["igphyml","--repfile", outrep, "-m", "HLP", "--run_id", "hlp", "--threads", str(nproc), "-o", optimization, "--omega", omega, "-t", kappa, "--motifs", motifs, "--hotness", hotness, "--oformat", oformat, "--outname", igphyml_out] if asr >= 0: hlp_args.append("--ASRc") hlp_args.append(str(asr)) log = OrderedDict() log["START"] = "IgPhyML GY94 tree estimation" printLog(log) try: #check for igphyml executable subprocess.check_output(["igphyml"]) except: printError("igphyml not found :-/") try: #get GY94 starting topologies p = subprocess.check_output(gy_args) except subprocess.CalledProcessError as e: print(" ".join(gy_args)) print('error>', e.output, '<') printError("GY94 tree building in IgPhyML failed") log = OrderedDict() log["START"] = "IgPhyML HLP analysis" log["OPTIMIZE"] = optimization log["TS/TV"] = kappa log["wFWR,wCDR"] = omega log["MOTIFS"] = motifs log["HOTNESS"] = hotness log["NPROC"] = nproc printLog(log) if not nohlp: try: #estimate HLP parameters/trees p = subprocess.check_output(hlp_args) except subprocess.CalledProcessError as e: print(" ".join(hlp_args)) print('error>', e.output, '<') printError("HLP tree building failed") log = OrderedDict() log["OUTPUT"] = igphyml_out if oformat == "tab": igf = open(igphyml_out) names = igf.readline().split("\t") vals = igf.readline().split("\t") for i in range(3,len(names)-1): log[names[i]] = round(float(vals[i]),2) printLog(log) if clean != "none": log = OrderedDict() log["START"] = "CLEANING" log["SCOPE"] = clean printLog(log) todelete = open(outrep) for line in todelete: line = line.rstrip("\n") line = line.rstrip("\r") lsplit = line.split("\t") if len(lsplit) == 4: os.remove(lsplit[0]) os.remove(lsplit[1]) os.remove(lsplit[3]) todelete.close() os.remove(outrep) os.remove(outfile) os.remove(gyout) cilog = outrep + "_igphyml_CIlog.txt_hlp" if os.path.isfile(cilog): os.remove(cilog) if oformat == "tab": os.rmdir(clone_dir) else: printWarning("Using --clean all with --oformat txt will not delete all tree file results.\n" "You'll have to do that yourself.") log = OrderedDict() log["END"] = "IgPhyML analysis" printLog(log) # Note: Collapse can give misleading dupcount information if some sequences have ambiguous characters at polymorphic sites def buildTrees(db_file, meta_data=None, target_clones=None, collapse=False, ncdr3=False, nmask=False, sample_depth=-1, min_seq=1,append=None, igphyml=False, nproc=1, optimization="lr", omega="e,e", kappa="e", motifs="FCH", hotness="e,e,e,e,e,e", oformat="tab", clean="none", nohlp=False, asr=-1, format=default_format, out_args=default_out_args): """ Masks codons split by alignment to IMGT reference, then produces input files for IgPhyML Arguments: db_file (str): input tab-delimited database file. meta_data (str): Field to append to sequence IDs. Splits identical sequences with different meta_data target_clones (str): List of clone IDs to analyze. collapse (bool): if True collapse identical sequences. ncdr3 (bool): if True remove all CDR3s. nmask (bool): if False, do not attempt to mask split codons sample_depth (int): depth of subsampling before deduplication min_seq (int): minimum number of sequences per clone append (str): column name to append to sequence_id igphyml (bool): If True, run IgPhyML on outputted data nproc (int) : Number of threads to parallelize IgPhyML across optimization (str): Optimize combination of topology (t) branch lengths (l) and parameters (r) in IgPhyML. omega (str): omega optimization in IgPhyML (--omega) kappa (str): kappa optimization in IgPhyML (-t) motifs (str): motifs to use in IgPhyML (--motifs) hotness (str): motif in IgPhyML (--hotness) oformat (str): output format for IgPhyML (tab or txt) clean (str): delete intermediate files? (none, all) nohlp (bool): If True, only estimate GY94 trees and parameters format (str): input and output format. out_args (dict): arguments for output preferences. Returns: dict: dictionary of output pass and fail files. """ # Print parameter info log = OrderedDict() log["START"] = "BuildTrees" log["FILE"] = os.path.basename(db_file) log["COLLAPSE"] = collapse printLog(log) # Open output files out_label = "lineages" pass_handle = getOutputHandle(db_file, out_label=out_label, out_dir=out_args["out_dir"], out_name= out_args["out_name"], out_type="tsv") igphyml_out = None if igphyml: igphyml_out = getOutputName(db_file, out_label="igphyml-pass", out_dir=out_args["out_dir"], out_name=out_args["out_name"], out_type=oformat) dir_name, __ = os.path.split(pass_handle.name) if out_args["out_name"] is None: __, clone_name, __ = splitName(db_file) else: clone_name = out_args["out_name"] if dir_name is None: clone_dir = clone_name else: clone_dir = os.path.join(dir_name, clone_name) if not os.path.exists(clone_dir): os.makedirs(clone_dir) # Format options try: reader, writer, __ = getFormatOperators(format) except ValueError: printError("Invalid format %s." % format) out_fields = getDbFields(db_file, reader=reader) # open input file handle = open(db_file, "r") records = reader(handle) fail_handle, fail_writer = None, None if out_args["failed"]: fail_handle = getOutputHandle(db_file, out_label="lineages-fail", out_dir=out_args["out_dir"], out_name=out_args["out_name"], out_type=out_args["out_type"]) fail_writer = writer(fail_handle, fields=out_fields) cloneseqs = {} clones = {} logs = OrderedDict() fails = {"rec_count":0, "seq_fail":0, "nf_fail":0, "del_fail":0, "in_fail":0, "minseq_fail":0, "other_fail":0, "region_fail":0, "germlineptc":0, "fdcount":0, "totalreads":0, "passreads":0, "failreads":0} # Mask codons split by indels start_time = time() printMessage("Correcting frames and indels of sequences", start_time=start_time, width=50) #subsampling loop init_clone_sizes = {} big_enough = [] all_records = [] found_no_funct = False for r in records: if r.functional is None: r.functional = True if found_no_funct is False: printWarning("FUNCTIONAL column not found.") found_no_funct = True all_records.append(r) if r.clone in init_clone_sizes: init_clone_sizes[r.clone] += 1 else: init_clone_sizes[r.clone] = 1 for r in all_records: if target_clones is None or r.clone in target_clones: if init_clone_sizes[r.clone] >= min_seq: big_enough.append(r) fails["totalreads"] = len(all_records) #fails["minseq_fail"] = len(all_records) - len(big_enough) if len(big_enough) == 0: printError("\n\nNo sequences found that match specified criteria.",1) if sample_depth > 0: random.shuffle(big_enough) total = 0 for r in big_enough: if r.functional is None: r.functional = True if found_no_funct is False: printWarning("FUNCTIONAL column not found.") found_no_funct = True r.sequence_id = r.sequence_id.replace(",","-") #remove commas from sequence ID r.sequence_id = r.sequence_id.replace(":","-") #remove colons from sequence ID r.sequence_id = r.sequence_id.replace(",","-") #remove commas from sequence ID r.sequence_id = r.sequence_id.replace(")","-") #remove parenthesis from sequence ID r.sequence_id = r.sequence_id.replace("(","-") #remove parenthesis from sequence ID if(meta_data is not None): for m in range(0,len(meta_data)): md = r.getField(meta_data[m]) md = md.replace(",","-") #remove commas from metadata md = md.replace(":","-") #remove colons from metadata md = md.replace(",","-") #remove commas from metadata md = md.replace(")","-") #remove parenthesis from metadata md = md.replace("(","-") #remove parenthesis from metadata r.setField(meta_data[m],md) if append is not None: if append is not None: for m in append: r.sequence_id = r.sequence_id + "_" + r.getField(m) total += maskCodonsLoop(r, clones, cloneseqs, logs, fails, out_args, fail_writer, mask = not nmask) if total == sample_depth: break # Start processing clones clonesizes = {} pass_count, nclones = 0, 0 printMessage("Processing clones", start_time=start_time, width=50) for k in clones.keys(): if len(clones[str(k)]) < min_seq: for j in range(0, len(clones[str(k)])): logs[clones[str(k)][j].sequence_id]["FAIL"] = "Clone too small: " + str(len(cloneseqs[str(k)])) logs[clones[str(k)][j].sequence_id]["PASS"] = False clonesizes[str(k)] = -len(cloneseqs[str(k)]) else: clonesizes[str(k)] = outputIgPhyML(clones[str(k)], cloneseqs[str(k)], meta_data=meta_data, collapse=collapse, ncdr3=ncdr3, logs=logs, fail_writer=fail_writer, out_dir=clone_dir, min_seq=min_seq) #If clone is too small, size is returned as a negative if clonesizes[str(k)] > 0: nclones += 1 pass_count += clonesizes[str(k)] else: fails["seq_fail"] -= clonesizes[str(k)] fails["minseq_fail"] -= clonesizes[str(k)] fail_count = fails["rec_count"] - pass_count # End clone processing printMessage("Done", start_time=start_time, end=True, width=50) log_handle = None if out_args["log_file"] is not None: log_handle = open(out_args["log_file"], "w") for j in logs.keys(): printLog(logs[j], handle=log_handle) pass_handle.write(str(nclones)+"\n") for key in sorted(clonesizes, key=clonesizes.get, reverse=True): #print(key + "\t" + str(clonesizes[key])) outfile = os.path.join(clone_dir, "%s.fasta" % key) partfile = os.path.join(clone_dir, "%s.part.txt" % key) if clonesizes[key] > 0: germ_id = ["GERM"] if meta_data is not None: for i in range(1, len(meta_data)): germ_id.append("GERM") pass_handle.write("%s\t%s\t%s_%s\t%s\n" % (outfile, "N", key,"_".join(germ_id), partfile)) handle.close() output = {"pass": None, "fail": None} if pass_handle is not None: output["pass"] = pass_handle.name pass_handle.close() if fail_handle is not None: output["fail"] = fail_handle.name fail_handle.close() if log_handle is not None: log_handle.close() #printProgress(rec_count, rec_count, 0.05, start_time) log = OrderedDict() log["OUTPUT"] = os.path.basename(pass_handle.name) if pass_handle is not None else None log["RECORDS"] = fails["totalreads"] log["INITIAL_FILTER"] = fails["rec_count"] log["PASS"] = pass_count log["FAIL"] = fail_count log["NONFUNCTIONAL"] = fails["nf_fail"] log["FRAMESHIFT_DEL"] = fails["del_fail"] log["FRAMESHIFT_INS"] = fails["in_fail"] log["CLONETOOSMALL"] = fails["minseq_fail"] log["CDRFWR_ERROR"] = fails["region_fail"] log["GERMLINE_PTC"] = fails["germlineptc"] log["OTHER_FAIL"] = fails["other_fail"] if collapse: log["DUPLICATE"] = fail_count - fails["seq_fail"] log["END"] = "BuildTrees" printLog(log) #Run IgPhyML on outputted data? if igphyml: runIgPhyML(pass_handle.name, igphyml_out=igphyml_out, clone_dir=clone_dir, nproc=nproc, optimization=optimization, omega=omega, kappa=kappa, motifs=motifs, hotness=hotness, oformat=oformat, nohlp=nohlp,clean=clean,asr=asr) return output def getArgParser(): """ Defines the ArgumentParser Returns: argparse.ArgumentParser: argument parsers. """ # Define input and output field help message fields = dedent( """ output files: folder containing fasta and partition files for each clone. lineages successfully processed records. lineages-fail database records failed processing. igphyml-pass parameter estimates and lineage trees from running IgPhyML, if specified required fields: sequence_id, sequence, sequence_alignment, germline_alignment_d_mask or germline_alignment, v_call, j_call, clone_id, v_sequence_start """) # Parent parser parser_parent = getCommonArgParser(out_file=False, log=True, format=True) # Define argument parser parser = ArgumentParser(description=__doc__, epilog=fields, parents=[parser_parent], formatter_class=CommonHelpFormatter, add_help=False) group = parser.add_argument_group("sequence processing arguments") group.add_argument("--collapse", action="store_true", dest="collapse", help="""If specified, collapse identical sequences before exporting to fasta.""") group.add_argument("--ncdr3", action="store_true", dest="ncdr3", help="""If specified, remove CDR3 from all sequences.""") group.add_argument("--nmask", action="store_true", dest="nmask", help="""If specified, do not attempt to mask split codons.""") group.add_argument("--md", nargs="+", action="store", dest="meta_data", help="""List of fields to containing metadata to include in output fasta file sequence headers.""") group.add_argument("--clones", nargs="+", action="store", dest="target_clones", help="""List of clone IDs to output, if specified.""") group.add_argument("--minseq", action="store", dest="min_seq", type=int, default=1, help="""Minimum number of data sequences. Any clones with fewer than the specified number of sequences will be excluded.""") group.add_argument("--sample", action="store", dest="sample_depth", type=int, default=-1, help="""Depth of reads to be subsampled (before deduplication).""") group.add_argument("--append", nargs="+", action="store", dest="append", help="""List of columns to append to sequence ID to ensure uniqueness.""") igphyml_group = parser.add_argument_group("IgPhyML arguments (see igphyml -h for details)") igphyml_group.add_argument("--igphyml", action="store_true", dest="igphyml", help="""Run IgPhyML on output?""") igphyml_group.add_argument("--nproc", action="store", dest="nproc", type=int, default=1, help="""Number of threads to parallelize IgPhyML across.""") igphyml_group.add_argument("--clean", action="store", choices=("none", "all"), dest="clean", type=str, default="none", help="""Delete intermediate files? none: leave all intermediate files; all: delete all intermediate files.""") igphyml_group.add_argument("--optimize", action="store", dest="optimization", type=str, default="lr", choices=("n","r","l","lr","tl","tlr"), help="""Optimize combination of topology (t) branch lengths (l) and parameters (r), or nothing (n), for IgPhyML.""") igphyml_group.add_argument("--omega", action="store", dest="omega", type=str, default="e,e", help="""Omega parameters to estimate for FWR,CDR respectively: e = estimate, ce = estimate + confidence interval, or numeric value""") igphyml_group.add_argument("-t", action="store", dest="kappa", type=str, default="e", help="""Kappa parameters to estimate: e = estimate, ce = estimate + confidence interval, or numeric value""") igphyml_group.add_argument("--motifs", action="store", dest="motifs", type=str, default="WRC_2:0,GYW_0:1,WA_1:2,TW_0:3,SYC_2:4,GRS_0:5", help="""Which motifs to estimate mutability.""") igphyml_group.add_argument("--hotness", action="store", dest="hotness", type=str, default="e,e,e,e,e,e", help="""Mutability parameters to estimate: e = estimate, ce = estimate + confidence interval, or numeric value""") igphyml_group.add_argument("--oformat", action="store", dest="oformat", type=str, default="tab", choices=("tab", "txt"), help="""IgPhyML output format.""") igphyml_group.add_argument("--nohlp", action="store_true", dest="nohlp", help="""Don't run HLP model?""") igphyml_group.add_argument("--asr", action="store", dest="asr", type=float, default=-1, help="""Ancestral sequence reconstruction interval (0-1).""") return parser if __name__ == "__main__": """ Parses command line arguments and calls main """ # Parse command line arguments parser = getArgParser() checkArgs(parser) args = parser.parse_args() args_dict = parseCommonArgs(args) del args_dict["db_files"] # Call main for each input file for f in args.__dict__["db_files"]: args_dict["db_file"] = f buildTrees(**args_dict)changeo-1.2.0/bin/AssignGenes.py0000755000175000017500000002744614135353312016014 0ustar nileshnilesh#!/usr/bin/env python3 """ Assign V(D)J gene annotations """ # Info __author__ = 'Jason Anthony Vander Heiden' from changeo import __version__, __date__ # Imports import os import shutil from argparse import ArgumentParser from collections import OrderedDict from pkg_resources import parse_version from textwrap import dedent from time import time import re # Presto imports from presto.IO import printLog, printMessage, printError, printWarning from changeo.Defaults import default_igblastn_exec, default_igblastp_exec, default_out_args from changeo.Applications import runIgBLASTN, runIgBLASTP, getIgBLASTVersion from changeo.Commandline import CommonHelpFormatter, checkArgs, getCommonArgParser, parseCommonArgs from changeo.IO import getOutputName # Defaults choices_format = ('blast', 'airr') choices_loci = ('ig', 'tr') choices_organism = ('human', 'mouse', 'rabbit', 'rat', 'rhesus_monkey') default_format = 'blast' default_loci = 'ig' default_organism = 'human' default_igdata = '~/share/igblast' def assignIgBLAST(seq_file, amino_acid=False, igdata=default_igdata, loci='ig', organism='human', vdb=None, ddb=None, jdb=None, format=default_format, igblast_exec=default_igblastn_exec, out_file=None, out_args=default_out_args, nproc=None): """ Performs clustering on sets of sequences Arguments: seq_file (str): the sample sequence file name. amino_acid : if True then run igblastp. igblastn is assumed if False. igdata (str): path to the IgBLAST database directory (IGDATA environment). loci (str): receptor type; one of 'ig' or 'tr'. organism (str): species name. vdb (str): name of a custom V reference in the database folder to use. ddb (str): name of a custom D reference in the database folder to use. jdb (str): name of a custom J reference in the database folder to use. format (str): output format. One of 'blast' or 'airr'. exec (str): the path to the igblastn executable. out_file (str): output file name. Automatically generated from the input file if None. out_args (dict): common output argument dictionary from parseCommonArgs. nproc (int): the number of processQueue processes; if None defaults to the number of CPUs. Returns: str: the output file name """ # Check format argument try: out_type = {'blast': 'fmt7', 'airr': 'tsv'}[format] except KeyError: printError('Invalid output format %s.' % format) # Get IgBLAST version version = getIgBLASTVersion(exec=igblast_exec) if parse_version(version) < parse_version('1.6'): printError('IgBLAST version is %s and 1.6 or higher is required.' % version) if format == 'airr' and parse_version(version) < parse_version('1.9'): printError('IgBLAST version is %s and 1.9 or higher is required for AIRR format support.' % version) # Print parameter info log = OrderedDict() log['START'] = 'AssignGenes' log['COMMAND'] = 'igblast-aa' if amino_acid else 'igblast' log['VERSION'] = version log['FILE'] = os.path.basename(seq_file) log['ORGANISM'] = organism log['LOCI'] = loci log['NPROC'] = nproc printLog(log) # Open output writer if out_file is None: out_file = getOutputName(seq_file, out_label='igblast', out_dir=out_args['out_dir'], out_name=out_args['out_name'], out_type=out_type) # Run IgBLAST clustering start_time = time() printMessage('Running IgBLAST', start_time=start_time, width=25) if not amino_acid: console_out = runIgBLASTN(seq_file, igdata, loci=loci, organism=organism, vdb=vdb, ddb=ddb, jdb=jdb, output=out_file, format=format, threads=nproc, exec=igblast_exec) else: console_out = runIgBLASTP(seq_file, igdata, loci=loci, organism=organism, vdb=vdb, output=out_file, threads=nproc, exec=igblast_exec) printMessage('Done', start_time=start_time, end=True, width=25) # Get number of processed sequences if (format == 'blast'): with open(out_file, 'rb') as f: f.seek(-2, os.SEEK_END) while f.read(1) != b'\n': f.seek(-2, os.SEEK_CUR) pass_info = f.readline().decode() num_seqs_match = re.search('(# BLAST processed )(\d+)( .*)', pass_info) num_sequences = num_seqs_match.group(2) else: f = open(out_file, 'rb') lines = 0 buf_size = 1024 * 1024 read_f = f.raw.read buf = read_f(buf_size) while buf: lines += buf.count(b'\n') buf = read_f(buf_size) num_sequences = lines - 1 # Print log log = OrderedDict() log['PASS'] = num_sequences log['OUTPUT'] = os.path.basename(out_file) log['END'] = 'AssignGenes' printLog(log) return out_file def getArgParser(): """ Defines the ArgumentParser Arguments: None Returns: an ArgumentParser object """ # Define output file names and header fields fields = dedent( ''' output files: igblast Reference alignment results from IgBLAST. ''') # Define ArgumentParser parser = ArgumentParser(description=__doc__, epilog=fields, formatter_class=CommonHelpFormatter, add_help=False) group_help = parser.add_argument_group('help') group_help.add_argument('--version', action='version', version='%(prog)s:' + ' %s %s' %(__version__, __date__)) group_help.add_argument('-h', '--help', action='help', help='show this help message and exit') subparsers = parser.add_subparsers(title='subcommands', dest='command', metavar='', help='Assignment operation') # TODO: This is a temporary fix for Python issue 9253 subparsers.required = True # Parent parser parent_parser = getCommonArgParser(db_in=False, log=False, failed=False, format=False, multiproc=True) # Subparser to run igblastn parser_igblast = subparsers.add_parser('igblast', parents=[parent_parser], formatter_class=CommonHelpFormatter, add_help=False, help='Executes igblastn.', description='Executes igblastn.') group_igblast = parser_igblast.add_argument_group('alignment arguments') group_igblast.add_argument('-s', nargs='+', action='store', dest='seq_files', required=True, help='A list of FASTA files containing sequences to process.') group_igblast.add_argument('-b', action='store', dest='igdata', required=True, help='IgBLAST database directory (IGDATA).') group_igblast.add_argument('--organism', action='store', dest='organism', default=default_organism, choices=choices_organism, help='Organism name.') group_igblast.add_argument('--loci', action='store', dest='loci', default=default_loci, choices=choices_loci, help='The receptor type.') group_igblast.add_argument('--vdb', action='store', dest='vdb', default=None, help='''Name of the custom V reference in the IgBLAST database folder. If not specified, then a default database name with the form imgt___v will be used.''') group_igblast.add_argument('--ddb', action='store', dest='ddb', default=None, help='''Name of the custom D reference in the IgBLAST database folder. If not specified, then a default database name with the form imgt___d will be used.''') group_igblast.add_argument('--jdb', action='store', dest='jdb', default=None, help='''Name of the custom J reference in the IgBLAST database folder. If not specified, then a default database name with the form imgt___j will be used.''') group_igblast.add_argument('--format', action='store', dest='format', default=default_format, choices=choices_format, help='''Specify the output format. The "blast" will result in the IgBLAST "-outfmt 7 std qseq sseq btop" output format. Specifying "airr" will output the AIRR TSV format provided by the IgBLAST argument "-outfmt 19".''') group_igblast.add_argument('--exec', action='store', dest='igblast_exec', default=default_igblastn_exec, help='Path to the igblastn executable.') parser_igblast.set_defaults(func=assignIgBLAST, amino_acid=False) # Subparser to run igblastp parser_igblast_aa = subparsers.add_parser('igblast-aa', parents=[parent_parser], formatter_class=CommonHelpFormatter, add_help=False, help='Executes igblastp.', description='Executes igblastp.') group_igblast_aa = parser_igblast_aa.add_argument_group('alignment arguments') group_igblast_aa.add_argument('-s', nargs='+', action='store', dest='seq_files', required=True, help='A list of FASTA files containing sequences to process.') group_igblast_aa.add_argument('-b', action='store', dest='igdata', required=True, help='IgBLAST database directory (IGDATA).') group_igblast_aa.add_argument('--organism', action='store', dest='organism', default=default_organism, choices=choices_organism, help='Organism name.') group_igblast_aa.add_argument('--loci', action='store', dest='loci', default=default_loci, choices=choices_loci, help='The receptor type.') group_igblast_aa.add_argument('--vdb', action='store', dest='vdb', default=None, help='''Name of the custom V reference in the IgBLAST database folder. If not specified, then a default database name with the form imgt_aa___v will be used.''') group_igblast_aa.add_argument('--exec', action='store', dest='igblast_exec', default=default_igblastp_exec, help='Path to the igblastp executable.') parser_igblast_aa.set_defaults(func=assignIgBLAST, amino_acid=True, ddb=None, jdb=None, format='blast') return parser if __name__ == '__main__': """ Parses command line arguments and calls main function """ # Parse arguments parser = getArgParser() checkArgs(parser) args = parser.parse_args() args_dict = parseCommonArgs(args) # Check if a valid clustering executable was specified if not shutil.which(args_dict['igblast_exec']): parser.error('%s executable not found' % args_dict['igblast_exec']) # Clean arguments dictionary del args_dict['seq_files'] if 'out_files' in args_dict: del args_dict['out_files'] del args_dict['func'] del args_dict['command'] # Call main function for each input file for i, f in enumerate(args.__dict__['seq_files']): args_dict['seq_file'] = f args_dict['out_file'] = args.__dict__['out_files'][i] \ if args.__dict__['out_files'] else None args.func(**args_dict) changeo-1.2.0/bin/ConvertDb.py0000755000175000017500000014237414135625454015503 0ustar nileshnilesh#!/usr/bin/env python3 """ Parses tab delimited database files """ # Info __author__ = 'Jason Anthony Vander Heiden' from changeo import __version__, __date__ # Imports import csv import os import re import shutil from argparse import ArgumentParser from collections import OrderedDict from itertools import chain from textwrap import dedent from time import time from Bio import SeqIO from Bio.Seq import Seq from Bio.SeqRecord import SeqRecord # Presto and changeo imports from presto.Annotation import flattenAnnotation from presto.IO import printLog, printMessage, printProgress, printError, printWarning from changeo.Alignment import gapV from changeo.Applications import default_tbl2asn_exec, runASN from changeo.Defaults import default_id_field, default_seq_field, default_germ_field, \ default_csv_size, default_format, default_out_args from changeo.Commandline import CommonHelpFormatter, checkArgs, getCommonArgParser, parseCommonArgs from changeo.Gene import getCGene, buildGermline from changeo.IO import countDbFile, getFormatOperators, getOutputHandle, AIRRReader, AIRRWriter, \ ChangeoReader, ChangeoWriter, TSVReader, ReceptorData, readGermlines, \ checkFields, yamlDict from changeo.Receptor import AIRRSchema, ChangeoSchema # System settings csv.field_size_limit(default_csv_size) # Defaults default_db_xref = 'IMGT/GENE-DB' default_molecule = 'mRNA' default_product = 'immunoglobulin heavy chain' default_allele_delim = '*' def buildSeqRecord(db_record, id_field, seq_field, meta_fields=None): """ Parses a database record into a SeqRecord Arguments: db_record : a dictionary containing a database record. id_field : the field containing identifiers. seq_field : the field containing sequences. meta_fields : a list of fields to add to sequence annotations. Returns: Bio.SeqRecord.SeqRecord: record. """ # Return None if ID or sequence fields are empty if not db_record[id_field] or not db_record[seq_field]: return None # Create description string desc_dict = OrderedDict([('ID', db_record[id_field])]) if meta_fields is not None: desc_dict.update([(f, db_record[f]) for f in meta_fields if f in db_record]) desc_str = flattenAnnotation(desc_dict) # Create SeqRecord seq_record = SeqRecord(Seq(db_record[seq_field]), id=desc_str, name=desc_str, description='') return seq_record def correctIMGTFields(receptor, references): """ Add IMGT-gaps to IMGT fields in a Receptor object Arguments: receptor (changeo.Receptor.Receptor): Receptor object to modify. references (dict): dictionary of IMGT-gapped references sequences. Returns: changeo.Receptor.Receptor: modified Receptor with IMGT-gapped fields. """ # Initialize update object imgt_dict = {'sequence_imgt': None, 'v_germ_start_imgt': None, 'v_germ_length_imgt': None, 'germline_imgt': None} try: if not all([receptor.sequence_imgt, receptor.v_germ_start_imgt, receptor.v_germ_length_imgt, receptor.v_call]): raise AttributeError except AttributeError: return None # Update IMGT fields try: gapped = gapV(receptor.sequence_imgt, receptor.v_germ_start_imgt, receptor.v_germ_length_imgt, receptor.v_call, references) except KeyError as e: printWarning(e) return None # Verify IMGT-gapped sequence and junction concur try: check = (receptor.junction == gapped['sequence_imgt'][309:(309 + receptor.junction_length)]) except TypeError: check = False if not check: return None # Rebuild germline sequence __, germlines, __ = buildGermline(receptor, references) if germlines is None: return None else: gapped['germline_imgt'] = germlines['full'] # Update return object imgt_dict.update(gapped) return imgt_dict def insertGaps(db_file, references=None, format=default_format, out_file=None, out_args=default_out_args): """ Inserts IMGT numbering into V fields Arguments: db_file : the database file name. references : folder with germline repertoire files. If None, do not updated alignment columns wtih IMGT gaps. format : input format. out_file : output file name. Automatically generated from the input file if None. out_args : common output argument dictionary from parseCommonArgs. Returns: str : output file name """ log = OrderedDict() log['START'] = 'ConvertDb' log['COMMAND'] = 'imgt' log['FILE'] = os.path.basename(db_file) printLog(log) # Define format operators try: reader, writer, schema = getFormatOperators(format) except ValueError: printError('Invalid format %s.' % format) # Open input db_handle = open(db_file, 'rt') db_iter = reader(db_handle) # Check for required columns try: required = ['sequence_imgt', 'v_germ_start_imgt'] checkFields(required, db_iter.fields, schema=schema) except LookupError as e: printError(e) # Load references reference_dict = readGermlines(references) # Check for IMGT-gaps in germlines if all('...' not in x for x in reference_dict.values()): printWarning('Germline reference sequences do not appear to contain IMGT-numbering spacers. Results may be incorrect.') # Open output writer if out_file is not None: pass_handle = open(out_file, 'w') else: pass_handle = getOutputHandle(db_file, out_label='gap', out_dir=out_args['out_dir'], out_name=out_args['out_name'], out_type=schema.out_type) pass_writer = writer(pass_handle, fields=db_iter.fields) # Count records result_count = countDbFile(db_file) # Iterate over records start_time = time() rec_count = pass_count = 0 for rec in db_iter: # Print progress for previous iteration printProgress(rec_count, result_count, 0.05, start_time=start_time) rec_count += 1 # Update IMGT fields imgt_dict = correctIMGTFields(rec, reference_dict) # Write records if imgt_dict is not None: pass_count += 1 rec.setDict(imgt_dict, parse=False) pass_writer.writeReceptor(rec) # Print counts printProgress(rec_count, result_count, 0.05, start_time=start_time) log = OrderedDict() log['OUTPUT'] = os.path.basename(pass_handle.name) log['RECORDS'] = rec_count log['PASS'] = pass_count log['FAIL'] = rec_count - pass_count log['END'] = 'ConvertDb' printLog(log) # Close file handles pass_handle.close() db_handle.close() return pass_handle.name def convertToAIRR(db_file, format=default_format, out_file=None, out_args=default_out_args): """ Converts a Change-O formatted file into an AIRR formatted file Arguments: db_file : the database file name. format : input format. out_file : output file name. Automatically generated from the input file if None. out_args : common output argument dictionary from parseCommonArgs. Returns: str : output file name """ log = OrderedDict() log['START'] = 'ConvertDb' log['COMMAND'] = 'airr' log['FILE'] = os.path.basename(db_file) printLog(log) # Define format operators try: reader, __, schema = getFormatOperators(format) except ValueError: printError('Invalid format %s.' % format) # Open input db_handle = open(db_file, 'rt') db_iter = reader(db_handle) # Set output fields replacing length with end fields in_fields = [schema.toReceptor(f) for f in db_iter.fields] out_fields = [] for f in in_fields: if f in ReceptorData.length_fields and ReceptorData.length_fields[f][0] in in_fields: out_fields.append(ReceptorData.length_fields[f][1]) out_fields.append(f) out_fields = list(OrderedDict.fromkeys(out_fields)) out_fields = [AIRRSchema.fromReceptor(f) for f in out_fields] # Open output writer if out_file is not None: pass_handle = open(out_file, 'w') else: pass_handle = getOutputHandle(db_file, out_label='airr', out_dir=out_args['out_dir'], out_name=out_args['out_name'], out_type=AIRRSchema.out_type) pass_writer = AIRRWriter(pass_handle, fields=out_fields) # Count records result_count = countDbFile(db_file) # Iterate over records start_time = time() rec_count = 0 for rec in db_iter: # Print progress for previous iteration printProgress(rec_count, result_count, 0.05, start_time=start_time) rec_count += 1 # Write records pass_writer.writeReceptor(rec) # Print counts printProgress(rec_count, result_count, 0.05, start_time=start_time) log = OrderedDict() log['OUTPUT'] = os.path.basename(pass_handle.name) log['RECORDS'] = rec_count log['END'] = 'ConvertDb' printLog(log) # Close file handles pass_handle.close() db_handle.close() return pass_handle.name def convertToChangeo(db_file, out_file=None, out_args=default_out_args): """ Converts an AIRR formatted file into an Change-O formatted file Arguments: db_file: the database file name. out_file : output file name. Automatically generated from the input file if None. out_args : common output argument dictionary from parseCommonArgs. Returns: str : output file name. """ log = OrderedDict() log['START'] = 'ConvertDb' log['COMMAND'] = 'changeo' log['FILE'] = os.path.basename(db_file) printLog(log) # Open input db_handle = open(db_file, 'rt') db_iter = AIRRReader(db_handle) # Set output fields replacing length with end fields in_fields = [AIRRSchema.toReceptor(f) for f in db_iter.fields] out_fields = [] for f in in_fields: out_fields.append(f) if f in ReceptorData.end_fields and ReceptorData.end_fields[f][0] in in_fields: out_fields.append(ReceptorData.end_fields[f][1]) out_fields = list(OrderedDict.fromkeys(out_fields)) out_fields = [ChangeoSchema.fromReceptor(f) for f in out_fields] # Open output writer if out_file is not None: pass_handle = open(out_file, 'w') else: pass_handle = getOutputHandle(db_file, out_label='changeo', out_dir=out_args['out_dir'], out_name=out_args['out_name'], out_type=ChangeoSchema.out_type) pass_writer = ChangeoWriter(pass_handle, fields=out_fields) # Count records result_count = countDbFile(db_file) # Iterate over records start_time = time() rec_count = 0 for rec in db_iter: # Print progress for previous iteration printProgress(rec_count, result_count, 0.05, start_time=start_time) rec_count += 1 # Write records pass_writer.writeReceptor(rec) # Print counts printProgress(rec_count, result_count, 0.05, start_time=start_time) log = OrderedDict() log['OUTPUT'] = os.path.basename(pass_handle.name) log['RECORDS'] = rec_count log['END'] = 'ConvertDb' printLog(log) # Close file handles pass_handle.close() db_handle.close() return pass_handle.name # TODO: SHOULD ALLOW FOR UNSORTED CLUSTER COLUMN # TODO: SHOULD ALLOW FOR GROUPING FIELDS def convertToBaseline(db_file, id_field=default_id_field, seq_field=default_seq_field, germ_field=default_germ_field, cluster_field=None, meta_fields=None, out_file=None, out_args=default_out_args): """ Builds fasta files from database records Arguments: db_file : the database file name. id_field : the field containing identifiers. seq_field : the field containing sample sequences. germ_field : the field containing germline sequences. cluster_field : the field containing clonal groupings; if None write the germline for each record. meta_fields : a list of fields to add to sequence annotations. out_file : output file name. Automatically generated from the input file if None. out_args : common output argument dictionary from parseCommonArgs. Returns: str : output file name """ log = OrderedDict() log['START'] = 'ConvertDb' log['COMMAND'] = 'fasta' log['FILE'] = os.path.basename(db_file) log['ID_FIELD'] = id_field log['SEQ_FIELD'] = seq_field log['GERM_FIELD'] = germ_field log['CLUSTER_FIELD'] = cluster_field if meta_fields is not None: log['META_FIELDS'] = ','.join(meta_fields) printLog(log) # Open input db_handle = open(db_file, 'rt') db_iter = TSVReader(db_handle) result_count = countDbFile(db_file) # Open output if out_file is not None: pass_handle = open(out_file, 'w') else: pass_handle = getOutputHandle(db_file, out_label='sequences', out_dir=out_args['out_dir'], out_name=out_args['out_name'], out_type='clip') # Iterate over records start_time = time() rec_count, germ_count, pass_count, fail_count = 0, 0, 0, 0 cluster_last = None for rec in db_iter: # Print progress for previous iteration printProgress(rec_count, result_count, 0.05, start_time=start_time) rec_count += 1 # Update cluster ID cluster = rec.get(cluster_field, None) # Get germline SeqRecord when needed if cluster_field is None: germ = buildSeqRecord(rec, id_field, germ_field, meta_fields) germ.id = '>' + germ.id elif cluster != cluster_last: germ = buildSeqRecord(rec, cluster_field, germ_field) germ.id = '>' + germ.id else: germ = None # Get read SeqRecord seq = buildSeqRecord(rec, id_field, seq_field, meta_fields) # Write germline if germ is not None: germ_count += 1 SeqIO.write(germ, pass_handle, 'fasta') # Write sequences if seq is not None: pass_count += 1 SeqIO.write(seq, pass_handle, 'fasta') else: fail_count += 1 # Set last cluster ID cluster_last = cluster # Print counts printProgress(rec_count, result_count, 0.05, start_time=start_time) log = OrderedDict() log['OUTPUT'] = os.path.basename(pass_handle.name) log['RECORDS'] = rec_count log['GERMLINES'] = germ_count log['PASS'] = pass_count log['FAIL'] = fail_count log['END'] = 'ConvertDb' printLog(log) # Close file handles pass_handle.close() db_handle.close() return pass_handle.name def convertToFasta(db_file, id_field=default_id_field, seq_field=default_seq_field, meta_fields=None, out_file=None, out_args=default_out_args): """ Builds fasta files from database records Arguments: db_file : the database file name. id_field : the field containing identifiers. seq_field : the field containing sequences. meta_fields : a list of fields to add to sequence annotations. out_file : output file name. Automatically generated from the input file if None. out_args : common output argument dictionary from parseCommonArgs. Returns: str : output file name. """ log = OrderedDict() log['START'] = 'ConvertDb' log['COMMAND'] = 'fasta' log['FILE'] = os.path.basename(db_file) log['ID_FIELD'] = id_field log['SEQ_FIELD'] = seq_field if meta_fields is not None: log['META_FIELDS'] = ','.join(meta_fields) printLog(log) # Open input out_type = 'fasta' db_handle = open(db_file, 'rt') db_iter = TSVReader(db_handle) result_count = countDbFile(db_file) # Open output if out_file is not None: pass_handle = open(out_file, 'w') else: pass_handle = getOutputHandle(db_file, out_label='sequences', out_dir=out_args['out_dir'], out_name=out_args['out_name'], out_type=out_type) # Iterate over records start_time = time() rec_count, pass_count, fail_count = 0, 0, 0 for rec in db_iter: # Print progress for previous iteration printProgress(rec_count, result_count, 0.05, start_time=start_time) rec_count += 1 # Get SeqRecord seq = buildSeqRecord(rec, id_field, seq_field, meta_fields) # Write sequences if seq is not None: pass_count += 1 SeqIO.write(seq, pass_handle, out_type) else: fail_count += 1 # Print counts printProgress(rec_count, result_count, 0.05, start_time=start_time) log = OrderedDict() log['OUTPUT'] = os.path.basename(pass_handle.name) log['RECORDS'] = rec_count log['PASS'] = pass_count log['FAIL'] = fail_count log['END'] = 'ConvertDb' printLog(log) # Close file handles pass_handle.close() db_handle.close() return pass_handle.name def makeGenbankFeatures(record, start=None, end=None, product=default_product, inference=None, db_xref=None, c_field=None, allow_stop=False, asis_calls=False, allele_delim=default_allele_delim): """ Creates a feature table for GenBank submissions Arguments: record : Receptor record. start : start position of the modified sequence in the input sequence. Used for feature position offsets. end : end position of the modified sequence in the input sequence. Used for feature position offsets. product : Product (protein) name. inference : Reference alignment tool. db_xref : Reference database name. c_field : column containing the C region gene call. allow_stop : if True retain records with junctions having stop codons. asis_calls : if True do not parse gene calls for IMGT nomenclature. allele_delim : delimiter separating the gene name from the allele number when asis_calls=True. Returns: dict : dictionary defining GenBank features where the key is a tuple (start, end, feature key) and values are a list of tuples contain (qualifier key, qualifier value). """ # .tbl file format # Line 1, Column 1: Start location of feature # Line 1, Column 2: Stop location of feature # Line 1, Column 3: Feature key # Line 2, Column 4: Qualifier key # Line 2, Column 5: Qualifier value # Get genes and alleles c_gene = None if not asis_calls: # V gene v_gene = record.getVGene() v_allele = record.getVAlleleNumber() # D gene d_gene = record.getDGene() d_allele = record.getDAlleleNumber() # J gene j_gene = record.getJGene() j_allele = record.getJAlleleNumber() # C region if c_field is not None: c_gene = getCGene(record.getField(c_field), action='first') else: # V gene v_split = iter(record.v_call.rsplit(allele_delim, maxsplit=1)) v_gene = next(v_split, None) v_allele = next(v_split, None) # D gene d_split = iter(record.d_call.rsplit(allele_delim, maxsplit=1)) d_gene = next(d_split, None) d_allele = next(d_split, None) # J gene j_split = iter(record.j_call.rsplit(allele_delim, maxsplit=1)) j_gene = next(j_split, None) j_allele = next(j_split, None) # C region if c_field is not None: c_gene = record.getField(c_field) # Fail if V or J is missing if v_gene is None or j_gene is None: return None # Set position offsets if required start_trim = 0 if start is None else start end_trim = 0 if end is None else len(record.sequence_input) - end source_len = len(record.sequence_input) - end_trim # Define return object result = OrderedDict() # C_region # gene # db_xref # inference c_region_start = record.j_seq_end + 1 - start_trim c_region_length = len(record.sequence_input[(c_region_start + start_trim - 1):]) - end_trim if c_region_length > 0: if c_gene is not None: c_region = [('gene', c_gene)] if db_xref is not None: c_region.append(('db_xref', '%s:%s' % (db_xref, c_gene))) else: c_region = [] # Assign C_region feature c_region_end = c_region_start + c_region_length - 1 result[(c_region_start, '>%i' % c_region_end, 'C_region')] = c_region # Preserve J segment end position j_end = record.j_seq_end # Check for range error if c_region_end > source_len: return None else: # Trim J segment end position j_end = record.j_seq_end + c_region_length # V_region variable_start = max(record.v_seq_start - start_trim, 1) variable_end = j_end - start_trim result[(variable_start, variable_end, 'V_region')] = [] # Check for range error if variable_end > source_len: return None # Product feature result[(variable_start, variable_end, 'misc_feature')] = [('note', '%s variable region' % product)] # V_segment # gene (gene name) # allele (allele only, without gene name, don't use if ambiguous) # db_xref (database link) # inference (reference alignment tool) v_segment = [('gene', v_gene)] if v_allele is not None: v_segment.append(('allele', v_allele)) if db_xref is not None: v_segment.append(('db_xref', '%s:%s' % (db_xref, v_gene))) if inference is not None: v_segment.append(('inference', 'COORDINATES:alignment:%s' % inference)) result[(variable_start, record.v_seq_end - start_trim, 'V_segment')] = v_segment # D_segment # gene # allele # db_xref # inference if d_gene: d_segment = [('gene', d_gene)] if d_allele is not None: d_segment.append(('allele', d_allele)) if db_xref is not None: d_segment.append(('db_xref', '%s:%s' % (db_xref, d_gene))) if inference is not None: d_segment.append(('inference', 'COORDINATES:alignment:%s' % inference)) result[(record.d_seq_start - start_trim, record.d_seq_end - start_trim, 'D_segment')] = d_segment # J_segment # gene # allele # db_xref # inference j_segment = [('gene', j_gene)] if j_allele is not None: j_segment.append(('allele', j_allele)) if db_xref is not None: j_segment.append(('db_xref', '%s:%s' % (db_xref, j_gene))) if inference is not None: j_segment.append(('inference', 'COORDINATES:alignment:%s' % inference)) result[(record.j_seq_start - start_trim, j_end - start_trim, 'J_segment')] = j_segment # CDS # codon_start (must indicate codon offset) # function = JUNCTION # inference # print(record.v_germ_end_imgt, record.v_seq_end, record.v_germ_end_imgt - 310) # print(record.junction_start, record.junction_end, record.junction_length) if record.junction_start is not None and record.junction_end is not None: # Define junction boundaries junction_start = record.junction_start - start_trim junction_end = record.junction_end - start_trim # CDS record cds_start = '<%i' % junction_start cds_end = '>%i' % junction_end cds_record = [('function', 'JUNCTION')] if inference is not None: cds_record.append(('inference', 'COORDINATES:protein motif:%s' % inference)) # Check for valid translation junction_seq = record.sequence_input[(junction_start - 1):junction_end] if len(junction_seq) % 3 > 0: junction_seq = junction_seq + 'N' * (3 - len(junction_seq) % 3) junction_aa = Seq(junction_seq).translate() # Return invalid record upon junction stop codon if '*' in junction_aa and not allow_stop: return None elif '*' in junction_aa: cds_record.append(('note', '%s junction region' % product)) result[(cds_start, cds_end, 'misc_feature')] = cds_record else: cds_record.append(('product', '%s junction region' % product)) cds_record.append(('codon_start', 1)) result[(cds_start, cds_end, 'CDS')] = cds_record return result def makeGenbankSequence(record, name=None, label=None, count_field=None, index_field=None, molecule=default_molecule, features=None): """ Creates a sequence for GenBank submissions Arguments: record : Receptor record. name : sequence identifier for the output sequence. If None, use the original sequence identifier. label : a string to use as a label for the ID. if None do not add a field label. count_field : field name to populate the AIRR_READ_COUNT note. index_field : field name to populate the AIRR_CELL_INDEX note. molecule : source molecule (eg, "mRNA", "genomic DNA") features : dictionary of sample features (BioSample attributes) to add to the description of each record. Returns: dict: dictionary with {'record': SeqRecord, 'start': start position in raw sequence, 'end': end position in raw sequence} """ # Replace gaps with N seq = record.sequence_input seq = seq.replace('-', 'N').replace('.', 'N') # Strip leading and trailing Ns head_match = re.search('^N+', seq) tail_match = re.search('N+$', seq) seq_start = head_match.end() if head_match else 0 seq_end = tail_match.start() if tail_match else len(seq) # Define ID if name is None: name = record.sequence_id.split(' ')[0] if label is not None: name = '%s=%s' % (label, name) if features is not None: sample_desc = ' '.join(['[%s=%s]' % (k, v) for k, v in features.items()]) name = '%s %s' % (name, sample_desc) name = '%s [moltype=%s] [keyword=TLS; Targeted Locus Study; AIRR; MiAIRR:1.0]' % (name, molecule) # Notes note_dict = OrderedDict() if count_field is not None: note_dict['AIRR_READ_COUNT'] = record.getField(count_field) if index_field is not None: note_dict['AIRR_CELL_INDEX'] = record.getField(index_field) if note_dict: note = '; '.join(['%s:%s' % (k, v) for k, v in note_dict.items()]) name = '%s [note=%s]' % (name, note) # Return SeqRecord and positions record = SeqRecord(Seq(seq[seq_start:seq_end]), id=name, name=name, description='') result = {'record': record, 'start': seq_start, 'end': seq_end} return result def convertToGenbank(db_file, inference=None, db_xref=None, molecule=default_molecule, product=default_product, features=None, c_field=None, label=None, count_field=None, index_field=None, allow_stop=False, asis_id=False, asis_calls=False, allele_delim=default_allele_delim, build_asn=False, asn_template=None, tbl2asn_exec=default_tbl2asn_exec, format=default_format, out_file=None, out_args=default_out_args): """ Builds GenBank submission fasta and table files Arguments: db_file : the database file name. inference : reference alignment tool. db_xref : reference database link. molecule : source molecule (eg, "mRNA", "genomic DNA") product : Product (protein) name. features : dictionary of sample features (BioSample attributes) to add to the description of each record. c_field : column containing the C region gene call. label : a string to use as a label for the ID. if None do not add a field label. count_field : field name to populate the AIRR_READ_COUNT note. index_field : field name to populate the AIRR_CELL_INDEX note. allow_stop : if True retain records with junctions having stop codons. asis_id : if True use the original sequence ID for the output IDs. asis_calls : if True do not parse gene calls for IMGT nomenclature. allele_delim : delimiter separating the gene name from the allele number when asis_calls=True. build_asn : if True run tbl2asn on the generated .tbl and .fsa files. asn_template : template file (.sbt) to pass to tbl2asn. tbl2asn_exec : name of or path to the tbl2asn executable. format : input and output format. out_file : output file name without extension. Automatically generated from the input file if None. out_args : common output argument dictionary from parseCommonArgs. Returns: tuple : the output (feature table, fasta) file names. """ log = OrderedDict() log['START'] = 'ConvertDb' log['COMMAND'] = 'genbank' log['FILE'] = os.path.basename(db_file) printLog(log) # Define format operators try: reader, __, schema = getFormatOperators(format) except ValueError: printError('Invalid format %s.' % format) # Open input db_handle = open(db_file, 'rt') db_iter = reader(db_handle) # Check for required columns try: required = ['sequence_input', 'v_call', 'd_call', 'j_call', 'v_seq_start', 'd_seq_start', 'j_seq_start'] checkFields(required, db_iter.fields, schema=schema) except LookupError as e: printError(e) # Open output if out_file is not None: out_name, __ = os.path.splitext(out_file) fsa_handle = open('%s.fsa' % out_name, 'w') tbl_handle = open('%s.tbl' % out_name, 'w') else: fsa_handle = getOutputHandle(db_file, out_label='genbank', out_dir=out_args['out_dir'], out_name=out_args['out_name'], out_type='fsa') tbl_handle = getOutputHandle(db_file, out_label='genbank', out_dir=out_args['out_dir'], out_name=out_args['out_name'], out_type='tbl') # Count records result_count = countDbFile(db_file) # Define writer writer = csv.writer(tbl_handle, delimiter='\t', quoting=csv.QUOTE_NONE) # Iterate over records start_time = time() rec_count, pass_count, fail_count = 0, 0, 0 for rec in db_iter: # Print progress for previous iteration printProgress(rec_count, result_count, 0.05, start_time=start_time) rec_count += 1 # Extract table dictionary name = None if asis_id else rec_count seq = makeGenbankSequence(rec, name=name, label=label, count_field=count_field, index_field=index_field, molecule=molecule, features=features) tbl = makeGenbankFeatures(rec, start=seq['start'], end=seq['end'], product=product, db_xref=db_xref, inference=inference, c_field=c_field, allow_stop=allow_stop, asis_calls=asis_calls, allele_delim=allele_delim) if tbl is not None: pass_count +=1 # Write table writer.writerow(['>Features', seq['record'].id]) for feature, qualifiers in tbl.items(): writer.writerow(feature) if qualifiers: for x in qualifiers: writer.writerow(list(chain(['', '', ''], x))) # Write sequence SeqIO.write(seq['record'], fsa_handle, 'fasta') else: fail_count += 1 # Final progress bar printProgress(rec_count, result_count, 0.05, start_time=start_time) # Run tbl2asn if build_asn: start_time = time() printMessage('Running tbl2asn', start_time=start_time, width=25) result = runASN(fsa_handle.name, template=asn_template, exec=tbl2asn_exec) printMessage('Done', start_time=start_time, end=True, width=25) # Print ending console log log = OrderedDict() log['OUTPUT_TBL'] = os.path.basename(tbl_handle.name) log['OUTPUT_FSA'] = os.path.basename(fsa_handle.name) log['RECORDS'] = rec_count log['PASS'] = pass_count log['FAIL'] = fail_count log['END'] = 'ConvertDb' printLog(log) # Close file handles tbl_handle.close() fsa_handle.close() db_handle.close() return (tbl_handle.name, fsa_handle.name) def getArgParser(): """ Defines the ArgumentParser Arguments: None Returns: an ArgumentParser object """ # Define input and output field help message fields = dedent( ''' output files: airr AIRR formatted database files. changeo Change-O formatted database files. sequences FASTA formatted sequences output from the subcommands fasta and clip. genbank feature tables and fasta files containing MiAIRR compliant input for tbl2asn. required fields: sequence_id, sequence, sequence_alignment, junction, v_call, d_call, j_call, v_germline_start, v_germline_end, v_sequence_start, v_sequence_end, d_sequence_start, d_sequence_end, j_sequence_start, j_sequence_end optional fields: germline_alignment, c_call, clone_id ''') # Define ArgumentParser parser = ArgumentParser(description=__doc__, epilog=fields, formatter_class=CommonHelpFormatter, add_help=False) group_help = parser.add_argument_group('help') group_help.add_argument('--version', action='version', version='%(prog)s:' + ' %s %s' %(__version__, __date__)) group_help.add_argument('-h', '--help', action='help', help='show this help message and exit') subparsers = parser.add_subparsers(title='subcommands', dest='command', metavar='', help='Database operation') # TODO: This is a temporary fix for Python issue 9253 subparsers.required = True # Define parent parsers default_parent = getCommonArgParser(failed=False, log=False, format=False) format_parent = getCommonArgParser(failed=False, log=False) # Subparser to convert changeo to AIRR files parser_airr = subparsers.add_parser('airr', parents=[default_parent], formatter_class=CommonHelpFormatter, add_help=False, help='Converts input to an AIRR TSV file.', description='Converts input to an AIRR TSV file.') parser_airr.set_defaults(func=convertToAIRR) # Subparser to convert AIRR to changeo files parser_changeo = subparsers.add_parser('changeo', parents=[default_parent], formatter_class=CommonHelpFormatter, add_help=False, help='Converts input into a Change-O TSV file.', description='Converts input into a Change-O TSV file.') parser_changeo.set_defaults(func=convertToChangeo) # Subparser to insert IMGT-gaps # desc_gap = dedent(''' # Inserts IMGT numbering spacers into the observed sequence # (SEQUENCE_IMGT, sequence_alignment) and rebuilds the germline sequence # (GERMLINE_IMGT, germline_alignment) if present. Also adjusts the values # in the V germline coordinate fields (V_GERM_START_IMGT, V_GERM_LENGTH_IMGT; # v_germline_end, v_germline_start), which are required. # ''') # parser_gap = subparsers.add_parser('gap', parents=[format_parent], # formatter_class=CommonHelpFormatter, add_help=False, # help='Inserts IMGT numbering spacers into the V region.', # description=desc_gap) # group_gap = parser_gap.add_argument_group('conversion arguments') # group_gap.add_argument('-r', nargs='+', action='store', dest='references', required=False, # help='''List of folders and/or fasta files containing # IMGT-gapped germline sequences corresponding to the # set of germlines used for the alignment.''') # parser_gap.set_defaults(func=insertGaps) # Subparser to convert database entries to sequence file parser_fasta = subparsers.add_parser('fasta', parents=[default_parent], formatter_class=CommonHelpFormatter, add_help=False, help='Creates a fasta file from database records.', description='Creates a fasta file from database records.') group_fasta = parser_fasta.add_argument_group('conversion arguments') group_fasta.add_argument('--if', action='store', dest='id_field', default=default_id_field, help='The name of the field containing identifiers') group_fasta.add_argument('--sf', action='store', dest='seq_field', default=default_seq_field, help='The name of the field containing sequences') group_fasta.add_argument('--mf', nargs='+', action='store', dest='meta_fields', help='List of annotation fields to add to the sequence description') parser_fasta.set_defaults(func=convertToFasta) # Subparser to convert database entries to clip-fasta file parser_baseln = subparsers.add_parser('baseline', parents=[default_parent], formatter_class=CommonHelpFormatter, add_help=False, description='Creates a BASELINe fasta file from database records.', help='''Creates a specially formatted fasta file from database records for input into the BASELINe website. The format groups clonally related sequences sequentially, with the germline sequence preceding each clone and denoted by headers starting with ">>".''') group_baseln = parser_baseln.add_argument_group('conversion arguments') group_baseln.add_argument('--if', action='store', dest='id_field', default=default_id_field, help='The name of the field containing identifiers') group_baseln.add_argument('--sf', action='store', dest='seq_field', default=default_seq_field, help='The name of the field containing reads') group_baseln.add_argument('--gf', action='store', dest='germ_field', default=default_germ_field, help='The name of the field containing germline sequences') group_baseln.add_argument('--cf', action='store', dest='cluster_field', default=None, help='The name of the field containing containing sorted clone IDs') group_baseln.add_argument('--mf', nargs='+', action='store', dest='meta_fields', help='List of annotation fields to add to the sequence description') parser_baseln.set_defaults(func=convertToBaseline) # Subparser to convert database entries to a GenBank fasta and feature table file parser_gb = subparsers.add_parser('genbank', parents=[format_parent], formatter_class=CommonHelpFormatter, add_help=False, help='Creates files for GenBank/TLS submissions.', description='Creates files for GenBank/TLS submissions.') # Genbank source information arguments group_gb_src = parser_gb.add_argument_group('source information arguments') group_gb_src.add_argument('--mol', action='store', dest='molecule', default=default_molecule, help='''The source molecule type. Usually one of "mRNA" or "genomic DNA".''') group_gb_src.add_argument('--product', action='store', dest='product', default=default_product, help='''The product name, such as "immunoglobulin heavy chain".''') group_gb_src.add_argument('--db', action='store', dest='db_xref', default=None, help='''Name of the reference database used for alignment. Usually "IMGT/GENE-DB".''') group_gb_src.add_argument('--inf', action='store', dest='inference', default=None, help='''Name and version of the inference tool used for reference alignment in the form tool:version.''') # Genbank sample information arguments group_gb_sam = parser_gb.add_argument_group('sample information arguments') group_gb_sam.add_argument('--organism', action='store', dest='organism', default=None, help='The scientific name of the organism.') group_gb_sam.add_argument('--sex', action='store', dest='sex', default=None, help='''If specified, adds the given sex annotation to the fasta headers.''') group_gb_sam.add_argument('--isolate', action='store', dest='isolate', default=None, help='''If specified, adds the given isolate annotation (sample label) to the fasta headers.''') group_gb_sam.add_argument('--tissue', action='store', dest='tissue', default=None, help='''If specified, adds the given tissue-type annotation to the fasta headers.''') group_gb_sam.add_argument('--cell-type', action='store', dest='cell_type', default=None, help='''If specified, adds the given cell-type annotation to the fasta headers.''') group_gb_sam.add_argument('-y', action='store', dest='yaml_config', default=None, help='''A yaml file specifying sample features (BioSample attributes) in the form \'variable: value\'. If specified, any features provided in the yaml file will override those provided at the commandline. Note, this config file applies to sample features only and cannot be used for required source features such as the --product or --mol argument.''') # General genbank conversion arguments group_gb_cvt = parser_gb.add_argument_group('conversion arguments') group_gb_cvt.add_argument('--label', action='store', dest='label', default=None, help='''If specified, add a field name to the sequence identifier. Sequence identifiers will be output in the form