vdjtools-1.2.1+git20190311/ 000755 001750 001750 00000000000 13525024557 016127 5 ustar 00moeller moeller 000000 000000 vdjtools-1.2.1+git20190311/README.md 000644 001750 001750 00000005640 13525023773 017412 0 ustar 00moeller moeller 000000 000000 [](https://travis-ci.org/mikessh/vdjtools)
[](https://jitpack.io/#mikessh/vdjtools)
## VDJtools
A comprehensive analysis framework for T-cell and B-cell repertoire sequencing data. Compiled binaries are available from [here](https://github.com/mikessh/vdjtools/releases/latest). You can download them and execute as
```bash
java -jar vdjtools.jar ...
```
Make sure that you've specified the full/correct path to jar file. In case of Java Heap Space exception, you can increase the JVM memory limit by adding ``-Xmx20G`` (for extra 20G) after the ``-jar`` argument.
The software is cross-platform and requires Java v1.8+ to run and R to perform plotting.
Easy installation on **MacOS/Linux** via [Homebrew](http://brew.sh/) or [Linuxbrew](http://linuxbrew.sh/):
```bash
brew tap homebrew/science
brew tap mikessh/repseq
brew install vdjtools
vdjtools CalcBasicStats ...
```
See [homebrew-repseq](https://github.com/mikessh/homebrew-repseq) for other RepSeq analysis software Homebrew installers.
List of features and detailed documentation can be found at [ReadTheDocs](http://vdjtools-doc.readthedocs.io).
Example datasets and shell scripts are provided in a separate [repository](https://github.com/mikessh/vdjtools-examples).
### Please cite VDJtools as:
Shugay M et al. VDJtools: Unifying Post-analysis of T Cell Receptor Repertoires. [PLoS Comp Biol 2015; 11(11):e1004503-e1004503](http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004503).
### Some recent publications where VDJtools was used:
- Feng Y et al. A mechanism for expansion of regulatory T-cell repertoire and its role in self-tolerance. [Nature 2015; doi:10.1038/nature16141](http://www.nature.com/nature/journal/vaop/ncurrent/full/nature16141.html)
- Britanova OV et al. Dynamics of Individual T Cell Repertoires: From Cord Blood to Centenarians. [J Immunol 2016; doi:10.4049/jimmunol.1600005](http://www.jimmunol.org/content/196/12/5005.short)
- Joachims ML et al. Single-cell analysis of glandular T cell receptors in Sjögren’s syndrome. [JCI Insight 2016; doi:10.1172/jci.insight.85609](https://insight.jci.org/articles/view/85609)
- Plitas G et al. Regulatory T cells exhibit distinct features in human breast cancer. [Immunity 2017; doi.org:10.1016/j.immuni.2016.10.032](http://www.sciencedirect.com/science/article/pii/S1074761316304435)
- Izraelson M et al. Comparative Analysis of Murine T Cell Receptor Repertoires. [Immunology 2017; doi.org:10.1111/imm.12857](http://onlinelibrary.wiley.com/doi/10.1111/imm.12857/full)
- Bolotin DA et al. Antigen receptor repertoire profiling from RNA-seq data. [Nat Biotech 2017; doi:10.1038/nbt.3979](https://www.nature.com/articles/nbt.3979)
- Meng W et al. An atlas of B-cell clonal distribution in the human body. [Nat Biotech 2017; doi:10.1038/nbt.3942](https://www.nature.com/articles/nbt.3942) vdjtools-1.2.1+git20190311/travis.post.sh 000644 001750 001750 00000005307 13525023773 020763 0 ustar 00moeller moeller 000000 000000 VDJTOOLS="java -Xmx4G -jar ../build/libs/vdjtools-*.jar"
cd aging_lite/
# basic analysis
$VDJTOOLS CalcBasicStats -m metadata.txt out/0
$VDJTOOLS CalcSpectratype -m metadata.txt out/1
$VDJTOOLS CalcSegmentUsage -m metadata.txt -p -f age -n out/2
$VDJTOOLS PlotFancySpectratype A4-i125.txt.gz out/3
$VDJTOOLS PlotSpectratypeV A4-i125.txt.gz out/4
$VDJTOOLS PlotFancyVJUsage A4-i125.txt.gz out/5
# diversity estimates
$VDJTOOLS PlotQuantileStats A4-i125.txt.gz out/6
$VDJTOOLS CalcDiversityStats -m metadata.txt out/7
$VDJTOOLS RarefactionPlot -m metadata.txt -f age -n -l sample.id out/8
# sample overlap
$VDJTOOLS OverlapPair -p A4-i189.txt.gz A4-i190.txt.gz out/9
$VDJTOOLS CalcPairwiseDistances -m metadata.small.txt out/10
$VDJTOOLS ClusterSamples -p -f age -n -l sample.id out/10 out/10.age
# sample operations and filtering
$VDJTOOLS Decontaminate -m metadata.txt -c out/dec/
$VDJTOOLS Downsample -m metadata.txt -c -x 10000 out/ds/
$VDJTOOLS FilterNonFunctional -m metadata.txt -c out/nf/
$VDJTOOLS JoinSamples -p -m metadata.small.txt out/12
$VDJTOOLS PoolSamples -m metadata.small.txt out/13
# annotation
$VDJTOOLS CalcCdrAaStats -m metadata.txt out/14
$VDJTOOLS Annotate -m metadata.txt out/annot/
$VDJTOOLS SegmentsToFamilies -s human -m metadata.txt out/s2f/
$VDJTOOLS CalcDegreeStats -m metadata.txt out/degstat/
# check all output files are generated
cd out/
ls -lh
flist=(
'0.basicstats.txt'
'1.spectratype.insert.wt.txt'
'1.spectratype.ndn.wt.txt'
'1.spectratype.nt.wt.txt'
'2.segments.wt.J.pdf'
'2.segments.wt.J.txt'
'2.segments.wt.V.pdf'
'2.segments.wt.V.txt'
'3.fancyspectra.pdf'
'3.fancyspectra.txt'
'4.spectraV.wt.pdf'
'4.spectraV.wt.txt'
'5.fancyvj.wt.pdf'
'5.fancyvj.wt.txt'
'6.qstat.pdf'
'6.qstat.txt'
'7.diversity.strict.exact.txt'
'7.diversity.strict.resampled.txt'
'8.rarefaction.strict.pdf'
'8.rarefaction.strict.txt'
'9.paired.strict.summary.txt'
'9.paired.strict.table.collapsed.pdf'
'9.paired.strict.table.collapsed.txt'
'9.paired.strict.table.txt'
'9.strict.paired.scatter.pdf'
'10.age.hc.aa.F.pdf'
'10.age.mds.aa.F.pdf'
'10.age.mds.aa.F.txt'
'10.intersect.batch.aa.txt'
'12.join.aa.summary.txt'
'12.join.aa.table.txt'
'12.join.aa.venn.pdf'
'13.pool.aa.table.txt'
'14.cdr3aa.stat.unwt.unnorm.txt'
'dec/metadata.txt'
'ds/metadata.txt'
'nf/metadata.txt'
'annot/metadata.txt'
's2f/metadata.txt'
'degstat/metadata.txt'
)
for f in "${flist[@]}"
do
if [[ ! -s $f ]]
then exit 1
fi
done vdjtools-1.2.1+git20190311/doc/ 000755 001750 001750 00000000000 13525023773 016673 5 ustar 00moeller moeller 000000 000000 vdjtools-1.2.1+git20190311/doc/overlap.rst 000644 001750 001750 00000122031 13525023773 021074 0 ustar 00moeller moeller 000000 000000 .. _overlap:
Repertoire overlap analysis
---------------------------
.. _OverlapPair:
OverlapPair
^^^^^^^^^^^
Performs a comprehensive analysis of clonotype sharing for a pair of samples.
Command line usage
~~~~~~~~~~~~~~~~~~
.. code-block:: bash
$VDJTOOLS OverlapPair [options] sample1.txt sample2.txt output_prefix
Parameters:
+-------------+------------------------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| Shorthand | Long name | Argument | Description |
+=============+========================+============+=====================================================================================================================================================+
| ``-i`` | ``--intersect-type`` | string | Sample intersection rule. Defaults to ``strict``. See :ref:`common_params` |
+-------------+------------------------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-t`` | ``--top`` | int | Number of top clonotypes to visualize explicitly on stack are plot and provide in the collapsed joint table. Should not exceed 100, default is 20 |
+-------------+------------------------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-p`` | ``--plot`` | | Turns on plotting. See :ref:`common_params` |
+-------------+------------------------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| | ``--plot-area-v2`` | | Alternative plotting mode, clonotype CDR3 sequences are shown at plot sides and connected to corresponding areas with lines. |
+-------------+------------------------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-h`` | ``--help`` | | Display help message |
+-------------+------------------------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------+
Tabular output
~~~~~~~~~~~~~~
Two joint clonotype abundance tables with
``paired.[intersection type shorthand].table.txt`` and
``paired.[intersection type shorthand].table.collapsed.txt`` suffices
are generated. Tables are written in :ref:`vdjtools_format`.
Collapsed table contains rows corresponding to top N clonotypes and
summary abundances for non-overlapping and hidden clonotypes.
See :ref:`joint_table_structure` for a detailed description of table columns.
A summary table (``paired.[intersection type shorthand].summary.txt``
suffix) containing information on sample overlap size, etc, is also
provided. See tabular output in :ref:`CalcPairwiseDistances` section
below for details.
Graphical output
~~~~~~~~~~~~~~~~
A composite scatterplot plot having
``paired.[intersection type shorthand].scatter.pdf`` suffix is
generated. The second plot file with
``.paired.[intersection type shorthand].table.collapsed.pdf`` suffix
contains a clonotype stack area plot.
.. figure:: _static/images/modules/intersect-pair-scatter.png
:align: center
:scale: 50 %
**Clonotype scatterplot**. Main frame contains a scatterplot of clonotype abundances (overlapping
clonotypes only) and a linear regression. Point size is scaled to the geometric mean of clonotype
frequency in both samples. Scatterplot axes represent log10 clonotype frequencies in each sample.
Two marginal histograms show the overlapping (red) and total clonotype (grey) abundance distributions
in corresponding sample. Histograms are weighted by clonotype abundance, i.e. they display
read distribution by clonotype size.
.. figure:: _static/images/modules/intersect-pair-stack.png
:align: center
:scale: 50 %
**Shared clonotype abundance plot**. Plot shows details for top 20 clonotypes
shared between samples, as well as collapsed ("NotShown") and non-overlapping
("NonOverlapping") clonotypes. Clonotype CDR3 amino acid sequence is
plotted against the sample where the clonotype reaches maximum
abundance.
--------------
.. _CalcPairwiseDistances:
CalcPairwiseDistances
^^^^^^^^^^^^^^^^^^^^^
Performs an all-versus-all pairwise overlap for a list of samples
and computes a set of repertoire similarity measures. At least 3 samples
should be provided. Note that this is one of most the memory-demanding routines,
as it will load all samples into memory at once (unless used with ``--low-mem`` option).
Repertoire similarity measures include
- Pearson correlation of clonotype frequencies, restricted only to the overlapping clonotypes
.. math:: R_{ij} = \frac{\sum^N_{k=1} \left(\phi _{ik} - \bar{\phi _{i}} \right ) \left(\phi _{jk} - \bar{\phi _{j}} \right )}{\sqrt{\sum^N_{k=1} \left(\phi _{ik} - \bar{\phi _{i}} \right )^2 \sum^N_{k=1} \left(\phi _{jk} - \bar{\phi _{j}} \right )^2}}
where :math:`k=1..N` are the indices of overlapping clonotypes,
:math:`\phi_{ik}` is the frequency of clonotype :math:`k` in sample :math:`i` and
:math:`\bar{\phi_{i}}` is the average frequency of overlapping clonotypes in sample :math:`i`.
- Relative overlap diversity, computed with the following normalization
.. math:: D_{ij} = \frac{d_{ij}}{d_{i}d_{j}}
where :math:`d_{ij}` is the number of clonotypes present in both samples
and :math:`d_{i}` is the diversity of sample :math:`i`. See
`this paper `__
for the rationale behind normalization.
- Geometric mean of relative overlap frequencies
.. math:: F_{ij} = \sqrt{f_{ij}f_{ji}}
where :math:`f_{ij}=\sum^N_{k=1}\phi_{ik}` is the total frequency of clonotypes that overlap
between samples :math:`i` and :math:`j` in sample :math:`i`.
- Сlonotype-wise sum of geometric mean frequencies
.. math:: F2_{ij} = \sum^N_{k=1}\sqrt{\phi_{ik}\phi_{jk}}
Note that this measure performs similar to :math:`F` and provides slightly more robust
results in case cross-sample contamination is present.
- `Jensen-Shannon divergence
`__ between
Variable segment usage profiles
(will be moved to :ref:`CalcSegmentUsage` in near future).
- `Jaccard index `__.
- `Morisita-Horm index `__.
:ref:`ClusterSamples` routine can be additionally run for CalcPairwiseDistances
results.
Command line usage
~~~~~~~~~~~~~~~~~~
.. code-block:: bash
$VDJTOOLS CalcPairwiseDistances \
[options] [sample1.txt sample2.txt sample3.txt ... if -m is not specified] output_prefix
Parameters:
+-------------+------------------------+------------+-----------------------------------------------------------------------------------------------------+
| Shorthand | Long name | Argument | Description |
+=============+========================+============+=====================================================================================================+
| ``-m`` | ``--metadata`` | path | Path to metadata file. See :ref:`common_params` |
+-------------+------------------------+------------+-----------------------------------------------------------------------------------------------------+
| ``-i`` | ``--intersect-type`` | string | Sample intersection rule. Defaults to ``aa``. See :ref:`common_params` |
+-------------+------------------------+------------+-----------------------------------------------------------------------------------------------------+
| | ``--low-mem`` | | Low memory mode, will keep only a pair of samples in memory during execution, but run much slower. |
+-------------+------------------------+------------+-----------------------------------------------------------------------------------------------------+
| ``-p`` | ``--plot`` | | Turns on plotting. See :ref:`common_params` |
+-------------+------------------------+------------+-----------------------------------------------------------------------------------------------------+
| ``-h`` | ``--help`` | | Display help message |
+-------------+------------------------+------------+-----------------------------------------------------------------------------------------------------+
Tabular output
~~~~~~~~~~~~~~
A table suffixed
``intersect.batch.[intersection type shorthand].summary.txt`` with a
comprehensive information on sample pair intersections is generated.
This table is non-redundant: it contains ``N * (N - 1) / 2`` rows
corresponding to upper diagonal of matrix of possible pairs ``(i,j)``.
Table layout is given below in three parts.
**General info**
+-----------------+-----------------------------------------------------------------------------------------------------------------------------+
| Column | Description |
+=================+=============================================================================================================================+
| 1\_sample\_id | First sample unique identifier |
+-----------------+-----------------------------------------------------------------------------------------------------------------------------+
| 2\_sample\_id | Second sample unique identifier |
+-----------------+-----------------------------------------------------------------------------------------------------------------------------+
| div1 | Total number of clonotypes in the first sample after identical clonotypes are collapsed based on intersection type ``-i`` |
+-----------------+-----------------------------------------------------------------------------------------------------------------------------+
| div2 | Same as above, second sample |
+-----------------+-----------------------------------------------------------------------------------------------------------------------------+
| div12 | Number of overlapping clonotypes |
+-----------------+-----------------------------------------------------------------------------------------------------------------------------+
| div21 | Same as above |
+-----------------+-----------------------------------------------------------------------------------------------------------------------------+
| count1 | Total number of reads in the first sample |
+-----------------+-----------------------------------------------------------------------------------------------------------------------------+
| count2 | ... |
+-----------------+-----------------------------------------------------------------------------------------------------------------------------+
| count12 | For clonotypes **overlapping** between two samples: total number of reads they have in the **first** sample |
+-----------------+-----------------------------------------------------------------------------------------------------------------------------+
| count21 | ... |
+-----------------+-----------------------------------------------------------------------------------------------------------------------------+
| freq1 | Total clonotype relative abundance for the first sample (should be 1.0 if sample is unaltered) |
+-----------------+-----------------------------------------------------------------------------------------------------------------------------+
| freq2 | ... |
+-----------------+-----------------------------------------------------------------------------------------------------------------------------+
| freq12 | For clonotypes **overlapping** between two samples: their sum of relative abundances in the **first** sample |
+-----------------+-----------------------------------------------------------------------------------------------------------------------------+
| freq21 | ... |
+-----------------+-----------------------------------------------------------------------------------------------------------------------------+
**Overlap metrics**
+---------------+--------------------------------------------------------------------------------------------+
| Column | Description |
+===============+============================================================================================+
| R | Pearson correlation |
+---------------+--------------------------------------------------------------------------------------------+
| D | Relative overlap diversity |
+---------------+--------------------------------------------------------------------------------------------+
| F | Geometric mean of relative overlap frequencies |
+---------------+--------------------------------------------------------------------------------------------+
| F2 | Sum of geometric means of overlapping clonotype frequencies. |
+---------------+--------------------------------------------------------------------------------------------+
| vJSD | Jensen-Shannon divergence of Variable segment usage distributions |
+---------------+--------------------------------------------------------------------------------------------+
| vjJSD | <*experimental*\ > |
+---------------+--------------------------------------------------------------------------------------------+
| vj2JSD | <*experimental*\ > |
+---------------+--------------------------------------------------------------------------------------------+
| sJSD | <*experimental*\ > |
+---------------+--------------------------------------------------------------------------------------------+
| Jaccard | Jaccard index |
+---------------+--------------------------------------------------------------------------------------------+
| MorisitaHorn | Morisita-Horn index |
+---------------+--------------------------------------------------------------------------------------------+
**Sample metadata**
+----------+------------------------------------------------------------+
| Column | Description |
+==========+============================================================+
| 1\_... | First sample metadata columns. See :ref:`metadata` section |
+----------+------------------------------------------------------------+
| 2\_... | Second sample metadata columns |
+----------+------------------------------------------------------------+
Graphical output
~~~~~~~~~~~~~~~~
Circos plots showing pairwise overlap are stored in a file suffixed
``intersect.batch.[intersection type shorthand].summary.pdf``.
.. figure:: _static/images/modules/intersect-batch-circos.png
:align: center
:scale: 50 %
**Pairwise overlap circos plot**. Count, frequency and diversity
panels correspond to the read count, frequency (both non-symmetric)
and the total number of clonotypes that are shared between samples.
Pairwise overlaps are stacked, i.e. segment arc length is not equal
to sample size.
--------------
.. _ClusterSamples:
ClusterSamples
^^^^^^^^^^^^^^
This routine provides additional cluster analysis (hierarchical
clustering), multi-dimensional scaling (MDS)
and plotting for :ref:`CalcPairwiseDistances` output.
Note that this routine requires the following parameter setting:
- Input file prefix (``input_prefix``) is set to the same value
as the output prefix of :ref:`CalcPairwiseDistances`
- The ``-i`` argument setting is the same as in :ref:`CalcPairwiseDistances`
Command line usage
~~~~~~~~~~~~~~~~~~
.. code-block:: bash
$VDJTOOLS ClusterSamples \
[options] input_prefix [output_prefix]
Parameters:
+-------------+------------------------+------------+---------------------------------------------------------------------------------------------------------------------------------------------+
| Shorthand | Long name | Argument | Description |
+=============+========================+============+=============================================================================================================================================+
| ``-e`` | ``--measure`` | string | Sample overlap metric, see **Overlap metrics** section of :ref:`CalcPairwiseDistances` tabular output for allowed values. Defaults to ``F`` |
+-------------+------------------------+------------+---------------------------------------------------------------------------------------------------------------------------------------------+
| ``-i`` | ``--intersect-type`` | string | Intersection type, defaults to ``aa``. See :ref:`common_params` |
+-------------+------------------------+------------+---------------------------------------------------------------------------------------------------------------------------------------------+
| ``-f`` | ``--factor`` | string | Specifies metadata column with plotting factor (is used to color for sample labels and figure legend). See :ref:`common_params` |
+-------------+------------------------+------------+---------------------------------------------------------------------------------------------------------------------------------------------+
| ``-n`` | ``--numeric`` | | Specifies if plotting factor is continuous. See :ref:`common_params` |
+-------------+------------------------+------------+---------------------------------------------------------------------------------------------------------------------------------------------+
| ``-l`` | ``--label`` | string | Specifies metadata column with sample labelslabel . See :ref:`common_params` |
+-------------+------------------------+------------+---------------------------------------------------------------------------------------------------------------------------------------------+
| ``-h`` | ``--help`` | | Display help message |
+-------------+------------------------+------------+---------------------------------------------------------------------------------------------------------------------------------------------+
| ``-p`` | ``--plot`` | | Turns on plotting. See :ref:`common_params` |
+-------------+------------------------+------------+---------------------------------------------------------------------------------------------------------------------------------------------+
Tabular output
~~~~~~~~~~~~~~
Two output files are generated:
- Table suffixed ``mds.[value of -i argument].[value of -e argument].txt``
that contains coordinates of samples computed using
multi-dimensional scaling (MDS), i.e. the coordinates of samples
projected to a 2D plane in a manner that pairwise sample distances are preserved.
- A file in `Newick format `__ suffixed
``hc.[value of -i argument].[value of -e argument].newick`` is
generated that contains sample dendrogram produced by hierarchical clustering.
.. note::
Hierarchical clustering and MDS are performed using ``hclust()`` and
``isoMDS()`` (`MASS package `__) R functions.
Default parameters are used for those algorithms.
Distances are scaled as ``-log10(.)`` and ``(1-.)/2`` for relative overlap and
correlation metrics respectively; in case of Jensen-Shannon divergence,
Jaccard and Morisita-Horn indices no scaling is performed.
Graphical output
~~~~~~~~~~~~~~~~
Hierarchical clustering plot is stored in a file suffixed
``hc.[value of -i argument].[value of -e argument].pdf``.
MDS plot is stored in a file with
``mds.[value of -i argument].[value of -e argument].pdf`` suffix.
.. figure:: _static/images/modules/intersect-batch-dendro.png
:align: center
:scale: 50 %
**Hierarchical clustering**. Dendrogram of samples, branch
length shows the distance between repertoires. Node colors
correspond to factor value, continuous scale is used in
present case (``-n -f age`` argument).
.. figure:: _static/images/modules/intersect-batch-mds.png
:align: center
:scale: 50 %
**MDS plot**. A scatterplot of samples. Euclidean distance
between points reflects the distance between repertoires.
Points are colored by factor value.
--------------
.. _TestClusters:
TestClusters
^^^^^^^^^^^^
This routine allows to test whether a given factor influences
repertoire clustering. It assesses compactness of samples that
have the same factor level and separation between samples with
distinct factor levels for the factor specified in
:ref:`ClusterSamples`.
Performs post-hoc permutation testing
based on MDS coordinates generated by :ref:`ClusterSamples` routine.
Can only be performed if a discrete factor (``-f``) was specified
in :ref:`ClusterSamples`.
Note that this routine requires the following parameter setting:
- Input file prefix (``input_prefix``) is set to the same value
as the output prefix of :ref:`ClusterSamples`
- The ``-i`` and ``-e`` argument setting is the
same as in :ref:`ClusterSamples`
Command line usage
~~~~~~~~~~~~~~~~~~
.. code-block:: bash
$VDJTOOLS TestClusters \
[options] input_prefix [output_prefix]
Parameters:
+-------------+------------------------+------------+---------------------------------------------------------------------------------------------------------------------------------------------+
| Shorthand | Long name | Argument | Description |
+=============+========================+============+=============================================================================================================================================+
| ``-e`` | ``--measure`` | string | Sample overlap metric, see **Overlap metrics** section of :ref:`CalcPairwiseDistances` tabular output for allowed values. Defaults to ``F`` |
+-------------+------------------------+------------+---------------------------------------------------------------------------------------------------------------------------------------------+
| ``-i`` | ``--intersect-type`` | string | Intersection type, defaults to ``aa``. See :ref:`common_params` |
+-------------+------------------------+------------+---------------------------------------------------------------------------------------------------------------------------------------------+
Tabular output
~~~~~~~~~~~~~~
none
Graphical output
~~~~~~~~~~~~~~~~
Permutation summary plot is generated having the
``perms.[value of -i argument].[value of -e argument].pdf`` suffix.
.. figure:: _static/images/modules/test-clusters.png
:align: center
:scale: 50 %
**Testing compactness and separation of sample clustering for a given factor**.
Average repertoire similarity values for
sample pairs in which both samples have the same (within panel)
and different (between panel) factor levels. Each row correspond
to a specific factor level. Red lines show observed values,
histograms correspond to values generated by randomly permuting
factor levels. Numbers near red lines indicate P-values for
n=10000 permutations.
--------------
.. _TrackClonotypes:
TrackClonotypes
^^^^^^^^^^^^^^^
This routine performs an all-vs-all intersection between an ordered list
of samples for clonotype tracking purposes. User can specify sample which
clonotypes will be traced, e.g. the pre-therapy sample.
Command line usage
~~~~~~~~~~~~~~~~~~
.. code-block:: bash
$VDJTOOLS TrackClonotypes \
[options] [sample1.txt sample2.txt sample3.txt ... if -m is not specified] output_prefix
Parameters:
+-------------+------------------------+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Shorthand | Long name | Argument | Description |
+=============+========================+===================+===========================================================================================================================================================================================================================================+
| ``-m`` | ``--metadata`` | path | Path to metadata file. See See :ref:`common_params` |
+-------------+------------------------+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-i`` | ``--intersect-type`` | string | Sample intersection rule. Defaults to ``strict``. See :ref:`common_params` |
+-------------+------------------------+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-f`` | ``--factor`` | string | Specifies factor that should be treated as ``time`` variable. Factor values should be numeric. If such column not set, time points are taken either from values provided with ``-s`` argument or sample order. See :ref:`common_params` |
+-------------+------------------------+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-x`` | ``--track-sample`` | integer | A zero-based index of time point to track. If not provided, will consider all clonotypes that were detected in 2+ samples |
+-------------+------------------------+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-s`` | ``--sequence`` | ``[t1,t2,...]`` | Time point sequence. Unused if ``-m`` is specified. If not specified, either ``time`` column values from metadata, or sample indexes (as in command line) are used. |
+-------------+------------------------+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-t`` | ``--top`` | integer | Number of top clonotypes to visualize explicitly on stack are plot and provide in the collapsed joint table. Should not exceed 100, default is 200 |
+-------------+------------------------+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-p`` | ``--plot`` | | Turns on plotting. See :ref:`common_params` |
+-------------+------------------------+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-c`` | ``--compress`` | | Compressed output for clonotype table. See :ref:`common_params` |
+-------------+------------------------+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-h`` | ``--help`` | | Display help message |
+-------------+------------------------+-------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Tabular output
~~~~~~~~~~~~~~
Summary table suffixed ``sequential.[value of -i argument].summary.txt``
is created with the following columns.
+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Column | Description |
+=================+===========================================================================================================================================================================================================================================================================================================+
| 1\_sample\_id | First sample unique identifier |
+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2\_sample\_id | Second sample unique identifier |
+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| value | Value of the intersection metric |
+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| metric | Metric type: ``diversity``, ``frequency`` or ``count``. Metrics correspond to the number of unique clonotypes, total frequency and total read count for clonotypes overlapping between first and second sample. In case tracking is on (``-x``), only clonotypes present in tracked sample are counted. |
+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1\_time | Time value for the first sample |
+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2\_time | Time value for the second sample |
+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 1\_... | First sample metadata columns. See :ref:`metadata` section |
+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 2\_... | Second sample metadata columns |
+-----------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Two joint clonotype abundance tables with
``sequential.[intersection type shorthand].table.txt`` and
``sequential.[intersection type shorthand].table.collapsed.txt``
suffices are generated. The latter contains top ``-t``
clonotypes, with two additional rows containing summary count and frequency
for non-overlapping and collapsed clonotypes.
See :ref:`joint_table_structure` for a detailed description of table columns.
**Graphical output**
Summary table is visualized in a plot file suffixed
``sequential.[value of -i argument].summary.pdf``.
A plot file with ``.sequential.[value of -i argument].stackplot.pdf``
suffix contains a clonotype abundance stack area plot.
The same is also visualized using a heatmap in a file with
``.sequential.[value of -i argument].heatplot.pdf``).
.. figure:: _static/images/modules/intersect-seq-summary.png
:align: center
:scale: 50 %
**Clonotype tracking summary**. Count, frequency and diversity
panels correspond to the read count, frequency (both non-symmetric)
and the total number of clonotypes that are shared between samples.
Rows and columns of each matrix are sorted according to time point
sequence.
.. figure:: _static/images/modules/intersect-seq-stackplot.png
:align: center
:scale: 50 %
**Clonotype tracking stackplot**. Contains detailed profiles for top
``-t`` clonotypes, as well as collapsed ("NotShown") and non-overlapping
("NonOverlapping") clonotypes. Clonotype CDR3 amino acid sequence is
plotted against the sample where the clonotype reaches maximum
abundance. Clonotypes are colored by the peak position of their
abundance profile.
.. figure:: _static/images/modules/intersect-seq-heatplot.png
:align: center
:scale: 50 %
**Clonotype tracking heatmap**. Shows a heatmap for top ``-t``
joint clonotype abundances.
vdjtools-1.2.1+git20190311/doc/modules.rst 000644 001750 001750 00000041607 13525023773 021105 0 ustar 00moeller moeller 000000 000000 .. _modules:
Analysis modules
----------------
Table of VDJtools modules
^^^^^^^^^^^^^^^^^^^^^^^^^
VDJtools software package contains a comprehensive set of immune
repertoire post-analysis routines, which are subdivided into several
analysis modules. Each module's section provides command line usage
syntax and parameter descriptions for each of the routines, as well as
output example and description.
:ref:`basic`
~~~~~~~~~~~~
Summary statistics, spectratyping, etc
- :ref:`CalcBasicStats`
Computes summary statistics for samples: read counts, mean clonotype
sizes, number of non-functional clonotypes, etc
- :ref:`CalcSegmentUsage`
Computes Variable (V) and Joining (J) segment usage profiles
- :ref:`CalcSpectratype`
Computes spectratype, the distribution of clonotype abundance by CDR3
sequence length
- :ref:`PlotFancySpectratype`
Plots spectratype explicitly showing top N clonotypes
- :ref:`PlotFancyVJUsage`
Plots the frequency of different V-J pairings
- :ref:`PlotSpectratypeV`
Plots distribution of V segment abundance by resulting CDR3 sequence
length
:ref:`diversity`
~~~~~~~~~~~~~~~~
Repertoire richness and diversity
- :ref:`PlotQuantileStats`
Visualizes repertoire clonality
- :ref:`RarefactionPlot`
Performs rarefaction analysis
- :ref:`CalcDiversityStats`
Computes repertoire diversity estimates
:ref:`overlap`
~~~~~~~~~~~~~~
Clonotype sharing between samples
- :ref:`OverlapPair`
Computes intersection between a pair of samples
- :ref:`CalcPairwiseDistances`
Computes pairwise intersections for a list of samples
- :ref:`ClusterSamples`
Performs sample clusterization according to the results of batch intersection
- :ref:`TrackClonotypes`
Time-course analysis for a sequence of samples
:ref:`preprocess`
~~~~~~~~~~~~~~~~~
Filtering and resampling
- :ref:`Correct`
Performs a frequency-based erroneous clonotype correction
- :ref:`Decontaminate`
Filters possible cross-sample contaminations in a set of samples
- :ref:`DownSample`
Performs down-sampling, i.e. takes a subset of random reads from sample(s)
- :ref:`FilterNonFunctional`
Filters non-functional clonotypes
- :ref:`SelectTop`
Selects a fixed number of top (most abundant) clonotypes from sample(s)
- :ref:`FilterByFrequency`
Filters clonotypes based on a specified frequency threshold.
- :ref:`ApplySampleAsFilter`
Filters clonotypes that are present in a specified sample from sample(s)
- :ref:`FilterBySegment`
Filters clonotypes according to their V/D/J segment
:ref:`operate`
~~~~~~~~~~~~~~
Clonotype table operations
- :ref:`PoolSamples`
Pools clonotypes from several samples together
- :ref:`JoinSamples`
Joins a set of samples and generates clonotype abundance profiles
:ref:`annotate`
~~~~~~~~~~~~~~~
Functional annotation of clonotype tables (antigen specificity, amino acid properties, etc)
- :ref:`CalcCdrAAProfile`
Builds a profile of CDR3 regions (V germline, V-D junction, ...) using a set of amino-acid physical properties
- :ref:`Annotate2`
Computes a set of basic (insert size, ...) and amino acid physical properties (GRAVY, ...) for clonotypes
- :ref:`ScanDatabase`
Queries a database containing clonotypes of known antigen specificity.
:ref:`util`
~~~~~~~~~~~
Some useful utilities
- :ref:`FilterMetadata`
Filters metadata file by values in specified column
- :ref:`SplitMetadata`
Splits metadata file by specified columns
- :ref:`Convert`
Converts from one software format to another
- :ref:`Rinstall`
Installs necessary R dependencies
Output
~~~~~~
Each routine generates a comprehensive tabular output and some
produce optional graphical output. In case of graphical output,
the corresponding R script with specified arguments (at the beginning of
the script, commented) will be stored to the analysis folder. Thus, user can
uncomment the script arguments, modify the script and re-run it. This behavior
be disabled by running VDJtools with ``discard_scripts`` argument prior
to routine name.
By default, all graphical output is generated in PDF format, to generate
PNG images use ````--plot-type png`` option.
When running routines that output clonotype tables consider the following:
- Joint and pooled samples are stored in VDJtools fomat
- Samples produced using :ref:`ScanDatabase` or :ref:`Annotate` routine are in VDJtools format and include additional annotation columns. Annotation columns are retained when running most of VDJtools routines
- When loading a joint/pooled sample into VDJtools, clonotype abundance vectors, incidence counts, etc will be treated as clonotype level annotations
- Annotation columns will not be preserved when joining/pooling annotated samples, a workaround
here will be to use :ref:`ApplySampleAsFilter` routine
.. attention::
When exporting a table generated by one of VDJtools routines
into R use the following command to parse the input correctly:
.. code:: r
read.table("some_table.txt", header=T, quote="", sep = "\t")
.. _common_params:
Common parameters
^^^^^^^^^^^^^^^^^
There are several parameters that are commonly used among analysis
routines:
+-------------+------------------------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Shorthand | Long name | Argument | Description |
+=============+========================+============+=============================================================================================================================================================================================================================================================================+
| ``-h`` | ``--help`` | | Brings up the help message for selected routine |
+-------------+------------------------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-m`` | ``--metadata`` | path | Path to metadata file. Should point to a tab-delimited file with the first two columns containing sample path and sample id respectively, and the remaining columns containing user-specified data. See :ref:`metadata` section |
+-------------+------------------------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-u`` | ``--unweighted`` | | If present as an option and not set, all statistics will be weighted by clonotype frequency |
+-------------+------------------------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-i`` | ``--intersect-type`` | string | :ref:`overlap_type`, that specifies which clonotype features (CDR3 sequence, V/J segments, hypermutations) will be compared when checking if two clonotypes match. Allowed values: ``strict``,\ ``nt``,\ ``ntV``,\ ``ntVJ``,\ ``aa``,\ ``aaV``,\ ``aaVJ`` and ``aa!nt``. |
+-------------+------------------------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-p`` | ``--plot`` | | [*plotting*] Enable plotting for routines that supports it. |
+-------------+------------------------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| | ``--plot-type`` | | [*plotting*] Specifies whether to generate a PDF or PNG file. While latter could be easily embedded, PDF plots have superior quality. |
+-------------+------------------------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-f`` | ``--factor`` | string | [*plotting*] Name of the sample metadata column that should be treated as factor. If the name contains spaces, the argument should be surrounded with double quotes, e.g. ``-f "Treatment type"`` |
+-------------+------------------------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-n`` | ``--factor-numeric`` | | [*plotting*] Treat the factor as numeric? |
+-------------+------------------------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-l`` | ``--label`` | string | [*plotting*] Name of the sample metadata column that should be treated as label. If the name contains spaces, the argument should be surrounded with double quotes, e.g. ``-l "Patient id"`` |
+-------------+------------------------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-c`` | ``--compress`` | path | Compress resulting clonotype tables using GZIP. |
+-------------+------------------------+------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
.. _overlap_type:
Overlap type
~~~~~~~~~~~~
Some of VDJtools routines require to define clonotype matching strategy
when computing clonotype sharing between samples. This parameter is also
used when collapsing clonotype tables, e.g. a common situation is when
one is interested in estimating the extent of convergent recombination,
which is the number of distinct nucleotide CDR3 sequences per one CDR3
amino acid sequence. This requires to collapse clonotype table by identical
CDR3aa field.
The list of strategies is defined below.
+-------------+---------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| Shorthand | Rule | Note |
+=============+===================================================+=======================================================================================================================================+
| strict | **CDR3nt** (AND) **V** (AND) **J** (AND) **SHMs** | Require full match for receptor nucleotide sequence |
+-------------+---------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| nt | **CDR3nt** | |
+-------------+---------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| ntV | **CDR3nt** (AND) **V** | |
+-------------+---------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| ntVJ | **CDR3nt** (AND) **V** (AND) **J** | |
+-------------+---------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| aa | **CDR3aa** | |
+-------------+---------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| aaV | **CDR3aa** (AND) **V** | |
+-------------+---------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| aaVJ | **CDR3aa** (AND) **V** (AND) **J** | |
+-------------+---------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
| aa!nt | **CDR3aa** (AND)((NOT) **CDR3nt** ) | Removes nearly all contamination bias from overlap results. Should not be used for samples from the same donor/tracking experiments |
+-------------+---------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
As somatic hypermutations (SHMs) are currently not supported by VDJtools,
``strict`` and ``ntVJ`` options are identical. See VDJtools :ref:`clonotype_spec`
specification for details.
vdjtools-1.2.1+git20190311/doc/input.rst 000644 001750 001750 00000033401 13525023773 020565 0 ustar 00moeller moeller 000000 000000 Input
-----
Clonotype tables
^^^^^^^^^^^^^^^^
The processing stage of RepSeq analysis starts with mapping of Variable,
Diversity and Joining segments. Mapped reads are then assembled into clonotypes
and stored as a clonotype abundance tables.
.. _clonotype_spec:
Clonotype
~~~~~~~~~
VDJtools **clonotype** specification includes the following fields:
- Variable (*V*) segment name.
- Diversity (*D*) segment name for some of the receptor chains (TRB,
TRD and IGH). Set to `.` if not aplicable or D segment was not
identified.
- Joining (*J*) segment name.
- Complementarity determining region 3 nucleotide sequence (*CDR3nt*).
CDR3 starts with Variable region reference point (conserved Cys residue)
and ends with Joining segment reference point (conserved Phe\Trp).
- Translated CDR3 sequence (*CDR3aa*).
- Somatic hypermutations (*SHMs*) in the variable segment (antibody only, **planned**).
.. important::
For ambiguous segment assignments encoded by a comma separated list
of segment names only the first one is selected.
.. hint::
In case of non-coding CDR3 sequences, the convention is to
translate in both directions: upstream from V segment
reference point and downstream from J segment reference point.
The resulting sequence (e.g. ``CASSLA_TNEKFF``)
is linked by a ``_`` symbol that marks the incomplete codon.
Clonotype **abundance** data is represented by *count* and *frequency* fields:
- *Count*: number of reads or cDNA/DNA molecules in case
`UMIs `__
are used.
- *Frequency*: the share of clonotype in the sample. While seemingly
redundant, this property is left for compatibility with cases when
the sample represents a subset of another one, e.g. clonotypes from
PBMCs filtered by intersection with lymph node clonotypes.
The following fields are optional, but are used for computing various
statistics and visualization:
- *Vend*, *Dstart*, *Dend* and *Jstart* - marking V, D and J segment
boundaries within CDR3 nucleotide sequence (inclusive)
.. tip::
VDJtools accepts `gzip `__-compressed
files, such files should have an ``.gz`` suffix. Input data
should be provided in a form of tab-delimited table.
.. _vdjtools_format:
VDJtools format
^^^^^^^^^^^^^^^
This is a core tabular format for VDJtools. All datasets
should be converted to this format using the :ref:`convert` routine
prior to analysis. Columns 8-10 are optional.
+-----------+-------------+---------------------------+------------------+------------+-----------+-----------+------------+-----------+-----------+-----------+
| column1 | column2 | column3 | column4 | column5 | column6 | column7 | column8 | column9 | column10 | column11 |
+===========+=============+===========================+==================+============+===========+===========+============+===========+===========+===========+
| count | frequency | CDR3nt | CDR3aa | V | D | J | Vend | Dstart | Dend | Jstart |
+-----------+-------------+---------------------------+------------------+------------+-----------+-----------+------------+-----------+-----------+-----------+
| 1176 | 9.90E-02 | TGTGCCAGC...AAGCTTTCTTT | CAST...EAFF | TRBV12-4 | TRBD1 | TRBJ1-1 | 11 | 14 | 16 | 23 |
+-----------+-------------+---------------------------+------------------+------------+-----------+-----------+------------+-----------+-----------+-----------+
All additional columns after column 10 will be considered as clonotype annotations
and carried over unmodified during most stages of VDJtools analysis. This is especially
useful when processing results of :ref:`Annotate` and :ref:`ScanDatabase` routines.
.. _supported_input:
Formats supported for conversion
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
MiTCR
~~~~~
Output from MiTCR software (`executable jar `__,
`documentation `__) in
``full`` mode can be used without any pre-processing. Corresponding
table should start with **two header lines** (default MiTCR output
stores processing options and version in the first line), followed by a clonotype
list.
Run :ref:`convert` routine with ``-S mitcr`` argument to prepare datasets
in this format for VDJtools analysis.
MiGEC
~~~~~
`MiGEC `__ is a software for V/D/J mapping and CDR3
extraction that relies on BLAST algorithm for running alignments. MIGEC software
additionally implements processing of unique molecular identifier (UMI)-tagged libraries
for error correction and dataset normalization. Default output of MIGEC software
can be directly used with VDJtools.
Run :ref:`convert` routine with ``-S migec`` argument to prepare datasets
in this format for VDJtools analysis.
IgBlast (MIGMAP)
~~~~~~~~~~~~~~~~
As IgBlast doesn't compute a canonical clonotype abundance table,
VDJtools supports output of `MIGMAP `__,
a versatile IgBlast wrapper. Note that currently no somatic hypermutation (SHM)
information is imported by VDJtools, neither there are any dedicated VDJtools
routines to analyze SHM profiles, but you check out `post-analysis provided by MIGMAP `__.
Run :ref:`convert` routine with ``-S migmap`` argument to prepare datasets
in this format for VDJtools analysis.
ImmunoSEQ
~~~~~~~~~
One of the most commonly used RepSeq data format, more than 90% of recently published studies
were performed using `immunoSEQ `__
assay. We have implemented a parser for clonotype tables as provided by
`Adaptive Biotechnologies `__.
- The resulting datasets for most studies that use ImmunoSEQ technology can be accessed and exported using the
`ImmunoSEQ Analyzer `__.
- Example datasets in this format could be found in the
`Supplementary Data `__
section of `Spreafico R et al. Ann Rheum Dis. 2014 `__.
- Column header information was taken from **page 24** of the immunoSEQ Analyzer
`manual `__
- VDJtools will use V/J segment information only at the family level, as many of the clonotypes miss
segment (`-X`) and allele (`-X*0Y`) information.
The clonotype table is then collapsed to handle unique V/J/CDR3 entries.
- Raw clonotype tables in this format do not contain CDR3 nucleotide sequence.
Instead, an entire sequencing read (first column) is provided. Therefore, we have
implemented additional algorithms for CDR3 extraction and "virtual" translation
to tell out-of-frame clonotypes from partially read ones.
.. attention::
Some of the clonotype entries will dropped during conversion as they contain an incomplete
CDR3 sequence (lacking J segment), which is due to short reads used in immunoSEQ assay,
see this `blog post `__
for details.
Run :ref:`convert` routine with ``-S immunoseq`` argument to prepare datasets
in this format for VDJtools analysis. Note that there are currently two possible ImmunoSEQ
output formats that have different column naming:
- This option should be used in case you have selected
``Export samples`` option in the ImmunoSEQ analyzer.
- In case you have used the ``Export samples v2``
option you should pass the ``-S immunoseqv2`` argument to VDJtools Convert routine.
IMGT/HighV-QUEST
~~~~~~~~~~~~~~~~
Another commonly used RepSeq processing tool is the
`IMGT/HighV-QUEST `__ web server.
Please refer to the official `documentation `__
to see the description of output files and their formats.
.. tip::
The output for each submission consists of several files and only
.. code:: bash
3_Nt-sequences_${chain}_${sx}_${date}.txt
should be used as an input for VDJtools :ref:`convert` routine.
Run :ref:`convert` routine with ``-S imgthighvquest`` argument to prepare datasets
in this format for VDJtools analysis.
VDJdb
~~~~~
VDJtools has native support for the analysis of clonotype tables annotated
with `VDJdb `__ software.
Note that as those tables can list the same clonotype several times with
different annotation, they should not be used directly in most VDJtools
routines (e.g. diversity statistics), check out
`VDJdb README `__
for corresponding guidelines and workarounds.
Vidjil
~~~~~~
VDJtools supports parsing output Json files produced by the
`Vidjil `__ software. VDJtools will only use
top clonotypes which have V/D/J detalization in the output.
RTCR
~~~~
VDJtools supports parsing the ``results.tsv`` table with clonotype list
generated by the `RTCR `__ software.
Run :ref:`convert` routine with ``-S rtcr`` argument to prepare datasets
in this format for VDJtools analysis.
MiXCR
~~~~~
Output from `MiXCR `__ software ``export`` routine
in ``full`` (default) mode can be used without any pre-processing.
Run :ref:`convert` routine with ``-S mixcr`` argument to prepare datasets
in this format for VDJtools analysis.
IMSEQ
~~~~~
Output from `IMSEQ `__ software can be used
if results are collapsed to nucleotide-level clonotypes using
``-on`` argument with IMSEQ.
Run :ref:`convert` routine with ``-S imseq`` argument to prepare datasets
in this format for VDJtools analysis.
.. _metadata:
Metadata
^^^^^^^^
Most VDJtools routines will accept multiple sample files as command
line arguments for batch processing. This should be always preferred over
multiple calls to VDJTools with a single sample due to the
initialisation time of VDJTools.
An alternative way to specify a sample batch is to pass the sample metadata
file with ``-m`` option. The file should contain sample file paths,
sample names. It can be also supplemented with optional metadata columns
that will be appended to analysis results and can be used for plottings.
Additionally, for each step that involves modification of samples (e.g.
converting or filtering non-functional rearrangements) a new metadata
file will be created in the folder containing the processed sample batch.
.. note::
- VDJtools will append metadata fields to its output tables to
facilitate the exploration of analysis results.
- Metadata entries are used as a factor in some analysis routines and
most plotting routines.
- When performing tasks that involve modifying clonotype abundance
tables themselves, such as down-sampling, VDJtools will also provide
a copy of metadata file pointing to newly generated samples.
- Newly generated metadata file would contain an additional
``..filter..`` column, which has a comma-separated list of filters
that were applied. For example the :ref:`downsample` routine run with
``-n 50000`` will append ``ds:50000`` to the ``..filter..`` column.
Note that this column name is reserved and should not be modified.
- Some routines for working with metadata files can be found in
:ref:`util` section.
Below are the basic guidelines for creating a metadata file.
- Metadata file should be a tab-delimited table, e.g.
+-----------------+--------------+-------------+-------+
| #file.name | sample.id | col.name | ... |
+=================+==============+=============+=======+
| sample\_1.txt | sample\_1 | A | ... |
+-----------------+--------------+-------------+-------+
| sample\_2.txt | sample\_2 | A | ... |
+-----------------+--------------+-------------+-------+
| sample\_3.txt | sample\_3 | B | ... |
+-----------------+--------------+-------------+-------+
| sample\_4.txt | sample\_4 | C | ... |
+-----------------+--------------+-------------+-------+
| ... | ... | ... | ... |
+-----------------+--------------+-------------+-------+
- Header is mandatory, first two columns should be named **file\_name**
and **sample\_id**. Names of the remaining columns will be later used
to specify metadata variable name
- First two columns should contain the file name and sample id
respectively.
- The file name should be either an absolute path
(e.g. ``/Users/username/somedir/file.txt``) or a path relative to the
parent directory of metadata file (e.g. ``../file.txt``)
- Sample IDs should be unique
- Columns after **sample.id** are treated as metadata entries. There
are also several cases when info from metadata is used during
execution:
- VDJtools plotting routines could be directed to use metadata fields
for naming samples and creating intuitive legends. If column name
contains spaces it should be quoted, e.g. ``-f "patient id"``
- Metadata fields are categorized as factor (contain only strings),
numeric (contain only numbers) and semi-numeric (numbers and
strings). Numeric and semi-numeric fields could be used for
gradient coloring by plotting routines.
vdjtools-1.2.1+git20190311/doc/util.rst 000644 001750 001750 00000013046 13525023773 020406 0 ustar 00moeller moeller 000000 000000 .. _util:
Utilities
---------
.. _SplitMetadata:
SplitMetadata
^^^^^^^^^^^^^
Splits metadata file into separate metadata files according to the set of values in specified column(s).
Can be handly for implementing pipelines using VDJtools.
Command line usage
~~~~~~~~~~~~~~~~~~
.. code-block:: bash
$VDJTOOLS SplitMetadata [options] metadata.txt output_dir
Parameters:
+-------------+------------------------+---------------------+-----------------------------------------------------------------+
| Shorthand | Long name | Argument | Description |
+=============+========================+=====================+=================================================================+
| ``-c`` | ``--columns`` | string1,string2,... | A comma separated list of column name(s) to split metadata by. |
+-------------+------------------------+---------------------+-----------------------------------------------------------------+
Tabular output
~~~~~~~~~~~~~~
Output resulting metadata files to specified folder. Unique combinations of metadata entries in specified columns will be appended to names of corresponding metadata files,
relative sample paths will be handled appropriately.
-------------
.. _FilterMetadata:
FilterMetadata
^^^^^^^^^^^^^^
Filters metadata by evaluating expression over values in specified metadata columns, e.g.:
.. code-block:: java
"__chain__=~/TR[AB]/"
"__chain__=='TRA'||__chain__=='TRB'"
"__chain__.contains('TRA')"
"!__condition__.startsWith('control')"
Both Java and Groovy syntax are supported, column names should be marked by double underscores before and after the name.
Command line usage
~~~~~~~~~~~~~~~~~~
.. code-block:: bash
$VDJTOOLS FilterMetadata [options] metadata.txt output_dir output_suffix
Parameters:
+-------------+------------------------+--------------+-------------------------------------------------------------------------------------------------------------------+
| Shorthand | Long name | Argument | Description |
+=============+========================+==============+===================================================================================================================+
| ``-f`` | ``--filter`` | expression | Filter expression, should be surrounded with quotation marks, metadata column names should be marked with ``__``. |
+-------------+------------------------+--------------+-------------------------------------------------------------------------------------------------------------------+
Tabular output
~~~~~~~~~~~~~~
Filtered metadata table with corresponding suffix will be created in the specified folder, relative sample paths will be handled appropriately.
-------------
.. _Convert:
Convert
^^^^^^^
Converts datasets from an arbitrary supported format to :ref:`vdjtools_format`. You can also re-normalize your data - collapse clonotypes by
V, D, J and CDR3 nucleotide sequence and re-compute clonotype frequencies - by using ``-S VDJtoolsRenorm`` option. This is useful if you want
to groom manually converted data, or somewhy your clonotype frequencies do not sum to 1.
Command line usage
~~~~~~~~~~~~~~~~~~
.. code-block:: bash
$VDJTOOLS Convert \
[options] [sample1.txt sample2.txt ... if -m is not specified] output_prefix
Parameters:
+-------------+------------------------+-----------+-------------------------------------------------------------------------------------------------------------+
| Shorthand | Long name | Argument | Description |
+=============+========================+===========+=============================================================================================================+
| ``-S`` | ``--software`` | path | Format to convert from, see the :ref:`supported_input` section |
+-------------+------------------------+-----------+-------------------------------------------------------------------------------------------------------------+
| ``-m`` | ``--metadata`` | path | Path to metadata file. See :ref:`common_params` |
+-------------+------------------------+-----------+-------------------------------------------------------------------------------------------------------------+
| ``-c`` | ``--compress`` | | Compressed output for clonotype table. See :ref:`common_params` |
+-------------+------------------------+-----------+-------------------------------------------------------------------------------------------------------------+
Tabular output
~~~~~~~~~~~~~~
Outputs converted samples to the path specified by output prefix and creates a
corresponding metadata file. Will also append ``conv:[-S value]`` to ``..filter..``
metadata column.
-------------
.. _Rinstall:
RInstall
^^^^^^^^
Prints the list of required R packages and installs dependencies into a local library
(`RPackages` folder) which is placed in the parent folder of VDJtools jar.
If this routine does not return with "PASSED" message, manual installation of
packages that failed to deploy is required.
Command line usage
~~~~~~~~~~~~~~~~~~
.. code-block:: bash
$VDJTOOLS RInstall
vdjtools-1.2.1+git20190311/doc/vdjviz.rst 000644 001750 001750 00000001723 13525023773 020744 0 ustar 00moeller moeller 000000 000000 .. _vdjviz:
Clonotype browser
-----------------
In order to demonstrate VDJtools API features, a lightweight immune
repertoire browser **VDJviz** was implemented by
`@bvdmitri `__. VDJviz is a
`Play framework `__
application that uses `D3js `__ for interactive
visualization of VDJtools output. It allows visualizing and
comparing various immune repertoire features such as spectratypes and
rarefaction curves.
To try it out register at
`vdjviz.milaboratory.com `__ and upload
some RepSeq files in any supported format.
.. important::
Currently there is an upload limit of 25 files with at
most 10,000 clonotypes, so the :ref:`DownSample` routine
could come in handy
.. figure:: _static/images/vdjviz.png
:align: center
Clonotype browser panel
.. figure:: _static/images/vdjviz1.png
:align: center
Interactive graphs
vdjtools-1.2.1+git20190311/doc/annotate.rst 000644 001750 001750 00000140512 13525023773 021241 0 ustar 00moeller moeller 000000 000000 .. _annotate:
Annotation
----------
.. _SegmentsToFamilies:
SegmentsToFamilies
^^^^^^^^^^^^^^^^^^
Will replace V and J segment IDs in samples with segment 'family' IDs. Here, 'families' are defined as clusters of V/J
sequences built using hierarchial clustering of pairwise amino acid sequence alignment distances. Thus, two segments are
assigned to the same family if they have homologous sequence. The actual table of segment <> family conversions can
be accessed `here `__.
Command line usage
~~~~~~~~~~~~~~~~~~
.. code-block:: bash
$VDJTOOLS SegmentsToFamilies \
[options] [sample1.txt sample2.txt ... if -m is not specified] output_prefix
Parameters:
+-------------+-----------------------+--------------------+----------------------------------------------------+
| Shorthand | Long name | Argument | Description |
+=============+=======================+====================+====================================================+
| ``-m`` | ``--metadata`` | path | Path to metadata file. See :ref:`common_params` |
+-------------+-----------------------+--------------------+----------------------------------------------------+
| ``-s`` | ``--species`` | name | [Required] Species name: ``human`` or ``mouse``. |
+-------------+-----------------------+--------------------+----------------------------------------------------+
| ``-h`` | ``--help`` | | Display help message |
+-------------+-----------------------+--------------------+----------------------------------------------------+
| ``-c`` | | | Compressed output. |
+-------------+-----------------------+--------------------+----------------------------------------------------+
Tabular output
~~~~~~~~~~~~~~
Samples are returned as is, with the content of ``v`` and ``j`` columns replaced by families.
A metadata file will be created for resulting samples with ``segm2fam``
appended to the ``..filter..`` metadata column.
Graphical output
~~~~~~~~~~~~~~~~
none
--------------
.. _CalcDegreeStats:
CalcDegreeStats
^^^^^^^^^^^^^^^
Performs a TCR neighborhood enrichment test (TCRNET), testing each sample for clonotypes
that have more neighbours (higher **degree** in a graph), i.e. clonotypes with similar CDR3 amino acid sequences, than would be expected
by chance according to some control dataset. User can specify the actual **search scope** (i.e.
number of allowed CDR3 mismatches), whether to only compare clonotypes with same V/J, and the
control sample. If control sample is not provided, a pooling (see :ref:`PoolSamples`) of all provided samples is used.
Note that this test, if supplied with real samples and a control pooled using ``-i strict`` option
will account for the number of neighbours with the same CDR3 amino acid sequence, but distinct nucleotide
sequences. If this is not desired, all input samples and control should be pre-pooled with ``-i aa`` or
``-i aaVJ`` to collapse variants coding for the amino acid CDR3 sequence.
.. note::
Running this routine will not return the actual clonotype graph for you, just annotate input samples.
To build the graph, one should refer to `VDJmatch `__ software
and its ``Cluster`` routine. Make sure the search scope option is the same as ``-o`` used for ``CalcDegreeStats``
and that all scoring/filtering is turned off. Next, one should retain only the edges that connect pairs of
enriched clonotypes and enriched clonotypes with their neighbours.
Command line usage
~~~~~~~~~~~~~~~~~~
.. code-block:: bash
$VDJTOOLS CalcDegreeStats \
[options] [sample1.txt sample2.txt ... if -m is not specified] output_prefix
Parameters:
+-------------+-----------------------+--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Shorthand | Long name | Argument | Description |
+=============+=======================+====================+============================================================================================================================================================+
| ``-m`` | ``--metadata`` | path | Path to metadata file. See :ref:`common_params` |
+-------------+-----------------------+--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-b`` | ``--background`` | path | Path to the background (control) sample, used to compute expected statistics/P-values. If not provided, will pool input samples and uses them as control. |
+-------------+-----------------------+--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-o`` | ``--search-scope`` | s,i,d | Search scope: number of substitutions (s), indels (id) and total number of mismatches (t) allowed. Default is ``1,0,1`` |
+-------------+-----------------------+--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-g`` | ``--grouping`` | string | Primary grouping type, limits set of clonotype comparisons: 'dummy' (no grouping, default), 'vj' (same V and J) or 'vjl' (same V, J and CDR3 length). |
+-------------+-----------------------+--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-g2`` | ``--grouping2`` | string | Secondary grouping, used for computing statistics, accepts same values as ``-g``. By default will select 'vjl' if no indels allowed and 'vj' otherwise. |
+-------------+-----------------------+--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-h`` | ``--help`` | | Display help message |
+-------------+-----------------------+--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
.. note::
There are two possible schemes for running the algorithm. Firstly, one can select,
say a search scope of ``1,0,1`` allowing no indels, and ``-g vjl`` to only allow comparisons
between clonotypes that match in V, J and CDR3 length. Then, one should
only consider ``p.value.g`` in the output and disregard all columns with ``g2/group2``.
On the other hand, if one wants to allow comparison of clonotypes with different V/J,
and/or comparisons with indels, the option ``-g dummy`` should be used. If one thinks there
might be certain biases in V/J frequencies between control/background sample and input samples,
and one wants to control for them, he should select ``-g2 vj``, then observed degree values
will be provided as is (i.e. not limiting clonotype comparisons to a fixed V/J),
but the expected degree will be corrected to account for V/J usage difference
between input sample and control. One should only consider ``p.value.g2``
in this case. See below for more explaination on output columns.
Tabular output
~~~~~~~~~~~~~~
Processed samples will have additional annotation columns appended to VDJtools clonotype
table columns. These columns are the following:
+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Column | Description |
+=================+=======================================================================================================================================================================================================+
| degree.s | Degree (number of neighbours) of a given clonotype in sample. The degree is the number of unique clonotypes (incl. nucleotide variants) that match a given clonotype under specified search scope. |
+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| group.count.s | Number of unique clonotypes that match the group, defined by primary grouping (``-g``), of a given clonotype in sample, say have the same V and J. |
+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| group2.count.s | Same as above, but the group is defined by secondary grouping ``-g2``. |
+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| degree.c | Degree (number of neighbours) of a given clonotype in the control sample. |
+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| group.count.c | Number of unique clonotypes in the control sample that match the group of given clonotype as defined by primary grouping (``-g``). |
+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| group2.count.c | Same as above, but the group is defined by secondary grouping ``-g2``. |
+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| p.value.g | P-value for the neighbour (degree) enrichment of a given clonotype according to primary grouping. The P-value is computed as ``Pbinom(n=degree.s|p=degree.c/group.count.c, N=group.count.s)``. |
+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| p.value.g2 | P-value for the neighbour (degree) enrichment of a given clonotype according to secondary grouping. The P-value is computed as ``Ppoisson(n=degree.s|lambda=group.count.s*degree.c/group.count.c)``. |
+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
A metadata file will be created for resulting samples with ``degstat``
appended to the ``..filter..`` metadata column.
Graphical output
~~~~~~~~~~~~~~~~
none
--------------
.. _CalcCdrAAProfile:
CalcCdrAAProfile
^^^^^^^^^^^^^^^^
Generates amino acid physical properties profile of CDR3. Amino acids are
first grouped to corresponding CDR3 sub-regions and then binned by position
within the sub-region. Amino acids in a given bin is scored according to
its physical properties, sums of those scores and total number of amino acids
is reported for each sample/sub-region/bin/property combination.
For example under the **polarity** property amino acids are marked as polar (``1``)
and non-polar (``0``) and the sum of these values is returned. When divided by
the total number of amino acids one will get the fraction of polar amino acids
in a given sample/sub-region. For **volume** the same operation will return the
average volume of amino acids.
Command line usage
~~~~~~~~~~~~~~~~~~
.. code-block:: bash
$VDJTOOLS CalcCdrAAProfile \
[options] [sample1.txt sample2.txt ... if -m is not specified] output_prefix
Parameters:
+-------------+-----------------------+--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Shorthand | Long name | Argument | Description |
+=============+=======================+====================+============================================================================================================================================================+
| ``-m`` | ``--metadata`` | path | Path to metadata file. See :ref:`common_params` |
+-------------+-----------------------+--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-w`` | ``--weighted`` | | If set, will weight amino acid property values by clonotype frequency. |
+-------------+-----------------------+--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-n`` | ``--normalize`` | | If set, will normalize amino acid property values by dividing them by corresponding CDR3 sub-region size. |
+-------------+-----------------------+--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-r`` | ``--region-list`` | region1,... | List of CDR3 sub-regions to count statistics for, default is ``"CDR3-full,VJ-junc,V-germ,J-germ`` |
+-------------+-----------------------+--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-o`` | ``--property-list`` | property1,... | List of amino acid physicochemical properties to use, see below for allowed value. Uses all amino acid properties from list below by default. |
+-------------+-----------------------+--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-h`` | ``--help`` | | Display help message |
+-------------+-----------------------+--------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
Supported CDR3 sub-regions:
+-----------------+--------------------------------------------------------------------------+
| Name | Description |
+=================+==========================================================================+
| ``CDR3-full`` | Complete CDR3 region |
+-----------------+--------------------------------------------------------------------------+
| ``CDR3-center`` | Central 5 amino acids of CDR3 |
+-----------------+--------------------------------------------------------------------------+
| ``V-germ`` | Germline part of CDR3 region corresponding to Variable segment |
+-----------------+--------------------------------------------------------------------------+
| ``D-germ`` | Germline part of CDR3 region corresponding to Diversity segment |
+-----------------+--------------------------------------------------------------------------+
| ``J-germ`` | Germline part of CDR3 region corresponding to Joining segment |
+-----------------+--------------------------------------------------------------------------+
| ``VD-junc`` | Variable-Diversity segment junction, applicable when D segment is mapped |
+-----------------+--------------------------------------------------------------------------+
| ``DJ-junc`` | Diversity-Joining segment junction, applicable when D segment is mapped |
+-----------------+--------------------------------------------------------------------------+
| ``VJ-junc`` | Variable-Joining segment junction, including D segment if it is mapped |
+-----------------+--------------------------------------------------------------------------+
Supported amino acid physical properties (see `full table `__ for raw values):
+-------------------+-----------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------+
| Name | Description | Reference |
+===================+=================================================================================================================+=================================================================+
| ``alpha`` | Preference to appear in alpha helices | Stryer L et al. Biochemistry, 5th edition. ISBN 978-0716746843 |
+-------------------+-----------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------+
| ``beta`` | Preference to appear in beta sheets | Stryer L et al. Biochemistry, 5th edition. ISBN 978-0716746843 |
+-------------------+-----------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------+
| ``turn`` | Preference to appear in turns | Stryer L et al. Biochemistry, 5th edition. ISBN 978-0716746843 |
+-------------------+-----------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------+
| ``surface`` | Residues that have unchanged accessibility area when PPI partner is present | `PMID:22559010 `__ |
+-------------------+-----------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------+
| ``rim`` | Residues that have changed accessibility area, but no atoms with zero accessibility in PPI interfaces | `PMID:22559010 `__ |
+-------------------+-----------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------+
| ``core`` | Residues that have changed accessibility area and at least one atom with zero accessibility in PPI interfaces | `PMID:22559010 `__ |
+-------------------+-----------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------+
| ``disorder`` | Intrinsic structural disorder-promoting, order-promoting and neutral amino acids | `PMID:11381529 `__ |
+-------------------+-----------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------+
| ``charge`` | Charged/non-charged amino acids | `Wikipedia `__ |
+-------------------+-----------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------+
| ``pH`` | Amino acid pH level | `Wikipedia `__ |
+-------------------+-----------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------+
| ``polarity`` | Polar/non-polar amino acids | `Wikipedia `__ |
+-------------------+-----------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------+
| ``hydropathy`` | Amino acid hydropathy | `Wikipedia `__ |
+-------------------+-----------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------+
| ``volume`` | Amino acid volume | `Wikipedia `__ |
+-------------------+-----------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------+
| ``strength`` | Strongly-interacting amino acids / amino acids depleted by purifying selection in thymus | `PMID:18946038 `__ |
+-------------------+-----------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------+
| ``mjenergy`` | Mean value of MJ statistical potential for each amino acid, used to derive 'strength' | `PMID:8604144 `__ |
+-------------------+-----------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------+
| ``kf1``..``kf10`` | Values of 10 Kidera factors summarizing physicochemical properties of amino acids | unpublished |
+-------------------+-----------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------+
Tabular output
~~~~~~~~~~~~~~
A summary table with averaged amino acid property values is generated,
suffixed ``cdr3aa.profile.[wt or unwt based on -u].txt``. The table contains
the following columns:
+---------------+---------------------------------------------------------------------------------------------------------------+
| Column | Description |
+===============+===============================================================================================================+
| sample\_id | Sample unique identifier |
+---------------+---------------------------------------------------------------------------------------------------------------+
| ... | Sample metadata columns. See `Metadata `__ section |
+---------------+---------------------------------------------------------------------------------------------------------------+
| region | Current CDR3 sub-region, see above |
+---------------+---------------------------------------------------------------------------------------------------------------+
| property | Amino acid physical property name, see above |
+---------------+---------------------------------------------------------------------------------------------------------------+
| mean | Mean property value |
+---------------+---------------------------------------------------------------------------------------------------------------+
Graphical output
~~~~~~~~~~~~~~~~
none
--------------
.. _Annotate2:
Annotate
^^^^^^^^
This routine will compute a set of properties for each clonotype's CDR3 sequence and
append them to resulting clonotype table. For example, number of added N-nucleotides
and the sum of polar amino acids in CDR3. The main difference from :ref:`CalcCdrAAProfile`
is that the former computes sample-level average while this routine performs calculation
on clonotype level.
Command line usage
~~~~~~~~~~~~~~~~~~
.. code-block:: bash
$VDJTOOLS Annotate \
[options] [sample1.txt sample2.txt ... if -m is not specified] output_prefix
Parameters:
+-------------+-----------------------+--------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Shorthand | Long name | Argument | Description |
+=============+=======================+====================+===========================================================================================================================================================================================================================================================================+
| ``-m`` | ``--metadata`` | path | Path to metadata file. See :ref:`common_params` |
+-------------+-----------------------+--------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-b`` | ``--base`` | param1,param2,... | Comma-separated list of basic clonotype features to calculate and append to resulting clonotype tables. See below for allowed values. Default: ``cdr3Length,ndnSize,insertSize`` |
+-------------+-----------------------+--------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-a`` | ``--aaprop`` | property1,... | Comma-separated list of amino acid properties. Amino acid property value sum will be calculated for CDR3 sequence (blank annotations will be generated for non-coding clonotypes). See below for allowed values. Default: ``hydropathy,charge,polarity,strength,contact`` |
+-------------+-----------------------+--------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-h`` | ``--help`` | | Display help message |
+-------------+-----------------------+--------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
List of basic annotation properties:
+----------------+--------------------------------------------------------------------------------------------------+
| Name | Description |
+================+==================================================================================================+
| ``cdr3Length`` | Length of CDR3 region |
+----------------+--------------------------------------------------------------------------------------------------+
| ``NDNSize`` | Number of nucleotides between last base of V germline and first base of J germline parts of CDR3 |
+----------------+--------------------------------------------------------------------------------------------------+
| ``insertSize`` | Number of added N-nucleotides |
+----------------+--------------------------------------------------------------------------------------------------+
| ``VDIns`` | Number of added N-nucleotides in V-D junction or ``-1`` if D segment is undefined |
+----------------+--------------------------------------------------------------------------------------------------+
| ``DJIns`` | Number of added N-nucleotides in D-J junction or ``-1`` if D segment is undefined |
+----------------+--------------------------------------------------------------------------------------------------+
See :ref:`CalcCdrAAProfile` for the list of amino acid properties available for annotation.
Sum of specified amino acid property values across all amino acids of CDR3 will be computed.
It can be divided by ``cdr3Length / 3`` basic property value to get the average.
Tabular output
~~~~~~~~~~~~~~
Processed samples will have additional annotation columns appended to VDJtools clonotype
table columns. Those columns will be prefixed with ``base.`` for basic CDR3 properties
and ``aaprop.`` for CDR3 amino acid composition properties.
A metadata file will be created for resulting samples with ``annot:[-b value]:[-a value]``
appended to the ``..filter..`` metadata column.
Graphical output
~~~~~~~~~~~~~~~~
none
----------------
.. _ScanDatabase:
ScanDatabase (DEPRECATED since v1.0.5, use `VDJmatch `__)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Annotates a set of samples using immune receptor database based on
V-(D)-J junction matching. By default uses
`VDJdb `__, which contains CDR3
sequences, Variable and Joining segments of known specificity obtained
using literature mining. This routine supports user-provided databases
and allows flexible filtering of results based on database fields. The
output of ScanDatabase includes both detailed (clonotype-wise)
annotation of samples and summary statistics. Only amino-acid CDR3
sequences are used in database querying.
Command line usage
~~~~~~~~~~~~~~~~~~
.. code-block:: bash
$VDJTOOLS ScanDatabase \
[options] [sample1.txt sample2.txt ... if -m is not specified] output_prefix
Parameters:
+-------------+-----------------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Shorthand | Long name | Argument | Description |
+=============+=======================+==================+===================================================================================================================================================================================+
| ``-m`` | ``--metadata`` | path | Path to metadata file. See :ref:`common_params` |
+-------------+-----------------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-D`` | ``--database`` | path | Path to an external database file. Will use built-in VDJdb if not specified. |
+-------------+-----------------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-d`` | ``--details`` | | Will provide a detailed output for each sample with annotated clonotype matches |
+-------------+-----------------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-f`` | ``--fuzzy`` | | Will query database allowing at most 2 substitutions, 1 deletion and 1 insertion but no more than 2 mismatches simultaneously. If not set, only exact matches will be reported |
+-------------+-----------------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| | ``--filter`` | ``expression`` | Logical pre-filter on database columns. See below |
+-------------+-----------------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| | ``--v-match`` | | V segment must to match |
+-------------+-----------------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| | ``--j-match`` | | J segment must to match |
+-------------+-----------------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``-h`` | ``--help`` | | Display help message |
+-------------+-----------------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
.. note::
Database filter is a logical expression that contains
reference to input table columns. Database column name references should
be surrounded with double underscores (``__``). Syntax supports Regex and
standard Java/Groovy functions such as ``.contains()``, ``.startsWith()``,
etc. Here are some examples:
.. code-block:: groovy
__origin__=~/EBV/
!(__origin__=~/CMV/)
Note that the expression should be quoted: ``--filter "__origin__=~/HSV/"``
Tabular output
~~~~~~~~~~~~~~
A summary table suffixed ``annot.[database name].summary.txt`` is
generated. First header line marked with ``##FILTER`` contains filtering
expression that was used. The table contains the following columns:
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Column | Description |
+==================================+==================================================================================================================================================================================================================================================================================================+
| sample\_id | Sample unique identifier |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ... | Sample metadata columns. See `Metadata `__ section |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| diversity | Number of clonotypes in sample |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| match\_size | Number of matches between sample and database. In case ``--fuzzy`` mode is on, all matches will be counted. E.g. if clonotype ``a`` in the sample matches clonotypes ``A`` and ``B`` in the database and clonotype ``b`` in the sample matches clonotype B the value in this column will be 3. |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| sample\_diversity\_in\_matches | Number of unique clonotypes in the sample that matched clonotypes from the database |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| db\_diversity\_in\_matches | Number of unique clonotypes in the database that matched clonotypes from the sample |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| sample\_freq\_in\_matches | Overall frequency of unique clonotypes in the sample that matched clonotypes from the database |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| mean\_matched\_clone\_size | Geometric mean of frequency of unique clonotypes in the sample that matched clonotypes from the database |
+----------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Detailed database query results will be also reported for each sample if
``-d`` is specified. Those tables are suffixed
``annot.[database name].[sample id].txt`` and contain the following
columns.
+-------------------+-----------------------------------------------------------------------+
| Column | Description |
+===================+=======================================================================+
| score | CDR3 sequence alignment score |
+-------------------+-----------------------------------------------------------------------+
| query\_cdr3aa | Query CDR3 amino acid sequence |
+-------------------+-----------------------------------------------------------------------+
| query\_v | Query Variable segment |
+-------------------+-----------------------------------------------------------------------+
| query\_j | Query Joining segment |
+-------------------+-----------------------------------------------------------------------+
| subject\_cdr3aa | Subject CDR3 amino acid sequence |
+-------------------+-----------------------------------------------------------------------+
| subject\_v | Subject Variable segment |
+-------------------+-----------------------------------------------------------------------+
| subject\_j | Subject Joining segment |
+-------------------+-----------------------------------------------------------------------+
| v\_match | ``true`` if Variable segments of query and subject clonotypes match |
+-------------------+-----------------------------------------------------------------------+
| j\_match | ``true`` if Joining segments of query and subject clonotypes match |
+-------------------+-----------------------------------------------------------------------+
| mismatches | Comma-separated list of query->subject mismatches |
+-------------------+-----------------------------------------------------------------------+
| ... | Database fields corresponding to subject clonotype |
+-------------------+-----------------------------------------------------------------------+
Graphical output
~~~~~~~~~~~~~~~~
none vdjtools-1.2.1+git20190311/doc/conf.py 000644 001750 001750 00000021767 13525023773 020207 0 ustar 00moeller moeller 000000 000000 # -*- coding: utf-8 -*-
#
# vdjtools documentation build configuration file, created by
# sphinx-quickstart on Tue Mar 24 17:33:45 2015.
#
# This file is execfile()d with the current directory set to its
# containing dir.
#
# Note that not all possible configuration values are present in this
# autogenerated file.
#
# All configuration values have a default; values that are commented out
# serve to show the default.
import sys
import os
import shlex
# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#sys.path.insert(0, os.path.abspath('.'))
# -- General configuration ------------------------------------------------
# If your documentation needs a minimal Sphinx version, state it here.
#needs_sphinx = '1.0'
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.mathjax',
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
# source_suffix = ['.rst', '.md']
source_suffix = '.rst'
# The encoding of source files.
#source_encoding = 'utf-8-sig'
# The master toctree document.
master_doc = 'index'
# General information about the project.
project = u'vdjtools'
copyright = u'2015, Mikhail Shugay'
author = u'Mikhail Shugay'
# The version info for the project you're documenting, acts as replacement for
# |version| and |release|, also used in various other places throughout the
# built documents.
#
# The short X.Y version.
version = '1.0'
# The full version, including alpha/beta/rc tags.
release = 'SNAPSHOT'
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None
# There are two options for replacing |today|: either, you set today to some
# non-false value, then it is used:
#today = ''
# Else, today_fmt is used as the format for a strftime call.
#today_fmt = '%B %d, %Y'
# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
exclude_patterns = ['_build']
# The reST default role (used for this markup: `text`) to use for all
# documents.
#default_role = None
# If true, '()' will be appended to :func: etc. cross-reference text.
#add_function_parentheses = True
# If true, the current module name will be prepended to all description
# unit titles (such as .. function::).
#add_module_names = True
# If true, sectionauthor and moduleauthor directives will be shown in the
# output. They are ignored by default.
#show_authors = False
# The name of the Pygments (syntax highlighting) style to use.
pygments_style = 'sphinx'
# A list of ignored prefixes for module index sorting.
#modindex_common_prefix = []
# If true, keep warnings as "system message" paragraphs in the built documents.
#keep_warnings = False
# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = False
# -- Options for HTML output ----------------------------------------------
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
html_theme = 'default'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#html_theme_options = {}
# Add any paths that contain custom themes here, relative to this directory.
#html_theme_path = []
# The name for this set of Sphinx documents. If None, it defaults to
# " v documentation".
#html_title = None
# A shorter title for the navigation bar. Default is the same as html_title.
#html_short_title = None
# The name of an image file (relative to this directory) to place at the top
# of the sidebar.
#html_logo = None
# The name of an image file (within the static path) to use as favicon of the
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
# pixels large.
#html_favicon = None
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
# Add any extra paths that contain custom files (such as robots.txt or
# .htaccess) here, relative to this directory. These files are copied
# directly to the root of the documentation.
#html_extra_path = []
# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
# using the given strftime format.
#html_last_updated_fmt = '%b %d, %Y'
# If true, SmartyPants will be used to convert quotes and dashes to
# typographically correct entities.
#html_use_smartypants = True
# Custom sidebar templates, maps document names to template names.
#html_sidebars = {}
# Additional templates that should be rendered to pages, maps page names to
# template names.
#html_additional_pages = {}
# If false, no module index is generated.
#html_domain_indices = True
# If false, no index is generated.
#html_use_index = True
# If true, the index is split into individual pages for each letter.
#html_split_index = False
# If true, links to the reST sources are added to the pages.
#html_show_sourcelink = True
# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
#html_show_sphinx = True
# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
#html_show_copyright = True
# If true, an OpenSearch description file will be output, and all pages will
# contain a tag referring to it. The value of this option must be the
# base URL from which the finished HTML is served.
#html_use_opensearch = ''
# This is the file name suffix for HTML files (e.g. ".xhtml").
#html_file_suffix = None
# Language to be used for generating the HTML full-text search index.
# Sphinx supports the following languages:
# 'da', 'de', 'en', 'es', 'fi', 'fr', 'hu', 'it', 'ja'
# 'nl', 'no', 'pt', 'ro', 'ru', 'sv', 'tr'
#html_search_language = 'en'
# A dictionary with options for the search language support, empty by default.
# Now only 'ja' uses this config value
#html_search_options = {'type': 'default'}
# The name of a javascript file (relative to the configuration directory) that
# implements a search results scorer. If empty, the default will be used.
#html_search_scorer = 'scorer.js'
# Output file base name for HTML help builder.
htmlhelp_basename = 'vdjtoolsdoc'
# -- Options for LaTeX output ---------------------------------------------
latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#'preamble': '',
# Latex figure (float) alignment
#'figure_align': 'htbp',
}
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'vdjtools.tex', u'vdjtools Documentation',
u'Mikhail Shugay', 'manual'),
]
# The name of an image file (relative to this directory) to place at the top of
# the title page.
#latex_logo = None
# For "manual" documents, if this is true, then toplevel headings are parts,
# not chapters.
#latex_use_parts = False
# If true, show page references after internal links.
#latex_show_pagerefs = False
# If true, show URL addresses after external links.
#latex_show_urls = False
# Documents to append as an appendix to all manuals.
#latex_appendices = []
# If false, no module index is generated.
#latex_domain_indices = True
# -- Options for manual page output ---------------------------------------
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [
(master_doc, 'vdjtools', u'vdjtools Documentation',
[author], 1)
]
# If true, show URL addresses after external links.
#man_show_urls = False
# -- Options for Texinfo output -------------------------------------------
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(master_doc, 'vdjtools', u'vdjtools Documentation',
author, 'vdjtools', 'One line description of project.',
'Miscellaneous'),
]
# Documents to append as an appendix to all manuals.
#texinfo_appendices = []
# If false, no module index is generated.
#texinfo_domain_indices = True
# How to display URL addresses: 'footnote', 'no', or 'inline'.
#texinfo_show_urls = 'footnote'
# If true, do not generate a @detailmenu in the "Top" node's menu.
#texinfo_no_detailmenu = False
vdjtools-1.2.1+git20190311/doc/Makefile 000644 001750 001750 00000016371 13525023773 020343 0 ustar 00moeller moeller 000000 000000 # Makefile for Sphinx documentation
#
# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
PAPER =
BUILDDIR = _build
# User-friendly check for sphinx-build
ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
$(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
endif
# Internal variables.
PAPEROPT_a4 = -D latex_paper_size=a4
PAPEROPT_letter = -D latex_paper_size=letter
ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
# the i18n builder cannot share the environment and doctrees with the others
I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest coverage gettext
help:
@echo "Please use \`make ' where is one of"
@echo " html to make standalone HTML files"
@echo " dirhtml to make HTML files named index.html in directories"
@echo " singlehtml to make a single large HTML file"
@echo " pickle to make pickle files"
@echo " json to make JSON files"
@echo " htmlhelp to make HTML files and a HTML help project"
@echo " qthelp to make HTML files and a qthelp project"
@echo " applehelp to make an Apple Help Book"
@echo " devhelp to make HTML files and a Devhelp project"
@echo " epub to make an epub"
@echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
@echo " latexpdf to make LaTeX files and run them through pdflatex"
@echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
@echo " text to make text files"
@echo " man to make manual pages"
@echo " texinfo to make Texinfo files"
@echo " info to make Texinfo files and run them through makeinfo"
@echo " gettext to make PO message catalogs"
@echo " changes to make an overview of all changed/added/deprecated items"
@echo " xml to make Docutils-native XML files"
@echo " pseudoxml to make pseudoxml-XML files for display purposes"
@echo " linkcheck to check all external links for integrity"
@echo " doctest to run all doctests embedded in the documentation (if enabled)"
@echo " coverage to run coverage check of the documentation (if enabled)"
clean:
rm -rf $(BUILDDIR)/*
html:
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
dirhtml:
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
@echo
@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
singlehtml:
$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
@echo
@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
pickle:
$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
@echo
@echo "Build finished; now you can process the pickle files."
json:
$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
@echo
@echo "Build finished; now you can process the JSON files."
htmlhelp:
$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
@echo
@echo "Build finished; now you can run HTML Help Workshop with the" \
".hhp project file in $(BUILDDIR)/htmlhelp."
qthelp:
$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
@echo
@echo "Build finished; now you can run "qcollectiongenerator" with the" \
".qhcp project file in $(BUILDDIR)/qthelp, like this:"
@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/vdjtools.qhcp"
@echo "To view the help file:"
@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/vdjtools.qhc"
applehelp:
$(SPHINXBUILD) -b applehelp $(ALLSPHINXOPTS) $(BUILDDIR)/applehelp
@echo
@echo "Build finished. The help book is in $(BUILDDIR)/applehelp."
@echo "N.B. You won't be able to view it unless you put it in" \
"~/Library/Documentation/Help or install it in your application" \
"bundle."
devhelp:
$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
@echo
@echo "Build finished."
@echo "To view the help file:"
@echo "# mkdir -p $$HOME/.local/share/devhelp/vdjtools"
@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/vdjtools"
@echo "# devhelp"
epub:
$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
@echo
@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
latex:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo
@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
@echo "Run \`make' in that directory to run these through (pdf)latex" \
"(use \`make latexpdf' here to do that automatically)."
latexpdf:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through pdflatex..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
latexpdfja:
$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
@echo "Running LaTeX files through platex and dvipdfmx..."
$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
text:
$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
@echo
@echo "Build finished. The text files are in $(BUILDDIR)/text."
man:
$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
@echo
@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
texinfo:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo
@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
@echo "Run \`make' in that directory to run these through makeinfo" \
"(use \`make info' here to do that automatically)."
info:
$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
@echo "Running Texinfo files through makeinfo..."
make -C $(BUILDDIR)/texinfo info
@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
gettext:
$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
@echo
@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
changes:
$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
@echo
@echo "The overview file is in $(BUILDDIR)/changes."
linkcheck:
$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
@echo
@echo "Link check complete; look for any errors in the above output " \
"or in $(BUILDDIR)/linkcheck/output.txt."
doctest:
$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
@echo "Testing of doctests in the sources finished, look at the " \
"results in $(BUILDDIR)/doctest/output.txt."
coverage:
$(SPHINXBUILD) -b coverage $(ALLSPHINXOPTS) $(BUILDDIR)/coverage
@echo "Testing of coverage in the sources finished, look at the " \
"results in $(BUILDDIR)/coverage/python.txt."
xml:
$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
@echo
@echo "Build finished. The XML files are in $(BUILDDIR)/xml."
pseudoxml:
$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
@echo
@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."
vdjtools-1.2.1+git20190311/doc/usage.rst 000644 001750 001750 00000005221 13525023773 020531 0 ustar 00moeller moeller 000000 000000 Usage
-----
Command line usage
^^^^^^^^^^^^^^^^^^
General way to execute VDJtools routines would be the following,
.. code-block:: bash
java -Xmx16G -jar vdjtools.jar RoutineName [arguments] -m metadata.txt output/prefix
Output prefix could be either an output directory name (if ended with
``/``) or an output file prefix. Most VDJtools routines will append
the prefix with an intuitive suffix and extension.
The ``-m metadata.txt`` argument specifies a metadata file with relative sample paths,
sample names and any other information to provide this information later in analysis.
For more details, see the :ref:`metadata` section.
Alternatively, ``-m`` argument could be substituted with a
space-separated list of files, e.g.
.. code-block:: bash
java -Xmx16G -jar vdjtools.jar RoutineName sample1.txt[.gz] sample2.txt[.gz] ... output/prefix
Whether not explicitly used (such as in "...Plot" routines) and applicable,
plotting is turned on with ``-p`` argument.
The ``-h`` argument will bring up help message for specified routine.
.. warning::
Consider allocating sufficient memory for Java Virtual Machine
when running the pipeline. To do so, execute the java with the
``-Xmx`` argument, e.g.:
.. code-block:: bash
java -Xmx16G -jar vdjtools.jar RoutineName [arguments]
If insufficient amount memory is allocated, the Java Virtual Machine
could drop with a *Java Heap Space Out of Memory* error.
.. warning::
Due to JAR loading overhead, running VDJtools for a batch of samples should be
preferred to running VDJtools separately for each sample if possible.
See :ref:`metadata` section for more details.
.. tip::
Some routines could be memory demanding, especially when running sample
intersection/joining/pooling with a high number of large (~1,000,000 clonotypes)
datasets. Setting the ``-Xmx`` argument to 20-60Gb of memory should be enough
for most purposes, e.g. 100 samples with 500,000 clonotypes on average.
Another way to work this around is to down-sample datasets to ~100,000 reads
each using the :ref:`downsample` routine.
Importing clonotype tables
^^^^^^^^^^^^^^^^^^^^^^^^^^
In order to proceed with VDJtools analysis datasets should be converted to
VDJtools format (see :ref:`vdjtools_format`). To do this run either of the following commands:
.. code-block:: bash
java -Xmx16G -jar vdjtools.jar Convert -S software -m metadata.txt ... output_dir/
or
.. code-block:: bash
java -Xmx16G -jar vdjtools.jar Convert -S software sample1.txt[.gz] sample2.txt[.gz] ... output_dir/
An additional ``-c`` flag could be set to compress output files.
vdjtools-1.2.1+git20190311/doc/_static/ 000755 001750 001750 00000000000 13525023773 020321 5 ustar 00moeller moeller 000000 000000 vdjtools-1.2.1+git20190311/doc/_static/images/ 000755 001750 001750 00000000000 13525023773 021566 5 ustar 00moeller moeller 000000 000000 vdjtools-1.2.1+git20190311/doc/_static/images/vdjviz.png 000644 001750 001750 00000520071 13525023773 023615 0 ustar 00moeller moeller 000000 000000 PNG
IHDR #iCCPICC Profile XY 8_yp,3g2