pax_global_header00006660000000000000000000000064142037447610014522gustar00rootroot0000000000000052 comment=8a5fc7b11452ece8a661fda9a43cd74e21ff387e cwl-format-2022.02.18/000077500000000000000000000000001420374476100141735ustar00rootroot00000000000000cwl-format-2022.02.18/.github/000077500000000000000000000000001420374476100155335ustar00rootroot00000000000000cwl-format-2022.02.18/.github/workflows/000077500000000000000000000000001420374476100175705ustar00rootroot00000000000000cwl-format-2022.02.18/.github/workflows/tests.yml000066400000000000000000000016671420374476100214670ustar00rootroot00000000000000name: Tests on: [push] jobs: build: runs-on: ubuntu-latest strategy: matrix: python-version: [3.6, 3.7, 3.8] steps: - uses: actions/checkout@v2 - name: Set up Python ${{ matrix.python-version }} uses: actions/setup-python@v1 with: python-version: ${{ matrix.python-version }} - name: Install dependencies run: | python -m pip install --upgrade pip pip install . - name: Lint with flake8 run: | pip install flake8 # stop the build if there are Python syntax errors or undefined names flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics - name: Test with pytest run: | pip install pytest cd tests py.test cwl-format-2022.02.18/LICENSE000066400000000000000000000261231420374476100152040ustar00rootroot00000000000000 Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 1. Definitions. "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. "Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. "Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below). "Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof. "Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner. For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution." "Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work. 2. Grant of Copyright License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form. 3. Grant of Patent License. Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted. If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed. 4. Redistribution. You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions: (a) You must give any other recipients of the Work or Derivative Works a copy of this License; and (b) You must cause any modified files to carry prominent notices stating that You changed the files; and (c) You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and (d) If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear. The contents of the NOTICE file are for informational purposes only and do not modify the License. You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License. You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License. 5. Submission of Contributions. Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions. 6. Trademarks. This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file. 7. Disclaimer of Warranty. Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License. 8. Limitation of Liability. In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages. 9. Accepting Warranty or Additional Liability. While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License. However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability. END OF TERMS AND CONDITIONS APPENDIX: How to apply the Apache License to your work. To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information. (Don't include the brackets!) The text should be enclosed in the appropriate comment syntax for the file format. We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives. Copyright (c) 2020 Seven Bridges Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. cwl-format-2022.02.18/MANIFEST.in000066400000000000000000000000361420374476100157300ustar00rootroot00000000000000include cwlformat/keyorder.ymlcwl-format-2022.02.18/Readme.md000066400000000000000000000067361420374476100157260ustar00rootroot00000000000000# CWL Format [![Tests](https://github.com/rabix/cwl-format/workflows/Tests/badge.svg)](https://github.com/rabix/cwl-format/actions?query=workflow%3ATests) [![PyPI version](https://badge.fury.io/py/cwlformat.svg)](https://pypi.org/project/cwlformat/) CWL Format is a specification and a reference implementation for a very opinionated CWL code formatter. It outputs CWL in a standardized YAML format. It has no settings or options because you have better things to do with your time. And because CWL Format is always correct. This repository lists the formatting rules and also contains a Python implementation of the formatter. ``` pip install cwlformat cwl-format unformatted.cwl > formatted.cwl ``` If you don't have a > py3.6 environment, you can use `pipx`: ``` pip install pipx # from your < py3.6 environment pipx ensurepath # ensures CLI application directory is on your $PATH pipx install cwlformat --python python3.7 #tells pipx to set up a Py3.7 env for this app ``` Use programmatically in Python by doing ```python from cwlformat.formatter import cwl_format formatted_text = cwl_format(unformatted_text) ``` or ```python from cwlformat.formatter import stringify_dict as_dict = load_yaml(unformatted_text) formatted_str = stringify_dict(as_dict) ``` ## Rules - Only comment lines at the top of the file, including blank lines, before the actual CWL code are preserved. All other comments are lost. **Do not use this if all comments in the YAML are important to you**. - If the first line does not start with `#!/usr/bin/env ` the line `#!/usr/bin/env cwl-runner` is added to the top of the file. - All CWL fields are ordered systematically. The field order for specific fields have a defined precedence ("pinned fields"). Any fields not present in this file ("free fields") are printed after the pinned fields and ordered alphabetically. - A single blank line is added before the following fields if the parent structure is a process. - inputs - outputs - steps - requirements - hints - baseCommand - The pinned fields are defined in [this YAML file][spec]. - Specific pinned field orderings are available for CommandLineTool, ExpressionTool and Workflow processes. All other types follow a generic pinned field list. - All strings that fit within 80 columns are expressed in flow style. Longer strings or strings with new lines are expressed in block style. - All lists and maps are expressed in block style - The ordering of all lists are preserved - Indentation is 2 spaces, including for lists [spec]: https://raw.githubusercontent.com/rabix/cwl-format/master/cwlformat/keyorder.yml ## Conformance tests A series of documents are found in the [`tests`][tests] directory that can be used to check correctness of a formatter. The files named `original-*` are the input files and the files named `formatted-*` are the corresponding formatted documents. There are a mixture of YAML and JSON input files. Formatted files are always YAML. [tests]: https://github.com/rabix/cwl-format/tree/master/tests/cwl # CWL Exploder This takes as input a packed workflow (workflow with all steps in lined) and splits it recursively into parts. ``` cwl-explode formatted-atac-seq-pipeline.cwl expected-exploded-atac-seq.cwl ``` Results in the [exploded parent workflow](https://github.com/rabix/cwl-format/blob/master/tests/cwl/expected-exploded-atac-seq.cwl) and [52 sub-components](https://github.com/rabix/cwl-format/tree/master/tests/cwl/expected-exploded-atac-seq.cwl.steps) cwl-format-2022.02.18/cwlformat/000077500000000000000000000000001420374476100161715ustar00rootroot00000000000000cwl-format-2022.02.18/cwlformat/__init__.py000066400000000000000000000000001420374476100202700ustar00rootroot00000000000000cwl-format-2022.02.18/cwlformat/explode.py000066400000000000000000000043331420374476100202060ustar00rootroot00000000000000# Copyright (c) 2020 Seven Bridges # Given a cwl dict, if it is a workflow, split out any inlined steps into their own files from typing import List import pathlib import sys import ruamel.yaml from .version import __version__ from .formatter import stringify_dict, leading_comment_lines yaml = ruamel.yaml.YAML() class CWLProcess: def __init__(self, cwl: dict, file_path: pathlib.Path): self.cwl = cwl self.file_path = file_path def __str__(self): return stringify_dict(self.cwl) def save(self): self.file_path.parent.mkdir(parents=True, exist_ok=True) self.file_path.write_text(leading_comment_lines("") + stringify_dict(self.cwl)) def explode(cwl: CWLProcess) -> List[CWLProcess]: _processes = [cwl] _cwl = cwl.cwl if _cwl.get("class") == "Workflow": _cwl_steps = _cwl.get("steps", {}) _is_dict = isinstance(_cwl_steps, dict) for _k, _step in (_cwl_steps.items() if _is_dict else enumerate(_cwl_steps)): _step_id = _k if _is_dict else _step.get("id") if _step_id is not None: _run = _step.get("run") if isinstance(_run, dict): step_path = \ cwl.file_path.parent / \ (cwl.file_path.name + ".steps") / \ (_step_id + ".cwl") _step["run"] = str(step_path.relative_to(cwl.file_path.parent)) _processes += explode(CWLProcess(_run, step_path)) return _processes def main(): import argparse parser = argparse.ArgumentParser( formatter_class=argparse.RawDescriptionHelpFormatter, description=f"Rabix/cwl-explode v{__version__}\n" "Explodes CWL workflow with inlined steps") parser.add_argument("cwlfile") parser.add_argument("outname") args = parser.parse_args() fp = pathlib.Path(args.cwlfile).absolute() as_dict = yaml.load(fp.read_text()) fp_out = pathlib.Path(args.outname).absolute() for n, exploded in enumerate(explode(CWLProcess(as_dict, fp_out))): sys.stderr.write(f"{n + 1}: {exploded.file_path.relative_to(fp.parent)}\n") exploded.save() if __name__ == "__main__": main() cwl-format-2022.02.18/cwlformat/formatter.py000066400000000000000000000070161420374476100205520ustar00rootroot00000000000000# Copyright (c) 2020 Seven Bridges from typing import Union import sys import pathlib try: from importlib.resources import read_text except ImportError: # Python 3.6 fallback: https://importlib-resources.readthedocs.io/en/latest/ from importlib_resources import read_text import ruamel.yaml from ruamel.yaml import scalarstring from ruamel.yaml.compat import StringIO from ruamel.yaml.comments import CommentedMap from cwlformat.version import __version__ yaml = ruamel.yaml.YAML() yaml.indent(mapping=2, sequence=2, offset=0) Literal = ruamel.yaml.scalarstring.LiteralScalarString key_order_dict = yaml.load(read_text("cwlformat", "keyorder.yml")) hash_bang = "#!/usr/bin/env cwl-runner\n\n" hash_bang_pre = "#!/usr/bin/env " def leading_comment_lines(raw_cwl: str): top_comment = [] if len(raw_cwl) > 0 and raw_cwl.lstrip()[0] != "{": for _line in raw_cwl.splitlines(keepends=True): line = _line.strip() if line == "" or line[0] == "#": top_comment += [_line] else: break if len(top_comment) == 0 or not top_comment[0].startswith(hash_bang_pre): top_comment = [hash_bang] + top_comment return "".join(top_comment) def format_node(cwl: Union[dict, list, str], node_path=None): if isinstance(cwl, str): if len(cwl) > 80: return Literal(cwl) else: return cwl elif isinstance(cwl, dict): _fmt_cwl = CommentedMap([ (k, format_node(v, node_path + [k])) for k, v in reorder_node(cwl, node_path)]) if _fmt_cwl.get("class") in ["CommandLineTool", "ExpressionTool", "Workflow"]: add_space_between_main_sections(_fmt_cwl) return _fmt_cwl elif isinstance(cwl, list): return [format_node(v, node_path) for v in cwl] else: return cwl def add_space_between_main_sections(cwl: CommentedMap): for k in cwl.keys(): if k in ["inputs", "outputs", "steps", "requirements", "hints", "baseCommand"]: cwl.yaml_set_comment_before_after_key(key=k, before="\n") def reorder_node(cwl: dict, node_path: list) -> dict: known_key_order = key_order_dict.get( infer_type(cwl, node_path), key_order_dict["generic-ordering"]) extra_keys = sorted(set(cwl.keys()) - set(known_key_order)) for k in known_key_order + extra_keys: if k in cwl: yield k, cwl[k] def infer_type(cwl: dict, node_path: list): if "class" in cwl: return cwl["class"] else: return "generic-ordering" def cwl_format(raw_cwl: str) -> str: as_dict = yaml.load(raw_cwl) return leading_comment_lines(raw_cwl) + stringify_dict(as_dict) def stringify_dict(as_dict: dict) -> str: as_dict = format_node(as_dict, node_path=[]) stream = StringIO() yaml.dump(as_dict, stream) return stream.getvalue() def main(): import argparse parser = argparse.ArgumentParser( formatter_class=argparse.RawDescriptionHelpFormatter, description=f"Rabix/cwl-format v{__version__}\n" "A very opinionated code formatter for CWL") parser.add_argument("cwlfile") parser.add_argument("--inplace", action="store_true", help="Instead of writing formatted code to stdout, overwrite original file") args = parser.parse_args() fp = pathlib.Path(args.cwlfile) formatted = cwl_format(fp.read_text()) if args.inplace: fp.write_text(formatted) else: sys.stdout.write(formatted) if __name__ == "__main__": main() cwl-format-2022.02.18/cwlformat/keyorder.yml000066400000000000000000000017431420374476100205450ustar00rootroot00000000000000generic-ordering: - id - label - name - doc - class - type - format - default - secondaryFiles - inputBinding - prefix - position - valueFrom - separate - itemSeparator - shellQuote - outputBinding - glob - outputEval - loadContents - loadListing - dockerPull # Dirent - entryname - writable # WorkflowStep - in - scatter - scatterMethod - run - when - out - requirements - hints # WorkflowStepInput - source - outputSource - linkMerge CommandLineTool: - cwlVersion - class - label - doc - $namespaces - requirements - inputs - outputs - stdout - stderr - baseCommand - arguments - stdout - stderr - hints - id ExpressionTool: - cwlVersion - class - label - doc - requirements - inputs - outputs - expression - hints - id Workflow: - cwlVersion - class - label - doc - $namespaces - requirements - inputs - outputs - steps - hints - id cwl-format-2022.02.18/cwlformat/version.py000066400000000000000000000001151420374476100202250ustar00rootroot00000000000000# Copyright (c) 2020 Seven Bridges. See LICENSE __version__ = "2022.02.18" cwl-format-2022.02.18/setup.py000066400000000000000000000020371420374476100157070ustar00rootroot00000000000000# -*- coding: utf-8 -*- import pathlib from setuptools import setup, find_packages current_path = pathlib.Path(__file__).parent ver_path = pathlib.Path(current_path, "cwlformat", "version.py") _ver = {} exec(ver_path.open("r").read(), _ver) version = _ver["__version__"] readme = pathlib.Path(current_path, "Readme.md").read_text() setup( name='cwlformat', python_requires='>=3.6.0', version=version, description='A prettifier for CWL code', long_description=readme, long_description_content_type="text/markdown", author='Kaushik Ghose', author_email='kaushik.ghose@sbgenomics.com', url='https://github.com/rabix/cwl-format', packages=find_packages(exclude=('tests', 'docs')), entry_points={ 'console_scripts': [ 'cwl-format=cwlformat.formatter:main', 'cwl-explode=cwlformat.explode:main' ], }, include_package_data=True, install_requires=[ "ruamel.yaml >= 0.16.12", ], extras_require={':python_version<"3.7"': ['importlib-resources']}, ) cwl-format-2022.02.18/tests/000077500000000000000000000000001420374476100153355ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/000077500000000000000000000000001420374476100161225ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl000066400000000000000000000242631420374476100237370ustar00rootroot00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: Workflow label: ATAC-seq-pipeline-se doc: 'ATAC-seq pipeline - reads: SE' $namespaces: sbg: https://sevenbridges.com requirements: - class: ScatterFeatureRequirement - class: SubworkflowFeatureRequirement - class: StepInputExpressionRequirement inputs: as_narrowPeak_file: doc: Definition narrowPeak file in AutoSql format (used in bedToBigBed) type: File default_adapters_file: doc: Adapters file type: File genome_effective_size: doc: |- Effective genome size used by MACS2. It can be numeric or a shortcuts:'hs' for human (2.7e9), 'mm' for mouse (1.87e9), 'ce' for C. elegans (9e7) and 'dm' for fruitfly (1.2e8), Default:hs type: string default: hs genome_ref_first_index_file: doc: |- "First index file of Bowtie reference genome with extension 1.ebwt. \ (Note: the rest of the index files MUST be in the same folder)" type: File secondaryFiles: - ^^.2.ebwt - ^^.3.ebwt - ^^.4.ebwt - ^^.rev.1.ebwt - ^^.rev.2.ebwt genome_sizes_file: doc: Genome sizes tab-delimited file (used in samtools) type: File input_fastq_files: type: File[] nthreads_map: doc: Number of threads required for the 03-map step type: int nthreads_peakcall: doc: Number of threads required for the 04-peakcall step type: int nthreads_qc: doc: Number of threads required for the 01-qc step type: int nthreads_quant: doc: Number of threads required for the 05-quantification step type: int nthreads_trimm: doc: Number of threads required for the 02-trim step type: int picard_jar_path: doc: Picard Java jar file type: string picard_java_opts: doc: |- JVM arguments should be a quoted, space separated list (e.g. "-Xms128m -Xmx512m") type: string? trimmomatic_jar_path: doc: Trimmomatic Java jar file type: string trimmomatic_java_opts: doc: |- JVM arguments should be a quoted, space separated list (e.g. "-Xms128m -Xmx512m") type: string? outputs: map_bowtie_log_files: doc: Bowtie log file with mapping stats type: File[] outputSource: map/output_bowtie_log map_dedup_bam_files: doc: Filtered BAM files (post-processing end point) type: File[] outputSource: map/output_data_sorted_dups_marked_bam_files map_mark_duplicates_files: doc: |- Summary of duplicates removed with Picard tool MarkDuplicates (for multiple reads aligned to the same positions type: File[] outputSource: map/output_picard_mark_duplicates_files map_pbc_files: doc: PCR Bottleneck Coefficient files (used to flag samples when pbc<0.5) type: File[] outputSource: map/output_pbc_files map_percent_mitochondrial_reads: doc: Percentage of mitochondrial reads type: File[] outputSource: map/output_percent_mitochondrial_reads map_preseq_c_curve_files: doc: Preseq c_curve output files type: File[] outputSource: map/output_preseq_c_curve_files map_preseq_percentage_uniq_reads: doc: Preseq percentage of uniq reads type: File[] outputSource: map/output_percentage_uniq_reads map_read_count_mapped: doc: Read counts of the mapped BAM files type: File[] outputSource: map/output_read_count_mapped peakcall_extended_peak_file: doc: Extended fragment peaks in ENCODE Peak file format type: File[] outputSource: peak_call/output_extended_peak_file peakcall_filtered_read_count_file: doc: Filtered read count after peak calling type: File[] outputSource: peak_call/output_filtered_read_count_file peakcall_peak_bigbed_file: doc: Peaks in bigBed format type: File[] outputSource: peak_call/output_peak_bigbed_file peakcall_peak_count_within_replicate: doc: Peak counts within replicate type: File[] outputSource: peak_call/output_peak_count_within_replicate peakcall_peak_file: doc: Peaks in ENCODE Peak file format type: File[] outputSource: peak_call/output_peak_file peakcall_peak_summits_file: doc: Peaks summits in bedfile format type: File[] outputSource: peak_call/output_peak_summits_file peakcall_peak_xls_file: doc: Peak calling report file type: File[] outputSource: peak_call/output_peak_xls_file peakcall_read_in_peak_count_within_replicate: doc: Peak counts within replicate type: File[] outputSource: peak_call/output_read_in_peak_count_within_replicate peakcall_spp_x_cross_corr: doc: SPP strand cross correlation summary type: File[] outputSource: peak_call/output_spp_x_cross_corr peakcall_spp_x_cross_corr_plot: doc: SPP strand cross correlation plot type: File[] outputSource: peak_call/output_spp_cross_corr_plot qc_count_raw_reads: doc: Raw read counts of fastq files after QC type: File[] outputSource: qc/output_count_raw_reads qc_diff_counts: doc: Diff file between number of raw reads and number of reads counted by FASTQC, type: File[] outputSource: qc/output_diff_counts qc_fastqc_data_files: doc: FastQC data files type: File[] outputSource: qc/output_fastqc_data_files qc_fastqc_report_files: doc: FastQC reports in zip format type: File[] outputSource: qc/output_fastqc_report_files quant_bigwig_norm_files: doc: Normalized reads bigWig (signal) files type: File[] outputSource: quant/bigwig_norm_files quant_bigwig_raw_files: doc: Raw reads bigWig (signal) files type: File[] outputSource: quant/bigwig_raw_files trimm_fastq_files: doc: FASTQ files after trimming type: File[] outputSource: trimm/output_data_fastq_trimmed_files trimm_raw_counts: doc: Raw read counts of fastq files after trimming type: File[] outputSource: trimm/output_trimmed_fastq_read_count steps: map: in: genome_ref_first_index_file: genome_ref_first_index_file genome_sizes_file: genome_sizes_file input_fastq_files: trimm/output_data_fastq_trimmed_files nthreads: nthreads_map picard_jar_path: picard_jar_path picard_java_opts: picard_java_opts run: expected-exploded-atac-seq.cwl.steps/map.cwl out: - output_data_sorted_dedup_bam_files - output_data_sorted_dups_marked_bam_files - output_picard_mark_duplicates_files - output_pbc_files - output_bowtie_log - output_preseq_c_curve_files - output_percentage_uniq_reads - output_read_count_mapped - output_percent_mitochondrial_reads peak_call: in: as_narrowPeak_file: as_narrowPeak_file genome_effective_size: genome_effective_size input_bam_files: map/output_data_sorted_dedup_bam_files input_bam_format: valueFrom: BAM input_genome_sizes: genome_sizes_file nthreads: nthreads_peakcall run: expected-exploded-atac-seq.cwl.steps/peak_call.cwl out: - output_spp_x_cross_corr - output_spp_cross_corr_plot - output_read_in_peak_count_within_replicate - output_peak_file - output_peak_bigbed_file - output_peak_summits_file - output_extended_peak_file - output_peak_xls_file - output_filtered_read_count_file - output_peak_count_within_replicate qc: in: default_adapters_file: default_adapters_file input_fastq_files: input_fastq_files nthreads: nthreads_qc run: expected-exploded-atac-seq.cwl.steps/qc.cwl out: - output_count_raw_reads - output_diff_counts - output_fastqc_report_files - output_fastqc_data_files - output_custom_adapters quant: in: input_bam_files: map/output_data_sorted_dedup_bam_files input_genome_sizes: genome_sizes_file nthreads: nthreads_quant run: expected-exploded-atac-seq.cwl.steps/quant.cwl out: - bigwig_raw_files - bigwig_norm_files trimm: in: input_adapters_files: qc/output_custom_adapters input_fastq_files: input_fastq_files nthreads: nthreads_trimm trimmomatic_jar_path: trimmomatic_jar_path trimmomatic_java_opts: trimmomatic_java_opts run: expected-exploded-atac-seq.cwl.steps/trimm.cwl out: - output_data_fastq_trimmed_files - output_trimmed_fastq_read_count id: |- https://api.sbgenomics.com/v2/apps/kghosesbg/sbpla-31744/ATAC-seq-pipeline-se/2/raw/ sbg:appVersion: - v1.0 sbg:content_hash: ad9474546d1d7aba5aa20e3c7a03b5429e5f8ec1d18be92cbab7315600a6bce48 sbg:contributors: - kghosesbg sbg:createdBy: kghosesbg sbg:createdOn: 1580500895 sbg:id: kghosesbg/sbpla-31744/ATAC-seq-pipeline-se/2 sbg:image_url: |- https://igor.sbgenomics.com/ns/brood/images/kghosesbg/sbpla-31744/ATAC-seq-pipeline-se/2.png sbg:latestRevision: 2 sbg:modifiedBy: kghosesbg sbg:modifiedOn: 1581699121 sbg:project: kghosesbg/sbpla-31744 sbg:projectName: SBPLA-31744 sbg:publisher: sbg sbg:revision: 2 sbg:revisionNotes: |- Uploaded using sbpack v2020.02.14. Source: https://raw.githubusercontent.com/Duke-GCB/GGR-cwl/master/v1.0/ATAC-seq_pipeline/pipeline-se.cwl sbg:revisionsInfo: - sbg:modifiedBy: kghosesbg sbg:modifiedOn: 1580500895 sbg:revision: 0 sbg:revisionNotes: |- Uploaded using sbpack. Source: https://raw.githubusercontent.com/Duke-GCB/GGR-cwl/master/v1.0/ATAC-seq_pipeline/pipeline-se.cwl - sbg:modifiedBy: kghosesbg sbg:modifiedOn: 1580742764 sbg:revision: 1 sbg:revisionNotes: Just moved a node - sbg:modifiedBy: kghosesbg sbg:modifiedOn: 1581699121 sbg:revision: 2 sbg:revisionNotes: |- Uploaded using sbpack v2020.02.14. Source: https://raw.githubusercontent.com/Duke-GCB/GGR-cwl/master/v1.0/ATAC-seq_pipeline/pipeline-se.cwl sbg:sbgMaintained: false sbg:validationErrors: - 'Required input is not set: #qc.input_fastq_files' - 'Required input is not set: #qc.default_adapters_file' - 'Required input is not set: #qc.nthreads' - 'Required input is not set: #trimm.input_fastq_files' - 'Required input is not set: #trimm.input_adapters_files' - 'Required input is not set: #map.input_fastq_files' - 'Required input is not set: #map.genome_sizes_file' - 'Required input is not set: #map.genome_ref_first_index_file' - 'Required input is not set: #peak_call.input_bam_files' - 'Required input is not set: #peak_call.input_genome_sizes' - 'Required input is not set: #peak_call.as_narrowPeak_file' - 'Required input is not set: #quant.input_bam_files' - 'Required input is not set: #quant.input_genome_sizes' cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/000077500000000000000000000000001420374476100251425ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl000066400000000000000000000173731420374476100264410ustar00rootroot00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: Workflow doc: 'ATAC-seq 03 mapping - reads: SE' requirements: - class: ScatterFeatureRequirement - class: SubworkflowFeatureRequirement - class: StepInputExpressionRequirement - class: InlineJavascriptRequirement inputs: genome_ref_first_index_file: doc: |- Bowtie first index files for reference genome (e.g. *1.ebwt). The rest of the files should be in the same folder. type: File secondaryFiles: - ^^.2.ebwt - ^^.3.ebwt - ^^.4.ebwt - ^^.rev.1.ebwt - ^^.rev.2.ebwt genome_sizes_file: doc: Genome sizes tab-delimited file (used in samtools) type: File input_fastq_files: doc: Input fastq files type: File[] nthreads: type: int default: 1 picard_jar_path: doc: Picard Java jar file type: string default: /usr/picard/picard.jar picard_java_opts: doc: |- JVM arguments should be a quoted, space separated list (e.g. "-Xms128m -Xmx512m") type: string? outputs: output_bowtie_log: doc: Bowtie log file. type: File[] outputSource: bowtie-se/output_bowtie_log output_data_sorted_dedup_bam_files: doc: BAM files without duplicate reads. type: File[] outputSource: index_dedup_bams/indexed_file output_data_sorted_dups_marked_bam_files: doc: BAM files with marked duplicate reads. type: File[] outputSource: index_dups_marked_bams/indexed_file output_pbc_files: doc: PCR Bottleneck Coeficient files. type: File[] outputSource: execute_pcr_bottleneck_coef/pbc_file output_percent_mitochondrial_reads: doc: Percentage of mitochondrial reads. type: File[] outputSource: percent_mitochondrial_reads/percent_map output_percentage_uniq_reads: doc: Percentage of uniq reads from preseq c_curve output type: File[] outputSource: percent_uniq_reads/output output_picard_mark_duplicates_files: doc: Picard MarkDuplicates metrics files. type: File[] outputSource: mark_duplicates/output_metrics_file output_preseq_c_curve_files: doc: Preseq c_curve output files. type: File[] outputSource: preseq-c-curve/output_file output_read_count_mapped: doc: Read counts of the mapped BAM files type: File[] outputSource: mapped_reads_count/output output_read_count_mapped_filtered: doc: Read counts of the mapped and filtered BAM files type: File[] outputSource: mapped_filtered_reads_count/output_read_count steps: bam_idxstats: in: bam: index_bams/indexed_file scatter: bam run: map.cwl.steps/bam_idxstats.cwl out: - idxstats_file bowtie-se: in: X: valueFrom: ${return 2000} genome_ref_first_index_file: genome_ref_first_index_file input_fastq_file: input_fastq_files nthreads: nthreads output_filename: extract_basename_2/output_path v: valueFrom: ${return 2} scatter: - input_fastq_file - output_filename scatterMethod: dotproduct run: map.cwl.steps/bowtie-se.cwl out: - output_aligned_file - output_bowtie_log execute_pcr_bottleneck_coef: in: genome_sizes: genome_sizes_file input_bam_files: filtered2sorted/sorted_file input_output_filenames: extract_basename_2/output_path run: map.cwl.steps/execute_pcr_bottleneck_coef.cwl out: - pbc_file extract_basename_1: in: input_file: input_fastq_files scatter: input_file run: map.cwl.steps/extract_basename_1.cwl out: - output_basename extract_basename_2: in: file_path: extract_basename_1/output_basename scatter: file_path run: map.cwl.steps/extract_basename_2.cwl out: - output_path filter-unmapped: in: input_file: sort_bams/sorted_file output_filename: extract_basename_2/output_path scatter: - input_file - output_filename scatterMethod: dotproduct run: map.cwl.steps/filter-unmapped.cwl out: - filtered_file filtered2sorted: in: input_file: filter-unmapped/filtered_file nthreads: nthreads scatter: - input_file run: map.cwl.steps/filtered2sorted.cwl out: - sorted_file index_bams: in: input_file: sort_bams/sorted_file scatter: input_file run: map.cwl.steps/index_bams.cwl out: - indexed_file index_dedup_bams: in: input_file: sort_dedup_bams/sorted_file scatter: - input_file run: map.cwl.steps/index_dedup_bams.cwl out: - indexed_file index_dups_marked_bams: in: input_file: sort_dups_marked_bams/sorted_file scatter: - input_file run: map.cwl.steps/index_dups_marked_bams.cwl out: - indexed_file index_filtered_bam: in: input_file: filtered2sorted/sorted_file scatter: input_file run: map.cwl.steps/index_filtered_bam.cwl out: - indexed_file mapped_filtered_reads_count: in: input_bam_file: sort_dedup_bams/sorted_file output_suffix: valueFrom: .mapped_and_filtered.read_count.txt scatter: input_bam_file run: map.cwl.steps/mapped_filtered_reads_count.cwl out: - output_read_count mapped_reads_count: in: bowtie_log: bowtie-se/output_bowtie_log scatter: bowtie_log run: map.cwl.steps/mapped_reads_count.cwl out: - output mark_duplicates: in: input_file: index_filtered_bam/indexed_file java_opts: picard_java_opts output_filename: extract_basename_2/output_path output_suffix: valueFrom: bam picard_jar_path: picard_jar_path scatter: - input_file - output_filename scatterMethod: dotproduct run: map.cwl.steps/mark_duplicates.cwl out: - output_metrics_file - output_dedup_bam_file percent_mitochondrial_reads: in: chrom: valueFrom: chrM idxstats: bam_idxstats/idxstats_file output_filename: valueFrom: |- ${return inputs.idxstats.basename.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '').replace(/\.[^/.]+$/, '').replace(/\.[^/.]+$/, '.mitochondrial_percentage.txt')} scatter: idxstats run: map.cwl.steps/percent_mitochondrial_reads.cwl out: - percent_map percent_uniq_reads: in: preseq_c_curve_outfile: preseq-c-curve/output_file scatter: preseq_c_curve_outfile run: map.cwl.steps/percent_uniq_reads.cwl out: - output preseq-c-curve: in: input_sorted_file: filtered2sorted/sorted_file output_file_basename: extract_basename_2/output_path scatter: - input_sorted_file - output_file_basename scatterMethod: dotproduct run: map.cwl.steps/preseq-c-curve.cwl out: - output_file remove_duplicates: in: F: valueFrom: ${return 1024} b: valueFrom: ${return true} input_file: index_dups_marked_bams/indexed_file outfile_name: valueFrom: ${return inputs.input_file.basename.replace('dups_marked', 'dedup')} suffix: valueFrom: .dedup.bam scatter: - input_file run: map.cwl.steps/remove_duplicates.cwl out: - outfile sam2bam: in: input_file: bowtie-se/output_aligned_file nthreads: nthreads scatter: input_file run: map.cwl.steps/sam2bam.cwl out: - bam_file sort_bams: in: input_file: sam2bam/bam_file nthreads: nthreads scatter: input_file run: map.cwl.steps/sort_bams.cwl out: - sorted_file sort_dedup_bams: in: input_file: remove_duplicates/outfile nthreads: nthreads scatter: - input_file run: map.cwl.steps/sort_dedup_bams.cwl out: - sorted_file sort_dups_marked_bams: in: input_file: mark_duplicates/output_dedup_bam_file nthreads: nthreads suffix: valueFrom: .dups_marked.bam scatter: - input_file run: map.cwl.steps/sort_dups_marked_bams.cwl out: - sorted_file cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps/000077500000000000000000000000001420374476100276405ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps/bam_idxstats.cwl000066400000000000000000000013441420374476100330330ustar00rootroot00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool requirements: InitialWorkDirRequirement: listing: - $(inputs.bam) InlineJavascriptRequirement: {} inputs: bam: doc: Bam file (it should be indexed) type: File secondaryFiles: - .bai inputBinding: position: 1 outputs: idxstats_file: doc: | Idxstats output file. TAB-delimited with each line consisting of reference sequence name, sequence length, # mapped reads and # unmapped reads type: File outputBinding: glob: $(inputs.bam.basename + ".idxstats") stdout: $(inputs.bam.basename + ".idxstats") baseCommand: - samtools - idxstats hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps/bowtie-se.cwl000066400000000000000000000062441420374476100322530ustar00rootroot00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} ShellCommandRequirement: {} inputs: X: doc: 'maximum insert size for paired-end alignment (default: 250)' type: int? inputBinding: prefix: -X position: 4 best: doc: Hits guaranteed best stratum; ties broken by quality type: boolean default: true inputBinding: prefix: --best position: 5 chunkmbs: doc: |- The number of megabytes of memory a given thread is given to store path descriptors in --best mode. (Default: 256) type: int? inputBinding: prefix: --chunkmbs position: 5 genome_ref_first_index_file: doc: |- First file (extension .1.ebwt) of the Bowtie index files generated for the reference genome (see http://bowtie-bio.sourceforge.net/tutorial.shtml#newi) type: File secondaryFiles: - ^^.2.ebwt - ^^.3.ebwt - ^^.4.ebwt - ^^.rev.1.ebwt - ^^.rev.2.ebwt inputBinding: position: 9 valueFrom: $(self.path.split('.').splice(0,self.path.split('.').length-2).join(".")) input_fastq_file: doc: Query input FASTQ file. type: File inputBinding: position: 10 m: doc: 'Suppress all alignments if > exist (def: 1)' type: int default: 1 inputBinding: prefix: -m position: 7 nthreads: doc: ' number of alignment threads to launch (default: 1)' type: int default: 1 inputBinding: prefix: --threads position: 8 output_filename: type: string sam: doc: 'Write hits in SAM format (default: BAM)' type: boolean default: true inputBinding: prefix: --sam position: 2 seedlen: doc: 'seed length for -n (default: 28)' type: int? inputBinding: prefix: --seedlen position: 1 seedmms: doc: 'max mismatches in seed (between [0, 3], default: 2)' type: int? inputBinding: prefix: --seedmms position: 1 strata: doc: Hits in sub-optimal strata aren't reported (requires --best) type: boolean default: true inputBinding: prefix: --strata position: 6 t: doc: Print wall-clock time taken by search phases type: boolean default: true inputBinding: prefix: -t position: 1 trim3: doc: trim bases from 3' (right) end of reads type: int? inputBinding: prefix: --trim3 position: 1 trim5: doc: trim bases from 5' (left) end of reads type: int? inputBinding: prefix: --trim5 position: 1 v: doc: Report end-to-end hits w/ <=v mismatches; ignore qualities type: int? inputBinding: prefix: -v position: 3 outputs: output_aligned_file: doc: Aligned bowtie file in [SAM|BAM] format. type: File outputBinding: glob: $(inputs.output_filename + '.sam') output_bowtie_log: type: File outputBinding: glob: $(inputs.output_filename + '.bowtie.log') stderr: $(inputs.output_filename + '.bowtie.log') baseCommand: bowtie arguments: - position: 11 valueFrom: $(inputs.output_filename + '.sam') hints: DockerRequirement: dockerPull: dukegcb/bowtie execute_pcr_bottleneck_coef.cwl000066400000000000000000000015631420374476100360110ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: Workflow doc: ChIP-seq - map - PCR Bottleneck Coefficients requirements: - class: ScatterFeatureRequirement inputs: genome_sizes: type: File input_bam_files: type: File[] input_output_filenames: type: string[] outputs: pbc_file: type: File[] outputSource: compute_pbc/pbc steps: bedtools_genomecov: in: bg: default: true g: genome_sizes ibam: input_bam_files scatter: ibam run: execute_pcr_bottleneck_coef.cwl.steps/bedtools_genomecov.cwl out: - output_bedfile compute_pbc: in: bedgraph_file: bedtools_genomecov/output_bedfile output_filename: input_output_filenames scatter: - bedgraph_file - output_filename scatterMethod: dotproduct run: execute_pcr_bottleneck_coef.cwl.steps/compute_pbc.cwl out: - pbc execute_pcr_bottleneck_coef.cwl.steps/000077500000000000000000000000001420374476100372165ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.stepsbedtools_genomecov.cwl000066400000000000000000000170571420374476100436140ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps/execute_pcr_bottleneck_coef.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: |- Tool: bedtools genomecov (aka genomeCoverageBed) Version: v2.25.0 Summary: Compute the coverage of a feature file among a genome. Usage: bedtools genomecov [OPTIONS] -i -g Options: -ibam The input file is in BAM format. Note: BAM _must_ be sorted by position -d Report the depth at each genome position (with one-based coordinates). Default behavior is to report a histogram. -dz Report the depth at each genome position (with zero-based coordinates). Reports only non-zero positions. Default behavior is to report a histogram. -bg Report depth in BedGraph format. For details, see: genome.ucsc.edu/goldenPath/help/bedgraph.html -bga Report depth in BedGraph format, as above (-bg). However with this option, regions with zero coverage are also reported. This allows one to quickly extract all regions of a genome with 0 coverage by applying: "grep -w 0$" to the output. -split Treat "split" BAM or BED12 entries as distinct BED intervals. when computing coverage. For BAM files, this uses the CIGAR "N" and "D" operations to infer the blocks for computing coverage. For BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds fields (i.e., columns 10,11,12). -strand Calculate coverage of intervals from a specific strand. With BED files, requires at least 6 columns (strand is column 6). - (STRING): can be + or - -5 Calculate coverage of 5" positions (instead of entire interval). -3 Calculate coverage of 3" positions (instead of entire interval). -max Combine all positions with a depth >= max into a single bin in the histogram. Irrelevant for -d and -bedGraph - (INTEGER) -scale Scale the coverage by a constant factor. Each coverage value is multiplied by this factor before being reported. Useful for normalizing coverage by, e.g., reads per million (RPM). - Default is 1.0; i.e., unscaled. - (FLOAT) -trackline Adds a UCSC/Genome-Browser track line definition in the first line of the output. - See here for more details about track line definition: http://genome.ucsc.edu/goldenPath/help/bedgraph.html - NOTE: When adding a trackline definition, the output BedGraph can be easily uploaded to the Genome Browser as a custom track, BUT CAN NOT be converted into a BigWig file (w/o removing the first line). -trackopts Writes additional track line definition parameters in the first line. - Example: -trackopts 'name="My Track" visibility=2 color=255,30,30' Note the use of single-quotes if you have spaces in your parameters. - (TEXT) Notes: (1) The genome file should tab delimited and structured as follows: For example, Human (hg19): chr1 249250621 chr2 243199373 ... chr18_gl000207_random 4262 (2) The input BED (-i) file must be grouped by chromosome. A simple "sort -k 1,1 > .sorted" will suffice. (3) The input BAM (-ibam) file must be sorted by position. A "samtools sort " should suffice. Tips: One can use the UCSC Genome Browser's MySQL database to extract chromosome sizes. For example, H. sapiens: mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \ "select chrom, size from hg19.chromInfo" > hg19.genome requirements: InlineJavascriptRequirement: {} ShellCommandRequirement: {} inputs: '3': doc: "\tCalculate coverage of 3\" positions (instead of entire interval).\n" type: boolean? inputBinding: prefix: '-3' position: 1 '5': doc: "\tCalculate coverage of 5\" positions (instead of entire interval).\n" type: boolean? inputBinding: prefix: '-5' position: 1 bg: doc: | Report depth in BedGraph format. For details, see: genome.ucsc.edu/goldenPath/help/bedgraph.html type: boolean? inputBinding: prefix: -bg position: 1 bga: doc: | Report depth in BedGraph format, as above (-bg). However with this option, regions with zero coverage are also reported. This allows one to quickly extract all regions of a genome with 0 coverage by applying: "grep -w 0$" to the output. type: boolean? inputBinding: prefix: -bga position: 1 d: doc: | Report the depth at each genome position (with one-based coordinates). Default behavior is to report a histogram. type: boolean? inputBinding: prefix: -d position: 1 dz: doc: | Report the depth at each genome position (with zero-based coordinates). Reports only non-zero positions. Default behavior is to report a histogram. type: boolean? inputBinding: prefix: -dz position: 1 g: doc: type: File inputBinding: prefix: -g position: 3 ibam: doc: "\tThe input file is in BAM format.\nNote: BAM _must_ be sorted by position\n" type: File inputBinding: prefix: -ibam position: 2 max: doc: | Combine all positions with a depth >= max into a single bin in the histogram. Irrelevant for -d and -bedGraph - (INTEGER) type: int? inputBinding: prefix: -max position: 1 scale: doc: | Scale the coverage by a constant factor. Each coverage value is multiplied by this factor before being reported. Useful for normalizing coverage by, e.g., reads per million (RPM). - Default is 1.0; i.e., unscaled. - (FLOAT) type: float? inputBinding: prefix: -scale position: 1 split: doc: | Treat "split" BAM or BED12 entries as distinct BED intervals. when computing coverage. For BAM files, this uses the CIGAR "N" and "D" operations to infer the blocks for computing coverage. For BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds fields (i.e., columns 10,11,12). type: boolean? inputBinding: prefix: -split position: 1 strand: doc: | Calculate coverage of intervals from a specific strand. With BED files, requires at least 6 columns (strand is column 6). - (STRING): can be + or - type: string? inputBinding: prefix: -strand position: 1 trackline: doc: | Adds a UCSC/Genome-Browser track line definition in the first line of the output. - See here for more details about track line definition: http://genome.ucsc.edu/goldenPath/help/bedgraph.html - NOTE: When adding a trackline definition, the output BedGraph can be easily uploaded to the Genome Browser as a custom track, BUT CAN NOT be converted into a BigWig file (w/o removing the first line). type: boolean? inputBinding: prefix: -trackline position: 1 trackopts: doc: | Writes additional track line definition parameters in the first line. - Example: -trackopts 'name="My Track" visibility=2 color=255,30,30' Note the use of single-quotes if you have spaces in your parameters. - (TEXT) type: string? inputBinding: prefix: -trackopts position: 1 outputs: output_bedfile: type: File outputBinding: glob: $(inputs.ibam.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.bdg') stdout: $(inputs.ibam.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.bdg') baseCommand: - bedtools - genomecov hints: DockerRequirement: dockerPull: dukegcb/bedtools compute_pbc.cwl000066400000000000000000000007311420374476100422260ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps/execute_pcr_bottleneck_coef.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: Compute PCR Bottleneck Coeficient from BedGraph file. inputs: bedgraph_file: type: File inputBinding: position: 1 output_filename: type: string outputs: pbc: type: File outputBinding: glob: $(inputs.output_filename + '.PBC.txt') stdout: $(inputs.output_filename + '.PBC.txt') baseCommand: - awk - $4==1 {N1 += $3 - $2}; $4>=1 {Nd += $3 - $2} END {print N1/Nd} extract_basename_1.cwl000066400000000000000000000011051420374476100340120ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: Extracts the base name of a file requirements: InlineJavascriptRequirement: {} inputs: input_file: type: File inputBinding: position: 1 outputs: output_basename: type: string outputBinding: outputEval: |- $(inputs.input_file.path.substr(inputs.input_file.path.lastIndexOf('/') + 1, inputs.input_file.path.lastIndexOf('.') - (inputs.input_file.path.lastIndexOf('/') + 1))) baseCommand: echo hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr extract_basename_2.cwl000066400000000000000000000006141420374476100340170ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: Extracts the base name of a file inputs: file_path: type: string inputBinding: position: 1 outputs: output_path: type: string outputBinding: outputEval: $(inputs.file_path.replace(/\.[^/.]+$/, "")) baseCommand: echo hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr filter-unmapped.cwl000066400000000000000000000014361420374476100333700ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} inputs: input_file: doc: Aligned file to be sorted with samtools type: File inputBinding: position: 1000 nthreads: doc: Number of threads used in sorting type: int default: 1 inputBinding: prefix: -@ position: 1 output_filename: doc: Basename for the output file type: string outputs: filtered_file: doc: Filter unmapped reads in aligned file type: File outputBinding: glob: $(inputs.output_filename + '.accepted_hits.bam') stdout: $(inputs.output_filename + '.accepted_hits.bam') baseCommand: - samtools - view - -F - '4' - -b - -h hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 filtered2sorted.cwl000066400000000000000000000020311420374476100333650ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} inputs: input_file: doc: Aligned file to be sorted with samtools type: File inputBinding: position: 1000 n: doc: Sort by read name type: boolean default: false inputBinding: prefix: -n position: 1 nthreads: doc: Number of threads used in sorting type: int default: 1 inputBinding: prefix: -@ position: 1 suffix: doc: suffix of the transformed SAM/BAM file (including extension, e.g. .filtered.sam) type: string default: .sorted.bam outputs: sorted_file: doc: Sorted aligned file type: File outputBinding: glob: |- $(inputs.input_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + inputs.suffix) stdout: |- $(inputs.input_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + inputs.suffix) baseCommand: - samtools - sort hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps/index_bams.cwl000066400000000000000000000011751420374476100324640ustar00rootroot00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool requirements: InitialWorkDirRequirement: listing: - $(inputs.input_file) InlineJavascriptRequirement: {} inputs: input_file: doc: Aligned file to be sorted with samtools type: File inputBinding: position: 1 outputs: indexed_file: doc: Indexed BAM file type: File secondaryFiles: .bai outputBinding: glob: $(inputs.input_file.basename) baseCommand: - samtools - index arguments: - position: 2 valueFrom: $(inputs.input_file.basename + '.bai') hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 index_dedup_bams.cwl000066400000000000000000000011751420374476100335660ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool requirements: InitialWorkDirRequirement: listing: - $(inputs.input_file) InlineJavascriptRequirement: {} inputs: input_file: doc: Aligned file to be sorted with samtools type: File inputBinding: position: 1 outputs: indexed_file: doc: Indexed BAM file type: File secondaryFiles: .bai outputBinding: glob: $(inputs.input_file.basename) baseCommand: - samtools - index arguments: - position: 2 valueFrom: $(inputs.input_file.basename + '.bai') hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 index_dups_marked_bams.cwl000066400000000000000000000011751420374476100347630ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool requirements: InitialWorkDirRequirement: listing: - $(inputs.input_file) InlineJavascriptRequirement: {} inputs: input_file: doc: Aligned file to be sorted with samtools type: File inputBinding: position: 1 outputs: indexed_file: doc: Indexed BAM file type: File secondaryFiles: .bai outputBinding: glob: $(inputs.input_file.basename) baseCommand: - samtools - index arguments: - position: 2 valueFrom: $(inputs.input_file.basename + '.bai') hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 index_filtered_bam.cwl000066400000000000000000000011751420374476100341000ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool requirements: InitialWorkDirRequirement: listing: - $(inputs.input_file) InlineJavascriptRequirement: {} inputs: input_file: doc: Aligned file to be sorted with samtools type: File inputBinding: position: 1 outputs: indexed_file: doc: Indexed BAM file type: File secondaryFiles: .bai outputBinding: glob: $(inputs.input_file.basename) baseCommand: - samtools - index arguments: - position: 2 valueFrom: $(inputs.input_file.basename + '.bai') hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 mapped_filtered_reads_count.cwl000066400000000000000000000016101420374476100360000ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: Extract mapped reads from BAM file using Samtools flagstat command requirements: InlineJavascriptRequirement: {} ShellCommandRequirement: {} inputs: input_bam_file: doc: Aligned BAM file to filter type: File inputBinding: position: 1 output_suffix: type: string outputs: output_read_count: doc: Samtools Flagstat report file type: File outputBinding: glob: |- $(inputs.input_bam_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, "") + inputs.output_suffix) stdout: |- $(inputs.input_bam_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, "") + inputs.output_suffix) baseCommand: - samtools - flagstat arguments: - position: 10000 valueFrom: " | head -n1 | cut -f 1 -d ' '" shellQuote: false hints: DockerRequirement: dockerPull: dukegcb/samtools mapped_reads_count.cwl000066400000000000000000000011241420374476100341220ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: Get number of processed reads from Bowtie log. requirements: InlineJavascriptRequirement: {} ShellCommandRequirement: {} inputs: bowtie_log: type: File inputBinding: {} outputs: output: type: File outputBinding: glob: $(inputs.bowtie_log.path.replace(/^.*[\\\/]/, '') + '.read_count.mapped') stdout: $(inputs.bowtie_log.path.replace(/^.*[\\\/]/, '') + '.read_count.mapped') baseCommand: read-count-from-bowtie-log.sh hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr mark_duplicates.cwl000066400000000000000000000043611420374476100334430ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} ShellCommandRequirement: {} inputs: barcode_tag: doc: |- If true do not write duplicates to the output file instead of writing them with appropriate flags set. (Default false). type: string? inputBinding: prefix: BARCODE_TAG= position: 5 separate: false input_file: doc: One or more input SAM or BAM files to analyze. Must be coordinate sorted. type: File inputBinding: position: 4 valueFrom: $('INPUT=' + self.path) shellQuote: false java_opts: doc: |- JVM arguments should be a quoted, space separated list (e.g. "-Xms128m -Xmx512m") type: string? inputBinding: position: 1 shellQuote: false metrics_suffix: doc: 'Suffix used to create the metrics output file (Default: dedup_metrics.txt)' type: string default: dedup_metrics.txt output_filename: doc: Output filename used as basename type: string output_suffix: doc: 'Suffix used to identify the output file (Default: dedup.bam)' type: string default: dedup.bam picard_jar_path: doc: Path to the picard.jar file type: string inputBinding: prefix: -jar position: 2 remove_duplicates: doc: |- If true do not write duplicates to the output file instead of writing them with appropriate flags set. (Default false). type: boolean default: false inputBinding: position: 5 valueFrom: $('REMOVE_DUPLICATES=' + self) outputs: output_dedup_bam_file: type: File outputBinding: glob: $(inputs.output_filename + '.' + inputs.output_suffix) output_metrics_file: type: File outputBinding: glob: $(inputs.output_filename + '.' + inputs.metrics_suffix) baseCommand: - java arguments: - position: 3 valueFrom: MarkDuplicates - position: 5 valueFrom: $('OUTPUT=' + inputs.output_filename + '.' + inputs.output_suffix) shellQuote: false - position: 5 valueFrom: $('METRICS_FILE='+inputs.output_filename + '.' + inputs.metrics_suffix) shellQuote: false - position: 5 valueFrom: $('TMP_DIR='+runtime.tmpdir) shellQuote: false hints: DockerRequirement: dockerPull: dukegcb/picard percent_mitochondrial_reads.cwl000066400000000000000000000023411420374476100360220ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: ExpressionTool requirements: InlineJavascriptRequirement: {} inputs: chrom: doc: Query chromosome used to calculate percentage type: string idxstats: doc: Samtools idxstats file type: File inputBinding: loadContents: true output_filename: doc: Save the percentage in a file of the given name type: string? outputs: percent_map: type: - File - string expression: | ${ var regExp = new RegExp(inputs.chrom + "\\s\\d+\\s(\\d+)\\s(\\d+)"); var match = inputs.idxstats.contents.match(regExp); if (match){ var chrom_mapped_reads = match[1]; var total_reads = inputs.idxstats.contents.split("\n") .map(function(x){ var rr = x.match(/.*\s\d+\s(\d+)\s\d+/); return (rr ? rr[1] : 0); }) .reduce(function(a, b) { return Number(a) + Number(b); }); var output = (100*chrom_mapped_reads/total_reads).toFixed(4) + "%" + "\n"; if (inputs.output_filename){ return { percent_map : { "class": "File", "basename" : inputs.output_filename, "contents" : output, } } } return output; } } percent_uniq_reads.cwl000066400000000000000000000013221420374476100341400ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: Get number of processed reads from Bowtie log. requirements: InlineJavascriptRequirement: {} ShellCommandRequirement: {} inputs: preseq_c_curve_outfile: type: File inputBinding: {} outputs: output: type: File outputBinding: glob: |- $(inputs.preseq_c_curve_outfile.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, "") + '.percentage_unique_reads.txt') stdout: |- $(inputs.preseq_c_curve_outfile.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, "") + '.percentage_unique_reads.txt') baseCommand: percent-uniq-reads-from-preseq.sh hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr preseq-c-curve.cwl000066400000000000000000000045141420374476100331350ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: |- Usage: c_curve [OPTIONS] Options: -o, -output yield output file (default: stdout) -s, -step step size in extrapolations (default: 1e+06) -v, -verbose print more information -P, -pe input is paired end read file -H, -hist input is a text file containing the observed histogram -V, -vals input is a text file containing only the observed counts -B, -bam input is in BAM format -l, -seg_len maximum segment length when merging paired end bam reads (default: 5000) Help options: -?, -help print this help message -about print about message requirements: InlineJavascriptRequirement: {} inputs: B: doc: "-bam input is in BAM format \n" type: boolean default: true inputBinding: prefix: -B position: 1 H: doc: "-hist input is a text file containing the observed histogram \n" type: File? inputBinding: prefix: -H position: 1 V: doc: "-vals input is a text file containing only the observed counts \n" type: File? inputBinding: prefix: -V position: 1 input_sorted_file: doc: Sorted bed or BAM file type: File inputBinding: position: 2 l: doc: | -seg_len maximum segment length when merging paired end bam reads (default: 5000) Help options: -?, -help print this help message -about print about message type: int? inputBinding: prefix: -l position: 1 output_file_basename: type: string pe: doc: "-pe input is paired end read file \n" type: boolean? inputBinding: prefix: -P position: 1 s: doc: "-step step size in extrapolations (default: 1e+06) \n" type: float? inputBinding: prefix: -s position: 1 v: doc: "-verbose print more information \n" type: boolean default: false inputBinding: prefix: -v position: 1 outputs: output_file: type: File outputBinding: glob: $(inputs.output_file_basename + '.preseq_c_curve.txt') stdout: $(inputs.output_file_basename + '.preseq_c_curve.txt') baseCommand: - preseq - c_curve hints: DockerRequirement: dockerPull: reddylab/preseq:2.0 remove_duplicates.cwl000066400000000000000000000046331420374476100340100ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} inputs: F: doc: only include reads with none of the bits set in INT set in FLAG [0] type: int? inputBinding: prefix: -F position: 1 L: doc: FILE only include reads overlapping this BED FILE [null] type: File? inputBinding: prefix: -L position: 1 S: doc: Input format autodetected type: boolean default: true inputBinding: prefix: -S position: 1 b: doc: output BAM type: boolean? inputBinding: prefix: -b position: 1 f: doc: only include reads with all bits set in INT set in FLAG [0] type: int? inputBinding: prefix: -f position: 1 header: doc: Include header in output type: boolean? inputBinding: prefix: -h position: 1 input_file: doc: File to be converted to BAM with samtools type: File inputBinding: position: 2 nthreads: doc: Number of threads used type: int default: 1 inputBinding: prefix: -@ position: 1 outfile_name: doc: |- Output file name. If not specified, the basename of the input file with the suffix specified in the suffix argument will be used. type: string? q: doc: only include reads with mapping quality >= INT [0] type: int? inputBinding: prefix: -q position: 1 suffix: doc: suffix of the transformed SAM/BAM file (including extension, e.g. .filtered.sam) type: string? u: doc: uncompressed BAM output (implies -b) type: boolean default: true inputBinding: prefix: -u position: 1 outputs: outfile: doc: Aligned file in SAM or BAM format type: File outputBinding: glob: | ${ if (inputs.outfile_name) return inputs.outfile_name; var suffix = inputs.b ? '.bam' : '.sam'; suffix = inputs.suffix || suffix; return inputs.input_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + suffix } stdout: | ${ if (inputs.outfile_name) return inputs.outfile_name; var suffix = inputs.b ? '.bam' : '.sam'; suffix = inputs.suffix || suffix; return inputs.input_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + suffix } baseCommand: - samtools - view hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps/sam2bam.cwl000066400000000000000000000016011420374476100316670ustar00rootroot00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} inputs: S: doc: Input format autodetected type: boolean default: true inputBinding: prefix: -S position: 1 input_file: doc: File to be converted to BAM with samtools type: File inputBinding: position: 2 nthreads: doc: Number of threads used type: int default: 1 inputBinding: prefix: -@ position: 1 outputs: bam_file: doc: Aligned file in BAM format type: File outputBinding: glob: |- $(inputs.input_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.bam') stdout: |- $(inputs.input_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.bam') baseCommand: - samtools - view - -b hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps/sort_bams.cwl000066400000000000000000000020311420374476100323340ustar00rootroot00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} inputs: input_file: doc: Aligned file to be sorted with samtools type: File inputBinding: position: 1000 n: doc: Sort by read name type: boolean default: false inputBinding: prefix: -n position: 1 nthreads: doc: Number of threads used in sorting type: int default: 1 inputBinding: prefix: -@ position: 1 suffix: doc: suffix of the transformed SAM/BAM file (including extension, e.g. .filtered.sam) type: string default: .sorted.bam outputs: sorted_file: doc: Sorted aligned file type: File outputBinding: glob: |- $(inputs.input_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + inputs.suffix) stdout: |- $(inputs.input_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + inputs.suffix) baseCommand: - samtools - sort hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 sort_dedup_bams.cwl000066400000000000000000000020311420374476100334360ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} inputs: input_file: doc: Aligned file to be sorted with samtools type: File inputBinding: position: 1000 n: doc: Sort by read name type: boolean default: false inputBinding: prefix: -n position: 1 nthreads: doc: Number of threads used in sorting type: int default: 1 inputBinding: prefix: -@ position: 1 suffix: doc: suffix of the transformed SAM/BAM file (including extension, e.g. .filtered.sam) type: string default: .sorted.bam outputs: sorted_file: doc: Sorted aligned file type: File outputBinding: glob: |- $(inputs.input_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + inputs.suffix) stdout: |- $(inputs.input_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + inputs.suffix) baseCommand: - samtools - sort hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 sort_dups_marked_bams.cwl000066400000000000000000000020311420374476100346330ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/map.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} inputs: input_file: doc: Aligned file to be sorted with samtools type: File inputBinding: position: 1000 n: doc: Sort by read name type: boolean default: false inputBinding: prefix: -n position: 1 nthreads: doc: Number of threads used in sorting type: int default: 1 inputBinding: prefix: -@ position: 1 suffix: doc: suffix of the transformed SAM/BAM file (including extension, e.g. .filtered.sam) type: string default: .sorted.bam outputs: sorted_file: doc: Sorted aligned file type: File outputBinding: glob: |- $(inputs.input_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + inputs.suffix) stdout: |- $(inputs.input_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + inputs.suffix) baseCommand: - samtools - sort hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/peak_call.cwl000066400000000000000000000114111420374476100275620ustar00rootroot00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: Workflow doc: ATAC-seq 04 quantification - SE requirements: - class: ScatterFeatureRequirement - class: StepInputExpressionRequirement - class: InlineJavascriptRequirement inputs: as_narrowPeak_file: doc: Definition narrowPeak file in AutoSql format (used in bedToBigBed) type: File genome_effective_size: doc: |- Effective genome size used by MACS2. It can be numeric or a shortcuts:'hs' for human (2.7e9), 'mm' for mouse (1.87e9), 'ce' for C. elegans (9e7) and 'dm' for fruitfly (1.2e8), Default:hs type: string default: hs input_bam_files: type: File[] input_genome_sizes: doc: Two column tab-delimited file with chromosome size information type: File nthreads: type: int default: 1 outputs: output_extended_peak_file: doc: peakshift/phantomPeak extended fragment results file type: File[] outputSource: peak-calling/output_ext_frag_bdg_file output_filtered_read_count_file: doc: Filtered read count reported by MACS2 type: File[] outputSource: count-reads-filtered/read_count_file output_peak_bigbed_file: doc: Peaks in bigBed format type: File[] outputSource: peaks-bed-to-bigbed/bigbed output_peak_count_within_replicate: doc: Peak counts within replicate type: File[] outputSource: count-peaks/output_counts output_peak_file: doc: peakshift/phantomPeak results file type: File[] outputSource: peak-calling/output_peak_file output_peak_summits_file: doc: File containing peak summits type: File[] outputSource: peak-calling/output_peak_summits_file output_peak_xls_file: doc: Peak calling report file (*_peaks.xls file produced by MACS2) type: File[] outputSource: peak-calling/output_peak_xls_file output_read_in_peak_count_within_replicate: doc: Reads peak counts within replicate type: File[] outputSource: extract-count-reads-in-peaks/output_read_count output_spp_cross_corr_plot: doc: peakshift/phantomPeak results file type: File[] outputSource: spp/output_spp_cross_corr_plot output_spp_x_cross_corr: doc: peakshift/phantomPeak results file type: File[] outputSource: spp/output_spp_cross_corr steps: count-peaks: in: input_file: peak-calling/output_peak_file output_suffix: valueFrom: .peak_count.within_replicate.txt scatter: input_file run: peak_call.cwl.steps/count-peaks.cwl out: - output_counts count-reads-filtered: in: peak_xls_file: peak-calling/output_peak_xls_file scatter: peak_xls_file run: peak_call.cwl.steps/count-reads-filtered.cwl out: - read_count_file extract-count-reads-in-peaks: in: input_bam_file: filter-reads-in-peaks/filtered_file output_suffix: valueFrom: .read_count.within_replicate.txt scatter: input_bam_file run: peak_call.cwl.steps/extract-count-reads-in-peaks.cwl out: - output_read_count extract-peak-frag-length: in: input_spp_txt_file: spp/output_spp_cross_corr scatter: input_spp_txt_file run: peak_call.cwl.steps/extract-peak-frag-length.cwl out: - output_best_frag_length filter-reads-in-peaks: in: input_bam_file: input_bam_files input_bedfile: peak-calling/output_peak_file scatter: - input_bam_file - input_bedfile scatterMethod: dotproduct run: peak_call.cwl.steps/filter-reads-in-peaks.cwl out: - filtered_file peak-calling: in: format: valueFrom: BAM bdg: valueFrom: ${return true} extsize: valueFrom: ${return 200} g: genome_effective_size nomodel: valueFrom: ${return true} q: valueFrom: ${return 0.1} shift: valueFrom: ${return -100} treatment: valueFrom: $([self]) source: input_bam_files scatter: - treatment scatterMethod: dotproduct run: peak_call.cwl.steps/peak-calling.cwl out: - output_peak_file - output_peak_summits_file - output_ext_frag_bdg_file - output_peak_xls_file peaks-bed-to-bigbed: in: type: valueFrom: bed6+4 as: as_narrowPeak_file bed: trunk-peak-score/trunked_scores_peaks genome_sizes: input_genome_sizes scatter: bed run: peak_call.cwl.steps/peaks-bed-to-bigbed.cwl out: - bigbed spp: in: input_bam: input_bam_files nthreads: nthreads savp: valueFrom: ${return true} scatter: - input_bam scatterMethod: dotproduct run: peak_call.cwl.steps/spp.cwl out: - output_spp_cross_corr - output_spp_cross_corr_plot trunk-peak-score: in: peaks: peak-calling/output_peak_file scatter: peaks run: peak_call.cwl.steps/trunk-peak-score.cwl out: - trunked_scores_peaks cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/peak_call.cwl.steps/000077500000000000000000000000001420374476100307765ustar00rootroot00000000000000count-peaks.cwl000066400000000000000000000010671420374476100336630ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/peak_call.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: Counts lines in a file and returns a suffixed file with that number requirements: InlineJavascriptRequirement: {} inputs: input_file: type: File output_suffix: type: string default: .count outputs: output_counts: type: File outputBinding: glob: $(inputs.input_file.path.replace(/^.*[\\\/]/, '') + inputs.output_suffix) stdout: $(inputs.input_file.path.replace(/^.*[\\\/]/, '') + inputs.output_suffix) baseCommand: - wc - -l stdin: $(inputs.input_file.path) count-reads-filtered.cwl000066400000000000000000000012261420374476100354470ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/peak_call.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: Count number of dedup-ed reads used in peak calling requirements: InlineJavascriptRequirement: {} inputs: peak_xls_file: type: File inputBinding: position: 1 outputs: read_count_file: type: File outputBinding: glob: |- $(inputs.peak_xls_file.path.replace(/^.*[\\\/]/, '').replace(/\_peaks\.xls$/, '_read_count.txt')) stdout: |- $(inputs.peak_xls_file.path.replace(/^.*[\\\/]/, '').replace(/\_peaks\.xls$/, '_read_count.txt')) baseCommand: count-filtered-reads-macs2.sh hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr extract-count-reads-in-peaks.cwl000066400000000000000000000016101420374476100370250ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/peak_call.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: Extract mapped reads from BAM file using Samtools flagstat command requirements: InlineJavascriptRequirement: {} ShellCommandRequirement: {} inputs: input_bam_file: doc: Aligned BAM file to filter type: File inputBinding: position: 1 output_suffix: type: string outputs: output_read_count: doc: Samtools Flagstat report file type: File outputBinding: glob: |- $(inputs.input_bam_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, "") + inputs.output_suffix) stdout: |- $(inputs.input_bam_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, "") + inputs.output_suffix) baseCommand: - samtools - flagstat arguments: - position: 10000 valueFrom: " | head -n1 | cut -f 1 -d ' '" shellQuote: false hints: DockerRequirement: dockerPull: dukegcb/samtools extract-peak-frag-length.cwl000066400000000000000000000010341420374476100362100ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/peak_call.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: Extracts best fragment length from SPP output text file inputs: input_spp_txt_file: type: File inputBinding: position: 1 outputs: output_best_frag_length: type: float outputBinding: glob: best_frag_length outputEval: $(Number(self[0].contents.replace('\n', ''))) loadContents: true stdout: best_frag_length baseCommand: extract-best-frag-length.sh hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr filter-reads-in-peaks.cwl000066400000000000000000000015221420374476100355140ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/peak_call.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: Filter BAM file to only include reads overlapping with a BED file requirements: InlineJavascriptRequirement: {} inputs: input_bam_file: doc: Aligned BAM file to filter type: File inputBinding: position: 3 input_bedfile: doc: Bedfile used to only include reads overlapping this BED FILE type: File inputBinding: prefix: -L position: 2 outputs: filtered_file: doc: Filtered aligned BAM file by BED coordinates file type: File outputBinding: glob: $(inputs.input_bam_file.path.replace(/^.*[\\\/]/, '') + '.in_peaks.bam') stdout: $(inputs.input_bam_file.path.replace(/^.*[\\\/]/, '') + '.in_peaks.bam') baseCommand: - samtools - view - -b - -h hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 peak-calling.cwl000066400000000000000000000314431420374476100337620ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/peak_call.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} inputs: format: doc: |- -f {AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE,BAMPE}, --format {AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE,BAMPE} Format of tag file, "AUTO", "BED" or "ELAND" or "ELANDMULTI" or "ELANDEXPORT" or "SAM" or "BAM" or "BOWTIE" or "BAMPE". The default AUTO option will let MACS decide which format the file is. Note that MACS can't detect "BAMPE" or "BEDPE" format with "AUTO", and you have to implicitly specify the format for "BAMPE" and "BEDPE". DEFAULT: "AUTO". type: string? inputBinding: prefix: -f position: 1 SPMR: doc: |- If True, MACS will save signal per million reads for fragment pileup profiles. Require --bdg to be set. Default: False type: boolean? inputBinding: prefix: --SPMR position: 1 bdg: doc: |- Whether or not to save extended fragment pileup, and local lambda tracks (two files) at every bp into a bedGraph file. DEFAULT: True type: boolean? inputBinding: prefix: --bdg position: 1 broad: doc: |- If set, MACS will try to call broad peaks by linking nearby highly enriched regions. The linking region is controlled by another cutoff through --linking-cutoff. The maximum linking region length is 4 times of d from MACS. DEFAULT: False type: boolean? inputBinding: prefix: --broad position: 1 broad-cutoff: doc: | BROADCUTOFF Cutoff for broad region. This option is not available unless --broad is set. If -p is set, this is a pvalue cutoff, otherwise, it's a qvalue cutoff. DEFAULT: 0.1 type: float? inputBinding: prefix: --broad-cutoff position: 1 buffer-size: doc: | BUFFER_SIZE Buffer size for incrementally increasing internal array size to store reads alignment information. In most cases, you don't have to change this parameter. However, if there are large number of chromosomes/contigs/scaffolds in your alignment, it's recommended to specify a smaller buffer size in order to decrease memory usage (but it will take longer time to read alignment files). Minimum memory requested for reading an alignment file is about # of CHROMOSOME * BUFFER_SIZE * 2 Bytes. DEFAULT: 100000 type: int? inputBinding: prefix: --buffer-size position: 1 bw: doc: | BW Band width for picking regions to compute fragment size. This value is only used while building the shifting model. DEFAULT: 300 type: int? inputBinding: prefix: --bw position: 1 call-summits: doc: |- If set, MACS will use a more sophisticated signal processing approach to find subpeak summits in each enriched peak region. DEFAULT: False type: boolean? inputBinding: prefix: --call-summits position: 1 control: doc: Control sample file. type: File? inputBinding: prefix: --control position: 2 cutoff-analysis: doc: |- While set, MACS2 will analyze number or total length of peaks that can be called by different p-value cutoff then output a summary table to help user decide a better cutoff. The table will be saved in NAME_cutoff_analysis.txt file. Note, minlen and maxgap may affect the results. WARNING: May take ~30 folds longer time to finish. DEFAULT: False Post-processing options: type: boolean? inputBinding: prefix: --cutoff-analysis position: 1 down-sample: doc: |- When set, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. Warning: This option will make your result unstable and irreproducible since each time, random reads would be selected. Consider to use 'randsample' script instead. If used together with --SPMR, 1 million unique reads will be randomly picked. Caution: due to the implementation, the final number of selected reads may not be as you expected! DEFAULT: False type: boolean? inputBinding: prefix: --down-sample position: 1 extsize: doc: |- The arbitrary extension size in bp. When nomodel is true, MACS will use this value as fragment size to extend each read towards 3' end, then pile them up. It's exactly twice the number of obsolete SHIFTSIZE. In previous language, each read is moved 5'->3' direction to middle of fragment by 1/2 d, then extended to both direction with 1/2 d. This is equivalent to say each read is extended towards 5'->3' into a d size fragment. DEFAULT: 200. EXTSIZE and SHIFT can be combined when necessary. Check SHIFT option. type: float? inputBinding: prefix: --extsize position: 1 fe-cutoff: doc: | FECUTOFF When set, the value will be used to filter out peaks with low fold-enrichment. Note, MACS2 use 1.0 as pseudocount while calculating fold-enrichment. DEFAULT: 1.0 type: float? inputBinding: prefix: --fe-cutoff position: 1 fix-bimodal: doc: |- Whether turn on the auto pair model process. If set, when MACS failed to build paired model, it will use the nomodel settings, the --exsize parameter to extend each tags towards 3' direction. Not to use this automate fixation is a default behavior now. DEFAULT: False type: boolean? inputBinding: prefix: --fix-bimodal position: 1 g: doc: |- Effective genome size. It can be 1.0e+9 or 1000000000, or shortcuts:'hs' for human (2.7e9), 'mm' for mouse (1.87e9), 'ce' for C. elegans (9e7) and 'dm' for fruitfly (1.2e8), Default:hs. type: string? inputBinding: prefix: -g position: 1 keep-dup: doc: | KEEPDUPLICATES It controls the MACS behavior towards duplicate tags at the exact same location -- the same coordination and the same strand. The 'auto' option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff; and the 'all' option keeps every tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location. Default: 1 type: string? inputBinding: prefix: --keep-dup position: 1 llocal: doc: | LARGELOCAL The large nearby region in basepairs to calculate dynamic lambda. This is used to capture the surround bias. If you set this to 0, MACS will skip llocal lambda calculation. *Note* that MACS will always perform a d-size local lambda calculation. The final local bias should be the maximum of the lambda value from d, slocal, and llocal size windows. DEFAULT: 10000. type: int? inputBinding: prefix: --llocal position: 1 m: doc: | MFOLD MFOLD, --mfold MFOLD MFOLD Select the regions within MFOLD range of high- confidence enrichment ratio against background to build model. Fold-enrichment in regions must be lower than upper limit, and higher than the lower limit. Use as "-m 10 30". DEFAULT:5 50 type: string? inputBinding: prefix: -m position: 1 nolambda: doc: |- If True, MACS will use fixed background lambda as local lambda for every peak region. Normally, MACS calculates a dynamic local lambda to reflect the local bias due to potential chromatin structure. type: boolean? inputBinding: prefix: --nolambda position: 1 nomodel: doc: |- Whether or not to build the shifting model. If True, MACS will not build model. by default it means shifting size = 100, try to set extsize to change it. DEFAULT: False type: boolean? inputBinding: prefix: --nomodel position: 1 p: doc: |- Pvalue cutoff for peak detection. DEFAULT: not set. -q, and -p are mutually exclusive. If pvalue cutoff is set, qvalue will not be calculated and reported as -1 in the final .xls file.. type: float? inputBinding: prefix: -p position: 1 q: doc: |- Minimum FDR (q-value) cutoff for peak detection. DEFAULT: 0.05. -q, and -p are mutually exclusive. type: float? inputBinding: prefix: -q position: 1 ratio: doc: | RATIO When set, use a custom scaling ratio of ChIP/control (e.g. calculated using NCIS) for linear scaling. DEFAULT: ingore type: float? inputBinding: prefix: --ratio position: 1 s: doc: | TSIZE, --tsize TSIZE Tag size. This will overide the auto detected tag size. DEFAULT: Not set type: int? inputBinding: prefix: -s position: 1 seed: doc: | SEED Set the random seed while down sampling data. Must be a non-negative integer in order to be effective. DEFAULT: not set type: int? inputBinding: prefix: --seed position: 1 shift: doc: |- (NOT the legacy --shiftsize option!) The arbitrary shift in bp. Use discretion while setting it other than default value. When NOMODEL is set, MACS will use this value to move cutting ends (5') towards 5'->3' direction then apply EXTSIZE to extend them to fragments. When this value is negative, ends will be moved toward 3'->5' direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with EXTSIZE option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can't set values other than 0 if format is BAMPE for paired-end data. DEFAULT: 0. type: int? inputBinding: prefix: --shift position: 1 slocal: doc: | SMALLLOCAL The small nearby region in basepairs to calculate dynamic lambda. This is used to capture the bias near the peak summit region. Invalid if there is no control data. If you set this to 0, MACS will skip slocal lambda calculation. *Note* that MACS will always perform a d-size local lambda calculation. The final local bias should be the maximum of the lambda value from d, slocal, and llocal size windows. DEFAULT: 1000 type: int? inputBinding: prefix: --slocal position: 1 to-large: doc: |- When set, scale the small sample up to the bigger sample. By default, the bigger dataset will be scaled down towards the smaller dataset, which will lead to smaller p/qvalues and more specific results. Keep in mind that scaling down will bring down background noise more. DEFAULT: False type: boolean? inputBinding: prefix: --to-large position: 1 trackline: doc: |- Tells MACS to include trackline with bedGraph files. To include this trackline while displaying bedGraph at UCSC genome browser, can show name and description of the file as well. However my suggestion is to convert bedGraph to bigWig, then show the smaller and faster binary bigWig file at UCSC genome browser, as well as downstream analysis. Require --bdg to be set. Default: Not include trackline. type: boolean? inputBinding: prefix: --trackline position: 1 treatment: doc: |- Treatment sample file(s). If multiple files are given as -t A B C, then they will all be read and pooled together. IMPORTANT: the first sample will be used as the outputs basename. type: File[] inputBinding: prefix: --treatment position: 2 verbose: doc: | VERBOSE_LEVEL Set verbose level of runtime message. 0: only show critical message, 1: show additional warning message, 2: show process information, 3: show debug messages. DEFAULT:2 type: int? inputBinding: prefix: --verbose position: 1 outputs: output_ext_frag_bdg_file: doc: Bedgraph with extended fragment pileup. type: File? outputBinding: glob: |- $(inputs.treatment[0].path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '_treat_pileup.bdg') output_peak_file: doc: Peak calling output file in narrowPeak|broadPeak format. type: File outputBinding: glob: |- $(inputs.treatment[0].path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '_peaks.*Peak') outputEval: $(self[0]) output_peak_summits_file: doc: Peaks summits bedfile. type: File? outputBinding: glob: |- $(inputs.treatment[0].path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '_summits.bed') output_peak_xls_file: doc: Peaks information/report file. type: File outputBinding: glob: |- $(inputs.treatment[0].path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '_peaks.xls') baseCommand: - macs2 - callpeak arguments: - prefix: -n position: 1 valueFrom: $(inputs.treatment[0].path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '')) hints: DockerRequirement: dockerPull: dukegcb/macs2 peaks-bed-to-bigbed.cwl000066400000000000000000000061451420374476100351210ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/peak_call.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: | "bedToBigBed v. 2.7 - Convert bed file to bigBed. (BigBed version: 4) usage: bedToBigBed in.bed chrom.sizes out.bb Where in.bed is in one of the ascii bed formats, but not including track lines and chrom.sizes is two column: and out.bb is the output indexed big bed file. Use the script: fetchChromSizes to obtain the actual chrom.sizes information from UCSC, please do not make up a chrom sizes from your own information. The in.bed file must be sorted by chromosome,start, to sort a bed file, use the unix sort command: sort -k1,1 -k2,2n unsorted.bed > sorted.bed" requirements: InlineJavascriptRequirement: {} inputs: type: doc: | -type=bedN[+[P]] : N is between 3 and 15, optional (+) if extra "bedPlus" fields, optional P specifies the number of extra fields. Not required, but preferred. Examples: -type=bed6 or -type=bed6+ or -type=bed6+3 (see http://genome.ucsc.edu/FAQ/FAQformat.html#format1) type: string? inputBinding: prefix: -type= position: 1 separate: false as: doc: | -as=fields.as - If you have non-standard "bedPlus" fields, it's great to put a definition of each field in a row in AutoSql format here.1) type: File? inputBinding: prefix: -as= position: 1 separate: false bed: doc: Input bed file type: File inputBinding: position: 2 blockSize: doc: "-blockSize=N - Number of items to bundle in r-tree. Default 256\n" type: int? inputBinding: prefix: -blockSize= position: 1 separate: false extraIndex: doc: | -extraIndex=fieldList - If set, make an index on each field in a comma separated list extraIndex=name and extraIndex=name,id are commonly used. type: - 'null' - type: array items: string inputBinding: prefix: -extraIndex= position: 1 itemSeparator: ',' genome_sizes: doc: "genome_sizes is two column: .\n" type: File inputBinding: position: 3 itemsPerSlot: doc: "-itemsPerSlot=N - Number of data points bundled at lowest level. Default\ \ 512\n" type: int? inputBinding: prefix: -itemsPerSlot= position: 1 separate: false output_suffix: type: string default: .bb tab: doc: | -tab - If set, expect fields to be tab separated, normally expects white space separator. type: boolean? inputBinding: position: 1 unc: doc: "-unc - If set, do not use compression.\n" type: boolean? inputBinding: position: 1 outputs: bigbed: type: File outputBinding: glob: $(inputs.bed.path.replace(/^.*[\\\/]/, '')+ inputs.output_suffix) baseCommand: bedToBigBed arguments: - position: 4 valueFrom: $(inputs.bed.path.replace(/^.*[\\\/]/, '') + inputs.output_suffix) hints: DockerRequirement: dockerPull: dleehr/docker-hubutils cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/peak_call.cwl.steps/spp.cwl000066400000000000000000000111141420374476100323050ustar00rootroot00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} ShellCommandRequirement: {} inputs: control_bam: doc: |- , full path and name (or URL) of tagAlign/BAM file (can be gzipped) (FILE EXTENSION MUST BE tagAlign.gz, tagAlign, bam or bam.gz) type: File? inputBinding: prefix: -i= separate: false fdr: doc: -fdr= , false discovery rate threshold for peak calling type: float? inputBinding: prefix: -fdr= separate: false filtchr: doc: |- -filtchr= , Pattern to use to remove tags that map to specific chromosomes e.g. _ will remove all tags that map to chromosomes with _ in their name type: string? inputBinding: prefix: -filtchr= separate: false input_bam: doc: |- , full path and name (or URL) of tagAlign/BAM file (can be gzipped)(FILE EXTENSION MUST BE tagAlign.gz, tagAlign, bam or bam.gz) type: File inputBinding: prefix: -c= separate: false npeak: doc: -npeak=, threshold on number of peaks to call type: int? inputBinding: prefix: -npeak= separate: false nthreads: doc: -p= , number of parallel processing nodes, default=0 type: int? inputBinding: prefix: -p= separate: false rf: doc: 'overwrite (force remove) output files in case they exist. Default: true' type: boolean default: true inputBinding: prefix: -rf s: doc: |- -s=:: , strand shifts at which cross-correlation is evaluated, default=-500:5:1500 type: string? inputBinding: prefix: -s= separate: false savd: doc: -savd= OR -savd, save Rdata file type: boolean? inputBinding: valueFrom: |- ${ if (self) return "-savd=" + inputs.input_bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".spp.Rdata"; return null} savn: doc: -savn= OR -savn NarrowPeak file name (fixed width peaks) type: boolean? inputBinding: valueFrom: |- ${ if (self) return "-savn=" + inputs.input_bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".spp.narrowPeak"; return null} savp: doc: save cross-correlation plot type: boolean? inputBinding: valueFrom: |- ${ if (self) return "-savp=" + inputs.input_bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".spp_cross_corr.pdf"; return null} savr: doc: |- -savr= OR -savr RegionPeak file name (variable width peaks with regions of enrichment) type: boolean? inputBinding: valueFrom: |- ${ if (self) return "-savr=" + inputs.input_bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".spp.regionPeak"; return null} speak: doc: -speak=, user-defined cross-correlation peak strandshift type: string? inputBinding: prefix: -speak= separate: false x: doc: |- -x=:, strand shifts to exclude (This is mainly to avoid region around phantom peak) default=10:(readlen+10) type: string? inputBinding: prefix: -x= separate: false outputs: output_spp_cross_corr: doc: peakshift/phantomPeak results summary file type: File outputBinding: glob: |- $(inputs.input_bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".spp_cross_corr.txt") output_spp_cross_corr_plot: doc: peakshift/phantomPeak results summary plot type: File? outputBinding: glob: |- $(inputs.input_bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".spp_cross_corr.pdf") output_spp_narrow_peak: doc: narrowPeak output file type: File? outputBinding: glob: |- $(inputs.input_bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".spp.narrowPeak") output_spp_rdata: doc: Rdata file from the run_spp.R run type: File? outputBinding: glob: |- $(inputs.input_bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".spp.Rdata") output_spp_region_peak: doc: regionPeak output file type: File? outputBinding: glob: |- $(inputs.input_bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".spp.regionPeak") baseCommand: run_spp.R arguments: - valueFrom: |- $("-out=" + inputs.input_bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".spp_cross_corr.txt") - valueFrom: $("-tmpdir="+runtime.tmpdir) shellQuote: false hints: DockerRequirement: dockerPull: dukegcb/spp trunk-peak-score.cwl000066400000000000000000000013321420374476100346170ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/peak_call.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: Trunk scores in ENCODE bed6+4 files inputs: peaks: type: File inputBinding: position: 10000 sep: type: string default: \t inputBinding: prefix: -F position: 2 outputs: trunked_scores_peaks: type: File outputBinding: glob: |- $(inputs.peaks.path.replace(/^.*[\\\/]/, '').replace(/\.([^/.]+)$/, "\.trunked_scores\.$1")) stdout: |- $(inputs.peaks.path.replace(/^.*[\\\/]/, '').replace(/\.([^/.]+)$/, "\.trunked_scores\.$1")) baseCommand: awk arguments: - position: 3 valueFrom: BEGIN{OFS=FS}$5>1000{$5=1000}{print} hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/qc.cwl000066400000000000000000000056001420374476100262550ustar00rootroot00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: Workflow doc: 'ATAC-seq 01 QC - reads: SE' requirements: - class: ScatterFeatureRequirement - class: StepInputExpressionRequirement - class: InlineJavascriptRequirement inputs: default_adapters_file: doc: Adapters file type: File input_fastq_files: doc: Input fastq files type: File[] nthreads: doc: Number of threads. type: int outputs: output_count_raw_reads: type: File[] outputSource: count_raw_reads/output_read_count output_custom_adapters: type: File[] outputSource: overrepresented_sequence_extract/output_custom_adapters output_diff_counts: type: File[] outputSource: compare_read_counts/result output_fastqc_data_files: doc: FastQC data files type: File[] outputSource: extract_fastqc_data/output_fastqc_data_file output_fastqc_report_files: doc: FastQC reports in zip format type: File[] outputSource: fastqc/output_qc_report_file steps: compare_read_counts: in: file1: count_raw_reads/output_read_count file2: count_fastqc_reads/output_fastqc_read_count scatter: - file1 - file2 scatterMethod: dotproduct run: qc.cwl.steps/compare_read_counts.cwl out: - result count_fastqc_reads: in: input_basename: extract_basename/output_basename input_fastqc_data: extract_fastqc_data/output_fastqc_data_file scatter: - input_fastqc_data - input_basename scatterMethod: dotproduct run: qc.cwl.steps/count_fastqc_reads.cwl out: - output_fastqc_read_count count_raw_reads: in: input_basename: extract_basename/output_basename input_fastq_file: input_fastq_files scatter: - input_fastq_file - input_basename scatterMethod: dotproduct run: qc.cwl.steps/count_raw_reads.cwl out: - output_read_count extract_basename: in: input_file: input_fastq_files scatter: input_file run: qc.cwl.steps/extract_basename.cwl out: - output_basename extract_fastqc_data: in: input_basename: extract_basename/output_basename input_qc_report_file: fastqc/output_qc_report_file scatter: - input_qc_report_file - input_basename scatterMethod: dotproduct run: qc.cwl.steps/extract_fastqc_data.cwl out: - output_fastqc_data_file fastqc: in: input_fastq_file: input_fastq_files threads: nthreads scatter: input_fastq_file run: qc.cwl.steps/fastqc.cwl out: - output_qc_report_file overrepresented_sequence_extract: in: default_adapters_file: default_adapters_file input_basename: extract_basename/output_basename input_fastqc_data: extract_fastqc_data/output_fastqc_data_file scatter: - input_fastqc_data - input_basename scatterMethod: dotproduct run: qc.cwl.steps/overrepresented_sequence_extract.cwl out: - output_custom_adapters cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/qc.cwl.steps/000077500000000000000000000000001420374476100274665ustar00rootroot00000000000000compare_read_counts.cwl000066400000000000000000000007751420374476100341430ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/qc.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: Compares 2 files inputs: brief: type: boolean default: true inputBinding: prefix: --brief position: 3 file1: type: File inputBinding: position: 1 file2: type: File inputBinding: position: 2 outputs: result: type: File outputBinding: glob: stdout.txt stdout: stdout.txt baseCommand: diff hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr count_fastqc_reads.cwl000066400000000000000000000010331420374476100337620ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/qc.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: Extracts read count from fastqc_data.txt inputs: input_basename: type: string input_fastqc_data: type: File inputBinding: position: 1 outputs: output_fastqc_read_count: type: File outputBinding: glob: $(inputs.input_basename + '.fastqc-read_count.txt') stdout: $(inputs.input_basename + '.fastqc-read_count.txt') baseCommand: count-fastqc_data-reads.sh hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr count_raw_reads.cwl000066400000000000000000000010441420374476100332740ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/qc.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: Counts reads in a fastq file requirements: InlineJavascriptRequirement: {} inputs: input_basename: type: string input_fastq_file: type: File inputBinding: position: 1 outputs: output_read_count: type: File outputBinding: glob: $(inputs.input_basename + '.read_count.txt') stdout: $(inputs.input_basename + '.read_count.txt') baseCommand: count-fastq-reads.sh hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr extract_basename.cwl000066400000000000000000000011051420374476100334200ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/qc.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: Extracts the base name of a file requirements: InlineJavascriptRequirement: {} inputs: input_file: type: File inputBinding: position: 1 outputs: output_basename: type: string outputBinding: outputEval: |- $(inputs.input_file.path.substr(inputs.input_file.path.lastIndexOf('/') + 1, inputs.input_file.path.lastIndexOf('.') - (inputs.input_file.path.lastIndexOf('/') + 1))) baseCommand: echo hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr extract_fastqc_data.cwl000066400000000000000000000014061420374476100341230ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/qc.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: |- Unzips a zipped fastqc report and returns the fastqc_data.txt file. Unzips the file to pipe and uses redirection inputs: extract_pattern: type: string default: '*/fastqc_data.txt' inputBinding: position: 3 input_basename: type: string input_qc_report_file: type: File inputBinding: position: 2 pipe: type: boolean default: true inputBinding: prefix: -p position: 1 outputs: output_fastqc_data_file: type: File outputBinding: glob: $(inputs.input_basename + '.fastqc_data.txt') stdout: $(inputs.input_basename + '.fastqc_data.txt') baseCommand: unzip hints: DockerRequirement: dockerPull: dukegcb/fastqc cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/qc.cwl.steps/fastqc.cwl000066400000000000000000000016211420374476100314560ustar00rootroot00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} inputs: format: type: string default: fastq inputBinding: prefix: --format position: 3 input_fastq_file: type: File inputBinding: position: 4 noextract: type: boolean default: true inputBinding: prefix: --noextract position: 2 threads: type: int default: 1 inputBinding: prefix: --threads position: 5 outputs: output_qc_report_file: type: File outputBinding: glob: |- $(inputs.input_fastq_file.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, '') + "_fastqc.zip") baseCommand: fastqc arguments: - prefix: --dir position: 5 valueFrom: $(runtime.tmpdir) - prefix: -o position: 5 valueFrom: $(runtime.outdir) hints: DockerRequirement: dockerPull: dukegcb/fastqc overrepresented_sequence_extract.cwl000066400000000000000000000014301420374476100367520ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/qc.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool inputs: default_adapters_file: doc: Adapters file in fasta format type: File inputBinding: position: 2 input_basename: doc: Name of the sample - used as a base name for generating output files type: string input_fastqc_data: doc: fastqc_data.txt file from a fastqc report type: File inputBinding: position: 1 outputs: output_custom_adapters: type: File outputBinding: glob: $(inputs.input_basename + '.custom_adapters.fasta') baseCommand: overrepresented_sequence_extract.py arguments: - position: 3 valueFrom: $(inputs.input_basename + '.custom_adapters.fasta') hints: DockerRequirement: dockerPull: reddylab/overrepresented_sequence_extract:1.0 cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/quant.cwl000066400000000000000000000031521420374476100270020ustar00rootroot00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: Workflow doc: ATAC-seq - Quantification requirements: - class: ScatterFeatureRequirement - class: StepInputExpressionRequirement - class: InlineJavascriptRequirement inputs: input_bam_files: type: File[] input_genome_sizes: type: File nthreads: type: int default: 1 outputs: bigwig_norm_files: doc: signal files of pileup reads in RPKM type: File[] outputSource: bamcoverage/output_bam_coverage bigwig_raw_files: doc: Raw reads bigWig (signal) files type: File[] outputSource: bdg2bw-raw/output_bigwig steps: bamcoverage: in: bam: input_bam_files binSize: valueFrom: ${return 1} extendReads: valueFrom: ${return 200} normalizeUsing: valueFrom: RPKM numberOfProcessors: nthreads output_suffix: valueFrom: .rpkm.bw scatter: bam run: quant.cwl.steps/bamcoverage.cwl out: - output_bam_coverage bdg2bw-raw: in: bed_graph: bedsort_genomecov/bed_file_sorted genome_sizes: input_genome_sizes output_suffix: valueFrom: .raw.bw scatter: bed_graph run: quant.cwl.steps/bdg2bw-raw.cwl out: - output_bigwig bedsort_genomecov: in: bed_file: bedtools_genomecov/output_bedfile scatter: bed_file run: quant.cwl.steps/bedsort_genomecov.cwl out: - bed_file_sorted bedtools_genomecov: in: bg: valueFrom: ${return true} g: input_genome_sizes ibam: input_bam_files scatter: ibam run: quant.cwl.steps/bedtools_genomecov.cwl out: - output_bedfile cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/quant.cwl.steps/000077500000000000000000000000001420374476100302135ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/quant.cwl.steps/bamcoverage.cwl000066400000000000000000000507201420374476100332010ustar00rootroot00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: | usage: An example usage is:$ bamCoverage -b reads.bam -o coverage.bw This tool takes an alignment of reads or fragments as input (BAM file) and generates a coverage track (bigWig or bedGraph) as output. The coverage is calculated as the number of reads per bin, where bins are short consecutive counting windows of a defined size. It is possible to extended the length of the reads to better reflect the actual fragment length. *bamCoverage* offers normalization by scaling factor, Reads Per Kilobase per Million mapped reads (RPKM), and 1x depth (reads per genome coverage, RPGC). Required arguments: --bam BAM file, -b BAM file BAM file to process (default: None) Output: --outFileName FILENAME, -o FILENAME Output file name. (default: None) --outFileFormat {bigwig,bedgraph}, -of {bigwig,bedgraph} Output file type. Either "bigwig" or "bedgraph". (default: bigwig) Optional arguments: --help, -h show this help message and exit --scaleFactor SCALEFACTOR The smooth length defines a window, larger than the binSize, to average the number of reads. For example, if the –binSize is set to 20 and the –smoothLength is set to 60, then, for each bin, the average of the bin and its left and right neighbors is considered. Any value smaller than –binSize will be ignored and no smoothing will be applied. (default: 1.0) --MNase Determine nucleosome positions from MNase-seq data. Only 3 nucleotides at the center of each fragment are counted. The fragment ends are defined by the two mate reads. Only fragment lengthsbetween 130 - 200 bp are considered to avoid dinucleosomes or other artifacts.*NOTE*: Requires paired-end data. A bin size of 1 is recommended. (default: False) --filterRNAstrand {forward,reverse} Selects RNA-seq reads (single-end or paired-end) in the given strand. (default: None) --version show program's version number and exit --binSize INT bp, -bs INT bp Size of the bins, in bases, for the output of the bigwig/bedgraph file. (default: 50) --region CHR:START:END, -r CHR:START:END Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example --region chr10 or --region chr10:456700:891000. (default: None) --blackListFileName BED file, -bl BED file A BED file containing regions that should be excluded from all analyses. Currently this works by rejecting genomic chunks that happen to overlap an entry. Consequently, for BAM files, if a read partially overlaps a blacklisted region or a fragment spans over it, then the read/fragment might still be considered. (default: None) --numberOfProcessors INT, -p INT Number of processors to use. Type "max/2" to use half the maximum number of processors or "max" to use all available processors. (default: max/2) --verbose, -v Set to see processing messages. (default: False) Read coverage normalization options: --normalizeTo1x EFFECTIVE GENOME SIZE LENGTH Report read coverage normalized to 1x sequencing depth (also known as Reads Per Genomic Content (RPGC)). Sequencing depth is defined as: (total number of mapped reads * fragment length) / effective genome size. The scaling factor used is the inverse of the sequencing depth computed for the sample to match the 1x coverage. To use this option, the effective genome size has to be indicated after the option. The effective genome size is the portion of the genome that is mappable. Large fractions of the genome are stretches of NNNN that should be discarded. Also, if repetitive regions were not included in the mapping of reads, the effective genome size needs to be adjusted accordingly. Common values are: mm9: 2,150,570,000; hg19:2,451,960,000; dm3:121,400,000 and ce10:93,260,000. See Table 2 of http://www.plosone.org /article/info:doi/10.1371/journal.pone.0030377 or http ://www.nature.com/nbt/journal/v27/n1/fig_tab/nbt.1518_ T1.html for several effective genome sizes. (default: None) --ignoreForNormalization IGNOREFORNORMALIZATION [IGNOREFORNORMALIZATION ...] A list of space-delimited chromosome names containing those chromosomes that should be excluded for computing the normalization. This is useful when considering samples with unequal coverage across chromosomes, like male samples. An usage examples is --ignoreForNormalization chrX chrM. (default: None) --skipNonCoveredRegions, --skipNAs This parameter determines if non-covered regions (regions without overlapping reads) in a BAM file should be skipped. The default is to treat those regions as having a value of zero. The decision to skip non-covered regions depends on the interpretation of the data. Non-covered regions may represent, for example, repetitive regions that should be skipped. (default: False) --smoothLength INT bp The smooth length defines a window, larger than the binSize, to average the number of reads. For example, if the --binSize is set to 20 and the --smoothLength is set to 60, then, for each bin, the average of the bin and its left and right neighbors is considered. Any value smaller than --binSize will be ignored and no smoothing will be applied. (default: None) Read processing options: --extendReads [INT bp], -e [INT bp] This parameter allows the extension of reads to fragment size. If set, each read is extended, without exception. *NOTE*: This feature is generally NOT recommended for spliced-read data, such as RNA-seq, as it would extend reads over skipped regions. *Single- end*: Requires a user specified value for the final fragment length. Reads that already exceed this fragment length will not be extended. *Paired-end*: Reads with mates are always extended to match the fragment size defined by the two read mates. Unmated reads, mate reads that map too far apart (>4x fragment length) or even map to different chromosomes are treated like single-end reads. The input of a fragment length value is optional. If no value is specified, it is estimated from the data (mean of the fragment size of all mate reads). (default: False) --ignoreDuplicates If set, reads that have the same orientation and start position will be considered only once. If reads are paired, the mate's position also has to coincide to ignore a read. (default: False) --minMappingQuality INT If set, only reads that have a mapping quality score of at least this are considered. (default: None) --centerReads By adding this option, reads are centered with respect to the fragment length. For paired-end data, the read is centered at the fragment length defined by the two ends of the fragment. For single-end data, the given fragment length is used. This option is useful to get a sharper signal around enriched regions. (default: False) --samFlagInclude INT Include reads based on the SAM flag. For example, to get only reads that are the first mate, use a flag of 64. This is useful to count properly paired reads only once, as otherwise the second mate will be also considered for the coverage. (default: None) --samFlagExclude INT Exclude reads based on the SAM flag. For example, to get only reads that map to the forward strand, use --samFlagExclude 16, where 16 is the SAM flag for reads that map to the reverse strand. (default: None) requirements: InlineJavascriptRequirement: {} inputs: MNase: doc: | Determine nucleosome positions from MNase-seq data. Only 3 nucleotides at the center of each fragment are counted. The fragment ends are defined by the two mate reads. Only fragment lengthsbetween 130 - 200 bp are considered to avoid dinucleosomes or other artifacts.*NOTE*: Requires paired-end data. A bin size of 1 is recommended. (default: False) type: boolean? inputBinding: prefix: --MNase position: 1 bam: doc: 'BAM file to process ' type: File secondaryFiles: - .bai inputBinding: prefix: --bam position: 1 binSize: doc: | INT bp Size of the bins, in bases, for the output of the bigwig/bedgraph file. (default: 50) type: int? inputBinding: prefix: --binSize position: 1 blackListFileName: doc: | BED file A BED file containing regions that should be excluded from all analyses. Currently this works by rejecting genomic chunks that happen to overlap an entry. Consequently, for BAM files, if a read partially overlaps a blacklisted region or a fragment spans over it, then the read/fragment might still be considered. (default: None) type: File? inputBinding: prefix: --blackListFileName position: 1 centerReads: doc: | By adding this option, reads are centered with respect to the fragment length. For paired-end data, the read is centered at the fragment length defined by the two ends of the fragment. For single-end data, the given fragment length is used. This option is useful to get a sharper signal around enriched regions. (default: False) type: boolean? inputBinding: prefix: --centerReads position: 1 extendReads: doc: | INT bp This parameter allows the extension of reads to fragment size. If set, each read is extended, without exception. *NOTE*: This feature is generally NOT recommended for spliced-read data, such as RNA-seq, as it would extend reads over skipped regions. *Single- end*: Requires a user specified value for the final fragment length. Reads that already exceed this fragment length will not be extended. *Paired-end*: Reads with mates are always extended to match the fragment size defined by the two read mates. Unmated reads, mate reads that map too far apart (>4x fragment length) or even map to different chromosomes are treated like single-end reads. The input of a fragment length value is optional. If no value is specified, it is estimated from the data (mean of the fragment size of all mate reads). (default: False) type: int? inputBinding: prefix: --extendReads position: 1 filterRNAstrand: doc: | {forward,reverse} Selects RNA-seq reads (single-end or paired-end) in the given strand. (default: None) type: string? inputBinding: prefix: --filterRNAstrand position: 1 ignoreDuplicates: doc: | If set, reads that have the same orientation and start position will be considered only once. If reads are paired, the mate's position also has to coincide to ignore a read. (default: False) type: boolean? inputBinding: prefix: --ignoreDuplicates position: 1 ignoreForNormalization: doc: | A list of space-delimited chromosome names containing those chromosomes that should be excluded for computing the normalization. This is useful when considering samples with unequal coverage across chromosomes, like male samples. An usage examples is –ignoreForNormalization chrX chrM. type: - 'null' - type: array items: string inputBinding: prefix: --ignoreForNormalization position: 1 minMappingQuality: doc: | INT If set, only reads that have a mapping quality score of at least this are considered. (default: None) type: int? inputBinding: prefix: --minMappingQuality position: 1 normalizeUsing: doc: | Possible choices: RPKM, CPM, BPM, RPGC Use one of the entered methods to normalize the number of reads per bin. By default, no normalization is performed. RPKM = Reads Per Kilobase per Million mapped reads; CPM = Counts Per Million mapped reads, same as CPM in RNA-seq; BPM = Bins Per Million mapped reads, same as TPM in RNA-seq; RPGC = reads per genomic content (1x normalization); Mapped reads are considered after blacklist filtering (if applied). RPKM (per bin) = number of reads per bin / (number of mapped reads (in millions) * bin length (kb)). CPM (per bin) = number of reads per bin / number of mapped reads (in millions). BPM (per bin) = number of reads per bin / sum of all reads per bin (in millions). RPGC (per bin) = number of reads per bin / scaling factor for 1x average coverage. This scaling factor, in turn, is determined from the sequencing depth: (total number of mapped reads * fragment length) / effective genome size. The scaling factor used is the inverse of the sequencing depth computed for the sample to match the 1x coverage. This option requires –effectiveGenomeSize. Each read is considered independently, if you want to only count one mate from a pair in paired-end data, then use the –samFlagInclude/–samFlagExclude options. type: string? inputBinding: prefix: --normalizeUsing position: 1 numberOfProcessors: doc: | INT Number of processors to use. Type "max/2" to use half the maximum number of processors or "max" to use all available processors. (default: max/2) type: int? inputBinding: prefix: --numberOfProcessors position: 1 outFileFormat: doc: | {bigwig,bedgraph}, -of {bigwig,bedgraph} Output file type. Either "bigwig" or "bedgraph". (default: bigwig) type: string default: bigwig inputBinding: prefix: --outFileFormat position: 1 outFileName: doc: | FILENAME Output file name. (default: input BAM filename with bigwig [*.bw] or bedgraph [*.bdg] extension.) type: string? output_suffix: doc: Suffix used for output file (input BAM filename + suffix) type: string? region: doc: | CHR:START:END Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example --region chr10 or --region chr10:456700:891000. (default: None) type: string? inputBinding: prefix: --region position: 1 samFlagExclude: doc: | INT Exclude reads based on the SAM flag. For example, to get only reads that map to the forward strand, use --samFlagExclude 16, where 16 is the SAM flag for reads that map to the reverse strand. (default: None) type: int? inputBinding: prefix: --samFlagExclude position: 1 samFlagInclude: doc: | INT Include reads based on the SAM flag. For example, to get only reads that are the first mate, use a flag of 64. This is useful to count properly paired reads only once, as otherwise the second mate will be also considered for the coverage. (default: None) type: int? inputBinding: prefix: --samFlagInclude position: 1 scaleFactor: doc: | The smooth length defines a window, larger than the binSize, to average the number of reads. For example, if the –binSize is set to 20 and the –smoothLength is set to 60, then, for each bin, the average of the bin and its left and right neighbors is considered. Any value smaller than –binSize will be ignored and no smoothing will be applied. (default: 1.0) type: float? inputBinding: prefix: --scaleFactor position: 1 skipNonCoveredRegions: doc: | --skipNonCoveredRegions, --skipNAs This parameter determines if non-covered regions (regions without overlapping reads) in a BAM file should be skipped. The default is to treat those regions as having a value of zero. The decision to skip non-covered regions depends on the interpretation of the data. Non-covered regions may represent, for example, repetitive regions that should be skipped. (default: False) type: boolean? inputBinding: prefix: --skipNonCoveredRegions position: 1 smoothLength: doc: | INT bp The smooth length defines a window, larger than the binSize, to average the number of reads. For example, if the --binSize is set to 20 and the --smoothLength is set to 60, then, for each bin, the average of the bin and its left and right neighbors is considered. Any value smaller than --binSize will be ignored and no smoothing will be applied. (default: None) Read processing options: type: int? inputBinding: prefix: --smoothLength position: 1 verbose: doc: "--verbose \nSet to see processing messages. (default: False)\n" type: boolean? inputBinding: prefix: --verbose position: 1 version: doc: show program's version number and exit type: boolean? inputBinding: prefix: --version position: 1 outputs: output_bam_coverage: type: File outputBinding: glob: |- ${ if (inputs.outFileName) return inputs.outFileName; if (inputs.output_suffix) return inputs.bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + inputs.output_suffix; if (inputs.outFileFormat == "bedgraph") return inputs.bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".bdg"; return inputs.bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".bw"; } baseCommand: bamCoverage arguments: - prefix: --outFileName position: 3 valueFrom: |- ${ if (inputs.outFileName) return inputs.outFileName; if (inputs.output_suffix) return inputs.bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + inputs.output_suffix; if (inputs.outFileFormat == "bedgraph") return inputs.bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".bdg"; return inputs.bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".bw"; } hints: DockerRequirement: dockerPull: reddylab/deeptools:3.0.1 cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/quant.cwl.steps/bdg2bw-raw.cwl000066400000000000000000000017271420374476100326670ustar00rootroot00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: 'Tool: bedGraphToBigWig v 4 - Convert a bedGraph file to bigWig format.' requirements: InlineJavascriptRequirement: {} inputs: bed_graph: doc: "\tbed_graph is a four column file in the format: \n" type: File inputBinding: position: 1 genome_sizes: doc: "\tgenome_sizes is two column: .\n" type: File inputBinding: position: 2 output_suffix: type: string default: .bw outputs: output_bigwig: type: File outputBinding: glob: |- $(inputs.bed_graph.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, "") + inputs.output_suffix) baseCommand: bedGraphToBigWig arguments: - position: 3 valueFrom: |- $(inputs.bed_graph.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, "") + inputs.output_suffix) hints: DockerRequirement: dockerPull: dukegcb/bedgraphtobigwig bedsort_genomecov.cwl000066400000000000000000000012571420374476100343540ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/quant.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: | bedSort - Sort a .bed file by chrom,chromStart usage: bedSort in.bed out.bed in.bed and out.bed may be the same. requirements: InlineJavascriptRequirement: {} inputs: bed_file: doc: Bed or bedGraph file to be sorted type: File inputBinding: position: 1 outputs: bed_file_sorted: type: File outputBinding: glob: $(inputs.bed_file.path.replace(/^.*[\\\/]/, '') + "_sorted") baseCommand: bedSort arguments: - position: 2 valueFrom: $(inputs.bed_file.path.replace(/^.*[\\\/]/, '') + "_sorted") hints: DockerRequirement: dockerPull: dleehr/docker-hubutils bedtools_genomecov.cwl000066400000000000000000000170571420374476100345320ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/quant.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: |- Tool: bedtools genomecov (aka genomeCoverageBed) Version: v2.25.0 Summary: Compute the coverage of a feature file among a genome. Usage: bedtools genomecov [OPTIONS] -i -g Options: -ibam The input file is in BAM format. Note: BAM _must_ be sorted by position -d Report the depth at each genome position (with one-based coordinates). Default behavior is to report a histogram. -dz Report the depth at each genome position (with zero-based coordinates). Reports only non-zero positions. Default behavior is to report a histogram. -bg Report depth in BedGraph format. For details, see: genome.ucsc.edu/goldenPath/help/bedgraph.html -bga Report depth in BedGraph format, as above (-bg). However with this option, regions with zero coverage are also reported. This allows one to quickly extract all regions of a genome with 0 coverage by applying: "grep -w 0$" to the output. -split Treat "split" BAM or BED12 entries as distinct BED intervals. when computing coverage. For BAM files, this uses the CIGAR "N" and "D" operations to infer the blocks for computing coverage. For BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds fields (i.e., columns 10,11,12). -strand Calculate coverage of intervals from a specific strand. With BED files, requires at least 6 columns (strand is column 6). - (STRING): can be + or - -5 Calculate coverage of 5" positions (instead of entire interval). -3 Calculate coverage of 3" positions (instead of entire interval). -max Combine all positions with a depth >= max into a single bin in the histogram. Irrelevant for -d and -bedGraph - (INTEGER) -scale Scale the coverage by a constant factor. Each coverage value is multiplied by this factor before being reported. Useful for normalizing coverage by, e.g., reads per million (RPM). - Default is 1.0; i.e., unscaled. - (FLOAT) -trackline Adds a UCSC/Genome-Browser track line definition in the first line of the output. - See here for more details about track line definition: http://genome.ucsc.edu/goldenPath/help/bedgraph.html - NOTE: When adding a trackline definition, the output BedGraph can be easily uploaded to the Genome Browser as a custom track, BUT CAN NOT be converted into a BigWig file (w/o removing the first line). -trackopts Writes additional track line definition parameters in the first line. - Example: -trackopts 'name="My Track" visibility=2 color=255,30,30' Note the use of single-quotes if you have spaces in your parameters. - (TEXT) Notes: (1) The genome file should tab delimited and structured as follows: For example, Human (hg19): chr1 249250621 chr2 243199373 ... chr18_gl000207_random 4262 (2) The input BED (-i) file must be grouped by chromosome. A simple "sort -k 1,1 > .sorted" will suffice. (3) The input BAM (-ibam) file must be sorted by position. A "samtools sort " should suffice. Tips: One can use the UCSC Genome Browser's MySQL database to extract chromosome sizes. For example, H. sapiens: mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \ "select chrom, size from hg19.chromInfo" > hg19.genome requirements: InlineJavascriptRequirement: {} ShellCommandRequirement: {} inputs: '3': doc: "\tCalculate coverage of 3\" positions (instead of entire interval).\n" type: boolean? inputBinding: prefix: '-3' position: 1 '5': doc: "\tCalculate coverage of 5\" positions (instead of entire interval).\n" type: boolean? inputBinding: prefix: '-5' position: 1 bg: doc: | Report depth in BedGraph format. For details, see: genome.ucsc.edu/goldenPath/help/bedgraph.html type: boolean? inputBinding: prefix: -bg position: 1 bga: doc: | Report depth in BedGraph format, as above (-bg). However with this option, regions with zero coverage are also reported. This allows one to quickly extract all regions of a genome with 0 coverage by applying: "grep -w 0$" to the output. type: boolean? inputBinding: prefix: -bga position: 1 d: doc: | Report the depth at each genome position (with one-based coordinates). Default behavior is to report a histogram. type: boolean? inputBinding: prefix: -d position: 1 dz: doc: | Report the depth at each genome position (with zero-based coordinates). Reports only non-zero positions. Default behavior is to report a histogram. type: boolean? inputBinding: prefix: -dz position: 1 g: doc: type: File inputBinding: prefix: -g position: 3 ibam: doc: "\tThe input file is in BAM format.\nNote: BAM _must_ be sorted by position\n" type: File inputBinding: prefix: -ibam position: 2 max: doc: | Combine all positions with a depth >= max into a single bin in the histogram. Irrelevant for -d and -bedGraph - (INTEGER) type: int? inputBinding: prefix: -max position: 1 scale: doc: | Scale the coverage by a constant factor. Each coverage value is multiplied by this factor before being reported. Useful for normalizing coverage by, e.g., reads per million (RPM). - Default is 1.0; i.e., unscaled. - (FLOAT) type: float? inputBinding: prefix: -scale position: 1 split: doc: | Treat "split" BAM or BED12 entries as distinct BED intervals. when computing coverage. For BAM files, this uses the CIGAR "N" and "D" operations to infer the blocks for computing coverage. For BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds fields (i.e., columns 10,11,12). type: boolean? inputBinding: prefix: -split position: 1 strand: doc: | Calculate coverage of intervals from a specific strand. With BED files, requires at least 6 columns (strand is column 6). - (STRING): can be + or - type: string? inputBinding: prefix: -strand position: 1 trackline: doc: | Adds a UCSC/Genome-Browser track line definition in the first line of the output. - See here for more details about track line definition: http://genome.ucsc.edu/goldenPath/help/bedgraph.html - NOTE: When adding a trackline definition, the output BedGraph can be easily uploaded to the Genome Browser as a custom track, BUT CAN NOT be converted into a BigWig file (w/o removing the first line). type: boolean? inputBinding: prefix: -trackline position: 1 trackopts: doc: | Writes additional track line definition parameters in the first line. - Example: -trackopts 'name="My Track" visibility=2 color=255,30,30' Note the use of single-quotes if you have spaces in your parameters. - (TEXT) type: string? inputBinding: prefix: -trackopts position: 1 outputs: output_bedfile: type: File outputBinding: glob: $(inputs.ibam.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.bdg') stdout: $(inputs.ibam.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.bdg') baseCommand: - bedtools - genomecov hints: DockerRequirement: dockerPull: dukegcb/bedtools cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/trimm.cwl000066400000000000000000000043051420374476100270030ustar00rootroot00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: Workflow doc: 'ATAC-seq 02 trimming - reads: SE' requirements: - class: ScatterFeatureRequirement - class: StepInputExpressionRequirement - class: InlineJavascriptRequirement inputs: input_adapters_files: doc: Input adapters files type: File[] input_fastq_files: doc: Input fastq files type: File[] nthreads: doc: Number of threads type: int default: 1 quality_score: type: string default: -phred33 trimmomatic_jar_path: doc: Trimmomatic Java jar file type: string default: /usr/share/java/trimmomatic.jar trimmomatic_java_opts: doc: JVM arguments should be a quoted, space separated list type: string? outputs: output_data_fastq_trimmed_files: doc: Trimmed fastq files type: File[] outputSource: trimmomatic/output_read1_trimmed_file output_trimmed_fastq_read_count: doc: Trimmed read counts of fastq files type: File[] outputSource: count_fastq_reads/output_read_count steps: count_fastq_reads: in: input_basename: extract_basename/output_basename input_fastq_file: trimmomatic/output_read1_trimmed_file scatter: - input_fastq_file - input_basename scatterMethod: dotproduct run: trimm.cwl.steps/count_fastq_reads.cwl out: - output_read_count extract_basename: in: input_file: trimmomatic/output_read1_trimmed_file scatter: input_file run: trimm.cwl.steps/extract_basename.cwl out: - output_basename trimmomatic: in: end_mode: valueFrom: SE illuminaclip: valueFrom: 2:30:15 input_adapters_file: input_adapters_files input_read1_fastq_file: input_fastq_files java_opts: trimmomatic_java_opts leading: valueFrom: ${return 3} minlen: valueFrom: ${return 15} nthreads: nthreads phred: valueFrom: '33' slidingwindow: valueFrom: 4:20 trailing: valueFrom: ${return 3} trimmomatic_jar_path: trimmomatic_jar_path scatter: - input_read1_fastq_file - input_adapters_file scatterMethod: dotproduct run: trimm.cwl.steps/trimmomatic.cwl out: - output_read1_trimmed_file cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/trimm.cwl.steps/000077500000000000000000000000001420374476100302135ustar00rootroot00000000000000count_fastq_reads.cwl000066400000000000000000000010441420374476100343460ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/trimm.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: Counts reads in a fastq file requirements: InlineJavascriptRequirement: {} inputs: input_basename: type: string input_fastq_file: type: File inputBinding: position: 1 outputs: output_read_count: type: File outputBinding: glob: $(inputs.input_basename + '.read_count.txt') stdout: $(inputs.input_basename + '.read_count.txt') baseCommand: count-fastq-reads.sh hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr extract_basename.cwl000066400000000000000000000011051420374476100341450ustar00rootroot00000000000000cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/trimm.cwl.steps#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: Extracts the base name of a file requirements: InlineJavascriptRequirement: {} inputs: input_file: type: File inputBinding: position: 1 outputs: output_basename: type: string outputBinding: outputEval: |- $(inputs.input_file.path.substr(inputs.input_file.path.lastIndexOf('/') + 1, inputs.input_file.path.lastIndexOf('.') - (inputs.input_file.path.lastIndexOf('/') + 1))) baseCommand: echo hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr cwl-format-2022.02.18/tests/cwl/expected-exploded-atac-seq.cwl.steps/trimm.cwl.steps/trimmomatic.cwl000066400000000000000000000256401420374476100332560ustar00rootroot00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: | Trimmomatic is a fast, multithreaded command line tool that can be used to trim and crop Illumina (FASTQ) data as well as to remove adapters. These adapters can pose a real problem depending on the library preparation and downstream application. There are two major modes of the program: Paired end mode and Single end mode. The paired end mode will maintain correspondence of read pairs and also use the additional information contained in paired reads to better find adapter or PCR primer fragments introduced by the library preparation process. Trimmomatic works with FASTQ files (using phred + 33 or phred + 64 quality scores, depending on the Illumina pipeline used). requirements: InlineJavascriptRequirement: {} ShellCommandRequirement: {} inputs: avgqual: doc: | Drop the read if the average quality is below the specified level : Specifies the minimum average quality required to keep a read. type: int? inputBinding: prefix: 'AVGQUAL:' position: 101 separate: false crop: doc: | Removes bases regardless of quality from the end of the read, so that the read has maximally the specified length after this step has been performed. Steps performed after CROP might of course further shorten the read. : The number of bases to keep, from the start of the read. type: int? inputBinding: prefix: 'CROP:' position: 13 separate: false end_mode: doc: "SE|PE\nSingle End (SE) or Paired End (PE) mode\n" type: string inputBinding: position: 3 headcrop: doc: | Removes the specified number of bases, regardless of quality, from the beginning of the read. : The number of bases to keep, from the start of the read. type: int? inputBinding: prefix: 'HEADCROP:' position: 13 separate: false illuminaclip: doc: | ::::: Find and remove Illumina adapters. REQUIRED: : specifies the path to a fasta file containing all the adapters, PCR sequences etc. The naming of the various sequences within this file determines how they are used. See below. : specifies the maximum mismatch count which will still allow a full match to be performed : specifies how accurate the match between the two 'adapter ligated' reads must be for PE palindrome read alignment. : specifies how accurate the match between any adapter etc. sequence must be against a read OPTIONAL: : In addition to the alignment score, palindrome mode can verify that a minimum length of adapter has been detected. If unspecified, this defaults to 8 bases, for historical reasons. However, since palindrome mode has a very low false positive rate, this can be safely reduced, even down to 1, to allow shorter adapter fragments to be removed. : After read-though has been detected by palindrome mode, and the adapter sequence removed, the reverse read contains the same sequence information as the forward read, albeit in reverse complement. For this reason, the default behaviour is to entirely drop the reverse read. By specifying "true" for this parameter, the reverse read will also be retained, which may be useful e.g. if the downstream tools cannot handle a combination of paired and unpaired reads. type: string input_adapters_file: doc: |- FASTA file containing adapters, PCR sequences, etc. It is used to search for and remove these sequences in the input FASTQ file(s) type: File input_read1_fastq_file: doc: FASTQ file for input read (read R1 in Paired End mode) type: File inputBinding: position: 5 input_read2_fastq_file: doc: FASTQ file for read R2 in Paired End mode type: File? inputBinding: position: 6 java_opts: doc: |- JVM arguments should be a quoted, space separated list (e.g. "-Xms128m -Xmx512m") type: string? inputBinding: position: 1 shellQuote: false leading: doc: | Remove low quality bases from the beginning. As long as a base has a value below this threshold the base is removed and the next base will be investigated. : Specifies the minimum quality required to keep a base. type: int? inputBinding: prefix: 'LEADING:' position: 14 separate: false log_filename: doc: | Specifying a trimlog file creates a log of all read trimmings, indicating the following details: the read name the surviving sequence length the location of the first surviving base, aka. the amount trimmed from the start the location of the last surviving base in the original read the amount trimmed from the end : filename for the generated output log file. type: string? inputBinding: prefix: -trimlog position: 4 maxinfo: doc: | : Performs an adaptive quality trim, balancing the benefits of retaining longer reads against the costs of retaining bases with errors. : This specifies the read length which is likely to allow the location of the read within the target sequence to be determined. : This value, which should be set between 0 and 1, specifies the balance between preserving as much read length as possible vs. removal of incorrect bases. A low value of this parameter (<0.2) favours longer reads, while a high value (>0.8) favours read correctness. type: string? inputBinding: prefix: 'MAXINFO:' position: 15 separate: false minlen: doc: | This module removes reads that fall below the specified minimal length. If required, it should normally be after all other processing steps. Reads removed by this step will be counted and included in the "dropped reads" count presented in the trimmomatic summary. : Specifies the minimum length of reads to be kept type: int? inputBinding: prefix: 'MINLEN:' position: 100 separate: false nthreads: doc: Number of threads type: int default: 1 inputBinding: prefix: -threads position: 4 phred: doc: | "33"|"64" -phred33 ("33") or -phred64 ("64") specifies the base quality encoding. Default: -phred64 type: string default: '64' inputBinding: prefix: -phred position: 4 separate: false slidingwindow: doc: | : Perform a sliding window trimming, cutting once the average quality within the window falls below a threshold. By considering multiple bases, a single poor quality base will not cause the removal of high quality data later in the read. : specifies the number of bases to average across : specifies the average quality required type: string? inputBinding: prefix: 'SLIDINGWINDOW:' position: 15 separate: false tophred33: doc: This (re)encodes the quality part of the FASTQ file to base 33. type: boolean? inputBinding: prefix: TOPHRED33 position: 12 separate: false tophred64: doc: This (re)encodes the quality part of the FASTQ file to base 64. type: boolean? inputBinding: prefix: TOPHRED64 position: 12 separate: false trailing: doc: | Remove low quality bases from the end. As long as a base has a value below this threshold the base is removed and the next base (which as trimmomatic is starting from the 3" prime end would be base preceding the just removed base) will be investigated. This approach can be used removing the special illumina "low quality segment" regions (which are marked with quality score of 2), but we recommend Sliding Window or MaxInfo instead : Specifies the minimum quality required to keep a base. type: int? inputBinding: prefix: 'TRAILING:' position: 14 separate: false trimmomatic_jar_path: type: string inputBinding: prefix: -jar position: 2 outputs: output_log_file: doc: Trimmomatic Log file. type: File? outputBinding: glob: $(inputs.log_filename) output_read1_trimmed_file: type: File outputBinding: glob: |- $(inputs.input_read1_fastq_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.trimmed.fastq') output_read1_trimmed_unpaired_file: type: File? outputBinding: glob: | ${ if (inputs.end_mode == "PE") return inputs.input_read1_fastq_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.unpaired.trimmed.fastq'; return null; } output_read2_trimmed_paired_file: type: File? outputBinding: glob: | ${ if (inputs.end_mode == "PE" && inputs.input_read2_fastq_file) return inputs.input_read2_fastq_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.trimmed.fastq'; return null; } output_read2_trimmed_unpaired_file: type: File? outputBinding: glob: | ${ if (inputs.end_mode == "PE" && inputs.input_read2_fastq_file) return inputs.input_read2_fastq_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.unpaired.trimmed.fastq'; return null; } baseCommand: java arguments: - position: 1 valueFrom: $("-Djava.io.tmpdir="+runtime.tmpdir) shellQuote: false - position: 7 valueFrom: |- $(inputs.input_read1_fastq_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.trimmed.fastq') - position: 8 valueFrom: | ${ if (inputs.end_mode == "PE" && inputs.input_read2_fastq_file) return inputs.input_read1_fastq_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.trimmed.unpaired.fastq'; return null; } - position: 9 valueFrom: | ${ if (inputs.end_mode == "PE" && inputs.input_read2_fastq_file) return inputs.input_read2_fastq_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.trimmed.fastq'; return null; } - position: 10 valueFrom: | ${ if (inputs.end_mode == "PE" && inputs.input_read2_fastq_file) return inputs.input_read2_fastq_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.trimmed.unpaired.fastq'; return null; } - position: 11 valueFrom: $("ILLUMINACLIP:" + inputs.input_adapters_file.path + ":"+ inputs.illuminaclip) hints: DockerRequirement: dockerPull: dukegcb/trimmomatic cwl-format-2022.02.18/tests/cwl/formatted-atac-seq-pipeline.cwl000066400000000000000000005300501420374476100241200ustar00rootroot00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: Workflow label: ATAC-seq-pipeline-se doc: 'ATAC-seq pipeline - reads: SE' $namespaces: sbg: https://sevenbridges.com requirements: - class: ScatterFeatureRequirement - class: SubworkflowFeatureRequirement - class: StepInputExpressionRequirement inputs: as_narrowPeak_file: doc: Definition narrowPeak file in AutoSql format (used in bedToBigBed) type: File default_adapters_file: doc: Adapters file type: File genome_effective_size: doc: |- Effective genome size used by MACS2. It can be numeric or a shortcuts:'hs' for human (2.7e9), 'mm' for mouse (1.87e9), 'ce' for C. elegans (9e7) and 'dm' for fruitfly (1.2e8), Default:hs type: string default: hs genome_ref_first_index_file: doc: |- "First index file of Bowtie reference genome with extension 1.ebwt. \ (Note: the rest of the index files MUST be in the same folder)" type: File secondaryFiles: - ^^.2.ebwt - ^^.3.ebwt - ^^.4.ebwt - ^^.rev.1.ebwt - ^^.rev.2.ebwt genome_sizes_file: doc: Genome sizes tab-delimited file (used in samtools) type: File input_fastq_files: type: File[] nthreads_map: doc: Number of threads required for the 03-map step type: int nthreads_peakcall: doc: Number of threads required for the 04-peakcall step type: int nthreads_qc: doc: Number of threads required for the 01-qc step type: int nthreads_quant: doc: Number of threads required for the 05-quantification step type: int nthreads_trimm: doc: Number of threads required for the 02-trim step type: int picard_jar_path: doc: Picard Java jar file type: string picard_java_opts: doc: |- JVM arguments should be a quoted, space separated list (e.g. "-Xms128m -Xmx512m") type: string? trimmomatic_jar_path: doc: Trimmomatic Java jar file type: string trimmomatic_java_opts: doc: |- JVM arguments should be a quoted, space separated list (e.g. "-Xms128m -Xmx512m") type: string? outputs: map_bowtie_log_files: doc: Bowtie log file with mapping stats type: File[] outputSource: map/output_bowtie_log map_dedup_bam_files: doc: Filtered BAM files (post-processing end point) type: File[] outputSource: map/output_data_sorted_dups_marked_bam_files map_mark_duplicates_files: doc: |- Summary of duplicates removed with Picard tool MarkDuplicates (for multiple reads aligned to the same positions type: File[] outputSource: map/output_picard_mark_duplicates_files map_pbc_files: doc: PCR Bottleneck Coefficient files (used to flag samples when pbc<0.5) type: File[] outputSource: map/output_pbc_files map_percent_mitochondrial_reads: doc: Percentage of mitochondrial reads type: File[] outputSource: map/output_percent_mitochondrial_reads map_preseq_c_curve_files: doc: Preseq c_curve output files type: File[] outputSource: map/output_preseq_c_curve_files map_preseq_percentage_uniq_reads: doc: Preseq percentage of uniq reads type: File[] outputSource: map/output_percentage_uniq_reads map_read_count_mapped: doc: Read counts of the mapped BAM files type: File[] outputSource: map/output_read_count_mapped peakcall_extended_peak_file: doc: Extended fragment peaks in ENCODE Peak file format type: File[] outputSource: peak_call/output_extended_peak_file peakcall_filtered_read_count_file: doc: Filtered read count after peak calling type: File[] outputSource: peak_call/output_filtered_read_count_file peakcall_peak_bigbed_file: doc: Peaks in bigBed format type: File[] outputSource: peak_call/output_peak_bigbed_file peakcall_peak_count_within_replicate: doc: Peak counts within replicate type: File[] outputSource: peak_call/output_peak_count_within_replicate peakcall_peak_file: doc: Peaks in ENCODE Peak file format type: File[] outputSource: peak_call/output_peak_file peakcall_peak_summits_file: doc: Peaks summits in bedfile format type: File[] outputSource: peak_call/output_peak_summits_file peakcall_peak_xls_file: doc: Peak calling report file type: File[] outputSource: peak_call/output_peak_xls_file peakcall_read_in_peak_count_within_replicate: doc: Peak counts within replicate type: File[] outputSource: peak_call/output_read_in_peak_count_within_replicate peakcall_spp_x_cross_corr: doc: SPP strand cross correlation summary type: File[] outputSource: peak_call/output_spp_x_cross_corr peakcall_spp_x_cross_corr_plot: doc: SPP strand cross correlation plot type: File[] outputSource: peak_call/output_spp_cross_corr_plot qc_count_raw_reads: doc: Raw read counts of fastq files after QC type: File[] outputSource: qc/output_count_raw_reads qc_diff_counts: doc: Diff file between number of raw reads and number of reads counted by FASTQC, type: File[] outputSource: qc/output_diff_counts qc_fastqc_data_files: doc: FastQC data files type: File[] outputSource: qc/output_fastqc_data_files qc_fastqc_report_files: doc: FastQC reports in zip format type: File[] outputSource: qc/output_fastqc_report_files quant_bigwig_norm_files: doc: Normalized reads bigWig (signal) files type: File[] outputSource: quant/bigwig_norm_files quant_bigwig_raw_files: doc: Raw reads bigWig (signal) files type: File[] outputSource: quant/bigwig_raw_files trimm_fastq_files: doc: FASTQ files after trimming type: File[] outputSource: trimm/output_data_fastq_trimmed_files trimm_raw_counts: doc: Raw read counts of fastq files after trimming type: File[] outputSource: trimm/output_trimmed_fastq_read_count steps: map: in: genome_ref_first_index_file: genome_ref_first_index_file genome_sizes_file: genome_sizes_file input_fastq_files: trimm/output_data_fastq_trimmed_files nthreads: nthreads_map picard_jar_path: picard_jar_path picard_java_opts: picard_java_opts run: cwlVersion: v1.0 class: Workflow doc: 'ATAC-seq 03 mapping - reads: SE' requirements: - class: ScatterFeatureRequirement - class: SubworkflowFeatureRequirement - class: StepInputExpressionRequirement - class: InlineJavascriptRequirement inputs: genome_ref_first_index_file: doc: |- Bowtie first index files for reference genome (e.g. *1.ebwt). The rest of the files should be in the same folder. type: File secondaryFiles: - ^^.2.ebwt - ^^.3.ebwt - ^^.4.ebwt - ^^.rev.1.ebwt - ^^.rev.2.ebwt genome_sizes_file: doc: Genome sizes tab-delimited file (used in samtools) type: File input_fastq_files: doc: Input fastq files type: File[] nthreads: type: int default: 1 picard_jar_path: doc: Picard Java jar file type: string default: /usr/picard/picard.jar picard_java_opts: doc: |- JVM arguments should be a quoted, space separated list (e.g. "-Xms128m -Xmx512m") type: string? outputs: output_bowtie_log: doc: Bowtie log file. type: File[] outputSource: bowtie-se/output_bowtie_log output_data_sorted_dedup_bam_files: doc: BAM files without duplicate reads. type: File[] outputSource: index_dedup_bams/indexed_file output_data_sorted_dups_marked_bam_files: doc: BAM files with marked duplicate reads. type: File[] outputSource: index_dups_marked_bams/indexed_file output_pbc_files: doc: PCR Bottleneck Coeficient files. type: File[] outputSource: execute_pcr_bottleneck_coef/pbc_file output_percent_mitochondrial_reads: doc: Percentage of mitochondrial reads. type: File[] outputSource: percent_mitochondrial_reads/percent_map output_percentage_uniq_reads: doc: Percentage of uniq reads from preseq c_curve output type: File[] outputSource: percent_uniq_reads/output output_picard_mark_duplicates_files: doc: Picard MarkDuplicates metrics files. type: File[] outputSource: mark_duplicates/output_metrics_file output_preseq_c_curve_files: doc: Preseq c_curve output files. type: File[] outputSource: preseq-c-curve/output_file output_read_count_mapped: doc: Read counts of the mapped BAM files type: File[] outputSource: mapped_reads_count/output output_read_count_mapped_filtered: doc: Read counts of the mapped and filtered BAM files type: File[] outputSource: mapped_filtered_reads_count/output_read_count steps: bam_idxstats: in: bam: index_bams/indexed_file scatter: bam run: cwlVersion: v1.0 class: CommandLineTool requirements: InitialWorkDirRequirement: listing: - $(inputs.bam) InlineJavascriptRequirement: {} inputs: bam: doc: Bam file (it should be indexed) type: File secondaryFiles: - .bai inputBinding: position: 1 outputs: idxstats_file: doc: | Idxstats output file. TAB-delimited with each line consisting of reference sequence name, sequence length, # mapped reads and # unmapped reads type: File outputBinding: glob: $(inputs.bam.basename + ".idxstats") stdout: $(inputs.bam.basename + ".idxstats") baseCommand: - samtools - idxstats hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 out: - idxstats_file bowtie-se: in: X: valueFrom: ${return 2000} genome_ref_first_index_file: genome_ref_first_index_file input_fastq_file: input_fastq_files nthreads: nthreads output_filename: extract_basename_2/output_path v: valueFrom: ${return 2} scatter: - input_fastq_file - output_filename scatterMethod: dotproduct run: cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} ShellCommandRequirement: {} inputs: X: doc: 'maximum insert size for paired-end alignment (default: 250)' type: int? inputBinding: prefix: -X position: 4 best: doc: Hits guaranteed best stratum; ties broken by quality type: boolean default: true inputBinding: prefix: --best position: 5 chunkmbs: doc: |- The number of megabytes of memory a given thread is given to store path descriptors in --best mode. (Default: 256) type: int? inputBinding: prefix: --chunkmbs position: 5 genome_ref_first_index_file: doc: |- First file (extension .1.ebwt) of the Bowtie index files generated for the reference genome (see http://bowtie-bio.sourceforge.net/tutorial.shtml#newi) type: File secondaryFiles: - ^^.2.ebwt - ^^.3.ebwt - ^^.4.ebwt - ^^.rev.1.ebwt - ^^.rev.2.ebwt inputBinding: position: 9 valueFrom: $(self.path.split('.').splice(0,self.path.split('.').length-2).join(".")) input_fastq_file: doc: Query input FASTQ file. type: File inputBinding: position: 10 m: doc: 'Suppress all alignments if > exist (def: 1)' type: int default: 1 inputBinding: prefix: -m position: 7 nthreads: doc: ' number of alignment threads to launch (default: 1)' type: int default: 1 inputBinding: prefix: --threads position: 8 output_filename: type: string sam: doc: 'Write hits in SAM format (default: BAM)' type: boolean default: true inputBinding: prefix: --sam position: 2 seedlen: doc: 'seed length for -n (default: 28)' type: int? inputBinding: prefix: --seedlen position: 1 seedmms: doc: 'max mismatches in seed (between [0, 3], default: 2)' type: int? inputBinding: prefix: --seedmms position: 1 strata: doc: Hits in sub-optimal strata aren't reported (requires --best) type: boolean default: true inputBinding: prefix: --strata position: 6 t: doc: Print wall-clock time taken by search phases type: boolean default: true inputBinding: prefix: -t position: 1 trim3: doc: trim bases from 3' (right) end of reads type: int? inputBinding: prefix: --trim3 position: 1 trim5: doc: trim bases from 5' (left) end of reads type: int? inputBinding: prefix: --trim5 position: 1 v: doc: Report end-to-end hits w/ <=v mismatches; ignore qualities type: int? inputBinding: prefix: -v position: 3 outputs: output_aligned_file: doc: Aligned bowtie file in [SAM|BAM] format. type: File outputBinding: glob: $(inputs.output_filename + '.sam') output_bowtie_log: type: File outputBinding: glob: $(inputs.output_filename + '.bowtie.log') stderr: $(inputs.output_filename + '.bowtie.log') baseCommand: bowtie arguments: - position: 11 valueFrom: $(inputs.output_filename + '.sam') hints: DockerRequirement: dockerPull: dukegcb/bowtie out: - output_aligned_file - output_bowtie_log execute_pcr_bottleneck_coef: in: genome_sizes: genome_sizes_file input_bam_files: filtered2sorted/sorted_file input_output_filenames: extract_basename_2/output_path run: cwlVersion: v1.0 class: Workflow doc: ChIP-seq - map - PCR Bottleneck Coefficients requirements: - class: ScatterFeatureRequirement inputs: genome_sizes: type: File input_bam_files: type: File[] input_output_filenames: type: string[] outputs: pbc_file: type: File[] outputSource: compute_pbc/pbc steps: bedtools_genomecov: in: bg: default: true g: genome_sizes ibam: input_bam_files scatter: ibam run: cwlVersion: v1.0 class: CommandLineTool doc: |- Tool: bedtools genomecov (aka genomeCoverageBed) Version: v2.25.0 Summary: Compute the coverage of a feature file among a genome. Usage: bedtools genomecov [OPTIONS] -i -g Options: -ibam The input file is in BAM format. Note: BAM _must_ be sorted by position -d Report the depth at each genome position (with one-based coordinates). Default behavior is to report a histogram. -dz Report the depth at each genome position (with zero-based coordinates). Reports only non-zero positions. Default behavior is to report a histogram. -bg Report depth in BedGraph format. For details, see: genome.ucsc.edu/goldenPath/help/bedgraph.html -bga Report depth in BedGraph format, as above (-bg). However with this option, regions with zero coverage are also reported. This allows one to quickly extract all regions of a genome with 0 coverage by applying: "grep -w 0$" to the output. -split Treat "split" BAM or BED12 entries as distinct BED intervals. when computing coverage. For BAM files, this uses the CIGAR "N" and "D" operations to infer the blocks for computing coverage. For BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds fields (i.e., columns 10,11,12). -strand Calculate coverage of intervals from a specific strand. With BED files, requires at least 6 columns (strand is column 6). - (STRING): can be + or - -5 Calculate coverage of 5" positions (instead of entire interval). -3 Calculate coverage of 3" positions (instead of entire interval). -max Combine all positions with a depth >= max into a single bin in the histogram. Irrelevant for -d and -bedGraph - (INTEGER) -scale Scale the coverage by a constant factor. Each coverage value is multiplied by this factor before being reported. Useful for normalizing coverage by, e.g., reads per million (RPM). - Default is 1.0; i.e., unscaled. - (FLOAT) -trackline Adds a UCSC/Genome-Browser track line definition in the first line of the output. - See here for more details about track line definition: http://genome.ucsc.edu/goldenPath/help/bedgraph.html - NOTE: When adding a trackline definition, the output BedGraph can be easily uploaded to the Genome Browser as a custom track, BUT CAN NOT be converted into a BigWig file (w/o removing the first line). -trackopts Writes additional track line definition parameters in the first line. - Example: -trackopts 'name="My Track" visibility=2 color=255,30,30' Note the use of single-quotes if you have spaces in your parameters. - (TEXT) Notes: (1) The genome file should tab delimited and structured as follows: For example, Human (hg19): chr1 249250621 chr2 243199373 ... chr18_gl000207_random 4262 (2) The input BED (-i) file must be grouped by chromosome. A simple "sort -k 1,1 > .sorted" will suffice. (3) The input BAM (-ibam) file must be sorted by position. A "samtools sort " should suffice. Tips: One can use the UCSC Genome Browser's MySQL database to extract chromosome sizes. For example, H. sapiens: mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \ "select chrom, size from hg19.chromInfo" > hg19.genome requirements: InlineJavascriptRequirement: {} ShellCommandRequirement: {} inputs: '3': doc: "\tCalculate coverage of 3\" positions (instead of entire\ \ interval).\n" type: boolean? inputBinding: prefix: '-3' position: 1 '5': doc: "\tCalculate coverage of 5\" positions (instead of entire\ \ interval).\n" type: boolean? inputBinding: prefix: '-5' position: 1 bg: doc: | Report depth in BedGraph format. For details, see: genome.ucsc.edu/goldenPath/help/bedgraph.html type: boolean? inputBinding: prefix: -bg position: 1 bga: doc: | Report depth in BedGraph format, as above (-bg). However with this option, regions with zero coverage are also reported. This allows one to quickly extract all regions of a genome with 0 coverage by applying: "grep -w 0$" to the output. type: boolean? inputBinding: prefix: -bga position: 1 d: doc: | Report the depth at each genome position (with one-based coordinates). Default behavior is to report a histogram. type: boolean? inputBinding: prefix: -d position: 1 dz: doc: | Report the depth at each genome position (with zero-based coordinates). Reports only non-zero positions. Default behavior is to report a histogram. type: boolean? inputBinding: prefix: -dz position: 1 g: doc: type: File inputBinding: prefix: -g position: 3 ibam: doc: "\tThe input file is in BAM format.\nNote: BAM _must_ be\ \ sorted by position\n" type: File inputBinding: prefix: -ibam position: 2 max: doc: | Combine all positions with a depth >= max into a single bin in the histogram. Irrelevant for -d and -bedGraph - (INTEGER) type: int? inputBinding: prefix: -max position: 1 scale: doc: | Scale the coverage by a constant factor. Each coverage value is multiplied by this factor before being reported. Useful for normalizing coverage by, e.g., reads per million (RPM). - Default is 1.0; i.e., unscaled. - (FLOAT) type: float? inputBinding: prefix: -scale position: 1 split: doc: | Treat "split" BAM or BED12 entries as distinct BED intervals. when computing coverage. For BAM files, this uses the CIGAR "N" and "D" operations to infer the blocks for computing coverage. For BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds fields (i.e., columns 10,11,12). type: boolean? inputBinding: prefix: -split position: 1 strand: doc: | Calculate coverage of intervals from a specific strand. With BED files, requires at least 6 columns (strand is column 6). - (STRING): can be + or - type: string? inputBinding: prefix: -strand position: 1 trackline: doc: | Adds a UCSC/Genome-Browser track line definition in the first line of the output. - See here for more details about track line definition: http://genome.ucsc.edu/goldenPath/help/bedgraph.html - NOTE: When adding a trackline definition, the output BedGraph can be easily uploaded to the Genome Browser as a custom track, BUT CAN NOT be converted into a BigWig file (w/o removing the first line). type: boolean? inputBinding: prefix: -trackline position: 1 trackopts: doc: | Writes additional track line definition parameters in the first line. - Example: -trackopts 'name="My Track" visibility=2 color=255,30,30' Note the use of single-quotes if you have spaces in your parameters. - (TEXT) type: string? inputBinding: prefix: -trackopts position: 1 outputs: output_bedfile: type: File outputBinding: glob: $(inputs.ibam.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.bdg') stdout: $(inputs.ibam.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.bdg') baseCommand: - bedtools - genomecov hints: DockerRequirement: dockerPull: dukegcb/bedtools out: - output_bedfile compute_pbc: in: bedgraph_file: bedtools_genomecov/output_bedfile output_filename: input_output_filenames scatter: - bedgraph_file - output_filename scatterMethod: dotproduct run: cwlVersion: v1.0 class: CommandLineTool doc: Compute PCR Bottleneck Coeficient from BedGraph file. inputs: bedgraph_file: type: File inputBinding: position: 1 output_filename: type: string outputs: pbc: type: File outputBinding: glob: $(inputs.output_filename + '.PBC.txt') stdout: $(inputs.output_filename + '.PBC.txt') baseCommand: - awk - $4==1 {N1 += $3 - $2}; $4>=1 {Nd += $3 - $2} END {print N1/Nd} out: - pbc out: - pbc_file extract_basename_1: in: input_file: input_fastq_files scatter: input_file run: cwlVersion: v1.0 class: CommandLineTool doc: Extracts the base name of a file requirements: InlineJavascriptRequirement: {} inputs: input_file: type: File inputBinding: position: 1 outputs: output_basename: type: string outputBinding: outputEval: |- $(inputs.input_file.path.substr(inputs.input_file.path.lastIndexOf('/') + 1, inputs.input_file.path.lastIndexOf('.') - (inputs.input_file.path.lastIndexOf('/') + 1))) baseCommand: echo hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr out: - output_basename extract_basename_2: in: file_path: extract_basename_1/output_basename scatter: file_path run: cwlVersion: v1.0 class: CommandLineTool doc: Extracts the base name of a file inputs: file_path: type: string inputBinding: position: 1 outputs: output_path: type: string outputBinding: outputEval: $(inputs.file_path.replace(/\.[^/.]+$/, "")) baseCommand: echo hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr out: - output_path filter-unmapped: in: input_file: sort_bams/sorted_file output_filename: extract_basename_2/output_path scatter: - input_file - output_filename scatterMethod: dotproduct run: cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} inputs: input_file: doc: Aligned file to be sorted with samtools type: File inputBinding: position: 1000 nthreads: doc: Number of threads used in sorting type: int default: 1 inputBinding: prefix: -@ position: 1 output_filename: doc: Basename for the output file type: string outputs: filtered_file: doc: Filter unmapped reads in aligned file type: File outputBinding: glob: $(inputs.output_filename + '.accepted_hits.bam') stdout: $(inputs.output_filename + '.accepted_hits.bam') baseCommand: - samtools - view - -F - '4' - -b - -h hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 out: - filtered_file filtered2sorted: in: input_file: filter-unmapped/filtered_file nthreads: nthreads scatter: - input_file run: cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} inputs: input_file: doc: Aligned file to be sorted with samtools type: File inputBinding: position: 1000 n: doc: Sort by read name type: boolean default: false inputBinding: prefix: -n position: 1 nthreads: doc: Number of threads used in sorting type: int default: 1 inputBinding: prefix: -@ position: 1 suffix: doc: suffix of the transformed SAM/BAM file (including extension, e.g. .filtered.sam) type: string default: .sorted.bam outputs: sorted_file: doc: Sorted aligned file type: File outputBinding: glob: |- $(inputs.input_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + inputs.suffix) stdout: |- $(inputs.input_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + inputs.suffix) baseCommand: - samtools - sort hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 out: - sorted_file index_bams: in: input_file: sort_bams/sorted_file scatter: input_file run: cwlVersion: v1.0 class: CommandLineTool requirements: InitialWorkDirRequirement: listing: - $(inputs.input_file) InlineJavascriptRequirement: {} inputs: input_file: doc: Aligned file to be sorted with samtools type: File inputBinding: position: 1 outputs: indexed_file: doc: Indexed BAM file type: File secondaryFiles: .bai outputBinding: glob: $(inputs.input_file.basename) baseCommand: - samtools - index arguments: - position: 2 valueFrom: $(inputs.input_file.basename + '.bai') hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 out: - indexed_file index_dedup_bams: in: input_file: sort_dedup_bams/sorted_file scatter: - input_file run: cwlVersion: v1.0 class: CommandLineTool requirements: InitialWorkDirRequirement: listing: - $(inputs.input_file) InlineJavascriptRequirement: {} inputs: input_file: doc: Aligned file to be sorted with samtools type: File inputBinding: position: 1 outputs: indexed_file: doc: Indexed BAM file type: File secondaryFiles: .bai outputBinding: glob: $(inputs.input_file.basename) baseCommand: - samtools - index arguments: - position: 2 valueFrom: $(inputs.input_file.basename + '.bai') hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 out: - indexed_file index_dups_marked_bams: in: input_file: sort_dups_marked_bams/sorted_file scatter: - input_file run: cwlVersion: v1.0 class: CommandLineTool requirements: InitialWorkDirRequirement: listing: - $(inputs.input_file) InlineJavascriptRequirement: {} inputs: input_file: doc: Aligned file to be sorted with samtools type: File inputBinding: position: 1 outputs: indexed_file: doc: Indexed BAM file type: File secondaryFiles: .bai outputBinding: glob: $(inputs.input_file.basename) baseCommand: - samtools - index arguments: - position: 2 valueFrom: $(inputs.input_file.basename + '.bai') hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 out: - indexed_file index_filtered_bam: in: input_file: filtered2sorted/sorted_file scatter: input_file run: cwlVersion: v1.0 class: CommandLineTool requirements: InitialWorkDirRequirement: listing: - $(inputs.input_file) InlineJavascriptRequirement: {} inputs: input_file: doc: Aligned file to be sorted with samtools type: File inputBinding: position: 1 outputs: indexed_file: doc: Indexed BAM file type: File secondaryFiles: .bai outputBinding: glob: $(inputs.input_file.basename) baseCommand: - samtools - index arguments: - position: 2 valueFrom: $(inputs.input_file.basename + '.bai') hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 out: - indexed_file mapped_filtered_reads_count: in: input_bam_file: sort_dedup_bams/sorted_file output_suffix: valueFrom: .mapped_and_filtered.read_count.txt scatter: input_bam_file run: cwlVersion: v1.0 class: CommandLineTool doc: Extract mapped reads from BAM file using Samtools flagstat command requirements: InlineJavascriptRequirement: {} ShellCommandRequirement: {} inputs: input_bam_file: doc: Aligned BAM file to filter type: File inputBinding: position: 1 output_suffix: type: string outputs: output_read_count: doc: Samtools Flagstat report file type: File outputBinding: glob: |- $(inputs.input_bam_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, "") + inputs.output_suffix) stdout: |- $(inputs.input_bam_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, "") + inputs.output_suffix) baseCommand: - samtools - flagstat arguments: - position: 10000 valueFrom: " | head -n1 | cut -f 1 -d ' '" shellQuote: false hints: DockerRequirement: dockerPull: dukegcb/samtools out: - output_read_count mapped_reads_count: in: bowtie_log: bowtie-se/output_bowtie_log scatter: bowtie_log run: cwlVersion: v1.0 class: CommandLineTool doc: Get number of processed reads from Bowtie log. requirements: InlineJavascriptRequirement: {} ShellCommandRequirement: {} inputs: bowtie_log: type: File inputBinding: {} outputs: output: type: File outputBinding: glob: $(inputs.bowtie_log.path.replace(/^.*[\\\/]/, '') + '.read_count.mapped') stdout: $(inputs.bowtie_log.path.replace(/^.*[\\\/]/, '') + '.read_count.mapped') baseCommand: read-count-from-bowtie-log.sh hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr out: - output mark_duplicates: in: input_file: index_filtered_bam/indexed_file java_opts: picard_java_opts output_filename: extract_basename_2/output_path output_suffix: valueFrom: bam picard_jar_path: picard_jar_path scatter: - input_file - output_filename scatterMethod: dotproduct run: cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} ShellCommandRequirement: {} inputs: barcode_tag: doc: |- If true do not write duplicates to the output file instead of writing them with appropriate flags set. (Default false). type: string? inputBinding: prefix: BARCODE_TAG= position: 5 separate: false input_file: doc: One or more input SAM or BAM files to analyze. Must be coordinate sorted. type: File inputBinding: position: 4 valueFrom: $('INPUT=' + self.path) shellQuote: false java_opts: doc: |- JVM arguments should be a quoted, space separated list (e.g. "-Xms128m -Xmx512m") type: string? inputBinding: position: 1 shellQuote: false metrics_suffix: doc: 'Suffix used to create the metrics output file (Default: dedup_metrics.txt)' type: string default: dedup_metrics.txt output_filename: doc: Output filename used as basename type: string output_suffix: doc: 'Suffix used to identify the output file (Default: dedup.bam)' type: string default: dedup.bam picard_jar_path: doc: Path to the picard.jar file type: string inputBinding: prefix: -jar position: 2 remove_duplicates: doc: |- If true do not write duplicates to the output file instead of writing them with appropriate flags set. (Default false). type: boolean default: false inputBinding: position: 5 valueFrom: $('REMOVE_DUPLICATES=' + self) outputs: output_dedup_bam_file: type: File outputBinding: glob: $(inputs.output_filename + '.' + inputs.output_suffix) output_metrics_file: type: File outputBinding: glob: $(inputs.output_filename + '.' + inputs.metrics_suffix) baseCommand: - java arguments: - position: 3 valueFrom: MarkDuplicates - position: 5 valueFrom: $('OUTPUT=' + inputs.output_filename + '.' + inputs.output_suffix) shellQuote: false - position: 5 valueFrom: $('METRICS_FILE='+inputs.output_filename + '.' + inputs.metrics_suffix) shellQuote: false - position: 5 valueFrom: $('TMP_DIR='+runtime.tmpdir) shellQuote: false hints: DockerRequirement: dockerPull: dukegcb/picard out: - output_metrics_file - output_dedup_bam_file percent_mitochondrial_reads: in: chrom: valueFrom: chrM idxstats: bam_idxstats/idxstats_file output_filename: valueFrom: |- ${return inputs.idxstats.basename.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '').replace(/\.[^/.]+$/, '').replace(/\.[^/.]+$/, '.mitochondrial_percentage.txt')} scatter: idxstats run: cwlVersion: v1.0 class: ExpressionTool requirements: InlineJavascriptRequirement: {} inputs: chrom: doc: Query chromosome used to calculate percentage type: string idxstats: doc: Samtools idxstats file type: File inputBinding: loadContents: true output_filename: doc: Save the percentage in a file of the given name type: string? outputs: percent_map: type: - File - string expression: | ${ var regExp = new RegExp(inputs.chrom + "\\s\\d+\\s(\\d+)\\s(\\d+)"); var match = inputs.idxstats.contents.match(regExp); if (match){ var chrom_mapped_reads = match[1]; var total_reads = inputs.idxstats.contents.split("\n") .map(function(x){ var rr = x.match(/.*\s\d+\s(\d+)\s\d+/); return (rr ? rr[1] : 0); }) .reduce(function(a, b) { return Number(a) + Number(b); }); var output = (100*chrom_mapped_reads/total_reads).toFixed(4) + "%" + "\n"; if (inputs.output_filename){ return { percent_map : { "class": "File", "basename" : inputs.output_filename, "contents" : output, } } } return output; } } out: - percent_map percent_uniq_reads: in: preseq_c_curve_outfile: preseq-c-curve/output_file scatter: preseq_c_curve_outfile run: cwlVersion: v1.0 class: CommandLineTool doc: Get number of processed reads from Bowtie log. requirements: InlineJavascriptRequirement: {} ShellCommandRequirement: {} inputs: preseq_c_curve_outfile: type: File inputBinding: {} outputs: output: type: File outputBinding: glob: |- $(inputs.preseq_c_curve_outfile.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, "") + '.percentage_unique_reads.txt') stdout: |- $(inputs.preseq_c_curve_outfile.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, "") + '.percentage_unique_reads.txt') baseCommand: percent-uniq-reads-from-preseq.sh hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr out: - output preseq-c-curve: in: input_sorted_file: filtered2sorted/sorted_file output_file_basename: extract_basename_2/output_path scatter: - input_sorted_file - output_file_basename scatterMethod: dotproduct run: cwlVersion: v1.0 class: CommandLineTool doc: |- Usage: c_curve [OPTIONS] Options: -o, -output yield output file (default: stdout) -s, -step step size in extrapolations (default: 1e+06) -v, -verbose print more information -P, -pe input is paired end read file -H, -hist input is a text file containing the observed histogram -V, -vals input is a text file containing only the observed counts -B, -bam input is in BAM format -l, -seg_len maximum segment length when merging paired end bam reads (default: 5000) Help options: -?, -help print this help message -about print about message requirements: InlineJavascriptRequirement: {} inputs: B: doc: "-bam input is in BAM format \n" type: boolean default: true inputBinding: prefix: -B position: 1 H: doc: "-hist input is a text file containing the observed histogram\ \ \n" type: File? inputBinding: prefix: -H position: 1 V: doc: "-vals input is a text file containing only the observed\ \ counts \n" type: File? inputBinding: prefix: -V position: 1 input_sorted_file: doc: Sorted bed or BAM file type: File inputBinding: position: 2 l: doc: | -seg_len maximum segment length when merging paired end bam reads (default: 5000) Help options: -?, -help print this help message -about print about message type: int? inputBinding: prefix: -l position: 1 output_file_basename: type: string pe: doc: "-pe input is paired end read file \n" type: boolean? inputBinding: prefix: -P position: 1 s: doc: "-step step size in extrapolations (default: 1e+06) \n" type: float? inputBinding: prefix: -s position: 1 v: doc: "-verbose print more information \n" type: boolean default: false inputBinding: prefix: -v position: 1 outputs: output_file: type: File outputBinding: glob: $(inputs.output_file_basename + '.preseq_c_curve.txt') stdout: $(inputs.output_file_basename + '.preseq_c_curve.txt') baseCommand: - preseq - c_curve hints: DockerRequirement: dockerPull: reddylab/preseq:2.0 out: - output_file remove_duplicates: in: F: valueFrom: ${return 1024} b: valueFrom: ${return true} input_file: index_dups_marked_bams/indexed_file outfile_name: valueFrom: ${return inputs.input_file.basename.replace('dups_marked', 'dedup')} suffix: valueFrom: .dedup.bam scatter: - input_file run: cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} inputs: F: doc: only include reads with none of the bits set in INT set in FLAG [0] type: int? inputBinding: prefix: -F position: 1 L: doc: FILE only include reads overlapping this BED FILE [null] type: File? inputBinding: prefix: -L position: 1 S: doc: Input format autodetected type: boolean default: true inputBinding: prefix: -S position: 1 b: doc: output BAM type: boolean? inputBinding: prefix: -b position: 1 f: doc: only include reads with all bits set in INT set in FLAG [0] type: int? inputBinding: prefix: -f position: 1 header: doc: Include header in output type: boolean? inputBinding: prefix: -h position: 1 input_file: doc: File to be converted to BAM with samtools type: File inputBinding: position: 2 nthreads: doc: Number of threads used type: int default: 1 inputBinding: prefix: -@ position: 1 outfile_name: doc: |- Output file name. If not specified, the basename of the input file with the suffix specified in the suffix argument will be used. type: string? q: doc: only include reads with mapping quality >= INT [0] type: int? inputBinding: prefix: -q position: 1 suffix: doc: suffix of the transformed SAM/BAM file (including extension, e.g. .filtered.sam) type: string? u: doc: uncompressed BAM output (implies -b) type: boolean default: true inputBinding: prefix: -u position: 1 outputs: outfile: doc: Aligned file in SAM or BAM format type: File outputBinding: glob: | ${ if (inputs.outfile_name) return inputs.outfile_name; var suffix = inputs.b ? '.bam' : '.sam'; suffix = inputs.suffix || suffix; return inputs.input_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + suffix } stdout: | ${ if (inputs.outfile_name) return inputs.outfile_name; var suffix = inputs.b ? '.bam' : '.sam'; suffix = inputs.suffix || suffix; return inputs.input_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + suffix } baseCommand: - samtools - view hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 out: - outfile sam2bam: in: input_file: bowtie-se/output_aligned_file nthreads: nthreads scatter: input_file run: cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} inputs: S: doc: Input format autodetected type: boolean default: true inputBinding: prefix: -S position: 1 input_file: doc: File to be converted to BAM with samtools type: File inputBinding: position: 2 nthreads: doc: Number of threads used type: int default: 1 inputBinding: prefix: -@ position: 1 outputs: bam_file: doc: Aligned file in BAM format type: File outputBinding: glob: |- $(inputs.input_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.bam') stdout: |- $(inputs.input_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.bam') baseCommand: - samtools - view - -b hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 out: - bam_file sort_bams: in: input_file: sam2bam/bam_file nthreads: nthreads scatter: input_file run: cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} inputs: input_file: doc: Aligned file to be sorted with samtools type: File inputBinding: position: 1000 n: doc: Sort by read name type: boolean default: false inputBinding: prefix: -n position: 1 nthreads: doc: Number of threads used in sorting type: int default: 1 inputBinding: prefix: -@ position: 1 suffix: doc: suffix of the transformed SAM/BAM file (including extension, e.g. .filtered.sam) type: string default: .sorted.bam outputs: sorted_file: doc: Sorted aligned file type: File outputBinding: glob: |- $(inputs.input_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + inputs.suffix) stdout: |- $(inputs.input_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + inputs.suffix) baseCommand: - samtools - sort hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 out: - sorted_file sort_dedup_bams: in: input_file: remove_duplicates/outfile nthreads: nthreads scatter: - input_file run: cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} inputs: input_file: doc: Aligned file to be sorted with samtools type: File inputBinding: position: 1000 n: doc: Sort by read name type: boolean default: false inputBinding: prefix: -n position: 1 nthreads: doc: Number of threads used in sorting type: int default: 1 inputBinding: prefix: -@ position: 1 suffix: doc: suffix of the transformed SAM/BAM file (including extension, e.g. .filtered.sam) type: string default: .sorted.bam outputs: sorted_file: doc: Sorted aligned file type: File outputBinding: glob: |- $(inputs.input_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + inputs.suffix) stdout: |- $(inputs.input_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + inputs.suffix) baseCommand: - samtools - sort hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 out: - sorted_file sort_dups_marked_bams: in: input_file: mark_duplicates/output_dedup_bam_file nthreads: nthreads suffix: valueFrom: .dups_marked.bam scatter: - input_file run: cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} inputs: input_file: doc: Aligned file to be sorted with samtools type: File inputBinding: position: 1000 n: doc: Sort by read name type: boolean default: false inputBinding: prefix: -n position: 1 nthreads: doc: Number of threads used in sorting type: int default: 1 inputBinding: prefix: -@ position: 1 suffix: doc: suffix of the transformed SAM/BAM file (including extension, e.g. .filtered.sam) type: string default: .sorted.bam outputs: sorted_file: doc: Sorted aligned file type: File outputBinding: glob: |- $(inputs.input_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + inputs.suffix) stdout: |- $(inputs.input_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + inputs.suffix) baseCommand: - samtools - sort hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 out: - sorted_file out: - output_data_sorted_dedup_bam_files - output_data_sorted_dups_marked_bam_files - output_picard_mark_duplicates_files - output_pbc_files - output_bowtie_log - output_preseq_c_curve_files - output_percentage_uniq_reads - output_read_count_mapped - output_percent_mitochondrial_reads peak_call: in: as_narrowPeak_file: as_narrowPeak_file genome_effective_size: genome_effective_size input_bam_files: map/output_data_sorted_dedup_bam_files input_bam_format: valueFrom: BAM input_genome_sizes: genome_sizes_file nthreads: nthreads_peakcall run: cwlVersion: v1.0 class: Workflow doc: ATAC-seq 04 quantification - SE requirements: - class: ScatterFeatureRequirement - class: StepInputExpressionRequirement - class: InlineJavascriptRequirement inputs: as_narrowPeak_file: doc: Definition narrowPeak file in AutoSql format (used in bedToBigBed) type: File genome_effective_size: doc: |- Effective genome size used by MACS2. It can be numeric or a shortcuts:'hs' for human (2.7e9), 'mm' for mouse (1.87e9), 'ce' for C. elegans (9e7) and 'dm' for fruitfly (1.2e8), Default:hs type: string default: hs input_bam_files: type: File[] input_genome_sizes: doc: Two column tab-delimited file with chromosome size information type: File nthreads: type: int default: 1 outputs: output_extended_peak_file: doc: peakshift/phantomPeak extended fragment results file type: File[] outputSource: peak-calling/output_ext_frag_bdg_file output_filtered_read_count_file: doc: Filtered read count reported by MACS2 type: File[] outputSource: count-reads-filtered/read_count_file output_peak_bigbed_file: doc: Peaks in bigBed format type: File[] outputSource: peaks-bed-to-bigbed/bigbed output_peak_count_within_replicate: doc: Peak counts within replicate type: File[] outputSource: count-peaks/output_counts output_peak_file: doc: peakshift/phantomPeak results file type: File[] outputSource: peak-calling/output_peak_file output_peak_summits_file: doc: File containing peak summits type: File[] outputSource: peak-calling/output_peak_summits_file output_peak_xls_file: doc: Peak calling report file (*_peaks.xls file produced by MACS2) type: File[] outputSource: peak-calling/output_peak_xls_file output_read_in_peak_count_within_replicate: doc: Reads peak counts within replicate type: File[] outputSource: extract-count-reads-in-peaks/output_read_count output_spp_cross_corr_plot: doc: peakshift/phantomPeak results file type: File[] outputSource: spp/output_spp_cross_corr_plot output_spp_x_cross_corr: doc: peakshift/phantomPeak results file type: File[] outputSource: spp/output_spp_cross_corr steps: count-peaks: in: input_file: peak-calling/output_peak_file output_suffix: valueFrom: .peak_count.within_replicate.txt scatter: input_file run: cwlVersion: v1.0 class: CommandLineTool doc: Counts lines in a file and returns a suffixed file with that number requirements: InlineJavascriptRequirement: {} inputs: input_file: type: File output_suffix: type: string default: .count outputs: output_counts: type: File outputBinding: glob: $(inputs.input_file.path.replace(/^.*[\\\/]/, '') + inputs.output_suffix) stdout: $(inputs.input_file.path.replace(/^.*[\\\/]/, '') + inputs.output_suffix) baseCommand: - wc - -l stdin: $(inputs.input_file.path) out: - output_counts count-reads-filtered: in: peak_xls_file: peak-calling/output_peak_xls_file scatter: peak_xls_file run: cwlVersion: v1.0 class: CommandLineTool doc: Count number of dedup-ed reads used in peak calling requirements: InlineJavascriptRequirement: {} inputs: peak_xls_file: type: File inputBinding: position: 1 outputs: read_count_file: type: File outputBinding: glob: |- $(inputs.peak_xls_file.path.replace(/^.*[\\\/]/, '').replace(/\_peaks\.xls$/, '_read_count.txt')) stdout: |- $(inputs.peak_xls_file.path.replace(/^.*[\\\/]/, '').replace(/\_peaks\.xls$/, '_read_count.txt')) baseCommand: count-filtered-reads-macs2.sh hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr out: - read_count_file extract-count-reads-in-peaks: in: input_bam_file: filter-reads-in-peaks/filtered_file output_suffix: valueFrom: .read_count.within_replicate.txt scatter: input_bam_file run: cwlVersion: v1.0 class: CommandLineTool doc: Extract mapped reads from BAM file using Samtools flagstat command requirements: InlineJavascriptRequirement: {} ShellCommandRequirement: {} inputs: input_bam_file: doc: Aligned BAM file to filter type: File inputBinding: position: 1 output_suffix: type: string outputs: output_read_count: doc: Samtools Flagstat report file type: File outputBinding: glob: |- $(inputs.input_bam_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, "") + inputs.output_suffix) stdout: |- $(inputs.input_bam_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, "") + inputs.output_suffix) baseCommand: - samtools - flagstat arguments: - position: 10000 valueFrom: " | head -n1 | cut -f 1 -d ' '" shellQuote: false hints: DockerRequirement: dockerPull: dukegcb/samtools out: - output_read_count extract-peak-frag-length: in: input_spp_txt_file: spp/output_spp_cross_corr scatter: input_spp_txt_file run: cwlVersion: v1.0 class: CommandLineTool doc: Extracts best fragment length from SPP output text file inputs: input_spp_txt_file: type: File inputBinding: position: 1 outputs: output_best_frag_length: type: float outputBinding: glob: best_frag_length outputEval: $(Number(self[0].contents.replace('\n', ''))) loadContents: true stdout: best_frag_length baseCommand: extract-best-frag-length.sh hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr out: - output_best_frag_length filter-reads-in-peaks: in: input_bam_file: input_bam_files input_bedfile: peak-calling/output_peak_file scatter: - input_bam_file - input_bedfile scatterMethod: dotproduct run: cwlVersion: v1.0 class: CommandLineTool doc: Filter BAM file to only include reads overlapping with a BED file requirements: InlineJavascriptRequirement: {} inputs: input_bam_file: doc: Aligned BAM file to filter type: File inputBinding: position: 3 input_bedfile: doc: Bedfile used to only include reads overlapping this BED FILE type: File inputBinding: prefix: -L position: 2 outputs: filtered_file: doc: Filtered aligned BAM file by BED coordinates file type: File outputBinding: glob: $(inputs.input_bam_file.path.replace(/^.*[\\\/]/, '') + '.in_peaks.bam') stdout: $(inputs.input_bam_file.path.replace(/^.*[\\\/]/, '') + '.in_peaks.bam') baseCommand: - samtools - view - -b - -h hints: DockerRequirement: dockerPull: dukegcb/samtools:1.3 out: - filtered_file peak-calling: in: format: valueFrom: BAM bdg: valueFrom: ${return true} extsize: valueFrom: ${return 200} g: genome_effective_size nomodel: valueFrom: ${return true} q: valueFrom: ${return 0.1} shift: valueFrom: ${return -100} treatment: valueFrom: $([self]) source: input_bam_files scatter: - treatment scatterMethod: dotproduct run: cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} inputs: format: doc: |- -f {AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE,BAMPE}, --format {AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE,BAMPE} Format of tag file, "AUTO", "BED" or "ELAND" or "ELANDMULTI" or "ELANDEXPORT" or "SAM" or "BAM" or "BOWTIE" or "BAMPE". The default AUTO option will let MACS decide which format the file is. Note that MACS can't detect "BAMPE" or "BEDPE" format with "AUTO", and you have to implicitly specify the format for "BAMPE" and "BEDPE". DEFAULT: "AUTO". type: string? inputBinding: prefix: -f position: 1 SPMR: doc: |- If True, MACS will save signal per million reads for fragment pileup profiles. Require --bdg to be set. Default: False type: boolean? inputBinding: prefix: --SPMR position: 1 bdg: doc: |- Whether or not to save extended fragment pileup, and local lambda tracks (two files) at every bp into a bedGraph file. DEFAULT: True type: boolean? inputBinding: prefix: --bdg position: 1 broad: doc: |- If set, MACS will try to call broad peaks by linking nearby highly enriched regions. The linking region is controlled by another cutoff through --linking-cutoff. The maximum linking region length is 4 times of d from MACS. DEFAULT: False type: boolean? inputBinding: prefix: --broad position: 1 broad-cutoff: doc: | BROADCUTOFF Cutoff for broad region. This option is not available unless --broad is set. If -p is set, this is a pvalue cutoff, otherwise, it's a qvalue cutoff. DEFAULT: 0.1 type: float? inputBinding: prefix: --broad-cutoff position: 1 buffer-size: doc: | BUFFER_SIZE Buffer size for incrementally increasing internal array size to store reads alignment information. In most cases, you don't have to change this parameter. However, if there are large number of chromosomes/contigs/scaffolds in your alignment, it's recommended to specify a smaller buffer size in order to decrease memory usage (but it will take longer time to read alignment files). Minimum memory requested for reading an alignment file is about # of CHROMOSOME * BUFFER_SIZE * 2 Bytes. DEFAULT: 100000 type: int? inputBinding: prefix: --buffer-size position: 1 bw: doc: | BW Band width for picking regions to compute fragment size. This value is only used while building the shifting model. DEFAULT: 300 type: int? inputBinding: prefix: --bw position: 1 call-summits: doc: |- If set, MACS will use a more sophisticated signal processing approach to find subpeak summits in each enriched peak region. DEFAULT: False type: boolean? inputBinding: prefix: --call-summits position: 1 control: doc: Control sample file. type: File? inputBinding: prefix: --control position: 2 cutoff-analysis: doc: |- While set, MACS2 will analyze number or total length of peaks that can be called by different p-value cutoff then output a summary table to help user decide a better cutoff. The table will be saved in NAME_cutoff_analysis.txt file. Note, minlen and maxgap may affect the results. WARNING: May take ~30 folds longer time to finish. DEFAULT: False Post-processing options: type: boolean? inputBinding: prefix: --cutoff-analysis position: 1 down-sample: doc: |- When set, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. Warning: This option will make your result unstable and irreproducible since each time, random reads would be selected. Consider to use 'randsample' script instead. If used together with --SPMR, 1 million unique reads will be randomly picked. Caution: due to the implementation, the final number of selected reads may not be as you expected! DEFAULT: False type: boolean? inputBinding: prefix: --down-sample position: 1 extsize: doc: |- The arbitrary extension size in bp. When nomodel is true, MACS will use this value as fragment size to extend each read towards 3' end, then pile them up. It's exactly twice the number of obsolete SHIFTSIZE. In previous language, each read is moved 5'->3' direction to middle of fragment by 1/2 d, then extended to both direction with 1/2 d. This is equivalent to say each read is extended towards 5'->3' into a d size fragment. DEFAULT: 200. EXTSIZE and SHIFT can be combined when necessary. Check SHIFT option. type: float? inputBinding: prefix: --extsize position: 1 fe-cutoff: doc: | FECUTOFF When set, the value will be used to filter out peaks with low fold-enrichment. Note, MACS2 use 1.0 as pseudocount while calculating fold-enrichment. DEFAULT: 1.0 type: float? inputBinding: prefix: --fe-cutoff position: 1 fix-bimodal: doc: |- Whether turn on the auto pair model process. If set, when MACS failed to build paired model, it will use the nomodel settings, the --exsize parameter to extend each tags towards 3' direction. Not to use this automate fixation is a default behavior now. DEFAULT: False type: boolean? inputBinding: prefix: --fix-bimodal position: 1 g: doc: |- Effective genome size. It can be 1.0e+9 or 1000000000, or shortcuts:'hs' for human (2.7e9), 'mm' for mouse (1.87e9), 'ce' for C. elegans (9e7) and 'dm' for fruitfly (1.2e8), Default:hs. type: string? inputBinding: prefix: -g position: 1 keep-dup: doc: | KEEPDUPLICATES It controls the MACS behavior towards duplicate tags at the exact same location -- the same coordination and the same strand. The 'auto' option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff; and the 'all' option keeps every tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location. Default: 1 type: string? inputBinding: prefix: --keep-dup position: 1 llocal: doc: | LARGELOCAL The large nearby region in basepairs to calculate dynamic lambda. This is used to capture the surround bias. If you set this to 0, MACS will skip llocal lambda calculation. *Note* that MACS will always perform a d-size local lambda calculation. The final local bias should be the maximum of the lambda value from d, slocal, and llocal size windows. DEFAULT: 10000. type: int? inputBinding: prefix: --llocal position: 1 m: doc: | MFOLD MFOLD, --mfold MFOLD MFOLD Select the regions within MFOLD range of high- confidence enrichment ratio against background to build model. Fold-enrichment in regions must be lower than upper limit, and higher than the lower limit. Use as "-m 10 30". DEFAULT:5 50 type: string? inputBinding: prefix: -m position: 1 nolambda: doc: |- If True, MACS will use fixed background lambda as local lambda for every peak region. Normally, MACS calculates a dynamic local lambda to reflect the local bias due to potential chromatin structure. type: boolean? inputBinding: prefix: --nolambda position: 1 nomodel: doc: |- Whether or not to build the shifting model. If True, MACS will not build model. by default it means shifting size = 100, try to set extsize to change it. DEFAULT: False type: boolean? inputBinding: prefix: --nomodel position: 1 p: doc: |- Pvalue cutoff for peak detection. DEFAULT: not set. -q, and -p are mutually exclusive. If pvalue cutoff is set, qvalue will not be calculated and reported as -1 in the final .xls file.. type: float? inputBinding: prefix: -p position: 1 q: doc: |- Minimum FDR (q-value) cutoff for peak detection. DEFAULT: 0.05. -q, and -p are mutually exclusive. type: float? inputBinding: prefix: -q position: 1 ratio: doc: | RATIO When set, use a custom scaling ratio of ChIP/control (e.g. calculated using NCIS) for linear scaling. DEFAULT: ingore type: float? inputBinding: prefix: --ratio position: 1 s: doc: | TSIZE, --tsize TSIZE Tag size. This will overide the auto detected tag size. DEFAULT: Not set type: int? inputBinding: prefix: -s position: 1 seed: doc: | SEED Set the random seed while down sampling data. Must be a non-negative integer in order to be effective. DEFAULT: not set type: int? inputBinding: prefix: --seed position: 1 shift: doc: |- (NOT the legacy --shiftsize option!) The arbitrary shift in bp. Use discretion while setting it other than default value. When NOMODEL is set, MACS will use this value to move cutting ends (5') towards 5'->3' direction then apply EXTSIZE to extend them to fragments. When this value is negative, ends will be moved toward 3'->5' direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with EXTSIZE option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can't set values other than 0 if format is BAMPE for paired-end data. DEFAULT: 0. type: int? inputBinding: prefix: --shift position: 1 slocal: doc: | SMALLLOCAL The small nearby region in basepairs to calculate dynamic lambda. This is used to capture the bias near the peak summit region. Invalid if there is no control data. If you set this to 0, MACS will skip slocal lambda calculation. *Note* that MACS will always perform a d-size local lambda calculation. The final local bias should be the maximum of the lambda value from d, slocal, and llocal size windows. DEFAULT: 1000 type: int? inputBinding: prefix: --slocal position: 1 to-large: doc: |- When set, scale the small sample up to the bigger sample. By default, the bigger dataset will be scaled down towards the smaller dataset, which will lead to smaller p/qvalues and more specific results. Keep in mind that scaling down will bring down background noise more. DEFAULT: False type: boolean? inputBinding: prefix: --to-large position: 1 trackline: doc: |- Tells MACS to include trackline with bedGraph files. To include this trackline while displaying bedGraph at UCSC genome browser, can show name and description of the file as well. However my suggestion is to convert bedGraph to bigWig, then show the smaller and faster binary bigWig file at UCSC genome browser, as well as downstream analysis. Require --bdg to be set. Default: Not include trackline. type: boolean? inputBinding: prefix: --trackline position: 1 treatment: doc: |- Treatment sample file(s). If multiple files are given as -t A B C, then they will all be read and pooled together. IMPORTANT: the first sample will be used as the outputs basename. type: File[] inputBinding: prefix: --treatment position: 2 verbose: doc: | VERBOSE_LEVEL Set verbose level of runtime message. 0: only show critical message, 1: show additional warning message, 2: show process information, 3: show debug messages. DEFAULT:2 type: int? inputBinding: prefix: --verbose position: 1 outputs: output_ext_frag_bdg_file: doc: Bedgraph with extended fragment pileup. type: File? outputBinding: glob: |- $(inputs.treatment[0].path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '_treat_pileup.bdg') output_peak_file: doc: Peak calling output file in narrowPeak|broadPeak format. type: File outputBinding: glob: |- $(inputs.treatment[0].path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '_peaks.*Peak') outputEval: $(self[0]) output_peak_summits_file: doc: Peaks summits bedfile. type: File? outputBinding: glob: |- $(inputs.treatment[0].path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '_summits.bed') output_peak_xls_file: doc: Peaks information/report file. type: File outputBinding: glob: |- $(inputs.treatment[0].path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '_peaks.xls') baseCommand: - macs2 - callpeak arguments: - prefix: -n position: 1 valueFrom: $(inputs.treatment[0].path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '')) hints: DockerRequirement: dockerPull: dukegcb/macs2 out: - output_peak_file - output_peak_summits_file - output_ext_frag_bdg_file - output_peak_xls_file peaks-bed-to-bigbed: in: type: valueFrom: bed6+4 as: as_narrowPeak_file bed: trunk-peak-score/trunked_scores_peaks genome_sizes: input_genome_sizes scatter: bed run: cwlVersion: v1.0 class: CommandLineTool doc: | "bedToBigBed v. 2.7 - Convert bed file to bigBed. (BigBed version: 4) usage: bedToBigBed in.bed chrom.sizes out.bb Where in.bed is in one of the ascii bed formats, but not including track lines and chrom.sizes is two column: and out.bb is the output indexed big bed file. Use the script: fetchChromSizes to obtain the actual chrom.sizes information from UCSC, please do not make up a chrom sizes from your own information. The in.bed file must be sorted by chromosome,start, to sort a bed file, use the unix sort command: sort -k1,1 -k2,2n unsorted.bed > sorted.bed" requirements: InlineJavascriptRequirement: {} inputs: type: doc: | -type=bedN[+[P]] : N is between 3 and 15, optional (+) if extra "bedPlus" fields, optional P specifies the number of extra fields. Not required, but preferred. Examples: -type=bed6 or -type=bed6+ or -type=bed6+3 (see http://genome.ucsc.edu/FAQ/FAQformat.html#format1) type: string? inputBinding: prefix: -type= position: 1 separate: false as: doc: | -as=fields.as - If you have non-standard "bedPlus" fields, it's great to put a definition of each field in a row in AutoSql format here.1) type: File? inputBinding: prefix: -as= position: 1 separate: false bed: doc: Input bed file type: File inputBinding: position: 2 blockSize: doc: "-blockSize=N - Number of items to bundle in r-tree. Default\ \ 256\n" type: int? inputBinding: prefix: -blockSize= position: 1 separate: false extraIndex: doc: | -extraIndex=fieldList - If set, make an index on each field in a comma separated list extraIndex=name and extraIndex=name,id are commonly used. type: - 'null' - type: array items: string inputBinding: prefix: -extraIndex= position: 1 itemSeparator: ',' genome_sizes: doc: "genome_sizes is two column: .\n" type: File inputBinding: position: 3 itemsPerSlot: doc: "-itemsPerSlot=N - Number of data points bundled at lowest level.\ \ Default 512\n" type: int? inputBinding: prefix: -itemsPerSlot= position: 1 separate: false output_suffix: type: string default: .bb tab: doc: | -tab - If set, expect fields to be tab separated, normally expects white space separator. type: boolean? inputBinding: position: 1 unc: doc: "-unc - If set, do not use compression.\n" type: boolean? inputBinding: position: 1 outputs: bigbed: type: File outputBinding: glob: $(inputs.bed.path.replace(/^.*[\\\/]/, '')+ inputs.output_suffix) baseCommand: bedToBigBed arguments: - position: 4 valueFrom: $(inputs.bed.path.replace(/^.*[\\\/]/, '') + inputs.output_suffix) hints: DockerRequirement: dockerPull: dleehr/docker-hubutils out: - bigbed spp: in: input_bam: input_bam_files nthreads: nthreads savp: valueFrom: ${return true} scatter: - input_bam scatterMethod: dotproduct run: cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} ShellCommandRequirement: {} inputs: control_bam: doc: |- , full path and name (or URL) of tagAlign/BAM file (can be gzipped) (FILE EXTENSION MUST BE tagAlign.gz, tagAlign, bam or bam.gz) type: File? inputBinding: prefix: -i= separate: false fdr: doc: -fdr= , false discovery rate threshold for peak calling type: float? inputBinding: prefix: -fdr= separate: false filtchr: doc: |- -filtchr= , Pattern to use to remove tags that map to specific chromosomes e.g. _ will remove all tags that map to chromosomes with _ in their name type: string? inputBinding: prefix: -filtchr= separate: false input_bam: doc: |- , full path and name (or URL) of tagAlign/BAM file (can be gzipped)(FILE EXTENSION MUST BE tagAlign.gz, tagAlign, bam or bam.gz) type: File inputBinding: prefix: -c= separate: false npeak: doc: -npeak=, threshold on number of peaks to call type: int? inputBinding: prefix: -npeak= separate: false nthreads: doc: -p= , number of parallel processing nodes, default=0 type: int? inputBinding: prefix: -p= separate: false rf: doc: 'overwrite (force remove) output files in case they exist. Default: true' type: boolean default: true inputBinding: prefix: -rf s: doc: |- -s=:: , strand shifts at which cross-correlation is evaluated, default=-500:5:1500 type: string? inputBinding: prefix: -s= separate: false savd: doc: -savd= OR -savd, save Rdata file type: boolean? inputBinding: valueFrom: |- ${ if (self) return "-savd=" + inputs.input_bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".spp.Rdata"; return null} savn: doc: -savn= OR -savn NarrowPeak file name (fixed width peaks) type: boolean? inputBinding: valueFrom: |- ${ if (self) return "-savn=" + inputs.input_bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".spp.narrowPeak"; return null} savp: doc: save cross-correlation plot type: boolean? inputBinding: valueFrom: |- ${ if (self) return "-savp=" + inputs.input_bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".spp_cross_corr.pdf"; return null} savr: doc: |- -savr= OR -savr RegionPeak file name (variable width peaks with regions of enrichment) type: boolean? inputBinding: valueFrom: |- ${ if (self) return "-savr=" + inputs.input_bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".spp.regionPeak"; return null} speak: doc: -speak=, user-defined cross-correlation peak strandshift type: string? inputBinding: prefix: -speak= separate: false x: doc: |- -x=:, strand shifts to exclude (This is mainly to avoid region around phantom peak) default=10:(readlen+10) type: string? inputBinding: prefix: -x= separate: false outputs: output_spp_cross_corr: doc: peakshift/phantomPeak results summary file type: File outputBinding: glob: |- $(inputs.input_bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".spp_cross_corr.txt") output_spp_cross_corr_plot: doc: peakshift/phantomPeak results summary plot type: File? outputBinding: glob: |- $(inputs.input_bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".spp_cross_corr.pdf") output_spp_narrow_peak: doc: narrowPeak output file type: File? outputBinding: glob: |- $(inputs.input_bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".spp.narrowPeak") output_spp_rdata: doc: Rdata file from the run_spp.R run type: File? outputBinding: glob: |- $(inputs.input_bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".spp.Rdata") output_spp_region_peak: doc: regionPeak output file type: File? outputBinding: glob: |- $(inputs.input_bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".spp.regionPeak") baseCommand: run_spp.R arguments: - valueFrom: |- $("-out=" + inputs.input_bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".spp_cross_corr.txt") - valueFrom: $("-tmpdir="+runtime.tmpdir) shellQuote: false hints: DockerRequirement: dockerPull: dukegcb/spp out: - output_spp_cross_corr - output_spp_cross_corr_plot trunk-peak-score: in: peaks: peak-calling/output_peak_file scatter: peaks run: cwlVersion: v1.0 class: CommandLineTool doc: Trunk scores in ENCODE bed6+4 files inputs: peaks: type: File inputBinding: position: 10000 sep: type: string default: \t inputBinding: prefix: -F position: 2 outputs: trunked_scores_peaks: type: File outputBinding: glob: |- $(inputs.peaks.path.replace(/^.*[\\\/]/, '').replace(/\.([^/.]+)$/, "\.trunked_scores\.$1")) stdout: |- $(inputs.peaks.path.replace(/^.*[\\\/]/, '').replace(/\.([^/.]+)$/, "\.trunked_scores\.$1")) baseCommand: awk arguments: - position: 3 valueFrom: BEGIN{OFS=FS}$5>1000{$5=1000}{print} hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr out: - trunked_scores_peaks out: - output_spp_x_cross_corr - output_spp_cross_corr_plot - output_read_in_peak_count_within_replicate - output_peak_file - output_peak_bigbed_file - output_peak_summits_file - output_extended_peak_file - output_peak_xls_file - output_filtered_read_count_file - output_peak_count_within_replicate qc: in: default_adapters_file: default_adapters_file input_fastq_files: input_fastq_files nthreads: nthreads_qc run: cwlVersion: v1.0 class: Workflow doc: 'ATAC-seq 01 QC - reads: SE' requirements: - class: ScatterFeatureRequirement - class: StepInputExpressionRequirement - class: InlineJavascriptRequirement inputs: default_adapters_file: doc: Adapters file type: File input_fastq_files: doc: Input fastq files type: File[] nthreads: doc: Number of threads. type: int outputs: output_count_raw_reads: type: File[] outputSource: count_raw_reads/output_read_count output_custom_adapters: type: File[] outputSource: overrepresented_sequence_extract/output_custom_adapters output_diff_counts: type: File[] outputSource: compare_read_counts/result output_fastqc_data_files: doc: FastQC data files type: File[] outputSource: extract_fastqc_data/output_fastqc_data_file output_fastqc_report_files: doc: FastQC reports in zip format type: File[] outputSource: fastqc/output_qc_report_file steps: compare_read_counts: in: file1: count_raw_reads/output_read_count file2: count_fastqc_reads/output_fastqc_read_count scatter: - file1 - file2 scatterMethod: dotproduct run: cwlVersion: v1.0 class: CommandLineTool doc: Compares 2 files inputs: brief: type: boolean default: true inputBinding: prefix: --brief position: 3 file1: type: File inputBinding: position: 1 file2: type: File inputBinding: position: 2 outputs: result: type: File outputBinding: glob: stdout.txt stdout: stdout.txt baseCommand: diff hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr out: - result count_fastqc_reads: in: input_basename: extract_basename/output_basename input_fastqc_data: extract_fastqc_data/output_fastqc_data_file scatter: - input_fastqc_data - input_basename scatterMethod: dotproduct run: cwlVersion: v1.0 class: CommandLineTool doc: Extracts read count from fastqc_data.txt inputs: input_basename: type: string input_fastqc_data: type: File inputBinding: position: 1 outputs: output_fastqc_read_count: type: File outputBinding: glob: $(inputs.input_basename + '.fastqc-read_count.txt') stdout: $(inputs.input_basename + '.fastqc-read_count.txt') baseCommand: count-fastqc_data-reads.sh hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr out: - output_fastqc_read_count count_raw_reads: in: input_basename: extract_basename/output_basename input_fastq_file: input_fastq_files scatter: - input_fastq_file - input_basename scatterMethod: dotproduct run: cwlVersion: v1.0 class: CommandLineTool doc: Counts reads in a fastq file requirements: InlineJavascriptRequirement: {} inputs: input_basename: type: string input_fastq_file: type: File inputBinding: position: 1 outputs: output_read_count: type: File outputBinding: glob: $(inputs.input_basename + '.read_count.txt') stdout: $(inputs.input_basename + '.read_count.txt') baseCommand: count-fastq-reads.sh hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr out: - output_read_count extract_basename: in: input_file: input_fastq_files scatter: input_file run: cwlVersion: v1.0 class: CommandLineTool doc: Extracts the base name of a file requirements: InlineJavascriptRequirement: {} inputs: input_file: type: File inputBinding: position: 1 outputs: output_basename: type: string outputBinding: outputEval: |- $(inputs.input_file.path.substr(inputs.input_file.path.lastIndexOf('/') + 1, inputs.input_file.path.lastIndexOf('.') - (inputs.input_file.path.lastIndexOf('/') + 1))) baseCommand: echo hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr out: - output_basename extract_fastqc_data: in: input_basename: extract_basename/output_basename input_qc_report_file: fastqc/output_qc_report_file scatter: - input_qc_report_file - input_basename scatterMethod: dotproduct run: cwlVersion: v1.0 class: CommandLineTool doc: |- Unzips a zipped fastqc report and returns the fastqc_data.txt file. Unzips the file to pipe and uses redirection inputs: extract_pattern: type: string default: '*/fastqc_data.txt' inputBinding: position: 3 input_basename: type: string input_qc_report_file: type: File inputBinding: position: 2 pipe: type: boolean default: true inputBinding: prefix: -p position: 1 outputs: output_fastqc_data_file: type: File outputBinding: glob: $(inputs.input_basename + '.fastqc_data.txt') stdout: $(inputs.input_basename + '.fastqc_data.txt') baseCommand: unzip hints: DockerRequirement: dockerPull: dukegcb/fastqc out: - output_fastqc_data_file fastqc: in: input_fastq_file: input_fastq_files threads: nthreads scatter: input_fastq_file run: cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} inputs: format: type: string default: fastq inputBinding: prefix: --format position: 3 input_fastq_file: type: File inputBinding: position: 4 noextract: type: boolean default: true inputBinding: prefix: --noextract position: 2 threads: type: int default: 1 inputBinding: prefix: --threads position: 5 outputs: output_qc_report_file: type: File outputBinding: glob: |- $(inputs.input_fastq_file.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, '') + "_fastqc.zip") baseCommand: fastqc arguments: - prefix: --dir position: 5 valueFrom: $(runtime.tmpdir) - prefix: -o position: 5 valueFrom: $(runtime.outdir) hints: DockerRequirement: dockerPull: dukegcb/fastqc out: - output_qc_report_file overrepresented_sequence_extract: in: default_adapters_file: default_adapters_file input_basename: extract_basename/output_basename input_fastqc_data: extract_fastqc_data/output_fastqc_data_file scatter: - input_fastqc_data - input_basename scatterMethod: dotproduct run: cwlVersion: v1.0 class: CommandLineTool inputs: default_adapters_file: doc: Adapters file in fasta format type: File inputBinding: position: 2 input_basename: doc: Name of the sample - used as a base name for generating output files type: string input_fastqc_data: doc: fastqc_data.txt file from a fastqc report type: File inputBinding: position: 1 outputs: output_custom_adapters: type: File outputBinding: glob: $(inputs.input_basename + '.custom_adapters.fasta') baseCommand: overrepresented_sequence_extract.py arguments: - position: 3 valueFrom: $(inputs.input_basename + '.custom_adapters.fasta') hints: DockerRequirement: dockerPull: reddylab/overrepresented_sequence_extract:1.0 out: - output_custom_adapters out: - output_count_raw_reads - output_diff_counts - output_fastqc_report_files - output_fastqc_data_files - output_custom_adapters quant: in: input_bam_files: map/output_data_sorted_dedup_bam_files input_genome_sizes: genome_sizes_file nthreads: nthreads_quant run: cwlVersion: v1.0 class: Workflow doc: ATAC-seq - Quantification requirements: - class: ScatterFeatureRequirement - class: StepInputExpressionRequirement - class: InlineJavascriptRequirement inputs: input_bam_files: type: File[] input_genome_sizes: type: File nthreads: type: int default: 1 outputs: bigwig_norm_files: doc: signal files of pileup reads in RPKM type: File[] outputSource: bamcoverage/output_bam_coverage bigwig_raw_files: doc: Raw reads bigWig (signal) files type: File[] outputSource: bdg2bw-raw/output_bigwig steps: bamcoverage: in: bam: input_bam_files binSize: valueFrom: ${return 1} extendReads: valueFrom: ${return 200} normalizeUsing: valueFrom: RPKM numberOfProcessors: nthreads output_suffix: valueFrom: .rpkm.bw scatter: bam run: cwlVersion: v1.0 class: CommandLineTool doc: | usage: An example usage is:$ bamCoverage -b reads.bam -o coverage.bw This tool takes an alignment of reads or fragments as input (BAM file) and generates a coverage track (bigWig or bedGraph) as output. The coverage is calculated as the number of reads per bin, where bins are short consecutive counting windows of a defined size. It is possible to extended the length of the reads to better reflect the actual fragment length. *bamCoverage* offers normalization by scaling factor, Reads Per Kilobase per Million mapped reads (RPKM), and 1x depth (reads per genome coverage, RPGC). Required arguments: --bam BAM file, -b BAM file BAM file to process (default: None) Output: --outFileName FILENAME, -o FILENAME Output file name. (default: None) --outFileFormat {bigwig,bedgraph}, -of {bigwig,bedgraph} Output file type. Either "bigwig" or "bedgraph". (default: bigwig) Optional arguments: --help, -h show this help message and exit --scaleFactor SCALEFACTOR The smooth length defines a window, larger than the binSize, to average the number of reads. For example, if the –binSize is set to 20 and the –smoothLength is set to 60, then, for each bin, the average of the bin and its left and right neighbors is considered. Any value smaller than –binSize will be ignored and no smoothing will be applied. (default: 1.0) --MNase Determine nucleosome positions from MNase-seq data. Only 3 nucleotides at the center of each fragment are counted. The fragment ends are defined by the two mate reads. Only fragment lengthsbetween 130 - 200 bp are considered to avoid dinucleosomes or other artifacts.*NOTE*: Requires paired-end data. A bin size of 1 is recommended. (default: False) --filterRNAstrand {forward,reverse} Selects RNA-seq reads (single-end or paired-end) in the given strand. (default: None) --version show program's version number and exit --binSize INT bp, -bs INT bp Size of the bins, in bases, for the output of the bigwig/bedgraph file. (default: 50) --region CHR:START:END, -r CHR:START:END Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example --region chr10 or --region chr10:456700:891000. (default: None) --blackListFileName BED file, -bl BED file A BED file containing regions that should be excluded from all analyses. Currently this works by rejecting genomic chunks that happen to overlap an entry. Consequently, for BAM files, if a read partially overlaps a blacklisted region or a fragment spans over it, then the read/fragment might still be considered. (default: None) --numberOfProcessors INT, -p INT Number of processors to use. Type "max/2" to use half the maximum number of processors or "max" to use all available processors. (default: max/2) --verbose, -v Set to see processing messages. (default: False) Read coverage normalization options: --normalizeTo1x EFFECTIVE GENOME SIZE LENGTH Report read coverage normalized to 1x sequencing depth (also known as Reads Per Genomic Content (RPGC)). Sequencing depth is defined as: (total number of mapped reads * fragment length) / effective genome size. The scaling factor used is the inverse of the sequencing depth computed for the sample to match the 1x coverage. To use this option, the effective genome size has to be indicated after the option. The effective genome size is the portion of the genome that is mappable. Large fractions of the genome are stretches of NNNN that should be discarded. Also, if repetitive regions were not included in the mapping of reads, the effective genome size needs to be adjusted accordingly. Common values are: mm9: 2,150,570,000; hg19:2,451,960,000; dm3:121,400,000 and ce10:93,260,000. See Table 2 of http://www.plosone.org /article/info:doi/10.1371/journal.pone.0030377 or http ://www.nature.com/nbt/journal/v27/n1/fig_tab/nbt.1518_ T1.html for several effective genome sizes. (default: None) --ignoreForNormalization IGNOREFORNORMALIZATION [IGNOREFORNORMALIZATION ...] A list of space-delimited chromosome names containing those chromosomes that should be excluded for computing the normalization. This is useful when considering samples with unequal coverage across chromosomes, like male samples. An usage examples is --ignoreForNormalization chrX chrM. (default: None) --skipNonCoveredRegions, --skipNAs This parameter determines if non-covered regions (regions without overlapping reads) in a BAM file should be skipped. The default is to treat those regions as having a value of zero. The decision to skip non-covered regions depends on the interpretation of the data. Non-covered regions may represent, for example, repetitive regions that should be skipped. (default: False) --smoothLength INT bp The smooth length defines a window, larger than the binSize, to average the number of reads. For example, if the --binSize is set to 20 and the --smoothLength is set to 60, then, for each bin, the average of the bin and its left and right neighbors is considered. Any value smaller than --binSize will be ignored and no smoothing will be applied. (default: None) Read processing options: --extendReads [INT bp], -e [INT bp] This parameter allows the extension of reads to fragment size. If set, each read is extended, without exception. *NOTE*: This feature is generally NOT recommended for spliced-read data, such as RNA-seq, as it would extend reads over skipped regions. *Single- end*: Requires a user specified value for the final fragment length. Reads that already exceed this fragment length will not be extended. *Paired-end*: Reads with mates are always extended to match the fragment size defined by the two read mates. Unmated reads, mate reads that map too far apart (>4x fragment length) or even map to different chromosomes are treated like single-end reads. The input of a fragment length value is optional. If no value is specified, it is estimated from the data (mean of the fragment size of all mate reads). (default: False) --ignoreDuplicates If set, reads that have the same orientation and start position will be considered only once. If reads are paired, the mate's position also has to coincide to ignore a read. (default: False) --minMappingQuality INT If set, only reads that have a mapping quality score of at least this are considered. (default: None) --centerReads By adding this option, reads are centered with respect to the fragment length. For paired-end data, the read is centered at the fragment length defined by the two ends of the fragment. For single-end data, the given fragment length is used. This option is useful to get a sharper signal around enriched regions. (default: False) --samFlagInclude INT Include reads based on the SAM flag. For example, to get only reads that are the first mate, use a flag of 64. This is useful to count properly paired reads only once, as otherwise the second mate will be also considered for the coverage. (default: None) --samFlagExclude INT Exclude reads based on the SAM flag. For example, to get only reads that map to the forward strand, use --samFlagExclude 16, where 16 is the SAM flag for reads that map to the reverse strand. (default: None) requirements: InlineJavascriptRequirement: {} inputs: MNase: doc: | Determine nucleosome positions from MNase-seq data. Only 3 nucleotides at the center of each fragment are counted. The fragment ends are defined by the two mate reads. Only fragment lengthsbetween 130 - 200 bp are considered to avoid dinucleosomes or other artifacts.*NOTE*: Requires paired-end data. A bin size of 1 is recommended. (default: False) type: boolean? inputBinding: prefix: --MNase position: 1 bam: doc: 'BAM file to process ' type: File secondaryFiles: - .bai inputBinding: prefix: --bam position: 1 binSize: doc: | INT bp Size of the bins, in bases, for the output of the bigwig/bedgraph file. (default: 50) type: int? inputBinding: prefix: --binSize position: 1 blackListFileName: doc: | BED file A BED file containing regions that should be excluded from all analyses. Currently this works by rejecting genomic chunks that happen to overlap an entry. Consequently, for BAM files, if a read partially overlaps a blacklisted region or a fragment spans over it, then the read/fragment might still be considered. (default: None) type: File? inputBinding: prefix: --blackListFileName position: 1 centerReads: doc: | By adding this option, reads are centered with respect to the fragment length. For paired-end data, the read is centered at the fragment length defined by the two ends of the fragment. For single-end data, the given fragment length is used. This option is useful to get a sharper signal around enriched regions. (default: False) type: boolean? inputBinding: prefix: --centerReads position: 1 extendReads: doc: | INT bp This parameter allows the extension of reads to fragment size. If set, each read is extended, without exception. *NOTE*: This feature is generally NOT recommended for spliced-read data, such as RNA-seq, as it would extend reads over skipped regions. *Single- end*: Requires a user specified value for the final fragment length. Reads that already exceed this fragment length will not be extended. *Paired-end*: Reads with mates are always extended to match the fragment size defined by the two read mates. Unmated reads, mate reads that map too far apart (>4x fragment length) or even map to different chromosomes are treated like single-end reads. The input of a fragment length value is optional. If no value is specified, it is estimated from the data (mean of the fragment size of all mate reads). (default: False) type: int? inputBinding: prefix: --extendReads position: 1 filterRNAstrand: doc: | {forward,reverse} Selects RNA-seq reads (single-end or paired-end) in the given strand. (default: None) type: string? inputBinding: prefix: --filterRNAstrand position: 1 ignoreDuplicates: doc: | If set, reads that have the same orientation and start position will be considered only once. If reads are paired, the mate's position also has to coincide to ignore a read. (default: False) type: boolean? inputBinding: prefix: --ignoreDuplicates position: 1 ignoreForNormalization: doc: | A list of space-delimited chromosome names containing those chromosomes that should be excluded for computing the normalization. This is useful when considering samples with unequal coverage across chromosomes, like male samples. An usage examples is –ignoreForNormalization chrX chrM. type: - 'null' - type: array items: string inputBinding: prefix: --ignoreForNormalization position: 1 minMappingQuality: doc: | INT If set, only reads that have a mapping quality score of at least this are considered. (default: None) type: int? inputBinding: prefix: --minMappingQuality position: 1 normalizeUsing: doc: | Possible choices: RPKM, CPM, BPM, RPGC Use one of the entered methods to normalize the number of reads per bin. By default, no normalization is performed. RPKM = Reads Per Kilobase per Million mapped reads; CPM = Counts Per Million mapped reads, same as CPM in RNA-seq; BPM = Bins Per Million mapped reads, same as TPM in RNA-seq; RPGC = reads per genomic content (1x normalization); Mapped reads are considered after blacklist filtering (if applied). RPKM (per bin) = number of reads per bin / (number of mapped reads (in millions) * bin length (kb)). CPM (per bin) = number of reads per bin / number of mapped reads (in millions). BPM (per bin) = number of reads per bin / sum of all reads per bin (in millions). RPGC (per bin) = number of reads per bin / scaling factor for 1x average coverage. This scaling factor, in turn, is determined from the sequencing depth: (total number of mapped reads * fragment length) / effective genome size. The scaling factor used is the inverse of the sequencing depth computed for the sample to match the 1x coverage. This option requires –effectiveGenomeSize. Each read is considered independently, if you want to only count one mate from a pair in paired-end data, then use the –samFlagInclude/–samFlagExclude options. type: string? inputBinding: prefix: --normalizeUsing position: 1 numberOfProcessors: doc: | INT Number of processors to use. Type "max/2" to use half the maximum number of processors or "max" to use all available processors. (default: max/2) type: int? inputBinding: prefix: --numberOfProcessors position: 1 outFileFormat: doc: | {bigwig,bedgraph}, -of {bigwig,bedgraph} Output file type. Either "bigwig" or "bedgraph". (default: bigwig) type: string default: bigwig inputBinding: prefix: --outFileFormat position: 1 outFileName: doc: | FILENAME Output file name. (default: input BAM filename with bigwig [*.bw] or bedgraph [*.bdg] extension.) type: string? output_suffix: doc: Suffix used for output file (input BAM filename + suffix) type: string? region: doc: | CHR:START:END Region of the genome to limit the operation to - this is useful when testing parameters to reduce the computing time. The format is chr:start:end, for example --region chr10 or --region chr10:456700:891000. (default: None) type: string? inputBinding: prefix: --region position: 1 samFlagExclude: doc: | INT Exclude reads based on the SAM flag. For example, to get only reads that map to the forward strand, use --samFlagExclude 16, where 16 is the SAM flag for reads that map to the reverse strand. (default: None) type: int? inputBinding: prefix: --samFlagExclude position: 1 samFlagInclude: doc: | INT Include reads based on the SAM flag. For example, to get only reads that are the first mate, use a flag of 64. This is useful to count properly paired reads only once, as otherwise the second mate will be also considered for the coverage. (default: None) type: int? inputBinding: prefix: --samFlagInclude position: 1 scaleFactor: doc: | The smooth length defines a window, larger than the binSize, to average the number of reads. For example, if the –binSize is set to 20 and the –smoothLength is set to 60, then, for each bin, the average of the bin and its left and right neighbors is considered. Any value smaller than –binSize will be ignored and no smoothing will be applied. (default: 1.0) type: float? inputBinding: prefix: --scaleFactor position: 1 skipNonCoveredRegions: doc: | --skipNonCoveredRegions, --skipNAs This parameter determines if non-covered regions (regions without overlapping reads) in a BAM file should be skipped. The default is to treat those regions as having a value of zero. The decision to skip non-covered regions depends on the interpretation of the data. Non-covered regions may represent, for example, repetitive regions that should be skipped. (default: False) type: boolean? inputBinding: prefix: --skipNonCoveredRegions position: 1 smoothLength: doc: | INT bp The smooth length defines a window, larger than the binSize, to average the number of reads. For example, if the --binSize is set to 20 and the --smoothLength is set to 60, then, for each bin, the average of the bin and its left and right neighbors is considered. Any value smaller than --binSize will be ignored and no smoothing will be applied. (default: None) Read processing options: type: int? inputBinding: prefix: --smoothLength position: 1 verbose: doc: "--verbose \nSet to see processing messages. (default:\ \ False)\n" type: boolean? inputBinding: prefix: --verbose position: 1 version: doc: show program's version number and exit type: boolean? inputBinding: prefix: --version position: 1 outputs: output_bam_coverage: type: File outputBinding: glob: |- ${ if (inputs.outFileName) return inputs.outFileName; if (inputs.output_suffix) return inputs.bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + inputs.output_suffix; if (inputs.outFileFormat == "bedgraph") return inputs.bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".bdg"; return inputs.bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".bw"; } baseCommand: bamCoverage arguments: - prefix: --outFileName position: 3 valueFrom: |- ${ if (inputs.outFileName) return inputs.outFileName; if (inputs.output_suffix) return inputs.bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + inputs.output_suffix; if (inputs.outFileFormat == "bedgraph") return inputs.bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".bdg"; return inputs.bam.path.replace(/^.*[\\\/]/, "").replace(/\.[^/.]+$/, "") + ".bw"; } hints: DockerRequirement: dockerPull: reddylab/deeptools:3.0.1 out: - output_bam_coverage bdg2bw-raw: in: bed_graph: bedsort_genomecov/bed_file_sorted genome_sizes: input_genome_sizes output_suffix: valueFrom: .raw.bw scatter: bed_graph run: cwlVersion: v1.0 class: CommandLineTool doc: 'Tool: bedGraphToBigWig v 4 - Convert a bedGraph file to bigWig format.' requirements: InlineJavascriptRequirement: {} inputs: bed_graph: doc: "\tbed_graph is a four column file in the format: \ \ \n" type: File inputBinding: position: 1 genome_sizes: doc: "\tgenome_sizes is two column: .\n" type: File inputBinding: position: 2 output_suffix: type: string default: .bw outputs: output_bigwig: type: File outputBinding: glob: |- $(inputs.bed_graph.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, "") + inputs.output_suffix) baseCommand: bedGraphToBigWig arguments: - position: 3 valueFrom: |- $(inputs.bed_graph.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, "") + inputs.output_suffix) hints: DockerRequirement: dockerPull: dukegcb/bedgraphtobigwig out: - output_bigwig bedsort_genomecov: in: bed_file: bedtools_genomecov/output_bedfile scatter: bed_file run: cwlVersion: v1.0 class: CommandLineTool doc: | bedSort - Sort a .bed file by chrom,chromStart usage: bedSort in.bed out.bed in.bed and out.bed may be the same. requirements: InlineJavascriptRequirement: {} inputs: bed_file: doc: Bed or bedGraph file to be sorted type: File inputBinding: position: 1 outputs: bed_file_sorted: type: File outputBinding: glob: $(inputs.bed_file.path.replace(/^.*[\\\/]/, '') + "_sorted") baseCommand: bedSort arguments: - position: 2 valueFrom: $(inputs.bed_file.path.replace(/^.*[\\\/]/, '') + "_sorted") hints: DockerRequirement: dockerPull: dleehr/docker-hubutils out: - bed_file_sorted bedtools_genomecov: in: bg: valueFrom: ${return true} g: input_genome_sizes ibam: input_bam_files scatter: ibam run: cwlVersion: v1.0 class: CommandLineTool doc: |- Tool: bedtools genomecov (aka genomeCoverageBed) Version: v2.25.0 Summary: Compute the coverage of a feature file among a genome. Usage: bedtools genomecov [OPTIONS] -i -g Options: -ibam The input file is in BAM format. Note: BAM _must_ be sorted by position -d Report the depth at each genome position (with one-based coordinates). Default behavior is to report a histogram. -dz Report the depth at each genome position (with zero-based coordinates). Reports only non-zero positions. Default behavior is to report a histogram. -bg Report depth in BedGraph format. For details, see: genome.ucsc.edu/goldenPath/help/bedgraph.html -bga Report depth in BedGraph format, as above (-bg). However with this option, regions with zero coverage are also reported. This allows one to quickly extract all regions of a genome with 0 coverage by applying: "grep -w 0$" to the output. -split Treat "split" BAM or BED12 entries as distinct BED intervals. when computing coverage. For BAM files, this uses the CIGAR "N" and "D" operations to infer the blocks for computing coverage. For BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds fields (i.e., columns 10,11,12). -strand Calculate coverage of intervals from a specific strand. With BED files, requires at least 6 columns (strand is column 6). - (STRING): can be + or - -5 Calculate coverage of 5" positions (instead of entire interval). -3 Calculate coverage of 3" positions (instead of entire interval). -max Combine all positions with a depth >= max into a single bin in the histogram. Irrelevant for -d and -bedGraph - (INTEGER) -scale Scale the coverage by a constant factor. Each coverage value is multiplied by this factor before being reported. Useful for normalizing coverage by, e.g., reads per million (RPM). - Default is 1.0; i.e., unscaled. - (FLOAT) -trackline Adds a UCSC/Genome-Browser track line definition in the first line of the output. - See here for more details about track line definition: http://genome.ucsc.edu/goldenPath/help/bedgraph.html - NOTE: When adding a trackline definition, the output BedGraph can be easily uploaded to the Genome Browser as a custom track, BUT CAN NOT be converted into a BigWig file (w/o removing the first line). -trackopts Writes additional track line definition parameters in the first line. - Example: -trackopts 'name="My Track" visibility=2 color=255,30,30' Note the use of single-quotes if you have spaces in your parameters. - (TEXT) Notes: (1) The genome file should tab delimited and structured as follows: For example, Human (hg19): chr1 249250621 chr2 243199373 ... chr18_gl000207_random 4262 (2) The input BED (-i) file must be grouped by chromosome. A simple "sort -k 1,1 > .sorted" will suffice. (3) The input BAM (-ibam) file must be sorted by position. A "samtools sort " should suffice. Tips: One can use the UCSC Genome Browser's MySQL database to extract chromosome sizes. For example, H. sapiens: mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \ "select chrom, size from hg19.chromInfo" > hg19.genome requirements: InlineJavascriptRequirement: {} ShellCommandRequirement: {} inputs: '3': doc: "\tCalculate coverage of 3\" positions (instead of entire interval).\n" type: boolean? inputBinding: prefix: '-3' position: 1 '5': doc: "\tCalculate coverage of 5\" positions (instead of entire interval).\n" type: boolean? inputBinding: prefix: '-5' position: 1 bg: doc: | Report depth in BedGraph format. For details, see: genome.ucsc.edu/goldenPath/help/bedgraph.html type: boolean? inputBinding: prefix: -bg position: 1 bga: doc: | Report depth in BedGraph format, as above (-bg). However with this option, regions with zero coverage are also reported. This allows one to quickly extract all regions of a genome with 0 coverage by applying: "grep -w 0$" to the output. type: boolean? inputBinding: prefix: -bga position: 1 d: doc: | Report the depth at each genome position (with one-based coordinates). Default behavior is to report a histogram. type: boolean? inputBinding: prefix: -d position: 1 dz: doc: | Report the depth at each genome position (with zero-based coordinates). Reports only non-zero positions. Default behavior is to report a histogram. type: boolean? inputBinding: prefix: -dz position: 1 g: doc: type: File inputBinding: prefix: -g position: 3 ibam: doc: "\tThe input file is in BAM format.\nNote: BAM _must_ be sorted\ \ by position\n" type: File inputBinding: prefix: -ibam position: 2 max: doc: | Combine all positions with a depth >= max into a single bin in the histogram. Irrelevant for -d and -bedGraph - (INTEGER) type: int? inputBinding: prefix: -max position: 1 scale: doc: | Scale the coverage by a constant factor. Each coverage value is multiplied by this factor before being reported. Useful for normalizing coverage by, e.g., reads per million (RPM). - Default is 1.0; i.e., unscaled. - (FLOAT) type: float? inputBinding: prefix: -scale position: 1 split: doc: | Treat "split" BAM or BED12 entries as distinct BED intervals. when computing coverage. For BAM files, this uses the CIGAR "N" and "D" operations to infer the blocks for computing coverage. For BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds fields (i.e., columns 10,11,12). type: boolean? inputBinding: prefix: -split position: 1 strand: doc: | Calculate coverage of intervals from a specific strand. With BED files, requires at least 6 columns (strand is column 6). - (STRING): can be + or - type: string? inputBinding: prefix: -strand position: 1 trackline: doc: | Adds a UCSC/Genome-Browser track line definition in the first line of the output. - See here for more details about track line definition: http://genome.ucsc.edu/goldenPath/help/bedgraph.html - NOTE: When adding a trackline definition, the output BedGraph can be easily uploaded to the Genome Browser as a custom track, BUT CAN NOT be converted into a BigWig file (w/o removing the first line). type: boolean? inputBinding: prefix: -trackline position: 1 trackopts: doc: | Writes additional track line definition parameters in the first line. - Example: -trackopts 'name="My Track" visibility=2 color=255,30,30' Note the use of single-quotes if you have spaces in your parameters. - (TEXT) type: string? inputBinding: prefix: -trackopts position: 1 outputs: output_bedfile: type: File outputBinding: glob: $(inputs.ibam.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.bdg') stdout: $(inputs.ibam.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.bdg') baseCommand: - bedtools - genomecov hints: DockerRequirement: dockerPull: dukegcb/bedtools out: - output_bedfile out: - bigwig_raw_files - bigwig_norm_files trimm: in: input_adapters_files: qc/output_custom_adapters input_fastq_files: input_fastq_files nthreads: nthreads_trimm trimmomatic_jar_path: trimmomatic_jar_path trimmomatic_java_opts: trimmomatic_java_opts run: cwlVersion: v1.0 class: Workflow doc: 'ATAC-seq 02 trimming - reads: SE' requirements: - class: ScatterFeatureRequirement - class: StepInputExpressionRequirement - class: InlineJavascriptRequirement inputs: input_adapters_files: doc: Input adapters files type: File[] input_fastq_files: doc: Input fastq files type: File[] nthreads: doc: Number of threads type: int default: 1 quality_score: type: string default: -phred33 trimmomatic_jar_path: doc: Trimmomatic Java jar file type: string default: /usr/share/java/trimmomatic.jar trimmomatic_java_opts: doc: JVM arguments should be a quoted, space separated list type: string? outputs: output_data_fastq_trimmed_files: doc: Trimmed fastq files type: File[] outputSource: trimmomatic/output_read1_trimmed_file output_trimmed_fastq_read_count: doc: Trimmed read counts of fastq files type: File[] outputSource: count_fastq_reads/output_read_count steps: count_fastq_reads: in: input_basename: extract_basename/output_basename input_fastq_file: trimmomatic/output_read1_trimmed_file scatter: - input_fastq_file - input_basename scatterMethod: dotproduct run: cwlVersion: v1.0 class: CommandLineTool doc: Counts reads in a fastq file requirements: InlineJavascriptRequirement: {} inputs: input_basename: type: string input_fastq_file: type: File inputBinding: position: 1 outputs: output_read_count: type: File outputBinding: glob: $(inputs.input_basename + '.read_count.txt') stdout: $(inputs.input_basename + '.read_count.txt') baseCommand: count-fastq-reads.sh hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr out: - output_read_count extract_basename: in: input_file: trimmomatic/output_read1_trimmed_file scatter: input_file run: cwlVersion: v1.0 class: CommandLineTool doc: Extracts the base name of a file requirements: InlineJavascriptRequirement: {} inputs: input_file: type: File inputBinding: position: 1 outputs: output_basename: type: string outputBinding: outputEval: |- $(inputs.input_file.path.substr(inputs.input_file.path.lastIndexOf('/') + 1, inputs.input_file.path.lastIndexOf('.') - (inputs.input_file.path.lastIndexOf('/') + 1))) baseCommand: echo hints: DockerRequirement: dockerPull: reddylab/workflow-utils:ggr out: - output_basename trimmomatic: in: end_mode: valueFrom: SE illuminaclip: valueFrom: 2:30:15 input_adapters_file: input_adapters_files input_read1_fastq_file: input_fastq_files java_opts: trimmomatic_java_opts leading: valueFrom: ${return 3} minlen: valueFrom: ${return 15} nthreads: nthreads phred: valueFrom: '33' slidingwindow: valueFrom: 4:20 trailing: valueFrom: ${return 3} trimmomatic_jar_path: trimmomatic_jar_path scatter: - input_read1_fastq_file - input_adapters_file scatterMethod: dotproduct run: cwlVersion: v1.0 class: CommandLineTool doc: | Trimmomatic is a fast, multithreaded command line tool that can be used to trim and crop Illumina (FASTQ) data as well as to remove adapters. These adapters can pose a real problem depending on the library preparation and downstream application. There are two major modes of the program: Paired end mode and Single end mode. The paired end mode will maintain correspondence of read pairs and also use the additional information contained in paired reads to better find adapter or PCR primer fragments introduced by the library preparation process. Trimmomatic works with FASTQ files (using phred + 33 or phred + 64 quality scores, depending on the Illumina pipeline used). requirements: InlineJavascriptRequirement: {} ShellCommandRequirement: {} inputs: avgqual: doc: | Drop the read if the average quality is below the specified level : Specifies the minimum average quality required to keep a read. type: int? inputBinding: prefix: 'AVGQUAL:' position: 101 separate: false crop: doc: | Removes bases regardless of quality from the end of the read, so that the read has maximally the specified length after this step has been performed. Steps performed after CROP might of course further shorten the read. : The number of bases to keep, from the start of the read. type: int? inputBinding: prefix: 'CROP:' position: 13 separate: false end_mode: doc: "SE|PE\nSingle End (SE) or Paired End (PE) mode\n" type: string inputBinding: position: 3 headcrop: doc: | Removes the specified number of bases, regardless of quality, from the beginning of the read. : The number of bases to keep, from the start of the read. type: int? inputBinding: prefix: 'HEADCROP:' position: 13 separate: false illuminaclip: doc: | ::::: Find and remove Illumina adapters. REQUIRED: : specifies the path to a fasta file containing all the adapters, PCR sequences etc. The naming of the various sequences within this file determines how they are used. See below. : specifies the maximum mismatch count which will still allow a full match to be performed : specifies how accurate the match between the two 'adapter ligated' reads must be for PE palindrome read alignment. : specifies how accurate the match between any adapter etc. sequence must be against a read OPTIONAL: : In addition to the alignment score, palindrome mode can verify that a minimum length of adapter has been detected. If unspecified, this defaults to 8 bases, for historical reasons. However, since palindrome mode has a very low false positive rate, this can be safely reduced, even down to 1, to allow shorter adapter fragments to be removed. : After read-though has been detected by palindrome mode, and the adapter sequence removed, the reverse read contains the same sequence information as the forward read, albeit in reverse complement. For this reason, the default behaviour is to entirely drop the reverse read. By specifying "true" for this parameter, the reverse read will also be retained, which may be useful e.g. if the downstream tools cannot handle a combination of paired and unpaired reads. type: string input_adapters_file: doc: |- FASTA file containing adapters, PCR sequences, etc. It is used to search for and remove these sequences in the input FASTQ file(s) type: File input_read1_fastq_file: doc: FASTQ file for input read (read R1 in Paired End mode) type: File inputBinding: position: 5 input_read2_fastq_file: doc: FASTQ file for read R2 in Paired End mode type: File? inputBinding: position: 6 java_opts: doc: |- JVM arguments should be a quoted, space separated list (e.g. "-Xms128m -Xmx512m") type: string? inputBinding: position: 1 shellQuote: false leading: doc: | Remove low quality bases from the beginning. As long as a base has a value below this threshold the base is removed and the next base will be investigated. : Specifies the minimum quality required to keep a base. type: int? inputBinding: prefix: 'LEADING:' position: 14 separate: false log_filename: doc: | Specifying a trimlog file creates a log of all read trimmings, indicating the following details: the read name the surviving sequence length the location of the first surviving base, aka. the amount trimmed from the start the location of the last surviving base in the original read the amount trimmed from the end : filename for the generated output log file. type: string? inputBinding: prefix: -trimlog position: 4 maxinfo: doc: | : Performs an adaptive quality trim, balancing the benefits of retaining longer reads against the costs of retaining bases with errors. : This specifies the read length which is likely to allow the location of the read within the target sequence to be determined. : This value, which should be set between 0 and 1, specifies the balance between preserving as much read length as possible vs. removal of incorrect bases. A low value of this parameter (<0.2) favours longer reads, while a high value (>0.8) favours read correctness. type: string? inputBinding: prefix: 'MAXINFO:' position: 15 separate: false minlen: doc: | This module removes reads that fall below the specified minimal length. If required, it should normally be after all other processing steps. Reads removed by this step will be counted and included in the "dropped reads" count presented in the trimmomatic summary. : Specifies the minimum length of reads to be kept type: int? inputBinding: prefix: 'MINLEN:' position: 100 separate: false nthreads: doc: Number of threads type: int default: 1 inputBinding: prefix: -threads position: 4 phred: doc: | "33"|"64" -phred33 ("33") or -phred64 ("64") specifies the base quality encoding. Default: -phred64 type: string default: '64' inputBinding: prefix: -phred position: 4 separate: false slidingwindow: doc: | : Perform a sliding window trimming, cutting once the average quality within the window falls below a threshold. By considering multiple bases, a single poor quality base will not cause the removal of high quality data later in the read. : specifies the number of bases to average across : specifies the average quality required type: string? inputBinding: prefix: 'SLIDINGWINDOW:' position: 15 separate: false tophred33: doc: This (re)encodes the quality part of the FASTQ file to base 33. type: boolean? inputBinding: prefix: TOPHRED33 position: 12 separate: false tophred64: doc: This (re)encodes the quality part of the FASTQ file to base 64. type: boolean? inputBinding: prefix: TOPHRED64 position: 12 separate: false trailing: doc: | Remove low quality bases from the end. As long as a base has a value below this threshold the base is removed and the next base (which as trimmomatic is starting from the 3" prime end would be base preceding the just removed base) will be investigated. This approach can be used removing the special illumina "low quality segment" regions (which are marked with quality score of 2), but we recommend Sliding Window or MaxInfo instead : Specifies the minimum quality required to keep a base. type: int? inputBinding: prefix: 'TRAILING:' position: 14 separate: false trimmomatic_jar_path: type: string inputBinding: prefix: -jar position: 2 outputs: output_log_file: doc: Trimmomatic Log file. type: File? outputBinding: glob: $(inputs.log_filename) output_read1_trimmed_file: type: File outputBinding: glob: |- $(inputs.input_read1_fastq_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.trimmed.fastq') output_read1_trimmed_unpaired_file: type: File? outputBinding: glob: | ${ if (inputs.end_mode == "PE") return inputs.input_read1_fastq_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.unpaired.trimmed.fastq'; return null; } output_read2_trimmed_paired_file: type: File? outputBinding: glob: | ${ if (inputs.end_mode == "PE" && inputs.input_read2_fastq_file) return inputs.input_read2_fastq_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.trimmed.fastq'; return null; } output_read2_trimmed_unpaired_file: type: File? outputBinding: glob: | ${ if (inputs.end_mode == "PE" && inputs.input_read2_fastq_file) return inputs.input_read2_fastq_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.unpaired.trimmed.fastq'; return null; } baseCommand: java arguments: - position: 1 valueFrom: $("-Djava.io.tmpdir="+runtime.tmpdir) shellQuote: false - position: 7 valueFrom: |- $(inputs.input_read1_fastq_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.trimmed.fastq') - position: 8 valueFrom: | ${ if (inputs.end_mode == "PE" && inputs.input_read2_fastq_file) return inputs.input_read1_fastq_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.trimmed.unpaired.fastq'; return null; } - position: 9 valueFrom: | ${ if (inputs.end_mode == "PE" && inputs.input_read2_fastq_file) return inputs.input_read2_fastq_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.trimmed.fastq'; return null; } - position: 10 valueFrom: | ${ if (inputs.end_mode == "PE" && inputs.input_read2_fastq_file) return inputs.input_read2_fastq_file.path.replace(/^.*[\\\/]/, '').replace(/\.[^/.]+$/, '') + '.trimmed.unpaired.fastq'; return null; } - position: 11 valueFrom: $("ILLUMINACLIP:" + inputs.input_adapters_file.path + ":"+ inputs.illuminaclip) hints: DockerRequirement: dockerPull: dukegcb/trimmomatic out: - output_read1_trimmed_file out: - output_data_fastq_trimmed_files - output_trimmed_fastq_read_count id: |- https://api.sbgenomics.com/v2/apps/kghosesbg/sbpla-31744/ATAC-seq-pipeline-se/2/raw/ sbg:appVersion: - v1.0 sbg:content_hash: ad9474546d1d7aba5aa20e3c7a03b5429e5f8ec1d18be92cbab7315600a6bce48 sbg:contributors: - kghosesbg sbg:createdBy: kghosesbg sbg:createdOn: 1580500895 sbg:id: kghosesbg/sbpla-31744/ATAC-seq-pipeline-se/2 sbg:image_url: |- https://igor.sbgenomics.com/ns/brood/images/kghosesbg/sbpla-31744/ATAC-seq-pipeline-se/2.png sbg:latestRevision: 2 sbg:modifiedBy: kghosesbg sbg:modifiedOn: 1581699121 sbg:project: kghosesbg/sbpla-31744 sbg:projectName: SBPLA-31744 sbg:publisher: sbg sbg:revision: 2 sbg:revisionNotes: |- Uploaded using sbpack v2020.02.14. Source: https://raw.githubusercontent.com/Duke-GCB/GGR-cwl/master/v1.0/ATAC-seq_pipeline/pipeline-se.cwl sbg:revisionsInfo: - sbg:modifiedBy: kghosesbg sbg:modifiedOn: 1580500895 sbg:revision: 0 sbg:revisionNotes: |- Uploaded using sbpack. Source: https://raw.githubusercontent.com/Duke-GCB/GGR-cwl/master/v1.0/ATAC-seq_pipeline/pipeline-se.cwl - sbg:modifiedBy: kghosesbg sbg:modifiedOn: 1580742764 sbg:revision: 1 sbg:revisionNotes: Just moved a node - sbg:modifiedBy: kghosesbg sbg:modifiedOn: 1581699121 sbg:revision: 2 sbg:revisionNotes: |- Uploaded using sbpack v2020.02.14. Source: https://raw.githubusercontent.com/Duke-GCB/GGR-cwl/master/v1.0/ATAC-seq_pipeline/pipeline-se.cwl sbg:sbgMaintained: false sbg:validationErrors: - 'Required input is not set: #qc.input_fastq_files' - 'Required input is not set: #qc.default_adapters_file' - 'Required input is not set: #qc.nthreads' - 'Required input is not set: #trimm.input_fastq_files' - 'Required input is not set: #trimm.input_adapters_files' - 'Required input is not set: #map.input_fastq_files' - 'Required input is not set: #map.genome_sizes_file' - 'Required input is not set: #map.genome_ref_first_index_file' - 'Required input is not set: #peak_call.input_bam_files' - 'Required input is not set: #peak_call.input_genome_sizes' - 'Required input is not set: #peak_call.as_narrowPeak_file' - 'Required input is not set: #quant.input_bam_files' - 'Required input is not set: #quant.input_genome_sizes' cwl-format-2022.02.18/tests/cwl/formatted-commented.cwl000066400000000000000000000010001420374476100225560ustar00rootroot00000000000000#!/usr/bin/env cwl-runner # Top comment is preserved cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} inputs: in1: type: string inputBinding: position: 1 valueFrom: A_$(inputs.in1)_B_${return inputs.in1}_C_$(inputs.in1) outputs: out1: type: string outputBinding: glob: out.txt outputEval: $(self[0].contents)_D_$(runtime.cores) loadContents: true stdout: out.txt baseCommand: echo arguments: - valueFrom: $(runtime) cwl-format-2022.02.18/tests/cwl/formatted-fragment.cwl000066400000000000000000000001321420374476100224130ustar00rootroot00000000000000#!/usr/bin/env cwl-runner 'A: this should go first': - what - a - list no such field: 22 cwl-format-2022.02.18/tests/cwl/formatted-no-comment.cwl000066400000000000000000000007431420374476100226740ustar00rootroot00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} inputs: in1: type: string inputBinding: position: 1 valueFrom: A_$(inputs.in1)_B_${return inputs.in1}_C_$(inputs.in1) outputs: out1: type: string outputBinding: glob: out.txt outputEval: $(self[0].contents)_D_$(runtime.cores) loadContents: true stdout: out.txt baseCommand: echo arguments: - valueFrom: $(runtime) cwl-format-2022.02.18/tests/cwl/formatted-other-runner.cwl000066400000000000000000000007451420374476100232520ustar00rootroot00000000000000#!/usr/bin/env other-runner cwlVersion: v1.0 class: CommandLineTool requirements: InlineJavascriptRequirement: {} inputs: in1: type: string inputBinding: position: 1 valueFrom: A_$(inputs.in1)_B_${return inputs.in1}_C_$(inputs.in1) outputs: out1: type: string outputBinding: glob: out.txt outputEval: $(self[0].contents)_D_$(runtime.cores) loadContents: true stdout: out.txt baseCommand: echo arguments: - valueFrom: $(runtime) cwl-format-2022.02.18/tests/cwl/original-atac-seq-pipeline.json000066400000000000000000006212641420374476100241330ustar00rootroot00000000000000{ "class": "Workflow", "cwlVersion": "v1.0", "doc": "ATAC-seq pipeline - reads: SE", "requirements": [ { "class": "ScatterFeatureRequirement" }, { "class": "SubworkflowFeatureRequirement" }, { "class": "StepInputExpressionRequirement" } ], "inputs": { "input_fastq_files": { "type": "File[]" }, "genome_sizes_file": { "doc": "Genome sizes tab-delimited file (used in samtools)", "type": "File" }, "default_adapters_file": { "doc": "Adapters file", "type": "File" }, "genome_effective_size": { "default": "hs", "doc": "Effective genome size used by MACS2. It can be numeric or a shortcuts:'hs' for human (2.7e9), 'mm' for mouse (1.87e9), 'ce' for C. elegans (9e7) and 'dm' for fruitfly (1.2e8), Default:hs", "type": "string" }, "genome_ref_first_index_file": { "doc": "\"First index file of Bowtie reference genome with extension 1.ebwt. \\ (Note: the rest of the index files MUST be in the same folder)\" ", "type": "File", "secondaryFiles": [ "^^.2.ebwt", "^^.3.ebwt", "^^.4.ebwt", "^^.rev.1.ebwt", "^^.rev.2.ebwt" ] }, "as_narrowPeak_file": { "doc": "Definition narrowPeak file in AutoSql format (used in bedToBigBed)", "type": "File" }, "trimmomatic_jar_path": { "doc": "Trimmomatic Java jar file", "type": "string" }, "trimmomatic_java_opts": { "doc": "JVM arguments should be a quoted, space separated list (e.g. \"-Xms128m -Xmx512m\")", "type": "string?" }, "picard_jar_path": { "doc": "Picard Java jar file", "type": "string" }, "picard_java_opts": { "doc": "JVM arguments should be a quoted, space separated list (e.g. \"-Xms128m -Xmx512m\")", "type": "string?" }, "nthreads_qc": { "doc": "Number of threads required for the 01-qc step", "type": "int" }, "nthreads_trimm": { "doc": "Number of threads required for the 02-trim step", "type": "int" }, "nthreads_map": { "doc": "Number of threads required for the 03-map step", "type": "int" }, "nthreads_peakcall": { "doc": "Number of threads required for the 04-peakcall step", "type": "int" }, "nthreads_quant": { "doc": "Number of threads required for the 05-quantification step", "type": "int" } }, "steps": { "qc": { "run": { "class": "Workflow", "cwlVersion": "v1.0", "doc": "ATAC-seq 01 QC - reads: SE", "requirements": [ { "class": "ScatterFeatureRequirement" }, { "class": "StepInputExpressionRequirement" }, { "class": "InlineJavascriptRequirement" } ], "inputs": { "input_fastq_files": { "doc": "Input fastq files", "type": "File[]" }, "default_adapters_file": { "doc": "Adapters file", "type": "File" }, "nthreads": { "doc": "Number of threads.", "type": "int" } }, "steps": { "extract_basename": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "Extracts the base name of a file", "requirements": { "InlineJavascriptRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "reddylab/workflow-utils:ggr" } }, "inputs": { "input_file": { "type": "File", "inputBinding": { "position": 1 } } }, "outputs": { "output_basename": { "type": "string", "outputBinding": { "outputEval": "$(inputs.input_file.path.substr(inputs.input_file.path.lastIndexOf('/') + 1, inputs.input_file.path.lastIndexOf('.') - (inputs.input_file.path.lastIndexOf('/') + 1)))" } } }, "baseCommand": "echo" }, "scatter": "input_file", "in": { "input_file": "input_fastq_files" }, "out": [ "output_basename" ] }, "count_raw_reads": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "Counts reads in a fastq file", "requirements": { "InlineJavascriptRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "reddylab/workflow-utils:ggr" } }, "inputs": { "input_basename": { "type": "string" }, "input_fastq_file": { "type": "File", "inputBinding": { "position": 1 } } }, "outputs": { "output_read_count": { "type": "File", "outputBinding": { "glob": "$(inputs.input_basename + '.read_count.txt')" } } }, "baseCommand": "count-fastq-reads.sh", "stdout": "$(inputs.input_basename + '.read_count.txt')" }, "scatterMethod": "dotproduct", "scatter": [ "input_fastq_file", "input_basename" ], "in": { "input_basename": "extract_basename/output_basename", "input_fastq_file": "input_fastq_files" }, "out": [ "output_read_count" ] }, "fastqc": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "requirements": { "InlineJavascriptRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "dukegcb/fastqc" } }, "inputs": { "format": { "type": "string", "default": "fastq", "inputBinding": { "position": 3, "prefix": "--format" } }, "threads": { "type": "int", "default": 1, "inputBinding": { "position": 5, "prefix": "--threads" } }, "noextract": { "type": "boolean", "default": true, "inputBinding": { "prefix": "--noextract", "position": 2 } }, "input_fastq_file": { "type": "File", "inputBinding": { "position": 4 } } }, "outputs": { "output_qc_report_file": { "type": "File", "outputBinding": { "glob": "$(inputs.input_fastq_file.path.replace(/^.*[\\\\\\/]/, \"\").replace(/\\.[^/.]+$/, '') + \"_fastqc.zip\")" } } }, "baseCommand": "fastqc", "arguments": [ { "valueFrom": "$(runtime.tmpdir)", "prefix": "--dir", "position": 5 }, { "valueFrom": "$(runtime.outdir)", "prefix": "-o", "position": 5 } ] }, "scatter": "input_fastq_file", "in": { "threads": "nthreads", "input_fastq_file": "input_fastq_files" }, "out": [ "output_qc_report_file" ] }, "extract_fastqc_data": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "Unzips a zipped fastqc report and returns the fastqc_data.txt file. Unzips the file to pipe and uses redirection", "hints": { "DockerRequirement": { "dockerPull": "dukegcb/fastqc" } }, "inputs": { "pipe": { "type": "boolean", "default": true, "inputBinding": { "prefix": "-p", "position": 1 } }, "input_basename": { "type": "string" }, "input_qc_report_file": { "type": "File", "inputBinding": { "position": 2 } }, "extract_pattern": { "type": "string", "default": "*/fastqc_data.txt", "inputBinding": { "position": 3 } } }, "outputs": { "output_fastqc_data_file": { "type": "File", "outputBinding": { "glob": "$(inputs.input_basename + '.fastqc_data.txt')" } } }, "baseCommand": "unzip", "stdout": "$(inputs.input_basename + '.fastqc_data.txt')" }, "scatterMethod": "dotproduct", "scatter": [ "input_qc_report_file", "input_basename" ], "in": { "input_basename": "extract_basename/output_basename", "input_qc_report_file": "fastqc/output_qc_report_file" }, "out": [ "output_fastqc_data_file" ] }, "overrepresented_sequence_extract": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "hints": { "DockerRequirement": { "dockerPull": "reddylab/overrepresented_sequence_extract:1.0" } }, "inputs": { "input_fastqc_data": { "type": "File", "inputBinding": { "position": 1 }, "doc": "fastqc_data.txt file from a fastqc report" }, "input_basename": { "type": "string", "doc": "Name of the sample - used as a base name for generating output files" }, "default_adapters_file": { "type": "File", "inputBinding": { "position": 2 }, "doc": "Adapters file in fasta format" } }, "outputs": { "output_custom_adapters": { "type": "File", "outputBinding": { "glob": "$(inputs.input_basename + '.custom_adapters.fasta')" } } }, "baseCommand": "overrepresented_sequence_extract.py", "arguments": [ { "valueFrom": "$(inputs.input_basename + '.custom_adapters.fasta')", "position": 3 } ] }, "scatterMethod": "dotproduct", "scatter": [ "input_fastqc_data", "input_basename" ], "in": { "input_fastqc_data": "extract_fastqc_data/output_fastqc_data_file", "input_basename": "extract_basename/output_basename", "default_adapters_file": "default_adapters_file" }, "out": [ "output_custom_adapters" ] }, "compare_read_counts": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "Compares 2 files", "hints": { "DockerRequirement": { "dockerPull": "reddylab/workflow-utils:ggr" } }, "inputs": { "file2": { "type": "File", "inputBinding": { "position": 2 } }, "file1": { "type": "File", "inputBinding": { "position": 1 } }, "brief": { "type": "boolean", "default": true, "inputBinding": { "prefix": "--brief", "position": 3 } } }, "outputs": { "result": { "type": "File", "outputBinding": { "glob": "stdout.txt" } } }, "baseCommand": "diff", "stdout": "stdout.txt" }, "scatterMethod": "dotproduct", "scatter": [ "file1", "file2" ], "in": { "file2": "count_fastqc_reads/output_fastqc_read_count", "file1": "count_raw_reads/output_read_count" }, "out": [ "result" ] }, "count_fastqc_reads": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "Extracts read count from fastqc_data.txt", "hints": { "DockerRequirement": { "dockerPull": "reddylab/workflow-utils:ggr" } }, "inputs": { "input_fastqc_data": { "type": "File", "inputBinding": { "position": 1 } }, "input_basename": { "type": "string" } }, "outputs": { "output_fastqc_read_count": { "type": "File", "outputBinding": { "glob": "$(inputs.input_basename + '.fastqc-read_count.txt')" } } }, "baseCommand": "count-fastqc_data-reads.sh", "stdout": "$(inputs.input_basename + '.fastqc-read_count.txt')" }, "scatterMethod": "dotproduct", "scatter": [ "input_fastqc_data", "input_basename" ], "in": { "input_fastqc_data": "extract_fastqc_data/output_fastqc_data_file", "input_basename": "extract_basename/output_basename" }, "out": [ "output_fastqc_read_count" ] } }, "outputs": { "output_fastqc_data_files": { "doc": "FastQC data files", "type": "File[]", "outputSource": "extract_fastqc_data/output_fastqc_data_file" }, "output_fastqc_report_files": { "doc": "FastQC reports in zip format", "type": "File[]", "outputSource": "fastqc/output_qc_report_file" }, "output_custom_adapters": { "outputSource": "overrepresented_sequence_extract/output_custom_adapters", "type": "File[]" }, "output_count_raw_reads": { "outputSource": "count_raw_reads/output_read_count", "type": "File[]" }, "output_diff_counts": { "outputSource": "compare_read_counts/result", "type": "File[]" } } }, "in": { "input_fastq_files": "input_fastq_files", "default_adapters_file": "default_adapters_file", "nthreads": "nthreads_qc" }, "out": [ "output_count_raw_reads", "output_diff_counts", "output_fastqc_report_files", "output_fastqc_data_files", "output_custom_adapters" ] }, "trimm": { "run": { "class": "Workflow", "cwlVersion": "v1.0", "doc": "ATAC-seq 02 trimming - reads: SE", "requirements": [ { "class": "ScatterFeatureRequirement" }, { "class": "StepInputExpressionRequirement" }, { "class": "InlineJavascriptRequirement" } ], "inputs": { "input_fastq_files": { "doc": "Input fastq files", "type": "File[]" }, "input_adapters_files": { "doc": "Input adapters files", "type": "File[]" }, "quality_score": { "default": "-phred33", "type": "string" }, "trimmomatic_jar_path": { "default": "/usr/share/java/trimmomatic.jar", "doc": "Trimmomatic Java jar file", "type": "string" }, "trimmomatic_java_opts": { "doc": "JVM arguments should be a quoted, space separated list", "type": "string?" }, "nthreads": { "default": 1, "doc": "Number of threads", "type": "int" } }, "steps": { "extract_basename": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "Extracts the base name of a file", "requirements": { "InlineJavascriptRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "reddylab/workflow-utils:ggr" } }, "inputs": { "input_file": { "type": "File", "inputBinding": { "position": 1 } } }, "outputs": { "output_basename": { "type": "string", "outputBinding": { "outputEval": "$(inputs.input_file.path.substr(inputs.input_file.path.lastIndexOf('/') + 1, inputs.input_file.path.lastIndexOf('.') - (inputs.input_file.path.lastIndexOf('/') + 1)))" } } }, "baseCommand": "echo" }, "scatter": "input_file", "in": { "input_file": "trimmomatic/output_read1_trimmed_file" }, "out": [ "output_basename" ] }, "trimmomatic": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "Trimmomatic is a fast, multithreaded command line tool that can be used to trim and crop\nIllumina (FASTQ) data as well as to remove adapters. These adapters can pose a real problem\ndepending on the library preparation and downstream application.\nThere are two major modes of the program: Paired end mode and Single end mode. The\npaired end mode will maintain correspondence of read pairs and also use the additional\ninformation contained in paired reads to better find adapter or PCR primer fragments\nintroduced by the library preparation process.\nTrimmomatic works with FASTQ files (using phred + 33 or phred + 64 quality scores,\ndepending on the Illumina pipeline used).\n", "requirements": { "InlineJavascriptRequirement": {}, "ShellCommandRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "dukegcb/trimmomatic" } }, "inputs": { "phred": { "type": "string", "default": "64", "inputBinding": { "prefix": "-phred", "separate": false, "position": 4 }, "doc": "\"33\"|\"64\"\n-phred33 (\"33\") or -phred64 (\"64\") specifies the base quality encoding. Default: -phred64\n" }, "headcrop": { "type": "int?", "inputBinding": { "position": 13, "prefix": "HEADCROP:", "separate": false }, "doc": "\nRemoves the specified number of bases, regardless of quality, from the beginning of the read.\n: The number of bases to keep, from the start of the read.\n" }, "end_mode": { "type": "string", "inputBinding": { "position": 3 }, "doc": "SE|PE\nSingle End (SE) or Paired End (PE) mode\n" }, "input_read1_fastq_file": { "type": "File", "inputBinding": { "position": 5 }, "doc": "FASTQ file for input read (read R1 in Paired End mode)" }, "minlen": { "type": "int?", "inputBinding": { "position": 100, "prefix": "MINLEN:", "separate": false }, "doc": "\nThis module removes reads that fall below the specified minimal length. If required, it should\nnormally be after all other processing steps. Reads removed by this step will be counted and\nincluded in the \"dropped reads\" count presented in the trimmomatic summary.\n: Specifies the minimum length of reads to be kept\n" }, "input_read2_fastq_file": { "type": "File?", "inputBinding": { "position": 6 }, "doc": "FASTQ file for read R2 in Paired End mode" }, "leading": { "type": "int?", "inputBinding": { "position": 14, "prefix": "LEADING:", "separate": false }, "doc": "\nRemove low quality bases from the beginning. As long as a base has a value below this\nthreshold the base is removed and the next base will be investigated.\n: Specifies the minimum quality required to keep a base.\n" }, "log_filename": { "type": "string?", "inputBinding": { "position": 4, "prefix": "-trimlog" }, "doc": "\nSpecifying a trimlog file creates a log of all read trimmings, indicating the following details:\n the read name\n the surviving sequence length\n the location of the first surviving base, aka. the amount trimmed from the start\n the location of the last surviving base in the original read\n the amount trimmed from the end\n: filename for the generated output log file.\n" }, "slidingwindow": { "type": "string?", "inputBinding": { "position": 15, "prefix": "SLIDINGWINDOW:", "separate": false }, "doc": ":\nPerform a sliding window trimming, cutting once the average quality within the window falls\nbelow a threshold. By considering multiple bases, a single poor quality base will not cause the\nremoval of high quality data later in the read.\n: specifies the number of bases to average across\n: specifies the average quality required\n" }, "illuminaclip": { "type": "string", "doc": ":::::\nFind and remove Illumina adapters.\nREQUIRED:\n: specifies the path to a fasta file containing all the adapters, PCR sequences etc.\nThe naming of the various sequences within this file determines how they are used. See below.\n: specifies the maximum mismatch count which will still allow a full match to be performed\n: specifies how accurate the match between the two 'adapter ligated' reads must be\nfor PE palindrome read alignment.\n: specifies how accurate the match between any adapter etc. sequence must be against a read\nOPTIONAL:\n: In addition to the alignment score, palindrome mode can verify\nthat a minimum length of adapter has been detected. If unspecified, this defaults to 8 bases,\nfor historical reasons. However, since palindrome mode has a very low false positive rate, this\ncan be safely reduced, even down to 1, to allow shorter adapter fragments to be removed.\n: After read-though has been detected by palindrome mode, and the\nadapter sequence removed, the reverse read contains the same sequence information as the\nforward read, albeit in reverse complement. For this reason, the default behaviour is to\nentirely drop the reverse read. By specifying \"true\" for this parameter, the reverse read will\nalso be retained, which may be useful e.g. if the downstream tools cannot handle a\ncombination of paired and unpaired reads.\n" }, "crop": { "type": "int?", "inputBinding": { "position": 13, "prefix": "CROP:", "separate": false }, "doc": "\nRemoves bases regardless of quality from the end of the read, so that the read has maximally\nthe specified length after this step has been performed. Steps performed after CROP might of\ncourse further shorten the read.\n: The number of bases to keep, from the start of the read.\n" }, "nthreads": { "type": "int", "default": 1, "inputBinding": { "position": 4, "prefix": "-threads" }, "doc": "Number of threads" }, "java_opts": { "type": "string?", "inputBinding": { "position": 1, "shellQuote": false }, "doc": "JVM arguments should be a quoted, space separated list (e.g. \"-Xms128m -Xmx512m\")" }, "tophred64": { "type": "boolean?", "inputBinding": { "position": 12, "prefix": "TOPHRED64", "separate": false }, "doc": "This (re)encodes the quality part of the FASTQ file to base 64." }, "avgqual": { "type": "int?", "inputBinding": { "position": 101, "prefix": "AVGQUAL:", "separate": false }, "doc": "\nDrop the read if the average quality is below the specified level\n: Specifies the minimum average quality required to keep a read.\n" }, "tophred33": { "type": "boolean?", "inputBinding": { "position": 12, "prefix": "TOPHRED33", "separate": false }, "doc": "This (re)encodes the quality part of the FASTQ file to base 33." }, "trailing": { "type": "int?", "inputBinding": { "position": 14, "prefix": "TRAILING:", "separate": false }, "doc": "\nRemove low quality bases from the end. As long as a base has a value below this threshold\nthe base is removed and the next base (which as trimmomatic is starting from the 3\" prime end\nwould be base preceding the just removed base) will be investigated. This approach can be\nused removing the special illumina \"low quality segment\" regions (which are marked with\nquality score of 2), but we recommend Sliding Window or MaxInfo instead\n: Specifies the minimum quality required to keep a base.\n" }, "maxinfo": { "type": "string?", "inputBinding": { "position": 15, "prefix": "MAXINFO:", "separate": false }, "doc": ":\nPerforms an adaptive quality trim, balancing the benefits of retaining longer reads against the\ncosts of retaining bases with errors.\n: This specifies the read length which is likely to allow the\nlocation of the read within the target sequence to be determined.\n: This value, which should be set between 0 and 1, specifies the\nbalance between preserving as much read length as possible vs. removal of incorrect\nbases. A low value of this parameter (<0.2) favours longer reads, while a high value\n(>0.8) favours read correctness.\n" }, "trimmomatic_jar_path": { "type": "string", "inputBinding": { "position": 2, "prefix": "-jar" } }, "input_adapters_file": { "type": "File", "doc": "FASTA file containing adapters, PCR sequences, etc. It is used to search for and remove these sequences in the input FASTQ file(s)" } }, "outputs": { "output_read1_trimmed_file": { "type": "File", "outputBinding": { "glob": "$(inputs.input_read1_fastq_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + '.trimmed.fastq')" } }, "output_log_file": { "type": "File?", "outputBinding": { "glob": "$(inputs.log_filename)" }, "doc": "Trimmomatic Log file." }, "output_read1_trimmed_unpaired_file": { "type": "File?", "outputBinding": { "glob": "${\n if (inputs.end_mode == \"PE\")\n return inputs.input_read1_fastq_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + '.unpaired.trimmed.fastq';\n return null;\n}\n" } }, "output_read2_trimmed_paired_file": { "type": "File?", "outputBinding": { "glob": "${\n if (inputs.end_mode == \"PE\" && inputs.input_read2_fastq_file)\n return inputs.input_read2_fastq_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + '.trimmed.fastq';\n return null;\n}\n" } }, "output_read2_trimmed_unpaired_file": { "type": "File?", "outputBinding": { "glob": "${\n if (inputs.end_mode == \"PE\" && inputs.input_read2_fastq_file)\n return inputs.input_read2_fastq_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + '.unpaired.trimmed.fastq';\n return null;\n}\n" } } }, "baseCommand": "java", "arguments": [ { "valueFrom": "$(\"-Djava.io.tmpdir=\"+runtime.tmpdir)", "shellQuote": false, "position": 1 }, { "valueFrom": "$(inputs.input_read1_fastq_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + '.trimmed.fastq')", "position": 7 }, { "valueFrom": "${\n if (inputs.end_mode == \"PE\" && inputs.input_read2_fastq_file)\n return inputs.input_read1_fastq_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + '.trimmed.unpaired.fastq';\n return null;\n}\n", "position": 8 }, { "valueFrom": "${\n if (inputs.end_mode == \"PE\" && inputs.input_read2_fastq_file)\n return inputs.input_read2_fastq_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + '.trimmed.fastq';\n return null;\n}\n", "position": 9 }, { "valueFrom": "${\n if (inputs.end_mode == \"PE\" && inputs.input_read2_fastq_file)\n return inputs.input_read2_fastq_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + '.trimmed.unpaired.fastq';\n return null;\n}\n", "position": 10 }, { "valueFrom": "$(\"ILLUMINACLIP:\" + inputs.input_adapters_file.path + \":\"+ inputs.illuminaclip)", "position": 11 } ] }, "scatterMethod": "dotproduct", "scatter": [ "input_read1_fastq_file", "input_adapters_file" ], "in": { "input_read1_fastq_file": "input_fastq_files", "input_adapters_file": "input_adapters_files", "phred": { "valueFrom": "33" }, "nthreads": "nthreads", "minlen": { "valueFrom": "${return 15}" }, "java_opts": "trimmomatic_java_opts", "leading": { "valueFrom": "${return 3}" }, "slidingwindow": { "valueFrom": "4:20" }, "illuminaclip": { "valueFrom": "2:30:15" }, "trailing": { "valueFrom": "${return 3}" }, "trimmomatic_jar_path": "trimmomatic_jar_path", "end_mode": { "valueFrom": "SE" } }, "out": [ "output_read1_trimmed_file" ] }, "count_fastq_reads": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "Counts reads in a fastq file", "requirements": { "InlineJavascriptRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "reddylab/workflow-utils:ggr" } }, "inputs": { "input_basename": { "type": "string" }, "input_fastq_file": { "type": "File", "inputBinding": { "position": 1 } } }, "outputs": { "output_read_count": { "type": "File", "outputBinding": { "glob": "$(inputs.input_basename + '.read_count.txt')" } } }, "baseCommand": "count-fastq-reads.sh", "stdout": "$(inputs.input_basename + '.read_count.txt')" }, "scatterMethod": "dotproduct", "scatter": [ "input_fastq_file", "input_basename" ], "in": { "input_basename": "extract_basename/output_basename", "input_fastq_file": "trimmomatic/output_read1_trimmed_file" }, "out": [ "output_read_count" ] } }, "outputs": { "output_data_fastq_trimmed_files": { "doc": "Trimmed fastq files", "type": "File[]", "outputSource": "trimmomatic/output_read1_trimmed_file" }, "output_trimmed_fastq_read_count": { "doc": "Trimmed read counts of fastq files", "type": "File[]", "outputSource": "count_fastq_reads/output_read_count" } } }, "in": { "input_fastq_files": "input_fastq_files", "input_adapters_files": "qc/output_custom_adapters", "trimmomatic_jar_path": "trimmomatic_jar_path", "trimmomatic_java_opts": "trimmomatic_java_opts", "nthreads": "nthreads_trimm" }, "out": [ "output_data_fastq_trimmed_files", "output_trimmed_fastq_read_count" ] }, "map": { "run": { "class": "Workflow", "cwlVersion": "v1.0", "doc": "ATAC-seq 03 mapping - reads: SE", "requirements": [ { "class": "ScatterFeatureRequirement" }, { "class": "SubworkflowFeatureRequirement" }, { "class": "StepInputExpressionRequirement" }, { "class": "InlineJavascriptRequirement" } ], "inputs": { "input_fastq_files": { "doc": "Input fastq files", "type": "File[]" }, "genome_sizes_file": { "doc": "Genome sizes tab-delimited file (used in samtools)", "type": "File" }, "genome_ref_first_index_file": { "doc": "Bowtie first index files for reference genome (e.g. *1.ebwt). The rest of the files should be in the same folder.", "type": "File", "secondaryFiles": [ "^^.2.ebwt", "^^.3.ebwt", "^^.4.ebwt", "^^.rev.1.ebwt", "^^.rev.2.ebwt" ] }, "picard_jar_path": { "default": "/usr/picard/picard.jar", "doc": "Picard Java jar file", "type": "string" }, "picard_java_opts": { "doc": "JVM arguments should be a quoted, space separated list (e.g. \"-Xms128m -Xmx512m\")", "type": "string?" }, "nthreads": { "default": 1, "type": "int" } }, "steps": { "extract_basename_1": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "Extracts the base name of a file", "requirements": { "InlineJavascriptRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "reddylab/workflow-utils:ggr" } }, "inputs": { "input_file": { "type": "File", "inputBinding": { "position": 1 } } }, "outputs": { "output_basename": { "type": "string", "outputBinding": { "outputEval": "$(inputs.input_file.path.substr(inputs.input_file.path.lastIndexOf('/') + 1, inputs.input_file.path.lastIndexOf('.') - (inputs.input_file.path.lastIndexOf('/') + 1)))" } } }, "baseCommand": "echo" }, "scatter": "input_file", "in": { "input_file": "input_fastq_files" }, "out": [ "output_basename" ] }, "extract_basename_2": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "Extracts the base name of a file", "hints": { "DockerRequirement": { "dockerPull": "reddylab/workflow-utils:ggr" } }, "inputs": { "file_path": { "type": "string", "inputBinding": { "position": 1 } } }, "outputs": { "output_path": { "type": "string", "outputBinding": { "outputEval": "$(inputs.file_path.replace(/\\.[^/.]+$/, \"\"))" } } }, "baseCommand": "echo" }, "scatter": "file_path", "in": { "file_path": "extract_basename_1/output_basename" }, "out": [ "output_path" ] }, "bowtie-se": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "requirements": { "InlineJavascriptRequirement": {}, "ShellCommandRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "dukegcb/bowtie" } }, "inputs": { "nthreads": { "type": "int", "default": 1, "inputBinding": { "position": 8, "prefix": "--threads" }, "doc": " number of alignment threads to launch (default: 1)" }, "sam": { "type": "boolean", "default": true, "inputBinding": { "position": 2, "prefix": "--sam" }, "doc": "Write hits in SAM format (default: BAM)" }, "seedmms": { "type": "int?", "inputBinding": { "position": 1, "prefix": "--seedmms" }, "doc": "max mismatches in seed (between [0, 3], default: 2)" }, "m": { "type": "int", "default": 1, "inputBinding": { "position": 7, "prefix": "-m" }, "doc": "Suppress all alignments if > exist (def: 1)" }, "strata": { "type": "boolean", "default": true, "inputBinding": { "position": 6, "prefix": "--strata" }, "doc": "Hits in sub-optimal strata aren't reported (requires --best)" }, "output_filename": { "type": "string" }, "seedlen": { "type": "int?", "inputBinding": { "position": 1, "prefix": "--seedlen" }, "doc": "seed length for -n (default: 28)" }, "t": { "type": "boolean", "default": true, "inputBinding": { "position": 1, "prefix": "-t" }, "doc": "Print wall-clock time taken by search phases" }, "v": { "type": "int?", "inputBinding": { "position": 3, "prefix": "-v" }, "doc": "Report end-to-end hits w/ <=v mismatches; ignore qualities" }, "X": { "type": "int?", "inputBinding": { "position": 4, "prefix": "-X" }, "doc": "maximum insert size for paired-end alignment (default: 250)" }, "trim3": { "type": "int?", "inputBinding": { "position": 1, "prefix": "--trim3" }, "doc": "trim bases from 3' (right) end of reads" }, "genome_ref_first_index_file": { "type": "File", "secondaryFiles": [ "^^.2.ebwt", "^^.3.ebwt", "^^.4.ebwt", "^^.rev.1.ebwt", "^^.rev.2.ebwt" ], "inputBinding": { "position": 9, "valueFrom": "$(self.path.split('.').splice(0,self.path.split('.').length-2).join(\".\"))" }, "doc": "First file (extension .1.ebwt) of the Bowtie index files generated for the reference genome (see http://bowtie-bio.sourceforge.net/tutorial.shtml#newi)" }, "trim5": { "type": "int?", "inputBinding": { "position": 1, "prefix": "--trim5" }, "doc": "trim bases from 5' (left) end of reads" }, "chunkmbs": { "type": "int?", "inputBinding": { "position": 5, "prefix": "--chunkmbs" }, "doc": "The number of megabytes of memory a given thread is given to store path descriptors in --best mode. (Default: 256)" }, "best": { "type": "boolean", "default": true, "inputBinding": { "position": 5, "prefix": "--best" }, "doc": "Hits guaranteed best stratum; ties broken by quality" }, "input_fastq_file": { "type": "File", "inputBinding": { "position": 10 }, "doc": "Query input FASTQ file." } }, "outputs": { "output_aligned_file": { "type": "File", "outputBinding": { "glob": "$(inputs.output_filename + '.sam')" }, "doc": "Aligned bowtie file in [SAM|BAM] format." }, "output_bowtie_log": { "type": "File", "outputBinding": { "glob": "$(inputs.output_filename + '.bowtie.log')" } } }, "baseCommand": "bowtie", "stderr": "$(inputs.output_filename + '.bowtie.log')", "arguments": [ { "valueFrom": "$(inputs.output_filename + '.sam')", "position": 11 } ] }, "scatterMethod": "dotproduct", "scatter": [ "input_fastq_file", "output_filename" ], "in": { "input_fastq_file": "input_fastq_files", "output_filename": "extract_basename_2/output_path", "v": { "valueFrom": "${return 2}" }, "X": { "valueFrom": "${return 2000}" }, "genome_ref_first_index_file": "genome_ref_first_index_file", "nthreads": "nthreads" }, "out": [ "output_aligned_file", "output_bowtie_log" ] }, "sam2bam": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "requirements": { "InlineJavascriptRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "dukegcb/samtools:1.3" } }, "inputs": { "S": { "type": "boolean", "default": true, "inputBinding": { "position": 1, "prefix": "-S" }, "doc": "Input format autodetected" }, "nthreads": { "type": "int", "default": 1, "inputBinding": { "position": 1, "prefix": "-@" }, "doc": "Number of threads used" }, "input_file": { "type": "File", "inputBinding": { "position": 2 }, "doc": "File to be converted to BAM with samtools" } }, "outputs": { "bam_file": { "type": "File", "outputBinding": { "glob": "$(inputs.input_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + '.bam')" }, "doc": "Aligned file in BAM format" } }, "baseCommand": [ "samtools", "view", "-b" ], "stdout": "$(inputs.input_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + '.bam')" }, "scatter": "input_file", "in": { "nthreads": "nthreads", "input_file": "bowtie-se/output_aligned_file" }, "out": [ "bam_file" ] }, "sort_bams": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "requirements": { "InlineJavascriptRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "dukegcb/samtools:1.3" } }, "inputs": { "n": { "type": "boolean", "default": false, "inputBinding": { "position": 1, "prefix": "-n" }, "doc": "Sort by read name" }, "nthreads": { "type": "int", "default": 1, "inputBinding": { "position": 1, "prefix": "-@" }, "doc": "Number of threads used in sorting" }, "input_file": { "type": "File", "inputBinding": { "position": 1000 }, "doc": "Aligned file to be sorted with samtools" }, "suffix": { "type": "string", "default": ".sorted.bam", "doc": "suffix of the transformed SAM/BAM file (including extension, e.g. .filtered.sam)" } }, "outputs": { "sorted_file": { "type": "File", "outputBinding": { "glob": "$(inputs.input_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + inputs.suffix)" }, "doc": "Sorted aligned file" } }, "baseCommand": [ "samtools", "sort" ], "stdout": "$(inputs.input_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + inputs.suffix)" }, "scatter": "input_file", "in": { "nthreads": "nthreads", "input_file": "sam2bam/bam_file" }, "out": [ "sorted_file" ] }, "index_bams": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "requirements": { "InlineJavascriptRequirement": {}, "InitialWorkDirRequirement": { "listing": [ "$(inputs.input_file)" ] } }, "hints": { "DockerRequirement": { "dockerPull": "dukegcb/samtools:1.3" } }, "inputs": { "input_file": { "type": "File", "inputBinding": { "position": 1 }, "doc": "Aligned file to be sorted with samtools" } }, "outputs": { "indexed_file": { "doc": "Indexed BAM file", "type": "File", "outputBinding": { "glob": "$(inputs.input_file.basename)" }, "secondaryFiles": ".bai" } }, "baseCommand": [ "samtools", "index" ], "arguments": [ { "valueFrom": "$(inputs.input_file.basename + '.bai')", "position": 2 } ] }, "scatter": "input_file", "in": { "input_file": "sort_bams/sorted_file" }, "out": [ "indexed_file" ] }, "filter-unmapped": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "requirements": { "InlineJavascriptRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "dukegcb/samtools:1.3" } }, "inputs": { "nthreads": { "type": "int", "default": 1, "inputBinding": { "position": 1, "prefix": "-@" }, "doc": "Number of threads used in sorting" }, "output_filename": { "type": "string", "doc": "Basename for the output file" }, "input_file": { "type": "File", "inputBinding": { "position": 1000 }, "doc": "Aligned file to be sorted with samtools" } }, "outputs": { "filtered_file": { "type": "File", "outputBinding": { "glob": "$(inputs.output_filename + '.accepted_hits.bam')" }, "doc": "Filter unmapped reads in aligned file" } }, "baseCommand": [ "samtools", "view", "-F", "4", "-b", "-h" ], "stdout": "$(inputs.output_filename + '.accepted_hits.bam')" }, "scatterMethod": "dotproduct", "scatter": [ "input_file", "output_filename" ], "in": { "output_filename": "extract_basename_2/output_path", "input_file": "sort_bams/sorted_file" }, "out": [ "filtered_file" ] }, "filtered2sorted": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "requirements": { "InlineJavascriptRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "dukegcb/samtools:1.3" } }, "inputs": { "n": { "type": "boolean", "default": false, "inputBinding": { "position": 1, "prefix": "-n" }, "doc": "Sort by read name" }, "nthreads": { "type": "int", "default": 1, "inputBinding": { "position": 1, "prefix": "-@" }, "doc": "Number of threads used in sorting" }, "input_file": { "type": "File", "inputBinding": { "position": 1000 }, "doc": "Aligned file to be sorted with samtools" }, "suffix": { "type": "string", "default": ".sorted.bam", "doc": "suffix of the transformed SAM/BAM file (including extension, e.g. .filtered.sam)" } }, "outputs": { "sorted_file": { "type": "File", "outputBinding": { "glob": "$(inputs.input_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + inputs.suffix)" }, "doc": "Sorted aligned file" } }, "baseCommand": [ "samtools", "sort" ], "stdout": "$(inputs.input_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + inputs.suffix)" }, "in": { "nthreads": "nthreads", "input_file": "filter-unmapped/filtered_file" }, "scatter": [ "input_file" ], "out": [ "sorted_file" ] }, "index_filtered_bam": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "requirements": { "InlineJavascriptRequirement": {}, "InitialWorkDirRequirement": { "listing": [ "$(inputs.input_file)" ] } }, "hints": { "DockerRequirement": { "dockerPull": "dukegcb/samtools:1.3" } }, "inputs": { "input_file": { "type": "File", "inputBinding": { "position": 1 }, "doc": "Aligned file to be sorted with samtools" } }, "outputs": { "indexed_file": { "doc": "Indexed BAM file", "type": "File", "outputBinding": { "glob": "$(inputs.input_file.basename)" }, "secondaryFiles": ".bai" } }, "baseCommand": [ "samtools", "index" ], "arguments": [ { "valueFrom": "$(inputs.input_file.basename + '.bai')", "position": 2 } ] }, "scatter": "input_file", "in": { "input_file": "filtered2sorted/sorted_file" }, "out": [ "indexed_file" ] }, "preseq-c-curve": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "Usage: c_curve [OPTIONS] \n\nOptions:\n -o, -output yield output file (default: stdout) \n -s, -step step size in extrapolations (default: 1e+06) \n -v, -verbose print more information \n -P, -pe input is paired end read file \n -H, -hist input is a text file containing the observed histogram \n -V, -vals input is a text file containing only the observed counts \n -B, -bam input is in BAM format \n -l, -seg_len maximum segment length when merging paired end bam reads \n (default: 5000) \n\nHelp options:\n -?, -help print this help message \n -about print about message", "requirements": { "InlineJavascriptRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "reddylab/preseq:2.0" } }, "inputs": { "V": { "type": "File?", "inputBinding": { "position": 1, "prefix": "-V" }, "doc": "-vals input is a text file containing only the observed counts \n" }, "B": { "type": "boolean", "default": true, "inputBinding": { "position": 1, "prefix": "-B" }, "doc": "-bam input is in BAM format \n" }, "output_file_basename": { "type": "string" }, "H": { "type": "File?", "inputBinding": { "position": 1, "prefix": "-H" }, "doc": "-hist input is a text file containing the observed histogram \n" }, "v": { "type": "boolean", "default": false, "inputBinding": { "position": 1, "prefix": "-v" }, "doc": "-verbose print more information \n" }, "input_sorted_file": { "type": "File", "inputBinding": { "position": 2 }, "doc": "Sorted bed or BAM file" }, "l": { "type": "int?", "inputBinding": { "position": 1, "prefix": "-l" }, "doc": "-seg_len maximum segment length when merging paired end bam reads \n(default: 5000)\nHelp options:\n-?, -help print this help message\n-about print about message\n" }, "s": { "type": "float?", "inputBinding": { "position": 1, "prefix": "-s" }, "doc": "-step step size in extrapolations (default: 1e+06) \n" }, "pe": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "-P" }, "doc": "-pe input is paired end read file \n" } }, "outputs": { "output_file": { "type": "File", "outputBinding": { "glob": "$(inputs.output_file_basename + '.preseq_c_curve.txt')" } } }, "baseCommand": [ "preseq", "c_curve" ], "stdout": "$(inputs.output_file_basename + '.preseq_c_curve.txt')" }, "scatterMethod": "dotproduct", "scatter": [ "input_sorted_file", "output_file_basename" ], "in": { "input_sorted_file": "filtered2sorted/sorted_file", "output_file_basename": "extract_basename_2/output_path" }, "out": [ "output_file" ] }, "execute_pcr_bottleneck_coef": { "in": { "input_bam_files": "filtered2sorted/sorted_file", "genome_sizes": "genome_sizes_file", "input_output_filenames": "extract_basename_2/output_path" }, "run": { "class": "Workflow", "cwlVersion": "v1.0", "doc": "ChIP-seq - map - PCR Bottleneck Coefficients", "requirements": [ { "class": "ScatterFeatureRequirement" } ], "inputs": { "input_bam_files": { "type": "File[]" }, "genome_sizes": { "type": "File" }, "input_output_filenames": { "type": "string[]" } }, "steps": { "compute_pbc": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "Compute PCR Bottleneck Coeficient from BedGraph file.", "inputs": { "bedgraph_file": { "type": "File", "inputBinding": { "position": 1 } }, "output_filename": { "type": "string" } }, "outputs": { "pbc": { "type": "File", "outputBinding": { "glob": "$(inputs.output_filename + '.PBC.txt')" } } }, "baseCommand": [ "awk", "$4==1 {N1 += $3 - $2}; $4>=1 {Nd += $3 - $2} END {print N1/Nd}" ], "stdout": "$(inputs.output_filename + '.PBC.txt')" }, "in": { "bedgraph_file": "bedtools_genomecov/output_bedfile", "output_filename": "input_output_filenames" }, "scatterMethod": "dotproduct", "scatter": [ "bedgraph_file", "output_filename" ], "out": [ "pbc" ] }, "bedtools_genomecov": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "Tool: bedtools genomecov (aka genomeCoverageBed)\nVersion: v2.25.0\nSummary: Compute the coverage of a feature file among a genome.\n\nUsage: bedtools genomecov [OPTIONS] -i -g \n\nOptions: \n\t-ibam\t\tThe input file is in BAM format.\n\t\t\tNote: BAM _must_ be sorted by position\n\n\t-d\t\tReport the depth at each genome position (with one-based coordinates).\n\t\t\tDefault behavior is to report a histogram.\n\n\t-dz\t\tReport the depth at each genome position (with zero-based coordinates).\n\t\t\tReports only non-zero positions.\n\t\t\tDefault behavior is to report a histogram.\n\n\t-bg\t\tReport depth in BedGraph format. For details, see:\n\t\t\tgenome.ucsc.edu/goldenPath/help/bedgraph.html\n\n\t-bga\t\tReport depth in BedGraph format, as above (-bg).\n\t\t\tHowever with this option, regions with zero \n\t\t\tcoverage are also reported. This allows one to\n\t\t\tquickly extract all regions of a genome with 0 \n\t\t\tcoverage by applying: \"grep -w 0$\" to the output.\n\n\t-split\t\tTreat \"split\" BAM or BED12 entries as distinct BED intervals.\n\t\t\twhen computing coverage.\n\t\t\tFor BAM files, this uses the CIGAR \"N\" and \"D\" operations \n\t\t\tto infer the blocks for computing coverage.\n\t\t\tFor BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds\n\t\t\tfields (i.e., columns 10,11,12).\n\n\t-strand\t\tCalculate coverage of intervals from a specific strand.\n\t\t\tWith BED files, requires at least 6 columns (strand is column 6). \n\t\t\t- (STRING): can be + or -\n\n\t-5\t\tCalculate coverage of 5\" positions (instead of entire interval).\n\n\t-3\t\tCalculate coverage of 3\" positions (instead of entire interval).\n\n\t-max\t\tCombine all positions with a depth >= max into\n\t\t\ta single bin in the histogram. Irrelevant\n\t\t\tfor -d and -bedGraph\n\t\t\t- (INTEGER)\n\n\t-scale\t\tScale the coverage by a constant factor.\n\t\t\tEach coverage value is multiplied by this factor before being reported.\n\t\t\tUseful for normalizing coverage by, e.g., reads per million (RPM).\n\t\t\t- Default is 1.0; i.e., unscaled.\n\t\t\t- (FLOAT)\n\n\t-trackline\tAdds a UCSC/Genome-Browser track line definition in the first line of the output.\n\t\t\t- See here for more details about track line definition:\n\t\t\t http://genome.ucsc.edu/goldenPath/help/bedgraph.html\n\t\t\t- NOTE: When adding a trackline definition, the output BedGraph can be easily\n\t\t\t uploaded to the Genome Browser as a custom track,\n\t\t\t BUT CAN NOT be converted into a BigWig file (w/o removing the first line).\n\n\t-trackopts\tWrites additional track line definition parameters in the first line.\n\t\t\t- Example:\n\t\t\t -trackopts 'name=\"My Track\" visibility=2 color=255,30,30'\n\t\t\t Note the use of single-quotes if you have spaces in your parameters.\n\t\t\t- (TEXT)\n\nNotes: \n\t(1) The genome file should tab delimited and structured as follows:\n\t \n\n\tFor example, Human (hg19):\n\tchr1\t249250621\n\tchr2\t243199373\n\t...\n\tchr18_gl000207_random\t4262\n\n\t(2) The input BED (-i) file must be grouped by chromosome.\n\t A simple \"sort -k 1,1 > .sorted\" will suffice.\n\n\t(3) The input BAM (-ibam) file must be sorted by position.\n\t A \"samtools sort \" should suffice.\n\nTips: \n\tOne can use the UCSC Genome Browser's MySQL database to extract\n\tchromosome sizes. For example, H. sapiens:\n\n\tmysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \\\n\t\"select chrom, size from hg19.chromInfo\" > hg19.genome", "requirements": { "InlineJavascriptRequirement": {}, "ShellCommandRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "dukegcb/bedtools" } }, "inputs": { "bga": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "-bga" }, "doc": "\tReport depth in BedGraph format, as above (-bg).\nHowever with this option, regions with zero\ncoverage are also reported. This allows one to\nquickly extract all regions of a genome with 0\ncoverage by applying: \"grep -w 0$\" to the output.\n" }, "bg": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "-bg" }, "doc": "\tReport depth in BedGraph format. For details, see:\ngenome.ucsc.edu/goldenPath/help/bedgraph.html\n" }, "d": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "-d" }, "doc": "\tReport the depth at each genome position (with one-based coordinates).\nDefault behavior is to report a histogram.\n" }, "g": { "type": "File", "inputBinding": { "position": 3, "prefix": "-g" }, "doc": "" }, "max": { "type": "int?", "inputBinding": { "position": 1, "prefix": "-max" }, "doc": "\tCombine all positions with a depth >= max into\na single bin in the histogram. Irrelevant\nfor -d and -bedGraph\n- (INTEGER)\n" }, "trackopts": { "type": "string?", "inputBinding": { "position": 1, "prefix": "-trackopts" }, "doc": "Writes additional track line definition parameters in the first line.\n- Example:\n-trackopts 'name=\"My Track\" visibility=2 color=255,30,30'\nNote the use of single-quotes if you have spaces in your parameters.\n- (TEXT)\n" }, "trackline": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "-trackline" }, "doc": "Adds a UCSC/Genome-Browser track line definition in the first line of the output.\n- See here for more details about track line definition:\nhttp://genome.ucsc.edu/goldenPath/help/bedgraph.html\n- NOTE: When adding a trackline definition, the output BedGraph can be easily\nuploaded to the Genome Browser as a custom track,\nBUT CAN NOT be converted into a BigWig file (w/o removing the first line).\n" }, "3": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "-3" }, "doc": "\tCalculate coverage of 3\" positions (instead of entire interval).\n" }, "scale": { "type": "float?", "inputBinding": { "position": 1, "prefix": "-scale" }, "doc": "\tScale the coverage by a constant factor.\nEach coverage value is multiplied by this factor before being reported.\nUseful for normalizing coverage by, e.g., reads per million (RPM).\n- Default is 1.0; i.e., unscaled.\n- (FLOAT)\n" }, "dz": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "-dz" }, "doc": "\tReport the depth at each genome position (with zero-based coordinates).\nReports only non-zero positions.\nDefault behavior is to report a histogram.\n" }, "split": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "-split" }, "doc": "\tTreat \"split\" BAM or BED12 entries as distinct BED intervals.\nwhen computing coverage.\nFor BAM files, this uses the CIGAR \"N\" and \"D\" operations\nto infer the blocks for computing coverage.\nFor BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds\nfields (i.e., columns 10,11,12).\n" }, "ibam": { "type": "File", "inputBinding": { "position": 2, "prefix": "-ibam" }, "doc": "\tThe input file is in BAM format.\nNote: BAM _must_ be sorted by position\n" }, "5": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "-5" }, "doc": "\tCalculate coverage of 5\" positions (instead of entire interval).\n" }, "strand": { "type": "string?", "inputBinding": { "position": 1, "prefix": "-strand" }, "doc": "\tCalculate coverage of intervals from a specific strand.\nWith BED files, requires at least 6 columns (strand is column 6).\n- (STRING): can be + or -\n" } }, "outputs": { "output_bedfile": { "type": "File", "outputBinding": { "glob": "$(inputs.ibam.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + '.bdg')" } } }, "baseCommand": [ "bedtools", "genomecov" ], "stdout": "$(inputs.ibam.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + '.bdg')" }, "in": { "bg": { "default": true }, "g": "genome_sizes", "ibam": "input_bam_files" }, "scatter": "ibam", "out": [ "output_bedfile" ] } }, "outputs": { "pbc_file": { "outputSource": "compute_pbc/pbc", "type": "File[]" } } }, "out": [ "pbc_file" ] }, "mark_duplicates": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "requirements": { "InlineJavascriptRequirement": {}, "ShellCommandRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "dukegcb/picard" } }, "inputs": { "remove_duplicates": { "type": "boolean", "default": false, "inputBinding": { "valueFrom": "$('REMOVE_DUPLICATES=' + self)", "position": 5 }, "doc": "If true do not write duplicates to the output file instead of writing them with appropriate flags set. (Default false)." }, "input_file": { "type": "File", "inputBinding": { "position": 4, "valueFrom": "$('INPUT=' + self.path)", "shellQuote": false }, "doc": "One or more input SAM or BAM files to analyze. Must be coordinate sorted." }, "java_opts": { "type": "string?", "inputBinding": { "position": 1, "shellQuote": false }, "doc": "JVM arguments should be a quoted, space separated list (e.g. \"-Xms128m -Xmx512m\")" }, "picard_jar_path": { "type": "string", "inputBinding": { "position": 2, "prefix": "-jar" }, "doc": "Path to the picard.jar file" }, "metrics_suffix": { "type": "string", "default": "dedup_metrics.txt", "doc": "Suffix used to create the metrics output file (Default: dedup_metrics.txt)" }, "barcode_tag": { "type": "string?", "inputBinding": { "prefix": "BARCODE_TAG=", "separate": false, "position": 5 }, "doc": "If true do not write duplicates to the output file instead of writing them with appropriate flags set. (Default false)." }, "output_filename": { "type": "string", "doc": "Output filename used as basename" }, "output_suffix": { "type": "string", "default": "dedup.bam", "doc": "Suffix used to identify the output file (Default: dedup.bam)" } }, "outputs": { "output_metrics_file": { "type": "File", "outputBinding": { "glob": "$(inputs.output_filename + '.' + inputs.metrics_suffix)" } }, "output_dedup_bam_file": { "type": "File", "outputBinding": { "glob": "$(inputs.output_filename + '.' + inputs.output_suffix)" } } }, "baseCommand": [ "java" ], "arguments": [ { "valueFrom": "MarkDuplicates", "position": 3 }, { "valueFrom": "$('OUTPUT=' + inputs.output_filename + '.' + inputs.output_suffix)", "position": 5, "shellQuote": false }, { "valueFrom": "$('METRICS_FILE='+inputs.output_filename + '.' + inputs.metrics_suffix)", "position": 5, "shellQuote": false }, { "valueFrom": "$('TMP_DIR='+runtime.tmpdir)", "position": 5, "shellQuote": false } ] }, "scatterMethod": "dotproduct", "scatter": [ "input_file", "output_filename" ], "in": { "input_file": "index_filtered_bam/indexed_file", "java_opts": "picard_java_opts", "picard_jar_path": "picard_jar_path", "output_filename": "extract_basename_2/output_path", "output_suffix": { "valueFrom": "bam" } }, "out": [ "output_metrics_file", "output_dedup_bam_file" ] }, "sort_dups_marked_bams": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "requirements": { "InlineJavascriptRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "dukegcb/samtools:1.3" } }, "inputs": { "n": { "type": "boolean", "default": false, "inputBinding": { "position": 1, "prefix": "-n" }, "doc": "Sort by read name" }, "nthreads": { "type": "int", "default": 1, "inputBinding": { "position": 1, "prefix": "-@" }, "doc": "Number of threads used in sorting" }, "input_file": { "type": "File", "inputBinding": { "position": 1000 }, "doc": "Aligned file to be sorted with samtools" }, "suffix": { "type": "string", "default": ".sorted.bam", "doc": "suffix of the transformed SAM/BAM file (including extension, e.g. .filtered.sam)" } }, "outputs": { "sorted_file": { "type": "File", "outputBinding": { "glob": "$(inputs.input_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + inputs.suffix)" }, "doc": "Sorted aligned file" } }, "baseCommand": [ "samtools", "sort" ], "stdout": "$(inputs.input_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + inputs.suffix)" }, "scatter": [ "input_file" ], "in": { "nthreads": "nthreads", "input_file": "mark_duplicates/output_dedup_bam_file", "suffix": { "valueFrom": ".dups_marked.bam" } }, "out": [ "sorted_file" ] }, "index_dups_marked_bams": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "requirements": { "InlineJavascriptRequirement": {}, "InitialWorkDirRequirement": { "listing": [ "$(inputs.input_file)" ] } }, "hints": { "DockerRequirement": { "dockerPull": "dukegcb/samtools:1.3" } }, "inputs": { "input_file": { "type": "File", "inputBinding": { "position": 1 }, "doc": "Aligned file to be sorted with samtools" } }, "outputs": { "indexed_file": { "doc": "Indexed BAM file", "type": "File", "outputBinding": { "glob": "$(inputs.input_file.basename)" }, "secondaryFiles": ".bai" } }, "baseCommand": [ "samtools", "index" ], "arguments": [ { "valueFrom": "$(inputs.input_file.basename + '.bai')", "position": 2 } ] }, "scatter": [ "input_file" ], "in": { "input_file": "sort_dups_marked_bams/sorted_file" }, "out": [ "indexed_file" ] }, "remove_duplicates": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "requirements": { "InlineJavascriptRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "dukegcb/samtools:1.3" } }, "inputs": { "b": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "-b" }, "doc": "output BAM" }, "header": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "-h" }, "doc": "Include header in output" }, "f": { "type": "int?", "inputBinding": { "position": 1, "prefix": "-f" }, "doc": "only include reads with all bits set in INT set in FLAG [0]" }, "F": { "type": "int?", "inputBinding": { "position": 1, "prefix": "-F" }, "doc": "only include reads with none of the bits set in INT set in FLAG [0]" }, "S": { "type": "boolean", "default": true, "inputBinding": { "position": 1, "prefix": "-S" }, "doc": "Input format autodetected" }, "u": { "type": "boolean", "default": true, "inputBinding": { "position": 1, "prefix": "-u" }, "doc": "uncompressed BAM output (implies -b)" }, "nthreads": { "type": "int", "default": 1, "inputBinding": { "position": 1, "prefix": "-@" }, "doc": "Number of threads used" }, "q": { "type": "int?", "inputBinding": { "position": 1, "prefix": "-q" }, "doc": "only include reads with mapping quality >= INT [0]" }, "L": { "type": "File?", "inputBinding": { "position": 1, "prefix": "-L" }, "doc": "FILE only include reads overlapping this BED FILE [null]" }, "input_file": { "type": "File", "inputBinding": { "position": 2 }, "doc": "File to be converted to BAM with samtools" }, "suffix": { "type": "string?", "doc": "suffix of the transformed SAM/BAM file (including extension, e.g. .filtered.sam)" }, "outfile_name": { "type": "string?", "doc": "Output file name. If not specified, the basename of the input file with the suffix specified in the suffix argument will be used." } }, "outputs": { "outfile": { "type": "File", "outputBinding": { "glob": "${\n if (inputs.outfile_name) return inputs.outfile_name;\n var suffix = inputs.b ? '.bam' : '.sam';\n suffix = inputs.suffix || suffix;\n return inputs.input_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + suffix\n}\n" }, "doc": "Aligned file in SAM or BAM format" } }, "baseCommand": [ "samtools", "view" ], "stdout": "${\n if (inputs.outfile_name) return inputs.outfile_name;\n var suffix = inputs.b ? '.bam' : '.sam';\n suffix = inputs.suffix || suffix;\n return inputs.input_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + suffix\n}\n" }, "scatter": [ "input_file" ], "in": { "input_file": "index_dups_marked_bams/indexed_file", "F": { "valueFrom": "${return 1024}" }, "suffix": { "valueFrom": ".dedup.bam" }, "b": { "valueFrom": "${return true}" }, "outfile_name": { "valueFrom": "${return inputs.input_file.basename.replace('dups_marked', 'dedup')}" } }, "out": [ "outfile" ] }, "sort_dedup_bams": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "requirements": { "InlineJavascriptRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "dukegcb/samtools:1.3" } }, "inputs": { "n": { "type": "boolean", "default": false, "inputBinding": { "position": 1, "prefix": "-n" }, "doc": "Sort by read name" }, "nthreads": { "type": "int", "default": 1, "inputBinding": { "position": 1, "prefix": "-@" }, "doc": "Number of threads used in sorting" }, "input_file": { "type": "File", "inputBinding": { "position": 1000 }, "doc": "Aligned file to be sorted with samtools" }, "suffix": { "type": "string", "default": ".sorted.bam", "doc": "suffix of the transformed SAM/BAM file (including extension, e.g. .filtered.sam)" } }, "outputs": { "sorted_file": { "type": "File", "outputBinding": { "glob": "$(inputs.input_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + inputs.suffix)" }, "doc": "Sorted aligned file" } }, "baseCommand": [ "samtools", "sort" ], "stdout": "$(inputs.input_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + inputs.suffix)" }, "scatter": [ "input_file" ], "in": { "nthreads": "nthreads", "input_file": "remove_duplicates/outfile" }, "out": [ "sorted_file" ] }, "index_dedup_bams": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "requirements": { "InlineJavascriptRequirement": {}, "InitialWorkDirRequirement": { "listing": [ "$(inputs.input_file)" ] } }, "hints": { "DockerRequirement": { "dockerPull": "dukegcb/samtools:1.3" } }, "inputs": { "input_file": { "type": "File", "inputBinding": { "position": 1 }, "doc": "Aligned file to be sorted with samtools" } }, "outputs": { "indexed_file": { "doc": "Indexed BAM file", "type": "File", "outputBinding": { "glob": "$(inputs.input_file.basename)" }, "secondaryFiles": ".bai" } }, "baseCommand": [ "samtools", "index" ], "arguments": [ { "valueFrom": "$(inputs.input_file.basename + '.bai')", "position": 2 } ] }, "scatter": [ "input_file" ], "in": { "input_file": "sort_dedup_bams/sorted_file" }, "out": [ "indexed_file" ] }, "mapped_reads_count": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "Get number of processed reads from Bowtie log.", "requirements": { "InlineJavascriptRequirement": {}, "ShellCommandRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "reddylab/workflow-utils:ggr" } }, "inputs": { "bowtie_log": { "type": "File", "inputBinding": {} } }, "outputs": { "output": { "type": "File", "outputBinding": { "glob": "$(inputs.bowtie_log.path.replace(/^.*[\\\\\\/]/, '') + '.read_count.mapped')" } } }, "baseCommand": "read-count-from-bowtie-log.sh", "stdout": "$(inputs.bowtie_log.path.replace(/^.*[\\\\\\/]/, '') + '.read_count.mapped')" }, "scatter": "bowtie_log", "in": { "bowtie_log": "bowtie-se/output_bowtie_log" }, "out": [ "output" ] }, "percent_uniq_reads": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "Get number of processed reads from Bowtie log.", "requirements": { "InlineJavascriptRequirement": {}, "ShellCommandRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "reddylab/workflow-utils:ggr" } }, "inputs": { "preseq_c_curve_outfile": { "type": "File", "inputBinding": {} } }, "outputs": { "output": { "type": "File", "outputBinding": { "glob": "$(inputs.preseq_c_curve_outfile.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, \"\") + '.percentage_unique_reads.txt')" } } }, "baseCommand": "percent-uniq-reads-from-preseq.sh", "stdout": "$(inputs.preseq_c_curve_outfile.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, \"\") + '.percentage_unique_reads.txt')" }, "scatter": "preseq_c_curve_outfile", "in": { "preseq_c_curve_outfile": "preseq-c-curve/output_file" }, "out": [ "output" ] }, "mapped_filtered_reads_count": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "Extract mapped reads from BAM file using Samtools flagstat command", "requirements": { "InlineJavascriptRequirement": {}, "ShellCommandRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "dukegcb/samtools" } }, "inputs": { "output_suffix": { "type": "string" }, "input_bam_file": { "type": "File", "inputBinding": { "position": 1 }, "doc": "Aligned BAM file to filter" } }, "outputs": { "output_read_count": { "type": "File", "outputBinding": { "glob": "$(inputs.input_bam_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, \"\") + inputs.output_suffix)" }, "doc": "Samtools Flagstat report file" } }, "baseCommand": [ "samtools", "flagstat" ], "arguments": [ { "valueFrom": " | head -n1 | cut -f 1 -d ' '", "position": 10000, "shellQuote": false } ], "stdout": "$(inputs.input_bam_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, \"\") + inputs.output_suffix)" }, "scatter": "input_bam_file", "in": { "output_suffix": { "valueFrom": ".mapped_and_filtered.read_count.txt" }, "input_bam_file": "sort_dedup_bams/sorted_file" }, "out": [ "output_read_count" ] }, "bam_idxstats": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "requirements": { "InlineJavascriptRequirement": {}, "InitialWorkDirRequirement": { "listing": [ "$(inputs.bam)" ] } }, "hints": { "DockerRequirement": { "dockerPull": "dukegcb/samtools:1.3" } }, "inputs": { "bam": { "type": "File", "inputBinding": { "position": 1 }, "secondaryFiles": [ ".bai" ], "doc": "Bam file (it should be indexed)" } }, "outputs": { "idxstats_file": { "type": "File", "doc": "Idxstats output file. TAB-delimited with each line consisting of reference\nsequence name, sequence length, # mapped reads and # unmapped reads\n", "outputBinding": { "glob": "$(inputs.bam.basename + \".idxstats\")" } } }, "baseCommand": [ "samtools", "idxstats" ], "stdout": "$(inputs.bam.basename + \".idxstats\")" }, "scatter": "bam", "in": { "bam": "index_bams/indexed_file" }, "out": [ "idxstats_file" ] }, "percent_mitochondrial_reads": { "run": { "class": "ExpressionTool", "cwlVersion": "v1.0", "requirements": { "InlineJavascriptRequirement": {} }, "inputs": { "idxstats": { "type": "File", "doc": "Samtools idxstats file", "inputBinding": { "loadContents": true } }, "chrom": { "type": "string", "doc": "Query chromosome used to calculate percentage" }, "output_filename": { "type": "string?", "doc": "Save the percentage in a file of the given name" } }, "expression": "${\n var regExp = new RegExp(inputs.chrom + \"\\\\s\\\\d+\\\\s(\\\\d+)\\\\s(\\\\d+)\");\n var match = inputs.idxstats.contents.match(regExp);\n if (match){\n var chrom_mapped_reads = match[1];\n var total_reads = inputs.idxstats.contents.split(\"\\n\")\n .map(function(x){\n var rr = x.match(/.*\\s\\d+\\s(\\d+)\\s\\d+/);\n return (rr ? rr[1] : 0);\n })\n .reduce(function(a, b) { return Number(a) + Number(b); });\n\n var output = (100*chrom_mapped_reads/total_reads).toFixed(4) + \"%\" + \"\\n\";\n\n if (inputs.output_filename){\n return {\n percent_map : {\n \"class\": \"File\",\n \"basename\" : inputs.output_filename,\n \"contents\" : output,\n }\n }\n }\n return output;\n }\n}\n", "outputs": { "percent_map": { "type": [ "File", "string" ] } } }, "scatter": "idxstats", "in": { "idxstats": "bam_idxstats/idxstats_file", "chrom": { "valueFrom": "chrM" }, "output_filename": { "valueFrom": "${return inputs.idxstats.basename.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '').replace(/\\.[^/.]+$/, '').replace(/\\.[^/.]+$/, '.mitochondrial_percentage.txt')}" } }, "out": [ "percent_map" ] } }, "outputs": { "output_pbc_files": { "doc": "PCR Bottleneck Coeficient files.", "type": "File[]", "outputSource": "execute_pcr_bottleneck_coef/pbc_file" }, "output_read_count_mapped": { "doc": "Read counts of the mapped BAM files", "type": "File[]", "outputSource": "mapped_reads_count/output" }, "output_data_sorted_dedup_bam_files": { "doc": "BAM files without duplicate reads.", "type": "File[]", "outputSource": "index_dedup_bams/indexed_file" }, "output_data_sorted_dups_marked_bam_files": { "doc": "BAM files with marked duplicate reads.", "type": "File[]", "outputSource": "index_dups_marked_bams/indexed_file" }, "output_picard_mark_duplicates_files": { "doc": "Picard MarkDuplicates metrics files.", "type": "File[]", "outputSource": "mark_duplicates/output_metrics_file" }, "output_read_count_mapped_filtered": { "doc": "Read counts of the mapped and filtered BAM files", "type": "File[]", "outputSource": "mapped_filtered_reads_count/output_read_count" }, "output_percentage_uniq_reads": { "doc": "Percentage of uniq reads from preseq c_curve output", "type": "File[]", "outputSource": "percent_uniq_reads/output" }, "output_bowtie_log": { "doc": "Bowtie log file.", "type": "File[]", "outputSource": "bowtie-se/output_bowtie_log" }, "output_preseq_c_curve_files": { "doc": "Preseq c_curve output files.", "type": "File[]", "outputSource": "preseq-c-curve/output_file" }, "output_percent_mitochondrial_reads": { "doc": "Percentage of mitochondrial reads.", "type": "File[]", "outputSource": "percent_mitochondrial_reads/percent_map" } } }, "in": { "input_fastq_files": "trimm/output_data_fastq_trimmed_files", "genome_sizes_file": "genome_sizes_file", "genome_ref_first_index_file": "genome_ref_first_index_file", "picard_jar_path": "picard_jar_path", "picard_java_opts": "picard_java_opts", "nthreads": "nthreads_map" }, "out": [ "output_data_sorted_dedup_bam_files", "output_data_sorted_dups_marked_bam_files", "output_picard_mark_duplicates_files", "output_pbc_files", "output_bowtie_log", "output_preseq_c_curve_files", "output_percentage_uniq_reads", "output_read_count_mapped", "output_percent_mitochondrial_reads" ] }, "peak_call": { "run": { "class": "Workflow", "cwlVersion": "v1.0", "doc": "ATAC-seq 04 quantification - SE", "requirements": [ { "class": "ScatterFeatureRequirement" }, { "class": "StepInputExpressionRequirement" }, { "class": "InlineJavascriptRequirement" } ], "inputs": { "input_bam_files": { "type": "File[]" }, "genome_effective_size": { "default": "hs", "doc": "Effective genome size used by MACS2. It can be numeric or a shortcuts:'hs' for human (2.7e9), 'mm' for mouse (1.87e9), 'ce' for C. elegans (9e7) and 'dm' for fruitfly (1.2e8), Default:hs", "type": "string" }, "input_genome_sizes": { "doc": "Two column tab-delimited file with chromosome size information", "type": "File" }, "as_narrowPeak_file": { "doc": "Definition narrowPeak file in AutoSql format (used in bedToBigBed)", "type": "File" }, "nthreads": { "default": 1, "type": "int" } }, "steps": { "spp": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "requirements": { "InlineJavascriptRequirement": {}, "ShellCommandRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "dukegcb/spp" } }, "inputs": { "nthreads": { "type": "int?", "inputBinding": { "prefix": "-p=", "separate": false }, "doc": "-p= , number of parallel processing nodes, default=0" }, "filtchr": { "type": "string?", "inputBinding": { "prefix": "-filtchr=", "separate": false }, "doc": "-filtchr= , Pattern to use to remove tags that map to specific chromosomes e.g. _ will remove all tags that map to chromosomes with _ in their name" }, "savr": { "type": "boolean?", "inputBinding": { "valueFrom": "${ if (self) return \"-savr=\" + inputs.input_bam.path.replace(/^.*[\\\\\\/]/, \"\").replace(/\\.[^/.]+$/, \"\") + \".spp.regionPeak\"; return null}" }, "doc": "-savr= OR -savr RegionPeak file name (variable width peaks with regions of enrichment)" }, "rf": { "type": "boolean", "default": true, "inputBinding": { "prefix": "-rf" }, "doc": "overwrite (force remove) output files in case they exist. Default: true" }, "npeak": { "type": "int?", "inputBinding": { "prefix": "-npeak=", "separate": false }, "doc": "-npeak=, threshold on number of peaks to call" }, "savn": { "type": "boolean?", "inputBinding": { "valueFrom": "${ if (self) return \"-savn=\" + inputs.input_bam.path.replace(/^.*[\\\\\\/]/, \"\").replace(/\\.[^/.]+$/, \"\") + \".spp.narrowPeak\"; return null}" }, "doc": "-savn= OR -savn NarrowPeak file name (fixed width peaks)" }, "s": { "type": "string?", "inputBinding": { "prefix": "-s=", "separate": false }, "doc": "-s=:: , strand shifts at which cross-correlation is evaluated, default=-500:5:1500" }, "fdr": { "type": "float?", "inputBinding": { "prefix": "-fdr=", "separate": false }, "doc": "-fdr= , false discovery rate threshold for peak calling" }, "savp": { "type": "boolean?", "inputBinding": { "valueFrom": "${ if (self) return \"-savp=\" + inputs.input_bam.path.replace(/^.*[\\\\\\/]/, \"\").replace(/\\.[^/.]+$/, \"\") + \".spp_cross_corr.pdf\"; return null}" }, "doc": "save cross-correlation plot" }, "x": { "type": "string?", "inputBinding": { "prefix": "-x=", "separate": false }, "doc": "-x=:, strand shifts to exclude (This is mainly to avoid region around phantom peak) default=10:(readlen+10)" }, "control_bam": { "type": "File?", "inputBinding": { "prefix": "-i=", "separate": false }, "doc": ", full path and name (or URL) of tagAlign/BAM file (can be gzipped) (FILE EXTENSION MUST BE tagAlign.gz, tagAlign, bam or bam.gz)" }, "savd": { "type": "boolean?", "inputBinding": { "valueFrom": "${ if (self) return \"-savd=\" + inputs.input_bam.path.replace(/^.*[\\\\\\/]/, \"\").replace(/\\.[^/.]+$/, \"\") + \".spp.Rdata\"; return null}" }, "doc": "-savd= OR -savd, save Rdata file" }, "input_bam": { "type": "File", "inputBinding": { "prefix": "-c=", "separate": false }, "doc": ", full path and name (or URL) of tagAlign/BAM file (can be gzipped)(FILE EXTENSION MUST BE tagAlign.gz, tagAlign, bam or bam.gz)" }, "speak": { "type": "string?", "inputBinding": { "prefix": "-speak=", "separate": false }, "doc": "-speak=, user-defined cross-correlation peak strandshift" } }, "outputs": { "output_spp_cross_corr": { "type": "File", "outputBinding": { "glob": "$(inputs.input_bam.path.replace(/^.*[\\\\\\/]/, \"\").replace(/\\.[^/.]+$/, \"\") + \".spp_cross_corr.txt\")" }, "doc": "peakshift/phantomPeak results summary file" }, "output_spp_rdata": { "type": "File?", "outputBinding": { "glob": "$(inputs.input_bam.path.replace(/^.*[\\\\\\/]/, \"\").replace(/\\.[^/.]+$/, \"\") + \".spp.Rdata\")" }, "doc": "Rdata file from the run_spp.R run" }, "output_spp_narrow_peak": { "type": "File?", "outputBinding": { "glob": "$(inputs.input_bam.path.replace(/^.*[\\\\\\/]/, \"\").replace(/\\.[^/.]+$/, \"\") + \".spp.narrowPeak\")" }, "doc": "narrowPeak output file" }, "output_spp_region_peak": { "type": "File?", "outputBinding": { "glob": "$(inputs.input_bam.path.replace(/^.*[\\\\\\/]/, \"\").replace(/\\.[^/.]+$/, \"\") + \".spp.regionPeak\")" }, "doc": "regionPeak output file" }, "output_spp_cross_corr_plot": { "type": "File?", "outputBinding": { "glob": "$(inputs.input_bam.path.replace(/^.*[\\\\\\/]/, \"\").replace(/\\.[^/.]+$/, \"\") + \".spp_cross_corr.pdf\")" }, "doc": "peakshift/phantomPeak results summary plot" } }, "baseCommand": "run_spp.R", "arguments": [ { "valueFrom": "$(\"-out=\" + inputs.input_bam.path.replace(/^.*[\\\\\\/]/, \"\").replace(/\\.[^/.]+$/, \"\") + \".spp_cross_corr.txt\")" }, { "valueFrom": "$(\"-tmpdir=\"+runtime.tmpdir)", "shellQuote": false } ] }, "scatterMethod": "dotproduct", "scatter": [ "input_bam" ], "in": { "input_bam": "input_bam_files", "nthreads": "nthreads", "savp": { "valueFrom": "${return true}" } }, "out": [ "output_spp_cross_corr", "output_spp_cross_corr_plot" ] }, "extract-peak-frag-length": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "Extracts best fragment length from SPP output text file", "hints": { "DockerRequirement": { "dockerPull": "reddylab/workflow-utils:ggr" } }, "inputs": { "input_spp_txt_file": { "type": "File", "inputBinding": { "position": 1 } } }, "outputs": { "output_best_frag_length": { "type": "float", "outputBinding": { "glob": "best_frag_length", "loadContents": true, "outputEval": "$(Number(self[0].contents.replace('\\n', '')))" } } }, "baseCommand": "extract-best-frag-length.sh", "stdout": "best_frag_length" }, "scatter": "input_spp_txt_file", "in": { "input_spp_txt_file": "spp/output_spp_cross_corr" }, "out": [ "output_best_frag_length" ] }, "peak-calling": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "requirements": { "InlineJavascriptRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "dukegcb/macs2" } }, "inputs": { "control": { "type": "File?", "inputBinding": { "position": 2, "prefix": "--control" }, "doc": "Control sample file." }, "extsize": { "type": "float?", "inputBinding": { "position": 1, "prefix": "--extsize" }, "doc": "The arbitrary extension size in bp. When nomodel is true, MACS will use this value as fragment size to extend each read towards 3' end, then pile them up. It's exactly twice the number of obsolete SHIFTSIZE. In previous language, each read is moved 5'->3' direction to middle of fragment by 1/2 d, then extended to both direction with 1/2 d. This is equivalent to say each read is extended towards 5'->3' into a d size fragment. DEFAULT: 200. EXTSIZE and SHIFT can be combined when necessary. Check SHIFT option." }, "verbose": { "type": "int?", "inputBinding": { "position": 1, "prefix": "--verbose" }, "doc": "VERBOSE_LEVEL Set verbose level of runtime message. 0: only show\ncritical message, 1: show additional warning message,\n2: show process information, 3: show debug messages.\nDEFAULT:2\n" }, "seed": { "type": "int?", "inputBinding": { "position": 1, "prefix": "--seed" }, "doc": "SEED Set the random seed while down sampling data. Must be\na non-negative integer in order to be effective.\nDEFAULT: not set\n" }, "broad-cutoff": { "type": "float?", "inputBinding": { "position": 1, "prefix": "--broad-cutoff" }, "doc": "BROADCUTOFF\nCutoff for broad region. This option is not available\nunless --broad is set. If -p is set, this is a pvalue\ncutoff, otherwise, it's a qvalue cutoff. DEFAULT: 0.1\n" }, "bdg": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "--bdg" }, "doc": "Whether or not to save extended fragment pileup, and local lambda tracks (two files) at every bp into a bedGraph file. DEFAULT: True" }, "ratio": { "type": "float?", "inputBinding": { "position": 1, "prefix": "--ratio" }, "doc": "RATIO When set, use a custom scaling ratio of ChIP/control\n(e.g. calculated using NCIS) for linear scaling.\nDEFAULT: ingore\n" }, "call-summits": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "--call-summits" }, "doc": "If set, MACS will use a more sophisticated signal processing approach to find subpeak summits in each enriched peak region. DEFAULT: False " }, "broad": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "--broad" }, "doc": "If set, MACS will try to call broad peaks by linking nearby highly enriched regions. The linking region is controlled by another cutoff through --linking-cutoff. The maximum linking region length is 4 times of d from MACS. DEFAULT: False " }, "format": { "type": "string?", "inputBinding": { "position": 1, "prefix": "-f" }, "doc": "-f {AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE,BAMPE}, --format {AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE,BAMPE} Format of tag file, \"AUTO\", \"BED\" or \"ELAND\" or \"ELANDMULTI\" or \"ELANDEXPORT\" or \"SAM\" or \"BAM\" or \"BOWTIE\" or \"BAMPE\". The default AUTO option will let MACS decide which format the file is. Note that MACS can't detect \"BAMPE\" or \"BEDPE\" format with \"AUTO\", and you have to implicitly specify the format for \"BAMPE\" and \"BEDPE\". DEFAULT: \"AUTO\"." }, "buffer-size": { "type": "int?", "inputBinding": { "position": 1, "prefix": "--buffer-size" }, "doc": "BUFFER_SIZE\nBuffer size for incrementally increasing internal\narray size to store reads alignment information. In\nmost cases, you don't have to change this parameter.\nHowever, if there are large number of\nchromosomes/contigs/scaffolds in your alignment, it's\nrecommended to specify a smaller buffer size in order\nto decrease memory usage (but it will take longer time\nto read alignment files). Minimum memory requested for\nreading an alignment file is about # of CHROMOSOME *\nBUFFER_SIZE * 2 Bytes. DEFAULT: 100000\n" }, "nolambda": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "--nolambda" }, "doc": "If True, MACS will use fixed background lambda as local lambda for every peak region. Normally, MACS calculates a dynamic local lambda to reflect the local bias due to potential chromatin structure. " }, "cutoff-analysis": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "--cutoff-analysis" }, "doc": "While set, MACS2 will analyze number or total length of peaks that can be called by different p-value cutoff then output a summary table to help user decide a better cutoff. The table will be saved in NAME_cutoff_analysis.txt file. Note, minlen and maxgap may affect the results. WARNING: May take ~30 folds longer time to finish. DEFAULT: False Post-processing options: " }, "treatment": { "type": "File[]", "inputBinding": { "position": 2, "prefix": "--treatment" }, "doc": "Treatment sample file(s). If multiple files are given as -t A B C, then they will all be read and pooled together. IMPORTANT: the first sample will be used as the outputs basename." }, "slocal": { "type": "int?", "inputBinding": { "position": 1, "prefix": "--slocal" }, "doc": "SMALLLOCAL The small nearby region in basepairs to calculate\ndynamic lambda. This is used to capture the bias near\nthe peak summit region. Invalid if there is no control\ndata. If you set this to 0, MACS will skip slocal\nlambda calculation. *Note* that MACS will always\nperform a d-size local lambda calculation. The final\nlocal bias should be the maximum of the lambda value\nfrom d, slocal, and llocal size windows. DEFAULT: 1000\n" }, "nomodel": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "--nomodel" }, "doc": "\t Whether or not to build the shifting model. If True, MACS will not build model. by default it means shifting size = 100, try to set extsize to change it. DEFAULT: False" }, "keep-dup": { "type": "string?", "inputBinding": { "position": 1, "prefix": "--keep-dup" }, "doc": "KEEPDUPLICATES\nIt controls the MACS behavior towards duplicate tags\nat the exact same location -- the same coordination\nand the same strand. The 'auto' option makes MACS\ncalculate the maximum tags at the exact same location\nbased on binomal distribution using 1e-5 as pvalue\ncutoff; and the 'all' option keeps every tags. If an\ninteger is given, at most this number of tags will be\nkept at the same location. The default is to keep one\ntag at the same location. Default: 1\n" }, "down-sample": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "--down-sample" }, "doc": "When set, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. Warning: This option will make your result unstable and irreproducible since each time, random reads would be selected. Consider to use 'randsample' script instead. If used together with --SPMR, 1 million unique reads will be randomly picked. Caution: due to the implementation, the final number of selected reads may not be as you expected! DEFAULT: False " }, "trackline": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "--trackline" }, "doc": "Tells MACS to include trackline with bedGraph files. To include this trackline while displaying bedGraph at UCSC genome browser, can show name and description of the file as well. However my suggestion is to convert bedGraph to bigWig, then show the smaller and faster binary bigWig file at UCSC genome browser, as well as downstream analysis. Require --bdg to be set. Default: Not include trackline. " }, "bw": { "type": "int?", "inputBinding": { "position": 1, "prefix": "--bw" }, "doc": "BW Band width for picking regions to compute fragment\nsize. This value is only used while building the\nshifting model. DEFAULT: 300\n" }, "fe-cutoff": { "type": "float?", "inputBinding": { "position": 1, "prefix": "--fe-cutoff" }, "doc": "FECUTOFF When set, the value will be used to filter out peaks\nwith low fold-enrichment. Note, MACS2 use 1.0 as\npseudocount while calculating fold-enrichment.\nDEFAULT: 1.0\n" }, "fix-bimodal": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "--fix-bimodal" }, "doc": "Whether turn on the auto pair model process. If set, when MACS failed to build paired model, it will use the nomodel settings, the --exsize parameter to extend each tags towards 3' direction. Not to use this automate fixation is a default behavior now. DEFAULT: False " }, "to-large": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "--to-large" }, "doc": "When set, scale the small sample up to the bigger sample. By default, the bigger dataset will be scaled down towards the smaller dataset, which will lead to smaller p/qvalues and more specific results. Keep in mind that scaling down will bring down background noise more. DEFAULT: False " }, "g": { "type": "string?", "inputBinding": { "position": 1, "prefix": "-g" }, "doc": "Effective genome size. It can be 1.0e+9 or 1000000000, or shortcuts:'hs' for human (2.7e9), 'mm' for mouse (1.87e9), 'ce' for C. elegans (9e7) and 'dm' for fruitfly (1.2e8), Default:hs." }, "shift": { "type": "int?", "inputBinding": { "position": 1, "prefix": "--shift" }, "doc": "(NOT the legacy --shiftsize option!) The arbitrary shift in bp. Use discretion while setting it other than default value. When NOMODEL is set, MACS will use this value to move cutting ends (5') towards 5'->3' direction then apply EXTSIZE to extend them to fragments. When this value is negative, ends will be moved toward 3'->5' direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with EXTSIZE option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can't set values other than 0 if format is BAMPE for paired-end data. DEFAULT: 0." }, "m": { "type": "string?", "inputBinding": { "position": 1, "prefix": "-m" }, "doc": "MFOLD MFOLD, --mfold MFOLD MFOLD\nSelect the regions within MFOLD range of high-\nconfidence enrichment ratio against background to\nbuild model. Fold-enrichment in regions must be lower\nthan upper limit, and higher than the lower limit. Use\nas \"-m 10 30\". DEFAULT:5 50\n" }, "q": { "type": "float?", "inputBinding": { "position": 1, "prefix": "-q" }, "doc": "Minimum FDR (q-value) cutoff for peak detection. DEFAULT: 0.05. -q, and -p are mutually exclusive." }, "p": { "type": "float?", "inputBinding": { "position": 1, "prefix": "-p" }, "doc": "Pvalue cutoff for peak detection. DEFAULT: not set. -q, and -p are mutually exclusive. If pvalue cutoff is set, qvalue will not be calculated and reported as -1 in the final .xls file.." }, "s": { "type": "int?", "inputBinding": { "position": 1, "prefix": "-s" }, "doc": "TSIZE, --tsize TSIZE\nTag size. This will overide the auto detected tag\nsize. DEFAULT: Not set\n" }, "llocal": { "type": "int?", "inputBinding": { "position": 1, "prefix": "--llocal" }, "doc": "LARGELOCAL The large nearby region in basepairs to calculate\ndynamic lambda. This is used to capture the surround\nbias. If you set this to 0, MACS will skip llocal\nlambda calculation. *Note* that MACS will always\nperform a d-size local lambda calculation. The final\nlocal bias should be the maximum of the lambda value\nfrom d, slocal, and llocal size windows. DEFAULT:\n10000.\n" }, "SPMR": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "--SPMR" }, "doc": "If True, MACS will save signal per million reads for fragment pileup profiles. Require --bdg to be set. Default: False " } }, "outputs": { "output_peak_file": { "type": "File", "outputBinding": { "glob": "$(inputs.treatment[0].path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + '_peaks.*Peak')", "outputEval": "$(self[0])" }, "doc": "Peak calling output file in narrowPeak|broadPeak format." }, "output_peak_xls_file": { "type": "File", "outputBinding": { "glob": "$(inputs.treatment[0].path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + '_peaks.xls')" }, "doc": "Peaks information/report file." }, "output_peak_summits_file": { "type": "File?", "outputBinding": { "glob": "$(inputs.treatment[0].path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + '_summits.bed')" }, "doc": "Peaks summits bedfile." }, "output_ext_frag_bdg_file": { "type": "File?", "outputBinding": { "glob": "$(inputs.treatment[0].path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + '_treat_pileup.bdg')" }, "doc": "Bedgraph with extended fragment pileup." } }, "baseCommand": [ "macs2", "callpeak" ], "arguments": [ { "valueFrom": "$(inputs.treatment[0].path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, ''))", "prefix": "-n", "position": 1 } ] }, "scatterMethod": "dotproduct", "scatter": [ "treatment" ], "in": { "q": { "valueFrom": "${return 0.1}" }, "bdg": { "valueFrom": "${return true}" }, "treatment": { "source": "input_bam_files", "valueFrom": "$([self])" }, "g": "genome_effective_size", "format": { "valueFrom": "BAM" }, "extsize": { "valueFrom": "${return 200}" }, "nomodel": { "valueFrom": "${return true}" }, "shift": { "valueFrom": "${return -100}" } }, "out": [ "output_peak_file", "output_peak_summits_file", "output_ext_frag_bdg_file", "output_peak_xls_file" ] }, "count-reads-filtered": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "Count number of dedup-ed reads used in peak calling", "requirements": { "InlineJavascriptRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "reddylab/workflow-utils:ggr" } }, "inputs": { "peak_xls_file": { "type": "File", "inputBinding": { "position": 1 } } }, "outputs": { "read_count_file": { "type": "File", "outputBinding": { "glob": "$(inputs.peak_xls_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\_peaks\\.xls$/, '_read_count.txt'))" } } }, "baseCommand": "count-filtered-reads-macs2.sh", "stdout": "$(inputs.peak_xls_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\_peaks\\.xls$/, '_read_count.txt'))" }, "in": { "peak_xls_file": "peak-calling/output_peak_xls_file" }, "scatter": "peak_xls_file", "out": [ "read_count_file" ] }, "count-peaks": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "Counts lines in a file and returns a suffixed file with that number", "requirements": { "InlineJavascriptRequirement": {} }, "inputs": { "output_suffix": { "type": "string", "default": ".count" }, "input_file": { "type": "File" } }, "outputs": { "output_counts": { "type": "File", "outputBinding": { "glob": "$(inputs.input_file.path.replace(/^.*[\\\\\\/]/, '') + inputs.output_suffix)" } } }, "baseCommand": [ "wc", "-l" ], "stdin": "$(inputs.input_file.path)", "stdout": "$(inputs.input_file.path.replace(/^.*[\\\\\\/]/, '') + inputs.output_suffix)" }, "in": { "output_suffix": { "valueFrom": ".peak_count.within_replicate.txt" }, "input_file": "peak-calling/output_peak_file" }, "scatter": "input_file", "out": [ "output_counts" ] }, "filter-reads-in-peaks": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "Filter BAM file to only include reads overlapping with a BED file", "requirements": { "InlineJavascriptRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "dukegcb/samtools:1.3" } }, "inputs": { "input_bam_file": { "type": "File", "inputBinding": { "position": 3 }, "doc": "Aligned BAM file to filter" }, "input_bedfile": { "type": "File", "inputBinding": { "position": 2, "prefix": "-L" }, "doc": "Bedfile used to only include reads overlapping this BED FILE" } }, "outputs": { "filtered_file": { "type": "File", "outputBinding": { "glob": "$(inputs.input_bam_file.path.replace(/^.*[\\\\\\/]/, '') + '.in_peaks.bam')" }, "doc": "Filtered aligned BAM file by BED coordinates file" } }, "baseCommand": [ "samtools", "view", "-b", "-h" ], "stdout": "$(inputs.input_bam_file.path.replace(/^.*[\\\\\\/]/, '') + '.in_peaks.bam')" }, "scatterMethod": "dotproduct", "scatter": [ "input_bam_file", "input_bedfile" ], "in": { "input_bam_file": "input_bam_files", "input_bedfile": "peak-calling/output_peak_file" }, "out": [ "filtered_file" ] }, "extract-count-reads-in-peaks": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "Extract mapped reads from BAM file using Samtools flagstat command", "requirements": { "InlineJavascriptRequirement": {}, "ShellCommandRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "dukegcb/samtools" } }, "inputs": { "output_suffix": { "type": "string" }, "input_bam_file": { "type": "File", "inputBinding": { "position": 1 }, "doc": "Aligned BAM file to filter" } }, "outputs": { "output_read_count": { "type": "File", "outputBinding": { "glob": "$(inputs.input_bam_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, \"\") + inputs.output_suffix)" }, "doc": "Samtools Flagstat report file" } }, "baseCommand": [ "samtools", "flagstat" ], "arguments": [ { "valueFrom": " | head -n1 | cut -f 1 -d ' '", "position": 10000, "shellQuote": false } ], "stdout": "$(inputs.input_bam_file.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, \"\") + inputs.output_suffix)" }, "scatter": "input_bam_file", "in": { "output_suffix": { "valueFrom": ".read_count.within_replicate.txt" }, "input_bam_file": "filter-reads-in-peaks/filtered_file" }, "out": [ "output_read_count" ] }, "trunk-peak-score": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "Trunk scores in ENCODE bed6+4 files", "hints": { "DockerRequirement": { "dockerPull": "reddylab/workflow-utils:ggr" } }, "inputs": { "peaks": { "type": "File", "inputBinding": { "position": 10000 } }, "sep": { "type": "string", "default": "\\t", "inputBinding": { "position": 2, "prefix": "-F" } } }, "outputs": { "trunked_scores_peaks": { "type": "File", "outputBinding": { "glob": "$(inputs.peaks.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.([^/.]+)$/, \"\\.trunked_scores\\.$1\"))" } } }, "baseCommand": "awk", "arguments": [ { "valueFrom": "BEGIN{OFS=FS}$5>1000{$5=1000}{print}", "position": 3 } ], "stdout": "$(inputs.peaks.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.([^/.]+)$/, \"\\.trunked_scores\\.$1\"))" }, "scatter": "peaks", "in": { "peaks": "peak-calling/output_peak_file" }, "out": [ "trunked_scores_peaks" ] }, "peaks-bed-to-bigbed": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "\"bedToBigBed v. 2.7 - Convert bed file to bigBed. (BigBed version: 4)\nusage:\n bedToBigBed in.bed chrom.sizes out.bb\nWhere in.bed is in one of the ascii bed formats, but not including track lines\nand chrom.sizes is two column: \nand out.bb is the output indexed big bed file.\nUse the script: fetchChromSizes to obtain the actual chrom.sizes information\nfrom UCSC, please do not make up a chrom sizes from your own information.\nThe in.bed file must be sorted by chromosome,start,\n to sort a bed file, use the unix sort command:\n sort -k1,1 -k2,2n unsorted.bed > sorted.bed\"\n", "requirements": { "InlineJavascriptRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "dleehr/docker-hubutils" } }, "inputs": { "genome_sizes": { "type": "File", "inputBinding": { "position": 3 }, "doc": "genome_sizes is two column: .\n" }, "blockSize": { "type": "int?", "inputBinding": { "position": 1, "prefix": "-blockSize=", "separate": false }, "doc": "-blockSize=N - Number of items to bundle in r-tree. Default 256\n" }, "bed": { "type": "File", "inputBinding": { "position": 2 }, "doc": "Input bed file" }, "itemsPerSlot": { "type": "int?", "inputBinding": { "position": 1, "prefix": "-itemsPerSlot=", "separate": false }, "doc": "-itemsPerSlot=N - Number of data points bundled at lowest level. Default 512\n" }, "as": { "type": "File?", "inputBinding": { "position": 1, "prefix": "-as=", "separate": false }, "doc": "-as=fields.as - If you have non-standard \"bedPlus\" fields, it's great to put a definition\n of each field in a row in AutoSql format here.1)\n" }, "unc": { "type": "boolean?", "inputBinding": { "position": 1 }, "doc": "-unc - If set, do not use compression.\n" }, "extraIndex": { "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 1, "prefix": "-extraIndex=", "itemSeparator": "," }, "doc": "-extraIndex=fieldList - If set, make an index on each field in a comma separated list\n extraIndex=name and extraIndex=name,id are commonly used.\n" }, "tab": { "type": "boolean?", "inputBinding": { "position": 1 }, "doc": "-tab - If set, expect fields to be tab separated, normally expects white space separator.\n" }, "type": { "type": "string?", "inputBinding": { "position": 1, "prefix": "-type=", "separate": false }, "doc": "-type=bedN[+[P]] :\n N is between 3 and 15,\n optional (+) if extra \"bedPlus\" fields,\n optional P specifies the number of extra fields. Not required, but preferred.\n Examples: -type=bed6 or -type=bed6+ or -type=bed6+3\n (see http://genome.ucsc.edu/FAQ/FAQformat.html#format1)\n" }, "output_suffix": { "type": "string", "default": ".bb" } }, "outputs": { "bigbed": { "type": "File", "outputBinding": { "glob": "$(inputs.bed.path.replace(/^.*[\\\\\\/]/, '')+ inputs.output_suffix)" } } }, "baseCommand": "bedToBigBed", "arguments": [ { "valueFrom": "$(inputs.bed.path.replace(/^.*[\\\\\\/]/, '') + inputs.output_suffix)", "position": 4 } ] }, "in": { "type": { "valueFrom": "bed6+4" }, "as": "as_narrowPeak_file", "genome_sizes": "input_genome_sizes", "bed": "trunk-peak-score/trunked_scores_peaks" }, "scatter": "bed", "out": [ "bigbed" ] } }, "outputs": { "output_filtered_read_count_file": { "doc": "Filtered read count reported by MACS2", "type": "File[]", "outputSource": "count-reads-filtered/read_count_file" }, "output_peak_summits_file": { "doc": "File containing peak summits", "type": "File[]", "outputSource": "peak-calling/output_peak_summits_file" }, "output_peak_file": { "doc": "peakshift/phantomPeak results file", "type": "File[]", "outputSource": "peak-calling/output_peak_file" }, "output_peak_count_within_replicate": { "doc": "Peak counts within replicate", "type": "File[]", "outputSource": "count-peaks/output_counts" }, "output_spp_cross_corr_plot": { "doc": "peakshift/phantomPeak results file", "type": "File[]", "outputSource": "spp/output_spp_cross_corr_plot" }, "output_spp_x_cross_corr": { "doc": "peakshift/phantomPeak results file", "type": "File[]", "outputSource": "spp/output_spp_cross_corr" }, "output_extended_peak_file": { "doc": "peakshift/phantomPeak extended fragment results file", "type": "File[]", "outputSource": "peak-calling/output_ext_frag_bdg_file" }, "output_peak_xls_file": { "doc": "Peak calling report file (*_peaks.xls file produced by MACS2)", "type": "File[]", "outputSource": "peak-calling/output_peak_xls_file" }, "output_peak_bigbed_file": { "doc": "Peaks in bigBed format", "type": "File[]", "outputSource": "peaks-bed-to-bigbed/bigbed" }, "output_read_in_peak_count_within_replicate": { "doc": "Reads peak counts within replicate", "type": "File[]", "outputSource": "extract-count-reads-in-peaks/output_read_count" } } }, "in": { "input_bam_files": "map/output_data_sorted_dedup_bam_files", "input_bam_format": { "valueFrom": "BAM" }, "genome_effective_size": "genome_effective_size", "input_genome_sizes": "genome_sizes_file", "as_narrowPeak_file": "as_narrowPeak_file", "nthreads": "nthreads_peakcall" }, "out": [ "output_spp_x_cross_corr", "output_spp_cross_corr_plot", "output_read_in_peak_count_within_replicate", "output_peak_file", "output_peak_bigbed_file", "output_peak_summits_file", "output_extended_peak_file", "output_peak_xls_file", "output_filtered_read_count_file", "output_peak_count_within_replicate" ] }, "quant": { "run": { "class": "Workflow", "cwlVersion": "v1.0", "doc": "ATAC-seq - Quantification", "requirements": [ { "class": "ScatterFeatureRequirement" }, { "class": "StepInputExpressionRequirement" }, { "class": "InlineJavascriptRequirement" } ], "inputs": { "input_bam_files": { "type": "File[]" }, "input_genome_sizes": { "type": "File" }, "nthreads": { "default": 1, "type": "int" } }, "steps": { "bamcoverage": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "usage: An example usage is:$ bamCoverage -b reads.bam -o coverage.bw\n\nThis tool takes an alignment of reads or fragments as input (BAM file) and\ngenerates a coverage track (bigWig or bedGraph) as output. The coverage is\ncalculated as the number of reads per bin, where bins are short consecutive\ncounting windows of a defined size. It is possible to extended the length of\nthe reads to better reflect the actual fragment length. *bamCoverage* offers\nnormalization by scaling factor, Reads Per Kilobase per Million mapped reads\n(RPKM), and 1x depth (reads per genome coverage, RPGC).\nRequired arguments:\n --bam BAM file, -b BAM file\n BAM file to process (default: None)\nOutput:\n --outFileName FILENAME, -o FILENAME\n Output file name. (default: None)\n --outFileFormat {bigwig,bedgraph}, -of {bigwig,bedgraph}\n Output file type. Either \"bigwig\" or \"bedgraph\".\n (default: bigwig)\nOptional arguments:\n --help, -h show this help message and exit\n --scaleFactor SCALEFACTOR\n The smooth length defines a window, larger than the\n binSize, to average the number of reads. For example,\n if the \u2013binSize is set to 20 and the \u2013smoothLength is\n set to 60, then, for each bin, the average of the bin\n and its left and right neighbors is considered.\n Any value smaller than \u2013binSize will be ignored and\n no smoothing will be applied. (default: 1.0)\n --MNase Determine nucleosome positions from MNase-seq data.\n Only 3 nucleotides at the center of each fragment are\n counted. The fragment ends are defined by the two mate\n reads. Only fragment lengthsbetween 130 - 200 bp are\n considered to avoid dinucleosomes or other\n artifacts.*NOTE*: Requires paired-end data. A bin size\n of 1 is recommended. (default: False)\n --filterRNAstrand {forward,reverse}\n Selects RNA-seq reads (single-end or paired-end) in\n the given strand. (default: None)\n --version show program's version number and exit\n --binSize INT bp, -bs INT bp\n Size of the bins, in bases, for the output of the\n bigwig/bedgraph file. (default: 50)\n --region CHR:START:END, -r CHR:START:END\n Region of the genome to limit the operation to - this\n is useful when testing parameters to reduce the\n computing time. The format is chr:start:end, for\n example --region chr10 or --region\n chr10:456700:891000. (default: None)\n --blackListFileName BED file, -bl BED file\n A BED file containing regions that should be excluded\n from all analyses. Currently this works by rejecting\n genomic chunks that happen to overlap an entry.\n Consequently, for BAM files, if a read partially\n overlaps a blacklisted region or a fragment spans over\n it, then the read/fragment might still be considered.\n (default: None)\n --numberOfProcessors INT, -p INT\n Number of processors to use. Type \"max/2\" to use half\n the maximum number of processors or \"max\" to use all\n available processors. (default: max/2)\n --verbose, -v Set to see processing messages. (default: False)\nRead coverage normalization options:\n --normalizeTo1x EFFECTIVE GENOME SIZE LENGTH\n Report read coverage normalized to 1x sequencing depth\n (also known as Reads Per Genomic Content (RPGC)).\n Sequencing depth is defined as: (total number of\n mapped reads * fragment length) / effective genome\n size. The scaling factor used is the inverse of the\n sequencing depth computed for the sample to match the\n 1x coverage. To use this option, the effective genome\n size has to be indicated after the option. The\n effective genome size is the portion of the genome\n that is mappable. Large fractions of the genome are\n stretches of NNNN that should be discarded. Also, if\n repetitive regions were not included in the mapping of\n reads, the effective genome size needs to be adjusted\n accordingly. Common values are: mm9: 2,150,570,000;\n hg19:2,451,960,000; dm3:121,400,000 and\n ce10:93,260,000. See Table 2 of http://www.plosone.org\n /article/info:doi/10.1371/journal.pone.0030377 or http\n ://www.nature.com/nbt/journal/v27/n1/fig_tab/nbt.1518_\n T1.html for several effective genome sizes. (default:\n None)\n--ignoreForNormalization IGNOREFORNORMALIZATION [IGNOREFORNORMALIZATION ...]\n A list of space-delimited chromosome names containing\n those chromosomes that should be excluded for\n computing the normalization. This is useful when\n considering samples with unequal coverage across\n chromosomes, like male samples. An usage examples is\n --ignoreForNormalization chrX chrM. (default: None)\n --skipNonCoveredRegions, --skipNAs\n This parameter determines if non-covered regions\n (regions without overlapping reads) in a BAM file\n should be skipped. The default is to treat those\n regions as having a value of zero. The decision to\n skip non-covered regions depends on the interpretation\n of the data. Non-covered regions may represent, for\n example, repetitive regions that should be skipped.\n (default: False)\n --smoothLength INT bp\n The smooth length defines a window, larger than the\n binSize, to average the number of reads. For example,\n if the --binSize is set to 20 and the --smoothLength\n is set to 60, then, for each bin, the average of the\n bin and its left and right neighbors is considered.\n Any value smaller than --binSize will be ignored and\n no smoothing will be applied. (default: None)\nRead processing options:\n --extendReads [INT bp], -e [INT bp]\n This parameter allows the extension of reads to\n fragment size. If set, each read is extended, without\n exception. *NOTE*: This feature is generally NOT\n recommended for spliced-read data, such as RNA-seq, as\n it would extend reads over skipped regions. *Single-\n end*: Requires a user specified value for the final\n fragment length. Reads that already exceed this\n fragment length will not be extended. *Paired-end*:\n Reads with mates are always extended to match the\n fragment size defined by the two read mates. Unmated\n reads, mate reads that map too far apart (>4x fragment\n length) or even map to different chromosomes are\n treated like single-end reads. The input of a fragment\n length value is optional. If no value is specified, it\n is estimated from the data (mean of the fragment size\n of all mate reads). (default: False)\n --ignoreDuplicates If set, reads that have the same orientation and start\n position will be considered only once. If reads are\n paired, the mate's position also has to coincide to\n ignore a read. (default: False)\n --minMappingQuality INT\n If set, only reads that have a mapping quality score\n of at least this are considered. (default: None)\n --centerReads By adding this option, reads are centered with respect\n to the fragment length. For paired-end data, the read\n is centered at the fragment length defined by the two\n ends of the fragment. For single-end data, the given\n fragment length is used. This option is useful to get\n a sharper signal around enriched regions. (default:\n False)\n --samFlagInclude INT Include reads based on the SAM flag. For example, to\n get only reads that are the first mate, use a flag of\n 64. This is useful to count properly paired reads only\n once, as otherwise the second mate will be also\n considered for the coverage. (default: None)\n --samFlagExclude INT Exclude reads based on the SAM flag. For example, to\n get only reads that map to the forward strand, use\n --samFlagExclude 16, where 16 is the SAM flag for\n reads that map to the reverse strand. (default: None)\n", "requirements": { "InlineJavascriptRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "reddylab/deeptools:3.0.1" } }, "inputs": { "verbose": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "--verbose" }, "doc": "--verbose \nSet to see processing messages. (default: False)\n" }, "binSize": { "type": "int?", "inputBinding": { "position": 1, "prefix": "--binSize" }, "doc": "INT bp\nSize of the bins, in bases, for the output of the\nbigwig/bedgraph file. (default: 50)\n" }, "MNase": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "--MNase" }, "doc": "Determine nucleosome positions from MNase-seq data.\nOnly 3 nucleotides at the center of each fragment are\ncounted. The fragment ends are defined by the two mate\nreads. Only fragment lengthsbetween 130 - 200 bp are\nconsidered to avoid dinucleosomes or other\nartifacts.*NOTE*: Requires paired-end data. A bin size\nof 1 is recommended. (default: False)\n" }, "ignoreDuplicates": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "--ignoreDuplicates" }, "doc": "If set, reads that have the same orientation and start\nposition will be considered only once. If reads are\npaired, the mate's position also has to coincide to\nignore a read. (default: False)\n" }, "numberOfProcessors": { "type": "int?", "inputBinding": { "position": 1, "prefix": "--numberOfProcessors" }, "doc": "INT\nNumber of processors to use. Type \"max/2\" to use half\nthe maximum number of processors or \"max\" to use all\navailable processors. (default: max/2)\n" }, "outFileName": { "type": "string?", "doc": "FILENAME\nOutput file name. (default: input BAM filename with bigwig [*.bw] or bedgraph [*.bdg] extension.)\n" }, "smoothLength": { "type": "int?", "inputBinding": { "position": 1, "prefix": "--smoothLength" }, "doc": "INT bp\nThe smooth length defines a window, larger than the\nbinSize, to average the number of reads. For example,\nif the --binSize is set to 20 and the --smoothLength\nis set to 60, then, for each bin, the average of the\nbin and its left and right neighbors is considered.\nAny value smaller than --binSize will be ignored and\nno smoothing will be applied. (default: None)\nRead processing options:\n" }, "version": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "--version" }, "doc": "show program's version number and exit" }, "extendReads": { "type": "int?", "inputBinding": { "position": 1, "prefix": "--extendReads" }, "doc": "INT bp\nThis parameter allows the extension of reads to\nfragment size. If set, each read is extended, without\nexception. *NOTE*: This feature is generally NOT\nrecommended for spliced-read data, such as RNA-seq, as\nit would extend reads over skipped regions. *Single-\nend*: Requires a user specified value for the final\nfragment length. Reads that already exceed this\nfragment length will not be extended. *Paired-end*:\nReads with mates are always extended to match the\nfragment size defined by the two read mates. Unmated\nreads, mate reads that map too far apart (>4x fragment\nlength) or even map to different chromosomes are\ntreated like single-end reads. The input of a fragment\nlength value is optional. If no value is specified, it\nis estimated from the data (mean of the fragment size\nof all mate reads). (default: False)\n" }, "centerReads": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "--centerReads" }, "doc": "By adding this option, reads are centered with respect\nto the fragment length. For paired-end data, the read\nis centered at the fragment length defined by the two\nends of the fragment. For single-end data, the given\nfragment length is used. This option is useful to get\na sharper signal around enriched regions. (default:\nFalse)\n" }, "samFlagExclude": { "type": "int?", "inputBinding": { "position": 1, "prefix": "--samFlagExclude" }, "doc": "INT \nExclude reads based on the SAM flag. For example, to\nget only reads that map to the forward strand, use\n--samFlagExclude 16, where 16 is the SAM flag for\nreads that map to the reverse strand. (default: None)\n" }, "samFlagInclude": { "type": "int?", "inputBinding": { "position": 1, "prefix": "--samFlagInclude" }, "doc": "INT \nInclude reads based on the SAM flag. For example, to\nget only reads that are the first mate, use a flag of\n64. This is useful to count properly paired reads only\nonce, as otherwise the second mate will be also\nconsidered for the coverage. (default: None)\n" }, "filterRNAstrand": { "type": "string?", "inputBinding": { "position": 1, "prefix": "--filterRNAstrand" }, "doc": "{forward,reverse}\nSelects RNA-seq reads (single-end or paired-end) in\nthe given strand. (default: None)\n" }, "scaleFactor": { "type": "float?", "inputBinding": { "position": 1, "prefix": "--scaleFactor" }, "doc": "The smooth length defines a window, larger than the\nbinSize, to average the number of reads. For example,\nif the \u2013binSize is set to 20 and the \u2013smoothLength is\nset to 60, then, for each bin, the average of the bin\nand its left and right neighbors is considered.\nAny value smaller than \u2013binSize will be ignored and\nno smoothing will be applied. (default: 1.0)\n" }, "skipNonCoveredRegions": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "--skipNonCoveredRegions" }, "doc": "--skipNonCoveredRegions, --skipNAs\nThis parameter determines if non-covered regions\n(regions without overlapping reads) in a BAM file\nshould be skipped. The default is to treat those\nregions as having a value of zero. The decision to\nskip non-covered regions depends on the interpretation\nof the data. Non-covered regions may represent, for\nexample, repetitive regions that should be skipped.\n(default: False)\n" }, "outFileFormat": { "type": "string", "default": "bigwig", "inputBinding": { "position": 1, "prefix": "--outFileFormat" }, "doc": "{bigwig,bedgraph}, -of {bigwig,bedgraph}\nOutput file type. Either \"bigwig\" or \"bedgraph\".\n(default: bigwig)\n" }, "output_suffix": { "type": "string?", "doc": "Suffix used for output file (input BAM filename + suffix)" }, "region": { "type": "string?", "inputBinding": { "position": 1, "prefix": "--region" }, "doc": "CHR:START:END\nRegion of the genome to limit the operation to - this\nis useful when testing parameters to reduce the\ncomputing time. The format is chr:start:end, for\nexample --region chr10 or --region\nchr10:456700:891000. (default: None)\n" }, "normalizeUsing": { "type": "string?", "inputBinding": { "position": 1, "prefix": "--normalizeUsing" }, "doc": "Possible choices: RPKM, CPM, BPM, RPGC\n\nUse one of the entered methods to normalize the number of reads per bin.\nBy default, no normalization is performed. RPKM = Reads Per Kilobase per\nMillion mapped reads; CPM = Counts Per Million mapped reads, same as\nCPM in RNA-seq; BPM = Bins Per Million mapped reads, same as TPM in\nRNA-seq; RPGC = reads per genomic content (1x normalization);\nMapped reads are considered after blacklist filtering (if applied).\nRPKM (per bin) = number of reads per bin / (number of mapped reads (in millions) * bin length (kb)).\nCPM (per bin) = number of reads per bin / number of mapped reads (in millions).\nBPM (per bin) = number of reads per bin / sum of all reads per bin (in millions).\nRPGC (per bin) = number of reads per bin / scaling factor for 1x average coverage.\nThis scaling factor, in turn, is determined from the sequencing depth:\n(total number of mapped reads * fragment length) / effective genome size.\nThe scaling factor used is the inverse of the sequencing depth\ncomputed for the sample to match the 1x coverage.\nThis option requires \u2013effectiveGenomeSize.\nEach read is considered independently, if you want to only count one\nmate from a pair in paired-end data, then use\nthe \u2013samFlagInclude/\u2013samFlagExclude options.\n" }, "ignoreForNormalization": { "type": [ "null", { "type": "array", "items": "string" } ], "inputBinding": { "position": 1, "prefix": "--ignoreForNormalization" }, "doc": "A list of space-delimited chromosome names containing those chromosomes\nthat should be excluded for computing the normalization. This is useful\nwhen considering samples with unequal coverage across chromosomes, like\nmale samples. An usage examples is \u2013ignoreForNormalization chrX chrM.\n" }, "blackListFileName": { "type": "File?", "inputBinding": { "position": 1, "prefix": "--blackListFileName" }, "doc": "BED file\nA BED file containing regions that should be excluded\nfrom all analyses. Currently this works by rejecting\ngenomic chunks that happen to overlap an entry.\nConsequently, for BAM files, if a read partially\noverlaps a blacklisted region or a fragment spans over\nit, then the read/fragment might still be considered.\n(default: None)\n" }, "bam": { "type": "File", "secondaryFiles": [ ".bai" ], "inputBinding": { "position": 1, "prefix": "--bam" }, "doc": "BAM file to process " }, "minMappingQuality": { "type": "int?", "inputBinding": { "position": 1, "prefix": "--minMappingQuality" }, "doc": "INT\nIf set, only reads that have a mapping quality score\nof at least this are considered. (default: None)\n" } }, "outputs": { "output_bam_coverage": { "type": "File", "outputBinding": { "glob": "${ if (inputs.outFileName) return inputs.outFileName; if (inputs.output_suffix) return inputs.bam.path.replace(/^.*[\\\\\\/]/, \"\").replace(/\\.[^/.]+$/, \"\") + inputs.output_suffix; if (inputs.outFileFormat == \"bedgraph\") return inputs.bam.path.replace(/^.*[\\\\\\/]/, \"\").replace(/\\.[^/.]+$/, \"\") + \".bdg\"; return inputs.bam.path.replace(/^.*[\\\\\\/]/, \"\").replace(/\\.[^/.]+$/, \"\") + \".bw\"; }" } } }, "baseCommand": "bamCoverage", "arguments": [ { "valueFrom": "${ if (inputs.outFileName) return inputs.outFileName; if (inputs.output_suffix) return inputs.bam.path.replace(/^.*[\\\\\\/]/, \"\").replace(/\\.[^/.]+$/, \"\") + inputs.output_suffix; if (inputs.outFileFormat == \"bedgraph\") return inputs.bam.path.replace(/^.*[\\\\\\/]/, \"\").replace(/\\.[^/.]+$/, \"\") + \".bdg\"; return inputs.bam.path.replace(/^.*[\\\\\\/]/, \"\").replace(/\\.[^/.]+$/, \"\") + \".bw\"; }", "prefix": "--outFileName", "position": 3 } ] }, "scatter": "bam", "in": { "bam": "input_bam_files", "numberOfProcessors": "nthreads", "normalizeUsing": { "valueFrom": "RPKM" }, "binSize": { "valueFrom": "${return 1}" }, "output_suffix": { "valueFrom": ".rpkm.bw" }, "extendReads": { "valueFrom": "${return 200}" } }, "out": [ "output_bam_coverage" ] }, "bedtools_genomecov": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "Tool: bedtools genomecov (aka genomeCoverageBed)\nVersion: v2.25.0\nSummary: Compute the coverage of a feature file among a genome.\n\nUsage: bedtools genomecov [OPTIONS] -i -g \n\nOptions: \n\t-ibam\t\tThe input file is in BAM format.\n\t\t\tNote: BAM _must_ be sorted by position\n\n\t-d\t\tReport the depth at each genome position (with one-based coordinates).\n\t\t\tDefault behavior is to report a histogram.\n\n\t-dz\t\tReport the depth at each genome position (with zero-based coordinates).\n\t\t\tReports only non-zero positions.\n\t\t\tDefault behavior is to report a histogram.\n\n\t-bg\t\tReport depth in BedGraph format. For details, see:\n\t\t\tgenome.ucsc.edu/goldenPath/help/bedgraph.html\n\n\t-bga\t\tReport depth in BedGraph format, as above (-bg).\n\t\t\tHowever with this option, regions with zero \n\t\t\tcoverage are also reported. This allows one to\n\t\t\tquickly extract all regions of a genome with 0 \n\t\t\tcoverage by applying: \"grep -w 0$\" to the output.\n\n\t-split\t\tTreat \"split\" BAM or BED12 entries as distinct BED intervals.\n\t\t\twhen computing coverage.\n\t\t\tFor BAM files, this uses the CIGAR \"N\" and \"D\" operations \n\t\t\tto infer the blocks for computing coverage.\n\t\t\tFor BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds\n\t\t\tfields (i.e., columns 10,11,12).\n\n\t-strand\t\tCalculate coverage of intervals from a specific strand.\n\t\t\tWith BED files, requires at least 6 columns (strand is column 6). \n\t\t\t- (STRING): can be + or -\n\n\t-5\t\tCalculate coverage of 5\" positions (instead of entire interval).\n\n\t-3\t\tCalculate coverage of 3\" positions (instead of entire interval).\n\n\t-max\t\tCombine all positions with a depth >= max into\n\t\t\ta single bin in the histogram. Irrelevant\n\t\t\tfor -d and -bedGraph\n\t\t\t- (INTEGER)\n\n\t-scale\t\tScale the coverage by a constant factor.\n\t\t\tEach coverage value is multiplied by this factor before being reported.\n\t\t\tUseful for normalizing coverage by, e.g., reads per million (RPM).\n\t\t\t- Default is 1.0; i.e., unscaled.\n\t\t\t- (FLOAT)\n\n\t-trackline\tAdds a UCSC/Genome-Browser track line definition in the first line of the output.\n\t\t\t- See here for more details about track line definition:\n\t\t\t http://genome.ucsc.edu/goldenPath/help/bedgraph.html\n\t\t\t- NOTE: When adding a trackline definition, the output BedGraph can be easily\n\t\t\t uploaded to the Genome Browser as a custom track,\n\t\t\t BUT CAN NOT be converted into a BigWig file (w/o removing the first line).\n\n\t-trackopts\tWrites additional track line definition parameters in the first line.\n\t\t\t- Example:\n\t\t\t -trackopts 'name=\"My Track\" visibility=2 color=255,30,30'\n\t\t\t Note the use of single-quotes if you have spaces in your parameters.\n\t\t\t- (TEXT)\n\nNotes: \n\t(1) The genome file should tab delimited and structured as follows:\n\t \n\n\tFor example, Human (hg19):\n\tchr1\t249250621\n\tchr2\t243199373\n\t...\n\tchr18_gl000207_random\t4262\n\n\t(2) The input BED (-i) file must be grouped by chromosome.\n\t A simple \"sort -k 1,1 > .sorted\" will suffice.\n\n\t(3) The input BAM (-ibam) file must be sorted by position.\n\t A \"samtools sort \" should suffice.\n\nTips: \n\tOne can use the UCSC Genome Browser's MySQL database to extract\n\tchromosome sizes. For example, H. sapiens:\n\n\tmysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \\\n\t\"select chrom, size from hg19.chromInfo\" > hg19.genome", "requirements": { "InlineJavascriptRequirement": {}, "ShellCommandRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "dukegcb/bedtools" } }, "inputs": { "bga": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "-bga" }, "doc": "\tReport depth in BedGraph format, as above (-bg).\nHowever with this option, regions with zero\ncoverage are also reported. This allows one to\nquickly extract all regions of a genome with 0\ncoverage by applying: \"grep -w 0$\" to the output.\n" }, "bg": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "-bg" }, "doc": "\tReport depth in BedGraph format. For details, see:\ngenome.ucsc.edu/goldenPath/help/bedgraph.html\n" }, "d": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "-d" }, "doc": "\tReport the depth at each genome position (with one-based coordinates).\nDefault behavior is to report a histogram.\n" }, "g": { "type": "File", "inputBinding": { "position": 3, "prefix": "-g" }, "doc": "" }, "max": { "type": "int?", "inputBinding": { "position": 1, "prefix": "-max" }, "doc": "\tCombine all positions with a depth >= max into\na single bin in the histogram. Irrelevant\nfor -d and -bedGraph\n- (INTEGER)\n" }, "trackopts": { "type": "string?", "inputBinding": { "position": 1, "prefix": "-trackopts" }, "doc": "Writes additional track line definition parameters in the first line.\n- Example:\n-trackopts 'name=\"My Track\" visibility=2 color=255,30,30'\nNote the use of single-quotes if you have spaces in your parameters.\n- (TEXT)\n" }, "trackline": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "-trackline" }, "doc": "Adds a UCSC/Genome-Browser track line definition in the first line of the output.\n- See here for more details about track line definition:\nhttp://genome.ucsc.edu/goldenPath/help/bedgraph.html\n- NOTE: When adding a trackline definition, the output BedGraph can be easily\nuploaded to the Genome Browser as a custom track,\nBUT CAN NOT be converted into a BigWig file (w/o removing the first line).\n" }, "3": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "-3" }, "doc": "\tCalculate coverage of 3\" positions (instead of entire interval).\n" }, "scale": { "type": "float?", "inputBinding": { "position": 1, "prefix": "-scale" }, "doc": "\tScale the coverage by a constant factor.\nEach coverage value is multiplied by this factor before being reported.\nUseful for normalizing coverage by, e.g., reads per million (RPM).\n- Default is 1.0; i.e., unscaled.\n- (FLOAT)\n" }, "dz": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "-dz" }, "doc": "\tReport the depth at each genome position (with zero-based coordinates).\nReports only non-zero positions.\nDefault behavior is to report a histogram.\n" }, "split": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "-split" }, "doc": "\tTreat \"split\" BAM or BED12 entries as distinct BED intervals.\nwhen computing coverage.\nFor BAM files, this uses the CIGAR \"N\" and \"D\" operations\nto infer the blocks for computing coverage.\nFor BED12 files, this uses the BlockCount, BlockStarts, and BlockEnds\nfields (i.e., columns 10,11,12).\n" }, "ibam": { "type": "File", "inputBinding": { "position": 2, "prefix": "-ibam" }, "doc": "\tThe input file is in BAM format.\nNote: BAM _must_ be sorted by position\n" }, "5": { "type": "boolean?", "inputBinding": { "position": 1, "prefix": "-5" }, "doc": "\tCalculate coverage of 5\" positions (instead of entire interval).\n" }, "strand": { "type": "string?", "inputBinding": { "position": 1, "prefix": "-strand" }, "doc": "\tCalculate coverage of intervals from a specific strand.\nWith BED files, requires at least 6 columns (strand is column 6).\n- (STRING): can be + or -\n" } }, "outputs": { "output_bedfile": { "type": "File", "outputBinding": { "glob": "$(inputs.ibam.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + '.bdg')" } } }, "baseCommand": [ "bedtools", "genomecov" ], "stdout": "$(inputs.ibam.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, '') + '.bdg')" }, "scatter": "ibam", "in": { "bg": { "valueFrom": "${return true}" }, "g": "input_genome_sizes", "ibam": "input_bam_files" }, "out": [ "output_bedfile" ] }, "bedsort_genomecov": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "bedSort - Sort a .bed file by chrom,chromStart\nusage:\n bedSort in.bed out.bed\nin.bed and out.bed may be the same.\n", "requirements": { "InlineJavascriptRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "dleehr/docker-hubutils" } }, "inputs": { "bed_file": { "type": "File", "inputBinding": { "position": 1 }, "doc": "Bed or bedGraph file to be sorted" } }, "outputs": { "bed_file_sorted": { "type": "File", "outputBinding": { "glob": "$(inputs.bed_file.path.replace(/^.*[\\\\\\/]/, '') + \"_sorted\")" } } }, "baseCommand": "bedSort", "arguments": [ { "valueFrom": "$(inputs.bed_file.path.replace(/^.*[\\\\\\/]/, '') + \"_sorted\")", "position": 2 } ] }, "scatter": "bed_file", "in": { "bed_file": "bedtools_genomecov/output_bedfile" }, "out": [ "bed_file_sorted" ] }, "bdg2bw-raw": { "run": { "class": "CommandLineTool", "cwlVersion": "v1.0", "doc": "Tool: bedGraphToBigWig v 4 - Convert a bedGraph file to bigWig format.", "requirements": { "InlineJavascriptRequirement": {} }, "hints": { "DockerRequirement": { "dockerPull": "dukegcb/bedgraphtobigwig" } }, "inputs": { "output_suffix": { "type": "string", "default": ".bw" }, "genome_sizes": { "type": "File", "inputBinding": { "position": 2 }, "doc": "\tgenome_sizes is two column: .\n" }, "bed_graph": { "type": "File", "inputBinding": { "position": 1 }, "doc": "\tbed_graph is a four column file in the format: \n" } }, "outputs": { "output_bigwig": { "type": "File", "outputBinding": { "glob": "$(inputs.bed_graph.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, \"\") + inputs.output_suffix)" } } }, "baseCommand": "bedGraphToBigWig", "arguments": [ { "valueFrom": "$(inputs.bed_graph.path.replace(/^.*[\\\\\\/]/, '').replace(/\\.[^/.]+$/, \"\") + inputs.output_suffix)", "position": 3 } ] }, "scatter": "bed_graph", "in": { "output_suffix": { "valueFrom": ".raw.bw" }, "genome_sizes": "input_genome_sizes", "bed_graph": "bedsort_genomecov/bed_file_sorted" }, "out": [ "output_bigwig" ] } }, "outputs": { "bigwig_raw_files": { "doc": "Raw reads bigWig (signal) files", "type": "File[]", "outputSource": "bdg2bw-raw/output_bigwig" }, "bigwig_norm_files": { "doc": "signal files of pileup reads in RPKM", "type": "File[]", "outputSource": "bamcoverage/output_bam_coverage" } } }, "in": { "input_bam_files": "map/output_data_sorted_dedup_bam_files", "input_genome_sizes": "genome_sizes_file", "nthreads": "nthreads_quant" }, "out": [ "bigwig_raw_files", "bigwig_norm_files" ] } }, "outputs": { "qc_fastqc_data_files": { "doc": "FastQC data files", "type": "File[]", "outputSource": "qc/output_fastqc_data_files" }, "qc_fastqc_report_files": { "doc": "FastQC reports in zip format", "type": "File[]", "outputSource": "qc/output_fastqc_report_files" }, "qc_count_raw_reads": { "doc": "Raw read counts of fastq files after QC", "type": "File[]", "outputSource": "qc/output_count_raw_reads" }, "qc_diff_counts": { "doc": "Diff file between number of raw reads and number of reads counted by FASTQC,", "type": "File[]", "outputSource": "qc/output_diff_counts" }, "trimm_fastq_files": { "doc": "FASTQ files after trimming", "type": "File[]", "outputSource": "trimm/output_data_fastq_trimmed_files" }, "trimm_raw_counts": { "doc": "Raw read counts of fastq files after trimming", "type": "File[]", "outputSource": "trimm/output_trimmed_fastq_read_count" }, "map_read_count_mapped": { "doc": "Read counts of the mapped BAM files", "type": "File[]", "outputSource": "map/output_read_count_mapped" }, "map_bowtie_log_files": { "doc": "Bowtie log file with mapping stats", "type": "File[]", "outputSource": "map/output_bowtie_log" }, "map_preseq_percentage_uniq_reads": { "doc": "Preseq percentage of uniq reads", "type": "File[]", "outputSource": "map/output_percentage_uniq_reads" }, "map_pbc_files": { "doc": "PCR Bottleneck Coefficient files (used to flag samples when pbc<0.5)", "type": "File[]", "outputSource": "map/output_pbc_files" }, "map_dedup_bam_files": { "doc": "Filtered BAM files (post-processing end point)", "type": "File[]", "outputSource": "map/output_data_sorted_dups_marked_bam_files" }, "map_mark_duplicates_files": { "doc": "Summary of duplicates removed with Picard tool MarkDuplicates (for multiple reads aligned to the same positions", "type": "File[]", "outputSource": "map/output_picard_mark_duplicates_files" }, "map_preseq_c_curve_files": { "doc": "Preseq c_curve output files", "type": "File[]", "outputSource": "map/output_preseq_c_curve_files" }, "map_percent_mitochondrial_reads": { "doc": "Percentage of mitochondrial reads", "type": "File[]", "outputSource": "map/output_percent_mitochondrial_reads" }, "peakcall_peak_file": { "doc": "Peaks in ENCODE Peak file format", "type": "File[]", "outputSource": "peak_call/output_peak_file" }, "peakcall_spp_x_cross_corr": { "doc": "SPP strand cross correlation summary", "type": "File[]", "outputSource": "peak_call/output_spp_x_cross_corr" }, "peakcall_peak_xls_file": { "doc": "Peak calling report file", "type": "File[]", "outputSource": "peak_call/output_peak_xls_file" }, "peakcall_peak_summits_file": { "doc": "Peaks summits in bedfile format", "type": "File[]", "outputSource": "peak_call/output_peak_summits_file" }, "peakcall_peak_count_within_replicate": { "doc": "Peak counts within replicate", "type": "File[]", "outputSource": "peak_call/output_peak_count_within_replicate" }, "peakcall_spp_x_cross_corr_plot": { "doc": "SPP strand cross correlation plot", "type": "File[]", "outputSource": "peak_call/output_spp_cross_corr_plot" }, "peakcall_filtered_read_count_file": { "doc": "Filtered read count after peak calling", "type": "File[]", "outputSource": "peak_call/output_filtered_read_count_file" }, "peakcall_extended_peak_file": { "doc": "Extended fragment peaks in ENCODE Peak file format", "type": "File[]", "outputSource": "peak_call/output_extended_peak_file" }, "peakcall_read_in_peak_count_within_replicate": { "doc": "Peak counts within replicate", "type": "File[]", "outputSource": "peak_call/output_read_in_peak_count_within_replicate" }, "peakcall_peak_bigbed_file": { "doc": "Peaks in bigBed format", "type": "File[]", "outputSource": "peak_call/output_peak_bigbed_file" }, "quant_bigwig_raw_files": { "doc": "Raw reads bigWig (signal) files", "type": "File[]", "outputSource": "quant/bigwig_raw_files" }, "quant_bigwig_norm_files": { "doc": "Normalized reads bigWig (signal) files", "type": "File[]", "outputSource": "quant/bigwig_norm_files" } }, "label": "ATAC-seq-pipeline-se", "$namespaces": { "sbg": "https://sevenbridges.com" }, "sbg:appVersion": [ "v1.0" ], "id": "https://api.sbgenomics.com/v2/apps/kghosesbg/sbpla-31744/ATAC-seq-pipeline-se/2/raw/", "sbg:id": "kghosesbg/sbpla-31744/ATAC-seq-pipeline-se/2", "sbg:revision": 2, "sbg:revisionNotes": "Uploaded using sbpack v2020.02.14. \nSource: https://raw.githubusercontent.com/Duke-GCB/GGR-cwl/master/v1.0/ATAC-seq_pipeline/pipeline-se.cwl", "sbg:modifiedOn": 1581699121, "sbg:modifiedBy": "kghosesbg", "sbg:createdOn": 1580500895, "sbg:createdBy": "kghosesbg", "sbg:project": "kghosesbg/sbpla-31744", "sbg:projectName": "SBPLA-31744", "sbg:sbgMaintained": false, "sbg:validationErrors": [ "Required input is not set: #qc.input_fastq_files", "Required input is not set: #qc.default_adapters_file", "Required input is not set: #qc.nthreads", "Required input is not set: #trimm.input_fastq_files", "Required input is not set: #trimm.input_adapters_files", "Required input is not set: #map.input_fastq_files", "Required input is not set: #map.genome_sizes_file", "Required input is not set: #map.genome_ref_first_index_file", "Required input is not set: #peak_call.input_bam_files", "Required input is not set: #peak_call.input_genome_sizes", "Required input is not set: #peak_call.as_narrowPeak_file", "Required input is not set: #quant.input_bam_files", "Required input is not set: #quant.input_genome_sizes" ], "sbg:contributors": [ "kghosesbg" ], "sbg:latestRevision": 2, "sbg:revisionsInfo": [ { "sbg:revision": 0, "sbg:modifiedBy": "kghosesbg", "sbg:modifiedOn": 1580500895, "sbg:revisionNotes": "Uploaded using sbpack. Source: https://raw.githubusercontent.com/Duke-GCB/GGR-cwl/master/v1.0/ATAC-seq_pipeline/pipeline-se.cwl" }, { "sbg:revision": 1, "sbg:modifiedBy": "kghosesbg", "sbg:modifiedOn": 1580742764, "sbg:revisionNotes": "Just moved a node" }, { "sbg:revision": 2, "sbg:modifiedBy": "kghosesbg", "sbg:modifiedOn": 1581699121, "sbg:revisionNotes": "Uploaded using sbpack v2020.02.14. \nSource: https://raw.githubusercontent.com/Duke-GCB/GGR-cwl/master/v1.0/ATAC-seq_pipeline/pipeline-se.cwl" } ], "sbg:image_url": "https://igor.sbgenomics.com/ns/brood/images/kghosesbg/sbpla-31744/ATAC-seq-pipeline-se/2.png", "sbg:publisher": "sbg", "sbg:content_hash": "ad9474546d1d7aba5aa20e3c7a03b5429e5f8ec1d18be92cbab7315600a6bce48" }cwl-format-2022.02.18/tests/cwl/original-commented.cwl000066400000000000000000000007441420374476100224130ustar00rootroot00000000000000 # Top comment is preserved class: CommandLineTool cwlVersion: v1.0 inputs: in1: type: string inputBinding: position: 1 valueFrom: A_$(inputs.in1)_B_${return inputs.in1}_C_$(inputs.in1) baseCommand: echo arguments: - valueFrom: $(runtime) outputs: out1: type: string outputBinding: glob: out.txt loadContents: true outputEval: $(self[0].contents)_D_$(runtime.cores) stdout: out.txt requirements: InlineJavascriptRequirement: {} cwl-format-2022.02.18/tests/cwl/original-fragment.json000066400000000000000000000001151420374476100224170ustar00rootroot00000000000000{ "no such field": 22, "A: this should go first": ["what", "a", "list"] }cwl-format-2022.02.18/tests/cwl/original-no-comment.cwl000066400000000000000000000007071420374476100225130ustar00rootroot00000000000000class: CommandLineTool cwlVersion: v1.0 inputs: in1: type: string inputBinding: position: 1 valueFrom: A_$(inputs.in1)_B_${return inputs.in1}_C_$(inputs.in1) baseCommand: echo arguments: - valueFrom: $(runtime) outputs: out1: type: string outputBinding: glob: out.txt loadContents: true outputEval: $(self[0].contents)_D_$(runtime.cores) stdout: out.txt requirements: InlineJavascriptRequirement: {} cwl-format-2022.02.18/tests/cwl/original-other-runner.cwl000066400000000000000000000007441420374476100230700ustar00rootroot00000000000000#!/usr/bin/env other-runner class: CommandLineTool cwlVersion: v1.0 inputs: in1: type: string inputBinding: position: 1 valueFrom: A_$(inputs.in1)_B_${return inputs.in1}_C_$(inputs.in1) baseCommand: echo arguments: - valueFrom: $(runtime) outputs: out1: type: string outputBinding: glob: out.txt loadContents: true outputEval: $(self[0].contents)_D_$(runtime.cores) stdout: out.txt requirements: InlineJavascriptRequirement: {} cwl-format-2022.02.18/tests/test_battery.py000066400000000000000000000015101420374476100204150ustar00rootroot00000000000000# Copyright (c) 2020 Seven Bridges import pathlib from cwlformat.formatter import cwl_format, yaml, format_node current_path = pathlib.Path(__file__).parent def test_formatting_battery(): path = current_path / "cwl" for raw_name in path.glob("original-*"): expected_name = raw_name.parent / pathlib.Path("formatted-" + "-".join(raw_name.stem.split("-")[1:]) + ".cwl") formatted_cwl = cwl_format(raw_name.open("r").read()) expected_raw_cwl = expected_name.open("r").read() assert formatted_cwl == expected_raw_cwl def test_node_conservation(): path = current_path / "cwl" for raw_name in path.glob("original-*"): original_cwl = yaml.load(raw_name.open("r").read()) formatted_cwl = format_node(original_cwl, node_path=[]) assert formatted_cwl == original_cwl cwl-format-2022.02.18/tests/test_explode.py000066400000000000000000000007771420374476100204210ustar00rootroot00000000000000# Copyright (c) 2020 Seven Bridges import pathlib import ruamel.yaml from cwlformat.explode import explode, CWLProcess yaml = ruamel.yaml.YAML() current_path = pathlib.Path(__file__).parent def test_explode(): path = current_path / "cwl" src_fp = path / "formatted-atac-seq-pipeline.cwl" fp_out = src_fp.parent / "expected-exploded-atac-seq.cwl" as_dict = yaml.load(src_fp.read_text()) for exploded in explode(CWLProcess(as_dict, fp_out)): assert exploded.file_path.exists()