cwltool-1.0.20180302231433/0000755000175200017520000000000013247251336015601 5ustar mcrusoemcrusoe00000000000000cwltool-1.0.20180302231433/gittaggers.py0000644000175200017520000000141713247251316020314 0ustar mcrusoemcrusoe00000000000000import subprocess import time from setuptools.command.egg_info import egg_info class EggInfoFromGit(egg_info): """Tag the build with git commit timestamp. If a build tag has already been set (e.g., "egg_info -b", building from source package), leave it alone. """ def git_timestamp_tag(self): gitinfo = subprocess.check_output( ['git', 'log', '--first-parent', '--max-count=1', '--format=format:%ct', '.']).strip() return time.strftime('.%Y%m%d%H%M%S', time.gmtime(int(gitinfo))) def tags(self): if self.tag_build is None: try: self.tag_build = self.git_timestamp_tag() except subprocess.CalledProcessError: pass return egg_info.tags(self) cwltool-1.0.20180302231433/PKG-INFO0000644000175200017520000007323313247251336016706 0ustar mcrusoemcrusoe00000000000000Metadata-Version: 1.1 Name: cwltool Version: 1.0.20180302231433 Summary: Common workflow language reference implementation Home-page: https://github.com/common-workflow-language/cwltool Author: Common workflow language working group Author-email: common-workflow-language@googlegroups.com License: UNKNOWN Download-URL: https://github.com/common-workflow-language/cwltool Description-Content-Type: UNKNOWN Description: ================================================================== Common Workflow Language tool description reference implementation ================================================================== CWL conformance tests: |Build Status| Travis CI: |Unix Build Status| .. |Unix Build Status| image:: https://img.shields.io/travis/common-workflow-language/cwltool/master.svg?label=unix%20build :target: https://travis-ci.org/common-workflow-language/cwltool This is the reference implementation of the Common Workflow Language. It is intended to feature complete and provide comprehensive validation of CWL files as well as provide other tools related to working with CWL. This is written and tested for Python ``2.7 and 3.x {x = 3, 4, 5, 6}`` The reference implementation consists of two packages. The ``cwltool`` package is the primary Python module containing the reference implementation in the ``cwltool`` module and console executable by the same name. The ``cwlref-runner`` package is optional and provides an additional entry point under the alias ``cwl-runner``, which is the implementation-agnostic name for the default CWL interpreter installed on a host. Install ------- It is highly recommended to setup virtual environment before installing `cwltool`: .. code:: bash virtualenv -p python2 venv # Create a virtual environment, can use `python3` as well source venv/bin/activate # Activate environment before installing `cwltool` Installing the official package from PyPi (will install "cwltool" package as well) .. code:: bash pip install cwlref-runner If installing alongside another CWL implementation then .. code:: bash pip install cwltool Or you can install from source: .. code:: bash git clone https://github.com/common-workflow-language/cwltool.git # clone cwltool repo cd cwltool # Switch to source directory pip install . # Install `cwltool` from source cwltool --version # Check if the installation works correctly Remember, if co-installing multiple CWL implementations then you need to maintain which implementation ``cwl-runner`` points to via a symbolic file system link or `another facility `_. Running tests locally --------------------- - Running basic tests ``(/tests)``: To run the basis tests after installing `cwltool` execute the following: .. code:: bash pip install pytest mock py.test --ignore cwltool/schemas/ --pyarg cwltool To run various tests in all supported Python environments we use `tox `_. To run the test suite in all supported Python environments first downloading the complete code repository (see the ``git clone`` instructions above) and then run the following in the terminal: ``pip install tox; tox`` List of all environment can be seen using: ``tox --listenvs`` and running a specfic test env using: ``tox -e `` - Running the entire suite of CWL conformance tests: The GitHub repository for the CWL specifications contains a script that tests a CWL implementation against a wide array of valid CWL files using the `cwltest `_ program Instructions for running these tests can be found in the Common Workflow Language Specification repository at https://github.com/common-workflow-language/common-workflow-language/blob/master/CONFORMANCE_TESTS.md Run on the command line ----------------------- Simple command:: cwl-runner [tool-or-workflow-description] [input-job-settings] Or if you have multiple CWL implementations installed and you want to override the default cwl-runner use:: cwltool [tool-or-workflow-description] [input-job-settings] Use with boot2docker -------------------- boot2docker is running docker inside a virtual machine and it only mounts ``Users`` on it. The default behavior of CWL is to create temporary directories under e.g. ``/Var`` which is not accessible to Docker containers. To run CWL successfully with boot2docker you need to set the ``--tmpdir-prefix`` and ``--tmp-outdir-prefix`` to somewhere under ``/Users``:: $ cwl-runner --tmp-outdir-prefix=/Users/username/project --tmpdir-prefix=/Users/username/project wc-tool.cwl wc-job.json .. |Build Status| image:: https://ci.commonwl.org/buildStatus/icon?job=cwltool-conformance :target: https://ci.commonwl.org/job/cwltool-conformance/ Using user-space replacements for Docker ---------------------------------------- Some shared computing environments don't support Docker software containers for technical or policy reasons. As a work around, the CWL reference runner supports using a alternative ``docker`` implementations on Linux with the ``--user-space-docker-cmd`` option. One such "user space" friendly docker replacement is ``udocker`` https://github.com/indigo-dc/udocker and another is ``dx-docker`` https://wiki.dnanexus.com/Developer-Tutorials/Using-Docker-Images udocker installation: https://github.com/indigo-dc/udocker/blob/master/doc/installation_manual.md#22-install-from-indigo-datacloud-repositories dx-docker installation: start with the DNAnexus toolkit (see https://wiki.dnanexus.com/Downloads for instructions). Run `cwltool` just as you normally would, but with the new option, e.g. from the conformance tests: .. code:: bash cwltool --user-space-docker-cmd=udocker https://raw.githubusercontent.com/common-workflow-language/common-workflow-language/master/v1.0/v1.0/test-cwl-out2.cwl https://github.com/common-workflow-language/common-workflow-language/blob/master/v1.0/v1.0/empty.json or .. code:: bash cwltool --user-space-docker-cmd=dx-docker https://raw.githubusercontent.com/common-workflow-language/common-workflow-language/master/v1.0/v1.0/test-cwl-out2.cwl https://github.com/common-workflow-language/common-workflow-language/blob/master/v1.0/v1.0/empty.json ``cwltool`` can use `Singularity `_ as a Docker container runtime, an experimental feature. Singularity will run software containers specified in ``DockerRequirement`` and therefore works with Docker images only, native Singularity images are not supported. To use Singularity as the Docker container runtime, provide ``--singularity`` command line option to ``cwltool``. .. code:: bash cwltool --singularity https://raw.githubusercontent.com/common-workflow-language/common-workflow-language/master/v1.0/v1.0/v1.0/cat3-tool-mediumcut.cwl https://github.com/common-workflow-language/common-workflow-language/blob/master/v1.0/v1.0/cat-job.json Tool or workflow loading from remote or local locations ------------------------------------------------------- ``cwltool`` can run tool and workflow descriptions on both local and remote systems via its support for HTTP[S] URLs. Input job files and Workflow steps (via the `run` directive) can reference CWL documents using absolute or relative local filesytem paths. If a relative path is referenced and that document isn't found in the current directory then the following locations will be searched: http://www.commonwl.org/v1.0/CommandLineTool.html#Discovering_CWL_documents_on_a_local_filesystem Use with GA4GH Tool Registry API -------------------------------- Cwltool can launch tools directly from `GA4GH Tool Registry API`_ endpoints. By default, cwltool searches https://dockstore.org/ . Use --add-tool-registry to add other registries to the search path. For example :: cwltool --non-strict quay.io/collaboratory/dockstore-tool-bamstats:master test.json and (defaults to latest when a version is not specified) :: cwltool --non-strict quay.io/collaboratory/dockstore-tool-bamstats test.json For this example, grab the test.json (and input file) from https://github.com/CancerCollaboratory/dockstore-tool-bamstats .. _`GA4GH Tool Registry API`: https://github.com/ga4gh/tool-registry-schemas Import as a module ------------------ Add .. code:: python import cwltool to your script. The easiest way to use cwltool to run a tool or workflow from Python is to use a Factory .. code:: python import cwltool.factory fac = cwltool.factory.Factory() echo = f.make("echo.cwl") result = echo(inp="foo") # result["out"] == "foo" Leveraging SoftwareRequirements (Beta) -------------------------------------- CWL tools may be decorated with ``SoftwareRequirement`` hints that cwltool may in turn use to resolve to packages in various package managers or dependency management systems such as `Environment Modules `__. Utilizing ``SoftwareRequirement`` hints using cwltool requires an optional dependency, for this reason be sure to use specify the ``deps`` modifier when installing cwltool. For instance:: $ pip install 'cwltool[deps]' Installing cwltool in this fashion enables several new command line options. The most general of these options is ``--beta-dependency-resolvers-configuration``. This option allows one to specify a dependency resolvers configuration file. This file may be specified as either XML or YAML and very simply describes various plugins to enable to "resolve" ``SoftwareRequirement`` dependencies. To discuss some of these plugins and how to configure them, first consider the following ``hint`` definition for an example CWL tool. .. code:: yaml SoftwareRequirement: packages: - package: seqtk version: - r93 Now imagine deploying cwltool on a cluster with Software Modules installed and that a ``seqtk`` module is available at version ``r93``. This means cluster users likely won't have the binary ``seqtk`` on their ``PATH`` by default, but after sourcing this module with the command ``modulecmd sh load seqtk/r93`` ``seqtk`` is available on the ``PATH``. A simple dependency resolvers configuration file, called ``dependency-resolvers-conf.yml`` for instance, that would enable cwltool to source the correct module environment before executing the above tool would simply be: .. code:: yaml - type: modules The outer list indicates that one plugin is being enabled, the plugin parameters are defined as a dictionary for this one list item. There is only one required parameter for the plugin above, this is ``type`` and defines the plugin type. This parameter is required for all plugins. The available plugins and the parameters available for each are documented (incompletely) `here `__. Unfortunately, this documentation is in the context of Galaxy tool ``requirement`` s instead of CWL ``SoftwareRequirement`` s, but the concepts map fairly directly. cwltool is distributed with an example of such seqtk tool and sample corresponding job. It could executed from the cwltool root using a dependency resolvers configuration file such as the above one using the command:: cwltool --beta-dependency-resolvers-configuration /path/to/dependency-resolvers-conf.yml \ tests/seqtk_seq.cwl \ tests/seqtk_seq_job.json This example demonstrates both that cwltool can leverage existing software installations and also handle workflows with dependencies on different versions of the same software and libraries. However the above example does require an existing module setup so it is impossible to test this example "out of the box" with cwltool. For a more isolated test that demonstrates all the same concepts - the resolver plugin type ``galaxy_packages`` can be used. "Galaxy packages" are a lighter weight alternative to Environment Modules that are really just defined by a way to lay out directories into packages and versions to find little scripts that are sourced to modify the environment. They have been used for years in Galaxy community to adapt Galaxy tools to cluster environments but require neither knowledge of Galaxy nor any special tools to setup. These should work just fine for CWL tools. The cwltool source code repository's test directory is setup with a very simple directory that defines a set of "Galaxy packages" (but really just defines one package named ``random-lines``). The directory layout is simply:: tests/test_deps_env/ random-lines/ 1.0/ env.sh If the ``galaxy_packages`` plugin is enabled and pointed at the ``tests/test_deps_env`` directory in cwltool's root and a ``SoftwareRequirement`` such as the following is encountered. .. code:: yaml hints: SoftwareRequirement: packages: - package: 'random-lines' version: - '1.0' Then cwltool will simply find that ``env.sh`` file and source it before executing the corresponding tool. That ``env.sh`` script is only responsible for modifying the job's ``PATH`` to add the required binaries. This is a full example that works since resolving "Galaxy packages" has no external requirements. Try it out by executing the following command from cwltool's root directory:: cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf.yml \ tests/random_lines.cwl \ tests/random_lines_job.json The resolvers configuration file in the above example was simply: .. code:: yaml - type: galaxy_packages base_path: ./tests/test_deps_env It is possible that the ``SoftwareRequirement`` s in a given CWL tool will not match the module names for a given cluster. Such requirements can be re-mapped to specific deployed packages and/or versions using another file specified using the resolver plugin parameter `mapping_files`. We will demonstrate this using `galaxy_packages` but the concepts apply equally well to Environment Modules or Conda packages (described below) for instance. So consider the resolvers configuration file (`tests/test_deps_env_resolvers_conf_rewrite.yml`): .. code:: yaml - type: galaxy_packages base_path: ./tests/test_deps_env mapping_files: ./tests/test_deps_mapping.yml And the corresponding mapping configuraiton file (`tests/test_deps_mapping.yml`): .. code:: yaml - from: name: randomLines version: 1.0.0-rc1 to: name: random-lines version: '1.0' This is saying if cwltool encounters a requirement of ``randomLines`` at version ``1.0.0-rc1`` in a tool, to rewrite to our specific plugin as ``random-lines`` at version ``1.0``. cwltool has such a test tool called ``random_lines_mapping.cwl`` that contains such a source ``SoftwareRequirement``. To try out this example with mapping, execute the following command from the cwltool root directory:: cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf_rewrite.yml \ tests/random_lines_mapping.cwl \ tests/random_lines_job.json The previous examples demonstrated leveraging existing infrastructure to provide requirements for CWL tools. If instead a real package manager is used cwltool has the oppertunity to install requirements as needed. While initial support for Homebrew/Linuxbrew plugins is available, the most developed such plugin is for the `Conda `__ package manager. Conda has the nice properties of allowing multiple versions of a package to be installed simultaneously, not requiring evalated permissions to install Conda itself or packages using Conda, and being cross platform. For these reasons, cwltool may run as a normal user, install its own Conda environment and manage multiple versions of Conda packages on both Linux and Mac OS X. The Conda plugin can be endlessly configured, but a sensible set of defaults that has proven a powerful stack for dependency management within the Galaxy tool development ecosystem can be enabled by simply passing cwltool the ``--beta-conda-dependencies`` flag. With this we can use the seqtk example above without Docker and without any externally managed services - cwltool should install everything it needs and create an environment for the tool. Try it out with the follwing command:: cwltool --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json The CWL specification allows URIs to be attached to ``SoftwareRequirement`` s that allow disambiguation of package names. If the mapping files described above allow deployers to adapt tools to their infrastructure, this mechanism allows tools to adapt their requirements to multiple package managers. To demonstrate this within the context of the seqtk, we can simply break the package name we use and then specify a specific Conda package as follows: .. code:: yaml hints: SoftwareRequirement: packages: - package: seqtk_seq version: - '1.2' specs: - https://anaconda.org/bioconda/seqtk - https://packages.debian.org/sid/seqtk The example can be executed using the command:: cwltool --beta-conda-dependencies tests/seqtk_seq_wrong_name.cwl tests/seqtk_seq_job.json The plugin framework for managing resolution of these software requirements as maintained as part of `galaxy-lib `__ - a small, portable subset of the Galaxy project. More information on configuration and implementation can be found at the following links: - `Dependency Resolvers in Galaxy `__ - `Conda for [Galaxy] Tool Dependencies `__ - `Mapping Files - Implementation `__ - `Specifications - Implementation `__ - `Initial cwltool Integration Pull Request `__ Overriding workflow requirements at load time --------------------------------------------- Sometimes a workflow needs additional requirements to run in a particular environment or with a particular dataset. To avoid the need to modify the underlying workflow, cwltool supports requirement "overrides". The format of the "overrides" object is a mapping of item identifier (workflow, workflow step, or command line tool) to the process requirements that should be applied. .. code:: yaml cwltool:overrides: echo.cwl: requirements: EnvVarRequirement: envDef: MESSAGE: override_value Overrides can be specified either on the command line, or as part of the job input document. Workflow steps are identified using the name of the workflow file followed by the step name as a document fragment identifier "#id". Override identifiers are relative to the toplevel workflow document. .. code:: bash cwltool --overrides overrides.yml my-tool.cwl my-job.yml .. code:: yaml input_parameter1: value1 input_parameter2: value2 cwltool:overrides: workflow.cwl#step1: requirements: EnvVarRequirement: envDef: MESSAGE: override_value .. code:: bash cwltool my-tool.cwl my-job-with-overrides.yml CWL Tool Control Flow --------------------- Technical outline of how cwltool works internally, for maintainers. #. Use CWL ``load_tool()`` to load document. #. Fetches the document from file or URL #. Applies preprocessing (syntax/identifier expansion and normalization) #. Validates the document based on cwlVersion #. If necessary, updates the document to latest spec #. Constructs a Process object using ``make_tool()``` callback. This yields a CommandLineTool, Workflow, or ExpressionTool. For workflows, this recursively constructs each workflow step. #. To construct custom types for CommandLineTool, Workflow, or ExpressionTool, provide a custom ``make_tool()`` #. Iterate on the ``job()`` method of the Process object to get back runnable jobs. #. ``job()`` is a generator method (uses the Python iterator protocol) #. Each time the ``job()`` method is invoked in an iteration, it returns one of: a runnable item (an object with a ``run()`` method), ``None`` (indicating there is currently no work ready to run) or end of iteration (indicating the process is complete.) #. Invoke the runnable item by calling ``run()``. This runs the tool and gets output. #. Output of a process is reported by an output callback. #. ``job()`` may be iterated over multiple times. It will yield all the work that is currently ready to run and then yield None. #. ``Workflow`` objects create a corresponding ``WorkflowJob`` and ``WorkflowJobStep`` objects to hold the workflow state for the duration of the job invocation. #. The WorkflowJob iterates over each WorkflowJobStep and determines if the inputs the step are ready. #. When a step is ready, it constructs an input object for that step and iterates on the ``job()`` method of the workflow job step. #. Each runnable item is yielded back up to top level run loop #. When a step job completes and receives an output callback, the job outputs are assigned to the output of the workflow step. #. When all steps are complete, the intermediate files are moved to a final workflow output, intermediate directories are deleted, and the output callback for the workflow is called. #. ``CommandLineTool`` job() objects yield a single runnable object. #. The CommandLineTool ``job()`` method calls ``makeJobRunner()`` to create a ``CommandLineJob`` object #. The job method configures the CommandLineJob object by setting public attributes #. The job method iterates over file and directories inputs to the CommandLineTool and creates a "path map". #. Files are mapped from their "resolved" location to a "target" path where they will appear at tool invocation (for example, a location inside a Docker container.) The target paths are used on the command line. #. Files are staged to targets paths using either Docker volume binds (when using containers) or symlinks (if not). This staging step enables files to be logically rearranged or renamed independent of their source layout. #. The ``run()`` method of CommandLineJob executes the command line tool or Docker container, waits for it to complete, collects output, and makes the output callback. Extension points ---------------- The following functions can be provided to main(), to load_tool(), or to the executor to override or augment the listed behaviors. executor :: executor(tool, job_order_object, **kwargs) (Process, Dict[Text, Any], **Any) -> Tuple[Dict[Text, Any], Text] A toplevel workflow execution loop, should synchronously execute a process object and return an output object. makeTool :: makeTool(toolpath_object, **kwargs) (Dict[Text, Any], **Any) -> Process Construct a Process object from a document. selectResources :: selectResources(request) (Dict[Text, int]) -> Dict[Text, int] Take a resource request and turn it into a concrete resource assignment. versionfunc :: () () -> Text Return version string. make_fs_access :: make_fs_access(basedir) (Text) -> StdFsAccess Return a file system access object. fetcher_constructor :: fetcher_constructor(cache, session) (Dict[unicode, unicode], requests.sessions.Session) -> Fetcher Construct a Fetcher object with the supplied cache and HTTP session. resolver :: resolver(document_loader, document) (Loader, Union[Text, dict[Text, Any]]) -> Text Resolve a relative document identifier to an absolute one which can be fetched. logger_handler :: logger_handler logging.Handler Handler object for logging. Platform: UNKNOWN Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Console Classifier: Intended Audience :: Developers Classifier: Intended Audience :: Science/Research Classifier: Intended Audience :: Healthcare Industry Classifier: License :: OSI Approved :: Apache Software License Classifier: Natural Language :: English Classifier: Operating System :: MacOS :: MacOS X Classifier: Operating System :: POSIX Classifier: Operating System :: POSIX :: Linux Classifier: Operating System :: OS Independent Classifier: Operating System :: Microsoft :: Windows Classifier: Operating System :: Microsoft :: Windows :: Windows 10 Classifier: Operating System :: Microsoft :: Windows :: Windows 8.1 Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Topic :: Scientific/Engineering Classifier: Topic :: Scientific/Engineering :: Bio-Informatics Classifier: Topic :: Scientific/Engineering :: Astronomy Classifier: Topic :: Scientific/Engineering :: Atmospheric Science Classifier: Topic :: Scientific/Engineering :: Information Analysis Classifier: Topic :: Scientific/Engineering :: Medical Science Apps. Classifier: Topic :: System :: Distributed Computing Classifier: Topic :: Utilities cwltool-1.0.20180302231433/cwltool.py0000755000175200017520000000055013247251315017636 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env python from __future__ import absolute_import """Convienance entry point for cwltool. This can be used instead of the recommended method of `./setup.py install` or `./setup.py develop` and then using the generated `cwltool` executable. """ import sys from cwltool import main if __name__ == "__main__": sys.exit(main.main(sys.argv[1:])) cwltool-1.0.20180302231433/cwltool.egg-info/0000755000175200017520000000000013247251336020756 5ustar mcrusoemcrusoe00000000000000cwltool-1.0.20180302231433/cwltool.egg-info/top_level.txt0000644000175200017520000000001013247251336023477 0ustar mcrusoemcrusoe00000000000000cwltool cwltool-1.0.20180302231433/cwltool.egg-info/PKG-INFO0000644000175200017520000007323313247251336022063 0ustar mcrusoemcrusoe00000000000000Metadata-Version: 1.1 Name: cwltool Version: 1.0.20180302231433 Summary: Common workflow language reference implementation Home-page: https://github.com/common-workflow-language/cwltool Author: Common workflow language working group Author-email: common-workflow-language@googlegroups.com License: UNKNOWN Download-URL: https://github.com/common-workflow-language/cwltool Description-Content-Type: UNKNOWN Description: ================================================================== Common Workflow Language tool description reference implementation ================================================================== CWL conformance tests: |Build Status| Travis CI: |Unix Build Status| .. |Unix Build Status| image:: https://img.shields.io/travis/common-workflow-language/cwltool/master.svg?label=unix%20build :target: https://travis-ci.org/common-workflow-language/cwltool This is the reference implementation of the Common Workflow Language. It is intended to feature complete and provide comprehensive validation of CWL files as well as provide other tools related to working with CWL. This is written and tested for Python ``2.7 and 3.x {x = 3, 4, 5, 6}`` The reference implementation consists of two packages. The ``cwltool`` package is the primary Python module containing the reference implementation in the ``cwltool`` module and console executable by the same name. The ``cwlref-runner`` package is optional and provides an additional entry point under the alias ``cwl-runner``, which is the implementation-agnostic name for the default CWL interpreter installed on a host. Install ------- It is highly recommended to setup virtual environment before installing `cwltool`: .. code:: bash virtualenv -p python2 venv # Create a virtual environment, can use `python3` as well source venv/bin/activate # Activate environment before installing `cwltool` Installing the official package from PyPi (will install "cwltool" package as well) .. code:: bash pip install cwlref-runner If installing alongside another CWL implementation then .. code:: bash pip install cwltool Or you can install from source: .. code:: bash git clone https://github.com/common-workflow-language/cwltool.git # clone cwltool repo cd cwltool # Switch to source directory pip install . # Install `cwltool` from source cwltool --version # Check if the installation works correctly Remember, if co-installing multiple CWL implementations then you need to maintain which implementation ``cwl-runner`` points to via a symbolic file system link or `another facility `_. Running tests locally --------------------- - Running basic tests ``(/tests)``: To run the basis tests after installing `cwltool` execute the following: .. code:: bash pip install pytest mock py.test --ignore cwltool/schemas/ --pyarg cwltool To run various tests in all supported Python environments we use `tox `_. To run the test suite in all supported Python environments first downloading the complete code repository (see the ``git clone`` instructions above) and then run the following in the terminal: ``pip install tox; tox`` List of all environment can be seen using: ``tox --listenvs`` and running a specfic test env using: ``tox -e `` - Running the entire suite of CWL conformance tests: The GitHub repository for the CWL specifications contains a script that tests a CWL implementation against a wide array of valid CWL files using the `cwltest `_ program Instructions for running these tests can be found in the Common Workflow Language Specification repository at https://github.com/common-workflow-language/common-workflow-language/blob/master/CONFORMANCE_TESTS.md Run on the command line ----------------------- Simple command:: cwl-runner [tool-or-workflow-description] [input-job-settings] Or if you have multiple CWL implementations installed and you want to override the default cwl-runner use:: cwltool [tool-or-workflow-description] [input-job-settings] Use with boot2docker -------------------- boot2docker is running docker inside a virtual machine and it only mounts ``Users`` on it. The default behavior of CWL is to create temporary directories under e.g. ``/Var`` which is not accessible to Docker containers. To run CWL successfully with boot2docker you need to set the ``--tmpdir-prefix`` and ``--tmp-outdir-prefix`` to somewhere under ``/Users``:: $ cwl-runner --tmp-outdir-prefix=/Users/username/project --tmpdir-prefix=/Users/username/project wc-tool.cwl wc-job.json .. |Build Status| image:: https://ci.commonwl.org/buildStatus/icon?job=cwltool-conformance :target: https://ci.commonwl.org/job/cwltool-conformance/ Using user-space replacements for Docker ---------------------------------------- Some shared computing environments don't support Docker software containers for technical or policy reasons. As a work around, the CWL reference runner supports using a alternative ``docker`` implementations on Linux with the ``--user-space-docker-cmd`` option. One such "user space" friendly docker replacement is ``udocker`` https://github.com/indigo-dc/udocker and another is ``dx-docker`` https://wiki.dnanexus.com/Developer-Tutorials/Using-Docker-Images udocker installation: https://github.com/indigo-dc/udocker/blob/master/doc/installation_manual.md#22-install-from-indigo-datacloud-repositories dx-docker installation: start with the DNAnexus toolkit (see https://wiki.dnanexus.com/Downloads for instructions). Run `cwltool` just as you normally would, but with the new option, e.g. from the conformance tests: .. code:: bash cwltool --user-space-docker-cmd=udocker https://raw.githubusercontent.com/common-workflow-language/common-workflow-language/master/v1.0/v1.0/test-cwl-out2.cwl https://github.com/common-workflow-language/common-workflow-language/blob/master/v1.0/v1.0/empty.json or .. code:: bash cwltool --user-space-docker-cmd=dx-docker https://raw.githubusercontent.com/common-workflow-language/common-workflow-language/master/v1.0/v1.0/test-cwl-out2.cwl https://github.com/common-workflow-language/common-workflow-language/blob/master/v1.0/v1.0/empty.json ``cwltool`` can use `Singularity `_ as a Docker container runtime, an experimental feature. Singularity will run software containers specified in ``DockerRequirement`` and therefore works with Docker images only, native Singularity images are not supported. To use Singularity as the Docker container runtime, provide ``--singularity`` command line option to ``cwltool``. .. code:: bash cwltool --singularity https://raw.githubusercontent.com/common-workflow-language/common-workflow-language/master/v1.0/v1.0/v1.0/cat3-tool-mediumcut.cwl https://github.com/common-workflow-language/common-workflow-language/blob/master/v1.0/v1.0/cat-job.json Tool or workflow loading from remote or local locations ------------------------------------------------------- ``cwltool`` can run tool and workflow descriptions on both local and remote systems via its support for HTTP[S] URLs. Input job files and Workflow steps (via the `run` directive) can reference CWL documents using absolute or relative local filesytem paths. If a relative path is referenced and that document isn't found in the current directory then the following locations will be searched: http://www.commonwl.org/v1.0/CommandLineTool.html#Discovering_CWL_documents_on_a_local_filesystem Use with GA4GH Tool Registry API -------------------------------- Cwltool can launch tools directly from `GA4GH Tool Registry API`_ endpoints. By default, cwltool searches https://dockstore.org/ . Use --add-tool-registry to add other registries to the search path. For example :: cwltool --non-strict quay.io/collaboratory/dockstore-tool-bamstats:master test.json and (defaults to latest when a version is not specified) :: cwltool --non-strict quay.io/collaboratory/dockstore-tool-bamstats test.json For this example, grab the test.json (and input file) from https://github.com/CancerCollaboratory/dockstore-tool-bamstats .. _`GA4GH Tool Registry API`: https://github.com/ga4gh/tool-registry-schemas Import as a module ------------------ Add .. code:: python import cwltool to your script. The easiest way to use cwltool to run a tool or workflow from Python is to use a Factory .. code:: python import cwltool.factory fac = cwltool.factory.Factory() echo = f.make("echo.cwl") result = echo(inp="foo") # result["out"] == "foo" Leveraging SoftwareRequirements (Beta) -------------------------------------- CWL tools may be decorated with ``SoftwareRequirement`` hints that cwltool may in turn use to resolve to packages in various package managers or dependency management systems such as `Environment Modules `__. Utilizing ``SoftwareRequirement`` hints using cwltool requires an optional dependency, for this reason be sure to use specify the ``deps`` modifier when installing cwltool. For instance:: $ pip install 'cwltool[deps]' Installing cwltool in this fashion enables several new command line options. The most general of these options is ``--beta-dependency-resolvers-configuration``. This option allows one to specify a dependency resolvers configuration file. This file may be specified as either XML or YAML and very simply describes various plugins to enable to "resolve" ``SoftwareRequirement`` dependencies. To discuss some of these plugins and how to configure them, first consider the following ``hint`` definition for an example CWL tool. .. code:: yaml SoftwareRequirement: packages: - package: seqtk version: - r93 Now imagine deploying cwltool on a cluster with Software Modules installed and that a ``seqtk`` module is available at version ``r93``. This means cluster users likely won't have the binary ``seqtk`` on their ``PATH`` by default, but after sourcing this module with the command ``modulecmd sh load seqtk/r93`` ``seqtk`` is available on the ``PATH``. A simple dependency resolvers configuration file, called ``dependency-resolvers-conf.yml`` for instance, that would enable cwltool to source the correct module environment before executing the above tool would simply be: .. code:: yaml - type: modules The outer list indicates that one plugin is being enabled, the plugin parameters are defined as a dictionary for this one list item. There is only one required parameter for the plugin above, this is ``type`` and defines the plugin type. This parameter is required for all plugins. The available plugins and the parameters available for each are documented (incompletely) `here `__. Unfortunately, this documentation is in the context of Galaxy tool ``requirement`` s instead of CWL ``SoftwareRequirement`` s, but the concepts map fairly directly. cwltool is distributed with an example of such seqtk tool and sample corresponding job. It could executed from the cwltool root using a dependency resolvers configuration file such as the above one using the command:: cwltool --beta-dependency-resolvers-configuration /path/to/dependency-resolvers-conf.yml \ tests/seqtk_seq.cwl \ tests/seqtk_seq_job.json This example demonstrates both that cwltool can leverage existing software installations and also handle workflows with dependencies on different versions of the same software and libraries. However the above example does require an existing module setup so it is impossible to test this example "out of the box" with cwltool. For a more isolated test that demonstrates all the same concepts - the resolver plugin type ``galaxy_packages`` can be used. "Galaxy packages" are a lighter weight alternative to Environment Modules that are really just defined by a way to lay out directories into packages and versions to find little scripts that are sourced to modify the environment. They have been used for years in Galaxy community to adapt Galaxy tools to cluster environments but require neither knowledge of Galaxy nor any special tools to setup. These should work just fine for CWL tools. The cwltool source code repository's test directory is setup with a very simple directory that defines a set of "Galaxy packages" (but really just defines one package named ``random-lines``). The directory layout is simply:: tests/test_deps_env/ random-lines/ 1.0/ env.sh If the ``galaxy_packages`` plugin is enabled and pointed at the ``tests/test_deps_env`` directory in cwltool's root and a ``SoftwareRequirement`` such as the following is encountered. .. code:: yaml hints: SoftwareRequirement: packages: - package: 'random-lines' version: - '1.0' Then cwltool will simply find that ``env.sh`` file and source it before executing the corresponding tool. That ``env.sh`` script is only responsible for modifying the job's ``PATH`` to add the required binaries. This is a full example that works since resolving "Galaxy packages" has no external requirements. Try it out by executing the following command from cwltool's root directory:: cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf.yml \ tests/random_lines.cwl \ tests/random_lines_job.json The resolvers configuration file in the above example was simply: .. code:: yaml - type: galaxy_packages base_path: ./tests/test_deps_env It is possible that the ``SoftwareRequirement`` s in a given CWL tool will not match the module names for a given cluster. Such requirements can be re-mapped to specific deployed packages and/or versions using another file specified using the resolver plugin parameter `mapping_files`. We will demonstrate this using `galaxy_packages` but the concepts apply equally well to Environment Modules or Conda packages (described below) for instance. So consider the resolvers configuration file (`tests/test_deps_env_resolvers_conf_rewrite.yml`): .. code:: yaml - type: galaxy_packages base_path: ./tests/test_deps_env mapping_files: ./tests/test_deps_mapping.yml And the corresponding mapping configuraiton file (`tests/test_deps_mapping.yml`): .. code:: yaml - from: name: randomLines version: 1.0.0-rc1 to: name: random-lines version: '1.0' This is saying if cwltool encounters a requirement of ``randomLines`` at version ``1.0.0-rc1`` in a tool, to rewrite to our specific plugin as ``random-lines`` at version ``1.0``. cwltool has such a test tool called ``random_lines_mapping.cwl`` that contains such a source ``SoftwareRequirement``. To try out this example with mapping, execute the following command from the cwltool root directory:: cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf_rewrite.yml \ tests/random_lines_mapping.cwl \ tests/random_lines_job.json The previous examples demonstrated leveraging existing infrastructure to provide requirements for CWL tools. If instead a real package manager is used cwltool has the oppertunity to install requirements as needed. While initial support for Homebrew/Linuxbrew plugins is available, the most developed such plugin is for the `Conda `__ package manager. Conda has the nice properties of allowing multiple versions of a package to be installed simultaneously, not requiring evalated permissions to install Conda itself or packages using Conda, and being cross platform. For these reasons, cwltool may run as a normal user, install its own Conda environment and manage multiple versions of Conda packages on both Linux and Mac OS X. The Conda plugin can be endlessly configured, but a sensible set of defaults that has proven a powerful stack for dependency management within the Galaxy tool development ecosystem can be enabled by simply passing cwltool the ``--beta-conda-dependencies`` flag. With this we can use the seqtk example above without Docker and without any externally managed services - cwltool should install everything it needs and create an environment for the tool. Try it out with the follwing command:: cwltool --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json The CWL specification allows URIs to be attached to ``SoftwareRequirement`` s that allow disambiguation of package names. If the mapping files described above allow deployers to adapt tools to their infrastructure, this mechanism allows tools to adapt their requirements to multiple package managers. To demonstrate this within the context of the seqtk, we can simply break the package name we use and then specify a specific Conda package as follows: .. code:: yaml hints: SoftwareRequirement: packages: - package: seqtk_seq version: - '1.2' specs: - https://anaconda.org/bioconda/seqtk - https://packages.debian.org/sid/seqtk The example can be executed using the command:: cwltool --beta-conda-dependencies tests/seqtk_seq_wrong_name.cwl tests/seqtk_seq_job.json The plugin framework for managing resolution of these software requirements as maintained as part of `galaxy-lib `__ - a small, portable subset of the Galaxy project. More information on configuration and implementation can be found at the following links: - `Dependency Resolvers in Galaxy `__ - `Conda for [Galaxy] Tool Dependencies `__ - `Mapping Files - Implementation `__ - `Specifications - Implementation `__ - `Initial cwltool Integration Pull Request `__ Overriding workflow requirements at load time --------------------------------------------- Sometimes a workflow needs additional requirements to run in a particular environment or with a particular dataset. To avoid the need to modify the underlying workflow, cwltool supports requirement "overrides". The format of the "overrides" object is a mapping of item identifier (workflow, workflow step, or command line tool) to the process requirements that should be applied. .. code:: yaml cwltool:overrides: echo.cwl: requirements: EnvVarRequirement: envDef: MESSAGE: override_value Overrides can be specified either on the command line, or as part of the job input document. Workflow steps are identified using the name of the workflow file followed by the step name as a document fragment identifier "#id". Override identifiers are relative to the toplevel workflow document. .. code:: bash cwltool --overrides overrides.yml my-tool.cwl my-job.yml .. code:: yaml input_parameter1: value1 input_parameter2: value2 cwltool:overrides: workflow.cwl#step1: requirements: EnvVarRequirement: envDef: MESSAGE: override_value .. code:: bash cwltool my-tool.cwl my-job-with-overrides.yml CWL Tool Control Flow --------------------- Technical outline of how cwltool works internally, for maintainers. #. Use CWL ``load_tool()`` to load document. #. Fetches the document from file or URL #. Applies preprocessing (syntax/identifier expansion and normalization) #. Validates the document based on cwlVersion #. If necessary, updates the document to latest spec #. Constructs a Process object using ``make_tool()``` callback. This yields a CommandLineTool, Workflow, or ExpressionTool. For workflows, this recursively constructs each workflow step. #. To construct custom types for CommandLineTool, Workflow, or ExpressionTool, provide a custom ``make_tool()`` #. Iterate on the ``job()`` method of the Process object to get back runnable jobs. #. ``job()`` is a generator method (uses the Python iterator protocol) #. Each time the ``job()`` method is invoked in an iteration, it returns one of: a runnable item (an object with a ``run()`` method), ``None`` (indicating there is currently no work ready to run) or end of iteration (indicating the process is complete.) #. Invoke the runnable item by calling ``run()``. This runs the tool and gets output. #. Output of a process is reported by an output callback. #. ``job()`` may be iterated over multiple times. It will yield all the work that is currently ready to run and then yield None. #. ``Workflow`` objects create a corresponding ``WorkflowJob`` and ``WorkflowJobStep`` objects to hold the workflow state for the duration of the job invocation. #. The WorkflowJob iterates over each WorkflowJobStep and determines if the inputs the step are ready. #. When a step is ready, it constructs an input object for that step and iterates on the ``job()`` method of the workflow job step. #. Each runnable item is yielded back up to top level run loop #. When a step job completes and receives an output callback, the job outputs are assigned to the output of the workflow step. #. When all steps are complete, the intermediate files are moved to a final workflow output, intermediate directories are deleted, and the output callback for the workflow is called. #. ``CommandLineTool`` job() objects yield a single runnable object. #. The CommandLineTool ``job()`` method calls ``makeJobRunner()`` to create a ``CommandLineJob`` object #. The job method configures the CommandLineJob object by setting public attributes #. The job method iterates over file and directories inputs to the CommandLineTool and creates a "path map". #. Files are mapped from their "resolved" location to a "target" path where they will appear at tool invocation (for example, a location inside a Docker container.) The target paths are used on the command line. #. Files are staged to targets paths using either Docker volume binds (when using containers) or symlinks (if not). This staging step enables files to be logically rearranged or renamed independent of their source layout. #. The ``run()`` method of CommandLineJob executes the command line tool or Docker container, waits for it to complete, collects output, and makes the output callback. Extension points ---------------- The following functions can be provided to main(), to load_tool(), or to the executor to override or augment the listed behaviors. executor :: executor(tool, job_order_object, **kwargs) (Process, Dict[Text, Any], **Any) -> Tuple[Dict[Text, Any], Text] A toplevel workflow execution loop, should synchronously execute a process object and return an output object. makeTool :: makeTool(toolpath_object, **kwargs) (Dict[Text, Any], **Any) -> Process Construct a Process object from a document. selectResources :: selectResources(request) (Dict[Text, int]) -> Dict[Text, int] Take a resource request and turn it into a concrete resource assignment. versionfunc :: () () -> Text Return version string. make_fs_access :: make_fs_access(basedir) (Text) -> StdFsAccess Return a file system access object. fetcher_constructor :: fetcher_constructor(cache, session) (Dict[unicode, unicode], requests.sessions.Session) -> Fetcher Construct a Fetcher object with the supplied cache and HTTP session. resolver :: resolver(document_loader, document) (Loader, Union[Text, dict[Text, Any]]) -> Text Resolve a relative document identifier to an absolute one which can be fetched. logger_handler :: logger_handler logging.Handler Handler object for logging. Platform: UNKNOWN Classifier: Development Status :: 5 - Production/Stable Classifier: Environment :: Console Classifier: Intended Audience :: Developers Classifier: Intended Audience :: Science/Research Classifier: Intended Audience :: Healthcare Industry Classifier: License :: OSI Approved :: Apache Software License Classifier: Natural Language :: English Classifier: Operating System :: MacOS :: MacOS X Classifier: Operating System :: POSIX Classifier: Operating System :: POSIX :: Linux Classifier: Operating System :: OS Independent Classifier: Operating System :: Microsoft :: Windows Classifier: Operating System :: Microsoft :: Windows :: Windows 10 Classifier: Operating System :: Microsoft :: Windows :: Windows 8.1 Classifier: Programming Language :: Python :: 2 Classifier: Programming Language :: Python :: 2.7 Classifier: Programming Language :: Python :: 3 Classifier: Programming Language :: Python :: 3.4 Classifier: Programming Language :: Python :: 3.5 Classifier: Programming Language :: Python :: 3.6 Classifier: Topic :: Scientific/Engineering Classifier: Topic :: Scientific/Engineering :: Bio-Informatics Classifier: Topic :: Scientific/Engineering :: Astronomy Classifier: Topic :: Scientific/Engineering :: Atmospheric Science Classifier: Topic :: Scientific/Engineering :: Information Analysis Classifier: Topic :: Scientific/Engineering :: Medical Science Apps. Classifier: Topic :: System :: Distributed Computing Classifier: Topic :: Utilities cwltool-1.0.20180302231433/cwltool.egg-info/SOURCES.txt0000644000175200017520000002262113247251336022645 0ustar mcrusoemcrusoe00000000000000MANIFEST.in Makefile README.rst cwltool.py gittaggers.py setup.cfg setup.py cwltool/__init__.py cwltool/__main__.py cwltool/argparser.py cwltool/builder.py cwltool/command_line_tool.py cwltool/cwlNodeEngine.js cwltool/cwlNodeEngineJSConsole.js cwltool/cwlrdf.py cwltool/docker.py cwltool/docker_id.py cwltool/draft2tool.py cwltool/errors.py cwltool/executors.py cwltool/expression.py cwltool/extensions.yml cwltool/factory.py cwltool/flatten.py cwltool/job.py cwltool/load_tool.py cwltool/loghandler.py cwltool/main.py cwltool/mutation.py cwltool/pack.py cwltool/pathmapper.py cwltool/process.py cwltool/resolver.py cwltool/sandboxjs.py cwltool/singularity.py cwltool/software_requirements.py cwltool/stdfsaccess.py cwltool/update.py cwltool/utils.py cwltool/workflow.py cwltool.egg-info/PKG-INFO cwltool.egg-info/SOURCES.txt cwltool.egg-info/dependency_links.txt cwltool.egg-info/entry_points.txt cwltool.egg-info/requires.txt cwltool.egg-info/top_level.txt cwltool.egg-info/zip-safe cwltool/schemas/draft-2/CommonWorkflowLanguage.yml cwltool/schemas/draft-2/cwl-avro.yml cwltool/schemas/draft-3/CommandLineTool-standalone.yml cwltool/schemas/draft-3/CommandLineTool.yml cwltool/schemas/draft-3/CommonWorkflowLanguage.yml cwltool/schemas/draft-3/Process.yml cwltool/schemas/draft-3/README.md cwltool/schemas/draft-3/UserGuide.yml cwltool/schemas/draft-3/Workflow.yml cwltool/schemas/draft-3/concepts.md cwltool/schemas/draft-3/contrib.md cwltool/schemas/draft-3/index.yml cwltool/schemas/draft-3/intro.md cwltool/schemas/draft-3/invocation.md cwltool/schemas/draft-3/userguide-intro.md cwltool/schemas/draft-3/salad/schema_salad/metaschema/field_name.yml cwltool/schemas/draft-3/salad/schema_salad/metaschema/field_name_proc.yml cwltool/schemas/draft-3/salad/schema_salad/metaschema/field_name_schema.yml cwltool/schemas/draft-3/salad/schema_salad/metaschema/field_name_src.yml cwltool/schemas/draft-3/salad/schema_salad/metaschema/ident_res.yml cwltool/schemas/draft-3/salad/schema_salad/metaschema/ident_res_proc.yml cwltool/schemas/draft-3/salad/schema_salad/metaschema/ident_res_schema.yml cwltool/schemas/draft-3/salad/schema_salad/metaschema/ident_res_src.yml cwltool/schemas/draft-3/salad/schema_salad/metaschema/import_include.md cwltool/schemas/draft-3/salad/schema_salad/metaschema/link_res.yml cwltool/schemas/draft-3/salad/schema_salad/metaschema/link_res_proc.yml cwltool/schemas/draft-3/salad/schema_salad/metaschema/link_res_schema.yml cwltool/schemas/draft-3/salad/schema_salad/metaschema/link_res_src.yml cwltool/schemas/draft-3/salad/schema_salad/metaschema/metaschema.yml cwltool/schemas/draft-3/salad/schema_salad/metaschema/salad.md cwltool/schemas/draft-3/salad/schema_salad/metaschema/vocab_res.yml cwltool/schemas/draft-3/salad/schema_salad/metaschema/vocab_res_proc.yml cwltool/schemas/draft-3/salad/schema_salad/metaschema/vocab_res_schema.yml cwltool/schemas/draft-3/salad/schema_salad/metaschema/vocab_res_src.yml cwltool/schemas/v1.0/CommandLineTool-standalone.yml cwltool/schemas/v1.0/CommandLineTool.yml cwltool/schemas/v1.0/CommonWorkflowLanguage.yml cwltool/schemas/v1.0/Process.yml cwltool/schemas/v1.0/README.md cwltool/schemas/v1.0/UserGuide.yml cwltool/schemas/v1.0/Workflow.yml cwltool/schemas/v1.0/concepts.md cwltool/schemas/v1.0/contrib.md cwltool/schemas/v1.0/intro.md cwltool/schemas/v1.0/invocation.md cwltool/schemas/v1.0/userguide-intro.md cwltool/schemas/v1.0/salad/schema_salad/metaschema/field_name.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/field_name_proc.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/field_name_schema.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/field_name_src.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/ident_res.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/ident_res_proc.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/ident_res_schema.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/ident_res_src.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/import_include.md cwltool/schemas/v1.0/salad/schema_salad/metaschema/link_res.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/link_res_proc.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/link_res_schema.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/link_res_src.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/map_res.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/map_res_proc.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/map_res_schema.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/map_res_src.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/metaschema.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/metaschema_base.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/salad.md cwltool/schemas/v1.0/salad/schema_salad/metaschema/typedsl_res.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/typedsl_res_proc.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/typedsl_res_schema.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/typedsl_res_src.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/vocab_res.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/vocab_res_proc.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/vocab_res_schema.yml cwltool/schemas/v1.0/salad/schema_salad/metaschema/vocab_res_src.yml cwltool/schemas/v1.1.0-dev1/CommandLineTool-standalone.yml cwltool/schemas/v1.1.0-dev1/CommandLineTool.yml cwltool/schemas/v1.1.0-dev1/CommonWorkflowLanguage.yml cwltool/schemas/v1.1.0-dev1/Process.yml cwltool/schemas/v1.1.0-dev1/README.md cwltool/schemas/v1.1.0-dev1/UserGuide.yml cwltool/schemas/v1.1.0-dev1/Workflow.yml cwltool/schemas/v1.1.0-dev1/concepts.md cwltool/schemas/v1.1.0-dev1/contrib.md cwltool/schemas/v1.1.0-dev1/intro.md cwltool/schemas/v1.1.0-dev1/invocation.md cwltool/schemas/v1.1.0-dev1/userguide-intro.md cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/field_name.yml cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/field_name_proc.yml cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/field_name_schema.yml cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/field_name_src.yml cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/ident_res.yml cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/ident_res_proc.yml cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/ident_res_schema.yml cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/ident_res_src.yml cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/import_include.md cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/link_res.yml cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/link_res_proc.yml cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/link_res_schema.yml cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/link_res_src.yml cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/metaschema.yml cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/metaschema_base.yml cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/salad.md cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/vocab_res.yml cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/vocab_res_proc.yml cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/vocab_res_schema.yml cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/vocab_res_src.yml tests/2.fasta tests/2.fastq tests/__init__.py tests/echo-cwlrun-job.yaml tests/echo-job.yaml tests/echo.cwl tests/listing-job.yml tests/random_lines.cwl tests/random_lines_job.json tests/random_lines_mapping.cwl tests/seqtk_seq.cwl tests/seqtk_seq_job.json tests/seqtk_seq_with_docker.cwl tests/seqtk_seq_wrong_name.cwl tests/test_bad_outputs_wf.cwl tests/test_check.py tests/test_cwl_version.py tests/test_default_path.py tests/test_deps_env_resolvers_conf.yml tests/test_deps_env_resolvers_conf_rewrite.yml tests/test_deps_mapping.yml tests/test_docker_warning.py tests/test_examples.py tests/test_ext.py tests/test_fetch.py tests/test_http_input.py tests/test_iwdr.py tests/test_js_sandbox.py tests/test_override.py tests/test_pack.py tests/test_parallel.py tests/test_pathmapper.py tests/test_rdfprint.py tests/test_relax_path_checks.py tests/test_toolargparse.py tests/util.py tests/override/echo-job-ov.yml tests/override/echo-job-ov2.yml tests/override/echo-job.yml tests/override/echo-wf.cwl tests/override/echo.cwl tests/override/ov.yml tests/override/ov2.yml tests/override/ov3.yml tests/tmp1/tmp2/tmp3/.gitkeep tests/wf/badout1.cwl tests/wf/badout2.cwl tests/wf/badout3.cwl tests/wf/cat-tool.cwl tests/wf/cat.cwl tests/wf/count-lines1-wf.cwl tests/wf/default_path.cwl tests/wf/echo.cwl tests/wf/empty.ttl tests/wf/expect_packed.cwl tests/wf/formattest-job.json tests/wf/formattest.cwl tests/wf/hello-workflow.cwl tests/wf/hello.txt tests/wf/hello_single_tool.cwl tests/wf/iwdr-entry.cwl tests/wf/js_output.cwl tests/wf/js_output_workflow.cwl tests/wf/listing_deep.cwl tests/wf/listing_none.cwl tests/wf/listing_shallow.cwl tests/wf/listing_v1_0.cwl tests/wf/malformed_outputs.cwl tests/wf/missing_cwlVersion.cwl tests/wf/mut.cwl tests/wf/mut2.cwl tests/wf/mut3.cwl tests/wf/parseInt-tool.cwl tests/wf/revsort-job.json tests/wf/revsort.cwl tests/wf/revtool.cwl tests/wf/scatter-job2.json tests/wf/scatter-wf4.cwl tests/wf/scatterfail.cwl tests/wf/separate_without_prefix.cwl tests/wf/sorttool.cwl tests/wf/updatedir.cwl tests/wf/updatedir_inplace.cwl tests/wf/updateval.cwl tests/wf/updateval.py tests/wf/updateval_inplace.cwl tests/wf/vf-concat.cwl tests/wf/wc-job.json tests/wf/wc-tool.cwl tests/wf/wffail.cwl tests/wf/whale.txt tests/wf/wrong_cwlVersion.cwlcwltool-1.0.20180302231433/cwltool.egg-info/dependency_links.txt0000644000175200017520000000000113247251336025024 0ustar mcrusoemcrusoe00000000000000 cwltool-1.0.20180302231433/cwltool.egg-info/requires.txt0000644000175200017520000000027313247251336023360 0ustar mcrusoemcrusoe00000000000000setuptools requests>=2.4.3 ruamel.yaml<0.15,>=0.12.4 rdflib<4.3.0,>=4.2.2 shellescape<3.5,>=3.4.1 schema-salad<3,>=2.6.20170927145003 typing>=3.5.3 six>=1.8.0 [deps] galaxy-lib>=17.09.3 cwltool-1.0.20180302231433/cwltool.egg-info/entry_points.txt0000644000175200017520000000005713247251336024256 0ustar mcrusoemcrusoe00000000000000[console_scripts] cwltool = cwltool.main:main cwltool-1.0.20180302231433/cwltool.egg-info/zip-safe0000644000175200017520000000000113247251316022404 0ustar mcrusoemcrusoe00000000000000 cwltool-1.0.20180302231433/Makefile0000644000175200017520000001460613247251315017245 0ustar mcrusoemcrusoe00000000000000# This file is part of cwltool, # https://github.com/common-workflow-language/cwltool/, and is # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # Contact: common-workflow-language@googlegroups.com # make pep8 to check for basic Python code compliance # make autopep8 to fix most pep8 errors # make pylint to check Python code for enhanced compliance including naming # and documentation # make coverage-report to check coverage of the python scripts by the tests MODULE=cwltool # `SHELL=bash` doesn't work for some, so don't use BASH-isms like # `[[` conditional expressions. PYSOURCES=$(wildcard ${MODULE}/**.py tests/*.py) setup.py DEVPKGS=pep8 diff_cover autopep8 pylint coverage pydocstyle flake8 pytest isort mock DEBDEVPKGS=pep8 python-autopep8 pylint python-coverage pydocstyle sloccount \ python-flake8 python-mock shellcheck VERSION=1.0.$(shell date +%Y%m%d%H%M%S --utc --date=`git log --first-parent \ --max-count=1 --format=format:%cI`) mkfile_dir := $(dir $(abspath $(lastword $(MAKEFILE_LIST)))) ## all : default task all: ./setup.py develop ## help : print this help message and exit help: Makefile @sed -n 's/^##//p' $< ## install-dep : install most of the development dependencies via pip install-dep: pip install --upgrade $(DEVPKGS) ## install-deb-dep: install most of the dev dependencies via apt-get install-deb-dep: sudo apt-get install $(DEBDEVPKGS) ## install : install the ${MODULE} module and schema-salad-tool install: FORCE pip install . ## dist : create a module package for distribution dist: dist/${MODULE}-$(VERSION).tar.gz dist/${MODULE}-$(VERSION).tar.gz: $(SOURCES) ./setup.py sdist bdist_wheel ## clean : clean up all temporary / machine-generated files clean: FORCE rm -f ${MODILE}/*.pyc tests/*.pyc ./setup.py clean --all || true rm -Rf .coverage rm -f diff-cover.html # Linting and code style related targets ## sorting imports using isort: https://github.com/timothycrosley/isort sort_imports: isort ${MODULE}/*.py tests/*.py setup.py ## pep8 : check Python code style pep8: $(PYSOURCES) pep8 --exclude=_version.py --show-source --show-pep8 $^ || true pep8_report.txt: $(PYSOURCES) pep8 --exclude=_version.py $^ > pep8_report.txt || true diff_pep8_report: pep8_report.txt diff-quality --violations=pep8 pep8_report.txt pep257: pydocstyle ## pydocstyle : check Python code style pydocstyle: $(PYSOURCES) pydocstyle --ignore=D100,D101,D102,D103 $^ || true pydocstyle_report.txt: $(PYSOURCES) pydocstyle setup.py $^ > pydocstyle_report.txt 2>&1 || true diff_pydocstyle_report: pydocstyle_report.txt diff-quality --violations=pycodestyle $^ ## autopep8 : fix most Python code indentation and formatting autopep8: $(PYSOURCES) autopep8 --recursive --in-place --ignore E309 $^ # A command to automatically run astyle and autopep8 on appropriate files ## format : check/fix all code indentation and formatting (runs autopep8) format: autopep8 # Do nothing ## pylint : run static code analysis on Python code pylint: $(PYSOURCES) pylint --msg-template="{path}:{line}: [{msg_id}({symbol}), {obj}] {msg}" \ $^ || true pylint_report.txt: ${PYSOURCES} pylint --msg-template="{path}:{line}: [{msg_id}({symbol}), {obj}] {msg}" \ $^ > pylint_report.txt || true diff_pylint_report: pylint_report.txt diff-quality --violations=pylint pylint_report.txt .coverage: $(PYSOURCES) all export COVERAGE_PROCESS_START=${mkfile_dir}.coveragerc; \ cd ${CWL}; ./run_test.sh RUNNER=cwltool coverage run setup.py test coverage combine ${CWL} ${CWL}/draft-3/ ./ coverage.xml: .coverage python-coverage xml coverage.html: htmlcov/index.html htmlcov/index.html: .coverage python-coverage html @echo Test coverage of the Python code is now in htmlcov/index.html coverage-report: .coverage python-coverage report diff-cover: coverage-gcovr.xml coverage.xml diff-cover coverage-gcovr.xml coverage.xml diff-cover.html: coverage-gcovr.xml coverage.xml diff-cover coverage-gcovr.xml coverage.xml \ --html-report diff-cover.html ## test : run the ${MODULE} test suite test: FORCE ./setup.py test sloccount.sc: ${PYSOURCES} Makefile sloccount --duplicates --wide --details $^ > sloccount.sc ## sloccount : count lines of code sloccount: ${PYSOURCES} Makefile sloccount $^ list-author-emails: @echo 'name, E-Mail Address' @git log --format='%aN,%aE' | sort -u | grep -v 'root' mypy2: ${PYSOURCES} rm -Rf typeshed/2and3/ruamel/yaml ln -s $(shell python -c 'from __future__ import print_function; import ruamel.yaml; import os.path; print(os.path.dirname(ruamel.yaml.__file__))') \ typeshed/2and3/ruamel/yaml rm -Rf typeshed/2and3/schema_salad ln -s $(shell python -c 'from __future__ import print_function; import schema_salad; import os.path; print(os.path.dirname(schema_salad.__file__))') \ typeshed/2and3/schema_salad MYPYPATH=$$MYPYPATH:typeshed/2.7:typeshed/2and3 mypy --py2 --disallow-untyped-calls \ --warn-redundant-casts \ cwltool mypy3: ${PYSOURCES} rm -Rf typeshed/2and3/ruamel/yaml ln -s $(shell python3 -c 'from __future__ import print_function; import ruamel.yaml; import os.path; print(os.path.dirname(ruamel.yaml.__file__))') \ typeshed/2and3/ruamel/yaml rm -Rf typeshed/2and3/schema_salad ln -s $(shell python3 -c 'from __future__ import print_function; import schema_salad; import os.path; print(os.path.dirname(schema_salad.__file__))') \ typeshed/2and3/schema_salad MYPYPATH=$$MYPYPATH:typeshed/3:typeshed/2and3 mypy --disallow-untyped-calls \ --warn-redundant-casts \ cwltool release: FORCE ./release-test.sh . testenv2/bin/activate && \ testenv2/src/${MODULE}/setup.py sdist bdist_wheel && \ pip install twine && \ twine upload testenv2/src/${MODULE}/dist/* && \ git tag ${VERSION} && git push --tags FORCE: # Use this to print the value of a Makefile variable # Example `make print-VERSION` # From https://www.cmcrossroads.com/article/printing-value-makefile-variable print-% : ; @echo $* = $($*) cwltool-1.0.20180302231433/setup.cfg0000644000175200017520000000050013247251336017415 0ustar mcrusoemcrusoe00000000000000[flake8] ignore = E124,E128,E129,E201,E202,E225,E226,E231,E265,E271,E302,E303,F401,E402,E501,W503,E731,F811,F821,F841 exclude = cwltool/schemas [bdist_wheel] universal = 1 [aliases] test = pytest [tool:pytest] addopts = --ignore cwltool/schemas testpaths = tests [egg_info] tag_build = .20180302231433 tag_date = 0 cwltool-1.0.20180302231433/cwltool/0000755000175200017520000000000013247251336017264 5ustar mcrusoemcrusoe00000000000000cwltool-1.0.20180302231433/cwltool/utils.py0000644000175200017520000001273313247251316021002 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import # no imports from cwltool allowed import os import shutil import stat import six from six.moves import urllib from six.moves import zip_longest from typing import Any,Callable, Dict, List, Tuple, Text, Union windows_default_container_id = "frolvlad/alpine-bash" def aslist(l): # type: (Any) -> List[Any] if isinstance(l, list): return l else: return [l] def get_feature(self, feature): # type: (Any, Any) -> Tuple[Any, bool] for t in reversed(self.requirements): if t["class"] == feature: return (t, True) for t in reversed(self.hints): if t["class"] == feature: return (t, False) return (None, None) def copytree_with_merge(src, dst, symlinks=False, ignore=None): # type: (Text, Text, bool, Callable[..., Any]) -> None if not os.path.exists(dst): os.makedirs(dst) shutil.copystat(src, dst) lst = os.listdir(src) if ignore: excl = ignore(src, lst) lst = [x for x in lst if x not in excl] for item in lst: s = os.path.join(src, item) d = os.path.join(dst, item) if symlinks and os.path.islink(s): if os.path.lexists(d): os.remove(d) os.symlink(os.readlink(s), d) try: st = os.lstat(s) mode = stat.S_IMODE(st.st_mode) os.lchmod(d, mode) except: pass # lchmod not available, only available on unix elif os.path.isdir(s): copytree_with_merge(s, d, symlinks, ignore) else: shutil.copy2(s, d) # changes windowspath(only) appropriately to be passed to docker run command # as docker treat them as unix paths so convert C:\Users\foo to /C/Users/foo def docker_windows_path_adjust(path): # type: (Text) -> (Text) if path is not None and onWindows(): sp=path.split(':') if len(sp)==2: sp[0]=sp[0].capitalize() # Capitalizing windows Drive letters path=':'.join(sp) path = path.replace(':', '').replace('\\', '/') return path if path[0] == '/' else '/' + path return path # changes docker path(only on windows os) appropriately back to Windows path # so convert /C/Users/foo to C:\Users\foo def docker_windows_reverse_path_adjust(path): # type: (Text) -> (Text) if path is not None and onWindows(): if path[0] == '/': path=path[1:] else: raise ValueError("not a docker path") splitpath=path.split('/') splitpath[0]= splitpath[0]+':' return '\\'.join(splitpath) return path # On docker in windows fileuri do not contain : in path # To convert this file uri to windows compatible add : after drove letter, # so file:///E/var becomes file:///E:/var def docker_windows_reverse_fileuri_adjust(fileuri): # type: (Text) -> (Text) if fileuri is not None and onWindows(): if urllib.parse.urlsplit(fileuri).scheme == "file": filesplit= fileuri.split("/") if filesplit[3][-1] != ':': filesplit[3]=filesplit[3]+':' return '/'.join(filesplit) else: return fileuri else: raise ValueError("not a file URI") return fileuri # Check if we are on windows OS def onWindows(): # type: () -> (bool) return os.name == 'nt' # On windows os.path.join would use backslash to join path, since we would use these paths in Docker we would convert it to / def convert_pathsep_to_unix(path): # type: (Text) -> (Text) if path is not None and onWindows(): return path.replace('\\', '/') return path # comparision function to be used in sorting # python3 doesn't allow sorting of different # types like str() and int(). # this function re-creates sorting nature in py2 # of heterogeneous list of `int` and `str` def cmp_like_py2(dict1, dict2): # type: (Dict[Text, Any], Dict[Text, Any]) -> int # extract lists from both dicts a, b = dict1["position"], dict2["position"] # iterate through both list till max of their size for i,j in zip_longest(a,b): if i == j: continue # in case 1st list is smaller # should come first in sorting if i is None: return -1 # if 1st list is longer, # it should come later in sort elif j is None: return 1 # if either of the list contains str element # at any index, both should be str before comparing if isinstance(i, str) or isinstance(j, str): return 1 if str(i) > str(j) else -1 # int comparison otherwise return 1 if i > j else -1 # if both lists are equal return 0 # util function to convert any present byte string # to unicode string. input is a dict of nested dicts and lists def bytes2str_in_dicts(a): # type: (Union[Dict[Text, Any], List[Any], Any]) -> Union[Text, List[Any], Dict[Text, Any]] # if input is dict, recursively call for each value if isinstance(a, dict): for k, v in dict.items(a): a[k] = bytes2str_in_dicts(v) return a # if list, iterate through list and fn call # for all its elements if isinstance(a, list): for idx, value in enumerate(a): a[idx] = bytes2str_in_dicts(value) return a # if value is bytes, return decoded string, elif isinstance(a, bytes): return a.decode('utf-8') # simply return elements itself return a cwltool-1.0.20180302231433/cwltool/cwlNodeEngine.js0000755000175200017520000000113313247251315022341 0ustar mcrusoemcrusoe00000000000000"use strict"; process.stdin.setEncoding("utf8"); var incoming = ""; process.stdin.on("data", function(chunk) { incoming += chunk; var i = incoming.indexOf("\n"); if (i > -1) { try{ var fn = JSON.parse(incoming.substr(0, i)); incoming = incoming.substr(i+1); process.stdout.write(JSON.stringify(require("vm").runInNewContext(fn, {})) + "\n"); } catch(e){ console.error(e) } /*strings to indicate the process has finished*/ console.log("r1cepzbhUTxtykz5XTC4"); console.error("r1cepzbhUTxtykz5XTC4"); } }); process.stdin.on("end", process.exit); cwltool-1.0.20180302231433/cwltool/workflow.py0000644000175200017520000012166113247251316021515 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import copy import functools import json import logging import random import tempfile from collections import namedtuple from typing import Any, Callable, Dict, Generator, Iterable, List, Text, Union, cast import schema_salad.validate as validate from ruamel.yaml.comments import CommentedMap, CommentedSeq from schema_salad.sourceline import SourceLine, cmap from . import command_line_tool, expression from .errors import WorkflowException from .load_tool import load_tool from .process import Process, shortname, uniquename, get_overrides from .utils import aslist import six from six.moves import range _logger = logging.getLogger("cwltool") WorkflowStateItem = namedtuple('WorkflowStateItem', ['parameter', 'value', 'success']) def defaultMakeTool(toolpath_object, # type: Dict[Text, Any] **kwargs # type: Any ): # type: (...) -> Process if not isinstance(toolpath_object, dict): raise WorkflowException(u"Not a dict: '%s'" % toolpath_object) if "class" in toolpath_object: if toolpath_object["class"] == "CommandLineTool": return command_line_tool.CommandLineTool(toolpath_object, **kwargs) elif toolpath_object["class"] == "ExpressionTool": return command_line_tool.ExpressionTool(toolpath_object, **kwargs) elif toolpath_object["class"] == "Workflow": return Workflow(toolpath_object, **kwargs) raise WorkflowException( u"Missing or invalid 'class' field in %s, expecting one of: CommandLineTool, ExpressionTool, Workflow" % toolpath_object["id"]) def findfiles(wo, fn=None): # type: (Any, List) -> List[Dict[Text, Any]] if fn is None: fn = [] if isinstance(wo, dict): if wo.get("class") == "File": fn.append(wo) findfiles(wo.get("secondaryFiles", None), fn) else: for w in wo.values(): findfiles(w, fn) elif isinstance(wo, list): for w in wo: findfiles(w, fn) return fn def match_types(sinktype, src, iid, inputobj, linkMerge, valueFrom): # type: (Union[List[Text],Text], WorkflowStateItem, Text, Dict[Text, Any], Text, Text) -> bool if isinstance(sinktype, list): # Sink is union type for st in sinktype: if match_types(st, src, iid, inputobj, linkMerge, valueFrom): return True elif isinstance(src.parameter["type"], list): # Source is union type # Check that at least one source type is compatible with the sink. original_types = src.parameter["type"] for source_type in original_types: src.parameter["type"] = source_type match = match_types( sinktype, src, iid, inputobj, linkMerge, valueFrom) if match: src.parameter["type"] = original_types return True src.parameter["type"] = original_types return False elif linkMerge: if iid not in inputobj: inputobj[iid] = [] if linkMerge == "merge_nested": inputobj[iid].append(src.value) elif linkMerge == "merge_flattened": if isinstance(src.value, list): inputobj[iid].extend(src.value) else: inputobj[iid].append(src.value) else: raise WorkflowException(u"Unrecognized linkMerge enum '%s'" % linkMerge) return True elif valueFrom is not None or can_assign_src_to_sink(src.parameter["type"], sinktype) or sinktype == "Any": # simply assign the value from state to input inputobj[iid] = copy.deepcopy(src.value) return True return False def check_types(srctype, sinktype, linkMerge, valueFrom): # type: (Any, Any, Text, Text) -> Text """Check if the source and sink types are "pass", "warning", or "exception". """ if valueFrom: return "pass" elif not linkMerge: if can_assign_src_to_sink(srctype, sinktype, strict=True): return "pass" elif can_assign_src_to_sink(srctype, sinktype, strict=False): return "warning" else: return "exception" elif linkMerge == "merge_nested": return check_types({"items": srctype, "type": "array"}, sinktype, None, None) elif linkMerge == "merge_flattened": return check_types(merge_flatten_type(srctype), sinktype, None, None) else: raise WorkflowException(u"Unrecognized linkMerge enu_m '%s'" % linkMerge) def merge_flatten_type(src): # type: (Any) -> Any """Return the merge flattened type of the source type """ if isinstance(src, list): return [merge_flatten_type(t) for t in src] elif isinstance(src, dict) and src.get("type") == "array": return src else: return {"items": src, "type": "array"} def can_assign_src_to_sink(src, sink, strict=False): # type: (Any, Any, bool) -> bool """Check for identical type specifications, ignoring extra keys like inputBinding. src: admissible source types sink: admissible sink types In non-strict comparison, at least one source type must match one sink type. In strict comparison, all source types must match at least one sink type. """ if src == "Any" or sink == "Any": return True if isinstance(src, dict) and isinstance(sink, dict): if src["type"] == "array" and sink["type"] == "array": return can_assign_src_to_sink(src["items"], sink["items"], strict) elif src["type"] == "record" and sink["type"] == "record": return _compare_records(src, sink, strict) return False elif isinstance(src, list): if strict: for t in src: if not can_assign_src_to_sink(t, sink): return False return True else: for t in src: if can_assign_src_to_sink(t, sink): return True return False elif isinstance(sink, list): for t in sink: if can_assign_src_to_sink(src, t): return True return False else: return src == sink def _compare_records(src, sink, strict=False): # type: (Dict[Text, Any], Dict[Text, Any], bool) -> bool """Compare two records, ensuring they have compatible fields. This handles normalizing record names, which will be relative to workflow step, so that they can be compared. """ def _rec_fields(rec): # type: (Dict[Text, Any]) -> Dict[Text, Any] out = {} for field in rec["fields"]: name = shortname(field["name"]) out[name] = field["type"] return out srcfields = _rec_fields(src) sinkfields = _rec_fields(sink) for key in six.iterkeys(sinkfields): if (not can_assign_src_to_sink( srcfields.get(key, "null"), sinkfields.get(key, "null"), strict) and sinkfields.get(key) is not None): _logger.info("Record comparison failure for %s and %s\n" "Did not match fields for %s: %s and %s" % (src["name"], sink["name"], key, srcfields.get(key), sinkfields.get(key))) return False return True def object_from_state(state, parms, frag_only, supportsMultipleInput, sourceField, incomplete=False): # type: (Dict[Text, WorkflowStateItem], List[Dict[Text, Any]], bool, bool, Text, bool) -> Dict[Text, Any] inputobj = {} # type: Dict[Text, Any] for inp in parms: iid = inp["id"] if frag_only: iid = shortname(iid) if sourceField in inp: connections = aslist(inp[sourceField]) if (len(connections) > 1 and not supportsMultipleInput): raise WorkflowException( "Workflow contains multiple inbound links to a single " "parameter but MultipleInputFeatureRequirement is not " "declared.") for src in connections: if src in state and state[src] is not None and (state[src].success == "success" or incomplete): if not match_types( inp["type"], state[src], iid, inputobj, inp.get("linkMerge", ("merge_nested" if len(connections) > 1 else None)), valueFrom=inp.get("valueFrom")): raise WorkflowException( u"Type mismatch between source '%s' (%s) and " "sink '%s' (%s)" % (src, state[src].parameter["type"], inp["id"], inp["type"])) elif src not in state: raise WorkflowException( u"Connect source '%s' on parameter '%s' does not " "exist" % (src, inp["id"])) elif not incomplete: return None if inputobj.get(iid) is None and "default" in inp: inputobj[iid] = copy.copy(inp["default"]) if iid not in inputobj and ("valueFrom" in inp or incomplete): inputobj[iid] = None if iid not in inputobj: raise WorkflowException(u"Value for %s not specified" % (inp["id"])) return inputobj class WorkflowJobStep(object): def __init__(self, step): # type: (Any) -> None self.step = step self.tool = step.tool self.id = step.id self.submitted = False self.completed = False self.iterable = None # type: Iterable self.name = uniquename(u"step %s" % shortname(self.id)) def job(self, joborder, output_callback, **kwargs): # type: (Dict[Text, Text], functools.partial[None], **Any) -> Generator kwargs["part_of"] = self.name kwargs["name"] = shortname(self.id) _logger.info(u"[%s] start", self.name) for j in self.step.job(joborder, output_callback, **kwargs): yield j class WorkflowJob(object): def __init__(self, workflow, **kwargs): # type: (Workflow, **Any) -> None self.workflow = workflow self.tool = workflow.tool self.steps = [WorkflowJobStep(s) for s in workflow.steps] self.state = None # type: Dict[Text, WorkflowStateItem] self.processStatus = None # type: Text self.did_callback = False if "outdir" in kwargs: self.outdir = kwargs["outdir"] elif "tmp_outdir_prefix" in kwargs: self.outdir = tempfile.mkdtemp(prefix=kwargs["tmp_outdir_prefix"]) else: # tmp_outdir_prefix defaults to tmp, so this is unlikely to be used self.outdir = tempfile.mkdtemp() self.name = uniquename(u"workflow %s" % kwargs.get("name", shortname(self.workflow.tool.get("id", "embedded")))) _logger.debug(u"[%s] initialized from %s", self.name, self.tool.get("id", "workflow embedded in %s" % kwargs.get("part_of"))) def do_output_callback(self, final_output_callback): # type: (Callable[[Any, Any], Any]) -> None supportsMultipleInput = bool(self.workflow.get_requirement("MultipleInputFeatureRequirement")[0]) try: wo = object_from_state(self.state, self.tool["outputs"], True, supportsMultipleInput, "outputSource", incomplete=True) except WorkflowException as e: _logger.error(u"[%s] Cannot collect workflow output: %s", self.name, e) wo = {} self.processStatus = "permanentFail" _logger.info(u"[%s] completed %s", self.name, self.processStatus) self.did_callback = True final_output_callback(wo, self.processStatus) def receive_output(self, step, outputparms, final_output_callback, jobout, processStatus): # type: (WorkflowJobStep, List[Dict[Text,Text]], Callable[[Any, Any], Any], Dict[Text,Text], Text) -> None for i in outputparms: if "id" in i: if i["id"] in jobout: self.state[i["id"]] = WorkflowStateItem(i, jobout[i["id"]], processStatus) else: _logger.error(u"[%s] Output is missing expected field %s", step.name, i["id"]) processStatus = "permanentFail" if _logger.isEnabledFor(logging.DEBUG): _logger.debug(u"[%s] produced output %s", step.name, json.dumps(jobout, indent=4)) if processStatus != "success": if self.processStatus != "permanentFail": self.processStatus = processStatus _logger.warning(u"[%s] completed %s", step.name, processStatus) else: _logger.info(u"[%s] completed %s", step.name, processStatus) step.completed = True self.made_progress = True completed = sum(1 for s in self.steps if s.completed) if completed == len(self.steps): self.do_output_callback(final_output_callback) def try_make_job(self, step, final_output_callback, **kwargs): # type: (WorkflowJobStep, Callable[[Any, Any], Any], **Any) -> Generator js_console = kwargs.get("js_console", False) debug = kwargs.get("debug", False) timeout = kwargs.get("eval_timeout") inputparms = step.tool["inputs"] outputparms = step.tool["outputs"] supportsMultipleInput = bool(self.workflow.get_requirement( "MultipleInputFeatureRequirement")[0]) try: inputobj = object_from_state( self.state, inputparms, False, supportsMultipleInput, "source") if inputobj is None: _logger.debug(u"[%s] job step %s not ready", self.name, step.id) return if step.submitted: return _logger.debug(u"[%s] starting %s", self.name, step.name) callback = functools.partial(self.receive_output, step, outputparms, final_output_callback) valueFrom = { i["id"]: i["valueFrom"] for i in step.tool["inputs"] if "valueFrom" in i} if len(valueFrom) > 0 and not bool(self.workflow.get_requirement("StepInputExpressionRequirement")[0]): raise WorkflowException( "Workflow step contains valueFrom but StepInputExpressionRequirement not in requirements") vfinputs = {shortname(k): v for k, v in six.iteritems(inputobj)} def postScatterEval(io): # type: (Dict[Text, Any]) -> Dict[Text, Any] shortio = {shortname(k): v for k, v in six.iteritems(io)} def valueFromFunc(k, v): # type: (Any, Any) -> Any if k in valueFrom: return expression.do_eval( valueFrom[k], shortio, self.workflow.requirements, None, None, {}, context=v, debug=debug, js_console=js_console, timeout=timeout) else: return v return {k: valueFromFunc(k, v) for k, v in io.items()} if "scatter" in step.tool: scatter = aslist(step.tool["scatter"]) method = step.tool.get("scatterMethod") if method is None and len(scatter) != 1: raise WorkflowException("Must specify scatterMethod when scattering over multiple inputs") kwargs["postScatterEval"] = postScatterEval tot = 1 emptyscatter = [shortname(s) for s in scatter if len(inputobj[s]) == 0] if emptyscatter: _logger.warning(u"[job %s] Notice: scattering over empty input in '%s'. All outputs will be empty.", step.name, "', '".join(emptyscatter)) if method == "dotproduct" or method is None: jobs = dotproduct_scatter(step, inputobj, scatter, cast( # known bug with mypy # https://github.com/python/mypy/issues/797 Callable[[Any], Any], callback), **kwargs) elif method == "nested_crossproduct": jobs = nested_crossproduct_scatter(step, inputobj, scatter, cast(Callable[[Any], Any], callback), # known bug in mypy # https://github.com/python/mypy/issues/797 **kwargs) elif method == "flat_crossproduct": jobs = cast(Generator, flat_crossproduct_scatter(step, inputobj, scatter, cast(Callable[[Any], Any], # known bug in mypy # https://github.com/python/mypy/issues/797 callback), 0, **kwargs)) else: if _logger.isEnabledFor(logging.DEBUG): _logger.debug(u"[job %s] job input %s", step.name, json.dumps(inputobj, indent=4)) inputobj = postScatterEval(inputobj) if _logger.isEnabledFor(logging.DEBUG): _logger.debug(u"[job %s] evaluated job input to %s", step.name, json.dumps(inputobj, indent=4)) jobs = step.job(inputobj, callback, **kwargs) step.submitted = True for j in jobs: yield j except WorkflowException: raise except Exception: _logger.exception("Unhandled exception") self.processStatus = "permanentFail" step.completed = True def run(self, **kwargs): _logger.info(u"[%s] start", self.name) def job(self, joborder, output_callback, **kwargs): # type: (Dict[Text, Any], Callable[[Any, Any], Any], **Any) -> Generator self.state = {} self.processStatus = "success" if "outdir" in kwargs: del kwargs["outdir"] for e, i in enumerate(self.tool["inputs"]): with SourceLine(self.tool["inputs"], e, WorkflowException, _logger.isEnabledFor(logging.DEBUG)): iid = shortname(i["id"]) if iid in joborder: self.state[i["id"]] = WorkflowStateItem(i, copy.deepcopy(joborder[iid]), "success") elif "default" in i: self.state[i["id"]] = WorkflowStateItem(i, copy.deepcopy(i["default"]), "success") else: raise WorkflowException( u"Input '%s' not in input object and does not have a default value." % (i["id"])) for s in self.steps: for out in s.tool["outputs"]: self.state[out["id"]] = None completed = 0 while completed < len(self.steps): self.made_progress = False for step in self.steps: if kwargs.get("on_error", "stop") == "stop" and self.processStatus != "success": break if not step.submitted: try: step.iterable = self.try_make_job(step, output_callback, **kwargs) except WorkflowException as e: _logger.error(u"[%s] Cannot make job: %s", step.name, e) _logger.debug("", exc_info=True) self.processStatus = "permanentFail" if step.iterable: try: for newjob in step.iterable: if kwargs.get("on_error", "stop") == "stop" and self.processStatus != "success": break if newjob: self.made_progress = True yield newjob else: break except WorkflowException as e: _logger.error(u"[%s] Cannot make job: %s", step.name, e) _logger.debug("", exc_info=True) self.processStatus = "permanentFail" completed = sum(1 for s in self.steps if s.completed) if not self.made_progress and completed < len(self.steps): if self.processStatus != "success": break else: yield None if not self.did_callback: self.do_output_callback(output_callback) class Workflow(Process): def __init__(self, toolpath_object, **kwargs): # type: (Dict[Text, Any], **Any) -> None super(Workflow, self).__init__(toolpath_object, **kwargs) kwargs["requirements"] = self.requirements kwargs["hints"] = self.hints makeTool = kwargs.get("makeTool") self.steps = [] # type: List[WorkflowStep] validation_errors = [] for n, step in enumerate(self.tool.get("steps", [])): try: self.steps.append(WorkflowStep(step, n, **kwargs)) except validate.ValidationException as v: if _logger.isEnabledFor(logging.DEBUG): _logger.exception("Validation failed at") validation_errors.append(v) if validation_errors: raise validate.ValidationException("\n".join(str(v) for v in validation_errors)) random.shuffle(self.steps) # statically validate data links instead of doing it at runtime. workflow_inputs = self.tool["inputs"] workflow_outputs = self.tool["outputs"] step_inputs = [] # type: List[Any] step_outputs = [] # type: List[Any] for step in self.steps: step_inputs.extend(step.tool["inputs"]) step_outputs.extend(step.tool["outputs"]) static_checker(workflow_inputs, workflow_outputs, step_inputs, step_outputs) def job(self, job_order, # type: Dict[Text, Text] output_callbacks, # type: Callable[[Any, Any], Any] **kwargs # type: Any ): # type: (...) -> Generator[Any, None, None] builder = self._init_job(job_order, **kwargs) wj = WorkflowJob(self, **kwargs) yield wj kwargs["part_of"] = u"workflow %s" % wj.name for w in wj.job(builder.job, output_callbacks, **kwargs): yield w def visit(self, op): op(self.tool) for s in self.steps: s.visit(op) def static_checker(workflow_inputs, workflow_outputs, step_inputs, step_outputs): # type: (List[Dict[Text, Any]], List[Dict[Text, Any]], List[Dict[Text, Any]], List[Dict[Text, Any]]) -> None """Check if all source and sink types of a workflow are compatible before run time. """ # source parameters: workflow_inputs and step_outputs # sink parameters: step_inputs and workflow_outputs # make a dictionary of source parameters, indexed by the "id" field src_parms = workflow_inputs + step_outputs src_dict = {} for parm in src_parms: src_dict[parm["id"]] = parm step_inputs_val = check_all_types(src_dict, step_inputs, "source") workflow_outputs_val = check_all_types(src_dict, workflow_outputs, "outputSource") warnings = step_inputs_val["warning"] + workflow_outputs_val["warning"] exceptions = step_inputs_val["exception"] + workflow_outputs_val["exception"] warning_msgs = [] exception_msgs = [] for warning in warnings: src = warning.src sink = warning.sink linkMerge = warning.linkMerge msg = SourceLine(src, "type").makeError( "Source '%s' of type %s is partially incompatible" % (shortname(src["id"]), json.dumps(src["type"]))) + "\n" + \ SourceLine(sink, "type").makeError( " with sink '%s' of type %s" % (shortname(sink["id"]), json.dumps(sink["type"]))) if linkMerge: msg += "\n" + SourceLine(sink).makeError(" source has linkMerge method %s" % linkMerge) warning_msgs.append(msg) for exception in exceptions: src = exception.src sink = exception.sink linkMerge = exception.linkMerge msg = SourceLine(src, "type").makeError( "Source '%s' of type %s is incompatible" % (shortname(src["id"]), json.dumps(src["type"]))) + "\n" + \ SourceLine(sink, "type").makeError( " with sink '%s' of type %s" % (shortname(sink["id"]), json.dumps(sink["type"]))) if linkMerge: msg += "\n" + SourceLine(sink).makeError(" source has linkMerge method %s" % linkMerge) exception_msgs.append(msg) for sink in step_inputs: if ('null' != sink["type"] and 'null' not in sink["type"] and "source" not in sink and "default" not in sink and "valueFrom" not in sink): msg = SourceLine(sink).makeError( "Required parameter '%s' does not have source, default, or valueFrom expression" % shortname(sink["id"])) exception_msgs.append(msg) all_warning_msg = "\n".join(warning_msgs) all_exception_msg = "\n".join(exception_msgs) if warnings: _logger.warning("Workflow checker warning:\n%s" % all_warning_msg) if exceptions: raise validate.ValidationException(all_exception_msg) SrcSink = namedtuple("SrcSink", ["src", "sink", "linkMerge"]) def check_all_types(src_dict, sinks, sourceField): # type: (Dict[Text, Any], List[Dict[Text, Any]], Text) -> Dict[Text, List[SrcSink]] # sourceField is either "soure" or "outputSource" """Given a list of sinks, check if their types match with the types of their sources. """ validation = {"warning": [], "exception": []} # type: Dict[Text, List[SrcSink]] for sink in sinks: if sourceField in sink: valueFrom = sink.get("valueFrom") if isinstance(sink[sourceField], list): srcs_of_sink = [src_dict[parm_id] for parm_id in sink[sourceField]] linkMerge = sink.get("linkMerge", ("merge_nested" if len(sink[sourceField]) > 1 else None)) else: parm_id = sink[sourceField] srcs_of_sink = [src_dict[parm_id]] linkMerge = None for src in srcs_of_sink: check_result = check_types(src["type"], sink["type"], linkMerge, valueFrom) if check_result == "warning": validation["warning"].append(SrcSink(src, sink, linkMerge)) elif check_result == "exception": validation["exception"].append(SrcSink(src, sink, linkMerge)) return validation class WorkflowStep(Process): def __init__(self, toolpath_object, pos, **kwargs): # type: (Dict[Text, Any], int, **Any) -> None if "id" in toolpath_object: self.id = toolpath_object["id"] else: self.id = "#step" + Text(pos) kwargs["requirements"] = (kwargs.get("requirements", []) + toolpath_object.get("requirements", []) + get_overrides(kwargs.get("overrides", []), self.id).get("requirements", [])) kwargs["hints"] = kwargs.get("hints", []) + toolpath_object.get("hints", []) try: if isinstance(toolpath_object["run"], dict): self.embedded_tool = kwargs.get("makeTool")(toolpath_object["run"], **kwargs) else: self.embedded_tool = load_tool( toolpath_object["run"], kwargs.get("makeTool"), kwargs, enable_dev=kwargs.get("enable_dev"), strict=kwargs.get("strict"), fetcher_constructor=kwargs.get("fetcher_constructor"), resolver=kwargs.get("resolver"), overrides=kwargs.get("overrides")) except validate.ValidationException as v: raise WorkflowException( u"Tool definition %s failed validation:\n%s" % (toolpath_object["run"], validate.indent(str(v)))) validation_errors = [] self.tool = toolpath_object = copy.deepcopy(toolpath_object) bound = set() for stepfield, toolfield in (("in", "inputs"), ("out", "outputs")): toolpath_object[toolfield] = [] for n, step_entry in enumerate(toolpath_object[stepfield]): if isinstance(step_entry, six.string_types): param = CommentedMap() # type: CommentedMap inputid = step_entry else: param = CommentedMap(six.iteritems(step_entry)) inputid = step_entry["id"] shortinputid = shortname(inputid) found = False for tool_entry in self.embedded_tool.tool[toolfield]: frag = shortname(tool_entry["id"]) if frag == shortinputid: #if the case that the step has a default for a parameter, #we do not want the default of the tool to override it step_default = None if "default" in param and "default" in tool_entry: step_default = param["default"] param.update(tool_entry) if step_default is not None: param["default"] = step_default found = True bound.add(frag) break if not found: if stepfield == "in": param["type"] = "Any" else: validation_errors.append( SourceLine(self.tool["out"], n).makeError( "Workflow step output '%s' does not correspond to" % shortname(step_entry)) + "\n" + SourceLine(self.embedded_tool.tool, "outputs").makeError( " tool output (expected '%s')" % ( "', '".join( [shortname(tool_entry["id"]) for tool_entry in self.embedded_tool.tool[toolfield]])))) param["id"] = inputid param.lc.line = toolpath_object[stepfield].lc.data[n][0] param.lc.col = toolpath_object[stepfield].lc.data[n][1] param.lc.filename = toolpath_object[stepfield].lc.filename toolpath_object[toolfield].append(param) missing = [] for i, tool_entry in enumerate(self.embedded_tool.tool["inputs"]): if shortname(tool_entry["id"]) not in bound: if "null" not in tool_entry["type"] and "default" not in tool_entry: missing.append(shortname(tool_entry["id"])) if missing: validation_errors.append(SourceLine(self.tool, "in").makeError( "Step is missing required parameter%s '%s'" % ("s" if len(missing) > 1 else "", "', '".join(missing)))) if validation_errors: raise validate.ValidationException("\n".join(validation_errors)) super(WorkflowStep, self).__init__(toolpath_object, **kwargs) if self.embedded_tool.tool["class"] == "Workflow": (feature, _) = self.get_requirement("SubworkflowFeatureRequirement") if not feature: raise WorkflowException( "Workflow contains embedded workflow but SubworkflowFeatureRequirement not in requirements") if "scatter" in self.tool: (feature, _) = self.get_requirement("ScatterFeatureRequirement") if not feature: raise WorkflowException("Workflow contains scatter but ScatterFeatureRequirement not in requirements") inputparms = copy.deepcopy(self.tool["inputs"]) outputparms = copy.deepcopy(self.tool["outputs"]) scatter = aslist(self.tool["scatter"]) method = self.tool.get("scatterMethod") if method is None and len(scatter) != 1: raise validate.ValidationException("Must specify scatterMethod when scattering over multiple inputs") inp_map = {i["id"]: i for i in inputparms} for s in scatter: if s not in inp_map: raise validate.ValidationException( SourceLine(self.tool, "scatter").makeError(u"Scatter parameter '%s' does not correspond to an input parameter of this " u"step, expecting '%s'" % (shortname(s), "', '".join(shortname(k) for k in inp_map.keys())))) inp_map[s]["type"] = {"type": "array", "items": inp_map[s]["type"]} if self.tool.get("scatterMethod") == "nested_crossproduct": nesting = len(scatter) else: nesting = 1 for r in range(0, nesting): for op in outputparms: op["type"] = {"type": "array", "items": op["type"]} self.tool["inputs"] = inputparms self.tool["outputs"] = outputparms def receive_output(self, output_callback, jobout, processStatus): # type: (Callable[...,Any], Dict[Text, Text], Text) -> None # _logger.debug("WorkflowStep output from run is %s", jobout) output = {} for i in self.tool["outputs"]: field = shortname(i["id"]) if field in jobout: output[i["id"]] = jobout[field] else: processStatus = "permanentFail" output_callback(output, processStatus) def job(self, job_order, # type: Dict[Text, Text] output_callbacks, # type: Callable[[Any, Any], Any] **kwargs # type: Any ): # type: (...) -> Generator[Any, None, None] for i in self.tool["inputs"]: p = i["id"] field = shortname(p) job_order[field] = job_order[i["id"]] del job_order[i["id"]] try: for t in self.embedded_tool.job(job_order, functools.partial( self.receive_output, output_callbacks), **kwargs): yield t except WorkflowException: _logger.error(u"Exception on step '%s'", kwargs.get("name")) raise except Exception as e: _logger.exception("Unexpected exception") raise WorkflowException(Text(e)) def visit(self, op): self.embedded_tool.visit(op) class ReceiveScatterOutput(object): def __init__(self, output_callback, dest): # type: (Callable[..., Any], Dict[Text,List[Text]]) -> None self.dest = dest self.completed = 0 self.processStatus = u"success" self.total = None # type: int self.output_callback = output_callback def receive_scatter_output(self, index, jobout, processStatus): # type: (int, Dict[Text, Text], Text) -> None for k, v in jobout.items(): self.dest[k][index] = v if processStatus != "success": if self.processStatus != "permanentFail": self.processStatus = processStatus self.completed += 1 if self.completed == self.total: self.output_callback(self.dest, self.processStatus) def setTotal(self, total): # type: (int) -> None self.total = total if self.completed == self.total: self.output_callback(self.dest, self.processStatus) def parallel_steps(steps, rc, kwargs): # type: (List[Generator], ReceiveScatterOutput, Dict[str, Any]) -> Generator while rc.completed < rc.total: made_progress = False for index in range(len(steps)): step = steps[index] if kwargs.get("on_error", "stop") == "stop" and rc.processStatus != "success": break try: for j in step: if kwargs.get("on_error", "stop") == "stop" and rc.processStatus != "success": break if j: made_progress = True yield j else: break except WorkflowException as e: _logger.error(u"Cannot make scatter job: %s", e) _logger.debug("", exc_info=True) rc.receive_scatter_output(index, {}, "permanentFail") if not made_progress and rc.completed < rc.total: yield None def dotproduct_scatter(process, joborder, scatter_keys, output_callback, **kwargs): # type: (WorkflowJobStep, Dict[Text, Any], List[Text], Callable[..., Any], **Any) -> Generator l = None for s in scatter_keys: if l is None: l = len(joborder[s]) elif l != len(joborder[s]): raise WorkflowException("Length of input arrays must be equal when performing dotproduct scatter.") output = {} # type: Dict[Text,List[Text]] for i in process.tool["outputs"]: output[i["id"]] = [None] * l rc = ReceiveScatterOutput(output_callback, output) steps = [] for n in range(0, l): jo = copy.copy(joborder) for s in scatter_keys: jo[s] = joborder[s][n] jo = kwargs["postScatterEval"](jo) steps.append(process.job(jo, functools.partial(rc.receive_scatter_output, n), **kwargs)) rc.setTotal(l) return parallel_steps(steps, rc, kwargs) def nested_crossproduct_scatter(process, joborder, scatter_keys, output_callback, **kwargs): # type: (WorkflowJobStep, Dict[Text, Any], List[Text], Callable[..., Any], **Any) -> Generator scatter_key = scatter_keys[0] l = len(joborder[scatter_key]) output = {} # type: Dict[Text,List[Text]] for i in process.tool["outputs"]: output[i["id"]] = [None] * l rc = ReceiveScatterOutput(output_callback, output) steps = [] for n in range(0, l): jo = copy.copy(joborder) jo[scatter_key] = joborder[scatter_key][n] if len(scatter_keys) == 1: jo = kwargs["postScatterEval"](jo) steps.append(process.job(jo, functools.partial(rc.receive_scatter_output, n), **kwargs)) else: # known bug with mypy, https://github.com/python/mypy/issues/797 casted = cast(Callable[[Any], Any], functools.partial(rc.receive_scatter_output, n)) steps.append(nested_crossproduct_scatter(process, jo, scatter_keys[1:], casted, **kwargs)) rc.setTotal(l) return parallel_steps(steps, rc, kwargs) def crossproduct_size(joborder, scatter_keys): # type: (Dict[Text, Any], List[Text]) -> int scatter_key = scatter_keys[0] if len(scatter_keys) == 1: sum = len(joborder[scatter_key]) else: sum = 0 for n in range(0, len(joborder[scatter_key])): jo = copy.copy(joborder) jo[scatter_key] = joborder[scatter_key][n] sum += crossproduct_size(joborder, scatter_keys[1:]) return sum def flat_crossproduct_scatter(process, joborder, scatter_keys, output_callback, startindex, **kwargs): # type: (WorkflowJobStep, Dict[Text, Any], List[Text], Union[ReceiveScatterOutput,Callable[..., Any]], int, **Any) -> Union[List[Generator], Generator] scatter_key = scatter_keys[0] l = len(joborder[scatter_key]) rc = None # type: ReceiveScatterOutput if startindex == 0 and not isinstance(output_callback, ReceiveScatterOutput): output = {} # type: Dict[Text,List[Text]] for i in process.tool["outputs"]: output[i["id"]] = [None] * crossproduct_size(joborder, scatter_keys) rc = ReceiveScatterOutput(output_callback, output) elif isinstance(output_callback, ReceiveScatterOutput): rc = output_callback else: raise Exception("Unhandled code path. Please report this.") steps = [] put = startindex for n in range(0, l): jo = copy.copy(joborder) jo[scatter_key] = joborder[scatter_key][n] if len(scatter_keys) == 1: jo = kwargs["postScatterEval"](jo) steps.append(process.job(jo, functools.partial(rc.receive_scatter_output, put), **kwargs)) put += 1 else: add = flat_crossproduct_scatter(process, jo, scatter_keys[1:], rc, put, **kwargs) put += len(cast(List[Generator], add)) steps.extend(add) if startindex == 0 and not isinstance(output_callback, ReceiveScatterOutput): rc.setTotal(put) return parallel_steps(steps, rc, kwargs) else: return steps cwltool-1.0.20180302231433/cwltool/docker.py0000644000175200017520000002654313247251315021114 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import logging import os import re import shutil import subprocess import sys import tempfile from io import open import datetime import requests from typing import (Dict, List, Text, Any, MutableMapping) from .docker_id import docker_vm_id from .errors import WorkflowException from .job import ContainerCommandLineJob from .pathmapper import PathMapper, ensure_writable from .utils import docker_windows_path_adjust, onWindows _logger = logging.getLogger("cwltool") class DockerCommandLineJob(ContainerCommandLineJob): @staticmethod def get_image(dockerRequirement, pull_image, dry_run=False): # type: (Dict[Text, Text], bool, bool) -> bool found = False if "dockerImageId" not in dockerRequirement and "dockerPull" in dockerRequirement: dockerRequirement["dockerImageId"] = dockerRequirement["dockerPull"] for ln in subprocess.check_output( ["docker", "images", "--no-trunc", "--all"]).decode('utf-8').splitlines(): try: m = re.match(r"^([^ ]+)\s+([^ ]+)\s+([^ ]+)", ln) sp = dockerRequirement["dockerImageId"].split(":") if len(sp) == 1: sp.append("latest") elif len(sp) == 2: # if sp[1] doesn't match valid tag names, it is a part of repository if not re.match(r'[\w][\w.-]{0,127}', sp[1]): sp[0] = sp[0] + ":" + sp[1] sp[1] = "latest" elif len(sp) == 3: if re.match(r'[\w][\w.-]{0,127}', sp[2]): sp[0] = sp[0] + ":" + sp[1] sp[1] = sp[2] del sp[2] # check for repository:tag match or image id match if ((sp[0] == m.group(1) and sp[1] == m.group(2)) or dockerRequirement["dockerImageId"] == m.group(3)): found = True break except ValueError: pass if not found and pull_image: cmd = [] # type: List[Text] if "dockerPull" in dockerRequirement: cmd = ["docker", "pull", str(dockerRequirement["dockerPull"])] _logger.info(Text(cmd)) if not dry_run: subprocess.check_call(cmd, stdout=sys.stderr) found = True elif "dockerFile" in dockerRequirement: dockerfile_dir = str(tempfile.mkdtemp()) with open(os.path.join(dockerfile_dir, "Dockerfile"), "wb") as df: df.write(dockerRequirement["dockerFile"].encode('utf-8')) cmd = ["docker", "build", "--tag=%s" % str(dockerRequirement["dockerImageId"]), dockerfile_dir] _logger.info(Text(cmd)) if not dry_run: subprocess.check_call(cmd, stdout=sys.stderr) found = True elif "dockerLoad" in dockerRequirement: cmd = ["docker", "load"] _logger.info(Text(cmd)) if not dry_run: if os.path.exists(dockerRequirement["dockerLoad"]): _logger.info(u"Loading docker image from %s", dockerRequirement["dockerLoad"]) with open(dockerRequirement["dockerLoad"], "rb") as f: loadproc = subprocess.Popen(cmd, stdin=f, stdout=sys.stderr) else: loadproc = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=sys.stderr) _logger.info(u"Sending GET request to %s", dockerRequirement["dockerLoad"]) req = requests.get(dockerRequirement["dockerLoad"], stream=True) n = 0 for chunk in req.iter_content(1024 * 1024): n += len(chunk) _logger.info("\r%i bytes" % (n)) loadproc.stdin.write(chunk) loadproc.stdin.close() rcode = loadproc.wait() if rcode != 0: raise WorkflowException("Docker load returned non-zero exit status %i" % (rcode)) found = True elif "dockerImport" in dockerRequirement: cmd = ["docker", "import", str(dockerRequirement["dockerImport"]), str(dockerRequirement["dockerImageId"])] _logger.info(Text(cmd)) if not dry_run: subprocess.check_call(cmd, stdout=sys.stderr) found = True return found def get_from_requirements(self, r, req, pull_image, dry_run=False): # type: (Dict[Text, Text], bool, bool, bool) -> Text if r: errmsg = None try: subprocess.check_output(["docker", "version"]) except subprocess.CalledProcessError as e: errmsg = "Cannot communicate with docker daemon: " + Text(e) except OSError as e: errmsg = "'docker' executable not found: " + Text(e) if errmsg: if req: raise WorkflowException(errmsg) else: return None if self.get_image(r, pull_image, dry_run): return r["dockerImageId"] else: if req: raise WorkflowException(u"Docker image %s not found" % r["dockerImageId"]) return None def add_volumes(self, pathmapper, runtime): # type: (PathMapper, List[Text]) -> None host_outdir = self.outdir container_outdir = self.builder.outdir for src, vol in pathmapper.items(): if not vol.staged: continue if vol.target.startswith(container_outdir+"/"): host_outdir_tgt = os.path.join( host_outdir, vol.target[len(container_outdir)+1:]) else: host_outdir_tgt = None if vol.type in ("File", "Directory"): if not vol.resolved.startswith("_:"): runtime.append(u"--volume=%s:%s:ro" % ( docker_windows_path_adjust(vol.resolved), docker_windows_path_adjust(vol.target))) elif vol.type == "WritableFile": if self.inplace_update: runtime.append(u"--volume=%s:%s:rw" % ( docker_windows_path_adjust(vol.resolved), docker_windows_path_adjust(vol.target))) else: shutil.copy(vol.resolved, host_outdir_tgt) ensure_writable(host_outdir_tgt) elif vol.type == "WritableDirectory": if vol.resolved.startswith("_:"): os.makedirs(host_outdir_tgt, 0o0755) else: if self.inplace_update: runtime.append(u"--volume=%s:%s:rw" % ( docker_windows_path_adjust(vol.resolved), docker_windows_path_adjust(vol.target))) else: shutil.copytree(vol.resolved, host_outdir_tgt) ensure_writable(host_outdir_tgt) elif vol.type == "CreateFile": if host_outdir_tgt: with open(host_outdir_tgt, "wb") as f: f.write(vol.resolved.encode("utf-8")) else: fd, createtmp = tempfile.mkstemp(dir=self.tmpdir) with os.fdopen(fd, "wb") as f: f.write(vol.resolved.encode("utf-8")) runtime.append(u"--volume=%s:%s:rw" % ( docker_windows_path_adjust(createtmp), docker_windows_path_adjust(vol.target))) def create_runtime(self, env, rm_container=True, record_container_id=False, cidfile_dir="", cidfile_prefix="", **kwargs): # type: (MutableMapping[Text, Text], bool, bool, Text, Text, **Any) -> List user_space_docker_cmd = kwargs.get("user_space_docker_cmd") if user_space_docker_cmd: runtime = [user_space_docker_cmd, u"run"] else: runtime = [u"docker", u"run", u"-i"] runtime.append(u"--volume=%s:%s:rw" % ( docker_windows_path_adjust(os.path.realpath(self.outdir)), self.builder.outdir)) runtime.append(u"--volume=%s:%s:rw" % ( docker_windows_path_adjust(os.path.realpath(self.tmpdir)), "/tmp")) self.add_volumes(self.pathmapper, runtime) if self.generatemapper: self.add_volumes(self.generatemapper, runtime) if user_space_docker_cmd: runtime = [x.replace(":ro", "") for x in runtime] runtime = [x.replace(":rw", "") for x in runtime] runtime.append(u"--workdir=%s" % ( docker_windows_path_adjust(self.builder.outdir))) if not user_space_docker_cmd: if not kwargs.get("no_read_only"): runtime.append(u"--read-only=true") if kwargs.get("custom_net", None) is not None: runtime.append(u"--net={0}".format(kwargs.get("custom_net"))) elif kwargs.get("disable_net", None): runtime.append(u"--net=none") if self.stdout: runtime.append("--log-driver=none") euid, egid = docker_vm_id() if not onWindows(): # MS Windows does not have getuid() or geteuid() functions euid, egid = euid or os.geteuid(), egid or os.getgid() if kwargs.get("no_match_user", None) is False \ and (euid, egid) != (None, None): runtime.append(u"--user=%d:%d" % (euid, egid)) if rm_container: runtime.append(u"--rm") runtime.append(u"--env=TMPDIR=/tmp") # spec currently says "HOME must be set to the designated output # directory." but spec might change to designated temp directory. # runtime.append("--env=HOME=/tmp") runtime.append(u"--env=HOME=%s" % self.builder.outdir) # add parameters to docker to write a container ID file if record_container_id: if cidfile_dir != "": if not os.path.isdir(cidfile_dir): _logger.error("--cidfile-dir %s error:\n%s", cidfile_dir, cidfile_dir + " is not a directory or " "directory doesn't exist, please check it first") exit(2) if not os.path.exists(cidfile_dir): _logger.error("--cidfile-dir %s error:\n%s", cidfile_dir, "directory doesn't exist, please create it first") exit(2) else: cidfile_dir = os.getcwd() cidfile_name = datetime.datetime.now().strftime("%Y%m%d%H%M%S-%f") + ".cid" if cidfile_prefix != "": cidfile_name = str(cidfile_prefix + "-" + cidfile_name) cidfile_path = os.path.join(cidfile_dir, cidfile_name) runtime.append(u"--cidfile=%s" % cidfile_path) for t, v in self.environment.items(): runtime.append(u"--env=%s=%s" % (t, v)) return runtime cwltool-1.0.20180302231433/cwltool/loghandler.py0000644000175200017520000000025613247251315021755 0ustar mcrusoemcrusoe00000000000000import logging _logger = logging.getLogger("cwltool") defaultStreamHandler = logging.StreamHandler() _logger.addHandler(defaultStreamHandler) _logger.setLevel(logging.INFO) cwltool-1.0.20180302231433/cwltool/software_requirements.py0000644000175200017520000001146513247251316024300 0ustar mcrusoemcrusoe00000000000000"""This module handles resolution of SoftwareRequirement hints. This is accomplished mainly by adapting cwltool internals to galaxy-lib's concept of "dependencies". Despite the name, galaxy-lib is a light weight library that can be used to map SoftwareRequirements in all sorts of ways - Homebrew, Conda, custom scripts, environment modules. We'd be happy to find ways to adapt new packages managers and such as well. """ from __future__ import absolute_import import argparse import os import string from typing import (Any, Dict, List, Text) try: from galaxy.tools.deps.requirements import ToolRequirement, ToolRequirements from galaxy.tools import deps except ImportError: ToolRequirement = None # type: ignore ToolRequirements = None # type: ignore deps = None from .utils import get_feature SOFTWARE_REQUIREMENTS_ENABLED = deps is not None COMMAND_WITH_DEPENDENCIES_TEMPLATE = string.Template("""#!/bin/bash $handle_dependencies python "run_job.py" "job.json" """) class DependenciesConfiguration(object): def __init__(self, args): # type: (argparse.Namespace) -> None conf_file = getattr(args, "beta_dependency_resolvers_configuration", None) tool_dependency_dir = getattr(args, "beta_dependencies_directory", None) conda_dependencies = getattr(args, "beta_conda_dependencies", None) if conf_file is not None and os.path.exists(conf_file): self.use_tool_dependencies = True if not tool_dependency_dir: tool_dependency_dir = os.path.abspath(os.path.dirname(conf_file)) self.tool_dependency_dir = tool_dependency_dir self.dependency_resolvers_config_file = conf_file elif conda_dependencies: if not tool_dependency_dir: tool_dependency_dir = os.path.abspath("./cwltool_deps") self.tool_dependency_dir = tool_dependency_dir self.use_tool_dependencies = True self.dependency_resolvers_config_file = None else: self.use_tool_dependencies = False @property def config_dict(self): return { 'conda_auto_install': True, 'conda_auto_init': True, } def build_job_script(self, builder, command): # type: (Any, List[str]) -> Text ensure_galaxy_lib_available() tool_dependency_manager = deps.build_dependency_manager(self) # type: deps.DependencyManager dependencies = get_dependencies(builder) handle_dependencies = "" # str if dependencies: handle_dependencies = "\n".join(tool_dependency_manager.dependency_shell_commands(dependencies, job_directory=builder.tmpdir)) template_kwds = dict(handle_dependencies=handle_dependencies) # type: Dict[str, str] job_script = COMMAND_WITH_DEPENDENCIES_TEMPLATE.substitute(template_kwds) return job_script def get_dependencies(builder): # type: (Any) -> ToolRequirements (software_requirement, _) = get_feature(builder, "SoftwareRequirement") dependencies = [] # type: List[ToolRequirement] if software_requirement and software_requirement.get("packages"): packages = software_requirement.get("packages") for package in packages: version = package.get("version", None) if isinstance(version, list): if version: version = version[0] else: version = None specs = [{"uri": s} for s in package.get("specs", [])] dependencies.append(ToolRequirement.from_dict(dict( name=package["package"].split("#")[-1], version=version, type="package", specs=specs, ))) return ToolRequirements.from_list(dependencies) def get_container_from_software_requirements(args, builder): if args.beta_use_biocontainers: ensure_galaxy_lib_available() from galaxy.tools.deps.containers import ContainerRegistry, AppInfo, ToolInfo, DOCKER_CONTAINER_TYPE app_info = AppInfo( involucro_auto_init=True, enable_beta_mulled_containers=True, container_image_cache_path=".", ) # type: AppInfo container_registry = ContainerRegistry(app_info) # type: ContainerRegistry requirements = get_dependencies(builder) tool_info = ToolInfo(requirements=requirements) # type: ToolInfo container_description = container_registry.find_best_container_description([DOCKER_CONTAINER_TYPE], tool_info) if container_description: return container_description.identifier return None def ensure_galaxy_lib_available(): # type: () -> None if not SOFTWARE_REQUIREMENTS_ENABLED: raise Exception("Optional Python library galaxy-lib not available, it is required for this configuration.") cwltool-1.0.20180302231433/cwltool/expression.py0000644000175200017520000002236213247251315022037 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import copy import json import logging import re from typing import Any, AnyStr, Dict, List, Text, Union from .utils import docker_windows_path_adjust import six from six import u from . import sandboxjs from .errors import WorkflowException from .utils import bytes2str_in_dicts _logger = logging.getLogger("cwltool") def jshead(engineConfig, rootvars): # type: (List[Text], Dict[Text, Any]) -> Text # make sure all the byte strings are converted # to str in `rootvars` dict. # TODO: need to make sure the `rootvars dict` # contains no bytes type in the first place. if six.PY3: rootvars = bytes2str_in_dicts(rootvars) # type: ignore return u"\n".join(engineConfig + [u"var %s = %s;" % (k, json.dumps(v, indent=4)) for k, v in rootvars.items()]) # decode all raw strings to unicode seg_symbol = r"""\w+""" seg_single = r"""\['([^']|\\')+'\]""" seg_double = r"""\["([^"]|\\")+"\]""" seg_index = r"""\[[0-9]+\]""" segments = r"(\.%s|%s|%s|%s)" % (seg_symbol, seg_single, seg_double, seg_index) segment_re = re.compile(u(segments), flags=re.UNICODE) param_str = r"\((%s)%s*\)$" % (seg_symbol, segments) param_re = re.compile(u(param_str), flags=re.UNICODE) JSON = Union[Dict[Any, Any], List[Any], Text, int, float, bool, None] class SubstitutionError(Exception): pass def scanner(scan): # type: (Text) -> List[int] DEFAULT = 0 DOLLAR = 1 PAREN = 2 BRACE = 3 SINGLE_QUOTE = 4 DOUBLE_QUOTE = 5 BACKSLASH = 6 i = 0 stack = [DEFAULT] start = 0 while i < len(scan): state = stack[-1] c = scan[i] if state == DEFAULT: if c == '$': stack.append(DOLLAR) elif c == '\\': stack.append(BACKSLASH) elif state == BACKSLASH: stack.pop() if stack[-1] == DEFAULT: return [i - 1, i + 1] elif state == DOLLAR: if c == '(': start = i - 1 stack.append(PAREN) elif c == '{': start = i - 1 stack.append(BRACE) else: stack.pop() elif state == PAREN: if c == '(': stack.append(PAREN) elif c == ')': stack.pop() if stack[-1] == DOLLAR: return [start, i + 1] elif c == "'": stack.append(SINGLE_QUOTE) elif c == '"': stack.append(DOUBLE_QUOTE) elif state == BRACE: if c == '{': stack.append(BRACE) elif c == '}': stack.pop() if stack[-1] == DOLLAR: return [start, i + 1] elif c == "'": stack.append(SINGLE_QUOTE) elif c == '"': stack.append(DOUBLE_QUOTE) elif state == SINGLE_QUOTE: if c == "'": stack.pop() elif c == '\\': stack.append(BACKSLASH) elif state == DOUBLE_QUOTE: if c == '"': stack.pop() elif c == '\\': stack.append(BACKSLASH) i += 1 if len(stack) > 1: raise SubstitutionError( "Substitution error, unfinished block starting at position {}: {}".format(start, scan[start:])) else: return None def next_seg(parsed_string, remaining_string, current_value): # type: (Text, Text, Any) -> Any if remaining_string: m = segment_re.match(remaining_string) next_segment_str = m.group(0) key = None # type: Union[Text, int] if next_segment_str[0] == '.': key = next_segment_str[1:] elif next_segment_str[1] in ("'", '"'): key = next_segment_str[2:-2].replace("\\'", "'").replace('\\"', '"') if key: if isinstance(current_value, list) and key == "length" and not remaining_string[m.end(0):]: return len(current_value) if not isinstance(current_value, dict): raise WorkflowException("%s is a %s, cannot index on string '%s'" % (parsed_string, type(current_value).__name__, key)) if key not in current_value: raise WorkflowException("%s does not contain key '%s'" % (parsed_string, key)) else: try: key = int(next_segment_str[1:-1]) except ValueError as v: raise WorkflowException(u(str(v))) if not isinstance(current_value, list): raise WorkflowException("%s is a %s, cannot index on int '%s'" % (parsed_string, type(current_value).__name__, key)) if key >= len(current_value): raise WorkflowException("%s list index %i out of range" % (parsed_string, key)) try: return next_seg(parsed_string + remaining_string, remaining_string[m.end(0):], current_value[key]) except KeyError: raise WorkflowException("%s doesn't have property %s" % (parsed_string, key)) else: return current_value def evaluator(ex, jslib, obj, fullJS=False, timeout=None, force_docker_pull=False, debug=False, js_console=False): # type: (Text, Text, Dict[Text, Any], bool, int, bool, bool, bool) -> JSON m = param_re.match(ex) expression_parse_exception = None expression_parse_succeeded = False if m: first_symbol = m.group(1) first_symbol_end = m.end(1) if first_symbol_end + 1 == len(ex) and first_symbol == "null": return None try: if obj.get(first_symbol) is None: raise WorkflowException("%s is not defined" % first_symbol) return next_seg(first_symbol, ex[first_symbol_end:-1], obj[first_symbol]) except WorkflowException as w: expression_parse_exception = w else: expression_parse_succeeded = True if fullJS and not expression_parse_succeeded: return sandboxjs.execjs(ex, jslib, timeout=timeout, force_docker_pull=force_docker_pull, debug=debug, js_console=js_console) else: if expression_parse_exception is not None: raise sandboxjs.JavascriptException( "Syntax error in parameter reference '%s': %s. This could be due to using Javascript code without specifying InlineJavascriptRequirement." % \ (ex[1:-1], expression_parse_exception)) else: raise sandboxjs.JavascriptException( "Syntax error in parameter reference '%s'. This could be due to using Javascript code without specifying InlineJavascriptRequirement." % \ ex) def interpolate(scan, rootvars, timeout=None, fullJS=None, jslib="", force_docker_pull=False, debug=False, js_console=False, strip_whitespace=True): # type: (Text, Dict[Text, Any], int, bool, Union[str, Text], bool, bool, bool, bool) -> JSON if strip_whitespace: scan = scan.strip() parts = [] w = scanner(scan) while w: parts.append(scan[0:w[0]]) if scan[w[0]] == '$': e = evaluator(scan[w[0] + 1:w[1]], jslib, rootvars, fullJS=fullJS, timeout=timeout, force_docker_pull=force_docker_pull, debug=debug, js_console=js_console) if w[0] == 0 and w[1] == len(scan) and len(parts) <= 1: return e leaf = json.dumps(e, sort_keys=True) if leaf[0] == '"': leaf = leaf[1:-1] parts.append(leaf) elif scan[w[0]] == '\\': e = scan[w[1] - 1] parts.append(e) scan = scan[w[1]:] w = scanner(scan) parts.append(scan) return ''.join(parts) def do_eval(ex, jobinput, requirements, outdir, tmpdir, resources, context=None, pull_image=True, timeout=None, force_docker_pull=False, debug=False, js_console=False, strip_whitespace=True): # type: (Union[dict, AnyStr], Dict[Text, Union[Dict, List, Text]], List[Dict[Text, Any]], Text, Text, Dict[Text, Union[int, Text]], Any, bool, int, bool, bool, bool, bool) -> Any runtime = copy.copy(resources) runtime["tmpdir"] = docker_windows_path_adjust(tmpdir) runtime["outdir"] = docker_windows_path_adjust(outdir) rootvars = { u"inputs": jobinput, u"self": context, u"runtime": runtime} if isinstance(ex, (str, Text)) and ("$(" in ex or "${" in ex): fullJS = False jslib = u"" for r in reversed(requirements): if r["class"] == "InlineJavascriptRequirement": fullJS = True jslib = jshead(r.get("expressionLib", []), rootvars) break try: return interpolate(ex, rootvars, timeout=timeout, fullJS=fullJS, jslib=jslib, force_docker_pull=force_docker_pull, debug=debug, js_console=js_console, strip_whitespace=strip_whitespace) except Exception as e: raise WorkflowException("Expression evaluation error:\n%s" % e) else: return ex cwltool-1.0.20180302231433/cwltool/update.py0000644000175200017520000001375213247251316021126 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import copy import json import re import traceback from typing import (Any, Callable, Dict, Text, # pylint: disable=unused-import Tuple, Union) from copy import deepcopy import six from six.moves import urllib import schema_salad.validate from ruamel.yaml.comments import CommentedMap, CommentedSeq from schema_salad.ref_resolver import Loader from .utils import aslist def findId(doc, frg): # type: (Any, Any) -> Dict if isinstance(doc, dict): if "id" in doc and doc["id"] == frg: return doc else: for d in doc: f = findId(doc[d], frg) if f: return f if isinstance(doc, list): for d in doc: f = findId(d, frg) if f: return f return None def fixType(doc): # type: (Any) -> Any if isinstance(doc, list): for i, f in enumerate(doc): doc[i] = fixType(f) return doc if isinstance(doc, (str, Text)): if doc not in ( "null", "boolean", "int", "long", "float", "double", "string", "File", "record", "enum", "array", "Any") and "#" not in doc: return "#" + doc return doc digits = re.compile("\d+") def updateScript(sc): # type: (Text) -> Text sc = sc.replace("$job", "inputs") sc = sc.replace("$tmpdir", "runtime.tmpdir") sc = sc.replace("$outdir", "runtime.outdir") sc = sc.replace("$self", "self") return sc def _updateDev2Script(ent): # type: (Any) -> Any if isinstance(ent, dict) and "engine" in ent: if ent["engine"] == "https://w3id.org/cwl/cwl#JsonPointer": sp = ent["script"].split("/") if sp[0] in ("tmpdir", "outdir"): return u"$(runtime.%s)" % sp[0] else: if not sp[0]: sp.pop(0) front = sp.pop(0) sp = [Text(i) if digits.match(i) else "'" + i + "'" for i in sp] if front == "job": return u"$(inputs[%s])" % ']['.join(sp) elif front == "context": return u"$(self[%s])" % ']['.join(sp) else: sc = updateScript(ent["script"]) if sc[0] == "{": return "$" + sc else: return u"$(%s)" % sc else: return ent def traverseImport(doc, loader, baseuri, func): # type: (Any, Loader, Text, Callable[[Any, Loader, Text], Any]) -> Any if "$import" in doc: if doc["$import"][0] == "#": return doc["$import"] else: imp = urllib.parse.urljoin(baseuri, doc["$import"]) impLoaded = loader.fetch(imp) r = {} # type: Dict[Text, Any] if isinstance(impLoaded, list): r = {"$graph": impLoaded} elif isinstance(impLoaded, dict): r = impLoaded else: raise Exception("Unexpected code path.") r["id"] = imp _, frag = urllib.parse.urldefrag(imp) if frag: frag = "#" + frag r = findId(r, frag) return func(r, loader, imp) def v1_0dev4to1_0(doc, loader, baseuri): # type: (Any, Loader, Text) -> Tuple[Any, Text] """Public updater for v1.0.dev4 to v1.0.""" return (doc, "v1.0") def v1_0to1_1_0dev1(doc, loader, baseuri): # type: (Any, Loader, Text) -> Tuple[Any, Text] """Public updater for v1.0 to v1.1.0-dev1.""" return (doc, "v1.1.0-dev1") UPDATES = { "v1.0": None } # type: Dict[Text, Callable[[Any, Loader, Text], Tuple[Any, Text]]] DEVUPDATES = { "v1.0": v1_0to1_1_0dev1, "v1.1.0-dev1": None } # type: Dict[Text, Callable[[Any, Loader, Text], Tuple[Any, Text]]] ALLUPDATES = UPDATES.copy() ALLUPDATES.update(DEVUPDATES) LATEST = "v1.0" def identity(doc, loader, baseuri): # pylint: disable=unused-argument # type: (Any, Loader, Text) -> Tuple[Any, Union[Text, Text]] """The default, do-nothing, CWL document upgrade function.""" return (doc, doc["cwlVersion"]) def checkversion(doc, metadata, enable_dev): # type: (Union[CommentedSeq, CommentedMap], CommentedMap, bool) -> Tuple[Dict[Text, Any], Text] # pylint: disable=line-too-long """Checks the validity of the version of the give CWL document. Returns the document and the validated version string. """ cdoc = None # type: CommentedMap if isinstance(doc, CommentedSeq): lc = metadata.lc metadata = copy.copy(metadata) metadata.lc.data = copy.copy(lc.data) metadata.lc.filename = lc.filename metadata[u"$graph"] = doc cdoc = metadata elif isinstance(doc, CommentedMap): cdoc = doc else: raise Exception("Expected CommentedMap or CommentedSeq") version = cdoc[u"cwlVersion"] if version not in UPDATES: if version in DEVUPDATES: if enable_dev: pass else: raise schema_salad.validate.ValidationException( u"Version '%s' is a development or deprecated version.\n " "Update your document to a stable version (%s) or use " "--enable-dev to enable support for development and " "deprecated versions." % (version, ", ".join( list(UPDATES.keys())))) else: raise schema_salad.validate.ValidationException( u"Unrecognized version %s" % version) return (cdoc, version) def update(doc, loader, baseuri, enable_dev, metadata): # type: (Union[CommentedSeq, CommentedMap], Loader, Text, bool, Any) -> dict (cdoc, version) = checkversion(doc, metadata, enable_dev) nextupdate = identity # type: Callable[[Any, Loader, Text], Tuple[Any, Text]] while nextupdate: (cdoc, version) = nextupdate(cdoc, loader, baseuri) nextupdate = ALLUPDATES[version] cdoc[u"cwlVersion"] = version return cdoc cwltool-1.0.20180302231433/cwltool/process.py0000644000175200017520000010700213247251315021311 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import abc import copy import errno import functools import hashlib import json import logging import os import shutil import stat import tempfile import uuid from collections import Iterable from io import open from functools import cmp_to_key from typing import (Any, Callable, Dict, Generator, List, Set, Text, Tuple, Union, cast) import avro.schema import schema_salad.schema import schema_salad.validate as validate import six from pkg_resources import resource_stream from rdflib import Graph, URIRef from rdflib.namespace import OWL, RDFS from ruamel.yaml.comments import CommentedMap, CommentedSeq from schema_salad.ref_resolver import Loader, file_uri from schema_salad.sourceline import SourceLine from six.moves import urllib from .utils import cmp_like_py2 from .builder import Builder from .errors import UnsupportedRequirement, WorkflowException from .pathmapper import (PathMapper, adjustDirObjs, get_listing, normalizeFilesDirs, visit_class, trim_listing, ensure_writable) from .stdfsaccess import StdFsAccess from .utils import aslist, get_feature, copytree_with_merge, onWindows # if six.PY3: # AvroSchemaFromJSONData = avro.schema.SchemaFromJSONData # else: AvroSchemaFromJSONData = avro.schema.make_avsc_object class LogAsDebugFilter(logging.Filter): def __init__(self, name, parent): # type: (Text, logging.Logger) -> None name = str(name) super(LogAsDebugFilter, self).__init__(name) self.parent = parent def filter(self, record): return self.parent.isEnabledFor(logging.DEBUG) _logger = logging.getLogger("cwltool") _logger_validation_warnings = logging.getLogger("cwltool.validation_warnings") _logger_validation_warnings.setLevel(_logger.getEffectiveLevel()) _logger_validation_warnings.addFilter(LogAsDebugFilter("cwltool.validation_warnings", _logger)) supportedProcessRequirements = ["DockerRequirement", "SchemaDefRequirement", "EnvVarRequirement", "ScatterFeatureRequirement", "SubworkflowFeatureRequirement", "MultipleInputFeatureRequirement", "InlineJavascriptRequirement", "ShellCommandRequirement", "StepInputExpressionRequirement", "ResourceRequirement", "InitialWorkDirRequirement", "http://commonwl.org/cwltool#LoadListingRequirement", "http://commonwl.org/cwltool#InplaceUpdateRequirement"] cwl_files = ( "Workflow.yml", "CommandLineTool.yml", "CommonWorkflowLanguage.yml", "Process.yml", "concepts.md", "contrib.md", "intro.md", "invocation.md") salad_files = ('metaschema.yml', 'metaschema_base.yml', 'salad.md', 'field_name.yml', 'import_include.md', 'link_res.yml', 'ident_res.yml', 'vocab_res.yml', 'vocab_res.yml', 'field_name_schema.yml', 'field_name_src.yml', 'field_name_proc.yml', 'ident_res_schema.yml', 'ident_res_src.yml', 'ident_res_proc.yml', 'link_res_schema.yml', 'link_res_src.yml', 'link_res_proc.yml', 'vocab_res_schema.yml', 'vocab_res_src.yml', 'vocab_res_proc.yml') SCHEMA_CACHE = {} # type: Dict[Text, Tuple[Loader, Union[avro.schema.Names, avro.schema.SchemaParseException], Dict[Text, Any], Loader]] SCHEMA_FILE = None # type: Dict[Text, Any] SCHEMA_DIR = None # type: Dict[Text, Any] SCHEMA_ANY = None # type: Dict[Text, Any] custom_schemas = {} # type: Dict[Text, Tuple[Text, Text]] def use_standard_schema(version): # type: (Text) -> None if version in custom_schemas: del custom_schemas[version] if version in SCHEMA_CACHE: del SCHEMA_CACHE[version] def use_custom_schema(version, name, text): # type: (Text, Text, Text) -> None custom_schemas[version] = (name, text) if version in SCHEMA_CACHE: del SCHEMA_CACHE[version] def get_schema(version): # type: (Text) -> Tuple[Loader, Union[avro.schema.Names, avro.schema.SchemaParseException], Dict[Text,Any], Loader] if version in SCHEMA_CACHE: return SCHEMA_CACHE[version] cache = {} # type: Dict[Text, Union[bytes, Text]] version = version.split("#")[-1] if '.dev' in version: version = ".".join(version.split(".")[:-1]) for f in cwl_files: try: res = resource_stream(__name__, 'schemas/%s/%s' % (version, f)) cache["https://w3id.org/cwl/" + f] = res.read() res.close() except IOError: pass for f in salad_files: try: res = resource_stream( __name__, 'schemas/%s/salad/schema_salad/metaschema/%s' % (version, f)) cache["https://w3id.org/cwl/salad/schema_salad/metaschema/" + f] = res.read() res.close() except IOError: pass if version in custom_schemas: cache[custom_schemas[version][0]] = custom_schemas[version][1] SCHEMA_CACHE[version] = schema_salad.schema.load_schema( custom_schemas[version][0], cache=cache) else: SCHEMA_CACHE[version] = schema_salad.schema.load_schema( "https://w3id.org/cwl/CommonWorkflowLanguage.yml", cache=cache) return SCHEMA_CACHE[version] def shortname(inputid): # type: (Text) -> Text d = urllib.parse.urlparse(inputid) if d.fragment: return d.fragment.split(u"/")[-1] else: return d.path.split(u"/")[-1] def checkRequirements(rec, supportedProcessRequirements): # type: (Any, Iterable[Any]) -> None if isinstance(rec, dict): if "requirements" in rec: for i, r in enumerate(rec["requirements"]): with SourceLine(rec["requirements"], i, UnsupportedRequirement): if r["class"] not in supportedProcessRequirements: raise UnsupportedRequirement(u"Unsupported requirement %s" % r["class"]) for d in rec: checkRequirements(rec[d], supportedProcessRequirements) if isinstance(rec, list): for d in rec: checkRequirements(d, supportedProcessRequirements) def adjustFilesWithSecondary(rec, op, primary=None): """Apply a mapping function to each File path in the object `rec`, propagating the primary file associated with a group of secondary files. """ if isinstance(rec, dict): if rec.get("class") == "File": rec["path"] = op(rec["path"], primary=primary) adjustFilesWithSecondary(rec.get("secondaryFiles", []), op, primary if primary else rec["path"]) else: for d in rec: adjustFilesWithSecondary(rec[d], op) if isinstance(rec, list): for d in rec: adjustFilesWithSecondary(d, op, primary) def stageFiles(pm, stageFunc=None, ignoreWritable=False, symLink=True): # type: (PathMapper, Callable[..., Any], bool, bool) -> None for f, p in pm.items(): if not p.staged: continue if not os.path.exists(os.path.dirname(p.target)): os.makedirs(os.path.dirname(p.target), 0o0755) if p.type in ("File", "Directory") and (os.path.exists(p.resolved)): if symLink: # Use symlink func if allowed if onWindows(): if p.type == "File": shutil.copy(p.resolved, p.target) elif p.type == "Directory": if os.path.exists(p.target) and os.path.isdir(p.target): shutil.rmtree(p.target) copytree_with_merge(p.resolved, p.target) else: os.symlink(p.resolved, p.target) elif stageFunc is not None: stageFunc(p.resolved, p.target) elif p.type == "Directory" and not os.path.exists(p.target) and p.resolved.startswith("_:"): os.makedirs(p.target, 0o0755) elif p.type == "WritableFile" and not ignoreWritable: shutil.copy(p.resolved, p.target) ensure_writable(p.target) elif p.type == "WritableDirectory" and not ignoreWritable: if p.resolved.startswith("_:"): os.makedirs(p.target, 0o0755) else: shutil.copytree(p.resolved, p.target) ensure_writable(p.target) elif p.type == "CreateFile": with open(p.target, "wb") as n: n.write(p.resolved.encode("utf-8")) ensure_writable(p.target) def collectFilesAndDirs(obj, out): # type: (Union[Dict[Text, Any], List[Dict[Text, Any]]], List[Dict[Text, Any]]) -> None if isinstance(obj, dict): if obj.get("class") in ("File", "Directory"): out.append(obj) else: for v in obj.values(): collectFilesAndDirs(v, out) if isinstance(obj, list): for l in obj: collectFilesAndDirs(l, out) def relocateOutputs(outputObj, outdir, output_dirs, action, fs_access, compute_checksum): # type: (Union[Dict[Text, Any], List[Dict[Text, Any]]], Text, Set[Text], Text, StdFsAccess, bool) -> Union[Dict[Text, Any], List[Dict[Text, Any]]] adjustDirObjs(outputObj, functools.partial(get_listing, fs_access, recursive=True)) if action not in ("move", "copy"): return outputObj def moveIt(src, dst): if action == "move": for a in output_dirs: if src.startswith(a+"/"): _logger.debug("Moving %s to %s", src, dst) if os.path.isdir(src) and os.path.isdir(dst): # merge directories for root, dirs, files in os.walk(src): for f in dirs+files: moveIt(os.path.join(root, f), os.path.join(dst, f)) else: shutil.move(src, dst) return if src != dst: _logger.debug("Copying %s to %s", src, dst) if os.path.isdir(src): if os.path.isdir(dst): shutil.rmtree(dst) elif os.path.isfile(dst): os.unlink(dst) shutil.copytree(src, dst) else: shutil.copy2(src, dst) outfiles = [] # type: List[Dict[Text, Any]] collectFilesAndDirs(outputObj, outfiles) pm = PathMapper(outfiles, "", outdir, separateDirs=False) stageFiles(pm, stageFunc=moveIt, symLink=False) def _check_adjust(f): f["location"] = file_uri(pm.mapper(f["location"])[1]) if "contents" in f: del f["contents"] return f visit_class(outputObj, ("File", "Directory"), _check_adjust) if compute_checksum: visit_class(outputObj, ("File",), functools.partial(compute_checksums, fs_access)) # If there are symlinks to intermediate output directories, we want to move # the real files into the final output location. If a file is linked more than once, # make an internal relative symlink. if action == "move": relinked = {} # type: Dict[Text, Text] for root, dirs, files in os.walk(outdir): for f in dirs+files: path = os.path.join(root, f) rp = os.path.realpath(path) if path != rp: if rp in relinked: if onWindows(): if os.path.isfile(path): shutil.copy(os.path.relpath(relinked[rp], path), path) elif os.path.exists(path) and os.path.isdir(path): shutil.rmtree(path) copytree_with_merge(os.path.relpath(relinked[rp], path), path) else: os.unlink(path) os.symlink(os.path.relpath(relinked[rp], path), path) else: for od in output_dirs: if rp.startswith(od+"/"): os.unlink(path) os.rename(rp, path) relinked[rp] = path break return outputObj def cleanIntermediate(output_dirs): # type: (Set[Text]) -> None for a in output_dirs: if os.path.exists(a) and empty_subtree(a): _logger.debug(u"Removing intermediate output directory %s", a) shutil.rmtree(a, True) def formatSubclassOf(fmt, cls, ontology, visited): # type: (Text, Text, Graph, Set[Text]) -> bool """Determine if `fmt` is a subclass of `cls`.""" if URIRef(fmt) == URIRef(cls): return True if ontology is None: return False if fmt in visited: return False visited.add(fmt) uriRefFmt = URIRef(fmt) for s, p, o in ontology.triples((uriRefFmt, RDFS.subClassOf, None)): # Find parent classes of `fmt` and search upward if formatSubclassOf(o, cls, ontology, visited): return True for s, p, o in ontology.triples((uriRefFmt, OWL.equivalentClass, None)): # Find equivalent classes of `fmt` and search horizontally if formatSubclassOf(o, cls, ontology, visited): return True for s, p, o in ontology.triples((None, OWL.equivalentClass, uriRefFmt)): # Find equivalent classes of `fmt` and search horizontally if formatSubclassOf(s, cls, ontology, visited): return True return False def checkFormat(actualFile, inputFormats, ontology): # type: (Union[Dict[Text, Any], List, Text], Union[List[Text], Text], Graph) -> None for af in aslist(actualFile): if not af: continue if "format" not in af: raise validate.ValidationException(u"Missing required 'format' for File %s" % af) for inpf in aslist(inputFormats): if af["format"] == inpf or formatSubclassOf(af["format"], inpf, ontology, set()): return raise validate.ValidationException( u"Incompatible file format %s required format(s) %s" % (af["format"], inputFormats)) def fillInDefaults(inputs, job): # type: (List[Dict[Text, Text]], Dict[Text, Union[Dict[Text, Any], List, Text]]) -> None for e, inp in enumerate(inputs): with SourceLine(inputs, e, WorkflowException, _logger.isEnabledFor(logging.DEBUG)): fieldname = shortname(inp[u"id"]) if job.get(fieldname) is not None: pass elif job.get(fieldname) is None and u"default" in inp: job[fieldname] = copy.copy(inp[u"default"]) elif job.get(fieldname) is None and u"null" in aslist(inp[u"type"]): job[fieldname] = None else: raise WorkflowException("Missing required input parameter '%s'" % shortname(inp["id"])) def avroize_type(field_type, name_prefix=""): # type: (Union[List[Dict[Text, Any]], Dict[Text, Any]], Text) -> Any """ adds missing information to a type so that CWL types are valid in schema_salad. """ if isinstance(field_type, list): for f in field_type: avroize_type(f, name_prefix) elif isinstance(field_type, dict): if field_type["type"] in ("enum", "record"): if "name" not in field_type: field_type["name"] = name_prefix + Text(uuid.uuid4()) if field_type["type"] == "record": avroize_type(field_type["fields"], name_prefix) if field_type["type"] == "array": avroize_type(field_type["items"], name_prefix) return field_type def get_overrides(overrides, toolid): # type: (List[Dict[Text, Any]], Text) -> Dict[Text, Any] req = {} # type: Dict[Text, Any] if not isinstance(overrides, list): raise validate.ValidationException("Expected overrides to be a list, but was %s" % type(overrides)) for ov in overrides: if ov["overrideTarget"] == toolid: req.update(ov) return req class Process(six.with_metaclass(abc.ABCMeta, object)): def __init__(self, toolpath_object, **kwargs): # type: (Dict[Text, Any], **Any) -> None """ kwargs: metadata: tool document metadata requirements: inherited requirements hints: inherited hints loader: schema_salad.ref_resolver.Loader used to load tool document avsc_names: CWL Avro schema object used to validate document strict: flag to determine strict validation (fail on unrecognized fields) """ self.metadata = kwargs.get("metadata", {}) # type: Dict[Text,Any] self.names = None # type: avro.schema.Names global SCHEMA_FILE, SCHEMA_DIR, SCHEMA_ANY # pylint: disable=global-statement if SCHEMA_FILE is None: get_schema("v1.0") SCHEMA_ANY = cast(Dict[Text, Any], SCHEMA_CACHE["v1.0"][3].idx["https://w3id.org/cwl/salad#Any"]) SCHEMA_FILE = cast(Dict[Text, Any], SCHEMA_CACHE["v1.0"][3].idx["https://w3id.org/cwl/cwl#File"]) SCHEMA_DIR = cast(Dict[Text, Any], SCHEMA_CACHE["v1.0"][3].idx["https://w3id.org/cwl/cwl#Directory"]) names = schema_salad.schema.make_avro_schema([SCHEMA_FILE, SCHEMA_DIR, SCHEMA_ANY], schema_salad.ref_resolver.Loader({}))[0] if isinstance(names, avro.schema.SchemaParseException): raise names else: self.names = names self.tool = toolpath_object self.requirements = (kwargs.get("requirements", []) + self.tool.get("requirements", []) + get_overrides(kwargs.get("overrides", []), self.tool["id"]).get("requirements", [])) self.hints = kwargs.get("hints", []) + self.tool.get("hints", []) self.formatgraph = None # type: Graph if "loader" in kwargs: self.formatgraph = kwargs["loader"].graph self.doc_loader = kwargs["loader"] self.doc_schema = kwargs["avsc_names"] checkRequirements(self.tool, supportedProcessRequirements) self.validate_hints(kwargs["avsc_names"], self.tool.get("hints", []), strict=kwargs.get("strict")) self.schemaDefs = {} # type: Dict[Text,Dict[Text, Any]] sd, _ = self.get_requirement("SchemaDefRequirement") if sd: sdtypes = sd["types"] av = schema_salad.schema.make_valid_avro(sdtypes, {t["name"]: t for t in avroize_type(sdtypes)}, set()) for i in av: self.schemaDefs[i["name"]] = i # type: ignore AvroSchemaFromJSONData(av, self.names) # type: ignore # Build record schema from inputs self.inputs_record_schema = { "name": "input_record_schema", "type": "record", "fields": []} # type: Dict[Text, Any] self.outputs_record_schema = { "name": "outputs_record_schema", "type": "record", "fields": []} # type: Dict[Text, Any] for key in ("inputs", "outputs"): for i in self.tool[key]: c = copy.copy(i) c["name"] = shortname(c["id"]) del c["id"] if "type" not in c: raise validate.ValidationException(u"Missing 'type' in " "parameter '%s'" % c["name"]) if "default" in c and "null" not in aslist(c["type"]): c["type"] = ["null"] + aslist(c["type"]) else: c["type"] = c["type"] c["type"] = avroize_type(c["type"], c["name"]) if key == "inputs": self.inputs_record_schema["fields"].append(c) elif key == "outputs": self.outputs_record_schema["fields"].append(c) try: self.inputs_record_schema = cast(Dict[six.text_type, Any], schema_salad.schema.make_valid_avro(self.inputs_record_schema, {}, set())) AvroSchemaFromJSONData(self.inputs_record_schema, self.names) except avro.schema.SchemaParseException as e: raise validate.ValidationException(u"Got error '%s' while " "processing inputs of %s:\n%s" % (Text(e), self.tool["id"], json.dumps(self.inputs_record_schema, indent=4))) try: self.outputs_record_schema = cast(Dict[six.text_type, Any], schema_salad.schema.make_valid_avro(self.outputs_record_schema, {}, set())) AvroSchemaFromJSONData(self.outputs_record_schema, self.names) except avro.schema.SchemaParseException as e: raise validate.ValidationException(u"Got error '%s' while " "processing outputs of %s:\n%s" % (Text(e), self.tool["id"], json.dumps(self.outputs_record_schema, indent=4))) def _init_job(self, joborder, **kwargs): # type: (Dict[Text, Text], **Any) -> Builder """ kwargs: eval_timeout: javascript evaluation timeout use_container: do/don't use Docker when DockerRequirement hint provided make_fs_access: make an FsAccess() object with given basedir basedir: basedir for FsAccess docker_outdir: output directory inside docker for this job docker_tmpdir: tmpdir inside docker for this job docker_stagedir: stagedir inside docker for this job outdir: outdir on host for this job tmpdir: tmpdir on host for this job stagedir: stagedir on host for this job select_resources: callback to select compute resources debug: enable debugging output js_console: enable javascript console output """ builder = Builder() builder.job = cast(Dict[Text, Union[Dict[Text, Any], List, Text]], copy.deepcopy(joborder)) # Validate job order try: fillInDefaults(self.tool[u"inputs"], builder.job) normalizeFilesDirs(builder.job) validate.validate_ex(self.names.get_name("input_record_schema", ""), builder.job, strict=False, logger=_logger_validation_warnings) except (validate.ValidationException, WorkflowException) as e: raise WorkflowException("Invalid job input record:\n" + Text(e)) builder.files = [] builder.bindings = CommentedSeq() builder.schemaDefs = self.schemaDefs builder.names = self.names builder.requirements = self.requirements builder.hints = self.hints builder.resources = {} builder.timeout = kwargs.get("eval_timeout") builder.debug = kwargs.get("debug") builder.js_console = kwargs.get("js_console") builder.mutation_manager = kwargs.get("mutation_manager") builder.make_fs_access = kwargs.get("make_fs_access") or StdFsAccess builder.fs_access = builder.make_fs_access(kwargs["basedir"]) builder.force_docker_pull = kwargs.get("force_docker_pull") loadListingReq, _ = self.get_requirement("http://commonwl.org/cwltool#LoadListingRequirement") if loadListingReq: builder.loadListing = loadListingReq.get("loadListing") dockerReq, is_req = self.get_requirement("DockerRequirement") defaultDocker = None if dockerReq is None and "default_container" in kwargs: defaultDocker = kwargs["default_container"] if (dockerReq or defaultDocker) and kwargs.get("use_container"): if dockerReq: # Check if docker output directory is absolute if dockerReq.get("dockerOutputDirectory") and dockerReq.get("dockerOutputDirectory").startswith('/'): builder.outdir = dockerReq.get("dockerOutputDirectory") else: builder.outdir = builder.fs_access.docker_compatible_realpath( dockerReq.get("dockerOutputDirectory") or kwargs.get("docker_outdir") or "/var/spool/cwl") elif defaultDocker: builder.outdir = builder.fs_access.docker_compatible_realpath( kwargs.get("docker_outdir") or "/var/spool/cwl") builder.tmpdir = builder.fs_access.docker_compatible_realpath(kwargs.get("docker_tmpdir") or "/tmp") builder.stagedir = builder.fs_access.docker_compatible_realpath(kwargs.get("docker_stagedir") or "/var/lib/cwl") else: builder.outdir = builder.fs_access.realpath(kwargs.get("outdir") or tempfile.mkdtemp()) if self.tool[u"class"] != 'Workflow': builder.tmpdir = builder.fs_access.realpath(kwargs.get("tmpdir") or tempfile.mkdtemp()) builder.stagedir = builder.fs_access.realpath(kwargs.get("stagedir") or tempfile.mkdtemp()) if self.formatgraph: for i in self.tool["inputs"]: d = shortname(i["id"]) if d in builder.job and i.get("format"): checkFormat(builder.job[d], builder.do_eval(i["format"]), self.formatgraph) builder.bindings.extend(builder.bind_input(self.inputs_record_schema, builder.job)) if self.tool.get("baseCommand"): for n, b in enumerate(aslist(self.tool["baseCommand"])): builder.bindings.append({ "position": [-1000000, n], "datum": b }) if self.tool.get("arguments"): for i, a in enumerate(self.tool["arguments"]): lc = self.tool["arguments"].lc.data[i] fn = self.tool["arguments"].lc.filename builder.bindings.lc.add_kv_line_col(len(builder.bindings), lc) if isinstance(a, dict): a = copy.copy(a) if a.get("position"): a["position"] = [a["position"], i] else: a["position"] = [0, i] builder.bindings.append(a) elif ("$(" in a) or ("${" in a): cm = CommentedMap(( ("position", [0, i]), ("valueFrom", a) )) cm.lc.add_kv_line_col("valueFrom", lc) cm.lc.filename = fn builder.bindings.append(cm) else: cm = CommentedMap(( ("position", [0, i]), ("datum", a) )) cm.lc.add_kv_line_col("datum", lc) cm.lc.filename = fn builder.bindings.append(cm) # use python2 like sorting of heterogeneous lists # (containing str and int types), # TODO: unify for both runtime if six.PY3: key = cmp_to_key(cmp_like_py2) else: # PY2 key = lambda dict: dict["position"] builder.bindings.sort(key=key) builder.resources = self.evalResources(builder, kwargs) builder.job_script_provider = kwargs.get("job_script_provider", None) return builder def evalResources(self, builder, kwargs): # type: (Builder, Dict[str, Any]) -> Dict[Text, Union[int, Text]] resourceReq, _ = self.get_requirement("ResourceRequirement") if resourceReq is None: resourceReq = {} request = { "coresMin": 1, "coresMax": 1, "ramMin": 1024, "ramMax": 1024, "tmpdirMin": 1024, "tmpdirMax": 1024, "outdirMin": 1024, "outdirMax": 1024 } for a in ("cores", "ram", "tmpdir", "outdir"): mn = None mx = None if resourceReq.get(a + "Min"): mn = builder.do_eval(resourceReq[a + "Min"]) if resourceReq.get(a + "Max"): mx = builder.do_eval(resourceReq[a + "Max"]) if mn is None: mn = mx elif mx is None: mx = mn if mn: request[a + "Min"] = mn request[a + "Max"] = mx if kwargs.get("select_resources"): return kwargs["select_resources"](request) else: return { "cores": request["coresMin"], "ram": request["ramMin"], "tmpdirSize": request["tmpdirMin"], "outdirSize": request["outdirMin"], } def validate_hints(self, avsc_names, hints, strict): # type: (Any, List[Dict[Text, Any]], bool) -> None for i, r in enumerate(hints): sl = SourceLine(hints, i, validate.ValidationException) with sl: if avsc_names.get_name(r["class"], "") is not None: plain_hint = dict((key, r[key]) for key in r if key not in self.doc_loader.identifiers) # strip identifiers validate.validate_ex( avsc_names.get_name(plain_hint["class"], ""), plain_hint, strict=strict) else: _logger.info(sl.makeError(u"Unknown hint %s" % (r["class"]))) def get_requirement(self, feature): # type: (Any) -> Tuple[Any, bool] return get_feature(self, feature) def visit(self, op): # type: (Callable[[Dict[Text, Any]], None]) -> None op(self.tool) @abc.abstractmethod def job(self, job_order, # type: Dict[Text, Text] output_callbacks, # type: Callable[[Any, Any], Any] **kwargs # type: Any ): # type: (...) -> Generator[Any, None, None] return None def empty_subtree(dirpath): # type: (Text) -> bool # Test if a directory tree contains any files (does not count empty # subdirectories) for d in os.listdir(dirpath): d = os.path.join(dirpath, d) try: if stat.S_ISDIR(os.stat(d).st_mode): if empty_subtree(d) is False: return False else: return False except OSError as e: if e.errno == errno.ENOENT: pass else: raise return True _names = set() # type: Set[Text] def uniquename(stem, names=None): # type: (Text, Set[Text]) -> Text global _names if names is None: names = _names c = 1 u = stem while u in names: c += 1 u = u"%s_%s" % (stem, c) names.add(u) return u def nestdir(base, deps): # type: (Text, Dict[Text, Any]) -> Dict[Text, Any] dirname = os.path.dirname(base) + "/" subid = deps["location"] if subid.startswith(dirname): s2 = subid[len(dirname):] sp = s2.split('/') sp.pop() while sp: nx = sp.pop() deps = { "class": "Directory", "basename": nx, "listing": [deps] } return deps def mergedirs(listing): # type: (List[Dict[Text, Any]]) -> List[Dict[Text, Any]] r = [] # type: List[Dict[Text, Any]] ents = {} # type: Dict[Text, Any] for e in listing: if e["basename"] not in ents: ents[e["basename"]] = e elif e["class"] == "Directory" and e.get("listing"): ents[e["basename"]].setdefault("listing", []).extend(e["listing"]) for e in six.itervalues(ents): if e["class"] == "Directory" and "listing" in e: e["listing"] = mergedirs(e["listing"]) r.extend(six.itervalues(ents)) return r def scandeps(base, doc, reffields, urlfields, loadref, urljoin=urllib.parse.urljoin): # type: (Text, Any, Set[Text], Set[Text], Callable[[Text, Text], Any], Callable[[Text, Text], Text]) -> List[Dict[Text, Text]] r = [] # type: List[Dict[Text, Text]] deps = None # type: Dict[Text, Any] if isinstance(doc, dict): if "id" in doc: if doc["id"].startswith("file://"): df, _ = urllib.parse.urldefrag(doc["id"]) if base != df: r.append({ "class": "File", "location": df }) base = df if doc.get("class") in ("File", "Directory") and "location" in urlfields: u = doc.get("location", doc.get("path")) if u and not u.startswith("_:"): deps = { "class": doc["class"], "location": urljoin(base, u) } if doc["class"] == "Directory" and "listing" in doc: deps["listing"] = doc["listing"] if doc["class"] == "File" and "secondaryFiles" in doc: deps["secondaryFiles"] = doc["secondaryFiles"] deps = nestdir(base, deps) r.append(deps) else: if doc["class"] == "Directory" and "listing" in doc: r.extend(scandeps(base, doc["listing"], reffields, urlfields, loadref, urljoin=urljoin)) elif doc["class"] == "File" and "secondaryFiles" in doc: r.extend(scandeps(base, doc["secondaryFiles"], reffields, urlfields, loadref, urljoin=urljoin)) for k, v in six.iteritems(doc): if k in reffields: for u in aslist(v): if isinstance(u, dict): r.extend(scandeps(base, u, reffields, urlfields, loadref, urljoin=urljoin)) else: sub = loadref(base, u) subid = urljoin(base, u) deps = { "class": "File", "location": subid } sf = scandeps(subid, sub, reffields, urlfields, loadref, urljoin=urljoin) if sf: deps["secondaryFiles"] = sf deps = nestdir(base, deps) r.append(deps) elif k in urlfields and k != "location": for u in aslist(v): deps = { "class": "File", "location": urljoin(base, u) } deps = nestdir(base, deps) r.append(deps) elif k not in ("listing", "secondaryFiles"): r.extend(scandeps(base, v, reffields, urlfields, loadref, urljoin=urljoin)) elif isinstance(doc, list): for d in doc: r.extend(scandeps(base, d, reffields, urlfields, loadref, urljoin=urljoin)) if r: normalizeFilesDirs(r) r = mergedirs(r) return r def compute_checksums(fs_access, fileobj): if "checksum" not in fileobj: checksum = hashlib.sha1() with fs_access.open(fileobj["location"], "rb") as f: contents = f.read(1024 * 1024) while contents != b"": checksum.update(contents) contents = f.read(1024 * 1024) f.seek(0, 2) filesize = f.tell() fileobj["checksum"] = "sha1$%s" % checksum.hexdigest() fileobj["size"] = filesize cwltool-1.0.20180302231433/cwltool/load_tool.py0000644000175200017520000003533013247251315021613 0ustar mcrusoemcrusoe00000000000000"""Loads a CWL document.""" from __future__ import absolute_import # pylint: disable=unused-import import logging import os import re import uuid import hashlib import json import copy from typing import (Any, Callable, Dict, Iterable, List, Mapping, Optional, Text, Tuple, Union, cast) import requests.sessions from six import itervalues, string_types from six.moves import urllib import schema_salad.schema as schema from avro.schema import Names from ruamel.yaml.comments import CommentedMap, CommentedSeq from schema_salad.ref_resolver import ContextType, Fetcher, Loader, file_uri from schema_salad.sourceline import cmap, SourceLine from schema_salad.validate import ValidationException from . import process, update from .errors import WorkflowException from .process import Process, shortname, get_schema from .update import ALLUPDATES _logger = logging.getLogger("cwltool") jobloaderctx = { u"cwl": "https://w3id.org/cwl/cwl#", u"cwltool": "http://commonwl.org/cwltool#", u"path": {u"@type": u"@id"}, u"location": {u"@type": u"@id"}, u"format": {u"@type": u"@id"}, u"id": u"@id" } # type: ContextType overrides_ctx = { u"overrideTarget": {u"@type": u"@id"}, u"cwltool": "http://commonwl.org/cwltool#", u"overrides": { "@id": "cwltool:overrides", "mapSubject": "overrideTarget", }, "requirements": { "@id": "https://w3id.org/cwl/cwl#requirements", "mapSubject": "class" } } # type: ContextType FetcherConstructorType = Callable[[Dict[Text, Union[Text, bool]], requests.sessions.Session], Fetcher] loaders = {} # type: Dict[FetcherConstructorType, Loader] def default_loader(fetcher_constructor): # type: (Optional[FetcherConstructorType]) -> Loader if fetcher_constructor in loaders: return loaders[fetcher_constructor] else: loader = Loader(jobloaderctx, fetcher_constructor=fetcher_constructor) loaders[fetcher_constructor] = loader return loader def resolve_tool_uri(argsworkflow, # type: Text resolver=None, # type: Callable[[Loader, Union[Text, Dict[Text, Any]]], Text] fetcher_constructor=None, # type: FetcherConstructorType document_loader=None # type: Loader ): # type: (...) -> Tuple[Text, Text] uri = None # type: Text split = urllib.parse.urlsplit(argsworkflow) # In case of Windows path, urlsplit misjudge Drive letters as scheme, here we are skipping that if split.scheme and split.scheme in [u'http', u'https', u'file']: uri = argsworkflow elif os.path.exists(os.path.abspath(argsworkflow)): uri = file_uri(str(os.path.abspath(argsworkflow))) elif resolver: if document_loader is None: document_loader = default_loader(fetcher_constructor) # type: ignore uri = resolver(document_loader, argsworkflow) if uri is None: raise ValidationException("Not found: '%s'" % argsworkflow) if argsworkflow != uri: _logger.info("Resolved '%s' to '%s'", argsworkflow, uri) fileuri = urllib.parse.urldefrag(uri)[0] return uri, fileuri def fetch_document(argsworkflow, # type: Union[Text, Dict[Text, Any]] resolver=None, # type: Callable[[Loader, Union[Text, Dict[Text, Any]]], Text] fetcher_constructor=None # type: FetcherConstructorType ): # type: (...) -> Tuple[Loader, CommentedMap, Text] """Retrieve a CWL document.""" document_loader = default_loader(fetcher_constructor) # type: ignore uri = None # type: Text workflowobj = None # type: CommentedMap if isinstance(argsworkflow, string_types): uri, fileuri = resolve_tool_uri(argsworkflow, resolver=resolver, document_loader=document_loader) workflowobj = document_loader.fetch(fileuri) elif isinstance(argsworkflow, dict): uri = "#" + Text(id(argsworkflow)) workflowobj = cast(CommentedMap, cmap(argsworkflow, fn=uri)) else: raise ValidationException("Must be URI or object: '%s'" % argsworkflow) return document_loader, workflowobj, uri def _convert_stdstreams_to_files(workflowobj): # type: (Union[Dict[Text, Any], List[Dict[Text, Any]]]) -> None if isinstance(workflowobj, dict): if workflowobj.get('class') == 'CommandLineTool': for out in workflowobj.get('outputs', []): if type(out) is not CommentedMap: with SourceLine(workflowobj, "outputs", ValidationException, _logger.isEnabledFor(logging.DEBUG)): raise ValidationException("Output '%s' is not a valid OutputParameter." % out) for streamtype in ['stdout', 'stderr']: if out.get('type') == streamtype: if 'outputBinding' in out: raise ValidationException( "Not allowed to specify outputBinding when" " using %s shortcut." % streamtype) if streamtype in workflowobj: filename = workflowobj[streamtype] else: filename = Text(hashlib.sha1(json.dumps(workflowobj, sort_keys=True).encode('utf-8')).hexdigest()) workflowobj[streamtype] = filename out['type'] = 'File' out['outputBinding'] = cmap({'glob': filename}) for inp in workflowobj.get('inputs', []): if inp.get('type') == 'stdin': if 'inputBinding' in inp: raise ValidationException( "Not allowed to specify inputBinding when" " using stdin shortcut.") if 'stdin' in workflowobj: raise ValidationException( "Not allowed to specify stdin path when" " using stdin type shortcut.") else: workflowobj['stdin'] = \ "$(inputs.%s.path)" % \ inp['id'].rpartition('#')[2] inp['type'] = 'File' else: for entry in itervalues(workflowobj): _convert_stdstreams_to_files(entry) if isinstance(workflowobj, list): for entry in workflowobj: _convert_stdstreams_to_files(entry) def _add_blank_ids(workflowobj): # type: (Union[Dict[Text, Any], List[Dict[Text, Any]]]) -> None if isinstance(workflowobj, dict): if ("run" in workflowobj and isinstance(workflowobj["run"], dict) and "id" not in workflowobj["run"] and "$import" not in workflowobj["run"]): workflowobj["run"]["id"] = Text(uuid.uuid4()) for entry in itervalues(workflowobj): _add_blank_ids(entry) if isinstance(workflowobj, list): for entry in workflowobj: _add_blank_ids(entry) def validate_document(document_loader, # type: Loader workflowobj, # type: CommentedMap uri, # type: Text enable_dev=False, # type: bool strict=True, # type: bool preprocess_only=False, # type: bool fetcher_constructor=None, # type: FetcherConstructorType skip_schemas=None, # type: bool overrides=None, # type: List[Dict] metadata=None, # type: Optional[Dict] ): # type: (...) -> Tuple[Loader, Names, Union[Dict[Text, Any], List[Dict[Text, Any]]], Dict[Text, Any], Text] """Validate a CWL document.""" if isinstance(workflowobj, list): workflowobj = cmap({ "$graph": workflowobj }, fn=uri) if not isinstance(workflowobj, dict): raise ValueError("workflowjobj must be a dict, got '%s': %s" % (type(workflowobj), workflowobj)) jobobj = None if "cwl:tool" in workflowobj: job_loader = default_loader(fetcher_constructor) # type: ignore jobobj, _ = job_loader.resolve_all(workflowobj, uri) uri = urllib.parse.urljoin(uri, workflowobj["https://w3id.org/cwl/cwl#tool"]) del cast(dict, jobobj)["https://w3id.org/cwl/cwl#tool"] if "http://commonwl.org/cwltool#overrides" in jobobj: overrides.extend(resolve_overrides(jobobj, uri, uri)) del jobobj["http://commonwl.org/cwltool#overrides"] workflowobj = fetch_document(uri, fetcher_constructor=fetcher_constructor)[1] fileuri = urllib.parse.urldefrag(uri)[0] if "cwlVersion" not in workflowobj: if metadata and 'cwlVersion' in metadata: workflowobj['cwlVersion'] = metadata['cwlVersion'] else: raise ValidationException( "No cwlVersion found. " "Use the following syntax in your CWL document to declare the version: cwlVersion: .\n" "Note: if this is a CWL draft-2 (pre v1.0) document then it will need to be upgraded first.") if not isinstance(workflowobj["cwlVersion"], (str, Text)): raise Exception("'cwlVersion' must be a string, got %s" % type(workflowobj["cwlVersion"])) # strip out version workflowobj["cwlVersion"] = re.sub( r"^(?:cwl:|https://w3id.org/cwl/cwl#)", "", workflowobj["cwlVersion"]) if workflowobj["cwlVersion"] not in list(ALLUPDATES): # print out all the Supported Versions of cwlVersion versions = [] for version in list(ALLUPDATES): if "dev" in version: version += " (with --enable-dev flag only)" versions.append(version) versions.sort() raise ValidationException("The CWL reference runner no longer supports pre CWL v1.0 documents. " "Supported versions are: \n{}".format("\n".join(versions))) (sch_document_loader, avsc_names) = \ process.get_schema(workflowobj["cwlVersion"])[:2] if isinstance(avsc_names, Exception): raise avsc_names processobj = None # type: Union[CommentedMap, CommentedSeq, Text] document_loader = Loader(sch_document_loader.ctx, schemagraph=sch_document_loader.graph, idx=document_loader.idx, cache=sch_document_loader.cache, fetcher_constructor=fetcher_constructor, skip_schemas=skip_schemas) _add_blank_ids(workflowobj) workflowobj["id"] = fileuri processobj, new_metadata = document_loader.resolve_all(workflowobj, fileuri) if not isinstance(processobj, (CommentedMap, CommentedSeq)): raise ValidationException("Workflow must be a dict or list.") if not new_metadata: new_metadata = cast(CommentedMap, cmap( {"$namespaces": processobj.get("$namespaces", {}), "$schemas": processobj.get("$schemas", []), "cwlVersion": processobj["cwlVersion"]}, fn=fileuri)) _convert_stdstreams_to_files(workflowobj) if preprocess_only: return document_loader, avsc_names, processobj, new_metadata, uri schema.validate_doc(avsc_names, processobj, document_loader, strict) if new_metadata.get("cwlVersion") != update.LATEST: processobj = cast(CommentedMap, cmap(update.update( processobj, document_loader, fileuri, enable_dev, new_metadata))) if jobobj: new_metadata[u"cwl:defaults"] = jobobj if overrides: new_metadata[u"cwltool:overrides"] = overrides return document_loader, avsc_names, processobj, new_metadata, uri def make_tool(document_loader, # type: Loader avsc_names, # type: Names metadata, # type: Dict[Text, Any] uri, # type: Text makeTool, # type: Callable[..., Process] kwargs # type: dict ): # type: (...) -> Process """Make a Python CWL object.""" resolveduri = document_loader.resolve_ref(uri)[0] processobj = None if isinstance(resolveduri, list): for obj in resolveduri: if obj['id'].endswith('#main'): processobj = obj break if not processobj: raise WorkflowException( u"Tool file contains graph of multiple objects, must specify " "one of #%s" % ", #".join( urllib.parse.urldefrag(i["id"])[1] for i in resolveduri if "id" in i)) elif isinstance(resolveduri, dict): processobj = resolveduri else: raise Exception("Must resolve to list or dict") kwargs = kwargs.copy() kwargs.update({ "makeTool": makeTool, "loader": document_loader, "avsc_names": avsc_names, "metadata": metadata }) tool = makeTool(processobj, **kwargs) if "cwl:defaults" in metadata: jobobj = metadata["cwl:defaults"] for inp in tool.tool["inputs"]: if shortname(inp["id"]) in jobobj: inp["default"] = jobobj[shortname(inp["id"])] return tool def load_tool(argsworkflow, # type: Union[Text, Dict[Text, Any]] makeTool, # type: Callable[..., Process] kwargs=None, # type: Dict enable_dev=False, # type: bool strict=True, # type: bool resolver=None, # type: Callable[[Loader, Union[Text, Dict[Text, Any]]], Text] fetcher_constructor=None, # type: FetcherConstructorType overrides=None ): # type: (...) -> Process document_loader, workflowobj, uri = fetch_document(argsworkflow, resolver=resolver, fetcher_constructor=fetcher_constructor) document_loader, avsc_names, processobj, metadata, uri = validate_document( document_loader, workflowobj, uri, enable_dev=enable_dev, strict=strict, fetcher_constructor=fetcher_constructor, overrides=overrides, metadata=kwargs.get('metadata', None) if kwargs else None) return make_tool(document_loader, avsc_names, metadata, uri, makeTool, kwargs if kwargs else {}) def resolve_overrides(ov, ov_uri, baseurl): # type: (CommentedMap, Text, Text) -> List[Dict[Text, Any]] ovloader = Loader(overrides_ctx) ret, _ = ovloader.resolve_all(ov, baseurl) if not isinstance(ret, CommentedMap): raise Exception("Expected CommentedMap, got %s" % type(ret)) cwl_docloader = get_schema("v1.0")[0] cwl_docloader.resolve_all(ret, ov_uri) return ret["overrides"] def load_overrides(ov, base_url): # type: (Text, Text) -> List[Dict[Text, Any]] ovloader = Loader(overrides_ctx) return resolve_overrides(ovloader.fetch(ov), ov, base_url) cwltool-1.0.20180302231433/cwltool/factory.py0000644000175200017520000000375713247251315021316 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import os from typing import Callable as tCallable from typing import Any, Dict, Text, Tuple, Union from . import load_tool, workflow from .argparser import get_default_args from .executors import SingleJobExecutor from .process import Process class WorkflowStatus(Exception): def __init__(self, out, status): # type: (Dict[Text,Any], Text) -> None super(WorkflowStatus, self).__init__("Completed %s" % status) self.out = out self.status = status class Callable(object): def __init__(self, t, factory): # type: (Process, Factory) -> None self.t = t self.factory = factory def __call__(self, **kwargs): # type: (**Any) -> Union[Text, Dict[Text, Text]] execkwargs = self.factory.execkwargs.copy() execkwargs["basedir"] = os.getcwd() out, status = self.factory.executor(self.t, kwargs, **execkwargs) if status != "success": raise WorkflowStatus(out, status) else: return out class Factory(object): def __init__(self, makeTool=workflow.defaultMakeTool, # type: tCallable[[Any], Process] # should be tCallable[[Dict[Text, Any], Any], Process] ? executor=None, # type: tCallable[...,Tuple[Dict[Text,Any], Text]] **execkwargs # type: Any ): # type: (...) -> None self.makeTool = makeTool if executor is None: executor = SingleJobExecutor() self.executor = executor kwargs = get_default_args() kwargs.pop("job_order") kwargs.pop("workflow") kwargs.pop("outdir") kwargs.update(execkwargs) self.execkwargs = kwargs def make(self, cwl): """Instantiate a CWL object from a CWl document.""" load = load_tool.load_tool(cwl, self.makeTool) if isinstance(load, int): raise Exception("Error loading tool") return Callable(load, self) cwltool-1.0.20180302231433/cwltool/__init__.py0000644000175200017520000000012213247251315021365 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import __author__ = 'peter.amstutz@curoverse.com' cwltool-1.0.20180302231433/cwltool/main.py0000755000175200017520000006366313247251315020600 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env python from __future__ import absolute_import from __future__ import print_function import argparse import collections import functools import json import logging import os import sys import warnings from typing import (IO, Any, Callable, Dict, List, Text, Tuple, Union, cast, Mapping, MutableMapping, Iterable) import pkg_resources # part of setuptools import ruamel.yaml as yaml import schema_salad.validate as validate import six from schema_salad.ref_resolver import Loader, file_uri, uri_file_path from schema_salad.sourceline import strip_dup_lineno from . import command_line_tool, workflow from .argparser import arg_parser, generate_parser, DEFAULT_TMP_PREFIX from .cwlrdf import printdot, printrdf from .errors import UnsupportedRequirement, WorkflowException from .executors import SingleJobExecutor, MultithreadedJobExecutor from .load_tool import (FetcherConstructorType, resolve_tool_uri, fetch_document, make_tool, validate_document, jobloaderctx, resolve_overrides, load_overrides) from .loghandler import defaultStreamHandler from .mutation import MutationManager from .pack import pack from .pathmapper import (adjustDirObjs, trim_listing, visit_class) from .process import (Process, normalizeFilesDirs, scandeps, shortname, use_custom_schema, use_standard_schema) from .resolver import ga4gh_tool_registries, tool_resolver from .software_requirements import (DependenciesConfiguration, get_container_from_software_requirements) from .stdfsaccess import StdFsAccess from .update import ALLUPDATES, UPDATES from .utils import onWindows, windows_default_container_id _logger = logging.getLogger("cwltool") def single_job_executor(t, # type: Process job_order_object, # type: Dict[Text, Any] **kwargs # type: Any ): # type: (...) -> Tuple[Dict[Text, Any], Text] warnings.warn("Use of single_job_executor function is deprecated. " "Use cwltool.executors.SingleJobExecutor class instead", DeprecationWarning) executor = SingleJobExecutor() return executor(t, job_order_object, **kwargs) def generate_example_input(inptype): # type: (Union[Text, Dict[Text, Any]]) -> Any defaults = { 'null': 'null', 'Any': 'null', 'boolean': False, 'int': 0, 'long': 0, 'float': 0.1, 'double': 0.1, 'string': 'default_string', 'File': { 'class': 'File', 'path': 'default/file/path' }, 'Directory': { 'class': 'Directory', 'path': 'default/directory/path' } } if (not isinstance(inptype, str) and not isinstance(inptype, collections.Mapping) and isinstance(inptype, collections.MutableSet)): if len(inptype) == 2 and 'null' in inptype: inptype.remove('null') return generate_example_input(inptype[0]) # TODO: indicate that this input is optional else: raise Exception("multi-types other than optional not yet supported" " for generating example input objects: %s" % inptype) if isinstance(inptype, collections.Mapping) and 'type' in inptype: if inptype['type'] == 'array': return [ generate_example_input(inptype['items']) ] elif inptype['type'] == 'enum': return 'valid_enum_value' # TODO: list valid values in a comment elif inptype['type'] == 'record': record = {} for field in inptype['fields']: record[shortname(field['name'])] = generate_example_input( field['type']) return record elif isinstance(inptype, str): return defaults.get(inptype, 'custom_type') # TODO: support custom types, complex arrays def generate_input_template(tool): # type: (Process) -> Dict[Text, Any] template = {} for inp in tool.tool["inputs"]: name = shortname(inp["id"]) inptype = inp["type"] template[name] = generate_example_input(inptype) return template def load_job_order(args, # type: argparse.Namespace stdin, # type: IO[Any] fetcher_constructor, # Fetcher overrides, # type: List[Dict[Text, Any]] tool_file_uri # type: Text ): # type: (...) -> Tuple[Dict[Text, Any], Text, Loader] job_order_object = None _jobloaderctx = jobloaderctx.copy() loader = Loader(_jobloaderctx, fetcher_constructor=fetcher_constructor) # type: ignore if len(args.job_order) == 1 and args.job_order[0][0] != "-": job_order_file = args.job_order[0] elif len(args.job_order) == 1 and args.job_order[0] == "-": job_order_object = yaml.round_trip_load(stdin) job_order_object, _ = loader.resolve_all(job_order_object, file_uri(os.getcwd()) + "/") else: job_order_file = None if job_order_object: input_basedir = args.basedir if args.basedir else os.getcwd() elif job_order_file: input_basedir = args.basedir if args.basedir else os.path.abspath(os.path.dirname(job_order_file)) job_order_object, _ = loader.resolve_ref(job_order_file, checklinks=False) if job_order_object and "http://commonwl.org/cwltool#overrides" in job_order_object: overrides.extend(resolve_overrides(job_order_object, file_uri(job_order_file), tool_file_uri)) del job_order_object["http://commonwl.org/cwltool#overrides"] if not job_order_object: input_basedir = args.basedir if args.basedir else os.getcwd() return (job_order_object, input_basedir, loader) def init_job_order(job_order_object, # type: MutableMapping[Text, Any] args, # type: argparse.Namespace t, # type: Process print_input_deps=False, # type: bool relative_deps=False, # type: bool stdout=sys.stdout, # type: IO[Any] make_fs_access=None, # type: Callable[[Text], StdFsAccess] loader=None, # type: Loader input_basedir="" # type: Text ): # (...) -> Tuple[Dict[Text, Any], Text] if not job_order_object: namemap = {} # type: Dict[Text, Text] records = [] # type: List[Text] toolparser = generate_parser( argparse.ArgumentParser(prog=args.workflow), t, namemap, records) if toolparser: if args.tool_help: toolparser.print_help() exit(0) cmd_line = vars(toolparser.parse_args(args.job_order)) for record_name in records: record = {} record_items = { k: v for k, v in six.iteritems(cmd_line) if k.startswith(record_name)} for key, value in six.iteritems(record_items): record[key[len(record_name) + 1:]] = value del cmd_line[key] cmd_line[str(record_name)] = record if cmd_line["job_order"]: try: job_order_object = cast(MutableMapping, loader.resolve_ref(cmd_line["job_order"])[0]) except Exception as e: _logger.error(Text(e), exc_info=args.debug) return 1 else: job_order_object = {"id": args.workflow} del cmd_line["job_order"] job_order_object.update({namemap[k]: v for k, v in cmd_line.items()}) if _logger.isEnabledFor(logging.DEBUG): _logger.debug(u"Parsed job order from command line: %s", json.dumps(job_order_object, indent=4)) else: job_order_object = None for inp in t.tool["inputs"]: if "default" in inp and (not job_order_object or shortname(inp["id"]) not in job_order_object): if not job_order_object: job_order_object = {} job_order_object[shortname(inp["id"])] = inp["default"] if not job_order_object and len(t.tool["inputs"]) > 0: if toolparser: print(u"\nOptions for {} ".format(args.workflow)) toolparser.print_help() _logger.error("") _logger.error("Input object required, use --help for details") exit(1) if print_input_deps: printdeps(job_order_object, loader, stdout, relative_deps, "", basedir=file_uri(str(input_basedir) + "/")) exit(0) def pathToLoc(p): if "location" not in p and "path" in p: p["location"] = p["path"] del p["path"] def addSizes(p): if 'location' in p: try: p["size"] = os.stat(p["location"][7:]).st_size # strip off file:// except OSError: pass elif 'contents' in p: p["size"] = len(p['contents']) else: return # best effort ns = {} # type: Dict[Text, Union[Dict[Any, Any], Text, Iterable[Text]]] ns.update(t.metadata.get("$namespaces", {})) ld = Loader(ns) def expand_formats(p): if "format" in p: p["format"] = ld.expand_url(p["format"], "") visit_class(job_order_object, ("File", "Directory"), pathToLoc) visit_class(job_order_object, ("File",), addSizes) visit_class(job_order_object, ("File",), expand_formats) adjustDirObjs(job_order_object, trim_listing) normalizeFilesDirs(job_order_object) if "cwl:tool" in job_order_object: del job_order_object["cwl:tool"] if "id" in job_order_object: del job_order_object["id"] return job_order_object def makeRelative(base, ob): u = ob.get("location", ob.get("path")) if ":" in u.split("/")[0] and not u.startswith("file://"): pass else: if u.startswith("file://"): u = uri_file_path(u) ob["location"] = os.path.relpath(u, base) def printdeps(obj, document_loader, stdout, relative_deps, uri, basedir=None): # type: (Mapping[Text, Any], Loader, IO[Any], bool, Text, Text) -> None deps = {"class": "File", "location": uri} # type: Dict[Text, Any] def loadref(b, u): return document_loader.fetch(document_loader.fetcher.urljoin(b, u)) sf = scandeps( basedir if basedir else uri, obj, {"$import", "run"}, {"$include", "$schemas", "location"}, loadref) if sf: deps["secondaryFiles"] = sf if relative_deps: if relative_deps == "primary": base = basedir if basedir else os.path.dirname(uri_file_path(str(uri))) elif relative_deps == "cwd": base = os.getcwd() else: raise Exception(u"Unknown relative_deps %s" % relative_deps) visit_class(deps, ("File", "Directory"), functools.partial(makeRelative, base)) stdout.write(json.dumps(deps, indent=4)) def print_pack(document_loader, processobj, uri, metadata): # type: (Loader, Union[Dict[Text, Any], List[Dict[Text, Any]]], Text, Dict[Text, Any]) -> str packed = pack(document_loader, processobj, uri, metadata) if len(packed["$graph"]) > 1: return json.dumps(packed, indent=4) else: return json.dumps(packed["$graph"][0], indent=4) def versionstring(): # type: () -> Text pkg = pkg_resources.require("cwltool") if pkg: return u"%s %s" % (sys.argv[0], pkg[0].version) else: return u"%s %s" % (sys.argv[0], "unknown version") def supportedCWLversions(enable_dev): # type: (bool) -> List[Text] # ALLUPDATES and UPDATES are dicts if enable_dev: versions = list(ALLUPDATES) else: versions = list(UPDATES) versions.sort() return versions def main(argsl=None, # type: List[str] args=None, # type: argparse.Namespace executor=None, # type: Callable[..., Tuple[Dict[Text, Any], Text]] makeTool=workflow.defaultMakeTool, # type: Callable[..., Process] selectResources=None, # type: Callable[[Dict[Text, int]], Dict[Text, int]] stdin=sys.stdin, # type: IO[Any] stdout=sys.stdout, # type: IO[Any] stderr=sys.stderr, # type: IO[Any] versionfunc=versionstring, # type: Callable[[], Text] job_order_object=None, # type: MutableMapping[Text, Any] make_fs_access=StdFsAccess, # type: Callable[[Text], StdFsAccess] fetcher_constructor=None, # type: FetcherConstructorType resolver=tool_resolver, logger_handler=None, custom_schema_callback=None # type: Callable[[], None] ): # type: (...) -> int _logger.removeHandler(defaultStreamHandler) if logger_handler: stderr_handler = logger_handler else: stderr_handler = logging.StreamHandler(stderr) _logger.addHandler(stderr_handler) try: if args is None: if argsl is None: argsl = sys.argv[1:] args = arg_parser().parse_args(argsl) # If On windows platform, A default Docker Container is Used if not explicitely provided by user if onWindows() and not args.default_container: # This docker image is a minimal alpine image with bash installed(size 6 mb). source: https://github.com/frol/docker-alpine-bash args.default_container = windows_default_container_id # If caller provided custom arguments, it may be not every expected # option is set, so fill in no-op defaults to avoid crashing when # dereferencing them in args. for k, v in six.iteritems({'print_deps': False, 'print_pre': False, 'print_rdf': False, 'print_dot': False, 'relative_deps': False, 'tmp_outdir_prefix': 'tmp', 'tmpdir_prefix': 'tmp', 'print_input_deps': False, 'cachedir': None, 'quiet': False, 'debug': False, 'timestamps': False, 'js_console': False, 'version': False, 'enable_dev': False, 'enable_ext': False, 'strict': True, 'skip_schemas': False, 'rdf_serializer': None, 'basedir': None, 'tool_help': False, 'workflow': None, 'job_order': None, 'pack': False, 'on_error': 'continue', 'relax_path_checks': False, 'validate': False, 'enable_ga4gh_tool_registry': False, 'ga4gh_tool_registries': [], 'find_default_container': None, 'make_template': False, 'overrides': None }): if not hasattr(args, k): setattr(args, k, v) if args.quiet: _logger.setLevel(logging.WARN) if args.debug: _logger.setLevel(logging.DEBUG) if args.timestamps: formatter = logging.Formatter("[%(asctime)s] %(message)s", "%Y-%m-%d %H:%M:%S") stderr_handler.setFormatter(formatter) if args.version: print(versionfunc()) return 0 else: _logger.info(versionfunc()) if args.print_supported_versions: print("\n".join(supportedCWLversions(args.enable_dev))) return 0 if not args.workflow: if os.path.isfile("CWLFile"): setattr(args, "workflow", "CWLFile") else: _logger.error("") _logger.error("CWL document required, no input file was provided") arg_parser().print_help() return 1 if args.relax_path_checks: command_line_tool.ACCEPTLIST_RE = command_line_tool.ACCEPTLIST_EN_RELAXED_RE if args.ga4gh_tool_registries: ga4gh_tool_registries[:] = args.ga4gh_tool_registries if not args.enable_ga4gh_tool_registry: del ga4gh_tool_registries[:] if custom_schema_callback: custom_schema_callback() elif args.enable_ext: res = pkg_resources.resource_stream(__name__, 'extensions.yml') use_custom_schema("v1.0", "http://commonwl.org/cwltool", res.read()) res.close() else: use_standard_schema("v1.0") uri, tool_file_uri = resolve_tool_uri(args.workflow, resolver=resolver, fetcher_constructor=fetcher_constructor) overrides = [] # type: List[Dict[Text, Any]] try: job_order_object, input_basedir, jobloader = load_job_order(args, stdin, fetcher_constructor, overrides, tool_file_uri) except Exception as e: _logger.error(Text(e), exc_info=args.debug) if args.overrides: overrides.extend(load_overrides(file_uri(os.path.abspath(args.overrides)), tool_file_uri)) try: document_loader, workflowobj, uri = fetch_document(uri, resolver=resolver, fetcher_constructor=fetcher_constructor) if args.print_deps: printdeps(workflowobj, document_loader, stdout, args.relative_deps, uri) return 0 document_loader, avsc_names, processobj, metadata, uri \ = validate_document(document_loader, workflowobj, uri, enable_dev=args.enable_dev, strict=args.strict, preprocess_only=args.print_pre or args.pack, fetcher_constructor=fetcher_constructor, skip_schemas=args.skip_schemas, overrides=overrides) if args.print_pre: stdout.write(json.dumps(processobj, indent=4)) return 0 overrides.extend(metadata.get("cwltool:overrides", [])) conf_file = getattr(args, "beta_dependency_resolvers_configuration", None) # Text use_conda_dependencies = getattr(args, "beta_conda_dependencies", None) # Text make_tool_kwds = vars(args) job_script_provider = None # type: Callable[[Any, List[str]], Text] if conf_file or use_conda_dependencies: dependencies_configuration = DependenciesConfiguration(args) # type: DependenciesConfiguration make_tool_kwds["job_script_provider"] = dependencies_configuration make_tool_kwds["find_default_container"] = functools.partial(find_default_container, args) make_tool_kwds["overrides"] = overrides tool = make_tool(document_loader, avsc_names, metadata, uri, makeTool, make_tool_kwds) if args.make_template: yaml.safe_dump(generate_input_template(tool), sys.stdout, default_flow_style=False, indent=4, block_seq_indent=2) return 0 if args.validate: _logger.info("Tool definition is valid") return 0 if args.pack: stdout.write(print_pack(document_loader, processobj, uri, metadata)) return 0 if args.print_rdf: stdout.write(printrdf(tool, document_loader.ctx, args.rdf_serializer)) return 0 if args.print_dot: printdot(tool, document_loader.ctx, stdout) return 0 except (validate.ValidationException) as exc: _logger.error(u"Tool definition failed validation:\n%s", exc, exc_info=args.debug) return 1 except (RuntimeError, WorkflowException) as exc: _logger.error(u"Tool definition failed initialization:\n%s", exc, exc_info=args.debug) return 1 except Exception as exc: _logger.error( u"I'm sorry, I couldn't load this CWL file%s", ", try again with --debug for more information.\nThe error was: " "%s" % exc if not args.debug else ". The error was:", exc_info=args.debug) return 1 if isinstance(tool, int): return tool # If on MacOS platform, TMPDIR must be set to be under one of the shared volumes in Docker for Mac # More info: https://dockstore.org/docs/faq if sys.platform == "darwin": tmp_prefix = "tmp_outdir_prefix" default_mac_path = "/private/tmp/docker_tmp" if getattr(args, tmp_prefix) and getattr(args, tmp_prefix) == DEFAULT_TMP_PREFIX: setattr(args, tmp_prefix, default_mac_path) for dirprefix in ("tmpdir_prefix", "tmp_outdir_prefix", "cachedir"): if getattr(args, dirprefix) and getattr(args, dirprefix) != DEFAULT_TMP_PREFIX: sl = "/" if getattr(args, dirprefix).endswith("/") or dirprefix == "cachedir" else "" setattr(args, dirprefix, os.path.abspath(getattr(args, dirprefix)) + sl) if not os.path.exists(os.path.dirname(getattr(args, dirprefix))): try: os.makedirs(os.path.dirname(getattr(args, dirprefix))) except Exception as e: _logger.error("Failed to create directory: %s", e) return 1 if args.cachedir: if args.move_outputs == "move": setattr(args, 'move_outputs', "copy") setattr(args, "tmp_outdir_prefix", args.cachedir) try: job_order_object = init_job_order(job_order_object, args, tool, print_input_deps=args.print_input_deps, relative_deps=args.relative_deps, stdout=stdout, make_fs_access=make_fs_access, loader=jobloader, input_basedir=input_basedir) except SystemExit as e: return e.code if not executor: if args.parallel: executor = MultithreadedJobExecutor() else: executor = SingleJobExecutor() if isinstance(job_order_object, int): return job_order_object try: setattr(args, 'basedir', input_basedir) del args.workflow del args.job_order (out, status) = executor(tool, job_order_object, logger=_logger, makeTool=makeTool, select_resources=selectResources, make_fs_access=make_fs_access, **vars(args)) # This is the workflow output, it needs to be written if out is not None: def locToPath(p): for field in ("path", "nameext", "nameroot", "dirname"): if field in p: del p[field] if p["location"].startswith("file://"): p["path"] = uri_file_path(p["location"]) visit_class(out, ("File", "Directory"), locToPath) # Unsetting the Generation fron final output object visit_class(out,("File",), MutationManager().unset_generation) if isinstance(out, six.string_types): stdout.write(out) else: stdout.write(json.dumps(out, indent=4)) stdout.write("\n") stdout.flush() if status != "success": _logger.warning(u"Final process status is %s", status) return 1 else: _logger.info(u"Final process status is %s", status) return 0 except (validate.ValidationException) as exc: _logger.error(u"Input object failed validation:\n%s", exc, exc_info=args.debug) return 1 except UnsupportedRequirement as exc: _logger.error( u"Workflow or tool uses unsupported feature:\n%s", exc, exc_info=args.debug) return 33 except WorkflowException as exc: _logger.error( u"Workflow error, try again with --debug for more " "information:\n%s", strip_dup_lineno(six.text_type(exc)), exc_info=args.debug) return 1 except Exception as exc: _logger.error( u"Unhandled error, try again with --debug for more information:\n" " %s", exc, exc_info=args.debug) return 1 finally: _logger.removeHandler(stderr_handler) _logger.addHandler(defaultStreamHandler) def find_default_container(args, builder): default_container = None if args.default_container: default_container = args.default_container elif args.beta_use_biocontainers: default_container = get_container_from_software_requirements(args, builder) return default_container if __name__ == "__main__": sys.exit(main(sys.argv[1:])) cwltool-1.0.20180302231433/cwltool/__main__.py0000644000175200017520000000013513247251315021352 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import sys from . import main sys.exit(main.main()) cwltool-1.0.20180302231433/cwltool/pathmapper.py0000644000175200017520000002756713247251315022015 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import collections import logging import os import stat import uuid from functools import partial from tempfile import NamedTemporaryFile import requests from cachecontrol import CacheControl from cachecontrol.caches import FileCache from typing import Any, Callable, Dict, Iterable, List, Set, Text, Tuple, Union, MutableMapping import schema_salad.validate as validate from schema_salad.ref_resolver import uri_file_path from schema_salad.sourceline import SourceLine from six.moves import urllib from .utils import convert_pathsep_to_unix from .stdfsaccess import StdFsAccess, abspath _logger = logging.getLogger("cwltool") MapperEnt = collections.namedtuple("MapperEnt", ["resolved", "target", "type", "staged"]) def adjustFiles(rec, op): # type: (Any, Union[Callable[..., Any], partial[Any]]) -> None """Apply a mapping function to each File path in the object `rec`.""" if isinstance(rec, dict): if rec.get("class") == "File": rec["path"] = op(rec["path"]) for d in rec: adjustFiles(rec[d], op) if isinstance(rec, list): for d in rec: adjustFiles(d, op) def visit_class(rec, cls, op): # type: (Any, Iterable, Union[Callable[..., Any], partial[Any]]) -> None """Apply a function to with "class" in cls.""" if isinstance(rec, dict): if "class" in rec and rec.get("class") in cls: op(rec) for d in rec: visit_class(rec[d], cls, op) if isinstance(rec, list): for d in rec: visit_class(d, cls, op) def adjustFileObjs(rec, op): # type: (Any, Union[Callable[..., Any], partial[Any]]) -> None """Apply an update function to each File object in the object `rec`.""" visit_class(rec, ("File",), op) def adjustDirObjs(rec, op): # type: (Any, Union[Callable[..., Any], partial[Any]]) -> None """Apply an update function to each Directory object in the object `rec`.""" visit_class(rec, ("Directory",), op) def normalizeFilesDirs(job): # type: (Union[List[Dict[Text, Any]], MutableMapping[Text, Any]]) -> None def addLocation(d): if "location" not in d: if d["class"] == "File" and ("contents" not in d): raise validate.ValidationException("Anonymous file object must have 'contents' and 'basename' fields.") if d["class"] == "Directory" and ("listing" not in d or "basename" not in d): raise validate.ValidationException( "Anonymous directory object must have 'listing' and 'basename' fields.") d["location"] = "_:" + Text(uuid.uuid4()) if "basename" not in d: d["basename"] = d["location"][2:] parse = urllib.parse.urlparse(d["location"]) path = parse.path # strip trailing slash if path.endswith("/"): if d["class"] != "Directory": raise validate.ValidationException( "location '%s' ends with '/' but is not a Directory" % d["location"]) path = path.rstrip("/") d["location"] = urllib.parse.urlunparse((parse.scheme, parse.netloc, path, parse.params, parse.query, parse.fragment)) if "basename" not in d: d["basename"] = os.path.basename(urllib.request.url2pathname(path)) if d["class"] == "File": d["nameroot"], d["nameext"] = os.path.splitext(d["basename"]) visit_class(job, ("File", "Directory"), addLocation) def dedup(listing): # type: (List[Any]) -> List[Any] marksub = set() def mark(d): marksub.add(d["location"]) for l in listing: if l["class"] == "Directory": for e in l.get("listing", []): adjustFileObjs(e, mark) adjustDirObjs(e, mark) dd = [] markdup = set() # type: Set[Text] for r in listing: if r["location"] not in marksub and r["location"] not in markdup: dd.append(r) markdup.add(r["location"]) return dd def get_listing(fs_access, rec, recursive=True): # type: (StdFsAccess, Dict[Text, Any], bool) -> None if "listing" in rec: return listing = [] loc = rec["location"] for ld in fs_access.listdir(loc): parse = urllib.parse.urlparse(ld) bn = os.path.basename(urllib.request.url2pathname(parse.path)) if fs_access.isdir(ld): ent = {u"class": u"Directory", u"location": ld, u"basename": bn} if recursive: get_listing(fs_access, ent, recursive) listing.append(ent) else: listing.append({"class": "File", "location": ld, "basename": bn}) rec["listing"] = listing def trim_listing(obj): """Remove 'listing' field from Directory objects that are file references. It redundant and potentially expensive to pass fully enumerated Directory objects around if not explicitly needed, so delete the 'listing' field when it is safe to do so. """ if obj.get("location", "").startswith("file://") and "listing" in obj: del obj["listing"] # Download http Files def downloadHttpFile(httpurl): # type: (Text) -> Text cache_session = None if "XDG_CACHE_HOME" in os.environ: directory = os.environ["XDG_CACHE_HOME"] elif "HOME" in os.environ: directory = os.environ["HOME"] else: directory = os.path.expanduser('~') cache_session = CacheControl( requests.Session(), cache=FileCache( os.path.join(directory, ".cache", "cwltool"))) r = cache_session.get(httpurl, stream=True) with NamedTemporaryFile(mode='wb', delete=False) as f: for chunk in r.iter_content(chunk_size=16384): if chunk: # filter out keep-alive new chunks f.write(chunk) r.close() return f.name def ensure_writable(path): # type: (Text) -> None if os.path.isdir(path): for root, dirs, files in os.walk(path): for name in files: j = os.path.join(root, name) st = os.stat(j) mode = stat.S_IMODE(st.st_mode) os.chmod(j, mode|stat.S_IWUSR) for name in dirs: j = os.path.join(root, name) st = os.stat(j) mode = stat.S_IMODE(st.st_mode) os.chmod(j, mode|stat.S_IWUSR) else: st = os.stat(path) mode = stat.S_IMODE(st.st_mode) os.chmod(path, mode|stat.S_IWUSR) class PathMapper(object): """Mapping of files from relative path provided in the file to a tuple of (absolute local path, absolute container path) The tao of PathMapper: The initializer takes a list of File and Directory objects, a base directory (for resolving relative references) and a staging directory (where the files are mapped to). The purpose of the setup method is to determine where each File or Directory should be placed on the target file system (relative to stagedir). If separatedirs=True, unrelated files will be isolated in their own directories under stagedir. If separatedirs=False, files and directories will all be placed in stagedir (with the possibility for name collisions...) The path map maps the "location" of the input Files and Directory objects to a tuple (resolved, target, type). The "resolved" field is the "real" path on the local file system (after resolving relative paths and traversing symlinks). The "target" is the path on the target file system (under stagedir). The type is the object type (one of File, Directory, CreateFile, WritableFile). The latter two (CreateFile, WritableFile) are used by InitialWorkDirRequirement to indicate files that are generated on the fly (CreateFile, in this case "resolved" holds the file contents instead of the path because they file doesn't exist) or copied into the output directory so they can be opened for update ("r+" or "a") (WritableFile). """ def __init__(self, referenced_files, basedir, stagedir, separateDirs=True): # type: (List[Any], Text, Text, bool) -> None self._pathmap = {} # type: Dict[Text, MapperEnt] self.stagedir = stagedir self.separateDirs = separateDirs self.setup(dedup(referenced_files), basedir) def visitlisting(self, listing, stagedir, basedir, copy=False, staged=False): # type: (List[Dict[Text, Any]], Text, Text, bool, bool) -> None for ld in listing: self.visit(ld, stagedir, basedir, copy=ld.get("writable", copy), staged=staged) def visit(self, obj, stagedir, basedir, copy=False, staged=False): # type: (Dict[Text, Any], Text, Text, bool, bool) -> None tgt = convert_pathsep_to_unix( os.path.join(stagedir, obj["basename"])) if obj["location"] in self._pathmap: return if obj["class"] == "Directory": if obj["location"].startswith("file://"): resolved = uri_file_path(obj["location"]) else: resolved = obj["location"] self._pathmap[obj["location"]] = MapperEnt(resolved, tgt, "WritableDirectory" if copy else "Directory", staged) if obj["location"].startswith("file://"): staged = False self.visitlisting(obj.get("listing", []), tgt, basedir, copy=copy, staged=staged) elif obj["class"] == "File": path = obj["location"] ab = abspath(path, basedir) if "contents" in obj and obj["location"].startswith("_:"): self._pathmap[obj["location"]] = MapperEnt(obj["contents"], tgt, "CreateFile", staged) else: with SourceLine(obj, "location", validate.ValidationException, _logger.isEnabledFor(logging.DEBUG)): deref = ab if urllib.parse.urlsplit(deref).scheme in ['http','https']: deref = downloadHttpFile(path) else: # Dereference symbolic links st = os.lstat(deref) while stat.S_ISLNK(st.st_mode): rl = os.readlink(deref) deref = rl if os.path.isabs(rl) else os.path.join( os.path.dirname(deref), rl) st = os.lstat(deref) self._pathmap[path] = MapperEnt(deref, tgt, "WritableFile" if copy else "File", staged) self.visitlisting(obj.get("secondaryFiles", []), stagedir, basedir, copy=copy, staged=staged) def setup(self, referenced_files, basedir): # type: (List[Any], Text) -> None # Go through each file and set the target to its own directory along # with any secondary files. stagedir = self.stagedir for fob in referenced_files: if self.separateDirs: stagedir = os.path.join(self.stagedir, "stg%s" % uuid.uuid4()) self.visit(fob, stagedir, basedir, copy=fob.get("writable"), staged=True) def mapper(self, src): # type: (Text) -> MapperEnt if u"#" in src: i = src.index(u"#") p = self._pathmap[src[:i]] return MapperEnt(p.resolved, p.target + src[i:], p.type, p.staged) else: return self._pathmap[src] def files(self): # type: () -> List[Text] return list(self._pathmap.keys()) def items(self): # type: () -> List[Tuple[Text, MapperEnt]] return list(self._pathmap.items()) def reversemap(self, target): # type: (Text) -> Tuple[Text, Text] for k, v in self._pathmap.items(): if v[1] == target: return (k, v[0]) return None def update(self, key, resolved, target, type, stage): # type: (Text, Text, Text, Text, bool) -> None self._pathmap[key] = MapperEnt(resolved, target, type, stage) def __contains__(self, key): return key in self._pathmap cwltool-1.0.20180302231433/cwltool/errors.py0000644000175200017520000000015113247251315021144 0ustar mcrusoemcrusoe00000000000000class WorkflowException(Exception): pass class UnsupportedRequirement(WorkflowException): pass cwltool-1.0.20180302231433/cwltool/executors.py0000644000175200017520000001323613247251315021661 0ustar mcrusoemcrusoe00000000000000import logging import tempfile import threading import os from abc import ABCMeta, abstractmethod from typing import Dict, Text, Any, Tuple, Set, List from .builder import Builder from .errors import WorkflowException from .mutation import MutationManager from .job import JobBase from .process import relocateOutputs, cleanIntermediate, Process from . import loghandler _logger = logging.getLogger("cwltool") class JobExecutor(object): __metaclass__ = ABCMeta def __init__(self): # type: (...) -> None self.final_output = [] # type: List self.final_status = [] # type: List self.output_dirs = set() # type: Set def __call__(self, *args, **kwargs): return self.execute(*args, **kwargs) def output_callback(self, out, processStatus): self.final_status.append(processStatus) self.final_output.append(out) @abstractmethod def run_jobs(self, t, # type: Process job_order_object, # type: Dict[Text, Any] logger, **kwargs # type: Any ): pass def execute(self, t, # type: Process job_order_object, # type: Dict[Text, Any] logger=_logger, **kwargs # type: Any ): # type: (...) -> Tuple[Dict[Text, Any], Text] if "basedir" not in kwargs: raise WorkflowException("Must provide 'basedir' in kwargs") finaloutdir = os.path.abspath(kwargs.get("outdir")) if kwargs.get("outdir") else None kwargs["outdir"] = tempfile.mkdtemp(prefix=kwargs["tmp_outdir_prefix"]) if kwargs.get( "tmp_outdir_prefix") else tempfile.mkdtemp() self.output_dirs.add(kwargs["outdir"]) kwargs["mutation_manager"] = MutationManager() jobReqs = None if "cwl:requirements" in job_order_object: jobReqs = job_order_object["cwl:requirements"] elif ("cwl:defaults" in t.metadata and "cwl:requirements" in t.metadata["cwl:defaults"]): jobReqs = t.metadata["cwl:defaults"]["cwl:requirements"] if jobReqs: for req in jobReqs: t.requirements.append(req) self.run_jobs(t, job_order_object, logger, **kwargs) if self.final_output and self.final_output[0] and finaloutdir: self.final_output[0] = relocateOutputs(self.final_output[0], finaloutdir, self.output_dirs, kwargs.get("move_outputs"), kwargs["make_fs_access"](""), kwargs["compute_checksum"]) if kwargs.get("rm_tmpdir"): cleanIntermediate(self.output_dirs) if self.final_output and self.final_status: return (self.final_output[0], self.final_status[0]) else: return (None, "permanentFail") class SingleJobExecutor(JobExecutor): def run_jobs(self, t, # type: Process job_order_object, # type: Dict[Text, Any] logger, **kwargs # type: Any ): jobiter = t.job(job_order_object, self.output_callback, **kwargs) try: for r in jobiter: if r: builder = kwargs.get("builder", None) # type: Builder if builder is not None: r.builder = builder if r.outdir: self.output_dirs.add(r.outdir) r.run(**kwargs) else: logger.error("Workflow cannot make any more progress.") break except WorkflowException: raise except Exception as e: logger.exception("Got workflow error") raise WorkflowException(Text(e)) class MultithreadedJobExecutor(JobExecutor): def __init__(self): super(MultithreadedJobExecutor, self).__init__() self.threads = set() self.exceptions = [] def run_job(self, job, # type: JobBase **kwargs # type: Any ): # type: (...) -> None def runner(): try: job.run(**kwargs) except WorkflowException as e: self.exceptions.append(e) except Exception as e: self.exceptions.append(WorkflowException(Text(e))) self.threads.remove(thread) thread = threading.Thread(target=runner) thread.daemon = True self.threads.add(thread) thread.start() def wait_for_next_completion(self): # type: () -> None if self.exceptions: raise self.exceptions[0] def run_jobs(self, t, # type: Process job_order_object, # type: Dict[Text, Any] logger, **kwargs # type: Any ): jobiter = t.job(job_order_object, self.output_callback, **kwargs) for r in jobiter: if r: builder = kwargs.get("builder", None) # type: Builder if builder is not None: r.builder = builder if r.outdir: self.output_dirs.add(r.outdir) self.run_job(r, **kwargs) else: if len(self.threads): self.wait_for_next_completion() else: logger.error("Workflow cannot make any more progress.") break while len(self.threads) > 0: self.wait_for_next_completion() cwltool-1.0.20180302231433/cwltool/job.py0000644000175200017520000004571213247251315020416 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import codecs import functools import io import json import logging import os import re import shutil import stat import subprocess import sys import tempfile from abc import ABCMeta, abstractmethod from io import open from threading import Lock import shellescape from typing import (IO, Any, Callable, Dict, Iterable, List, MutableMapping, Text, Union, cast) from .builder import Builder from .errors import WorkflowException from .pathmapper import PathMapper from .process import (UnsupportedRequirement, get_feature, stageFiles) from .utils import bytes2str_in_dicts from .utils import copytree_with_merge, onWindows _logger = logging.getLogger("cwltool") needs_shell_quoting_re = re.compile(r"""(^$|[\s|&;()<>\'"$@])""") job_output_lock = Lock() FORCE_SHELLED_POPEN = os.getenv("CWLTOOL_FORCE_SHELL_POPEN", "0") == "1" SHELL_COMMAND_TEMPLATE = """#!/bin/bash python "run_job.py" "job.json" """ PYTHON_RUN_SCRIPT = """ import json import os import sys import subprocess with open(sys.argv[1], "r") as f: popen_description = json.load(f) commands = popen_description["commands"] cwd = popen_description["cwd"] env = popen_description["env"] env["PATH"] = os.environ.get("PATH") stdin_path = popen_description["stdin_path"] stdout_path = popen_description["stdout_path"] stderr_path = popen_description["stderr_path"] if stdin_path is not None: stdin = open(stdin_path, "rb") else: stdin = subprocess.PIPE if stdout_path is not None: stdout = open(stdout_path, "wb") else: stdout = sys.stderr if stderr_path is not None: stderr = open(stderr_path, "wb") else: stderr = sys.stderr sp = subprocess.Popen(commands, shell=False, close_fds=True, stdin=stdin, stdout=stdout, stderr=stderr, env=env, cwd=cwd) if sp.stdin: sp.stdin.close() rcode = sp.wait() if stdin is not subprocess.PIPE: stdin.close() if stdout is not sys.stderr: stdout.close() if stderr is not sys.stderr: stderr.close() sys.exit(rcode) """ def deref_links(outputs): # type: (Any) -> None if isinstance(outputs, dict): if outputs.get("class") == "File": st = os.lstat(outputs["path"]) if stat.S_ISLNK(st.st_mode): outputs["basename"] = os.path.basename(outputs["path"]) outputs["path"] = os.readlink(outputs["path"]) else: for v in outputs.values(): deref_links(v) if isinstance(outputs, list): for v in outputs: deref_links(v) def relink_initialworkdir(pathmapper, host_outdir, container_outdir, inplace_update=False): # type: (PathMapper, Text, Text, bool) -> None for src, vol in pathmapper.items(): if not vol.staged: continue if vol.type in ("File", "Directory") or (inplace_update and vol.type in ("WritableFile", "WritableDirectory")): host_outdir_tgt = os.path.join(host_outdir, vol.target[len(container_outdir)+1:]) if os.path.islink(host_outdir_tgt) or os.path.isfile(host_outdir_tgt): os.remove(host_outdir_tgt) elif os.path.isdir(host_outdir_tgt): shutil.rmtree(host_outdir_tgt) if onWindows(): if vol.type in ("File", "WritableFile"): shutil.copy(vol.resolved, host_outdir_tgt) elif vol.type in ("Directory", "WritableDirectory"): copytree_with_merge(vol.resolved, host_outdir_tgt) else: os.symlink(vol.resolved, host_outdir_tgt) class JobBase(object): def __init__(self): # type: () -> None self.builder = None # type: Builder self.joborder = None # type: Dict[Text, Union[Dict[Text, Any], List, Text]] self.stdin = None # type: Text self.stderr = None # type: Text self.stdout = None # type: Text self.successCodes = None # type: Iterable[int] self.temporaryFailCodes = None # type: Iterable[int] self.permanentFailCodes = None # type: Iterable[int] self.requirements = None # type: List[Dict[Text, Text]] self.hints = None # type: Dict[Text,Text] self.name = None # type: Text self.command_line = None # type: List[Text] self.pathmapper = None # type: PathMapper self.make_pathmapper = None # type: Callable[..., PathMapper] self.generatemapper = None # type: PathMapper self.collect_outputs = None # type: Union[Callable[[Any], Any], functools.partial[Any]] self.output_callback = None # type: Callable[[Any, Any], Any] self.outdir = None # type: Text self.tmpdir = None # type: Text self.environment = None # type: MutableMapping[Text, Text] self.generatefiles = None # type: Dict[Text, Union[List[Dict[Text, Text]], Dict[Text, Text], Text]] self.stagedir = None # type: Text self.inplace_update = None # type: bool def _setup(self, kwargs): # type: (Dict) -> None if not os.path.exists(self.outdir): os.makedirs(self.outdir) for knownfile in self.pathmapper.files(): p = self.pathmapper.mapper(knownfile) if p.type == "File" and not os.path.isfile(p[0]) and p.staged: raise WorkflowException( u"Input file %s (at %s) not found or is not a regular " "file." % (knownfile, self.pathmapper.mapper(knownfile)[0])) if self.generatefiles["listing"]: make_path_mapper_kwargs = kwargs if "basedir" in make_path_mapper_kwargs: make_path_mapper_kwargs = make_path_mapper_kwargs.copy() del make_path_mapper_kwargs["basedir"] self.generatemapper = self.make_pathmapper(cast(List[Any], self.generatefiles["listing"]), self.builder.outdir, basedir=self.outdir, separateDirs=False, **make_path_mapper_kwargs) _logger.debug(u"[job %s] initial work dir %s", self.name, json.dumps({p: self.generatemapper.mapper(p) for p in self.generatemapper.files()}, indent=4)) def _execute(self, runtime, env, rm_tmpdir=True, move_outputs="move"): # type: (List[Text], MutableMapping[Text, Text], bool, Text) -> None scr, _ = get_feature(self, "ShellCommandRequirement") shouldquote = None # type: Callable[[Any], Any] if scr: shouldquote = lambda x: False else: shouldquote = needs_shell_quoting_re.search _logger.info(u"[job %s] %s$ %s%s%s%s", self.name, self.outdir, " \\\n ".join([shellescape.quote(Text(arg)) if shouldquote(Text(arg)) else Text(arg) for arg in (runtime + self.command_line)]), u' < %s' % self.stdin if self.stdin else '', u' > %s' % os.path.join(self.outdir, self.stdout) if self.stdout else '', u' 2> %s' % os.path.join(self.outdir, self.stderr) if self.stderr else '') outputs = {} # type: Dict[Text,Text] try: stdin_path = None if self.stdin: stdin_path = self.pathmapper.reversemap(self.stdin)[1] stderr_path = None if self.stderr: abserr = os.path.join(self.outdir, self.stderr) dnerr = os.path.dirname(abserr) if dnerr and not os.path.exists(dnerr): os.makedirs(dnerr) stderr_path = abserr stdout_path = None if self.stdout: absout = os.path.join(self.outdir, self.stdout) dn = os.path.dirname(absout) if dn and not os.path.exists(dn): os.makedirs(dn) stdout_path = absout commands = [Text(x) for x in (runtime + self.command_line)] job_script_contents = None # type: Text builder = getattr(self, "builder", None) # type: Builder if builder is not None: job_script_contents = builder.build_job_script(commands) rcode = _job_popen( commands, stdin_path=stdin_path, stdout_path=stdout_path, stderr_path=stderr_path, env=env, cwd=self.outdir, job_script_contents=job_script_contents, ) if self.successCodes and rcode in self.successCodes: processStatus = "success" elif self.temporaryFailCodes and rcode in self.temporaryFailCodes: processStatus = "temporaryFail" elif self.permanentFailCodes and rcode in self.permanentFailCodes: processStatus = "permanentFail" elif rcode == 0: processStatus = "success" else: processStatus = "permanentFail" if self.generatefiles["listing"]: relink_initialworkdir(self.generatemapper, self.outdir, self.builder.outdir, inplace_update=self.inplace_update) outputs = self.collect_outputs(self.outdir) outputs = bytes2str_in_dicts(outputs) # type: ignore except OSError as e: if e.errno == 2: if runtime: _logger.error(u"'%s' not found", runtime[0]) else: _logger.error(u"'%s' not found", self.command_line[0]) else: _logger.exception("Exception while running job") processStatus = "permanentFail" except WorkflowException as e: _logger.error(u"[job %s] Job error:\n%s" % (self.name, e)) processStatus = "permanentFail" except Exception as e: _logger.exception("Exception while running job") processStatus = "permanentFail" if processStatus != "success": _logger.warning(u"[job %s] completed %s", self.name, processStatus) else: _logger.info(u"[job %s] completed %s", self.name, processStatus) if _logger.isEnabledFor(logging.DEBUG): _logger.debug(u"[job %s] %s", self.name, json.dumps(outputs, indent=4)) with job_output_lock: self.output_callback(outputs, processStatus) if self.stagedir and os.path.exists(self.stagedir): _logger.debug(u"[job %s] Removing input staging directory %s", self.name, self.stagedir) shutil.rmtree(self.stagedir, True) if rm_tmpdir: _logger.debug(u"[job %s] Removing temporary directory %s", self.name, self.tmpdir) shutil.rmtree(self.tmpdir, True) class CommandLineJob(JobBase): def run(self, pull_image=True, rm_container=True, rm_tmpdir=True, move_outputs="move", **kwargs): # type: (bool, bool, bool, Text, **Any) -> None self._setup(kwargs) env = self.environment if not os.path.exists(self.tmpdir): os.makedirs(self.tmpdir) vars_to_preserve = kwargs.get("preserve_environment") if kwargs.get("preserve_entire_environment"): vars_to_preserve = os.environ if vars_to_preserve is not None: for key, value in os.environ.items(): if key in vars_to_preserve and key not in env: # On Windows, subprocess env can't handle unicode. env[key] = str(value) if onWindows() else value env["HOME"] = str(self.outdir) if onWindows() else self.outdir env["TMPDIR"] = str(self.tmpdir) if onWindows() else self.tmpdir if "PATH" not in env: env["PATH"] = str(os.environ["PATH"]) if onWindows() else os.environ["PATH"] if "SYSTEMROOT" not in env and "SYSTEMROOT" in os.environ: env["SYSTEMROOT"] = str(os.environ["SYSTEMROOT"]) if onWindows() else os.environ["SYSTEMROOT"] stageFiles(self.pathmapper, ignoreWritable=True, symLink=True) if self.generatemapper: stageFiles(self.generatemapper, ignoreWritable=self.inplace_update, symLink=True) relink_initialworkdir(self.generatemapper, self.outdir, self.builder.outdir, inplace_update=self.inplace_update) self._execute([], env, rm_tmpdir=rm_tmpdir, move_outputs=move_outputs) class ContainerCommandLineJob(JobBase): __metaclass__ = ABCMeta @abstractmethod def get_from_requirements(self, r, req, pull_image, dry_run=False): # type: (Dict[Text, Text], bool, bool, bool) -> Text pass @abstractmethod def create_runtime(self, env, rm_container, record_container_id, cidfile_dir, cidfile_prefix, **kwargs): # type: (MutableMapping[Text, Text], bool, bool, Text, Text, **Any) -> List pass def run(self, pull_image=True, rm_container=True, record_container_id=False, cidfile_dir="", cidfile_prefix="", rm_tmpdir=True, move_outputs="move", **kwargs): # type: (bool, bool, bool, Text, Text, bool, Text, **Any) -> None (docker_req, docker_is_req) = get_feature(self, "DockerRequirement") img_id = None env = None # type: MutableMapping[Text, Text] user_space_docker_cmd = kwargs.get("user_space_docker_cmd") if docker_req and user_space_docker_cmd: # For user-space docker implementations, a local image name or ID # takes precedence over a network pull if 'dockerImageId' in docker_req: img_id = str(docker_req["dockerImageId"]) elif 'dockerPull' in docker_req: img_id = str(docker_req["dockerPull"]) else: raise Exception("Docker image must be specified as " "'dockerImageId' or 'dockerPull' when using user " "space implementations of Docker") else: try: env = cast(MutableMapping[Text, Text], os.environ) if docker_req and kwargs.get("use_container"): img_id = str(self.get_from_requirements(docker_req, True, pull_image)) if img_id is None: if self.builder.find_default_container: default_container = self.builder.find_default_container() if default_container: img_id = str(default_container) env = cast(MutableMapping[Text, Text], os.environ) if docker_req and img_id is None and kwargs.get("use_container"): raise Exception("Docker image not available") except Exception as e: container = "Singularity" if kwargs.get("singularity") else "Docker" _logger.debug("%s error" % container, exc_info=True) if docker_is_req: raise UnsupportedRequirement( "%s is required to run this tool: %s" % (container, e)) else: raise WorkflowException( "{0} is not available for this tool, try " "--no-container to disable {0}, or install " "a user space Docker replacement like uDocker with " "--user-space-docker-cmd.: {1}".format(container, e)) self._setup(kwargs) runtime = self.create_runtime(env, rm_container, record_container_id, cidfile_dir, cidfile_prefix, **kwargs) runtime.append(img_id) self._execute(runtime, env, rm_tmpdir=rm_tmpdir, move_outputs=move_outputs) def _job_popen( commands, # type: List[Text] stdin_path, # type: Text stdout_path, # type: Text stderr_path, # type: Text env, # type: Union[MutableMapping[Text, Text], MutableMapping[str, str]] cwd, # type: Text job_dir=None, # type: Text job_script_contents=None, # type: Text ): # type: (...) -> int if not job_script_contents and not FORCE_SHELLED_POPEN: stdin = None # type: Union[IO[Any], int] stderr = None # type: IO[Any] stdout = None # type: IO[Any] if stdin_path is not None: stdin = open(stdin_path, "rb") else: stdin = subprocess.PIPE if stdout_path is not None: stdout = open(stdout_path, "wb") else: stdout = sys.stderr if stderr_path is not None: stderr = open(stderr_path, "wb") else: stderr = sys.stderr sp = subprocess.Popen(commands, shell=False, close_fds=not onWindows(), stdin=stdin, stdout=stdout, stderr=stderr, env=env, cwd=cwd) if sp.stdin: sp.stdin.close() rcode = sp.wait() if isinstance(stdin, io.IOBase): stdin.close() if stdout is not sys.stderr: stdout.close() if stderr is not sys.stderr: stderr.close() return rcode else: if job_dir is None: job_dir = tempfile.mkdtemp(prefix="cwltooljob") if not job_script_contents: job_script_contents = SHELL_COMMAND_TEMPLATE env_copy = {} key = None # type: Any for key in env: env_copy[key] = env[key] job_description = dict( commands=commands, cwd=cwd, env=env_copy, stdout_path=stdout_path, stderr_path=stderr_path, stdin_path=stdin_path, ) with open(os.path.join(job_dir, "job.json"), "wb") as f: json.dump(job_description, codecs.getwriter('utf-8')(f), ensure_ascii=False) # type: ignore try: job_script = os.path.join(job_dir, "run_job.bash") with open(job_script, "wb") as f: f.write(job_script_contents.encode('utf-8')) job_run = os.path.join(job_dir, "run_job.py") with open(job_run, "wb") as f: f.write(PYTHON_RUN_SCRIPT.encode('utf-8')) sp = subprocess.Popen( ["bash", job_script.encode("utf-8")], shell=False, cwd=job_dir, stdout=sys.stderr, # The nested script will output the paths to the correct files if they need stderr=sys.stderr, # to be captured. Else just write everything to stderr (same as above). stdin=subprocess.PIPE, ) if sp.stdin: sp.stdin.close() rcode = sp.wait() return rcode finally: shutil.rmtree(job_dir) cwltool-1.0.20180302231433/cwltool/argparser.py0000644000175200017520000004766213247251315021640 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import from __future__ import print_function import argparse import logging import os from typing import (Any, AnyStr, Dict, List, Sequence, Text, Union, cast) from . import loghandler from schema_salad.ref_resolver import file_uri from .process import (Process, shortname) from .resolver import ga4gh_tool_registries from .software_requirements import (SOFTWARE_REQUIREMENTS_ENABLED) _logger = logging.getLogger("cwltool") DEFAULT_TMP_PREFIX = "tmp" def arg_parser(): # type: () -> argparse.ArgumentParser parser = argparse.ArgumentParser( description='Reference executor for Common Workflow Language standards.') parser.add_argument("--basedir", type=Text) parser.add_argument("--outdir", type=Text, default=os.path.abspath('.'), help="Output directory, default current directory") parser.add_argument("--parallel", action="store_true", default=False, help="[experimental] Run jobs in parallel. " "Does not currently keep track of ResourceRequirements like the number of cores" "or memory and can overload this system") envgroup = parser.add_mutually_exclusive_group() envgroup.add_argument("--preserve-environment", type=Text, action="append", help="Preserve specific environment variable when " "running CommandLineTools. May be provided multiple " "times.", metavar="ENVVAR", default=["PATH"], dest="preserve_environment") envgroup.add_argument("--preserve-entire-environment", action="store_true", help="Preserve all environment variable when running " "CommandLineTools.", default=False, dest="preserve_entire_environment") exgroup = parser.add_mutually_exclusive_group() exgroup.add_argument("--rm-container", action="store_true", default=True, help="Delete Docker container used by jobs after they exit (default)", dest="rm_container") exgroup.add_argument("--leave-container", action="store_false", default=True, help="Do not delete Docker container used by jobs after they exit", dest="rm_container") cidgroup = parser.add_argument_group("Options for recording the Docker " "container identifier into a file") cidgroup.add_argument("--record-container-id", action="store_true", default=False, help="If enabled, store the Docker container ID into a file. " "See --cidfile-dir to specify the directory.", dest="record_container_id") cidgroup.add_argument("--cidfile-dir", type=Text, help="Directory for storing the Docker container ID file. " "The default is the current directory", default="", dest="cidfile_dir") cidgroup.add_argument("--cidfile-prefix", type=Text, help="Specify a prefix to the container ID filename. " "Final file name will be followed by a timestamp. " "The default is no prefix.", default="", dest="cidfile_prefix") parser.add_argument("--tmpdir-prefix", type=Text, help="Path prefix for temporary directories", default=DEFAULT_TMP_PREFIX) exgroup = parser.add_mutually_exclusive_group() exgroup.add_argument("--tmp-outdir-prefix", type=Text, help="Path prefix for intermediate output directories", default=DEFAULT_TMP_PREFIX) exgroup.add_argument("--cachedir", type=Text, default="", help="Directory to cache intermediate workflow outputs to avoid recomputing steps.") exgroup = parser.add_mutually_exclusive_group() exgroup.add_argument("--rm-tmpdir", action="store_true", default=True, help="Delete intermediate temporary directories (default)", dest="rm_tmpdir") exgroup.add_argument("--leave-tmpdir", action="store_false", default=True, help="Do not delete intermediate temporary directories", dest="rm_tmpdir") exgroup = parser.add_mutually_exclusive_group() exgroup.add_argument("--move-outputs", action="store_const", const="move", default="move", help="Move output files to the workflow output directory and delete intermediate output directories (default).", dest="move_outputs") exgroup.add_argument("--leave-outputs", action="store_const", const="leave", default="move", help="Leave output files in intermediate output directories.", dest="move_outputs") exgroup.add_argument("--copy-outputs", action="store_const", const="copy", default="move", help="Copy output files to the workflow output directory, don't delete intermediate output directories.", dest="move_outputs") exgroup = parser.add_mutually_exclusive_group() exgroup.add_argument("--enable-pull", default=True, action="store_true", help="Try to pull Docker images", dest="enable_pull") exgroup.add_argument("--disable-pull", default=True, action="store_false", help="Do not try to pull Docker images", dest="enable_pull") parser.add_argument("--rdf-serializer", help="Output RDF serialization format used by --print-rdf (one of turtle (default), n3, nt, xml)", default="turtle") parser.add_argument("--eval-timeout", help="Time to wait for a Javascript expression to evaluate before giving an error, default 20s.", type=float, default=20) exgroup = parser.add_mutually_exclusive_group() exgroup.add_argument("--print-rdf", action="store_true", help="Print corresponding RDF graph for workflow and exit") exgroup.add_argument("--print-dot", action="store_true", help="Print workflow visualization in graphviz format and exit") exgroup.add_argument("--print-pre", action="store_true", help="Print CWL document after preprocessing.") exgroup.add_argument("--print-deps", action="store_true", help="Print CWL document dependencies.") exgroup.add_argument("--print-input-deps", action="store_true", help="Print input object document dependencies.") exgroup.add_argument("--pack", action="store_true", help="Combine components into single document and print.") exgroup.add_argument("--version", action="store_true", help="Print version and exit") exgroup.add_argument("--validate", action="store_true", help="Validate CWL document only.") exgroup.add_argument("--print-supported-versions", action="store_true", help="Print supported CWL specs.") exgroup = parser.add_mutually_exclusive_group() exgroup.add_argument("--strict", action="store_true", help="Strict validation (unrecognized or out of place fields are error)", default=True, dest="strict") exgroup.add_argument("--non-strict", action="store_false", help="Lenient validation (ignore unrecognized fields)", default=True, dest="strict") parser.add_argument("--skip-schemas", action="store_true", help="Skip loading of schemas", default=False, dest="skip_schemas") exgroup = parser.add_mutually_exclusive_group() exgroup.add_argument("--verbose", action="store_true", help="Default logging") exgroup.add_argument("--quiet", action="store_true", help="Only print warnings and errors.") exgroup.add_argument("--debug", action="store_true", help="Print even more logging") parser.add_argument("--timestamps", action="store_true", help="Add " "timestamps to the errors, warnings, and " "notifications.") parser.add_argument("--js-console", action="store_true", help="Enable javascript console output") dockergroup = parser.add_mutually_exclusive_group() dockergroup.add_argument("--user-space-docker-cmd", metavar="CMD", help="(Linux/OS X only) Specify a user space docker " "command (like udocker or dx-docker) that will be " "used to call 'pull' and 'run'") dockergroup.add_argument("--singularity", action="store_true", default=False, help="[experimental] Use " "Singularity runtime for running containers. " "Requires Singularity v2.3.2+ and Linux with kernel " "version v3.18+ or with overlayfs support " "backported.") dockergroup.add_argument("--no-container", action="store_false", default=True, help="Do not execute jobs in a " "Docker container, even when `DockerRequirement` " "is specified under `hints`.", dest="use_container") dependency_resolvers_configuration_help = argparse.SUPPRESS dependencies_directory_help = argparse.SUPPRESS use_biocontainers_help = argparse.SUPPRESS conda_dependencies = argparse.SUPPRESS if SOFTWARE_REQUIREMENTS_ENABLED: dependency_resolvers_configuration_help = "Dependency resolver configuration file describing how to adapt 'SoftwareRequirement' packages to current system." dependencies_directory_help = "Defaut root directory used by dependency resolvers configuration." use_biocontainers_help = "Use biocontainers for tools without an explicitly annotated Docker container." conda_dependencies = "Short cut to use Conda to resolve 'SoftwareRequirement' packages." parser.add_argument("--beta-dependency-resolvers-configuration", default=None, help=dependency_resolvers_configuration_help) parser.add_argument("--beta-dependencies-directory", default=None, help=dependencies_directory_help) parser.add_argument("--beta-use-biocontainers", default=None, help=use_biocontainers_help, action="store_true") parser.add_argument("--beta-conda-dependencies", default=None, help=conda_dependencies, action="store_true") parser.add_argument("--tool-help", action="store_true", help="Print command line help for tool") parser.add_argument("--relative-deps", choices=['primary', 'cwd'], default="primary", help="When using --print-deps, print paths " "relative to primary file or current working directory.") parser.add_argument("--enable-dev", action="store_true", help="Enable loading and running development versions " "of CWL spec.", default=False) parser.add_argument("--enable-ext", action="store_true", help="Enable loading and running cwltool extensions " "to CWL spec.", default=False) parser.add_argument("--default-container", help="Specify a default docker container that will be used if the workflow fails to specify one.") parser.add_argument("--no-match-user", action="store_true", help="Disable passing the current uid to `docker run --user`") parser.add_argument("--disable-net", action="store_true", help="Use docker's default networking for containers;" " the default is to enable networking.") parser.add_argument("--custom-net", type=Text, help="Will be passed to `docker run` as the '--net' " "parameter. Implies '--enable-net'.") exgroup = parser.add_mutually_exclusive_group() exgroup.add_argument("--enable-ga4gh-tool-registry", action="store_true", help="Enable resolution using GA4GH tool registry API", dest="enable_ga4gh_tool_registry", default=True) exgroup.add_argument("--disable-ga4gh-tool-registry", action="store_false", help="Disable resolution using GA4GH tool registry API", dest="enable_ga4gh_tool_registry", default=True) parser.add_argument("--add-ga4gh-tool-registry", action="append", help="Add a GA4GH tool registry endpoint to use for resolution, default %s" % ga4gh_tool_registries, dest="ga4gh_tool_registries", default=[]) parser.add_argument("--on-error", help="Desired workflow behavior when a step fails. One of 'stop' or 'continue'. " "Default is 'stop'.", default="stop", choices=("stop", "continue")) exgroup = parser.add_mutually_exclusive_group() exgroup.add_argument("--compute-checksum", action="store_true", default=True, help="Compute checksum of contents while collecting outputs", dest="compute_checksum") exgroup.add_argument("--no-compute-checksum", action="store_false", help="Do not compute checksum of contents while collecting outputs", dest="compute_checksum") parser.add_argument("--relax-path-checks", action="store_true", default=False, help="Relax requirements on path names to permit " "spaces and hash characters.", dest="relax_path_checks") exgroup.add_argument("--make-template", action="store_true", help="Generate a template input object") parser.add_argument("--force-docker-pull", action="store_true", default=False, help="Pull latest docker image even if" " it is locally present", dest="force_docker_pull") parser.add_argument("--no-read-only", action="store_true", default=False, help="Do not set root directory in the" " container as read-only", dest="no_read_only") parser.add_argument("--overrides", type=str, default=None, help="Read process requirement overrides from file.") parser.add_argument("workflow", type=Text, nargs="?", default=None, metavar='cwl_document', help="path or URL to a CWL Workflow, " "CommandLineTool, or ExpressionTool. If the `inputs_object` has a " "`cwl:tool` field indicating the path or URL to the cwl_document, " " then the `workflow` argument is optional.") parser.add_argument("job_order", nargs=argparse.REMAINDER, metavar='inputs_object', help="path or URL to a YAML or JSON " "formatted description of the required input values for the given " "`cwl_document`.") return parser def get_default_args(): # type: () -> Dict[str, Any] """ Get default values of cwltool's command line options """ ap = arg_parser() args = ap.parse_args() return vars(args) class FSAction(argparse.Action): objclass = None # type: Text def __init__(self, option_strings, dest, nargs=None, **kwargs): # type: (List[Text], Text, Any, **Any) -> None if nargs is not None: raise ValueError("nargs not allowed") super(FSAction, self).__init__(option_strings, dest, **kwargs) def __call__(self, parser, namespace, values, option_string=None): # type: (argparse.ArgumentParser, argparse.Namespace, Union[AnyStr, Sequence[Any], None], AnyStr) -> None setattr(namespace, self.dest, # type: ignore {"class": self.objclass, "location": file_uri(str(os.path.abspath(cast(AnyStr, values))))}) class FSAppendAction(argparse.Action): objclass = None # type: Text def __init__(self, option_strings, dest, nargs=None, **kwargs): # type: (List[Text], Text, Any, **Any) -> None if nargs is not None: raise ValueError("nargs not allowed") super(FSAppendAction, self).__init__(option_strings, dest, **kwargs) def __call__(self, parser, namespace, values, option_string=None): # type: (argparse.ArgumentParser, argparse.Namespace, Union[AnyStr, Sequence[Any], None], AnyStr) -> None g = getattr(namespace, self.dest # type: ignore ) if not g: g = [] setattr(namespace, self.dest, # type: ignore g) g.append( {"class": self.objclass, "location": file_uri(str(os.path.abspath(cast(AnyStr, values))))}) class FileAction(FSAction): objclass = "File" class DirectoryAction(FSAction): objclass = "Directory" class FileAppendAction(FSAppendAction): objclass = "File" class DirectoryAppendAction(FSAppendAction): objclass = "Directory" def add_argument(toolparser, name, inptype, records, description="", default=None): # type: (argparse.ArgumentParser, Text, Any, List[Text], Text, Any) -> None if len(name) == 1: flag = "-" else: flag = "--" required = True if isinstance(inptype, list): if inptype[0] == "null": required = False if len(inptype) == 2: inptype = inptype[1] else: _logger.debug(u"Can't make command line argument from %s", inptype) return None ahelp = description.replace("%", "%%") action = None # type: Union[argparse.Action, Text] atype = None # type: Any if inptype == "File": action = cast(argparse.Action, FileAction) elif inptype == "Directory": action = cast(argparse.Action, DirectoryAction) elif isinstance(inptype, dict) and inptype["type"] == "array": if inptype["items"] == "File": action = cast(argparse.Action, FileAppendAction) elif inptype["items"] == "Directory": action = cast(argparse.Action, DirectoryAppendAction) else: action = "append" elif isinstance(inptype, dict) and inptype["type"] == "enum": atype = Text elif isinstance(inptype, dict) and inptype["type"] == "record": records.append(name) for field in inptype['fields']: fieldname = name + "." + shortname(field['name']) fieldtype = field['type'] fielddescription = field.get("doc", "") add_argument( toolparser, fieldname, fieldtype, records, fielddescription) return if inptype == "string": atype = Text elif inptype == "int": atype = int elif inptype == "double": atype = float elif inptype == "float": atype = float elif inptype == "boolean": action = "store_true" if default: required = False if not atype and not action: _logger.debug(u"Can't make command line argument from %s", inptype) return None if inptype != "boolean": typekw = {'type': atype} else: typekw = {} toolparser.add_argument( # type: ignore flag + name, required=required, help=ahelp, action=action, default=default, **typekw) def generate_parser(toolparser, tool, namemap, records): # type: (argparse.ArgumentParser, Process, Dict[Text, Text], List[Text]) -> argparse.ArgumentParser toolparser.add_argument("job_order", nargs="?", help="Job input json file") namemap["job_order"] = "job_order" for inp in tool.tool["inputs"]: name = shortname(inp["id"]) namemap[name.replace("-", "_")] = name inptype = inp["type"] description = inp.get("doc", "") default = inp.get("default", None) add_argument(toolparser, name, inptype, records, description, default) return toolparser cwltool-1.0.20180302231433/cwltool/sandboxjs.py0000644000175200017520000002717013247251315021635 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import errno import json import logging import os import re import select import subprocess import threading import sys from io import BytesIO from typing import Any, Dict, List, Mapping, Text, Tuple, Union from .utils import onWindows from pkg_resources import resource_stream import six try: import queue # type: ignore except ImportError: import Queue as queue # type: ignore class JavascriptException(Exception): pass _logger = logging.getLogger("cwltool") JSON = Union[Dict[Text, Any], List[Any], Text, int, float, bool, None] localdata = threading.local() have_node_slim = False # minimum acceptable version of nodejs engine minimum_node_version_str = '0.10.26' def check_js_threshold_version(working_alias): # type: (str) -> bool """Checks if the nodeJS engine version on the system with the allowed minimum version. https://github.com/nodejs/node/blob/master/CHANGELOG.md#nodejs-changelog """ # parse nodejs version into int Tuple: 'v4.2.6\n' -> [4, 2, 6] current_version_str = subprocess.check_output( [working_alias, "-v"]).decode('utf-8') current_version = [int(v) for v in current_version_str.strip().strip('v').split('.')] minimum_node_version = [int(v) for v in minimum_node_version_str.split('.')] if current_version >= minimum_node_version: return True else: return False def new_js_proc(force_docker_pull=False, js_console=False): # type: (bool, bool) -> subprocess.Popen cwl_node_engine_js = 'cwlNodeEngine.js' if js_console: cwl_node_engine_js = 'cwlNodeEngineJSConsole.js' _logger.warn("Running with support for javascript console in expressions (DO NOT USE IN PRODUCTION)") res = resource_stream(__name__, cwl_node_engine_js) nodecode = res.read().decode('utf-8') required_node_version, docker = (False,)*2 nodejs = None trynodes = ("nodejs", "node") for n in trynodes: try: if subprocess.check_output([n, "--eval", "process.stdout.write('t')"]).decode('utf-8') != "t": continue else: nodejs = subprocess.Popen([n, "--eval", nodecode], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE) required_node_version = check_js_threshold_version(n) break except subprocess.CalledProcessError: pass except OSError as e: if e.errno == errno.ENOENT: pass else: raise if nodejs is None or nodejs is not None and required_node_version is False: try: nodeimg = "node:slim" global have_node_slim if not have_node_slim: dockerimgs = subprocess.check_output(["docker", "images", "-q", nodeimg]).decode('utf-8') # if output is an empty string if (len(dockerimgs.split("\n")) <= 1) or force_docker_pull: # pull node:slim docker container nodejsimg = subprocess.check_output(["docker", "pull", nodeimg]).decode('utf-8') _logger.info("Pulled Docker image %s %s", nodeimg, nodejsimg) have_node_slim = True nodejs = subprocess.Popen(["docker", "run", "--attach=STDIN", "--attach=STDOUT", "--attach=STDERR", "--sig-proxy=true", "--interactive", "--rm", nodeimg, "node", "--eval", nodecode], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE) docker = True except OSError as e: if e.errno == errno.ENOENT: pass else: raise except subprocess.CalledProcessError: pass # docker failed and nodejs not on system if nodejs is None: raise JavascriptException( u"cwltool requires Node.js engine to evaluate Javascript " "expressions, but couldn't find it. Tried %s, docker run " "node:slim" % u", ".join(trynodes)) # docker failed, but nodejs is installed on system but the version is below the required version if docker is False and required_node_version is False: raise JavascriptException( u'cwltool requires minimum v{} version of Node.js engine.'.format(minimum_node_version_str), u'Try updating: https://docs.npmjs.com/getting-started/installing-node') return nodejs def execjs(js, jslib, timeout=None, force_docker_pull=False, debug=False, js_console=False): # type: (Union[Mapping, Text], Any, int, bool, bool, bool) -> JSON if not hasattr(localdata, "proc") or localdata.proc.poll() is not None or onWindows(): localdata.proc = new_js_proc(force_docker_pull=force_docker_pull, js_console=js_console) nodejs = localdata.proc fn = u"\"use strict\";\n%s\n(function()%s)()" %\ (jslib, js if isinstance(js, six.string_types) and len(js) > 1 and js[0] == '{' else ("{return (%s);}" % js)) killed = [] """ Kill the node process if it exceeds timeout limit""" def terminate(): try: killed.append(True) nodejs.kill() except OSError: pass if timeout is None: timeout = 20 tm = threading.Timer(timeout, terminate) tm.start() stdin_buf = BytesIO((json.dumps(fn) + "\n").encode('utf-8')) stdout_buf = BytesIO() stderr_buf = BytesIO() rselect = [nodejs.stdout, nodejs.stderr] # type: List[BytesIO] wselect = [nodejs.stdin] # type: List[BytesIO] PROCESS_FINISHED_STR = "r1cepzbhUTxtykz5XTC4\n" def process_finished(): # type: () -> bool return stdout_buf.getvalue().decode().endswith(PROCESS_FINISHED_STR) and \ stderr_buf.getvalue().decode().endswith(PROCESS_FINISHED_STR) # On windows system standard input/output are not handled properly by select module # (modules like pywin32, msvcrt, gevent don't work either) if sys.platform=='win32': READ_BYTES_SIZE = 512 # creating queue for reading from a thread to queue input_queue = queue.Queue() output_queue = queue.Queue() error_queue = queue.Queue() # To tell threads that output has ended and threads can safely exit no_more_output = threading.Lock() no_more_output.acquire() no_more_error = threading.Lock() no_more_error.acquire() # put constructed command to input queue which then will be passed to nodejs's stdin def put_input(input_queue): while True: b = stdin_buf.read(READ_BYTES_SIZE) if b: input_queue.put(b) else: break # get the output from nodejs's stdout and continue till otuput ends def get_output(output_queue): while not no_more_output.acquire(False): b=os.read(nodejs.stdout.fileno(), READ_BYTES_SIZE) if b: output_queue.put(b) # get the output from nodejs's stderr and continue till error output ends def get_error(error_queue): while not no_more_error.acquire(False): b = os.read(nodejs.stderr.fileno(), READ_BYTES_SIZE) if b: error_queue.put(b) # Threads managing nodejs.stdin, nodejs.stdout and nodejs.stderr respectively input_thread = threading.Thread(target=put_input, args=(input_queue,)) input_thread.daemon=True input_thread.start() output_thread = threading.Thread(target=get_output, args=(output_queue,)) output_thread.daemon=True output_thread.start() error_thread = threading.Thread(target=get_error, args=(error_queue,)) error_thread.daemon=True error_thread.start() finished = False while not finished and tm.is_alive(): try: if nodejs.stdin in wselect: if not input_queue.empty(): os.write(nodejs.stdin.fileno(), input_queue.get()) elif not input_thread.is_alive(): wselect = [] if nodejs.stdout in rselect: if not output_queue.empty(): stdout_buf.write(output_queue.get()) if nodejs.stderr in rselect: if not error_queue.empty(): stderr_buf.write(error_queue.get()) if process_finished() and error_queue.empty() and output_queue.empty(): finished = True no_more_output.release() no_more_error.release() except OSError as e: break else: while not process_finished() and tm.is_alive(): rready, wready, _ = select.select(rselect, wselect, []) try: if nodejs.stdin in wready: b = stdin_buf.read(select.PIPE_BUF) if b: os.write(nodejs.stdin.fileno(), b) for pipes in ((nodejs.stdout, stdout_buf), (nodejs.stderr, stderr_buf)): if pipes[0] in rready: b = os.read(pipes[0].fileno(), select.PIPE_BUF) if b: pipes[1].write(b) except OSError as e: break tm.cancel() stdin_buf.close() stdoutdata = stdout_buf.getvalue()[:-len(PROCESS_FINISHED_STR) - 1] stderrdata = stderr_buf.getvalue()[:-len(PROCESS_FINISHED_STR) - 1] def fn_linenum(): # type: () -> Text lines = fn.splitlines() ofs = 0 maxlines = 99 if len(lines) > maxlines: ofs = len(lines) - maxlines lines = lines[-maxlines:] return u"\n".join(u"%02i %s" % (i + ofs + 1, b) for i, b in enumerate(lines)) def stdfmt(data): # type: (Text) -> Text if "\n" in data: return "\n" + data.strip() return data nodejs.poll() if js_console: if len(stderrdata) > 0: _logger.info("Javascript console output:") _logger.info("----------------------------------------") _logger.info('\n'.join(re.findall(r'^[[](?:log|err)[]].*$', stderrdata.decode('utf-8'), flags=re.MULTILINE))) _logger.info("----------------------------------------") if debug: info = u"returncode was: %s\nscript was:\n%s\nstdout was: %s\nstderr was: %s\n" %\ (nodejs.returncode, fn_linenum(), stdfmt(stdoutdata.decode('utf-8')), stdfmt(stderrdata.decode('utf-8'))) else: info = u"Javascript expression was: %s\nstdout was: %s\nstderr was: %s" %\ (js, stdfmt(stdoutdata.decode('utf-8')), stdfmt(stderrdata.decode('utf-8'))) if nodejs.poll() not in (None, 0): if killed: raise JavascriptException(u"Long-running script killed after %s seconds: %s" % (timeout, info)) else: raise JavascriptException(info) else: try: # On windows currently a new instance of nodejs process is used due to problem with blocking on read operation on windows if onWindows(): nodejs.kill() return json.loads(stdoutdata.decode('utf-8')) except ValueError as e: raise JavascriptException(u"%s\nscript was:\n%s\nstdout was: '%s'\nstderr was: '%s'\n" % (e, fn_linenum(), stdoutdata, stderrdata)) cwltool-1.0.20180302231433/cwltool/mutation.py0000644000175200017520000000604113247251315021474 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import from collections import namedtuple from typing import Any, Callable, Dict, Generator, Iterable, List, Text, Union, cast from .errors import WorkflowException MutationState = namedtuple("MutationTracker", ["generation", "readers", "stepname"]) _generation = "http://commonwl.org/cwltool#generation" class MutationManager(object): """Lock manager for checking correctness of in-place update of files. Used to validate that in-place file updates happen sequentially, and that a file which is registered for in-place update cannot be read or updated by any other steps. """ def __init__(self): # type: () -> None self.generations = {} # type: Dict[Text, MutationState] def register_reader(self, stepname, obj): # type: (Text, Dict[Text, Any]) -> None loc = obj["location"] current = self.generations.get(loc, MutationState(0, [], "")) obj_generation = obj.get(_generation, 0) if obj_generation != current.generation: raise WorkflowException("[job %s] wants to read %s from generation %i but current generation is %s (last updated by %s)" % ( stepname, loc, obj_generation, current.generation, current.stepname)) current.readers.append(stepname) self.generations[loc] = current def release_reader(self, stepname, obj): # type: (Text, Dict[Text, Any]) -> None loc = obj["location"] current = self.generations.get(loc, MutationState(0, [], "")) obj_generation = obj.get(_generation, 0) if obj_generation != current.generation: raise WorkflowException("[job %s] wants to release reader on %s from generation %i but current generation is %s (last updated by %s)" % ( stepname, loc, obj_generation, current.generation, current.stepname)) self.generations[loc].readers.remove(stepname) def register_mutation(self, stepname, obj): # type: (Text, Dict[Text, Any]) -> None loc = obj["location"] current = self.generations.get(loc, MutationState(0,[], "")) obj_generation = obj.get(_generation, 0) if len(current.readers) > 0: raise WorkflowException("[job %s] wants to modify %s but has readers: %s" % ( stepname, loc, current.readers)) if obj_generation != current.generation: raise WorkflowException("[job %s] wants to modify %s from generation %i but current generation is %s (last updated by %s)" % ( stepname, loc, obj_generation, current.generation, current.stepname)) self.generations[loc] = MutationState(current.generation+1, current.readers, stepname) def set_generation(self, obj): # type: (Dict) -> None loc = obj["location"] current = self.generations.get(loc, MutationState(0,[], "")) obj[_generation] = current.generation def unset_generation(self, obj): # type: (Dict) -> None obj.pop(_generation, None) cwltool-1.0.20180302231433/cwltool/docker_id.py0000644000175200017520000001021113247251315021551 0ustar mcrusoemcrusoe00000000000000from __future__ import print_function from __future__ import absolute_import import subprocess from typing import List, Text, Tuple def docker_vm_id(): # type: () -> Tuple[int, int] """ Returns the User ID and Group ID of the default docker user inside the VM When a host is using boot2docker or docker-machine to run docker with boot2docker.iso (As on Mac OS X), the UID that mounts the shared filesystem inside the VirtualBox VM is likely different than the user's UID on the host. :return: A tuple containing numeric User ID and Group ID of the docker account inside the boot2docker VM """ if boot2docker_running(): return boot2docker_id() elif docker_machine_running(): return docker_machine_id() else: return (None, None) def check_output_and_strip(cmd): # type: (List[Text]) -> Text """ Passes a command list to subprocess.check_output, returning None if an expected exception is raised :param cmd: The command to execute :return: Stripped string output of the command, or None if error """ try: result = subprocess.check_output(cmd, stderr=subprocess.STDOUT) return result.strip() except (OSError, subprocess.CalledProcessError, TypeError, AttributeError): # OSError is raised if command doesn't exist # CalledProcessError is raised if command returns nonzero # AttributeError is raised if result cannot be strip()ped return None def docker_machine_name(): # type: () -> Text """ Get the machine name of the active docker-machine machine :return: Name of the active machine or None if error """ return check_output_and_strip(['docker-machine', 'active']) def cmd_output_matches(check_cmd, expected_status): # type: (List[Text], Text) -> bool """ Runs a command and compares output to expected :param check_cmd: Command list to execute :param expected_status: Expected output, e.g. "Running" or "poweroff" :return: Boolean value, indicating whether or not command result matched """ if check_output_and_strip(check_cmd) == expected_status: return True else: return False def boot2docker_running(): # type: () -> bool """ Checks if boot2docker CLI reports that boot2docker vm is running :return: True if vm is running, False otherwise """ return cmd_output_matches(['boot2docker', 'status'], 'running') def docker_machine_running(): # type: () -> bool """ Asks docker-machine for active machine and checks if its VM is running :return: True if vm is running, False otherwise """ machine_name = docker_machine_name() return cmd_output_matches(['docker-machine', 'status', machine_name], 'Running') def cmd_output_to_int(cmd): # type: (List[Text]) -> int """ Runs the provided command and returns the integer value of the result :param cmd: The command to run :return: Integer value of result, or None if an error occurred """ result = check_output_and_strip(cmd) # may return None if result is not None: try: return int(result) except ValueError: # ValueError is raised if int conversion fails return None return None def boot2docker_id(): # type: () -> Tuple[int, int] """ Gets the UID and GID of the docker user inside a running boot2docker vm :return: Tuple (UID, GID), or (None, None) if error (e.g. boot2docker not present or stopped) """ uid = cmd_output_to_int(['boot2docker', 'ssh', 'id', '-u']) gid = cmd_output_to_int(['boot2docker', 'ssh', 'id', '-g']) return (uid, gid) def docker_machine_id(): # type: () -> Tuple[int, int] """ Asks docker-machine for active machine and gets the UID of the docker user inside the vm :return: tuple (UID, GID), or (None, None) if error (e.g. docker-machine not present or stopped) """ machine_name = docker_machine_name() uid = cmd_output_to_int(['docker-machine', 'ssh', machine_name, "id -u"]) gid = cmd_output_to_int(['docker-machine', 'ssh', machine_name, "id -g"]) return (uid, gid) if __name__ == '__main__': print(docker_vm_id()) cwltool-1.0.20180302231433/cwltool/builder.py0000644000175200017520000003026113247251315021263 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import copy import os import logging from typing import Any, Callable, Dict, List, Text, Type, Union import six from six import iteritems, string_types import avro import schema_salad.validate as validate from schema_salad.sourceline import SourceLine from . import expression from .errors import WorkflowException from .mutation import MutationManager from .pathmapper import (PathMapper, get_listing, normalizeFilesDirs, visit_class) from .stdfsaccess import StdFsAccess from .utils import aslist, get_feature, docker_windows_path_adjust, onWindows _logger = logging.getLogger("cwltool") AvroSchemaFromJSONData = avro.schema.make_avsc_object CONTENT_LIMIT = 64 * 1024 def substitute(value, replace): # type: (Text, Text) -> Text if replace[0] == "^": return substitute(value[0:value.rindex('.')], replace[1:]) else: return value + replace class Builder(object): def __init__(self): # type: () -> None self.names = None # type: avro.schema.Names self.schemaDefs = None # type: Dict[Text, Dict[Text, Any]] self.files = None # type: List[Dict[Text, Text]] self.fs_access = None # type: StdFsAccess self.job = None # type: Dict[Text, Union[Dict[Text, Any], List, Text]] self.requirements = None # type: List[Dict[Text, Any]] self.hints = None # type: List[Dict[Text, Any]] self.outdir = None # type: Text self.tmpdir = None # type: Text self.resources = None # type: Dict[Text, Union[int, Text]] self.bindings = [] # type: List[Dict[Text, Any]] self.timeout = None # type: int self.pathmapper = None # type: PathMapper self.stagedir = None # type: Text self.make_fs_access = None # type: Type[StdFsAccess] self.debug = False # type: bool self.js_console = False # type: bool self.mutation_manager = None # type: MutationManager self.force_docker_pull = False # type: bool # One of "no_listing", "shallow_listing", "deep_listing" # Will be default "no_listing" for CWL v1.1 self.loadListing = "deep_listing" # type: Union[None, str] self.find_default_container = None # type: Callable[[], Text] self.job_script_provider = None # type: Any def build_job_script(self, commands): # type: (List[Text]) -> Text build_job_script_method = getattr(self.job_script_provider, "build_job_script", None) # type: Callable[[Builder, Union[List[str],List[Text]]], Text] if build_job_script_method: return build_job_script_method(self, commands) else: return None def bind_input(self, schema, datum, lead_pos=None, tail_pos=None): # type: (Dict[Text, Any], Any, Union[int, List[int]], List[int]) -> List[Dict[Text, Any]] if tail_pos is None: tail_pos = [] if lead_pos is None: lead_pos = [] bindings = [] # type: List[Dict[Text,Text]] binding = None # type: Dict[Text,Any] value_from_expression = False if "inputBinding" in schema and isinstance(schema["inputBinding"], dict): binding = copy.copy(schema["inputBinding"]) if "position" in binding: binding["position"] = aslist(lead_pos) + aslist(binding["position"]) + aslist(tail_pos) else: binding["position"] = aslist(lead_pos) + [0] + aslist(tail_pos) binding["datum"] = datum if "valueFrom" in binding: value_from_expression = True # Handle union types if isinstance(schema["type"], list): if not value_from_expression: for t in schema["type"]: if isinstance(t, (str, Text)) and self.names.has_name(t, ""): avsc = self.names.get_name(t, "") elif isinstance(t, dict) and "name" in t and self.names.has_name(t["name"], ""): avsc = self.names.get_name(t["name"], "") else: avsc = AvroSchemaFromJSONData(t, self.names) if validate.validate(avsc, datum): schema = copy.deepcopy(schema) schema["type"] = t return self.bind_input(schema, datum, lead_pos=lead_pos, tail_pos=tail_pos) raise validate.ValidationException(u"'%s' is not a valid union %s" % (datum, schema["type"])) elif isinstance(schema["type"], dict): if not value_from_expression: st = copy.deepcopy(schema["type"]) if binding and "inputBinding" not in st and st["type"] == "array" and "itemSeparator" not in binding: st["inputBinding"] = {} for k in ("secondaryFiles", "format", "streamable"): if k in schema: st[k] = schema[k] bindings.extend(self.bind_input(st, datum, lead_pos=lead_pos, tail_pos=tail_pos)) else: if schema["type"] in self.schemaDefs: schema = self.schemaDefs[schema["type"]] if schema["type"] == "record": for f in schema["fields"]: if f["name"] in datum: bindings.extend(self.bind_input(f, datum[f["name"]], lead_pos=lead_pos, tail_pos=f["name"])) else: datum[f["name"]] = f.get("default") if schema["type"] == "array": for n, item in enumerate(datum): b2 = None if binding: b2 = copy.deepcopy(binding) b2["datum"] = item itemschema = { u"type": schema["items"], u"inputBinding": b2 } for k in ("secondaryFiles", "format", "streamable"): if k in schema: itemschema[k] = schema[k] bindings.extend( self.bind_input(itemschema, item, lead_pos=n, tail_pos=tail_pos)) binding = None if schema["type"] == "File": self.files.append(datum) if binding: if binding.get("loadContents"): with self.fs_access.open(datum["location"], "rb") as f: datum["contents"] = f.read(CONTENT_LIMIT) if "secondaryFiles" in schema: if "secondaryFiles" not in datum: datum["secondaryFiles"] = [] for sf in aslist(schema["secondaryFiles"]): if isinstance(sf, dict) or "$(" in sf or "${" in sf: sfpath = self.do_eval(sf, context=datum) else: sfpath = substitute(datum["basename"], sf) for sfname in aslist(sfpath): found = False for d in datum["secondaryFiles"]: if not d.get("basename"): d["basename"] = d["location"][d["location"].rindex("/")+1:] if d["basename"] == sfname: found = True if not found: if isinstance(sfname, dict): datum["secondaryFiles"].append(sfname) else: datum["secondaryFiles"].append({ "location": datum["location"][0:datum["location"].rindex("/")+1]+sfname, "basename": sfname, "class": "File"}) normalizeFilesDirs(datum["secondaryFiles"]) def _capture_files(f): self.files.append(f) return f visit_class(datum.get("secondaryFiles", []), ("File", "Directory"), _capture_files) if schema["type"] == "Directory": ll = self.loadListing or (binding and binding.get("loadListing")) if ll and ll != "no_listing": get_listing(self.fs_access, datum, (ll == "deep_listing")) self.files.append(datum) # Position to front of the sort key if binding: for bi in bindings: bi["position"] = binding["position"] + bi["position"] bindings.append(binding) return bindings def tostr(self, value): # type: (Any) -> Text if isinstance(value, dict) and value.get("class") in ("File", "Directory"): if "path" not in value: raise WorkflowException(u"%s object missing \"path\": %s" % (value["class"], value)) # Path adjust for windows file path when passing to docker, docker accepts unix like path only (docker_req, docker_is_req) = get_feature(self, "DockerRequirement") if onWindows() and docker_req is not None: # docker_req is none only when there is no dockerRequirement mentioned in hints and Requirement return docker_windows_path_adjust(value["path"]) return value["path"] else: return Text(value) def generate_arg(self, binding): # type: (Dict[Text,Any]) -> List[Text] value = binding.get("datum") if "valueFrom" in binding: with SourceLine(binding, "valueFrom", WorkflowException, _logger.isEnabledFor(logging.DEBUG)): value = self.do_eval(binding["valueFrom"], context=value) prefix = binding.get("prefix") sep = binding.get("separate", True) if prefix is None and not sep: with SourceLine(binding, "separate", WorkflowException, _logger.isEnabledFor(logging.DEBUG)): raise WorkflowException("'separate' option can not be specified without prefix") l = [] # type: List[Dict[Text,Text]] if isinstance(value, list): if binding.get("itemSeparator") and value: l = [binding["itemSeparator"].join([self.tostr(v) for v in value])] elif binding.get("valueFrom"): value = [self.tostr(v) for v in value] return ([prefix] if prefix else []) + value elif prefix and value: return [prefix] else: return [] elif isinstance(value, dict) and value.get("class") in ("File", "Directory"): l = [value] elif isinstance(value, dict): return [prefix] if prefix else [] elif value is True and prefix: return [prefix] elif value is False or value is None or (value is True and not prefix): return [] else: l = [value] args = [] for j in l: if sep: args.extend([prefix, self.tostr(j)]) else: args.append(prefix + self.tostr(j)) return [a for a in args if a is not None] def do_eval(self, ex, context=None, pull_image=True, recursive=False, strip_whitespace=True): # type: (Union[Dict[Text, Text], Text], Any, bool, bool, bool) -> Any if recursive: if isinstance(ex, dict): return {k: self.do_eval(v, context, pull_image, recursive) for k, v in iteritems(ex)} if isinstance(ex, list): return [self.do_eval(v, context, pull_image, recursive) for v in ex] if context is None and type(ex) is str and "self" in ex: return None return expression.do_eval(ex, self.job, self.requirements, self.outdir, self.tmpdir, self.resources, context=context, pull_image=pull_image, timeout=self.timeout, debug=self.debug, js_console=self.js_console, force_docker_pull=self.force_docker_pull, strip_whitespace=strip_whitespace) cwltool-1.0.20180302231433/cwltool/singularity.py0000644000175200017520000001560513247251316022215 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import logging import os import re import shutil import subprocess import sys from io import open from typing import (Dict, List, Text, MutableMapping, Any) from .errors import WorkflowException from .job import ContainerCommandLineJob from .pathmapper import PathMapper, ensure_writable from .process import (UnsupportedRequirement) from .utils import docker_windows_path_adjust _logger = logging.getLogger("cwltool") class SingularityCommandLineJob(ContainerCommandLineJob): @staticmethod def get_image(dockerRequirement, pull_image, dry_run=False): # type: (Dict[Text, Text], bool, bool) -> bool found = False if "dockerImageId" not in dockerRequirement and "dockerPull" in dockerRequirement: match = re.search(pattern=r'([a-z]*://)', string=dockerRequirement["dockerPull"]) if match: dockerRequirement["dockerImageId"] = re.sub(pattern=r'([a-z]*://)', repl=r'', string=dockerRequirement["dockerPull"]) dockerRequirement["dockerImageId"] = re.sub(pattern=r'[:/]', repl=r'-', string=dockerRequirement["dockerImageId"]) + ".img" else: dockerRequirement["dockerImageId"] = re.sub(pattern=r'[:/]', repl=r'-', string=dockerRequirement["dockerPull"]) + ".img" dockerRequirement["dockerPull"] = "docker://" + dockerRequirement["dockerPull"] # check to see if the Singularity container is already downloaded if os.path.isfile(dockerRequirement["dockerImageId"]): _logger.info("Using local copy of Singularity image") found = True # if the .img file is not already present, pull the image elif pull_image: cmd = [] # type: List[Text] if "dockerPull" in dockerRequirement: cmd = ["singularity", "pull", "--name", str(dockerRequirement["dockerImageId"]), str(dockerRequirement["dockerPull"])] _logger.info(Text(cmd)) if not dry_run: subprocess.check_call(cmd, stdout=sys.stderr) found = True return found def get_from_requirements(self, r, req, pull_image, dry_run=False): # type: (Dict[Text, Text], bool, bool, bool) -> Text # returns the filename of the Singularity image (e.g. hello-world-latest.img) if r: errmsg = None try: subprocess.check_output(["singularity", "--version"]) except subprocess.CalledProcessError as e: errmsg = "Cannot execute 'singularity --version' " + Text(e) except OSError as e: errmsg = "'singularity' executable not found: " + Text(e) if errmsg: if req: raise WorkflowException(errmsg) else: return None if self.get_image(r, pull_image, dry_run): return os.path.abspath(r["dockerImageId"]) else: if req: raise WorkflowException(u"Container image %s not found" % r["dockerImageId"]) return None def add_volumes(self, pathmapper, runtime, stage_output): # type: (PathMapper, List[Text], bool) -> None host_outdir = self.outdir container_outdir = self.builder.outdir for src, vol in pathmapper.items(): if not vol.staged: continue if stage_output: containertgt = container_outdir + vol.target[len(host_outdir):] else: containertgt = vol.target if vol.target.startswith(container_outdir + "/"): host_outdir_tgt = os.path.join( host_outdir, vol.target[len(container_outdir) + 1:]) else: host_outdir_tgt = None if vol.type in ("File", "Directory"): if not vol.resolved.startswith("_:"): runtime.append(u"--bind") runtime.append("%s:%s:ro" % ( docker_windows_path_adjust(vol.resolved), docker_windows_path_adjust(containertgt))) elif vol.type == "WritableFile": if self.inplace_update: runtime.append(u"--bind") runtime.append("%s:%s:rw" % ( docker_windows_path_adjust(vol.resolved), docker_windows_path_adjust(containertgt))) else: shutil.copy(vol.resolved, host_outdir_tgt) ensure_writable(host_outdir_tgt) elif vol.type == "WritableDirectory": if vol.resolved.startswith("_:"): os.makedirs(host_outdir_tgt, 0o0755) else: if self.inplace_update: runtime.append(u"--bind") runtime.append("%s:%s:rw" % ( docker_windows_path_adjust(vol.resolved), docker_windows_path_adjust(containertgt))) else: shutil.copytree(vol.resolved, vol.target) elif vol.type == "CreateFile": createtmp = os.path.join(host_outdir, os.path.basename(vol.target)) with open(createtmp, "wb") as f: f.write(vol.resolved.encode("utf-8")) runtime.append(u"--bind") runtime.append( "%s:%s:ro" % (docker_windows_path_adjust(createtmp), docker_windows_path_adjust(vol.target))) def create_runtime(self, env, rm_container=True, record_container_id=False, cidfile_dir="", cidfile_prefix="", **kwargs): # type: (MutableMapping[Text, Text], bool, bool, Text, Text, **Any) -> List runtime = [u"singularity", u"--quiet", u"exec"] runtime.append(u"--bind") runtime.append( u"%s:%s:rw" % (docker_windows_path_adjust(os.path.realpath(self.outdir)), self.builder.outdir)) runtime.append(u"--bind") runtime.append(u"%s:%s:rw" % (docker_windows_path_adjust(os.path.realpath(self.tmpdir)), "/tmp")) self.add_volumes(self.pathmapper, runtime, stage_output=False) if self.generatemapper: self.add_volumes(self.generatemapper, runtime, stage_output=True) runtime.append(u"--pwd") runtime.append("%s" % (docker_windows_path_adjust(self.builder.outdir))) if kwargs.get("custom_net", None) is not None: raise UnsupportedRequirement( "Singularity implementation does not support networking") env["SINGULARITYENV_TMPDIR"] = "/tmp" env["SINGULARITYENV_HOME"] = self.builder.outdir for t, v in self.environment.items(): env["SINGULARITYENV_" + t] = v return runtime cwltool-1.0.20180302231433/cwltool/extensions.yml0000644000175200017520000000152113247251315022202 0ustar mcrusoemcrusoe00000000000000$base: http://commonwl.org/cwltool# $namespaces: cwl: "https://w3id.org/cwl/cwl#" $graph: - $import: https://w3id.org/cwl/CommonWorkflowLanguage.yml - name: LoadListingRequirement type: record extends: cwl:ProcessRequirement inVocab: false fields: class: type: string doc: "Always 'LoadListingRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" loadListing: type: - type: enum name: LoadListingEnum symbols: [no_listing, shallow_listing, deep_listing] - name: InplaceUpdateRequirement type: record inVocab: false extends: cwl:ProcessRequirement fields: class: type: string doc: "Always 'InplaceUpdateRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" inplaceUpdate: type: boolean cwltool-1.0.20180302231433/cwltool/command_line_tool.py0000644000175200017520000010054313247251315023320 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import copy import hashlib import locale import json import logging import os import re import shutil import tempfile from functools import partial, cmp_to_key from typing import (Any, Callable, Dict, Generator, List, Optional, Set, Text, Union, cast) from six import string_types, u import schema_salad.validate as validate import shellescape from schema_salad.ref_resolver import file_uri, uri_file_path from schema_salad.sourceline import SourceLine, indent from six.moves import urllib from .builder import CONTENT_LIMIT, Builder, substitute from .docker import DockerCommandLineJob from .errors import WorkflowException from .flatten import flatten from .job import CommandLineJob, JobBase from .pathmapper import (PathMapper, adjustDirObjs, adjustFileObjs, get_listing, trim_listing, visit_class) from .process import (Process, UnsupportedRequirement, _logger_validation_warnings, compute_checksums, normalizeFilesDirs, shortname, uniquename) from .singularity import SingularityCommandLineJob from .stdfsaccess import StdFsAccess from .utils import aslist, docker_windows_path_adjust, convert_pathsep_to_unix, windows_default_container_id, onWindows from six.moves import map ACCEPTLIST_EN_STRICT_RE = re.compile(r"^[a-zA-Z0-9._+-]+$") ACCEPTLIST_EN_RELAXED_RE = re.compile(r".*") # Accept anything ACCEPTLIST_RE = ACCEPTLIST_EN_STRICT_RE DEFAULT_CONTAINER_MSG="""We are on Microsoft Windows and not all components of this CWL description have a container specified. This means that these steps will be executed in the default container, which is %s. Note, this could affect portability if this CWL description relies on non-POSIX features or commands in this container. For best results add the following to your CWL description's hints section: hints: DockerRequirement: dockerPull: %s """ _logger = logging.getLogger("cwltool") class ExpressionTool(Process): def __init__(self, toolpath_object, **kwargs): # type: (Dict[Text, Any], **Any) -> None super(ExpressionTool, self).__init__(toolpath_object, **kwargs) class ExpressionJob(object): def __init__(self): # type: () -> None self.builder = None # type: Builder self.requirements = None # type: Dict[Text, Text] self.hints = None # type: Dict[Text, Text] self.collect_outputs = None # type: Callable[[Any], Any] self.output_callback = None # type: Callable[[Any, Any], Any] self.outdir = None # type: Text self.tmpdir = None # type: Text self.script = None # type: Dict[Text, Text] def run(self, **kwargs): # type: (**Any) -> None try: ev = self.builder.do_eval(self.script) normalizeFilesDirs(ev) self.output_callback(ev, "success") except Exception as e: _logger.warning(u"Failed to evaluate expression:\n%s", e, exc_info=kwargs.get('debug')) self.output_callback({}, "permanentFail") def job(self, job_order, # type: Dict[Text, Text] output_callbacks, # type: Callable[[Any, Any], Any] **kwargs # type: Any ): # type: (...) -> Generator[ExpressionTool.ExpressionJob, None, None] builder = self._init_job(job_order, **kwargs) j = ExpressionTool.ExpressionJob() j.builder = builder j.script = self.tool["expression"] j.output_callback = output_callbacks j.requirements = self.requirements j.hints = self.hints j.outdir = None j.tmpdir = None yield j def remove_path(f): # type: (Dict[Text, Any]) -> None if "path" in f: del f["path"] def revmap_file(builder, outdir, f): # type: (Builder, Text, Dict[Text, Any]) -> Union[Dict[Text, Any], None] """Remap a file from internal path to external path. For Docker, this maps from the path inside tho container to the path outside the container. Recognizes files in the pathmapper or remaps internal output directories to the external directory. """ split = urllib.parse.urlsplit(outdir) if not split.scheme: outdir = file_uri(str(outdir)) # builder.outdir is the inner (container/compute node) output directory # outdir is the outer (host/storage system) output directory if "location" in f and "path" not in f: if f["location"].startswith("file://"): f["path"] = convert_pathsep_to_unix(uri_file_path(f["location"])) else: return f if "path" in f: path = f["path"] uripath = file_uri(path) del f["path"] if "basename" not in f: f["basename"] = os.path.basename(path) revmap_f = builder.pathmapper.reversemap(path) if revmap_f and not builder.pathmapper.mapper(revmap_f[0]).type.startswith("Writable"): f["location"] = revmap_f[1] elif uripath == outdir or uripath.startswith(outdir+os.sep): f["location"] = file_uri(path) elif path == builder.outdir or path.startswith(builder.outdir+os.sep): f["location"] = builder.fs_access.join(outdir, path[len(builder.outdir) + 1:]) elif not os.path.isabs(path): f["location"] = builder.fs_access.join(outdir, path) else: raise WorkflowException(u"Output file path %s must be within designated output directory (%s) or an input " u"file pass through." % (path, builder.outdir)) return f raise WorkflowException(u"Output File object is missing both 'location' " "and 'path' fields: %s" % f) class CallbackJob(object): def __init__(self, job, output_callback, cachebuilder, jobcache): # type: (CommandLineTool, Callable[[Any, Any], Any], Builder, Text) -> None self.job = job self.output_callback = output_callback self.cachebuilder = cachebuilder self.outdir = jobcache def run(self, **kwargs): # type: (**Any) -> None self.output_callback(self.job.collect_output_ports( self.job.tool["outputs"], self.cachebuilder, self.outdir, kwargs.get("compute_checksum", True)), "success") # map files to assigned path inside a container. We need to also explicitly # walk over input as implicit reassignment doesn't reach everything in builder.bindings def check_adjust(builder, f): # type: (Builder, Dict[Text, Any]) -> Dict[Text, Any] f["path"] = docker_windows_path_adjust(builder.pathmapper.mapper(f["location"])[1]) f["dirname"], f["basename"] = os.path.split(f["path"]) if f["class"] == "File": f["nameroot"], f["nameext"] = os.path.splitext(f["basename"]) if not ACCEPTLIST_RE.match(f["basename"]): raise WorkflowException("Invalid filename: '%s' contains illegal characters" % (f["basename"])) return f def check_valid_locations(fs_access, ob): if ob["location"].startswith("_:"): pass if ob["class"] == "File" and not fs_access.isfile(ob["location"]): raise validate.ValidationException("Does not exist or is not a File: '%s'" % ob["location"]) if ob["class"] == "Directory" and not fs_access.isdir(ob["location"]): raise validate.ValidationException("Does not exist or is not a Directory: '%s'" % ob["location"]) class CommandLineTool(Process): def __init__(self, toolpath_object, **kwargs): # type: (Dict[Text, Any], **Any) -> None super(CommandLineTool, self).__init__(toolpath_object, **kwargs) self.find_default_container = kwargs.get("find_default_container", None) def makeJobRunner(self, use_container=True, **kwargs): # type: (Optional[bool], **Any) -> JobBase dockerReq, _ = self.get_requirement("DockerRequirement") if not dockerReq and use_container: if self.find_default_container: default_container = self.find_default_container(self) if default_container: self.requirements.insert(0, { "class": "DockerRequirement", "dockerPull": default_container }) dockerReq = self.requirements[0] if default_container == windows_default_container_id and use_container and onWindows(): _logger.warning(DEFAULT_CONTAINER_MSG % (windows_default_container_id, windows_default_container_id)) if dockerReq and use_container: if kwargs.get('singularity'): return SingularityCommandLineJob() else: return DockerCommandLineJob() else: for t in reversed(self.requirements): if t["class"] == "DockerRequirement": raise UnsupportedRequirement( "--no-container, but this CommandLineTool has " "DockerRequirement under 'requirements'.") return CommandLineJob() def makePathMapper(self, reffiles, stagedir, **kwargs): # type: (List[Any], Text, **Any) -> PathMapper return PathMapper(reffiles, kwargs["basedir"], stagedir, separateDirs=kwargs.get("separateDirs", True)) def updatePathmap(self, outdir, pathmap, fn): # type: (Text, PathMapper, Dict) -> None if "location" in fn and fn["location"] in pathmap: pathmap.update(fn["location"], pathmap.mapper(fn["location"]).resolved, os.path.join(outdir, fn["basename"]), ("Writable" if fn.get("writable") else "") + fn["class"], False) for sf in fn.get("secondaryFiles", []): self.updatePathmap(outdir, pathmap, sf) for ls in fn.get("listing", []): self.updatePathmap(os.path.join(outdir, fn["basename"]), pathmap, ls) def job(self, job_order, # type: Dict[Text, Text] output_callbacks, # type: Callable[[Any, Any], Any] **kwargs # type: Any ): # type: (...) -> Generator[Union[JobBase, CallbackJob], None, None] jobname = uniquename(kwargs.get("name", shortname(self.tool.get("id", "job")))) if kwargs.get("cachedir"): cacheargs = kwargs.copy() cacheargs["outdir"] = "/out" cacheargs["tmpdir"] = "/tmp" cacheargs["stagedir"] = "/stage" cachebuilder = self._init_job(job_order, **cacheargs) cachebuilder.pathmapper = PathMapper(cachebuilder.files, kwargs["basedir"], cachebuilder.stagedir, separateDirs=False) _check_adjust = partial(check_adjust, cachebuilder) visit_class([cachebuilder.files, cachebuilder.bindings], ("File", "Directory"), _check_adjust) cmdline = flatten(list(map(cachebuilder.generate_arg, cachebuilder.bindings))) (docker_req, docker_is_req) = self.get_requirement("DockerRequirement") if docker_req and kwargs.get("use_container"): dockerimg = docker_req.get("dockerImageId") or docker_req.get("dockerPull") elif kwargs.get("default_container", None) is not None and kwargs.get("use_container"): dockerimg = kwargs.get("default_container") else: dockerimg = None if dockerimg: cmdline = ["docker", "run", dockerimg] + cmdline keydict = {u"cmdline": cmdline} if "stdout" in self.tool: keydict["stdout"] = self.tool["stdout"] for location, f in cachebuilder.pathmapper.items(): if f.type == "File": checksum = next((e['checksum'] for e in cachebuilder.files if 'location' in e and e['location'] == location and 'checksum' in e and e['checksum'] != 'sha1$hash'), None) st = os.stat(f.resolved) if checksum: keydict[f.resolved] = [st.st_size, checksum] else: keydict[f.resolved] = [st.st_size, int(st.st_mtime * 1000)] interesting = {"DockerRequirement", "EnvVarRequirement", "CreateFileRequirement", "ShellCommandRequirement"} for rh in (self.requirements, self.hints): for r in reversed(rh): if r["class"] in interesting and r["class"] not in keydict: keydict[r["class"]] = r keydictstr = json.dumps(keydict, separators=(',', ':'), sort_keys=True) cachekey = hashlib.md5(keydictstr.encode('utf-8')).hexdigest() _logger.debug("[job %s] keydictstr is %s -> %s", jobname, keydictstr, cachekey) jobcache = os.path.join(kwargs["cachedir"], cachekey) jobcachepending = jobcache + ".pending" if os.path.isdir(jobcache) and not os.path.isfile(jobcachepending): if docker_req and kwargs.get("use_container"): cachebuilder.outdir = kwargs.get("docker_outdir") or "/var/spool/cwl" else: cachebuilder.outdir = jobcache _logger.info("[job %s] Using cached output in %s", jobname, jobcache) yield CallbackJob(self, output_callbacks, cachebuilder, jobcache) return else: _logger.info("[job %s] Output of job will be cached in %s", jobname, jobcache) shutil.rmtree(jobcache, True) os.makedirs(jobcache) kwargs["outdir"] = jobcache open(jobcachepending, "w").close() def rm_pending_output_callback(output_callbacks, jobcachepending, outputs, processStatus): if processStatus == "success": os.remove(jobcachepending) output_callbacks(outputs, processStatus) output_callbacks = cast( Callable[..., Any], # known bug in mypy # https://github.com/python/mypy/issues/797 partial(rm_pending_output_callback, output_callbacks, jobcachepending)) builder = self._init_job(job_order, **kwargs) reffiles = copy.deepcopy(builder.files) j = self.makeJobRunner(**kwargs) j.builder = builder j.joborder = builder.job j.make_pathmapper = self.makePathMapper j.stdin = None j.stderr = None j.stdout = None j.successCodes = self.tool.get("successCodes") j.temporaryFailCodes = self.tool.get("temporaryFailCodes") j.permanentFailCodes = self.tool.get("permanentFailCodes") j.requirements = self.requirements j.hints = self.hints j.name = jobname debug = _logger.isEnabledFor(logging.DEBUG) if debug: _logger.debug(u"[job %s] initializing from %s%s", j.name, self.tool.get("id", ""), u" as part of %s" % kwargs["part_of"] if "part_of" in kwargs else "") _logger.debug(u"[job %s] %s", j.name, json.dumps(job_order, indent=4)) builder.pathmapper = None make_path_mapper_kwargs = kwargs if "stagedir" in make_path_mapper_kwargs: make_path_mapper_kwargs = make_path_mapper_kwargs.copy() del make_path_mapper_kwargs["stagedir"] builder.pathmapper = self.makePathMapper(reffiles, builder.stagedir, **make_path_mapper_kwargs) builder.requirements = j.requirements _check_adjust = partial(check_adjust, builder) visit_class([builder.files, builder.bindings], ("File", "Directory"), _check_adjust) initialWorkdir = self.get_requirement("InitialWorkDirRequirement")[0] j.generatefiles = {"class": "Directory", "listing": [], "basename": ""} if initialWorkdir: ls = [] # type: List[Dict[Text, Any]] if isinstance(initialWorkdir["listing"], (str, Text)): ls = builder.do_eval(initialWorkdir["listing"]) else: for t in initialWorkdir["listing"]: if "entry" in t: et = {u"entry": builder.do_eval(t["entry"], strip_whitespace=False)} if "entryname" in t: et["entryname"] = builder.do_eval(t["entryname"]) else: et["entryname"] = None et["writable"] = t.get("writable", False) ls.append(et) else: ls.append(builder.do_eval(t)) for i, t in enumerate(ls): if "entry" in t: if isinstance(t["entry"], string_types): ls[i] = { "class": "File", "basename": t["entryname"], "contents": t["entry"], "writable": t.get("writable") } else: if t.get("entryname") or t.get("writable"): t = copy.deepcopy(t) if t.get("entryname"): t["entry"]["basename"] = t["entryname"] t["entry"]["writable"] = t.get("writable") ls[i] = t["entry"] j.generatefiles[u"listing"] = ls for l in ls: self.updatePathmap(builder.outdir, builder.pathmapper, l) visit_class([builder.files, builder.bindings], ("File", "Directory"), _check_adjust) if debug: _logger.debug(u"[job %s] path mappings is %s", j.name, json.dumps({p: builder.pathmapper.mapper(p) for p in builder.pathmapper.files()}, indent=4)) if self.tool.get("stdin"): with SourceLine(self.tool, "stdin", validate.ValidationException, debug): j.stdin = builder.do_eval(self.tool["stdin"]) reffiles.append({"class": "File", "path": j.stdin}) if self.tool.get("stderr"): with SourceLine(self.tool, "stderr", validate.ValidationException, debug): j.stderr = builder.do_eval(self.tool["stderr"]) if os.path.isabs(j.stderr) or ".." in j.stderr: raise validate.ValidationException("stderr must be a relative path, got '%s'" % j.stderr) if self.tool.get("stdout"): with SourceLine(self.tool, "stdout", validate.ValidationException, debug): j.stdout = builder.do_eval(self.tool["stdout"]) if os.path.isabs(j.stdout) or ".." in j.stdout or not j.stdout: raise validate.ValidationException("stdout must be a relative path, got '%s'" % j.stdout) if debug: _logger.debug(u"[job %s] command line bindings is %s", j.name, json.dumps(builder.bindings, indent=4)) dockerReq = self.get_requirement("DockerRequirement")[0] if dockerReq and kwargs.get("use_container"): out_prefix = kwargs.get("tmp_outdir_prefix") j.outdir = kwargs.get("outdir") or tempfile.mkdtemp(prefix=out_prefix) tmpdir_prefix = kwargs.get('tmpdir_prefix') j.tmpdir = kwargs.get("tmpdir") or tempfile.mkdtemp(prefix=tmpdir_prefix) j.stagedir = tempfile.mkdtemp(prefix=tmpdir_prefix) else: j.outdir = builder.outdir j.tmpdir = builder.tmpdir j.stagedir = builder.stagedir inplaceUpdateReq = self.get_requirement("http://commonwl.org/cwltool#InplaceUpdateRequirement")[0] if inplaceUpdateReq: j.inplace_update = inplaceUpdateReq["inplaceUpdate"] normalizeFilesDirs(j.generatefiles) readers = {} muts = set() if builder.mutation_manager: def register_mut(f): muts.add(f["location"]) builder.mutation_manager.register_mutation(j.name, f) def register_reader(f): if f["location"] not in muts: builder.mutation_manager.register_reader(j.name, f) readers[f["location"]] = f for li in j.generatefiles["listing"]: li = cast(Dict[Text, Any], li) if li.get("writable") and j.inplace_update: adjustFileObjs(li, register_mut) adjustDirObjs(li, register_mut) else: adjustFileObjs(li, register_reader) adjustDirObjs(li, register_reader) adjustFileObjs(builder.files, register_reader) adjustFileObjs(builder.bindings, register_reader) adjustDirObjs(builder.files, register_reader) adjustDirObjs(builder.bindings, register_reader) j.environment = {} evr = self.get_requirement("EnvVarRequirement")[0] if evr: for t in evr["envDef"]: j.environment[t["envName"]] = builder.do_eval(t["envValue"]) shellcmd = self.get_requirement("ShellCommandRequirement")[0] if shellcmd: cmd = [] # type: List[Text] for b in builder.bindings: arg = builder.generate_arg(b) if b.get("shellQuote", True): arg = [shellescape.quote(a) for a in aslist(arg)] cmd.extend(aslist(arg)) j.command_line = ["/bin/sh", "-c", " ".join(cmd)] else: j.command_line = flatten(list(map(builder.generate_arg, builder.bindings))) j.pathmapper = builder.pathmapper j.collect_outputs = partial( self.collect_output_ports, self.tool["outputs"], builder, compute_checksum=kwargs.get("compute_checksum", True), jobname=jobname, readers=readers) j.output_callback = output_callbacks yield j def collect_output_ports(self, ports, builder, outdir, compute_checksum=True, jobname="", readers=None): # type: (Set[Dict[Text, Any]], Builder, Text, bool, Text, Dict[Text, Any]) -> Dict[Text, Union[Text, List[Any], Dict[Text, Any]]] ret = {} # type: Dict[Text, Union[Text, List[Any], Dict[Text, Any]]] debug = _logger.isEnabledFor(logging.DEBUG) try: fs_access = builder.make_fs_access(outdir) custom_output = fs_access.join(outdir, "cwl.output.json") if fs_access.exists(custom_output): with fs_access.open(custom_output, "r") as f: ret = json.load(f) if debug: _logger.debug(u"Raw output from %s: %s", custom_output, json.dumps(ret, indent=4)) else: for i, port in enumerate(ports): def makeWorkflowException(msg): return WorkflowException( u"Error collecting output for parameter '%s':\n%s" % (shortname(port["id"]), msg)) with SourceLine(ports, i, makeWorkflowException, debug): fragment = shortname(port["id"]) ret[fragment] = self.collect_output(port, builder, outdir, fs_access, compute_checksum=compute_checksum) if ret: revmap = partial(revmap_file, builder, outdir) adjustDirObjs(ret, trim_listing) visit_class(ret, ("File", "Directory"), cast(Callable[[Any], Any], revmap)) visit_class(ret, ("File", "Directory"), remove_path) normalizeFilesDirs(ret) visit_class(ret, ("File", "Directory"), partial(check_valid_locations, fs_access)) if compute_checksum: adjustFileObjs(ret, partial(compute_checksums, fs_access)) validate.validate_ex(self.names.get_name("outputs_record_schema", ""), ret, strict=False, logger=_logger_validation_warnings) if ret is not None and builder.mutation_manager is not None: adjustFileObjs(ret, builder.mutation_manager.set_generation) return ret if ret is not None else {} except validate.ValidationException as e: raise WorkflowException("Error validating output record. " + Text(e) + "\n in " + json.dumps(ret, indent=4)) finally: if builder.mutation_manager and readers: for r in readers.values(): builder.mutation_manager.release_reader(jobname, r) def collect_output(self, schema, builder, outdir, fs_access, compute_checksum=True): # type: (Dict[Text, Any], Builder, Text, StdFsAccess, bool) -> Union[Dict[Text, Any], List[Union[Dict[Text, Any], Text]]] r = [] # type: List[Any] debug = _logger.isEnabledFor(logging.DEBUG) if "outputBinding" in schema: binding = schema["outputBinding"] globpatterns = [] # type: List[Text] revmap = partial(revmap_file, builder, outdir) if "glob" in binding: with SourceLine(binding, "glob", WorkflowException, debug): for gb in aslist(binding["glob"]): gb = builder.do_eval(gb) if gb: globpatterns.extend(aslist(gb)) for gb in globpatterns: if gb.startswith(outdir): gb = gb[len(outdir) + 1:] elif gb == ".": gb = outdir elif gb.startswith("/"): raise WorkflowException( "glob patterns must not start with '/'") try: prefix = fs_access.glob(outdir) r.extend([{"location": g, "path": fs_access.join(builder.outdir, g[len(prefix[0])+1:]), "basename": os.path.basename(g), "nameroot": os.path.splitext( os.path.basename(g))[0], "nameext": os.path.splitext( os.path.basename(g))[1], "class": "File" if fs_access.isfile(g) else "Directory"} for g in sorted(fs_access.glob( fs_access.join(outdir, gb)), key=cmp_to_key(cast( Callable[[Text, Text], int], locale.strcoll)))]) except (OSError, IOError) as e: _logger.warning(Text(e)) except: _logger.error("Unexpected error from fs_access", exc_info=True) raise for files in r: rfile = files.copy() revmap(rfile) if files["class"] == "Directory": ll = builder.loadListing or (binding and binding.get("loadListing")) if ll and ll != "no_listing": get_listing(fs_access, files, (ll == "deep_listing")) else: with fs_access.open(rfile["location"], "rb") as f: contents = b"" if binding.get("loadContents") or compute_checksum: contents = f.read(CONTENT_LIMIT) if binding.get("loadContents"): files["contents"] = contents if compute_checksum: checksum = hashlib.sha1() while contents != b"": checksum.update(contents) contents = f.read(1024 * 1024) files["checksum"] = "sha1$%s" % checksum.hexdigest() f.seek(0, 2) filesize = f.tell() files["size"] = filesize if "format" in schema: files["format"] = builder.do_eval(schema["format"], context=files) optional = False single = False if isinstance(schema["type"], list): if "null" in schema["type"]: optional = True if "File" in schema["type"] or "Directory" in schema["type"]: single = True elif schema["type"] == "File" or schema["type"] == "Directory": single = True if "outputEval" in binding: with SourceLine(binding, "outputEval", WorkflowException, debug): r = builder.do_eval(binding["outputEval"], context=r) if single: if not r and not optional: with SourceLine(binding, "glob", WorkflowException, debug): raise WorkflowException("Did not find output file with glob pattern: '{}'".format(globpatterns)) elif not r and optional: pass elif isinstance(r, list): if len(r) > 1: raise WorkflowException("Multiple matches for output item that is a single file.") else: r = r[0] if "secondaryFiles" in schema: with SourceLine(schema, "secondaryFiles", WorkflowException, debug): for primary in aslist(r): if isinstance(primary, dict): primary.setdefault("secondaryFiles", []) pathprefix = primary["path"][0:primary["path"].rindex("/")+1] for sf in aslist(schema["secondaryFiles"]): if isinstance(sf, dict) or "$(" in sf or "${" in sf: sfpath = builder.do_eval(sf, context=primary) subst = False else: sfpath = sf subst = True for sfitem in aslist(sfpath): if isinstance(sfitem, string_types): if subst: sfitem = {"path": substitute(primary["path"], sfitem)} else: sfitem = {"path": pathprefix+sfitem} if "path" in sfitem and "location" not in sfitem: revmap(sfitem) if fs_access.isfile(sfitem["location"]): sfitem["class"] = "File" primary["secondaryFiles"].append(sfitem) elif fs_access.isdir(sfitem["location"]): sfitem["class"] = "Directory" primary["secondaryFiles"].append(sfitem) # Ensure files point to local references outside of the run environment adjustFileObjs(r, cast( # known bug in mypy # https://github.com/python/mypy/issues/797 Callable[[Any], Any], revmap)) if not r and optional: r = None if (not r and isinstance(schema["type"], dict) and schema["type"]["type"] == "record"): out = {} for f in schema["type"]["fields"]: out[shortname(f["name"])] = self.collect_output( # type: ignore f, builder, outdir, fs_access, compute_checksum=compute_checksum) return out return rcwltool-1.0.20180302231433/cwltool/stdfsaccess.py0000644000175200017520000000452713247251316022151 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import glob import os from io import open from typing import BinaryIO, List, Union, Text, IO, overload from .utils import onWindows import six from six.moves import urllib from schema_salad.ref_resolver import file_uri, uri_file_path def abspath(src, basedir): # type: (Text, Text) -> Text if src.startswith(u"file://"): ab = six.text_type(uri_file_path(str(src))) elif urllib.parse.urlsplit(src).scheme in ['http','https']: return src else: if basedir.startswith(u"file://"): ab = src if os.path.isabs(src) else basedir+ '/'+ src else: ab = src if os.path.isabs(src) else os.path.join(basedir, src) return ab class StdFsAccess(object): def __init__(self, basedir): # type: (Text) -> None self.basedir = basedir def _abs(self, p): # type: (Text) -> Text return abspath(p, self.basedir) def glob(self, pattern): # type: (Text) -> List[Text] return [file_uri(str(self._abs(l))) for l in glob.glob(self._abs(pattern))] # overload is related to mypy type checking and in no way # modifies the behaviour of the function. @overload def open(self, fn, mode='rb'): # type: (Text, str) -> IO[bytes] pass @overload def open(self, fn, mode='r'): # type: (Text, str) -> IO[str] pass def open(self, fn, mode): return open(self._abs(fn), mode) def exists(self, fn): # type: (Text) -> bool return os.path.exists(self._abs(fn)) def isfile(self, fn): # type: (Text) -> bool return os.path.isfile(self._abs(fn)) def isdir(self, fn): # type: (Text) -> bool return os.path.isdir(self._abs(fn)) def listdir(self, fn): # type: (Text) -> List[Text] return [abspath(urllib.parse.quote(str(l)), fn) for l in os.listdir(self._abs(fn))] def join(self, path, *paths): # type: (Text, *Text) -> Text return os.path.join(path, *paths) def realpath(self, path): # type: (Text) -> Text return os.path.realpath(path) # On windows os.path.realpath appends unecessary Drive, here we would avoid that def docker_compatible_realpath(self, path): # type: (Text) -> Text if onWindows(): if path.startswith('/'): return path return '/'+path return self.realpath(path) cwltool-1.0.20180302231433/cwltool/flatten.py0000644000175200017520000000120013247251315021261 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import from typing import Any, Callable, List, cast # http://rightfootin.blogspot.com/2006/09/more-on-python-flatten.html def flatten(l, ltypes=(list, tuple)): # type: (Any, Any) -> List[Any] if l is None: return [] if not isinstance(l, ltypes): return [l] ltype = type(l) l = list(l) i = 0 while i < len(l): while isinstance(l[i], ltypes): if not l[i]: l.pop(i) i -= 1 break else: l[i:i + 1] = l[i] i += 1 return cast(Callable[[Any], List], ltype)(l) cwltool-1.0.20180302231433/cwltool/cwlrdf.py0000644000175200017520000001225713247251315021123 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import from typing import IO, Any, Dict, Text from rdflib import Graph from schema_salad.jsonld_context import makerdf from schema_salad.ref_resolver import ContextType from six.moves import urllib from .process import Process def gather(tool, ctx): # type: (Process, ContextType) -> Graph g = Graph() def visitor(t): makerdf(t["id"], t, ctx, graph=g) tool.visit(visitor) return g def printrdf(wf, ctx, sr): # type: (Process, ContextType, Text) -> Text return gather(wf, ctx).serialize(format=sr).decode('utf-8') def lastpart(uri): # type: (Any) -> Text uri = Text(uri) if "/" in uri: return uri[uri.rindex("/") + 1:] else: return uri def dot_with_parameters(g, stdout): # type: (Graph, IO[Any]) -> None qres = g.query( """SELECT ?step ?run ?runtype WHERE { ?step cwl:run ?run . ?run rdf:type ?runtype . }""") for step, run, runtype in qres: stdout.write(u'"%s" [label="%s"]\n' % (lastpart(step), "%s (%s)" % (lastpart(step), lastpart(run)))) qres = g.query( """SELECT ?step ?inp ?source WHERE { ?wf Workflow:steps ?step . ?step cwl:inputs ?inp . ?inp cwl:source ?source . }""") for step, inp, source in qres: stdout.write(u'"%s" [shape=box]\n' % (lastpart(inp))) stdout.write(u'"%s" -> "%s" [label="%s"]\n' % (lastpart(source), lastpart(inp), "")) stdout.write(u'"%s" -> "%s" [label="%s"]\n' % (lastpart(inp), lastpart(step), "")) qres = g.query( """SELECT ?step ?out WHERE { ?wf Workflow:steps ?step . ?step cwl:outputs ?out . }""") for step, out in qres: stdout.write(u'"%s" [shape=box]\n' % (lastpart(out))) stdout.write(u'"%s" -> "%s" [label="%s"]\n' % (lastpart(step), lastpart(out), "")) qres = g.query( """SELECT ?out ?source WHERE { ?wf cwl:outputs ?out . ?out cwl:source ?source . }""") for out, source in qres: stdout.write(u'"%s" [shape=octagon]\n' % (lastpart(out))) stdout.write(u'"%s" -> "%s" [label="%s"]\n' % (lastpart(source), lastpart(out), "")) qres = g.query( """SELECT ?inp WHERE { ?wf rdf:type cwl:Workflow . ?wf cwl:inputs ?inp . }""") for (inp,) in qres: stdout.write(u'"%s" [shape=octagon]\n' % (lastpart(inp))) def dot_without_parameters(g, stdout): # type: (Graph, IO[Any]) -> None dotname = {} # type: Dict[Text,Text] clusternode = {} stdout.write("compound=true\n") subworkflows = set() qres = g.query( """SELECT ?run WHERE { ?wf rdf:type cwl:Workflow . ?wf Workflow:steps ?step . ?step cwl:run ?run . ?run rdf:type cwl:Workflow . } ORDER BY ?wf""") for (run,) in qres: subworkflows.add(run) qres = g.query( """SELECT ?wf ?step ?run ?runtype WHERE { ?wf rdf:type cwl:Workflow . ?wf Workflow:steps ?step . ?step cwl:run ?run . ?run rdf:type ?runtype . } ORDER BY ?wf""") currentwf = None for wf, step, run, runtype in qres: if step not in dotname: dotname[step] = lastpart(step) if wf != currentwf: if currentwf is not None: stdout.write("}\n") if wf in subworkflows: if wf not in dotname: dotname[wf] = "cluster_" + lastpart(wf) stdout.write(u'subgraph "%s" { label="%s"\n' % (dotname[wf], lastpart(wf))) currentwf = wf clusternode[wf] = step else: currentwf = None if Text(runtype) != "https://w3id.org/cwl/cwl#Workflow": stdout.write(u'"%s" [label="%s"]\n' % (dotname[step], urllib.parse.urldefrag(Text(step))[1])) if currentwf is not None: stdout.write("}\n") qres = g.query( """SELECT DISTINCT ?src ?sink ?srcrun ?sinkrun WHERE { ?wf1 Workflow:steps ?src . ?wf2 Workflow:steps ?sink . ?src cwl:out ?out . ?inp cwl:source ?out . ?sink cwl:in ?inp . ?src cwl:run ?srcrun . ?sink cwl:run ?sinkrun . }""") for src, sink, srcrun, sinkrun in qres: attr = u"" if srcrun in clusternode: attr += u'ltail="%s"' % dotname[srcrun] src = clusternode[srcrun] if sinkrun in clusternode: attr += u' lhead="%s"' % dotname[sinkrun] sink = clusternode[sinkrun] stdout.write(u'"%s" -> "%s" [%s]\n' % (dotname[src], dotname[sink], attr)) def printdot(wf, ctx, stdout, include_parameters=False): # type: (Process, ContextType, Any, bool) -> None g = gather(wf, ctx) stdout.write("digraph {") # g.namespace_manager.qname(predicate) if include_parameters: dot_with_parameters(g, stdout) else: dot_without_parameters(g, stdout) stdout.write("}") cwltool-1.0.20180302231433/cwltool/resolver.py0000644000175200017520000000312013247251315021470 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import logging import os from six.moves import urllib from schema_salad.ref_resolver import file_uri _logger = logging.getLogger("cwltool") def resolve_local(document_loader, uri): if uri.startswith("/"): return None shares = [os.environ.get("XDG_DATA_HOME", os.path.join(os.path.expanduser('~'), ".local", "share"))] shares.extend(os.environ.get("XDG_DATA_DIRS", "/usr/local/share/:/usr/share/").split(":")) shares = [os.path.join(s, "commonwl", uri) for s in shares] shares.insert(0, os.path.join(os.getcwd(), uri)) _logger.debug("Search path is %s", shares) for s in shares: if os.path.exists(s): return file_uri(s) if os.path.exists("%s.cwl" % s): return file_uri(s) return None def tool_resolver(document_loader, uri): for r in [resolve_local, resolve_ga4gh_tool]: ret = r(document_loader, uri) if ret is not None: return ret return file_uri(os.path.abspath(uri), split_frag=True) ga4gh_tool_registries = ["https://dockstore.org:8443"] def resolve_ga4gh_tool(document_loader, uri): path, version = uri.partition(":")[::2] if not version: version = "latest" for reg in ga4gh_tool_registries: ds = "{0}/api/ga4gh/v1/tools/{1}/versions/{2}/plain-CWL/descriptor".format(reg, urllib.parse.quote(path, ""), urllib.parse.quote(version, "")) try: resp = document_loader.session.head(ds) resp.raise_for_status() return ds except Exception: pass return None cwltool-1.0.20180302231433/cwltool/draft2tool.py0000644000175200017520000000036613247251315021720 0ustar mcrusoemcrusoe00000000000000# Do wildcard import of command_line_tool from .command_line_tool import * _logger = logging.getLogger("cwltool") _logger.warning("'draft2tool.py' has been renamed to 'command_line_tool.py'" "and will be removed in the future.") cwltool-1.0.20180302231433/cwltool/cwlNodeEngineJSConsole.js0000644000175200017520000000173113247251315024122 0ustar mcrusoemcrusoe00000000000000"use strict"; function js_console_log(){ console.error("[log] "+require("util").format.apply(this, arguments).split("\n").join("\n[log] ")); } function js_console_err(){ console.error("[err] "+require("util").format.apply(this, arguments).split("\n").join("\n[err] ")); } process.stdin.setEncoding("utf8"); var incoming = ""; process.stdin.on("data", function(chunk) { incoming += chunk; var i = incoming.indexOf("\n"); if (i > -1) { try{ var fn = JSON.parse(incoming.substr(0, i)); incoming = incoming.substr(i+1); process.stdout.write(JSON.stringify(require("vm").runInNewContext(fn, { console: { log: js_console_log, error: js_console_err } })) + "\n"); } catch(e){ console.error(e) } /*strings to indicate the process has finished*/ console.log("r1cepzbhUTxtykz5XTC4"); console.error("r1cepzbhUTxtykz5XTC4"); } }); process.stdin.on("end", process.exit); cwltool-1.0.20180302231433/cwltool/pack.py0000644000175200017520000001513313247251315020554 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import copy import re from typing import Any, Callable, Dict, List, Set, Text, Union, cast from schema_salad.ref_resolver import Loader, SubLoader from six.moves import urllib from ruamel.yaml.comments import CommentedSeq, CommentedMap from .process import shortname, uniquename import six def flatten_deps(d, files): # type: (Any, Set[Text]) -> None if isinstance(d, list): for s in d: flatten_deps(s, files) elif isinstance(d, dict): if d["class"] == "File": files.add(d["location"]) if "secondaryFiles" in d: flatten_deps(d["secondaryFiles"], files) if "listing" in d: flatten_deps(d["listing"], files) def find_run(d, loadref, runs): # type: (Any, Callable[[Text, Text], Union[Dict, List, Text]], Set[Text]) -> None if isinstance(d, list): for s in d: find_run(s, loadref, runs) elif isinstance(d, dict): if "run" in d and isinstance(d["run"], six.string_types): if d["run"] not in runs: runs.add(d["run"]) find_run(loadref(None, d["run"]), loadref, runs) for s in d.values(): find_run(s, loadref, runs) def find_ids(d, ids): # type: (Any, Set[Text]) -> None if isinstance(d, list): for s in d: find_ids(s, ids) elif isinstance(d, dict): for i in ("id", "name"): if i in d and isinstance(d[i], six.string_types): ids.add(d[i]) for s in d.values(): find_ids(s, ids) def replace_refs(d, rewrite, stem, newstem): # type: (Any, Dict[Text, Text], Text, Text) -> None if isinstance(d, list): for s, v in enumerate(d): if isinstance(v, six.string_types): if v in rewrite: d[s] = rewrite[v] elif v.startswith(stem): d[s] = newstem + v[len(stem):] else: replace_refs(v, rewrite, stem, newstem) elif isinstance(d, dict): for s, v in d.items(): if isinstance(v, six.string_types): if v in rewrite: d[s] = rewrite[v] elif v.startswith(stem): id_ = v[len(stem):] # prevent appending newstems if tool is already packed if id_.startswith(newstem.strip("#")): d[s] = "#" + id_ else: d[s] = newstem + id_ replace_refs(v, rewrite, stem, newstem) def import_embed(d, seen): # type: (Any, Set[Text]) -> None if isinstance(d, list): for v in d: import_embed(v, seen) elif isinstance(d, dict): for n in ("id", "name"): if n in d: if d[n] in seen: this = d[n] d.clear() d["$import"] = this else: this = d[n] seen.add(this) break for k in sorted(d.keys()): import_embed(d[k], seen) def pack(document_loader, processobj, uri, metadata, rewrite_out=None): # type: (Loader, Union[Dict[Text, Any], List[Dict[Text, Any]]], Text, Dict[Text, Text], Dict[Text, Text]) -> Dict[Text, Any] document_loader = SubLoader(document_loader) document_loader.idx = {} if isinstance(processobj, dict): document_loader.idx[processobj["id"]] = CommentedMap(six.iteritems(processobj)) elif isinstance(processobj, list): path, frag = urllib.parse.urldefrag(uri) for po in processobj: if not frag: if po["id"].endswith("#main"): uri = po["id"] document_loader.idx[po["id"]] = CommentedMap(six.iteritems(po)) def loadref(b, u): # type: (Text, Text) -> Union[Dict, List, Text] return document_loader.resolve_ref(u, base_url=b)[0] ids = set() # type: Set[Text] find_ids(processobj, ids) runs = {uri} find_run(processobj, loadref, runs) for f in runs: find_ids(document_loader.resolve_ref(f)[0], ids) names = set() # type: Set[Text] if rewrite_out is None: rewrite = {} # type: Dict[Text, Text] else: rewrite = rewrite_out mainpath, _ = urllib.parse.urldefrag(uri) def rewrite_id(r, mainuri): # type: (Text, Text) -> None if r == mainuri: rewrite[r] = "#main" elif r.startswith(mainuri) and r[len(mainuri)] in ("#", "/"): if r[len(mainuri):].startswith("#main/"): rewrite[r] = "#" + uniquename(r[len(mainuri)+1:], names) else: rewrite[r] = "#" + uniquename("main/"+r[len(mainuri)+1:], names) else: path, frag = urllib.parse.urldefrag(r) if path == mainpath: rewrite[r] = "#" + uniquename(frag, names) else: if path not in rewrite: rewrite[path] = "#" + uniquename(shortname(path), names) sortedids = sorted(ids) for r in sortedids: rewrite_id(r, uri) packed = {"$graph": [], "cwlVersion": metadata["cwlVersion"] } # type: Dict[Text, Any] namespaces = metadata.get('$namespaces', None) schemas = set() # type: Set[Text] for r in sorted(runs): dcr, metadata = document_loader.resolve_ref(r) if isinstance(dcr, CommentedSeq): dcr = dcr[0] dcr = cast(CommentedMap, dcr) if not isinstance(dcr, dict): continue for doc in (dcr, metadata): if "$schemas" in doc: for s in doc["$schemas"]: schemas.add(s) if dcr.get("class") not in ("Workflow", "CommandLineTool", "ExpressionTool"): continue dc = cast(Dict[Text, Any], copy.deepcopy(dcr)) v = rewrite[r] dc["id"] = v for n in ("name", "cwlVersion", "$namespaces", "$schemas"): if n in dc: del dc[n] packed["$graph"].append(dc) if schemas: packed["$schemas"] = list(schemas) for r in rewrite: v = rewrite[r] replace_refs(packed, rewrite, r + "/" if "#" in r else r + "#", v + "/") import_embed(packed, set()) if len(packed["$graph"]) == 1: # duplicate 'cwlVersion' inside $graph when there is a single item # because we're printing contents inside '$graph' rather than whole dict packed["$graph"][0]["cwlVersion"] = packed["cwlVersion"] if namespaces: packed["$graph"][0]["$namespaces"] = dict(cast(Dict, namespaces)) return packed cwltool-1.0.20180302231433/cwltool/schemas/0000755000175200017520000000000013247251336020707 5ustar mcrusoemcrusoe00000000000000cwltool-1.0.20180302231433/cwltool/schemas/draft-2/0000755000175200017520000000000013247251336022146 5ustar mcrusoemcrusoe00000000000000cwltool-1.0.20180302231433/cwltool/schemas/draft-2/CommonWorkflowLanguage.yml0000644000175200017520000021506013247251315027321 0ustar mcrusoemcrusoe00000000000000$base: "https://w3id.org/cwl/cwl#" $namespaces: cwl: "https://w3id.org/cwl/cwl#" sld: "https://w3id.org/cwl/salad#" rdfs: "http://www.w3.org/2000/01/rdf-schema#" $graph: - name: "Common Workflow Language, Draft 2" type: documentation doc: | 7 July 2015 This version: * https://w3id.org/cwl/draft-2/ Current version: * https://w3id.org/cwl/ Authors: * Peter Amstutz , Curoverse * Nebojša Tijanić , Seven Bridges Genomics Contributers: * Luka Stojanovic , Seven Bridges Genomics * John Chilton , Galaxy Project, Pennsylvania State University * Michael R. Crusoe , Michigan State University * Hervé Ménager , Institut Pasteur * Maxim Mikheev , BioDatomics * Stian Soiland-Reyes [soiland-reyes@cs.manchester.ac.uk](mailto:soiland-reyes@cs.manchester.ac.uk), University of Manchester # Abstract A Workflow is an analysis task represented by a directed graph describing a sequence of operations that transform an input data set to output. This specification defines the Common Workflow Language (CWL), a vendor-neutral standard for representing workflows and concrete process steps intended to be portable across a variety of computing platforms. # Status of This Document This document is the product of the [Common Workflow Language working group](https://groups.google.com/forum/#!forum/common-workflow-language). The latest version of this document is available in the "specification" directory at https://github.com/common-workflow-language/common-workflow-language The products of the CWL working group (including this document) are made available under the terms of the Apache License, version 2.0. # Introduction The Common Workflow Language (CWL) working group is an informal, multi-vendor working group consisting of various organizations and individuals that have an interest in portability of data analysis workflows. The goal is to create specifications like this one that enable data scientists to describe analysis tools and workflows that are powerful, easy to use, portable, and support reproducibility. ## Introduction to draft 2 This specification represents the second milestone of the CWL group. Since draft-1, this draft introduces a number of major changes and additions: * Use of Avro schema (instead of JSON-schema) and JSON-LD for data modeling. * Significant refactoring of the Command Line Tool description. * Data and execution model for Workflows. * Extension mechanism though "hints" and "requirements". ## Purpose CWL is designed to express workflows for data-intensive science, such as Bioinformatics, Chemistry, Physics, and Astronomy. This specification is intended to define a data and execution model for Workflows and Command Line Tools that can be implemented on top of a variety of computing platforms, ranging from an individual workstation to cluster, grid, cloud, and high performance computing systems. ## References to Other Specifications * [JSON](http://json.org) * [JSON-LD](http://json-ld.org) * [JSON Pointer](https://tools.ietf.org/html/draft-ietf-appsawg-json-pointer-04) * [YAML](http://yaml.org) * [Avro](https://avro.apache.org/docs/current/spec.html) * [Uniform Resource Identifier (URI): Generic Syntax](https://tools.ietf.org/html/rfc3986) * [UTF-8](https://www.ietf.org/rfc/rfc2279.txt) * [Portable Operating System Interface (POSIX.1-2008)](http://pubs.opengroup.org/onlinepubs/9699919799/) * [Resource Description Framework (RDF)](http://www.w3.org/RDF/) ## Scope This document describes the CWL syntax, execution, and object model. It is not intended to document a specific implementation of CWL, however it may serve as a reference for the behavior of conforming implementations. ## Terminology The terminology used to describe CWL documents is defined in the Concepts section of the specification. The terms defined in the following list are used in building those definitions and in describing the actions of an CWL implementation: **may**: Conforming CWL documents and CWL implementations are permitted but not required to behave as described. **must**: Conforming CWL documents and CWL implementations are required to behave as described; otherwise they are in error. **error**: A violation of the rules of this specification; results are undefined. Conforming implementations may detect and report an error and may recover from it. **fatal error**: A violation of the rules of this specification; results are undefined. Conforming implementations must not continue to execute the current process and may report an error. **at user option**: Conforming software may or must (depending on the modal verb in the sentence) behave as described; if it does, it must provide users a means to enable or disable the behavior described. # Data model ## Data concepts An **object** is a data structure equivalent to the "object" type in JSON, consisting of a unordered set of name/value pairs (referred to here as **fields**) and where the name is a string and the value is a string, number, boolean, array, or object. A **document** is a file containing a serialized object, or an array of objects. A **process** is a basic unit of computation which accepts input data, performs some computation, and produces output data. An **input object** is an object describing the inputs to a invocation of process. An **output object** is an object describing the output of an invocation of a process. An **input schema** describes the valid format (required fields, data types) for an input object. An **output schema** describes the valid format for a output object. **Metadata** is information about workflows, tools, or input items that is not used directly in the computation. ## Syntax Documents containing CWL objects are serialized and loaded using YAML syntax and UTF-8 text encoding. A conforming implementation must accept all valid YAML documents. The CWL schema is defined using Avro Linked Data (avro-ld). Avro-ld is an extension of the Apache Avro schema language to support additional annotations mapping Avro fields to RDF predicates via JSON-LD. A CWL document may be validated by transforming the avro-ld schema to a base Apache Avro schema. An implementation may interpret a CWL document as [JSON-LD](http://json-ld.org) and convert a CWL document to a [Resource Description Framework (RDF)](http://www.w3.org/RDF/) using the CWL [JSON-LD Context](https://w3id.org/cwl/draft-2/context) (extracted from the avro-ld schema). The CWL [RDFS schema](https://w3id.org/cwl/draft-2/cwl.ttl) defines the classes and properties used by CWL as JSON-LD. The latest draft-2 schema is defined here: https://github.com/common-workflow-language/common-workflow-language/blob/master/schemas/draft-2/cwl-avro.yml ## Identifiers If an object contains an `id` field, that is used to uniquely identify the object in that document. The value of the `id` field must be unique over the entire document. The format of the `id` field is that of a [relative fragment identifier](https://tools.ietf.org/html/rfc3986#section-3.5), and must start with a hash `#` character. An implementation may choose to only honor references to object types for which the `id` field is explicitly listed in this specification. When loading a CWL document, an implementation may resolve relative identifiers to absolute URI references. For example, "my_tool.cwl" located in the directory "/home/example/work/" may be transformed to "file:///home/example/work/my_tool.cwl" and a relative fragment reference "#input" in this file may be transformed to "file:///home/example/work/my_tool.cwl#input". ## Document preprocessing An implementation must resolve `import` directives. An `import` directive is an object consisting of the field `import` specifying a URI. The URI referenced by `import` must be loaded as a CWL document (including recursive preprocessing) and then the `import` object is implicitly replaced by the external resource. URIs may include document fragments referring to objects identified by their `id` field, in which case the `import` directive is replaced by only the fragment object. An implementation must resolve `include` directives. An `include` directive is an object consisting of the field `include` specifying a URI. The URI referenced by `include` must be loaded as UTF-8 encoded text document and the `include` directive implicitly replaced by a string with the contents of the document. Because the loaded resource is unparsed, URIs used with `include` must not include fragments. ## Extensions and Metadata Implementation extensions not required for correct execution (for example, fields related to GUI rendering) may be stored in [process hints](#requirements_and_hints). Input metadata (for example, a lab sample identifier) may be explicitly represented within a workflow using input parameters which are propagated to output. Future versions of this specification may define additional facilities for working with input/output metadata. Fields for tool and workflow metadata (for example, authorship for use in citations) are not defined in this specification. Future versions of this specification may define such fields. # Execution model ## Execution concepts A **parameter** is a named symbolic input or output of process, with an associated datatype or schema. During execution, values are assigned to parameters to make the input object or output object used for concrete process invocation. A **command line tool** is a process characterized by the execution of a standalone, non-interactive program which is invoked on some input, produces output, and then terminates. A **workflow** is a process characterized by multiple subprocess steps, where step outputs are connected to the inputs of other downstream steps to form a directed graph, and independent steps may run concurrently. A **runtime environment** is the actual hardware and software environment when executing a command line tool. It includes, but is not limited to, the hardware architecture, hardware resources, operating system, software runtime (if applicable, such as the Python interpreter or the JVM), libraries, modules, packages, utilities, and data files required to run the tool. A **workflow platform** is a specific hardware and software implementation capable of interpreting a CWL document and executing the processes specified by the document. The responsibilities of the workflow platform may include scheduling process invocation, setting up the necessary runtime environment, making input data available, invoking the tool process, and collecting output. It is intended that the workflow platform has broad leeway outside of this specification to optimize use of computing resources and enforce policies not covered by this specifcation. Some areas that are currently out of scope for CWL specification but may be handled by a specific workflow platform include: * Data security and permissions. * Scheduling tool invocations on remote cluster or cloud compute nodes. * Using virtual machines or operating system containers to manage the runtime (except as described in [DockerRequirement](#dockerrequirement)). * Using remote or distributed file systems to manage input and output files. * Translating or rewriting file paths. * Determining if a process has previously been executed, skipping it and reusing previous results. * Pausing and resuming of processes or workflows. Conforming CWL processes must not assume anything about the runtime environment or workflow platform unless explicitly declared though the use of [process requirements](#processrequirement). ## Generic execution process The generic execution sequence of a CWL process (including both workflows and concrete process implementations) is as follows. 1. Load and validate CWL document, yielding a process object. 2. Load input object. 3. Validate the input object against the `inputs` schema for the process. 4. Validate that process requirements are met. 5. Perform any further setup required by the specific process type. 6. Execute the process. 7. Capture results of process execution into the output object. 8. Validate the output object against the `outputs` schema for the process. 9. Report the output object to the process caller. ## Requirements and hints A **[process requirement](#processrequirement)** modifies the semantics or runtime environment of a process. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option. A **hint** is similar to a requirement, however it is not an error if an implementation cannot satisfy all hints. The implementation may report a warning if a hint cannot be satisfied. Requirements are inherited. A requirement specified in a Workflow applies to all workflow steps; a requirement specified on a workflow step will apply to the process implementation. If the same process requirement appears at different levels of the workflow, the most specific instance of the requirement is used, that is, an entry in `requirements` on a process implementation such as CommandLineTool will take precendence over an entry in `requirements` specified in a workflow step, and an entry in `requirements` on a workflow step takes precedence over the workflow. Entries in `hints` are resolved the same way. Requirements override hints. If a process implementation provides a process requirement in `hints` which is also provided in `requirements` by an enclosing workflow or workflow step, the enclosing `requirements` takes precedence. Process requirements are the primary mechanism for specifying extensions to the CWL core specification. ## Expressions An expression is a fragment of executable code which is evaluated by the workflow platform to affect the inputs, outputs, or behavior of a process. In the generic execution sequence, expressions may be evaluated during step 5 (process setup), step 6 (execute process), and/or step 7 (capture output). Expressions are distinct from regular processes in that they are intended to modify the behavior of the workflow itself rather than perform the primary work of the workflow. An implementation must provide the predefined `cwl:JsonPointer` expression engine. This expression engine specifies a [JSON Pointer](https://tools.ietf.org/html/draft-ietf-appsawg-json-pointer-04) into an expression input object consisting of the `job` and `context` fields described below. An expression engine defined with [ExpressionEngineRequirement](#expressionenginerequirement) is a command line program following the following protocol: * On standard input, receive a JSON object with the following fields: - **engineConfig**: A list of strings from the `engineConfig` field. Null if `engineConfig` is not specified. - **job**: The input object of the current Process (context dependent). - **context**: The specific value being transformed (context dependent). May be null. - **script**: The code fragment to evaluate. - **outdir**: When used in the context of a CommandLineTool, this is the designated output directory that will be used when executing the tool. Null if not applicable. - **tmpdir**: When used in the context of a CommandLineTool, this is the designated temporary directory that will be used when executing the tool. Null if not applicable. * On standard output, print a single JSON value (string, number, array, object, boolean, or null) for the return value. Expressions must be evaluated in an isolated context (a "sandbox") which permits no side effects to leak outside the context, and permit no outside data to leak into the context. Implementations may apply limits, such as process isolation, timeouts, and operating system containers/jails to minimize the security risks associated with running untrusted code. The order in which expressions are evaluated within a process or workflow is undefined. ## Workflow graph A workflow describes a set of **steps** and the **dependencies** between those processes. When a process produces output that will be consumed by a second process, the first process is a dependency of the second process. When there is a dependency, the workflow engine must execute the dependency process and wait for it to successfully produce output before executing the dependent process. If two processes are defined in the workflow graph that are not directly or indirectly dependent, these processes are **independent**, and may execute in any order or execute concurrently. A workflow is complete when all steps have been executed. ## Success and failure A completed process must result in one of `success`, `temporaryFailure` or `permanentFailure` states. An implementation may choose to retry a process execution which resulted in `temporaryFailure`. An implementation may choose to either continue running other steps of a workflow, or terminate immediately upon `permanentFailure`. * If any step of a workflow execution results in `permanentFailure`, then the workflow status is `permanentFailure`. * If one or more steps result in `temporaryFailure` and all other steps complete `success` or are not executed, then the workflow status is `temporaryFailure`. * If all workflow steps are executed and complete with `success`, then the workflow status is `success`. ## Executing CWL documents as scripts By convention, a CWL document may begin with `#!/usr/bin/env cwl-runner` and be marked as executable (the POSIX "+x" permission bits) to enable it to be executed directly. A workflow platform may support this mode of operation; if so, it must provide `cwl-runner` as an alias for the platform's CWL implementation. # Sample CWL workflow revtool.cwl: ``` #!/usr/bin/env cwl-runner # # Simplest example command line program wrapper for the Unix tool "rev". # class: CommandLineTool description: "Reverse each line using the `rev` command" # The "inputs" array defines the structure of the input object that describes # the inputs to the underlying program. Here, there is one input field # defined that will be called "input" and will contain a "File" object. # # The input binding indicates that the input value should be turned into a # command line argument. In this example inputBinding is an empty object, # which indicates that the file name should be added to the command line at # a default location. inputs: - id: "#input" type: File inputBinding: {} # The "outputs" array defines the structure of the output object that # describes the outputs of the underlying program. Here, there is one # output field defined that will be called "output", must be a "File" type, # and after the program executes, the output value will be the file # output.txt in the designated output directory. outputs: - id: "#output" type: File outputBinding: glob: output.txt # The actual program to execute. baseCommand: rev # Specify that the standard output stream must be redirected to a file called # output.txt in the designated output directory. stdout: output.txt ``` sorttool.cwl: ``` #!/usr/bin/env cwl-runner # # Example command line program wrapper for the Unix tool "sort" # demonstrating command line flags. class: CommandLineTool description: "Sort lines using the `sort` command" # This example is similar to the previous one, with an additional input # parameter called "reverse". It is a boolean parameter, which is # intepreted as a command line flag. The value of "prefix" is used for # flag to put on the command line if "reverse" is true. If "reverse" is # false, no flag is added. # # This example also introduced the "position" field. This indicates the # sorting order of items on the command line. Lower numbers are placed # before higher numbers. Here, the "--reverse" flag (if present) will be # added to the command line before the input file path. inputs: - id: "#reverse" type: boolean inputBinding: position: 1 prefix: "--reverse" - id: "#input" type: File inputBinding: position: 2 outputs: - id: "#output" type: File outputBinding: glob: output.txt baseCommand: sort stdout: output.txt ``` revsort.cwl: ``` #!/usr/bin/env cwl-runner # # This is a two-step workflow which uses "revtool" and "sorttool" defined above. # class: Workflow description: "Reverse the lines in a document, then sort those lines." # Requirements specify prerequisites and extensions to the workflow. # In this example, DockerRequirement specifies a default Docker container # in which the command line tools will execute. requirements: - class: DockerRequirement dockerPull: debian:8 # The inputs array defines the structure of the input object that describes # the inputs to the workflow. # # The "reverse_sort" input parameter demonstrates the "default" field. If the # field "reverse_sort" is not provided in the input object, the default value will # be used. inputs: - id: "#input" type: File description: "The input file to be processed." - id: "#reverse_sort" type: boolean default: true description: "If true, reverse (descending) sort" # The "outputs" array defines the structure of the output object that describes # the outputs of the workflow. # # Each output field must be connected to the output of one of the workflow # steps using the "connect" field. Here, the parameter "#output" of the # workflow comes from the "#sorted" output of the "sort" step. outputs: - id: "#output" type: File source: "#sorted.output" description: "The output with the lines reversed and sorted." # The "steps" array lists the executable steps that make up the workflow. # The tool to execute each step is listed in the "run" field. # # In the first step, the "inputs" field of the step connects the upstream # parameter "#input" of the workflow to the input parameter of the tool # "revtool.cwl#input". # # In the second step, the "inputs" field of the step connects the output # parameter "#reversed" from the first step to the input parameter of the # tool "sorttool.cwl#input". steps: - inputs: - { id: "#rev.input", source: "#input" } outputs: - { id: "#rev.output" } run: { import: revtool.cwl } - inputs: - { id: "#sorted.input", source: "#rev.output" } - { id: "#sorted.reverse", source: "#reverse_sort" } outputs: - { id: "#sorted.output" } run: { import: sorttool.cwl } ``` Sample input object: ``` { "input": { "class": "File", "path": "whale.txt" } } ``` Sample output object: ``` { "output": { "path": "/tmp/tmpdeI_p_/output.txt", "size": 1111, "class": "File", "checksum": "sha1$b9214658cc453331b62c2282b772a5c063dbd284" } } ``` - name: Reference type: documentation doc: This section specifies the core object types that make up a CWL document. - type: enum name: CWLVersions doc: "Version symbols for published CWL document versions." symbols: - cwl:draft-2 - name: Datatype type: enum docAfter: "#ProcessRequirement" symbols: - "null" - sld:boolean - sld:int - sld:long - sld:float - sld:double - sld:bytes - sld:string - sld:record - sld:enum - sld:array - sld:map - cwl:File - cwl:Any doc: | CWL data types are based on Avro schema declarations. Refer to the [Avro schema declaration documentation](https://avro.apache.org/docs/current/spec.html#schemas) for detailed information. In addition, CWL defines [`File`](#file) as a special record type. ## Primitive types * **null**: no value * **boolean**: a binary value * **int**: 32-bit signed integer * **long**: 64-bit signed integer * **float**: single precision (32-bit) IEEE 754 floating-point number * **double**: double precision (64-bit) IEEE 754 floating-point number * **bytes**: sequence of uninterpreted 8-bit unsigned bytes * **string**: Unicode character sequence ## Complex types * **record**: An object with one or more fields defined by name and type * **enum**: A value from a finite set of symbolic values * **array**: An ordered sequence of values * **map**: An unordered collection of key/value pairs ## File type See [File](#file) below. ## Any type See [Any](#any) below. - name: File type: record docParent: "#Datatype" doc: | Represents a file (or group of files if `secondaryFiles` is specified) that must be accessible by tools using standard POSIX file system call API such as open(2) and read(2). fields: - name: "class" type: type: enum name: "File_class" symbols: - cwl:File jsonldPredicate: "_id": "@type" "_type": "@vocab" doc: Must be `File` to indicate this object describes a file. - name: "path" type: "string" doc: The path to the file. - name: "checksum" type: ["null", "string"] doc: | Optional hash code for validating file integrity. Currently must be in the form "sha1$ + hexidecimal string" using the SHA-1 algorithm. - name: "size" type: ["null", "long"] doc: Optional file size. - name: "cwl:secondaryFiles" type: - "null" - type: array items: "#File" doc: | A list of additional files that are associated with the primary file and must be transferred alongside the primary file. Examples include indexes of the primary file, or external references which must be included when loading primary document. A file object listed in `secondaryFiles` may itself include `secondaryFiles` for which the same rules apply. - name: Any type: enum docParent: "#Datatype" symbols: ["cwl:Any"] doc: | The **Any** type validates for any non-null value. - name: Schema type: record doc: "A schema defines a parameter type." docParent: "#Parameter" fields: - name: type doc: "The data type of this parameter." type: - "#Datatype" - "#Schema" - "string" - type: "array" items: [ "#Datatype", "#Schema", "string" ] jsonldPredicate: "_id": "sld:type" "_type": "@vocab" - name: fields type: - "null" - type: "array" items: "#Schema" jsonldPredicate: "_id": "sld:fields" "_container": "@list" doc: "When `type` is `record`, defines the fields of the record." - name: "symbols" type: - "null" - type: "array" items: "string" jsonldPredicate: "_id": "sld:symbols" "_container": "@list" doc: "When `type` is `enum`, defines the set of valid symbols." - name: items type: - "null" - "#Datatype" - "#Schema" - "string" - type: "array" items: [ "#Datatype", "#Schema", "string" ] jsonldPredicate: "_id": "sld:items" "_container": "@list" doc: "When `type` is `array`, defines the type of the array elements." - name: "values" type: - "null" - "#Datatype" - "#Schema" - "string" - type: "array" items: [ "#Datatype", "#Schema", "string" ] jsonldPredicate: "_id": "sld:values" "_container": "@list" doc: "When `type` is `map`, defines the value type for the key/value pairs." - name: Parameter type: record docParent: "#Process" abstract: true doc: | Define an input or output parameter to a process. fields: - name: type type: - "null" - "#Datatype" - "#Schema" - string - type: array items: - "#Datatype" - "#Schema" - string jsonldPredicate: "_id": "sld:type" "_type": "@vocab" doc: | Specify valid types of data that may be assigned to this parameter. - name: label type: - "null" - string jsonldPredicate: "rdfs:label" doc: "A short, human-readable label of this parameter object." - name: description type: - "null" - string jsonldPredicate: "rdfs:comment" doc: "A long, human-readable description of this parameter object." - name: streamable type: ["null", "boolean"] doc: | Currently only applies if `type` is `File`. A value of `true` indicates that the file is read or written sequentially without seeking. An implementation may use this flag to indicate whether it is valid to stream file contents using a named pipe. Default: `false`. - name: cwl:default type: ["null", Any] doc: | The default value for this parameter if not provided in the input object. - name: JsonPointer type: enum docParent: "#Expression" symbols: - "cwl:JsonPointer" - type: record name: Expression docAfter: "#ExpressionTool" doc: | Define an expression that will be evaluated and used to modify the behavior of a tool or workflow. See [Expressions](#expressions) for more information about expressions and [ExpressionEngineRequirement](#expressionenginerequirement) for information on how to define a expression engine. fields: - name: engine type: - "#JsonPointer" - string doc: | Either `cwl:JsonPointer` or a reference to an ExpressionEngineRequirement defining which engine to use. jsonldPredicate: "_id": "cwl:engine" "_type": "@id" - name: script type: string doc: "The code to be executed by the expression engine." - name: Binding type: record docParent: "#Parameter" fields: - name: loadContents type: - "null" - boolean doc: | Only applies when `type` is `File`. Read up to the first 64 KiB of text from the file and place it in the "contents" field of the file object for manipulation by expressions. - name: cwl:secondaryFiles type: - "null" - "string" - "#Expression" - type: "array" items: ["string", "#Expression"] doc: | Only applies when `type` is `File`. Describes files that must be included alongside the primary file. If the value is an expression, the context of the expression is the input or output File parameter to which this binding applies. If the value is a string, it specifies that the following pattern should be applied to the primary file: 1. If string begins with one or more caret `^` characters, for each caret, remove the last file extension from the path (the last period `.` and all following characters). If there are no file extensions, the path is unchanged. 2. Append the remainder of the string to the end of the file path. - name: InputSchema type: record extends: "#Schema" docParent: "#InputParameter" specialize: - specializeFrom: "#Schema" specializeTo: "#InputSchema" fields: - name: cwl:inputBinding type: [ "null", "#Binding" ] doc: | Describes how to handle a value in the input object convert it into a concrete form for execution, such as command line parameters. - name: OutputSchema type: record extends: "#Schema" docParent: "#OutputParameter" specialize: - specializeFrom: "#Schema" specializeTo: "#OutputSchema" - name: InputParameter type: record extends: "#Parameter" docAfter: "#Parameter" specialize: - specializeFrom: "#Schema" specializeTo: "#InputSchema" fields: - name: id type: string jsonldPredicate: "@id" doc: "The unique identifier for this parameter object." - name: "cwl:inputBinding" type: [ "null", "#Binding" ] doc: | Describes how to handle the inputs of a process and convert them into a concrete form for execution, such as command line parameters. - name: OutputParameter type: record extends: "#Parameter" docAfter: "#Parameter" specialize: - specializeFrom: "#Schema" specializeTo: "#OutputSchema" fields: - name: id type: string jsonldPredicate: "@id" doc: "The unique identifier for this parameter object." - type: record name: "FileDef" docParent: "#CreateFileRequirement" doc: | Define a file that must be placed in the designated output directory prior to executing the command line tool. May be the result of executing an expression, such as building a configuration file from a template. fields: - name: "filename" type: ["string", "#Expression"] doc: "The name of the file to create in the output directory." - name: "fileContent" type: ["string", "#Expression"] doc: | If the value is a string literal or an expression which evaluates to a string, a new file must be created with the string as the file contents. If the value is an expression that evaluates to a File object, this indicates the referenced file should be added to the designated output directory prior to executing the tool. Files added in this way may be read-only, and may be provided by bind mounts or file system links to avoid unnecessary copying of the input file. - type: record name: EnvironmentDef docParent: "#EnvVarRequirement" doc: | Define an environment variable that will be set in the runtime environment by the workflow platform when executing the command line tool. May be the result of executing an expression, such as getting a parameter from input. fields: - name: "envName" type: "string" doc: The environment variable name - name: "envValue" type: ["string", "#Expression"] doc: The environment variable value - type: record name: SchemaDef extends: "#InputSchema" docParent: "#SchemaDefRequirement" specialize: - specializeFrom: "#InputSchema" specializeTo: "#SchemaDef" - specializeFrom: "#Binding" specializeTo: "#CommandLineBinding" fields: - name: name type: ["null", string] doc: "The type name being defined." - type: record name: ProcessRequirement docAfter: "#ExpressionTool" abstract: true doc: | A process requirement declares a prerequisite that may or must be fulfilled before executing a process. See [`Process.hints`](#process) and [`Process.requirements`](#process). Process requirements are the primary mechanism for specifying extensions to the CWL core specification. fields: - name: "class" type: "string" doc: "The specific requirement type." jsonldPredicate: "_id": "@type" "_type": "@vocab" - type: record name: Process abstract: true docAfter: "#ProcessRequirement" doc: | The base executable type in CWL is the `Process` object defined by the document. Note that the `Process` object is abstract and cannot be directly executed. fields: - name: id type: ["null", string] jsonldPredicate: "@id" doc: "The unique identifier for this process object." - name: cwl:inputs type: type: array items: "#InputParameter" doc: | Defines the input parameters of the process. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used build a user interface for constructing the input object. - name: cwl:outputs type: type: array items: "#OutputParameter" doc: | Defines the parameters representing the output of the process. May be used to generate and/or validate the output object. - name: cwl:requirements type: - "null" - type: array items: "#ProcessRequirement" doc: > Declares requirements that apply to either the runtime environment or the workflow engine that must be met in order to execute this process. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option. - name: hints type: - "null" - type: array items: Any doc: > Declares hints applying to either the runtime environment or the workflow engine that may be helpful in executing this process. It is not an error if an implementation cannot satisfy all hints, however the implementation may report a warning. jsonldPredicate: _id: cwl:hints noLinkCheck: true - name: label type: - "null" - string jsonldPredicate: "rdfs:label" doc: "A short, human-readable label of this process object." - name: description type: - "null" - string jsonldPredicate: "rdfs:comment" doc: "A long, human-readable description of this process object." - name: cwlVersion type: - "null" - "#CWLVersions" doc: "CWL document version" jsonldPredicate: "_id": "cwl:cwlVersion" "_type": "@vocab" - type: record name: CommandLineBinding extends: "#Binding" docParent: "#CommandInputParameter" doc: | When listed under `inputBinding` in the input schema, the term "value" refers to the the corresponding value in the input object. For binding objects listed in `CommandLineTool.arguments`, the term "value" refers to the effective value after evaluating `valueFrom`. The binding behavior when building the command line depends on the data type of the value. If there is a mismatch between the type described by the input schema and the effective value, such as resulting from an expression evaluation, an implementation must use the data type of the effective value. - **string**: Add `prefix` and the string to the command line. - **number**: Add `prefix` and decimal representation to command line. - **boolean**: If true, add `prefix` to the command line. If false, add nothing. - **File**: Add `prefix` and the value of [`File.path`](#file) to the command line. - **array**: If `itemSeparator` is specified, add `prefix` and the join the array into a single string with `itemSeparator` separating the items. Otherwise first add `prefix`, then recursively process individual elements. - **object**: Add `prefix` only, and recursively add object fields for which `inputBinding` is specified. - **null**: Add nothing. fields: - name: "position" type: ["null", "int"] doc: "The sorting key. Default position is 0." - name: "prefix" type: [ "null", "string"] doc: "Command line prefix to add before the value." - name: "separate" type: ["null", boolean] doc: | If true (default), then the prefix and value must be added as separate command line arguments; if false, prefix and value must be concatenated into a single command line argument. - name: "itemSeparator" type: ["null", "string"] doc: | Join the array elements into a single string with the elements separated by by `itemSeparator`. - name: "valueFrom" type: - "null" - "string" - "#Expression" doc: | If `valueFrom` is a constant string value, use this as the value and apply the binding rules above. If `valueFrom` is an expression, evaluate the expression to yield the actual value to use to build the command line and apply the binding rules above. If the inputBinding is associated with an input parameter, the "context" of the expression will be the value of the input parameter. When a binding is part of the `CommandLineTool.arguments` field, the `valueFrom` field is required. - type: record name: CommandOutputBinding extends: "#Binding" docParent: "#CommandOutputParameter" doc: | Describes how to generate an output parameter based on the files produced by a CommandLineTool. The output parameter is generated by applying these operations in the following order: - glob - loadContents - outputEval fields: - name: glob type: - "null" - string - "#Expression" - type: array items: string doc: | Find files relative to the output directory, using POSIX glob(3) pathname matching. If provided an array, find files that match any pattern in the array. If provided an expression, the expression must return a string or an array of strings, which will then be evaluated as one or more glob patterns. Only files which actually exist will be matched and returned. - name: outputEval type: - "null" - "#Expression" doc: | Evaluate an expression to generate the output value. If `glob` was specified, the script `context` will be an array containing any files that were matched. Additionally, if `loadContents` is `true`, the File objects will include up to the first 64 KiB of file contents in the `contents` field. - type: record name: CommandInputSchema extends: "#InputSchema" docParent: "#CommandInputParameter" specialize: - specializeFrom: "#InputSchema" specializeTo: "#CommandInputSchema" - specializeFrom: "#Binding" specializeTo: "#CommandLineBinding" - type: record name: CommandOutputSchema extends: "#OutputSchema" docParent: "#CommandOutputParameter" specialize: - specializeFrom: "#OutputSchema" specializeTo: "#CommandOutputSchema" fields: - name: "cwl:outputBinding" type: [ "null", "#CommandOutputBinding" ] doc: | Describes how to handle the concrete outputs of a process step (such as files created by a program) and describe them in the process output parameter. - type: record name: CommandInputParameter extends: "#InputParameter" docParent: "#CommandLineTool" doc: An input parameter for a CommandLineTool. specialize: - specializeFrom: "#InputSchema" specializeTo: "#CommandInputSchema" - specializeFrom: "#Binding" specializeTo: "#CommandLineBinding" - type: record name: CommandOutputParameter extends: "#OutputParameter" docParent: "#CommandLineTool" doc: An output parameter for a CommandLineTool. specialize: - specializeFrom: "#OutputSchema" specializeTo: "#CommandOutputSchema" fields: - name: "cwl:outputBinding" type: [ "null", "#CommandOutputBinding" ] doc: | Describes how to handle the concrete outputs of a process step (such as files created by a program) and describe them in the process output parameter. - type: record name: CommandLineTool extends: "#Process" docAfter: "#Workflow" specialize: - specializeFrom: "#InputParameter" specializeTo: "#CommandInputParameter" - specializeFrom: "#OutputParameter" specializeTo: "#CommandOutputParameter" documentRoot: true doc: | A CommandLineTool process is a process implementation for executing a non-interactive application in a POSIX environment. To accommodate the enormous variety in syntax and semantics for input, runtime environment, invocation, and output of arbitrary programs, CommandLineTool uses an "input binding" that describes how to translate input parameters to an actual program invocation, and an "output binding" that describes how to generate output parameters from program output. # Input binding The tool command line is built by applying command line bindings to the input object. Bindings are listed either as part of an [input parameter](#commandinputparameter) using the `inputBinding` field, or separately using the `arguments` field of the CommandLineTool. The algorithm to build the command line is as follows. In this algorithm, the sort key is a list consisting of one or more numeric or string elements. Strings are sorted lexicographically based on UTF-8 encoding. 1. Collect `CommandLineBinding` objects from `arguments`. Assign a sorting key `[position, i]` where `position` is [`CommandLineBinding.position`](#commandlinebinding) and `i` is the index in the `arguments` list. 2. Collect `CommandLineBinding` objects from the `inputs` schema and associate them with values from the input object. Where the input type is a record, array, or map, recursively walk the schema and input object, collecting nested `CommandLineBinding` objects and associating them with values from the input object. 3. Create a sorting key by taking the value of the `position` field at each level leading to each leaf binding object. If `position` is not specified, it is not added to the sorting key. For bindings on arrays and maps, the sorting key must include the array index or map key following the position. If and only if two bindings have the same sort key, the tie must be broken using the ordering of the field or parameter name immediately containing the leaf binding. 4. Sort elements using the assigned sorting keys. Numeric entries sort before strings. 5. In the sorted order, apply the rules defined in [`CommandLineBinding`](#commandlinebinding) to convert bindings to actual command line elements. 6. Insert elements from `baseCommand` at the beginning of the command line. # Runtime environment All files listed in the input object must be made available in the runtime environment. The implementation may use a shared or distributed file system or transfer files via explicit download. Implementations may choose not to provide access to files not explicitly specified by the input object or process requirements. Output files produced by tool execution must be written to the **designated output directory**. The initial current working directory when executing the tool must be the designated output directory. When executing the tool, the child process must not inherit environment variables from the parent process. The tool must execute in a new, empty environment, containing only environment variables defined by [EnvVarRequirement](#envvarrequirement), the default environment of the Docker container specified in [DockerRequirement](#dockerrequirement) (if applicable), and `TMPDIR`. The `TMPDIR` environment variable must be set in the runtime environment to the **designated temporary directory**. Any files written to the designated temporary directory may be deleted by the workflow platform when the tool invocation is complete. An implementation may forbid the tool from writing to any location in the runtime environment file system other than the designated temporary directory and designated output directory. An implementation may provide read-only input files, and disallow in-place update of input files. The standard input stream and standard output stream may be redirected as described in the `stdin` and `stdout` fields. ## Extensions [DockerRequirement](#dockerrequirement), [CreateFileRequirement](#createfilerequirement), and [EnvVarRequirement](#envvarrequirement) are available as standard extensions to core command line tool semantics for defining the runtime environment. # Execution Once the command line is built and the runtime environment is created, the actual tool is executed. The standard error stream and standard output stream (unless redirected by setting `stdout`) may be captured by platform logging facilities for storage and reporting. Tools may be multithreaded or spawn child processes; however, when the parent process exits, the tool is considered finished regardless of whether any detached child processes are still running. Tools must not require any kind of console, GUI, or web based user interaction in order to start and run to completion. The exit code of the process indicates if the process completed successfully. By convention, an exit code of zero is treated as success and non-zero exit codes are treated as failure. This may be customized by providing the fields `successCodes`, `temporaryFailCodes`, and `permanentFailCodes`. An implementation may choose to default unspecified non-zero exit codes to either `temporaryFailure` or `permanentFailure`. # Output binding If the output directory contains a file named "cwl.output.json", that file must be loaded and used as the output object. Otherwise, the output object must be generated by walking the parameters listed in `outputs` and applying output bindings to the tool output. Output bindings are associated with output parameters using the `outputBinding` field. See [`CommandOutputBinding`](#commandoutputbinding) for details. fields: - name: "class" jsonldPredicate: "_id": "@type" "_type": "@vocab" type: string - name: baseCommand doc: | Specifies the program to execute. If the value is an array, the first element is the program to execute, and subsequent elements are placed at the beginning of the command line in prior to any command line bindings. If the program includes a path separator character it must be an absolute path, otherwise it is an error. If the program does not include a path separator, search the `$PATH` variable in the runtime environment of the workflow runner find the absolute path of the executable. type: - string - type: array items: string jsonldPredicate: "_id": "cwl:baseCommand" "_container": "@list" - name: arguments doc: | Command line bindings which are not directly associated with input parameters. type: - "null" - type: array items: [string, "#CommandLineBinding"] jsonldPredicate: "_id": "cwl:arguments" "_container": "@list" - name: stdin type: ["null", string, "#Expression"] doc: | A path to a file whose contents must be piped into the command's standard input stream. - name: stdout type: ["null", string, "#Expression"] doc: | Capture the command's standard output stream to a file written to the designated output directory. If `stdout` is a string, it specifies the file name to use. If `stdout` is an expression, the expression is evaluated and must return a string with the file name to use to capture stdout. If the return value is not a string, or the resulting path contains illegal characters (such as the path separator `/`) it is an error. - name: successCodes type: - "null" - type: array items: int doc: | Exit codes that indicate the process completed successfully. - name: temporaryFailCodes type: - "null" - type: array items: int doc: | Exit codes that indicate the process failed due to a possibly temporary condition, where excuting the process with the same runtime environment and inputs may produce different results. - name: permanentFailCodes type: - "null" - type: array items: int doc: Exit codes that indicate the process failed due to a permanent logic error, where excuting the process with the same runtime environment and same inputs is expected to always fail. - type: record name: ExpressionTool extends: "#Process" docAfter: "#CommandLineTool" documentRoot: true doc: | Execute an expression as a process step. fields: - name: "class" jsonldPredicate: "_id": "@type" "_type": "@vocab" type: string - name: expression type: "#Expression" doc: | The expression to execute. The expression must return a JSON object which matches the output parameters of the ExpressionTool. - name: LinkMergeMethod type: enum docParent: "#WorkflowStepInput" doc: The input link merge method, described in [WorkflowStepInput](#workflowstepinput). symbols: - merge_nested - merge_flattened - name: WorkflowOutputParameter type: record extends: "#OutputParameter" docParent: "#Workflow" doc: | Describe an output parameter of a workflow. The parameter must be connected to one or more parameters defined in the workflow that will provide the value of the output parameter. fields: - name: source doc: | Specifies one or more workflow parameters that will provide this output value. jsonldPredicate: "_id": "cwl:source" "_type": "@id" type: - "null" - string - type: array items: string - name: cwl:linkMerge type: ["null", "#LinkMergeMethod"] doc: | The method to use to merge multiple inbound links into a single array. If not specified, the default method is "merge_nested". - type: record name: WorkflowStepInput docParent: "#WorkflowStep" doc: | The input of a workflow step connects an upstream parameter (from the workflow inputs, or the outputs of other workflows steps) with the input parameters of the underlying process. ## Input object A WorkflowStepInput object must contain an `id` field in the form `#fieldname` or `#stepname.fieldname`. When the `id` field contains a period `.` the field name consists of the characters following the final period. This defines a field of the workflow step input object with the value of the `source` parameter(s). ## Merging If the sink parameter is an array, or named in a [workflow scatter](#workflowstep) operation, there may be multiple inbound data links listed in the `connect` field. The values from the input links are merged depending on the method specified in the `linkMerge` field. If not specified, the default method is "merge_nested". * **merge_nested** The input must be an array consisting of exactly one entry for each input link. If "merge_nested" is specified with a single link, the value from the link must be wrapped in a single-item list. * **merge_flattened** 1. The source and sink parameters must be compatible types, or the source type must be compatible with single element from the "items" type of the destination array parameter. 2. Source parameters which are arrays are concatenated. Source parameters which are single element types are appended as single elements. fields: - name: id type: string jsonldPredicate: "@id" doc: "A unique identifier for this workflow input parameter." - name: source doc: | Specifies one or more workflow parameters that will provide input to the underlying process parameter. jsonldPredicate: "_id": "cwl:source" "_type": "@id" type: - "null" - string - type: array items: string - name: cwl:linkMerge type: ["null", "#LinkMergeMethod"] doc: | The method to use to merge multiple inbound links into a single array. If not specified, the default method is "merge_nested". - name: cwl:default type: ["null", Any] doc: | The default value for this parameter if there is no `source` field. - type: record name: WorkflowStepOutput docParent: "#WorkflowStep" doc: | Associate an output parameter of the underlying process with a workflow parameter. The workflow parameter (given in the `id` field) be may be used as a `source` to connect with input parameters of other workflow steps, or with an output parameter of the process. fields: - name: id type: string jsonldPredicate: "@id" doc: | A unique identifier for this workflow output parameter. This is the identifier to use in the `source` field of `WorkflowStepInput` to connect the output value to downstream parameters. - name: ScatterMethod type: enum docParent: "#WorkflowStep" doc: The scatter method, as described in [workflow step scatter](#workflowstep). symbols: - dotproduct - nested_crossproduct - flat_crossproduct - name: WorkflowStep type: record docParent: "#Workflow" doc: | A workflow step is an executable element of a workflow. It specifies the underlying process implementation (such as `CommandLineTool`) in the `run` field and connects the input and output parameters of the underlying process to workflow parameters. # Scatter/gather To use scatter/gather, [ScatterFeatureRequirement](#scatterfeaturerequirement) must be specified in the workflow or workflow step requirements. A "scatter" operation specifies that the associated workflow step or subworkflow should execute separately over a list of input elements. Each job making up a scatter operaution is independent and may be executed concurrently. The `scatter` field specifies one or more input parameters which will be scattered. An input parameter may be listed more than once. The declared type of each input parameter is implicitly wrapped in an array for each time it appears in the `scatter` field. As a result, upstream parameters which are connected to scattered parameters may be arrays. All output parameter types are also implicitly wrapped in arrays. Each job in the scatter results in an entry in the output array. If `scatter` declares more than one input parameter, `scatterMethod` describes how to decompose the input into a discrete set of jobs. * **dotproduct** specifies that each of the input arrays are aligned and one element taken from each array to construct each job. It is an error if all input arrays are not the same length. * **nested_crossproduct** specifies the Cartesian product of the inputs, producing a job for every combination of the scattered inputs. The output must be nested arrays for each level of scattering, in the order that the input arrays are listed in the `scatter` field. * **flat_crossproduct** specifies the Cartesian product of the inputs, producing a job for every combination of the scattered inputs. The output arrays must be flattened to a single level, but otherwise listed in the order that the input arrays are listed in the `scatter` field. # Subworkflows To specify a nested workflow as part of a workflow step, [SubworkflowFeatureRequirement](#subworkflowfeaturerequirement) must be specified in the workflow or workflow step requirements. fields: - name: id type: ["null", string] jsonldPredicate: "@id" doc: "The unique identifier for this workflow step." - name: cwl:inputs type: type: array items: "#WorkflowStepInput" doc: | Defines the input parameters of the workflow step. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used build a user interface for constructing the input object. - name: cwl:outputs type: type: array items: "#WorkflowStepOutput" doc: | Defines the parameters representing the output of the process. May be used to generate and/or validate the output object. - name: cwl:requirements type: - "null" - type: array items: "#ProcessRequirement" doc: > Declares requirements that apply to either the runtime environment or the workflow engine that must be met in order to execute this workflow step. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option. - name: hints type: - "null" - type: array items: Any doc: > Declares hints applying to either the runtime environment or the workflow engine that may be helpful in executing this workflow step. It is not an error if an implementation cannot satisfy all hints, however the implementation may report a warning. jsonldPredicate: _id: cwl:hints noLinkCheck: true - name: label type: - "null" - string jsonldPredicate: "rdfs:label" doc: "A short, human-readable label of this process object." - name: description type: - "null" - string jsonldPredicate: "rdfs:comment" doc: "A long, human-readable description of this process object." - name: run type: "#Process" doc: | Specifies the process to run. - name: scatter type: - "null" - string - type: array items: string jsonldPredicate: "_id": "cwl:scatter" "_type": "@id" "_container": "@list" - name: scatterMethod doc: | Required if `scatter` is an array of more than one element. type: - "null" - "#ScatterMethod" jsonldPredicate: "_id": "cwl:scatterMethod" "_type": "@vocab" - name: Workflow type: record docParent: "#Reference" extends: "#Process" specialize: - specializeFrom: "#OutputParameter" specializeTo: "#WorkflowOutputParameter" documentRoot: true doc: | A workflow is a process consisting of one or more `steps`. Each step has input and output parameters defined by the `inputs` and `outputs` fields. A workflow executes as described in [execution model](#workflow_graph). # Dependencies Dependencies between parameters are expressed using the `source` field on [workflow step input parameters](#workflowstepinput) and [workflow output parameters](#workflowoutputparameter). The `source` field expresses the dependency of one parameter on another such that when a value is associated with the parameter specified by `source`, that value is propagated to the destination parameter. When all data links inbound to a given step are fufilled, the step is ready to execute. # Extensions [ScatterFeatureRequirement](#scatterfeaturerequirement) and [SubworkflowFeatureRequirement](#subworkflowfeaturerequirement) are available as standard extensions to core workflow semantics. fields: - name: "class" jsonldPredicate: "_id": "@type" "_type": "@vocab" type: string - name: steps doc: | The individual steps that make up the workflow. Each step is executed when all of its input data links are fufilled. An implementation may choose to execute the steps in a different order than listed and/or execute steps concurrently, provided that dependencies between steps are met. type: - type: array items: "#WorkflowStep" - type: record name: DockerRequirement extends: "#ProcessRequirement" doc: | Indicates that a workflow component should be run in a [Docker](http://docker.com) container, and specifies how to fetch or build the image. If a CommandLineTool lists `DockerRequirement` under `hints` or `requirements`, it may (or must) be run in the specified Docker container. The platform must first acquire or install the correct Docker image as specified by `dockerPull`, `dockerLoad` or `dockerFile`. The platform must execute the tool in the container using `docker run` with the appropriate Docker image and tool command line. The workflow platform may provide input files and the designated output directory through the use of volume bind mounts. The platform may rewrite file paths in the input object to correspond to the Docker bind mounted locations. When running a tool contained in Docker, the workflow platform must not assume anything about the contents of the Docker container, such as the presence or absence of specific software, except to assume that the generated command line represents a valid command within the runtime environment of the container. ## Interaction with other requirements If [EnvVarRequirement](#envvarrequirement) is specified alongside a DockerRequirement, the environment variables must be provided to Docker using `--env` or `--env-file` and interact with the container's preexisting environment as defined by Docker. fields: - name: dockerPull type: ["null", "string"] doc: "Specify a Docker image to retrieve using `docker pull`." - name: "dockerLoad" type: ["null", "string"] doc: "Specify a HTTP URL from which to download a Docker image using `docker load`." - name: dockerFile type: ["null", "string"] doc: "Supply the contents of a Dockerfile which will be built using `docker build`." - name: dockerImageId type: ["null", "string"] doc: | The image id that will be used for `docker run`. May be a human-readable image name or the image identifier hash. May be skipped if `dockerPull` is specified, in which case the `dockerPull` image id must be used. - name: dockerOutputDirectory type: ["null", "string"] doc: | Set the designated output directory to a specific location inside the Docker container. - type: record name: SubworkflowFeatureRequirement extends: "#ProcessRequirement" doc: | Indicates that the workflow platform must support nested workflows in the `run` field of (WorkflowStep)(#workflowstep). - name: CreateFileRequirement type: record extends: "#ProcessRequirement" doc: | Define a list of files that must be created by the workflow platform in the designated output directory prior to executing the command line tool. See `FileDef` for details. fields: - name: fileDef type: type: "array" items: "#FileDef" doc: The list of files. - name: EnvVarRequirement type: record extends: "#ProcessRequirement" doc: | Define a list of environment variables which will be set in the execution environment of the tool. See `EnvironmentDef` for details. fields: - name: envDef type: type: "array" items: "#EnvironmentDef" doc: The list of environment variables. - name: ScatterFeatureRequirement type: record extends: "#ProcessRequirement" doc: | Indicates that the workflow platform must support the `scatter` and `scatterMethod` fields of [WorkflowStep](#workflowstep). - name: SchemaDefRequirement type: record extends: "#ProcessRequirement" doc: | This field consists of an array of type definitions which must be used when interpreting the `inputs` and `outputs` fields. When a symbolic type is encountered that is not in [`Datatype`](#datatype), the implementation must check if the type is defined in `schemaDefs` and use that definition. If the type is not found in `schemaDefs`, it is an error. The entries in `schemaDefs` must be processed in the order listed such that later schema definitions may refer to earlier schema definitions. fields: - name: types type: type: array items: "#SchemaDef" doc: The list of type definitions. - type: record name: ExpressionEngineRequirement extends: "#ProcessRequirement" doc: | Define an expression engine, as described in [Expressions](#expressions). fields: - name: id type: string doc: "Used to identify the expression engine in the `engine` field of Expressions." jsonldPredicate: "@id" - name: cwl:requirements type: - "null" - type: array items: "#ProcessRequirement" doc: | Requirements to run this expression engine, such as DockerRequirement for specifying a container to run the engine. - name: engineCommand type: - "null" - string - type: array items: string doc: "The command line to invoke the expression engine." - name: engineConfig type: - "null" - type: array items: string doc: | Additional configuration or code fragments that will also be passed to the expression engine. The semantics of this field are defined by the underlying expression engine. Intended for uses such as providing function definitions that will be called from CWL expressions. cwltool-1.0.20180302231433/cwltool/schemas/draft-2/cwl-avro.yml0000644000175200017520000021224313247251315024424 0ustar mcrusoemcrusoe00000000000000- name: "Common Workflow Language, Draft 2" type: doc doc: | 7 July 2015 This version: * https://w3id.org/cwl/draft-2/ Current version: * https://w3id.org/cwl/ Authors: * Peter Amstutz , Curoverse * Nebojša Tijanić , Seven Bridges Genomics Contributers: * Luka Stojanovic , Seven Bridges Genomics * John Chilton , Galaxy Project, Pennsylvania State University * Michael R. Crusoe , Michigan State University * Hervé Ménager , Institut Pasteur * Maxim Mikheev , BioDatomics * Stian Soiland-Reyes [soiland-reyes@cs.manchester.ac.uk](mailto:soiland-reyes@cs.manchester.ac.uk), University of Manchester # Abstract A Workflow is an analysis task represented by a directed graph describing a sequence of operations that transform an input data set to output. This specification defines the Common Workflow Language (CWL), a vendor-neutral standard for representing workflows and concrete process steps intended to be portable across a variety of computing platforms. # Status of This Document This document is the product of the [Common Workflow Language working group](https://groups.google.com/forum/#!forum/common-workflow-language). The latest version of this document is available in the "specification" directory at https://github.com/common-workflow-language/common-workflow-language The products of the CWL working group (including this document) are made available under the terms of the Apache License, version 2.0. # Introduction The Common Workflow Language (CWL) working group is an informal, multi-vendor working group consisting of various organizations and individuals that have an interest in portability of data analysis workflows. The goal is to create specifications like this one that enable data scientists to describe analysis tools and workflows that are powerful, easy to use, portable, and support reproducibility. ## Introduction to draft 2 This specification represents the second milestone of the CWL group. Since draft-1, this draft introduces a number of major changes and additions: * Use of Avro schema (instead of JSON-schema) and JSON-LD for data modeling. * Significant refactoring of the Command Line Tool description. * Data and execution model for Workflows. * Extension mechanism though "hints" and "requirements". ## Purpose CWL is designed to express workflows for data-intensive science, such as Bioinformatics, Chemistry, Physics, and Astronomy. This specification is intended to define a data and execution model for Workflows and Command Line Tools that can be implemented on top of a variety of computing platforms, ranging from an individual workstation to cluster, grid, cloud, and high performance computing systems. ## References to Other Specifications * [JSON](http://json.org) * [JSON-LD](http://json-ld.org) * [JSON Pointer](https://tools.ietf.org/html/draft-ietf-appsawg-json-pointer-04) * [YAML](http://yaml.org) * [Avro](https://avro.apache.org/docs/current/spec.html) * [Uniform Resource Identifier (URI): Generic Syntax](https://tools.ietf.org/html/rfc3986) * [UTF-8](https://www.ietf.org/rfc/rfc2279.txt) * [Portable Operating System Interface (POSIX.1-2008)](http://pubs.opengroup.org/onlinepubs/9699919799/) * [Resource Description Framework (RDF)](http://www.w3.org/RDF/) ## Scope This document describes the CWL syntax, execution, and object model. It is not intended to document a specific implementation of CWL, however it may serve as a reference for the behavior of conforming implementations. ## Terminology The terminology used to describe CWL documents is defined in the Concepts section of the specification. The terms defined in the following list are used in building those definitions and in describing the actions of an CWL implementation: **may**: Conforming CWL documents and CWL implementations are permitted but not required to behave as described. **must**: Conforming CWL documents and CWL implementations are required to behave as described; otherwise they are in error. **error**: A violation of the rules of this specification; results are undefined. Conforming implementations may detect and report an error and may recover from it. **fatal error**: A violation of the rules of this specification; results are undefined. Conforming implementations must not continue to execute the current process and may report an error. **at user option**: Conforming software may or must (depending on the modal verb in the sentence) behave as described; if it does, it must provide users a means to enable or disable the behavior described. # Data model ## Data concepts An **object** is a data structure equivalent to the "object" type in JSON, consisting of a unordered set of name/value pairs (referred to here as **fields**) and where the name is a string and the value is a string, number, boolean, array, or object. A **document** is a file containing a serialized object, or an array of objects. A **process** is a basic unit of computation which accepts input data, performs some computation, and produces output data. An **input object** is an object describing the inputs to a invocation of process. An **output object** is an object describing the output of an invocation of a process. An **input schema** describes the valid format (required fields, data types) for an input object. An **output schema** describes the valid format for a output object. **Metadata** is information about workflows, tools, or input items that is not used directly in the computation. ## Syntax Documents containing CWL objects are serialized and loaded using YAML syntax and UTF-8 text encoding. A conforming implementation must accept all valid YAML documents. The CWL schema is defined using Avro Linked Data (avro-ld). Avro-ld is an extension of the Apache Avro schema language to support additional annotations mapping Avro fields to RDF predicates via JSON-LD. A CWL document may be validated by transforming the avro-ld schema to a base Apache Avro schema. An implementation may interpret a CWL document as [JSON-LD](http://json-ld.org) and convert a CWL document to a [Resource Description Framework (RDF)](http://www.w3.org/RDF/) using the CWL [JSON-LD Context](https://w3id.org/cwl/draft-2/context) (extracted from the avro-ld schema). The CWL [RDFS schema](https://w3id.org/cwl/draft-2/cwl.ttl) defines the classes and properties used by CWL as JSON-LD. The latest draft-2 schema is defined here: https://github.com/common-workflow-language/common-workflow-language/blob/master/schemas/draft-2/cwl-avro.yml ## Identifiers If an object contains an `id` field, that is used to uniquely identify the object in that document. The value of the `id` field must be unique over the entire document. The format of the `id` field is that of a [relative fragment identifier](https://tools.ietf.org/html/rfc3986#section-3.5), and must start with a hash `#` character. An implementation may choose to only honor references to object types for which the `id` field is explicitly listed in this specification. When loading a CWL document, an implementation may resolve relative identifiers to absolute URI references. For example, "my_tool.cwl" located in the directory "/home/example/work/" may be transformed to "file:///home/example/work/my_tool.cwl" and a relative fragment reference "#input" in this file may be transformed to "file:///home/example/work/my_tool.cwl#input". ## Document preprocessing An implementation must resolve `import` directives. An `import` directive is an object consisting of the field `import` specifying a URI. The URI referenced by `import` must be loaded as a CWL document (including recursive preprocessing) and then the `import` object is implicitly replaced by the external resource. URIs may include document fragments referring to objects identified by their `id` field, in which case the `import` directive is replaced by only the fragment object. An implementation must resolve `include` directives. An `include` directive is an object consisting of the field `include` specifying a URI. The URI referenced by `include` must be loaded as UTF-8 encoded text document and the `include` directive implicitly replaced by a string with the contents of the document. Because the loaded resource is unparsed, URIs used with `include` must not include fragments. ## Extensions and Metadata Implementation extensions not required for correct execution (for example, fields related to GUI rendering) may be stored in [process hints](#requirements_and_hints). Input metadata (for example, a lab sample identifier) may be explicitly represented within a workflow using input parameters which are propagated to output. Future versions of this specification may define additional facilities for working with input/output metadata. Fields for tool and workflow metadata (for example, authorship for use in citations) are not defined in this specification. Future versions of this specification may define such fields. # Execution model ## Execution concepts A **parameter** is a named symbolic input or output of process, with an associated datatype or schema. During execution, values are assigned to parameters to make the input object or output object used for concrete process invocation. A **command line tool** is a process characterized by the execution of a standalone, non-interactive program which is invoked on some input, produces output, and then terminates. A **workflow** is a process characterized by multiple subprocess steps, where step outputs are connected to the inputs of other downstream steps to form a directed graph, and independent steps may run concurrently. A **runtime environment** is the actual hardware and software environment when executing a command line tool. It includes, but is not limited to, the hardware architecture, hardware resources, operating system, software runtime (if applicable, such as the Python interpreter or the JVM), libraries, modules, packages, utilities, and data files required to run the tool. A **workflow platform** is a specific hardware and software implementation capable of interpreting a CWL document and executing the processes specified by the document. The responsibilities of the workflow platform may include scheduling process invocation, setting up the necessary runtime environment, making input data available, invoking the tool process, and collecting output. It is intended that the workflow platform has broad leeway outside of this specification to optimize use of computing resources and enforce policies not covered by this specifcation. Some areas that are currently out of scope for CWL specification but may be handled by a specific workflow platform include: * Data security and permissions. * Scheduling tool invocations on remote cluster or cloud compute nodes. * Using virtual machines or operating system containers to manage the runtime (except as described in [DockerRequirement](#dockerrequirement)). * Using remote or distributed file systems to manage input and output files. * Translating or rewriting file paths. * Determining if a process has previously been executed, skipping it and reusing previous results. * Pausing and resuming of processes or workflows. Conforming CWL processes must not assume anything about the runtime environment or workflow platform unless explicitly declared though the use of [process requirements](#processrequirement). ## Generic execution process The generic execution sequence of a CWL process (including both workflows and concrete process implementations) is as follows. 1. Load and validate CWL document, yielding a process object. 2. Load input object. 3. Validate the input object against the `inputs` schema for the process. 4. Validate that process requirements are met. 5. Perform any further setup required by the specific process type. 6. Execute the process. 7. Capture results of process execution into the output object. 8. Validate the output object against the `outputs` schema for the process. 9. Report the output object to the process caller. ## Requirements and hints A **[process requirement](#processrequirement)** modifies the semantics or runtime environment of a process. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option. A **hint** is similar to a requirement, however it is not an error if an implementation cannot satisfy all hints. The implementation may report a warning if a hint cannot be satisfied. Requirements are inherited. A requirement specified in a Workflow applies to all workflow steps; a requirement specified on a workflow step will apply to the process implementation. If the same process requirement appears at different levels of the workflow, the most specific instance of the requirement is used, that is, an entry in `requirements` on a process implementation such as CommandLineTool will take precendence over an entry in `requirements` specified in a workflow step, and an entry in `requirements` on a workflow step takes precedence over the workflow. Entries in `hints` are resolved the same way. Requirements override hints. If a process implementation provides a process requirement in `hints` which is also provided in `requirements` by an enclosing workflow or workflow step, the enclosing `requirements` takes precedence. Process requirements are the primary mechanism for specifying extensions to the CWL core specification. ## Expressions An expression is a fragment of executable code which is evaluated by the workflow platform to affect the inputs, outputs, or behavior of a process. In the generic execution sequence, expressions may be evaluated during step 5 (process setup), step 6 (execute process), and/or step 7 (capture output). Expressions are distinct from regular processes in that they are intended to modify the behavior of the workflow itself rather than perform the primary work of the workflow. An implementation must provide the predefined `cwl:JsonPointer` expression engine. This expression engine specifies a [JSON Pointer](https://tools.ietf.org/html/draft-ietf-appsawg-json-pointer-04) into an expression input object consisting of the `job` and `context` fields described below. An expression engine defined with [ExpressionEngineRequirement](#expressionenginerequirement) is a command line program following the following protocol: * On standard input, receive a JSON object with the following fields: - **engineConfig**: A list of strings from the `engineConfig` field. Null if `engineConfig` is not specified. - **job**: The input object of the current Process (context dependent). - **context**: The specific value being transformed (context dependent). May be null. - **script**: The code fragment to evaluate. - **outdir**: When used in the context of a CommandLineTool, this is the designated output directory that will be used when executing the tool. Null if not applicable. - **tmpdir**: When used in the context of a CommandLineTool, this is the designated temporary directory that will be used when executing the tool. Null if not applicable. * On standard output, print a single JSON value (string, number, array, object, boolean, or null) for the return value. Expressions must be evaluated in an isolated context (a "sandbox") which permits no side effects to leak outside the context, and permit no outside data to leak into the context. Implementations may apply limits, such as process isolation, timeouts, and operating system containers/jails to minimize the security risks associated with running untrusted code. The order in which expressions are evaluated within a process or workflow is undefined. ## Workflow graph A workflow describes a set of **steps** and the **dependencies** between those processes. When a process produces output that will be consumed by a second process, the first process is a dependency of the second process. When there is a dependency, the workflow engine must execute the dependency process and wait for it to successfully produce output before executing the dependent process. If two processes are defined in the workflow graph that are not directly or indirectly dependent, these processes are **independent**, and may execute in any order or execute concurrently. A workflow is complete when all steps have been executed. ## Success and failure A completed process must result in one of `success`, `temporaryFailure` or `permanentFailure` states. An implementation may choose to retry a process execution which resulted in `temporaryFailure`. An implementation may choose to either continue running other steps of a workflow, or terminate immediately upon `permanentFailure`. * If any step of a workflow execution results in `permanentFailure`, then the workflow status is `permanentFailure`. * If one or more steps result in `temporaryFailure` and all other steps complete `success` or are not executed, then the workflow status is `temporaryFailure`. * If all workflow steps are executed and complete with `success`, then the workflow status is `success`. ## Executing CWL documents as scripts By convention, a CWL document may begin with `#!/usr/bin/env cwl-runner` and be marked as executable (the POSIX "+x" permission bits) to enable it to be executed directly. A workflow platform may support this mode of operation; if so, it must provide `cwl-runner` as an alias for the platform's CWL implementation. # Sample CWL workflow revtool.cwl: ``` #!/usr/bin/env cwl-runner # # Simplest example command line program wrapper for the Unix tool "rev". # class: CommandLineTool description: "Reverse each line using the `rev` command" # The "inputs" array defines the structure of the input object that describes # the inputs to the underlying program. Here, there is one input field # defined that will be called "input" and will contain a "File" object. # # The input binding indicates that the input value should be turned into a # command line argument. In this example inputBinding is an empty object, # which indicates that the file name should be added to the command line at # a default location. inputs: - id: "#input" type: File inputBinding: {} # The "outputs" array defines the structure of the output object that # describes the outputs of the underlying program. Here, there is one # output field defined that will be called "output", must be a "File" type, # and after the program executes, the output value will be the file # output.txt in the designated output directory. outputs: - id: "#output" type: File outputBinding: glob: output.txt # The actual program to execute. baseCommand: rev # Specify that the standard output stream must be redirected to a file called # output.txt in the designated output directory. stdout: output.txt ``` sorttool.cwl: ``` #!/usr/bin/env cwl-runner # # Example command line program wrapper for the Unix tool "sort" # demonstrating command line flags. class: CommandLineTool description: "Sort lines using the `sort` command" # This example is similar to the previous one, with an additional input # parameter called "reverse". It is a boolean parameter, which is # intepreted as a command line flag. The value of "prefix" is used for # flag to put on the command line if "reverse" is true. If "reverse" is # false, no flag is added. # # This example also introduced the "position" field. This indicates the # sorting order of items on the command line. Lower numbers are placed # before higher numbers. Here, the "--reverse" flag (if present) will be # added to the command line before the input file path. inputs: - id: "#reverse" type: boolean inputBinding: position: 1 prefix: "--reverse" - id: "#input" type: File inputBinding: position: 2 outputs: - id: "#output" type: File outputBinding: glob: output.txt baseCommand: sort stdout: output.txt ``` revsort.cwl: ``` #!/usr/bin/env cwl-runner # # This is a two-step workflow which uses "revtool" and "sorttool" defined above. # class: Workflow description: "Reverse the lines in a document, then sort those lines." # Requirements specify prerequisites and extensions to the workflow. # In this example, DockerRequirement specifies a default Docker container # in which the command line tools will execute. requirements: - class: DockerRequirement dockerPull: debian:8 # The inputs array defines the structure of the input object that describes # the inputs to the workflow. # # The "reverse_sort" input parameter demonstrates the "default" field. If the # field "reverse_sort" is not provided in the input object, the default value will # be used. inputs: - id: "#input" type: File description: "The input file to be processed." - id: "#reverse_sort" type: boolean default: true description: "If true, reverse (descending) sort" # The "outputs" array defines the structure of the output object that describes # the outputs of the workflow. # # Each output field must be connected to the output of one of the workflow # steps using the "connect" field. Here, the parameter "#output" of the # workflow comes from the "#sorted" output of the "sort" step. outputs: - id: "#output" type: File source: "#sorted.output" description: "The output with the lines reversed and sorted." # The "steps" array lists the executable steps that make up the workflow. # The tool to execute each step is listed in the "run" field. # # In the first step, the "inputs" field of the step connects the upstream # parameter "#input" of the workflow to the input parameter of the tool # "revtool.cwl#input". # # In the second step, the "inputs" field of the step connects the output # parameter "#reversed" from the first step to the input parameter of the # tool "sorttool.cwl#input". steps: - inputs: - { id: "#rev.input", source: "#input" } outputs: - { id: "#rev.output" } run: { import: revtool.cwl } - inputs: - { id: "#sorted.input", source: "#rev.output" } - { id: "#sorted.reverse", source: "#reverse_sort" } outputs: - { id: "#sorted.output" } run: { import: sorttool.cwl } ``` Sample input object: ``` { "input": { "class": "File", "path": "whale.txt" } } ``` Sample output object: ``` { "output": { "path": "/tmp/tmpdeI_p_/output.txt", "size": 1111, "class": "File", "checksum": "sha1$b9214658cc453331b62c2282b772a5c063dbd284" } } ``` jsonldPrefixes: { "cwl": "https://w3id.org/cwl/cwl#", "avro": "https://w3id.org/cwl/avro#", "dct": "http://purl.org/dc/terms/", "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#", "rdfs": "http://www.w3.org/2000/01/rdf-schema#" } jsonldVocab: cwl - name: Reference type: doc doc: This section specifies the core object types that make up a CWL document. - name: Datatype type: enum docAfter: ProcessRequirement symbols: - "null" - boolean - int - long - float - double - bytes - string - record - enum - array - map - File - Any jsonldPrefix: avro jsonldPredicate: - symbol: File predicate: "cwl:File" - symbol: Any predicate: "cwl:Any" doc: | CWL data types are based on Avro schema declarations. Refer to the [Avro schema declaration documentation](https://avro.apache.org/docs/current/spec.html#schemas) for detailed information. In addition, CWL defines [`File`](#file) as a special record type. ## Primitive types * **null**: no value * **boolean**: a binary value * **int**: 32-bit signed integer * **long**: 64-bit signed integer * **float**: single precision (32-bit) IEEE 754 floating-point number * **double**: double precision (64-bit) IEEE 754 floating-point number * **bytes**: sequence of uninterpreted 8-bit unsigned bytes * **string**: Unicode character sequence ## Complex types * **record**: An object with one or more fields defined by name and type * **enum**: A value from a finite set of symbolic values * **array**: An ordered sequence of values * **map**: An unordered collection of key/value pairs ## File type See [File](#file) below. ## Any type See [Any](#any) below. - name: File type: record docParent: Datatype doc: | Represents a file (or group of files if `secondaryFiles` is specified) that must be accessible by tools using standard POSIX file system call API such as open(2) and read(2). fields: - name: "class" type: type: enum name: "File_class" symbols: - File jsonldPredicate: "@id": "@type" "@type": "@vocab" doc: Must be `File` to indicate this object describes a file. - name: "path" type: "string" doc: The path to the file. - name: "checksum" type: ["null", "string"] doc: | Optional hash code for validating file integrity. Currently must be in the form "sha1$ + hexidecimal string" using the SHA-1 algorithm. - name: "size" type: ["null", "long"] doc: Optional file size. - name: "secondaryFiles" type: - "null" - type: array items: File doc: | A list of additional files that are associated with the primary file and must be transferred alongside the primary file. Examples include indexes of the primary file, or external references which must be included when loading primary document. A file object listed in `secondaryFiles` may itself include `secondaryFiles` for which the same rules apply. - name: Any type: enum docParent: Datatype symbols: ["Any"] doc: | The **Any** type validates for any non-null value. - name: Schema type: record doc: "A schema defines a parameter type." docParent: Parameter fields: - name: type doc: "The data type of this parameter." type: - "Datatype" - "Schema" - "string" - type: "array" items: [ "Datatype", "Schema", "string" ] jsonldPredicate: "@id": "avro:type" "@type": "@vocab" - name: fields type: - "null" - type: "array" items: "Schema" jsonldPredicate: "@id": "avro:fields" "@container": "@list" doc: "When `type` is `record`, defines the fields of the record." - name: "symbols" type: - "null" - type: "array" items: "string" jsonldPredicate: "@id": "avro:symbols" "@container": "@list" doc: "When `type` is `enum`, defines the set of valid symbols." - name: items type: - "null" - "Datatype" - "Schema" - "string" - type: "array" items: [ "Datatype", "Schema", "string" ] jsonldPredicate: "@id": "avro:items" "@container": "@list" doc: "When `type` is `array`, defines the type of the array elements." - name: "values" type: - "null" - "Datatype" - "Schema" - "string" - type: "array" items: [ "Datatype", "Schema", "string" ] jsonldPredicate: "@id": "avro:values" "@container": "@list" doc: "When `type` is `map`, defines the value type for the key/value pairs." - name: Parameter type: record docParent: Process abstract: true doc: | Define an input or output parameter to a process. fields: - name: type type: - "null" - Datatype - Schema - string - type: array items: - Datatype - Schema - string jsonldPredicate: "@id": "avro:type" "@type": "@vocab" doc: | Specify valid types of data that may be assigned to this parameter. - name: label type: - "null" - string jsondldPredicate: "rdfs:label" doc: "A short, human-readable label of this parameter object." - name: description type: - "null" - string jsondldPredicate: "rdfs:comment" doc: "A long, human-readable description of this parameter object." - name: streamable type: ["null", "boolean"] doc: | Currently only applies if `type` is `File`. A value of `true` indicates that the file is read or written sequentially without seeking. An implementation may use this flag to indicate whether it is valid to stream file contents using a named pipe. Default: `false`. - name: default type: ["null", Any] doc: | The default value for this parameter if not provided in the input object. - name: JsonPointer type: enum docParent: Expression symbols: - "JsonPointer" jsonldPrefix: "cwl" - type: record name: Expression docAfter: ExpressionTool doc: | Define an expression that will be evaluated and used to modify the behavior of a tool or workflow. See [Expressions](#expressions) for more information about expressions and [ExpressionEngineRequirement](#expressionenginerequirement) for information on how to define a expression engine. fields: - name: engine type: - JsonPointer - string doc: | Either `cwl:JsonPointer` or a reference to an ExpressionEngineRequirement defining which engine to use. jsonldPredicate: "@id": "cwl:engine" "@type": "@id" - name: script type: string doc: "The code to be executed by the expression engine." - name: Binding type: record docParent: Parameter fields: - name: loadContents type: - "null" - boolean doc: | Only applies when `type` is `File`. Read up to the first 64 KiB of text from the file and place it in the "contents" field of the file object for manipulation by expressions. - name: secondaryFiles type: - "null" - "string" - Expression - type: "array" items: ["string", "Expression"] doc: | Only applies when `type` is `File`. Describes files that must be included alongside the primary file. If the value is an expression, the context of the expression is the input or output File parameter to which this binding applies. If the value is a string, it specifies that the following pattern should be applied to the primary file: 1. If string begins with one or more caret `^` characters, for each caret, remove the last file extension from the path (the last period `.` and all following characters). If there are no file extensions, the path is unchanged. 2. Append the remainder of the string to the end of the file path. - name: InputSchema type: record extends: Schema docParent: InputParameter specialize: {Schema: InputSchema} fields: - name: inputBinding type: [ "null", "Binding" ] doc: | Describes how to handle a value in the input object convert it into a concrete form for execution, such as command line parameters. - name: OutputSchema type: record extends: Schema docParent: OutputParameter specialize: {Schema: OutputSchema} - name: InputParameter type: record extends: Parameter docAfter: Parameter specialize: {Schema: InputSchema} fields: - name: id type: string jsonldPredicate: "@id" doc: "The unique identifier for this parameter object." - name: "inputBinding" type: [ "null", "Binding" ] doc: | Describes how to handle the inputs of a process and convert them into a concrete form for execution, such as command line parameters. - name: OutputParameter type: record extends: Parameter docAfter: Parameter specialize: {Schema: OutputSchema} fields: - name: id type: string jsonldPredicate: "@id" doc: "The unique identifier for this parameter object." - type: record name: "FileDef" docParent: CreateFileRequirement doc: | Define a file that must be placed in the designated output directory prior to executing the command line tool. May be the result of executing an expression, such as building a configuration file from a template. fields: - name: "filename" type: ["string", "Expression"] doc: "The name of the file to create in the output directory." - name: "fileContent" type: ["string", "Expression"] doc: | If the value is a string literal or an expression which evaluates to a string, a new file must be created with the string as the file contents. If the value is an expression that evaluates to a File object, this indicates the referenced file should be added to the designated output directory prior to executing the tool. Files added in this way may be read-only, and may be provided by bind mounts or file system links to avoid unnecessary copying of the input file. - type: record name: EnvironmentDef docParent: EnvVarRequirement doc: | Define an environment variable that will be set in the runtime environment by the workflow platform when executing the command line tool. May be the result of executing an expression, such as getting a parameter from input. fields: - name: "envName" type: "string" doc: The environment variable name - name: "envValue" type: ["string", "Expression"] doc: The environment variable value - type: record name: SchemaDef extends: Schema docParent: SchemaDefRequirement fields: - name: name type: string doc: "The type name being defined." - type: record name: ProcessRequirement docAfter: ExpressionTool abstract: true doc: | A process requirement declares a prerequisite that may or must be fulfilled before executing a process. See [`Process.hints`](#process) and [`Process.requirements`](#process). Process requirements are the primary mechanism for specifying extensions to the CWL core specification. fields: - name: "class" type: "string" doc: "The specific requirement type." jsonldPredicate: "@id": "@type" "@type": "@vocab" - type: record name: Process abstract: true docAfter: ProcessRequirement doc: | The base executable type in CWL is the `Process` object defined by the document. Note that the `Process` object is abstract and cannot be directly executed. fields: - name: id type: ["null", string] jsonldPredicate: "@id" doc: "The unique identifier for this process object." - name: inputs type: type: array items: InputParameter doc: | Defines the input parameters of the process. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used build a user interface for constructing the input object. - name: outputs type: type: array items: OutputParameter doc: | Defines the parameters representing the output of the process. May be used to generate and/or validate the output object. - name: requirements type: - "null" - type: array items: ProcessRequirement doc: > Declares requirements that apply to either the runtime environment or the workflow engine that must be met in order to execute this process. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option. - name: hints type: - "null" - type: array items: Any doc: > Declares hints applying to either the runtime environment or the workflow engine that may be helpful in executing this process. It is not an error if an implementation cannot satisfy all hints, however the implementation may report a warning. - name: label type: - "null" - string jsondldPredicate: "rdfs:label" doc: "A short, human-readable label of this process object." - name: description type: - "null" - string jsondldPredicate: "rdfs:comment" doc: "A long, human-readable description of this process object." - type: record name: CommandLineBinding extends: Binding docParent: CommandInputParameter doc: | When listed under `inputBinding` in the input schema, the term "value" refers to the the corresponding value in the input object. For binding objects listed in `CommandLineTool.arguments`, the term "value" refers to the effective value after evaluating `valueFrom`. The binding behavior when building the command line depends on the data type of the value. If there is a mismatch between the type described by the input schema and the effective value, such as resulting from an expression evaluation, an implementation must use the data type of the effective value. - **string**: Add `prefix` and the string to the command line. - **number**: Add `prefix` and decimal representation to command line. - **boolean**: If true, add `prefix` to the command line. If false, add nothing. - **File**: Add `prefix` and the value of [`File.path`](#file) to the command line. - **array**: If `itemSeparator` is specified, add `prefix` and the join the array into a single string with `itemSeparator` separating the items. Otherwise first add `prefix`, then recursively process individual elements. - **object**: Add `prefix` only, and recursively add object fields for which `inputBinding` is specified. - **null**: Add nothing. fields: - name: "position" type: ["null", "int"] doc: "The sorting key. Default position is 0." - name: "prefix" type: [ "null", "string"] doc: "Command line prefix to add before the value." - name: "separate" type: ["null", boolean] doc: | If true (default), then the prefix and value must be added as separate command line arguments; if false, prefix and value must be concatenated into a single command line argument. - name: "itemSeparator" type: ["null", "string"] doc: | Join the array elements into a single string with the elements separated by by `itemSeparator`. - name: "valueFrom" type: - "null" - "string" - "Expression" doc: | If `valueFrom` is a constant string value, use this as the value and apply the binding rules above. If `valueFrom` is an expression, evaluate the expression to yield the actual value to use to build the command line and apply the binding rules above. If the inputBinding is associated with an input parameter, the "context" of the expression will be the value of the input parameter. When a binding is part of the `CommandLineTool.arguments` field, the `valueFrom` field is required. - type: record name: CommandOutputBinding extends: Binding docParent: CommandOutputParameter doc: | Describes how to generate an output parameter based on the files produced by a CommandLineTool. The output parameter is generated by applying these operations in the following order: - glob - loadContents - outputEval fields: - name: glob type: - "null" - string - Expression - type: array items: string doc: | Find files relative to the output directory, using POSIX glob(3) pathname matching. If provided an array, find files that match any pattern in the array. If provided an expression, the expression must return a string or an array of strings, which will then be evaluated as one or more glob patterns. Only files which actually exist will be matched and returned. - name: outputEval type: - "null" - Expression doc: | Evaluate an expression to generate the output value. If `glob` was specified, the script `context` will be an array containing any files that were matched. Additionally, if `loadContents` is `true`, the File objects will include up to the first 64 KiB of file contents in the `contents` field. - type: record name: CommandInputSchema extends: InputSchema docParent: CommandInputParameter specialize: InputSchema: CommandInputSchema Binding: CommandLineBinding - type: record name: CommandOutputSchema extends: OutputSchema docParent: CommandOutputParameter specialize: OutputSchema: CommandOutputSchema fields: - name: "outputBinding" type: [ "null", CommandOutputBinding ] doc: | Describes how to handle the concrete outputs of a process step (such as files created by a program) and describe them in the process output parameter. - type: record name: CommandInputParameter extends: InputParameter docParent: CommandLineTool doc: An input parameter for a CommandLineTool. specialize: InputSchema: CommandInputSchema Binding: CommandLineBinding - type: record name: CommandOutputParameter extends: OutputParameter docParent: CommandLineTool doc: An output parameter for a CommandLineTool. specialize: OutputSchema: CommandOutputSchema fields: - name: "outputBinding" type: [ "null", CommandOutputBinding ] doc: | Describes how to handle the concrete outputs of a process step (such as files created by a program) and describe them in the process output parameter. - type: record name: CommandLineTool extends: Process docAfter: Workflow specialize: InputParameter: CommandInputParameter OutputParameter: CommandOutputParameter doc: | A CommandLineTool process is a process implementation for executing a non-interactive application in a POSIX environment. To accommodate the enormous variety in syntax and semantics for input, runtime environment, invocation, and output of arbitrary programs, CommandLineTool uses an "input binding" that describes how to translate input parameters to an actual program invocation, and an "output binding" that describes how to generate output parameters from program output. # Input binding The tool command line is built by applying command line bindings to the input object. Bindings are listed either as part of an [input parameter](#commandinputparameter) using the `inputBinding` field, or separately using the `arguments` field of the CommandLineTool. The algorithm to build the command line is as follows. In this algorithm, the sort key is a list consisting of one or more numeric or string elements. Strings are sorted lexicographically based on UTF-8 encoding. 1. Collect `CommandLineBinding` objects from `arguments`. Assign a sorting key `[position, i]` where `position` is [`CommandLineBinding.position`](#commandlinebinding) and `i` is the index in the `arguments` list. 2. Collect `CommandLineBinding` objects from the `inputs` schema and associate them with values from the input object. Where the input type is a record, array, or map, recursively walk the schema and input object, collecting nested `CommandLineBinding` objects and associating them with values from the input object. 3. Create a sorting key by taking the value of the `position` field at each level leading to each leaf binding object. If `position` is not specified, it is not added to the sorting key. For bindings on arrays and maps, the sorting key must include the array index or map key following the position. If and only if two bindings have the same sort key, the tie must be broken using the ordering of the field or parameter name immediately containing the leaf binding. 4. Sort elements using the assigned sorting keys. Numeric entries sort before strings. 5. In the sorted order, apply the rules defined in [`CommandLineBinding`](#commandlinebinding) to convert bindings to actual command line elements. 6. Insert elements from `baseCommand` at the beginning of the command line. # Runtime environment All files listed in the input object must be made available in the runtime environment. The implementation may use a shared or distributed file system or transfer files via explicit download. Implementations may choose not to provide access to files not explicitly specified by the input object or process requirements. Output files produced by tool execution must be written to the **designated output directory**. The initial current working directory when executing the tool must be the designated output directory. When executing the tool, the child process must not inherit environment variables from the parent process. The tool must execute in a new, empty environment, containing only environment variables defined by [EnvVarRequirement](#envvarrequirement), the default environment of the Docker container specified in [DockerRequirement](#dockerrequirement) (if applicable), and `TMPDIR`. The `TMPDIR` environment variable must be set in the runtime environment to the **designated temporary directory**. Any files written to the designated temporary directory may be deleted by the workflow platform when the tool invocation is complete. An implementation may forbid the tool from writing to any location in the runtime environment file system other than the designated temporary directory and designated output directory. An implementation may provide read-only input files, and disallow in-place update of input files. The standard input stream and standard output stream may be redirected as described in the `stdin` and `stdout` fields. ## Extensions [DockerRequirement](#dockerrequirement), [CreateFileRequirement](#createfilerequirement), and [EnvVarRequirement](#envvarrequirement) are available as standard extensions to core command line tool semantics for defining the runtime environment. # Execution Once the command line is built and the runtime environment is created, the actual tool is executed. The standard error stream and standard output stream (unless redirected by setting `stdout`) may be captured by platform logging facilities for storage and reporting. Tools may be multithreaded or spawn child processes; however, when the parent process exits, the tool is considered finished regardless of whether any detached child processes are still running. Tools must not require any kind of console, GUI, or web based user interaction in order to start and run to completion. The exit code of the process indicates if the process completed successfully. By convention, an exit code of zero is treated as success and non-zero exit codes are treated as failure. This may be customized by providing the fields `successCodes`, `temporaryFailCodes`, and `permanentFailCodes`. An implementation may choose to default unspecified non-zero exit codes to either `temporaryFailure` or `permanentFailure`. # Output binding If the output directory contains a file named "cwl.output.json", that file must be loaded and used as the output object. Otherwise, the output object must be generated by walking the parameters listed in `outputs` and applying output bindings to the tool output. Output bindings are associated with output parameters using the `outputBinding` field. See [`CommandOutputBinding`](#commandoutputbinding) for details. fields: - name: "class" jsonldPredicate: "@id": "@type" "@type": "@vocab" type: string - name: baseCommand doc: | Specifies the program to execute. If the value is an array, the first element is the program to execute, and subsequent elements are placed at the beginning of the command line in prior to any command line bindings. If the program includes a path separator character it must be an absolute path, otherwise it is an error. If the program does not include a path separator, search the `$PATH` variable in the runtime environment of the workflow runner find the absolute path of the executable. type: - string - type: array items: string jsonldPredicate: "@id": "cwl:baseCommand" "@container": "@list" - name: arguments doc: | Command line bindings which are not directly associated with input parameters. type: - "null" - type: array items: [string, CommandLineBinding] jsonldPredicate: "@id": "cwl:arguments" "@container": "@list" - name: stdin type: ["null", string, Expression] doc: | A path to a file whose contents must be piped into the command's standard input stream. - name: stdout type: ["null", string, Expression] doc: | Capture the command's standard output stream to a file written to the designated output directory. If `stdout` is a string, it specifies the file name to use. If `stdout` is an expression, the expression is evaluated and must return a string with the file name to use to capture stdout. If the return value is not a string, or the resulting path contains illegal characters (such as the path separator `/`) it is an error. - name: successCodes type: - "null" - type: array items: int doc: | Exit codes that indicate the process completed successfully. - name: temporaryFailCodes type: - "null" - type: array items: int doc: | Exit codes that indicate the process failed due to a possibly temporary condition, where excuting the process with the same runtime environment and inputs may produce different results. - name: permanentFailCodes type: - "null" - type: array items: int doc: Exit codes that indicate the process failed due to a permanent logic error, where excuting the process with the same runtime environment and same inputs is expected to always fail. - type: record name: ExpressionTool extends: Process docAfter: CommandLineTool doc: | Execute an expression as a process step. fields: - name: "class" jsonldPredicate: "@id": "@type" "@type": "@vocab" type: string - name: expression type: Expression doc: | The expression to execute. The expression must return a JSON object which matches the output parameters of the ExpressionTool. - name: LinkMergeMethod type: enum docParent: WorkflowStepInput doc: The input link merge method, described in [WorkflowStepInput](#workflowstepinput). symbols: - merge_nested - merge_flattened - name: WorkflowOutputParameter type: record extends: OutputParameter docParent: Workflow doc: | Describe an output parameter of a workflow. The parameter must be connected to one or more parameters defined in the workflow that will provide the value of the output parameter. fields: - name: source doc: | Specifies one or more workflow parameters that will provide this output value. jsonldPredicate: "@id": "cwl:source" "@type": "@id" type: - "null" - string - type: array items: string - name: linkMerge type: ["null", LinkMergeMethod] doc: | The method to use to merge multiple inbound links into a single array. If not specified, the default method is "merge_nested". - type: record name: WorkflowStepInput docParent: WorkflowStep doc: | The input of a workflow step connects an upstream parameter (from the workflow inputs, or the outputs of other workflows steps) with the input parameters of the underlying process. ## Input object A WorkflowStepInput object must contain an `id` field in the form `#fieldname` or `#stepname.fieldname`. When the `id` field contains a period `.` the field name consists of the characters following the final period. This defines a field of the workflow step input object with the value of the `source` parameter(s). ## Merging If the sink parameter is an array, or named in a [workflow scatter](#workflowstep) operation, there may be multiple inbound data links listed in the `connect` field. The values from the input links are merged depending on the method specified in the `linkMerge` field. If not specified, the default method is "merge_nested". * **merge_nested** The input must be an array consisting of exactly one entry for each input link. If "merge_nested" is specified with a single link, the value from the link must be wrapped in a single-item list. * **merge_flattened** 1. The source and sink parameters must be compatible types, or the source type must be compatible with single element from the "items" type of the destination array parameter. 2. Source parameters which are arrays are concatenated. Source parameters which are single element types are appended as single elements. fields: - name: id type: string jsonldPredicate: "@id" doc: "A unique identifier for this workflow input parameter." - name: source doc: | Specifies one or more workflow parameters that will provide input to the underlying process parameter. jsonldPredicate: "@id": "cwl:source" "@type": "@id" type: - "null" - string - type: array items: string - name: linkMerge type: ["null", LinkMergeMethod] doc: | The method to use to merge multiple inbound links into a single array. If not specified, the default method is "merge_nested". - name: default type: ["null", Any] doc: | The default value for this parameter if there is no `source` field. - type: record name: WorkflowStepOutput docParent: WorkflowStep doc: | Associate an output parameter of the underlying process with a workflow parameter. The workflow parameter (given in the `id` field) be may be used as a `source` to connect with input parameters of other workflow steps, or with an output parameter of the process. fields: - name: id type: string jsonldPredicate: "@id" doc: | A unique identifier for this workflow output parameter. This is the identifier to use in the `source` field of `WorkflowStepInput` to connect the output value to downstream parameters. - name: ScatterMethod type: enum docParent: WorkflowStep doc: The scatter method, as described in [workflow step scatter](#workflowstep). symbols: - dotproduct - nested_crossproduct - flat_crossproduct - name: WorkflowStep type: record docParent: Workflow doc: | A workflow step is an executable element of a workflow. It specifies the underlying process implementation (such as `CommandLineTool`) in the `run` field and connects the input and output parameters of the underlying process to workflow parameters. # Scatter/gather To use scatter/gather, [ScatterFeatureRequirement](#scatterfeaturerequirement) must be specified in the workflow or workflow step requirements. A "scatter" operation specifies that the associated workflow step or subworkflow should execute separately over a list of input elements. Each job making up a scatter operaution is independent and may be executed concurrently. The `scatter` field specifies one or more input parameters which will be scattered. An input parameter may be listed more than once. The declared type of each input parameter is implicitly wrapped in an array for each time it appears in the `scatter` field. As a result, upstream parameters which are connected to scattered parameters may be arrays. All output parameter types are also implicitly wrapped in arrays. Each job in the scatter results in an entry in the output array. If `scatter` declares more than one input parameter, `scatterMethod` describes how to decompose the input into a discrete set of jobs. * **dotproduct** specifies that each of the input arrays are aligned and one element taken from each array to construct each job. It is an error if all input arrays are not the same length. * **nested_crossproduct** specifies the Cartesian product of the inputs, producing a job for every combination of the scattered inputs. The output must be nested arrays for each level of scattering, in the order that the input arrays are listed in the `scatter` field. * **flat_crossproduct** specifies the Cartesian product of the inputs, producing a job for every combination of the scattered inputs. The output arrays must be flattened to a single level, but otherwise listed in the order that the input arrays are listed in the `scatter` field. # Subworkflows To specify a nested workflow as part of a workflow step, [SubworkflowFeatureRequirement](#subworkflowfeaturerequirement) must be specified in the workflow or workflow step requirements. fields: - name: id type: ["null", string] jsonldPredicate: "@id" doc: "The unique identifier for this workflow step." - name: inputs type: type: array items: WorkflowStepInput doc: | Defines the input parameters of the workflow step. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used build a user interface for constructing the input object. - name: outputs type: type: array items: WorkflowStepOutput doc: | Defines the parameters representing the output of the process. May be used to generate and/or validate the output object. - name: requirements type: - "null" - type: array items: ProcessRequirement doc: > Declares requirements that apply to either the runtime environment or the workflow engine that must be met in order to execute this workflow step. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option. - name: hints type: - "null" - type: array items: Any doc: > Declares hints applying to either the runtime environment or the workflow engine that may be helpful in executing this workflow step. It is not an error if an implementation cannot satisfy all hints, however the implementation may report a warning. - name: label type: - "null" - string jsondldPredicate: "rdfs:label" doc: "A short, human-readable label of this process object." - name: description type: - "null" - string jsondldPredicate: "rdfs:comment" doc: "A long, human-readable description of this process object." - name: run type: Process doc: | Specifies the process to run. - name: scatter type: - "null" - string - type: array items: string jsonldPredicate: "@id": "cwl:scatter" "@type": "@id" "@container": "@list" - name: scatterMethod doc: | Required if `scatter` is an array of more than one element. type: - "null" - ScatterMethod jsonldPredicate: "@id": "cwl:scatterMethod" "@type": "@vocab" - name: Workflow type: record docParent: Reference extends: Process specialize: OutputParameter: WorkflowOutputParameter doc: | A workflow is a process consisting of one or more `steps`. Each step has input and output parameters defined by the `inputs` and `outputs` fields. A workflow executes as described in [execution model](#workflow_graph). # Dependencies Dependencies between parameters are expressed using the `source` field on [workflow step input parameters](#workflowstepinput) and [workflow output parameters](#workflowoutputparameter). The `source` field expresses the dependency of one parameter on another such that when a value is associated with the parameter specified by `source`, that value is propagated to the destination parameter. When all data links inbound to a given step are fufilled, the step is ready to execute. # Extensions [ScatterFeatureRequirement](#scatterfeaturerequirement) and [SubworkflowFeatureRequirement](#subworkflowfeaturerequirement) are available as standard extensions to core workflow semantics. fields: - name: "class" jsonldPredicate: "@id": "@type" "@type": "@vocab" type: string - name: steps doc: | The individual steps that make up the workflow. Each step is executed when all of its input data links are fufilled. An implementation may choose to execute the steps in a different order than listed and/or execute steps concurrently, provided that dependencies between steps are met. type: - type: array items: WorkflowStep - type: record name: DockerRequirement extends: ProcessRequirement doc: | Indicates that a workflow component should be run in a [Docker](http://docker.com) container, and specifies how to fetch or build the image. If a CommandLineTool lists `DockerRequirement` under `hints` or `requirements`, it may (or must) be run in the specified Docker container. The platform must first acquire or install the correct Docker image as specified by `dockerPull`, `dockerLoad` or `dockerFile`. The platform must execute the tool in the container using `docker run` with the appropriate Docker image and tool command line. The workflow platform may provide input files and the designated output directory through the use of volume bind mounts. The platform may rewrite file paths in the input object to correspond to the Docker bind mounted locations. When running a tool contained in Docker, the workflow platform must not assume anything about the contents of the Docker container, such as the presence or absence of specific software, except to assume that the generated command line represents a valid command within the runtime environment of the container. ## Interaction with other requirements If [EnvVarRequirement](#envvarrequirement) is specified alongside a DockerRequirement, the environment variables must be provided to Docker using `--env` or `--env-file` and interact with the container's preexisting environment as defined by Docker. fields: - name: dockerPull type: ["null", "string"] doc: "Specify a Docker image to retrieve using `docker pull`." - name: "dockerLoad" type: ["null", "string"] doc: "Specify a HTTP URL from which to download a Docker image using `docker load`." - name: dockerFile type: ["null", "string"] doc: "Supply the contents of a Dockerfile which will be built using `docker build`." - name: dockerImageId type: ["null", "string"] doc: | The image id that will be used for `docker run`. May be a human-readable image name or the image identifier hash. May be skipped if `dockerPull` is specified, in which case the `dockerPull` image id must be used. - name: dockerOutputDirectory type: ["null", "string"] doc: | Set the designated output directory to a specific location inside the Docker container. - type: record name: SubworkflowFeatureRequirement extends: ProcessRequirement doc: | Indicates that the workflow platform must support nested workflows in the `run` field of (WorkflowStep)(#workflowstep). - name: CreateFileRequirement type: record extends: ProcessRequirement doc: | Define a list of files that must be created by the workflow platform in the designated output directory prior to executing the command line tool. See `FileDef` for details. fields: - name: fileDef type: type: "array" items: "FileDef" doc: The list of files. - name: EnvVarRequirement type: record extends: ProcessRequirement doc: | Define a list of environment variables which will be set in the execution environment of the tool. See `EnvironmentDef` for details. fields: - name: envDef type: type: "array" items: "EnvironmentDef" doc: The list of environment variables. - name: ScatterFeatureRequirement type: record extends: ProcessRequirement doc: | Indicates that the workflow platform must support the `scatter` and `scatterMethod` fields of [WorkflowStep](#workflowstep). - name: SchemaDefRequirement type: record extends: ProcessRequirement doc: | This field consists of an array of type definitions which must be used when interpreting the `inputs` and `outputs` fields. When a symbolic type is encountered that is not in [`Datatype`](#datatype), the implementation must check if the type is defined in `schemaDefs` and use that definition. If the type is not found in `schemaDefs`, it is an error. The entries in `schemaDefs` must be processed in the order listed such that later schema definitions may refer to earlier schema definitions. fields: - name: types type: type: array items: SchemaDef doc: The list of type definitions. - type: record name: ExpressionEngineRequirement extends: ProcessRequirement doc: | Define an expression engine, as described in [Expressions](#expressions). fields: - name: id type: string doc: "Used to identify the expression engine in the `engine` field of Expressions." jsonldPredicate: "@id" - name: requirements type: - "null" - type: array items: ProcessRequirement doc: | Requirements to run this expression engine, such as DockerRequirement for specifying a container to run the engine. - name: engineCommand type: - "null" - string - type: array items: string doc: "The command line to invoke the expression engine." - name: engineConfig type: - "null" - type: array items: string doc: | Additional configuration or code fragments that will also be passed to the expression engine. The semantics of this field are defined by the underlying expression engine. Intended for uses such as providing function definitions that will be called from CWL expressions. cwltool-1.0.20180302231433/cwltool/schemas/draft-3/0000755000175200017520000000000013247251336022147 5ustar mcrusoemcrusoe00000000000000cwltool-1.0.20180302231433/cwltool/schemas/draft-3/UserGuide.yml0000644000175200017520000006715213247251315024576 0ustar mcrusoemcrusoe00000000000000- name: userguide type: documentation doc: - $include: userguide-intro.md - | # Wrapping Command Line Tools - | ## First example The simplest "hello world" program. This accepts one input parameter, writes a message to the terminal or job log, and produces no permanent output. CWL documents are written in [JSON](http://json.org) or [YAML](http://yaml.org), or a mix of the two. *1st-tool.cwl* ``` - $include: examples/1st-tool.cwl - | ``` Use a YAML object in a separate file to describe the input of a run: *echo-job.yml* ``` - $include: examples/echo-job.yml - | ``` Now invoke `cwl-runner` with the tool wrapper and the input object on the command line: ``` $ cwl-runner 1st-tool.cwl echo-job.yml [job 140199012414352] $ echo 'Hello world!' Hello world! Final process status is success ``` What's going on here? Let's break it down: ``` cwlVersion: cwl:draft-3 class: CommandLineTool ``` The 'cwlVersion` field indicates the version of the CWL spec used by the document. The `class` field indicates this document describes a command line tool. ``` baseCommand: echo ``` The `baseCommand` provides the name of program that will actually run (echo) ``` inputs: - id: message type: string inputBinding: position: 1 ``` The `inputs` section describes the inputs of the tool. This is a list of input parameters and each parameter includes an identifier, a data type, and optionally an `inputBinding` which describes how this input parameter should appear on the command line. In this example, the `position` field indicates where it should appear on the command line. ``` outputs: [] ``` This tool has no formal output, so the `outputs` section is an empty list. - | ## Essential input parameters The `inputs` of a tool is a list of input parameters that control how to run the tool. Each parameter has an `id` for the name of parameter, and `type` describing what types of values are valid for that parameter. Available primitive types are *string*, *int*, *long*, *float*, *double*, and *null*; complex types are *array* and *record*; in addition there are special types *File* and *Any*. The following example demonstrates some input parameters with different types and appearing on the command line in different ways: *inp.cwl* ``` - $include: examples/inp.cwl - | ``` *inp-job.yml* ``` - $include: examples/inp-job.yml - | ``` Notice that "example_file", as a `File` type, must be provided as an object with the fields `class: File` and `path`. Next, create a whale.txt and invoke `cwl-runner` with the tool wrapper and the input object on the command line: ``` $ touch whale.txt $ cwl-runner inp.cwl inp-job.yml [job 140020149614160] /home/example$ echo -f -i42 --example-string hello --file=/home/example/whale.txt -f -i42 --example-string hello --file=/home/example/whale.txt Final process status is success ``` The field `inputBinding` is optional and indicates whether and how the input parameter should be appear on the tool's command line. If `inputBinding` is missing, the parameter does not appear on the command line. Let's look at each example in detail. ``` - id: example_flag type: boolean inputBinding: position: 1 prefix: -f ``` Boolean types are treated as a flag. If the input parameter "example_flag" is "true", then `prefix` will be added to the command line. If false, no flag is added. ``` - id: example_string type: string inputBinding: position: 3 prefix: --example-string ``` String types appear on the command line as literal values. The `prefix` is optional, if provided, it appears as a separate argument on the command line before the parameter . In the example above, this is rendered as `--example-string hello`. ``` - id: example_int type: int inputBinding: position: 2 prefix: -i separate: false ``` Integer (and floating point) types appear on the command line with decimal text representation. When the option `separate` is false (the default value is true), the prefix and value are combined into a single argument. In the example above, this is rendered as `-i42`. ``` - id: example_file type: ["null", File] inputBinding: prefix: --file= separate: false position: 4 ``` File types appear on the command line as the path to the file. When the parameter type is a list, this indicates several alternate types are valid for this parameter. The most common use is to provide "null" as an alternate parameter type, which indicates that the parameter is optional. In the example above, this is rendered as `--file=/home/example/whale.txt`. However, if the "example_file" parameter were not provided in the input, nothing would appear on the command line. Input files are read-only. If you wish to update an input file, you must first copy it to the output directory. The value of `position` is used to determine where parameter should appear on the command line. Positions are relative to one another, not abosolute. As a result, positions do not have to be sequential, three parameters with positions `[1, 3, 5]` will result in the same command line as `[1, 2, 3]`. More than one parameter can have the same position (ties are broken using the parameter name), and the position field itself is optional. the default position is 0. The `baseCommand` field always comes before parameters. - | ## Returning output files The `outputs` of a tool is a list of output parameters that should be returned after running the tool. Each parameter has an `id` for the name of parameter, and `type` describing what types of values are valid for that parameter. When a tool runs under CWL, the starting working directory is the designated output directory. The underlying tool or script must record its results in the form of files created in the output directory. The output parameters returned by the CWL tool are either the output files themselves, or come from examining the content of those files. *tar.cwl* ``` - $include: examples/tar.cwl - | ``` *tar-job.yml* ``` - $include: examples/tar-job.yml - | ``` Next, create a tar file for the example and invoke `cwl-runner` with the tool wrapper and the input object on the command line: ``` $ touch hello.txt && tar -cvf hello.tar hello.txt $ cwl-runner tar.cwl tar-job.yml [job 139868145165200] $ tar xf /home/example/hello.tar Final process status is success { "example_out": { "path": "hello.txt", "size": 13, "class": "File", "checksum": "sha1$47a013e660d408619d894b20806b1d5086aab03b" } } ``` The field `outputBinding` describes how to to set the value of each output parameter. ``` outputs: - id: example_out type: File outputBinding: glob: hello.txt ``` The `glob` field consists of the name of a file in the output directory. If you don't know name of the file in advance, you can use a wildcard pattern. - | ## Capturing a tool's standard output stream To capture a tool's standard output stream, add the `stdout` field with the name of the file where the output stream should go. Then use `glob` on `outputBinding` to return the file. *stdout.cwl* ``` - $include: examples/stdout.cwl - | ``` *echo-job.yml* ``` - $include: examples/echo-job.yml - | ``` Now invoke `cwl-runner` providing the tool wrapper and the input object on the command line: ``` $ cwl-runner stdout.cwl echo-job.yml [job 140199012414352] $ echo 'Hello world!' > output.txt Final process status is success { "output": { "path": "output.txt", "size": 13, "class": "File", "checksum": "sha1$47a013e660d408619d894b20806b1d5086aab03b" } } $ cat output.txt Hello world! ``` - | ## Parameter references In a previous example, we used extracted a file using the "tar" program. However, that example was very limited becuase it assumed that the file we were interested in was called "hello.txt". In this example, you will see how to reference the value of input parameters dynamically from other fields. *tar-param.cwl* ``` - $include: examples/tar-param.cwl - | ``` *tar-param-job.yml* ``` - $include: examples/tar-param-job.yml - | ``` Create your input files and invoke `cwl-runner` with the tool wrapper and the input object on the command line: ``` $ rm hello.tar || true && touch goodbye.txt && tar -cvf hello.tar $ cwl-runner tar-param.cwl tar-param-job.yml [job 139868145165200] $ tar xf /home/example/hello.tar goodbye.txt Final process status is success { "example_out": { "path": "goodbye.txt", "size": 24, "class": "File", "checksum": "sha1$dd0a4c4c49ba43004d6611771972b6cf969c1c01" } } ``` Certain fields permit parameter references which are enclosed in `$(...)`. These are evaluated and replaced with value being referenced. ``` outputs: - id: example_out type: File outputBinding: glob: $(inputs.extractfile) ``` References are written using a subset of Javascript syntax. In this example, `$(inputs.extractfile)`, `$(inputs["extractfile"])`, and `$(inputs['extractfile'])` are equivalent. The value of the "inputs" variable is the input object provided when the CWL tool was invoked. Note that because File parameters are objects, to get the path to an input file you must reference the path field on a file object; to reference the path to the tar file in the above example you would write `$(inputs.tarfile.path)`. - | ## Running tools inside Docker [Docker](http://docker.io) containers simplify software installation by providing a complete known-good runtime for software and its dependencies. However, containers are also purposefully isolated from the host system, so in order to run a tool inside a Docker container there is additional work to ensure that input files are available inside the container and output files can be recovered from the contianer. CWL can perform this work automatically, allowing you to use Docker to simplify your software management while avoiding the complexity of invoking and managing Docker containers. This example runs a simple Node.js script inside a Docker container. *docker.cwl* ``` - $include: examples/docker.cwl - | ``` *docker-job.yml* ``` - $include: examples/docker-job.yml - | ``` Provide a hello.js and invoke `cwl-runner` providing the tool wrapper and the input object on the command line: ``` $ echo "console.log(\"Hello World\");" > hello.js $ cwl-runner docker.cwl docker-job.yml [job 140259721854416] /home/example$ docker run -i --volume=/home/example/hello.js:/var/lib/cwl/job369354770_examples/hello.js:ro --volume=/home/example:/var/spool/cwl:rw --volume=/tmp/tmpDLs5hm:/tmp:rw --workdir=/var/spool/cwl --read-only=true --net=none --user=1001 --rm --env=TMPDIR=/tmp node:slim node /var/lib/cwl/job369354770_examples/hello.js Hello world! Final process status is success ``` Notice the CWL runner has constructed a Docker command line to run the script. One of the responsibilies of the CWL runner is to the paths of input files to reflect the location where they appear inside the container. In this example, the path to the script `hello.js` is `/home/example/hello.js` outside the container but `/var/lib/cwl/job369354770_examples/hello.js` inside the container, as reflected in the invocation of the `node` command. - | ## Additional command line arguments and runtime parameters Sometimes tools require additional command line options that don't correspond exactly to input parameters. In this example, we will wrap the Java compiler to compile a java source file to a class file. By default, `javac` will create the class files in the same directory as the source file. However, CWL input files (and the directories in which they appear) may be read-only, so we need to instruct javac to write the class file to the designated output directory instead. *arguments.cwl* ``` - $include: examples/arguments.cwl - | ``` *arguments-job.yml* ``` - $include: examples/arguments-job.yml - | ``` Now create a sample Java file and invoke `cwl-runner` providing the tool wrapper and the input object on the command line: ``` $ echo "public class Hello {}" > Hello.java $ cwl-runner arguments.cwl arguments-job.yml [job 140051188854928] /home/example$ docker run -i --volume=/home/example/Hello.java:/var/lib/cwl/job710906416_example/Hello.java:ro --volume=/home/example:/var/spool/cwl:rw --volume=/tmp/tmpdlQDWi:/tmp:rw --workdir=/var/spool/cwl --read-only=true --net=none --user=1001 --rm --env=TMPDIR=/tmp java:7 javac -d /var/spool/cwl /var/lib/cwl/job710906416_examples/Hello.java Final process status is success { "classfile": { "size": 416, "path": "/home/example/Hello.class", "checksum": "sha1$2f7ac33c1f3aac3f1fec7b936b6562422c85b38a", "class": "File" } } ``` Here we use the `arguments` field to add an additional argument to the command line that isn't tied to a specific input parameter. ``` arguments: - prefix: "-d" valueFrom: $(runtime.outdir) ``` This example references a runtime parameter. Runtime parameters provide information about the hardware or software environment when the tool is actually executed. The `$(runtime.outdir)` parameter is the path to the designated output directory. Other parameters include `$(runtime.tmpdir)`, `$(runtime.ram)`, `$(runtime.cores)`, `$(runtime.ram)`, `$(runtime.outdirSize)`, and `$(runtime.tmpdirSize)`. See the [Runtime Environment](CommandLineTool.html#Runtime_environment) section of the CWL specification for details. - | ## Array inputs It is easy to add arrays of input parameters represented to the command line. To specify an array parameter, the array definition is nested under the `type` field with `type: array` and `items` defining the valid data types that may appear in the array. *array-inputs.cwl* ``` - $include: examples/array-inputs.cwl - | ``` *array-inputs-job.yml* ``` - $include: examples/array-inputs-job.yml - | ``` Now invoke `cwl-runner` providing the tool wrapper and the input object on the command line: ``` $ cwl-runner array-inputs.cwl array-inputs-job.yml [job 140334923640912] /home/example$ echo -A one two three -B=four -B=five -B=six -C=seven,eight,nine -A one two three -B=four -B=five -B=six -C=seven,eight,nine Final process status is success {} ``` The `inputBinding` can appear either on the outer array parameter definition or the inner array element definition, and these produce different behavior when constructing the command line, as shown above. In addition, the `itemSeperator` field, if provided, specifies that array values should be concatenated into a single argument separated by the item separator string. You can specify arrays of arrays, arrays of records, and other complex types. - | ## Array outputs You can also capture multiple output files into an array of files using `glob`. *array-outputs.cwl* ``` - $include: examples/array-outputs.cwl - | ``` *array-outpust-job.yml* ``` - $include: examples/array-outputs-job.yml - | ``` Now invoke `cwl-runner` providing the tool wrapper and the input object on the command line: ``` $ cwl-runner array-outputs.cwl array-outputs-job.yml [job 140190876078160] /home/example$ touch foo.txt bar.dat baz.txt Final process status is success { "output": [ { "size": 0, "path": "/home/peter/work/common-workflow-language/draft-3/examples/foo.txt", "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709", "class": "File" }, { "size": 0, "path": "/home/peter/work/common-workflow-language/draft-3/examples/baz.txt", "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709", "class": "File" } ] } ``` - | ## Record inputs, dependent and mutually exclusive parameters Sometimes an underlying tool has several arguments that must be provided together (they are dependent) or several arguments that cannot be provided together (they are exclusive). You can use records and type unions to group parameters together to describe these two conditions. *record.cwl* ``` - $include: examples/record.cwl - | ``` *record-job1.yml* ``` - $include: examples/record-job1.yml - | ``` ``` $ cwl-runner record.cwl record-job1.yml Workflow error: Error validating input record, could not validate field `dependent_parameters` because missing required field `itemB` ``` In the first example, you can't provide `itemA` without also providing `itemB`. *record-job2.yml* ``` - $include: examples/record-job2.yml - | ``` ``` $ cwl-runner record.cwl record-job2.yml [job 140566927111376] /home/example$ echo -A one -B two -C three -A one -B two -C three Final process status is success {} ``` In the second example, `itemC` and `itemD` are exclusive, so only `itemC` is added to the command line and `itemD` is ignored. *record-job3.yml* ``` - $include: examples/record-job2.yml - | ``` ``` $ cwl-runner record.cwl record-job3.yml [job 140606932172880] /home/example$ echo -A one -B two -D four -A one -B two -D four Final process status is success {} ``` In the third example, only `itemD` is provided, so it appears on the command line. - | ## Environment variables Tools run in a restricted environment and do not inherit most environment variables from the parent process. You can set environment variables for the tool using `EnvVarRequirement`. *env.cwl* ``` - $include: examples/env.cwl - | ``` *echo-job.yml* ``` - $include: examples/echo-job.yml - | ``` Now invoke `cwl-runner` with the tool wrapper and the input object on the command line: ``` $ cwl-runner env.cwl echo-job.yml [job 140710387785808] /home/example$ env PATH=/bin:/usr/bin:/usr/local/bin HELLO=Hello world! TMPDIR=/tmp/tmp63Obpk Final process status is success {} ``` - | ## Javascript expressions If you need to manipulate input parameters, include the requirement `InlineJavascriptRequirement` and then anywhere a parameter reference is legal you can provide a fragment of Javascript that will be evaluated by the CWL runner. *expression.cwl* ``` - $include: examples/expression.cwl - | ``` ``` $ cwl-runner expression.cwl empty.yml [job 140000594593168] /home/example$ echo -A 2 -B baz -C 10 9 8 7 6 5 4 3 2 1 -A 2 -B baz -C 10 9 8 7 6 5 4 3 2 1 Final process status is success {} ``` You can only use expressions in certain fields. These are: `filename`, `fileContent`, `envValue`, `valueFrom`, `glob`, `outputEval`, `stdin`, `stdout`, `coresMin`, `coresMax`, `ramMin`, `ramMax`, `tmpdirMin`, `tmpdirMax`, `outdirMin`, and `outdirMax`. - | ## Creating files at runtime Sometimes you need to create a file on the fly from input parameters, such as tools which expect to read their input configuration from a file rather than the command line parameters. To do this, use `CreateFileRequirement`. *createfile.cwl* ``` - $include: examples/createfile.cwl - | ``` *echo-job.yml* ``` - $include: examples/echo-job.yml - | ``` Now invoke `cwl-runner` with the tool wrapper and the input object on the command line: ``` $ cwltool createfile.cwl echo-job.yml [job 140528604979344] /home/example$ cat example.conf CONFIGVAR=Hello world! Final process status is success {} ``` - | ## Staging input files in the output directory Normally, input files are located in a read-only directory separate from the output directory. This causes problems if the underlying tool expects to write its output files alongside the input file in the same directory. You use `CreateFileRequirement` to stage input files into the output directory. In this example, we use a Javascript expression to extract the base name of the input file from its leading directory path. *linkfile.cwl* ``` - $include: examples/linkfile.cwl - | ``` *arguments-job.yml* ``` - $include: examples/arguments-job.yml - | ``` Now invoke `cwl-runner` with the tool wrapper and the input object on the command line: ``` $ cwl-runner linkfile.cwl arguments-job.yml [job 139928309171664] /home/example$ docker run -i --volume=/home/example/Hello.java:/var/lib/cwl/job557617295_examples/Hello.java:ro --volume=/home/example:/var/spool/cwl:rw --volume=/tmp/tmpmNbApw:/tmp:rw --workdir=/var/spool/cwl --read-only=true --net=none --user=1001 --rm --env=TMPDIR=/tmp java:7 javac Hello.java Final process status is success { "classfile": { "size": 416, "path": "/home/example/Hello.class", "checksum": "sha1$2f7ac33c1f3aac3f1fec7b936b6562422c85b38a", "class": "File" } } ``` - | # Writing Workflows ## First workflow This workflow extracts a java source file from a tar file and then compiles it. *1st-workflow.cwl* ``` - $include: examples/1st-workflow.cwl - | ``` Use a JSON object in a separate file to describe the input of a run: *1st-workflow-job.yml* ``` - $include: examples/1st-workflow-job.yml - | ``` Now invoke `cwl-runner` with the tool wrapper and the input object on the command line: ``` $ cwl-runner 1st-workflow.cwl 1st-workflow-job.yml [job untar] /tmp/tmp94qFiM$ tar xf /home/example/hello.tar Hello.java [step untar] completion status is success [job compile] /tmp/tmpu1iaKL$ docker run -i --volume=/tmp/tmp94qFiM/Hello.java:/var/lib/cwl/job301600808_tmp94qFiM/Hello.java:ro --volume=/tmp/tmpu1iaKL:/var/spool/cwl:rw --volume=/tmp/tmpfZnNdR:/tmp:rw --workdir=/var/spool/cwl --read-only=true --net=none --user=1001 --rm --env=TMPDIR=/tmp java:7 javac -d /var/spool/cwl /var/lib/cwl/job301600808_tmp94qFiM/Hello.java [step compile] completion status is success [workflow 1st-workflow.cwl] outdir is /home/example Final process status is success { "classout": { "path": "/home/example/Hello.class", "checksum": "sha1$2f7ac33c1f3aac3f1fec7b936b6562422c85b38a", "class": "File", "size": 416 } } ``` What's going on here? Let's break it down: ``` cwlVersion: cwl:draft-3 class: Workflow ``` The 'cwlVersion` field indicates the version of the CWL spec used by the document. The `class` field indicates this document describes a workflow. ``` inputs: - id: inp type: File - id: ex type: string ``` The `inputs` section describes the inputs of the workflow. This is a list of input parameters where each parameter consists of an identifier and a data type. These parameters can be used as sources for input to specific workflows steps. ``` outputs: - id: classout type: File source: "#compile/classfile" ``` The `outputs` section describes the outputs of the workflow. This is a list of output parameters where each parameter consists of an identifier and a data type. The `source` connects the output parameter `classfile` of the `compile` step to the workflow output parameter `classout`. ``` steps: - id: untar run: tar-param.cwl inputs: - id: tarfile source: "#inp" - id: extractfile source: "#ex" outputs: - id: example_out ``` The `steps` section describes the actual steps of the workflow. In this example, the first step extracts a file from a tar file, and the second step compiles the file from the first step using the java compiler. Workflow steps are not necessarily run in the order they are listed, instead the order is determined by the dependencies between steps (using `source`). In addition, workflow steps which do not depend on one another may run in parallel. The first step, `untar` runs `tar-param.cwl` (described previously in [Parameter references](#Parameter_references)). This tool has two input parameters, `tarfile` and `extractfile` and one output parameter `example_out`. The `inputs` section of the workflow step connects these two input parameters to the inputs of the workflow, `#inp` and `#ex` using `source`. This means that when the workflow step is executed, the values assigned to `#inp` and `#ex` will be used for the parameters `tarfile` and `extractfile` in order to run the tool. The `outputs` section of the workflow step lists the output parameters that are expected from the tool. ``` - id: compile run: arguments.cwl inputs: - id: src source: "#untar/example_out" outputs: - id: classfile ``` The second step `compile` depends on the results from the first step by connecting the input parameter `src` to the output parameter of `untar` using `#untar/example_out`. The output of this step `classfile` is connected to the `outputs` section for the Workflow, described above. cwltool-1.0.20180302231433/cwltool/schemas/draft-3/userguide-intro.md0000644000175200017520000000236013247251315025614 0ustar mcrusoemcrusoe00000000000000# A Gentle Introduction to the Common Workflow Language Hello! This guide will introduce you to writing tool wrappers and workflows using the Common Workflow Language (CWL). This guide describes the current stable specification, draft 3. Note: This document is a work in progress. Not all features are covered, yet. # Introduction CWL is a way to describe command line tools and connect them together to create workflows. Because CWL is a specification and not a specific piece of software, tools and workflows described using CWL are portable across a variety of platforms that support the CWL standard. CWL has roots in "make" and many similar tools that determine order of execution based on dependencies between tasks. However unlike "make", CWL tasks are isolated and you must be explicit about your inputs and outputs. The benefit of explicitness and isolation are flexibility, portability, and scalability: tools and workflows described with CWL can transparently leverage technologies such as Docker, be used with CWL implementations from different vendors, and is well suited for describing large-scale workflows in cluster, cloud and high performance computing environments where tasks are scheduled in parallel across many nodes. cwltool-1.0.20180302231433/cwltool/schemas/draft-3/Process.yml0000644000175200017520000004010713247251315024307 0ustar mcrusoemcrusoe00000000000000$base: "https://w3id.org/cwl/cwl#" $namespaces: cwl: "https://w3id.org/cwl/cwl#" sld: "https://w3id.org/cwl/salad#" rdfs: "http://www.w3.org/2000/01/rdf-schema#" $graph: - name: "Common Workflow Language, Draft 3" type: documentation doc: {$include: concepts.md} - $import: "salad/schema_salad/metaschema/metaschema.yml" - name: BaseTypesDoc type: documentation doc: | ## Base types docChild: - "#CWLType" - "#Process" - type: enum name: CWLVersions doc: "Version symbols for published CWL document versions." symbols: - cwl:draft-2 - cwl:draft-3.dev1 - cwl:draft-3.dev2 - cwl:draft-3.dev3 - cwl:draft-3.dev4 - cwl:draft-3.dev5 - cwl:draft-3 - name: CWLType type: enum extends: "sld:PrimitiveType" symbols: - cwl:File doc: - "Extends primitive types with the concept of a file as a first class type." - "File: A File object" - name: File type: record docParent: "#CWLType" doc: | Represents a file (or group of files if `secondaryFiles` is specified) that must be accessible by tools using standard POSIX file system call API such as open(2) and read(2). fields: - name: class type: type: enum name: File_class symbols: - cwl:File jsonldPredicate: "_id": "@type" "_type": "@vocab" doc: Must be `File` to indicate this object describes a file. - name: path type: string doc: The path to the file. jsonldPredicate: "_id": "cwl:path" "_type": "@id" - name: checksum type: ["null", string] doc: | Optional hash code for validating file integrity. Currently must be in the form "sha1$ + hexidecimal string" using the SHA-1 algorithm. - name: size type: ["null", long] doc: Optional file size. - name: "secondaryFiles" type: - "null" - type: array items: "#File" jsonldPredicate: "cwl:secondaryFiles" doc: | A list of additional files that are associated with the primary file and must be transferred alongside the primary file. Examples include indexes of the primary file, or external references which must be included when loading primary document. A file object listed in `secondaryFiles` may itself include `secondaryFiles` for which the same rules apply. - name: format type: ["null", string] jsonldPredicate: _id: cwl:format _type: "@id" identity: true doc: | The format of the file. This must be a URI of a concept node that represents the file format, preferrably defined within an ontology. If no ontology is available, file formats may be tested by exact match. Reasoning about format compatability must be done by checking that an input file format is the same, `owl:equivalentClass` or `rdfs:subClassOf` the format required by the input parameter. `owl:equivalentClass` is transitive with `rdfs:subClassOf`, e.g. if ` owl:equivalentClass ` and ` owl:subclassOf ` then infer ` owl:subclassOf `. File format ontologies may be provided in the "$schema" metadata at the root of the document. If no ontologies are specified in `$schema`, the runtime may perform exact file format matches. - name: SchemaBase type: record abstract: true fields: - name: secondaryFiles type: - "null" - "string" - "#Expression" - type: "array" items: ["string", "#Expression"] jsonldPredicate: "cwl:secondaryFiles" doc: | Only valid when `type: File` or is an array of `items: File`. Describes files that must be included alongside the primary file(s). If the value is an expression, the value of `self` in the expression must be the primary input or output File to which this binding applies. If the value is a string, it specifies that the following pattern should be applied to the primary file: 1. If string begins with one or more caret `^` characters, for each caret, remove the last file extension from the path (the last period `.` and all following characters). If there are no file extensions, the path is unchanged. 2. Append the remainder of the string to the end of the file path. - name: format type: - "null" - string - type: array items: string - "#Expression" jsonldPredicate: _id: cwl:format _type: "@id" identity: true doc: | Only valid when `type: File` or is an array of `items: File`. For input parameters, this must be one or more URIs of a concept nodes that represents file formats which are allowed as input to this parameter, preferrably defined within an ontology. If no ontology is available, file formats may be tested by exact match. For output parameters, this is the file format that will be assigned to the output parameter. - name: streamable type: ["null", "boolean"] doc: | Only valid when `type: File` or is an array of `items: File`. A value of `true` indicates that the file is read or written sequentially without seeking. An implementation may use this flag to indicate whether it is valid to stream file contents using a named pipe. Default: `false`. - name: Parameter type: record extends: "#SchemaBase" abstract: true doc: | Define an input or output parameter to a process. fields: - name: type type: - "null" - "#CWLType" - "sld:RecordSchema" - "sld:EnumSchema" - "sld:ArraySchema" - string - type: array items: - "#CWLType" - "sld:RecordSchema" - "sld:EnumSchema" - "sld:ArraySchema" - string jsonldPredicate: "_id": "sld:type" "_type": "@vocab" doc: | Specify valid types of data that may be assigned to this parameter. - name: label type: - "null" - string jsonldPredicate: "rdfs:label" doc: "A short, human-readable label of this parameter object." - name: description type: - "null" - string jsonldPredicate: "rdfs:comment" doc: "A long, human-readable description of this parameter object." - type: enum name: Expression doc: | Not a real type. Indicates that a field must allow runtime parameter references. If [InlineJavascriptRequirement](#InlineJavascriptRequirement) is declared and supported by the platform, the field must also allow Javascript expressions. symbols: - cwl:ExpressionPlaceholder - name: InputBinding type: record abstract: true fields: - name: loadContents type: - "null" - boolean jsonldPredicate: "cwl:loadContents" doc: | Only valid when `type: File` or is an array of `items: File`. Read up to the first 64 KiB of text from the file and place it in the "contents" field of the file object for use by expressions. - name: OutputBinding type: record abstract: true - name: InputSchema extends: "#SchemaBase" type: record abstract: true - name: OutputSchema extends: "#SchemaBase" type: record abstract: true - name: InputRecordField type: record extends: "sld:RecordField" specialize: - specializeFrom: "sld:RecordSchema" specializeTo: "#InputRecordSchema" - specializeFrom: "sld:EnumSchema" specializeTo: "#InputEnumSchema" - specializeFrom: "sld:ArraySchema" specializeTo: "#InputArraySchema" fields: - name: inputBinding type: [ "null", "#InputBinding" ] jsonldPredicate: "cwl:inputBinding" - name: InputRecordSchema type: record extends: ["sld:RecordSchema", "#InputSchema"] specialize: - specializeFrom: "sld:RecordField" specializeTo: "#InputRecordField" - name: InputEnumSchema type: record extends: ["sld:EnumSchema", "#InputSchema"] fields: - name: inputBinding type: [ "null", "#InputBinding" ] jsonldPredicate: "cwl:inputBinding" - name: InputArraySchema type: record extends: ["sld:ArraySchema", "#InputSchema"] specialize: - specializeFrom: "sld:RecordSchema" specializeTo: "#InputRecordSchema" - specializeFrom: "sld:EnumSchema" specializeTo: "#InputEnumSchema" - specializeFrom: "sld:ArraySchema" specializeTo: "#InputArraySchema" fields: - name: inputBinding type: [ "null", "#InputBinding" ] jsonldPredicate: "cwl:inputBinding" - name: OutputRecordField type: record extends: "sld:RecordField" specialize: - specializeFrom: "sld:RecordSchema" specializeTo: "#OutputRecordSchema" - specializeFrom: "sld:EnumSchema" specializeTo: "#OutputEnumSchema" - specializeFrom: "sld:ArraySchema" specializeTo: "#OutputArraySchema" fields: - name: outputBinding type: [ "null", "#OutputBinding" ] jsonldPredicate: "cwl:outputBinding" - name: OutputRecordSchema type: record extends: ["sld:RecordSchema", "#OutputSchema"] docParent: "#OutputParameter" specialize: - specializeFrom: "sld:RecordField" specializeTo: "#OutputRecordField" - name: OutputEnumSchema type: record extends: ["sld:EnumSchema", "#OutputSchema"] docParent: "#OutputParameter" fields: - name: outputBinding type: [ "null", "#OutputBinding" ] jsonldPredicate: "cwl:outputBinding" - name: OutputArraySchema type: record extends: ["sld:ArraySchema", "#OutputSchema"] docParent: "#OutputParameter" specialize: - specializeFrom: "sld:RecordSchema" specializeTo: "#OutputRecordSchema" - specializeFrom: "sld:EnumSchema" specializeTo: "#OutputEnumSchema" - specializeFrom: "sld:ArraySchema" specializeTo: "#OutputArraySchema" fields: - name: outputBinding type: [ "null", "#OutputBinding" ] jsonldPredicate: "cwl:outputBinding" - name: InputParameter type: record extends: "#Parameter" specialize: - specializeFrom: "sld:RecordSchema" specializeTo: "#InputRecordSchema" - specializeFrom: "sld:EnumSchema" specializeTo: "#InputEnumSchema" - specializeFrom: "sld:ArraySchema" specializeTo: "#InputArraySchema" fields: - name: id type: string jsonldPredicate: "@id" doc: "The unique identifier for this parameter object." - name: "inputBinding" type: [ "null", "#InputBinding" ] jsonldPredicate: "cwl:inputBinding" doc: | Describes how to handle the inputs of a process and convert them into a concrete form for execution, such as command line parameters. - name: default type: ["null", "Any"] jsonldPredicate: "cwl:default" doc: | The default value for this parameter if not provided in the input object. - name: OutputParameter type: record extends: "#Parameter" specialize: - specializeFrom: "sld:RecordSchema" specializeTo: "#OutputRecordSchema" - specializeFrom: "sld:EnumSchema" specializeTo: "#OutputEnumSchema" - specializeFrom: "sld:ArraySchema" specializeTo: "#OutputArraySchema" fields: - name: id type: string jsonldPredicate: "@id" doc: "The unique identifier for this parameter object." - name: "outputBinding" type: [ "null", "#OutputBinding" ] jsonldPredicate: "cwl:outputBinding" doc: | Describes how to handle the outputs of a process. - type: record name: ProcessRequirement abstract: true doc: | A process requirement declares a prerequisite that may or must be fulfilled before executing a process. See [`Process.hints`](#process) and [`Process.requirements`](#process). Process requirements are the primary mechanism for specifying extensions to the CWL core specification. fields: - name: "class" type: "string" doc: "The specific requirement type." jsonldPredicate: "_id": "@type" "_type": "@vocab" - type: record name: Process abstract: true doc: | The base executable type in CWL is the `Process` object defined by the document. Note that the `Process` object is abstract and cannot be directly executed. fields: - name: id type: ["null", string] jsonldPredicate: "@id" doc: "The unique identifier for this process object." - name: inputs type: type: array items: "#InputParameter" jsonldPredicate: "cwl:inputs" doc: | Defines the input parameters of the process. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used to build a user interface for constructing the input object. - name: outputs type: type: array items: "#OutputParameter" jsonldPredicate: "cwl:outputs" doc: | Defines the parameters representing the output of the process. May be used to generate and/or validate the output object. - name: requirements type: - "null" - type: array items: "#ProcessRequirement" jsonldPredicate: "cwl:requirements" doc: | Declares requirements that apply to either the runtime environment or the workflow engine that must be met in order to execute this process. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option. - name: hints type: - "null" - type: array items: Any jsonldPredicate: "cwl:hints" doc: | Declares hints applying to either the runtime environment or the workflow engine that may be helpful in executing this process. It is not an error if an implementation cannot satisfy all hints, however the implementation may report a warning. jsonldPredicate: _id: cwl:hints noLinkCheck: true - name: label type: - "null" - string jsonldPredicate: "rdfs:label" doc: "A short, human-readable label of this process object." - name: description type: - "null" - string jsonldPredicate: "rdfs:comment" doc: "A long, human-readable description of this process object." - name: cwlVersion type: - "null" - "#CWLVersions" doc: "CWL document version" jsonldPredicate: "_id": "cwl:cwlVersion" "_type": "@vocab" - name: InlineJavascriptRequirement type: record extends: "#ProcessRequirement" doc: | Indicates that the workflow platform must support inline Javascript expressions. If this requirement is not present, the workflow platform must not perform expression interpolatation. fields: - name: expressionLib type: - "null" - type: array items: string doc: | Additional code fragments that will also be inserted before executing the expression code. Allows for function definitions that may be called from CWL expressions. - name: SchemaDefRequirement type: record extends: "#ProcessRequirement" doc: | This field consists of an array of type definitions which must be used when interpreting the `inputs` and `outputs` fields. When a `type` field contain a URI, the implementation must check if the type is defined in `schemaDefs` and use that definition. If the type is not found in `schemaDefs`, it is an error. The entries in `schemaDefs` must be processed in the order listed such that later schema definitions may refer to earlier schema definitions. fields: - name: types type: type: array items: "#InputSchema" doc: The list of type definitions. cwltool-1.0.20180302231433/cwltool/schemas/draft-3/CommonWorkflowLanguage.yml0000644000175200017520000000032113247251315027312 0ustar mcrusoemcrusoe00000000000000$base: "https://w3id.org/cwl/cwl#" $namespaces: cwl: "https://w3id.org/cwl/cwl#" sld: "https://w3id.org/cwl/salad#" $graph: - $import: Process.yml - $import: CommandLineTool.yml - $import: Workflow.yml cwltool-1.0.20180302231433/cwltool/schemas/draft-3/index.yml0000644000175200017520000000030613247251315023775 0ustar mcrusoemcrusoe00000000000000# Common Workflow Language draft-3 specifications CWL draft-3 specification consists of the following documents: * Command Line Tool description specification * Workflow description specification cwltool-1.0.20180302231433/cwltool/schemas/draft-3/invocation.md0000644000175200017520000001717113247251315024646 0ustar mcrusoemcrusoe00000000000000# Running a Command To accommodate the enormous variety in syntax and semantics for input, runtime environment, invocation, and output of arbitrary programs, a CommandLineTool defines an "input binding" that describes how to translate abstract input parameters to an concrete program invocation, and an "output binding" that describes how to generate output parameters from program output. ## Input binding The tool command line is built by applying command line bindings to the input object. Bindings are listed either as part of an [input parameter](#CommandInputParameter) using the `inputBinding` field, or separately using the `arguments` field of the CommandLineTool. The algorithm to build the command line is as follows. In this algorithm, the sort key is a list consisting of one or more numeric or string elements. Strings are sorted lexicographically based on UTF-8 encoding. 1. Collect `CommandLineBinding` objects from `arguments`. Assign a sorting key `[position, i]` where `position` is [`CommandLineBinding.position`](#CommandLineBinding) and `i` is the index in the `arguments` list. 2. Collect `CommandLineBinding` objects from the `inputs` schema and associate them with values from the input object. Where the input type is a record, array, or map, recursively walk the schema and input object, collecting nested `CommandLineBinding` objects and associating them with values from the input object. 3. Create a sorting key by taking the value of the `position` field at each level leading to each leaf binding object. If `position` is not specified, it is not added to the sorting key. For bindings on arrays and maps, the sorting key must include the array index or map key following the position. If and only if two bindings have the same sort key, the tie must be broken using the ordering of the field or parameter name immediately containing the leaf binding. 4. Sort elements using the assigned sorting keys. Numeric entries sort before strings. 5. In the sorted order, apply the rules defined in [`CommandLineBinding`](#CommandLineBinding) to convert bindings to actual command line elements. 6. Insert elements from `baseCommand` at the beginning of the command line. ## Runtime environment All files listed in the input object must be made available in the runtime environment. The implementation may use a shared or distributed file system or transfer files via explicit download. Implementations may choose not to provide access to files not explicitly specified in the input object or process requirements. Output files produced by tool execution must be written to the **designated output directory**. The initial current working directory when executing the tool must be the designated output directory. Files may also be written to the **designated temporary directory**. This directory must be isolated and not shared with other processes. Any files written to the designated temporary directory may be automatically deleted by the workflow platform immediately after the tool terminates. For compatibility, files may be written to the **system temporary directory** which must be located at `/tmp`. Because the system temporary directory may be shared with other processes on the system, files placed in the system temporary directory are not guaranteed to be deleted automatically. Correct tools must clean up temporary files written to the system temporary directory. A tool must not use the system temporary directory as a backchannel communication with other tools. It is valid for the system temporary directory to be the same as the designated temporary directory. When executing the tool, the tool must execute in a new, empty environment with only the environment variables described below; the child process must not inherit environment variables from the parent process except as specified or at user option. * `HOME` must be set to the designated output directory. * `TMPDIR` must be set to the designated temporary directory. when the tool invocation and output collection is complete. * `PATH` may be inherited from the parent process, except when run in a container that provides its own `PATH`. * Variables defined by [EnvVarRequirement](#EnvVarRequirement) * The default environment of the container, such as when using [DockerRequirement](#DockerRequirement) An implementation may forbid the tool from writing to any location in the runtime environment file system other than the designated temporary directory, system temporary directory, and designated output directory. An implementation may provide read-only input files, and disallow in-place update of input files. The designated temporary directory, system temporary directory and designated output directory may each reside on different mount points on different file systems. An implementation may forbid the tool from directly accessing network resources. Correct tools must not assume any network access. Future versions of the specification may incorporate optional process requirements that describe the networking needs of a tool. The `runtime` section available in [parameter references](#Parameter_references) and [expressions](#Expressions) contains the following fields. As noted earlier, an implementation may perform deferred resolution of runtime fields by providing opaque strings for any or all of the following fields; parameter references and expressions may only use the literal string value of the field and must not perform computation on the contents. * `runtime.outdir`: an absolute path to the designated output directory * `runtime.tmpdir`: an absolute path to the designated temporary directory * `runtime.cores`: number of CPU cores reserved for the tool process * `runtime.ram`: amount of RAM in mebibytes (2**20) reserved for the tool process * `runtime.outdirSize`: reserved storage space available in the designated output directory * `runtime.tmpdirSize`: reserved storage space available in the designated temporary directory See [ResourceRequirement](#ResourceRequirement) for details on how to describe the hardware resources required by a tool. The standard input stream and standard output stream may be redirected as described in the `stdin` and `stdout` fields. ## Execution Once the command line is built and the runtime environment is created, the actual tool is executed. The standard error stream and standard output stream (unless redirected by setting `stdout`) may be captured by platform logging facilities for storage and reporting. Tools may be multithreaded or spawn child processes; however, when the parent process exits, the tool is considered finished regardless of whether any detached child processes are still running. Tools must not require any kind of console, GUI, or web based user interaction in order to start and run to completion. The exit code of the process indicates if the process completed successfully. By convention, an exit code of zero is treated as success and non-zero exit codes are treated as failure. This may be customized by providing the fields `successCodes`, `temporaryFailCodes`, and `permanentFailCodes`. An implementation may choose to default unspecified non-zero exit codes to either `temporaryFailure` or `permanentFailure`. ## Output binding If the output directory contains a file named "cwl.output.json", that file must be loaded and used as the output object. Otherwise, the output object must be generated by walking the parameters listed in `outputs` and applying output bindings to the tool output. Output bindings are associated with output parameters using the `outputBinding` field. See [`CommandOutputBinding`](#CommandOutputBinding) for details. cwltool-1.0.20180302231433/cwltool/schemas/draft-3/concepts.md0000644000175200017520000004240113247251315024305 0ustar mcrusoemcrusoe00000000000000## References to Other Specifications **Javascript Object Notation (JSON)**: http://json.org **JSON Linked Data (JSON-LD)**: http://json-ld.org **YAML**: http://yaml.org **Avro**: https://avro.apache.org/docs/current/spec.html **Uniform Resource Identifier (URI) Generic Syntax**: https://tools.ietf.org/html/rfc3986) **Portable Operating System Interface (POSIX.1-2008)**: http://pubs.opengroup.org/onlinepubs/9699919799/ **Resource Description Framework (RDF)**: http://www.w3.org/RDF/ ## Scope This document describes CWL syntax, execution, and object model. It is not intended to document a CWL specific implementation, however it may serve as a reference for the behavior of conforming implementations. ## Terminology The terminology used to describe CWL documents is defined in the Concepts section of the specification. The terms defined in the following list are used in building those definitions and in describing the actions of an CWL implementation: **may**: Conforming CWL documents and CWL implementations are permitted but not required to behave as described. **must**: Conforming CWL documents and CWL implementations are required to behave as described; otherwise they are in error. **error**: A violation of the rules of this specification; results are undefined. Conforming implementations may detect and report an error and may recover from it. **fatal error**: A violation of the rules of this specification; results are undefined. Conforming implementations must not continue to execute the current process and may report an error. **at user option**: Conforming software may or must (depending on the modal verb in the sentence) behave as described; if it does, it must provide users a means to enable or disable the behavior described. **deprecated**: Conforming software may implement a behavior for backwards compatibility. Portable CWL documents should not rely on deprecated behavior. Behavior marked as deprecated may be removed entirely from future revisions of the CWL specification. # Data model ## Data concepts An **object** is a data structure equivalent to the "object" type in JSON, consisting of a unordered set of name/value pairs (referred to here as **fields**) and where the name is a string and the value is a string, number, boolean, array, or object. A **document** is a file containing a serialized object, or an array of objects. A **process** is a basic unit of computation which accepts input data, performs some computation, and produces output data. An **input object** is an object describing the inputs to a invocation of process. An **output object** is an object describing the output of an invocation of a process. An **input schema** describes the valid format (required fields, data types) for an input object. An **output schema** describes the valid format for a output object. **Metadata** is information about workflows, tools, or input items that is not used directly in the computation. ## Syntax CWL documents must consist of an object or array of objects represented using JSON or YAML syntax. Upon loading, a CWL implementation must apply the preprocessing steps described in the [Semantic Annotations for Linked Avro Data (SALAD) Specification](SchemaSalad.html). A implementation may formally validate the structure of a CWL document using SALAD schemas located at https://github.com/common-workflow-language/common-workflow-language/tree/master/draft-3 ## Identifiers If an object contains an `id` field, that is used to uniquely identify the object in that document. The value of the `id` field must be unique over the entire document. Identifiers may be resolved relative to other the document base and/or other identifiers following the rules are described in the [Schema Salad specification](SchemaSalad.html#Identifier_resolution). An implementation may choose to only honor references to object types for which the `id` field is explicitly listed in this specification. ## Document preprocessing An implementation must resolve [$import](SchemaSalad.html#Import) and [$include](SchemaSalad.html#Import) directives as described in the [Schema Salad specification](SchemaSalad.html). ## Extensions and Metadata Input metadata (for example, a lab sample identifier) may be represented within a tool or workflow using input parameters which are explicitly propagated to output. Future versions of this specification may define additional facilities for working with input/output metadata. Implementation extensions not required for correct execution (for example, fields related to GUI presentation) and metadata about the tool or workflow itself (for example, authorship for use in citations) may be provided as additional fields on any object. Such extensions fields must use a namespace prefix listed in the `$namespaces` section of the document as described in the [Schema Salad specification](SchemaSalad.html#Explicit_context). Implementation extensions which modify execution semantics must be [listed in the `requirements` field](#Requirements_and_hints). # Execution model ## Execution concepts A **parameter** is a named symbolic input or output of process, with an associated datatype or schema. During execution, values are assigned to parameters to make the input object or output object used for concrete process invocation. A **command line tool** is a process characterized by the execution of a standalone, non-interactive program which is invoked on some input, produces output, and then terminates. A **workflow** is a process characterized by multiple subprocess steps, where step outputs are connected to the inputs of other downstream steps to form a directed graph, and independent steps may run concurrently. A **runtime environment** is the actual hardware and software environment when executing a command line tool. It includes, but is not limited to, the hardware architecture, hardware resources, operating system, software runtime (if applicable, such as the Python interpreter or the JVM), libraries, modules, packages, utilities, and data files required to run the tool. A **workflow platform** is a specific hardware and software implementation capable of interpreting CWL documents and executing the processes specified by the document. The responsibilities of the workflow platform may include scheduling process invocation, setting up the necessary runtime environment, making input data available, invoking the tool process, and collecting output. A workflow platform may choose to only implement the Command Line Tool Description part of the CWL specification. It is intended that the workflow platform has broad leeway outside of this specification to optimize use of computing resources and enforce policies not covered by this specification. Some areas that are currently out of scope for CWL specification but may be handled by a specific workflow platform include: * Data security and permissions. * Scheduling tool invocations on remote cluster or cloud compute nodes. * Using virtual machines or operating system containers to manage the runtime (except as described in [DockerRequirement](CommandLineTool.html#DockerRequirement)). * Using remote or distributed file systems to manage input and output files. * Transforming file paths. * Determining if a process has previously been executed, skipping it and reusing previous results. * Pausing, resuming or checkpointing processes or workflows. Conforming CWL processes must not assume anything about the runtime environment or workflow platform unless explicitly declared though the use of [process requirements](#Requirements_and_hints). ## Generic execution process The generic execution sequence of a CWL process (including workflows and command line line tools) is as follows. 1. Load, process and validate a CWL document, yielding a process object. 2. Load input object. 3. Validate the input object against the `inputs` schema for the process. 4. Validate that process requirements are met. 5. Perform any further setup required by the specific process type. 6. Execute the process. 7. Capture results of process execution into the output object. 8. Validate the output object against the `outputs` schema for the process. 9. Report the output object to the process caller. ## Requirements and hints A **process requirement** modifies the semantics or runtime environment of a process. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option. A **hint** is similar to a requirement, however it is not an error if an implementation cannot satisfy all hints. The implementation may report a warning if a hint cannot be satisfied. Requirements are inherited. A requirement specified in a Workflow applies to all workflow steps; a requirement specified on a workflow step will apply to the process implementation. If the same process requirement appears at different levels of the workflow, the most specific instance of the requirement is used, that is, an entry in `requirements` on a process implementation such as CommandLineTool will take precedence over an entry in `requirements` specified in a workflow step, and an entry in `requirements` on a workflow step takes precedence over the workflow. Entries in `hints` are resolved the same way. Requirements override hints. If a process implementation provides a process requirement in `hints` which is also provided in `requirements` by an enclosing workflow or workflow step, the enclosing `requirements` takes precedence. ## Parameter references Parameter references are denoted by the syntax `$(...)` and may be used in any field permitting the pseudo-type `Expression`, as specified by this document. Conforming implementations must support parameter references. Parameter references use the following subset of [Javascript/ECMAScript 5.1](http://www.ecma-international.org/ecma-262/5.1/) syntax. In the following BNF grammar, character classes and grammar rules are denoted in '{}', '-' denotes exclusion from a character class, '(())' denotes grouping, '|' denotes alternates, trailing '*' denotes zero or more repeats, '+' denote one or more repeats, all other characters are literal values.

symbol:: {Unicode alphanumeric}+
singleq:: [' (( {character - '} | \' ))* ']
doubleq:: [" (( {character - "} | \" ))* "]
index:: [ {decimal digit}+ ]
segment:: . {symbol} | {singleq} | {doubleq} | {index}
parameter::$( {symbol} {segment}*)

Use the following algorithm to resolve a parameter reference: 1. Match the leading symbol as key 2. Look up the key in the parameter context (described below) to get the current value. It is an error if the key is not found in the parameter context. 3. If there are no subsequent segments, terminate and return current value 4. Else, match the next segment 5. Extract the symbol, string, or index from the segment as key 6. Look up the key in current value and assign as new current value. If the key is a symbol or string, the current value must be an object. If the key is an index, the current value must be an array or string. It is an error if the key does not match the required type, or the key is not found or out of range. 7. Repeat steps 3-6 The root namespace is the parameter context. The following parameters must be provided: * `inputs`: The input object to the current Process. * `self`: A context-specific value. The contextual values for 'self' are documented for specific fields elsewhere in this specification. If a contextual value of 'self' is not documented for a field, it must be 'null'. * `runtime`: An object containing configuration details. Specific to the process type. An implementation may provide may provide opaque strings for any or all fields of `runtime`. These must be filled in by the platform after processing the Tool but before actual execution. Parameter references and expressions may only use the literal string value of the field and must not perform computation on the contents. If the value of a field has no leading or trailing non-whitespace characters around a parameter reference, the effective value of the field becomes the value of the referenced parameter, preserving the return type. If the value of a field has non-whitespace leading or trailing characters around an parameter reference, it is subject to string interpolation. The effective value of the field is a string containing the leading characters; followed by the string value of the parameter reference; followed by the trailing characters. The string value of the parameter reference is its textual JSON representation with the following rules: * Leading and trailing quotes are stripped from strings * Objects entries are sorted by key Multiple parameter references may appear in a single field. This case is must be treated as a string interpolation. After interpolating the first parameter reference, interpolation must be recursively applied to the trailing characters to yield the final string value. ## Expressions An expression is a fragment of [Javascript/ECMAScript 5.1](http://www.ecma-international.org/ecma-262/5.1/) code which is evaluated by the workflow platform to affect the inputs, outputs, or behavior of a process. In the generic execution sequence, expressions may be evaluated during step 5 (process setup), step 6 (execute process), and/or step 7 (capture output). Expressions are distinct from regular processes in that they are intended to modify the behavior of the workflow itself rather than perform the primary work of the workflow. To declare the use of expressions, the document must include the process requirement `InlineJavascriptRequirement`. Expressions may be used in any field permitting the pseudo-type `Expression`, as specified by this document. Expressions are denoted by the syntax `$(...)` or `${...}`. A code fragment wrapped in the `$(...)` syntax must be evaluated as a [ECMAScript expression](http://www.ecma-international.org/ecma-262/5.1/#sec-11). A code fragment wrapped in the `${...}` syntax must be evaluated as a [EMACScript function body](http://www.ecma-international.org/ecma-262/5.1/#sec-13) for an anonymous, zero-argument function. Expressions must return a valid JSON data type: one of null, string, number, boolean, array, object. Implementations must permit any syntactically valid Javascript and account for nesting of parenthesis or braces and that strings that may contain parenthesis or braces when scanning for expressions. The runtime must include any code defined in the ["expressionLib" field of InlineJavascriptRequirement](#InlineJavascriptRequirement) prior to executing the actual expression. Before executing the expression, the runtime must initialize as global variables the fields of the parameter context described above. The effective value of the field after expression evaluation follows the same rules as parameter references discussed above. Multiple expressions may appear in a single field. Expressions must be evaluated in an isolated context (a "sandbox") which permits no side effects to leak outside the context. Expressions also must be evaluated in [Javascript strict mode](http://www.ecma-international.org/ecma-262/5.1/#sec-4.2.2). The order in which expressions are evaluated is undefined except where otherwise noted in this document. An implementation may choose to implement parameter references by evaluating as a Javascript expression. The results of evaluating parameter references must be identical whether implemented by Javascript evaluation or some other means. Implementations may apply other limits, such as process isolation, timeouts, and operating system containers/jails to minimize the security risks associated with running untrusted code embedded in a CWL document. ## Success and failure A completed process must result in one of `success`, `temporaryFailure` or `permanentFailure` states. An implementation may choose to retry a process execution which resulted in `temporaryFailure`. An implementation may choose to either continue running other steps of a workflow, or terminate immediately upon `permanentFailure`. * If any step of a workflow execution results in `permanentFailure`, then the workflow status is `permanentFailure`. * If one or more steps result in `temporaryFailure` and all other steps complete `success` or are not executed, then the workflow status is `temporaryFailure`. * If all workflow steps are executed and complete with `success`, then the workflow status is `success`. ## Executing CWL documents as scripts By convention, a CWL document may begin with `#!/usr/bin/env cwl-runner` and be marked as executable (the POSIX "+x" permission bits) to enable it to be executed directly. A workflow platform may support this mode of operation; if so, it must provide `cwl-runner` as an alias for the platform's CWL implementation. A CWL input object document may similarly begin with `#!/usr/bin/env cwl-runner` and be marked as executable. In this case, the input object must include the field `cwl:tool` supplying a URI to the default CWL document that should be executed using the fields of the input object as input parameters. cwltool-1.0.20180302231433/cwltool/schemas/draft-3/README.md0000644000175200017520000000175313247251315023431 0ustar mcrusoemcrusoe00000000000000# Common Workflow Language Specifications, draft-3 The CWL specifications are divided up into several documents. The [User Guide](UserGuide.html) provides a gentle introduction to writing CWL command line tools and workflows. The [Command Line Tool Description Specification](CommandLineTool.html) specifies the document schema and execution semantics for wrapping and executing command line tools. The [Workflow Description Specification](Workflow.html) specifies the document schema and execution semantics for composing workflows from components such as command line tools and other workflows. The [Semantic Annotations for Linked Avro Data (SALAD) Specification](SchemaSalad.html) specifies the preprocessing steps that must be applied when loading CWL documents and the schema language used to write the above specifications. If you use the CWL specifications or distribute CWL descriptions with a publication you should [cite the standard](https://dx.doi.org/10.6084/m9.figshare.3115156.v1) cwltool-1.0.20180302231433/cwltool/schemas/draft-3/contrib.md0000644000175200017520000000145113247251315024127 0ustar mcrusoemcrusoe00000000000000Authors: * Peter Amstutz , Arvados Project, Curoverse * Nebojša Tijanić , Seven Bridges Genomics Contributers: * Brad Chapman , Harvard Chan School of Public Health * John Chilton , Galaxy Project, Pennsylvania State University * Michael R. Crusoe , University of California, Davis * Andrey Kartashov , Cincinnati Children's Hospital * Dan Leehr , Duke University * Hervé Ménager , Institut Pasteur * Stian Soiland-Reyes [soiland-reyes@cs.manchester.ac.uk](mailto:soiland-reyes@cs.manchester.ac.uk), University of Manchester * Luka Stojanovic , Seven Bridges Genomics cwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/0000755000175200017520000000000013247251336023233 5ustar mcrusoemcrusoe00000000000000cwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/schema_salad/0000755000175200017520000000000013247251336025637 5ustar mcrusoemcrusoe00000000000000cwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/schema_salad/metaschema/0000755000175200017520000000000013247251336027746 5ustar mcrusoemcrusoe00000000000000././@LongLink0000000000000000000000000000014700000000000011217 Lustar 00000000000000cwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/schema_salad/metaschema/field_name_schema.ymlcwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/schema_salad/metaschema/field_name_schema.y0000644000175200017520000000040113247251315033533 0ustar mcrusoemcrusoe00000000000000{ "$namespaces": { "acid": "http://example.com/acid#" }, "$graph": [{ "name": "ExampleType", "type": "record", "fields": [{ "name": "base", "type": "string", "jsonldPredicate": "http://example.com/base" }] }] } ././@LongLink0000000000000000000000000000014600000000000011216 Lustar 00000000000000cwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/schema_salad/metaschema/ident_res_schema.ymlcwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/schema_salad/metaschema/ident_res_schema.ym0000644000175200017520000000035313247251315033607 0ustar mcrusoemcrusoe00000000000000{ "$namespaces": { "acid": "http://example.com/acid#" }, "$graph": [{ "name": "ExampleType", "type": "record", "fields": [{ "name": "id", "type": "string", "jsonldPredicate": "@id" }] }] } cwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/schema_salad/metaschema/ident_res_src.yml0000644000175200017520000000052213247251315033310 0ustar mcrusoemcrusoe00000000000000 { "id": "http://example.com/base", "form": { "id": "one", "things": [ { "id": "two" }, { "id": "#three", }, { "id": "four#five", }, { "id": "acid:six", } ] } } cwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/schema_salad/metaschema/field_name.yml0000644000175200017520000000246013247251315032553 0ustar mcrusoemcrusoe00000000000000- | ## Field name resolution The document schema declares the vocabulary of known field names. During preprocessing traversal, field name in the document which are not part of the schema vocabulary must be resolved to absolute URIs. Under "strict" validation, it is an error for a document to include fields which are not part of the vocabulary and not resolvable to absolute URIs. Fields names which are not part of the vocabulary are resolved using the following rules: * If an field name URI begins with a namespace prefix declared in the document context (`@context`) followed by a colon `:`, the prefix and colon must be replaced by the namespace declared in `@context`. * If there is a vocabulary term which maps to the URI of a resolved field, the field name must be replace with the vocabulary term. * If a field name URI is an absolute URI consisting of a scheme and path and is not part of the vocabulary, no processing occurs. Field name resolution is not relative. It must not be affected by the base URI. ### Field name resolution example Given the following schema: ``` - $include: field_name_schema.yml - | ``` Process the following example: ``` - $include: field_name_src.yml - | ``` This becomes: ``` - $include: field_name_proc.yml - | ``` cwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/schema_salad/metaschema/link_res_proc.yml0000644000175200017520000000063313247251315033321 0ustar mcrusoemcrusoe00000000000000{ "$base": "http://example.com/base", "link": "http://example.com/base/zero", "form": { "link": "http://example.com/one", "things": [ { "link": "http://example.com/two" }, { "link": "http://example.com/base#three" }, { "link": "http://example.com/four#five", }, { "link": "http://example.com/acid#six", } ] } } cwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/schema_salad/metaschema/salad.md0000644000175200017520000002435313247251315031360 0ustar mcrusoemcrusoe00000000000000# Semantic Annotations for Linked Avro Data (SALAD) Author: * Peter Amstutz , Curoverse Contributors: * The developers of Apache Avro * The developers of JSON-LD * Nebojša Tijanić , Seven Bridges Genomics # Abstract Salad is a schema language for describing structured linked data documents in JSON or YAML documents. A Salad schema provides rules for preprocessing, structural validation, and link checking for documents described by a Salad schema. Salad builds on JSON-LD and the Apache Avro data serialization system, and extends Avro with features for rich data modeling such as inheritance, template specialization, object identifiers, and object references. Salad was developed to provide a bridge between the record oriented data modeling supported by Apache Avro and the Semantic Web. # Status of This Document This document is the product of the [Common Workflow Language working group](https://groups.google.com/forum/#!forum/common-workflow-language). The latest version of this document is available in the "schema_salad" directory at https://github.com/common-workflow-language/schema_salad The products of the CWL working group (including this document) are made available under the terms of the Apache License, version 2.0. # Introduction The JSON data model is an extremely popular way to represent structured data. It is attractive because of it's relative simplicity and is a natural fit with the standard types of many programming languages. However, this simplicity means that basic JSON lacks expressive features useful for working with complex data structures and document formats, such as schemas, object references, and namespaces. JSON-LD is a W3C standard providing a way to describe how to interpret a JSON document as Linked Data by means of a "context". JSON-LD provides a powerful solution for representing object references and namespaces in JSON based on standard web URIs, but is not itself a schema language. Without a schema providing a well defined structure, it is difficult to process an arbitrary JSON-LD document as idiomatic JSON because there are many ways to express the same data that are logically equivalent but structurally distinct. Several schema languages exist for describing and validating JSON data, such as the Apache Avro data serialization system, however none understand linked data. As a result, to fully take advantage of JSON-LD to build the next generation of linked data applications, one must maintain separate JSON schema, JSON-LD context, RDF schema, and human documentation, despite significant overlap of content and obvious need for these documents to stay synchronized. Schema Salad is designed to address this gap. It provides a schema language and processing rules for describing structured JSON content permitting URI resolution and strict document validation. The schema language supports linked data through annotations that describe the linked data interpretation of the content, enables generation of JSON-LD context and RDF schema, and production of RDF triples by applying the JSON-LD context. The schema language also provides for robust support of inline documentation. ## Introduction to draft 1 This is the first version of Schema Salad. It is developed concurrently with draft 3 of the Common Workflow Language for use in specifying the Common Workflow Language, however Schema Salad is intended to be useful to a broader audience. ## References to Other Specifications **Javascript Object Notation (JSON)**: http://json.org **JSON Linked Data (JSON-LD)**: http://json-ld.org **YAML**: http://yaml.org **Avro**: https://avro.apache.org/docs/current/spec.html **Uniform Resource Identifier (URI) Generic Syntax**: https://tools.ietf.org/html/rfc3986) **Resource Description Framework (RDF)**: http://www.w3.org/RDF/ **UTF-8**: https://www.ietf.org/rfc/rfc2279.txt) ## Scope This document describes the syntax, data model, algorithms, and schema language for working with Salad documents. It is not intended to document a specific implementation of Salad, however it may serve as a reference for the behavior of conforming implementations. ## Terminology The terminology used to describe Salad documents is defined in the Concepts section of the specification. The terms defined in the following list are used in building those definitions and in describing the actions of an Salad implementation: **may**: Conforming Salad documents and Salad implementations are permitted but not required to be interpreted as described. **must**: Conforming Salad documents and Salad implementations are required to be interpreted as described; otherwise they are in error. **error**: A violation of the rules of this specification; results are undefined. Conforming implementations may detect and report an error and may recover from it. **fatal error**: A violation of the rules of this specification; results are undefined. Conforming implementations must not continue to process the document and may report an error. **at user option**: Conforming software may or must (depending on the modal verb in the sentence) behave as described; if it does, it must provide users a means to enable or disable the behavior described. # Document model ## Data concepts An **object** is a data structure equivalent to the "object" type in JSON, consisting of a unordered set of name/value pairs (referred to here as **fields**) and where the name is a string and the value is a string, number, boolean, array, or object. A **document** is a file containing a serialized object, or an array of objects. A **document type** is a class of files that share a common structure and semantics. A **document schema** is a formal description of the grammar of a document type. A **base URI** is a context-dependent URI used to resolve relative references. An **identifier** is a URI that designates a single document or single object within a document. A **vocabulary** is the set of symbolic field names and enumerated symbols defined by a document schema, where each term maps to absolute URI. ## Syntax Conforming Salad documents are serialized and loaded using YAML syntax and UTF-8 text encoding. Salad documents are written using the JSON-compatible subset of YAML. Features of YAML such as headers and type tags that are not found in the standard JSON data model must not be used in conforming Salad documents. It is a fatal error if the document is not valid YAML. A Salad document must consist only of either a single root object or an array of objects. ## Document context ### Implied context The implicit context consists of the vocabulary defined by the schema and the base URI. By default, the base URI must be the URI that was used to load the document. It may be overridden by an explicit context. ### Explicit context If a document consists of a root object, this object may contain the fields `$base`, `$namespaces`, `$schemas`, and `$graph`: * `$base`: Must be a string. Set the base URI for the document used to resolve relative references. * `$namespaces`: Must be an object with strings as values. The keys of the object are namespace prefixes used in the document; the values of the object are the prefix expansions. * `$schemas`: Must be an array of strings. This field may list URI references to documents in RDF-XML format which will be queried for RDF schema data. The subjects and predicates described by the RDF schema may provide additional semantic context for the document, and may be used for validation of prefixed extension fields found in the document. Other directives beginning with `$` must be ignored. ## Document graph If a document consists of a single root object, this object may contain the field `$graph`. This field must be an array of objects. If present, this field holds the primary content of the document. A document that consists of array of objects at the root is an implicit graph. ## Document metadata If a document consists of a single root object, metadata about the document, such as authorship, may be declared in the root object. ## Document schema Document preprocessing, link validation and schema validation require a document schema. A schema may consist of: * At least one record definition object which defines valid fields that make up a record type. Record field definitions include the valid types that may be assigned to each field and annotations to indicate fields that represent identifiers and links, described below in "Semantic Annotations". * Any number of enumerated type objects which define a set of finite set of symbols that are valid value of the type. * Any number of documentation objects which allow in-line documentation of the schema. The schema for defining a salad schema (the metaschema) is described in detail in "Schema validation". ### Record field annotations In a document schema, record field definitions may include the field `jsonldPredicate`, which may be either a string or object. Implementations must use the following document preprocessing of fields by the following rules: * If the value of `jsonldPredicate` is `@id`, the field is an identifier field. * If the value of `jsonldPredicate` is an object, and contains that object contains the field `_type` with the value `@id`, the field is a link field. * If the value of `jsonldPredicate` is an object, and contains that object contains the field `_type` with the value `@vocab`, the field is a vocabulary field, which is a subtype of link field. ## Document traversal To perform document document preprocessing, link validation and schema validation, the document must be traversed starting from the fields or array items of the root object or array and recursively visiting each child item which contains an object or arrays. # Document preprocessing After processing the explicit context (if any), document preprocessing begins. Starting from the document root, object fields values or array items which contain objects or arrays are recursively traversed depth-first. For each visited object, field names, identifier fields, link fields, vocabulary fields, and `$import` and `$include` directives must be processed as described in this section. The order of traversal of child nodes within a parent node is undefined. cwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/schema_salad/metaschema/link_res.yml0000644000175200017520000000341613247251315032300 0ustar mcrusoemcrusoe00000000000000- | ## Link resolution The schema may designate one or more fields as link fields reference other objects. Processing must resolve links to either absolute URIs using the following rules: * If a reference URI is prefixed with `#` it is a relative fragment identifier. It is resolved relative to the base URI by setting or replacing the fragment portion of the base URI. * If a reference URI does not contain a scheme and is not prefixed with `#` it is a path relative reference. If the reference URI contains `#` in any position other than the first character, the reference URI must be divided into a path portion and a fragment portion split on the first instance of `#`. The path portion is resolved relative to the base URI by the following rule: if the path portion of the base URI ends in a slash `/`, append the path portion of the reference URI to the path portion of the base URI. If the path portion of the base URI does not end in a slash, replace the final path segment with the path portion of the reference URI. Replace the fragment portion of the base URI with the fragment portion of the reference URI. * If a reference URI begins with a namespace prefix declared in `$namespaces` followed by a colon `:`, the prefix and colon must be replaced by the namespace declared in `$namespaces`. * If a reference URI is an absolute URI consisting of a scheme and path, no processing occurs. Link resolution must not affect the base URI used to resolve identifiers and other links. ### Link resolution example Given the following schema: ``` - $include: link_res_schema.yml - | ``` Process the following example: ``` - $include: link_res_src.yml - | ``` This becomes: ``` - $include: link_res_proc.yml - | ``` ././@LongLink0000000000000000000000000000014600000000000011216 Lustar 00000000000000cwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/schema_salad/metaschema/vocab_res_schema.ymlcwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/schema_salad/metaschema/vocab_res_schema.ym0000644000175200017520000000053113247251315033574 0ustar mcrusoemcrusoe00000000000000{ "$namespaces": { "acid": "http://example.com/acid#" }, "$graph": [{ "name": "Colors", "type": "enum", "symbols": ["acid:red"] }, { "name": "ExampleType", "type": "record", "fields": [{ "name": "voc", "type": "string", "jsonldPredicate": { "_type": "@vocab" } }] }] } cwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/schema_salad/metaschema/vocab_res_src.yml0000644000175200017520000000041313247251315033276 0ustar mcrusoemcrusoe00000000000000 { "form": { "things": [ { "voc": "red", }, { "voc": "http://example.com/acid#red", }, { "voc": "http://example.com/acid#blue", } ] } } cwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/schema_salad/metaschema/import_include.md0000644000175200017520000000554313247251315033311 0ustar mcrusoemcrusoe00000000000000## Import During preprocessing traversal, an implementation must resolve `$import` directives. An `$import` directive is an object consisting of exactly one field `$import` specifying resource by URI string. It is an error if there are additional fields in the `$import` object, such additional fields must be ignored. The URI string must be resolved to an absolute URI using the link resolution rules described previously. Implementations must support loading from `file`, `http` and `https` resources. The URI referenced by `$import` must be loaded and recursively preprocessed as a Salad document. The external imported document does not inherit the context of the importing document, and the default base URI for processing the imported document must be the URI used to retrieve the imported document. If the `$import` URI includes a document fragment, the fragment must be excluded from the base URI used to preprocess the imported document. Once loaded and processed, the `$import` node is replaced in the document structure by the object or array yielded from the import operation. URIs may reference document fragments which refer to specific an object in the target document. This indicates that the `$import` node must be replaced by only the object with the appropriate fragment identifier. It is a fatal error if an import directive refers to an external resource or resource fragment which does not exist or is not accessible. ### Import example import.yml: ``` { "hello": "world" } ``` parent.yml: ``` { "form": { "bar": { "$import": "import.yml" } } } ``` This becomes: ``` { "form": { "bar": { "hello": "world" } } } ``` ## Include During preprocessing traversal, an implementation must resolve `$include` directives. An `$include` directive is an object consisting of exactly one field `$include` specifying a URI string. It is an error if there are additional fields in the `$include` object, such additional fields must be ignored. The URI string must be resolved to an absolute URI using the link resolution rules described previously. The URI referenced by `$include` must be loaded as a text data. Implementations must support loading from `file`, `http` and `https` resources. Implementations may transcode the character encoding of the text data to match that of the parent document, but must not interpret or parse the text document in any other way. Once loaded, the `$include` node is replaced in the document structure by a string containing the text data loaded from the resource. It is a fatal error if an import directive refers to an external resource which does not exist or is not accessible. ### Include example parent.yml: ``` { "form": { "bar": { "$include": "include.txt" } } } ``` include.txt: ``` hello world ``` This becomes: ``` { "form": { "bar": "hello world" } } ``` cwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/schema_salad/metaschema/metaschema.yml0000644000175200017520000002565213247251315032607 0ustar mcrusoemcrusoe00000000000000$base: "https://w3id.org/cwl/salad#" $namespaces: sld: "https://w3id.org/cwl/salad#" dct: "http://purl.org/dc/terms/" rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdfs: "http://www.w3.org/2000/01/rdf-schema#" xsd: "http://www.w3.org/2001/XMLSchema#" $graph: - name: "Semantic_Annotations_for_Linked_Avro_Data" type: documentation doc: - $include: salad.md - $import: field_name.yml - $import: ident_res.yml - $import: link_res.yml - $import: vocab_res.yml - $include: import_include.md - name: "Link_Validation" type: documentation doc: | # Link validation Once a document has been preprocessed, an implementation may validate links. The link validation traversal may visit fields which the schema designates as link fields and check that each URI references an existing object in the current document, an imported document, file system, or network resource. Failure to validate links may be a fatal error. Link validation behavior for individual fields may be modified by `identity` and `noLinkCheck` in the `jsonldPredicate` section of the field schema. - name: "Schema_validation" type: documentation doc: "" # - name: "JSON_LD_Context" # type: documentation # doc: | # # Generating JSON-LD Context # How to generate the json-ld context... - name: PrimitiveType type: enum symbols: - "sld:null" - "xsd:boolean" - "xsd:int" - "xsd:long" - "xsd:float" - "xsd:double" - "xsd:string" doc: - | Salad data types are based on Avro schema declarations. Refer to the [Avro schema declaration documentation](https://avro.apache.org/docs/current/spec.html#schemas) for detailed information. - "null: no value" - "boolean: a binary value" - "int: 32-bit signed integer" - "long: 64-bit signed integer" - "float: single precision (32-bit) IEEE 754 floating-point number" - "double: double precision (64-bit) IEEE 754 floating-point number" - "string: Unicode character sequence" - name: "Any" type: enum symbols: ["#Any"] doc: | The **Any** type validates for any non-null value. - name: JsonldPredicate type: record doc: | Attached to a record field to define how the parent record field is handled for URI resolution and JSON-LD context generation. fields: - name: _id type: ["null", string] jsonldPredicate: _id: sld:_id _type: "@id" identity: true doc: | The predicate URI that this field corresponds to. Corresponds to JSON-LD `@id` directive. - name: _type type: ["null", string] doc: | The context type hint, corresponds to JSON-LD `@type` directive. * If the value of this field is `@id` and `identity` is false or unspecified, the parent field must be resolved using the link resolution rules. If `identity` is true, the parent field must be resolved using the identifier expansion rules. * If the value of this field is `@vocab`, the parent field must be resolved using the vocabulary resolution rules. - name: _container type: ["null", string] doc: | Structure hint, corresponds to JSON-LD `@container` directive. - name: identity type: ["null", boolean] doc: | If true and `_type` is `@id` this indicates that the parent field must be resolved according to identity resolution rules instead of link resolution rules. In addition, the field value is considered an assertion that the linked value exists; absence of an object in the loaded document with the URI is not an error. - name: noLinkCheck type: ["null", boolean] doc: | If true, this indicates that link validation traversal must stop at this field. This field (it is is a URI) or any fields under it (if it is an object or array) are not subject to link checking. - name: SpecializeDef type: record fields: - name: specializeFrom type: string doc: "The data type to be replaced" jsonldPredicate: _id: "sld:specializeFrom" _type: "@id" - name: specializeTo type: string doc: "The new data type to replace with" jsonldPredicate: _id: "sld:specializeTo" _type: "@id" - name: NamedType type: record abstract: true fields: - name: name type: string jsonldPredicate: "@id" doc: "The identifier for this type" - name: DocType type: record abstract: true fields: - name: doc type: - "null" - string - type: array items: string doc: "A documentation string for this type, or an array of strings which should be concatenated." jsonldPredicate: "sld:doc" - name: docParent type: ["null", string] doc: | Hint to indicate that during documentation generation, documentation for this type should appear in a subsection under `docParent`. jsonldPredicate: _id: "sld:docParent" _type: "@id" - name: docChild type: - "null" - string - type: array items: string doc: | Hint to indicate that during documentation generation, documentation for `docChild` should appear in a subsection under this type. jsonldPredicate: _id: "sld:docChild" _type: "@id" - name: docAfter type: ["null", string] doc: | Hint to indicate that during documentation generation, documentation for this type should appear after the `docAfter` section at the same level. jsonldPredicate: _id: "sld:docAfter" _type: "@id" - name: SchemaDefinedType type: record extends: "#DocType" doc: | Abstract base for schema-defined types. abstract: true fields: - name: jsonldPredicate type: - "null" - string - "#JsonldPredicate" doc: | Annotate this type with linked data context. jsonldPredicate: "sld:jsonldPredicate" - name: documentRoot type: ["null", boolean] doc: | If true, indicates that the type is a valid at the document root. At least one type in a schema must be tagged with `documentRoot: true`. - name: RecordField type: record doc: "A field of a record." fields: - name: name type: string jsonldPredicate: "@id" doc: | The name of the field - name: doc type: ["null", string] doc: | A documentation string for this field jsonldPredicate: "sld:doc" - name: type type: - "#PrimitiveType" - "#RecordSchema" - "#EnumSchema" - "#ArraySchema" - string - type: array items: - "#PrimitiveType" - "#RecordSchema" - "#EnumSchema" - "#ArraySchema" - string jsonldPredicate: _id: "sld:type" _type: "@vocab" doc: | The field type - name: SaladRecordField type: record extends: "#RecordField" doc: "A field of a record." fields: - name: jsonldPredicate type: - "null" - string - "#JsonldPredicate" doc: | Annotate this type with linked data context. jsonldPredicate: "sld:jsonldPredicate" - name: RecordSchema type: record fields: - name: type doc: "Must be `record`" type: name: Record_symbol type: enum symbols: - "sld:record" jsonldPredicate: _id: "sld:type" _type: "@vocab" - name: "fields" type: - "null" - type: "array" items: "#RecordField" jsonldPredicate: "sld:fields" doc: "Defines the fields of the record." - name: SaladRecordSchema type: record extends: ["#NamedType", "#RecordSchema", "#SchemaDefinedType"] documentRoot: true specialize: - specializeFrom: "#RecordField" specializeTo: "#SaladRecordField" fields: - name: abstract type: ["null", boolean] doc: | If true, this record is abstract and may be used as a base for other records, but is not valid on its own. - name: extends type: - "null" - string - type: array items: string jsonldPredicate: _id: "sld:extends" _type: "@id" doc: | Indicates that this record inherits fields from one or more base records. - name: specialize type: - "null" - "#SpecializeDef" - type: array items: "#SpecializeDef" doc: | Only applies if `extends` is declared. Apply type specialization using the base record as a template. For each field inherited from the base record, replace any instance of the type `specializeFrom` with `specializeTo`. - name: EnumSchema type: record doc: | Define an enumerated type. fields: - name: type doc: "Must be `enum`" type: name: Enum_symbol type: enum symbols: - "sld:enum" jsonldPredicate: _id: "sld:type" _type: "@vocab" - name: "symbols" type: - type: "array" items: "string" jsonldPredicate: _id: "sld:symbols" _type: "@id" identity: true doc: "Defines the set of valid symbols." - name: SaladEnumSchema type: record extends: ["#EnumSchema", "#SchemaDefinedType"] documentRoot: true doc: | Define an enumerated type. fields: - name: extends type: - "null" - string - type: array items: string jsonldPredicate: _id: "sld:extends" _type: "@id" doc: | Indicates that this enum inherits symbols from a base enum. - name: ArraySchema type: record fields: - name: type doc: "Must be `array`" type: name: Array_symbol type: enum symbols: - "sld:array" jsonldPredicate: _id: "sld:type" _type: "@vocab" - name: items type: - "#PrimitiveType" - "#RecordSchema" - "#EnumSchema" - "#ArraySchema" - string - type: array items: - "#PrimitiveType" - "#RecordSchema" - "#EnumSchema" - "#ArraySchema" - string jsonldPredicate: _id: "sld:items" _type: "@vocab" doc: "Defines the type of the array elements." - name: Documentation type: record extends: ["#NamedType", "#DocType"] documentRoot: true doc: | A documentation section. This type exists to facilitate self-documenting schemas but has no role in formal validation. fields: - name: type doc: "Must be `documentation`" type: name: Documentation_symbol type: enum symbols: - "sld:documentation" jsonldPredicate: _id: "sld:type" _type: "@vocab" cwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/schema_salad/metaschema/field_name_src.yml0000644000175200017520000000025313247251315033420 0ustar mcrusoemcrusoe00000000000000 { "base": "one", "form": { "http://example.com/base": "two", "http://example.com/three": "three", }, "acid:four": "four" } cwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/schema_salad/metaschema/link_res_src.yml0000644000175200017520000000047113247251315033145 0ustar mcrusoemcrusoe00000000000000{ "$base": "http://example.com/base", "link": "http://example.com/base/zero", "form": { "link": "one", "things": [ { "link": "two" }, { "link": "#three", }, { "link": "four#five", }, { "link": "acid:six", } ] } } cwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/schema_salad/metaschema/ident_res.yml0000644000175200017520000000323413247251315032444 0ustar mcrusoemcrusoe00000000000000- | ## Identifier resolution The schema may designate one or more fields as identifier fields to identify specific objects. Processing must resolve relative identifiers to absolute identifiers using the following rules: * If an identifier URI is prefixed with `#` it is a URI relative fragment identifier. It is resolved relative to the base URI by setting or replacing the fragment portion of the base URI. * If an identifier URI does not contain a scheme and is not prefixed `#` it is a parent relative fragment identifier. It is resolved relative to the base URI by the following rule: if the base URI does not contain a document fragment, set the fragment portion of the base URI. If the base URI does contain a document fragment, append a slash `/` followed by the identifier field to the fragment portion of the base URI. * If an identifier URI begins with a namespace prefix declared in `$namespaces` followed by a colon `:`, the prefix and colon must be replaced by the namespace declared in `$namespaces`. * If an identifier URI is an absolute URI consisting of a scheme and path, no processing occurs. When preprocessing visits a node containing an identifier, that identifier must be used as the base URI to process child nodes. It is an error for more than one object in a document to have the same absolute URI. ### Identifier resolution example Given the following schema: ``` - $include: ident_res_schema.yml - | ``` Process the following example: ``` - $include: ident_res_src.yml - | ``` This becomes: ``` - $include: ident_res_proc.yml - | ``` cwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/schema_salad/metaschema/link_res_schema.yml0000644000175200017520000000041013247251315033607 0ustar mcrusoemcrusoe00000000000000{ "$namespaces": { "acid": "http://example.com/acid#" }, "$graph": [{ "name": "ExampleType", "type": "record", "fields": [{ "name": "link", "type": "string", "jsonldPredicate": { "_type": "@id" } }] }] } cwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/schema_salad/metaschema/ident_res_proc.yml0000644000175200017520000000056213247251315033470 0ustar mcrusoemcrusoe00000000000000{ "id": "http://example.com/base", "form": { "id": "http://example.com/base#one", "things": [ { "id": "http://example.com/base#one/two" }, { "id": "http://example.com/base#three" }, { "id": "http://example.com/four#five", }, { "id": "http://example.com/acid#six", } ] } } cwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/schema_salad/metaschema/vocab_res_proc.yml0000644000175200017520000000036313247251315033456 0ustar mcrusoemcrusoe00000000000000 { "form": { "things": [ { "voc": "red", }, { "voc": "red", }, { "voc": "http://example.com/acid#blue", } ] } } cwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/schema_salad/metaschema/vocab_res.yml0000644000175200017520000000142313247251315032431 0ustar mcrusoemcrusoe00000000000000- | ## Vocabulary resolution The schema may designate one or more vocabulary fields which use terms defined in the vocabulary. Processing must resolve vocabulary fields to either vocabulary terms or absolute URIs by first applying the link resolution rules defined above, then applying the following additional rule: * If a reference URI is a vocabulary field, and there is a vocabulary term which maps to the resolved URI, the reference must be replace with the vocabulary term. ### Vocabulary resolution example Given the following schema: ``` - $include: vocab_res_schema.yml - | ``` Process the following example: ``` - $include: vocab_res_src.yml - | ``` This becomes: ``` - $include: vocab_res_proc.yml - | ``` cwltool-1.0.20180302231433/cwltool/schemas/draft-3/salad/schema_salad/metaschema/field_name_proc.yml0000644000175200017520000000025313247251315033574 0ustar mcrusoemcrusoe00000000000000 { "base": "one", "form": { "base": "two", "http://example.com/three": "three", }, "http://example.com/acid#four": "four" } cwltool-1.0.20180302231433/cwltool/schemas/draft-3/Workflow.yml0000644000175200017520000004177013247251315024512 0ustar mcrusoemcrusoe00000000000000$base: "https://w3id.org/cwl/cwl#" $namespaces: cwl: "https://w3id.org/cwl/cwl#" rdfs: "http://www.w3.org/2000/01/rdf-schema#" $graph: - name: "WorkflowDoc" type: documentation doc: - | # Common Workflow Language (CWL) Workflow Description, draft 3 This version: * https://w3id.org/cwl/draft-3/ Current version: * https://w3id.org/cwl/ - "\n\n" - {$include: contrib.md} - "\n\n" - | # Abstract A Workflow is an analysis task represented by a directed graph describing a sequence of operations that transform an input data set to output. This specification defines the Common Workflow Language (CWL) Workflow description, a vendor-neutral standard for representing workflows intended to be portable across a variety of computing platforms. - {$include: intro.md} - | ## Introduction to draft 3 This specification represents the third milestone of the CWL group. Since draft-2, this draft introduces the following changes and additions: * Greatly simplified naming within a document with scoped identifiers, as described in the [Schema Salad specification](SchemaSalad.html). * The draft-2 concept of pluggable expression engines has been replaced by a [streamlined expression syntax)[#Parameter_references] and standardization on [Javascript](#Expressions). * [File](#File) objects can now include a `format` field to indicate the file type. * The addition of [MultipleInputFeatureRequirement](#MultipleInputFeatureRequirement). * The addition of [StepInputExpressionRequirement](#StepInputExpressionRequirement). * The separation of Workflow and CommandLineTool components into separate specifications. ## Purpose The Common Workflow Language Command Line Tool Description express workflows for data-intensive science, such as Bioinformatics, Chemistry, Physics, and Astronomy. This specification is intended to define a data and execution model for Workflows that can be implemented on top of a variety of computing platforms, ranging from an individual workstation to cluster, grid, cloud, and high performance computing systems. - {$include: concepts.md} - type: record name: ExpressionTool extends: "#Process" documentRoot: true doc: | Execute an expression as a process step. fields: - name: "class" jsonldPredicate: "_id": "@type" "_type": "@vocab" type: string - name: expression type: [string, "#Expression"] doc: | The expression to execute. The expression must return a JSON object which matches the output parameters of the ExpressionTool. - name: LinkMergeMethod type: enum docParent: "#WorkflowStepInput" doc: The input link merge method, described in [WorkflowStepInput](#WorkflowStepInput). symbols: - merge_nested - merge_flattened - name: WorkflowOutputParameter type: record extends: ["#OutputParameter", "#Sink"] docParent: "#Workflow" doc: | Describe an output parameter of a workflow. The parameter must be connected to one or more parameters defined in the workflow that will provide the value of the output parameter. - name: Sink type: record abstract: true fields: - name: source doc: | Specifies one or more workflow parameters that will provide input to the underlying process parameter. jsonldPredicate: "_id": "cwl:source" "_type": "@id" type: - "null" - string - type: array items: string - name: linkMerge type: ["null", "#LinkMergeMethod"] doc: | The method to use to merge multiple inbound links into a single array. If not specified, the default method is "merge_nested". - type: record name: WorkflowStepInput extends: "#Sink" docParent: "#WorkflowStep" doc: | The input of a workflow step connects an upstream parameter (from the workflow inputs, or the outputs of other workflows steps) with the input parameters of the underlying process. ## Input object A WorkflowStepInput object must contain an `id` field in the form `#fieldname` or `#stepname.fieldname`. When the `id` field contains a period `.` the field name consists of the characters following the final period. This defines a field of the workflow step input object with the value of the `source` parameter(s). ## Merging To merge multiple inbound data links, [MultipleInputFeatureRequirement](#MultipleInputFeatureRequirement) must be specified in the workflow or workflow step requirements. If the sink parameter is an array, or named in a [workflow scatter](#WorkflowStep) operation, there may be multiple inbound data links listed in the `source` field. The values from the input links are merged depending on the method specified in the `linkMerge` field. If not specified, the default method is "merge_nested". * **merge_nested** The input must be an array consisting of exactly one entry for each input link. If "merge_nested" is specified with a single link, the value from the link must be wrapped in a single-item list. * **merge_flattened** 1. The source and sink parameters must be compatible types, or the source type must be compatible with single element from the "items" type of the destination array parameter. 2. Source parameters which are arrays are concatenated. Source parameters which are single element types are appended as single elements. fields: - name: id type: string jsonldPredicate: "@id" doc: "A unique identifier for this workflow input parameter." - name: default type: ["null", Any] doc: | The default value for this parameter if there is no `source` field. jsonldPredicate: "cwl:default" - name: valueFrom type: - "null" - "string" - "#Expression" jsonldPredicate: "cwl:valueFrom" doc: | To use valueFrom, [StepInputExpressionRequirement](#StepInputExpressionRequirement) must be specified in the workflow or workflow step requirements. If `valueFrom` is a constant string value, use this as the value for this input parameter. If `valueFrom` is a parameter reference or expression, it must be evaluated to yield the actual value to be assiged to the input field. The `self` value of in the parameter reference or expression must be the value of the parameter(s) specified in the `source` field, or null if there is no `source` field. The value of `inputs` in the parameter reference or expression is the input object to the workflow step after assigning the `source` values, but before evaluating any step with `valueFrom`. The order of evaluating `valueFrom` among step input parameters is undefined. - type: record name: WorkflowStepOutput docParent: "#WorkflowStep" doc: | Associate an output parameter of the underlying process with a workflow parameter. The workflow parameter (given in the `id` field) be may be used as a `source` to connect with input parameters of other workflow steps, or with an output parameter of the process. fields: - name: id type: string jsonldPredicate: "@id" doc: | A unique identifier for this workflow output parameter. This is the identifier to use in the `source` field of `WorkflowStepInput` to connect the output value to downstream parameters. - name: ScatterMethod type: enum docParent: "#WorkflowStep" doc: The scatter method, as described in [workflow step scatter](#WorkflowStep). symbols: - dotproduct - nested_crossproduct - flat_crossproduct - name: WorkflowStep type: record docParent: "#Workflow" doc: | A workflow step is an executable element of a workflow. It specifies the underlying process implementation (such as `CommandLineTool`) in the `run` field and connects the input and output parameters of the underlying process to workflow parameters. # Scatter/gather To use scatter/gather, [ScatterFeatureRequirement](#ScatterFeatureRequirement) must be specified in the workflow or workflow step requirements. A "scatter" operation specifies that the associated workflow step or subworkflow should execute separately over a list of input elements. Each job making up a scatter operation is independent and may be executed concurrently. The `scatter` field specifies one or more input parameters which will be scattered. An input parameter may be listed more than once. The declared type of each input parameter is implicitly wrapped in an array for each time it appears in the `scatter` field. As a result, upstream parameters which are connected to scattered parameters may be arrays. All output parameter types are also implicitly wrapped in arrays. Each job in the scatter results in an entry in the output array. If `scatter` declares more than one input parameter, `scatterMethod` describes how to decompose the input into a discrete set of jobs. * **dotproduct** specifies that each of the input arrays are aligned and one element taken from each array to construct each job. It is an error if all input arrays are not the same length. * **nested_crossproduct** specifies the Cartesian product of the inputs, producing a job for every combination of the scattered inputs. The output must be nested arrays for each level of scattering, in the order that the input arrays are listed in the `scatter` field. * **flat_crossproduct** specifies the Cartesian product of the inputs, producing a job for every combination of the scattered inputs. The output arrays must be flattened to a single level, but otherwise listed in the order that the input arrays are listed in the `scatter` field. # Subworkflows To specify a nested workflow as part of a workflow step, [SubworkflowFeatureRequirement](#SubworkflowFeatureRequirement) must be specified in the workflow or workflow step requirements. fields: - name: id type: string jsonldPredicate: "@id" doc: "The unique identifier for this workflow step." - name: inputs type: type: array items: "#WorkflowStepInput" jsonldPredicate: "cwl:inputs" doc: | Defines the input parameters of the workflow step. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used build a user interface for constructing the input object. - name: outputs type: type: array items: "#WorkflowStepOutput" jsonldPredicate: "cwl:outputs" doc: | Defines the parameters representing the output of the process. May be used to generate and/or validate the output object. - name: requirements type: - "null" - type: array items: "#ProcessRequirement" jsonldPredicate: "cwl:requirements" doc: | Declares requirements that apply to either the runtime environment or the workflow engine that must be met in order to execute this workflow step. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option. - name: hints type: - "null" - type: array items: "Any" jsonldPredicate: "cwl:hints" doc: | Declares hints applying to either the runtime environment or the workflow engine that may be helpful in executing this workflow step. It is not an error if an implementation cannot satisfy all hints, however the implementation may report a warning. jsonldPredicate: _id: cwl:hints noLinkCheck: true - name: label type: - "null" - string jsonldPredicate: "rdfs:label" doc: "A short, human-readable label of this process object." - name: description type: - "null" - string jsonldPredicate: "rdfs:comment" doc: "A long, human-readable description of this process object." - name: run type: [string, "#Process"] jsonldPredicate: "_id": "cwl:run" "_type": "@id" doc: | Specifies the process to run. - name: scatter type: - "null" - string - type: array items: string jsonldPredicate: "_id": "cwl:scatter" "_type": "@id" "_container": "@list" - name: scatterMethod doc: | Required if `scatter` is an array of more than one element. type: - "null" - "#ScatterMethod" jsonldPredicate: "_id": "cwl:scatterMethod" "_type": "@vocab" - name: Workflow type: record extends: "#Process" documentRoot: true specialize: - specializeFrom: "#OutputParameter" specializeTo: "#WorkflowOutputParameter" doc: | A workflow describes a set of **steps** and the **dependencies** between those processes. When a process produces output that will be consumed by a second process, the first process is a dependency of the second process. When there is a dependency, the workflow engine must execute the preceeding process and wait for it to successfully produce output before executing the dependent process. If two processes are defined in the workflow graph that are not directly or indirectly dependent, these processes are **independent**, and may execute in any order or execute concurrently. A workflow is complete when all steps have been executed. Dependencies between parameters are expressed using the `source` field on [workflow step input parameters](#WorkflowStepInput) and [workflow output parameters](#WorkflowOutputParameter). The `source` field expresses the dependency of one parameter on another such that when a value is associated with the parameter specified by `source`, that value is propagated to the destination parameter. When all data links inbound to a given step are fufilled, the step is ready to execute. ## Workflow success and failure A completed process must result in one of `success`, `temporaryFailure` or `permanentFailure` states. An implementation may choose to retry a process execution which resulted in `temporaryFailure`. An implementation may choose to either continue running other steps of a workflow, or terminate immediately upon `permanentFailure`. * If any step of a workflow execution results in `permanentFailure`, then the workflow status is `permanentFailure`. * If one or more steps result in `temporaryFailure` and all other steps complete `success` or are not executed, then the workflow status is `temporaryFailure`. * If all workflow steps are executed and complete with `success`, then the workflow status is `success`. # Extensions [ScatterFeatureRequirement](#ScatterFeatureRequirement) and [SubworkflowFeatureRequirement](#SubworkflowFeatureRequirement) are available as standard extensions to core workflow semantics. fields: - name: "class" jsonldPredicate: "_id": "@type" "_type": "@vocab" type: string - name: steps doc: | The individual steps that make up the workflow. Each step is executed when all of its input data links are fufilled. An implementation may choose to execute the steps in a different order than listed and/or execute steps concurrently, provided that dependencies between steps are met. type: - type: array items: "#WorkflowStep" - type: record name: SubworkflowFeatureRequirement extends: "#ProcessRequirement" doc: | Indicates that the workflow platform must support nested workflows in the `run` field of (WorkflowStep)(#WorkflowStep). - name: ScatterFeatureRequirement type: record extends: "#ProcessRequirement" doc: | Indicates that the workflow platform must support the `scatter` and `scatterMethod` fields of [WorkflowStep](#WorkflowStep). - name: MultipleInputFeatureRequirement type: record extends: "#ProcessRequirement" doc: | Indicates that the workflow platform must support multiple inbound data links listed in the `source` field of [WorkflowStepInput](#WorkflowStepInput). - type: record name: StepInputExpressionRequirement extends: "#ProcessRequirement" doc: | Indicate that the workflow platform must support the `valueFrom` field of [WorkflowStepInput](#WorkflowStepInput).cwltool-1.0.20180302231433/cwltool/schemas/draft-3/CommandLineTool-standalone.yml0000644000175200017520000000006513247251315030042 0ustar mcrusoemcrusoe00000000000000- $import: Process.yml - $import: CommandLineTool.ymlcwltool-1.0.20180302231433/cwltool/schemas/draft-3/intro.md0000644000175200017520000000156713247251315023632 0ustar mcrusoemcrusoe00000000000000# Status of This Document This document is the product of the [Common Workflow Language working group](https://groups.google.com/forum/#!forum/common-workflow-language). The latest version of this document is available in the "draft-3" directory at https://github.com/common-workflow-language/common-workflow-language The products of the CWL working group (including this document) are made available under the terms of the Apache License, version 2.0. # Introduction The Common Workflow Language (CWL) working group is an informal, multi-vendor working group consisting of various organizations and individuals that have an interest in portability of data analysis workflows. The goal is to create specifications like this one that enable data scientists to describe analysis tools and workflows that are powerful, easy to use, portable, and support reproducibility. cwltool-1.0.20180302231433/cwltool/schemas/draft-3/CommandLineTool.yml0000644000175200017520000005635213247251315025726 0ustar mcrusoemcrusoe00000000000000$base: "https://w3id.org/cwl/cwl#" $namespaces: cwl: "https://w3id.org/cwl/cwl#" $graph: - name: CommandLineToolDoc type: documentation doc: - | # Common Workflow Language (CWL) Command Line Tool Description, draft 3 This version: * https://w3id.org/cwl/draft-3/ Current version: * https://w3id.org/cwl/ - "\n\n" - {$include: contrib.md} - "\n\n" - | # Abstract A Command Line Tool is a non-interactive executable program that reads some input, performs a computation, and terminates after producing some output. Command line programs are a flexible unit of code sharing and reuse, unfortunately the syntax and input/output semantics among command line programs is extremely heterogeneous. A common layer for describing the syntax and semantics of programs can reduce this incidental complexity by providing a consistent way to connect programs together. This specification defines the Common Workflow Language (CWL) Command Line Tool Description, a vendor-neutral standard for describing the syntax and input/output semantics of command line programs. - {$include: intro.md} - | ## Introduction to draft 3 This specification represents the third milestone of the CWL group. Since draft-2, this draft introduces the following major changes and additions: * Greatly simplified naming within a document with scoped identifiers, as described in the [Schema Salad specification](SchemaSalad.html). * The draft-2 concept of pluggable expression engines has been replaced by a [streamlined expression syntax](#Parameter_references) and standardization on [Javascript](#Expressions). * [File](#File) objects can now include a `format` field to indicate the file type. * The addition of [ShellCommandRequirement](#ShellCommandRequirement). * The addition of [ResourceRequirement](#ResourceRequirement). * The separation of CommandLineTool and Workflow components into separate specifications. ## Purpose Standalone programs are a flexible and interoperable form of code reuse. Unlike monolithic applications, applications and analysis workflows which are composed of multiple separate programs can be written in multiple languages and execute concurrently on multiple hosts. However, POSIX does not dictate computer-readable grammar or semantics for program input and output, resulting in extremely heterogeneous command line grammar and input/output semantics among program. This a particular problem in distributed computing (multi-node compute clusters) and virtualized environments (such as Docker containers) where it is often necessary to provision resources such as input files before executing the program. Often this is gap is filled by hard coding program invocation and implicitly assuming requirements will be met, or abstracting program invocation with wrapper scripts or descriptor documents. Unfortunately, where these approaches are application or platform specific it creates a significant barrier to reproducibility and portability, as methods developed for one platform must be manually ported to be used on new platforms. Similarly it creates redundant work, as wrappers for popular tools must be rewritten for each application or platform in use. The Common Workflow Language Command Line Tool Description is designed to provide a common standard description of grammar and semantics for invoking programs used in data-intensive fields such as Bioinformatics, Chemistry, Physics, Astronomy, and Statistics. This specification defines a precise data and execution model for Command Line Tools that can be implemented on a variety of computing platforms, ranging from a single workstation to cluster, grid, cloud, and high performance computing platforms. - {$include: concepts.md} - {$include: invocation.md} - type: record name: FileDef doc: | Define a file that must be placed in the designated output directory prior to executing the command line tool. May be the result of executing an expression, such as building a configuration file from a template. fields: - name: "filename" type: ["string", "#Expression"] doc: "The name of the file to create in the output directory." - name: "fileContent" type: ["string", "#Expression"] doc: | If the value is a string literal or an expression which evaluates to a string, a new file must be created with the string as the file contents. If the value is an expression that evaluates to a File object, this indicates the referenced file should be added to the designated output directory prior to executing the tool. Files added in this way may be read-only, and may be provided by bind mounts or file system links to avoid unnecessary copying of the input file. - type: record name: EnvironmentDef doc: | Define an environment variable that will be set in the runtime environment by the workflow platform when executing the command line tool. May be the result of executing an expression, such as getting a parameter from input. fields: - name: "envName" type: "string" doc: The environment variable name - name: "envValue" type: ["string", "#Expression"] doc: The environment variable value - type: record name: CommandLineBinding extends: "#InputBinding" doc: | When listed under `inputBinding` in the input schema, the term "value" refers to the the corresponding value in the input object. For binding objects listed in `CommandLineTool.arguments`, the term "value" refers to the effective value after evaluating `valueFrom`. The binding behavior when building the command line depends on the data type of the value. If there is a mismatch between the type described by the input schema and the effective value, such as resulting from an expression evaluation, an implementation must use the data type of the effective value. - **string**: Add `prefix` and the string to the command line. - **number**: Add `prefix` and decimal representation to command line. - **boolean**: If true, add `prefix` to the command line. If false, add nothing. - **File**: Add `prefix` and the value of [`File.path`](#File) to the command line. - **array**: If `itemSeparator` is specified, add `prefix` and the join the array into a single string with `itemSeparator` separating the items. Otherwise first add `prefix`, then recursively process individual elements. - **object**: Add `prefix` only, and recursively add object fields for which `inputBinding` is specified. - **null**: Add nothing. fields: - name: "position" type: ["null", "int"] doc: "The sorting key. Default position is 0." - name: "prefix" type: [ "null", "string"] doc: "Command line prefix to add before the value." - name: "separate" type: ["null", boolean] doc: | If true (default), then the prefix and value must be added as separate command line arguments; if false, prefix and value must be concatenated into a single command line argument. - name: "itemSeparator" type: ["null", "string"] doc: | Join the array elements into a single string with the elements separated by by `itemSeparator`. - name: "valueFrom" type: - "null" - "string" - "#Expression" jsonldPredicate: "cwl:valueFrom" doc: | If `valueFrom` is a constant string value, use this as the value and apply the binding rules above. If `valueFrom` is an expression, evaluate the expression to yield the actual value to use to build the command line and apply the binding rules above. If the inputBinding is associated with an input parameter, the value of `self` in the expression will be the value of the input parameter. When a binding is part of the `CommandLineTool.arguments` field, the `valueFrom` field is required. - name: shellQuote type: ["null", boolean] doc: | If `ShellCommandRequirement` is in the requirements for the current command, this controls whether the value is quoted on the command line (default is true). Use `shellQuote: false` to inject metacharacters for operations such as pipes. - type: record name: CommandOutputBinding extends: "#OutputBinding" doc: | Describes how to generate an output parameter based on the files produced by a CommandLineTool. The output parameter is generated by applying these operations in the following order: - glob - loadContents - outputEval fields: - name: glob type: - "null" - string - "#Expression" - type: array items: string doc: | Find files relative to the output directory, using POSIX glob(3) pathname matching. If provided an array, find files that match any pattern in the array. If provided an expression, the expression must return a string or an array of strings, which will then be evaluated as one or more glob patterns. Must only match and return files which actually exist. - name: loadContents type: - "null" - boolean jsonldPredicate: "cwl:loadContents" doc: | For each file matched in `glob`, read up to the first 64 KiB of text from the file and place it in the `contents` field of the file object for manipulation by `outputEval`. - name: outputEval type: - "null" - string - "#Expression" doc: | Evaluate an expression to generate the output value. If `glob` was specified, the value of `self` must be an array containing file objects that were matched. If no files were matched, `self' must be a zero length array; if a single file was matched, the value of `self` is an array of a single element. Additionally, if `loadContents` is `true`, the File objects must include up to the first 64 KiB of file contents in the `contents` field. - name: CommandInputRecordField type: record extends: "#InputRecordField" specialize: - specializeFrom: "#InputRecordSchema" specializeTo: "#CommandInputRecordSchema" - specializeFrom: "#InputEnumSchema" specializeTo: "#CommandInputEnumSchema" - specializeFrom: "#InputArraySchema" specializeTo: "#CommandInputArraySchema" - specializeFrom: "#InputBinding" specializeTo: "#CommandLineBinding" - name: CommandInputRecordSchema type: record extends: "#InputRecordSchema" specialize: - specializeFrom: "#InputRecordField" specializeTo: "#CommandInputRecordField" - name: CommandInputEnumSchema type: record extends: "#InputEnumSchema" specialize: - specializeFrom: "#InputBinding" specializeTo: "#CommandLineBinding" - name: CommandInputArraySchema type: record extends: "#InputArraySchema" specialize: - specializeFrom: "#InputRecordSchema" specializeTo: "#CommandInputRecordSchema" - specializeFrom: "#InputEnumSchema" specializeTo: "#CommandInputEnumSchema" - specializeFrom: "#InputArraySchema" specializeTo: "#CommandInputArraySchema" - specializeFrom: "#InputBinding" specializeTo: "#CommandLineBinding" - name: CommandOutputRecordField type: record extends: "#OutputRecordField" specialize: - specializeFrom: "#OutputRecordSchema" specializeTo: "#CommandOutputRecordSchema" - specializeFrom: "#OutputEnumSchema" specializeTo: "#CommandOutputEnumSchema" - specializeFrom: "#OutputArraySchema" specializeTo: "#CommandOutputArraySchema" - specializeFrom: "#OutputBinding" specializeTo: "#CommandOutputBinding" - name: CommandOutputRecordSchema type: record extends: "#OutputRecordSchema" specialize: - specializeFrom: "#OutputRecordField" specializeTo: "#CommandOutputRecordField" - name: CommandOutputEnumSchema type: record extends: "#OutputEnumSchema" specialize: - specializeFrom: "#OutputRecordSchema" specializeTo: "#CommandOutputRecordSchema" - specializeFrom: "#OutputEnumSchema" specializeTo: "#CommandOutputEnumSchema" - specializeFrom: "#OutputArraySchema" specializeTo: "#CommandOutputArraySchema" - specializeFrom: "#OutputBinding" specializeTo: "#CommandOutputBinding" - name: CommandOutputArraySchema type: record extends: "#OutputArraySchema" specialize: - specializeFrom: "#OutputRecordSchema" specializeTo: "#CommandOutputRecordSchema" - specializeFrom: "#OutputEnumSchema" specializeTo: "#CommandOutputEnumSchema" - specializeFrom: "#OutputArraySchema" specializeTo: "#CommandOutputArraySchema" - specializeFrom: "#OutputBinding" specializeTo: "#CommandOutputBinding" - type: record name: CommandInputParameter extends: "#InputParameter" doc: An input parameter for a CommandLineTool. specialize: - specializeFrom: "#InputRecordSchema" specializeTo: "#CommandInputRecordSchema" - specializeFrom: "#InputEnumSchema" specializeTo: "#CommandInputEnumSchema" - specializeFrom: "#InputArraySchema" specializeTo: "#CommandInputArraySchema" - specializeFrom: "#InputBinding" specializeTo: "#CommandLineBinding" - type: record name: CommandOutputParameter extends: "#OutputParameter" doc: An output parameter for a CommandLineTool. specialize: - specializeFrom: "#OutputRecordSchema" specializeTo: "#CommandOutputRecordSchema" - specializeFrom: "#OutputEnumSchema" specializeTo: "#CommandOutputEnumSchema" - specializeFrom: "#OutputArraySchema" specializeTo: "#CommandOutputArraySchema" - specializeFrom: "#OutputBinding" specializeTo: "#CommandOutputBinding" - type: record name: CommandLineTool extends: "#Process" documentRoot: true specialize: - specializeFrom: "#InputParameter" specializeTo: "#CommandInputParameter" - specializeFrom: "#OutputParameter" specializeTo: "#CommandOutputParameter" doc: | This defines the schema of the CWL Command Line Tool Description document. fields: - name: "class" jsonldPredicate: "_id": "@type" "_type": "@vocab" type: string - name: baseCommand doc: | Specifies the program to execute. If the value is an array, the first element is the program to execute, and subsequent elements are placed at the beginning of the command line in prior to any command line bindings. If the program includes a path separator character it must be an absolute path, otherwise it is an error. If the program does not include a path separator, search the `$PATH` variable in the runtime environment of the workflow runner find the absolute path of the executable. type: - string - type: array items: string jsonldPredicate: "_id": "cwl:baseCommand" "_container": "@list" - name: arguments doc: | Command line bindings which are not directly associated with input parameters. type: - "null" - type: array items: [string, "#CommandLineBinding"] jsonldPredicate: "_id": "cwl:arguments" "_container": "@list" - name: stdin type: ["null", string, "#Expression"] doc: | A path to a file whose contents must be piped into the command's standard input stream. - name: stdout type: ["null", string, "#Expression"] doc: | Capture the command's standard output stream to a file written to the designated output directory. If `stdout` is a string, it specifies the file name to use. If `stdout` is an expression, the expression is evaluated and must return a string with the file name to use to capture stdout. If the return value is not a string, or the resulting path contains illegal characters (such as the path separator `/`) it is an error. - name: successCodes type: - "null" - type: array items: int doc: | Exit codes that indicate the process completed successfully. - name: temporaryFailCodes type: - "null" - type: array items: int doc: | Exit codes that indicate the process failed due to a possibly temporary condition, where excuting the process with the same runtime environment and inputs may produce different results. - name: permanentFailCodes type: - "null" - type: array items: int doc: Exit codes that indicate the process failed due to a permanent logic error, where excuting the process with the same runtime environment and same inputs is expected to always fail. - type: record name: DockerRequirement extends: "#ProcessRequirement" doc: | Indicates that a workflow component should be run in a [Docker](http://docker.com) container, and specifies how to fetch or build the image. If a CommandLineTool lists `DockerRequirement` under `hints` or `requirements`, it may (or must) be run in the specified Docker container. The platform must first acquire or install the correct Docker image as specified by `dockerPull`, `dockerImport`, `dockerLoad` or `dockerFile`. The platform must execute the tool in the container using `docker run` with the appropriate Docker image and tool command line. The workflow platform may provide input files and the designated output directory through the use of volume bind mounts. The platform may rewrite file paths in the input object to correspond to the Docker bind mounted locations. When running a tool contained in Docker, the workflow platform must not assume anything about the contents of the Docker container, such as the presence or absence of specific software, except to assume that the generated command line represents a valid command within the runtime environment of the container. ## Interaction with other requirements If [EnvVarRequirement](#EnvVarRequirement) is specified alongside a DockerRequirement, the environment variables must be provided to Docker using `--env` or `--env-file` and interact with the container's preexisting environment as defined by Docker. fields: - name: dockerPull type: ["null", "string"] doc: "Specify a Docker image to retrieve using `docker pull`." - name: "dockerLoad" type: ["null", "string"] doc: "Specify a HTTP URL from which to download a Docker image using `docker load`." - name: dockerFile type: ["null", "string"] doc: "Supply the contents of a Dockerfile which will be built using `docker build`." - name: dockerImport type: ["null", "string"] doc: "Provide HTTP URL to download and gunzip a Docker images using `docker import." - name: dockerImageId type: ["null", "string"] doc: | The image id that will be used for `docker run`. May be a human-readable image name or the image identifier hash. May be skipped if `dockerPull` is specified, in which case the `dockerPull` image id must be used. - name: dockerOutputDirectory type: ["null", "string"] doc: | Set the designated output directory to a specific location inside the Docker container. - name: CreateFileRequirement type: record extends: "#ProcessRequirement" doc: | Define a list of files that must be created by the workflow platform in the designated output directory prior to executing the command line tool. See `FileDef` for details. fields: - name: fileDef type: type: "array" items: "#FileDef" doc: The list of files. - name: EnvVarRequirement type: record extends: "#ProcessRequirement" doc: | Define a list of environment variables which will be set in the execution environment of the tool. See `EnvironmentDef` for details. fields: - name: envDef type: type: "array" items: "#EnvironmentDef" doc: The list of environment variables. - type: record name: ShellCommandRequirement extends: "#ProcessRequirement" doc: | Modify the behavior of CommandLineTool to generate a single string containing a shell command line. Each item in the argument list must be joined into a string separated by single spaces and quoted to prevent intepretation by the shell, unless `CommandLineBinding` for that argument contains `shellQuote: false`. If `shellQuote: false` is specified, the argument is joined into the command string without quoting, which allows the use of shell metacharacters such as `|` for pipes. - type: record name: ResourceRequirement extends: "#ProcessRequirement" doc: | Specify basic hardware resource requirements. "min" is the minimum amount of a resource that must be reserved to schedule a job. If "min" cannot be satisfied, the job should not be run. "max" is the maximum amount of a resource that the job shall be permitted to use. If a node has sufficient resources, multiple jobs may be scheduled on a single node provided each job's "max" resource requirements are met. If a job attempts to exceed its "max" resource allocation, an implementation may deny additional resources, which may result in job failure. If "min" is specified but "max" is not, then "max" == "min" If "max" is specified by "min" is not, then "min" == "max". It is an error if max < min. It is an error if the value of any of these fields is negative. If neither "min" nor "max" is specified for a resource, an implementation may provide a default. fields: - name: coresMin type: ["null", long, string, "#Expression"] doc: Minimum reserved number of CPU cores - name: coresMax type: ["null", int, string, "#Expression"] doc: Maximum reserved number of CPU cores - name: ramMin type: ["null", long, string, "#Expression"] doc: Minimum reserved RAM in mebibytes (2**20) - name: ramMax type: ["null", long, string, "#Expression"] doc: Maximum reserved RAM in mebibytes (2**20) - name: tmpdirMin type: ["null", long, string, "#Expression"] doc: Minimum reserved filesystem based storage for the designated temporary directory, in mebibytes (2**20) - name: tmpdirMax type: ["null", long, string, "#Expression"] doc: Maximum reserved filesystem based storage for the designated temporary directory, in mebibytes (2**20) - name: outdirMin type: ["null", long, string, "#Expression"] doc: Minimum reserved filesystem based storage for the designated output directory, in mebibytes (2**20) - name: outdirMax type: ["null", long, string, "#Expression"] doc: Maximum reserved filesystem based storage for the designated output directory, in mebibytes (2**20) cwltool-1.0.20180302231433/cwltool/schemas/v1.0/0000755000175200017520000000000013247251336021373 5ustar mcrusoemcrusoe00000000000000cwltool-1.0.20180302231433/cwltool/schemas/v1.0/UserGuide.yml0000644000175200017520000010042013247251315024004 0ustar mcrusoemcrusoe00000000000000- name: userguide type: documentation doc: - $include: userguide-intro.md - | # Wrapping Command Line Tools - | ## First example The simplest "hello world" program. This accepts one input parameter, writes a message to the terminal or job log, and produces no permanent output. CWL documents are written in [JSON](http://json.org) or [YAML](http://yaml.org), or a mix of the two. *1st-tool.cwl* ``` - $include: examples/1st-tool.cwl - | ``` Use a YAML object in a separate file to describe the input of a run: *echo-job.yml* ``` - $include: examples/echo-job.yml - | ``` Now invoke `cwl-runner` with the tool wrapper and the input object on the command line: ``` $ cwl-runner 1st-tool.cwl echo-job.yml [job 140199012414352] $ echo 'Hello world!' Hello world! Final process status is success ``` What's going on here? Let's break it down: ``` cwlVersion: v1.0 class: CommandLineTool ``` The `cwlVersion` field indicates the version of the CWL spec used by the document. The `class` field indicates this document describes a command line tool. ``` baseCommand: echo ``` The `baseCommand` provides the name of program that will actually run (echo) ``` inputs: message: type: string inputBinding: position: 1 ``` The `inputs` section describes the inputs of the tool. This is a list of input parameters and each parameter includes an identifier, a data type, and optionally an `inputBinding` which describes how this input parameter should appear on the command line. In this example, the `position` field indicates where it should appear on the command line. ``` outputs: [] ``` This tool has no formal output, so the `outputs` section is an empty list. - | ## Essential input parameters The `inputs` of a tool is a list of input parameters that control how to run the tool. Each parameter has an `id` for the name of parameter, and `type` describing what types of values are valid for that parameter. Available primitive types are *string*, *int*, *long*, *float*, *double*, and *null*; complex types are *array* and *record*; in addition there are special types *File*, *Directory* and *Any*. The following example demonstrates some input parameters with different types and appearing on the command line in different ways: *inp.cwl* ``` - $include: examples/inp.cwl - | ``` *inp-job.yml* ``` - $include: examples/inp-job.yml - | ``` Notice that "example_file", as a `File` type, must be provided as an object with the fields `class: File` and `path`. Next, create a whale.txt and invoke `cwl-runner` with the tool wrapper and the input object on the command line: ``` $ touch whale.txt $ cwl-runner inp.cwl inp-job.yml [job 140020149614160] /home/example$ echo -f -i42 --example-string hello --file=/home/example/whale.txt -f -i42 --example-string hello --file=/home/example/whale.txt Final process status is success ``` The field `inputBinding` is optional and indicates whether and how the input parameter should be appear on the tool's command line. If `inputBinding` is missing, the parameter does not appear on the command line. Let's look at each example in detail. ``` example_flag: type: boolean inputBinding: position: 1 prefix: -f ``` Boolean types are treated as a flag. If the input parameter "example_flag" is "true", then `prefix` will be added to the command line. If false, no flag is added. ``` example_string: type: string inputBinding: position: 3 prefix: --example-string ``` String types appear on the command line as literal values. The `prefix` is optional, if provided, it appears as a separate argument on the command line before the parameter . In the example above, this is rendered as `--example-string hello`. ``` example_int: type: int inputBinding: position: 2 prefix: -i separate: false ``` Integer (and floating point) types appear on the command line with decimal text representation. When the option `separate` is false (the default value is true), the prefix and value are combined into a single argument. In the example above, this is rendered as `-i42`. ``` example_file: type: File? inputBinding: prefix: --file= separate: false position: 4 ``` File types appear on the command line as the path to the file. When the parameter type ends with a question mark `?` it indicates that the parameter is optional. In the example above, this is rendered as `--file=/home/example/whale.txt`. However, if the "example_file" parameter were not provided in the input, nothing would appear on the command line. Input files are read-only. If you wish to update an input file, you must first copy it to the output directory. The value of `position` is used to determine where parameter should appear on the command line. Positions are relative to one another, not absolute. As a result, positions do not have to be sequential, three parameters with positions `[1, 3, 5]` will result in the same command line as `[1, 2, 3]`. More than one parameter can have the same position (ties are broken using the parameter name), and the position field itself is optional. the default position is 0. The `baseCommand` field always comes before parameters. - | ## Returning output files The `outputs` of a tool is a list of output parameters that should be returned after running the tool. Each parameter has an `id` for the name of parameter, and `type` describing what types of values are valid for that parameter. When a tool runs under CWL, the starting working directory is the designated output directory. The underlying tool or script must record its results in the form of files created in the output directory. The output parameters returned by the CWL tool are either the output files themselves, or come from examining the content of those files. *tar.cwl* ``` - $include: examples/tar.cwl - | ``` *tar-job.yml* ``` - $include: examples/tar-job.yml - | ``` Next, create a tar file for the example and invoke `cwl-runner` with the tool wrapper and the input object on the command line: ``` $ touch hello.txt && tar -cvf hello.tar hello.txt $ cwl-runner tar.cwl tar-job.yml [job 139868145165200] $ tar xf /home/example/hello.tar Final process status is success { "example_out": { "location": "hello.txt", "size": 13, "class": "File", "checksum": "sha1$47a013e660d408619d894b20806b1d5086aab03b" } } ``` The field `outputBinding` describes how to to set the value of each output parameter. ``` outputs: example_out: type: File outputBinding: glob: hello.txt ``` The `glob` field consists of the name of a file in the output directory. If you don't know name of the file in advance, you can use a wildcard pattern. - | ## Capturing a tool's standard output stream To capture a tool's standard output stream, add the `stdout` field with the name of the file where the output stream should go. Then add `type: stdout` on the corresponding output parameter. *stdout.cwl* ``` - $include: examples/stdout.cwl - | ``` *echo-job.yml* ``` - $include: examples/echo-job.yml - | ``` Now invoke `cwl-runner` providing the tool wrapper and the input object on the command line: ``` $ cwl-runner stdout.cwl echo-job.yml [job 140199012414352] $ echo 'Hello world!' > output.txt Final process status is success { "output": { "location": "output.txt", "size": 13, "class": "File", "checksum": "sha1$47a013e660d408619d894b20806b1d5086aab03b" } } $ cat output.txt Hello world! ``` - | ## Parameter references In a previous example, we extracted a file using the "tar" program. However, that example was very limited because it assumed that the file we were interested in was called "hello.txt". In this example, you will see how to reference the value of input parameters dynamically from other fields. *tar-param.cwl* ``` - $include: examples/tar-param.cwl - | ``` *tar-param-job.yml* ``` - $include: examples/tar-param-job.yml - | ``` Create your input files and invoke `cwl-runner` with the tool wrapper and the input object on the command line: ``` $ rm hello.tar || true && touch goodbye.txt && tar -cvf hello.tar goodbye.txt $ cwl-runner tar-param.cwl tar-param-job.yml [job 139868145165200] $ tar xf /home/example/hello.tar goodbye.txt Final process status is success { "example_out": { "location": "goodbye.txt", "size": 24, "class": "File", "checksum": "sha1$dd0a4c4c49ba43004d6611771972b6cf969c1c01" } } ``` Certain fields permit parameter references which are enclosed in `$(...)`. These are evaluated and replaced with value being referenced. ``` outputs: example_out: type: File outputBinding: glob: $(inputs.extractfile) ``` References are written using a subset of Javascript syntax. In this example, `$(inputs.extractfile)`, `$(inputs["extractfile"])`, and `$(inputs['extractfile'])` are equivalent. The value of the "inputs" variable is the input object provided when the CWL tool was invoked. Note that because File parameters are objects, to get the path to an input file you must reference the path field on a file object; to reference the path to the tar file in the above example you would write `$(inputs.tarfile.path)`. - | ## Running tools inside Docker [Docker](http://docker.io) containers simplify software installation by providing a complete known-good runtime for software and its dependencies. However, containers are also purposefully isolated from the host system, so in order to run a tool inside a Docker container there is additional work to ensure that input files are available inside the container and output files can be recovered from the container. CWL can perform this work automatically, allowing you to use Docker to simplify your software management while avoiding the complexity of invoking and managing Docker containers. This example runs a simple Node.js script inside a Docker container. *docker.cwl* ``` - $include: examples/docker.cwl - | ``` *docker-job.yml* ``` - $include: examples/docker-job.yml - | ``` Provide a hello.js and invoke `cwl-runner` providing the tool wrapper and the input object on the command line: ``` $ echo "console.log(\"Hello World\");" > hello.js $ cwl-runner docker.cwl docker-job.yml [job 140259721854416] /home/example$ docker run -i --volume=/home/example/hello.js:/var/lib/cwl/job369354770_examples/hello.js:ro --volume=/home/example:/var/spool/cwl:rw --volume=/tmp/tmpDLs5hm:/tmp:rw --workdir=/var/spool/cwl --read-only=true --net=none --user=1001 --rm --env=TMPDIR=/tmp node:slim node /var/lib/cwl/job369354770_examples/hello.js Hello world! Final process status is success ``` Notice the CWL runner has constructed a Docker command line to run the script. One of the responsibilies of the CWL runner is to the paths of input files to reflect the location where they appear inside the container. In this example, the path to the script `hello.js` is `/home/example/hello.js` outside the container but `/var/lib/cwl/job369354770_examples/hello.js` inside the container, as reflected in the invocation of the `node` command. - | ## Additional command line arguments and runtime parameters Sometimes tools require additional command line options that don't correspond exactly to input parameters. In this example, we will wrap the Java compiler to compile a java source file to a class file. By default, `javac` will create the class files in the same directory as the source file. However, CWL input files (and the directories in which they appear) may be read-only, so we need to instruct javac to write the class file to the designated output directory instead. *arguments.cwl* ``` - $include: examples/arguments.cwl - | ``` *arguments-job.yml* ``` - $include: examples/arguments-job.yml - | ``` Now create a sample Java file and invoke `cwl-runner` providing the tool wrapper and the input object on the command line: ``` $ echo "public class Hello {}" > Hello.java $ cwl-runner arguments.cwl arguments-job.yml [job arguments.cwl] /tmp/tmpwYALo1$ docker \ run \ -i \ --volume=/home/peter/work/common-workflow-language/v1.0/examples/Hello.java:/var/lib/cwl/stg8939ac04-7443-4990-a518-1855b2322141/Hello.java:ro \ --volume=/tmp/tmpwYALo1:/var/spool/cwl:rw \ --volume=/tmp/tmpptIAJ8:/tmp:rw \ --workdir=/var/spool/cwl \ --read-only=true \ --user=1001 \ --rm \ --env=TMPDIR=/tmp \ --env=HOME=/var/spool/cwl \ java:7 \ javac \ -d \ /var/spool/cwl \ /var/lib/cwl/stg8939ac04-7443-4990-a518-1855b2322141/Hello.java Final process status is success { "classfile": { "size": 416, "location": "/home/example/Hello.class", "checksum": "sha1$2f7ac33c1f3aac3f1fec7b936b6562422c85b38a", "class": "File" } } ``` Here we use the `arguments` field to add an additional argument to the command line that isn't tied to a specific input parameter. ``` arguments: ["-d", $(runtime.outdir)] ``` This example references a runtime parameter. Runtime parameters provide information about the hardware or software environment when the tool is actually executed. The `$(runtime.outdir)` parameter is the path to the designated output directory. Other parameters include `$(runtime.tmpdir)`, `$(runtime.ram)`, `$(runtime.cores)`, `$(runtime.outdirSize)`, and `$(runtime.tmpdirSize)`. See the [Runtime Environment](CommandLineTool.html#Runtime_environment) section of the CWL specification for details. - | ## Array inputs It is easy to add arrays of input parameters represented to the command line. To specify an array parameter, the array definition is nested under the `type` field with `type: array` and `items` defining the valid data types that may appear in the array. *array-inputs.cwl* ``` - $include: examples/array-inputs.cwl - | ``` *array-inputs-job.yml* ``` - $include: examples/array-inputs-job.yml - | ``` Now invoke `cwl-runner` providing the tool wrapper and the input object on the command line: ``` $ cwl-runner array-inputs.cwl array-inputs-job.yml [job 140334923640912] /home/example$ echo -A one two three -B=four -B=five -B=six -C=seven,eight,nine -A one two three -B=four -B=five -B=six -C=seven,eight,nine Final process status is success {} ``` The `inputBinding` can appear either on the outer array parameter definition or the inner array element definition, and these produce different behavior when constructing the command line, as shown above. In addition, the `itemSeperator` field, if provided, specifies that array values should be concatenated into a single argument separated by the item separator string. You can specify arrays of arrays, arrays of records, and other complex types. - | ## Array outputs You can also capture multiple output files into an array of files using `glob`. *array-outputs.cwl* ``` - $include: examples/array-outputs.cwl - | ``` *array-outputs-job.yml* ``` - $include: examples/array-outputs-job.yml - | ``` Now invoke `cwl-runner` providing the tool wrapper and the input object on the command line: ``` $ cwl-runner array-outputs.cwl array-outputs-job.yml [job 140190876078160] /home/example$ touch foo.txt bar.dat baz.txt Final process status is success { "output": [ { "size": 0, "location": "examples/foo.txt", "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709", "class": "File" }, { "size": 0, "location": "examples/baz.txt", "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709", "class": "File" } ] } ``` - | ## Record inputs, dependent and mutually exclusive parameters Sometimes an underlying tool has several arguments that must be provided together (they are dependent) or several arguments that cannot be provided together (they are exclusive). You can use records and type unions to group parameters together to describe these two conditions. *record.cwl* ``` - $include: examples/record.cwl - | ``` *record-job1.yml* ``` - $include: examples/record-job1.yml - | ``` ``` $ cwl-runner record.cwl record-job1.yml Workflow error: Error validating input record, could not validate field `dependent_parameters` because missing required field `itemB` ``` In the first example, you can't provide `itemA` without also providing `itemB`. *record-job2.yml* ``` - $include: examples/record-job2.yml - | ``` ``` $ cwl-runner record.cwl record-job2.yml [job 140566927111376] /home/example$ echo -A one -B two -C three -A one -B two -C three Final process status is success {} ``` In the second example, `itemC` and `itemD` are exclusive, so only `itemC` is added to the command line and `itemD` is ignored. *record-job3.yml* ``` - $include: examples/record-job3.yml - | ``` ``` $ cwl-runner record.cwl record-job3.yml [job 140606932172880] /home/example$ echo -A one -B two -D four -A one -B two -D four Final process status is success {} ``` In the third example, only `itemD` is provided, so it appears on the command line. - | ## Environment variables Tools run in a restricted environment and do not inherit most environment variables from the parent process. You can set environment variables for the tool using `EnvVarRequirement`. *env.cwl* ``` - $include: examples/env.cwl - | ``` *echo-job.yml* ``` - $include: examples/echo-job.yml - | ``` Now invoke `cwl-runner` with the tool wrapper and the input object on the command line: ``` $ cwl-runner env.cwl echo-job.yml [job 140710387785808] /home/example$ env PATH=/bin:/usr/bin:/usr/local/bin HELLO=Hello world! TMPDIR=/tmp/tmp63Obpk Final process status is success {} ``` - | ## Javascript expressions If you need to manipulate input parameters, include the requirement `InlineJavascriptRequirement` and then anywhere a parameter reference is legal you can provide a fragment of Javascript that will be evaluated by the CWL runner. *expression.cwl* ``` - $include: examples/expression.cwl - | ``` As this tool does not require any `inputs` we can run it with an (almost) empty job file: *empty.yml* ``` {} | ``` We can then run `expression.cwl`: ``` $ cwl-runner expression.cwl empty.yml [job 140000594593168] /home/example$ echo -A 2 -B baz -C 10 9 8 7 6 5 4 3 2 1 -A 2 -B baz -C 10 9 8 7 6 5 4 3 2 1 Final process status is success {} ``` You can only use expressions in certain fields. These are: `filename`, `fileContent`, `envValue`, `valueFrom`, `glob`, `outputEval`, `stdin`, `stdout`, `coresMin`, `coresMax`, `ramMin`, `ramMax`, `tmpdirMin`, `tmpdirMax`, `outdirMin`, and `outdirMax`. - | ## Creating files at runtime Sometimes you need to create a file on the fly from input parameters, such as tools which expect to read their input configuration from a file rather than the command line parameters. To do this, use `InitialWorkDirRequirement`. *createfile.cwl* ``` - $include: examples/createfile.cwl - | ``` *echo-job.yml* ``` - $include: examples/echo-job.yml - | ``` Now invoke `cwl-runner` with the tool wrapper and the input object on the command line: ``` $ cwltool createfile.cwl echo-job.yml [job 140528604979344] /home/example$ cat example.conf CONFIGVAR=Hello world! Final process status is success {} ``` - | ## Staging input files in the output directory Normally, input files are located in a read-only directory separate from the output directory. This causes problems if the underlying tool expects to write its output files alongside the input file in the same directory. You use `InitialWorkDirRequirement` to stage input files into the output directory. In this example, we use a Javascript expression to extract the base name of the input file from its leading directory path. *linkfile.cwl* ``` - $include: examples/linkfile.cwl - | ``` *arguments-job.yml* ``` - $include: examples/arguments-job.yml - | ``` Now invoke `cwl-runner` with the tool wrapper and the input object on the command line: ``` $ cwl-runner linkfile.cwl arguments-job.yml [job 139928309171664] /home/example$ docker run -i --volume=/home/example/Hello.java:/var/lib/cwl/job557617295_examples/Hello.java:ro --volume=/home/example:/var/spool/cwl:rw --volume=/tmp/tmpmNbApw:/tmp:rw --workdir=/var/spool/cwl --read-only=true --net=none --user=1001 --rm --env=TMPDIR=/tmp java:7 javac Hello.java Final process status is success { "classfile": { "size": 416, "location": "/home/example/Hello.class", "checksum": "sha1$2f7ac33c1f3aac3f1fec7b936b6562422c85b38a", "class": "File" } } ``` - | # Writing Workflows ## First workflow This workflow extracts a java source file from a tar file and then compiles it. *1st-workflow.cwl* ``` - $include: examples/1st-workflow.cwl - | ``` Use a JSON object in a separate file to describe the input of a run: *1st-workflow-job.yml* ``` - $include: examples/1st-workflow-job.yml - | ``` Now invoke `cwl-runner` with the tool wrapper and the input object on the command line: ``` $ echo "public class Hello {}" > Hello.java && tar -cvf hello.tar Hello.java $ cwl-runner 1st-workflow.cwl 1st-workflow-job.yml [job untar] /tmp/tmp94qFiM$ tar xf /home/example/hello.tar Hello.java [step untar] completion status is success [job compile] /tmp/tmpu1iaKL$ docker run -i --volume=/tmp/tmp94qFiM/Hello.java:/var/lib/cwl/job301600808_tmp94qFiM/Hello.java:ro --volume=/tmp/tmpu1iaKL:/var/spool/cwl:rw --volume=/tmp/tmpfZnNdR:/tmp:rw --workdir=/var/spool/cwl --read-only=true --net=none --user=1001 --rm --env=TMPDIR=/tmp java:7 javac -d /var/spool/cwl /var/lib/cwl/job301600808_tmp94qFiM/Hello.java [step compile] completion status is success [workflow 1st-workflow.cwl] outdir is /home/example Final process status is success { "classout": { "location": "/home/example/Hello.class", "checksum": "sha1$e68df795c0686e9aa1a1195536bd900f5f417b18", "class": "File", "size": 416 } } ``` What's going on here? Let's break it down: ``` cwlVersion: v1.0 class: Workflow ``` The `cwlVersion` field indicates the version of the CWL spec used by the document. The `class` field indicates this document describes a workflow. ``` inputs: inp: File ex: string ``` The `inputs` section describes the inputs of the workflow. This is a list of input parameters where each parameter consists of an identifier and a data type. These parameters can be used as sources for input to specific workflows steps. ``` outputs: classout: type: File outputSource: compile/classfile ``` The `outputs` section describes the outputs of the workflow. This is a list of output parameters where each parameter consists of an identifier and a data type. The `outputSource` connects the output parameter `classfile` of the `compile` step to the workflow output parameter `classout`. ``` steps: untar: run: tar-param.cwl in: tarfile: inp extractfile: ex outputs: [example_out] ``` The `steps` section describes the actual steps of the workflow. In this example, the first step extracts a file from a tar file, and the second step compiles the file from the first step using the java compiler. Workflow steps are not necessarily run in the order they are listed, instead the order is determined by the dependencies between steps (using `source`). In addition, workflow steps which do not depend on one another may run in parallel. The first step, `untar` runs `tar-param.cwl` (described previously in [Parameter references](#Parameter_references)). This tool has two input parameters, `tarfile` and `extractfile` and one output parameter `example_out`. The `inputs` section of the workflow step connects these two input parameters to the inputs of the workflow, `inp` and `ex` using `source`. This means that when the workflow step is executed, the values assigned to `inp` and `ex` will be used for the parameters `tarfile` and `extractfile` in order to run the tool. The `outputs` section of the workflow step lists the output parameters that are expected from the tool. ``` compile: run: arguments.cwl in: src: untar/example_out outputs: [classfile] ``` The second step `compile` depends on the results from the first step by connecting the input parameter `src` to the output parameter of `untar` using `untar/example_out`. The output of this step `classfile` is connected to the `outputs` section for the Workflow, described above. - | ## Nested workflows Workflows are ways to combine multiple tools to perform a larger operations. We can also think of a workflow as being a tool itself; a CWL workflow can be used as a step in another CWL workflow, if the workflow engine supports the `SubworkflowFeatureRequirement`: ``` requirements: - class: SubworkflowFeatureRequirement ``` Here's an example workflow that uses our `1st-workflow.cwl` as a nested workflow: ``` - $include: examples/nestedworkflows.cwl - | ``` A CWL `Workflow` can be used as a `step` just like a `CommandLineTool`, it's CWL file is included with `run`. The workflow inputs (`inp` and `ex`) and outputs (`classout`) then can be mapped to become the step's input/outputs. ``` compile: run: 1st-workflow.cwl in: inp: source: create-tar/tar ex: default: "Hello.java" out: [classout] ``` Our `1st-workflow.cwl` was parameterized with workflow inputs, so when running it we had to provide a job file to denote the tar file and `*.java` filename. This is generally best-practice, as it means it can be reused in multiple parent workflows, or even in multiple steps within the same workflow. Here we use `default:` to hard-code `"Hello.java"` as the `ex` input, however our workflow also requires a tar file at `inp`, which we will prepare in the `create-tar` step. At this point it is probably a good idea to refactor `1st-workflow.cwl` to have more specific input/output names, as those also appear in its usage as a tool. It is also possible to do a less generic approach and avoid external dependencies in the job file. So in this workflow we can generate a hard-coded `Hello.java` file using the previously mentioned `InitialWorkDirRequirement` requirement, before adding it to a tar file. ``` create-tar: requirements: - class: InitialWorkDirRequirement listing: - entryname: Hello.java entry: | public class Hello { public static void main(String[] argv) { System.out.println("Hello from Java"); } } ``` In this case our step can assume `Hello.java` rather than be parameterized, so we can use a simpler `arguments` form as long as the CWL workflow wngine supports the `ShellCommandRequirement`: ``` run: class: CommandLineTool requirements: - class: ShellCommandRequirement arguments: - shellQuote: false valueFrom: > tar cf hello.tar Hello.java ``` Note the use of `shellQuote: false` here, otherwise the shell will try to execute the quoted binary `"tar cf hello.tar Hello.java"`. Here the `>` block means that newlines are stripped, so it's possible to write the single command on multiple lines. Similarly, the `|` we used above will preserve newlines, combined with `ShellCommandRequirement` this would allow embedding a shell script. Shell commands should however be used sparingly in CWL, as it means you "jump out" of the workflow and no longer get reusable components, provenance or scalability. For reproducibility and portability it is recommended to only use shell commands together with a `DockerRequirement` hint, so that the commands are executed in a predictable shell environment. Did you notice that we didn't split out the `tar cf` tool to a separate file, but rather embedded it within the CWL Workflow file? This is generally not best practice, as the tool then can't be reused. The reason for doing it in this case is because the command line is hard-coded with filenames that only make sense within this workflow. In this example we had to prepare a tar file outside, but only because our inner workflow was designed to take that as an input. A better refactoring of the inner workflow would be to take a list of Java files to compile, which would simplify its usage as a tool step in other workflows. Nested workflows can be a powerful feature to generate higher-level functional and reusable workflow units - but just like for creating a CWL Tool description, care must be taken to improve its usability in multiple workflows. cwltool-1.0.20180302231433/cwltool/schemas/v1.0/userguide-intro.md0000644000175200017520000000236413247251315025044 0ustar mcrusoemcrusoe00000000000000# A Gentle Introduction to the Common Workflow Language Hello! This guide will introduce you to writing tool wrappers and workflows using the Common Workflow Language (CWL). This guide describes the current stable specification, version 1.0. Note: This document is a work in progress. Not all features are covered, yet. # Introduction CWL is a way to describe command line tools and connect them together to create workflows. Because CWL is a specification and not a specific piece of software, tools and workflows described using CWL are portable across a variety of platforms that support the CWL standard. CWL has roots in "make" and many similar tools that determine order of execution based on dependencies between tasks. However unlike "make", CWL tasks are isolated and you must be explicit about your inputs and outputs. The benefit of explicitness and isolation are flexibility, portability, and scalability: tools and workflows described with CWL can transparently leverage technologies such as Docker, be used with CWL implementations from different vendors, and is well suited for describing large-scale workflows in cluster, cloud and high performance computing environments where tasks are scheduled in parallel across many nodes. cwltool-1.0.20180302231433/cwltool/schemas/v1.0/Process.yml0000644000175200017520000007676013247251315023551 0ustar mcrusoemcrusoe00000000000000$base: "https://w3id.org/cwl/cwl#" $namespaces: cwl: "https://w3id.org/cwl/cwl#" sld: "https://w3id.org/cwl/salad#" rdfs: "http://www.w3.org/2000/01/rdf-schema#" $graph: - name: "Common Workflow Language, v1.0" type: documentation doc: {$include: concepts.md} - $import: "salad/schema_salad/metaschema/metaschema_base.yml" - name: BaseTypesDoc type: documentation doc: | ## Base types docChild: - "#CWLType" - "#Process" - type: enum name: CWLVersion doc: "Version symbols for published CWL document versions." symbols: - cwl:draft-2 - cwl:draft-3.dev1 - cwl:draft-3.dev2 - cwl:draft-3.dev3 - cwl:draft-3.dev4 - cwl:draft-3.dev5 - cwl:draft-3 - cwl:draft-4.dev1 - cwl:draft-4.dev2 - cwl:draft-4.dev3 - cwl:v1.0.dev4 - cwl:v1.0 - name: CWLType type: enum extends: "sld:PrimitiveType" symbols: - cwl:File - cwl:Directory doc: - "Extends primitive types with the concept of a file and directory as a builtin type." - "File: A File object" - "Directory: A Directory object" - name: File type: record docParent: "#CWLType" doc: | Represents a file (or group of files when `secondaryFiles` is provided) that will be accessible by tools using standard POSIX file system call API such as open(2) and read(2). Files are represented as objects with `class` of `File`. File objects have a number of properties that provide metadata about the file. The `location` property of a File is a URI that uniquely identifies the file. Implementations must support the file:// URI scheme and may support other schemes such as http://. The value of `location` may also be a relative reference, in which case it must be resolved relative to the URI of the document it appears in. Alternately to `location`, implementations must also accept the `path` property on File, which must be a filesystem path available on the same host as the CWL runner (for inputs) or the runtime environment of a command line tool execution (for command line tool outputs). If no `location` or `path` is specified, a file object must specify `contents` with the UTF-8 text content of the file. This is a "file literal". File literals do not correspond to external resources, but are created on disk with `contents` with when needed for a executing a tool. Where appropriate, expressions can return file literals to define new files on a runtime. The maximum size of `contents` is 64 kilobytes. The `basename` property defines the filename on disk where the file is staged. This may differ from the resource name. If not provided, `basename` must be computed from the last path part of `location` and made available to expressions. The `secondaryFiles` property is a list of File or Directory objects that must be staged in the same directory as the primary file. It is an error for file names to be duplicated in `secondaryFiles`. The `size` property is the size in bytes of the File. It must be computed from the resource and made available to expressions. The `checksum` field contains a cryptographic hash of the file content for use it verifying file contents. Implementations may, at user option, enable or disable computation of the `checksum` field for performance or other reasons. However, the ability to compute output checksums is required to pass the CWL conformance test suite. When executing a CommandLineTool, the files and secondary files may be staged to an arbitrary directory, but must use the value of `basename` for the filename. The `path` property must be file path in the context of the tool execution runtime (local to the compute node, or within the executing container). All computed properties should be available to expressions. File literals also must be staged and `path` must be set. When collecting CommandLineTool outputs, `glob` matching returns file paths (with the `path` property) and the derived properties. This can all be modified by `outputEval`. Alternately, if the file `cwl.outputs.json` is present in the output, `outputBinding` is ignored. File objects in the output must provide either a `location` URI or a `path` property in the context of the tool execution runtime (local to the compute node, or within the executing container). When evaluating an ExpressionTool, file objects must be referenced via `location` (the expression tool does not have access to files on disk so `path` is meaningless) or as file literals. It is legal to return a file object with an existing `location` but a different `basename`. The `loadContents` field of ExpressionTool inputs behaves the same as on CommandLineTool inputs, however it is not meaningful on the outputs. An ExpressionTool may forward file references from input to output by using the same value for `location`. fields: - name: class type: type: enum name: File_class symbols: - cwl:File jsonldPredicate: _id: "@type" _type: "@vocab" doc: Must be `File` to indicate this object describes a file. - name: location type: string? doc: | An IRI that identifies the file resource. This may be a relative reference, in which case it must be resolved using the base IRI of the document. The location may refer to a local or remote resource; the implementation must use the IRI to retrieve file content. If an implementation is unable to retrieve the file content stored at a remote resource (due to unsupported protocol, access denied, or other issue) it must signal an error. If the `location` field is not provided, the `contents` field must be provided. The implementation must assign a unique identifier for the `location` field. If the `path` field is provided but the `location` field is not, an implementation may assign the value of the `path` field to `location`, then follow the rules above. jsonldPredicate: _id: "@id" _type: "@id" - name: path type: string? doc: | The local host path where the File is available when a CommandLineTool is executed. This field must be set by the implementation. The final path component must match the value of `basename`. This field must not be used in any other context. The command line tool being executed must be able to to access the file at `path` using the POSIX `open(2)` syscall. As a special case, if the `path` field is provided but the `location` field is not, an implementation may assign the value of the `path` field to `location`, and remove the `path` field. If the `path` contains [POSIX shell metacharacters](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02) (`|`,`&`, `;`, `<`, `>`, `(`,`)`, `$`,`` ` ``, `\`, `"`, `'`, ``, ``, and ``) or characters [not allowed](http://www.iana.org/assignments/idna-tables-6.3.0/idna-tables-6.3.0.xhtml) for [Internationalized Domain Names for Applications](https://tools.ietf.org/html/rfc6452) then implementations may terminate the process with a `permanentFailure`. jsonldPredicate: "_id": "cwl:path" "_type": "@id" - name: basename type: string? doc: | The base name of the file, that is, the name of the file without any leading directory path. The base name must not contain a slash `/`. If not provided, the implementation must set this field based on the `location` field by taking the final path component after parsing `location` as an IRI. If `basename` is provided, it is not required to match the value from `location`. When this file is made available to a CommandLineTool, it must be named with `basename`, i.e. the final component of the `path` field must match `basename`. jsonldPredicate: "cwl:basename" - name: dirname type: string? doc: | The name of the directory containing file, that is, the path leading up to the final slash in the path such that `dirname + '/' + basename == path`. The implementation must set this field based on the value of `path` prior to evaluating parameter references or expressions in a CommandLineTool document. This field must not be used in any other context. - name: nameroot type: string? doc: | The basename root such that `nameroot + nameext == basename`, and `nameext` is empty or begins with a period and contains at most one period. For the purposess of path splitting leading periods on the basename are ignored; a basename of `.cshrc` will have a nameroot of `.cshrc`. The implementation must set this field automatically based on the value of `basename` prior to evaluating parameter references or expressions. - name: nameext type: string? doc: | The basename extension such that `nameroot + nameext == basename`, and `nameext` is empty or begins with a period and contains at most one period. Leading periods on the basename are ignored; a basename of `.cshrc` will have an empty `nameext`. The implementation must set this field automatically based on the value of `basename` prior to evaluating parameter references or expressions. - name: checksum type: string? doc: | Optional hash code for validating file integrity. Currently must be in the form "sha1$ + hexadecimal string" using the SHA-1 algorithm. - name: size type: long? doc: Optional file size - name: "secondaryFiles" type: - "null" - type: array items: [File, Directory] jsonldPredicate: "cwl:secondaryFiles" doc: | A list of additional files that are associated with the primary file and must be transferred alongside the primary file. Examples include indexes of the primary file, or external references which must be included when loading primary document. A file object listed in `secondaryFiles` may itself include `secondaryFiles` for which the same rules apply. - name: format type: string? jsonldPredicate: _id: cwl:format _type: "@id" identity: true doc: | The format of the file: this must be an IRI of a concept node that represents the file format, preferrably defined within an ontology. If no ontology is available, file formats may be tested by exact match. Reasoning about format compatability must be done by checking that an input file format is the same, `owl:equivalentClass` or `rdfs:subClassOf` the format required by the input parameter. `owl:equivalentClass` is transitive with `rdfs:subClassOf`, e.g. if ` owl:equivalentClass ` and ` owl:subclassOf
` then infer ` owl:subclassOf `. File format ontologies may be provided in the "$schema" metadata at the root of the document. If no ontologies are specified in `$schema`, the runtime may perform exact file format matches. - name: contents type: string? doc: | File contents literal. Maximum of 64 KiB. If neither `location` nor `path` is provided, `contents` must be non-null. The implementation must assign a unique identifier for the `location` field. When the file is staged as input to CommandLineTool, the value of `contents` must be written to a file. If `loadContents` of `inputBinding` or `outputBinding` is true and `location` is valid, the implementation must read up to the first 64 KiB of text from the file and place it in the "contents" field. - name: Directory type: record docAfter: "#File" doc: | Represents a directory to present to a command line tool. Directories are represented as objects with `class` of `Directory`. Directory objects have a number of properties that provide metadata about the directory. The `location` property of a Directory is a URI that uniquely identifies the directory. Implementations must support the file:// URI scheme and may support other schemes such as http://. Alternately to `location`, implementations must also accept the `path` property on Direcotry, which must be a filesystem path available on the same host as the CWL runner (for inputs) or the runtime environment of a command line tool execution (for command line tool outputs). A Directory object may have a `listing` field. This is a list of File and Directory objects that are contained in the Directory. For each entry in `listing`, the `basename` property defines the name of the File or Subdirectory when staged to disk. If `listing` is not provided, the implementation must have some way of fetching the Directory listing at runtime based on the `location` field. If a Directory does not have `location`, it is a Directory literal. A Directory literal must provide `listing`. Directory literals must be created on disk at runtime as needed. The resources in a Directory literal do not need to have any implied relationship in their `location`. For example, a Directory listing may contain two files located on different hosts. It is the responsibility of the runtime to ensure that those files are staged to disk appropriately. Secondary files associated with files in `listing` must also be staged to the same Directory. When executing a CommandLineTool, Directories must be recursively staged first and have local values of `path` assigend. Directory objects in CommandLineTool output must provide either a `location` URI or a `path` property in the context of the tool execution runtime (local to the compute node, or within the executing container). An ExpressionTool may forward file references from input to output by using the same value for `location`. Name conflicts (the same `basename` appearing multiple times in `listing` or in any entry in `secondaryFiles` in the listing) is a fatal error. fields: - name: class type: type: enum name: Directory_class symbols: - cwl:Directory jsonldPredicate: _id: "@type" _type: "@vocab" doc: Must be `Directory` to indicate this object describes a Directory. - name: location type: string? doc: | An IRI that identifies the directory resource. This may be a relative reference, in which case it must be resolved using the base IRI of the document. The location may refer to a local or remote resource. If the `listing` field is not set, the implementation must use the location IRI to retrieve directory listing. If an implementation is unable to retrieve the directory listing stored at a remote resource (due to unsupported protocol, access denied, or other issue) it must signal an error. If the `location` field is not provided, the `listing` field must be provided. The implementation must assign a unique identifier for the `location` field. If the `path` field is provided but the `location` field is not, an implementation may assign the value of the `path` field to `location`, then follow the rules above. jsonldPredicate: _id: "@id" _type: "@id" - name: path type: string? doc: | The local path where the Directory is made available prior to executing a CommandLineTool. This must be set by the implementation. This field must not be used in any other context. The command line tool being executed must be able to to access the directory at `path` using the POSIX `opendir(2)` syscall. If the `path` contains [POSIX shell metacharacters](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02) (`|`,`&`, `;`, `<`, `>`, `(`,`)`, `$`,`` ` ``, `\`, `"`, `'`, ``, ``, and ``) or characters [not allowed](http://www.iana.org/assignments/idna-tables-6.3.0/idna-tables-6.3.0.xhtml) for [Internationalized Domain Names for Applications](https://tools.ietf.org/html/rfc6452) then implementations may terminate the process with a `permanentFailure`. jsonldPredicate: _id: "cwl:path" _type: "@id" - name: basename type: string? doc: | The base name of the directory, that is, the name of the file without any leading directory path. The base name must not contain a slash `/`. If not provided, the implementation must set this field based on the `location` field by taking the final path component after parsing `location` as an IRI. If `basename` is provided, it is not required to match the value from `location`. When this file is made available to a CommandLineTool, it must be named with `basename`, i.e. the final component of the `path` field must match `basename`. jsonldPredicate: "cwl:basename" - name: listing type: - "null" - type: array items: [File, Directory] doc: | List of files or subdirectories contained in this directory. The name of each file or subdirectory is determined by the `basename` field of each `File` or `Directory` object. It is an error if a `File` shares a `basename` with any other entry in `listing`. If two or more `Directory` object share the same `basename`, this must be treated as equivalent to a single subdirectory with the listings recursively merged. jsonldPredicate: _id: "cwl:listing" - name: SchemaBase type: record abstract: true fields: - name: label type: - "null" - string jsonldPredicate: "rdfs:label" doc: "A short, human-readable label of this object." - name: Parameter type: record extends: SchemaBase abstract: true doc: | Define an input or output parameter to a process. fields: - name: secondaryFiles type: - "null" - string - Expression - type: array items: [string, Expression] jsonldPredicate: "cwl:secondaryFiles" doc: | Only valid when `type: File` or is an array of `items: File`. Describes files that must be included alongside the primary file(s). All listed secondary files must be present. An implementation may fail workflow execution if a secondary file does not exist. If the value is an expression, the value of `self` in the expression must be the primary input or output File to which this binding applies. If the value is a string, it specifies that the following pattern should be applied to the `location` of the primary file: 1. If string begins with one or more caret `^` characters, for each caret, remove the last file extension from the path (the last period `.` and all following characters). If there are no file extensions, the path is unchanged. 2. Append the remainder of the string to the end of the file path. - name: streamable type: boolean? doc: | Only valid when `type: File` or is an array of `items: File`. A value of `true` indicates that the file is read or written sequentially without seeking. An implementation may use this flag to indicate whether it is valid to stream file contents using a named pipe. Default: `false`. - name: doc type: - string? - string[]? doc: "A documentation string for this type, or an array of strings which should be concatenated." jsonldPredicate: "rdfs:comment" - type: enum name: Expression doc: | 'Expression' is not a real type. It indicates that a field must allow runtime parameter references. If [InlineJavascriptRequirement](#InlineJavascriptRequirement) is declared and supported by the platform, the field must also allow Javascript expressions. symbols: - cwl:ExpressionPlaceholder - name: InputBinding type: record abstract: true fields: - name: loadContents type: - "null" - boolean jsonldPredicate: "cwl:loadContents" doc: | Only valid when `type: File` or is an array of `items: File`. Read up to the first 64 KiB of text from the file and place it in the "contents" field of the file object for use by expressions. - name: OutputBinding type: record abstract: true - name: InputSchema extends: SchemaBase type: record abstract: true - name: OutputSchema extends: SchemaBase type: record abstract: true - name: InputRecordField type: record extends: "sld:RecordField" specialize: - specializeFrom: "sld:RecordSchema" specializeTo: InputRecordSchema - specializeFrom: "sld:EnumSchema" specializeTo: InputEnumSchema - specializeFrom: "sld:ArraySchema" specializeTo: InputArraySchema - specializeFrom: "sld:PrimitiveType" specializeTo: CWLType fields: - name: inputBinding type: InputBinding? jsonldPredicate: "cwl:inputBinding" - name: label type: string? jsonldPredicate: "rdfs:label" doc: "A short, human-readable label of this process object." - name: InputRecordSchema type: record extends: ["sld:RecordSchema", InputSchema] specialize: - specializeFrom: "sld:RecordField" specializeTo: InputRecordField - name: InputEnumSchema type: record extends: ["sld:EnumSchema", InputSchema] fields: - name: inputBinding type: InputBinding? jsonldPredicate: "cwl:inputBinding" - name: InputArraySchema type: record extends: ["sld:ArraySchema", InputSchema] specialize: - specializeFrom: "sld:RecordSchema" specializeTo: InputRecordSchema - specializeFrom: "sld:EnumSchema" specializeTo: InputEnumSchema - specializeFrom: "sld:ArraySchema" specializeTo: InputArraySchema - specializeFrom: "sld:PrimitiveType" specializeTo: CWLType fields: - name: inputBinding type: InputBinding? jsonldPredicate: "cwl:inputBinding" - name: OutputRecordField type: record extends: "sld:RecordField" specialize: - specializeFrom: "sld:RecordSchema" specializeTo: OutputRecordSchema - specializeFrom: "sld:EnumSchema" specializeTo: OutputEnumSchema - specializeFrom: "sld:ArraySchema" specializeTo: OutputArraySchema - specializeFrom: "sld:PrimitiveType" specializeTo: CWLType fields: - name: outputBinding type: OutputBinding? jsonldPredicate: "cwl:outputBinding" - name: OutputRecordSchema type: record extends: ["sld:RecordSchema", "#OutputSchema"] docParent: "#OutputParameter" specialize: - specializeFrom: "sld:RecordField" specializeTo: OutputRecordField - name: OutputEnumSchema type: record extends: ["sld:EnumSchema", OutputSchema] docParent: "#OutputParameter" fields: - name: outputBinding type: OutputBinding? jsonldPredicate: "cwl:outputBinding" - name: OutputArraySchema type: record extends: ["sld:ArraySchema", OutputSchema] docParent: "#OutputParameter" specialize: - specializeFrom: "sld:RecordSchema" specializeTo: OutputRecordSchema - specializeFrom: "sld:EnumSchema" specializeTo: OutputEnumSchema - specializeFrom: "sld:ArraySchema" specializeTo: OutputArraySchema - specializeFrom: "sld:PrimitiveType" specializeTo: CWLType fields: - name: outputBinding type: OutputBinding? jsonldPredicate: "cwl:outputBinding" - name: InputParameter type: record extends: Parameter fields: - name: id type: string jsonldPredicate: "@id" doc: "The unique identifier for this parameter object." - name: format type: - "null" - string - type: array items: string - Expression jsonldPredicate: _id: cwl:format _type: "@id" identity: true doc: | Only valid when `type: File` or is an array of `items: File`. This must be one or more IRIs of concept nodes that represents file formats which are allowed as input to this parameter, preferrably defined within an ontology. If no ontology is available, file formats may be tested by exact match. - name: inputBinding type: InputBinding? jsonldPredicate: "cwl:inputBinding" doc: | Describes how to handle the inputs of a process and convert them into a concrete form for execution, such as command line parameters. - name: default type: Any? jsonldPredicate: _id: cwl:default noLinkCheck: true doc: | The default value to use for this parameter if the parameter is missing from the input object, or if the value of the parameter in the input object is `null`. Default values are applied before evaluating expressions (e.g. dependent `valueFrom` fields). - name: type type: - "null" - CWLType - InputRecordSchema - InputEnumSchema - InputArraySchema - string - type: array items: - CWLType - InputRecordSchema - InputEnumSchema - InputArraySchema - string jsonldPredicate: "_id": "sld:type" "_type": "@vocab" refScope: 2 typeDSL: True doc: | Specify valid types of data that may be assigned to this parameter. - name: OutputParameter type: record extends: Parameter fields: - name: id type: string jsonldPredicate: "@id" doc: "The unique identifier for this parameter object." - name: outputBinding type: OutputBinding? jsonldPredicate: "cwl:outputBinding" doc: | Describes how to handle the outputs of a process. - name: format type: - "null" - string - Expression jsonldPredicate: _id: cwl:format _type: "@id" identity: true doc: | Only valid when `type: File` or is an array of `items: File`. This is the file format that will be assigned to the output parameter. - type: record name: ProcessRequirement abstract: true doc: | A process requirement declares a prerequisite that may or must be fulfilled before executing a process. See [`Process.hints`](#process) and [`Process.requirements`](#process). Process requirements are the primary mechanism for specifying extensions to the CWL core specification. - type: record name: Process abstract: true doc: | The base executable type in CWL is the `Process` object defined by the document. Note that the `Process` object is abstract and cannot be directly executed. fields: - name: id type: string? jsonldPredicate: "@id" doc: "The unique identifier for this process object." - name: inputs type: type: array items: InputParameter jsonldPredicate: _id: "cwl:inputs" mapSubject: id mapPredicate: type doc: | Defines the input parameters of the process. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used to build a user interface for constructing the input object. When accepting an input object, all input parameters must have a value. If an input parameter is missing from the input object, it must be assigned a value of `null` (or the value of `default` for that parameter, if provided) for the purposes of validation and evaluation of expressions. - name: outputs type: type: array items: OutputParameter jsonldPredicate: _id: "cwl:outputs" mapSubject: id mapPredicate: type doc: | Defines the parameters representing the output of the process. May be used to generate and/or validate the output object. - name: requirements type: ProcessRequirement[]? jsonldPredicate: _id: "cwl:requirements" mapSubject: class doc: | Declares requirements that apply to either the runtime environment or the workflow engine that must be met in order to execute this process. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option. - name: hints type: Any[]? doc: | Declares hints applying to either the runtime environment or the workflow engine that may be helpful in executing this process. It is not an error if an implementation cannot satisfy all hints, however the implementation may report a warning. jsonldPredicate: _id: cwl:hints noLinkCheck: true mapSubject: class - name: label type: string? jsonldPredicate: "rdfs:label" doc: "A short, human-readable label of this process object." - name: doc type: string? jsonldPredicate: "rdfs:comment" doc: "A long, human-readable description of this process object." - name: cwlVersion type: CWLVersion? doc: | CWL document version. Always required at the document root. Not required for a Process embedded inside another Process. jsonldPredicate: "_id": "cwl:cwlVersion" "_type": "@vocab" - name: InlineJavascriptRequirement type: record extends: ProcessRequirement doc: | Indicates that the workflow platform must support inline Javascript expressions. If this requirement is not present, the workflow platform must not perform expression interpolatation. fields: - name: class type: string doc: "Always 'InlineJavascriptRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: expressionLib type: string[]? doc: | Additional code fragments that will also be inserted before executing the expression code. Allows for function definitions that may be called from CWL expressions. - name: SchemaDefRequirement type: record extends: ProcessRequirement doc: | This field consists of an array of type definitions which must be used when interpreting the `inputs` and `outputs` fields. When a `type` field contain a IRI, the implementation must check if the type is defined in `schemaDefs` and use that definition. If the type is not found in `schemaDefs`, it is an error. The entries in `schemaDefs` must be processed in the order listed such that later schema definitions may refer to earlier schema definitions. fields: - name: class type: string doc: "Always 'SchemaDefRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: types type: type: array items: InputSchema doc: The list of type definitions. cwltool-1.0.20180302231433/cwltool/schemas/v1.0/CommonWorkflowLanguage.yml0000644000175200017520000000032113247251315026536 0ustar mcrusoemcrusoe00000000000000$base: "https://w3id.org/cwl/cwl#" $namespaces: cwl: "https://w3id.org/cwl/cwl#" sld: "https://w3id.org/cwl/salad#" $graph: - $import: Process.yml - $import: CommandLineTool.yml - $import: Workflow.yml cwltool-1.0.20180302231433/cwltool/schemas/v1.0/invocation.md0000644000175200017520000001732113247251315024067 0ustar mcrusoemcrusoe00000000000000# Running a Command To accommodate the enormous variety in syntax and semantics for input, runtime environment, invocation, and output of arbitrary programs, a CommandLineTool defines an "input binding" that describes how to translate abstract input parameters to an concrete program invocation, and an "output binding" that describes how to generate output parameters from program output. ## Input binding The tool command line is built by applying command line bindings to the input object. Bindings are listed either as part of an [input parameter](#CommandInputParameter) using the `inputBinding` field, or separately using the `arguments` field of the CommandLineTool. The algorithm to build the command line is as follows. In this algorithm, the sort key is a list consisting of one or more numeric or string elements. Strings are sorted lexicographically based on UTF-8 encoding. 1. Collect `CommandLineBinding` objects from `arguments`. Assign a sorting key `[position, i]` where `position` is [`CommandLineBinding.position`](#CommandLineBinding) and `i` is the index in the `arguments` list. 2. Collect `CommandLineBinding` objects from the `inputs` schema and associate them with values from the input object. Where the input type is a record, array, or map, recursively walk the schema and input object, collecting nested `CommandLineBinding` objects and associating them with values from the input object. 3. Create a sorting key by taking the value of the `position` field at each level leading to each leaf binding object. If `position` is not specified, it is not added to the sorting key. For bindings on arrays and maps, the sorting key must include the array index or map key following the position. If and only if two bindings have the same sort key, the tie must be broken using the ordering of the field or parameter name immediately containing the leaf binding. 4. Sort elements using the assigned sorting keys. Numeric entries sort before strings. 5. In the sorted order, apply the rules defined in [`CommandLineBinding`](#CommandLineBinding) to convert bindings to actual command line elements. 6. Insert elements from `baseCommand` at the beginning of the command line. ## Runtime environment All files listed in the input object must be made available in the runtime environment. The implementation may use a shared or distributed file system or transfer files via explicit download to the host. Implementations may choose not to provide access to files not explicitly specified in the input object or process requirements. Output files produced by tool execution must be written to the **designated output directory**. The initial current working directory when executing the tool must be the designated output directory. Files may also be written to the **designated temporary directory**. This directory must be isolated and not shared with other processes. Any files written to the designated temporary directory may be automatically deleted by the workflow platform immediately after the tool terminates. For compatibility, files may be written to the **system temporary directory** which must be located at `/tmp`. Because the system temporary directory may be shared with other processes on the system, files placed in the system temporary directory are not guaranteed to be deleted automatically. A tool must not use the system temporary directory as a backchannel communication with other tools. It is valid for the system temporary directory to be the same as the designated temporary directory. When executing the tool, the tool must execute in a new, empty environment with only the environment variables described below; the child process must not inherit environment variables from the parent process except as specified or at user option. * `HOME` must be set to the designated output directory. * `TMPDIR` must be set to the designated temporary directory. * `PATH` may be inherited from the parent process, except when run in a container that provides its own `PATH`. * Variables defined by [EnvVarRequirement](#EnvVarRequirement) * The default environment of the container, such as when using [DockerRequirement](#DockerRequirement) An implementation may forbid the tool from writing to any location in the runtime environment file system other than the designated temporary directory, system temporary directory, and designated output directory. An implementation may provide read-only input files, and disallow in-place update of input files. The designated temporary directory, system temporary directory and designated output directory may each reside on different mount points on different file systems. An implementation may forbid the tool from directly accessing network resources. Correct tools must not assume any network access. Future versions of the specification may incorporate optional process requirements that describe the networking needs of a tool. The `runtime` section available in [parameter references](#Parameter_references) and [expressions](#Expressions) contains the following fields. As noted earlier, an implementation may perform deferred resolution of runtime fields by providing opaque strings for any or all of the following fields; parameter references and expressions may only use the literal string value of the field and must not perform computation on the contents. * `runtime.outdir`: an absolute path to the designated output directory * `runtime.tmpdir`: an absolute path to the designated temporary directory * `runtime.cores`: number of CPU cores reserved for the tool process * `runtime.ram`: amount of RAM in mebibytes (2\*\*20) reserved for the tool process * `runtime.outdirSize`: reserved storage space available in the designated output directory * `runtime.tmpdirSize`: reserved storage space available in the designated temporary directory For `cores`, `ram`, `outdirSize` and `tmpdirSize`, if an implementation can't provide the actual number of reserved cores during the expression evaluation time, it should report back the minimal requested amount. See [ResourceRequirement](#ResourceRequirement) for details on how to describe the hardware resources required by a tool. The standard input stream and standard output stream may be redirected as described in the `stdin` and `stdout` fields. ## Execution Once the command line is built and the runtime environment is created, the actual tool is executed. The standard error stream and standard output stream (unless redirected by setting `stdout` or `stderr`) may be captured by platform logging facilities for storage and reporting. Tools may be multithreaded or spawn child processes; however, when the parent process exits, the tool is considered finished regardless of whether any detached child processes are still running. Tools must not require any kind of console, GUI, or web based user interaction in order to start and run to completion. The exit code of the process indicates if the process completed successfully. By convention, an exit code of zero is treated as success and non-zero exit codes are treated as failure. This may be customized by providing the fields `successCodes`, `temporaryFailCodes`, and `permanentFailCodes`. An implementation may choose to default unspecified non-zero exit codes to either `temporaryFailure` or `permanentFailure`. ## Output binding If the output directory contains a file named "cwl.output.json", that file must be loaded and used as the output object. Otherwise, the output object must be generated by walking the parameters listed in `outputs` and applying output bindings to the tool output. Output bindings are associated with output parameters using the `outputBinding` field. See [`CommandOutputBinding`](#CommandOutputBinding) for details. cwltool-1.0.20180302231433/cwltool/schemas/v1.0/concepts.md0000644000175200017520000004323013247251315023532 0ustar mcrusoemcrusoe00000000000000## References to other specifications **Javascript Object Notation (JSON)**: http://json.org **JSON Linked Data (JSON-LD)**: http://json-ld.org **YAML**: http://yaml.org **Avro**: https://avro.apache.org/docs/1.8.1/spec.html **Uniform Resource Identifier (URI) Generic Syntax**: https://tools.ietf.org/html/rfc3986) **Internationalized Resource Identifiers (IRIs)**: https://tools.ietf.org/html/rfc3987 **Portable Operating System Interface (POSIX.1-2008)**: http://pubs.opengroup.org/onlinepubs/9699919799/ **Resource Description Framework (RDF)**: http://www.w3.org/RDF/ ## Scope This document describes CWL syntax, execution, and object model. It is not intended to document a CWL specific implementation, however it may serve as a reference for the behavior of conforming implementations. ## Terminology The terminology used to describe CWL documents is defined in the Concepts section of the specification. The terms defined in the following list are used in building those definitions and in describing the actions of a CWL implementation: **may**: Conforming CWL documents and CWL implementations are permitted but not required to behave as described. **must**: Conforming CWL documents and CWL implementations are required to behave as described; otherwise they are in error. **error**: A violation of the rules of this specification; results are undefined. Conforming implementations may detect and report an error and may recover from it. **fatal error**: A violation of the rules of this specification; results are undefined. Conforming implementations must not continue to execute the current process and may report an error. **at user option**: Conforming software may or must (depending on the modal verb in the sentence) behave as described; if it does, it must provide users a means to enable or disable the behavior described. **deprecated**: Conforming software may implement a behavior for backwards compatibility. Portable CWL documents should not rely on deprecated behavior. Behavior marked as deprecated may be removed entirely from future revisions of the CWL specification. # Data model ## Data concepts An **object** is a data structure equivalent to the "object" type in JSON, consisting of a unordered set of name/value pairs (referred to here as **fields**) and where the name is a string and the value is a string, number, boolean, array, or object. A **document** is a file containing a serialized object, or an array of objects. A **process** is a basic unit of computation which accepts input data, performs some computation, and produces output data. Examples include CommandLineTools, Workflows, and ExpressionTools. An **input object** is an object describing the inputs to an invocation of a process. An **output object** is an object describing the output resulting from an invocation of a process. An **input schema** describes the valid format (required fields, data types) for an input object. An **output schema** describes the valid format for an output object. **Metadata** is information about workflows, tools, or input items. ## Syntax CWL documents must consist of an object or array of objects represented using JSON or YAML syntax. Upon loading, a CWL implementation must apply the preprocessing steps described in the [Semantic Annotations for Linked Avro Data (SALAD) Specification](SchemaSalad.html). An implementation may formally validate the structure of a CWL document using SALAD schemas located at https://github.com/common-workflow-language/common-workflow-language/tree/master/v1.0 ## Identifiers If an object contains an `id` field, that is used to uniquely identify the object in that document. The value of the `id` field must be unique over the entire document. Identifiers may be resolved relative to either the document base and/or other identifiers following the rules are described in the [Schema Salad specification](SchemaSalad.html#Identifier_resolution). An implementation may choose to only honor references to object types for which the `id` field is explicitly listed in this specification. ## Document preprocessing An implementation must resolve [$import](SchemaSalad.html#Import) and [$include](SchemaSalad.html#Import) directives as described in the [Schema Salad specification](SchemaSalad.html). Another transformation defined in Schema salad is simplification of data type definitions. Type `` ending with `?` should be transformed to `[, "null"]`. Type `` ending with `[]` should be transformed to `{"type": "array", "items": }` ## Extensions and metadata Input metadata (for example, a lab sample identifier) may be represented within a tool or workflow using input parameters which are explicitly propagated to output. Future versions of this specification may define additional facilities for working with input/output metadata. Implementation extensions not required for correct execution (for example, fields related to GUI presentation) and metadata about the tool or workflow itself (for example, authorship for use in citations) may be provided as additional fields on any object. Such extensions fields must use a namespace prefix listed in the `$namespaces` section of the document as described in the [Schema Salad specification](SchemaSalad.html#Explicit_context). Implementation extensions which modify execution semantics must be [listed in the `requirements` field](#Requirements_and_hints). # Execution model ## Execution concepts A **parameter** is a named symbolic input or output of process, with an associated datatype or schema. During execution, values are assigned to parameters to make the input object or output object used for concrete process invocation. A **CommandLineTool** is a process characterized by the execution of a standalone, non-interactive program which is invoked on some input, produces output, and then terminates. A **workflow** is a process characterized by multiple subprocess steps, where step outputs are connected to the inputs of downstream steps to form a directed acylic graph, and independent steps may run concurrently. A **runtime environment** is the actual hardware and software environment when executing a command line tool. It includes, but is not limited to, the hardware architecture, hardware resources, operating system, software runtime (if applicable, such as the specific Python interpreter or the specific Java virtual machine), libraries, modules, packages, utilities, and data files required to run the tool. A **workflow platform** is a specific hardware and software implementation capable of interpreting CWL documents and executing the processes specified by the document. The responsibilities of the workflow platform may include scheduling process invocation, setting up the necessary runtime environment, making input data available, invoking the tool process, and collecting output. A workflow platform may choose to only implement the Command Line Tool Description part of the CWL specification. It is intended that the workflow platform has broad leeway outside of this specification to optimize use of computing resources and enforce policies not covered by this specification. Some areas that are currently out of scope for CWL specification but may be handled by a specific workflow platform include: * Data security and permissions * Scheduling tool invocations on remote cluster or cloud compute nodes. * Using virtual machines or operating system containers to manage the runtime (except as described in [DockerRequirement](CommandLineTool.html#DockerRequirement)). * Using remote or distributed file systems to manage input and output files. * Transforming file paths. * Determining if a process has previously been executed, and if so skipping it and reusing previous results. * Pausing, resuming or checkpointing processes or workflows. Conforming CWL processes must not assume anything about the runtime environment or workflow platform unless explicitly declared though the use of [process requirements](#Requirements_and_hints). ## Generic execution process The generic execution sequence of a CWL process (including workflows and command line line tools) is as follows. 1. Load, process and validate a CWL document, yielding a process object. 2. Load input object. 3. Validate the input object against the `inputs` schema for the process. 4. Validate process requirements are met. 5. Perform any further setup required by the specific process type. 6. Execute the process. 7. Capture results of process execution into the output object. 8. Validate the output object against the `outputs` schema for the process. 9. Report the output object to the process caller. ## Requirements and hints A **process requirement** modifies the semantics or runtime environment of a process. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option. A **hint** is similar to a requirement; however, it is not an error if an implementation cannot satisfy all hints. The implementation may report a warning if a hint cannot be satisfied. Requirements are inherited. A requirement specified in a Workflow applies to all workflow steps; a requirement specified on a workflow step will apply to the process implementation of that step and any of its substeps. If the same process requirement appears at different levels of the workflow, the most specific instance of the requirement is used, that is, an entry in `requirements` on a process implementation such as CommandLineTool will take precedence over an entry in `requirements` specified in a workflow step, and an entry in `requirements` on a workflow step takes precedence over the workflow. Entries in `hints` are resolved the same way. Requirements override hints. If a process implementation provides a process requirement in `hints` which is also provided in `requirements` by an enclosing workflow or workflow step, the enclosing `requirements` takes precedence. ## Parameter references Parameter references are denoted by the syntax `$(...)` and may be used in any field permitting the pseudo-type `Expression`, as specified by this document. Conforming implementations must support parameter references. Parameter references use the following subset of [Javascript/ECMAScript 5.1](http://www.ecma-international.org/ecma-262/5.1/) syntax, but they are designed to not require a Javascript engine for evaluation. In the following BNF grammar, character classes, and grammar rules are denoted in '{}', '-' denotes exclusion from a character class, '(())' denotes grouping, '|' denotes alternates, trailing '*' denotes zero or more repeats, '+' denote one or more repeats, '/' escapes these special characters, and all other characters are literal values.

symbol:: {Unicode alphanumeric}+
singleq:: [' (( {character - '} | \' ))* ']
doubleq:: [" (( {character - "} | \" ))* "]
index:: [ {decimal digit}+ ]
segment:: . {symbol} | {singleq} | {doubleq} | {index}
parameter reference::$( {symbol} {segment}*)

Use the following algorithm to resolve a parameter reference: 1. Match the leading symbol as the key 2. Look up the key in the parameter context (described below) to get the current value. It is an error if the key is not found in the parameter context. 3. If there are no subsequent segments, terminate and return current value 4. Else, match the next segment 5. Extract the symbol, string, or index from the segment as the key 6. Look up the key in current value and assign as new current value. If the key is a symbol or string, the current value must be an object. If the key is an index, the current value must be an array or string. It is an error if the key does not match the required type, or the key is not found or out of range. 7. Repeat steps 3-6 The root namespace is the parameter context. The following parameters must be provided: * `inputs`: The input object to the current Process. * `self`: A context-specific value. The contextual values for 'self' are documented for specific fields elsewhere in this specification. If a contextual value of 'self' is not documented for a field, it must be 'null'. * `runtime`: An object containing configuration details. Specific to the process type. An implementation may provide opaque strings for any or all fields of `runtime`. These must be filled in by the platform after processing the Tool but before actual execution. Parameter references and expressions may only use the literal string value of the field and must not perform computation on the contents, except where noted otherwise. If the value of a field has no leading or trailing non-whitespace characters around a parameter reference, the effective value of the field becomes the value of the referenced parameter, preserving the return type. If the value of a field has non-whitespace leading or trailing characters around a parameter reference, it is subject to string interpolation. The effective value of the field is a string containing the leading characters, followed by the string value of the parameter reference, followed by the trailing characters. The string value of the parameter reference is its textual JSON representation with the following rules: * Leading and trailing quotes are stripped from strings * Objects entries are sorted by key Multiple parameter references may appear in a single field. This case must be treated as a string interpolation. After interpolating the first parameter reference, interpolation must be recursively applied to the trailing characters to yield the final string value. ## Expressions An expression is a fragment of [Javascript/ECMAScript 5.1](http://www.ecma-international.org/ecma-262/5.1/) code evaluated by the workflow platform to affect the inputs, outputs, or behavior of a process. In the generic execution sequence, expressions may be evaluated during step 5 (process setup), step 6 (execute process), and/or step 7 (capture output). Expressions are distinct from regular processes in that they are intended to modify the behavior of the workflow itself rather than perform the primary work of the workflow. To declare the use of expressions, the document must include the process requirement `InlineJavascriptRequirement`. Expressions may be used in any field permitting the pseudo-type `Expression`, as specified by this document. Expressions are denoted by the syntax `$(...)` or `${...}`. A code fragment wrapped in the `$(...)` syntax must be evaluated as a [ECMAScript expression](http://www.ecma-international.org/ecma-262/5.1/#sec-11). A code fragment wrapped in the `${...}` syntax must be evaluated as a [ECMAScript function body](http://www.ecma-international.org/ecma-262/5.1/#sec-13) for an anonymous, zero-argument function. Expressions must return a valid JSON data type: one of null, string, number, boolean, array, object. Other return values must result in a `permanentFailure`. Implementations must permit any syntactically valid Javascript and account for nesting of parenthesis or braces and that strings that may contain parenthesis or braces when scanning for expressions. The runtime must include any code defined in the ["expressionLib" field of InlineJavascriptRequirement](#InlineJavascriptRequirement) prior to executing the actual expression. Before executing the expression, the runtime must initialize as global variables the fields of the parameter context described above. The effective value of the field after expression evaluation follows the same rules as parameter references discussed above. Multiple expressions may appear in a single field. Expressions must be evaluated in an isolated context (a "sandbox") which permits no side effects to leak outside the context. Expressions also must be evaluated in [Javascript strict mode](http://www.ecma-international.org/ecma-262/5.1/#sec-4.2.2). The order in which expressions are evaluated is undefined except where otherwise noted in this document. An implementation may choose to implement parameter references by evaluating as a Javascript expression. The results of evaluating parameter references must be identical whether implemented by Javascript evaluation or some other means. Implementations may apply other limits, such as process isolation, timeouts, and operating system containers/jails to minimize the security risks associated with running untrusted code embedded in a CWL document. Exceptions thrown from an exception must result in a `permanentFailure` of the process. ## Executing CWL documents as scripts By convention, a CWL document may begin with `#!/usr/bin/env cwl-runner` and be marked as executable (the POSIX "+x" permission bits) to enable it to be executed directly. A workflow platform may support this mode of operation; if so, it must provide `cwl-runner` as an alias for the platform's CWL implementation. A CWL input object document may similarly begin with `#!/usr/bin/env cwl-runner` and be marked as executable. In this case, the input object must include the field `cwl:tool` supplying an IRI to the default CWL document that should be executed using the fields of the input object as input parameters. ## Discovering CWL documents on a local filesystem To discover CWL documents look in the following locations: `/usr/share/commonwl/` `/usr/local/share/commonwl/` `$XDG_DATA_HOME/commonwl/` (usually `$HOME/.local/share/commonwl`) `$XDF_DATA_HOME` is from the [XDG Base Directory Specification](http://standards.freedesktop.org/basedir-spec/basedir-spec-0.6.html) cwltool-1.0.20180302231433/cwltool/schemas/v1.0/README.md0000644000175200017520000000150113247251315022644 0ustar mcrusoemcrusoe00000000000000# Common Workflow Language Specifications, v1.0 The CWL specifications are divided up into several documents. The [User Guide](UserGuide.html) provides a gentle introduction to writing CWL command line tools and workflows. The [Command Line Tool Description Specification](CommandLineTool.html) specifies the document schema and execution semantics for wrapping and executing command line tools. The [Workflow Description Specification](Workflow.html) specifies the document schema and execution semantics for composing workflows from components such as command line tools and other workflows. The [Semantic Annotations for Linked Avro Data (SALAD) Specification](SchemaSalad.html) specifies the preprocessing steps that must be applied when loading CWL documents and the schema language used to write the above specifications. cwltool-1.0.20180302231433/cwltool/schemas/v1.0/contrib.md0000644000175200017520000000200313247251315023345 0ustar mcrusoemcrusoe00000000000000Authors: * Peter Amstutz , Arvados Project, Curoverse * Michael R. Crusoe , Common Workflow Language project * Nebojša Tijanić , Seven Bridges Genomics Contributors: * Brad Chapman , Harvard Chan School of Public Health * John Chilton , Galaxy Project, Pennsylvania State University * Michael Heuer ,UC Berkeley AMPLab * Andrey Kartashov , Cincinnati Children's Hospital * Dan Leehr , Duke University * Hervé Ménager , Institut Pasteur * Maya Nedeljkovich , Seven Bridges Genomics * Matt Scales , Institute of Cancer Research, London * Stian Soiland-Reyes [soiland-reyes@cs.manchester.ac.uk](mailto:soiland-reyes@cs.manchester.ac.uk), University of Manchester * Luka Stojanovic , Seven Bridges Genomics cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/0000755000175200017520000000000013247251336022457 5ustar mcrusoemcrusoe00000000000000cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/0000755000175200017520000000000013247251336025063 5ustar mcrusoemcrusoe00000000000000cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/0000755000175200017520000000000013247251336027172 5ustar mcrusoemcrusoe00000000000000cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/field_name_schema.yml0000644000175200017520000000040113247251315033310 0ustar mcrusoemcrusoe00000000000000{ "$namespaces": { "acid": "http://example.com/acid#" }, "$graph": [{ "name": "ExampleType", "type": "record", "fields": [{ "name": "base", "type": "string", "jsonldPredicate": "http://example.com/base" }] }] } cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/ident_res_schema.yml0000644000175200017520000000035313247251315033207 0ustar mcrusoemcrusoe00000000000000{ "$namespaces": { "acid": "http://example.com/acid#" }, "$graph": [{ "name": "ExampleType", "type": "record", "fields": [{ "name": "id", "type": "string", "jsonldPredicate": "@id" }] }] } cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/ident_res_src.yml0000644000175200017520000000052213247251315032534 0ustar mcrusoemcrusoe00000000000000 { "id": "http://example.com/base", "form": { "id": "one", "things": [ { "id": "two" }, { "id": "#three", }, { "id": "four#five", }, { "id": "acid:six", } ] } } cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/typedsl_res_proc.yml0000644000175200017520000000061213247251315033271 0ustar mcrusoemcrusoe00000000000000[ { "extype": "string" }, { "extype": [ "null", "string" ] }, { "extype": { "type": "array", "items": "string" } }, { "extype": [ "null", { "type": "array", "items": "string" } ] } ] cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/field_name.yml0000644000175200017520000000246013247251315031777 0ustar mcrusoemcrusoe00000000000000- | ## Field name resolution The document schema declares the vocabulary of known field names. During preprocessing traversal, field name in the document which are not part of the schema vocabulary must be resolved to absolute URIs. Under "strict" validation, it is an error for a document to include fields which are not part of the vocabulary and not resolvable to absolute URIs. Fields names which are not part of the vocabulary are resolved using the following rules: * If an field name URI begins with a namespace prefix declared in the document context (`@context`) followed by a colon `:`, the prefix and colon must be replaced by the namespace declared in `@context`. * If there is a vocabulary term which maps to the URI of a resolved field, the field name must be replace with the vocabulary term. * If a field name URI is an absolute URI consisting of a scheme and path and is not part of the vocabulary, no processing occurs. Field name resolution is not relative. It must not be affected by the base URI. ### Field name resolution example Given the following schema: ``` - $include: field_name_schema.yml - | ``` Process the following example: ``` - $include: field_name_src.yml - | ``` This becomes: ``` - $include: field_name_proc.yml - | ``` cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/map_res.yml0000644000175200017520000000164513247251315031346 0ustar mcrusoemcrusoe00000000000000- | ## Identifier maps The schema may designate certain fields as having a `mapSubject`. If the value of the field is a JSON object, it must be transformed into an array of JSON objects. Each key-value pair from the source JSON object is a list item, each list item must be a JSON objects, and the value of the key is assigned to the field specified by `mapSubject`. Fields which have `mapSubject` specified may also supply a `mapPredicate`. If the value of a map item is not a JSON object, the item is transformed to a JSON object with the key assigned to the field specified by `mapSubject` and the value assigned to the field specified by `mapPredicate`. ### Identifier map example Given the following schema: ``` - $include: map_res_schema.yml - | ``` Process the following example: ``` - $include: map_res_src.yml - | ``` This becomes: ``` - $include: map_res_proc.yml - | ``` cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/link_res_proc.yml0000644000175200017520000000063313247251315032545 0ustar mcrusoemcrusoe00000000000000{ "$base": "http://example.com/base", "link": "http://example.com/base/zero", "form": { "link": "http://example.com/one", "things": [ { "link": "http://example.com/two" }, { "link": "http://example.com/base#three" }, { "link": "http://example.com/four#five", }, { "link": "http://example.com/acid#six", } ] } } cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/salad.md0000644000175200017520000002512513247251315030602 0ustar mcrusoemcrusoe00000000000000# Semantic Annotations for Linked Avro Data (SALAD) Author: * Peter Amstutz , Curoverse Contributors: * The developers of Apache Avro * The developers of JSON-LD * Nebojša Tijanić , Seven Bridges Genomics # Abstract Salad is a schema language for describing structured linked data documents in JSON or YAML documents. A Salad schema provides rules for preprocessing, structural validation, and link checking for documents described by a Salad schema. Salad builds on JSON-LD and the Apache Avro data serialization system, and extends Avro with features for rich data modeling such as inheritance, template specialization, object identifiers, and object references. Salad was developed to provide a bridge between the record oriented data modeling supported by Apache Avro and the Semantic Web. # Status of This Document This document is the product of the [Common Workflow Language working group](https://groups.google.com/forum/#!forum/common-workflow-language). The latest version of this document is available in the "schema_salad" repository at https://github.com/common-workflow-language/schema_salad The products of the CWL working group (including this document) are made available under the terms of the Apache License, version 2.0. # Introduction The JSON data model is an extremely popular way to represent structured data. It is attractive because of its relative simplicity and is a natural fit with the standard types of many programming languages. However, this simplicity means that basic JSON lacks expressive features useful for working with complex data structures and document formats, such as schemas, object references, and namespaces. JSON-LD is a W3C standard providing a way to describe how to interpret a JSON document as Linked Data by means of a "context". JSON-LD provides a powerful solution for representing object references and namespaces in JSON based on standard web URIs, but is not itself a schema language. Without a schema providing a well defined structure, it is difficult to process an arbitrary JSON-LD document as idiomatic JSON because there are many ways to express the same data that are logically equivalent but structurally distinct. Several schema languages exist for describing and validating JSON data, such as the Apache Avro data serialization system, however none understand linked data. As a result, to fully take advantage of JSON-LD to build the next generation of linked data applications, one must maintain separate JSON schema, JSON-LD context, RDF schema, and human documentation, despite significant overlap of content and obvious need for these documents to stay synchronized. Schema Salad is designed to address this gap. It provides a schema language and processing rules for describing structured JSON content permitting URI resolution and strict document validation. The schema language supports linked data through annotations that describe the linked data interpretation of the content, enables generation of JSON-LD context and RDF schema, and production of RDF triples by applying the JSON-LD context. The schema language also provides for robust support of inline documentation. ## Introduction to v1.0 This is the second version of of the Schema Salad specification. It is developed concurrently with v1.0 of the Common Workflow Language for use in specifying the Common Workflow Language, however Schema Salad is intended to be useful to a broader audience. Compared to the draft-1 schema salad specification, the following changes have been made: * Use of [mapSubject and mapPredicate](#Identifier_maps) to transform maps to lists of records. * Resolution of the [domain Specific Language for types](#Domain_Specific_Language_for_types) * Consolidation of the formal [schema into section 5](#Schema). ## References to Other Specifications **Javascript Object Notation (JSON)**: http://json.org **JSON Linked Data (JSON-LD)**: http://json-ld.org **YAML**: http://yaml.org **Avro**: https://avro.apache.org/docs/current/spec.html **Uniform Resource Identifier (URI) Generic Syntax**: https://tools.ietf.org/html/rfc3986) **Resource Description Framework (RDF)**: http://www.w3.org/RDF/ **UTF-8**: https://www.ietf.org/rfc/rfc2279.txt) ## Scope This document describes the syntax, data model, algorithms, and schema language for working with Salad documents. It is not intended to document a specific implementation of Salad, however it may serve as a reference for the behavior of conforming implementations. ## Terminology The terminology used to describe Salad documents is defined in the Concepts section of the specification. The terms defined in the following list are used in building those definitions and in describing the actions of an Salad implementation: **may**: Conforming Salad documents and Salad implementations are permitted but not required to be interpreted as described. **must**: Conforming Salad documents and Salad implementations are required to be interpreted as described; otherwise they are in error. **error**: A violation of the rules of this specification; results are undefined. Conforming implementations may detect and report an error and may recover from it. **fatal error**: A violation of the rules of this specification; results are undefined. Conforming implementations must not continue to process the document and may report an error. **at user option**: Conforming software may or must (depending on the modal verb in the sentence) behave as described; if it does, it must provide users a means to enable or disable the behavior described. # Document model ## Data concepts An **object** is a data structure equivalent to the "object" type in JSON, consisting of a unordered set of name/value pairs (referred to here as **fields**) and where the name is a string and the value is a string, number, boolean, array, or object. A **document** is a file containing a serialized object, or an array of objects. A **document type** is a class of files that share a common structure and semantics. A **document schema** is a formal description of the grammar of a document type. A **base URI** is a context-dependent URI used to resolve relative references. An **identifier** is a URI that designates a single document or single object within a document. A **vocabulary** is the set of symbolic field names and enumerated symbols defined by a document schema, where each term maps to absolute URI. ## Syntax Conforming Salad documents are serialized and loaded using YAML syntax and UTF-8 text encoding. Salad documents are written using the JSON-compatible subset of YAML. Features of YAML such as headers and type tags that are not found in the standard JSON data model must not be used in conforming Salad documents. It is a fatal error if the document is not valid YAML. A Salad document must consist only of either a single root object or an array of objects. ## Document context ### Implied context The implicit context consists of the vocabulary defined by the schema and the base URI. By default, the base URI must be the URI that was used to load the document. It may be overridden by an explicit context. ### Explicit context If a document consists of a root object, this object may contain the fields `$base`, `$namespaces`, `$schemas`, and `$graph`: * `$base`: Must be a string. Set the base URI for the document used to resolve relative references. * `$namespaces`: Must be an object with strings as values. The keys of the object are namespace prefixes used in the document; the values of the object are the prefix expansions. * `$schemas`: Must be an array of strings. This field may list URI references to documents in RDF-XML format which will be queried for RDF schema data. The subjects and predicates described by the RDF schema may provide additional semantic context for the document, and may be used for validation of prefixed extension fields found in the document. Other directives beginning with `$` must be ignored. ## Document graph If a document consists of a single root object, this object may contain the field `$graph`. This field must be an array of objects. If present, this field holds the primary content of the document. A document that consists of array of objects at the root is an implicit graph. ## Document metadata If a document consists of a single root object, metadata about the document, such as authorship, may be declared in the root object. ## Document schema Document preprocessing, link validation and schema validation require a document schema. A schema may consist of: * At least one record definition object which defines valid fields that make up a record type. Record field definitions include the valid types that may be assigned to each field and annotations to indicate fields that represent identifiers and links, described below in "Semantic Annotations". * Any number of enumerated type objects which define a set of finite set of symbols that are valid value of the type. * Any number of documentation objects which allow in-line documentation of the schema. The schema for defining a salad schema (the metaschema) is described in detail in "Schema validation". ### Record field annotations In a document schema, record field definitions may include the field `jsonldPredicate`, which may be either a string or object. Implementations must use the following document preprocessing of fields by the following rules: * If the value of `jsonldPredicate` is `@id`, the field is an identifier field. * If the value of `jsonldPredicate` is an object, and contains that object contains the field `_type` with the value `@id`, the field is a link field. * If the value of `jsonldPredicate` is an object, and contains that object contains the field `_type` with the value `@vocab`, the field is a vocabulary field, which is a subtype of link field. ## Document traversal To perform document document preprocessing, link validation and schema validation, the document must be traversed starting from the fields or array items of the root object or array and recursively visiting each child item which contains an object or arrays. # Document preprocessing After processing the explicit context (if any), document preprocessing begins. Starting from the document root, object fields values or array items which contain objects or arrays are recursively traversed depth-first. For each visited object, field names, identifier fields, link fields, vocabulary fields, and `$import` and `$include` directives must be processed as described in this section. The order of traversal of child nodes within a parent node is undefined. cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/link_res.yml0000644000175200017520000000341613247251315031524 0ustar mcrusoemcrusoe00000000000000- | ## Link resolution The schema may designate one or more fields as link fields reference other objects. Processing must resolve links to either absolute URIs using the following rules: * If a reference URI is prefixed with `#` it is a relative fragment identifier. It is resolved relative to the base URI by setting or replacing the fragment portion of the base URI. * If a reference URI does not contain a scheme and is not prefixed with `#` it is a path relative reference. If the reference URI contains `#` in any position other than the first character, the reference URI must be divided into a path portion and a fragment portion split on the first instance of `#`. The path portion is resolved relative to the base URI by the following rule: if the path portion of the base URI ends in a slash `/`, append the path portion of the reference URI to the path portion of the base URI. If the path portion of the base URI does not end in a slash, replace the final path segment with the path portion of the reference URI. Replace the fragment portion of the base URI with the fragment portion of the reference URI. * If a reference URI begins with a namespace prefix declared in `$namespaces` followed by a colon `:`, the prefix and colon must be replaced by the namespace declared in `$namespaces`. * If a reference URI is an absolute URI consisting of a scheme and path, no processing occurs. Link resolution must not affect the base URI used to resolve identifiers and other links. ### Link resolution example Given the following schema: ``` - $include: link_res_schema.yml - | ``` Process the following example: ``` - $include: link_res_src.yml - | ``` This becomes: ``` - $include: link_res_proc.yml - | ``` cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/vocab_res_schema.yml0000644000175200017520000000053113247251315033174 0ustar mcrusoemcrusoe00000000000000{ "$namespaces": { "acid": "http://example.com/acid#" }, "$graph": [{ "name": "Colors", "type": "enum", "symbols": ["acid:red"] }, { "name": "ExampleType", "type": "record", "fields": [{ "name": "voc", "type": "string", "jsonldPredicate": { "_type": "@vocab" } }] }] } cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/vocab_res_src.yml0000644000175200017520000000041313247251315032522 0ustar mcrusoemcrusoe00000000000000 { "form": { "things": [ { "voc": "red", }, { "voc": "http://example.com/acid#red", }, { "voc": "http://example.com/acid#blue", } ] } } cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/import_include.md0000644000175200017520000001076313247251315032535 0ustar mcrusoemcrusoe00000000000000## Import During preprocessing traversal, an implementation must resolve `$import` directives. An `$import` directive is an object consisting of exactly one field `$import` specifying resource by URI string. It is an error if there are additional fields in the `$import` object, such additional fields must be ignored. The URI string must be resolved to an absolute URI using the link resolution rules described previously. Implementations must support loading from `file`, `http` and `https` resources. The URI referenced by `$import` must be loaded and recursively preprocessed as a Salad document. The external imported document does not inherit the context of the importing document, and the default base URI for processing the imported document must be the URI used to retrieve the imported document. If the `$import` URI includes a document fragment, the fragment must be excluded from the base URI used to preprocess the imported document. Once loaded and processed, the `$import` node is replaced in the document structure by the object or array yielded from the import operation. URIs may reference document fragments which refer to specific an object in the target document. This indicates that the `$import` node must be replaced by only the object with the appropriate fragment identifier. It is a fatal error if an import directive refers to an external resource or resource fragment which does not exist or is not accessible. ### Import example import.yml: ``` { "hello": "world" } ``` parent.yml: ``` { "form": { "bar": { "$import": "import.yml" } } } ``` This becomes: ``` { "form": { "bar": { "hello": "world" } } } ``` ## Include During preprocessing traversal, an implementation must resolve `$include` directives. An `$include` directive is an object consisting of exactly one field `$include` specifying a URI string. It is an error if there are additional fields in the `$include` object, such additional fields must be ignored. The URI string must be resolved to an absolute URI using the link resolution rules described previously. The URI referenced by `$include` must be loaded as a text data. Implementations must support loading from `file`, `http` and `https` resources. Implementations may transcode the character encoding of the text data to match that of the parent document, but must not interpret or parse the text document in any other way. Once loaded, the `$include` node is replaced in the document structure by a string containing the text data loaded from the resource. It is a fatal error if an import directive refers to an external resource which does not exist or is not accessible. ### Include example parent.yml: ``` { "form": { "bar": { "$include": "include.txt" } } } ``` include.txt: ``` hello world ``` This becomes: ``` { "form": { "bar": "hello world" } } ``` ## Mixin During preprocessing traversal, an implementation must resolve `$mixin` directives. An `$mixin` directive is an object consisting of the field `$mixin` specifying resource by URI string. If there are additional fields in the `$mixin` object, these fields override fields in the object which is loaded from the `$mixin` URI. The URI string must be resolved to an absolute URI using the link resolution rules described previously. Implementations must support loading from `file`, `http` and `https` resources. The URI referenced by `$mixin` must be loaded and recursively preprocessed as a Salad document. The external imported document must inherit the context of the importing document, however the file URI for processing the imported document must be the URI used to retrieve the imported document. The `$mixin` URI must not include a document fragment. Once loaded and processed, the `$mixin` node is replaced in the document structure by the object or array yielded from the import operation. URIs may reference document fragments which refer to specific an object in the target document. This indicates that the `$mixin` node must be replaced by only the object with the appropriate fragment identifier. It is a fatal error if an import directive refers to an external resource or resource fragment which does not exist or is not accessible. ### Mixin example mixin.yml: ``` { "hello": "world", "carrot": "orange" } ``` parent.yml: ``` { "form": { "bar": { "$mixin": "mixin.yml" "carrot": "cake" } } } ``` This becomes: ``` { "form": { "bar": { "hello": "world", "carrot": "cake" } } } ``` cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/map_res_proc.yml0000644000175200017520000000026613247251315032367 0ustar mcrusoemcrusoe00000000000000{ "mapped": [ { "value": "daphne", "key": "fred" }, { "value": "scooby", "key": "shaggy" } ] }cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/typedsl_res_src.yml0000644000175200017520000000015713247251315033121 0ustar mcrusoemcrusoe00000000000000[{ "extype": "string" }, { "extype": "string?" }, { "extype": "string[]" }, { "extype": "string[]?" }] cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/typedsl_res_schema.yml0000644000175200017520000000045313247251315033571 0ustar mcrusoemcrusoe00000000000000{ "$graph": [ {"$import": "metaschema_base.yml"}, { "name": "TypeDSLExample", "type": "record", "documentRoot": true, "fields": [{ "name": "extype", "type": "string", "jsonldPredicate": { _type: "@vocab", "typeDSL": true } }] }] } cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/metaschema.yml0000644000175200017520000002357013247251315032030 0ustar mcrusoemcrusoe00000000000000$base: "https://w3id.org/cwl/salad#" $namespaces: sld: "https://w3id.org/cwl/salad#" dct: "http://purl.org/dc/terms/" rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdfs: "http://www.w3.org/2000/01/rdf-schema#" xsd: "http://www.w3.org/2001/XMLSchema#" $graph: - name: "Semantic_Annotations_for_Linked_Avro_Data" type: documentation doc: - $include: salad.md - $import: field_name.yml - $import: ident_res.yml - $import: link_res.yml - $import: vocab_res.yml - $include: import_include.md - $import: map_res.yml - $import: typedsl_res.yml - name: "Link_Validation" type: documentation doc: | # Link validation Once a document has been preprocessed, an implementation may validate links. The link validation traversal may visit fields which the schema designates as link fields and check that each URI references an existing object in the current document, an imported document, file system, or network resource. Failure to validate links may be a fatal error. Link validation behavior for individual fields may be modified by `identity` and `noLinkCheck` in the `jsonldPredicate` section of the field schema. - name: "Schema_validation" type: documentation doc: "" # - name: "JSON_LD_Context" # type: documentation # doc: | # # Generating JSON-LD Context # How to generate the json-ld context... - $import: metaschema_base.yml - name: JsonldPredicate type: record doc: | Attached to a record field to define how the parent record field is handled for URI resolution and JSON-LD context generation. fields: - name: _id type: string? jsonldPredicate: _id: sld:_id _type: "@id" identity: true doc: | The predicate URI that this field corresponds to. Corresponds to JSON-LD `@id` directive. - name: _type type: string? doc: | The context type hint, corresponds to JSON-LD `@type` directive. * If the value of this field is `@id` and `identity` is false or unspecified, the parent field must be resolved using the link resolution rules. If `identity` is true, the parent field must be resolved using the identifier expansion rules. * If the value of this field is `@vocab`, the parent field must be resolved using the vocabulary resolution rules. - name: _container type: string? doc: | Structure hint, corresponds to JSON-LD `@container` directive. - name: identity type: boolean? doc: | If true and `_type` is `@id` this indicates that the parent field must be resolved according to identity resolution rules instead of link resolution rules. In addition, the field value is considered an assertion that the linked value exists; absence of an object in the loaded document with the URI is not an error. - name: noLinkCheck type: boolean? doc: | If true, this indicates that link validation traversal must stop at this field. This field (it is is a URI) or any fields under it (if it is an object or array) are not subject to link checking. - name: mapSubject type: string? doc: | If the value of the field is a JSON object, it must be transformed into an array of JSON objects, where each key-value pair from the source JSON object is a list item, the list items must be JSON objects, and the key is assigned to the field specified by `mapSubject`. - name: mapPredicate type: string? doc: | Only applies if `mapSubject` is also provided. If the value of the field is a JSON object, it is transformed as described in `mapSubject`, with the addition that when the value of a map item is not an object, the item is transformed to a JSON object with the key assigned to the field specified by `mapSubject` and the value assigned to the field specified by `mapPredicate`. - name: refScope type: int? doc: | If the field contains a relative reference, it must be resolved by searching for valid document references in each successive parent scope in the document fragment. For example, a reference of `foo` in the context `#foo/bar/baz` will first check for the existence of `#foo/bar/baz/foo`, followed by `#foo/bar/foo`, then `#foo/foo` and then finally `#foo`. The first valid URI in the search order shall be used as the fully resolved value of the identifier. The value of the refScope field is the specified number of levels from the containing identifer scope before starting the search, so if `refScope: 2` then "baz" and "bar" must be stripped to get the base `#foo` and search `#foo/foo` and the `#foo`. The last scope searched must be the top level scope before determining if the identifier cannot be resolved. - name: typeDSL type: boolean? doc: | Field must be expanded based on the the Schema Salad type DSL. - name: SpecializeDef type: record fields: - name: specializeFrom type: string doc: "The data type to be replaced" jsonldPredicate: _id: "sld:specializeFrom" _type: "@id" refScope: 1 - name: specializeTo type: string doc: "The new data type to replace with" jsonldPredicate: _id: "sld:specializeTo" _type: "@id" refScope: 1 - name: NamedType type: record abstract: true docParent: "#Schema" fields: - name: name type: string jsonldPredicate: "@id" doc: "The identifier for this type" - name: inVocab type: boolean? doc: | By default or if "true", include the short name of this type in the vocabulary (the keys of the JSON-LD context). If false, do not include the short name in the vocabulary. - name: DocType type: record abstract: true docParent: "#Schema" fields: - name: doc type: - string? - string[]? doc: "A documentation string for this type, or an array of strings which should be concatenated." jsonldPredicate: "rdfs:comment" - name: docParent type: string? doc: | Hint to indicate that during documentation generation, documentation for this type should appear in a subsection under `docParent`. jsonldPredicate: _id: "sld:docParent" _type: "@id" - name: docChild type: - string? - string[]? doc: | Hint to indicate that during documentation generation, documentation for `docChild` should appear in a subsection under this type. jsonldPredicate: _id: "sld:docChild" _type: "@id" - name: docAfter type: string? doc: | Hint to indicate that during documentation generation, documentation for this type should appear after the `docAfter` section at the same level. jsonldPredicate: _id: "sld:docAfter" _type: "@id" - name: SchemaDefinedType type: record extends: DocType doc: | Abstract base for schema-defined types. abstract: true fields: - name: jsonldPredicate type: - string? - JsonldPredicate? doc: | Annotate this type with linked data context. jsonldPredicate: sld:jsonldPredicate - name: documentRoot type: boolean? doc: | If true, indicates that the type is a valid at the document root. At least one type in a schema must be tagged with `documentRoot: true`. - name: SaladRecordField type: record extends: RecordField doc: "A field of a record." fields: - name: jsonldPredicate type: - string? - JsonldPredicate? doc: | Annotate this type with linked data context. jsonldPredicate: "sld:jsonldPredicate" - name: SaladRecordSchema docParent: "#Schema" type: record extends: [NamedType, RecordSchema, SchemaDefinedType] documentRoot: true specialize: RecordField: SaladRecordField fields: - name: abstract type: boolean? doc: | If true, this record is abstract and may be used as a base for other records, but is not valid on its own. - name: extends type: - string? - string[]? jsonldPredicate: _id: "sld:extends" _type: "@id" refScope: 1 doc: | Indicates that this record inherits fields from one or more base records. - name: specialize type: - SpecializeDef[]? doc: | Only applies if `extends` is declared. Apply type specialization using the base record as a template. For each field inherited from the base record, replace any instance of the type `specializeFrom` with `specializeTo`. jsonldPredicate: _id: "sld:specialize" mapSubject: specializeFrom mapPredicate: specializeTo - name: SaladEnumSchema docParent: "#Schema" type: record extends: [EnumSchema, SchemaDefinedType] documentRoot: true doc: | Define an enumerated type. fields: - name: extends type: - string? - string[]? jsonldPredicate: _id: "sld:extends" _type: "@id" refScope: 1 doc: | Indicates that this enum inherits symbols from a base enum. - name: Documentation type: record docParent: "#Schema" extends: [NamedType, DocType] documentRoot: true doc: | A documentation section. This type exists to facilitate self-documenting schemas but has no role in formal validation. fields: - name: type doc: "Must be `documentation`" type: name: Documentation_symbol type: enum symbols: - "sld:documentation" jsonldPredicate: _id: "sld:type" _type: "@vocab" typeDSL: true refScope: 2 cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/metaschema_base.yml0000644000175200017520000000716013247251315033017 0ustar mcrusoemcrusoe00000000000000$base: "https://w3id.org/cwl/salad#" $namespaces: sld: "https://w3id.org/cwl/salad#" dct: "http://purl.org/dc/terms/" rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdfs: "http://www.w3.org/2000/01/rdf-schema#" xsd: "http://www.w3.org/2001/XMLSchema#" $graph: - name: "Schema" type: documentation doc: | # Schema - name: PrimitiveType type: enum symbols: - "sld:null" - "xsd:boolean" - "xsd:int" - "xsd:long" - "xsd:float" - "xsd:double" - "xsd:string" doc: - | Salad data types are based on Avro schema declarations. Refer to the [Avro schema declaration documentation](https://avro.apache.org/docs/current/spec.html#schemas) for detailed information. - "null: no value" - "boolean: a binary value" - "int: 32-bit signed integer" - "long: 64-bit signed integer" - "float: single precision (32-bit) IEEE 754 floating-point number" - "double: double precision (64-bit) IEEE 754 floating-point number" - "string: Unicode character sequence" - name: Any type: enum symbols: ["#Any"] docAfter: "#PrimitiveType" doc: | The **Any** type validates for any non-null value. - name: RecordField type: record doc: A field of a record. fields: - name: name type: string jsonldPredicate: "@id" doc: | The name of the field - name: doc type: string? doc: | A documentation string for this field jsonldPredicate: "rdfs:comment" - name: type type: - PrimitiveType - RecordSchema - EnumSchema - ArraySchema - string - type: array items: - PrimitiveType - RecordSchema - EnumSchema - ArraySchema - string jsonldPredicate: _id: sld:type _type: "@vocab" typeDSL: true refScope: 2 doc: | The field type - name: RecordSchema type: record fields: type: doc: "Must be `record`" type: name: Record_symbol type: enum symbols: - "sld:record" jsonldPredicate: _id: "sld:type" _type: "@vocab" typeDSL: true refScope: 2 fields: type: RecordField[]? jsonldPredicate: _id: sld:fields mapSubject: name mapPredicate: type doc: "Defines the fields of the record." - name: EnumSchema type: record doc: | Define an enumerated type. fields: type: doc: "Must be `enum`" type: name: Enum_symbol type: enum symbols: - "sld:enum" jsonldPredicate: _id: "sld:type" _type: "@vocab" typeDSL: true refScope: 2 symbols: type: string[] jsonldPredicate: _id: "sld:symbols" _type: "@id" identity: true doc: "Defines the set of valid symbols." - name: ArraySchema type: record fields: type: doc: "Must be `array`" type: name: Array_symbol type: enum symbols: - "sld:array" jsonldPredicate: _id: "sld:type" _type: "@vocab" typeDSL: true refScope: 2 items: type: - PrimitiveType - RecordSchema - EnumSchema - ArraySchema - string - type: array items: - PrimitiveType - RecordSchema - EnumSchema - ArraySchema - string jsonldPredicate: _id: "sld:items" _type: "@vocab" refScope: 2 doc: "Defines the type of the array elements." cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/field_name_src.yml0000644000175200017520000000025313247251315032644 0ustar mcrusoemcrusoe00000000000000 { "base": "one", "form": { "http://example.com/base": "two", "http://example.com/three": "three", }, "acid:four": "four" } cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/map_res_schema.yml0000644000175200017520000000100613247251315032655 0ustar mcrusoemcrusoe00000000000000{ "$graph": [{ "name": "MappedType", "type": "record", "documentRoot": true, "fields": [{ "name": "mapped", "type": { "type": "array", "items": "ExampleRecord" }, "jsonldPredicate": { "mapSubject": "key", "mapPredicate": "value" } }], }, { "name": "ExampleRecord", "type": "record", "fields": [{ "name": "key", "type": "string" }, { "name": "value", "type": "string" } ] }] } cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/typedsl_res.yml0000644000175200017520000000137513247251315032255 0ustar mcrusoemcrusoe00000000000000- | ## Domain Specific Language for types Fields may be tagged `typeDSL: true`. If so, the field is expanded using the following micro-DSL for schema salad types: * If the type ends with a question mark `?` it is expanded to a union with `null` * If the type ends with square brackets `[]` it is expanded to an array with items of the preceeding type symbol * The type may end with both `[]?` to indicate it is an optional array. * Identifier resolution is applied after type DSL expansion. ### Type DSL example Given the following schema: ``` - $include: typedsl_res_schema.yml - | ``` Process the following example: ``` - $include: typedsl_res_src.yml - | ``` This becomes: ``` - $include: typedsl_res_proc.yml - | ``` cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/link_res_src.yml0000644000175200017520000000047113247251315032371 0ustar mcrusoemcrusoe00000000000000{ "$base": "http://example.com/base", "link": "http://example.com/base/zero", "form": { "link": "one", "things": [ { "link": "two" }, { "link": "#three", }, { "link": "four#five", }, { "link": "acid:six", } ] } } cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/ident_res.yml0000644000175200017520000000323413247251315031670 0ustar mcrusoemcrusoe00000000000000- | ## Identifier resolution The schema may designate one or more fields as identifier fields to identify specific objects. Processing must resolve relative identifiers to absolute identifiers using the following rules: * If an identifier URI is prefixed with `#` it is a URI relative fragment identifier. It is resolved relative to the base URI by setting or replacing the fragment portion of the base URI. * If an identifier URI does not contain a scheme and is not prefixed `#` it is a parent relative fragment identifier. It is resolved relative to the base URI by the following rule: if the base URI does not contain a document fragment, set the fragment portion of the base URI. If the base URI does contain a document fragment, append a slash `/` followed by the identifier field to the fragment portion of the base URI. * If an identifier URI begins with a namespace prefix declared in `$namespaces` followed by a colon `:`, the prefix and colon must be replaced by the namespace declared in `$namespaces`. * If an identifier URI is an absolute URI consisting of a scheme and path, no processing occurs. When preprocessing visits a node containing an identifier, that identifier must be used as the base URI to process child nodes. It is an error for more than one object in a document to have the same absolute URI. ### Identifier resolution example Given the following schema: ``` - $include: ident_res_schema.yml - | ``` Process the following example: ``` - $include: ident_res_src.yml - | ``` This becomes: ``` - $include: ident_res_proc.yml - | ``` cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/link_res_schema.yml0000644000175200017520000000041013247251315033033 0ustar mcrusoemcrusoe00000000000000{ "$namespaces": { "acid": "http://example.com/acid#" }, "$graph": [{ "name": "ExampleType", "type": "record", "fields": [{ "name": "link", "type": "string", "jsonldPredicate": { "_type": "@id" } }] }] } cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/ident_res_proc.yml0000644000175200017520000000056213247251315032714 0ustar mcrusoemcrusoe00000000000000{ "id": "http://example.com/base", "form": { "id": "http://example.com/base#one", "things": [ { "id": "http://example.com/base#one/two" }, { "id": "http://example.com/base#three" }, { "id": "http://example.com/four#five", }, { "id": "http://example.com/acid#six", } ] } } cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/vocab_res_proc.yml0000644000175200017520000000036313247251315032702 0ustar mcrusoemcrusoe00000000000000 { "form": { "things": [ { "voc": "red", }, { "voc": "red", }, { "voc": "http://example.com/acid#blue", } ] } } cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/vocab_res.yml0000644000175200017520000000142313247251315031655 0ustar mcrusoemcrusoe00000000000000- | ## Vocabulary resolution The schema may designate one or more vocabulary fields which use terms defined in the vocabulary. Processing must resolve vocabulary fields to either vocabulary terms or absolute URIs by first applying the link resolution rules defined above, then applying the following additional rule: * If a reference URI is a vocabulary field, and there is a vocabulary term which maps to the resolved URI, the reference must be replace with the vocabulary term. ### Vocabulary resolution example Given the following schema: ``` - $include: vocab_res_schema.yml - | ``` Process the following example: ``` - $include: vocab_res_src.yml - | ``` This becomes: ``` - $include: vocab_res_proc.yml - | ``` cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/map_res_src.yml0000644000175200017520000000013113247251315032202 0ustar mcrusoemcrusoe00000000000000{ "mapped": { "shaggy": { "value": "scooby" }, "fred": "daphne" } }cwltool-1.0.20180302231433/cwltool/schemas/v1.0/salad/schema_salad/metaschema/field_name_proc.yml0000644000175200017520000000025313247251315033020 0ustar mcrusoemcrusoe00000000000000 { "base": "one", "form": { "base": "two", "http://example.com/three": "three", }, "http://example.com/acid#four": "four" } cwltool-1.0.20180302231433/cwltool/schemas/v1.0/Workflow.yml0000644000175200017520000005236613247251315023741 0ustar mcrusoemcrusoe00000000000000$base: "https://w3id.org/cwl/cwl#" $namespaces: cwl: "https://w3id.org/cwl/cwl#" rdfs: "http://www.w3.org/2000/01/rdf-schema#" $graph: - name: "WorkflowDoc" type: documentation doc: - | # Common Workflow Language (CWL) Workflow Description, v1.0.1 This version: * https://w3id.org/cwl/v1.0/ Current version: * https://w3id.org/cwl/ - "\n\n" - {$include: contrib.md} - "\n\n" - | # Abstract One way to define a workflow is: an analysis task represented by a directed graph describing a sequence of operations that transform an input data set to output. This specification defines the Common Workflow Language (CWL) Workflow description, a vendor-neutral standard for representing workflows intended to be portable across a variety of computing platforms. - {$include: intro.md} - | ## Introduction to CWL Workflow standard v1.0.1 This specification represents the second stable release from the CWL group. Since v1.0, v1.0.1 introduces the following updates to the CWL Workflow standard. Documents should continue to use `cwlVersion: v1.0` and existing v1.0 documents remain valid, however CWL documents that relied on previously undefined or underspecified behavior may have slightly different behavior in v1.0.1. * 12 March 2017: * Mark `default` as not required for link checking. * Add note that recursive subworkflows is not allowed. * Fix mistake in discussion of extracting field names from workflow step ids. * 23 July 2017: (v1.0.1) * Add clarification about scattering over empty arrays. * Clarify interpretation of `secondaryFiles` on inputs. * Expanded discussion of semantics of `File` and `Directory` types * Fixed typo "EMACScript" to "ECMAScript" * Clarified application of input parameter default values when the input is `null` or undefined. Since draft-3, v1.0 introduces the following changes and additions to the CWL Workflow standard: * The `inputs` and `outputs` fields have been renamed `in` and `out`. * Syntax simplifcations: denoted by the `map<>` syntax. Example: `in` contains a list of items, each with an id. Now one can specify a mapping of that identifier to the corresponding `InputParameter`. ``` in: - id: one type: string doc: First input parameter - id: two type: int doc: Second input parameter ``` can be ``` in: one: type: string doc: First input parameter two: type: int doc: Second input parameter ``` * The common field `description` has been renamed to `doc`. ## Purpose The Common Workflow Language Command Line Tool Description express workflows for data-intensive science, such as Bioinformatics, Chemistry, Physics, and Astronomy. This specification is intended to define a data and execution model for Workflows that can be implemented on top of a variety of computing platforms, ranging from an individual workstation to cluster, grid, cloud, and high performance computing systems. - {$include: concepts.md} - name: ExpressionToolOutputParameter type: record extends: OutputParameter fields: - name: type type: - "null" - "#CWLType" - "#OutputRecordSchema" - "#OutputEnumSchema" - "#OutputArraySchema" - string - type: array items: - "#CWLType" - "#OutputRecordSchema" - "#OutputEnumSchema" - "#OutputArraySchema" - string jsonldPredicate: "_id": "sld:type" "_type": "@vocab" refScope: 2 typeDSL: True doc: | Specify valid types of data that may be assigned to this parameter. - type: record name: ExpressionTool extends: Process specialize: - specializeFrom: "#OutputParameter" specializeTo: "#ExpressionToolOutputParameter" documentRoot: true doc: | Execute an expression as a Workflow step. fields: - name: "class" jsonldPredicate: "_id": "@type" "_type": "@vocab" type: string - name: expression type: [string, Expression] doc: | The expression to execute. The expression must return a JSON object which matches the output parameters of the ExpressionTool. - name: LinkMergeMethod type: enum docParent: "#WorkflowStepInput" doc: The input link merge method, described in [WorkflowStepInput](#WorkflowStepInput). symbols: - merge_nested - merge_flattened - name: WorkflowOutputParameter type: record extends: OutputParameter docParent: "#Workflow" doc: | Describe an output parameter of a workflow. The parameter must be connected to one or more parameters defined in the workflow that will provide the value of the output parameter. fields: - name: outputSource doc: | Specifies one or more workflow parameters that supply the value of to the output parameter. jsonldPredicate: "_id": "cwl:outputSource" "_type": "@id" refScope: 0 type: - string? - string[]? - name: linkMerge type: ["null", "#LinkMergeMethod"] jsonldPredicate: "cwl:linkMerge" doc: | The method to use to merge multiple sources into a single array. If not specified, the default method is "merge_nested". - name: type type: - "null" - "#CWLType" - "#OutputRecordSchema" - "#OutputEnumSchema" - "#OutputArraySchema" - string - type: array items: - "#CWLType" - "#OutputRecordSchema" - "#OutputEnumSchema" - "#OutputArraySchema" - string jsonldPredicate: "_id": "sld:type" "_type": "@vocab" refScope: 2 typeDSL: True doc: | Specify valid types of data that may be assigned to this parameter. - name: Sink type: record abstract: true fields: - name: source doc: | Specifies one or more workflow parameters that will provide input to the underlying step parameter. jsonldPredicate: "_id": "cwl:source" "_type": "@id" refScope: 2 type: - string? - string[]? - name: linkMerge type: LinkMergeMethod? jsonldPredicate: "cwl:linkMerge" doc: | The method to use to merge multiple inbound links into a single array. If not specified, the default method is "merge_nested". - type: record name: WorkflowStepInput extends: Sink docParent: "#WorkflowStep" doc: | The input of a workflow step connects an upstream parameter (from the workflow inputs, or the outputs of other workflows steps) with the input parameters of the underlying step. ## Input object A WorkflowStepInput object must contain an `id` field in the form `#fieldname` or `#prefix/fieldname`. When the `id` field contains a slash `/` the field name consists of the characters following the final slash (the prefix portion may contain one or more slashes to indicate scope). This defines a field of the workflow step input object with the value of the `source` parameter(s). ## Merging To merge multiple inbound data links, [MultipleInputFeatureRequirement](#MultipleInputFeatureRequirement) must be specified in the workflow or workflow step requirements. If the sink parameter is an array, or named in a [workflow scatter](#WorkflowStep) operation, there may be multiple inbound data links listed in the `source` field. The values from the input links are merged depending on the method specified in the `linkMerge` field. If not specified, the default method is "merge_nested". * **merge_nested** The input must be an array consisting of exactly one entry for each input link. If "merge_nested" is specified with a single link, the value from the link must be wrapped in a single-item list. * **merge_flattened** 1. The source and sink parameters must be compatible types, or the source type must be compatible with single element from the "items" type of the destination array parameter. 2. Source parameters which are arrays are concatenated. Source parameters which are single element types are appended as single elements. fields: - name: id type: string jsonldPredicate: "@id" doc: "A unique identifier for this workflow input parameter." - name: default type: ["null", Any] doc: | The default value for this parameter to use if either there is no `source` field, or the value produced by the `source` is `null`. The default must be applied prior to scattering or evaluating `valueFrom`. jsonldPredicate: _id: "cwl:default" noLinkCheck: true - name: valueFrom type: - "null" - "string" - "#Expression" jsonldPredicate: "cwl:valueFrom" doc: | To use valueFrom, [StepInputExpressionRequirement](#StepInputExpressionRequirement) must be specified in the workflow or workflow step requirements. If `valueFrom` is a constant string value, use this as the value for this input parameter. If `valueFrom` is a parameter reference or expression, it must be evaluated to yield the actual value to be assiged to the input field. The `self` value of in the parameter reference or expression must be the value of the parameter(s) specified in the `source` field, or null if there is no `source` field. The value of `inputs` in the parameter reference or expression must be the input object to the workflow step after assigning the `source` values, applying `default`, and then scattering. The order of evaluating `valueFrom` among step input parameters is undefined and the result of evaluating `valueFrom` on a parameter must not be visible to evaluation of `valueFrom` on other parameters. - type: record name: WorkflowStepOutput docParent: "#WorkflowStep" doc: | Associate an output parameter of the underlying process with a workflow parameter. The workflow parameter (given in the `id` field) be may be used as a `source` to connect with input parameters of other workflow steps, or with an output parameter of the process. fields: - name: id type: string jsonldPredicate: "@id" doc: | A unique identifier for this workflow output parameter. This is the identifier to use in the `source` field of `WorkflowStepInput` to connect the output value to downstream parameters. - name: ScatterMethod type: enum docParent: "#WorkflowStep" doc: The scatter method, as described in [workflow step scatter](#WorkflowStep). symbols: - dotproduct - nested_crossproduct - flat_crossproduct - name: WorkflowStep type: record docParent: "#Workflow" doc: | A workflow step is an executable element of a workflow. It specifies the underlying process implementation (such as `CommandLineTool` or another `Workflow`) in the `run` field and connects the input and output parameters of the underlying process to workflow parameters. # Scatter/gather To use scatter/gather, [ScatterFeatureRequirement](#ScatterFeatureRequirement) must be specified in the workflow or workflow step requirements. A "scatter" operation specifies that the associated workflow step or subworkflow should execute separately over a list of input elements. Each job making up a scatter operation is independent and may be executed concurrently. The `scatter` field specifies one or more input parameters which will be scattered. An input parameter may be listed more than once. The declared type of each input parameter is implicitly becomes an array of items of the input parameter type. If a parameter is listed more than once, it becomes a nested array. As a result, upstream parameters which are connected to scattered parameters must be arrays. All output parameter types are also implicitly wrapped in arrays. Each job in the scatter results in an entry in the output array. If any scattered parameter is empty at runtime, all outputs are set to empty arrays and no work is done for the step. If `scatter` declares more than one input parameter, `scatterMethod` describes how to decompose the input into a discrete set of jobs. * **dotproduct** specifies that each of the input arrays are aligned and one element taken from each array to construct each job. It is an error if all input arrays are not the same length. * **nested_crossproduct** specifies the Cartesian product of the inputs, producing a job for every combination of the scattered inputs. The output must be nested arrays for each level of scattering, in the order that the input arrays are listed in the `scatter` field. * **flat_crossproduct** specifies the Cartesian product of the inputs, producing a job for every combination of the scattered inputs. The output arrays must be flattened to a single level, but otherwise listed in the order that the input arrays are listed in the `scatter` field. # Subworkflows To specify a nested workflow as part of a workflow step, [SubworkflowFeatureRequirement](#SubworkflowFeatureRequirement) must be specified in the workflow or workflow step requirements. It is a fatal error if a workflow directly or indirectly invokes itself as a subworkflow (recursive workflows are not allowed). fields: - name: id type: string jsonldPredicate: "@id" doc: "The unique identifier for this workflow step." - name: in type: WorkflowStepInput[] jsonldPredicate: _id: "cwl:in" mapSubject: id mapPredicate: source doc: | Defines the input parameters of the workflow step. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used build a user interface for constructing the input object. - name: out type: - type: array items: [string, WorkflowStepOutput] jsonldPredicate: _id: "cwl:out" _type: "@id" identity: true doc: | Defines the parameters representing the output of the process. May be used to generate and/or validate the output object. - name: requirements type: ProcessRequirement[]? jsonldPredicate: _id: "cwl:requirements" mapSubject: class doc: | Declares requirements that apply to either the runtime environment or the workflow engine that must be met in order to execute this workflow step. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option. - name: hints type: Any[]? jsonldPredicate: _id: "cwl:hints" noLinkCheck: true mapSubject: class doc: | Declares hints applying to either the runtime environment or the workflow engine that may be helpful in executing this workflow step. It is not an error if an implementation cannot satisfy all hints, however the implementation may report a warning. - name: label type: string? jsonldPredicate: "rdfs:label" doc: "A short, human-readable label of this process object." - name: doc type: string? jsonldPredicate: "rdfs:comment" doc: "A long, human-readable description of this process object." - name: run type: [string, Process] jsonldPredicate: "_id": "cwl:run" "_type": "@id" doc: | Specifies the process to run. - name: scatter type: - string? - string[]? jsonldPredicate: "_id": "cwl:scatter" "_type": "@id" "_container": "@list" refScope: 0 - name: scatterMethod doc: | Required if `scatter` is an array of more than one element. type: ScatterMethod? jsonldPredicate: "_id": "cwl:scatterMethod" "_type": "@vocab" - name: Workflow type: record extends: "#Process" documentRoot: true specialize: - specializeFrom: "#OutputParameter" specializeTo: "#WorkflowOutputParameter" doc: | A workflow describes a set of **steps** and the **dependencies** between those steps. When a step produces output that will be consumed by a second step, the first step is a dependency of the second step. When there is a dependency, the workflow engine must execute the preceeding step and wait for it to successfully produce output before executing the dependent step. If two steps are defined in the workflow graph that are not directly or indirectly dependent, these steps are **independent**, and may execute in any order or execute concurrently. A workflow is complete when all steps have been executed. Dependencies between parameters are expressed using the `source` field on [workflow step input parameters](#WorkflowStepInput) and [workflow output parameters](#WorkflowOutputParameter). The `source` field expresses the dependency of one parameter on another such that when a value is associated with the parameter specified by `source`, that value is propagated to the destination parameter. When all data links inbound to a given step are fufilled, the step is ready to execute. ## Workflow success and failure A completed step must result in one of `success`, `temporaryFailure` or `permanentFailure` states. An implementation may choose to retry a step execution which resulted in `temporaryFailure`. An implementation may choose to either continue running other steps of a workflow, or terminate immediately upon `permanentFailure`. * If any step of a workflow execution results in `permanentFailure`, then the workflow status is `permanentFailure`. * If one or more steps result in `temporaryFailure` and all other steps complete `success` or are not executed, then the workflow status is `temporaryFailure`. * If all workflow steps are executed and complete with `success`, then the workflow status is `success`. # Extensions [ScatterFeatureRequirement](#ScatterFeatureRequirement) and [SubworkflowFeatureRequirement](#SubworkflowFeatureRequirement) are available as standard [extensions](#Extensions_and_Metadata) to core workflow semantics. fields: - name: "class" jsonldPredicate: "_id": "@type" "_type": "@vocab" type: string - name: steps doc: | The individual steps that make up the workflow. Each step is executed when all of its input data links are fufilled. An implementation may choose to execute the steps in a different order than listed and/or execute steps concurrently, provided that dependencies between steps are met. type: - type: array items: "#WorkflowStep" jsonldPredicate: mapSubject: id - type: record name: SubworkflowFeatureRequirement extends: ProcessRequirement doc: | Indicates that the workflow platform must support nested workflows in the `run` field of [WorkflowStep](#WorkflowStep). fields: - name: "class" type: "string" doc: "Always 'SubworkflowFeatureRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: ScatterFeatureRequirement type: record extends: ProcessRequirement doc: | Indicates that the workflow platform must support the `scatter` and `scatterMethod` fields of [WorkflowStep](#WorkflowStep). fields: - name: "class" type: "string" doc: "Always 'ScatterFeatureRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: MultipleInputFeatureRequirement type: record extends: ProcessRequirement doc: | Indicates that the workflow platform must support multiple inbound data links listed in the `source` field of [WorkflowStepInput](#WorkflowStepInput). fields: - name: "class" type: "string" doc: "Always 'MultipleInputFeatureRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - type: record name: StepInputExpressionRequirement extends: ProcessRequirement doc: | Indicate that the workflow platform must support the `valueFrom` field of [WorkflowStepInput](#WorkflowStepInput). fields: - name: "class" type: "string" doc: "Always 'StepInputExpressionRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" cwltool-1.0.20180302231433/cwltool/schemas/v1.0/CommandLineTool-standalone.yml0000644000175200017520000000006513247251315027266 0ustar mcrusoemcrusoe00000000000000- $import: Process.yml - $import: CommandLineTool.ymlcwltool-1.0.20180302231433/cwltool/schemas/v1.0/intro.md0000644000175200017520000000156413247251315023053 0ustar mcrusoemcrusoe00000000000000# Status of this document This document is the product of the [Common Workflow Language working group](https://groups.google.com/forum/#!forum/common-workflow-language). The latest version of this document is available in the "v1.0" directory at https://github.com/common-workflow-language/common-workflow-language The products of the CWL working group (including this document) are made available under the terms of the Apache License, version 2.0. # Introduction The Common Workflow Language (CWL) working group is an informal, multi-vendor working group consisting of various organizations and individuals that have an interest in portability of data analysis workflows. The goal is to create specifications like this one that enable data scientists to describe analysis tools and workflows that are powerful, easy to use, portable, and support reproducibility. cwltool-1.0.20180302231433/cwltool/schemas/v1.0/CommandLineTool.yml0000644000175200017520000007711713247251315025154 0ustar mcrusoemcrusoe00000000000000$base: "https://w3id.org/cwl/cwl#" $namespaces: cwl: "https://w3id.org/cwl/cwl#" $graph: - name: CommandLineToolDoc type: documentation doc: - | # Common Workflow Language (CWL) Command Line Tool Description, v1.0.1 This version: * https://w3id.org/cwl/v1.0/ Current version: * https://w3id.org/cwl/ - "\n\n" - {$include: contrib.md} - "\n\n" - | # Abstract A Command Line Tool is a non-interactive executable program that reads some input, performs a computation, and terminates after producing some output. Command line programs are a flexible unit of code sharing and reuse, unfortunately the syntax and input/output semantics among command line programs is extremely heterogeneous. A common layer for describing the syntax and semantics of programs can reduce this incidental complexity by providing a consistent way to connect programs together. This specification defines the Common Workflow Language (CWL) Command Line Tool Description, a vendor-neutral standard for describing the syntax and input/output semantics of command line programs. - {$include: intro.md} - | ## Introduction to CWL Command Line Tool standard v1.0.1 This specification represents the second stable release from the CWL group. Since v1.0, v1.0.1 introduces the following updates to the CWL Command Line Tool standard. Documents should continue to use `cwlVersion: v1.0` and existing v1.0 documents remain valid, however CWL documents that relied on previously undefined or underspecified behavior may have slightly different behavior in v1.0.1. * 13 July 2016: Mark `baseCommand` as optional and update descriptive text. * 12 March 2017: * Mark `default` as not required for link checking. * Add note that files in InitialWorkDir must have path in output directory. * Add note that writable: true applies recursively. * 23 July 2017: (v1.0.1) * Add clarification about scattering over empty arrays. * Clarify interpretation of `secondaryFiles` on inputs. * Expanded discussion of semantics of `File` and `Directory` types * Fixed typo "EMACScript" to "ECMAScript" * Clarified application of input parameter default values when the input is `null` or undefined. * Clarified valid types and meaning of the format field on inputs versus outputs Since draft-3, v1.0 introduces the following changes and additions to the CWL Command Line Tool standard: * The [Directory](#Directory) type. * Syntax simplifcations: denoted by the `map<>` syntax. Example: inputs contains a list of items, each with an id. Now one can specify a mapping of that identifier to the corresponding `CommandInputParamater`. ``` inputs: - id: one type: string doc: First input parameter - id: two type: int doc: Second input parameter ``` can be ``` inputs: one: type: string doc: First input parameter two: type: int doc: Second input parameter ``` * [InitialWorkDirRequirement](#InitialWorkDirRequirement): list of files and subdirectories to be present in the output directory prior to execution. * Shortcuts for specifying the standard [output](#stdout) and/or [error](#stderr) streams as a (streamable) File output. * [SoftwareRequirement](#SoftwareRequirement) for describing software dependencies of a tool. * The common `description` field has been renamed to `doc`. ## Purpose Standalone programs are a flexible and interoperable form of code reuse. Unlike monolithic applications, applications and analysis workflows which are composed of multiple separate programs can be written in multiple languages and execute concurrently on multiple hosts. However, POSIX does not dictate computer-readable grammar or semantics for program input and output, resulting in extremely heterogeneous command line grammar and input/output semantics among program. This is a particular problem in distributed computing (multi-node compute clusters) and virtualized environments (such as Docker containers) where it is often necessary to provision resources such as input files before executing the program. Often this gap is filled by hard coding program invocation and implicitly assuming requirements will be met, or abstracting program invocation with wrapper scripts or descriptor documents. Unfortunately, where these approaches are application or platform specific it creates a significant barrier to reproducibility and portability, as methods developed for one platform must be manually ported to be used on new platforms. Similarly it creates redundant work, as wrappers for popular tools must be rewritten for each application or platform in use. The Common Workflow Language Command Line Tool Description is designed to provide a common standard description of grammar and semantics for invoking programs used in data-intensive fields such as Bioinformatics, Chemistry, Physics, Astronomy, and Statistics. This specification defines a precise data and execution model for Command Line Tools that can be implemented on a variety of computing platforms, ranging from a single workstation to cluster, grid, cloud, and high performance computing platforms. - {$include: concepts.md} - {$include: invocation.md} - type: record name: EnvironmentDef doc: | Define an environment variable that will be set in the runtime environment by the workflow platform when executing the command line tool. May be the result of executing an expression, such as getting a parameter from input. fields: - name: envName type: string doc: The environment variable name - name: envValue type: [string, Expression] doc: The environment variable value - type: record name: CommandLineBinding extends: InputBinding doc: | When listed under `inputBinding` in the input schema, the term "value" refers to the the corresponding value in the input object. For binding objects listed in `CommandLineTool.arguments`, the term "value" refers to the effective value after evaluating `valueFrom`. The binding behavior when building the command line depends on the data type of the value. If there is a mismatch between the type described by the input schema and the effective value, such as resulting from an expression evaluation, an implementation must use the data type of the effective value. - **string**: Add `prefix` and the string to the command line. - **number**: Add `prefix` and decimal representation to command line. - **boolean**: If true, add `prefix` to the command line. If false, add nothing. - **File**: Add `prefix` and the value of [`File.path`](#File) to the command line. - **array**: If `itemSeparator` is specified, add `prefix` and the join the array into a single string with `itemSeparator` separating the items. Otherwise first add `prefix`, then recursively process individual elements. - **object**: Add `prefix` only, and recursively add object fields for which `inputBinding` is specified. - **null**: Add nothing. fields: - name: position type: int? doc: "The sorting key. Default position is 0." - name: prefix type: string? doc: "Command line prefix to add before the value." - name: separate type: boolean? doc: | If true (default), then the prefix and value must be added as separate command line arguments; if false, prefix and value must be concatenated into a single command line argument. - name: itemSeparator type: string? doc: | Join the array elements into a single string with the elements separated by by `itemSeparator`. - name: valueFrom type: - "null" - string - Expression jsonldPredicate: "cwl:valueFrom" doc: | If `valueFrom` is a constant string value, use this as the value and apply the binding rules above. If `valueFrom` is an expression, evaluate the expression to yield the actual value to use to build the command line and apply the binding rules above. If the inputBinding is associated with an input parameter, the value of `self` in the expression will be the value of the input parameter. Input parameter defaults (as specified by the `InputParameter.default` field) must be applied before evaluating the expression. When a binding is part of the `CommandLineTool.arguments` field, the `valueFrom` field is required. - name: shellQuote type: boolean? doc: | If `ShellCommandRequirement` is in the requirements for the current command, this controls whether the value is quoted on the command line (default is true). Use `shellQuote: false` to inject metacharacters for operations such as pipes. - type: record name: CommandOutputBinding extends: OutputBinding doc: | Describes how to generate an output parameter based on the files produced by a CommandLineTool. The output parameter is generated by applying these operations in the following order: - glob - loadContents - outputEval fields: - name: glob type: - "null" - string - Expression - type: array items: string doc: | Find files relative to the output directory, using POSIX glob(3) pathname matching. If an array is provided, find files that match any pattern in the array. If an expression is provided, the expression must return a string or an array of strings, which will then be evaluated as one or more glob patterns. Must only match and return files which actually exist. - name: loadContents type: - "null" - boolean jsonldPredicate: "cwl:loadContents" doc: | For each file matched in `glob`, read up to the first 64 KiB of text from the file and place it in the `contents` field of the file object for manipulation by `outputEval`. - name: outputEval type: - "null" - string - Expression doc: | Evaluate an expression to generate the output value. If `glob` was specified, the value of `self` must be an array containing file objects that were matched. If no files were matched, `self` must be a zero length array; if a single file was matched, the value of `self` is an array of a single element. Additionally, if `loadContents` is `true`, the File objects must include up to the first 64 KiB of file contents in the `contents` field. - name: CommandInputRecordField type: record extends: InputRecordField specialize: - specializeFrom: InputRecordSchema specializeTo: CommandInputRecordSchema - specializeFrom: InputEnumSchema specializeTo: CommandInputEnumSchema - specializeFrom: InputArraySchema specializeTo: CommandInputArraySchema - specializeFrom: InputBinding specializeTo: CommandLineBinding - name: CommandInputRecordSchema type: record extends: InputRecordSchema specialize: - specializeFrom: InputRecordField specializeTo: CommandInputRecordField - name: CommandInputEnumSchema type: record extends: InputEnumSchema specialize: - specializeFrom: InputBinding specializeTo: CommandLineBinding - name: CommandInputArraySchema type: record extends: InputArraySchema specialize: - specializeFrom: InputRecordSchema specializeTo: CommandInputRecordSchema - specializeFrom: InputEnumSchema specializeTo: CommandInputEnumSchema - specializeFrom: InputArraySchema specializeTo: CommandInputArraySchema - specializeFrom: InputBinding specializeTo: CommandLineBinding - name: CommandOutputRecordField type: record extends: OutputRecordField specialize: - specializeFrom: OutputRecordSchema specializeTo: CommandOutputRecordSchema - specializeFrom: OutputEnumSchema specializeTo: CommandOutputEnumSchema - specializeFrom: OutputArraySchema specializeTo: CommandOutputArraySchema - specializeFrom: OutputBinding specializeTo: CommandOutputBinding - name: CommandOutputRecordSchema type: record extends: OutputRecordSchema specialize: - specializeFrom: OutputRecordField specializeTo: CommandOutputRecordField - name: CommandOutputEnumSchema type: record extends: OutputEnumSchema specialize: - specializeFrom: OutputRecordSchema specializeTo: CommandOutputRecordSchema - specializeFrom: OutputEnumSchema specializeTo: CommandOutputEnumSchema - specializeFrom: OutputArraySchema specializeTo: CommandOutputArraySchema - specializeFrom: OutputBinding specializeTo: CommandOutputBinding - name: CommandOutputArraySchema type: record extends: OutputArraySchema specialize: - specializeFrom: OutputRecordSchema specializeTo: CommandOutputRecordSchema - specializeFrom: OutputEnumSchema specializeTo: CommandOutputEnumSchema - specializeFrom: OutputArraySchema specializeTo: CommandOutputArraySchema - specializeFrom: OutputBinding specializeTo: CommandOutputBinding - type: record name: CommandInputParameter extends: InputParameter doc: An input parameter for a CommandLineTool. specialize: - specializeFrom: InputRecordSchema specializeTo: CommandInputRecordSchema - specializeFrom: InputEnumSchema specializeTo: CommandInputEnumSchema - specializeFrom: InputArraySchema specializeTo: CommandInputArraySchema - specializeFrom: InputBinding specializeTo: CommandLineBinding - type: record name: CommandOutputParameter extends: OutputParameter doc: An output parameter for a CommandLineTool. specialize: - specializeFrom: OutputBinding specializeTo: CommandOutputBinding fields: - name: type type: - "null" - CWLType - stdout - stderr - CommandOutputRecordSchema - CommandOutputEnumSchema - CommandOutputArraySchema - string - type: array items: - CWLType - CommandOutputRecordSchema - CommandOutputEnumSchema - CommandOutputArraySchema - string jsonldPredicate: "_id": "sld:type" "_type": "@vocab" refScope: 2 typeDSL: True doc: | Specify valid types of data that may be assigned to this parameter. - name: stdout type: enum symbols: [ "cwl:stdout" ] docParent: "#CommandOutputParameter" doc: | Only valid as a `type` for a `CommandLineTool` output with no `outputBinding` set. The following ``` outputs: an_output_name: type: stdout stdout: a_stdout_file ``` is equivalent to ``` outputs: an_output_name: type: File streamable: true outputBinding: glob: a_stdout_file stdout: a_stdout_file ``` If there is no `stdout` name provided, a random filename will be created. For example, the following ``` outputs: an_output_name: type: stdout ``` is equivalent to ``` outputs: an_output_name: type: File streamable: true outputBinding: glob: random_stdout_filenameABCDEFG stdout: random_stdout_filenameABCDEFG ``` - name: stderr type: enum symbols: [ "cwl:stderr" ] docParent: "#CommandOutputParameter" doc: | Only valid as a `type` for a `CommandLineTool` output with no `outputBinding` set. The following ``` outputs: an_output_name: type: stderr stderr: a_stderr_file ``` is equivalent to ``` outputs: an_output_name: type: File streamable: true outputBinding: glob: a_stderr_file stderr: a_stderr_file ``` If there is no `stderr` name provided, a random filename will be created. For example, the following ``` outputs: an_output_name: type: stderr ``` is equivalent to ``` outputs: an_output_name: type: File streamable: true outputBinding: glob: random_stderr_filenameABCDEFG stderr: random_stderr_filenameABCDEFG ``` - type: record name: CommandLineTool extends: Process documentRoot: true specialize: - specializeFrom: InputParameter specializeTo: CommandInputParameter - specializeFrom: OutputParameter specializeTo: CommandOutputParameter doc: | This defines the schema of the CWL Command Line Tool Description document. fields: - name: class jsonldPredicate: "_id": "@type" "_type": "@vocab" type: string - name: baseCommand doc: | Specifies the program to execute. If an array, the first element of the array is the command to execute, and subsequent elements are mandatory command line arguments. The elements in `baseCommand` must appear before any command line bindings from `inputBinding` or `arguments`. If `baseCommand` is not provided or is an empty array, the first element of the command line produced after processing `inputBinding` or `arguments` must be used as the program to execute. If the program includes a path separator character it must be an absolute path, otherwise it is an error. If the program does not include a path separator, search the `$PATH` variable in the runtime environment of the workflow runner find the absolute path of the executable. type: - string? - string[]? jsonldPredicate: "_id": "cwl:baseCommand" "_container": "@list" - name: arguments doc: | Command line bindings which are not directly associated with input parameters. type: - "null" - type: array items: [string, Expression, CommandLineBinding] jsonldPredicate: "_id": "cwl:arguments" "_container": "@list" - name: stdin type: ["null", string, Expression] doc: | A path to a file whose contents must be piped into the command's standard input stream. - name: stderr type: ["null", string, Expression] jsonldPredicate: "https://w3id.org/cwl/cwl#stderr" doc: | Capture the command's standard error stream to a file written to the designated output directory. If `stderr` is a string, it specifies the file name to use. If `stderr` is an expression, the expression is evaluated and must return a string with the file name to use to capture stderr. If the return value is not a string, or the resulting path contains illegal characters (such as the path separator `/`) it is an error. - name: stdout type: ["null", string, Expression] jsonldPredicate: "https://w3id.org/cwl/cwl#stdout" doc: | Capture the command's standard output stream to a file written to the designated output directory. If `stdout` is a string, it specifies the file name to use. If `stdout` is an expression, the expression is evaluated and must return a string with the file name to use to capture stdout. If the return value is not a string, or the resulting path contains illegal characters (such as the path separator `/`) it is an error. - name: successCodes type: int[]? doc: | Exit codes that indicate the process completed successfully. - name: temporaryFailCodes type: int[]? doc: | Exit codes that indicate the process failed due to a possibly temporary condition, where executing the process with the same runtime environment and inputs may produce different results. - name: permanentFailCodes type: int[]? doc: Exit codes that indicate the process failed due to a permanent logic error, where executing the process with the same runtime environment and same inputs is expected to always fail. - type: record name: DockerRequirement extends: ProcessRequirement doc: | Indicates that a workflow component should be run in a [Docker](http://docker.com) container, and specifies how to fetch or build the image. If a CommandLineTool lists `DockerRequirement` under `hints` (or `requirements`), it may (or must) be run in the specified Docker container. The platform must first acquire or install the correct Docker image as specified by `dockerPull`, `dockerImport`, `dockerLoad` or `dockerFile`. The platform must execute the tool in the container using `docker run` with the appropriate Docker image and tool command line. The workflow platform may provide input files and the designated output directory through the use of volume bind mounts. The platform may rewrite file paths in the input object to correspond to the Docker bind mounted locations. When running a tool contained in Docker, the workflow platform must not assume anything about the contents of the Docker container, such as the presence or absence of specific software, except to assume that the generated command line represents a valid command within the runtime environment of the container. ## Interaction with other requirements If [EnvVarRequirement](#EnvVarRequirement) is specified alongside a DockerRequirement, the environment variables must be provided to Docker using `--env` or `--env-file` and interact with the container's preexisting environment as defined by Docker. fields: - name: class type: string doc: "Always 'DockerRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: dockerPull type: string? doc: "Specify a Docker image to retrieve using `docker pull`." - name: dockerLoad type: string? doc: "Specify a HTTP URL from which to download a Docker image using `docker load`." - name: dockerFile type: string? doc: "Supply the contents of a Dockerfile which will be built using `docker build`." - name: dockerImport type: string? doc: "Provide HTTP URL to download and gunzip a Docker images using `docker import." - name: dockerImageId type: string? doc: | The image id that will be used for `docker run`. May be a human-readable image name or the image identifier hash. May be skipped if `dockerPull` is specified, in which case the `dockerPull` image id must be used. - name: dockerOutputDirectory type: string? doc: | Set the designated output directory to a specific location inside the Docker container. - type: record name: SoftwareRequirement extends: ProcessRequirement doc: | A list of software packages that should be configured in the environment of the defined process. fields: - name: class type: string doc: "Always 'SoftwareRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: packages type: SoftwarePackage[] doc: "The list of software to be configured." jsonldPredicate: mapSubject: package mapPredicate: specs - name: SoftwarePackage type: record fields: - name: package type: string doc: "The common name of the software to be configured." - name: version type: string[]? doc: "The (optional) version of the software to configured." - name: specs type: string[]? doc: | Must be one or more IRIs identifying resources for installing or enabling the software. Implementations may provide resolvers which map well-known software spec IRIs to some configuration action. For example, an IRI `https://packages.debian.org/jessie/bowtie` could be resolved with `apt-get install bowtie`. An IRI `https://anaconda.org/bioconda/bowtie` could be resolved with `conda install -c bioconda bowtie`. Tools may also provide IRIs to index entries such as [RRID](http://www.identifiers.org/rrid/), such as `http://identifiers.org/rrid/RRID:SCR_005476` - name: Dirent type: record doc: | Define a file or subdirectory that must be placed in the designated output directory prior to executing the command line tool. May be the result of executing an expression, such as building a configuration file from a template. fields: - name: entryname type: ["null", string, Expression] jsonldPredicate: _id: cwl:entryname doc: | The name of the file or subdirectory to create in the output directory. If `entry` is a File or Directory, this overrides `basename`. Optional. - name: entry type: [string, Expression] jsonldPredicate: _id: cwl:entry doc: | If the value is a string literal or an expression which evaluates to a string, a new file must be created with the string as the file contents. If the value is an expression that evaluates to a `File` object, this indicates the referenced file should be added to the designated output directory prior to executing the tool. If the value is an expression that evaluates to a `Dirent` object, this indicates that the File or Directory in `entry` should be added to the designated output directory with the name in `entryname`. If `writable` is false, the file may be made available using a bind mount or file system link to avoid unnecessary copying of the input file. - name: writable type: boolean? doc: | If true, the file or directory must be writable by the tool. Changes to the file or directory must be isolated and not visible by any other CommandLineTool process. This may be implemented by making a copy of the original file or directory. Default false (files and directories read-only by default). A directory marked as `writable: true` implies that all files and subdirectories are recursively writable as well. - name: InitialWorkDirRequirement type: record extends: ProcessRequirement doc: Define a list of files and subdirectories that must be created by the workflow platform in the designated output directory prior to executing the command line tool. fields: - name: class type: string doc: InitialWorkDirRequirement jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: listing type: - type: array items: [File, Directory, Dirent, string, Expression] - string - Expression jsonldPredicate: _id: "cwl:listing" doc: | The list of files or subdirectories that must be placed in the designated output directory prior to executing the command line tool. May be an expression. If so, the expression return value must validate as `{type: array, items: [File, Directory]}`. Files or Directories which are listed in the input parameters and appear in the `InitialWorkDirRequirement` listing must have their `path` set to their staged location in the designated output directory. If the same File or Directory appears more than once in the `InitialWorkDirRequirement` listing, the implementation must choose exactly one value for `path`; how this value is chosen is undefined. - name: EnvVarRequirement type: record extends: ProcessRequirement doc: | Define a list of environment variables which will be set in the execution environment of the tool. See `EnvironmentDef` for details. fields: - name: class type: string doc: "Always 'EnvVarRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: envDef type: EnvironmentDef[] doc: The list of environment variables. jsonldPredicate: mapSubject: envName mapPredicate: envValue - type: record name: ShellCommandRequirement extends: ProcessRequirement doc: | Modify the behavior of CommandLineTool to generate a single string containing a shell command line. Each item in the argument list must be joined into a string separated by single spaces and quoted to prevent intepretation by the shell, unless `CommandLineBinding` for that argument contains `shellQuote: false`. If `shellQuote: false` is specified, the argument is joined into the command string without quoting, which allows the use of shell metacharacters such as `|` for pipes. fields: - name: class type: string doc: "Always 'ShellCommandRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - type: record name: ResourceRequirement extends: ProcessRequirement doc: | Specify basic hardware resource requirements. "min" is the minimum amount of a resource that must be reserved to schedule a job. If "min" cannot be satisfied, the job should not be run. "max" is the maximum amount of a resource that the job shall be permitted to use. If a node has sufficient resources, multiple jobs may be scheduled on a single node provided each job's "max" resource requirements are met. If a job attempts to exceed its "max" resource allocation, an implementation may deny additional resources, which may result in job failure. If "min" is specified but "max" is not, then "max" == "min" If "max" is specified by "min" is not, then "min" == "max". It is an error if max < min. It is an error if the value of any of these fields is negative. If neither "min" nor "max" is specified for a resource, an implementation may provide a default. fields: - name: class type: string doc: "Always 'ResourceRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: coresMin type: ["null", long, string, Expression] doc: Minimum reserved number of CPU cores - name: coresMax type: ["null", int, string, Expression] doc: Maximum reserved number of CPU cores - name: ramMin type: ["null", long, string, Expression] doc: Minimum reserved RAM in mebibytes (2**20) - name: ramMax type: ["null", long, string, Expression] doc: Maximum reserved RAM in mebibytes (2**20) - name: tmpdirMin type: ["null", long, string, Expression] doc: Minimum reserved filesystem based storage for the designated temporary directory, in mebibytes (2**20) - name: tmpdirMax type: ["null", long, string, Expression] doc: Maximum reserved filesystem based storage for the designated temporary directory, in mebibytes (2**20) - name: outdirMin type: ["null", long, string, Expression] doc: Minimum reserved filesystem based storage for the designated output directory, in mebibytes (2**20) - name: outdirMax type: ["null", long, string, Expression] doc: Maximum reserved filesystem based storage for the designated output directory, in mebibytes (2**20) cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/0000755000175200017520000000000013247251336022367 5ustar mcrusoemcrusoe00000000000000cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/UserGuide.yml0000644000175200017520000006742413247251315025020 0ustar mcrusoemcrusoe00000000000000- name: userguide type: documentation doc: - $include: userguide-intro.md - | # Wrapping Command Line Tools - | ## First example The simplest "hello world" program. This accepts one input parameter, writes a message to the terminal or job log, and produces no permanent output. CWL documents are written in [JSON](http://json.org) or [YAML](http://yaml.org), or a mix of the two. *1st-tool.cwl* ``` - $include: examples/1st-tool.cwl - | ``` Use a YAML object in a separate file to describe the input of a run: *echo-job.yml* ``` - $include: examples/echo-job.yml - | ``` Now invoke `cwl-runner` with the tool wrapper and the input object on the command line: ``` $ cwl-runner 1st-tool.cwl echo-job.yml [job 140199012414352] $ echo 'Hello world!' Hello world! Final process status is success ``` What's going on here? Let's break it down: ``` cwlVersion: v1.1.0-dev1 class: CommandLineTool ``` The `cwlVersion` field indicates the version of the CWL spec used by the document. The `class` field indicates this document describes a command line tool. ``` baseCommand: echo ``` The `baseCommand` provides the name of program that will actually run (echo) ``` inputs: message: type: string inputBinding: position: 1 ``` The `inputs` section describes the inputs of the tool. This is a list of input parameters and each parameter includes an identifier, a data type, and optionally an `inputBinding` which describes how this input parameter should appear on the command line. In this example, the `position` field indicates where it should appear on the command line. ``` outputs: [] ``` This tool has no formal output, so the `outputs` section is an empty list. - | ## Essential input parameters The `inputs` of a tool is a list of input parameters that control how to run the tool. Each parameter has an `id` for the name of parameter, and `type` describing what types of values are valid for that parameter. Available primitive types are *string*, *int*, *long*, *float*, *double*, and *null*; complex types are *array* and *record*; in addition there are special types *File*, *Directory* and *Any*. The following example demonstrates some input parameters with different types and appearing on the command line in different ways: *inp.cwl* ``` - $include: examples/inp.cwl - | ``` *inp-job.yml* ``` - $include: examples/inp-job.yml - | ``` Notice that "example_file", as a `File` type, must be provided as an object with the fields `class: File` and `path`. Next, create a whale.txt and invoke `cwl-runner` with the tool wrapper and the input object on the command line: ``` $ touch whale.txt $ cwl-runner inp.cwl inp-job.yml [job 140020149614160] /home/example$ echo -f -i42 --example-string hello --file=/home/example/whale.txt -f -i42 --example-string hello --file=/home/example/whale.txt Final process status is success ``` The field `inputBinding` is optional and indicates whether and how the input parameter should be appear on the tool's command line. If `inputBinding` is missing, the parameter does not appear on the command line. Let's look at each example in detail. ``` example_flag: type: boolean inputBinding: position: 1 prefix: -f ``` Boolean types are treated as a flag. If the input parameter "example_flag" is "true", then `prefix` will be added to the command line. If false, no flag is added. ``` example_string: type: string inputBinding: position: 3 prefix: --example-string ``` String types appear on the command line as literal values. The `prefix` is optional, if provided, it appears as a separate argument on the command line before the parameter . In the example above, this is rendered as `--example-string hello`. ``` example_int: type: int inputBinding: position: 2 prefix: -i separate: false ``` Integer (and floating point) types appear on the command line with decimal text representation. When the option `separate` is false (the default value is true), the prefix and value are combined into a single argument. In the example above, this is rendered as `-i42`. ``` example_file: type: File? inputBinding: prefix: --file= separate: false position: 4 ``` File types appear on the command line as the path to the file. When the parameter type ends with a question mark `?` it indicates that the parameter is optional. In the example above, this is rendered as `--file=/home/example/whale.txt`. However, if the "example_file" parameter were not provided in the input, nothing would appear on the command line. Input files are read-only. If you wish to update an input file, you must first copy it to the output directory. The value of `position` is used to determine where parameter should appear on the command line. Positions are relative to one another, not absolute. As a result, positions do not have to be sequential, three parameters with positions `[1, 3, 5]` will result in the same command line as `[1, 2, 3]`. More than one parameter can have the same position (ties are broken using the parameter name), and the position field itself is optional. the default position is 0. The `baseCommand` field always comes before parameters. - | ## Returning output files The `outputs` of a tool is a list of output parameters that should be returned after running the tool. Each parameter has an `id` for the name of parameter, and `type` describing what types of values are valid for that parameter. When a tool runs under CWL, the starting working directory is the designated output directory. The underlying tool or script must record its results in the form of files created in the output directory. The output parameters returned by the CWL tool are either the output files themselves, or come from examining the content of those files. *tar.cwl* ``` - $include: examples/tar.cwl - | ``` *tar-job.yml* ``` - $include: examples/tar-job.yml - | ``` Next, create a tar file for the example and invoke `cwl-runner` with the tool wrapper and the input object on the command line: ``` $ touch hello.txt && tar -cvf hello.tar hello.txt $ cwl-runner tar.cwl tar-job.yml [job 139868145165200] $ tar xf /home/example/hello.tar Final process status is success { "example_out": { "location": "hello.txt", "size": 13, "class": "File", "checksum": "sha1$47a013e660d408619d894b20806b1d5086aab03b" } } ``` The field `outputBinding` describes how to to set the value of each output parameter. ``` outputs: example_out: type: File outputBinding: glob: hello.txt ``` The `glob` field consists of the name of a file in the output directory. If you don't know name of the file in advance, you can use a wildcard pattern. - | ## Capturing a tool's standard output stream To capture a tool's standard output stream, add the `stdout` field with the name of the file where the output stream should go. Then add `type: stdout` on the corresponding output parameter. *stdout.cwl* ``` - $include: examples/stdout.cwl - | ``` *echo-job.yml* ``` - $include: examples/echo-job.yml - | ``` Now invoke `cwl-runner` providing the tool wrapper and the input object on the command line: ``` $ cwl-runner stdout.cwl echo-job.yml [job 140199012414352] $ echo 'Hello world!' > output.txt Final process status is success { "output": { "location": "output.txt", "size": 13, "class": "File", "checksum": "sha1$47a013e660d408619d894b20806b1d5086aab03b" } } $ cat output.txt Hello world! ``` - | ## Parameter references In a previous example, we extracted a file using the "tar" program. However, that example was very limited because it assumed that the file we were interested in was called "hello.txt". In this example, you will see how to reference the value of input parameters dynamically from other fields. *tar-param.cwl* ``` - $include: examples/tar-param.cwl - | ``` *tar-param-job.yml* ``` - $include: examples/tar-param-job.yml - | ``` Create your input files and invoke `cwl-runner` with the tool wrapper and the input object on the command line: ``` $ rm hello.tar || true && touch goodbye.txt && tar -cvf hello.tar goodbye.txt $ cwl-runner tar-param.cwl tar-param-job.yml [job 139868145165200] $ tar xf /home/example/hello.tar goodbye.txt Final process status is success { "example_out": { "location": "goodbye.txt", "size": 24, "class": "File", "checksum": "sha1$dd0a4c4c49ba43004d6611771972b6cf969c1c01" } } ``` Certain fields permit parameter references which are enclosed in `$(...)`. These are evaluated and replaced with value being referenced. ``` outputs: example_out: type: File outputBinding: glob: $(inputs.extractfile) ``` References are written using a subset of Javascript syntax. In this example, `$(inputs.extractfile)`, `$(inputs["extractfile"])`, and `$(inputs['extractfile'])` are equivalent. The value of the "inputs" variable is the input object provided when the CWL tool was invoked. Note that because File parameters are objects, to get the path to an input file you must reference the path field on a file object; to reference the path to the tar file in the above example you would write `$(inputs.tarfile.path)`. - | ## Running tools inside Docker [Docker](http://docker.io) containers simplify software installation by providing a complete known-good runtime for software and its dependencies. However, containers are also purposefully isolated from the host system, so in order to run a tool inside a Docker container there is additional work to ensure that input files are available inside the container and output files can be recovered from the container. CWL can perform this work automatically, allowing you to use Docker to simplify your software management while avoiding the complexity of invoking and managing Docker containers. This example runs a simple Node.js script inside a Docker container. *docker.cwl* ``` - $include: examples/docker.cwl - | ``` *docker-job.yml* ``` - $include: examples/docker-job.yml - | ``` Provide a hello.js and invoke `cwl-runner` providing the tool wrapper and the input object on the command line: ``` $ echo "console.log(\"Hello World\");" > hello.js $ cwl-runner docker.cwl docker-job.yml [job 140259721854416] /home/example$ docker run -i --volume=/home/example/hello.js:/var/lib/cwl/job369354770_examples/hello.js:ro --volume=/home/example:/var/spool/cwl:rw --volume=/tmp/tmpDLs5hm:/tmp:rw --workdir=/var/spool/cwl --read-only=true --net=none --user=1001 --rm --env=TMPDIR=/tmp node:slim node /var/lib/cwl/job369354770_examples/hello.js Hello world! Final process status is success ``` Notice the CWL runner has constructed a Docker command line to run the script. One of the responsibilies of the CWL runner is to the paths of input files to reflect the location where they appear inside the container. In this example, the path to the script `hello.js` is `/home/example/hello.js` outside the container but `/var/lib/cwl/job369354770_examples/hello.js` inside the container, as reflected in the invocation of the `node` command. - | ## Additional command line arguments and runtime parameters Sometimes tools require additional command line options that don't correspond exactly to input parameters. In this example, we will wrap the Java compiler to compile a java source file to a class file. By default, `javac` will create the class files in the same directory as the source file. However, CWL input files (and the directories in which they appear) may be read-only, so we need to instruct javac to write the class file to the designated output directory instead. *arguments.cwl* ``` - $include: examples/arguments.cwl - | ``` *arguments-job.yml* ``` - $include: examples/arguments-job.yml - | ``` Now create a sample Java file and invoke `cwl-runner` providing the tool wrapper and the input object on the command line: ``` $ echo "public class Hello {}" > Hello.java $ cwl-runner arguments.cwl arguments-job.yml [job arguments.cwl] /tmp/tmpwYALo1$ docker \ run \ -i \ --volume=/home/peter/work/common-workflow-language/v1.1.0-dev1/examples/Hello.java:/var/lib/cwl/stg8939ac04-7443-4990-a518-1855b2322141/Hello.java:ro \ --volume=/tmp/tmpwYALo1:/var/spool/cwl:rw \ --volume=/tmp/tmpptIAJ8:/tmp:rw \ --workdir=/var/spool/cwl \ --read-only=true \ --user=1001 \ --rm \ --env=TMPDIR=/tmp \ --env=HOME=/var/spool/cwl \ java:7 \ javac \ -d \ /var/spool/cwl \ /var/lib/cwl/stg8939ac04-7443-4990-a518-1855b2322141/Hello.java Final process status is success { "classfile": { "size": 416, "location": "/home/example/Hello.class", "checksum": "sha1$2f7ac33c1f3aac3f1fec7b936b6562422c85b38a", "class": "File" } } ``` Here we use the `arguments` field to add an additional argument to the command line that isn't tied to a specific input parameter. ``` arguments: ["-d", $(runtime.outdir)] ``` This example references a runtime parameter. Runtime parameters provide information about the hardware or software environment when the tool is actually executed. The `$(runtime.outdir)` parameter is the path to the designated output directory. Other parameters include `$(runtime.tmpdir)`, `$(runtime.ram)`, `$(runtime.cores)`, `$(runtime.outdirSize)`, and `$(runtime.tmpdirSize)`. See the [Runtime Environment](CommandLineTool.html#Runtime_environment) section of the CWL specification for details. - | ## Array inputs It is easy to add arrays of input parameters represented to the command line. To specify an array parameter, the array definition is nested under the `type` field with `type: array` and `items` defining the valid data types that may appear in the array. *array-inputs.cwl* ``` - $include: examples/array-inputs.cwl - | ``` *array-inputs-job.yml* ``` - $include: examples/array-inputs-job.yml - | ``` Now invoke `cwl-runner` providing the tool wrapper and the input object on the command line: ``` $ cwl-runner array-inputs.cwl array-inputs-job.yml [job 140334923640912] /home/example$ echo -A one two three -B=four -B=five -B=six -C=seven,eight,nine -A one two three -B=four -B=five -B=six -C=seven,eight,nine Final process status is success {} ``` The `inputBinding` can appear either on the outer array parameter definition or the inner array element definition, and these produce different behavior when constructing the command line, as shown above. In addition, the `itemSeperator` field, if provided, specifies that array values should be concatenated into a single argument separated by the item separator string. You can specify arrays of arrays, arrays of records, and other complex types. - | ## Array outputs You can also capture multiple output files into an array of files using `glob`. *array-outputs.cwl* ``` - $include: examples/array-outputs.cwl - | ``` *array-outputs-job.yml* ``` - $include: examples/array-outputs-job.yml - | ``` Now invoke `cwl-runner` providing the tool wrapper and the input object on the command line: ``` $ cwl-runner array-outputs.cwl array-outputs-job.yml [job 140190876078160] /home/example$ touch foo.txt bar.dat baz.txt Final process status is success { "output": [ { "size": 0, "location": "/home/peter/work/common-workflow-language/draft-3/examples/foo.txt", "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709", "class": "File" }, { "size": 0, "location": "/home/peter/work/common-workflow-language/draft-3/examples/baz.txt", "checksum": "sha1$da39a3ee5e6b4b0d3255bfef95601890afd80709", "class": "File" } ] } ``` - | ## Record inputs, dependent and mutually exclusive parameters Sometimes an underlying tool has several arguments that must be provided together (they are dependent) or several arguments that cannot be provided together (they are exclusive). You can use records and type unions to group parameters together to describe these two conditions. *record.cwl* ``` - $include: examples/record.cwl - | ``` *record-job1.yml* ``` - $include: examples/record-job1.yml - | ``` ``` $ cwl-runner record.cwl record-job1.yml Workflow error: Error validating input record, could not validate field `dependent_parameters` because missing required field `itemB` ``` In the first example, you can't provide `itemA` without also providing `itemB`. *record-job2.yml* ``` - $include: examples/record-job2.yml - | ``` ``` $ cwl-runner record.cwl record-job2.yml [job 140566927111376] /home/example$ echo -A one -B two -C three -A one -B two -C three Final process status is success {} ``` In the second example, `itemC` and `itemD` are exclusive, so only `itemC` is added to the command line and `itemD` is ignored. *record-job3.yml* ``` - $include: examples/record-job3.yml - | ``` ``` $ cwl-runner record.cwl record-job3.yml [job 140606932172880] /home/example$ echo -A one -B two -D four -A one -B two -D four Final process status is success {} ``` In the third example, only `itemD` is provided, so it appears on the command line. - | ## Environment variables Tools run in a restricted environment and do not inherit most environment variables from the parent process. You can set environment variables for the tool using `EnvVarRequirement`. *env.cwl* ``` - $include: examples/env.cwl - | ``` *echo-job.yml* ``` - $include: examples/echo-job.yml - | ``` Now invoke `cwl-runner` with the tool wrapper and the input object on the command line: ``` $ cwl-runner env.cwl echo-job.yml [job 140710387785808] /home/example$ env PATH=/bin:/usr/bin:/usr/local/bin HELLO=Hello world! TMPDIR=/tmp/tmp63Obpk Final process status is success {} ``` - | ## Javascript expressions If you need to manipulate input parameters, include the requirement `InlineJavascriptRequirement` and then anywhere a parameter reference is legal you can provide a fragment of Javascript that will be evaluated by the CWL runner. *expression.cwl* ``` - $include: examples/expression.cwl - | ``` As this tool does not require any `inputs` we can run it with an (almost) empty job file: *empty.yml* ``` {} | ``` We can then run `expression.cwl`: ``` $ cwl-runner expression.cwl empty.yml [job 140000594593168] /home/example$ echo -A 2 -B baz -C 10 9 8 7 6 5 4 3 2 1 -A 2 -B baz -C 10 9 8 7 6 5 4 3 2 1 Final process status is success {} ``` You can only use expressions in certain fields. These are: `filename`, `fileContent`, `envValue`, `valueFrom`, `glob`, `outputEval`, `stdin`, `stdout`, `coresMin`, `coresMax`, `ramMin`, `ramMax`, `tmpdirMin`, `tmpdirMax`, `outdirMin`, and `outdirMax`. - | ## Creating files at runtime Sometimes you need to create a file on the fly from input parameters, such as tools which expect to read their input configuration from a file rather than the command line parameters. To do this, use `InitialWorkDirRequirement`. *createfile.cwl* ``` - $include: examples/createfile.cwl - | ``` *echo-job.yml* ``` - $include: examples/echo-job.yml - | ``` Now invoke `cwl-runner` with the tool wrapper and the input object on the command line: ``` $ cwltool createfile.cwl echo-job.yml [job 140528604979344] /home/example$ cat example.conf CONFIGVAR=Hello world! Final process status is success {} ``` - | ## Staging input files in the output directory Normally, input files are located in a read-only directory separate from the output directory. This causes problems if the underlying tool expects to write its output files alongside the input file in the same directory. You use `InitialWorkDirRequirement` to stage input files into the output directory. In this example, we use a Javascript expression to extract the base name of the input file from its leading directory path. *linkfile.cwl* ``` - $include: examples/linkfile.cwl - | ``` *arguments-job.yml* ``` - $include: examples/arguments-job.yml - | ``` Now invoke `cwl-runner` with the tool wrapper and the input object on the command line: ``` $ cwl-runner linkfile.cwl arguments-job.yml [job 139928309171664] /home/example$ docker run -i --volume=/home/example/Hello.java:/var/lib/cwl/job557617295_examples/Hello.java:ro --volume=/home/example:/var/spool/cwl:rw --volume=/tmp/tmpmNbApw:/tmp:rw --workdir=/var/spool/cwl --read-only=true --net=none --user=1001 --rm --env=TMPDIR=/tmp java:7 javac Hello.java Final process status is success { "classfile": { "size": 416, "location": "/home/example/Hello.class", "checksum": "sha1$2f7ac33c1f3aac3f1fec7b936b6562422c85b38a", "class": "File" } } ``` - | # Writing Workflows ## First workflow This workflow extracts a java source file from a tar file and then compiles it. *1st-workflow.cwl* ``` - $include: examples/1st-workflow.cwl - | ``` Use a JSON object in a separate file to describe the input of a run: *1st-workflow-job.yml* ``` - $include: examples/1st-workflow-job.yml - | ``` Now invoke `cwl-runner` with the tool wrapper and the input object on the command line: ``` $ echo "public class Hello {}" > Hello.java && tar -cvf hello.tar Hello.java $ cwl-runner 1st-workflow.cwl 1st-workflow-job.yml [job untar] /tmp/tmp94qFiM$ tar xf /home/example/hello.tar Hello.java [step untar] completion status is success [job compile] /tmp/tmpu1iaKL$ docker run -i --volume=/tmp/tmp94qFiM/Hello.java:/var/lib/cwl/job301600808_tmp94qFiM/Hello.java:ro --volume=/tmp/tmpu1iaKL:/var/spool/cwl:rw --volume=/tmp/tmpfZnNdR:/tmp:rw --workdir=/var/spool/cwl --read-only=true --net=none --user=1001 --rm --env=TMPDIR=/tmp java:7 javac -d /var/spool/cwl /var/lib/cwl/job301600808_tmp94qFiM/Hello.java [step compile] completion status is success [workflow 1st-workflow.cwl] outdir is /home/example Final process status is success { "classout": { "location": "/home/example/Hello.class", "checksum": "sha1$e68df795c0686e9aa1a1195536bd900f5f417b18", "class": "File", "size": 416 } } ``` What's going on here? Let's break it down: ``` cwlVersion: v1.1.0-dev1 class: Workflow ``` The `cwlVersion` field indicates the version of the CWL spec used by the document. The `class` field indicates this document describes a workflow. ``` inputs: inp: File ex: string ``` The `inputs` section describes the inputs of the workflow. This is a list of input parameters where each parameter consists of an identifier and a data type. These parameters can be used as sources for input to specific workflows steps. ``` outputs: classout: type: File outputSource: compile/classfile ``` The `outputs` section describes the outputs of the workflow. This is a list of output parameters where each parameter consists of an identifier and a data type. The `outputSource` connects the output parameter `classfile` of the `compile` step to the workflow output parameter `classout`. ``` steps: untar: run: tar-param.cwl in: tarfile: inp extractfile: ex outputs: [example_out] ``` The `steps` section describes the actual steps of the workflow. In this example, the first step extracts a file from a tar file, and the second step compiles the file from the first step using the java compiler. Workflow steps are not necessarily run in the order they are listed, instead the order is determined by the dependencies between steps (using `source`). In addition, workflow steps which do not depend on one another may run in parallel. The first step, `untar` runs `tar-param.cwl` (described previously in [Parameter references](#Parameter_references)). This tool has two input parameters, `tarfile` and `extractfile` and one output parameter `example_out`. The `inputs` section of the workflow step connects these two input parameters to the inputs of the workflow, `inp` and `ex` using `source`. This means that when the workflow step is executed, the values assigned to `inp` and `ex` will be used for the parameters `tarfile` and `extractfile` in order to run the tool. The `outputs` section of the workflow step lists the output parameters that are expected from the tool. ``` compile: run: arguments.cwl in: src: untar/example_out outputs: [classfile] ``` The second step `compile` depends on the results from the first step by connecting the input parameter `src` to the output parameter of `untar` using `untar/example_out`. The output of this step `classfile` is connected to the `outputs` section for the Workflow, described above. cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/userguide-intro.md0000644000175200017520000000240013247251316026030 0ustar mcrusoemcrusoe00000000000000# A Gentle Introduction to the Common Workflow Language Hello! This guide will introduce you to writing tool wrappers and workflows using the Common Workflow Language (CWL). This guide describes the current development specification, version 1.1.0-dev1. Note: This document is a work in progress. Not all features are covered, yet. # Introduction CWL is a way to describe command line tools and connect them together to create workflows. Because CWL is a specification and not a specific piece of software, tools and workflows described using CWL are portable across a variety of platforms that support the CWL standard. CWL has roots in "make" and many similar tools that determine order of execution based on dependencies between tasks. However unlike "make", CWL tasks are isolated and you must be explicit about your inputs and outputs. The benefit of explicitness and isolation are flexibility, portability, and scalability: tools and workflows described with CWL can transparently leverage technologies such as Docker, be used with CWL implementations from different vendors, and is well suited for describing large-scale workflows in cluster, cloud and high performance computing environments where tasks are scheduled in parallel across many nodes. cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/Process.yml0000644000175200017520000006246113247251315024536 0ustar mcrusoemcrusoe00000000000000$base: "https://w3id.org/cwl/cwl#" $namespaces: cwl: "https://w3id.org/cwl/cwl#" sld: "https://w3id.org/cwl/salad#" rdfs: "http://www.w3.org/2000/01/rdf-schema#" $graph: - name: "Common Workflow Language, v1.1.0-dev1" type: documentation doc: {$include: concepts.md} - $import: "salad/schema_salad/metaschema/metaschema_base.yml" - name: BaseTypesDoc type: documentation doc: | ## Base types docChild: - "#CWLType" - "#Process" - type: enum name: CWLVersion doc: "Version symbols for published CWL document versions." symbols: - cwl:draft-2 - cwl:draft-3.dev1 - cwl:draft-3.dev2 - cwl:draft-3.dev3 - cwl:draft-3.dev4 - cwl:draft-3.dev5 - cwl:draft-3 - cwl:draft-4.dev1 - cwl:draft-4.dev2 - cwl:draft-4.dev3 - cwl:v1.0.dev4 - cwl:v1.0 - cwl:v1.1.0-dev1 # a dash is required by the semver 2.0 rules - name: CWLType type: enum extends: "sld:PrimitiveType" symbols: - cwl:File - cwl:Directory doc: - "Extends primitive types with the concept of a file and directory as a builtin type." - "File: A File object" - "Directory: A Directory object" - name: File type: record docParent: "#CWLType" doc: | Represents a file (or group of files if `secondaryFiles` is specified) that must be accessible by tools using standard POSIX file system call API such as open(2) and read(2). fields: - name: class type: type: enum name: File_class symbols: - cwl:File jsonldPredicate: _id: "@type" _type: "@vocab" doc: Must be `File` to indicate this object describes a file. - name: location type: string? doc: | An IRI that identifies the file resource. This may be a relative reference, in which case it must be resolved using the base IRI of the document. The location may refer to a local or remote resource; the implementation must use the IRI to retrieve file content. If an implementation is unable to retrieve the file content stored at a remote resource (due to unsupported protocol, access denied, or other issue) it must signal an error. If the `location` field is not provided, the `contents` field must be provided. The implementation must assign a unique identifier for the `location` field. If the `path` field is provided but the `location` field is not, an implementation may assign the value of the `path` field to `location`, then follow the rules above. jsonldPredicate: _id: "@id" _type: "@id" - name: path type: string? doc: | The local host path where the File is available when a CommandLineTool is executed. This field must be set by the implementation. The final path component must match the value of `basename`. This field must not be used in any other context. The command line tool being executed must be able to to access the file at `path` using the POSIX `open(2)` syscall. As a special case, if the `path` field is provided but the `location` field is not, an implementation may assign the value of the `path` field to `location`, and remove the `path` field. If the `path` contains [POSIX shell metacharacters](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02) (`|`,`&`, `;`, `<`, `>`, `(`,`)`, `$`,`` ` ``, `\`, `"`, `'`, ``, ``, and ``) or characters [not allowed](http://www.iana.org/assignments/idna-tables-6.3.0/idna-tables-6.3.0.xhtml) for [Internationalized Domain Names for Applications](https://tools.ietf.org/html/rfc6452) then implementations may terminate the process with a `permanentFailure`. jsonldPredicate: "_id": "cwl:path" "_type": "@id" - name: basename type: string? doc: | The base name of the file, that is, the name of the file without any leading directory path. The base name must not contain a slash `/`. If not provided, the implementation must set this field based on the `location` field by taking the final path component after parsing `location` as an IRI. If `basename` is provided, it is not required to match the value from `location`. When this file is made available to a CommandLineTool, it must be named with `basename`, i.e. the final component of the `path` field must match `basename`. jsonldPredicate: "cwl:basename" - name: dirname type: string? doc: | The name of the directory containing file, that is, the path leading up to the final slash in the path such that `dirname + '/' + basename == path`. The implementation must set this field based on the value of `path` prior to evaluating parameter references or expressions in a CommandLineTool document. This field must not be used in any other context. - name: nameroot type: string? doc: | The basename root such that `nameroot + nameext == basename`, and `nameext` is empty or begins with a period and contains at most one period. For the purposess of path splitting leading periods on the basename are ignored; a basename of `.cshrc` will have a nameroot of `.cshrc`. The implementation must set this field automatically based on the value of `basename` prior to evaluating parameter references or expressions. - name: nameext type: string? doc: | The basename extension such that `nameroot + nameext == basename`, and `nameext` is empty or begins with a period and contains at most one period. Leading periods on the basename are ignored; a basename of `.cshrc` will have an empty `nameext`. The implementation must set this field automatically based on the value of `basename` prior to evaluating parameter references or expressions. - name: checksum type: string? doc: | Optional hash code for validating file integrity. Currently must be in the form "sha1$ + hexadecimal string" using the SHA-1 algorithm. - name: size type: long? doc: Optional file size - name: "secondaryFiles" type: - "null" - type: array items: [File, Directory] jsonldPredicate: "cwl:secondaryFiles" doc: | A list of additional files that are associated with the primary file and must be transferred alongside the primary file. Examples include indexes of the primary file, or external references which must be included when loading primary document. A file object listed in `secondaryFiles` may itself include `secondaryFiles` for which the same rules apply. - name: format type: string? jsonldPredicate: _id: cwl:format _type: "@id" identity: true doc: | The format of the file: this must be an IRI of a concept node that represents the file format, preferrably defined within an ontology. If no ontology is available, file formats may be tested by exact match. Reasoning about format compatability must be done by checking that an input file format is the same, `owl:equivalentClass` or `rdfs:subClassOf` the format required by the input parameter. `owl:equivalentClass` is transitive with `rdfs:subClassOf`, e.g. if ` owl:equivalentClass ` and ` owl:subclassOf
` then infer ` owl:subclassOf `. File format ontologies may be provided in the "$schema" metadata at the root of the document. If no ontologies are specified in `$schema`, the runtime may perform exact file format matches. - name: contents type: string? doc: | File contents literal. Maximum of 64 KiB. If neither `location` nor `path` is provided, `contents` must be non-null. The implementation must assign a unique identifier for the `location` field. When the file is staged as input to CommandLineTool, the value of `contents` must be written to a file. If `loadContents` of `inputBinding` or `outputBinding` is true and `location` is valid, the implementation must read up to the first 64 KiB of text from the file and place it in the "contents" field. - name: Directory type: record docAfter: "#File" doc: | Represents a directory to present to a command line tool. fields: - name: class type: type: enum name: Directory_class symbols: - cwl:Directory jsonldPredicate: _id: "@type" _type: "@vocab" doc: Must be `Directory` to indicate this object describes a Directory. - name: location type: string? doc: | An IRI that identifies the directory resource. This may be a relative reference, in which case it must be resolved using the base IRI of the document. The location may refer to a local or remote resource. If the `listing` field is not set, the implementation must use the location IRI to retrieve directory listing. If an implementation is unable to retrieve the directory listing stored at a remote resource (due to unsupported protocol, access denied, or other issue) it must signal an error. If the `location` field is not provided, the `listing` field must be provided. The implementation must assign a unique identifier for the `location` field. If the `path` field is provided but the `location` field is not, an implementation may assign the value of the `path` field to `location`, then follow the rules above. jsonldPredicate: _id: "@id" _type: "@id" - name: path type: string? doc: | The local path where the Directory is made available prior to executing a CommandLineTool. This must be set by the implementation. This field must not be used in any other context. The command line tool being executed must be able to to access the directory at `path` using the POSIX `opendir(2)` syscall. If the `path` contains [POSIX shell metacharacters](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_02) (`|`,`&`, `;`, `<`, `>`, `(`,`)`, `$`,`` ` ``, `\`, `"`, `'`, ``, ``, and ``) or characters [not allowed](http://www.iana.org/assignments/idna-tables-6.3.0/idna-tables-6.3.0.xhtml) for [Internationalized Domain Names for Applications](https://tools.ietf.org/html/rfc6452) then implementations may terminate the process with a `permanentFailure`. jsonldPredicate: _id: "cwl:path" _type: "@id" - name: basename type: string? doc: | The base name of the directory, that is, the name of the file without any leading directory path. The base name must not contain a slash `/`. If not provided, the implementation must set this field based on the `location` field by taking the final path component after parsing `location` as an IRI. If `basename` is provided, it is not required to match the value from `location`. When this file is made available to a CommandLineTool, it must be named with `basename`, i.e. the final component of the `path` field must match `basename`. jsonldPredicate: "cwl:basename" - name: listing type: - "null" - type: array items: [File, Directory] doc: | List of files or subdirectories contained in this directory. The name of each file or subdirectory is determined by the `basename` field of each `File` or `Directory` object. It is an error if a `File` shares a `basename` with any other entry in `listing`. If two or more `Directory` object share the same `basename`, this must be treated as equivalent to a single subdirectory with the listings recursively merged. jsonldPredicate: _id: "cwl:listing" - name: SchemaBase type: record abstract: true fields: - name: label type: - "null" - string jsonldPredicate: "rdfs:label" doc: "A short, human-readable label of this object." - name: Parameter type: record extends: SchemaBase abstract: true doc: | Define an input or output parameter to a process. fields: - name: secondaryFiles type: - "null" - string - Expression - type: array items: [string, Expression] jsonldPredicate: "cwl:secondaryFiles" doc: | Only valid when `type: File` or is an array of `items: File`. Describes files that must be included alongside the primary file(s). If the value is an expression, the value of `self` in the expression must be the primary input or output File to which this binding applies. If the value is a string, it specifies that the following pattern should be applied to the primary file: 1. If string begins with one or more caret `^` characters, for each caret, remove the last file extension from the path (the last period `.` and all following characters). If there are no file extensions, the path is unchanged. 2. Append the remainder of the string to the end of the file path. - name: format type: - "null" - string - type: array items: string - Expression jsonldPredicate: _id: cwl:format _type: "@id" identity: true doc: | Only valid when `type: File` or is an array of `items: File`. For input parameters, this must be one or more IRIs of concept nodes that represents file formats which are allowed as input to this parameter, preferrably defined within an ontology. If no ontology is available, file formats may be tested by exact match. For output parameters, this is the file format that will be assigned to the output parameter. - name: streamable type: boolean? doc: | Only valid when `type: File` or is an array of `items: File`. A value of `true` indicates that the file is read or written sequentially without seeking. An implementation may use this flag to indicate whether it is valid to stream file contents using a named pipe. Default: `false`. - name: doc type: - string? - string[]? doc: "A documentation string for this type, or an array of strings which should be concatenated." jsonldPredicate: "rdfs:comment" - type: enum name: Expression doc: | 'Expression' is not a real type. It indicates that a field must allow runtime parameter references. If [InlineJavascriptRequirement](#InlineJavascriptRequirement) is declared and supported by the platform, the field must also allow Javascript expressions. symbols: - cwl:ExpressionPlaceholder - name: InputBinding type: record abstract: true fields: - name: loadContents type: - "null" - boolean jsonldPredicate: "cwl:loadContents" doc: | Only valid when `type: File` or is an array of `items: File`. Read up to the first 64 KiB of text from the file and place it in the "contents" field of the file object for use by expressions. - name: OutputBinding type: record abstract: true - name: InputSchema extends: SchemaBase type: record abstract: true - name: OutputSchema extends: SchemaBase type: record abstract: true - name: InputRecordField type: record extends: "sld:RecordField" specialize: - specializeFrom: "sld:RecordSchema" specializeTo: InputRecordSchema - specializeFrom: "sld:EnumSchema" specializeTo: InputEnumSchema - specializeFrom: "sld:ArraySchema" specializeTo: InputArraySchema - specializeFrom: "sld:PrimitiveType" specializeTo: CWLType fields: - name: inputBinding type: InputBinding? jsonldPredicate: "cwl:inputBinding" - name: label type: string? jsonldPredicate: "rdfs:label" doc: "A short, human-readable label of this process object." - name: InputRecordSchema type: record extends: ["sld:RecordSchema", InputSchema] specialize: - specializeFrom: "sld:RecordField" specializeTo: InputRecordField - name: InputEnumSchema type: record extends: ["sld:EnumSchema", InputSchema] fields: - name: inputBinding type: InputBinding? jsonldPredicate: "cwl:inputBinding" - name: InputArraySchema type: record extends: ["sld:ArraySchema", InputSchema] specialize: - specializeFrom: "sld:RecordSchema" specializeTo: InputRecordSchema - specializeFrom: "sld:EnumSchema" specializeTo: InputEnumSchema - specializeFrom: "sld:ArraySchema" specializeTo: InputArraySchema - specializeFrom: "sld:PrimitiveType" specializeTo: CWLType fields: - name: inputBinding type: InputBinding? jsonldPredicate: "cwl:inputBinding" - name: OutputRecordField type: record extends: "sld:RecordField" specialize: - specializeFrom: "sld:RecordSchema" specializeTo: OutputRecordSchema - specializeFrom: "sld:EnumSchema" specializeTo: OutputEnumSchema - specializeFrom: "sld:ArraySchema" specializeTo: OutputArraySchema - specializeFrom: "sld:PrimitiveType" specializeTo: CWLType fields: - name: outputBinding type: OutputBinding? jsonldPredicate: "cwl:outputBinding" - name: OutputRecordSchema type: record extends: ["sld:RecordSchema", "#OutputSchema"] docParent: "#OutputParameter" specialize: - specializeFrom: "sld:RecordField" specializeTo: OutputRecordField - name: OutputEnumSchema type: record extends: ["sld:EnumSchema", OutputSchema] docParent: "#OutputParameter" fields: - name: outputBinding type: OutputBinding? jsonldPredicate: "cwl:outputBinding" - name: OutputArraySchema type: record extends: ["sld:ArraySchema", OutputSchema] docParent: "#OutputParameter" specialize: - specializeFrom: "sld:RecordSchema" specializeTo: OutputRecordSchema - specializeFrom: "sld:EnumSchema" specializeTo: OutputEnumSchema - specializeFrom: "sld:ArraySchema" specializeTo: OutputArraySchema - specializeFrom: "sld:PrimitiveType" specializeTo: CWLType fields: - name: outputBinding type: OutputBinding? jsonldPredicate: "cwl:outputBinding" - name: InputParameter type: record extends: Parameter fields: - name: id type: string jsonldPredicate: "@id" doc: "The unique identifier for this parameter object." - name: inputBinding type: InputBinding? jsonldPredicate: "cwl:inputBinding" doc: | Describes how to handle the inputs of a process and convert them into a concrete form for execution, such as command line parameters. - name: default type: Any? jsonldPredicate: _id: cwl:default noLinkCheck: true doc: | The default value to use for this parameter if the parameter is missing from the input object, or if the value of the parameter in the input object is `null`. - name: RegularInputParameter type: record extends: InputParameter fields: - name: type type: - "null" - CWLType - InputRecordSchema - InputEnumSchema - InputArraySchema - string - type: array items: - CWLType - InputRecordSchema - InputEnumSchema - InputArraySchema - string jsonldPredicate: "_id": "sld:type" "_type": "@vocab" refScope: 2 typeDSL: True doc: | Specify valid types of data that may be assigned to this parameter. - name: OutputParameter type: record extends: Parameter fields: - name: id type: string jsonldPredicate: "@id" doc: "The unique identifier for this parameter object." - name: outputBinding type: OutputBinding? jsonldPredicate: "cwl:outputBinding" doc: | Describes how to handle the outputs of a process. - type: record name: ProcessRequirement abstract: true doc: | A process requirement declares a prerequisite that may or must be fulfilled before executing a process. See [`Process.hints`](#process) and [`Process.requirements`](#process). Process requirements are the primary mechanism for specifying extensions to the CWL core specification. - type: record name: Process abstract: true doc: | The base executable type in CWL is the `Process` object defined by the document. Note that the `Process` object is abstract and cannot be directly executed. fields: - name: id type: string? jsonldPredicate: "@id" doc: "The unique identifier for this process object." - name: inputs type: type: array items: InputParameter jsonldPredicate: _id: "cwl:inputs" mapSubject: id mapPredicate: type doc: | Defines the input parameters of the process. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used to build a user interface for constructing the input object. When accepting an input object, all input parameters must have a value. If an input parameter is missing from the input object, it must be assigned a value of `null` (or the value of `default` for that parameter, if provided) for the purposes of validation and evaluation of expressions. - name: outputs type: type: array items: OutputParameter jsonldPredicate: _id: "cwl:outputs" mapSubject: id mapPredicate: type doc: | Defines the parameters representing the output of the process. May be used to generate and/or validate the output object. - name: requirements type: ProcessRequirement[]? jsonldPredicate: _id: "cwl:requirements" mapSubject: class doc: | Declares requirements that apply to either the runtime environment or the workflow engine that must be met in order to execute this process. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option. - name: hints type: Any[]? doc: | Declares hints applying to either the runtime environment or the workflow engine that may be helpful in executing this process. It is not an error if an implementation cannot satisfy all hints, however the implementation may report a warning. jsonldPredicate: _id: cwl:hints noLinkCheck: true mapSubject: class - name: label type: string? jsonldPredicate: "rdfs:label" doc: "A short, human-readable label of this process object." - name: doc type: string? jsonldPredicate: "rdfs:comment" doc: "A long, human-readable description of this process object." - name: cwlVersion type: CWLVersion? doc: | CWL document version. Always required at the document root. Not required for a Process embedded inside another Process. jsonldPredicate: "_id": "cwl:cwlVersion" "_type": "@vocab" - name: InlineJavascriptRequirement type: record extends: ProcessRequirement doc: | Indicates that the workflow platform must support inline Javascript expressions. If this requirement is not present, the workflow platform must not perform expression interpolatation. fields: - name: class type: string doc: "Always 'InlineJavascriptRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: expressionLib type: string[]? doc: | Additional code fragments that will also be inserted before executing the expression code. Allows for function definitions that may be called from CWL expressions. - name: SchemaDefRequirement type: record extends: ProcessRequirement doc: | This field consists of an array of type definitions which must be used when interpreting the `inputs` and `outputs` fields. When a `type` field contain a IRI, the implementation must check if the type is defined in `schemaDefs` and use that definition. If the type is not found in `schemaDefs`, it is an error. The entries in `schemaDefs` must be processed in the order listed such that later schema definitions may refer to earlier schema definitions. fields: - name: class type: string doc: "Always 'SchemaDefRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: types type: type: array items: InputSchema doc: The list of type definitions. cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/CommonWorkflowLanguage.yml0000644000175200017520000000032113247251315027532 0ustar mcrusoemcrusoe00000000000000$base: "https://w3id.org/cwl/cwl#" $namespaces: cwl: "https://w3id.org/cwl/cwl#" sld: "https://w3id.org/cwl/salad#" $graph: - $import: Process.yml - $import: CommandLineTool.yml - $import: Workflow.yml cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/invocation.md0000644000175200017520000001732113247251315025063 0ustar mcrusoemcrusoe00000000000000# Running a Command To accommodate the enormous variety in syntax and semantics for input, runtime environment, invocation, and output of arbitrary programs, a CommandLineTool defines an "input binding" that describes how to translate abstract input parameters to an concrete program invocation, and an "output binding" that describes how to generate output parameters from program output. ## Input binding The tool command line is built by applying command line bindings to the input object. Bindings are listed either as part of an [input parameter](#CommandInputParameter) using the `inputBinding` field, or separately using the `arguments` field of the CommandLineTool. The algorithm to build the command line is as follows. In this algorithm, the sort key is a list consisting of one or more numeric or string elements. Strings are sorted lexicographically based on UTF-8 encoding. 1. Collect `CommandLineBinding` objects from `arguments`. Assign a sorting key `[position, i]` where `position` is [`CommandLineBinding.position`](#CommandLineBinding) and `i` is the index in the `arguments` list. 2. Collect `CommandLineBinding` objects from the `inputs` schema and associate them with values from the input object. Where the input type is a record, array, or map, recursively walk the schema and input object, collecting nested `CommandLineBinding` objects and associating them with values from the input object. 3. Create a sorting key by taking the value of the `position` field at each level leading to each leaf binding object. If `position` is not specified, it is not added to the sorting key. For bindings on arrays and maps, the sorting key must include the array index or map key following the position. If and only if two bindings have the same sort key, the tie must be broken using the ordering of the field or parameter name immediately containing the leaf binding. 4. Sort elements using the assigned sorting keys. Numeric entries sort before strings. 5. In the sorted order, apply the rules defined in [`CommandLineBinding`](#CommandLineBinding) to convert bindings to actual command line elements. 6. Insert elements from `baseCommand` at the beginning of the command line. ## Runtime environment All files listed in the input object must be made available in the runtime environment. The implementation may use a shared or distributed file system or transfer files via explicit download to the host. Implementations may choose not to provide access to files not explicitly specified in the input object or process requirements. Output files produced by tool execution must be written to the **designated output directory**. The initial current working directory when executing the tool must be the designated output directory. Files may also be written to the **designated temporary directory**. This directory must be isolated and not shared with other processes. Any files written to the designated temporary directory may be automatically deleted by the workflow platform immediately after the tool terminates. For compatibility, files may be written to the **system temporary directory** which must be located at `/tmp`. Because the system temporary directory may be shared with other processes on the system, files placed in the system temporary directory are not guaranteed to be deleted automatically. A tool must not use the system temporary directory as a backchannel communication with other tools. It is valid for the system temporary directory to be the same as the designated temporary directory. When executing the tool, the tool must execute in a new, empty environment with only the environment variables described below; the child process must not inherit environment variables from the parent process except as specified or at user option. * `HOME` must be set to the designated output directory. * `TMPDIR` must be set to the designated temporary directory. * `PATH` may be inherited from the parent process, except when run in a container that provides its own `PATH`. * Variables defined by [EnvVarRequirement](#EnvVarRequirement) * The default environment of the container, such as when using [DockerRequirement](#DockerRequirement) An implementation may forbid the tool from writing to any location in the runtime environment file system other than the designated temporary directory, system temporary directory, and designated output directory. An implementation may provide read-only input files, and disallow in-place update of input files. The designated temporary directory, system temporary directory and designated output directory may each reside on different mount points on different file systems. An implementation may forbid the tool from directly accessing network resources. Correct tools must not assume any network access. Future versions of the specification may incorporate optional process requirements that describe the networking needs of a tool. The `runtime` section available in [parameter references](#Parameter_references) and [expressions](#Expressions) contains the following fields. As noted earlier, an implementation may perform deferred resolution of runtime fields by providing opaque strings for any or all of the following fields; parameter references and expressions may only use the literal string value of the field and must not perform computation on the contents. * `runtime.outdir`: an absolute path to the designated output directory * `runtime.tmpdir`: an absolute path to the designated temporary directory * `runtime.cores`: number of CPU cores reserved for the tool process * `runtime.ram`: amount of RAM in mebibytes (2\*\*20) reserved for the tool process * `runtime.outdirSize`: reserved storage space available in the designated output directory * `runtime.tmpdirSize`: reserved storage space available in the designated temporary directory For `cores`, `ram`, `outdirSize` and `tmpdirSize`, if an implementation can't provide the actual number of reserved cores during the expression evaluation time, it should report back the minimal requested amount. See [ResourceRequirement](#ResourceRequirement) for details on how to describe the hardware resources required by a tool. The standard input stream and standard output stream may be redirected as described in the `stdin` and `stdout` fields. ## Execution Once the command line is built and the runtime environment is created, the actual tool is executed. The standard error stream and standard output stream (unless redirected by setting `stdout` or `stderr`) may be captured by platform logging facilities for storage and reporting. Tools may be multithreaded or spawn child processes; however, when the parent process exits, the tool is considered finished regardless of whether any detached child processes are still running. Tools must not require any kind of console, GUI, or web based user interaction in order to start and run to completion. The exit code of the process indicates if the process completed successfully. By convention, an exit code of zero is treated as success and non-zero exit codes are treated as failure. This may be customized by providing the fields `successCodes`, `temporaryFailCodes`, and `permanentFailCodes`. An implementation may choose to default unspecified non-zero exit codes to either `temporaryFailure` or `permanentFailure`. ## Output binding If the output directory contains a file named "cwl.output.json", that file must be loaded and used as the output object. Otherwise, the output object must be generated by walking the parameters listed in `outputs` and applying output bindings to the tool output. Output bindings are associated with output parameters using the `outputBinding` field. See [`CommandOutputBinding`](#CommandOutputBinding) for details. cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/concepts.md0000644000175200017520000004333313247251315024532 0ustar mcrusoemcrusoe00000000000000## References to other specifications **Javascript Object Notation (JSON)**: http://json.org **JSON Linked Data (JSON-LD)**: http://json-ld.org **YAML**: http://yaml.org **Avro**: https://avro.apache.org/docs/current/spec.html **Uniform Resource Identifier (URI) Generic Syntax**: https://tools.ietf.org/html/rfc3986) **Internationalized Resource Identifiers (IRIs)**: https://tools.ietf.org/html/rfc3987 **Portable Operating System Interface (POSIX.1-2008)**: http://pubs.opengroup.org/onlinepubs/9699919799/ **Resource Description Framework (RDF)**: http://www.w3.org/RDF/ ## Scope This document describes CWL syntax, execution, and object model. It is not intended to document a CWL specific implementation, however it may serve as a reference for the behavior of conforming implementations. ## Terminology The terminology used to describe CWL documents is defined in the Concepts section of the specification. The terms defined in the following list are used in building those definitions and in describing the actions of a CWL implementation: **may**: Conforming CWL documents and CWL implementations are permitted but not required to behave as described. **must**: Conforming CWL documents and CWL implementations are required to behave as described; otherwise they are in error. **error**: A violation of the rules of this specification; results are undefined. Conforming implementations may detect and report an error and may recover from it. **fatal error**: A violation of the rules of this specification; results are undefined. Conforming implementations must not continue to execute the current process and may report an error. **at user option**: Conforming software may or must (depending on the modal verb in the sentence) behave as described; if it does, it must provide users a means to enable or disable the behavior described. **deprecated**: Conforming software may implement a behavior for backwards compatibility. Portable CWL documents should not rely on deprecated behavior. Behavior marked as deprecated may be removed entirely from future revisions of the CWL specification. # Data model ## Data concepts An **object** is a data structure equivalent to the "object" type in JSON, consisting of a unordered set of name/value pairs (referred to here as **fields**) and where the name is a string and the value is a string, number, boolean, array, or object. A **document** is a file containing a serialized object, or an array of objects. A **process** is a basic unit of computation which accepts input data, performs some computation, and produces output data. Examples include CommandLineTools, Workflows, and ExpressionTools. An **input object** is an object describing the inputs to an invocation of a process. An **output object** is an object describing the output resulting from an invocation of a process. An **input schema** describes the valid format (required fields, data types) for an input object. An **output schema** describes the valid format for an output object. **Metadata** is information about workflows, tools, or input items. ## Syntax CWL documents must consist of an object or array of objects represented using JSON or YAML syntax. Upon loading, a CWL implementation must apply the preprocessing steps described in the [Semantic Annotations for Linked Avro Data (SALAD) Specification](SchemaSalad.html). An implementation may formally validate the structure of a CWL document using SALAD schemas located at https://github.com/common-workflow-language/common-workflow-language/tree/master/v1.1.0-dev1 ## Identifiers If an object contains an `id` field, that is used to uniquely identify the object in that document. The value of the `id` field must be unique over the entire document. Identifiers may be resolved relative to either the document base and/or other identifiers following the rules are described in the [Schema Salad specification](SchemaSalad.html#Identifier_resolution). An implementation may choose to only honor references to object types for which the `id` field is explicitly listed in this specification. ## Document preprocessing An implementation must resolve [$import](SchemaSalad.html#Import) and [$include](SchemaSalad.html#Import) directives as described in the [Schema Salad specification](SchemaSalad.html). Another transformation defined in Schema salad is simplification of data type definitions. Type `` ending with `?` should be transformed to `[, "null"]`. Type `` ending with `[]` should be transformed to `{"type": "array", "items": }` ## Extensions and metadata Input metadata (for example, a lab sample identifier) may be represented within a tool or workflow using input parameters which are explicitly propagated to output. Future versions of this specification may define additional facilities for working with input/output metadata. Implementation extensions not required for correct execution (for example, fields related to GUI presentation) and metadata about the tool or workflow itself (for example, authorship for use in citations) may be provided as additional fields on any object. Such extensions fields must use a namespace prefix listed in the `$namespaces` section of the document as described in the [Schema Salad specification](SchemaSalad.html#Explicit_context). Implementation extensions which modify execution semantics must be [listed in the `requirements` field](#Requirements_and_hints). # Execution model ## Execution concepts A **parameter** is a named symbolic input or output of process, with an associated datatype or schema. During execution, values are assigned to parameters to make the input object or output object used for concrete process invocation. A **CommandLineTool** is a process characterized by the execution of a standalone, non-interactive program which is invoked on some input, produces output, and then terminates. A **workflow** is a process characterized by multiple subprocess steps, where step outputs are connected to the inputs of downstream steps to form a directed acylic graph, and independent steps may run concurrently. A **runtime environment** is the actual hardware and software environment when executing a command line tool. It includes, but is not limited to, the hardware architecture, hardware resources, operating system, software runtime (if applicable, such as the specific Python interpreter or the specific Java virtual machine), libraries, modules, packages, utilities, and data files required to run the tool. A **workflow platform** is a specific hardware and software implementation capable of interpreting CWL documents and executing the processes specified by the document. The responsibilities of the workflow platform may include scheduling process invocation, setting up the necessary runtime environment, making input data available, invoking the tool process, and collecting output. A workflow platform may choose to only implement the Command Line Tool Description part of the CWL specification. It is intended that the workflow platform has broad leeway outside of this specification to optimize use of computing resources and enforce policies not covered by this specification. Some areas that are currently out of scope for CWL specification but may be handled by a specific workflow platform include: * Data security and permissions * Scheduling tool invocations on remote cluster or cloud compute nodes. * Using virtual machines or operating system containers to manage the runtime (except as described in [DockerRequirement](CommandLineTool.html#DockerRequirement)). * Using remote or distributed file systems to manage input and output files. * Transforming file paths. * Determining if a process has previously been executed, and if so skipping it and reusing previous results. * Pausing, resuming or checkpointing processes or workflows. Conforming CWL processes must not assume anything about the runtime environment or workflow platform unless explicitly declared though the use of [process requirements](#Requirements_and_hints). ## Generic execution process The generic execution sequence of a CWL process (including workflows and command line line tools) is as follows. 1. Load, process and validate a CWL document, yielding a process object. 2. Load input object. 3. Validate the input object against the `inputs` schema for the process. 4. Validate process requirements are met. 5. Perform any further setup required by the specific process type. 6. Execute the process. 7. Capture results of process execution into the output object. 8. Validate the output object against the `outputs` schema for the process. 9. Report the output object to the process caller. ## Requirements and hints A **process requirement** modifies the semantics or runtime environment of a process. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option. A **hint** is similar to a requirement; however, it is not an error if an implementation cannot satisfy all hints. The implementation may report a warning if a hint cannot be satisfied. Requirements are inherited. A requirement specified in a Workflow applies to all workflow steps; a requirement specified on a workflow step will apply to the process implementation of that step and any of its substeps. If the same process requirement appears at different levels of the workflow, the most specific instance of the requirement is used, that is, an entry in `requirements` on a process implementation such as CommandLineTool will take precedence over an entry in `requirements` specified in a workflow step, and an entry in `requirements` on a workflow step takes precedence over the workflow. Entries in `hints` are resolved the same way. Requirements override hints. If a process implementation provides a process requirement in `hints` which is also provided in `requirements` by an enclosing workflow or workflow step, the enclosing `requirements` takes precedence. ## Parameter references Parameter references are denoted by the syntax `$(...)` and may be used in any field permitting the pseudo-type `Expression`, as specified by this document. Conforming implementations must support parameter references. Parameter references use the following subset of [Javascript/ECMAScript 5.1](http://www.ecma-international.org/ecma-262/5.1/) syntax, but they are designed to not require a Javascript engine for evaluation. In the following [BNF grammar](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form), character classes, and grammar rules are denoted in '{}', '-' denotes exclusion from a character class, '(())' denotes grouping, '|' denotes alternates, trailing '*' denotes zero or more repeats, '+' denote one or more repeats, '/' escapes these special characters, and all other characters are literal values.

symbol:: {Unicode alphanumeric}+
singleq:: [' (( {character - '} | \' ))* ']
doubleq:: [" (( {character - "} | \" ))* "]
index:: [ {decimal digit}+ ]
segment:: . {symbol} | {singleq} | {doubleq} | {index}
parameter reference::$( {symbol} {segment}*)

Use the following algorithm to resolve a parameter reference: 1. Match the leading symbol as the key 2. Look up the key in the parameter context (described below) to get the current value. It is an error if the key is not found in the parameter context. 3. If there are no subsequent segments, terminate and return current value 4. Else, match the next segment 5. Extract the symbol, string, or index from the segment as the key 6. Look up the key in current value and assign as new current value. If the key is a symbol or string, the current value must be an object. If the key is an index, the current value must be an array or string. It is an error if the key does not match the required type, or the key is not found or out of range. 7. Repeat steps 3-6 The root namespace is the parameter context. The following parameters must be provided: * `inputs`: The input object to the current Process. * `self`: A context-specific value. The contextual values for 'self' are documented for specific fields elsewhere in this specification. If a contextual value of 'self' is not documented for a field, it must be 'null'. * `runtime`: An object containing configuration details. Specific to the process type. An implementation may provide opaque strings for any or all fields of `runtime`. These must be filled in by the platform after processing the Tool but before actual execution. Parameter references and expressions may only use the literal string value of the field and must not perform computation on the contents, except where noted otherwise. If the value of a field has no leading or trailing non-whitespace characters around a parameter reference, the effective value of the field becomes the value of the referenced parameter, preserving the return type. If the value of a field has non-whitespace leading or trailing characters around a parameter reference, it is subject to string interpolation. The effective value of the field is a string containing the leading characters, followed by the string value of the parameter reference, followed by the trailing characters. The string value of the parameter reference is its textual JSON representation with the following rules: * Leading and trailing quotes are stripped from strings * Objects entries are sorted by key Multiple parameter references may appear in a single field. This case must be treated as a string interpolation. After interpolating the first parameter reference, interpolation must be recursively applied to the trailing characters to yield the final string value. ## Expressions An expression is a fragment of [Javascript/ECMAScript 5.1](http://www.ecma-international.org/ecma-262/5.1/) code evaluated by the workflow platform to affect the inputs, outputs, or behavior of a process. In the generic execution sequence, expressions may be evaluated during step 5 (process setup), step 6 (execute process), and/or step 7 (capture output). Expressions are distinct from regular processes in that they are intended to modify the behavior of the workflow itself rather than perform the primary work of the workflow. To declare the use of expressions, the document must include the process requirement `InlineJavascriptRequirement`. Expressions may be used in any field permitting the pseudo-type `Expression`, as specified by this document. Expressions are denoted by the syntax `$(...)` or `${...}`. A code fragment wrapped in the `$(...)` syntax must be evaluated as a [ECMAScript expression](http://www.ecma-international.org/ecma-262/5.1/#sec-11). A code fragment wrapped in the `${...}` syntax must be evaluated as a [ECMAScript function body](http://www.ecma-international.org/ecma-262/5.1/#sec-13) for an anonymous, zero-argument function. Expressions must return a valid JSON data type: one of null, string, number, boolean, array, object. Other return values must result in a `permanentFailure`. Implementations must permit any syntactically valid Javascript and account for nesting of parenthesis or braces and that strings that may contain parenthesis or braces when scanning for expressions. The runtime must include any code defined in the ["expressionLib" field of InlineJavascriptRequirement](#InlineJavascriptRequirement) prior to executing the actual expression. Before executing the expression, the runtime must initialize as global variables the fields of the parameter context described above. The effective value of the field after expression evaluation follows the same rules as parameter references discussed above. Multiple expressions may appear in a single field. Expressions must be evaluated in an isolated context (a "sandbox") which permits no side effects to leak outside the context. Expressions also must be evaluated in [Javascript strict mode](http://www.ecma-international.org/ecma-262/5.1/#sec-4.2.2). The order in which expressions are evaluated is undefined except where otherwise noted in this document. An implementation may choose to implement parameter references by evaluating as a Javascript expression. The results of evaluating parameter references must be identical whether implemented by Javascript evaluation or some other means. Implementations may apply other limits, such as process isolation, timeouts, and operating system containers/jails to minimize the security risks associated with running untrusted code embedded in a CWL document. Exceptions thrown from an exception must result in a `permanentFailure` of the process. ## Executing CWL documents as scripts By convention, a CWL document may begin with `#!/usr/bin/env cwl-runner` and be marked as executable (the POSIX "+x" permission bits) to enable it to be executed directly. A workflow platform may support this mode of operation; if so, it must provide `cwl-runner` as an alias for the platform's CWL implementation. A CWL input object document may similarly begin with `#!/usr/bin/env cwl-runner` and be marked as executable. In this case, the input object must include the field `cwl:tool` supplying an IRI to the default CWL document that should be executed using the fields of the input object as input parameters. ## Discovering CWL documents on a local filesystem To discover CWL documents look in the following locations: `/usr/share/commonwl/` `/usr/local/share/commonwl/` `$XDG_DATA_HOME/commonwl/` (usually `$HOME/.local/share/commonwl`) `$XDF_DATA_HOME` is from the [XDG Base Directory Specification](http://standards.freedesktop.org/basedir-spec/basedir-spec-0.6.html) cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/README.md0000644000175200017520000000151013247251315023640 0ustar mcrusoemcrusoe00000000000000# Common Workflow Language Specifications, v1.1.0-dev1 The CWL specifications are divided up into several documents. The [User Guide](UserGuide.html) provides a gentle introduction to writing CWL command line tools and workflows. The [Command Line Tool Description Specification](CommandLineTool.html) specifies the document schema and execution semantics for wrapping and executing command line tools. The [Workflow Description Specification](Workflow.html) specifies the document schema and execution semantics for composing workflows from components such as command line tools and other workflows. The [Semantic Annotations for Linked Avro Data (SALAD) Specification](SchemaSalad.html) specifies the preprocessing steps that must be applied when loading CWL documents and the schema language used to write the above specifications. cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/contrib.md0000644000175200017520000000200313247251315024341 0ustar mcrusoemcrusoe00000000000000Authors: * Peter Amstutz , Arvados Project, Curoverse * Michael R. Crusoe , Common Workflow Language project * Nebojša Tijanić , Seven Bridges Genomics Contributors: * Brad Chapman , Harvard Chan School of Public Health * John Chilton , Galaxy Project, Pennsylvania State University * Michael Heuer ,UC Berkeley AMPLab * Andrey Kartashov , Cincinnati Children's Hospital * Dan Leehr , Duke University * Hervé Ménager , Institut Pasteur * Maya Nedeljkovich , Seven Bridges Genomics * Matt Scales , Institute of Cancer Research, London * Stian Soiland-Reyes [soiland-reyes@cs.manchester.ac.uk](mailto:soiland-reyes@cs.manchester.ac.uk), University of Manchester * Luka Stojanovic , Seven Bridges Genomics cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/0000755000175200017520000000000013247251336023453 5ustar mcrusoemcrusoe00000000000000cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/0000755000175200017520000000000013247251336026057 5ustar mcrusoemcrusoe00000000000000cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/0000755000175200017520000000000013247251336030166 5ustar mcrusoemcrusoe00000000000000././@LongLink0000000000000000000000000000015300000000000011214 Lustar 00000000000000cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/field_name_schema.ymlcwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/field_name_sche0000644000175200017520000000040113247251315033166 0ustar mcrusoemcrusoe00000000000000{ "$namespaces": { "acid": "http://example.com/acid#" }, "$graph": [{ "name": "ExampleType", "type": "record", "fields": [{ "name": "base", "type": "string", "jsonldPredicate": "http://example.com/base" }] }] } ././@LongLink0000000000000000000000000000015200000000000011213 Lustar 00000000000000cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/ident_res_schema.ymlcwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/ident_res_schem0000644000175200017520000000035313247251315033242 0ustar mcrusoemcrusoe00000000000000{ "$namespaces": { "acid": "http://example.com/acid#" }, "$graph": [{ "name": "ExampleType", "type": "record", "fields": [{ "name": "id", "type": "string", "jsonldPredicate": "@id" }] }] } ././@LongLink0000000000000000000000000000014700000000000011217 Lustar 00000000000000cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/ident_res_src.ymlcwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/ident_res_src.y0000644000175200017520000000052213247251315033177 0ustar mcrusoemcrusoe00000000000000 { "id": "http://example.com/base", "form": { "id": "one", "things": [ { "id": "two" }, { "id": "#three", }, { "id": "four#five", }, { "id": "acid:six", } ] } } cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/field_name.yml0000644000175200017520000000246013247251315032773 0ustar mcrusoemcrusoe00000000000000- | ## Field name resolution The document schema declares the vocabulary of known field names. During preprocessing traversal, field name in the document which are not part of the schema vocabulary must be resolved to absolute URIs. Under "strict" validation, it is an error for a document to include fields which are not part of the vocabulary and not resolvable to absolute URIs. Fields names which are not part of the vocabulary are resolved using the following rules: * If an field name URI begins with a namespace prefix declared in the document context (`@context`) followed by a colon `:`, the prefix and colon must be replaced by the namespace declared in `@context`. * If there is a vocabulary term which maps to the URI of a resolved field, the field name must be replace with the vocabulary term. * If a field name URI is an absolute URI consisting of a scheme and path and is not part of the vocabulary, no processing occurs. Field name resolution is not relative. It must not be affected by the base URI. ### Field name resolution example Given the following schema: ``` - $include: field_name_schema.yml - | ``` Process the following example: ``` - $include: field_name_src.yml - | ``` This becomes: ``` - $include: field_name_proc.yml - | ``` ././@LongLink0000000000000000000000000000014700000000000011217 Lustar 00000000000000cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/link_res_proc.ymlcwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/link_res_proc.y0000644000175200017520000000063313247251315033210 0ustar mcrusoemcrusoe00000000000000{ "$base": "http://example.com/base", "link": "http://example.com/base/zero", "form": { "link": "http://example.com/one", "things": [ { "link": "http://example.com/two" }, { "link": "http://example.com/base#three" }, { "link": "http://example.com/four#five", }, { "link": "http://example.com/acid#six", } ] } } cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/salad.md0000644000175200017520000002435313247251315031600 0ustar mcrusoemcrusoe00000000000000# Semantic Annotations for Linked Avro Data (SALAD) Author: * Peter Amstutz , Curoverse Contributors: * The developers of Apache Avro * The developers of JSON-LD * Nebojša Tijanić , Seven Bridges Genomics # Abstract Salad is a schema language for describing structured linked data documents in JSON or YAML documents. A Salad schema provides rules for preprocessing, structural validation, and link checking for documents described by a Salad schema. Salad builds on JSON-LD and the Apache Avro data serialization system, and extends Avro with features for rich data modeling such as inheritance, template specialization, object identifiers, and object references. Salad was developed to provide a bridge between the record oriented data modeling supported by Apache Avro and the Semantic Web. # Status of This Document This document is the product of the [Common Workflow Language working group](https://groups.google.com/forum/#!forum/common-workflow-language). The latest version of this document is available in the "schema_salad" directory at https://github.com/common-workflow-language/schema_salad The products of the CWL working group (including this document) are made available under the terms of the Apache License, version 2.0. # Introduction The JSON data model is an extremely popular way to represent structured data. It is attractive because of it's relative simplicity and is a natural fit with the standard types of many programming languages. However, this simplicity means that basic JSON lacks expressive features useful for working with complex data structures and document formats, such as schemas, object references, and namespaces. JSON-LD is a W3C standard providing a way to describe how to interpret a JSON document as Linked Data by means of a "context". JSON-LD provides a powerful solution for representing object references and namespaces in JSON based on standard web URIs, but is not itself a schema language. Without a schema providing a well defined structure, it is difficult to process an arbitrary JSON-LD document as idiomatic JSON because there are many ways to express the same data that are logically equivalent but structurally distinct. Several schema languages exist for describing and validating JSON data, such as the Apache Avro data serialization system, however none understand linked data. As a result, to fully take advantage of JSON-LD to build the next generation of linked data applications, one must maintain separate JSON schema, JSON-LD context, RDF schema, and human documentation, despite significant overlap of content and obvious need for these documents to stay synchronized. Schema Salad is designed to address this gap. It provides a schema language and processing rules for describing structured JSON content permitting URI resolution and strict document validation. The schema language supports linked data through annotations that describe the linked data interpretation of the content, enables generation of JSON-LD context and RDF schema, and production of RDF triples by applying the JSON-LD context. The schema language also provides for robust support of inline documentation. ## Introduction to draft 1 This is the first version of Schema Salad. It is developed concurrently with draft 3 of the Common Workflow Language for use in specifying the Common Workflow Language, however Schema Salad is intended to be useful to a broader audience. ## References to Other Specifications **Javascript Object Notation (JSON)**: http://json.org **JSON Linked Data (JSON-LD)**: http://json-ld.org **YAML**: http://yaml.org **Avro**: https://avro.apache.org/docs/current/spec.html **Uniform Resource Identifier (URI) Generic Syntax**: https://tools.ietf.org/html/rfc3986) **Resource Description Framework (RDF)**: http://www.w3.org/RDF/ **UTF-8**: https://www.ietf.org/rfc/rfc2279.txt) ## Scope This document describes the syntax, data model, algorithms, and schema language for working with Salad documents. It is not intended to document a specific implementation of Salad, however it may serve as a reference for the behavior of conforming implementations. ## Terminology The terminology used to describe Salad documents is defined in the Concepts section of the specification. The terms defined in the following list are used in building those definitions and in describing the actions of an Salad implementation: **may**: Conforming Salad documents and Salad implementations are permitted but not required to be interpreted as described. **must**: Conforming Salad documents and Salad implementations are required to be interpreted as described; otherwise they are in error. **error**: A violation of the rules of this specification; results are undefined. Conforming implementations may detect and report an error and may recover from it. **fatal error**: A violation of the rules of this specification; results are undefined. Conforming implementations must not continue to process the document and may report an error. **at user option**: Conforming software may or must (depending on the modal verb in the sentence) behave as described; if it does, it must provide users a means to enable or disable the behavior described. # Document model ## Data concepts An **object** is a data structure equivalent to the "object" type in JSON, consisting of a unordered set of name/value pairs (referred to here as **fields**) and where the name is a string and the value is a string, number, boolean, array, or object. A **document** is a file containing a serialized object, or an array of objects. A **document type** is a class of files that share a common structure and semantics. A **document schema** is a formal description of the grammar of a document type. A **base URI** is a context-dependent URI used to resolve relative references. An **identifier** is a URI that designates a single document or single object within a document. A **vocabulary** is the set of symbolic field names and enumerated symbols defined by a document schema, where each term maps to absolute URI. ## Syntax Conforming Salad documents are serialized and loaded using YAML syntax and UTF-8 text encoding. Salad documents are written using the JSON-compatible subset of YAML. Features of YAML such as headers and type tags that are not found in the standard JSON data model must not be used in conforming Salad documents. It is a fatal error if the document is not valid YAML. A Salad document must consist only of either a single root object or an array of objects. ## Document context ### Implied context The implicit context consists of the vocabulary defined by the schema and the base URI. By default, the base URI must be the URI that was used to load the document. It may be overridden by an explicit context. ### Explicit context If a document consists of a root object, this object may contain the fields `$base`, `$namespaces`, `$schemas`, and `$graph`: * `$base`: Must be a string. Set the base URI for the document used to resolve relative references. * `$namespaces`: Must be an object with strings as values. The keys of the object are namespace prefixes used in the document; the values of the object are the prefix expansions. * `$schemas`: Must be an array of strings. This field may list URI references to documents in RDF-XML format which will be queried for RDF schema data. The subjects and predicates described by the RDF schema may provide additional semantic context for the document, and may be used for validation of prefixed extension fields found in the document. Other directives beginning with `$` must be ignored. ## Document graph If a document consists of a single root object, this object may contain the field `$graph`. This field must be an array of objects. If present, this field holds the primary content of the document. A document that consists of array of objects at the root is an implicit graph. ## Document metadata If a document consists of a single root object, metadata about the document, such as authorship, may be declared in the root object. ## Document schema Document preprocessing, link validation and schema validation require a document schema. A schema may consist of: * At least one record definition object which defines valid fields that make up a record type. Record field definitions include the valid types that may be assigned to each field and annotations to indicate fields that represent identifiers and links, described below in "Semantic Annotations". * Any number of enumerated type objects which define a set of finite set of symbols that are valid value of the type. * Any number of documentation objects which allow in-line documentation of the schema. The schema for defining a salad schema (the metaschema) is described in detail in "Schema validation". ### Record field annotations In a document schema, record field definitions may include the field `jsonldPredicate`, which may be either a string or object. Implementations must use the following document preprocessing of fields by the following rules: * If the value of `jsonldPredicate` is `@id`, the field is an identifier field. * If the value of `jsonldPredicate` is an object, and contains that object contains the field `_type` with the value `@id`, the field is a link field. * If the value of `jsonldPredicate` is an object, and contains that object contains the field `_type` with the value `@vocab`, the field is a vocabulary field, which is a subtype of link field. ## Document traversal To perform document document preprocessing, link validation and schema validation, the document must be traversed starting from the fields or array items of the root object or array and recursively visiting each child item which contains an object or arrays. # Document preprocessing After processing the explicit context (if any), document preprocessing begins. Starting from the document root, object fields values or array items which contain objects or arrays are recursively traversed depth-first. For each visited object, field names, identifier fields, link fields, vocabulary fields, and `$import` and `$include` directives must be processed as described in this section. The order of traversal of child nodes within a parent node is undefined. cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/link_res.yml0000644000175200017520000000341613247251315032520 0ustar mcrusoemcrusoe00000000000000- | ## Link resolution The schema may designate one or more fields as link fields reference other objects. Processing must resolve links to either absolute URIs using the following rules: * If a reference URI is prefixed with `#` it is a relative fragment identifier. It is resolved relative to the base URI by setting or replacing the fragment portion of the base URI. * If a reference URI does not contain a scheme and is not prefixed with `#` it is a path relative reference. If the reference URI contains `#` in any position other than the first character, the reference URI must be divided into a path portion and a fragment portion split on the first instance of `#`. The path portion is resolved relative to the base URI by the following rule: if the path portion of the base URI ends in a slash `/`, append the path portion of the reference URI to the path portion of the base URI. If the path portion of the base URI does not end in a slash, replace the final path segment with the path portion of the reference URI. Replace the fragment portion of the base URI with the fragment portion of the reference URI. * If a reference URI begins with a namespace prefix declared in `$namespaces` followed by a colon `:`, the prefix and colon must be replaced by the namespace declared in `$namespaces`. * If a reference URI is an absolute URI consisting of a scheme and path, no processing occurs. Link resolution must not affect the base URI used to resolve identifiers and other links. ### Link resolution example Given the following schema: ``` - $include: link_res_schema.yml - | ``` Process the following example: ``` - $include: link_res_src.yml - | ``` This becomes: ``` - $include: link_res_proc.yml - | ``` ././@LongLink0000000000000000000000000000015200000000000011213 Lustar 00000000000000cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/vocab_res_schema.ymlcwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/vocab_res_schem0000644000175200017520000000053113247251315033227 0ustar mcrusoemcrusoe00000000000000{ "$namespaces": { "acid": "http://example.com/acid#" }, "$graph": [{ "name": "Colors", "type": "enum", "symbols": ["acid:red"] }, { "name": "ExampleType", "type": "record", "fields": [{ "name": "voc", "type": "string", "jsonldPredicate": { "_type": "@vocab" } }] }] } ././@LongLink0000000000000000000000000000014700000000000011217 Lustar 00000000000000cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/vocab_res_src.ymlcwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/vocab_res_src.y0000644000175200017520000000041313247251315033165 0ustar mcrusoemcrusoe00000000000000 { "form": { "things": [ { "voc": "red", }, { "voc": "http://example.com/acid#red", }, { "voc": "http://example.com/acid#blue", } ] } } ././@LongLink0000000000000000000000000000014700000000000011217 Lustar 00000000000000cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/import_include.mdcwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/import_include.0000644000175200017520000001076313247251315033210 0ustar mcrusoemcrusoe00000000000000## Import During preprocessing traversal, an implementation must resolve `$import` directives. An `$import` directive is an object consisting of exactly one field `$import` specifying resource by URI string. It is an error if there are additional fields in the `$import` object, such additional fields must be ignored. The URI string must be resolved to an absolute URI using the link resolution rules described previously. Implementations must support loading from `file`, `http` and `https` resources. The URI referenced by `$import` must be loaded and recursively preprocessed as a Salad document. The external imported document does not inherit the context of the importing document, and the default base URI for processing the imported document must be the URI used to retrieve the imported document. If the `$import` URI includes a document fragment, the fragment must be excluded from the base URI used to preprocess the imported document. Once loaded and processed, the `$import` node is replaced in the document structure by the object or array yielded from the import operation. URIs may reference document fragments which refer to specific an object in the target document. This indicates that the `$import` node must be replaced by only the object with the appropriate fragment identifier. It is a fatal error if an import directive refers to an external resource or resource fragment which does not exist or is not accessible. ### Import example import.yml: ``` { "hello": "world" } ``` parent.yml: ``` { "form": { "bar": { "$import": "import.yml" } } } ``` This becomes: ``` { "form": { "bar": { "hello": "world" } } } ``` ## Include During preprocessing traversal, an implementation must resolve `$include` directives. An `$include` directive is an object consisting of exactly one field `$include` specifying a URI string. It is an error if there are additional fields in the `$include` object, such additional fields must be ignored. The URI string must be resolved to an absolute URI using the link resolution rules described previously. The URI referenced by `$include` must be loaded as a text data. Implementations must support loading from `file`, `http` and `https` resources. Implementations may transcode the character encoding of the text data to match that of the parent document, but must not interpret or parse the text document in any other way. Once loaded, the `$include` node is replaced in the document structure by a string containing the text data loaded from the resource. It is a fatal error if an import directive refers to an external resource which does not exist or is not accessible. ### Include example parent.yml: ``` { "form": { "bar": { "$include": "include.txt" } } } ``` include.txt: ``` hello world ``` This becomes: ``` { "form": { "bar": "hello world" } } ``` ## Mixin During preprocessing traversal, an implementation must resolve `$mixin` directives. An `$mixin` directive is an object consisting of the field `$mixin` specifying resource by URI string. If there are additional fields in the `$mixin` object, these fields override fields in the object which is loaded from the `$mixin` URI. The URI string must be resolved to an absolute URI using the link resolution rules described previously. Implementations must support loading from `file`, `http` and `https` resources. The URI referenced by `$mixin` must be loaded and recursively preprocessed as a Salad document. The external imported document must inherit the context of the importing document, however the file URI for processing the imported document must be the URI used to retrieve the imported document. The `$mixin` URI must not include a document fragment. Once loaded and processed, the `$mixin` node is replaced in the document structure by the object or array yielded from the import operation. URIs may reference document fragments which refer to specific an object in the target document. This indicates that the `$mixin` node must be replaced by only the object with the appropriate fragment identifier. It is a fatal error if an import directive refers to an external resource or resource fragment which does not exist or is not accessible. ### Mixin example mixin.yml: ``` { "hello": "world", "carrot": "orange" } ``` parent.yml: ``` { "form": { "bar": { "$mixin": "mixin.yml" "carrot": "cake" } } } ``` This becomes: ``` { "form": { "bar": { "hello": "world", "carrot": "cake" } } } ``` cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/metaschema.yml0000644000175200017520000002271713247251315033026 0ustar mcrusoemcrusoe00000000000000$base: "https://w3id.org/cwl/salad#" $namespaces: sld: "https://w3id.org/cwl/salad#" dct: "http://purl.org/dc/terms/" rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdfs: "http://www.w3.org/2000/01/rdf-schema#" xsd: "http://www.w3.org/2001/XMLSchema#" $graph: - name: "Semantic_Annotations_for_Linked_Avro_Data" type: documentation doc: - $include: salad.md - $import: field_name.yml - $import: ident_res.yml - $import: link_res.yml - $import: vocab_res.yml - $include: import_include.md - name: "Link_Validation" type: documentation doc: | # Link validation Once a document has been preprocessed, an implementation may validate links. The link validation traversal may visit fields which the schema designates as link fields and check that each URI references an existing object in the current document, an imported document, file system, or network resource. Failure to validate links may be a fatal error. Link validation behavior for individual fields may be modified by `identity` and `noLinkCheck` in the `jsonldPredicate` section of the field schema. - name: "Schema_validation" type: documentation doc: "" # - name: "JSON_LD_Context" # type: documentation # doc: | # # Generating JSON-LD Context # How to generate the json-ld context... - $import: metaschema_base.yml - name: JsonldPredicate type: record doc: | Attached to a record field to define how the parent record field is handled for URI resolution and JSON-LD context generation. fields: - name: _id type: string? jsonldPredicate: _id: sld:_id _type: "@id" identity: true doc: | The predicate URI that this field corresponds to. Corresponds to JSON-LD `@id` directive. - name: _type type: string? doc: | The context type hint, corresponds to JSON-LD `@type` directive. * If the value of this field is `@id` and `identity` is false or unspecified, the parent field must be resolved using the link resolution rules. If `identity` is true, the parent field must be resolved using the identifier expansion rules. * If the value of this field is `@vocab`, the parent field must be resolved using the vocabulary resolution rules. - name: _container type: string? doc: | Structure hint, corresponds to JSON-LD `@container` directive. - name: identity type: boolean? doc: | If true and `_type` is `@id` this indicates that the parent field must be resolved according to identity resolution rules instead of link resolution rules. In addition, the field value is considered an assertion that the linked value exists; absence of an object in the loaded document with the URI is not an error. - name: noLinkCheck type: boolean? doc: | If true, this indicates that link validation traversal must stop at this field. This field (it is is a URI) or any fields under it (if it is an object or array) are not subject to link checking. - name: mapSubject type: string? doc: | If the value of the field is a JSON object, it must be transformed into an array of JSON objects, where each key-value pair from the source JSON object is a list item, the list items must be JSON objects, and the key is assigned to the field specified by `mapSubject`. - name: mapPredicate type: string? doc: | Only applies if `mapSubject` is also provided. If the value of the field is a JSON object, it is transformed as described in `mapSubject`, with the addition that when the value of a map item is not an object, the item is transformed to a JSON object with the key assigned to the field specified by `mapSubject` and the value assigned to the field specified by `mapPredicate`. - name: refScope type: int? doc: | If the field contains a relative reference, it must be resolved by searching for valid document references in each successive parent scope in the document fragment. For example, a reference of `foo` in the context `#foo/bar/baz` will first check for the existence of `#foo/bar/baz/foo`, followed by `#foo/bar/foo`, then `#foo/foo` and then finally `#foo`. The first valid URI in the search order shall be used as the fully resolved value of the identifier. The value of the refScope field is the specified number of levels from the containing identifer scope before starting the search, so if `refScope: 2` then "baz" and "bar" must be stripped to get the base `#foo` and search `#foo/foo` and the `#foo`. The last scope searched must be the top level scope before determining if the identifier cannot be resolved. - name: typeDSL type: boolean? doc: | Field must be expanded based on the the Schema Salad type DSL. - name: SpecializeDef type: record fields: - name: specializeFrom type: string doc: "The data type to be replaced" jsonldPredicate: _id: "sld:specializeFrom" _type: "@id" refScope: 1 - name: specializeTo type: string doc: "The new data type to replace with" jsonldPredicate: _id: "sld:specializeTo" _type: "@id" refScope: 1 - name: NamedType type: record abstract: true fields: - name: name type: string jsonldPredicate: "@id" doc: "The identifier for this type" - name: DocType type: record abstract: true fields: - name: doc type: - string? - string[]? doc: "A documentation string for this type, or an array of strings which should be concatenated." jsonldPredicate: "rdfs:comment" - name: docParent type: string? doc: | Hint to indicate that during documentation generation, documentation for this type should appear in a subsection under `docParent`. jsonldPredicate: _id: "sld:docParent" _type: "@id" - name: docChild type: - string? - string[]? doc: | Hint to indicate that during documentation generation, documentation for `docChild` should appear in a subsection under this type. jsonldPredicate: _id: "sld:docChild" _type: "@id" - name: docAfter type: string? doc: | Hint to indicate that during documentation generation, documentation for this type should appear after the `docAfter` section at the same level. jsonldPredicate: _id: "sld:docAfter" _type: "@id" - name: SchemaDefinedType type: record extends: DocType doc: | Abstract base for schema-defined types. abstract: true fields: - name: jsonldPredicate type: - string? - JsonldPredicate? doc: | Annotate this type with linked data context. jsonldPredicate: sld:jsonldPredicate - name: documentRoot type: boolean? doc: | If true, indicates that the type is a valid at the document root. At least one type in a schema must be tagged with `documentRoot: true`. - name: SaladRecordField type: record extends: RecordField doc: "A field of a record." fields: - name: jsonldPredicate type: - string? - JsonldPredicate? doc: | Annotate this type with linked data context. jsonldPredicate: "sld:jsonldPredicate" - name: SaladRecordSchema type: record extends: [NamedType, RecordSchema, SchemaDefinedType] documentRoot: true specialize: RecordField: SaladRecordField fields: - name: abstract type: boolean? doc: | If true, this record is abstract and may be used as a base for other records, but is not valid on its own. - name: extends type: - string? - string[]? jsonldPredicate: _id: "sld:extends" _type: "@id" refScope: 1 doc: | Indicates that this record inherits fields from one or more base records. - name: specialize type: - SpecializeDef[]? doc: | Only applies if `extends` is declared. Apply type specialization using the base record as a template. For each field inherited from the base record, replace any instance of the type `specializeFrom` with `specializeTo`. jsonldPredicate: _id: "sld:specialize" mapSubject: specializeFrom mapPredicate: specializeTo - name: SaladEnumSchema type: record extends: [EnumSchema, SchemaDefinedType] documentRoot: true doc: | Define an enumerated type. fields: - name: extends type: - string? - string[]? jsonldPredicate: _id: "sld:extends" _type: "@id" refScope: 1 doc: | Indicates that this enum inherits symbols from a base enum. - name: Documentation type: record extends: [NamedType, DocType] documentRoot: true doc: | A documentation section. This type exists to facilitate self-documenting schemas but has no role in formal validation. fields: - name: type doc: "Must be `documentation`" type: name: Documentation_symbol type: enum symbols: - "sld:documentation" jsonldPredicate: _id: "sld:type" _type: "@vocab" typeDSL: true refScope: 2 ././@LongLink0000000000000000000000000000015100000000000011212 Lustar 00000000000000cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/metaschema_base.ymlcwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/metaschema_base0000644000175200017520000000702413247251315033212 0ustar mcrusoemcrusoe00000000000000$base: "https://w3id.org/cwl/salad#" $namespaces: sld: "https://w3id.org/cwl/salad#" dct: "http://purl.org/dc/terms/" rdf: "http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdfs: "http://www.w3.org/2000/01/rdf-schema#" xsd: "http://www.w3.org/2001/XMLSchema#" $graph: - name: PrimitiveType type: enum symbols: - "sld:null" - "xsd:boolean" - "xsd:int" - "xsd:long" - "xsd:float" - "xsd:double" - "xsd:string" doc: - | Salad data types are based on Avro schema declarations. Refer to the [Avro schema declaration documentation](https://avro.apache.org/docs/current/spec.html#schemas) for detailed information. - "null: no value" - "boolean: a binary value" - "int: 32-bit signed integer" - "long: 64-bit signed integer" - "float: single precision (32-bit) IEEE 754 floating-point number" - "double: double precision (64-bit) IEEE 754 floating-point number" - "string: Unicode character sequence" - name: Any type: enum symbols: ["#Any"] doc: | The **Any** type validates for any non-null value. - name: RecordField type: record doc: A field of a record. fields: - name: name type: string jsonldPredicate: "@id" doc: | The name of the field - name: doc type: string? doc: | A documentation string for this field jsonldPredicate: "rdfs:comment" - name: type type: - PrimitiveType - RecordSchema - EnumSchema - ArraySchema - string - type: array items: - PrimitiveType - RecordSchema - EnumSchema - ArraySchema - string jsonldPredicate: _id: sld:type _type: "@vocab" typeDSL: true refScope: 2 doc: | The field type - name: RecordSchema type: record fields: type: doc: "Must be `record`" type: name: Record_symbol type: enum symbols: - "sld:record" jsonldPredicate: _id: "sld:type" _type: "@vocab" typeDSL: true refScope: 2 fields: type: RecordField[]? jsonldPredicate: _id: sld:fields mapSubject: name mapPredicate: type doc: "Defines the fields of the record." - name: EnumSchema type: record doc: | Define an enumerated type. fields: type: doc: "Must be `enum`" type: name: Enum_symbol type: enum symbols: - "sld:enum" jsonldPredicate: _id: "sld:type" _type: "@vocab" typeDSL: true refScope: 2 symbols: type: string[] jsonldPredicate: _id: "sld:symbols" _type: "@id" identity: true doc: "Defines the set of valid symbols." - name: ArraySchema type: record fields: type: doc: "Must be `array`" type: name: Array_symbol type: enum symbols: - "sld:array" jsonldPredicate: _id: "sld:type" _type: "@vocab" typeDSL: true refScope: 2 items: type: - PrimitiveType - RecordSchema - EnumSchema - ArraySchema - string - type: array items: - PrimitiveType - RecordSchema - EnumSchema - ArraySchema - string jsonldPredicate: _id: "sld:items" _type: "@vocab" refScope: 2 doc: "Defines the type of the array elements." ././@LongLink0000000000000000000000000000015000000000000011211 Lustar 00000000000000cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/field_name_src.ymlcwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/field_name_src.0000644000175200017520000000025313247251315033116 0ustar mcrusoemcrusoe00000000000000 { "base": "one", "form": { "http://example.com/base": "two", "http://example.com/three": "three", }, "acid:four": "four" } ././@LongLink0000000000000000000000000000014600000000000011216 Lustar 00000000000000cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/link_res_src.ymlcwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/link_res_src.ym0000644000175200017520000000047113247251315033211 0ustar mcrusoemcrusoe00000000000000{ "$base": "http://example.com/base", "link": "http://example.com/base/zero", "form": { "link": "one", "things": [ { "link": "two" }, { "link": "#three", }, { "link": "four#five", }, { "link": "acid:six", } ] } } cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/ident_res.yml0000644000175200017520000000323413247251315032664 0ustar mcrusoemcrusoe00000000000000- | ## Identifier resolution The schema may designate one or more fields as identifier fields to identify specific objects. Processing must resolve relative identifiers to absolute identifiers using the following rules: * If an identifier URI is prefixed with `#` it is a URI relative fragment identifier. It is resolved relative to the base URI by setting or replacing the fragment portion of the base URI. * If an identifier URI does not contain a scheme and is not prefixed `#` it is a parent relative fragment identifier. It is resolved relative to the base URI by the following rule: if the base URI does not contain a document fragment, set the fragment portion of the base URI. If the base URI does contain a document fragment, append a slash `/` followed by the identifier field to the fragment portion of the base URI. * If an identifier URI begins with a namespace prefix declared in `$namespaces` followed by a colon `:`, the prefix and colon must be replaced by the namespace declared in `$namespaces`. * If an identifier URI is an absolute URI consisting of a scheme and path, no processing occurs. When preprocessing visits a node containing an identifier, that identifier must be used as the base URI to process child nodes. It is an error for more than one object in a document to have the same absolute URI. ### Identifier resolution example Given the following schema: ``` - $include: ident_res_schema.yml - | ``` Process the following example: ``` - $include: ident_res_src.yml - | ``` This becomes: ``` - $include: ident_res_proc.yml - | ``` ././@LongLink0000000000000000000000000000015100000000000011212 Lustar 00000000000000cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/link_res_schema.ymlcwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/link_res_schema0000644000175200017520000000041013247251315033227 0ustar mcrusoemcrusoe00000000000000{ "$namespaces": { "acid": "http://example.com/acid#" }, "$graph": [{ "name": "ExampleType", "type": "record", "fields": [{ "name": "link", "type": "string", "jsonldPredicate": { "_type": "@id" } }] }] } ././@LongLink0000000000000000000000000000015000000000000011211 Lustar 00000000000000cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/ident_res_proc.ymlcwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/ident_res_proc.0000644000175200017520000000056213247251315033166 0ustar mcrusoemcrusoe00000000000000{ "id": "http://example.com/base", "form": { "id": "http://example.com/base#one", "things": [ { "id": "http://example.com/base#one/two" }, { "id": "http://example.com/base#three" }, { "id": "http://example.com/four#five", }, { "id": "http://example.com/acid#six", } ] } } ././@LongLink0000000000000000000000000000015000000000000011211 Lustar 00000000000000cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/vocab_res_proc.ymlcwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/vocab_res_proc.0000644000175200017520000000036313247251315033154 0ustar mcrusoemcrusoe00000000000000 { "form": { "things": [ { "voc": "red", }, { "voc": "red", }, { "voc": "http://example.com/acid#blue", } ] } } cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/vocab_res.yml0000644000175200017520000000142313247251315032651 0ustar mcrusoemcrusoe00000000000000- | ## Vocabulary resolution The schema may designate one or more vocabulary fields which use terms defined in the vocabulary. Processing must resolve vocabulary fields to either vocabulary terms or absolute URIs by first applying the link resolution rules defined above, then applying the following additional rule: * If a reference URI is a vocabulary field, and there is a vocabulary term which maps to the resolved URI, the reference must be replace with the vocabulary term. ### Vocabulary resolution example Given the following schema: ``` - $include: vocab_res_schema.yml - | ``` Process the following example: ``` - $include: vocab_res_src.yml - | ``` This becomes: ``` - $include: vocab_res_proc.yml - | ``` ././@LongLink0000000000000000000000000000015100000000000011212 Lustar 00000000000000cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/field_name_proc.ymlcwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/field_name_proc0000644000175200017520000000025313247251315033214 0ustar mcrusoemcrusoe00000000000000 { "base": "one", "form": { "base": "two", "http://example.com/three": "three", }, "http://example.com/acid#four": "four" } cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/Workflow.yml0000644000175200017520000005026713247251315024733 0ustar mcrusoemcrusoe00000000000000$base: "https://w3id.org/cwl/cwl#" $namespaces: cwl: "https://w3id.org/cwl/cwl#" rdfs: "http://www.w3.org/2000/01/rdf-schema#" $graph: - name: "WorkflowDoc" type: documentation doc: - | # Common Workflow Language (CWL) Workflow Description, v1.1.0-dev1 This version: * https://w3id.org/cwl/v1.1.0-dev1/ Current version: * https://w3id.org/cwl/ - "\n\n" - {$include: contrib.md} - "\n\n" - | # Abstract One way to define a workflow is: an analysis task represented by a directed graph describing a sequence of operations that transform an input data set to output. This specification defines the Common Workflow Language (CWL) Workflow description, a vendor-neutral standard for representing workflows intended to be portable across a variety of computing platforms. - {$include: intro.md} - | ## Introduction to v1.1.0-dev1 This is the first development release of the 1.1.0 version of the CWL Workflow specification. ## Introduction to v1.0 This specification represents the first full release from the CWL group. Since draft-3, this draft introduces the following changes and additions: * The `inputs` and `outputs` fields have been renamed `in` and `out`. * Syntax simplifcations: denoted by the `map<>` syntax. Example: `in` contains a list of items, each with an id. Now one can specify a mapping of that identifier to the corresponding `InputParameter`. ``` in: - id: one type: string doc: First input parameter - id: two type: int doc: Second input parameter ``` can be ``` in: one: type: string doc: First input parameter two: type: int doc: Second input parameter ``` * The common field `description` has been renamed to `doc`. ## Purpose The Common Workflow Language Command Line Tool Description express workflows for data-intensive science, such as Bioinformatics, Chemistry, Physics, and Astronomy. This specification is intended to define a data and execution model for Workflows that can be implemented on top of a variety of computing platforms, ranging from an individual workstation to cluster, grid, cloud, and high performance computing systems. - {$include: concepts.md} - name: ExpressionToolOutputParameter type: record extends: OutputParameter fields: - name: type type: - "null" - CWLType - OutputRecordSchema - OutputEnumSchema - OutputArraySchema - string - type: array items: - CWLType - OutputRecordSchema - OutputEnumSchema - OutputArraySchema - string jsonldPredicate: "_id": "sld:type" "_type": "@vocab" refScope: 2 typeDSL: True doc: | Specify valid types of data that may be assigned to this parameter. - type: record name: ExpressionTool extends: Process specialize: - specializeFrom: InputParameter specializeTo: RegularInputParameter - specializeFrom: OutputParameter specializeTo: ExpressionToolOutputParameter documentRoot: true doc: | Execute an expression as a Workflow step. fields: - name: class jsonldPredicate: "_id": "@type" "_type": "@vocab" type: string - name: expression type: [string, Expression] doc: | The expression to execute. The expression must return a JSON object which matches the output parameters of the ExpressionTool. - name: LinkMergeMethod type: enum docParent: "#WorkflowStepInput" doc: The input link merge method, described in [WorkflowStepInput](#WorkflowStepInput). symbols: - merge_nested - merge_flattened - name: WorkflowOutputParameter type: record extends: OutputParameter docParent: "#Workflow" doc: | Describe an output parameter of a workflow. The parameter must be connected to one or more parameters defined in the workflow that will provide the value of the output parameter. fields: - name: outputSource doc: | Specifies one or more workflow parameters that supply the value of to the output parameter. jsonldPredicate: "_id": "cwl:outputSource" "_type": "@id" refScope: 0 type: - string? - string[]? - name: linkMerge type: ["null", LinkMergeMethod] jsonldPredicate: "cwl:linkMerge" doc: | The method to use to merge multiple sources into a single array. If not specified, the default method is "merge_nested". - name: type type: - "null" - CWLType - OutputRecordSchema - OutputEnumSchema - OutputArraySchema - string - type: array items: - CWLType - OutputRecordSchema - OutputEnumSchema - OutputArraySchema - string jsonldPredicate: "_id": "sld:type" "_type": "@vocab" refScope: 2 typeDSL: True doc: | Specify valid types of data that may be assigned to this parameter. - name: Sink type: record abstract: true fields: - name: source doc: | Specifies one or more workflow parameters that will provide input to the underlying step parameter. jsonldPredicate: "_id": "cwl:source" "_type": "@id" refScope: 2 type: - string? - string[]? - name: linkMerge type: LinkMergeMethod? jsonldPredicate: "cwl:linkMerge" doc: | The method to use to merge multiple inbound links into a single array. If not specified, the default method is "merge_nested". - type: record name: WorkflowStepInput extends: Sink docParent: "#WorkflowStep" doc: | The input of a workflow step connects an upstream parameter (from the workflow inputs, or the outputs of other workflows steps) with the input parameters of the underlying step. ## Input object A WorkflowStepInput object must contain an `id` field in the form `#fieldname` or `#prefix/fieldname`. When the `id` field contains a slash `/` the field name consists of the characters following the final slash (the prefix portion may contain one or more slashes to indicate scope). This defines a field of the workflow step input object with the value of the `source` parameter(s). ## Merging To merge multiple inbound data links, [MultipleInputFeatureRequirement](#MultipleInputFeatureRequirement) must be specified in the workflow or workflow step requirements. If the sink parameter is an array, or named in a [workflow scatter](#WorkflowStep) operation, there may be multiple inbound data links listed in the `source` field. The values from the input links are merged depending on the method specified in the `linkMerge` field. If not specified, the default method is "merge_nested". * **merge_nested** The input must be an array consisting of exactly one entry for each input link. If "merge_nested" is specified with a single link, the value from the link must be wrapped in a single-item list. * **merge_flattened** 1. The source and sink parameters must be compatible types, or the source type must be compatible with single element from the "items" type of the destination array parameter. 2. Source parameters which are arrays are concatenated. Source parameters which are single element types are appended as single elements. fields: - name: id type: string jsonldPredicate: "@id" doc: "A unique identifier for this workflow input parameter." - name: default type: ["null", Any] doc: | The default value for this parameter to use if either there is no `source` field, or the value produced by the `source` is `null`. jsonldPredicate: _id: "cwl:default" noLinkCheck: true - name: valueFrom type: - "null" - string - Expression jsonldPredicate: "cwl:valueFrom" doc: | To use valueFrom, [StepInputExpressionRequirement](#StepInputExpressionRequirement) must be specified in the workflow or workflow step requirements. If `valueFrom` is a constant string value, use this as the value for this input parameter. If `valueFrom` is a parameter reference or expression, it must be evaluated to yield the actual value to be assiged to the input field. The `self` value of in the parameter reference or expression must be the value of the parameter(s) specified in the `source` field, or null if there is no `source` field. The value of `inputs` in the parameter reference or expression must be the input object to the workflow step after assigning the `source` values and then scattering. The order of evaluating `valueFrom` among step input parameters is undefined and the result of evaluating `valueFrom` on a parameter must not be visible to evaluation of `valueFrom` on other parameters. - type: record name: WorkflowStepOutput docParent: "#WorkflowStep" doc: | Associate an output parameter of the underlying process with a workflow parameter. The workflow parameter (given in the `id` field) be may be used as a `source` to connect with input parameters of other workflow steps, or with an output parameter of the process. fields: - name: id type: string jsonldPredicate: "@id" doc: | A unique identifier for this workflow output parameter. This is the identifier to use in the `source` field of `WorkflowStepInput` to connect the output value to downstream parameters. - name: ScatterMethod type: enum docParent: "#WorkflowStep" doc: The scatter method, as described in [workflow step scatter](#WorkflowStep). symbols: - dotproduct - nested_crossproduct - flat_crossproduct - name: WorkflowStep type: record docParent: "#Workflow" doc: | A workflow step is an executable element of a workflow. It specifies the underlying process implementation (such as `CommandLineTool` or another `Workflow`) in the `run` field and connects the input and output parameters of the underlying process to workflow parameters. # Scatter/gather To use scatter/gather, [ScatterFeatureRequirement](#ScatterFeatureRequirement) must be specified in the workflow or workflow step requirements. A "scatter" operation specifies that the associated workflow step or subworkflow should execute separately over a list of input elements. Each job making up a scatter operation is independent and may be executed concurrently. The `scatter` field specifies one or more input parameters which will be scattered. An input parameter may be listed more than once. The declared type of each input parameter is implicitly wrapped in an array for each time it appears in the `scatter` field. As a result, upstream parameters which are connected to scattered parameters may be arrays. All output parameter types are also implicitly wrapped in arrays. Each job in the scatter results in an entry in the output array. If `scatter` declares more than one input parameter, `scatterMethod` describes how to decompose the input into a discrete set of jobs. * **dotproduct** specifies that each of the input arrays are aligned and one element taken from each array to construct each job. It is an error if all input arrays are not the same length. * **nested_crossproduct** specifies the Cartesian product of the inputs, producing a job for every combination of the scattered inputs. The output must be nested arrays for each level of scattering, in the order that the input arrays are listed in the `scatter` field. * **flat_crossproduct** specifies the Cartesian product of the inputs, producing a job for every combination of the scattered inputs. The output arrays must be flattened to a single level, but otherwise listed in the order that the input arrays are listed in the `scatter` field. # Subworkflows To specify a nested workflow as part of a workflow step, [SubworkflowFeatureRequirement](#SubworkflowFeatureRequirement) must be specified in the workflow or workflow step requirements. It is a fatal error if a workflow directly or indirectly invokes itself as a subworkflow (recursive workflows are not allowed). fields: - name: id type: string jsonldPredicate: "@id" doc: "The unique identifier for this workflow step." - name: in type: WorkflowStepInput[] jsonldPredicate: _id: "cwl:in" mapSubject: id mapPredicate: source doc: | Defines the input parameters of the workflow step. The process is ready to run when all required input parameters are associated with concrete values. Input parameters include a schema for each parameter which is used to validate the input object. It may also be used build a user interface for constructing the input object. - name: out type: - type: array items: [string, WorkflowStepOutput] jsonldPredicate: _id: "cwl:out" _type: "@id" identity: true doc: | Defines the parameters representing the output of the process. May be used to generate and/or validate the output object. - name: requirements type: ProcessRequirement[]? jsonldPredicate: _id: "cwl:requirements" mapSubject: class doc: | Declares requirements that apply to either the runtime environment or the workflow engine that must be met in order to execute this workflow step. If an implementation cannot satisfy all requirements, or a requirement is listed which is not recognized by the implementation, it is a fatal error and the implementation must not attempt to run the process, unless overridden at user option. - name: hints type: Any[]? jsonldPredicate: _id: "cwl:hints" noLinkCheck: true mapSubject: class doc: | Declares hints applying to either the runtime environment or the workflow engine that may be helpful in executing this workflow step. It is not an error if an implementation cannot satisfy all hints, however the implementation may report a warning. - name: label type: string? jsonldPredicate: "rdfs:label" doc: "A short, human-readable label of this process object." - name: doc type: string? jsonldPredicate: "rdfs:comment" doc: "A long, human-readable description of this process object." - name: run type: [string, Process] jsonldPredicate: "_id": "cwl:run" "_type": "@id" doc: | Specifies the process to run. - name: scatter type: - string? - string[]? jsonldPredicate: "_id": "cwl:scatter" "_type": "@id" "_container": "@list" refScope: 0 - name: scatterMethod doc: | Required if `scatter` is an array of more than one element. type: ScatterMethod? jsonldPredicate: "_id": "cwl:scatterMethod" "_type": "@vocab" - name: Workflow type: record extends: "#Process" documentRoot: true specialize: - specializeFrom: InputParameter specializeTo: RegularInputParameter - specializeFrom: OutputParameter specializeTo: WorkflowOutputParameter doc: | A workflow describes a set of **steps** and the **dependencies** between those steps. When a step produces output that will be consumed by a second step, the first step is a dependency of the second step. When there is a dependency, the workflow engine must execute the preceeding step and wait for it to successfully produce output before executing the dependent step. If two steps are defined in the workflow graph that are not directly or indirectly dependent, these steps are **independent**, and may execute in any order or execute concurrently. A workflow is complete when all steps have been executed. Dependencies between parameters are expressed using the `source` field on [workflow step input parameters](#WorkflowStepInput) and [workflow output parameters](#WorkflowOutputParameter). The `source` field expresses the dependency of one parameter on another such that when a value is associated with the parameter specified by `source`, that value is propagated to the destination parameter. When all data links inbound to a given step are fufilled, the step is ready to execute. ## Workflow success and failure A completed step must result in one of `success`, `temporaryFailure` or `permanentFailure` states. An implementation may choose to retry a step execution which resulted in `temporaryFailure`. An implementation may choose to either continue running other steps of a workflow, or terminate immediately upon `permanentFailure`. * If any step of a workflow execution results in `permanentFailure`, then the workflow status is `permanentFailure`. * If one or more steps result in `temporaryFailure` and all other steps complete `success` or are not executed, then the workflow status is `temporaryFailure`. * If all workflow steps are executed and complete with `success`, then the workflow status is `success`. # Extensions [ScatterFeatureRequirement](#ScatterFeatureRequirement) and [SubworkflowFeatureRequirement](#SubworkflowFeatureRequirement) are available as standard [extensions](#Extensions_and_Metadata) to core workflow semantics. fields: - name: "class" jsonldPredicate: "_id": "@type" "_type": "@vocab" type: string - name: steps doc: | The individual steps that make up the workflow. Each step is executed when all of its input data links are fufilled. An implementation may choose to execute the steps in a different order than listed and/or execute steps concurrently, provided that dependencies between steps are met. type: - type: array items: "#WorkflowStep" jsonldPredicate: mapSubject: id - type: record name: SubworkflowFeatureRequirement extends: ProcessRequirement doc: | Indicates that the workflow platform must support nested workflows in the `run` field of [WorkflowStep](#WorkflowStep). fields: - name: "class" type: "string" doc: "Always 'SubworkflowFeatureRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: ScatterFeatureRequirement type: record extends: ProcessRequirement doc: | Indicates that the workflow platform must support the `scatter` and `scatterMethod` fields of [WorkflowStep](#WorkflowStep). fields: - name: "class" type: "string" doc: "Always 'ScatterFeatureRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: MultipleInputFeatureRequirement type: record extends: ProcessRequirement doc: | Indicates that the workflow platform must support multiple inbound data links listed in the `source` field of [WorkflowStepInput](#WorkflowStepInput). fields: - name: "class" type: "string" doc: "Always 'MultipleInputFeatureRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - type: record name: StepInputExpressionRequirement extends: ProcessRequirement doc: | Indicate that the workflow platform must support the `valueFrom` field of [WorkflowStepInput](#WorkflowStepInput). fields: - name: "class" type: "string" doc: "Always 'StepInputExpressionRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/CommandLineTool-standalone.yml0000644000175200017520000000006513247251315030262 0ustar mcrusoemcrusoe00000000000000- $import: Process.yml - $import: CommandLineTool.ymlcwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/intro.md0000644000175200017520000000157313247251315024047 0ustar mcrusoemcrusoe00000000000000# Status of this document This document is the product of the [Common Workflow Language working group](https://groups.google.com/forum/#!forum/common-workflow-language). The latest stable version of this document is available in the "v1.0" directory at https://github.com/common-workflow-language/common-workflow-language The products of the CWL working group (including this document) are made available under the terms of the Apache License, version 2.0. # Introduction The Common Workflow Language (CWL) working group is an informal, multi-vendor working group consisting of various organizations and individuals that have an interest in portability of data analysis workflows. The goal is to create specifications like this one that enable data scientists to describe analysis tools and workflows that are powerful, easy to use, portable, and support reproducibility. cwltool-1.0.20180302231433/cwltool/schemas/v1.1.0-dev1/CommandLineTool.yml0000644000175200017520000007714013247251315026144 0ustar mcrusoemcrusoe00000000000000$base: "https://w3id.org/cwl/cwl#" $namespaces: cwl: "https://w3id.org/cwl/cwl#" $graph: - name: CommandLineToolDoc type: documentation doc: - | # Common Workflow Language (CWL) Command Line Tool Description, v1.1.0-dev1 This version: * https://w3id.org/cwl/v1.1.0-dev1/ Current version: * https://w3id.org/cwl/ - "\n\n" - {$include: contrib.md} - "\n\n" - | # Abstract A Command Line Tool is a non-interactive executable program that reads some input, performs a computation, and terminates after producing some output. Command line programs are a flexible unit of code sharing and reuse, unfortunately the syntax and input/output semantics among command line programs is extremely heterogeneous. A common layer for describing the syntax and semantics of programs can reduce this incidental complexity by providing a consistent way to connect programs together. This specification defines the Common Workflow Language (CWL) Command Line Tool Description, a vendor-neutral standard for describing the syntax and input/output semantics of command line programs. - {$include: intro.md} - | ## Introduction to v1.1.0-dev1 This is the in progress first development version of the first maintenance release of the CWL CommandLineTool specification. Version 1.1 introduces the followings additions: * Addition of `stdin` type shortcut for `CommandInputParamater`s. ## Introduction to v1.0 This specification represents the first full release from the CWL group. Since draft-3, version 1.0 introduces the following changes and additions: * The [Directory](#Directory) type. * Syntax simplifcations: denoted by the `map<>` syntax. Example: inputs contains a list of items, each with an id. Now one can specify a mapping of that identifier to the corresponding `CommandInputParamater`. ``` inputs: - id: one type: string doc: First input parameter - id: two type: int doc: Second input parameter ``` can be ``` inputs: one: type: string doc: First input parameter two: type: int doc: Second input parameter ``` * [InitialWorkDirRequirement](#InitialWorkDirRequirement): list of files and subdirectories to be present in the output directory prior to execution. * Shortcuts for specifying the standard [output](#stdout) and/or [error](#stderr) streams as a (streamable) File output. * [SoftwareRequirement](#SoftwareRequirement) for describing software dependencies of a tool. * The common `description` field has been renamed to `doc`. ## Errata Post v1.0 release changes to the spec. * 13 July 2016: Mark `baseCommand` as optional and update descriptive text. ## Purpose Standalone programs are a flexible and interoperable form of code reuse. Unlike monolithic applications, applications and analysis workflows which are composed of multiple separate programs can be written in multiple languages and execute concurrently on multiple hosts. However, POSIX does not dictate computer-readable grammar or semantics for program input and output, resulting in extremely heterogeneous command line grammar and input/output semantics among program. This is a particular problem in distributed computing (multi-node compute clusters) and virtualized environments (such as Docker containers) where it is often necessary to provision resources such as input files before executing the program. Often this gap is filled by hard coding program invocation and implicitly assuming requirements will be met, or abstracting program invocation with wrapper scripts or descriptor documents. Unfortunately, where these approaches are application or platform specific it creates a significant barrier to reproducibility and portability, as methods developed for one platform must be manually ported to be used on new platforms. Similarly it creates redundant work, as wrappers for popular tools must be rewritten for each application or platform in use. The Common Workflow Language Command Line Tool Description is designed to provide a common standard description of grammar and semantics for invoking programs used in data-intensive fields such as Bioinformatics, Chemistry, Physics, Astronomy, and Statistics. This specification defines a precise data and execution model for Command Line Tools that can be implemented on a variety of computing platforms, ranging from a single workstation to cluster, grid, cloud, and high performance computing platforms. - {$include: concepts.md} - {$include: invocation.md} - type: record name: EnvironmentDef doc: | Define an environment variable that will be set in the runtime environment by the workflow platform when executing the command line tool. May be the result of executing an expression, such as getting a parameter from input. fields: - name: envName type: string doc: The environment variable name - name: envValue type: [string, Expression] doc: The environment variable value - type: record name: CommandLineBinding extends: InputBinding doc: | When listed under `inputBinding` in the input schema, the term "value" refers to the the corresponding value in the input object. For binding objects listed in `CommandLineTool.arguments`, the term "value" refers to the effective value after evaluating `valueFrom`. The binding behavior when building the command line depends on the data type of the value. If there is a mismatch between the type described by the input schema and the effective value, such as resulting from an expression evaluation, an implementation must use the data type of the effective value. - **string**: Add `prefix` and the string to the command line. - **number**: Add `prefix` and decimal representation to command line. - **boolean**: If true, add `prefix` to the command line. If false, add nothing. - **File**: Add `prefix` and the value of [`File.path`](#File) to the command line. - **array**: If `itemSeparator` is specified, add `prefix` and the join the array into a single string with `itemSeparator` separating the items. Otherwise first add `prefix`, then recursively process individual elements. - **object**: Add `prefix` only, and recursively add object fields for which `inputBinding` is specified. - **null**: Add nothing. fields: - name: position type: int? doc: "The sorting key. Default position is 0." - name: prefix type: string? doc: "Command line prefix to add before the value." - name: separate type: boolean? doc: | If true (default), then the prefix and value must be added as separate command line arguments; if false, prefix and value must be concatenated into a single command line argument. - name: itemSeparator type: string? doc: | Join the array elements into a single string with the elements separated by by `itemSeparator`. - name: valueFrom type: - "null" - string - Expression jsonldPredicate: "cwl:valueFrom" doc: | If `valueFrom` is a constant string value, use this as the value and apply the binding rules above. If `valueFrom` is an expression, evaluate the expression to yield the actual value to use to build the command line and apply the binding rules above. If the inputBinding is associated with an input parameter, the value of `self` in the expression will be the value of the input parameter. When a binding is part of the `CommandLineTool.arguments` field, the `valueFrom` field is required. - name: shellQuote type: boolean? doc: | If `ShellCommandRequirement` is in the requirements for the current command, this controls whether the value is quoted on the command line (default is true). Use `shellQuote: false` to inject metacharacters for operations such as pipes. - type: record name: CommandOutputBinding extends: OutputBinding doc: | Describes how to generate an output parameter based on the files produced by a CommandLineTool. The output parameter is generated by applying these operations in the following order: - glob - loadContents - outputEval fields: - name: glob type: - "null" - string - Expression - type: array items: string doc: | Find files relative to the output directory, using POSIX glob(3) pathname matching. If an array is provided, find files that match any pattern in the array. If an expression is provided, the expression must return a string or an array of strings, which will then be evaluated as one or more glob patterns. Must only match and return files which actually exist. - name: loadContents type: - "null" - boolean jsonldPredicate: "cwl:loadContents" doc: | For each file matched in `glob`, read up to the first 64 KiB of text from the file and place it in the `contents` field of the file object for manipulation by `outputEval`. - name: outputEval type: - "null" - string - Expression doc: | Evaluate an expression to generate the output value. If `glob` was specified, the value of `self` must be an array containing file objects that were matched. If no files were matched, `self` must be a zero length array; if a single file was matched, the value of `self` is an array of a single element. Additionally, if `loadContents` is `true`, the File objects must include up to the first 64 KiB of file contents in the `contents` field. - name: CommandInputRecordField type: record extends: InputRecordField specialize: - specializeFrom: InputRecordSchema specializeTo: CommandInputRecordSchema - specializeFrom: InputEnumSchema specializeTo: CommandInputEnumSchema - specializeFrom: InputArraySchema specializeTo: CommandInputArraySchema - specializeFrom: InputBinding specializeTo: CommandLineBinding - name: CommandInputRecordSchema type: record extends: InputRecordSchema specialize: - specializeFrom: InputRecordField specializeTo: CommandInputRecordField - name: CommandInputEnumSchema type: record extends: InputEnumSchema specialize: - specializeFrom: InputBinding specializeTo: CommandLineBinding - name: CommandInputArraySchema type: record extends: InputArraySchema specialize: - specializeFrom: InputRecordSchema specializeTo: CommandInputRecordSchema - specializeFrom: InputEnumSchema specializeTo: CommandInputEnumSchema - specializeFrom: InputArraySchema specializeTo: CommandInputArraySchema - specializeFrom: InputBinding specializeTo: CommandLineBinding - name: CommandOutputRecordField type: record extends: OutputRecordField specialize: - specializeFrom: OutputRecordSchema specializeTo: CommandOutputRecordSchema - specializeFrom: OutputEnumSchema specializeTo: CommandOutputEnumSchema - specializeFrom: OutputArraySchema specializeTo: CommandOutputArraySchema - specializeFrom: OutputBinding specializeTo: CommandOutputBinding - name: CommandOutputRecordSchema type: record extends: OutputRecordSchema specialize: - specializeFrom: OutputRecordField specializeTo: CommandOutputRecordField - name: CommandOutputEnumSchema type: record extends: OutputEnumSchema specialize: - specializeFrom: OutputRecordSchema specializeTo: CommandOutputRecordSchema - specializeFrom: OutputEnumSchema specializeTo: CommandOutputEnumSchema - specializeFrom: OutputArraySchema specializeTo: CommandOutputArraySchema - specializeFrom: OutputBinding specializeTo: CommandOutputBinding - name: CommandOutputArraySchema type: record extends: OutputArraySchema specialize: - specializeFrom: OutputRecordSchema specializeTo: CommandOutputRecordSchema - specializeFrom: OutputEnumSchema specializeTo: CommandOutputEnumSchema - specializeFrom: OutputArraySchema specializeTo: CommandOutputArraySchema - specializeFrom: OutputBinding specializeTo: CommandOutputBinding - type: record name: CommandInputParameter extends: InputParameter doc: An input parameter for a CommandLineTool. specialize: - specializeFrom: InputBinding specializeTo: CommandLineBinding fields: - name: type type: - "null" - CWLType - stdin - CommandInputRecordSchema - CommandInputEnumSchema - CommandInputArraySchema - string - type: array items: - CWLType - CommandInputRecordSchema - CommandInputEnumSchema - CommandInputArraySchema - string jsonldPredicate: "_id": "sld:type" "_type": "@vocab" refScope: 2 typeDSL: True doc: | Specify valid types of data that may be assigned to this parameter. - type: record name: CommandOutputParameter extends: OutputParameter doc: An output parameter for a CommandLineTool. specialize: - specializeFrom: OutputBinding specializeTo: CommandOutputBinding fields: - name: type type: - "null" - CWLType - stdin - stdout - stderr - CommandOutputRecordSchema - CommandOutputEnumSchema - CommandOutputArraySchema - string - type: array items: - CWLType - CommandOutputRecordSchema - CommandOutputEnumSchema - CommandOutputArraySchema - string jsonldPredicate: "_id": "sld:type" "_type": "@vocab" refScope: 2 typeDSL: True doc: | Specify valid types of data that may be assigned to this parameter. - name: stdin type: enum symbols: [ "cwl:stdin" ] docParent: "#CommandOutputParameter" doc: | Only valid as a `type` for a `CommandLineTool` input with no `inputBinding` set. `stdin` must not be specified at the `CommandLineTool` level. The following ``` inputs: an_input_name: type: stdin ``` is equivalent to ``` inputs: an_input_name: type: File streamable: true stdin: ${inputs.an_input_name.path} ``` - name: stdout type: enum symbols: [ "cwl:stdout" ] docParent: "#CommandOutputParameter" doc: | Only valid as a `type` for a `CommandLineTool` output with no `outputBinding` set. The following ``` outputs: an_output_name: type: stdout stdout: a_stdout_file ``` is equivalent to ``` outputs: an_output_name: type: File streamable: true outputBinding: glob: a_stdout_file stdout: a_stdout_file ``` If there is no `stdout` name provided, a random filename will be created. For example, the following ``` outputs: an_output_name: type: stdout ``` is equivalent to ``` outputs: an_output_name: type: File streamable: true outputBinding: glob: random_stdout_filenameABCDEFG stdout: random_stdout_filenameABCDEFG ``` - name: stderr type: enum symbols: [ "cwl:stderr" ] docParent: "#CommandOutputParameter" doc: | Only valid as a `type` for a `CommandLineTool` output with no `outputBinding` set. The following ``` outputs: an_output_name: type: stderr stderr: a_stderr_file ``` is equivalent to ``` outputs: an_output_name: type: File streamable: true outputBinding: glob: a_stderr_file stderr: a_stderr_file ``` If there is no `stderr` name provided, a random filename will be created. For example, the following ``` outputs: an_output_name: type: stderr ``` is equivalent to ``` outputs: an_output_name: type: File streamable: true outputBinding: glob: random_stderr_filenameABCDEFG stderr: random_stderr_filenameABCDEFG ``` - type: record name: CommandLineTool extends: Process documentRoot: true specialize: - specializeFrom: InputParameter specializeTo: CommandInputParameter - specializeFrom: OutputParameter specializeTo: CommandOutputParameter doc: | This defines the schema of the CWL Command Line Tool Description document. fields: - name: class jsonldPredicate: "_id": "@type" "_type": "@vocab" type: string - name: baseCommand doc: | Specifies the program to execute. If an array, the first element of the array is the command to execute, and subsequent elements are mandatory command line arguments. The elements in `baseCommand` must appear before any command line bindings from `inputBinding` or `arguments`. If `baseCommand` is not provided or is an empty array, the first element of the command line produced after processing `inputBinding` or `arguments` must be used as the program to execute. If the program includes a path separator character it must be an absolute path, otherwise it is an error. If the program does not include a path separator, search the `$PATH` variable in the runtime environment of the workflow runner find the absolute path of the executable. type: - string? - string[]? jsonldPredicate: "_id": "cwl:baseCommand" "_container": "@list" - name: arguments doc: | Command line bindings which are not directly associated with input parameters. type: - "null" - type: array items: [string, Expression, CommandLineBinding] jsonldPredicate: "_id": "cwl:arguments" "_container": "@list" - name: stdin type: ["null", string, Expression] jsonldPredicate: "https://w3id.org/cwl/cwl#stdin" doc: | A path to a file whose contents must be piped into the command's standard input stream. - name: stderr type: ["null", string, Expression] jsonldPredicate: "https://w3id.org/cwl/cwl#stderr" doc: | Capture the command's standard error stream to a file written to the designated output directory. If `stderr` is a string, it specifies the file name to use. If `stderr` is an expression, the expression is evaluated and must return a string with the file name to use to capture stderr. If the return value is not a string, or the resulting path contains illegal characters (such as the path separator `/`) it is an error. - name: stdout type: ["null", string, Expression] jsonldPredicate: "https://w3id.org/cwl/cwl#stdout" doc: | Capture the command's standard output stream to a file written to the designated output directory. If `stdout` is a string, it specifies the file name to use. If `stdout` is an expression, the expression is evaluated and must return a string with the file name to use to capture stdout. If the return value is not a string, or the resulting path contains illegal characters (such as the path separator `/`) it is an error. - name: successCodes type: int[]? doc: | Exit codes that indicate the process completed successfully. - name: temporaryFailCodes type: int[]? doc: | Exit codes that indicate the process failed due to a possibly temporary condition, where executing the process with the same runtime environment and inputs may produce different results. - name: permanentFailCodes type: int[]? doc: Exit codes that indicate the process failed due to a permanent logic error, where executing the process with the same runtime environment and same inputs is expected to always fail. - type: record name: DockerRequirement extends: ProcessRequirement doc: | Indicates that a workflow component should be run in a [Docker](http://docker.com) container, and specifies how to fetch or build the image. If a CommandLineTool lists `DockerRequirement` under `hints` (or `requirements`), it may (or must) be run in the specified Docker container. The platform must first acquire or install the correct Docker image as specified by `dockerPull`, `dockerImport`, `dockerLoad` or `dockerFile`. The platform must execute the tool in the container using `docker run` with the appropriate Docker image and tool command line. The workflow platform may provide input files and the designated output directory through the use of volume bind mounts. The platform may rewrite file paths in the input object to correspond to the Docker bind mounted locations. When running a tool contained in Docker, the workflow platform must not assume anything about the contents of the Docker container, such as the presence or absence of specific software, except to assume that the generated command line represents a valid command within the runtime environment of the container. ## Interaction with other requirements If [EnvVarRequirement](#EnvVarRequirement) is specified alongside a DockerRequirement, the environment variables must be provided to Docker using `--env` or `--env-file` and interact with the container's preexisting environment as defined by Docker. fields: - name: class type: string doc: "Always 'DockerRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: dockerPull type: string? doc: "Specify a Docker image to retrieve using `docker pull`." - name: dockerLoad type: string? doc: "Specify a HTTP URL from which to download a Docker image using `docker load`." - name: dockerFile type: string? doc: "Supply the contents of a Dockerfile which will be built using `docker build`." - name: dockerImport type: string? doc: "Provide HTTP URL to download and gunzip a Docker images using `docker import." - name: dockerImageId type: string? doc: | The image id that will be used for `docker run`. May be a human-readable image name or the image identifier hash. May be skipped if `dockerPull` is specified, in which case the `dockerPull` image id must be used. - name: dockerOutputDirectory type: string? doc: | Set the designated output directory to a specific location inside the Docker container. - type: record name: SoftwareRequirement extends: ProcessRequirement doc: | A list of software packages that should be configured in the environment of the defined process. fields: - name: class type: string doc: "Always 'SoftwareRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: packages type: SoftwarePackage[] doc: "The list of software to be configured." jsonldPredicate: mapSubject: package mapPredicate: specs - name: SoftwarePackage type: record fields: - name: package type: string doc: "The common name of the software to be configured." - name: version type: string[]? doc: "The (optional) version of the software to configured." - name: specs type: string[]? doc: | Must be one or more IRIs identifying resources for installing or enabling the software. Implementations may provide resolvers which map well-known software spec IRIs to some configuration action. For example, an IRI `https://packages.debian.org/jessie/bowtie` could be resolved with `apt-get install bowtie`. An IRI `https://anaconda.org/bioconda/bowtie` could be resolved with `conda install -c bioconda bowtie`. Tools may also provide IRIs to index entries such as [RRID](http://www.identifiers.org/rrid/), such as `http://identifiers.org/rrid/RRID:SCR_005476` - name: Dirent type: record doc: | Define a file or subdirectory that must be placed in the designated output directory prior to executing the command line tool. May be the result of executing an expression, such as building a configuration file from a template. fields: - name: entryname type: ["null", string, Expression] jsonldPredicate: _id: cwl:entryname doc: | The name of the file or subdirectory to create in the output directory. If `entry` is a File or Directory, this overrides `basename`. Optional. - name: entry type: [string, Expression] jsonldPredicate: _id: cwl:entry doc: | If the value is a string literal or an expression which evaluates to a string, a new file must be created with the string as the file contents. If the value is an expression that evaluates to a `File` object, this indicates the referenced file should be added to the designated output directory prior to executing the tool. If the value is an expression that evaluates to a `Dirent` object, this indicates that the File or Directory in `entry` should be added to the designated output directory with the name in `entryname`. If `writable` is false, the file may be made available using a bind mount or file system link to avoid unnecessary copying of the input file. - name: writable type: boolean? doc: | If true, the file or directory must be writable by the tool. Changes to the file or directory must be isolated and not visible by any other CommandLineTool process. This may be implemented by making a copy of the original file or directory. Default false (files and directories read-only by default). A directory marked as `writable: true` implies that all files and subdirectories are recursively writable as well. - name: InitialWorkDirRequirement type: record extends: ProcessRequirement doc: Define a list of files and subdirectories that must be created by the workflow platform in the designated output directory prior to executing the command line tool. fields: - name: class type: string doc: InitialWorkDirRequirement jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: listing type: - type: array items: [File, Directory, Dirent, string, Expression] - string - Expression jsonldPredicate: _id: "cwl:listing" doc: | The list of files or subdirectories that must be placed in the designated output directory prior to executing the command line tool. May be an expression. If so, the expression return value must validate as `{type: array, items: [File, Directory]}`. Files or Directories which are listed in the input parameters and appear in the `InitialWorkDirRequirement` listing must have their `path` set to their staged location in the designated output directory. If the same File or Directory appears more than once in the `InitialWorkDirRequirement` listing, the implementation must choose exactly one value for `path`; how this value is chosen is undefined. - name: EnvVarRequirement type: record extends: ProcessRequirement doc: | Define a list of environment variables which will be set in the execution environment of the tool. See `EnvironmentDef` for details. fields: - name: class type: string doc: "Always 'EnvVarRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: envDef type: EnvironmentDef[] doc: The list of environment variables. jsonldPredicate: mapSubject: envName mapPredicate: envValue - type: record name: ShellCommandRequirement extends: ProcessRequirement doc: | Modify the behavior of CommandLineTool to generate a single string containing a shell command line. Each item in the argument list must be joined into a string separated by single spaces and quoted to prevent intepretation by the shell, unless `CommandLineBinding` for that argument contains `shellQuote: false`. If `shellQuote: false` is specified, the argument is joined into the command string without quoting, which allows the use of shell metacharacters such as `|` for pipes. fields: - name: class type: string doc: "Always 'ShellCommandRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - type: record name: ResourceRequirement extends: ProcessRequirement doc: | Specify basic hardware resource requirements. "min" is the minimum amount of a resource that must be reserved to schedule a job. If "min" cannot be satisfied, the job should not be run. "max" is the maximum amount of a resource that the job shall be permitted to use. If a node has sufficient resources, multiple jobs may be scheduled on a single node provided each job's "max" resource requirements are met. If a job attempts to exceed its "max" resource allocation, an implementation may deny additional resources, which may result in job failure. If "min" is specified but "max" is not, then "max" == "min" If "max" is specified by "min" is not, then "min" == "max". It is an error if max < min. It is an error if the value of any of these fields is negative. If neither "min" nor "max" is specified for a resource, an implementation may provide a default. fields: - name: class type: string doc: "Always 'ResourceRequirement'" jsonldPredicate: "_id": "@type" "_type": "@vocab" - name: coresMin type: ["null", long, string, Expression] doc: Minimum reserved number of CPU cores - name: coresMax type: ["null", int, string, Expression] doc: Maximum reserved number of CPU cores - name: ramMin type: ["null", long, string, Expression] doc: Minimum reserved RAM in mebibytes (2**20) - name: ramMax type: ["null", long, string, Expression] doc: Maximum reserved RAM in mebibytes (2**20) - name: tmpdirMin type: ["null", long, string, Expression] doc: Minimum reserved filesystem based storage for the designated temporary directory, in mebibytes (2**20) - name: tmpdirMax type: ["null", long, string, Expression] doc: Maximum reserved filesystem based storage for the designated temporary directory, in mebibytes (2**20) - name: outdirMin type: ["null", long, string, Expression] doc: Minimum reserved filesystem based storage for the designated output directory, in mebibytes (2**20) - name: outdirMax type: ["null", long, string, Expression] doc: Maximum reserved filesystem based storage for the designated output directory, in mebibytes (2**20) cwltool-1.0.20180302231433/MANIFEST.in0000644000175200017520000000174013247251315017336 0ustar mcrusoemcrusoe00000000000000include gittaggers.py Makefile cwltool.py include tests/* include tests/tmp1/tmp2/tmp3/.gitkeep include tests/wf/* include tests/override/* include cwltool/schemas/v1.0/*.yml include cwltool/schemas/draft-2/*.yml include cwltool/schemas/draft-3/*.yml include cwltool/schemas/draft-3/*.md include cwltool/schemas/draft-3/salad/schema_salad/metaschema/*.yml include cwltool/schemas/draft-3/salad/schema_salad/metaschema/*.md include cwltool/schemas/v1.0/*.yml include cwltool/schemas/v1.0/*.md include cwltool/schemas/v1.0/salad/schema_salad/metaschema/*.yml include cwltool/schemas/v1.0/salad/schema_salad/metaschema/*.md include cwltool/schemas/v1.1.0-dev1/*.yml include cwltool/schemas/v1.1.0-dev1/*.md include cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/*.yml include cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/*.md include cwltool/cwlNodeEngine.js include cwltool/cwlNodeEngineJSConsole.js include cwltool/extensions.yml global-exclude *~ global-exclude *.pyc cwltool-1.0.20180302231433/README.rst0000644000175200017520000005631313247251315017275 0ustar mcrusoemcrusoe00000000000000================================================================== Common Workflow Language tool description reference implementation ================================================================== CWL conformance tests: |Build Status| Travis CI: |Unix Build Status| .. |Unix Build Status| image:: https://img.shields.io/travis/common-workflow-language/cwltool/master.svg?label=unix%20build :target: https://travis-ci.org/common-workflow-language/cwltool This is the reference implementation of the Common Workflow Language. It is intended to feature complete and provide comprehensive validation of CWL files as well as provide other tools related to working with CWL. This is written and tested for Python ``2.7 and 3.x {x = 3, 4, 5, 6}`` The reference implementation consists of two packages. The ``cwltool`` package is the primary Python module containing the reference implementation in the ``cwltool`` module and console executable by the same name. The ``cwlref-runner`` package is optional and provides an additional entry point under the alias ``cwl-runner``, which is the implementation-agnostic name for the default CWL interpreter installed on a host. Install ------- It is highly recommended to setup virtual environment before installing `cwltool`: .. code:: bash virtualenv -p python2 venv # Create a virtual environment, can use `python3` as well source venv/bin/activate # Activate environment before installing `cwltool` Installing the official package from PyPi (will install "cwltool" package as well) .. code:: bash pip install cwlref-runner If installing alongside another CWL implementation then .. code:: bash pip install cwltool Or you can install from source: .. code:: bash git clone https://github.com/common-workflow-language/cwltool.git # clone cwltool repo cd cwltool # Switch to source directory pip install . # Install `cwltool` from source cwltool --version # Check if the installation works correctly Remember, if co-installing multiple CWL implementations then you need to maintain which implementation ``cwl-runner`` points to via a symbolic file system link or `another facility `_. Running tests locally --------------------- - Running basic tests ``(/tests)``: To run the basis tests after installing `cwltool` execute the following: .. code:: bash pip install pytest mock py.test --ignore cwltool/schemas/ --pyarg cwltool To run various tests in all supported Python environments we use `tox `_. To run the test suite in all supported Python environments first downloading the complete code repository (see the ``git clone`` instructions above) and then run the following in the terminal: ``pip install tox; tox`` List of all environment can be seen using: ``tox --listenvs`` and running a specfic test env using: ``tox -e `` - Running the entire suite of CWL conformance tests: The GitHub repository for the CWL specifications contains a script that tests a CWL implementation against a wide array of valid CWL files using the `cwltest `_ program Instructions for running these tests can be found in the Common Workflow Language Specification repository at https://github.com/common-workflow-language/common-workflow-language/blob/master/CONFORMANCE_TESTS.md Run on the command line ----------------------- Simple command:: cwl-runner [tool-or-workflow-description] [input-job-settings] Or if you have multiple CWL implementations installed and you want to override the default cwl-runner use:: cwltool [tool-or-workflow-description] [input-job-settings] Use with boot2docker -------------------- boot2docker is running docker inside a virtual machine and it only mounts ``Users`` on it. The default behavior of CWL is to create temporary directories under e.g. ``/Var`` which is not accessible to Docker containers. To run CWL successfully with boot2docker you need to set the ``--tmpdir-prefix`` and ``--tmp-outdir-prefix`` to somewhere under ``/Users``:: $ cwl-runner --tmp-outdir-prefix=/Users/username/project --tmpdir-prefix=/Users/username/project wc-tool.cwl wc-job.json .. |Build Status| image:: https://ci.commonwl.org/buildStatus/icon?job=cwltool-conformance :target: https://ci.commonwl.org/job/cwltool-conformance/ Using user-space replacements for Docker ---------------------------------------- Some shared computing environments don't support Docker software containers for technical or policy reasons. As a work around, the CWL reference runner supports using a alternative ``docker`` implementations on Linux with the ``--user-space-docker-cmd`` option. One such "user space" friendly docker replacement is ``udocker`` https://github.com/indigo-dc/udocker and another is ``dx-docker`` https://wiki.dnanexus.com/Developer-Tutorials/Using-Docker-Images udocker installation: https://github.com/indigo-dc/udocker/blob/master/doc/installation_manual.md#22-install-from-indigo-datacloud-repositories dx-docker installation: start with the DNAnexus toolkit (see https://wiki.dnanexus.com/Downloads for instructions). Run `cwltool` just as you normally would, but with the new option, e.g. from the conformance tests: .. code:: bash cwltool --user-space-docker-cmd=udocker https://raw.githubusercontent.com/common-workflow-language/common-workflow-language/master/v1.0/v1.0/test-cwl-out2.cwl https://github.com/common-workflow-language/common-workflow-language/blob/master/v1.0/v1.0/empty.json or .. code:: bash cwltool --user-space-docker-cmd=dx-docker https://raw.githubusercontent.com/common-workflow-language/common-workflow-language/master/v1.0/v1.0/test-cwl-out2.cwl https://github.com/common-workflow-language/common-workflow-language/blob/master/v1.0/v1.0/empty.json ``cwltool`` can use `Singularity `_ as a Docker container runtime, an experimental feature. Singularity will run software containers specified in ``DockerRequirement`` and therefore works with Docker images only, native Singularity images are not supported. To use Singularity as the Docker container runtime, provide ``--singularity`` command line option to ``cwltool``. .. code:: bash cwltool --singularity https://raw.githubusercontent.com/common-workflow-language/common-workflow-language/master/v1.0/v1.0/v1.0/cat3-tool-mediumcut.cwl https://github.com/common-workflow-language/common-workflow-language/blob/master/v1.0/v1.0/cat-job.json Tool or workflow loading from remote or local locations ------------------------------------------------------- ``cwltool`` can run tool and workflow descriptions on both local and remote systems via its support for HTTP[S] URLs. Input job files and Workflow steps (via the `run` directive) can reference CWL documents using absolute or relative local filesytem paths. If a relative path is referenced and that document isn't found in the current directory then the following locations will be searched: http://www.commonwl.org/v1.0/CommandLineTool.html#Discovering_CWL_documents_on_a_local_filesystem Use with GA4GH Tool Registry API -------------------------------- Cwltool can launch tools directly from `GA4GH Tool Registry API`_ endpoints. By default, cwltool searches https://dockstore.org/ . Use --add-tool-registry to add other registries to the search path. For example :: cwltool --non-strict quay.io/collaboratory/dockstore-tool-bamstats:master test.json and (defaults to latest when a version is not specified) :: cwltool --non-strict quay.io/collaboratory/dockstore-tool-bamstats test.json For this example, grab the test.json (and input file) from https://github.com/CancerCollaboratory/dockstore-tool-bamstats .. _`GA4GH Tool Registry API`: https://github.com/ga4gh/tool-registry-schemas Import as a module ------------------ Add .. code:: python import cwltool to your script. The easiest way to use cwltool to run a tool or workflow from Python is to use a Factory .. code:: python import cwltool.factory fac = cwltool.factory.Factory() echo = f.make("echo.cwl") result = echo(inp="foo") # result["out"] == "foo" Leveraging SoftwareRequirements (Beta) -------------------------------------- CWL tools may be decorated with ``SoftwareRequirement`` hints that cwltool may in turn use to resolve to packages in various package managers or dependency management systems such as `Environment Modules `__. Utilizing ``SoftwareRequirement`` hints using cwltool requires an optional dependency, for this reason be sure to use specify the ``deps`` modifier when installing cwltool. For instance:: $ pip install 'cwltool[deps]' Installing cwltool in this fashion enables several new command line options. The most general of these options is ``--beta-dependency-resolvers-configuration``. This option allows one to specify a dependency resolvers configuration file. This file may be specified as either XML or YAML and very simply describes various plugins to enable to "resolve" ``SoftwareRequirement`` dependencies. To discuss some of these plugins and how to configure them, first consider the following ``hint`` definition for an example CWL tool. .. code:: yaml SoftwareRequirement: packages: - package: seqtk version: - r93 Now imagine deploying cwltool on a cluster with Software Modules installed and that a ``seqtk`` module is available at version ``r93``. This means cluster users likely won't have the binary ``seqtk`` on their ``PATH`` by default, but after sourcing this module with the command ``modulecmd sh load seqtk/r93`` ``seqtk`` is available on the ``PATH``. A simple dependency resolvers configuration file, called ``dependency-resolvers-conf.yml`` for instance, that would enable cwltool to source the correct module environment before executing the above tool would simply be: .. code:: yaml - type: modules The outer list indicates that one plugin is being enabled, the plugin parameters are defined as a dictionary for this one list item. There is only one required parameter for the plugin above, this is ``type`` and defines the plugin type. This parameter is required for all plugins. The available plugins and the parameters available for each are documented (incompletely) `here `__. Unfortunately, this documentation is in the context of Galaxy tool ``requirement`` s instead of CWL ``SoftwareRequirement`` s, but the concepts map fairly directly. cwltool is distributed with an example of such seqtk tool and sample corresponding job. It could executed from the cwltool root using a dependency resolvers configuration file such as the above one using the command:: cwltool --beta-dependency-resolvers-configuration /path/to/dependency-resolvers-conf.yml \ tests/seqtk_seq.cwl \ tests/seqtk_seq_job.json This example demonstrates both that cwltool can leverage existing software installations and also handle workflows with dependencies on different versions of the same software and libraries. However the above example does require an existing module setup so it is impossible to test this example "out of the box" with cwltool. For a more isolated test that demonstrates all the same concepts - the resolver plugin type ``galaxy_packages`` can be used. "Galaxy packages" are a lighter weight alternative to Environment Modules that are really just defined by a way to lay out directories into packages and versions to find little scripts that are sourced to modify the environment. They have been used for years in Galaxy community to adapt Galaxy tools to cluster environments but require neither knowledge of Galaxy nor any special tools to setup. These should work just fine for CWL tools. The cwltool source code repository's test directory is setup with a very simple directory that defines a set of "Galaxy packages" (but really just defines one package named ``random-lines``). The directory layout is simply:: tests/test_deps_env/ random-lines/ 1.0/ env.sh If the ``galaxy_packages`` plugin is enabled and pointed at the ``tests/test_deps_env`` directory in cwltool's root and a ``SoftwareRequirement`` such as the following is encountered. .. code:: yaml hints: SoftwareRequirement: packages: - package: 'random-lines' version: - '1.0' Then cwltool will simply find that ``env.sh`` file and source it before executing the corresponding tool. That ``env.sh`` script is only responsible for modifying the job's ``PATH`` to add the required binaries. This is a full example that works since resolving "Galaxy packages" has no external requirements. Try it out by executing the following command from cwltool's root directory:: cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf.yml \ tests/random_lines.cwl \ tests/random_lines_job.json The resolvers configuration file in the above example was simply: .. code:: yaml - type: galaxy_packages base_path: ./tests/test_deps_env It is possible that the ``SoftwareRequirement`` s in a given CWL tool will not match the module names for a given cluster. Such requirements can be re-mapped to specific deployed packages and/or versions using another file specified using the resolver plugin parameter `mapping_files`. We will demonstrate this using `galaxy_packages` but the concepts apply equally well to Environment Modules or Conda packages (described below) for instance. So consider the resolvers configuration file (`tests/test_deps_env_resolvers_conf_rewrite.yml`): .. code:: yaml - type: galaxy_packages base_path: ./tests/test_deps_env mapping_files: ./tests/test_deps_mapping.yml And the corresponding mapping configuraiton file (`tests/test_deps_mapping.yml`): .. code:: yaml - from: name: randomLines version: 1.0.0-rc1 to: name: random-lines version: '1.0' This is saying if cwltool encounters a requirement of ``randomLines`` at version ``1.0.0-rc1`` in a tool, to rewrite to our specific plugin as ``random-lines`` at version ``1.0``. cwltool has such a test tool called ``random_lines_mapping.cwl`` that contains such a source ``SoftwareRequirement``. To try out this example with mapping, execute the following command from the cwltool root directory:: cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf_rewrite.yml \ tests/random_lines_mapping.cwl \ tests/random_lines_job.json The previous examples demonstrated leveraging existing infrastructure to provide requirements for CWL tools. If instead a real package manager is used cwltool has the oppertunity to install requirements as needed. While initial support for Homebrew/Linuxbrew plugins is available, the most developed such plugin is for the `Conda `__ package manager. Conda has the nice properties of allowing multiple versions of a package to be installed simultaneously, not requiring evalated permissions to install Conda itself or packages using Conda, and being cross platform. For these reasons, cwltool may run as a normal user, install its own Conda environment and manage multiple versions of Conda packages on both Linux and Mac OS X. The Conda plugin can be endlessly configured, but a sensible set of defaults that has proven a powerful stack for dependency management within the Galaxy tool development ecosystem can be enabled by simply passing cwltool the ``--beta-conda-dependencies`` flag. With this we can use the seqtk example above without Docker and without any externally managed services - cwltool should install everything it needs and create an environment for the tool. Try it out with the follwing command:: cwltool --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json The CWL specification allows URIs to be attached to ``SoftwareRequirement`` s that allow disambiguation of package names. If the mapping files described above allow deployers to adapt tools to their infrastructure, this mechanism allows tools to adapt their requirements to multiple package managers. To demonstrate this within the context of the seqtk, we can simply break the package name we use and then specify a specific Conda package as follows: .. code:: yaml hints: SoftwareRequirement: packages: - package: seqtk_seq version: - '1.2' specs: - https://anaconda.org/bioconda/seqtk - https://packages.debian.org/sid/seqtk The example can be executed using the command:: cwltool --beta-conda-dependencies tests/seqtk_seq_wrong_name.cwl tests/seqtk_seq_job.json The plugin framework for managing resolution of these software requirements as maintained as part of `galaxy-lib `__ - a small, portable subset of the Galaxy project. More information on configuration and implementation can be found at the following links: - `Dependency Resolvers in Galaxy `__ - `Conda for [Galaxy] Tool Dependencies `__ - `Mapping Files - Implementation `__ - `Specifications - Implementation `__ - `Initial cwltool Integration Pull Request `__ Overriding workflow requirements at load time --------------------------------------------- Sometimes a workflow needs additional requirements to run in a particular environment or with a particular dataset. To avoid the need to modify the underlying workflow, cwltool supports requirement "overrides". The format of the "overrides" object is a mapping of item identifier (workflow, workflow step, or command line tool) to the process requirements that should be applied. .. code:: yaml cwltool:overrides: echo.cwl: requirements: EnvVarRequirement: envDef: MESSAGE: override_value Overrides can be specified either on the command line, or as part of the job input document. Workflow steps are identified using the name of the workflow file followed by the step name as a document fragment identifier "#id". Override identifiers are relative to the toplevel workflow document. .. code:: bash cwltool --overrides overrides.yml my-tool.cwl my-job.yml .. code:: yaml input_parameter1: value1 input_parameter2: value2 cwltool:overrides: workflow.cwl#step1: requirements: EnvVarRequirement: envDef: MESSAGE: override_value .. code:: bash cwltool my-tool.cwl my-job-with-overrides.yml CWL Tool Control Flow --------------------- Technical outline of how cwltool works internally, for maintainers. #. Use CWL ``load_tool()`` to load document. #. Fetches the document from file or URL #. Applies preprocessing (syntax/identifier expansion and normalization) #. Validates the document based on cwlVersion #. If necessary, updates the document to latest spec #. Constructs a Process object using ``make_tool()``` callback. This yields a CommandLineTool, Workflow, or ExpressionTool. For workflows, this recursively constructs each workflow step. #. To construct custom types for CommandLineTool, Workflow, or ExpressionTool, provide a custom ``make_tool()`` #. Iterate on the ``job()`` method of the Process object to get back runnable jobs. #. ``job()`` is a generator method (uses the Python iterator protocol) #. Each time the ``job()`` method is invoked in an iteration, it returns one of: a runnable item (an object with a ``run()`` method), ``None`` (indicating there is currently no work ready to run) or end of iteration (indicating the process is complete.) #. Invoke the runnable item by calling ``run()``. This runs the tool and gets output. #. Output of a process is reported by an output callback. #. ``job()`` may be iterated over multiple times. It will yield all the work that is currently ready to run and then yield None. #. ``Workflow`` objects create a corresponding ``WorkflowJob`` and ``WorkflowJobStep`` objects to hold the workflow state for the duration of the job invocation. #. The WorkflowJob iterates over each WorkflowJobStep and determines if the inputs the step are ready. #. When a step is ready, it constructs an input object for that step and iterates on the ``job()`` method of the workflow job step. #. Each runnable item is yielded back up to top level run loop #. When a step job completes and receives an output callback, the job outputs are assigned to the output of the workflow step. #. When all steps are complete, the intermediate files are moved to a final workflow output, intermediate directories are deleted, and the output callback for the workflow is called. #. ``CommandLineTool`` job() objects yield a single runnable object. #. The CommandLineTool ``job()`` method calls ``makeJobRunner()`` to create a ``CommandLineJob`` object #. The job method configures the CommandLineJob object by setting public attributes #. The job method iterates over file and directories inputs to the CommandLineTool and creates a "path map". #. Files are mapped from their "resolved" location to a "target" path where they will appear at tool invocation (for example, a location inside a Docker container.) The target paths are used on the command line. #. Files are staged to targets paths using either Docker volume binds (when using containers) or symlinks (if not). This staging step enables files to be logically rearranged or renamed independent of their source layout. #. The ``run()`` method of CommandLineJob executes the command line tool or Docker container, waits for it to complete, collects output, and makes the output callback. Extension points ---------------- The following functions can be provided to main(), to load_tool(), or to the executor to override or augment the listed behaviors. executor :: executor(tool, job_order_object, **kwargs) (Process, Dict[Text, Any], **Any) -> Tuple[Dict[Text, Any], Text] A toplevel workflow execution loop, should synchronously execute a process object and return an output object. makeTool :: makeTool(toolpath_object, **kwargs) (Dict[Text, Any], **Any) -> Process Construct a Process object from a document. selectResources :: selectResources(request) (Dict[Text, int]) -> Dict[Text, int] Take a resource request and turn it into a concrete resource assignment. versionfunc :: () () -> Text Return version string. make_fs_access :: make_fs_access(basedir) (Text) -> StdFsAccess Return a file system access object. fetcher_constructor :: fetcher_constructor(cache, session) (Dict[unicode, unicode], requests.sessions.Session) -> Fetcher Construct a Fetcher object with the supplied cache and HTTP session. resolver :: resolver(document_loader, document) (Loader, Union[Text, dict[Text, Any]]) -> Text Resolve a relative document identifier to an absolute one which can be fetched. logger_handler :: logger_handler logging.Handler Handler object for logging. cwltool-1.0.20180302231433/tests/0000755000175200017520000000000013247251336016743 5ustar mcrusoemcrusoe00000000000000cwltool-1.0.20180302231433/tests/2.fasta0000644000175200017520000000123713247251316020125 0ustar mcrusoemcrusoe00000000000000>Sequence 561 BP; 135 A; 106 C; 98 G; 222 T; 0 other; gttcgatgcc taaaatacct tcttttgtcc ctacacagac cacagttttc ctaatggctt tacaccgact agaaattctt gtgcaagcac taattgaaag cggttggcct agagtgttac cggtttgtat agctgagcgc gtctcttgcc ctgatcaaag gttcattttc tctactttgg aagacgttgt ggaagaatac aacaagtacg agtctctccc ccctggtttg ctgattactg gatacagttg taataccctt cgcaacaccg cgtaactatc tatatgaatt attttccctt tattatatgt agtaggttcg tctttaatct tcctttagca agtcttttac tgttttcgac ctcaatgttc atgttcttag gttgttttgg ataatatgcg gtcagtttaa tcttcgttgt ttcttcttaa aatatttatt catggtttaa tttttggttt gtacttgttc aggggccagt tcattattta ctctgtttgt atacagcagt tcttttattt ttagtatgat tttaatttaa aacaattcta atggtcaaaa acwltool-1.0.20180302231433/tests/test_default_path.py0000644000175200017520000000120413247251316023007 0ustar mcrusoemcrusoe00000000000000import unittest from cwltool.load_tool import fetch_document, validate_document from .util import get_data from schema_salad.ref_resolver import Loader class TestDefaultPath(unittest.TestCase): # Testing that error is not raised when default path is not present def test_default_path(self): document_loader, workflowobj, uri = fetch_document( get_data("tests/wf/default_path.cwl")) document_loader, avsc_names, processobj, metadata, uri = validate_document( document_loader, workflowobj, uri) self.assertIsInstance(document_loader,Loader) self.assertIn("cwlVersion",processobj) cwltool-1.0.20180302231433/tests/test_override.py0000644000175200017520000000547713247251316022206 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import unittest import cwltool.expression as expr import cwltool.pathmapper import cwltool.process import cwltool.workflow import pytest import json from cwltool.main import main from cwltool.utils import onWindows from six import StringIO from .util import get_data class TestOverride(unittest.TestCase): @pytest.mark.skipif(onWindows(), reason="Instance of Cwltool is used, On windows that invoke a default docker Container") def test_overrides(self): sio = StringIO() self.assertEquals(main([get_data('tests/override/echo.cwl'), get_data('tests/override/echo-job.yml')], stdout=sio), 0) self.assertEquals({"out": "zing hello1\n"}, json.loads(sio.getvalue())) sio = StringIO() self.assertEquals(main(["--overrides", get_data('tests/override/ov.yml'), get_data('tests/override/echo.cwl'), get_data('tests/override/echo-job.yml')], stdout=sio), 0) self.assertEquals({"out": "zing hello2\n"}, json.loads(sio.getvalue())) sio = StringIO() self.assertEquals(main([get_data('tests/override/echo.cwl'), get_data('tests/override/echo-job-ov.yml')], stdout=sio), 0) self.assertEquals({"out": "zing hello3\n"}, json.loads(sio.getvalue())) sio = StringIO() self.assertEquals(main([get_data('tests/override/echo-job-ov2.yml')], stdout=sio), 0) self.assertEquals({"out": "zing hello4\n"}, json.loads(sio.getvalue())) sio = StringIO() self.assertEquals(main(["--overrides", get_data('tests/override/ov.yml'), get_data('tests/override/echo-wf.cwl'), get_data('tests/override/echo-job.yml')], stdout=sio), 0) self.assertEquals({"out": "zing hello2\n"}, json.loads(sio.getvalue())) sio = StringIO() self.assertEquals(main(["--overrides", get_data('tests/override/ov2.yml'), get_data('tests/override/echo-wf.cwl'), get_data('tests/override/echo-job.yml')], stdout=sio), 0) self.assertEquals({"out": "zing hello5\n"}, json.loads(sio.getvalue())) sio = StringIO() self.assertEquals(main(["--overrides", get_data('tests/override/ov3.yml'), get_data('tests/override/echo-wf.cwl'), get_data('tests/override/echo-job.yml')], stdout=sio), 0) self.assertEquals({"out": "zing hello6\n"}, json.loads(sio.getvalue())) cwltool-1.0.20180302231433/tests/test_iwdr.py0000644000175200017520000000070613247251316021322 0ustar mcrusoemcrusoe00000000000000import unittest import cwltool import cwltool.factory from .util import get_data class TestInitialWorkDir(unittest.TestCase): def test_newline_in_entry(self): """ test that files in InitialWorkingDirectory are created with a newline character """ f = cwltool.factory.Factory() echo = f.make(get_data("tests/wf/iwdr-entry.cwl")) self.assertEqual(echo(message="hello"), {"out": "CONFIGVAR=hello\n"}) cwltool-1.0.20180302231433/tests/util.py0000644000175200017520000000120313247251316020264 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import os from pkg_resources import (Requirement, ResolutionError, # type: ignore resource_filename) def get_data(filename): filename = os.path.normpath( filename) # normalizing path depending on OS or else it will cause problem when joining path filepath = None try: filepath = resource_filename( Requirement.parse("cwltool"), filename) except ResolutionError: pass if not filepath or not os.path.isfile(filepath): filepath = os.path.join(os.path.dirname(__file__), os.pardir, filename) return filepath cwltool-1.0.20180302231433/tests/test_http_input.py0000644000175200017520000000222213247251316022546 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import unittest import os import tempfile from cwltool.pathmapper import PathMapper class TestHttpInput(unittest.TestCase): def test_http_path_mapping(self): class SubPathMapper(PathMapper): def __init__(self, referenced_files, basedir, stagedir): super(SubPathMapper, self).__init__(referenced_files, basedir, stagedir) input_file_path = "https://raw.githubusercontent.com/common-workflow-language/cwltool/master/tests/2.fasta" tempdir = tempfile.mkdtemp() base_file = [{ "class": "File", "location": "https://raw.githubusercontent.com/common-workflow-language/cwltool/master/tests/2.fasta", "basename": "chr20.fa" }] path_map_obj = SubPathMapper(base_file, os.getcwd(), tempdir) self.assertIn(input_file_path,path_map_obj._pathmap) assert os.path.exists(path_map_obj._pathmap[input_file_path].resolved) == 1 with open(path_map_obj._pathmap[input_file_path].resolved) as f: self.assertIn(">Sequence 561 BP; 135 A; 106 C; 98 G; 222 T; 0 other;",f.read()) f.close()cwltool-1.0.20180302231433/tests/override/0000755000175200017520000000000013247251336020562 5ustar mcrusoemcrusoe00000000000000cwltool-1.0.20180302231433/tests/override/ov.yml0000644000175200017520000000016413247251316021730 0ustar mcrusoemcrusoe00000000000000cwltool:overrides: echo.cwl: requirements: EnvVarRequirement: envDef: MESSAGE: hello2 cwltool-1.0.20180302231433/tests/override/echo-wf.cwl0000755000175200017520000000032313247251316022620 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: Workflow inputs: m1: string outputs: out: type: string outputSource: step1/out steps: step1: in: m1: m1 out: [out] run: echo.cwl cwltool-1.0.20180302231433/tests/override/echo.cwl0000755000175200017520000000065313247251316022214 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool requirements: ShellCommandRequirement: {} hints: EnvVarRequirement: envDef: MESSAGE: hello1 inputs: m1: string outputs: - id: out type: string outputBinding: glob: out.txt loadContents: true outputEval: $(self[0].contents) arguments: ["echo", $(inputs.m1), {shellQuote: false, valueFrom: "$MESSAGE"}] stdout: out.txt cwltool-1.0.20180302231433/tests/override/ov2.yml0000644000175200017520000000017713247251316022016 0ustar mcrusoemcrusoe00000000000000cwltool:overrides: "echo-wf.cwl#step1": requirements: EnvVarRequirement: envDef: MESSAGE: hello5 cwltool-1.0.20180302231433/tests/override/ov3.yml0000644000175200017520000000016713247251316022016 0ustar mcrusoemcrusoe00000000000000cwltool:overrides: echo-wf.cwl: requirements: EnvVarRequirement: envDef: MESSAGE: hello6 cwltool-1.0.20180302231433/tests/override/echo-job.yml0000644000175200017520000000001013247251316022760 0ustar mcrusoemcrusoe00000000000000m1: zingcwltool-1.0.20180302231433/tests/override/echo-job-ov.yml0000644000175200017520000000017513247251316023416 0ustar mcrusoemcrusoe00000000000000m1: zing cwltool:overrides: echo.cwl: requirements: EnvVarRequirement: envDef: MESSAGE: hello3 cwltool-1.0.20180302231433/tests/override/echo-job-ov2.yml0000644000175200017520000000022013247251316023467 0ustar mcrusoemcrusoe00000000000000m1: zing cwltool:overrides: echo.cwl: requirements: EnvVarRequirement: envDef: MESSAGE: hello4 cwl:tool: echo.cwl cwltool-1.0.20180302231433/tests/test_js_sandbox.py0000644000175200017520000000301213247251316022500 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import unittest from mock import Mock import cwltool import cwltool.factory # we should modify the subprocess imported from cwltool.sandboxjs from cwltool.sandboxjs import (check_js_threshold_version, subprocess) from .util import get_data class Javascript_Sanity_Checks(unittest.TestCase): def setUp(self): self.check_output = subprocess.check_output def tearDown(self): subprocess.check_output = self.check_output def test_node_version(self): subprocess.check_output = Mock(return_value=b'v0.8.26\n') self.assertEquals(check_js_threshold_version('node'), False) subprocess.check_output = Mock(return_value=b'v0.10.25\n') self.assertEquals(check_js_threshold_version('node'), False) subprocess.check_output = Mock(return_value=b'v0.10.26\n') self.assertEquals(check_js_threshold_version('node'), True) subprocess.check_output = Mock(return_value=b'v4.4.2\n') self.assertEquals(check_js_threshold_version('node'), True) subprocess.check_output = Mock(return_value=b'v7.7.3\n') self.assertEquals(check_js_threshold_version('node'), True) def test_is_javascript_installed(self): pass class TestValueFrom(unittest.TestCase): def test_value_from_two_concatenated_expressions(self): f = cwltool.factory.Factory() echo = f.make(get_data("tests/wf/vf-concat.cwl")) self.assertEqual(echo(), {u"out": u"a sting\n"}) cwltool-1.0.20180302231433/tests/test_relax_path_checks.py0000644000175200017520000000235313247251316024024 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import unittest import pytest from tempfile import NamedTemporaryFile from cwltool.main import main from cwltool.utils import onWindows class ToolArgparse(unittest.TestCase): script = ''' #!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool inputs: - id: input type: File inputBinding: position: 0 outputs: - id: output type: File outputBinding: glob: test.txt stdout: test.txt baseCommand: [cat] ''' @pytest.mark.skipif(onWindows(), reason="Instance of Cwltool is used, On windows that invoke a default docker Container") def test_spaces_in_input_files(self): with NamedTemporaryFile(mode='w', delete=False) as f: f.write(self.script) f.flush() f.close() with NamedTemporaryFile(prefix="test with spaces", delete=False) as spaces: spaces.close() self.assertEquals( main(["--debug", f.name, '--input', spaces.name]), 1) self.assertEquals( main(["--debug", "--relax-path-checks", f.name, '--input', spaces.name]), 0) if __name__ == '__main__': unittest.main() cwltool-1.0.20180302231433/tests/2.fastq0000644000175200017520000000037713247251316020151 0ustar mcrusoemcrusoe00000000000000@EAS54_6_R1_2_1_413_324 CCCTTCTTGTCTTCAGCGTTTCTCC + ;;3;;;;;;;;;;;;7;;;;;;;88 @EAS54_6_R1_2_1_540_792 TTGGCAGGCCAAGGCCGATGGATCA + ;;;;;;;;;;;7;;;;;-;;;3;83 @EAS54_6_R1_2_1_443_348 GTTGCTTCTGGCGTGGGTGGGGGGG +EAS54_6_R1_2_1_443_348 ;;;;;;;;;;;9;7;;.7;393333cwltool-1.0.20180302231433/tests/__init__.py0000644000175200017520000000000013247251316021040 0ustar mcrusoemcrusoe00000000000000cwltool-1.0.20180302231433/tests/test_toolargparse.py0000644000175200017520000000556013247251316023062 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import unittest import pytest from tempfile import NamedTemporaryFile from cwltool.main import main from cwltool.utils import onWindows from .util import get_data class ToolArgparse(unittest.TestCase): script = ''' #!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool doc: "This tool is developed for SMC-RNA Challenge for detecting gene fusions (STAR fusion)" inputs: #Give it a list of input files - id: input type: File inputBinding: position: 0 outputs: - id: output type: File outputBinding: glob: test.txt stdout: test.txt baseCommand: [cat] ''' script2 = ''' #!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool inputs: - id: bdg type: "boolean" outputs: - id: output type: File outputBinding: glob: foo baseCommand: - echo - "ff" stdout: foo ''' script3 = ''' #!/usr/bin/env cwl-runner cwlVersion: v1.0 class: ExpressionTool inputs: foo: type: type: record fields: one: File two: string expression: $(inputs.foo.two) outputs: [] ''' @pytest.mark.skipif(onWindows(), reason="Instance of Cwltool is used, On windows that invoke a default docker Container") def test_help(self): with NamedTemporaryFile(mode='w', delete=False) as f: f.write(self.script) f.flush() f.close() self.assertEquals(main(["--debug", f.name, '--input', get_data('tests/echo.cwl')]), 0) self.assertEquals(main(["--debug", f.name, '--input', get_data('tests/echo.cwl')]), 0) @pytest.mark.skipif(onWindows(), reason="Instance of Cwltool is used, On windows that invoke a default docker Container") def test_bool(self): with NamedTemporaryFile(mode='w', delete=False) as f: f.write(self.script2) f.flush() f.close() try: self.assertEquals(main([f.name, '--help']), 0) except SystemExit as e: self.assertEquals(e.code, 0) def test_record_help(self): with NamedTemporaryFile(mode='w', delete=False) as f: f.write(self.script3) f.flush() f.close() try: self.assertEquals(main([f.name, '--help']), 0) except SystemExit as e: self.assertEquals(e.code, 0) def test_record(self): with NamedTemporaryFile(mode='w', delete=False) as f: f.write(self.script3) f.flush() f.close() try: self.assertEquals(main([f.name, '--foo.one', get_data('tests/echo.cwl'), '--foo.two', 'test']), 0) except SystemExit as e: self.assertEquals(e.code, 0) if __name__ == '__main__': unittest.main() cwltool-1.0.20180302231433/tests/test_parallel.py0000644000175200017520000000231013247251316022142 0ustar mcrusoemcrusoe00000000000000import json import unittest import pytest import cwltool import cwltool.factory from cwltool.executors import MultithreadedJobExecutor from cwltool.utils import onWindows from .util import get_data class TestParallel(unittest.TestCase): @pytest.mark.skipif(onWindows(), reason="Unexplainable behavior: cwltool on AppVeyor does not recognize cwlVersion" "in count-lines1-wf.cwl") def test_sequential_workflow(self): test_file = "tests/wf/count-lines1-wf.cwl" f = cwltool.factory.Factory(executor=MultithreadedJobExecutor()) echo = f.make(get_data(test_file)) self.assertEqual(echo(file1= { "class": "File", "location": get_data("tests/wf/whale.txt") }), {"count_output": 16}) def test_scattered_workflow(self): test_file = "tests/wf/scatter-wf4.cwl" job_file = "tests/wf/scatter-job2.json" f = cwltool.factory.Factory(executor=MultithreadedJobExecutor()) echo = f.make(get_data(test_file)) with open(get_data(job_file)) as job: self.assertEqual(echo(**json.load(job)), {'out': ['foo one three', 'foo two four']}) cwltool-1.0.20180302231433/tests/test_examples.py0000644000175200017520000006235113247251316022177 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import unittest import pytest import subprocess from os import path import sys from io import StringIO from cwltool.errors import WorkflowException from cwltool.utils import onWindows try: reload except: try: from imp import reload except: from importlib import reload import cwltool.expression as expr import cwltool.factory import cwltool.pathmapper import cwltool.process import cwltool.workflow import schema_salad.validate from cwltool.main import main from .util import get_data sys.argv = [''] class TestParamMatching(unittest.TestCase): def test_params(self): self.assertTrue(expr.param_re.match("(foo)")) self.assertTrue(expr.param_re.match("(foo.bar)")) self.assertTrue(expr.param_re.match("(foo['bar'])")) self.assertTrue(expr.param_re.match("(foo[\"bar\"])")) self.assertTrue(expr.param_re.match("(foo.bar.baz)")) self.assertTrue(expr.param_re.match("(foo['bar'].baz)")) self.assertTrue(expr.param_re.match("(foo['bar']['baz'])")) self.assertTrue(expr.param_re.match("(foo['b\\'ar']['baz'])")) self.assertTrue(expr.param_re.match("(foo['b ar']['baz'])")) self.assertTrue(expr.param_re.match("(foo_bar)")) self.assertFalse(expr.param_re.match("(foo.[\"bar\"])")) self.assertFalse(expr.param_re.match("(.foo[\"bar\"])")) self.assertFalse(expr.param_re.match("(foo [\"bar\"])")) self.assertFalse(expr.param_re.match("( foo[\"bar\"])")) self.assertFalse(expr.param_re.match("(foo[bar].baz)")) self.assertFalse(expr.param_re.match("(foo['bar\"].baz)")) self.assertFalse(expr.param_re.match("(foo['bar].baz)")) self.assertFalse(expr.param_re.match("{foo}")) self.assertFalse(expr.param_re.match("(foo.bar")) self.assertFalse(expr.param_re.match("foo.bar)")) self.assertFalse(expr.param_re.match("foo.b ar)")) self.assertFalse(expr.param_re.match("foo.b\'ar)")) self.assertFalse(expr.param_re.match("(foo+bar")) self.assertFalse(expr.param_re.match("(foo bar")) inputs = { "foo": { "bar": { "baz": "zab1" }, "b ar": { "baz": 2 }, "b'ar": { "baz": True }, 'b"ar': { "baz": None } }, "lst": ["A", "B"] } self.assertEqual(expr.interpolate("$(foo)", inputs), inputs["foo"]) for pattern in ("$(foo.bar)", "$(foo['bar'])", "$(foo[\"bar\"])"): self.assertEqual(expr.interpolate(pattern, inputs), inputs["foo"]["bar"]) for pattern in ("$(foo.bar.baz)", "$(foo['bar'].baz)", "$(foo['bar'][\"baz\"])", "$(foo.bar['baz'])"): self.assertEqual(expr.interpolate(pattern, inputs), "zab1") self.assertEqual(expr.interpolate("$(foo['b ar'].baz)", inputs), 2) self.assertEqual(expr.interpolate("$(foo['b\\'ar'].baz)", inputs), True) self.assertEqual(expr.interpolate("$(foo[\"b'ar\"].baz)", inputs), True) self.assertEqual(expr.interpolate("$(foo['b\\\"ar'].baz)", inputs), None) self.assertEqual(expr.interpolate("$(lst[0])", inputs), "A") self.assertEqual(expr.interpolate("$(lst[1])", inputs), "B") self.assertEqual(expr.interpolate("$(lst.length)", inputs), 2) self.assertEqual(expr.interpolate("$(lst['length'])", inputs), 2) for pattern in ("-$(foo.bar)", "-$(foo['bar'])", "-$(foo[\"bar\"])"): self.assertEqual(expr.interpolate(pattern, inputs), """-{"baz": "zab1"}""") for pattern in ("-$(foo.bar.baz)", "-$(foo['bar'].baz)", "-$(foo['bar'][\"baz\"])", "-$(foo.bar['baz'])"): self.assertEqual(expr.interpolate(pattern, inputs), "-zab1") self.assertEqual(expr.interpolate("-$(foo['b ar'].baz)", inputs), "-2") self.assertEqual(expr.interpolate("-$(foo['b\\'ar'].baz)", inputs), "-true") self.assertEqual(expr.interpolate("-$(foo[\"b\\'ar\"].baz)", inputs), "-true") self.assertEqual(expr.interpolate("-$(foo['b\\\"ar'].baz)", inputs), "-null") for pattern in ("$(foo.bar) $(foo.bar)", "$(foo['bar']) $(foo['bar'])", "$(foo[\"bar\"]) $(foo[\"bar\"])"): self.assertEqual(expr.interpolate(pattern, inputs), """{"baz": "zab1"} {"baz": "zab1"}""") for pattern in ("$(foo.bar.baz) $(foo.bar.baz)", "$(foo['bar'].baz) $(foo['bar'].baz)", "$(foo['bar'][\"baz\"]) $(foo['bar'][\"baz\"])", "$(foo.bar['baz']) $(foo.bar['baz'])"): self.assertEqual(expr.interpolate(pattern, inputs), "zab1 zab1") self.assertEqual(expr.interpolate("$(foo['b ar'].baz) $(foo['b ar'].baz)", inputs), "2 2") self.assertEqual(expr.interpolate("$(foo['b\\'ar'].baz) $(foo['b\\'ar'].baz)", inputs), "true true") self.assertEqual(expr.interpolate("$(foo[\"b\\'ar\"].baz) $(foo[\"b\\'ar\"].baz)", inputs), "true true") self.assertEqual(expr.interpolate("$(foo['b\\\"ar'].baz) $(foo['b\\\"ar'].baz)", inputs), "null null") class TestFactory(unittest.TestCase): def test_factory(self): f = cwltool.factory.Factory() echo = f.make(get_data("tests/echo.cwl")) self.assertEqual(echo(inp="foo"), {"out": "foo\n"}) def test_default_args(self): f = cwltool.factory.Factory() assert f.execkwargs["use_container"] is True assert f.execkwargs["on_error"] == "stop" def test_redefined_args(self): f = cwltool.factory.Factory(use_container=False, on_error="continue") assert f.execkwargs["use_container"] is False assert f.execkwargs["on_error"] == "continue" def test_partial_scatter(self): f = cwltool.factory.Factory(on_error="continue") fail = f.make(get_data("tests/wf/scatterfail.cwl")) try: fail() except cwltool.factory.WorkflowStatus as e: self.assertEquals('sha1$e5fa44f2b31c1fb553b6021e7360d07d5d91ff5e', e.out["out"][0]["checksum"]) self.assertIsNone(e.out["out"][1]) self.assertEquals('sha1$a3db5c13ff90a36963278c6a39e4ee3c22e2a436', e.out["out"][2]["checksum"]) else: self.fail("Should have raised WorkflowStatus") def test_partial_output(self): f = cwltool.factory.Factory(on_error="continue") fail = f.make(get_data("tests/wf/wffail.cwl")) try: fail() except cwltool.factory.WorkflowStatus as e: self.assertEquals('sha1$e5fa44f2b31c1fb553b6021e7360d07d5d91ff5e', e.out["out1"]["checksum"]) self.assertIsNone(e.out["out2"]) else: self.fail("Should have raised WorkflowStatus") class TestScanDeps(unittest.TestCase): def test_scandeps(self): obj = { "id": "file:///example/foo.cwl", "steps": [ { "id": "file:///example/foo.cwl#step1", "inputs": [{ "id": "file:///example/foo.cwl#input1", "default": { "class": "File", "location": "file:///example/data.txt" } }], "run": { "id": "file:///example/bar.cwl", "inputs": [{ "id": "file:///example/bar.cwl#input2", "default": { "class": "Directory", "location": "file:///example/data2", "listing": [{ "class": "File", "location": "file:///example/data3.txt", "secondaryFiles": [{ "class": "File", "location": "file:///example/data5.txt" }] }] }, }, { "id": "file:///example/bar.cwl#input3", "default": { "class": "Directory", "listing": [{ "class": "File", "location": "file:///example/data4.txt" }] } }, { "id": "file:///example/bar.cwl#input4", "default": { "class": "File", "contents": "file literal" } }] } } ] } def loadref(base, p): if isinstance(p, dict): return p else: raise Exception("test case can't load things") sc = cwltool.process.scandeps(obj["id"], obj, {"$import", "run"}, {"$include", "$schemas", "location"}, loadref) sc.sort(key=lambda k: k["basename"]) self.assertEquals([{ "basename": "bar.cwl", "nameroot": "bar", "class": "File", "nameext": ".cwl", "location": "file:///example/bar.cwl" }, { "basename": "data.txt", "nameroot": "data", "class": "File", "nameext": ".txt", "location": "file:///example/data.txt" }, { "basename": "data2", "class": "Directory", "location": "file:///example/data2", "listing": [{ "basename": "data3.txt", "nameroot": "data3", "class": "File", "nameext": ".txt", "location": "file:///example/data3.txt", "secondaryFiles": [{ "class": "File", "basename": "data5.txt", "location": "file:///example/data5.txt", "nameext": ".txt", "nameroot": "data5" }] }] }, { "basename": "data4.txt", "nameroot": "data4", "class": "File", "nameext": ".txt", "location": "file:///example/data4.txt" }], sc) sc = cwltool.process.scandeps(obj["id"], obj, set(("run"), ), set(), loadref) sc.sort(key=lambda k: k["basename"]) self.assertEquals([{ "basename": "bar.cwl", "nameroot": "bar", "class": "File", "nameext": ".cwl", "location": "file:///example/bar.cwl" }], sc) class TestDedup(unittest.TestCase): def test_dedup(self): ex = [{ "class": "File", "location": "file:///example/a" }, { "class": "File", "location": "file:///example/a" }, { "class": "File", "location": "file:///example/d" }, { "class": "Directory", "location": "file:///example/c", "listing": [{ "class": "File", "location": "file:///example/d" }] }] self.assertEquals([{ "class": "File", "location": "file:///example/a" }, { "class": "Directory", "location": "file:///example/c", "listing": [{ "class": "File", "location": "file:///example/d" }] }], cwltool.pathmapper.dedup(ex)) class TestTypeCompare(unittest.TestCase): def test_typecompare(self): self.assertTrue(cwltool.workflow.can_assign_src_to_sink( {'items': ['string', 'null'], 'type': 'array'}, {'items': ['string', 'null'], 'type': 'array'})) self.assertTrue(cwltool.workflow.can_assign_src_to_sink( {'items': ['string'], 'type': 'array'}, {'items': ['string', 'null'], 'type': 'array'})) self.assertTrue(cwltool.workflow.can_assign_src_to_sink( {'items': ['string', 'null'], 'type': 'array'}, {'items': ['string'], 'type': 'array'})) self.assertFalse(cwltool.workflow.can_assign_src_to_sink( {'items': ['string'], 'type': 'array'}, {'items': ['int'], 'type': 'array'})) def test_typecomparestrict(self): self.assertTrue(cwltool.workflow.can_assign_src_to_sink( ['string', 'null'], ['string', 'null'], strict=True)) self.assertTrue(cwltool.workflow.can_assign_src_to_sink( ['string'], ['string', 'null'], strict=True)) self.assertFalse(cwltool.workflow.can_assign_src_to_sink( ['string', 'int'], ['string', 'null'], strict=True)) self.assertTrue(cwltool.workflow.can_assign_src_to_sink( {'items': ['string'], 'type': 'array'}, {'items': ['string', 'null'], 'type': 'array'}, strict=True)) self.assertFalse(cwltool.workflow.can_assign_src_to_sink( {'items': ['string', 'int'], 'type': 'array'}, {'items': ['string', 'null'], 'type': 'array'}, strict=True)) def test_recordcompare(self): src = { 'fields': [{ 'type': {'items': 'string', 'type': 'array'}, 'name': u'file:///home/chapmanb/drive/work/cwl/test_bcbio_cwl/run_info-cwl-workflow/wf-variantcall.cwl#vc_rec/vc_rec/description' }, { 'type': {'items': 'File', 'type': 'array'}, 'name': u'file:///home/chapmanb/drive/work/cwl/test_bcbio_cwl/run_info-cwl-workflow/wf-variantcall.cwl#vc_rec/vc_rec/vrn_file' }], 'type': 'record', 'name': u'file:///home/chapmanb/drive/work/cwl/test_bcbio_cwl/run_info-cwl-workflow/wf-variantcall.cwl#vc_rec/vc_rec' } sink = { 'fields': [{ 'type': {'items': 'string', 'type': 'array'}, 'name': u'file:///home/chapmanb/drive/work/cwl/test_bcbio_cwl/run_info-cwl-workflow/steps/vc_output_record.cwl#vc_rec/vc_rec/description' }, { 'type': {'items': 'File', 'type': 'array'}, 'name': u'file:///home/chapmanb/drive/work/cwl/test_bcbio_cwl/run_info-cwl-workflow/steps/vc_output_record.cwl#vc_rec/vc_rec/vrn_file' }], 'type': 'record', 'name': u'file:///home/chapmanb/drive/work/cwl/test_bcbio_cwl/run_info-cwl-workflow/steps/vc_output_record.cwl#vc_rec/vc_rec'} self.assertTrue(cwltool.workflow.can_assign_src_to_sink(src, sink)) self.assertFalse(cwltool.workflow.can_assign_src_to_sink(src, {'items': 'string', 'type': 'array'})) def test_typecheck(self): self.assertEquals(cwltool.workflow.check_types( ['string', 'int'], ['string', 'int', 'null'], linkMerge=None, valueFrom=None), "pass") self.assertEquals(cwltool.workflow.check_types( ['string', 'int'], ['string', 'null'], linkMerge=None, valueFrom=None), "warning") self.assertEquals(cwltool.workflow.check_types( ['File', 'int'], ['string', 'null'], linkMerge=None, valueFrom=None), "exception") self.assertEquals(cwltool.workflow.check_types( {'items': ['string', 'int'], 'type': 'array'}, {'items': ['string', 'int', 'null'], 'type': 'array'}, linkMerge=None, valueFrom=None), "pass") self.assertEquals(cwltool.workflow.check_types( {'items': ['string', 'int'], 'type': 'array'}, {'items': ['string', 'null'], 'type': 'array'}, linkMerge=None, valueFrom=None), "warning") self.assertEquals(cwltool.workflow.check_types( {'items': ['File', 'int'], 'type': 'array'}, {'items': ['string', 'null'], 'type': 'array'}, linkMerge=None, valueFrom=None), "exception") # check linkMerge when sinktype is not an array self.assertEquals(cwltool.workflow.check_types( ['string', 'int'], ['string', 'int', 'null'], linkMerge="merge_nested", valueFrom=None), "exception") # check linkMerge: merge_nested self.assertEquals(cwltool.workflow.check_types( ['string', 'int'], {'items': ['string', 'int', 'null'], 'type': 'array'}, linkMerge="merge_nested", valueFrom=None), "pass") self.assertEquals(cwltool.workflow.check_types( ['string', 'int'], {'items': ['string', 'null'], 'type': 'array'}, linkMerge="merge_nested", valueFrom=None), "warning") self.assertEquals(cwltool.workflow.check_types( ['File', 'int'], {'items': ['string', 'null'], 'type': 'array'}, linkMerge="merge_nested", valueFrom=None), "exception") # check linkMerge: merge_nested and sinktype is "Any" self.assertEquals(cwltool.workflow.check_types( ['string', 'int'], "Any", linkMerge="merge_nested", valueFrom=None), "pass") # check linkMerge: merge_flattened self.assertEquals(cwltool.workflow.check_types( ['string', 'int'], {'items': ['string', 'int', 'null'], 'type': 'array'}, linkMerge="merge_flattened", valueFrom=None), "pass") self.assertEquals(cwltool.workflow.check_types( ['string', 'int'], {'items': ['string', 'null'], 'type': 'array'}, linkMerge="merge_flattened", valueFrom=None), "warning") self.assertEquals(cwltool.workflow.check_types( ['File', 'int'], {'items': ['string', 'null'], 'type': 'array'}, linkMerge="merge_flattened", valueFrom=None), "exception") self.assertEquals(cwltool.workflow.check_types( {'items': ['string', 'int'], 'type': 'array'}, {'items': ['string', 'int', 'null'], 'type': 'array'}, linkMerge="merge_flattened", valueFrom=None), "pass") self.assertEquals(cwltool.workflow.check_types( {'items': ['string', 'int'], 'type': 'array'}, {'items': ['string', 'null'], 'type': 'array'}, linkMerge="merge_flattened", valueFrom=None), "warning") self.assertEquals(cwltool.workflow.check_types( {'items': ['File', 'int'], 'type': 'array'}, {'items': ['string', 'null'], 'type': 'array'}, linkMerge="merge_flattened", valueFrom=None), "exception") # check linkMerge: merge_flattened and sinktype is "Any" self.assertEquals(cwltool.workflow.check_types( ['string', 'int'], "Any", linkMerge="merge_flattened", valueFrom=None), "pass") self.assertEquals(cwltool.workflow.check_types( {'items': ['string', 'int'], 'type': 'array'}, "Any", linkMerge="merge_flattened", valueFrom=None), "pass") # check linkMerge: merge_flattened when srctype is a list self.assertEquals(cwltool.workflow.check_types( [{'items': 'string', 'type': 'array'}], {'items': 'string', 'type': 'array'}, linkMerge="merge_flattened", valueFrom=None), "pass") # check valueFrom self.assertEquals(cwltool.workflow.check_types( {'items': ['File', 'int'], 'type': 'array'}, {'items': ['string', 'null'], 'type': 'array'}, linkMerge="merge_flattened", valueFrom="special value"), "pass") def test_lifting(self): # check that lifting the types of the process outputs to the workflow step # fails if the step 'out' doesn't match. with self.assertRaises(schema_salad.validate.ValidationException): f = cwltool.factory.Factory() echo = f.make(get_data("tests/test_bad_outputs_wf.cwl")) self.assertEqual(echo(inp="foo"), {"out": "foo\n"}) def test_malformed_outputs(self): # check that tool validation fails if one of the outputs is not a valid CWL type f = cwltool.factory.Factory() with self.assertRaises(schema_salad.validate.ValidationException): echo = f.make(get_data("tests/wf/malformed_outputs.cwl")) echo() def test_separate_without_prefix(self): # check that setting 'separate = false' on an inputBinding without prefix fails the workflow with self.assertRaises(WorkflowException): f = cwltool.factory.Factory() echo = f.make(get_data("tests/wf/separate_without_prefix.cwl")) echo() def test_checker(self): # check that the static checker raises exception when a source type # mismatches its sink type. with self.assertRaises(schema_salad.validate.ValidationException): f = cwltool.factory.Factory() f.make("tests/checker_wf/broken-wf.cwl") with self.assertRaises(schema_salad.validate.ValidationException): f = cwltool.factory.Factory() f.make("tests/checker_wf/broken-wf2.cwl") class TestPrintDot(unittest.TestCase): def test_print_dot(self): # Require that --enable-ext is provided. self.assertEquals(main(["--print-dot", get_data('tests/wf/revsort.cwl')]), 0) class TestCmdLine(unittest.TestCase): def get_main_output(self, new_args): process = subprocess.Popen([ sys.executable, "-m", "cwltool" ] + new_args, stdout=subprocess.PIPE, stderr=subprocess.PIPE) stdout, stderr = process.communicate() return process.returncode, stdout.decode(), stderr.decode() class TestJsConsole(TestCmdLine): def test_js_console_cmd_line_tool(self): for test_file in ("js_output.cwl", "js_output_workflow.cwl"): error_code, stdout, stderr = self.get_main_output(["--js-console", "--no-container", get_data("tests/wf/" + test_file)]) self.assertIn("[log] Log message", stderr) self.assertIn("[err] Error message", stderr) self.assertEquals(error_code, 0, stderr) def test_no_js_console(self): for test_file in ("js_output.cwl", "js_output_workflow.cwl"): error_code, stdout, stderr = self.get_main_output(["--no-container", get_data("tests/wf/" + test_file)]) self.assertNotIn("[log] Log message", stderr) self.assertNotIn("[err] Error message", stderr) @pytest.mark.skipif(onWindows(), reason="Instance of cwltool is used, on Windows it invokes a default docker container" "which is not supported on AppVeyor") class TestCache(TestCmdLine): def test_wf_without_container(self): test_file = "hello-workflow.cwl" error_code, stdout, stderr = self.get_main_output(["--cachedir", "cache", get_data("tests/wf/" + test_file), "--usermessage", "hello"]) self.assertIn("completed success", stderr) self.assertEquals(error_code, 0) @pytest.mark.skipif(onWindows(), reason="Instance of cwltool is used, on Windows it invokes a default docker container" "which is not supported on AppVeyor") class TestChecksum(TestCmdLine): def test_compute_checksum(self): f = cwltool.factory.Factory(compute_checksum=True, use_container=False) echo = f.make(get_data("tests/wf/cat-tool.cwl")) output = echo(file1={ "class": "File", "location": get_data("tests/wf/whale.txt") }, reverse=False ) self.assertEquals(output['output']["checksum"], "sha1$327fc7aedf4f6b69a42a7c8b808dc5a7aff61376") def test_no_compute_checksum(self): test_file = "tests/wf/wc-tool.cwl" job_file = "tests/wf/wc-job.json" error_code, stdout, stderr = self.get_main_output(["--no-compute-checksum", get_data(test_file), get_data(job_file)]) self.assertIn("completed success", stderr) self.assertEquals(error_code, 0) self.assertNotIn("checksum", stdout) if __name__ == '__main__': unittest.main() cwltool-1.0.20180302231433/tests/test_docker_warning.py0000644000175200017520000000177013247251316023353 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import unittest from mock import mock from cwltool.utils import windows_default_container_id from cwltool.command_line_tool import DEFAULT_CONTAINER_MSG, CommandLineTool class TestDefaultDockerWarning(unittest.TestCase): # Test to check warning when default docker Container is used on Windows @mock.patch("cwltool.command_line_tool.onWindows",return_value = True) @mock.patch("cwltool.command_line_tool._logger") def test_default_docker_warning(self,mock_logger,mock_windows): class TestCommandLineTool(CommandLineTool): def __init__(self, **kwargs): self.requirements=[] self.hints=[] def find_default_container(args, builder): return windows_default_container_id TestObject = TestCommandLineTool() TestObject.makeJobRunner() mock_logger.warning.assert_called_with(DEFAULT_CONTAINER_MSG%(windows_default_container_id, windows_default_container_id)) cwltool-1.0.20180302231433/tests/random_lines_mapping.cwl0000755000175200017520000000102513247251316023636 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool id: "random_lines" doc: "Select random lines from a file" inputs: - id: seed type: int inputBinding: position: 1 prefix: -s - id: input1 type: File inputBinding: position: 2 - id: num_lines type: int inputBinding: position: 3 outputs: output1: type: stdout baseCommand: ["random-lines"] arguments: [] hints: SoftwareRequirement: packages: - package: randomLines version: - '1.0.0-rc1' cwltool-1.0.20180302231433/tests/test_fetch.py0000644000175200017520000000341713247251316021450 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import unittest from six.moves import urllib import schema_salad.main import schema_salad.ref_resolver import schema_salad.schema from cwltool.load_tool import load_tool from cwltool.main import main from cwltool.workflow import defaultMakeTool class FetcherTest(unittest.TestCase): def test_fetcher(self): class TestFetcher(schema_salad.ref_resolver.Fetcher): def __init__(self, a, b): pass def fetch_text(self, url): # type: (unicode) -> unicode if url == "baz:bar/foo.cwl": return """ cwlVersion: v1.0 class: CommandLineTool baseCommand: echo inputs: [] outputs: [] """ else: raise RuntimeError("Not foo.cwl, was %s" % url) def check_exists(self, url): # type: (unicode) -> bool if url == "baz:bar/foo.cwl": return True else: return False def urljoin(self, base, url): urlsp = urllib.parse.urlsplit(url) if urlsp.scheme: return url basesp = urllib.parse.urlsplit(base) if basesp.scheme == "keep": return base + "/" + url return urllib.parse.urljoin(base, url) def test_resolver(d, a): if a.startswith("baz:bar/"): return a else: return "baz:bar/" + a load_tool("foo.cwl", defaultMakeTool, resolver=test_resolver, fetcher_constructor=TestFetcher) self.assertEquals(0, main(["--print-pre", "--debug", "foo.cwl"], resolver=test_resolver, fetcher_constructor=TestFetcher)) cwltool-1.0.20180302231433/tests/test_pack.py0000644000175200017520000001464413247251316021301 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import json import unittest import os from functools import partial import tempfile import pytest from six import StringIO import cwltool.pack import cwltool.workflow from cwltool.resolver import tool_resolver from cwltool import load_tool from cwltool.load_tool import fetch_document, validate_document from cwltool.main import makeRelative, main, print_pack from cwltool.pathmapper import adjustDirObjs, adjustFileObjs from cwltool.utils import onWindows from .util import get_data class TestPack(unittest.TestCase): maxDiff = None def test_pack(self): load_tool.loaders = {} document_loader, workflowobj, uri = fetch_document( get_data("tests/wf/revsort.cwl")) document_loader, avsc_names, processobj, metadata, uri = validate_document( document_loader, workflowobj, uri) packed = cwltool.pack.pack(document_loader, processobj, uri, metadata) with open(get_data("tests/wf/expect_packed.cwl")) as f: expect_packed = json.load(f) adjustFileObjs(packed, partial(makeRelative, os.path.abspath(get_data("tests/wf")))) adjustDirObjs(packed, partial(makeRelative, os.path.abspath(get_data("tests/wf")))) self.assertIn("$schemas", packed) del packed["$schemas"] del expect_packed["$schemas"] self.assertEqual(expect_packed, packed) def test_pack_missing_cwlVersion(self): """Test to ensure the generated pack output is not missing the `cwlVersion` in case of single tool workflow and single step workflow""" # Testing single tool workflow document_loader, workflowobj, uri = fetch_document( get_data("tests/wf/hello_single_tool.cwl")) document_loader, _, processobj, metadata, uri = validate_document( document_loader, workflowobj, uri) # generate pack output dict packed = json.loads(print_pack(document_loader, processobj, uri, metadata)) self.assertEqual('v1.0', packed["cwlVersion"]) # Testing single step workflow document_loader, workflowobj, uri = fetch_document( get_data("tests/wf/hello-workflow.cwl")) document_loader, _, processobj, metadata, uri = validate_document( document_loader, workflowobj, uri) # generate pack output dict packed = json.loads(print_pack(document_loader, processobj, uri, metadata)) self.assertEqual('v1.0', packed["cwlVersion"]) def test_pack_idempotence_tool(self): """Test to ensure that pack produces exactly the same document for an already packed document""" # Testing single tool self._pack_idempotently("tests/wf/hello_single_tool.cwl") def test_pack_idempotence_workflow(self): """Test to ensure that pack produces exactly the same document for an already packed document""" # Testing workflow self._pack_idempotently("tests/wf/count-lines1-wf.cwl") def _pack_idempotently(self, document): document_loader, workflowobj, uri = fetch_document( get_data(document)) document_loader, avsc_names, processobj, metadata, uri = validate_document( document_loader, workflowobj, uri) # generate pack output dict packed = json.loads(print_pack(document_loader, processobj, uri, metadata)) document_loader, workflowobj, uri2 = fetch_document(packed) document_loader, avsc_names, processobj, metadata, uri2 = validate_document( document_loader, workflowobj, uri) double_packed = json.loads(print_pack(document_loader, processobj, uri2, metadata)) self.assertEqual(packed, double_packed) @pytest.mark.skipif(onWindows(), reason="Instance of cwltool is used, on Windows it invokes a default docker container" "which is not supported on AppVeyor") def test_packed_workflow_execution(self): load_tool.loaders = {} test_wf = "tests/wf/count-lines1-wf.cwl" test_wf_job = "tests/wf/wc-job.json" document_loader, workflowobj, uri = fetch_document( get_data(test_wf), resolver=tool_resolver) document_loader, avsc_names, processobj, metadata, uri = validate_document( document_loader, workflowobj, uri) packed = json.loads(print_pack(document_loader, processobj, uri, metadata)) temp_packed_path = tempfile.mkstemp()[1] with open(temp_packed_path, 'w') as f: json.dump(packed, f) normal_output = StringIO() packed_output = StringIO() self.assertEquals(main(['--debug', get_data(temp_packed_path), get_data(test_wf_job)], stdout=packed_output), 0) self.assertEquals(main([get_data(test_wf), get_data(test_wf_job)], stdout=normal_output), 0) self.assertEquals(json.loads(packed_output.getvalue()), json.loads(normal_output.getvalue())) os.remove(temp_packed_path) @pytest.mark.skipif(onWindows(), reason="Instance of cwltool is used, on Windows it invokes a default docker container" "which is not supported on AppVeyor") def test_preserving_namespaces(self): test_wf = "tests/wf/formattest.cwl" test_wf_job = "tests/wf/formattest-job.json" document_loader, workflowobj, uri = fetch_document( get_data(test_wf)) document_loader, avsc_names, processobj, metadata, uri = validate_document( document_loader, workflowobj, uri) packed = json.loads(print_pack(document_loader, processobj, uri, metadata)) assert "$namespaces" in packed temp_packed_path = tempfile.mkstemp()[1] with open(temp_packed_path, 'w') as f: json.dump(packed, f) normal_output = StringIO() packed_output = StringIO() self.assertEquals(main(['--debug', get_data(temp_packed_path), get_data(test_wf_job)], stdout=packed_output), 0) self.assertEquals(main([get_data(test_wf), get_data(test_wf_job)], stdout=normal_output), 0) self.assertEquals(json.loads(packed_output.getvalue()), json.loads(normal_output.getvalue())) os.remove(temp_packed_path) cwltool-1.0.20180302231433/tests/test_deps_env_resolvers_conf_rewrite.yml0000644000175200017520000000015213247251316027176 0ustar mcrusoemcrusoe00000000000000- type: galaxy_packages base_path: ./tests/test_deps_env mapping_files: ./tests/test_deps_mapping.yml cwltool-1.0.20180302231433/tests/echo-cwlrun-job.yaml0000644000175200017520000000014713247251316022625 0ustar mcrusoemcrusoe00000000000000cwl:tool: echo.cwl cwl:requirements: - class: DockerRequirement dockerPull: debian inp: "Hoopla!" cwltool-1.0.20180302231433/tests/seqtk_seq_with_docker.cwl0000755000175200017520000000077013247251316024040 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool id: "seqtk_seq" doc: "Convert to FASTA (seqtk)" inputs: - id: input1 type: File inputBinding: position: 1 prefix: "-a" outputs: - id: output1 type: File outputBinding: glob: out baseCommand: ["seqtk", "seq"] arguments: [] stdout: out hints: SoftwareRequirement: packages: - package: seqtk version: - '1.2' DockerRequirement: dockerPull: quay.io/biocontainers/seqtk:1.2--0 cwltool-1.0.20180302231433/tests/seqtk_seq_wrong_name.cwl0000755000175200017520000000103313247251316023663 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool id: "seqtk_seq" doc: "Convert to FASTA (seqtk)" inputs: - id: input1 type: File inputBinding: position: 1 prefix: "-a" outputs: - id: output1 type: File outputBinding: glob: out baseCommand: ["seqtk", "seq"] arguments: [] stdout: out hints: SoftwareRequirement: packages: - package: seqtk_seq version: - '1.2' specs: - https://anaconda.org/bioconda/seqtk - https://packages.debian.org/sid/seqtk cwltool-1.0.20180302231433/tests/test_rdfprint.py0000644000175200017520000000045713247251316022210 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import unittest from six import StringIO from cwltool.main import main from .util import get_data class RDF_Print(unittest.TestCase): def test_rdf_print(self): self.assertEquals(main(['--print-rdf', get_data('tests/wf/hello_single_tool.cwl')]), 0) cwltool-1.0.20180302231433/tests/echo-job.yaml0000644000175200017520000000012313247251316021307 0ustar mcrusoemcrusoe00000000000000cwl:requirements: - class: DockerRequirement dockerPull: debian inp: "Howdy!" cwltool-1.0.20180302231433/tests/echo.cwl0000755000175200017520000000045113247251316020371 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool inputs: - id: inp type: string inputBinding: {} outputs: - id: out type: string outputBinding: glob: out.txt loadContents: true outputEval: $(self[0].contents) baseCommand: echo stdout: out.txtcwltool-1.0.20180302231433/tests/seqtk_seq_job.json0000644000175200017520000000010513247251316022461 0ustar mcrusoemcrusoe00000000000000{ "input1": { "class": "File", "location": "2.fastq" } } cwltool-1.0.20180302231433/tests/random_lines_job.json0000644000175200017520000000014413247251316023137 0ustar mcrusoemcrusoe00000000000000{ "input1": { "class": "File", "location": "2.fastq" }, "seed": 5, "num_lines": 2 } cwltool-1.0.20180302231433/tests/listing-job.yml0000644000175200017520000000004613247251316021705 0ustar mcrusoemcrusoe00000000000000d: class: Directory location: tmp1cwltool-1.0.20180302231433/tests/test_pathmapper.py0000644000175200017520000000327213247251316022517 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import unittest from cwltool.pathmapper import PathMapper, normalizeFilesDirs class TestPathMapper(unittest.TestCase): def test_subclass(self): class SubPathMapper(PathMapper): def __init__(self, referenced_files, basedir, stagedir, new): super(SubPathMapper, self).__init__(referenced_files, basedir, stagedir) self.new = new a = SubPathMapper([], '', '', "new") self.assertTrue(a.new, "new") def test_strip_trailing(self): d = { "class": "Directory", "location": "/foo/bar/" } normalizeFilesDirs(d) self.assertEqual( { "class": "Directory", "location": "/foo/bar", "basename": "bar" }, d) def test_basename_field_generation(self): base_file = { "class": "File", "location": "/foo/" } # (filename, expected: (nameroot, nameext)) testdata = [ ("foo.bar", ("foo", ".bar")), ("foo", ("foo", '')), (".foo", (".foo", '')), ("foo.", ("foo", '.')), ("foo.bar.baz", ("foo.bar", ".baz")) ] for filename, (nameroot, nameext) in testdata: file = dict(base_file) file["location"] = file["location"] + filename expected = dict(file) expected["basename"] = filename expected["nameroot"] = nameroot expected["nameext"] = nameext normalizeFilesDirs(file) self.assertEqual(file, expected) cwltool-1.0.20180302231433/tests/wf/0000755000175200017520000000000013247251336017357 5ustar mcrusoemcrusoe00000000000000cwltool-1.0.20180302231433/tests/wf/expect_packed.cwl0000644000175200017520000001000013247251316022652 0ustar mcrusoemcrusoe00000000000000{ "cwlVersion": "v1.0", "$schemas": ["file:///home/peter/work/cwltool/tests/wf/empty.ttl"], "$graph": [ { "inputs": [ { "doc": "The input file to be processed.", "type": "File", "id": "#main/input", "default": { "class": "File", "location": "hello.txt" } }, { "default": true, "doc": "If true, reverse (decending) sort", "type": "boolean", "id": "#main/reverse_sort" } ], "doc": "Reverse the lines in a document, then sort those lines.", "class": "Workflow", "steps": [ { "out": [ "#main/rev/output" ], "run": "#revtool.cwl", "id": "#main/rev", "in": [ { "source": "#main/input", "id": "#main/rev/input" } ] }, { "out": [ "#main/sorted/output" ], "run": "#sorttool.cwl", "id": "#main/sorted", "in": [ { "source": "#main/rev/output", "id": "#main/sorted/input" }, { "source": "#main/reverse_sort", "id": "#main/sorted/reverse" } ] } ], "outputs": [ { "outputSource": "#main/sorted/output", "type": "File", "id": "#main/output", "doc": "The output with the lines reversed and sorted." } ], "id": "#main", "hints": [ { "dockerPull": "debian:8", "class": "DockerRequirement" } ] }, { "inputs": [ { "inputBinding": {}, "type": "File", "id": "#revtool.cwl/input" } ], "stdout": "output.txt", "doc": "Reverse each line using the `rev` command", "baseCommand": "rev", "class": "CommandLineTool", "outputs": [ { "outputBinding": { "glob": "output.txt" }, "type": "File", "id": "#revtool.cwl/output" } ], "id": "#revtool.cwl" }, { "inputs": [ { "inputBinding": { "position": 1, "prefix": "--reverse" }, "type": "boolean", "id": "#sorttool.cwl/reverse" }, { "inputBinding": { "position": 2 }, "type": "File", "id": "#sorttool.cwl/input" } ], "stdout": "output.txt", "doc": "Sort lines using the `sort` command", "baseCommand": "sort", "class": "CommandLineTool", "outputs": [ { "outputBinding": { "glob": "output.txt" }, "type": "File", "id": "#sorttool.cwl/output" } ], "id": "#sorttool.cwl" } ] } cwltool-1.0.20180302231433/tests/wf/iwdr-entry.cwl0000644000175200017520000000063613247251316022175 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner class: CommandLineTool cwlVersion: v1.0 baseCommand: ["cat", "example.conf"] requirements: InitialWorkDirRequirement: listing: - entryname: example.conf entry: | CONFIGVAR=$(inputs.message) inputs: message: string outputs: out: type: string outputBinding: glob: example.conf loadContents: true outputEval: $(self[0].contents)cwltool-1.0.20180302231433/tests/wf/wrong_cwlVersion.cwl0000755000175200017520000000106513247251316023440 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner cwlVersion: v0.1 class: Workflow label: "Hello World" doc: "Outputs a message using echo" inputs: [] outputs: response: outputSource: step0/response type: File steps: step0: run: class: CommandLineTool inputs: message: type: string doc: "The message to print" default: "Hello World" inputBinding: position: 1 baseCommand: echo stdout: response.txt outputs: response: type: stdout in: [] out: [response] cwltool-1.0.20180302231433/tests/wf/wc-tool.cwl0000755000175200017520000000034413247251316021454 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner class: CommandLineTool cwlVersion: v1.0 inputs: file1: File outputs: output: type: File outputBinding: { glob: output } baseCommand: [wc, -l] stdin: $(inputs.file1.path) stdout: output cwltool-1.0.20180302231433/tests/wf/badout2.cwl0000755000175200017520000000046013247251316021427 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner class: CommandLineTool cwlVersion: v1.0 baseCommand: touch arguments: [file1] requirements: InlineJavascriptRequirement: {} inputs: [] outputs: out: type: Directory outputBinding: outputEval: | $({"class": "Directory", "path": runtime.outdir+"/file1"})cwltool-1.0.20180302231433/tests/wf/cat.cwl0000755000175200017520000000020213247251316020630 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner class: CommandLineTool cwlVersion: v1.0 inputs: r: File outputs: [] arguments: [cat, $(inputs.r.path)]cwltool-1.0.20180302231433/tests/wf/revsort.cwl0000755000175200017520000000407113247251316021575 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner # # This is a two-step workflow which uses "revtool" and "sorttool" defined above. # class: Workflow doc: "Reverse the lines in a document, then sort those lines." cwlVersion: v1.0 # Requirements & hints specify prerequisites and extensions to the workflow. # In this example, DockerRequirement specifies a default Docker container # in which the command line tools will execute. hints: - class: DockerRequirement dockerPull: debian:8 # The inputs array defines the structure of the input object that describes # the inputs to the workflow. # # The "reverse_sort" input parameter demonstrates the "default" field. If the # field "reverse_sort" is not provided in the input object, the default value will # be used. inputs: input: type: File doc: "The input file to be processed." default: class: File location: hello.txt reverse_sort: type: boolean default: true doc: "If true, reverse (decending) sort" # The "outputs" array defines the structure of the output object that describes # the outputs of the workflow. # # Each output field must be connected to the output of one of the workflow # steps using the "connect" field. Here, the parameter "#output" of the # workflow comes from the "#sorted" output of the "sort" step. outputs: output: type: File outputSource: sorted/output doc: "The output with the lines reversed and sorted." # The "steps" array lists the executable steps that make up the workflow. # The tool to execute each step is listed in the "run" field. # # In the first step, the "inputs" field of the step connects the upstream # parameter "#input" of the workflow to the input parameter of the tool # "revtool.cwl#input" # # In the second step, the "inputs" field of the step connects the output # parameter "#reversed" from the first step to the input parameter of the # tool "sorttool.cwl#input". steps: rev: in: input: input out: [output] run: revtool.cwl sorted: in: input: rev/output reverse: reverse_sort out: [output] run: sorttool.cwl cwltool-1.0.20180302231433/tests/wf/js_output_workflow.cwl0000755000175200017520000000055413247251316024061 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner class: Workflow cwlVersion: v1.0 requirements: - class: InlineJavascriptRequirement inputs: [] outputs: [] steps: - id: js_log in: [] out: [] run: class: ExpressionTool inputs: [] outputs: [] expression: ${console.log("Log message");console.error("Error message");return ["python", "-c", "True"];}cwltool-1.0.20180302231433/tests/wf/updatedir.cwl0000755000175200017520000000047613247251316022057 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner class: CommandLineTool cwlVersion: v1.0 requirements: InitialWorkDirRequirement: listing: - entry: $(inputs.r) entryname: inp writable: true inputs: r: Directory outputs: out: type: Directory outputBinding: glob: inp arguments: [touch, inp/blurb]cwltool-1.0.20180302231433/tests/wf/listing_shallow.cwl0000755000175200017520000000044413247251316023273 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner class: CommandLineTool cwlVersion: v1.0 $namespaces: cwltool: http://commonwl.org/cwltool# requirements: cwltool:LoadListingRequirement: loadListing: shallow_listing inputs: d: Directory outputs: [] arguments: [echo, "$(inputs.d.listing[0].listing[0])"] cwltool-1.0.20180302231433/tests/wf/scatter-wf4.cwl0000755000175200017520000000154513247251316022237 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 $graph: - id: echo class: CommandLineTool inputs: echo_in1: type: string inputBinding: {} echo_in2: type: string inputBinding: {} outputs: echo_out: type: string outputBinding: glob: "step1_out" loadContents: true outputEval: $(self[0].contents) baseCommand: "echo" arguments: ["-n", "foo"] stdout: step1_out - id: main class: Workflow inputs: inp1: string[] inp2: string[] requirements: - class: ScatterFeatureRequirement steps: step1: scatter: [echo_in1, echo_in2] scatterMethod: dotproduct in: echo_in1: inp1 echo_in2: inp2 out: [echo_out] run: "#echo" outputs: - id: out outputSource: step1/echo_out type: type: array items: string cwltool-1.0.20180302231433/tests/wf/wffail.cwl0000755000175200017520000000117713247251316021345 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner class: Workflow cwlVersion: v1.0 inputs: [] requirements: StepInputExpressionRequirement: {} outputs: out1: type: File outputSource: step1/out out2: type: File outputSource: step2/out out4: type: File outputSource: step4/out steps: step1: in: r: {default: "1"} out: [out] run: echo.cwl step2: in: r: {default: "2"} out: [out] run: echo.cwl step3: in: r: {default: "5"} out: [out] run: echo.cwl step4: in: r: source: step3/out valueFrom: $(inputs.r.basename) out: [out] run: echo.cwl cwltool-1.0.20180302231433/tests/wf/updatedir_inplace.cwl0000755000175200017520000000065713247251316023553 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner class: CommandLineTool cwlVersion: v1.0 $namespaces: cwltool: http://commonwl.org/cwltool# requirements: InitialWorkDirRequirement: listing: - entry: $(inputs.r) entryname: inp writable: true cwltool:InplaceUpdateRequirement: inplaceUpdate: true inputs: r: Directory outputs: out: type: Directory outputBinding: glob: inp arguments: [touch, inp/blurb]cwltool-1.0.20180302231433/tests/wf/default_path.cwl0000755000175200017520000000032413247251316022526 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool inputs: - id: "file1" type: File default: class: File path: default.txt outputs: [] arguments: [cat, $(inputs.file1.path)] cwltool-1.0.20180302231433/tests/wf/updateval_inplace.cwl0000755000175200017520000000102413247251316023544 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner class: CommandLineTool cwlVersion: v1.0 $namespaces: cwltool: "http://commonwl.org/cwltool#" requirements: InitialWorkDirRequirement: listing: - entry: $(inputs.r) writable: true cwltool:InplaceUpdateRequirement: inplaceUpdate: true inputs: r: File script: type: File default: class: File location: updateval.py outputs: out: type: File outputBinding: glob: $(inputs.r.basename) arguments: [python, $(inputs.script), $(inputs.r.basename)]cwltool-1.0.20180302231433/tests/wf/mut.cwl0000755000175200017520000000035613247251316020700 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: Workflow inputs: a: File outputs: [] steps: step1: in: r: a out: [] run: updateval_inplace.cwl step2: in: r: a out: [] run: updateval_inplace.cwl cwltool-1.0.20180302231433/tests/wf/updateval.cwl0000755000175200017520000000064113247251316022055 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner class: CommandLineTool cwlVersion: v1.0 requirements: InitialWorkDirRequirement: listing: - entry: $(inputs.r) writable: true inputs: r: File script: type: File default: class: File location: updateval.py outputs: out: type: File outputBinding: glob: $(inputs.r.basename) arguments: [python, $(inputs.script), $(inputs.r.basename)]cwltool-1.0.20180302231433/tests/wf/updateval.py0000644000175200017520000000014313247251316021712 0ustar mcrusoemcrusoe00000000000000import sys f = open(sys.argv[1], "r+") val = int(f.read()) f.seek(0) f.write(str(val+1)) f.close() cwltool-1.0.20180302231433/tests/wf/empty.ttl0000644000175200017520000000000013247251316021226 0ustar mcrusoemcrusoe00000000000000cwltool-1.0.20180302231433/tests/wf/js_output.cwl0000755000175200017520000000043213247251316022122 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner class: CommandLineTool cwlVersion: v1.0 requirements: - class: InlineJavascriptRequirement inputs: [] outputs: [] arguments: - valueFrom: ${console.log("Log message");console.error("Error message");return ["python", "-c", "True"]} shellQuote: falsecwltool-1.0.20180302231433/tests/wf/missing_cwlVersion.cwl0000755000175200017520000000104413247251316023752 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner class: Workflow label: "Hello World" doc: "Outputs a message using echo" inputs: [] outputs: response: outputSource: step0/response type: File steps: step0: run: class: CommandLineTool inputs: message: type: string doc: "The message to print" default: "Hello World" inputBinding: position: 1 baseCommand: echo stdout: response.txt outputs: response: type: stdout in: [] out: [response] cwltool-1.0.20180302231433/tests/wf/whale.txt0000644000175200017520000000212713247251316021220 0ustar mcrusoemcrusoe00000000000000Call me Ishmael. Some years ago--never mind how long precisely--having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world. It is a way I have of driving off the spleen and regulating the circulation. Whenever I find myself growing grim about the mouth; whenever it is a damp, drizzly November in my soul; whenever I find myself involuntarily pausing before coffin warehouses, and bringing up the rear of every funeral I meet; and especially whenever my hypos get such an upper hand of me, that it requires a strong moral principle to prevent me from deliberately stepping into the street, and methodically knocking people's hats off--then, I account it high time to get to sea as soon as I can. This is my substitute for pistol and ball. With a philosophical flourish Cato throws himself upon his sword; I quietly take to the ship. There is nothing surprising in this. If they but knew it, almost all men in their degree, some time or other, cherish very nearly the same feelings towards the ocean with me. cwltool-1.0.20180302231433/tests/wf/scatter-job2.json0000755000175200017520000000007613247251316022555 0ustar mcrusoemcrusoe00000000000000{ "inp1": ["one", "two"], "inp2": ["three", "four"] } cwltool-1.0.20180302231433/tests/wf/echo.cwl0000755000175200017520000000110113247251316020776 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner class: CommandLineTool cwlVersion: v1.0 inputs: r: string script: type: string default: | from __future__ import print_function import sys print(sys.argv[1]) if sys.argv[1] == "2": exit(1) else: f = open("foo"+sys.argv[1]+".txt", "wb") content = sys.argv[1]+"\n" f.write(content.encode('utf-8')) if sys.argv[1] == "5": exit(1) outputs: out: type: File outputBinding: glob: foo$(inputs.r).txt arguments: [python, -c, $(inputs.script), $(inputs.r)] cwltool-1.0.20180302231433/tests/wf/parseInt-tool.cwl0000755000175200017520000000042713247251316022632 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner class: ExpressionTool requirements: - class: InlineJavascriptRequirement cwlVersion: v1.0 inputs: file1: type: File inputBinding: { loadContents: true } outputs: output: int expression: "$({'output': parseInt(inputs.file1.contents)})" cwltool-1.0.20180302231433/tests/wf/malformed_outputs.cwl0000644000175200017520000000012513247251316023633 0ustar mcrusoemcrusoe00000000000000cwlVersion: v1.0 class: CommandLineTool baseCommand: echo inputs: [] outputs: - cwltool-1.0.20180302231433/tests/wf/vf-concat.cwl0000644000175200017520000000055613247251316021752 0ustar mcrusoemcrusoe00000000000000cwlVersion: v1.0 class: CommandLineTool requirements: - class: InlineJavascriptRequirement baseCommand: echo inputs: - id: parameter type: string? inputBinding: valueFrom: $("a ")$("sting") outputs: out: type: string outputBinding: glob: output.txt loadContents: true outputEval: $(self[0].contents) stdout: output.txt cwltool-1.0.20180302231433/tests/wf/listing_deep.cwl0000755000175200017520000000044313247251316022536 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner class: CommandLineTool cwlVersion: v1.0 $namespaces: cwltool: "http://commonwl.org/cwltool#" requirements: cwltool:LoadListingRequirement: loadListing: deep_listing inputs: d: Directory outputs: [] arguments: [echo, "$(inputs.d.listing[0].listing[0])"] cwltool-1.0.20180302231433/tests/wf/badout3.cwl0000755000175200017520000000045313247251316021432 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner class: CommandLineTool cwlVersion: v1.0 baseCommand: touch arguments: [file1] requirements: InlineJavascriptRequirement: {} inputs: [] outputs: out: type: Directory outputBinding: outputEval: | $({"class": "File", "path": runtime.outdir+"/file1"})cwltool-1.0.20180302231433/tests/wf/listing_v1_0.cwl0000755000175200017520000000023613247251316022366 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner class: CommandLineTool cwlVersion: v1.0 inputs: d: Directory outputs: [] arguments: [echo, "$(inputs.d.listing[0].listing[0])"] cwltool-1.0.20180302231433/tests/wf/hello_single_tool.cwl0000755000175200017520000000025013247251316023565 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool baseCommand: echo inputs: message: type: string inputBinding: position: 1 outputs: [] cwltool-1.0.20180302231433/tests/wf/listing_none.cwl0000755000175200017520000000042413247251316022557 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner class: CommandLineTool cwlVersion: v1.0 $namespaces: cwltool: http://commonwl.org/cwltool# requirements: cwltool:LoadListingRequirement: loadListing: no_listing inputs: d: Directory outputs: [] arguments: [echo, "$(inputs.d.listing[0])"] cwltool-1.0.20180302231433/tests/wf/mut3.cwl0000755000175200017520000000043113247251316020755 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: Workflow inputs: a: File outputs: [] steps: step1: in: r: a out: [] run: cat.cwl step2: in: r: a out: [] run: cat.cwl step3: in: r: a out: [] run: updateval_inplace.cwl cwltool-1.0.20180302231433/tests/wf/badout1.cwl0000755000175200017520000000044613247251316021432 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner class: CommandLineTool cwlVersion: v1.0 baseCommand: touch arguments: [file1] requirements: InlineJavascriptRequirement: {} inputs: [] outputs: out: type: File outputBinding: outputEval: | $({"class": "File", "path": runtime.outdir+"/file2"})cwltool-1.0.20180302231433/tests/wf/cat-tool.cwl0000644000175200017520000000034113247251316021604 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner class: CommandLineTool cwlVersion: v1.0 inputs: file1: File outputs: output: type: File outputBinding: { glob: output } baseCommand: [cat] stdin: $(inputs.file1.path) stdout: output cwltool-1.0.20180302231433/tests/wf/revtool.cwl0000755000175200017520000000250713247251316021565 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner # # Simplest example command line program wrapper for the Unix tool "rev". # class: CommandLineTool cwlVersion: v1.0 doc: "Reverse each line using the `rev` command" $schemas: - empty.ttl # The "inputs" array defines the structure of the input object that describes # the inputs to the underlying program. Here, there is one input field # defined that will be called "input" and will contain a "File" object. # # The input binding indicates that the input value should be turned into a # command line argument. In this example inputBinding is an empty object, # which indicates that the file name should be added to the command line at # a default location. inputs: input: type: File inputBinding: {} # The "outputs" array defines the structure of the output object that # describes the outputs of the underlying program. Here, there is one # output field defined that will be called "output", must be a "File" type, # and after the program executes, the output value will be the file # output.txt in the designated output directory. outputs: output: type: File outputBinding: glob: output.txt # The actual program to execute. baseCommand: rev # Specify that the standard output stream must be redirected to a file called # output.txt in the designated output directory. stdout: output.txt cwltool-1.0.20180302231433/tests/wf/hello-workflow.cwl0000755000175200017520000000122113247251316023036 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: Workflow label: "Hello World" doc: "Outputs a message using echo" inputs: usermessage: string outputs: response: outputSource: step0/response type: File steps: step0: run: class: CommandLineTool inputs: message: type: string doc: "The message to print" default: "Hello World" inputBinding: position: 1 baseCommand: echo arguments: - "-n" - "-e" stdout: response.txt outputs: response: type: stdout in: message: usermessage out: [response]cwltool-1.0.20180302231433/tests/wf/separate_without_prefix.cwl0000644000175200017520000000052113247251316025026 0ustar mcrusoemcrusoe00000000000000cwlVersion: v1.0 class: CommandLineTool baseCommand: echo inputs: src: type: string default: string inputBinding: position: 1 separate: false stdout: output.txt outputs: output: type: string outputBinding: glob: output.txt loadContents: true outputEval: $(self[0].contents) cwltool-1.0.20180302231433/tests/wf/scatterfail.cwl0000755000175200017520000000127713247251316022377 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner class: Workflow cwlVersion: v1.0 requirements: ScatterFeatureRequirement: {} SubworkflowFeatureRequirement: {} inputs: range: type: string[] default: ["1", "2", "3"] outputs: out: type: File[] outputSource: step1/out steps: step1: in: r: range scatter: r out: [out] run: class: Workflow id: subtool inputs: r: string outputs: out: type: File outputSource: sstep1/out steps: sstep1: in: r: r out: [out] run: echo.cwl sstep2: in: r: sstep1/out out: [] run: cat.cwl cwltool-1.0.20180302231433/tests/wf/mut2.cwl0000755000175200017520000000045313247251316020760 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: Workflow inputs: a: File outputs: out: type: File outputSource: step2/out steps: step1: in: r: a out: [out] run: updateval_inplace.cwl step2: in: r: step1/out out: [out] run: updateval_inplace.cwl cwltool-1.0.20180302231433/tests/wf/wc-job.json0000644000175200017520000000012213247251316021424 0ustar mcrusoemcrusoe00000000000000{ "file1": { "class": "File", "location": "whale.txt" } } cwltool-1.0.20180302231433/tests/wf/sorttool.cwl0000755000175200017520000000207613247251316021761 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner # Example command line program wrapper for the Unix tool "sort" # demonstrating command line flags. class: CommandLineTool doc: "Sort lines using the `sort` command" cwlVersion: v1.0 # This example is similar to the previous one, with an additional input # parameter called "reverse". It is a boolean parameter, which is # intepreted as a command line flag. The value of "prefix" is used for # flag to put on the command line if "reverse" is true, if "reverse" is # false, no flag is added. # # This example also introduced the "position" field. This indicates the # sorting order of items on the command line. Lower numbers are placed # before higher numbers. Here, the "--reverse" flag (if present) will be # added to the command line before the input file path. inputs: - id: reverse type: boolean inputBinding: position: 1 prefix: "--reverse" - id: input type: File inputBinding: position: 2 outputs: - id: output type: File outputBinding: glob: output.txt baseCommand: sort stdout: output.txt cwltool-1.0.20180302231433/tests/wf/count-lines1-wf.cwl0000755000175200017520000000046513247251316023027 0ustar mcrusoemcrusoe00000000000000class: Workflow cwlVersion: v1.0 inputs: file1: type: File outputs: count_output: type: int outputSource: step2/output steps: step1: run: wc-tool.cwl in: file1: file1 out: [output] step2: run: parseInt-tool.cwl in: file1: step1/output out: [output] cwltool-1.0.20180302231433/tests/wf/formattest-job.json0000644000175200017520000000017013247251316023206 0ustar mcrusoemcrusoe00000000000000{ "input": { "class": "File", "location": "whale.txt", "format": "edam:format_2330" } } cwltool-1.0.20180302231433/tests/wf/hello.txt0000644000175200017520000000004313247251316021216 0ustar mcrusoemcrusoe00000000000000Hello world testing one two three. cwltool-1.0.20180302231433/tests/wf/revsort-job.json0000644000175200017520000000010613247251316022521 0ustar mcrusoemcrusoe00000000000000{ "input": { "class": "File", "location": "whale.txt" } } cwltool-1.0.20180302231433/tests/wf/formattest.cwl0000755000175200017520000000060313247251316022256 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner $namespaces: edam: "http://edamontology.org/" cwlVersion: v1.0 class: CommandLineTool doc: "Reverse each line using the `rev` command" inputs: input: type: File inputBinding: {} format: edam:format_2330 outputs: output: type: File outputBinding: glob: output.txt format: edam:format_2330 baseCommand: rev stdout: output.txtcwltool-1.0.20180302231433/tests/test_bad_outputs_wf.cwl0000755000175200017520000000105613247251316023541 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: Workflow inputs: [] outputs: b: type: string outputSource: step2/c steps: step1: in: [] out: [c] run: class: CommandLineTool id: subtool inputs: [] outputs: b: type: string outputBinding: outputEval: "qq" baseCommand: echo step2: in: a: step1/c out: [c] run: class: CommandLineTool id: subtool inputs: a: string outputs: b: string baseCommand: echocwltool-1.0.20180302231433/tests/test_deps_env_resolvers_conf.yml0000644000175200017520000000007413247251316025440 0ustar mcrusoemcrusoe00000000000000- type: galaxy_packages base_path: ./tests/test_deps_env cwltool-1.0.20180302231433/tests/seqtk_seq.cwl0000755000175200017520000000065613247251316021461 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool id: "seqtk_seq" doc: "Convert to FASTA (seqtk)" inputs: - id: input1 type: File inputBinding: position: 1 prefix: "-a" outputs: - id: output1 type: File outputBinding: glob: out baseCommand: ["seqtk", "seq"] arguments: [] stdout: out hints: SoftwareRequirement: packages: - package: seqtk version: - r93 cwltool-1.0.20180302231433/tests/test_cwl_version.py0000644000175200017520000000074313247251316022710 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import unittest from cwltool.main import main from .util import get_data class CWL_Version_Checks(unittest.TestCase): # no cwlVersion in the workflow def test_missing_cwl_version(self): self.assertEqual(main([get_data('tests/wf/missing_cwlVersion.cwl')]), 1) # using cwlVersion: v0.1 in the workflow def test_incorrect_cwl_version(self): self.assertEqual(main([get_data('tests/wf/wrong_cwlVersion.cwl')]), 1) cwltool-1.0.20180302231433/tests/tmp1/0000755000175200017520000000000013247251336017624 5ustar mcrusoemcrusoe00000000000000cwltool-1.0.20180302231433/tests/tmp1/tmp2/0000755000175200017520000000000013247251336020506 5ustar mcrusoemcrusoe00000000000000cwltool-1.0.20180302231433/tests/tmp1/tmp2/tmp3/0000755000175200017520000000000013247251336021371 5ustar mcrusoemcrusoe00000000000000cwltool-1.0.20180302231433/tests/tmp1/tmp2/tmp3/.gitkeep0000644000175200017520000000000013247251316023006 0ustar mcrusoemcrusoe00000000000000cwltool-1.0.20180302231433/tests/test_deps_mapping.yml0000644000175200017520000000014513247251316023171 0ustar mcrusoemcrusoe00000000000000- from: name: randomLines version: 1.0.0-rc1 to: name: random-lines version: '1.0' cwltool-1.0.20180302231433/tests/random_lines.cwl0000755000175200017520000000102213247251316022120 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env cwl-runner cwlVersion: v1.0 class: CommandLineTool id: "random_lines" doc: "Select random lines from a file" inputs: - id: seed type: int inputBinding: position: 1 prefix: -s - id: input1 type: File inputBinding: position: 2 - id: num_lines type: int inputBinding: position: 3 outputs: output1: type: stdout baseCommand: ["random-lines"] arguments: [] hints: SoftwareRequirement: packages: - package: 'random-lines' version: - '1.0' cwltool-1.0.20180302231433/tests/test_ext.py0000644000175200017520000001364713247251316021165 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import os import shutil import tempfile import unittest import pytest import cwltool.expression as expr import cwltool.pathmapper import cwltool.process import cwltool.workflow from cwltool.main import main from cwltool.utils import onWindows from .util import get_data @pytest.mark.skipif(onWindows(), reason="Instance of Cwltool is used, On windows that invoke a default docker Container") class TestListing(unittest.TestCase): def test_missing_enable_ext(self): # Require that --enable-ext is provided. self.assertEquals(main([get_data('tests/wf/listing_deep.cwl'), get_data('tests/listing-job.yml')]), 1) def test_listing_deep(self): # Should succeed. self.assertEquals(main(["--enable-ext", get_data('tests/wf/listing_deep.cwl'), get_data('tests/listing-job.yml')]), 0) def test_listing_shallow(self): # This fails on purpose, because it tries to access listing in a subdirectory the same way that listing_deep does, # but it shouldn't be expanded. self.assertEquals(main(["--enable-ext", get_data('tests/wf/listing_shallow.cwl'), get_data('tests/listing-job.yml')]), 1) def test_listing_none(self): # This fails on purpose, because it tries to access listing but it shouldn't be there. self.assertEquals(main(["--enable-ext", get_data('tests/wf/listing_none.cwl'), get_data('tests/listing-job.yml')]), 1) def test_listing_v1_0(self): # Default behavior in 1.0 is deep expansion. self.assertEquals(main([get_data('tests/wf/listing_v1_0.cwl'), get_data('tests/listing-job.yml')]), 0) # def test_listing_v1_1(self): # # Default behavior in 1.1 will be no expansion # self.assertEquals(main([get_data('tests/wf/listing_v1_1.cwl'), get_data('tests/listing-job.yml')]), 1) @pytest.mark.skipif(onWindows(), reason="InplaceUpdate uses symlinks,does not run on windows without admin privileges") class TestInplaceUpdate(unittest.TestCase): def test_updateval(self): try: tmp = tempfile.mkdtemp() with open(os.path.join(tmp, "value"), "w") as f: f.write("1") out = tempfile.mkdtemp() self.assertEquals(main(["--outdir", out, get_data('tests/wf/updateval.cwl'), "-r", os.path.join(tmp, "value")]), 0) with open(os.path.join(tmp, "value"), "r") as f: self.assertEquals("1", f.read()) with open(os.path.join(out, "value"), "r") as f: self.assertEquals("2", f.read()) finally: shutil.rmtree(tmp) shutil.rmtree(out) def test_updateval_inplace(self): try: tmp = tempfile.mkdtemp() with open(os.path.join(tmp, "value"), "w") as f: f.write("1") out = tempfile.mkdtemp() self.assertEquals(main(["--enable-ext", "--leave-outputs", "--outdir", out, get_data('tests/wf/updateval_inplace.cwl'), "-r", os.path.join(tmp, "value")]), 0) with open(os.path.join(tmp, "value"), "r") as f: self.assertEquals("2", f.read()) self.assertFalse(os.path.exists(os.path.join(out, "value"))) finally: shutil.rmtree(tmp) shutil.rmtree(out) def test_write_write_conflict(self): try: tmp = tempfile.mkdtemp() with open(os.path.join(tmp, "value"), "w") as f: f.write("1") self.assertEquals(main(["--enable-ext", get_data('tests/wf/mut.cwl'), "-a", os.path.join(tmp, "value")]), 1) with open(os.path.join(tmp, "value"), "r") as f: self.assertEquals("2", f.read()) finally: shutil.rmtree(tmp) def test_sequencing(self): try: tmp = tempfile.mkdtemp() with open(os.path.join(tmp, "value"), "w") as f: f.write("1") self.assertEquals(main(["--enable-ext", get_data('tests/wf/mut2.cwl'), "-a", os.path.join(tmp, "value")]), 0) with open(os.path.join(tmp, "value"), "r") as f: self.assertEquals("3", f.read()) finally: shutil.rmtree(tmp) # def test_read_write_conflict(self): # try: # tmp = tempfile.mkdtemp() # with open(os.path.join(tmp, "value"), "w") as f: # f.write("1") # self.assertEquals(main(["--enable-ext", get_data('tests/wf/mut3.cwl'), "-a", os.path.join(tmp, "value")]), 0) # finally: # shutil.rmtree(tmp) def test_updatedir(self): try: tmp = tempfile.mkdtemp() with open(os.path.join(tmp, "value"), "w") as f: f.write("1") out = tempfile.mkdtemp() self.assertFalse(os.path.exists(os.path.join(tmp, "blurb"))) self.assertFalse(os.path.exists(os.path.join(out, "blurb"))) self.assertEquals(main(["--outdir", out, get_data('tests/wf/updatedir.cwl'), "-r", tmp]), 0) self.assertFalse(os.path.exists(os.path.join(tmp, "blurb"))) self.assertTrue(os.path.exists(os.path.join(out, "inp/blurb"))) finally: shutil.rmtree(tmp) shutil.rmtree(out) def test_updatedir_inplace(self): try: tmp = tempfile.mkdtemp() with open(os.path.join(tmp, "value"), "w") as f: f.write("1") out = tempfile.mkdtemp() self.assertFalse(os.path.exists(os.path.join(tmp, "blurb"))) self.assertFalse(os.path.exists(os.path.join(out, "blurb"))) self.assertEquals(main(["--enable-ext", "--leave-outputs", "--outdir", out, get_data('tests/wf/updatedir_inplace.cwl'), "-r", tmp]), 0) self.assertTrue(os.path.exists(os.path.join(tmp, "blurb"))) self.assertFalse(os.path.exists(os.path.join(out, "inp/blurb"))) finally: shutil.rmtree(tmp) shutil.rmtree(out) cwltool-1.0.20180302231433/tests/test_check.py0000644000175200017520000000133313247251316021427 0ustar mcrusoemcrusoe00000000000000from __future__ import absolute_import import unittest import cwltool.expression as expr import cwltool.factory import cwltool.pathmapper import cwltool.process import cwltool.workflow import pytest from cwltool.main import main from cwltool.utils import onWindows from .util import get_data class TestCheck(unittest.TestCase): @pytest.mark.skipif(onWindows(), reason="Instance of Cwltool is used, On windows that invoke a default docker Container") def test_output_checking(self): self.assertEquals(main([get_data('tests/wf/badout1.cwl')]), 1) self.assertEquals(main([get_data('tests/wf/badout2.cwl')]), 1) self.assertEquals(main([get_data('tests/wf/badout3.cwl')]), 1) cwltool-1.0.20180302231433/setup.py0000755000175200017520000001072613247251316017322 0ustar mcrusoemcrusoe00000000000000#!/usr/bin/env python import os import sys import setuptools.command.egg_info as egg_info_cmd from setuptools import setup SETUP_DIR = os.path.dirname(__file__) README = os.path.join(SETUP_DIR, 'README.rst') try: import gittaggers tagger = gittaggers.EggInfoFromGit except ImportError: tagger = egg_info_cmd.egg_info needs_pytest = {'pytest', 'test', 'ptr'}.intersection(sys.argv) pytest_runner = ['pytest-runner'] if needs_pytest else [] setup(name='cwltool', version='1.0', description='Common workflow language reference implementation', long_description=open(README).read(), author='Common workflow language working group', author_email='common-workflow-language@googlegroups.com', url="https://github.com/common-workflow-language/cwltool", download_url="https://github.com/common-workflow-language/cwltool", # platforms='', # empty as is conveyed by the classifier below # license='', # empty as is conveyed by the classifier below packages=["cwltool", 'cwltool.tests'], package_dir={'cwltool.tests': 'tests'}, package_data={'cwltool': ['schemas/draft-2/*.yml', 'schemas/draft-3/*.yml', 'schemas/draft-3/*.md', 'schemas/draft-3/salad/schema_salad/metaschema/*.yml', 'schemas/draft-3/salad/schema_salad/metaschema/*.md', 'schemas/v1.0/*.yml', 'schemas/v1.0/*.md', 'schemas/v1.0/salad/schema_salad/metaschema/*.yml', 'schemas/v1.0/salad/schema_salad/metaschema/*.md', 'schemas/v1.1.0-dev1/*.yml', 'schemas/v1.1.0-dev1/*.md', 'schemas/v1.1.0-dev1/salad/schema_salad/metaschema/*.yml', 'schemas/v1.1.0-dev1/salad/schema_salad/metaschema/*.md', 'cwlNodeEngine.js', 'cwlNodeEngineJSConsole.js', 'extensions.yml']}, include_package_data=True, install_requires=[ 'setuptools', 'requests >= 2.4.3', 'ruamel.yaml >= 0.12.4, < 0.15', 'rdflib >= 4.2.2, < 4.3.0', 'shellescape >= 3.4.1, < 3.5', 'schema-salad >= 2.6.20170927145003, < 3', 'typing >= 3.5.3', 'six >= 1.8.0', ], extras_require={ 'deps': ["galaxy-lib >= 17.09.3"] }, setup_requires=[] + pytest_runner, test_suite='tests', tests_require=['pytest', 'mock >= 2.0.0',], entry_points={ 'console_scripts': ["cwltool=cwltool.main:main"] }, zip_safe=True, cmdclass={'egg_info': tagger}, classifiers=[ 'Development Status :: 5 - Production/Stable', 'Environment :: Console', 'Intended Audience :: Developers', 'Intended Audience :: Science/Research', 'Intended Audience :: Healthcare Industry', 'License :: OSI Approved :: Apache Software License', 'Natural Language :: English', 'Operating System :: MacOS :: MacOS X', 'Operating System :: POSIX', 'Operating System :: POSIX :: Linux', 'Operating System :: OS Independent', 'Operating System :: Microsoft :: Windows', 'Operating System :: Microsoft :: Windows :: Windows 10', 'Operating System :: Microsoft :: Windows :: Windows 8.1', # 'Operating System :: Microsoft :: Windows :: Windows 8', # not tested # 'Operating System :: Microsoft :: Windows :: Windows 7', # not tested 'Programming Language :: Python :: 2', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3.4', 'Programming Language :: Python :: 3.5', 'Programming Language :: Python :: 3.6', 'Topic :: Scientific/Engineering', 'Topic :: Scientific/Engineering :: Bio-Informatics', 'Topic :: Scientific/Engineering :: Astronomy', 'Topic :: Scientific/Engineering :: Atmospheric Science', 'Topic :: Scientific/Engineering :: Information Analysis', 'Topic :: Scientific/Engineering :: Medical Science Apps.', 'Topic :: System :: Distributed Computing', 'Topic :: Utilities', ] )