././@PaxHeader 0000000 0000000 0000000 00000000034 00000000000 010212 x ustar 00 28 mtime=1645718441.1396773
cwltool-3.1.20220224085855/ 0000755 0001750 0001750 00000000000 00000000000 013454 5 ustar 00peter peter ././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/CODE_OF_CONDUCT.md 0000644 0001750 0001750 00000012473 00000000000 016262 0 ustar 00peter peter CWL Code of Conduct
===================
The CWL Project is dedicated to providing a harassment-free experience for
everyone. We do not tolerate harassment of participants in any form.
This code of conduct applies to all CWL Project spaces both online and off: the
Google Group, the Gitter chat room, the Google Hangouts chats, and any other
CWL spaces. Anyone who violates this code of conduct may be sanctioned or
expelled from these spaces at the discretion of the CWL Leadership Team.
Some CWL Project spaces may have additional rules in place, which will be
made clearly available to participants. Participants are responsible for
knowing and abiding by these rules.
Harassment includes, but is not limited to:
- Offensive comments related to gender, gender identity and expression, sexual
orientation, disability, mental illness, neuro(a)typicality, physical
appearance, body size, age, race, or religion.
- Unwelcome comments regarding a person’s lifestyle choices and practices,
including those related to food, health, parenting, drugs, and employment.
- Deliberate misgendering or use of [dead](https://www.quora.com/What-is-deadnaming/answer/Nancy-C-Walker)
or rejected names.
- Gratuitous or off-topic sexual images or behaviour in spaces where they’re not
appropriate.
- Physical contact and simulated physical contact (eg, textual descriptions like
“\*hug\*” or “\*backrub\*”) without consent or after a request to stop.
- Threats of violence.
- Incitement of violence towards any individual, including encouraging a person
to commit suicide or to engage in self-harm.
- Deliberate intimidation.
- Stalking or following.
- Harassing photography or recording, including logging online activity for
harassment purposes.
- Sustained disruption of discussion.
- Unwelcome sexual attention.
- Pattern of inappropriate social contact, such as requesting/assuming
inappropriate levels of intimacy with others
- Continued one-on-one communication after requests to cease.
- Deliberate “outing” of any aspect of a person’s identity without their consent
except as necessary to protect vulnerable people from intentional abuse.
- Publication of non-harassing private communication.
The CWL Project prioritizes marginalized people’s safety over privileged
people’s comfort. The CWL Leadeship Team will not act on complaints regarding:
- ‘Reverse’ -isms, including ‘reverse racism,’ ‘reverse sexism,’ and ‘cisphobia’
- Reasonable communication of boundaries, such as “leave me alone,” “go away,” or
“I’m not discussing this with you.”
- Communicating in a [tone](http://geekfeminism.wikia.com/wiki/Tone_argument)
you don’t find congenial
Reporting
---------
If you are being harassed by a member of the CWL Project, notice that someone
else is being harassed, or have any other concerns, please contact the CWL
Leadership Team at leadership@commonwl.org. If person who is harassing
you is on the team, they will recuse themselves from handling your incident. We
will respond as promptly as we can.
This code of conduct applies to CWL Project spaces, but if you are being
harassed by a member of CWL Project outside our spaces, we still want to
know about it. We will take all good-faith reports of harassment by CWL Project
members, especially the CWL Leadership Team, seriously. This includes harassment
outside our spaces and harassment that took place at any point in time. The
abuse team reserves the right to exclude people from the CWL Project based on
their past behavior, including behavior outside CWL Project spaces and
behavior towards people who are not in the CWL Project.
In order to protect volunteers from abuse and burnout, we reserve the right to
reject any report we believe to have been made in bad faith. Reports intended
to silence legitimate criticism may be deleted without response.
We will respect confidentiality requests for the purpose of protecting victims
of abuse. At our discretion, we may publicly name a person about whom we’ve
received harassment complaints, or privately warn third parties about them, if
we believe that doing so will increase the safety of CWL Project members or
the general public. We will not name harassment victims without their
affirmative consent.
Consequences
------------
Participants asked to stop any harassing behavior are expected to comply
immediately.
If a participant engages in harassing behavior, the CWL Leadership Team may
take any action they deem appropriate, up to and including expulsion from all
CWL Project spaces and identification of the participant as a harasser to other
CWL Project members or the general public.
This anti-harassment policy is based on the [example policy from the Geek
Feminism wiki](http://geekfeminism.wikia.com/wiki/Community_anti-harassment/Policy),
created by the Geek Feminism community.
CWL Leadership Team
-------------------
As a stop gap measure until a more formal governance structure is adopted, the
following individuals make up the leadership of the CWL Project: Peter Amstutz,
John Chilton, Michael R. Crusoe, and Nebojša Tijanić.
To report an issue with anyone on the team you can escalate to Ward Vandewege
(Curoverse) ward@curoverse.com, Anton Nekrutenko (Galaxy)
anton AT bx DOT psu DOT edu, C. Titus Brown (UC Davis) ctbrown@ucdavis.edu, or
Brandi Davis-Dusenbery (Seven Bridges Genomics) brandi@sbgenomics.com.
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/CONTRIBUTING.md 0000644 0001750 0001750 00000004253 00000000000 015711 0 ustar 00peter peter Style guide:
- PEP-8
- Python 3.7+ compatible code
- PEP-484 type hints
- Vertically align the type hints in function definitions
The development is done using ``git``, we encourage you to get familiar with it.
Here's a rough guide (improvements are welcome!)
To get the code and start working on the changes you can start a console and:
- Clone the cwltool: ``git clone https://github.com/common-workflow-language/cwltool.git``
- Switch to cwltool directory: ``cd cwltool``
In order to contribute to the development of ``cwltool``, the source code needs to pass the test before your changes are accepted.
There are a couple ways to test the code with your changes: let `tox` manage installation and test running in virtual environments, or do it manually (preferably in a virtual environment):
- Install ``tox`` preferably using the OS' package manager, otherwise it can be installed with ``pip install --user -U tox``
- Make your changes to the code and add tests for new cool things you're adding!
- Run the tests with the command ``tox``, it's recommended to use some parameters as tox will try to run all the checks in all available python interpreters.
- The important tests to run are ``unit tests`` and ``type tests``.
To run these two in Python 3.7, we can tell tox to run only those tests by running: ``tox -e py37-unit,py37-mypy2,py37-mypy3``.
- Run ``tox -l`` to see all available tests and runtimes
For the more traditional workflow:
- Create a virtual environment: ``python3 -m venv cwltool``
- To begin using the virtual environment, it needs to be activated: ``source bin/activate``
- To check if you have the virtual environment set up: ``which python`` and it should point to python executable in your virtualenv
- Install cwltool: ``pip install -e .``
- Check the version which might be different from the version installed in general on any system: ``cwltool --version``
- Make your changes to the code and add tests for new cool things you're adding!
- Run the unit-tests to see : ``python setup.py test``
- After you're done working on ``cwltool``, you can deactivate the virtual environment: ``deactivate``
When tests are passing, you can simply commit and create a PR on ``cwltool`` repo ././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/LICENSE.txt 0000644 0001750 0001750 00000026136 00000000000 015307 0 ustar 00peter peter
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/MANIFEST.in 0000644 0001750 0001750 00000006200 00000000000 015210 0 ustar 00peter peter include README.rst CODE_OF_CONDUCT.md CONTRIBUTING.md
include MANIFEST.in
include LICENSE.txt
include *requirements.txt mypy.ini tox.ini
include gittaggers.py Makefile cwltool.py
recursive-include typeshed *.pyi
include tests/*
include tests/tmp1/tmp2/tmp3/.gitkeep
include tests/tmp4/alpha/*
include tests/wf/*
include tests/wf/operation/*
include tests/override/*
include tests/reloc/*.cwl
include tests/reloc/dir1/*
include tests/reloc/dir2/*
include tests/checker_wf/*
include tests/subgraph/*
include tests/input_deps/*
include tests/trs/*
include tests/wf/generator/*
include cwltool/py.typed
include cwltool/schemas/v1.0/*.yml
include cwltool/schemas/v1.0/*.yml
include cwltool/schemas/v1.0/*.md
include cwltool/schemas/v1.0/salad/schema_salad/metaschema/*.yml
include cwltool/schemas/v1.0/salad/schema_salad/metaschema/*.md
include cwltool/schemas/v1.1/*.yml
include cwltool/schemas/v1.1/*.md
include cwltool/schemas/v1.1/salad/schema_salad/metaschema/*.yml
include cwltool/schemas/v1.1/salad/schema_salad/metaschema/*.md
include cwltool/schemas/v1.1.0-dev1/*.yml
include cwltool/schemas/v1.1.0-dev1/*.md
include cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/*.yml
include cwltool/schemas/v1.1.0-dev1/salad/schema_salad/metaschema/*.md
include cwltool/schemas/v1.2.0-dev2/*.yml
include cwltool/schemas/v1.2.0-dev2/*.md
include cwltool/schemas/v1.2.0-dev2/salad/schema_salad/metaschema/*.yml
include cwltool/schemas/v1.2.0-dev2/salad/schema_salad/metaschema/*.md
include cwltool/schemas/v1.2.0-dev3/*.yml
include cwltool/schemas/v1.2.0-dev3/*.md
include cwltool/schemas/v1.2.0-dev3/salad/schema_salad/metaschema/*.yml
include cwltool/schemas/v1.2.0-dev3/salad/schema_salad/metaschema/*.md
include cwltool/schemas/v1.2.0-dev4/*.yml
include cwltool/schemas/v1.2.0-dev4/*.md
include cwltool/schemas/v1.2.0-dev4/salad/schema_salad/metaschema/*.yml
include cwltool/schemas/v1.2.0-dev4/salad/schema_salad/metaschema/*.md
include cwltool/schemas/v1.2.0-dev5/*.yml
include cwltool/schemas/v1.2.0-dev5/*.md
include cwltool/schemas/v1.2.0-dev5/salad/schema_salad/metaschema/*.yml
include cwltool/schemas/v1.2.0-dev5/salad/schema_salad/metaschema/*.md
include cwltool/schemas/v1.2/*.yml
include cwltool/schemas/v1.2/*.md
include cwltool/schemas/v1.2/salad/schema_salad/metaschema/*.yml
include cwltool/schemas/v1.2/salad/schema_salad/metaschema/*.md
include cwltool/cwlNodeEngine.js
include cwltool/cwlNodeEngineJSConsole.js
include cwltool/cwlNodeEngineWithContext.js
include cwltool/extensions.yml
include cwltool/extensions-v1.1.yml
include cwltool/jshint/jshint_wrapper.js
include cwltool/jshint/jshint.js
include cwltool/hello.simg
include cwltool/rdfqueries/*.sparql
prune cwltool/schemas/v1.0/salad/typeshed
prune cwltool/schemas/v1.0/salad/schema_salad/tests
prune cwltool/schemas/v1.1.0-dev1/salad/typeshed
prune cwltool/schemas/v1.1.0-dev1/salad/schema_salad/tests
prune cwltool/schemas/presentations
prune cwltool/schemas/site
prune cwltool/schemas/v1.0/examples
prune cwltool/schemas/v1.0/v1.0
prune cwltool/schemas/v1.1.0-dev1/examples
prune cwltool/schemas/v1.1.0-dev1/v1.1.0-dev1
recursive-exclude cwltool/schemas *.py
exclude debian.img
global-exclude *~
global-exclude *.pyc
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/Makefile 0000644 0001750 0001750 00000015067 00000000000 015125 0 ustar 00peter peter # This file is part of cwltool,
# https://github.com/common-workflow-language/cwltool/, and is
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# Contact: common-workflow-language@googlegroups.com
# make format to fix most python formatting errors
# make pylint to check Python code for enhanced compliance including naming
# and documentation
# make coverage-report to check coverage of the python scripts by the tests
MODULE=cwltool
# `SHELL=bash` doesn't work for some, so don't use BASH-isms like
# `[[` conditional expressions.
PYSOURCES=$(wildcard ${MODULE}/**.py tests/*.py) setup.py
DEVPKGS=diff_cover pylint pep257 pydocstyle tox tox-pyenv \
isort wheel autoflake pyupgrade bandit -rlint-requirements.txt\
-rtest-requirements.txt -rmypy-requirements.txt
DEBDEVPKGS=pep8 python-autopep8 pylint python-coverage pydocstyle sloccount \
python-flake8 python-mock shellcheck
VERSION=3.1.$(shell TZ=UTC git log --first-parent --max-count=1 \
--format=format:%cd --date=format-local:%Y%m%d%H%M%S)
mkfile_dir := $(dir $(abspath $(lastword $(MAKEFILE_LIST))))
UNAME_S=$(shell uname -s)
## all : default task
all: dev
## help : print this help message and exit
help: Makefile
@sed -n 's/^##//p' $<
## install-dep : install most of the development dependencies via pip
install-dep: install-dependencies
install-dependencies: FORCE
pip install --upgrade $(DEVPKGS)
pip install -r requirements.txt
install-doc-dep:
pip install -r docs/requirements.txt
## install-deb-dep: install most of the dev dependencies via apt-get
install-deb-dep:
sudo apt-get install $(DEBDEVPKGS)
## install : install the ${MODULE} module and schema-salad-tool
install: FORCE
pip install .[deps]
## dev : install the ${MODULE} module in dev mode
dev: install-dep
pip install -e .[deps]
## dist : create a module package for distribution
dist: dist/${MODULE}-$(VERSION).tar.gz
check-python3:
# Check that the default python version is python 3
python --version 2>&1 | grep "Python 3"
dist/${MODULE}-$(VERSION).tar.gz: check-python3 $(SOURCES)
python setup.py sdist bdist_wheel
## docs : make the docs
docs: FORCE
cd docs && $(MAKE) html
## clean : clean up all temporary / machine-generated files
clean: check-python3 FORCE
rm -f ${MODULE}/*.pyc tests/*.pyc *.so ${MODULE}/*.so
rm -Rf ${MODULE}/__pycache__/
python setup.py clean --all || true
rm -Rf .coverage
rm -f diff-cover.html
# Linting and code style related targets
## sorting imports using isort: https://github.com/timothycrosley/isort
sort_imports: $(PYSOURCES)
isort $^
remove_unused_imports: $(PYSOURCES)
autoflake --in-place --remove-all-unused-imports $^
pep257: pydocstyle
## pydocstyle : check Python code style
pydocstyle: $(PYSOURCES)
pydocstyle --add-ignore=D100,D101,D102,D103 $^ || true
pydocstyle_report.txt: $(PYSOURCES)
pydocstyle setup.py $^ > $@ 2>&1 || true
diff_pydocstyle_report: pydocstyle_report.txt
diff-quality --compare-branch=main --violations=pydocstyle --fail-under=100 $^
## format : check/fix all code indentation and formatting (runs black)
format:
black --exclude cwltool/schemas setup.py cwltool.py cwltool tests
format-check:
black --diff --check --exclude cwltool/schemas setup.py cwltool.py cwltool tests
## pylint : run static code analysis on Python code
pylint: $(PYSOURCES)
pylint --msg-template="{path}:{line}: [{msg_id}({symbol}), {obj}] {msg}" \
$^ -j0|| true
pylint_report.txt: $(PYSOURCES)
pylint --msg-template="{path}:{line}: [{msg_id}({symbol}), {obj}] {msg}" \
$^ -j0> $@ || true
diff_pylint_report: pylint_report.txt
diff-quality --compare-branch=main --violations=pylint pylint_report.txt
.coverage: testcov
coverage: .coverage
coverage report
coverage.xml: .coverage
coverage xml
coverage.html: htmlcov/index.html
htmlcov/index.html: .coverage
coverage html
@echo Test coverage of the Python code is now in htmlcov/index.html
coverage-report: .coverage
coverage report
diff-cover: coverage.xml
diff-cover --compare-branch=main $^
diff-cover.html: coverage.xml
diff-cover --compare-branch=main $^ --html-report $@
## test : run the ${MODULE} test suite
test: check-python3 $(PYSOURCES)
python -m pytest -rs ${PYTEST_EXTRA}
## testcov : run the ${MODULE} test suite and collect coverage
testcov: check-python3 $(PYSOURCES)
python -m pytest -rs --cov --cov-config=.coveragerc --cov-report= ${PYTEST_EXTRA}
sloccount.sc: $(PYSOURCES) Makefile
sloccount --duplicates --wide --details $^ > $@
## sloccount : count lines of code
sloccount: $(PYSOURCES) Makefile
sloccount $^
list-author-emails:
@echo 'name, E-Mail Address'
@git log --format='%aN,%aE' | sort -u | grep -v 'root'
mypy3: mypy
mypy: $(filter-out setup.py gittagger.py,$(PYSOURCES))
if ! test -f $(shell python -c 'import ruamel.yaml; import os.path; print(os.path.dirname(ruamel.yaml.__file__))')/py.typed ; \
then \
rm -Rf typeshed/ruamel/yaml ; \
ln -s $(shell python -c 'import ruamel.yaml; import os.path; print(os.path.dirname(ruamel.yaml.__file__))') \
typeshed/ruamel/ ; \
fi # if minimally required ruamel.yaml version is 0.15.99 or greater, than the above can be removed
MYPYPATH=$$MYPYPATH:typeshed mypy $^
mypyc: $(PYSOURCES)
MYPYPATH=typeshed CWLTOOL_USE_MYPYC=1 pip install --verbose -e . \
&& pytest -rs -vv ${PYTEST_EXTRA}
shellcheck: FORCE
shellcheck build-cwltool-docker.sh cwl-docker.sh release-test.sh conformance-test.sh \
cwltool-in-docker.sh
pyupgrade: $(PYSOURCES)
pyupgrade --exit-zero-even-if-changed --py36-plus $^
release-test: check-python3 FORCE
git diff-index --quiet HEAD -- || ( echo You have uncommited changes, please commit them and try again; false )
./release-test.sh
release: release-test
. testenv2/bin/activate && \
python testenv2/src/${MODULE}/setup.py sdist bdist_wheel && \
pip install twine && \
twine upload testenv2/src/${MODULE}/dist/* && \
git tag ${VERSION} && git push --tags
flake8: $(PYSOURCES)
flake8 $^
FORCE:
# Use this to print the value of a Makefile variable
# Example `make print-VERSION`
# From https://www.cmcrossroads.com/article/printing-value-makefile-variable
print-% : ; @echo $* = $($*)
././@PaxHeader 0000000 0000000 0000000 00000000034 00000000000 010212 x ustar 00 28 mtime=1645718441.1396773
cwltool-3.1.20220224085855/PKG-INFO 0000644 0001750 0001750 00000131545 00000000000 014562 0 ustar 00peter peter Metadata-Version: 2.1
Name: cwltool
Version: 3.1.20220224085855
Summary: Common workflow language reference implementation
Home-page: https://github.com/common-workflow-language/cwltool
Author: Common workflow language working group
Author-email: common-workflow-language@googlegroups.com
License: UNKNOWN
Download-URL: https://github.com/common-workflow-language/cwltool
Description: ==================================================================
Common Workflow Language tool description reference implementation
==================================================================
|Linux Status| |Coverage Status| |Docs Status|
PyPI: |PyPI Version| |PyPI Downloads Month| |Total PyPI Downloads|
Conda: |Conda Version| |Conda Installs|
Debian: |Debian Testing package| |Debian Stable package|
Quay.io (Docker): |Quay.io Container|
.. |Linux Status| image:: https://github.com/common-workflow-language/cwltool/actions/workflows/ci-tests.yml/badge.svg?branch=main
:target: https://github.com/common-workflow-language/cwltool/actions/workflows/ci-tests.yml
.. |Debian Stable package| image:: https://badges.debian.net/badges/debian/stable/cwltool/version.svg
:target: https://packages.debian.org/stable/cwltool
.. |Debian Testing package| image:: https://badges.debian.net/badges/debian/testing/cwltool/version.svg
:target: https://packages.debian.org/testing/cwltool
.. |Coverage Status| image:: https://img.shields.io/codecov/c/github/common-workflow-language/cwltool.svg
:target: https://codecov.io/gh/common-workflow-language/cwltool
.. |PyPI Version| image:: https://badge.fury.io/py/cwltool.svg
:target: https://badge.fury.io/py/cwltool
.. |PyPI Downloads Month| image:: https://pepy.tech/badge/cwltool/month
:target: https://pepy.tech/project/cwltool
.. |Total PyPI Downloads| image:: https://static.pepy.tech/personalized-badge/cwltool?period=total&units=international_system&left_color=black&right_color=orange&left_text=Total%20PyPI%20Downloads
:target: https://pepy.tech/project/cwltool
.. |Conda Version| image:: https://anaconda.org/conda-forge/cwltool/badges/version.svg
:target: https://anaconda.org/conda-forge/cwltool
.. |Conda Installs| image:: https://anaconda.org/conda-forge/cwltool/badges/downloads.svg
:target: https://anaconda.org/conda-forge/cwltool
.. |Quay.io Container| image:: https://quay.io/repository/commonwl/cwltool/status
:target: https://quay.io/repository/commonwl/cwltool
.. |Docs Status| image:: https://readthedocs.org/projects/cwltool/badge/?version=latest
:target: https://cwltool.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status
This is the reference implementation of the Common Workflow Language. It is
intended to be feature complete and provide comprehensive validation of CWL
files as well as provide other tools related to working with CWL.
This is written and tested for
`Python `_ ``3.x {x = 6, 7, 8, 9, 10}``
The reference implementation consists of two packages. The ``cwltool`` package
is the primary Python module containing the reference implementation in the
``cwltool`` module and console executable by the same name.
The ``cwlref-runner`` package is optional and provides an additional entry point
under the alias ``cwl-runner``, which is the implementation-agnostic name for the
default CWL interpreter installed on a host.
``cwltool`` is provided by the CWL project, `a member project of Software Freedom Conservancy `_
and our `many contributors `_.
Install
-------
``cwltool`` packages
^^^^^^^^^^^^^^^^^^^^
Your operating system may offer cwltool directly. For `Debian `_, `Ubuntu `_,
and similar Linux distribution try
.. code:: bash
sudo apt-get install cwltool
If you encounter an error, first try to update package information by using
.. code:: bash
sudo apt-get update
If you are running macOS X or other UNIXes and you want to use packages prepared by the conda-forge project, then
please follow the install instructions for `conda-forge `_ (if you haven't already) and then
.. code:: bash
conda install -c conda-forge cwltool
All of the above methods of installing ``cwltool`` use packages that might contain bugs already fixed in newer versions or be missing desired features.
If the packaged version of ``cwltool`` available to you is too old, then we recommend installing using ``pip`` and ``venv``
.. code:: bash
python3 -m venv env # Create a virtual environment named 'env' in the current directory
source env/bin/activate # Activate environment before installing `cwltool`
Then install the latest ``cwlref-runner`` package from PyPi (which will install the latest ``cwltool`` package as
well)
.. code:: bash
pip install cwlref-runner
If installing alongside another CWL implementation (like ``toil-cwl-runner`` or ``arvados-cwl-runner``) then instead run
.. code:: bash
pip install cwltool
MS Windows users
^^^^^^^^^^^^^^^^
1. Install `"Windows Subsystem for Linux 2" (WSL2) and Docker Desktop `_
2. Install `Debian from the Microsoft Store `_
3. Set Debian as your default WSL 2 distro: ``wsl --set-default debian``.
4. Return to the Docker Desktop, choose `Settings → Resources → WSL Integration `_ and under "Enable integration with additional distros" select "Debian",
5. Reboot if you have not yet already.
6. Launch Debian and follow the Linux instructions above (``apt-get install cwltool`` or use the ``venv`` method)
Network problems from within WSL2? Try `these instructions `_ followed by ``wsl --shutdown``.
``cwltool`` development version
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Or you can skip the direct ``pip`` commands above and install the latest development version of ``cwltool``:
.. code:: bash
git clone https://github.com/common-workflow-language/cwltool.git # clone (copy) the cwltool git repository
cd cwltool # Change to source directory that git clone just downloaded
pip install .[deps] # Installs ``cwltool`` from source
cwltool --version # Check if the installation works correctly
Remember, if co-installing multiple CWL implementations, then you need to
maintain which implementation ``cwl-runner`` points to via a symbolic file
system link or `another facility `_.
Recommended Software
^^^^^^^^^^^^^^^^^^^^
You may also want to have the following installed:
- `node.js `_
- Docker, udocker, or Singularity (optional)
Without these, some examples in the CWL tutorials at http://www.commonwl.org/user_guide/ may not work.
Run on the command line
-----------------------
Simple command::
cwl-runner [tool-or-workflow-description] [input-job-settings]
Or if you have multiple CWL implementations installed and you want to override
the default cwl-runner then use::
cwltool [tool-or-workflow-description] [input-job-settings]
You can set cwltool options in the environment with CWLTOOL_OPTIONS,
these will be inserted at the beginning of the command line::
export CWLTOOL_OPTIONS="--debug"
Use with boot2docker on macOS
-----------------------------
boot2docker runs Docker inside a virtual machine, and it only mounts ``Users``
on it. The default behavior of CWL is to create temporary directories under e.g.
``/Var`` which is not accessible to Docker containers.
To run CWL successfully with boot2docker you need to set the ``--tmpdir-prefix``
and ``--tmp-outdir-prefix`` to somewhere under ``/Users``::
$ cwl-runner --tmp-outdir-prefix=/Users/username/project --tmpdir-prefix=/Users/username/project wc-tool.cwl wc-job.json
Using uDocker
-------------
Some shared computing environments don't support Docker software containers for technical or policy reasons.
As a workaround, the CWL reference runner supports using alternative ``docker`` implementations on Linux
with the ``--user-space-docker-cmd`` option.
One such "user space" friendly docker replacement is ``udocker`` https://github.com/indigo-dc/udocker.
udocker installation: https://github.com/indigo-dc/udocker/blob/master/doc/installation_manual.md#22-install-from-udockertools-tarball
Run `cwltool` just as you usually would, but with the new option, e.g., from the conformance tests
.. code:: bash
cwltool --user-space-docker-cmd=udocker https://raw.githubusercontent.com/common-workflow-language/common-workflow-language/main/v1.0/v1.0/test-cwl-out2.cwl https://github.com/common-workflow-language/common-workflow-language/raw/main/v1.0/v1.0/empty.json
``cwltool`` can also use `Singularity `_ version 2.6.1
or later as a Docker container runtime.
``cwltool`` with Singularity will run software containers specified in
``DockerRequirement`` and therefore works with Docker images only, native
Singularity images are not supported. To use Singularity as the Docker container
runtime, provide ``--singularity`` command line option to ``cwltool``.
With Singularity, ``cwltool`` can pass all CWL v1.0 conformance tests, except
those involving Docker container ENTRYPOINTs.
Example
.. code:: bash
cwltool --singularity https://raw.githubusercontent.com/common-workflow-language/common-workflow-language/main/v1.0/v1.0/v1.0/cat3-tool-mediumcut.cwl https://github.com/common-workflow-language/common-workflow-language/blob/main/v1.0/v1.0/cat-job.json
Running a tool or workflow from remote or local locations
---------------------------------------------------------
``cwltool`` can run tool and workflow descriptions on both local and remote
systems via its support for HTTP[S] URLs.
Input job files and Workflow steps (via the `run` directive) can reference CWL
documents using absolute or relative local filesystem paths. If a relative path
is referenced and that document isn't found in the current directory, then the
following locations will be searched:
http://www.commonwl.org/v1.0/CommandLineTool.html#Discovering_CWL_documents_on_a_local_filesystem
You can also use `cwldep `
to manage dependencies on external tools and workflows.
Overriding workflow requirements at load time
---------------------------------------------
Sometimes a workflow needs additional requirements to run in a particular
environment or with a particular dataset. To avoid the need to modify the
underlying workflow, cwltool supports requirement "overrides".
The format of the "overrides" object is a mapping of item identifier (workflow,
workflow step, or command line tool) to the process requirements that should be applied.
.. code:: yaml
cwltool:overrides:
echo.cwl:
requirements:
EnvVarRequirement:
envDef:
MESSAGE: override_value
Overrides can be specified either on the command line, or as part of the job
input document. Workflow steps are identified using the name of the workflow
file followed by the step name as a document fragment identifier "#id".
Override identifiers are relative to the top-level workflow document.
.. code:: bash
cwltool --overrides overrides.yml my-tool.cwl my-job.yml
.. code:: yaml
input_parameter1: value1
input_parameter2: value2
cwltool:overrides:
workflow.cwl#step1:
requirements:
EnvVarRequirement:
envDef:
MESSAGE: override_value
.. code:: bash
cwltool my-tool.cwl my-job-with-overrides.yml
Combining parts of a workflow into a single document
----------------------------------------------------
Use ``--pack`` to combine a workflow made up of multiple files into a
single compound document. This operation takes all the CWL files
referenced by a workflow and builds a new CWL document with all
Process objects (CommandLineTool and Workflow) in a list in the
``$graph`` field. Cross references (such as ``run:`` and ``source:``
fields) are updated to internal references within the new packed
document. The top-level workflow is named ``#main``.
.. code:: bash
cwltool --pack my-wf.cwl > my-packed-wf.cwl
Running only part of a workflow
-------------------------------
You can run a partial workflow with the ``--target`` (``-t``) option. This
takes the name of an output parameter, workflow step, or input
parameter in the top-level workflow. You may provide multiple
targets.
.. code:: bash
cwltool --target step3 my-wf.cwl
If a target is an output parameter, it will only run only the steps
that contribute to that output. If a target is a workflow step, it
will run the workflow starting from that step. If a target is an
input parameter, it will only run the steps connected to
that input.
Use ``--print-targets`` to get a listing of the targets of a workflow.
To see which steps will run, use ``--print-subgraph`` with
``--target`` to get a printout of the workflow subgraph for the
selected targets.
.. code:: bash
cwltool --print-targets my-wf.cwl
cwltool --target step3 --print-subgraph my-wf.cwl > my-wf-starting-from-step3.cwl
Visualizing a CWL document
--------------------------
The ``--print-dot`` option will print a file suitable for Graphviz ``dot`` program. Here is a bash onliner to generate a Scalable Vector Graphic (SVG) file:
.. code:: bash
cwltool --print-dot my-wf.cwl | dot -Tsvg > my-wf.svg
Modeling a CWL document as RDF
------------------------------
CWL documents can be expressed as RDF triple graphs.
.. code:: bash
cwltool --print-rdf --rdf-serializer=turtle mywf.cwl
Environment Variables in cwltool
--------------------------------
This reference implementation supports several ways of setting
environment variables for tools, in addition to the standard
``EnvVarRequirement``. The sequence of steps applied to create the
enviroment is:
0. If the ``--preserve-entire-environment`` flag is present, then begin with the current
environment, else begin with an empty environment.
1. Add any variables specified by ``--preserve-environment`` option(s).
2. Set ``TMPDIR`` and ``HOME`` per `the CWL v1.0+ CommandLineTool specification `_.
3. Apply any ``EnvVarRequirement`` from the ``CommandLineTool`` description.
4. Apply any manipulations required by any ``cwltool:MPIRequirement`` extensions.
5. Substitute any secrets required by ``Secrets`` extension.
6. Modify the environment in response to ``SoftwareRequirement`` (see below).
Leveraging SoftwareRequirements (Beta)
--------------------------------------
CWL tools may be decorated with ``SoftwareRequirement`` hints that cwltool
may in turn use to resolve to packages in various package managers or
dependency management systems such as `Environment Modules
`__.
Utilizing ``SoftwareRequirement`` hints using cwltool requires an optional
dependency, for this reason be sure to use specify the ``deps`` modifier when
installing cwltool. For instance::
$ pip install 'cwltool[deps]'
Installing cwltool in this fashion enables several new command line options.
The most general of these options is ``--beta-dependency-resolvers-configuration``.
This option allows one to specify a dependency resolver's configuration file.
This file may be specified as either XML or YAML and very simply describes various
plugins to enable to "resolve" ``SoftwareRequirement`` dependencies.
Using these hints will allow cwltool to modify the environment in
which your tool runs, for example by loading one or more Environment
Modules. The environment is constructed as above, then the environment
may modified by the selected tool resolver. This currently means that
you cannot override any environment variables set by the selected tool
resolver. Note that the enviroment given to the configured dependency
resolver has the variable `_CWLTOOL` set to `1` to allow introspection.
To discuss some of these plugins and how to configure them, first consider the
following ``hint`` definition for an example CWL tool.
.. code:: yaml
SoftwareRequirement:
packages:
- package: seqtk
version:
- r93
Now imagine deploying cwltool on a cluster with Software Modules installed
and that a ``seqtk`` module is available at version ``r93``. This means cluster
users likely won't have the binary ``seqtk`` on their ``PATH`` by default, but after
sourcing this module with the command ``modulecmd sh load seqtk/r93`` ``seqtk`` is
available on the ``PATH``. A simple dependency resolvers configuration file, called
``dependency-resolvers-conf.yml`` for instance, that would enable cwltool to source
the correct module environment before executing the above tool would simply be:
.. code:: yaml
- type: modules
The outer list indicates that one plugin is being enabled, the plugin parameters are
defined as a dictionary for this one list item. There is only one required parameter
for the plugin above, this is ``type`` and defines the plugin type. This parameter
is required for all plugins. The available plugins and the parameters
available for each are documented (incompletely) `here
`__.
Unfortunately, this documentation is in the context of Galaxy tool
``requirement`` s instead of CWL ``SoftwareRequirement`` s, but the concepts map fairly directly.
cwltool is distributed with an example of such seqtk tool and sample corresponding
job. It could executed from the cwltool root using a dependency resolvers
configuration file such as the above one using the command::
cwltool --beta-dependency-resolvers-configuration /path/to/dependency-resolvers-conf.yml \
tests/seqtk_seq.cwl \
tests/seqtk_seq_job.json
This example demonstrates both that cwltool can leverage
existing software installations and also handle workflows with dependencies
on different versions of the same software and libraries. However the above
example does require an existing module setup so it is impossible to test this example
"out of the box" with cwltool. For a more isolated test that demonstrates all
the same concepts - the resolver plugin type ``galaxy_packages`` can be used.
"Galaxy packages" are a lighter-weight alternative to Environment Modules that are
really just defined by a way to lay out directories into packages and versions
to find little scripts that are sourced to modify the environment. They have
been used for years in Galaxy community to adapt Galaxy tools to cluster
environments but require neither knowledge of Galaxy nor any special tools to
setup. These should work just fine for CWL tools.
The cwltool source code repository's test directory is setup with a very simple
directory that defines a set of "Galaxy packages" (but really just defines one
package named ``random-lines``). The directory layout is simply::
tests/test_deps_env/
random-lines/
1.0/
env.sh
If the ``galaxy_packages`` plugin is enabled and pointed at the
``tests/test_deps_env`` directory in cwltool's root and a ``SoftwareRequirement``
such as the following is encountered.
.. code:: yaml
hints:
SoftwareRequirement:
packages:
- package: 'random-lines'
version:
- '1.0'
Then cwltool will simply find that ``env.sh`` file and source it before executing
the corresponding tool. That ``env.sh`` script is only responsible for modifying
the job's ``PATH`` to add the required binaries.
This is a full example that works since resolving "Galaxy packages" has no
external requirements. Try it out by executing the following command from cwltool's
root directory::
cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf.yml \
tests/random_lines.cwl \
tests/random_lines_job.json
The resolvers configuration file in the above example was simply:
.. code:: yaml
- type: galaxy_packages
base_path: ./tests/test_deps_env
It is possible that the ``SoftwareRequirement`` s in a given CWL tool will not
match the module names for a given cluster. Such requirements can be re-mapped
to specific deployed packages or versions using another file specified using
the resolver plugin parameter `mapping_files`. We will
demonstrate this using `galaxy_packages,` but the concepts apply equally well
to Environment Modules or Conda packages (described below), for instance.
So consider the resolvers configuration file.
(`tests/test_deps_env_resolvers_conf_rewrite.yml`):
.. code:: yaml
- type: galaxy_packages
base_path: ./tests/test_deps_env
mapping_files: ./tests/test_deps_mapping.yml
And the corresponding mapping configuration file (`tests/test_deps_mapping.yml`):
.. code:: yaml
- from:
name: randomLines
version: 1.0.0-rc1
to:
name: random-lines
version: '1.0'
This is saying if cwltool encounters a requirement of ``randomLines`` at version
``1.0.0-rc1`` in a tool, to rewrite to our specific plugin as ``random-lines`` at
version ``1.0``. cwltool has such a test tool called ``random_lines_mapping.cwl``
that contains such a source ``SoftwareRequirement``. To try out this example with
mapping, execute the following command from the cwltool root directory::
cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf_rewrite.yml \
tests/random_lines_mapping.cwl \
tests/random_lines_job.json
The previous examples demonstrated leveraging existing infrastructure to
provide requirements for CWL tools. If instead a real package manager is used
cwltool has the opportunity to install requirements as needed. While initial
support for Homebrew/Linuxbrew plugins is available, the most developed such
plugin is for the `Conda `__ package manager. Conda has the nice properties
of allowing multiple versions of a package to be installed simultaneously,
not requiring evaluated permissions to install Conda itself or packages using
Conda, and being cross-platform. For these reasons, cwltool may run as a normal
user, install its own Conda environment and manage multiple versions of Conda packages
on Linux and Mac OS X.
The Conda plugin can be endlessly configured, but a sensible set of defaults
that has proven a powerful stack for dependency management within the Galaxy tool
development ecosystem can be enabled by simply passing cwltool the
``--beta-conda-dependencies`` flag.
With this, we can use the seqtk example above without Docker or any externally managed services - cwltool should install everything it needs
and create an environment for the tool. Try it out with the following command::
cwltool --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json
The CWL specification allows URIs to be attached to ``SoftwareRequirement`` s
that allow disambiguation of package names. If the mapping files described above
allow deployers to adapt tools to their infrastructure, this mechanism allows
tools to adapt their requirements to multiple package managers. To demonstrate
this within the context of the seqtk, we can simply break the package name we
use and then specify a specific Conda package as follows:
.. code:: yaml
hints:
SoftwareRequirement:
packages:
- package: seqtk_seq
version:
- '1.2'
specs:
- https://anaconda.org/bioconda/seqtk
- https://packages.debian.org/sid/seqtk
The example can be executed using the command::
cwltool --beta-conda-dependencies tests/seqtk_seq_wrong_name.cwl tests/seqtk_seq_job.json
The plugin framework for managing the resolution of these software requirements
as maintained as part of `galaxy-tool-util `__ - a small,
portable subset of the Galaxy project. More information on configuration and implementation can be found
at the following links:
- `Dependency Resolvers in Galaxy `__
- `Conda for [Galaxy] Tool Dependencies `__
- `Mapping Files - Implementation `__
- `Specifications - Implementation `__
- `Initial cwltool Integration Pull Request `__
Use with GA4GH Tool Registry API
--------------------------------
Cwltool can launch tools directly from `GA4GH Tool Registry API`_ endpoints.
By default, cwltool searches https://dockstore.org/ . Use ``--add-tool-registry`` to add other registries to the search path.
For example ::
cwltool quay.io/collaboratory/dockstore-tool-bamstats:develop test.json
and (defaults to latest when a version is not specified) ::
cwltool quay.io/collaboratory/dockstore-tool-bamstats test.json
For this example, grab the test.json (and input file) from https://github.com/CancerCollaboratory/dockstore-tool-bamstats ::
wget https://dockstore.org/api/api/ga4gh/v2/tools/quay.io%2Fbriandoconnor%2Fdockstore-tool-bamstats/versions/develop/PLAIN-CWL/descriptor/test.json
wget https://github.com/CancerCollaboratory/dockstore-tool-bamstats/raw/develop/rna.SRR948778.bam
.. _`GA4GH Tool Registry API`: https://github.com/ga4gh/tool-registry-schemas
Running MPI-based tools that need to be launched
------------------------------------------------
Cwltool supports an extension to the CWL spec
``http://commonwl.org/cwltool#MPIRequirement``. When the tool
definition has this in its ``requirements``/``hints`` section, and
cwltool has been run with ``--enable-ext``, then the tool's command
line will be extended with the commands needed to launch it with
``mpirun`` or similar. You can specify the number of processes to
start as either a literal integer or an expression (that will result
in an integer). For example::
#!/usr/bin/env cwl-runner
cwlVersion: v1.1
class: CommandLineTool
$namespaces:
cwltool: "http://commonwl.org/cwltool#"
requirements:
cwltool:MPIRequirement:
processes: $(inputs.nproc)
inputs:
nproc:
type: int
Interaction with containers: the MPIRequirement currently prepends its
commands to the front of the command line that is constructed. If you
wish to run a containerized application in parallel, for simple use
cases, this does work with Singularity, depending upon the platform
setup. However, this combination should be considered "alpha" -- please
do report any issues you have! This does not work with Docker at the
moment. (More precisely, you get `n` copies of the same single process
image run at the same time that cannot communicate with each other.)
The host-specific parameters are configured in a simple YAML file
(specified with the ``--mpi-config-file`` flag). The allowed keys are
given in the following table; all are optional.
+----------------+------------------+----------+------------------------------+
| Key | Type | Default | Description |
+================+==================+==========+==============================+
| runner | str | "mpirun" | The primary command to use. |
+----------------+------------------+----------+------------------------------+
| nproc_flag | str | "-n" | Flag to set number of |
| | | | processes to start. |
+----------------+------------------+----------+------------------------------+
| default_nproc | int | 1 | Default number of processes. |
+----------------+------------------+----------+------------------------------+
| extra_flags | List[str] | [] | A list of any other flags to |
| | | | be added to the runner's |
| | | | command line before |
| | | | the ``baseCommand``. |
+----------------+------------------+----------+------------------------------+
| env_pass | List[str] | [] | A list of environment |
| | | | variables that should be |
| | | | passed from the host |
| | | | environment through to the |
| | | | tool (e.g., giving the |
| | | | node list as set by your |
| | | | scheduler). |
+----------------+------------------+----------+------------------------------+
| env_pass_regex | List[str] | [] | A list of python regular |
| | | | expressions that will be |
| | | | matched against the host's |
| | | | environment. Those that match|
| | | | will be passed through. |
+----------------+------------------+----------+------------------------------+
| env_set | Mapping[str,str] | {} | A dictionary whose keys are |
| | | | the environment variables set|
| | | | and the values being the |
| | | | values. |
+----------------+------------------+----------+------------------------------+
===========
Development
===========
Running tests locally
---------------------
- Running basic tests ``(/tests)``:
To run the basic tests after installing `cwltool` execute the following:
.. code:: bash
pip install -rtest-requirements.txt
pytest ## N.B. This requires node.js or docker to be available
To run various tests in all supported Python environments, we use `tox `_. To run the test suite in all supported Python environments
first clone the complete code repository (see the ``git clone`` instructions above) and then run
the following in the terminal:
``pip install tox; tox -p``
List of all environment can be seen using:
``tox --listenvs``
and running a specfic test env using:
``tox -e ``
and additionally run a specific test using this format:
``tox -e py310-unit -- -v tests/test_examples.py::test_scandeps``
- Running the entire suite of CWL conformance tests:
The GitHub repository for the CWL specifications contains a script that tests a CWL
implementation against a wide array of valid CWL files using the `cwltest `_
program
Instructions for running these tests can be found in the Common Workflow Language Specification repository at https://github.com/common-workflow-language/common-workflow-language/blob/main/CONFORMANCE_TESTS.md .
Import as a module
------------------
Add
.. code:: python
import cwltool
to your script.
The easiest way to use cwltool to run a tool or workflow from Python is to use a Factory
.. code:: python
import cwltool.factory
fac = cwltool.factory.Factory()
echo = fac.make("echo.cwl")
result = echo(inp="foo")
# result["out"] == "foo"
CWL Tool Control Flow
---------------------
Technical outline of how cwltool works internally, for maintainers.
#. Use CWL ``load_tool()`` to load document.
#. Fetches the document from file or URL
#. Applies preprocessing (syntax/identifier expansion and normalization)
#. Validates the document based on cwlVersion
#. If necessary, updates the document to the latest spec
#. Constructs a Process object using ``make_tool()``` callback. This yields a
CommandLineTool, Workflow, or ExpressionTool. For workflows, this
recursively constructs each workflow step.
#. To construct custom types for CommandLineTool, Workflow, or
ExpressionTool, provide a custom ``make_tool()``
#. Iterate on the ``job()`` method of the Process object to get back runnable jobs.
#. ``job()`` is a generator method (uses the Python iterator protocol)
#. Each time the ``job()`` method is invoked in an iteration, it returns one
of: a runnable item (an object with a ``run()`` method), ``None`` (indicating
there is currently no work ready to run) or end of iteration (indicating
the process is complete.)
#. Invoke the runnable item by calling ``run()``. This runs the tool and gets output.
#. An output callback reports the output of a process.
#. ``job()`` may be iterated over multiple times. It will yield all the work
that is currently ready to run and then yield None.
#. ``Workflow`` objects create a corresponding ``WorkflowJob`` and ``WorkflowJobStep`` objects to hold the workflow state for the duration of the job invocation.
#. The WorkflowJob iterates over each WorkflowJobStep and determines if the
inputs the step are ready.
#. When a step is ready, it constructs an input object for that step and
iterates on the ``job()`` method of the workflow job step.
#. Each runnable item is yielded back up to top-level run loop
#. When a step job completes and receives an output callback, the
job outputs are assigned to the output of the workflow step.
#. When all steps are complete, the intermediate files are moved to a final
workflow output, intermediate directories are deleted, and the workflow's output callback is called.
#. ``CommandLineTool`` job() objects yield a single runnable object.
#. The CommandLineTool ``job()`` method calls ``make_job_runner()`` to create a
``CommandLineJob`` object
#. The job method configures the CommandLineJob object by setting public
attributes
#. The job method iterates over file and directories inputs to the
CommandLineTool and creates a "path map".
#. Files are mapped from their "resolved" location to a "target" path where
they will appear at tool invocation (for example, a location inside a
Docker container.) The target paths are used on the command line.
#. Files are staged to targets paths using either Docker volume binds (when
using containers) or symlinks (if not). This staging step enables files
to be logically rearranged or renamed independent of their source layout.
#. The ``run()`` method of CommandLineJob executes the command line tool or
Docker container, waits for it to complete, collects output, and makes
the output callback.
Extension points
----------------
The following functions can be passed to main() to override or augment
the listed behaviors.
executor
::
executor(tool, job_order_object, runtimeContext, logger)
(Process, Dict[Text, Any], RuntimeContext) -> Tuple[Dict[Text, Any], Text]
An implementation of the top-level workflow execution loop should
synchronously run a process object to completion and return the
output object.
versionfunc
::
()
() -> Text
Return version string.
logger_handler
::
logger_handler
logging.Handler
Handler object for logging.
The following functions can be set in LoadingContext to override or
augment the listed behaviors.
fetcher_constructor
::
fetcher_constructor(cache, session)
(Dict[unicode, unicode], requests.sessions.Session) -> Fetcher
Construct a Fetcher object with the supplied cache and HTTP session.
resolver
::
resolver(document_loader, document)
(Loader, Union[Text, dict[Text, Any]]) -> Text
Resolve a relative document identifier to an absolute one that can be fetched.
The following functions can be set in RuntimeContext to override or
augment the listed behaviors.
construct_tool_object
::
construct_tool_object(toolpath_object, loadingContext)
(MutableMapping[Text, Any], LoadingContext) -> Process
Hook to construct a Process object (eg CommandLineTool) object from a document.
select_resources
::
selectResources(request)
(Dict[str, int], RuntimeContext) -> Dict[Text, int]
Take a resource request and turn it into a concrete resource assignment.
make_fs_access
::
make_fs_access(basedir)
(Text) -> StdFsAccess
Return a file system access object.
In addition, when providing custom subclasses of Process objects, you can override the following methods:
CommandLineTool.make_job_runner
::
make_job_runner(RuntimeContext)
(RuntimeContext) -> Type[JobBase]
Create and return a job runner object (this implements concrete execution of a command line tool).
Workflow.make_workflow_step
::
make_workflow_step(toolpath_object, pos, loadingContext, parentworkflowProv)
(Dict[Text, Any], int, LoadingContext, Optional[ProvenanceProfile]) -> WorkflowStep
Create and return a workflow step object.
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Healthcare Industry
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Natural Language :: English
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: POSIX
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Astronomy
Classifier: Topic :: Scientific/Engineering :: Atmospheric Science
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Classifier: Topic :: System :: Distributed Computing
Classifier: Topic :: Utilities
Requires-Python: >=3.6, <4
Description-Content-Type: text/x-rst
Provides-Extra: deps
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/README.rst 0000644 0001750 0001750 00000110412 00000000000 015142 0 ustar 00peter peter ==================================================================
Common Workflow Language tool description reference implementation
==================================================================
|Linux Status| |Coverage Status| |Docs Status|
PyPI: |PyPI Version| |PyPI Downloads Month| |Total PyPI Downloads|
Conda: |Conda Version| |Conda Installs|
Debian: |Debian Testing package| |Debian Stable package|
Quay.io (Docker): |Quay.io Container|
.. |Linux Status| image:: https://github.com/common-workflow-language/cwltool/actions/workflows/ci-tests.yml/badge.svg?branch=main
:target: https://github.com/common-workflow-language/cwltool/actions/workflows/ci-tests.yml
.. |Debian Stable package| image:: https://badges.debian.net/badges/debian/stable/cwltool/version.svg
:target: https://packages.debian.org/stable/cwltool
.. |Debian Testing package| image:: https://badges.debian.net/badges/debian/testing/cwltool/version.svg
:target: https://packages.debian.org/testing/cwltool
.. |Coverage Status| image:: https://img.shields.io/codecov/c/github/common-workflow-language/cwltool.svg
:target: https://codecov.io/gh/common-workflow-language/cwltool
.. |PyPI Version| image:: https://badge.fury.io/py/cwltool.svg
:target: https://badge.fury.io/py/cwltool
.. |PyPI Downloads Month| image:: https://pepy.tech/badge/cwltool/month
:target: https://pepy.tech/project/cwltool
.. |Total PyPI Downloads| image:: https://static.pepy.tech/personalized-badge/cwltool?period=total&units=international_system&left_color=black&right_color=orange&left_text=Total%20PyPI%20Downloads
:target: https://pepy.tech/project/cwltool
.. |Conda Version| image:: https://anaconda.org/conda-forge/cwltool/badges/version.svg
:target: https://anaconda.org/conda-forge/cwltool
.. |Conda Installs| image:: https://anaconda.org/conda-forge/cwltool/badges/downloads.svg
:target: https://anaconda.org/conda-forge/cwltool
.. |Quay.io Container| image:: https://quay.io/repository/commonwl/cwltool/status
:target: https://quay.io/repository/commonwl/cwltool
.. |Docs Status| image:: https://readthedocs.org/projects/cwltool/badge/?version=latest
:target: https://cwltool.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status
This is the reference implementation of the Common Workflow Language. It is
intended to be feature complete and provide comprehensive validation of CWL
files as well as provide other tools related to working with CWL.
This is written and tested for
`Python `_ ``3.x {x = 6, 7, 8, 9, 10}``
The reference implementation consists of two packages. The ``cwltool`` package
is the primary Python module containing the reference implementation in the
``cwltool`` module and console executable by the same name.
The ``cwlref-runner`` package is optional and provides an additional entry point
under the alias ``cwl-runner``, which is the implementation-agnostic name for the
default CWL interpreter installed on a host.
``cwltool`` is provided by the CWL project, `a member project of Software Freedom Conservancy `_
and our `many contributors `_.
Install
-------
``cwltool`` packages
^^^^^^^^^^^^^^^^^^^^
Your operating system may offer cwltool directly. For `Debian `_, `Ubuntu `_,
and similar Linux distribution try
.. code:: bash
sudo apt-get install cwltool
If you encounter an error, first try to update package information by using
.. code:: bash
sudo apt-get update
If you are running macOS X or other UNIXes and you want to use packages prepared by the conda-forge project, then
please follow the install instructions for `conda-forge `_ (if you haven't already) and then
.. code:: bash
conda install -c conda-forge cwltool
All of the above methods of installing ``cwltool`` use packages that might contain bugs already fixed in newer versions or be missing desired features.
If the packaged version of ``cwltool`` available to you is too old, then we recommend installing using ``pip`` and ``venv``
.. code:: bash
python3 -m venv env # Create a virtual environment named 'env' in the current directory
source env/bin/activate # Activate environment before installing `cwltool`
Then install the latest ``cwlref-runner`` package from PyPi (which will install the latest ``cwltool`` package as
well)
.. code:: bash
pip install cwlref-runner
If installing alongside another CWL implementation (like ``toil-cwl-runner`` or ``arvados-cwl-runner``) then instead run
.. code:: bash
pip install cwltool
MS Windows users
^^^^^^^^^^^^^^^^
1. Install `"Windows Subsystem for Linux 2" (WSL2) and Docker Desktop `_
2. Install `Debian from the Microsoft Store `_
3. Set Debian as your default WSL 2 distro: ``wsl --set-default debian``.
4. Return to the Docker Desktop, choose `Settings → Resources → WSL Integration `_ and under "Enable integration with additional distros" select "Debian",
5. Reboot if you have not yet already.
6. Launch Debian and follow the Linux instructions above (``apt-get install cwltool`` or use the ``venv`` method)
Network problems from within WSL2? Try `these instructions `_ followed by ``wsl --shutdown``.
``cwltool`` development version
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Or you can skip the direct ``pip`` commands above and install the latest development version of ``cwltool``:
.. code:: bash
git clone https://github.com/common-workflow-language/cwltool.git # clone (copy) the cwltool git repository
cd cwltool # Change to source directory that git clone just downloaded
pip install .[deps] # Installs ``cwltool`` from source
cwltool --version # Check if the installation works correctly
Remember, if co-installing multiple CWL implementations, then you need to
maintain which implementation ``cwl-runner`` points to via a symbolic file
system link or `another facility `_.
Recommended Software
^^^^^^^^^^^^^^^^^^^^
You may also want to have the following installed:
- `node.js `_
- Docker, udocker, or Singularity (optional)
Without these, some examples in the CWL tutorials at http://www.commonwl.org/user_guide/ may not work.
Run on the command line
-----------------------
Simple command::
cwl-runner [tool-or-workflow-description] [input-job-settings]
Or if you have multiple CWL implementations installed and you want to override
the default cwl-runner then use::
cwltool [tool-or-workflow-description] [input-job-settings]
You can set cwltool options in the environment with CWLTOOL_OPTIONS,
these will be inserted at the beginning of the command line::
export CWLTOOL_OPTIONS="--debug"
Use with boot2docker on macOS
-----------------------------
boot2docker runs Docker inside a virtual machine, and it only mounts ``Users``
on it. The default behavior of CWL is to create temporary directories under e.g.
``/Var`` which is not accessible to Docker containers.
To run CWL successfully with boot2docker you need to set the ``--tmpdir-prefix``
and ``--tmp-outdir-prefix`` to somewhere under ``/Users``::
$ cwl-runner --tmp-outdir-prefix=/Users/username/project --tmpdir-prefix=/Users/username/project wc-tool.cwl wc-job.json
Using uDocker
-------------
Some shared computing environments don't support Docker software containers for technical or policy reasons.
As a workaround, the CWL reference runner supports using alternative ``docker`` implementations on Linux
with the ``--user-space-docker-cmd`` option.
One such "user space" friendly docker replacement is ``udocker`` https://github.com/indigo-dc/udocker.
udocker installation: https://github.com/indigo-dc/udocker/blob/master/doc/installation_manual.md#22-install-from-udockertools-tarball
Run `cwltool` just as you usually would, but with the new option, e.g., from the conformance tests
.. code:: bash
cwltool --user-space-docker-cmd=udocker https://raw.githubusercontent.com/common-workflow-language/common-workflow-language/main/v1.0/v1.0/test-cwl-out2.cwl https://github.com/common-workflow-language/common-workflow-language/raw/main/v1.0/v1.0/empty.json
``cwltool`` can also use `Singularity `_ version 2.6.1
or later as a Docker container runtime.
``cwltool`` with Singularity will run software containers specified in
``DockerRequirement`` and therefore works with Docker images only, native
Singularity images are not supported. To use Singularity as the Docker container
runtime, provide ``--singularity`` command line option to ``cwltool``.
With Singularity, ``cwltool`` can pass all CWL v1.0 conformance tests, except
those involving Docker container ENTRYPOINTs.
Example
.. code:: bash
cwltool --singularity https://raw.githubusercontent.com/common-workflow-language/common-workflow-language/main/v1.0/v1.0/v1.0/cat3-tool-mediumcut.cwl https://github.com/common-workflow-language/common-workflow-language/blob/main/v1.0/v1.0/cat-job.json
Running a tool or workflow from remote or local locations
---------------------------------------------------------
``cwltool`` can run tool and workflow descriptions on both local and remote
systems via its support for HTTP[S] URLs.
Input job files and Workflow steps (via the `run` directive) can reference CWL
documents using absolute or relative local filesystem paths. If a relative path
is referenced and that document isn't found in the current directory, then the
following locations will be searched:
http://www.commonwl.org/v1.0/CommandLineTool.html#Discovering_CWL_documents_on_a_local_filesystem
You can also use `cwldep `
to manage dependencies on external tools and workflows.
Overriding workflow requirements at load time
---------------------------------------------
Sometimes a workflow needs additional requirements to run in a particular
environment or with a particular dataset. To avoid the need to modify the
underlying workflow, cwltool supports requirement "overrides".
The format of the "overrides" object is a mapping of item identifier (workflow,
workflow step, or command line tool) to the process requirements that should be applied.
.. code:: yaml
cwltool:overrides:
echo.cwl:
requirements:
EnvVarRequirement:
envDef:
MESSAGE: override_value
Overrides can be specified either on the command line, or as part of the job
input document. Workflow steps are identified using the name of the workflow
file followed by the step name as a document fragment identifier "#id".
Override identifiers are relative to the top-level workflow document.
.. code:: bash
cwltool --overrides overrides.yml my-tool.cwl my-job.yml
.. code:: yaml
input_parameter1: value1
input_parameter2: value2
cwltool:overrides:
workflow.cwl#step1:
requirements:
EnvVarRequirement:
envDef:
MESSAGE: override_value
.. code:: bash
cwltool my-tool.cwl my-job-with-overrides.yml
Combining parts of a workflow into a single document
----------------------------------------------------
Use ``--pack`` to combine a workflow made up of multiple files into a
single compound document. This operation takes all the CWL files
referenced by a workflow and builds a new CWL document with all
Process objects (CommandLineTool and Workflow) in a list in the
``$graph`` field. Cross references (such as ``run:`` and ``source:``
fields) are updated to internal references within the new packed
document. The top-level workflow is named ``#main``.
.. code:: bash
cwltool --pack my-wf.cwl > my-packed-wf.cwl
Running only part of a workflow
-------------------------------
You can run a partial workflow with the ``--target`` (``-t``) option. This
takes the name of an output parameter, workflow step, or input
parameter in the top-level workflow. You may provide multiple
targets.
.. code:: bash
cwltool --target step3 my-wf.cwl
If a target is an output parameter, it will only run only the steps
that contribute to that output. If a target is a workflow step, it
will run the workflow starting from that step. If a target is an
input parameter, it will only run the steps connected to
that input.
Use ``--print-targets`` to get a listing of the targets of a workflow.
To see which steps will run, use ``--print-subgraph`` with
``--target`` to get a printout of the workflow subgraph for the
selected targets.
.. code:: bash
cwltool --print-targets my-wf.cwl
cwltool --target step3 --print-subgraph my-wf.cwl > my-wf-starting-from-step3.cwl
Visualizing a CWL document
--------------------------
The ``--print-dot`` option will print a file suitable for Graphviz ``dot`` program. Here is a bash onliner to generate a Scalable Vector Graphic (SVG) file:
.. code:: bash
cwltool --print-dot my-wf.cwl | dot -Tsvg > my-wf.svg
Modeling a CWL document as RDF
------------------------------
CWL documents can be expressed as RDF triple graphs.
.. code:: bash
cwltool --print-rdf --rdf-serializer=turtle mywf.cwl
Environment Variables in cwltool
--------------------------------
This reference implementation supports several ways of setting
environment variables for tools, in addition to the standard
``EnvVarRequirement``. The sequence of steps applied to create the
enviroment is:
0. If the ``--preserve-entire-environment`` flag is present, then begin with the current
environment, else begin with an empty environment.
1. Add any variables specified by ``--preserve-environment`` option(s).
2. Set ``TMPDIR`` and ``HOME`` per `the CWL v1.0+ CommandLineTool specification `_.
3. Apply any ``EnvVarRequirement`` from the ``CommandLineTool`` description.
4. Apply any manipulations required by any ``cwltool:MPIRequirement`` extensions.
5. Substitute any secrets required by ``Secrets`` extension.
6. Modify the environment in response to ``SoftwareRequirement`` (see below).
Leveraging SoftwareRequirements (Beta)
--------------------------------------
CWL tools may be decorated with ``SoftwareRequirement`` hints that cwltool
may in turn use to resolve to packages in various package managers or
dependency management systems such as `Environment Modules
`__.
Utilizing ``SoftwareRequirement`` hints using cwltool requires an optional
dependency, for this reason be sure to use specify the ``deps`` modifier when
installing cwltool. For instance::
$ pip install 'cwltool[deps]'
Installing cwltool in this fashion enables several new command line options.
The most general of these options is ``--beta-dependency-resolvers-configuration``.
This option allows one to specify a dependency resolver's configuration file.
This file may be specified as either XML or YAML and very simply describes various
plugins to enable to "resolve" ``SoftwareRequirement`` dependencies.
Using these hints will allow cwltool to modify the environment in
which your tool runs, for example by loading one or more Environment
Modules. The environment is constructed as above, then the environment
may modified by the selected tool resolver. This currently means that
you cannot override any environment variables set by the selected tool
resolver. Note that the enviroment given to the configured dependency
resolver has the variable `_CWLTOOL` set to `1` to allow introspection.
To discuss some of these plugins and how to configure them, first consider the
following ``hint`` definition for an example CWL tool.
.. code:: yaml
SoftwareRequirement:
packages:
- package: seqtk
version:
- r93
Now imagine deploying cwltool on a cluster with Software Modules installed
and that a ``seqtk`` module is available at version ``r93``. This means cluster
users likely won't have the binary ``seqtk`` on their ``PATH`` by default, but after
sourcing this module with the command ``modulecmd sh load seqtk/r93`` ``seqtk`` is
available on the ``PATH``. A simple dependency resolvers configuration file, called
``dependency-resolvers-conf.yml`` for instance, that would enable cwltool to source
the correct module environment before executing the above tool would simply be:
.. code:: yaml
- type: modules
The outer list indicates that one plugin is being enabled, the plugin parameters are
defined as a dictionary for this one list item. There is only one required parameter
for the plugin above, this is ``type`` and defines the plugin type. This parameter
is required for all plugins. The available plugins and the parameters
available for each are documented (incompletely) `here
`__.
Unfortunately, this documentation is in the context of Galaxy tool
``requirement`` s instead of CWL ``SoftwareRequirement`` s, but the concepts map fairly directly.
cwltool is distributed with an example of such seqtk tool and sample corresponding
job. It could executed from the cwltool root using a dependency resolvers
configuration file such as the above one using the command::
cwltool --beta-dependency-resolvers-configuration /path/to/dependency-resolvers-conf.yml \
tests/seqtk_seq.cwl \
tests/seqtk_seq_job.json
This example demonstrates both that cwltool can leverage
existing software installations and also handle workflows with dependencies
on different versions of the same software and libraries. However the above
example does require an existing module setup so it is impossible to test this example
"out of the box" with cwltool. For a more isolated test that demonstrates all
the same concepts - the resolver plugin type ``galaxy_packages`` can be used.
"Galaxy packages" are a lighter-weight alternative to Environment Modules that are
really just defined by a way to lay out directories into packages and versions
to find little scripts that are sourced to modify the environment. They have
been used for years in Galaxy community to adapt Galaxy tools to cluster
environments but require neither knowledge of Galaxy nor any special tools to
setup. These should work just fine for CWL tools.
The cwltool source code repository's test directory is setup with a very simple
directory that defines a set of "Galaxy packages" (but really just defines one
package named ``random-lines``). The directory layout is simply::
tests/test_deps_env/
random-lines/
1.0/
env.sh
If the ``galaxy_packages`` plugin is enabled and pointed at the
``tests/test_deps_env`` directory in cwltool's root and a ``SoftwareRequirement``
such as the following is encountered.
.. code:: yaml
hints:
SoftwareRequirement:
packages:
- package: 'random-lines'
version:
- '1.0'
Then cwltool will simply find that ``env.sh`` file and source it before executing
the corresponding tool. That ``env.sh`` script is only responsible for modifying
the job's ``PATH`` to add the required binaries.
This is a full example that works since resolving "Galaxy packages" has no
external requirements. Try it out by executing the following command from cwltool's
root directory::
cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf.yml \
tests/random_lines.cwl \
tests/random_lines_job.json
The resolvers configuration file in the above example was simply:
.. code:: yaml
- type: galaxy_packages
base_path: ./tests/test_deps_env
It is possible that the ``SoftwareRequirement`` s in a given CWL tool will not
match the module names for a given cluster. Such requirements can be re-mapped
to specific deployed packages or versions using another file specified using
the resolver plugin parameter `mapping_files`. We will
demonstrate this using `galaxy_packages,` but the concepts apply equally well
to Environment Modules or Conda packages (described below), for instance.
So consider the resolvers configuration file.
(`tests/test_deps_env_resolvers_conf_rewrite.yml`):
.. code:: yaml
- type: galaxy_packages
base_path: ./tests/test_deps_env
mapping_files: ./tests/test_deps_mapping.yml
And the corresponding mapping configuration file (`tests/test_deps_mapping.yml`):
.. code:: yaml
- from:
name: randomLines
version: 1.0.0-rc1
to:
name: random-lines
version: '1.0'
This is saying if cwltool encounters a requirement of ``randomLines`` at version
``1.0.0-rc1`` in a tool, to rewrite to our specific plugin as ``random-lines`` at
version ``1.0``. cwltool has such a test tool called ``random_lines_mapping.cwl``
that contains such a source ``SoftwareRequirement``. To try out this example with
mapping, execute the following command from the cwltool root directory::
cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf_rewrite.yml \
tests/random_lines_mapping.cwl \
tests/random_lines_job.json
The previous examples demonstrated leveraging existing infrastructure to
provide requirements for CWL tools. If instead a real package manager is used
cwltool has the opportunity to install requirements as needed. While initial
support for Homebrew/Linuxbrew plugins is available, the most developed such
plugin is for the `Conda `__ package manager. Conda has the nice properties
of allowing multiple versions of a package to be installed simultaneously,
not requiring evaluated permissions to install Conda itself or packages using
Conda, and being cross-platform. For these reasons, cwltool may run as a normal
user, install its own Conda environment and manage multiple versions of Conda packages
on Linux and Mac OS X.
The Conda plugin can be endlessly configured, but a sensible set of defaults
that has proven a powerful stack for dependency management within the Galaxy tool
development ecosystem can be enabled by simply passing cwltool the
``--beta-conda-dependencies`` flag.
With this, we can use the seqtk example above without Docker or any externally managed services - cwltool should install everything it needs
and create an environment for the tool. Try it out with the following command::
cwltool --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json
The CWL specification allows URIs to be attached to ``SoftwareRequirement`` s
that allow disambiguation of package names. If the mapping files described above
allow deployers to adapt tools to their infrastructure, this mechanism allows
tools to adapt their requirements to multiple package managers. To demonstrate
this within the context of the seqtk, we can simply break the package name we
use and then specify a specific Conda package as follows:
.. code:: yaml
hints:
SoftwareRequirement:
packages:
- package: seqtk_seq
version:
- '1.2'
specs:
- https://anaconda.org/bioconda/seqtk
- https://packages.debian.org/sid/seqtk
The example can be executed using the command::
cwltool --beta-conda-dependencies tests/seqtk_seq_wrong_name.cwl tests/seqtk_seq_job.json
The plugin framework for managing the resolution of these software requirements
as maintained as part of `galaxy-tool-util `__ - a small,
portable subset of the Galaxy project. More information on configuration and implementation can be found
at the following links:
- `Dependency Resolvers in Galaxy `__
- `Conda for [Galaxy] Tool Dependencies `__
- `Mapping Files - Implementation `__
- `Specifications - Implementation `__
- `Initial cwltool Integration Pull Request `__
Use with GA4GH Tool Registry API
--------------------------------
Cwltool can launch tools directly from `GA4GH Tool Registry API`_ endpoints.
By default, cwltool searches https://dockstore.org/ . Use ``--add-tool-registry`` to add other registries to the search path.
For example ::
cwltool quay.io/collaboratory/dockstore-tool-bamstats:develop test.json
and (defaults to latest when a version is not specified) ::
cwltool quay.io/collaboratory/dockstore-tool-bamstats test.json
For this example, grab the test.json (and input file) from https://github.com/CancerCollaboratory/dockstore-tool-bamstats ::
wget https://dockstore.org/api/api/ga4gh/v2/tools/quay.io%2Fbriandoconnor%2Fdockstore-tool-bamstats/versions/develop/PLAIN-CWL/descriptor/test.json
wget https://github.com/CancerCollaboratory/dockstore-tool-bamstats/raw/develop/rna.SRR948778.bam
.. _`GA4GH Tool Registry API`: https://github.com/ga4gh/tool-registry-schemas
Running MPI-based tools that need to be launched
------------------------------------------------
Cwltool supports an extension to the CWL spec
``http://commonwl.org/cwltool#MPIRequirement``. When the tool
definition has this in its ``requirements``/``hints`` section, and
cwltool has been run with ``--enable-ext``, then the tool's command
line will be extended with the commands needed to launch it with
``mpirun`` or similar. You can specify the number of processes to
start as either a literal integer or an expression (that will result
in an integer). For example::
#!/usr/bin/env cwl-runner
cwlVersion: v1.1
class: CommandLineTool
$namespaces:
cwltool: "http://commonwl.org/cwltool#"
requirements:
cwltool:MPIRequirement:
processes: $(inputs.nproc)
inputs:
nproc:
type: int
Interaction with containers: the MPIRequirement currently prepends its
commands to the front of the command line that is constructed. If you
wish to run a containerized application in parallel, for simple use
cases, this does work with Singularity, depending upon the platform
setup. However, this combination should be considered "alpha" -- please
do report any issues you have! This does not work with Docker at the
moment. (More precisely, you get `n` copies of the same single process
image run at the same time that cannot communicate with each other.)
The host-specific parameters are configured in a simple YAML file
(specified with the ``--mpi-config-file`` flag). The allowed keys are
given in the following table; all are optional.
+----------------+------------------+----------+------------------------------+
| Key | Type | Default | Description |
+================+==================+==========+==============================+
| runner | str | "mpirun" | The primary command to use. |
+----------------+------------------+----------+------------------------------+
| nproc_flag | str | "-n" | Flag to set number of |
| | | | processes to start. |
+----------------+------------------+----------+------------------------------+
| default_nproc | int | 1 | Default number of processes. |
+----------------+------------------+----------+------------------------------+
| extra_flags | List[str] | [] | A list of any other flags to |
| | | | be added to the runner's |
| | | | command line before |
| | | | the ``baseCommand``. |
+----------------+------------------+----------+------------------------------+
| env_pass | List[str] | [] | A list of environment |
| | | | variables that should be |
| | | | passed from the host |
| | | | environment through to the |
| | | | tool (e.g., giving the |
| | | | node list as set by your |
| | | | scheduler). |
+----------------+------------------+----------+------------------------------+
| env_pass_regex | List[str] | [] | A list of python regular |
| | | | expressions that will be |
| | | | matched against the host's |
| | | | environment. Those that match|
| | | | will be passed through. |
+----------------+------------------+----------+------------------------------+
| env_set | Mapping[str,str] | {} | A dictionary whose keys are |
| | | | the environment variables set|
| | | | and the values being the |
| | | | values. |
+----------------+------------------+----------+------------------------------+
===========
Development
===========
Running tests locally
---------------------
- Running basic tests ``(/tests)``:
To run the basic tests after installing `cwltool` execute the following:
.. code:: bash
pip install -rtest-requirements.txt
pytest ## N.B. This requires node.js or docker to be available
To run various tests in all supported Python environments, we use `tox `_. To run the test suite in all supported Python environments
first clone the complete code repository (see the ``git clone`` instructions above) and then run
the following in the terminal:
``pip install tox; tox -p``
List of all environment can be seen using:
``tox --listenvs``
and running a specfic test env using:
``tox -e ``
and additionally run a specific test using this format:
``tox -e py310-unit -- -v tests/test_examples.py::test_scandeps``
- Running the entire suite of CWL conformance tests:
The GitHub repository for the CWL specifications contains a script that tests a CWL
implementation against a wide array of valid CWL files using the `cwltest `_
program
Instructions for running these tests can be found in the Common Workflow Language Specification repository at https://github.com/common-workflow-language/common-workflow-language/blob/main/CONFORMANCE_TESTS.md .
Import as a module
------------------
Add
.. code:: python
import cwltool
to your script.
The easiest way to use cwltool to run a tool or workflow from Python is to use a Factory
.. code:: python
import cwltool.factory
fac = cwltool.factory.Factory()
echo = fac.make("echo.cwl")
result = echo(inp="foo")
# result["out"] == "foo"
CWL Tool Control Flow
---------------------
Technical outline of how cwltool works internally, for maintainers.
#. Use CWL ``load_tool()`` to load document.
#. Fetches the document from file or URL
#. Applies preprocessing (syntax/identifier expansion and normalization)
#. Validates the document based on cwlVersion
#. If necessary, updates the document to the latest spec
#. Constructs a Process object using ``make_tool()``` callback. This yields a
CommandLineTool, Workflow, or ExpressionTool. For workflows, this
recursively constructs each workflow step.
#. To construct custom types for CommandLineTool, Workflow, or
ExpressionTool, provide a custom ``make_tool()``
#. Iterate on the ``job()`` method of the Process object to get back runnable jobs.
#. ``job()`` is a generator method (uses the Python iterator protocol)
#. Each time the ``job()`` method is invoked in an iteration, it returns one
of: a runnable item (an object with a ``run()`` method), ``None`` (indicating
there is currently no work ready to run) or end of iteration (indicating
the process is complete.)
#. Invoke the runnable item by calling ``run()``. This runs the tool and gets output.
#. An output callback reports the output of a process.
#. ``job()`` may be iterated over multiple times. It will yield all the work
that is currently ready to run and then yield None.
#. ``Workflow`` objects create a corresponding ``WorkflowJob`` and ``WorkflowJobStep`` objects to hold the workflow state for the duration of the job invocation.
#. The WorkflowJob iterates over each WorkflowJobStep and determines if the
inputs the step are ready.
#. When a step is ready, it constructs an input object for that step and
iterates on the ``job()`` method of the workflow job step.
#. Each runnable item is yielded back up to top-level run loop
#. When a step job completes and receives an output callback, the
job outputs are assigned to the output of the workflow step.
#. When all steps are complete, the intermediate files are moved to a final
workflow output, intermediate directories are deleted, and the workflow's output callback is called.
#. ``CommandLineTool`` job() objects yield a single runnable object.
#. The CommandLineTool ``job()`` method calls ``make_job_runner()`` to create a
``CommandLineJob`` object
#. The job method configures the CommandLineJob object by setting public
attributes
#. The job method iterates over file and directories inputs to the
CommandLineTool and creates a "path map".
#. Files are mapped from their "resolved" location to a "target" path where
they will appear at tool invocation (for example, a location inside a
Docker container.) The target paths are used on the command line.
#. Files are staged to targets paths using either Docker volume binds (when
using containers) or symlinks (if not). This staging step enables files
to be logically rearranged or renamed independent of their source layout.
#. The ``run()`` method of CommandLineJob executes the command line tool or
Docker container, waits for it to complete, collects output, and makes
the output callback.
Extension points
----------------
The following functions can be passed to main() to override or augment
the listed behaviors.
executor
::
executor(tool, job_order_object, runtimeContext, logger)
(Process, Dict[Text, Any], RuntimeContext) -> Tuple[Dict[Text, Any], Text]
An implementation of the top-level workflow execution loop should
synchronously run a process object to completion and return the
output object.
versionfunc
::
()
() -> Text
Return version string.
logger_handler
::
logger_handler
logging.Handler
Handler object for logging.
The following functions can be set in LoadingContext to override or
augment the listed behaviors.
fetcher_constructor
::
fetcher_constructor(cache, session)
(Dict[unicode, unicode], requests.sessions.Session) -> Fetcher
Construct a Fetcher object with the supplied cache and HTTP session.
resolver
::
resolver(document_loader, document)
(Loader, Union[Text, dict[Text, Any]]) -> Text
Resolve a relative document identifier to an absolute one that can be fetched.
The following functions can be set in RuntimeContext to override or
augment the listed behaviors.
construct_tool_object
::
construct_tool_object(toolpath_object, loadingContext)
(MutableMapping[Text, Any], LoadingContext) -> Process
Hook to construct a Process object (eg CommandLineTool) object from a document.
select_resources
::
selectResources(request)
(Dict[str, int], RuntimeContext) -> Dict[Text, int]
Take a resource request and turn it into a concrete resource assignment.
make_fs_access
::
make_fs_access(basedir)
(Text) -> StdFsAccess
Return a file system access object.
In addition, when providing custom subclasses of Process objects, you can override the following methods:
CommandLineTool.make_job_runner
::
make_job_runner(RuntimeContext)
(RuntimeContext) -> Type[JobBase]
Create and return a job runner object (this implements concrete execution of a command line tool).
Workflow.make_workflow_step
::
make_workflow_step(toolpath_object, pos, loadingContext, parentworkflowProv)
(Dict[Text, Any], int, LoadingContext, Optional[ProvenanceProfile]) -> WorkflowStep
Create and return a workflow step object.
././@PaxHeader 0000000 0000000 0000000 00000000033 00000000000 010211 x ustar 00 27 mtime=1645718441.067677
cwltool-3.1.20220224085855/cwltool/ 0000755 0001750 0001750 00000000000 00000000000 015137 5 ustar 00peter peter ././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/cwltool/__init__.py 0000644 0001750 0001750 00000000430 00000000000 017245 0 ustar 00peter peter """Reference implementation of the CWL standards."""
__author__ = "pamstutz@veritasgenetics.com"
CWL_CONTENT_TYPES = [
"text/plain",
"application/json",
"text/vnd.yaml",
"text/yaml",
"text/x-yaml",
"application/x-yaml",
"application/octet-stream",
]
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/cwltool/__main__.py 0000644 0001750 0001750 00000000121 00000000000 017223 0 ustar 00peter peter """Default entrypoint for the cwltool module."""
from . import main
main.run()
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/cwltool/argparser.py 0000644 0001750 0001750 00000070700 00000000000 017503 0 ustar 00peter peter """Command line argument parsing for cwltool."""
import argparse
import os
import urllib
from typing import (
Any,
AnyStr,
Callable,
Dict,
List,
MutableMapping,
MutableSequence,
Optional,
Sequence,
Type,
Union,
cast,
)
from schema_salad.ref_resolver import file_uri
from .loghandler import _logger
from .process import Process, shortname
from .resolver import ga4gh_tool_registries
from .software_requirements import SOFTWARE_REQUIREMENTS_ENABLED
from .utils import DEFAULT_TMP_PREFIX
def arg_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(
description="Reference executor for Common Workflow Language standards. "
"Not for production use."
)
parser.add_argument("--basedir", type=str)
parser.add_argument(
"--outdir",
type=str,
default=os.path.abspath("."),
help="Output directory. The default is the current directory.",
)
parser.add_argument(
"--log-dir",
type=str,
default="",
help="Log your tools stdout/stderr to this location outside of container "
"This will only log stdout/stderr if you specify stdout/stderr in their respective fields or capture it as an output",
)
parser.add_argument(
"--parallel",
action="store_true",
default=False,
help="[experimental] Run jobs in parallel. ",
)
envgroup = parser.add_mutually_exclusive_group()
envgroup.add_argument(
"--preserve-environment",
type=str,
action="append",
help="Preserve specific environment variable when running "
"CommandLineTools. May be provided multiple times. By default PATH is "
"preserved when not running in a container.",
metavar="ENVVAR",
default=[],
dest="preserve_environment",
)
envgroup.add_argument(
"--preserve-entire-environment",
action="store_true",
help="Preserve all environment variables when running CommandLineTools "
"without a software container.",
default=False,
dest="preserve_entire_environment",
)
containergroup = parser.add_mutually_exclusive_group()
containergroup.add_argument(
"--rm-container",
action="store_true",
default=True,
help="Delete Docker container used by jobs after they exit (default)",
dest="rm_container",
)
containergroup.add_argument(
"--leave-container",
action="store_false",
default=True,
help="Do not delete Docker container used by jobs after they exit",
dest="rm_container",
)
cidgroup = parser.add_argument_group(
"Options for recording the Docker container identifier into a file."
)
cidgroup.add_argument(
# Disabled as containerid is now saved by default
"--record-container-id",
action="store_true",
default=False,
help=argparse.SUPPRESS,
dest="record_container_id",
)
cidgroup.add_argument(
"--cidfile-dir",
type=str,
help="Store the Docker container ID into a file in the specified directory.",
default=None,
dest="cidfile_dir",
)
cidgroup.add_argument(
"--cidfile-prefix",
type=str,
help="Specify a prefix to the container ID filename. "
"Final file name will be followed by a timestamp. "
"The default is no prefix.",
default=None,
dest="cidfile_prefix",
)
parser.add_argument(
"--tmpdir-prefix",
type=str,
help="Path prefix for temporary directories. If --tmpdir-prefix is not "
"provided, then the prefix for temporary directories is influenced by "
"the value of the TMPDIR, TEMP, or TMP environment variables. Taking "
"those into consideration, the current default is {}.".format(
DEFAULT_TMP_PREFIX
),
default=DEFAULT_TMP_PREFIX,
)
intgroup = parser.add_mutually_exclusive_group()
intgroup.add_argument(
"--tmp-outdir-prefix",
type=str,
help="Path prefix for intermediate output directories. Defaults to the "
"value of --tmpdir-prefix.",
default="",
)
intgroup.add_argument(
"--cachedir",
type=str,
default="",
help="Directory to cache intermediate workflow outputs to avoid "
"recomputing steps. Can be very helpful in the development and "
"troubleshooting of CWL documents.",
)
tmpgroup = parser.add_mutually_exclusive_group()
tmpgroup.add_argument(
"--rm-tmpdir",
action="store_true",
default=True,
help="Delete intermediate temporary directories (default)",
dest="rm_tmpdir",
)
tmpgroup.add_argument(
"--leave-tmpdir",
action="store_false",
default=True,
help="Do not delete intermediate temporary directories",
dest="rm_tmpdir",
)
outgroup = parser.add_mutually_exclusive_group()
outgroup.add_argument(
"--move-outputs",
action="store_const",
const="move",
default="move",
help="Move output files to the workflow output directory and delete "
"intermediate output directories (default).",
dest="move_outputs",
)
outgroup.add_argument(
"--leave-outputs",
action="store_const",
const="leave",
default="move",
help="Leave output files in intermediate output directories.",
dest="move_outputs",
)
outgroup.add_argument(
"--copy-outputs",
action="store_const",
const="copy",
default="move",
help="Copy output files to the workflow output directory and don't "
"delete intermediate output directories.",
dest="move_outputs",
)
pullgroup = parser.add_mutually_exclusive_group()
pullgroup.add_argument(
"--enable-pull",
default=True,
action="store_true",
help="Try to pull Docker images",
dest="pull_image",
)
pullgroup.add_argument(
"--disable-pull",
default=True,
action="store_false",
help="Do not try to pull Docker images",
dest="pull_image",
)
parser.add_argument(
"--rdf-serializer",
help="Output RDF serialization format used by --print-rdf (one of "
"turtle (default), n3, nt, xml)",
default="turtle",
)
parser.add_argument(
"--eval-timeout",
help="Time to wait for a Javascript expression to evaluate before giving "
"an error, default 20s.",
type=float,
default=20,
)
provgroup = parser.add_argument_group(
"Options for recording provenance information of the execution"
)
provgroup.add_argument(
"--provenance",
help="Save provenance to specified folder as a "
"Research Object that captures and aggregates "
"workflow execution and data products.",
type=str,
)
provgroup.add_argument(
"--enable-user-provenance",
default=False,
action="store_true",
help="Record user account info as part of provenance.",
dest="user_provenance",
)
provgroup.add_argument(
"--disable-user-provenance",
default=False,
action="store_false",
help="Do not record user account info in provenance.",
dest="user_provenance",
)
provgroup.add_argument(
"--enable-host-provenance",
default=False,
action="store_true",
help="Record host info as part of provenance.",
dest="host_provenance",
)
provgroup.add_argument(
"--disable-host-provenance",
default=False,
action="store_false",
help="Do not record host info in provenance.",
dest="host_provenance",
)
provgroup.add_argument(
"--orcid",
help="Record user ORCID identifier as part of "
"provenance, e.g. https://orcid.org/0000-0002-1825-0097 "
"or 0000-0002-1825-0097. Alternatively the environment variable "
"ORCID may be set.",
dest="orcid",
default=os.environ.get("ORCID", ""),
type=str,
)
provgroup.add_argument(
"--full-name",
help="Record full name of user as part of provenance, "
"e.g. Josiah Carberry. You may need to use shell quotes to preserve "
"spaces. Alternatively the environment variable CWL_FULL_NAME may "
"be set.",
dest="cwl_full_name",
default=os.environ.get("CWL_FULL_NAME", ""),
type=str,
)
printgroup = parser.add_mutually_exclusive_group()
printgroup.add_argument(
"--print-rdf",
action="store_true",
help="Print corresponding RDF graph for workflow and exit",
)
printgroup.add_argument(
"--print-dot",
action="store_true",
help="Print workflow visualization in graphviz format and exit",
)
printgroup.add_argument(
"--print-pre",
action="store_true",
help="Print CWL document after preprocessing.",
)
printgroup.add_argument(
"--print-deps", action="store_true", help="Print CWL document dependencies."
)
printgroup.add_argument(
"--print-input-deps",
action="store_true",
help="Print input object document dependencies.",
)
printgroup.add_argument(
"--pack",
action="store_true",
help="Combine components into single document and print.",
)
printgroup.add_argument(
"--version", action="store_true", help="Print version and exit"
)
printgroup.add_argument(
"--validate", action="store_true", help="Validate CWL document only."
)
printgroup.add_argument(
"--print-supported-versions",
action="store_true",
help="Print supported CWL specs.",
)
printgroup.add_argument(
"--print-subgraph",
action="store_true",
help="Print workflow subgraph that will execute. Can combined with "
"--target or --single-step",
)
printgroup.add_argument(
"--print-targets", action="store_true", help="Print targets (output parameters)"
)
printgroup.add_argument(
"--make-template", action="store_true", help="Generate a template input object"
)
strictgroup = parser.add_mutually_exclusive_group()
strictgroup.add_argument(
"--strict",
action="store_true",
help="Strict validation (unrecognized or out of place fields are error)",
default=True,
dest="strict",
)
strictgroup.add_argument(
"--non-strict",
action="store_false",
help="Lenient validation (ignore unrecognized fields)",
default=True,
dest="strict",
)
parser.add_argument(
"--skip-schemas",
action="store_true",
help="Skip loading of schemas",
default=False,
dest="skip_schemas",
)
doccachegroup = parser.add_mutually_exclusive_group()
doccachegroup.add_argument(
"--no-doc-cache",
action="store_false",
help="Disable disk cache for documents loaded over HTTP",
default=True,
dest="doc_cache",
)
doccachegroup.add_argument(
"--doc-cache",
action="store_true",
help="Enable disk cache for documents loaded over HTTP",
default=True,
dest="doc_cache",
)
volumegroup = parser.add_mutually_exclusive_group()
volumegroup.add_argument("--verbose", action="store_true", help="Default logging")
volumegroup.add_argument(
"--quiet", action="store_true", help="Only print warnings and errors."
)
volumegroup.add_argument(
"--debug", action="store_true", help="Print even more logging"
)
parser.add_argument(
"--strict-memory-limit",
action="store_true",
help="When running with "
"software containers and the Docker engine, pass either the "
"calculated memory allocation from ResourceRequirements or the "
"default of 1 gigabyte to Docker's --memory option.",
)
parser.add_argument(
"--strict-cpu-limit",
action="store_true",
help="When running with "
"software containers and the Docker engine, pass either the "
"calculated cpu allocation from ResourceRequirements or the "
"default of 1 core to Docker's --cpu option. "
"Requires docker version >= v1.13.",
)
parser.add_argument(
"--timestamps",
action="store_true",
help="Add timestamps to the errors, warnings, and notifications.",
)
parser.add_argument(
"--js-console", action="store_true", help="Enable javascript console output"
)
parser.add_argument(
"--disable-js-validation",
action="store_true",
help="Disable javascript validation.",
)
parser.add_argument(
"--js-hint-options-file",
type=str,
help="File of options to pass to jshint. "
'This includes the added option "includewarnings". ',
)
dockergroup = parser.add_mutually_exclusive_group()
dockergroup.add_argument(
"--user-space-docker-cmd",
metavar="CMD",
help="(Linux/OS X only) Specify the path to udocker. Implies --udocker",
)
dockergroup.add_argument(
"--udocker",
help="(Linux/OS X only) Use the udocker runtime for running containers "
"(equivalent to --user-space-docker-cmd=udocker).",
action="store_const",
const="udocker",
dest="user_space_docker_cmd",
)
dockergroup.add_argument(
"--singularity",
action="store_true",
default=False,
help="[experimental] Use "
"Singularity runtime for running containers. "
"Requires Singularity v2.6.1+ and Linux with kernel "
"version v3.18+ or with overlayfs support "
"backported.",
)
dockergroup.add_argument(
"--podman",
action="store_true",
default=False,
help="[experimental] Use " "Podman runtime for running containers. ",
)
dockergroup.add_argument(
"--no-container",
action="store_false",
default=True,
help="Do not execute jobs in a "
"Docker container, even when `DockerRequirement` "
"is specified under `hints`.",
dest="use_container",
)
dependency_resolvers_configuration_help = argparse.SUPPRESS
dependencies_directory_help = argparse.SUPPRESS
use_biocontainers_help = argparse.SUPPRESS
conda_dependencies = argparse.SUPPRESS
if SOFTWARE_REQUIREMENTS_ENABLED:
dependency_resolvers_configuration_help = (
"Dependency resolver "
"configuration file describing how to adapt 'SoftwareRequirement' "
"packages to current system."
)
dependencies_directory_help = (
"Default root directory used by dependency resolvers configuration."
)
use_biocontainers_help = (
"Use biocontainers for tools without an "
"explicitly annotated Docker container."
)
conda_dependencies = (
"Short cut to use Conda to resolve 'SoftwareRequirement' packages."
)
parser.add_argument(
"--beta-dependency-resolvers-configuration",
default=None,
help=dependency_resolvers_configuration_help,
)
parser.add_argument(
"--beta-dependencies-directory", default=None, help=dependencies_directory_help
)
parser.add_argument(
"--beta-use-biocontainers",
default=None,
help=use_biocontainers_help,
action="store_true",
)
parser.add_argument(
"--beta-conda-dependencies",
default=None,
help=conda_dependencies,
action="store_true",
)
parser.add_argument(
"--tool-help", action="store_true", help="Print command line help for tool"
)
parser.add_argument(
"--relative-deps",
choices=["primary", "cwd"],
default="primary",
help="When using --print-deps, print paths "
"relative to primary file or current working directory.",
)
parser.add_argument(
"--enable-dev",
action="store_true",
help="Enable loading and running unofficial development versions of "
"the CWL standards.",
default=False,
)
parser.add_argument(
"--enable-ext",
action="store_true",
help="Enable loading and running 'cwltool:' extensions to the CWL standards.",
default=False,
)
colorgroup = parser.add_mutually_exclusive_group()
colorgroup.add_argument(
"--enable-color",
action="store_true",
help="Enable logging color (default enabled)",
default=True,
)
colorgroup.add_argument(
"--disable-color",
action="store_false",
dest="enable_color",
help="Disable colored logging (default false)",
)
parser.add_argument(
"--default-container",
help="Specify a default software container to use for any "
"CommandLineTool without a DockerRequirement.",
)
parser.add_argument(
"--no-match-user",
action="store_true",
help="Disable passing the current uid to `docker run --user`",
)
parser.add_argument(
"--custom-net",
type=str,
help="Passed to `docker run` as the '--net' parameter when "
"NetworkAccess is true, which is its default setting.",
)
parser.add_argument(
"--disable-validate",
dest="do_validate",
action="store_false",
default=True,
help=argparse.SUPPRESS,
)
reggroup = parser.add_mutually_exclusive_group()
reggroup.add_argument(
"--enable-ga4gh-tool-registry",
action="store_true",
help="Enable tool resolution using GA4GH tool registry API",
dest="enable_ga4gh_tool_registry",
default=True,
)
reggroup.add_argument(
"--disable-ga4gh-tool-registry",
action="store_false",
help="Disable tool resolution using GA4GH tool registry API",
dest="enable_ga4gh_tool_registry",
default=True,
)
parser.add_argument(
"--add-ga4gh-tool-registry",
action="append",
help="Add a GA4GH tool registry endpoint to use for resolution, default %s"
% ga4gh_tool_registries,
dest="ga4gh_tool_registries",
default=[],
)
parser.add_argument(
"--on-error",
help="Desired workflow behavior when a step fails. One of 'stop' (do "
"not submit any more steps) or 'continue' (may submit other steps that "
"are not downstream from the error). Default is 'stop'.",
default="stop",
choices=("stop", "continue"),
)
checkgroup = parser.add_mutually_exclusive_group()
checkgroup.add_argument(
"--compute-checksum",
action="store_true",
default=True,
help="Compute checksum of contents while collecting outputs",
dest="compute_checksum",
)
checkgroup.add_argument(
"--no-compute-checksum",
action="store_false",
help="Do not compute checksum of contents while collecting outputs",
dest="compute_checksum",
)
parser.add_argument(
"--relax-path-checks",
action="store_true",
default=False,
help="Relax requirements on path names to permit "
"spaces and hash characters.",
dest="relax_path_checks",
)
parser.add_argument(
"--force-docker-pull",
action="store_true",
default=False,
help="Pull latest software container image even if it is locally present",
dest="force_docker_pull",
)
parser.add_argument(
"--no-read-only",
action="store_true",
default=False,
help="Do not set root directory in the container as read-only",
dest="no_read_only",
)
parser.add_argument(
"--overrides",
type=str,
default=None,
help="Read process requirement overrides from file.",
)
subgroup = parser.add_mutually_exclusive_group()
subgroup.add_argument(
"--target",
"-t",
action="append",
help="Only execute steps that contribute to listed targets (can be "
"provided more than once).",
)
subgroup.add_argument(
"--single-step",
type=str,
default=None,
help="Only executes a single step in a workflow. The input object must "
"match that step's inputs. Can be combined with --print-subgraph.",
)
subgroup.add_argument(
"--single-process",
type=str,
default=None,
help="Only executes the underlying Process (CommandLineTool, "
"ExpressionTool, or sub-Workflow) for the given step in a workflow. "
"This will not include any step-level processing: 'scatter', 'when'; "
"and there will be no processing of step-level 'default', or 'valueFrom' "
"input modifiers. However, requirements/hints from the step or parent "
"workflow(s) will be inherited as usual."
"The input object must match that Process's inputs.",
)
parser.add_argument(
"--mpi-config-file",
type=str,
default=None,
help="Platform specific configuration for MPI (parallel launcher, its "
"flag etc). See README section 'Running MPI-based tools' for details "
"of the format.",
)
parser.add_argument(
"workflow",
type=str,
nargs="?",
default=None,
metavar="cwl_document",
help="path or URL to a CWL Workflow, "
"CommandLineTool, or ExpressionTool. If the `inputs_object` has a "
"`cwl:tool` field indicating the path or URL to the cwl_document, "
" then the `cwl_document` argument is optional.",
)
parser.add_argument(
"job_order",
nargs=argparse.REMAINDER,
metavar="inputs_object",
help="path or URL to a YAML or JSON "
"formatted description of the required input values for the given "
"`cwl_document`.",
)
return parser
def get_default_args() -> Dict[str, Any]:
"""Get default values of cwltool's command line options."""
ap = arg_parser()
args = ap.parse_args([])
return vars(args)
class FSAction(argparse.Action):
objclass = None # type: str
def __init__(
self,
option_strings: List[str],
dest: str,
nargs: Any = None,
urljoin: Callable[[str, str], str] = urllib.parse.urljoin,
base_uri: str = "",
**kwargs: Any,
) -> None:
"""Fail if nargs is used."""
if nargs is not None:
raise ValueError("nargs not allowed")
self.urljoin = urljoin
self.base_uri = base_uri
super().__init__(option_strings, dest, **kwargs)
def __call__(
self,
parser: argparse.ArgumentParser,
namespace: argparse.Namespace,
values: Union[str, Sequence[Any], None],
option_string: Optional[str] = None,
) -> None:
setattr(
namespace,
self.dest,
{
"class": self.objclass,
"location": self.urljoin(self.base_uri, cast(str, values)),
},
)
class FSAppendAction(argparse.Action):
objclass = None # type: str
def __init__(
self,
option_strings: List[str],
dest: str,
nargs: Any = None,
urljoin: Callable[[str, str], str] = urllib.parse.urljoin,
base_uri: str = "",
**kwargs: Any,
) -> None:
"""Initialize."""
if nargs is not None:
raise ValueError("nargs not allowed")
self.urljoin = urljoin
self.base_uri = base_uri
super().__init__(option_strings, dest, **kwargs)
def __call__(
self,
parser: argparse.ArgumentParser,
namespace: argparse.Namespace,
values: Union[str, Sequence[Any], None],
option_string: Optional[str] = None,
) -> None:
g = getattr(namespace, self.dest)
if not g:
g = []
setattr(namespace, self.dest, g)
g.append(
{
"class": self.objclass,
"location": self.urljoin(self.base_uri, cast(str, values)),
}
)
class FileAction(FSAction):
objclass = "File"
class DirectoryAction(FSAction):
objclass = "Directory"
class FileAppendAction(FSAppendAction):
objclass = "File"
class DirectoryAppendAction(FSAppendAction):
objclass = "Directory"
def add_argument(
toolparser: argparse.ArgumentParser,
name: str,
inptype: Any,
records: List[str],
description: str = "",
default: Any = None,
input_required: bool = True,
urljoin: Callable[[str, str], str] = urllib.parse.urljoin,
base_uri: str = "",
) -> None:
if len(name) == 1:
flag = "-"
else:
flag = "--"
# if input_required is false, don't make the command line
# parameter required.
required = default is None and input_required
if isinstance(inptype, MutableSequence):
if len(inptype) == 1:
inptype = inptype[0]
elif len(inptype) == 2 and inptype[0] == "null":
required = False
inptype = inptype[1]
elif len(inptype) == 2 and inptype[1] == "null":
required = False
inptype = inptype[0]
else:
_logger.debug("Can't make command line argument from %s", inptype)
return None
ahelp = description.replace("%", "%%")
action = None # type: Optional[Union[Type[argparse.Action], str]]
atype = None # type: Any
typekw = {} # type: Dict[str, Any]
if inptype == "File":
action = FileAction
elif inptype == "Directory":
action = DirectoryAction
elif isinstance(inptype, MutableMapping) and inptype["type"] == "array":
if inptype["items"] == "File":
action = FileAppendAction
elif inptype["items"] == "Directory":
action = DirectoryAppendAction
else:
action = "append"
elif isinstance(inptype, MutableMapping) and inptype["type"] == "enum":
atype = str
elif isinstance(inptype, MutableMapping) and inptype["type"] == "record":
records.append(name)
for field in inptype["fields"]:
fieldname = name + "." + shortname(field["name"])
fieldtype = field["type"]
fielddescription = field.get("doc", "")
add_argument(
toolparser,
fieldname,
fieldtype,
records,
fielddescription,
default=default.get(shortname(field["name"]), None)
if default
else None,
input_required=required,
)
return
elif inptype == "string":
atype = str
elif inptype == "int":
atype = int
elif inptype == "long":
atype = int
elif inptype == "double":
atype = float
elif inptype == "float":
atype = float
elif inptype == "boolean":
action = "store_true"
else:
_logger.debug("Can't make command line argument from %s", inptype)
return None
if action in (FileAction, DirectoryAction, FileAppendAction, DirectoryAppendAction):
typekw["urljoin"] = urljoin
typekw["base_uri"] = base_uri
if inptype != "boolean":
typekw["type"] = atype
toolparser.add_argument(
flag + name,
required=required,
help=ahelp,
action=action, # type: ignore
default=default,
**typekw,
)
def generate_parser(
toolparser: argparse.ArgumentParser,
tool: Process,
namemap: Dict[str, str],
records: List[str],
input_required: bool = True,
urljoin: Callable[[str, str], str] = urllib.parse.urljoin,
base_uri: str = "",
) -> argparse.ArgumentParser:
toolparser.description = tool.tool.get("doc", None)
toolparser.add_argument("job_order", nargs="?", help="Job input json file")
namemap["job_order"] = "job_order"
for inp in tool.tool["inputs"]:
name = shortname(inp["id"])
namemap[name.replace("-", "_")] = name
inptype = inp["type"]
description = inp.get("doc", "")
default = inp.get("default", None)
add_argument(
toolparser,
name,
inptype,
records,
description,
default,
input_required,
urljoin,
base_uri,
)
return toolparser
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/cwltool/builder.py 0000644 0001750 0001750 00000073602 00000000000 017147 0 ustar 00peter peter import copy
import logging
import math
from typing import (
IO,
Any,
Callable,
Dict,
List,
MutableMapping,
MutableSequence,
Optional,
Set,
Union,
cast,
)
from rdflib import Graph, URIRef
from rdflib.namespace import OWL, RDFS
from ruamel.yaml.comments import CommentedMap
from schema_salad.avro.schema import Names, Schema, make_avsc_object
from schema_salad.exceptions import ValidationException
from schema_salad.sourceline import SourceLine
from schema_salad.utils import convert_to_dict, json_dumps
from schema_salad.validate import validate
from typing_extensions import TYPE_CHECKING, Type # pylint: disable=unused-import
from . import expression
from .errors import WorkflowException
from .loghandler import _logger
from .mutation import MutationManager
from .software_requirements import DependenciesConfiguration
from .stdfsaccess import StdFsAccess
from .utils import (
CONTENT_LIMIT,
CWLObjectType,
CWLOutputType,
HasReqsHints,
aslist,
get_listing,
normalizeFilesDirs,
visit_class,
)
if TYPE_CHECKING:
from .pathmapper import PathMapper
from .provenance_profile import ProvenanceProfile # pylint: disable=unused-import
INPUT_OBJ_VOCAB: Dict[str, str] = {
"Any": "https://w3id.org/cwl/salad#Any",
"File": "https://w3id.org/cwl/cwl#File",
"Directory": "https://w3id.org/cwl/cwl#Directory",
}
def content_limit_respected_read_bytes(f): # type: (IO[bytes]) -> bytes
contents = f.read(CONTENT_LIMIT + 1)
if len(contents) > CONTENT_LIMIT:
raise WorkflowException(
"file is too large, loadContents limited to %d bytes" % CONTENT_LIMIT
)
return contents
def content_limit_respected_read(f): # type: (IO[bytes]) -> str
return content_limit_respected_read_bytes(f).decode("utf-8")
def substitute(value, replace): # type: (str, str) -> str
if replace.startswith("^"):
try:
return substitute(value[0 : value.rindex(".")], replace[1:])
except ValueError:
# No extension to remove
return value + replace.lstrip("^")
return value + replace
def formatSubclassOf(
fmt: str, cls: str, ontology: Optional[Graph], visited: Set[str]
) -> bool:
"""Determine if `fmt` is a subclass of `cls`."""
if URIRef(fmt) == URIRef(cls):
return True
if ontology is None:
return False
if fmt in visited:
return False
visited.add(fmt)
uriRefFmt = URIRef(fmt)
for _s, _p, o in ontology.triples((uriRefFmt, RDFS.subClassOf, None)):
# Find parent classes of `fmt` and search upward
if formatSubclassOf(o, cls, ontology, visited):
return True
for _s, _p, o in ontology.triples((uriRefFmt, OWL.equivalentClass, None)):
# Find equivalent classes of `fmt` and search horizontally
if formatSubclassOf(o, cls, ontology, visited):
return True
for s, _p, _o in ontology.triples((None, OWL.equivalentClass, uriRefFmt)):
# Find equivalent classes of `fmt` and search horizontally
if formatSubclassOf(s, cls, ontology, visited):
return True
return False
def check_format(
actual_file: Union[CWLObjectType, List[CWLObjectType]],
input_formats: Union[List[str], str],
ontology: Optional[Graph],
) -> None:
"""Confirm that the format present is valid for the allowed formats."""
for afile in aslist(actual_file):
if not afile:
continue
if "format" not in afile:
raise ValidationException(
f"File has no 'format' defined: {json_dumps(afile, indent=4)}"
)
for inpf in aslist(input_formats):
if afile["format"] == inpf or formatSubclassOf(
afile["format"], inpf, ontology, set()
):
return
raise ValidationException(
f"File has an incompatible format: {json_dumps(afile, indent=4)}"
)
class Builder(HasReqsHints):
def __init__(
self,
job: CWLObjectType,
files: List[CWLObjectType],
bindings: List[CWLObjectType],
schemaDefs: MutableMapping[str, CWLObjectType],
names: Names,
requirements: List[CWLObjectType],
hints: List[CWLObjectType],
resources: Dict[str, Union[int, float]],
mutation_manager: Optional[MutationManager],
formatgraph: Optional[Graph],
make_fs_access: Type[StdFsAccess],
fs_access: StdFsAccess,
job_script_provider: Optional[DependenciesConfiguration],
timeout: float,
debug: bool,
js_console: bool,
force_docker_pull: bool,
loadListing: str,
outdir: str,
tmpdir: str,
stagedir: str,
cwlVersion: str,
container_engine: str,
) -> None:
"""Initialize this Builder."""
super().__init__()
self.job = job
self.files = files
self.bindings = bindings
self.schemaDefs = schemaDefs
self.names = names
self.requirements = requirements
self.hints = hints
self.resources = resources
self.mutation_manager = mutation_manager
self.formatgraph = formatgraph
self.make_fs_access = make_fs_access
self.fs_access = fs_access
self.job_script_provider = job_script_provider
self.timeout = timeout
self.debug = debug
self.js_console = js_console
self.force_docker_pull = force_docker_pull
# One of "no_listing", "shallow_listing", "deep_listing"
self.loadListing = loadListing
self.outdir = outdir
self.tmpdir = tmpdir
self.stagedir = stagedir
self.cwlVersion = cwlVersion
self.pathmapper = None # type: Optional[PathMapper]
self.prov_obj = None # type: Optional[ProvenanceProfile]
self.find_default_container = None # type: Optional[Callable[[], str]]
self.container_engine = container_engine
def build_job_script(self, commands: List[str]) -> Optional[str]:
if self.job_script_provider is not None:
return self.job_script_provider.build_job_script(self, commands)
return None
def bind_input(
self,
schema: CWLObjectType,
datum: Union[CWLObjectType, List[CWLObjectType]],
discover_secondaryFiles: bool,
lead_pos: Optional[Union[int, List[int]]] = None,
tail_pos: Optional[Union[str, List[int]]] = None,
) -> List[MutableMapping[str, Union[str, List[int]]]]:
debug = _logger.isEnabledFor(logging.DEBUG)
if tail_pos is None:
tail_pos = []
if lead_pos is None:
lead_pos = []
bindings = [] # type: List[MutableMapping[str, Union[str, List[int]]]]
binding = (
{}
) # type: Union[MutableMapping[str, Union[str, List[int]]], CommentedMap]
value_from_expression = False
if "inputBinding" in schema and isinstance(
schema["inputBinding"], MutableMapping
):
binding = CommentedMap(schema["inputBinding"].items())
bp = list(aslist(lead_pos))
if "position" in binding:
position = binding["position"]
if isinstance(position, str): # no need to test the CWL Version
# the schema for v1.0 only allow ints
result = self.do_eval(position, context=datum)
if not isinstance(result, int):
raise SourceLine(
schema["inputBinding"], "position", WorkflowException, debug
).makeError(
"'position' expressions must evaluate to an int, "
f"not a {type(result)}. Expression {position} "
f"resulted in '{result}'."
)
binding["position"] = result
bp.append(result)
else:
bp.extend(aslist(binding["position"]))
else:
bp.append(0)
bp.extend(aslist(tail_pos))
binding["position"] = bp
binding["datum"] = datum
if "valueFrom" in binding:
value_from_expression = True
# Handle union types
if isinstance(schema["type"], MutableSequence):
bound_input = False
for t in schema["type"]:
avsc = None # type: Optional[Schema]
if isinstance(t, str) and self.names.has_name(t, None):
avsc = self.names.get_name(t, None)
elif (
isinstance(t, MutableMapping)
and "name" in t
and self.names.has_name(cast(str, t["name"]), None)
):
avsc = self.names.get_name(cast(str, t["name"]), None)
if not avsc:
avsc = make_avsc_object(convert_to_dict(t), self.names)
if validate(avsc, datum, vocab=INPUT_OBJ_VOCAB):
schema = copy.deepcopy(schema)
schema["type"] = t
if not value_from_expression:
return self.bind_input(
schema,
datum,
lead_pos=lead_pos,
tail_pos=tail_pos,
discover_secondaryFiles=discover_secondaryFiles,
)
else:
self.bind_input(
schema,
datum,
lead_pos=lead_pos,
tail_pos=tail_pos,
discover_secondaryFiles=discover_secondaryFiles,
)
bound_input = True
if not bound_input:
raise ValidationException(
"'{}' is not a valid union {}".format(datum, schema["type"])
)
elif isinstance(schema["type"], MutableMapping):
st = copy.deepcopy(schema["type"])
if (
binding
and "inputBinding" not in st
and "type" in st
and st["type"] == "array"
and "itemSeparator" not in binding
):
st["inputBinding"] = {}
for k in ("secondaryFiles", "format", "streamable"):
if k in schema:
st[k] = schema[k]
if value_from_expression:
self.bind_input(
st,
datum,
lead_pos=lead_pos,
tail_pos=tail_pos,
discover_secondaryFiles=discover_secondaryFiles,
)
else:
bindings.extend(
self.bind_input(
st,
datum,
lead_pos=lead_pos,
tail_pos=tail_pos,
discover_secondaryFiles=discover_secondaryFiles,
)
)
else:
if schema["type"] == "org.w3id.cwl.salad.Any":
if isinstance(datum, dict):
if datum.get("class") == "File":
schema["type"] = "org.w3id.cwl.cwl.File"
elif datum.get("class") == "Directory":
schema["type"] = "org.w3id.cwl.cwl.Directory"
else:
schema["type"] = "record"
schema["fields"] = [
{"name": field_name, "type": "Any"}
for field_name in datum.keys()
]
elif isinstance(datum, list):
schema["type"] = "array"
schema["items"] = "Any"
if schema["type"] in self.schemaDefs:
schema = self.schemaDefs[cast(str, schema["type"])]
if schema["type"] == "record":
datum = cast(CWLObjectType, datum)
for f in cast(List[CWLObjectType], schema["fields"]):
name = cast(str, f["name"])
if name in datum and datum[name] is not None:
bindings.extend(
self.bind_input(
f,
cast(CWLObjectType, datum[name]),
lead_pos=lead_pos,
tail_pos=name,
discover_secondaryFiles=discover_secondaryFiles,
)
)
else:
datum[name] = f.get("default")
if schema["type"] == "array":
for n, item in enumerate(cast(MutableSequence[CWLObjectType], datum)):
b2 = None
if binding:
b2 = cast(CWLObjectType, copy.deepcopy(binding))
b2["datum"] = item
itemschema = {
"type": schema["items"],
"inputBinding": b2,
} # type: CWLObjectType
for k in ("secondaryFiles", "format", "streamable"):
if k in schema:
itemschema[k] = schema[k]
bindings.extend(
self.bind_input(
itemschema,
item,
lead_pos=n,
tail_pos=tail_pos,
discover_secondaryFiles=discover_secondaryFiles,
)
)
binding = {}
def _capture_files(f: CWLObjectType) -> CWLObjectType:
self.files.append(f)
return f
if schema["type"] == "org.w3id.cwl.cwl.File":
datum = cast(CWLObjectType, datum)
self.files.append(datum)
loadContents_sourceline = (
None
) # type: Union[None, MutableMapping[str, Union[str, List[int]]], CWLObjectType]
if binding and binding.get("loadContents"):
loadContents_sourceline = binding
elif schema.get("loadContents"):
loadContents_sourceline = schema
if loadContents_sourceline and loadContents_sourceline["loadContents"]:
with SourceLine(
loadContents_sourceline,
"loadContents",
WorkflowException,
debug,
):
try:
with self.fs_access.open(
cast(str, datum["location"]), "rb"
) as f2:
datum["contents"] = content_limit_respected_read(f2)
except Exception as e:
raise Exception(
"Reading {}\n{}".format(datum["location"], e)
)
if "secondaryFiles" in schema:
if "secondaryFiles" not in datum:
datum["secondaryFiles"] = []
sf_schema = aslist(schema["secondaryFiles"])
elif not discover_secondaryFiles:
sf_schema = [] # trust the inputs
else:
sf_schema = aslist(schema["secondaryFiles"])
for num, sf_entry in enumerate(sf_schema):
if "required" in sf_entry and sf_entry["required"] is not None:
required_result = self.do_eval(
sf_entry["required"], context=datum
)
if not (
isinstance(required_result, bool)
or required_result is None
):
if sf_schema == schema["secondaryFiles"]:
sf_item: Any = sf_schema[num]
else:
sf_item = sf_schema
raise SourceLine(
sf_item, "required", WorkflowException, debug
).makeError(
"The result of a expression in the field "
"'required' must "
f"be a bool or None, not a {type(required_result)}. "
f"Expression '{sf_entry['required']}' resulted "
f"in '{required_result}'."
)
sf_required = required_result
else:
sf_required = True
if "$(" in sf_entry["pattern"] or "${" in sf_entry["pattern"]:
sfpath = self.do_eval(sf_entry["pattern"], context=datum)
else:
sfpath = substitute(
cast(str, datum["basename"]), sf_entry["pattern"]
)
for sfname in aslist(sfpath):
if not sfname:
continue
found = False
if isinstance(sfname, str):
d_location = cast(str, datum["location"])
if "/" in d_location:
sf_location = (
d_location[0 : d_location.rindex("/") + 1]
+ sfname
)
else:
sf_location = d_location + sfname
sfbasename = sfname
elif isinstance(sfname, MutableMapping):
sf_location = sfname["location"]
sfbasename = sfname["basename"]
else:
raise SourceLine(
sf_entry, "pattern", WorkflowException, debug
).makeError(
"Expected secondaryFile expression to "
"return type 'str', a 'File' or 'Directory' "
"dictionary, or a list of the same. Received "
f"'{type(sfname)} from '{sf_entry['pattern']}'."
)
for d in cast(
MutableSequence[MutableMapping[str, str]],
datum["secondaryFiles"],
):
if not d.get("basename"):
d["basename"] = d["location"][
d["location"].rindex("/") + 1 :
]
if d["basename"] == sfbasename:
found = True
if not found:
def addsf(
files: MutableSequence[CWLObjectType],
newsf: CWLObjectType,
) -> None:
for f in files:
if f["location"] == newsf["location"]:
f["basename"] = newsf["basename"]
return
files.append(newsf)
if isinstance(sfname, MutableMapping):
addsf(
cast(
MutableSequence[CWLObjectType],
datum["secondaryFiles"],
),
sfname,
)
elif discover_secondaryFiles and self.fs_access.exists(
sf_location
):
addsf(
cast(
MutableSequence[CWLObjectType],
datum["secondaryFiles"],
),
{
"location": sf_location,
"basename": sfname,
"class": "File",
},
)
elif sf_required:
raise SourceLine(
schema,
"secondaryFiles",
WorkflowException,
debug,
).makeError(
"Missing required secondary file '%s' from file object: %s"
% (sfname, json_dumps(datum, indent=4))
)
normalizeFilesDirs(
cast(MutableSequence[CWLObjectType], datum["secondaryFiles"])
)
if "format" in schema:
eval_format: Any = self.do_eval(schema["format"])
if isinstance(eval_format, str):
evaluated_format: Union[str, List[str]] = eval_format
elif isinstance(eval_format, MutableSequence):
for index, entry in enumerate(eval_format):
message = None
if not isinstance(entry, str):
message = (
"An expression in the 'format' field must "
"evaluate to a string, or list of strings. "
"However a non-string item was received: "
f"'{entry}' of type '{type(entry)}'. "
f"The expression was '{schema['format']}' and "
f"its fully evaluated result is '{eval_format}'."
)
if expression.needs_parsing(entry):
message = (
"For inputs, 'format' field can either "
"contain a single CWL Expression or CWL Parameter "
"Reference, a single format string, or a list of "
"format strings. But the list cannot contain CWL "
"Expressions or CWL Parameter References. List "
f"entry number {index+1} contains the following "
"unallowed CWL Parameter Reference or Expression: "
f"'{entry}'."
)
if message:
raise SourceLine(
schema["format"], index, WorkflowException, debug
).makeError(message)
evaluated_format = cast(List[str], eval_format)
else:
raise SourceLine(
schema, "format", WorkflowException, debug
).makeError(
"An expression in the 'format' field must "
"evaluate to a string, or list of strings. "
"However the type of the expression result was "
f"{type(eval_format)}. "
f"The expression was '{schema['format']}' and "
f"its fully evaluated result is 'eval_format'."
)
try:
check_format(
datum,
evaluated_format,
self.formatgraph,
)
except ValidationException as ve:
raise WorkflowException(
"Expected value of '%s' to have format %s but\n "
" %s" % (schema["name"], schema["format"], ve)
) from ve
visit_class(
datum.get("secondaryFiles", []),
("File", "Directory"),
_capture_files,
)
if schema["type"] == "org.w3id.cwl.cwl.Directory":
datum = cast(CWLObjectType, datum)
ll = schema.get("loadListing") or self.loadListing
if ll and ll != "no_listing":
get_listing(
self.fs_access,
datum,
(ll == "deep_listing"),
)
self.files.append(datum)
if schema["type"] == "Any":
visit_class(datum, ("File", "Directory"), _capture_files)
# Position to front of the sort key
if binding:
for bi in bindings:
bi["position"] = cast(List[int], binding["position"]) + cast(
List[int], bi["position"]
)
bindings.append(binding)
return bindings
def tostr(self, value: Union[MutableMapping[str, str], Any]) -> str:
if isinstance(value, MutableMapping) and value.get("class") in (
"File",
"Directory",
):
if "path" not in value:
raise WorkflowException(
'{} object missing "path": {}'.format(value["class"], value)
)
return value["path"]
else:
return str(value)
def generate_arg(self, binding: CWLObjectType) -> List[str]:
value = binding.get("datum")
debug = _logger.isEnabledFor(logging.DEBUG)
if "valueFrom" in binding:
with SourceLine(
binding,
"valueFrom",
WorkflowException,
debug,
):
value = self.do_eval(cast(str, binding["valueFrom"]), context=value)
prefix = cast(Optional[str], binding.get("prefix"))
sep = binding.get("separate", True)
if prefix is None and not sep:
with SourceLine(
binding,
"separate",
WorkflowException,
debug,
):
raise WorkflowException(
"'separate' option can not be specified without prefix"
)
argl = [] # type: MutableSequence[CWLOutputType]
if isinstance(value, MutableSequence):
if binding.get("itemSeparator") and value:
itemSeparator = cast(str, binding["itemSeparator"])
argl = [itemSeparator.join([self.tostr(v) for v in value])]
elif binding.get("valueFrom"):
value = [self.tostr(v) for v in value]
return cast(List[str], ([prefix] if prefix else [])) + cast(
List[str], value
)
elif prefix and value:
return [prefix]
else:
return []
elif isinstance(value, MutableMapping) and value.get("class") in (
"File",
"Directory",
):
argl = cast(MutableSequence[CWLOutputType], [value])
elif isinstance(value, MutableMapping):
return [prefix] if prefix else []
elif value is True and prefix:
return [prefix]
elif value is False or value is None or (value is True and not prefix):
return []
else:
argl = [value]
args = []
for j in argl:
if sep:
args.extend([prefix, self.tostr(j)])
else:
args.append(self.tostr(j) if prefix is None else prefix + self.tostr(j))
return [a for a in args if a is not None]
def do_eval(
self,
ex: Optional[CWLOutputType],
context: Optional[Any] = None,
recursive: bool = False,
strip_whitespace: bool = True,
) -> Optional[CWLOutputType]:
if recursive:
if isinstance(ex, MutableMapping):
return {k: self.do_eval(v, context, recursive) for k, v in ex.items()}
if isinstance(ex, MutableSequence):
return [self.do_eval(v, context, recursive) for v in ex]
resources = self.resources
if self.resources and "cores" in self.resources:
cores = resources["cores"]
resources = copy.copy(resources)
resources["cores"] = int(math.ceil(cores))
return expression.do_eval(
ex,
self.job,
self.requirements,
self.outdir,
self.tmpdir,
resources,
context=context,
timeout=self.timeout,
debug=self.debug,
js_console=self.js_console,
force_docker_pull=self.force_docker_pull,
strip_whitespace=strip_whitespace,
cwlVersion=self.cwlVersion,
container_engine=self.container_engine,
)
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/cwltool/checker.py 0000644 0001750 0001750 00000044677 00000000000 017137 0 ustar 00peter peter """Static checking of CWL workflow connectivity."""
from collections import namedtuple
from typing import (
Any,
Dict,
List,
MutableMapping,
MutableSequence,
Optional,
Sized,
Union,
cast,
)
from schema_salad.exceptions import ValidationException
from schema_salad.sourceline import SourceLine, bullets, strip_dup_lineno
from schema_salad.utils import json_dumps
from .errors import WorkflowException
from .loghandler import _logger
from .process import shortname
from .utils import CWLObjectType, CWLOutputAtomType, CWLOutputType, SinkType, aslist
def _get_type(tp):
# type: (Any) -> Any
if isinstance(tp, MutableMapping):
if tp.get("type") not in ("array", "record", "enum"):
return tp["type"]
return tp
def check_types(
srctype: SinkType,
sinktype: SinkType,
linkMerge: Optional[str],
valueFrom: Optional[str],
) -> str:
"""
Check if the source and sink types are correct.
Acceptable types are "pass", "warning", or "exception".
"""
if valueFrom is not None:
return "pass"
if linkMerge is None:
if can_assign_src_to_sink(srctype, sinktype, strict=True):
return "pass"
if can_assign_src_to_sink(srctype, sinktype, strict=False):
return "warning"
return "exception"
if linkMerge == "merge_nested":
return check_types(
{"items": _get_type(srctype), "type": "array"},
_get_type(sinktype),
None,
None,
)
if linkMerge == "merge_flattened":
return check_types(
merge_flatten_type(_get_type(srctype)), _get_type(sinktype), None, None
)
raise WorkflowException(f"Unrecognized linkMerge enum '{linkMerge}'")
def merge_flatten_type(src: SinkType) -> CWLOutputType:
"""Return the merge flattened type of the source type."""
if isinstance(src, MutableSequence):
return [merge_flatten_type(cast(SinkType, t)) for t in src]
if isinstance(src, MutableMapping) and src.get("type") == "array":
return src
return {"items": src, "type": "array"}
def can_assign_src_to_sink(
src: SinkType, sink: Optional[SinkType], strict: bool = False
) -> bool:
"""
Check for identical type specifications, ignoring extra keys like inputBinding.
src: admissible source types
sink: admissible sink types
In non-strict comparison, at least one source type must match one sink type.
In strict comparison, all source types must match at least one sink type.
"""
if src == "Any" or sink == "Any":
return True
if isinstance(src, MutableMapping) and isinstance(sink, MutableMapping):
if sink.get("not_connected") and strict:
return False
if src["type"] == "array" and sink["type"] == "array":
return can_assign_src_to_sink(
cast(MutableSequence[CWLOutputAtomType], src["items"]),
cast(MutableSequence[CWLOutputAtomType], sink["items"]),
strict,
)
if src["type"] == "record" and sink["type"] == "record":
return _compare_records(src, sink, strict)
if src["type"] == "File" and sink["type"] == "File":
for sinksf in cast(List[CWLObjectType], sink.get("secondaryFiles", [])):
if not [
1
for srcsf in cast(
List[CWLObjectType], src.get("secondaryFiles", [])
)
if sinksf == srcsf
]:
if strict:
return False
return True
return can_assign_src_to_sink(
cast(SinkType, src["type"]), cast(Optional[SinkType], sink["type"]), strict
)
if isinstance(src, MutableSequence):
if strict:
for this_src in src:
if not can_assign_src_to_sink(cast(SinkType, this_src), sink):
return False
return True
for this_src in src:
if can_assign_src_to_sink(cast(SinkType, this_src), sink):
return True
return False
if isinstance(sink, MutableSequence):
for this_sink in sink:
if can_assign_src_to_sink(src, cast(SinkType, this_sink)):
return True
return False
return bool(src == sink)
def _compare_records(
src: CWLObjectType, sink: CWLObjectType, strict: bool = False
) -> bool:
"""
Compare two records, ensuring they have compatible fields.
This handles normalizing record names, which will be relative to workflow
step, so that they can be compared.
"""
def _rec_fields(
rec,
): # type: (MutableMapping[str, Any]) -> MutableMapping[str, Any]
out = {}
for field in rec["fields"]:
name = shortname(field["name"])
out[name] = field["type"]
return out
srcfields = _rec_fields(src)
sinkfields = _rec_fields(sink)
for key in sinkfields.keys():
if (
not can_assign_src_to_sink(
srcfields.get(key, "null"), sinkfields.get(key, "null"), strict
)
and sinkfields.get(key) is not None
):
_logger.info(
"Record comparison failure for %s and %s\n"
"Did not match fields for %s: %s and %s",
src["name"],
sink["name"],
key,
srcfields.get(key),
sinkfields.get(key),
)
return False
return True
def missing_subset(fullset: List[Any], subset: List[Any]) -> List[Any]:
missing = []
for i in subset:
if i not in fullset:
missing.append(i)
return missing
def static_checker(
workflow_inputs: List[CWLObjectType],
workflow_outputs: MutableSequence[CWLObjectType],
step_inputs: MutableSequence[CWLObjectType],
step_outputs: List[CWLObjectType],
param_to_step: Dict[str, CWLObjectType],
) -> None:
"""Check if all source and sink types of a workflow are compatible before run time."""
# source parameters: workflow_inputs and step_outputs
# sink parameters: step_inputs and workflow_outputs
# make a dictionary of source parameters, indexed by the "id" field
src_parms = workflow_inputs + step_outputs
src_dict = {} # type: Dict[str, CWLObjectType]
for parm in src_parms:
src_dict[cast(str, parm["id"])] = parm
step_inputs_val = check_all_types(src_dict, step_inputs, "source", param_to_step)
workflow_outputs_val = check_all_types(
src_dict, workflow_outputs, "outputSource", param_to_step
)
warnings = step_inputs_val["warning"] + workflow_outputs_val["warning"]
exceptions = step_inputs_val["exception"] + workflow_outputs_val["exception"]
warning_msgs = []
exception_msgs = []
for warning in warnings:
src = warning.src
sink = warning.sink
linkMerge = warning.linkMerge
sinksf = sorted(
p["pattern"]
for p in sink.get("secondaryFiles", [])
if p.get("required", True)
)
srcsf = sorted(p["pattern"] for p in src.get("secondaryFiles", []))
# Every secondaryFile required by the sink, should be declared
# by the source
missing = missing_subset(srcsf, sinksf)
if missing:
msg1 = "Parameter '{}' requires secondaryFiles {} but".format(
shortname(sink["id"]),
missing,
)
msg3 = SourceLine(src, "id").makeError(
"source '%s' does not provide those secondaryFiles."
% (shortname(src["id"]))
)
msg4 = SourceLine(src.get("_tool_entry", src), "secondaryFiles").makeError(
"To resolve, add missing secondaryFiles patterns to definition of '%s' or"
% (shortname(src["id"]))
)
msg5 = SourceLine(
sink.get("_tool_entry", sink), "secondaryFiles"
).makeError(
"mark missing secondaryFiles in definition of '%s' as optional."
% shortname(sink["id"])
)
msg = SourceLine(sink).makeError(
"{}\n{}".format(msg1, bullets([msg3, msg4, msg5], " "))
)
elif sink.get("not_connected"):
if not sink.get("used_by_step"):
msg = SourceLine(sink, "type").makeError(
"'%s' is not an input parameter of %s, expected %s"
% (
shortname(sink["id"]),
param_to_step[sink["id"]]["run"],
", ".join(
shortname(cast(str, s["id"]))
for s in cast(
List[Dict[str, Union[str, bool]]],
param_to_step[sink["id"]]["inputs"],
)
if not s.get("not_connected")
),
)
)
else:
msg = ""
else:
msg = (
SourceLine(src, "type").makeError(
"Source '%s' of type %s may be incompatible"
% (shortname(src["id"]), json_dumps(src["type"]))
)
+ "\n"
+ SourceLine(sink, "type").makeError(
" with sink '%s' of type %s"
% (shortname(sink["id"]), json_dumps(sink["type"]))
)
)
if linkMerge is not None:
msg += "\n" + SourceLine(sink).makeError(
" source has linkMerge method %s" % linkMerge
)
if warning.message is not None:
msg += "\n" + SourceLine(sink).makeError(" " + warning.message)
if msg:
warning_msgs.append(msg)
for exception in exceptions:
src = exception.src
sink = exception.sink
linkMerge = exception.linkMerge
extra_message = exception.message
msg = (
SourceLine(src, "type").makeError(
"Source '%s' of type %s is incompatible"
% (shortname(src["id"]), json_dumps(src["type"]))
)
+ "\n"
+ SourceLine(sink, "type").makeError(
" with sink '%s' of type %s"
% (shortname(sink["id"]), json_dumps(sink["type"]))
)
)
if extra_message is not None:
msg += "\n" + SourceLine(sink).makeError(" " + extra_message)
if linkMerge is not None:
msg += "\n" + SourceLine(sink).makeError(
" source has linkMerge method %s" % linkMerge
)
exception_msgs.append(msg)
for sink in step_inputs:
if (
"null" != sink["type"]
and "null" not in sink["type"]
and "source" not in sink
and "default" not in sink
and "valueFrom" not in sink
):
msg = SourceLine(sink).makeError(
"Required parameter '%s' does not have source, default, or valueFrom expression"
% shortname(sink["id"])
)
exception_msgs.append(msg)
all_warning_msg = strip_dup_lineno("\n".join(warning_msgs))
all_exception_msg = strip_dup_lineno("\n" + "\n".join(exception_msgs))
if all_warning_msg:
_logger.warning("Workflow checker warning:\n%s", all_warning_msg)
if exceptions:
raise ValidationException(all_exception_msg)
SrcSink = namedtuple("SrcSink", ["src", "sink", "linkMerge", "message"])
def check_all_types(
src_dict: Dict[str, CWLObjectType],
sinks: MutableSequence[CWLObjectType],
sourceField: str,
param_to_step: Dict[str, CWLObjectType],
) -> Dict[str, List[SrcSink]]:
"""
Given a list of sinks, check if their types match with the types of their sources.
sourceField is either "soure" or "outputSource"
"""
validation = {"warning": [], "exception": []} # type: Dict[str, List[SrcSink]]
for sink in sinks:
if sourceField in sink:
valueFrom = cast(Optional[str], sink.get("valueFrom"))
pickValue = cast(Optional[str], sink.get("pickValue"))
extra_message = None
if pickValue is not None:
extra_message = "pickValue is: %s" % pickValue
if isinstance(sink[sourceField], MutableSequence):
linkMerge = cast(
Optional[str],
sink.get(
"linkMerge",
(
"merge_nested"
if len(cast(Sized, sink[sourceField])) > 1
else None
),
),
) # type: Optional[str]
if pickValue in ["first_non_null", "the_only_non_null"]:
linkMerge = None
srcs_of_sink = [] # type: List[CWLObjectType]
for parm_id in cast(MutableSequence[str], sink[sourceField]):
srcs_of_sink += [src_dict[parm_id]]
if (
is_conditional_step(param_to_step, parm_id)
and pickValue is None
):
validation["warning"].append(
SrcSink(
src_dict[parm_id],
sink,
linkMerge,
message="Source is from conditional step, but pickValue is not used",
)
)
else:
parm_id = cast(str, sink[sourceField])
if parm_id not in src_dict:
raise SourceLine(sink, sourceField, ValidationException).makeError(
f"{sourceField} not found: {parm_id}"
)
srcs_of_sink = [src_dict[parm_id]]
linkMerge = None
if pickValue is not None:
validation["warning"].append(
SrcSink(
src_dict[parm_id],
sink,
linkMerge,
message="pickValue is used but only a single input source is declared",
)
)
if is_conditional_step(param_to_step, parm_id):
src_typ = aslist(srcs_of_sink[0]["type"])
snk_typ = sink["type"]
if "null" not in src_typ:
src_typ = ["null"] + cast(List[Any], src_typ)
if "null" not in cast(
Union[List[str], CWLObjectType], snk_typ
): # Given our type names this works even if not a list
validation["warning"].append(
SrcSink(
src_dict[parm_id],
sink,
linkMerge,
message="Source is from conditional step and may produce `null`",
)
)
srcs_of_sink[0]["type"] = src_typ
for src in srcs_of_sink:
check_result = check_types(src, sink, linkMerge, valueFrom)
if check_result == "warning":
validation["warning"].append(
SrcSink(src, sink, linkMerge, message=extra_message)
)
elif check_result == "exception":
validation["exception"].append(
SrcSink(src, sink, linkMerge, message=extra_message)
)
return validation
def circular_dependency_checker(step_inputs: List[CWLObjectType]) -> None:
"""Check if a workflow has circular dependency."""
adjacency = get_dependency_tree(step_inputs)
vertices = adjacency.keys()
processed: List[str] = []
cycles: List[List[str]] = []
for vertex in vertices:
if vertex not in processed:
traversal_path = [vertex]
processDFS(adjacency, traversal_path, processed, cycles)
if cycles:
exception_msg = "The following steps have circular dependency:\n"
cyclestrs = [str(cycle) for cycle in cycles]
exception_msg += "\n".join(cyclestrs)
raise ValidationException(exception_msg)
def get_dependency_tree(step_inputs: List[CWLObjectType]) -> Dict[str, List[str]]:
"""Get the dependency tree in the form of adjacency list."""
adjacency = {} # adjacency list of the dependency tree
for step_input in step_inputs:
if "source" in step_input:
if isinstance(step_input["source"], list):
vertices_in = [
get_step_id(cast(str, src)) for src in step_input["source"]
]
else:
vertices_in = [get_step_id(cast(str, step_input["source"]))]
vertex_out = get_step_id(cast(str, step_input["id"]))
for vertex_in in vertices_in:
if vertex_in not in adjacency:
adjacency[vertex_in] = [vertex_out]
elif vertex_out not in adjacency[vertex_in]:
adjacency[vertex_in].append(vertex_out)
if vertex_out not in adjacency:
adjacency[vertex_out] = []
return adjacency
def processDFS(
adjacency: Dict[str, List[str]],
traversal_path: List[str],
processed: List[str],
cycles: List[List[str]],
) -> None:
"""Perform depth first search."""
tip = traversal_path[-1]
for vertex in adjacency[tip]:
if vertex in traversal_path:
i = traversal_path.index(vertex)
cycles.append(traversal_path[i:])
elif vertex not in processed:
traversal_path.append(vertex)
processDFS(adjacency, traversal_path, processed, cycles)
processed.append(tip)
traversal_path.pop()
def get_step_id(field_id: str) -> str:
"""Extract step id from either input or output fields."""
if "/" in field_id.split("#")[1]:
step_id = "/".join(field_id.split("/")[:-1])
else:
step_id = field_id.split("#")[0]
return step_id
def is_conditional_step(param_to_step: Dict[str, CWLObjectType], parm_id: str) -> bool:
source_step = param_to_step.get(parm_id)
if source_step is not None:
if source_step.get("when") is not None:
return True
return False
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/cwltool/command_line_tool.py 0000644 0001750 0001750 00000203311 00000000000 021173 0 ustar 00peter peter """Implementation of CommandLineTool."""
import copy
import hashlib
import json
import locale
import logging
import os
import re
import shutil
import threading
import urllib
import urllib.parse
from enum import Enum
from functools import cmp_to_key, partial
from typing import (
Any,
Callable,
Dict,
Generator,
List,
Mapping,
MutableMapping,
MutableSequence,
Optional,
Pattern,
Set,
TextIO,
Union,
cast,
)
import shellescape
from ruamel.yaml.comments import CommentedMap, CommentedSeq
from schema_salad.avro.schema import Schema
from schema_salad.exceptions import ValidationException
from schema_salad.ref_resolver import file_uri, uri_file_path
from schema_salad.sourceline import SourceLine
from schema_salad.utils import json_dumps
from schema_salad.validate import validate_ex
from typing_extensions import TYPE_CHECKING, Type
from .builder import (
INPUT_OBJ_VOCAB,
Builder,
content_limit_respected_read_bytes,
substitute,
)
from .context import LoadingContext, RuntimeContext, getdefault
from .docker import DockerCommandLineJob
from .errors import UnsupportedRequirement, WorkflowException
from .flatten import flatten
from .job import CommandLineJob, JobBase
from .loghandler import _logger
from .mpi import MPIRequirementName
from .mutation import MutationManager
from .pathmapper import PathMapper
from .process import (
Process,
_logger_validation_warnings,
compute_checksums,
shortname,
uniquename,
)
from .singularity import SingularityCommandLineJob
from .stdfsaccess import StdFsAccess
from .udocker import UDockerCommandLineJob
from .update import ORDERED_VERSIONS, ORIGINAL_CWLVERSION
from .utils import (
CWLObjectType,
CWLOutputType,
DirectoryType,
JobsGeneratorType,
OutputCallbackType,
adjustDirObjs,
adjustFileObjs,
aslist,
get_listing,
normalizeFilesDirs,
random_outdir,
shared_file_lock,
trim_listing,
upgrade_lock,
visit_class,
)
if TYPE_CHECKING:
from .provenance_profile import ProvenanceProfile # pylint: disable=unused-import
class PathCheckingMode(Enum):
"""
What characters are allowed in path names.
We have the strict (default) mode and the relaxed mode.
"""
STRICT = re.compile(r"^[\w.+\,\-:@\]^\u2600-\u26FF\U0001f600-\U0001f64f]+$")
# accepts names that contain one or more of the following:
# "\w" unicode word characters; this includes most characters
# that can be part of a word in any language, as well
# as numbers and the underscore
# "." a literal period
# "+" a literal plus sign
# "\," a literal comma
# "\-" a literal minus sign
# ":" a literal colon
# "@" a literal at-symbol
# "\]" a literal end-square-bracket
# "^" a literal caret symbol
# \u2600-\u26FF matches a single character in the range between
# ☀ (index 9728) and ⛿ (index 9983)
# \U0001f600-\U0001f64f matches a single character in the range between
# 😀 (index 128512) and 🙏 (index 128591)
# Note: the following characters are intentionally not included:
#
# 1. reserved words in POSIX:
# ! { }
#
# 2. POSIX metacharacters listed in the CWL standard as okay to reject
# | & ; < > ( ) $ ` " '
# (In accordance with
# https://www.commonwl.org/v1.0/CommandLineTool.html#File under "path" )
#
# 3. POSIX path separator
# \
# (also listed at
# https://www.commonwl.org/v1.0/CommandLineTool.html#File under "path")
#
# 4. Additional POSIX metacharacters
# * ? [ # ˜ = %
# TODO: switch to https://pypi.org/project/regex/ and use
# `\p{Extended_Pictographic}` instead of the manual emoji ranges
RELAXED = re.compile(r".*") # Accept anything
class ExpressionJob:
"""Job for ExpressionTools."""
def __init__(
self,
builder: Builder,
script: str,
output_callback: Optional[OutputCallbackType],
requirements: List[CWLObjectType],
hints: List[CWLObjectType],
outdir: Optional[str] = None,
tmpdir: Optional[str] = None,
) -> None:
"""Initializet this ExpressionJob."""
self.builder = builder
self.requirements = requirements
self.hints = hints
self.output_callback = output_callback
self.outdir = outdir
self.tmpdir = tmpdir
self.script = script
self.prov_obj = None # type: Optional[ProvenanceProfile]
def run(
self,
runtimeContext: RuntimeContext,
tmpdir_lock: Optional[threading.Lock] = None,
) -> None:
try:
normalizeFilesDirs(self.builder.job)
ev = self.builder.do_eval(self.script)
normalizeFilesDirs(
cast(
Optional[
Union[
MutableSequence[MutableMapping[str, Any]],
MutableMapping[str, Any],
DirectoryType,
]
],
ev,
)
)
if self.output_callback:
self.output_callback(cast(Optional[CWLObjectType], ev), "success")
except WorkflowException as err:
_logger.warning(
"Failed to evaluate expression:\n%s",
str(err),
exc_info=runtimeContext.debug,
)
if self.output_callback:
self.output_callback({}, "permanentFail")
class ExpressionTool(Process):
def job(
self,
job_order: CWLObjectType,
output_callbacks: Optional[OutputCallbackType],
runtimeContext: RuntimeContext,
) -> Generator[ExpressionJob, None, None]:
builder = self._init_job(job_order, runtimeContext)
job = ExpressionJob(
builder,
self.tool["expression"],
output_callbacks,
self.requirements,
self.hints,
)
job.prov_obj = runtimeContext.prov_obj
yield job
class AbstractOperation(Process):
def job(
self,
job_order: CWLObjectType,
output_callbacks: Optional[OutputCallbackType],
runtimeContext: RuntimeContext,
) -> JobsGeneratorType:
raise WorkflowException("Abstract operation cannot be executed.")
def remove_path(f): # type: (CWLObjectType) -> None
if "path" in f:
del f["path"]
def revmap_file(
builder: Builder, outdir: str, f: CWLObjectType
) -> Optional[CWLObjectType]:
"""
Remap a file from internal path to external path.
For Docker, this maps from the path inside tho container to the path
outside the container. Recognizes files in the pathmapper or remaps
internal output directories to the external directory.
"""
# builder.outdir is the inner (container/compute node) output directory
# outdir is the outer (host/storage system) output directory
if outdir.startswith("/"):
# local file path, turn it into a file:// URI
outdir = file_uri(outdir)
# note: outer outdir should already be a URI and should not be URI
# quoted any further.
if "location" in f and "path" not in f:
location = cast(str, f["location"])
if location.startswith("file://"):
f["path"] = uri_file_path(location)
else:
return f
if "dirname" in f:
del f["dirname"]
if "path" in f:
path = cast(str, f["path"])
uripath = file_uri(path)
del f["path"]
if "basename" not in f:
f["basename"] = os.path.basename(path)
if not builder.pathmapper:
raise ValueError(
"Do not call revmap_file using a builder that doesn't have a pathmapper."
)
revmap_f = builder.pathmapper.reversemap(path)
if revmap_f and not builder.pathmapper.mapper(revmap_f[0]).type.startswith(
"Writable"
):
f["location"] = revmap_f[1]
elif (
uripath == outdir
or uripath.startswith(outdir + os.sep)
or uripath.startswith(outdir + "/")
):
f["location"] = uripath
elif (
path == builder.outdir
or path.startswith(builder.outdir + os.sep)
or path.startswith(builder.outdir + "/")
):
joined_path = builder.fs_access.join(
outdir, urllib.parse.quote(path[len(builder.outdir) + 1 :])
)
f["location"] = joined_path
else:
raise WorkflowException(
"Output file path %s must be within designated output directory (%s) or an input "
"file pass through." % (path, builder.outdir)
)
return f
raise WorkflowException(
"Output File object is missing both 'location' and 'path' fields: %s" % f
)
class CallbackJob:
"""Callback Job class, used by CommandLine.job()."""
def __init__(
self,
job: "CommandLineTool",
output_callback: Optional[OutputCallbackType],
cachebuilder: Builder,
jobcache: str,
) -> None:
"""Initialize this CallbackJob."""
self.job = job
self.output_callback = output_callback
self.cachebuilder = cachebuilder
self.outdir = jobcache
self.prov_obj = None # type: Optional[ProvenanceProfile]
def run(
self,
runtimeContext: RuntimeContext,
tmpdir_lock: Optional[threading.Lock] = None,
) -> None:
if self.output_callback:
self.output_callback(
self.job.collect_output_ports(
self.job.tool["outputs"],
self.cachebuilder,
self.outdir,
getdefault(runtimeContext.compute_checksum, True),
),
"success",
)
def check_adjust(
accept_re: Pattern[str], builder: Builder, file_o: CWLObjectType
) -> CWLObjectType:
"""
Map files to assigned path inside a container.
We need to also explicitly walk over input, as implicit reassignment
doesn't reach everything in builder.bindings
"""
if not builder.pathmapper:
raise ValueError(
"Do not call check_adjust using a builder that doesn't have a pathmapper."
)
file_o["path"] = path = builder.pathmapper.mapper(cast(str, file_o["location"]))[1]
basename = cast(str, file_o.get("basename"))
dn, bn = os.path.split(path)
if file_o.get("dirname") != dn:
file_o["dirname"] = str(dn)
if basename != bn:
file_o["basename"] = basename = str(bn)
if file_o["class"] == "File":
nr, ne = os.path.splitext(basename)
if file_o.get("nameroot") != nr:
file_o["nameroot"] = str(nr)
if file_o.get("nameext") != ne:
file_o["nameext"] = str(ne)
if not accept_re.match(basename):
raise WorkflowException(
f"Invalid filename: '{file_o['basename']}' contains illegal characters"
)
return file_o
def check_valid_locations(fs_access: StdFsAccess, ob: CWLObjectType) -> None:
location = cast(str, ob["location"])
if location.startswith("_:"):
pass
if ob["class"] == "File" and not fs_access.isfile(location):
raise ValidationException("Does not exist or is not a File: '%s'" % location)
if ob["class"] == "Directory" and not fs_access.isdir(location):
raise ValidationException(
"Does not exist or is not a Directory: '%s'" % location
)
OutputPortsType = Dict[str, Optional[CWLOutputType]]
class ParameterOutputWorkflowException(WorkflowException):
def __init__(self, msg: str, port: CWLObjectType, **kwargs: Any) -> None:
"""Exception for when there was an error collecting output for a parameter."""
super().__init__(
"Error collecting output for parameter '%s': %s"
% (shortname(cast(str, port["id"])), msg),
kwargs,
)
class CommandLineTool(Process):
def __init__(
self, toolpath_object: CommentedMap, loadingContext: LoadingContext
) -> None:
"""Initialize this CommandLineTool."""
super().__init__(toolpath_object, loadingContext)
self.prov_obj = loadingContext.prov_obj
self.path_check_mode = (
PathCheckingMode.RELAXED
if loadingContext.relax_path_checks
else PathCheckingMode.STRICT
) # type: PathCheckingMode
def make_job_runner(self, runtimeContext: RuntimeContext) -> Type[JobBase]:
dockerReq, dockerRequired = self.get_requirement("DockerRequirement")
mpiReq, mpiRequired = self.get_requirement(MPIRequirementName)
if not dockerReq and runtimeContext.use_container:
if runtimeContext.find_default_container is not None:
default_container = runtimeContext.find_default_container(self)
if default_container is not None:
dockerReq = {
"class": "DockerRequirement",
"dockerPull": default_container,
}
if mpiRequired:
self.hints.insert(0, dockerReq)
dockerRequired = False
else:
self.requirements.insert(0, dockerReq)
dockerRequired = True
if dockerReq is not None and runtimeContext.use_container:
if mpiReq is not None:
_logger.warning("MPIRequirement with containers is a beta feature")
if runtimeContext.singularity:
return SingularityCommandLineJob
elif runtimeContext.user_space_docker_cmd:
return UDockerCommandLineJob
if mpiReq is not None:
if mpiRequired:
if dockerRequired:
raise UnsupportedRequirement(
"No support for Docker and MPIRequirement both being required"
)
else:
_logger.warning(
"MPI has been required while Docker is hinted, discarding Docker hint(s)"
)
self.hints = [
h for h in self.hints if h["class"] != "DockerRequirement"
]
return CommandLineJob
else:
if dockerRequired:
_logger.warning(
"Docker has been required while MPI is hinted, discarding MPI hint(s)"
)
self.hints = [
h for h in self.hints if h["class"] != MPIRequirementName
]
else:
raise UnsupportedRequirement(
"Both Docker and MPI have been hinted - don't know what to do"
)
return DockerCommandLineJob
if dockerRequired:
raise UnsupportedRequirement(
"--no-container, but this CommandLineTool has "
"DockerRequirement under 'requirements'."
)
return CommandLineJob
def make_path_mapper(
self,
reffiles: List[CWLObjectType],
stagedir: str,
runtimeContext: RuntimeContext,
separateDirs: bool,
) -> PathMapper:
return PathMapper(reffiles, runtimeContext.basedir, stagedir, separateDirs)
def updatePathmap(
self, outdir: str, pathmap: PathMapper, fn: CWLObjectType
) -> None:
if not isinstance(fn, MutableMapping):
raise WorkflowException(
"Expected File or Directory object, was %s" % type(fn)
)
basename = cast(str, fn["basename"])
if "location" in fn:
location = cast(str, fn["location"])
if location in pathmap:
pathmap.update(
location,
pathmap.mapper(location).resolved,
os.path.join(outdir, basename),
("Writable" if fn.get("writable") else "") + cast(str, fn["class"]),
False,
)
for sf in cast(List[CWLObjectType], fn.get("secondaryFiles", [])):
self.updatePathmap(outdir, pathmap, sf)
for ls in cast(List[CWLObjectType], fn.get("listing", [])):
self.updatePathmap(
os.path.join(outdir, cast(str, fn["basename"])), pathmap, ls
)
def _initialworkdir(self, j: JobBase, builder: Builder) -> None:
initialWorkdir, _ = self.get_requirement("InitialWorkDirRequirement")
if initialWorkdir is None:
return
debug = _logger.isEnabledFor(logging.DEBUG)
cwl_version = cast(Optional[str], self.metadata.get(ORIGINAL_CWLVERSION, None))
classic_dirent: bool = cwl_version is not None and (
ORDERED_VERSIONS.index(cwl_version) < ORDERED_VERSIONS.index("v1.2.0-dev2")
)
classic_listing = cwl_version and ORDERED_VERSIONS.index(
cwl_version
) < ORDERED_VERSIONS.index("v1.1.0-dev1")
ls = [] # type: List[CWLObjectType]
if isinstance(initialWorkdir["listing"], str):
# "listing" is just a string (must be an expression) so
# just evaluate it and use the result as if it was in
# listing
ls_evaluated = builder.do_eval(initialWorkdir["listing"])
fail: Any = False
fail_suffix: str = ""
if not isinstance(ls_evaluated, MutableSequence):
fail = ls_evaluated
else:
ls_evaluated2 = cast(
MutableSequence[Union[None, CWLOutputType]], ls_evaluated
)
for entry in ls_evaluated2:
if entry == None: # noqa
if classic_dirent:
fail = entry
fail_suffix = (
" Dirent.entry cannot return 'null' before CWL "
"v1.2. Please consider using 'cwl-upgrader' to "
"upgrade your document to CWL version v1.2."
)
elif isinstance(entry, MutableSequence):
if classic_listing:
raise SourceLine(
initialWorkdir, "listing", WorkflowException, debug
).makeError(
"InitialWorkDirRequirement.listing expressions "
"cannot return arrays of Files or Directories "
"before CWL v1.1. Please "
"considering using 'cwl-upgrader' to upgrade "
"your document to CWL v1.1' or later."
)
else:
for entry2 in entry:
if not (
isinstance(entry2, MutableMapping)
and (
"class" in entry2
and entry2["class"] == "File"
or "Directory"
)
):
fail = (
"an array with an item ('{entry2}') that is "
"not a File nor a Directory object."
)
elif not (
isinstance(entry, MutableMapping)
and (
"class" in entry
and (entry["class"] == "File" or "Directory")
or "entry" in entry
)
):
fail = entry
if fail is not False:
message = (
"Expression in a 'InitialWorkdirRequirement.listing' field "
"must return a list containing zero or more of: File or "
"Directory objects; Dirent objects"
)
if classic_dirent:
message += ". "
else:
message += "; null; or arrays of File or Directory objects. "
message += f"Got '{fail}' among the results from "
message += f"'{initialWorkdir['listing'].strip()}'." + fail_suffix
raise SourceLine(
initialWorkdir, "listing", WorkflowException, debug
).makeError(message)
ls = cast(List[CWLObjectType], ls_evaluated)
else:
# "listing" is an array of either expressions or Dirent so
# evaluate each item
for t in cast(
MutableSequence[Union[str, CWLObjectType]],
initialWorkdir["listing"],
):
if isinstance(t, Mapping) and "entry" in t:
# Dirent
entry_field = cast(str, t["entry"])
# the schema guarentees that 'entry' is a string, so the cast is safe
entry = builder.do_eval(entry_field, strip_whitespace=False)
if entry is None:
continue
if isinstance(entry, MutableSequence):
if classic_listing:
raise SourceLine(
t, "entry", WorkflowException, debug
).makeError(
"'entry' expressions are not allowed to evaluate "
"to an array of Files or Directories until CWL "
"v1.2. Consider using 'cwl-upgrader' to upgrade "
"your document to CWL version 1.2."
)
# Nested list. If it is a list of File or
# Directory objects, add it to the
# file list, otherwise JSON serialize it if CWL v1.2.
filelist = True
for e in entry:
if not isinstance(e, MutableMapping) or e.get(
"class"
) not in ("File", "Directory"):
filelist = False
break
if filelist:
if "entryname" in t:
raise SourceLine(
t, "entryname", WorkflowException, debug
).makeError(
"'entryname' is invalid when 'entry' returns list of File or Directory"
)
for e in entry:
ec = cast(CWLObjectType, e)
ec["writeable"] = t.get("writable", False)
ls.extend(cast(List[CWLObjectType], entry))
continue
et = {} # type: CWLObjectType
if isinstance(entry, Mapping) and entry.get("class") in (
"File",
"Directory",
):
et["entry"] = cast(CWLOutputType, entry)
else:
if isinstance(entry, str):
et["entry"] = entry
else:
if classic_dirent:
raise SourceLine(
t, "entry", WorkflowException, debug
).makeError(
"'entry' expression resulted in "
"something other than number, object or "
"array besides a single File or Dirent object. "
"In CWL v1.2+ this would be serialized to a JSON object. "
"However this is a {cwl_version} document. "
"If that is the desired result then please "
"consider using 'cwl-upgrader' to upgrade "
"your document to CWL version 1.2. "
f"Result of '{entry_field}' was '{entry}'."
)
et["entry"] = json_dumps(entry, sort_keys=True)
if "entryname" in t:
entryname_field = cast(str, t["entryname"])
if "${" in entryname_field or "$(" in entryname_field:
en = builder.do_eval(cast(str, t["entryname"]))
if not isinstance(en, str):
raise SourceLine(
t, "entryname", WorkflowException, debug
).makeError(
"'entryname' expression must result a string. "
f"Got '{en}' from '{entryname_field}'"
)
et["entryname"] = en
else:
et["entryname"] = entryname_field
else:
et["entryname"] = None
et["writable"] = t.get("writable", False)
ls.append(et)
else:
# Expression, must return a Dirent, File, Directory
# or array of such.
initwd_item = builder.do_eval(t)
if not initwd_item:
continue
if isinstance(initwd_item, MutableSequence):
ls.extend(cast(List[CWLObjectType], initwd_item))
else:
ls.append(cast(CWLObjectType, initwd_item))
for i, t2 in enumerate(ls):
if not isinstance(t2, Mapping):
raise SourceLine(
initialWorkdir, "listing", WorkflowException, debug
).makeError(
"Entry at index %s of listing is not a record, was %s"
% (i, type(t2))
)
if "entry" not in t2:
continue
# Dirent
if isinstance(t2["entry"], str):
if not t2["entryname"]:
raise SourceLine(
initialWorkdir, "listing", WorkflowException, debug
).makeError("Entry at index %s of listing missing entryname" % (i))
ls[i] = {
"class": "File",
"basename": t2["entryname"],
"contents": t2["entry"],
"writable": t2.get("writable"),
}
continue
if not isinstance(t2["entry"], Mapping):
raise SourceLine(
initialWorkdir, "listing", WorkflowException, debug
).makeError(
"Entry at index %s of listing is not a record, was %s"
% (i, type(t2["entry"]))
)
if t2["entry"].get("class") not in ("File", "Directory"):
raise SourceLine(
initialWorkdir, "listing", WorkflowException, debug
).makeError(
"Entry at index %s of listing is not a File or Directory object, was %s"
% (i, t2)
)
if t2.get("entryname") or t2.get("writable"):
t2 = copy.deepcopy(t2)
t2entry = cast(CWLObjectType, t2["entry"])
if t2.get("entryname"):
t2entry["basename"] = t2["entryname"]
t2entry["writable"] = t2.get("writable")
ls[i] = cast(CWLObjectType, t2["entry"])
for i, t3 in enumerate(ls):
if t3.get("class") not in ("File", "Directory"):
# Check that every item is a File or Directory object now
raise SourceLine(
initialWorkdir, "listing", WorkflowException, debug
).makeError(
f"Entry at index {i} of listing is not a Dirent, File or "
f"Directory object, was {t2}."
)
if "basename" not in t3:
continue
basename = os.path.normpath(cast(str, t3["basename"]))
t3["basename"] = basename
if basename.startswith("../"):
raise SourceLine(
initialWorkdir, "listing", WorkflowException, debug
).makeError(
f"Name '{basename}' at index {i} of listing is invalid, "
"cannot start with '../'"
)
if basename.startswith("/"):
# only if DockerRequirement in requirements
if cwl_version and ORDERED_VERSIONS.index(
cwl_version
) < ORDERED_VERSIONS.index("v1.2.0-dev4"):
raise SourceLine(
initialWorkdir, "listing", WorkflowException, debug
).makeError(
f"Name '{basename}' at index {i} of listing is invalid, "
"paths starting with '/' are only permitted in CWL 1.2 "
"and later. Consider changing the absolute path to a relative "
"path, or upgrade the CWL description to CWL v1.2 using "
"https://pypi.org/project/cwl-upgrader/"
)
req, is_req = self.get_requirement("DockerRequirement")
if is_req is not True:
raise SourceLine(
initialWorkdir, "listing", WorkflowException, debug
).makeError(
f"Name '{basename}' at index {i} of listing is invalid, "
"name can only start with '/' when DockerRequirement "
"is in 'requirements'."
)
with SourceLine(initialWorkdir, "listing", WorkflowException, debug):
j.generatefiles["listing"] = ls
for entry in ls:
if "basename" in entry:
basename = cast(str, entry["basename"])
entry["dirname"] = os.path.join(
builder.outdir, os.path.dirname(basename)
)
entry["basename"] = os.path.basename(basename)
normalizeFilesDirs(entry)
self.updatePathmap(
cast(Optional[str], entry.get("dirname")) or builder.outdir,
cast(PathMapper, builder.pathmapper),
entry,
)
if "listing" in entry:
def remove_dirname(d: CWLObjectType) -> None:
if "dirname" in d:
del d["dirname"]
visit_class(
entry["listing"],
("File", "Directory"),
remove_dirname,
)
visit_class(
[builder.files, builder.bindings],
("File", "Directory"),
partial(check_adjust, self.path_check_mode.value, builder),
)
def job(
self,
job_order: CWLObjectType,
output_callbacks: Optional[OutputCallbackType],
runtimeContext: RuntimeContext,
) -> Generator[Union[JobBase, CallbackJob], None, None]:
workReuse, _ = self.get_requirement("WorkReuse")
enableReuse = workReuse.get("enableReuse", True) if workReuse else True
jobname = uniquename(
runtimeContext.name or shortname(self.tool.get("id", "job"))
)
if runtimeContext.cachedir and enableReuse:
cachecontext = runtimeContext.copy()
cachecontext.outdir = "/out"
cachecontext.tmpdir = "/tmp" # nosec
cachecontext.stagedir = "/stage"
cachebuilder = self._init_job(job_order, cachecontext)
cachebuilder.pathmapper = PathMapper(
cachebuilder.files,
runtimeContext.basedir,
cachebuilder.stagedir,
separateDirs=False,
)
_check_adjust = partial(
check_adjust, self.path_check_mode.value, cachebuilder
)
visit_class(
[cachebuilder.files, cachebuilder.bindings],
("File", "Directory"),
_check_adjust,
)
cmdline = flatten(
list(map(cachebuilder.generate_arg, cachebuilder.bindings))
)
docker_req, _ = self.get_requirement("DockerRequirement")
if docker_req is not None and runtimeContext.use_container:
dockerimg = docker_req.get("dockerImageId") or docker_req.get(
"dockerPull"
)
elif (
runtimeContext.default_container is not None
and runtimeContext.use_container
):
dockerimg = runtimeContext.default_container
else:
dockerimg = None
if dockerimg is not None:
cmdline = ["docker", "run", dockerimg] + cmdline
# not really run using docker, just for hashing purposes
keydict = {
"cmdline": cmdline
} # type: Dict[str, Union[MutableSequence[Union[str, int]], CWLObjectType]]
for shortcut in ["stdin", "stdout", "stderr"]:
if shortcut in self.tool:
keydict[shortcut] = self.tool[shortcut]
def calc_checksum(location: str) -> Optional[str]:
for e in cachebuilder.files:
if (
"location" in e
and e["location"] == location
and "checksum" in e
and e["checksum"] != "sha1$hash"
):
return cast(Optional[str], e["checksum"])
return None
for location, fobj in cachebuilder.pathmapper.items():
if fobj.type == "File":
checksum = calc_checksum(location)
fobj_stat = os.stat(fobj.resolved)
if checksum is not None:
keydict[fobj.resolved] = [fobj_stat.st_size, checksum]
else:
keydict[fobj.resolved] = [
fobj_stat.st_size,
int(fobj_stat.st_mtime * 1000),
]
interesting = {
"DockerRequirement",
"EnvVarRequirement",
"InitialWorkDirRequirement",
"ShellCommandRequirement",
"NetworkAccess",
}
for rh in (self.original_requirements, self.original_hints):
for r in reversed(rh):
cls = cast(str, r["class"])
if cls in interesting and cls not in keydict:
keydict[cls] = r
keydictstr = json_dumps(keydict, separators=(",", ":"), sort_keys=True)
cachekey = hashlib.md5(keydictstr.encode("utf-8")).hexdigest() # nosec
_logger.debug(
"[job %s] keydictstr is %s -> %s", jobname, keydictstr, cachekey
)
jobcache = os.path.join(runtimeContext.cachedir, cachekey)
# Create a lockfile to manage cache status.
jobcachepending = f"{jobcache}.status"
jobcachelock = None
jobstatus = None
# Opens the file for read/write, or creates an empty file.
jobcachelock = open(jobcachepending, "a+")
# get the shared lock to ensure no other process is trying
# to write to this cache
shared_file_lock(jobcachelock)
jobcachelock.seek(0)
jobstatus = jobcachelock.read()
if os.path.isdir(jobcache) and jobstatus == "success":
if docker_req and runtimeContext.use_container:
cachebuilder.outdir = (
runtimeContext.docker_outdir or random_outdir()
)
else:
cachebuilder.outdir = jobcache
_logger.info("[job %s] Using cached output in %s", jobname, jobcache)
yield CallbackJob(self, output_callbacks, cachebuilder, jobcache)
# we're done with the cache so release lock
jobcachelock.close()
return
else:
_logger.info(
"[job %s] Output of job will be cached in %s", jobname, jobcache
)
# turn shared lock into an exclusive lock since we'll
# be writing the cache directory
upgrade_lock(jobcachelock)
shutil.rmtree(jobcache, True)
os.makedirs(jobcache)
runtimeContext = runtimeContext.copy()
runtimeContext.outdir = jobcache
def update_status_output_callback(
output_callbacks: OutputCallbackType,
jobcachelock: TextIO,
outputs: Optional[CWLObjectType],
processStatus: str,
) -> None:
# save status to the lockfile then release the lock
jobcachelock.seek(0)
jobcachelock.truncate()
jobcachelock.write(processStatus)
jobcachelock.close()
output_callbacks(outputs, processStatus)
output_callbacks = partial(
update_status_output_callback, output_callbacks, jobcachelock
)
builder = self._init_job(job_order, runtimeContext)
reffiles = copy.deepcopy(builder.files)
j = self.make_job_runner(runtimeContext)(
builder,
builder.job,
self.make_path_mapper,
self.requirements,
self.hints,
jobname,
)
j.prov_obj = self.prov_obj
j.successCodes = self.tool.get("successCodes", [])
j.temporaryFailCodes = self.tool.get("temporaryFailCodes", [])
j.permanentFailCodes = self.tool.get("permanentFailCodes", [])
debug = _logger.isEnabledFor(logging.DEBUG)
if debug:
_logger.debug(
"[job %s] initializing from %s%s",
j.name,
self.tool.get("id", ""),
" as part of %s" % runtimeContext.part_of
if runtimeContext.part_of
else "",
)
_logger.debug("[job %s] %s", j.name, json_dumps(builder.job, indent=4))
builder.pathmapper = self.make_path_mapper(
reffiles, builder.stagedir, runtimeContext, True
)
builder.requirements = j.requirements
_check_adjust = partial(check_adjust, self.path_check_mode.value, builder)
visit_class(
[builder.files, builder.bindings], ("File", "Directory"), _check_adjust
)
self._initialworkdir(j, builder)
if debug:
_logger.debug(
"[job %s] path mappings is %s",
j.name,
json_dumps(
{
p: builder.pathmapper.mapper(p)
for p in builder.pathmapper.files()
},
indent=4,
),
)
if self.tool.get("stdin"):
with SourceLine(self.tool, "stdin", ValidationException, debug):
stdin_eval = builder.do_eval(self.tool["stdin"])
if not (isinstance(stdin_eval, str) or stdin_eval is None):
raise ValidationException(
f"'stdin' expression must return a string or null. Got '{stdin_eval}' "
f"for '{self.tool['stdin']}'."
)
j.stdin = stdin_eval
if j.stdin:
reffiles.append({"class": "File", "path": j.stdin})
if self.tool.get("stderr"):
with SourceLine(self.tool, "stderr", ValidationException, debug):
stderr_eval = builder.do_eval(self.tool["stderr"])
if not isinstance(stderr_eval, str):
raise ValidationException(
f"'stderr' expression must return a string. Got '{stderr_eval}' "
f"for '{self.tool['stderr']}'."
)
j.stderr = stderr_eval
if j.stderr:
if os.path.isabs(j.stderr) or ".." in j.stderr:
raise ValidationException(
"stderr must be a relative path, got '%s'" % j.stderr
)
if self.tool.get("stdout"):
with SourceLine(self.tool, "stdout", ValidationException, debug):
stdout_eval = builder.do_eval(self.tool["stdout"])
if not isinstance(stdout_eval, str):
raise ValidationException(
f"'stdout' expression must return a string. Got '{stdout_eval}' "
f"for '{self.tool['stdout']}'."
)
j.stdout = stdout_eval
if j.stdout:
if os.path.isabs(j.stdout) or ".." in j.stdout or not j.stdout:
raise ValidationException(
"stdout must be a relative path, got '%s'" % j.stdout
)
if debug:
_logger.debug(
"[job %s] command line bindings is %s",
j.name,
json_dumps(builder.bindings, indent=4),
)
dockerReq, _ = self.get_requirement("DockerRequirement")
if dockerReq is not None and runtimeContext.use_container:
j.outdir = runtimeContext.get_outdir()
j.tmpdir = runtimeContext.get_tmpdir()
j.stagedir = runtimeContext.create_tmpdir()
else:
j.outdir = builder.outdir
j.tmpdir = builder.tmpdir
j.stagedir = builder.stagedir
inplaceUpdateReq, _ = self.get_requirement("InplaceUpdateRequirement")
if inplaceUpdateReq is not None:
j.inplace_update = cast(bool, inplaceUpdateReq["inplaceUpdate"])
normalizeFilesDirs(j.generatefiles)
readers = {} # type: Dict[str, CWLObjectType]
muts = set() # type: Set[str]
if builder.mutation_manager is not None:
def register_mut(f: CWLObjectType) -> None:
mm = cast(MutationManager, builder.mutation_manager)
muts.add(cast(str, f["location"]))
mm.register_mutation(j.name, f)
def register_reader(f: CWLObjectType) -> None:
mm = cast(MutationManager, builder.mutation_manager)
if cast(str, f["location"]) not in muts:
mm.register_reader(j.name, f)
readers[cast(str, f["location"])] = copy.deepcopy(f)
for li in j.generatefiles["listing"]:
if li.get("writable") and j.inplace_update:
adjustFileObjs(li, register_mut)
adjustDirObjs(li, register_mut)
else:
adjustFileObjs(li, register_reader)
adjustDirObjs(li, register_reader)
adjustFileObjs(builder.files, register_reader)
adjustFileObjs(builder.bindings, register_reader)
adjustDirObjs(builder.files, register_reader)
adjustDirObjs(builder.bindings, register_reader)
timelimit, _ = self.get_requirement("ToolTimeLimit")
if timelimit is not None:
with SourceLine(timelimit, "timelimit", ValidationException, debug):
limit_field = cast(Dict[str, Union[str, int]], timelimit)["timelimit"]
if isinstance(limit_field, str):
timelimit_eval = builder.do_eval(limit_field)
if timelimit_eval and not isinstance(timelimit_eval, int):
raise WorkflowException(
"'timelimit' expression must evaluate to a long/int. Got "
f"'{timelimit_eval}' for expression '{limit_field}'."
)
else:
timelimit_eval = limit_field
if not isinstance(timelimit_eval, int) or timelimit_eval < 0:
raise WorkflowException(
f"timelimit must be an integer >= 0, got: {timelimit_eval}"
)
j.timelimit = timelimit_eval
networkaccess, _ = self.get_requirement("NetworkAccess")
if networkaccess is not None:
with SourceLine(networkaccess, "networkAccess", ValidationException, debug):
networkaccess_field = networkaccess["networkAccess"]
if isinstance(networkaccess_field, str):
networkaccess_eval = builder.do_eval(networkaccess_field)
if not isinstance(networkaccess_eval, bool):
raise WorkflowException(
"'networkAccess' expression must evaluate to a bool. "
f"Got '{networkaccess_eval}' for expression '{networkaccess_field}'."
)
else:
networkaccess_eval = networkaccess_field
if not isinstance(networkaccess_eval, bool):
raise WorkflowException(
"networkAccess must be a boolean, got: {networkaccess_eval}."
)
j.networkaccess = networkaccess_eval
# Build a mapping to hold any EnvVarRequirement
required_env = {}
evr, _ = self.get_requirement("EnvVarRequirement")
if evr is not None:
for eindex, t3 in enumerate(cast(List[Dict[str, str]], evr["envDef"])):
env_value_field = t3["envValue"]
if "${" in env_value_field or "$(" in env_value_field:
env_value_eval = builder.do_eval(env_value_field)
if not isinstance(env_value_eval, str):
raise SourceLine(
evr["envDef"], eindex, WorkflowException, debug
).makeError(
"'envValue expression must evaluate to a str. "
f"Got '{env_value_eval}' for expression '{env_value_field}'."
)
env_value = env_value_eval
else:
env_value = env_value_field
required_env[t3["envName"]] = env_value
# Construct the env
j.prepare_environment(runtimeContext, required_env)
shellcmd, _ = self.get_requirement("ShellCommandRequirement")
if shellcmd is not None:
cmd = [] # type: List[str]
for b in builder.bindings:
arg = builder.generate_arg(b)
if b.get("shellQuote", True):
arg = [shellescape.quote(a) for a in aslist(arg)]
cmd.extend(aslist(arg))
j.command_line = ["/bin/sh", "-c", " ".join(cmd)]
else:
j.command_line = flatten(list(map(builder.generate_arg, builder.bindings)))
j.pathmapper = builder.pathmapper
j.collect_outputs = partial(
self.collect_output_ports,
self.tool["outputs"],
builder,
compute_checksum=getdefault(runtimeContext.compute_checksum, True),
jobname=jobname,
readers=readers,
)
j.output_callback = output_callbacks
mpi, _ = self.get_requirement(MPIRequirementName)
if mpi is not None:
np = cast( # From the schema for MPIRequirement.processes
Union[int, str],
mpi.get("processes", runtimeContext.mpi_config.default_nproc),
)
if isinstance(np, str):
np_eval = builder.do_eval(np)
if not isinstance(np_eval, int):
raise SourceLine(
mpi, "processes", WorkflowException, debug
).makeError(
f"{MPIRequirementName} needs 'processes' expression to "
f"evaluate to an int, got '{np_eval}' for expression '{np}'."
)
np = np_eval
j.mpi_procs = np
yield j
def collect_output_ports(
self,
ports: Union[CommentedSeq, Set[CWLObjectType]],
builder: Builder,
outdir: str,
rcode: int,
compute_checksum: bool = True,
jobname: str = "",
readers: Optional[MutableMapping[str, CWLObjectType]] = None,
) -> OutputPortsType:
ret = {} # type: OutputPortsType
debug = _logger.isEnabledFor(logging.DEBUG)
cwl_version = self.metadata.get(ORIGINAL_CWLVERSION, None)
if cwl_version != "v1.0":
builder.resources["exitCode"] = rcode
try:
fs_access = builder.make_fs_access(outdir)
custom_output = fs_access.join(outdir, "cwl.output.json")
if fs_access.exists(custom_output):
with fs_access.open(custom_output, "r") as f:
ret = json.load(f)
if debug:
_logger.debug(
"Raw output from %s: %s",
custom_output,
json_dumps(ret, indent=4),
)
else:
for i, port in enumerate(ports):
with SourceLine(
ports,
i,
partial(ParameterOutputWorkflowException, port=port),
debug,
):
fragment = shortname(port["id"])
ret[fragment] = self.collect_output(
port,
builder,
outdir,
fs_access,
compute_checksum=compute_checksum,
)
if ret:
revmap = partial(revmap_file, builder, outdir)
adjustDirObjs(ret, trim_listing)
visit_class(ret, ("File", "Directory"), revmap)
visit_class(ret, ("File", "Directory"), remove_path)
normalizeFilesDirs(ret)
visit_class(
ret,
("File", "Directory"),
partial(check_valid_locations, fs_access),
)
if compute_checksum:
adjustFileObjs(ret, partial(compute_checksums, fs_access))
expected_schema = cast(
Schema, self.names.get_name("outputs_record_schema", None)
)
validate_ex(
expected_schema,
ret,
strict=False,
logger=_logger_validation_warnings,
vocab=INPUT_OBJ_VOCAB,
)
if ret is not None and builder.mutation_manager is not None:
adjustFileObjs(ret, builder.mutation_manager.set_generation)
return ret if ret is not None else {}
except ValidationException as e:
raise WorkflowException(
"Error validating output record. "
+ str(e)
+ "\n in "
+ json_dumps(ret, indent=4)
) from e
finally:
if builder.mutation_manager and readers:
for r in readers.values():
builder.mutation_manager.release_reader(jobname, r)
def collect_output(
self,
schema: CWLObjectType,
builder: Builder,
outdir: str,
fs_access: StdFsAccess,
compute_checksum: bool = True,
) -> Optional[CWLOutputType]:
r = [] # type: List[CWLOutputType]
empty_and_optional = False
debug = _logger.isEnabledFor(logging.DEBUG)
result: Optional[CWLOutputType] = None
if "outputBinding" in schema:
binding = cast(
MutableMapping[str, Union[bool, str, List[str]]],
schema["outputBinding"],
)
globpatterns = [] # type: List[str]
revmap = partial(revmap_file, builder, outdir)
if "glob" in binding:
with SourceLine(binding, "glob", WorkflowException, debug):
for gb in aslist(binding["glob"]):
gb = builder.do_eval(gb)
if gb:
gb_eval_fail = False
if not isinstance(gb, str):
if isinstance(gb, list):
for entry in gb:
if not isinstance(entry, str):
gb_eval_fail = True
else:
gb_eval_fail = True
if gb_eval_fail:
raise WorkflowException(
"Resolved glob patterns must be strings "
f"or list of strings, not "
f"'{gb}' from '{binding['glob']}'"
)
globpatterns.extend(aslist(gb))
for gb in globpatterns:
if gb.startswith(builder.outdir):
gb = gb[len(builder.outdir) + 1 :]
elif gb == ".":
gb = outdir
elif gb.startswith("/"):
raise WorkflowException(
"glob patterns must not start with '/'"
)
try:
prefix = fs_access.glob(outdir)
sorted_glob_result = sorted(
fs_access.glob(fs_access.join(outdir, gb)),
key=cmp_to_key(
cast(
Callable[[str, str], int],
locale.strcoll,
)
),
)
r.extend(
[
{
"location": g,
"path": fs_access.join(
builder.outdir,
urllib.parse.unquote(
g[len(prefix[0]) + 1 :]
),
),
"basename": decoded_basename,
"nameroot": os.path.splitext(decoded_basename)[
0
],
"nameext": os.path.splitext(decoded_basename)[
1
],
"class": "File"
if fs_access.isfile(g)
else "Directory",
}
for g, decoded_basename in zip(
sorted_glob_result,
map(
lambda x: os.path.basename(
urllib.parse.unquote(x)
),
sorted_glob_result,
),
)
]
)
except (OSError) as e:
_logger.warning(str(e))
except Exception:
_logger.error(
"Unexpected error from fs_access", exc_info=True
)
raise
for files in cast(List[Dict[str, Optional[CWLOutputType]]], r):
rfile = files.copy()
revmap(rfile)
if files["class"] == "Directory":
ll = binding.get("loadListing") or builder.loadListing
if ll and ll != "no_listing":
get_listing(fs_access, files, (ll == "deep_listing"))
else:
if binding.get("loadContents"):
with fs_access.open(
cast(str, rfile["location"]), "rb"
) as f:
files["contents"] = content_limit_respected_read_bytes(
f
).decode("utf-8")
if compute_checksum:
with fs_access.open(
cast(str, rfile["location"]), "rb"
) as f:
checksum = hashlib.sha1() # nosec
contents = f.read(1024 * 1024)
while contents != b"":
checksum.update(contents)
contents = f.read(1024 * 1024)
files["checksum"] = "sha1$%s" % checksum.hexdigest()
files["size"] = fs_access.size(cast(str, rfile["location"]))
optional = False
single = False
if isinstance(schema["type"], MutableSequence):
if "null" in schema["type"]:
optional = True
if "File" in schema["type"] or "Directory" in schema["type"]:
single = True
elif schema["type"] == "File" or schema["type"] == "Directory":
single = True
if "outputEval" in binding:
with SourceLine(binding, "outputEval", WorkflowException, debug):
result = builder.do_eval(
cast(CWLOutputType, binding["outputEval"]), context=r
)
else:
result = cast(CWLOutputType, r)
if single:
with SourceLine(binding, "glob", WorkflowException, debug):
if not result and not optional:
raise WorkflowException(
f"Did not find output file with glob pattern: '{globpatterns}'."
)
elif not result and optional:
pass
elif isinstance(result, MutableSequence):
if len(result) > 1:
raise WorkflowException(
"Multiple matches for output item that is a single file."
)
else:
result = cast(CWLOutputType, result[0])
if "secondaryFiles" in schema:
with SourceLine(schema, "secondaryFiles", WorkflowException, debug):
for primary in aslist(result):
if isinstance(primary, MutableMapping):
primary.setdefault("secondaryFiles", [])
pathprefix = primary["path"][
0 : primary["path"].rindex(os.sep) + 1
]
for sf in aslist(schema["secondaryFiles"]):
if "required" in sf:
with SourceLine(
schema["secondaryFiles"],
"required",
WorkflowException,
debug,
):
sf_required_eval = builder.do_eval(
sf["required"], context=primary
)
if not (
isinstance(sf_required_eval, bool)
or sf_required_eval is None
):
raise WorkflowException(
"Expressions in the field "
"'required' must evaluate to a "
"Boolean (true or false) or None. "
f"Got '{sf_required_eval}' for "
f"'{sf['required']}'."
)
sf_required: bool = sf_required_eval or False
else:
sf_required = False
if "$(" in sf["pattern"] or "${" in sf["pattern"]:
sfpath = builder.do_eval(
sf["pattern"], context=primary
)
else:
sfpath = substitute(
primary["basename"], sf["pattern"]
)
for sfitem in aslist(sfpath):
if not sfitem:
continue
if isinstance(sfitem, str):
sfitem = {"path": pathprefix + sfitem}
if (
not fs_access.exists(sfitem["path"])
and sf_required
):
raise WorkflowException(
"Missing required secondary file '%s'"
% (sfitem["path"])
)
if "path" in sfitem and "location" not in sfitem:
revmap(sfitem)
if fs_access.isfile(sfitem["location"]):
sfitem["class"] = "File"
primary["secondaryFiles"].append(sfitem)
elif fs_access.isdir(sfitem["location"]):
sfitem["class"] = "Directory"
primary["secondaryFiles"].append(sfitem)
if "format" in schema:
format_field = cast(str, schema["format"])
if "$(" in format_field or "${" in format_field:
for index, primary in enumerate(aslist(result)):
format_eval = builder.do_eval(format_field, context=primary)
if not isinstance(format_eval, str):
message = (
f"'format' expression must evaluate to a string. "
f"Got '{format_eval}' from '{format_field}'."
)
if isinstance(result, list):
message += f" 'self' had the value of the index {index} result: '{primary}'."
raise SourceLine(
schema, "format", WorkflowException, debug
).makeError(message)
primary["format"] = format_eval
else:
for primary in aslist(result):
primary["format"] = format_field
# Ensure files point to local references outside of the run environment
adjustFileObjs(result, revmap)
if not result and optional:
# Don't convert zero or empty string to None
if result in [0, ""]:
return result
# For [] or None, return None
else:
return None
if (
not result
and not empty_and_optional
and isinstance(schema["type"], MutableMapping)
and schema["type"]["type"] == "record"
):
out = {}
for field in cast(List[CWLObjectType], schema["type"]["fields"]):
out[shortname(cast(str, field["name"]))] = self.collect_output(
field, builder, outdir, fs_access, compute_checksum=compute_checksum
)
return out
return result
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/cwltool/context.py 0000644 0001750 0001750 00000021772 00000000000 017206 0 ustar 00peter peter """Shared context objects that replace use of kwargs."""
import copy
import os
import shutil
import tempfile
import threading
from typing import IO, Any, Callable, Dict, Iterable, List, Optional, TextIO, Union
# move to a regular typing import when Python 3.3-3.6 is no longer supported
from ruamel.yaml.comments import CommentedMap
from schema_salad.avro.schema import Names
from schema_salad.ref_resolver import Loader
from schema_salad.utils import FetcherCallableType
from typing_extensions import TYPE_CHECKING
from .builder import Builder
from .mpi import MpiConfig
from .mutation import MutationManager
from .pathmapper import PathMapper
from .secrets import SecretStore
from .software_requirements import DependenciesConfiguration
from .stdfsaccess import StdFsAccess
from .utils import DEFAULT_TMP_PREFIX, CWLObjectType, HasReqsHints, ResolverType
if TYPE_CHECKING:
from .process import Process
from .provenance import ResearchObject # pylint: disable=unused-import
from .provenance_profile import ProvenanceProfile
class ContextBase:
"""Shared kwargs based initilizer for {Runtime,Loading}Context."""
def __init__(self, kwargs: Optional[Dict[str, Any]] = None) -> None:
"""Initialize."""
if kwargs:
for k, v in kwargs.items():
if hasattr(self, k):
setattr(self, k, v)
def make_tool_notimpl(
toolpath_object: CommentedMap, loadingContext: "LoadingContext"
) -> "Process":
raise NotImplementedError()
default_make_tool = make_tool_notimpl
def log_handler(
outdir: str,
base_path_logs: str,
stdout_path: Optional[str],
stderr_path: Optional[str],
) -> None:
"""Move logs from log location to final output."""
if outdir != base_path_logs:
if stdout_path:
new_stdout_path = stdout_path.replace(base_path_logs, outdir)
shutil.copy2(stdout_path, new_stdout_path)
if stderr_path:
new_stderr_path = stderr_path.replace(base_path_logs, outdir)
shutil.copy2(stderr_path, new_stderr_path)
def set_log_dir(outdir: str, log_dir: str, subdir_name: str) -> str:
"""Default handler for setting the log directory."""
if log_dir == "":
return outdir
else:
return log_dir + "/" + subdir_name
class LoadingContext(ContextBase):
def __init__(self, kwargs: Optional[Dict[str, Any]] = None) -> None:
"""Initialize the LoadingContext from the kwargs."""
self.debug = False # type: bool
self.metadata = {} # type: CWLObjectType
self.requirements = None # type: Optional[List[CWLObjectType]]
self.hints = None # type: Optional[List[CWLObjectType]]
self.overrides_list = [] # type: List[CWLObjectType]
self.loader = None # type: Optional[Loader]
self.avsc_names = None # type: Optional[Names]
self.disable_js_validation = False # type: bool
self.js_hint_options_file: Optional[str] = None
self.do_validate = True # type: bool
self.enable_dev = False # type: bool
self.strict = True # type: bool
self.resolver = None # type: Optional[ResolverType]
self.fetcher_constructor = None # type: Optional[FetcherCallableType]
self.construct_tool_object = default_make_tool
self.research_obj = None # type: Optional[ResearchObject]
self.orcid = "" # type: str
self.cwl_full_name = "" # type: str
self.host_provenance = False # type: bool
self.user_provenance = False # type: bool
self.prov_obj = None # type: Optional[ProvenanceProfile]
self.do_update = None # type: Optional[bool]
self.jobdefaults = None # type: Optional[CommentedMap]
self.doc_cache = True # type: bool
self.relax_path_checks = False # type: bool
self.singularity = False # type: bool
self.podman = False # type: bool
super().__init__(kwargs)
def copy(self):
# type: () -> LoadingContext
return copy.copy(self)
class RuntimeContext(ContextBase):
def __init__(self, kwargs: Optional[Dict[str, Any]] = None) -> None:
"""Initialize the RuntimeContext from the kwargs."""
select_resources_callable = Callable[ # pylint: disable=unused-variable
[Dict[str, Union[int, float]], RuntimeContext],
Dict[str, Union[int, float]],
]
self.user_space_docker_cmd = "" # type: Optional[str]
self.secret_store = None # type: Optional[SecretStore]
self.no_read_only = False # type: bool
self.custom_net = None # type: Optional[str]
self.no_match_user = False # type: bool
self.preserve_environment = None # type: Optional[Iterable[str]]
self.preserve_entire_environment = False # type: bool
self.use_container = True # type: bool
self.force_docker_pull = False # type: bool
self.tmp_outdir_prefix = "" # type: str
self.tmpdir_prefix = DEFAULT_TMP_PREFIX # type: str
self.tmpdir = "" # type: str
self.rm_tmpdir = True # type: bool
self.pull_image = True # type: bool
self.rm_container = True # type: bool
self.move_outputs = "move" # type: str
self.log_dir = "" # type: str
self.set_log_dir = set_log_dir
self.log_dir_handler = log_handler
self.streaming_allowed: bool = False
self.singularity = False # type: bool
self.podman = False # type: bool
self.debug = False # type: bool
self.compute_checksum = True # type: bool
self.name = "" # type: str
self.default_container = "" # type: Optional[str]
self.find_default_container = (
None
) # type: Optional[Callable[[HasReqsHints], Optional[str]]]
self.cachedir = None # type: Optional[str]
self.outdir = None # type: Optional[str]
self.stagedir = "" # type: str
self.part_of = "" # type: str
self.basedir = "" # type: str
self.toplevel = False # type: bool
self.mutation_manager = None # type: Optional[MutationManager]
self.make_fs_access = StdFsAccess
self.path_mapper = PathMapper
self.builder = None # type: Optional[Builder]
self.docker_outdir = "" # type: str
self.docker_tmpdir = "" # type: str
self.docker_stagedir = "" # type: str
self.js_console = False # type: bool
self.job_script_provider = None # type: Optional[DependenciesConfiguration]
self.select_resources = None # type: Optional[select_resources_callable]
self.eval_timeout = 20 # type: float
self.postScatterEval = (
None
) # type: Optional[Callable[[CWLObjectType], Optional[CWLObjectType]]]
self.on_error = "stop" # type: str
self.strict_memory_limit = False # type: bool
self.strict_cpu_limit = False # type: bool
self.cidfile_dir = None # type: Optional[str]
self.cidfile_prefix = None # type: Optional[str]
self.workflow_eval_lock = None # type: Optional[threading.Condition]
self.research_obj = None # type: Optional[ResearchObject]
self.orcid = "" # type: str
self.cwl_full_name = "" # type: str
self.process_run_id = None # type: Optional[str]
self.prov_obj = None # type: Optional[ProvenanceProfile]
self.mpi_config = MpiConfig() # type: MpiConfig
self.default_stdout = None # type: Optional[Union[IO[bytes], TextIO]]
self.default_stderr = None # type: Optional[Union[IO[bytes], TextIO]]
super().__init__(kwargs)
if self.tmp_outdir_prefix == "":
self.tmp_outdir_prefix = self.tmpdir_prefix
def get_outdir(self) -> str:
"""Return self.outdir or create one with self.tmp_outdir_prefix."""
if self.outdir:
return self.outdir
return self.create_outdir()
def get_tmpdir(self) -> str:
"""Return self.tmpdir or create one with self.tmpdir_prefix."""
if self.tmpdir:
return self.tmpdir
return self.create_tmpdir()
def get_stagedir(self) -> str:
"""Return self.stagedir or create one with self.tmpdir_prefix."""
if self.stagedir:
return self.stagedir
tmp_dir, tmp_prefix = os.path.split(self.tmpdir_prefix)
return tempfile.mkdtemp(prefix=tmp_prefix, dir=tmp_dir)
def create_tmpdir(self) -> str:
"""Create a temporary directory that respects self.tmpdir_prefix."""
tmp_dir, tmp_prefix = os.path.split(self.tmpdir_prefix)
return tempfile.mkdtemp(prefix=tmp_prefix, dir=tmp_dir)
def create_outdir(self) -> str:
"""Create a temporary directory that respects self.tmp_outdir_prefix."""
out_dir, out_prefix = os.path.split(self.tmp_outdir_prefix)
return tempfile.mkdtemp(prefix=out_prefix, dir=out_dir)
def copy(self):
# type: () -> RuntimeContext
return copy.copy(self)
def getdefault(val, default):
# type: (Any, Any) -> Any
if val is None:
return default
else:
return val
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/cwltool/cuda.py 0000644 0001750 0001750 00000002706 00000000000 016432 0 ustar 00peter peter import subprocess # nosec
import xml.dom.minidom # nosec
from typing import Tuple, cast
from .loghandler import _logger
from .utils import CWLObjectType
def cuda_version_and_device_count() -> Tuple[str, int]:
try:
out = subprocess.check_output(["nvidia-smi", "-q", "-x"]) # nosec
except Exception as e:
_logger.warning("Error checking CUDA version with nvidia-smi: %s", e)
return ("", 0)
dm = xml.dom.minidom.parseString(out) # nosec
ag = dm.getElementsByTagName("attached_gpus")[0].firstChild
cv = dm.getElementsByTagName("cuda_version")[0].firstChild
return (cv.data, int(ag.data))
def cuda_check(cuda_req: CWLObjectType, requestCount: int) -> int:
try:
vmin = float(str(cuda_req["cudaVersionMin"]))
version, devices = cuda_version_and_device_count()
if version == "":
# nvidia-smi not detected, or failed some other way
return 0
versionf = float(version)
if versionf < vmin:
_logger.warning(
"CUDA version '%s' is less than minimum version '%s'", version, vmin
)
return 0
if requestCount > devices:
_logger.warning(
"Requested %d GPU devices but only %d available", requestCount, devices
)
return 0
return requestCount
except Exception as e:
_logger.warning("Error checking CUDA requirements: %s", e)
return 0
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/cwltool/cwlNodeEngine.js 0000755 0001750 0001750 00000001133 00000000000 020217 0 ustar 00peter peter "use strict";
process.stdin.setEncoding("utf8");
var incoming = "";
process.stdin.on("data", function(chunk) {
incoming += chunk;
var i = incoming.indexOf("\n");
if (i > -1) {
try{
var fn = JSON.parse(incoming.substr(0, i));
incoming = incoming.substr(i+1);
process.stdout.write(JSON.stringify(require("vm").runInNewContext(fn, {})) + "\n");
}
catch(e){
console.error(e)
}
/*strings to indicate the process has finished*/
console.log("r1cepzbhUTxtykz5XTC4");
console.error("r1cepzbhUTxtykz5XTC4");
}
});
process.stdin.on("end", process.exit);
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/cwltool/cwlNodeEngineJSConsole.js 0000644 0001750 0001750 00000001731 00000000000 022000 0 ustar 00peter peter "use strict";
function js_console_log(){
console.error("[log] "+require("util").format.apply(this, arguments).split("\n").join("\n[log] "));
}
function js_console_err(){
console.error("[err] "+require("util").format.apply(this, arguments).split("\n").join("\n[err] "));
}
process.stdin.setEncoding("utf8");
var incoming = "";
process.stdin.on("data", function(chunk) {
incoming += chunk;
var i = incoming.indexOf("\n");
if (i > -1) {
try{
var fn = JSON.parse(incoming.substr(0, i));
incoming = incoming.substr(i+1);
process.stdout.write(JSON.stringify(require("vm").runInNewContext(fn, {
console: {
log: js_console_log,
error: js_console_err
}
})) + "\n");
}
catch(e){
console.error(e)
}
/*strings to indicate the process has finished*/
console.log("r1cepzbhUTxtykz5XTC4");
console.error("r1cepzbhUTxtykz5XTC4");
}
});
process.stdin.on("end", process.exit);
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/cwltool/cwlNodeEngineWithContext.js 0000644 0001750 0001750 00000001577 00000000000 022431 0 ustar 00peter peter "use strict";
process.stdin.setEncoding("utf8");
var incoming = "";
var firstInput = true;
var context = {};
process.stdin.on("data", function(chunk) {
incoming += chunk;
var i = incoming.indexOf("\n");
while (i > -1) {
try{
var input = incoming.substr(0, i);
incoming = incoming.substr(i+1);
var fn = JSON.parse(input);
if(firstInput){
context = require("vm").runInNewContext(fn, {});
}
else{
process.stdout.write(JSON.stringify(require("vm").runInNewContext(fn, context)) + "\n");
}
}
catch(e){
console.error(e);
}
if(firstInput){
firstInput = false;
}
else{
/*strings to indicate the process has finished*/
console.log("r1cepzbhUTxtykz5XTC4");
console.error("r1cepzbhUTxtykz5XTC4");
}
i = incoming.indexOf("\n");
}
});
process.stdin.on("end", process.exit);
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/cwltool/cwlrdf.py 0000644 0001750 0001750 00000012722 00000000000 016776 0 ustar 00peter peter import urllib
from codecs import StreamWriter
from typing import Any, Dict, Optional, TextIO, Union, cast
from rdflib import Graph
from ruamel.yaml.comments import CommentedMap
from schema_salad.jsonld_context import makerdf
from schema_salad.utils import ContextType
from .cwlviewer import CWLViewer
from .process import Process
def gather(tool: Process, ctx: ContextType) -> Graph:
g = Graph()
def visitor(t: CommentedMap) -> None:
makerdf(t["id"], t, ctx, graph=g)
tool.visit(visitor)
return g
def printrdf(wflow: Process, ctx: ContextType, style: str) -> str:
"""Serialize the CWL document into a string, ready for printing."""
rdf = gather(wflow, ctx).serialize(format=style, encoding="utf-8")
if not rdf:
return ""
return cast(str, rdf.decode("utf-8"))
def lastpart(uri: Any) -> str:
uri2 = str(uri)
if "/" in uri2:
return uri2[uri2.rindex("/") + 1 :]
return uri2
def dot_with_parameters(g: Graph, stdout: Union[TextIO, StreamWriter]) -> None:
qres = g.query(
"""SELECT ?step ?run ?runtype
WHERE {
?step cwl:run ?run .
?run rdf:type ?runtype .
}"""
)
for step, run, _ in qres:
stdout.write(
'"%s" [label="%s"]\n'
% (lastpart(step), f"{lastpart(step)} ({lastpart(run)})")
)
qres = g.query(
"""SELECT ?step ?inp ?source
WHERE {
?wf Workflow:steps ?step .
?step cwl:inputs ?inp .
?inp cwl:source ?source .
}"""
)
for step, inp, source in qres:
stdout.write('"%s" [shape=box]\n' % (lastpart(inp)))
stdout.write(
'"{}" -> "{}" [label="{}"]\n'.format(lastpart(source), lastpart(inp), "")
)
stdout.write(
'"{}" -> "{}" [label="{}"]\n'.format(lastpart(inp), lastpart(step), "")
)
qres = g.query(
"""SELECT ?step ?out
WHERE {
?wf Workflow:steps ?step .
?step cwl:outputs ?out .
}"""
)
for step, out in qres:
stdout.write('"%s" [shape=box]\n' % (lastpart(out)))
stdout.write(
'"{}" -> "{}" [label="{}"]\n'.format(lastpart(step), lastpart(out), "")
)
qres = g.query(
"""SELECT ?out ?source
WHERE {
?wf cwl:outputs ?out .
?out cwl:source ?source .
}"""
)
for out, source in qres:
stdout.write('"%s" [shape=octagon]\n' % (lastpart(out)))
stdout.write(
'"{}" -> "{}" [label="{}"]\n'.format(lastpart(source), lastpart(out), "")
)
qres = g.query(
"""SELECT ?inp
WHERE {
?wf rdf:type cwl:Workflow .
?wf cwl:inputs ?inp .
}"""
)
for (inp,) in qres:
stdout.write('"%s" [shape=octagon]\n' % (lastpart(inp)))
def dot_without_parameters(g: Graph, stdout: Union[TextIO, StreamWriter]) -> None:
dotname = {} # type: Dict[str,str]
clusternode = {}
stdout.write("compound=true\n")
subworkflows = set()
qres = g.query(
"""SELECT ?run
WHERE {
?wf rdf:type cwl:Workflow .
?wf Workflow:steps ?step .
?step cwl:run ?run .
?run rdf:type cwl:Workflow .
} ORDER BY ?wf"""
)
for (run,) in qres:
subworkflows.add(run)
qres = g.query(
"""SELECT ?wf ?step ?run ?runtype
WHERE {
?wf rdf:type cwl:Workflow .
?wf Workflow:steps ?step .
?step cwl:run ?run .
?run rdf:type ?runtype .
} ORDER BY ?wf"""
)
currentwf = None # type: Optional[str]
for wf, step, _run, runtype in qres:
if step not in dotname:
dotname[step] = lastpart(step)
if wf != currentwf:
if currentwf is not None:
stdout.write("}\n")
if wf in subworkflows:
if wf not in dotname:
dotname[wf] = "cluster_" + lastpart(wf)
stdout.write(f'subgraph "{dotname[wf]}" {{ label="{lastpart(wf)}"\n')
currentwf = wf
clusternode[wf] = step
else:
currentwf = None
if str(runtype) != "https://w3id.org/cwl/cwl#Workflow":
stdout.write(
'"%s" [label="%s"]\n'
% (dotname[step], urllib.parse.urldefrag(str(step))[1])
)
if currentwf is not None:
stdout.write("}\n")
qres = g.query(
"""SELECT DISTINCT ?src ?sink ?srcrun ?sinkrun
WHERE {
?wf1 Workflow:steps ?src .
?wf2 Workflow:steps ?sink .
?src cwl:out ?out .
?inp cwl:source ?out .
?sink cwl:in ?inp .
?src cwl:run ?srcrun .
?sink cwl:run ?sinkrun .
}"""
)
for src, sink, srcrun, sinkrun in qres:
attr = ""
if srcrun in clusternode:
attr += 'ltail="%s"' % dotname[srcrun]
src = clusternode[srcrun]
if sinkrun in clusternode:
attr += ' lhead="%s"' % dotname[sinkrun]
sink = clusternode[sinkrun]
stdout.write(f'"{dotname[src]}" -> "{dotname[sink]}" [{attr}]\n')
def printdot(
wf: Process,
ctx: ContextType,
stdout: Union[TextIO, StreamWriter],
) -> None:
cwl_viewer = CWLViewer(printrdf(wf, ctx, "n3")) # type: CWLViewer
stdout.write(cwl_viewer.dot())
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/cwltool/cwlviewer.py 0000644 0001750 0001750 00000013743 00000000000 017530 0 ustar 00peter peter """Visualize a CWL workflow."""
from pathlib import Path
from urllib.parse import urlparse
import pydot
import rdflib
_queries_dir = (Path(__file__).parent / "rdfqueries").resolve()
_get_inner_edges_query_path = _queries_dir / "get_inner_edges.sparql"
_get_input_edges_query_path = _queries_dir / "get_input_edges.sparql"
_get_output_edges_query_path = _queries_dir / "get_output_edges.sparql"
_get_root_query_path = _queries_dir / "get_root.sparql"
class CWLViewer:
"""Produce similar images with the https://github.com/common-workflow-language/cwlviewer."""
def __init__(self, rdf_description: str):
"""Create a viewer object based on the rdf description of the workflow."""
self._dot_graph: pydot.Graph = CWLViewer._init_dot_graph()
self._rdf_graph: rdflib.graph.Graph = self._load_cwl_graph(rdf_description)
self._root_graph_uri: str = self._get_root_graph_uri()
self._set_inner_edges()
self._set_input_edges()
self._set_output_edges()
def _load_cwl_graph(self, rdf_description: str) -> rdflib.graph.Graph:
rdf_graph = rdflib.Graph()
rdf_graph.parse(data=rdf_description, format="n3")
return rdf_graph
def _set_inner_edges(self) -> None:
with open(_get_inner_edges_query_path) as f:
get_inner_edges_query = f.read()
inner_edges = self._rdf_graph.query(
get_inner_edges_query, initBindings={"root_graph": self._root_graph_uri}
)
for inner_edge_row in inner_edges:
source_label = (
inner_edge_row["source_label"]
if inner_edge_row["source_label"] is not None
else urlparse(inner_edge_row["source_step"]).fragment
)
n = pydot.Node(
"",
fillcolor="lightgoldenrodyellow",
style="filled",
label=source_label,
shape="record",
)
n.set_name(str(inner_edge_row["source_step"]))
self._dot_graph.add_node(n)
target_label = (
inner_edge_row["target_label"]
if inner_edge_row["target_label"] is not None
else urlparse(inner_edge_row["target_step"]).fragment
)
n = pydot.Node(
"",
fillcolor="lightgoldenrodyellow",
style="filled",
label=target_label,
shape="record",
)
n.set_name(str(inner_edge_row["target_step"]))
self._dot_graph.add_node(n)
self._dot_graph.add_edge(
pydot.Edge(
str(inner_edge_row["source_step"]),
str(inner_edge_row["target_step"]),
)
)
def _set_input_edges(self) -> None:
with open(_get_input_edges_query_path) as f:
get_input_edges_query = f.read()
inputs_subgraph = pydot.Subgraph(graph_name="cluster_inputs")
self._dot_graph.add_subgraph(inputs_subgraph)
inputs_subgraph.set("rank", "same")
inputs_subgraph.create_attribute_methods(["style"])
inputs_subgraph.set("style", "dashed")
inputs_subgraph.set("label", "Workflow Inputs")
input_edges = self._rdf_graph.query(
get_input_edges_query, initBindings={"root_graph": self._root_graph_uri}
)
for input_row in input_edges:
n = pydot.Node(
"",
fillcolor="#94DDF4",
style="filled",
label=urlparse(input_row["input"]).fragment,
shape="record",
)
n.set_name(str(input_row["input"]))
inputs_subgraph.add_node(n)
self._dot_graph.add_edge(
pydot.Edge(str(input_row["input"]), str(input_row["step"]))
)
def _set_output_edges(self) -> None:
with open(_get_output_edges_query_path) as f:
get_output_edges = f.read()
outputs_graph = pydot.Subgraph(graph_name="cluster_outputs")
self._dot_graph.add_subgraph(outputs_graph)
outputs_graph.set("rank", "same")
outputs_graph.create_attribute_methods(["style"])
outputs_graph.set("style", "dashed")
outputs_graph.set("label", "Workflow Outputs")
outputs_graph.set("labelloc", "b")
output_edges = self._rdf_graph.query(
get_output_edges, initBindings={"root_graph": self._root_graph_uri}
)
for output_edge_row in output_edges:
n = pydot.Node(
"",
fillcolor="#94DDF4",
style="filled",
label=urlparse(output_edge_row["output"]).fragment,
shape="record",
)
n.set_name(str(output_edge_row["output"]))
outputs_graph.add_node(n)
self._dot_graph.add_edge(
pydot.Edge(output_edge_row["step"], output_edge_row["output"])
)
def _get_root_graph_uri(self) -> rdflib.URIRef:
with open(_get_root_query_path) as f:
get_root_query = f.read()
root = list(
self._rdf_graph.query(
get_root_query,
)
)
if len(root) != 1:
raise RuntimeError(
"Cannot identify root workflow! Notice that only Workflows can be visualized"
)
workflow = root[0]["workflow"] # type: rdflib.URIRef
return workflow
@classmethod
def _init_dot_graph(cls) -> pydot.Graph:
graph = pydot.Graph(graph_type="digraph", simplify=False)
graph.set("bgcolor", "#eeeeee")
graph.set("clusterrank", "local")
graph.set("labelloc", "bottom")
graph.set("labelloc", "bottom")
graph.set("labeljust", "right")
return graph
def get_dot_graph(self) -> pydot.Graph:
"""Get the dot graph object."""
return self._dot_graph
def dot(self) -> str:
"""Get the graph as graphviz."""
return str(self._dot_graph.to_string())
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/cwltool/docker.py 0000644 0001750 0001750 00000044332 00000000000 016766 0 ustar 00peter peter """Enables Docker software containers via the {u,}docker or podman runtimes."""
import csv
import datetime
import math
import os
import re
import shutil
import subprocess # nosec
import sys
import threading
from io import StringIO # pylint: disable=redefined-builtin
from typing import Callable, Dict, List, MutableMapping, Optional, Set, Tuple, cast
import requests
from .builder import Builder
from .context import RuntimeContext
from .cuda import cuda_check
from .docker_id import docker_vm_id
from .errors import WorkflowException
from .job import ContainerCommandLineJob
from .loghandler import _logger
from .pathmapper import MapperEnt, PathMapper
from .utils import CWLObjectType, create_tmp_dir, ensure_writable
_IMAGES = set() # type: Set[str]
_IMAGES_LOCK = threading.Lock()
__docker_machine_mounts = None # type: Optional[List[str]]
__docker_machine_mounts_lock = threading.Lock()
def _get_docker_machine_mounts() -> List[str]:
global __docker_machine_mounts
if __docker_machine_mounts is None:
with __docker_machine_mounts_lock:
if "DOCKER_MACHINE_NAME" not in os.environ:
__docker_machine_mounts = []
else:
__docker_machine_mounts = [
"/" + line.split(None, 1)[0]
for line in subprocess.check_output( # nosec
[
"docker-machine",
"ssh",
os.environ["DOCKER_MACHINE_NAME"],
"mount",
"-t",
"vboxsf",
],
universal_newlines=True,
).splitlines()
]
return __docker_machine_mounts
def _check_docker_machine_path(path: Optional[str]) -> None:
if path is None:
return
mounts = _get_docker_machine_mounts()
found = False
for mount in mounts:
if path.startswith(mount):
found = True
break
if not found and mounts:
name = os.environ.get("DOCKER_MACHINE_NAME", "???")
raise WorkflowException(
"Input path {path} is not in the list of host paths mounted "
"into the Docker virtual machine named {name}. Already mounted "
"paths: {mounts}.\n"
"See https://docs.docker.com/toolbox/toolbox_install_windows/"
"#optional-add-shared-directories for instructions on how to "
"add this path to your VM.".format(path=path, name=name, mounts=mounts)
)
class DockerCommandLineJob(ContainerCommandLineJob):
"""Runs a CommandLineJob in a sofware container using the Docker engine."""
def __init__(
self,
builder: Builder,
joborder: CWLObjectType,
make_path_mapper: Callable[..., PathMapper],
requirements: List[CWLObjectType],
hints: List[CWLObjectType],
name: str,
) -> None:
"""Initialize a command line builder using the Docker software container engine."""
super().__init__(builder, joborder, make_path_mapper, requirements, hints, name)
@staticmethod
def get_image(
docker_requirement: Dict[str, str],
pull_image: bool,
force_pull: bool,
tmp_outdir_prefix: str,
) -> bool:
"""
Retrieve the relevant Docker container image.
Returns True upon success
"""
found = False
if (
"dockerImageId" not in docker_requirement
and "dockerPull" in docker_requirement
):
docker_requirement["dockerImageId"] = docker_requirement["dockerPull"]
with _IMAGES_LOCK:
if docker_requirement["dockerImageId"] in _IMAGES:
return True
for line in (
subprocess.check_output( # nosec
["docker", "images", "--no-trunc", "--all"]
)
.decode("utf-8")
.splitlines()
):
try:
match = re.match(r"^([^ ]+)\s+([^ ]+)\s+([^ ]+)", line)
split = docker_requirement["dockerImageId"].split(":")
if len(split) == 1:
split.append("latest")
elif len(split) == 2:
# if split[1] doesn't match valid tag names, it is a part of repository
if not re.match(r"[\w][\w.-]{0,127}", split[1]):
split[0] = split[0] + ":" + split[1]
split[1] = "latest"
elif len(split) == 3:
if re.match(r"[\w][\w.-]{0,127}", split[2]):
split[0] = split[0] + ":" + split[1]
split[1] = split[2]
del split[2]
# check for repository:tag match or image id match
if match and (
(split[0] == match.group(1) and split[1] == match.group(2))
or docker_requirement["dockerImageId"] == match.group(3)
):
found = True
break
except ValueError:
pass
if (force_pull or not found) and pull_image:
cmd = [] # type: List[str]
if "dockerPull" in docker_requirement:
cmd = ["docker", "pull", str(docker_requirement["dockerPull"])]
_logger.info(str(cmd))
subprocess.check_call(cmd, stdout=sys.stderr) # nosec
found = True
elif "dockerFile" in docker_requirement:
dockerfile_dir = create_tmp_dir(tmp_outdir_prefix)
with open(os.path.join(dockerfile_dir, "Dockerfile"), "w") as dfile:
dfile.write(docker_requirement["dockerFile"])
cmd = [
"docker",
"build",
"--tag=%s" % str(docker_requirement["dockerImageId"]),
dockerfile_dir,
]
_logger.info(str(cmd))
subprocess.check_call(cmd, stdout=sys.stderr) # nosec
found = True
elif "dockerLoad" in docker_requirement:
cmd = ["docker", "load"]
_logger.info(str(cmd))
if os.path.exists(docker_requirement["dockerLoad"]):
_logger.info(
"Loading docker image from %s",
docker_requirement["dockerLoad"],
)
with open(docker_requirement["dockerLoad"], "rb") as dload:
loadproc = subprocess.Popen( # nosec
cmd, stdin=dload, stdout=sys.stderr
)
else:
loadproc = subprocess.Popen( # nosec
cmd, stdin=subprocess.PIPE, stdout=sys.stderr
)
assert loadproc.stdin is not None # nosec
_logger.info(
"Sending GET request to %s", docker_requirement["dockerLoad"]
)
req = requests.get(docker_requirement["dockerLoad"], stream=True)
size = 0
for chunk in req.iter_content(1024 * 1024):
size += len(chunk)
_logger.info("\r%i bytes", size)
loadproc.stdin.write(chunk)
loadproc.stdin.close()
rcode = loadproc.wait()
if rcode != 0:
raise WorkflowException(
"Docker load returned non-zero exit status %i" % (rcode)
)
found = True
elif "dockerImport" in docker_requirement:
cmd = [
"docker",
"import",
str(docker_requirement["dockerImport"]),
str(docker_requirement["dockerImageId"]),
]
_logger.info(str(cmd))
subprocess.check_call(cmd, stdout=sys.stderr) # nosec
found = True
if found:
with _IMAGES_LOCK:
_IMAGES.add(docker_requirement["dockerImageId"])
return found
def get_from_requirements(
self,
r: CWLObjectType,
pull_image: bool,
force_pull: bool,
tmp_outdir_prefix: str,
) -> Optional[str]:
if not shutil.which("docker"):
raise WorkflowException("docker executable is not available")
if self.get_image(
cast(Dict[str, str], r), pull_image, force_pull, tmp_outdir_prefix
):
return cast(Optional[str], r["dockerImageId"])
raise WorkflowException("Docker image %s not found" % r["dockerImageId"])
@staticmethod
def append_volume(
runtime: List[str], source: str, target: str, writable: bool = False
) -> None:
"""Add binding arguments to the runtime list."""
options = [
"type=bind",
"source=" + source,
"target=" + target,
]
if not writable:
options.append("readonly")
output = StringIO()
csv.writer(output).writerow(options)
mount_arg = output.getvalue().strip()
runtime.append(f"--mount={mount_arg}")
# Unlike "--volume", "--mount" will fail if the volume doesn't already exist.
if not os.path.exists(source):
os.makedirs(source)
def add_file_or_directory_volume(
self, runtime: List[str], volume: MapperEnt, host_outdir_tgt: Optional[str]
) -> None:
"""Append volume a file/dir mapping to the runtime option list."""
if not volume.resolved.startswith("_:"):
_check_docker_machine_path(volume.resolved)
self.append_volume(runtime, volume.resolved, volume.target)
def add_writable_file_volume(
self,
runtime: List[str],
volume: MapperEnt,
host_outdir_tgt: Optional[str],
tmpdir_prefix: str,
) -> None:
"""Append a writable file mapping to the runtime option list."""
if self.inplace_update:
self.append_volume(runtime, volume.resolved, volume.target, writable=True)
else:
if host_outdir_tgt:
# shortcut, just copy to the output directory
# which is already going to be mounted
if not os.path.exists(os.path.dirname(host_outdir_tgt)):
os.makedirs(os.path.dirname(host_outdir_tgt))
shutil.copy(volume.resolved, host_outdir_tgt)
else:
tmpdir = create_tmp_dir(tmpdir_prefix)
file_copy = os.path.join(tmpdir, os.path.basename(volume.resolved))
shutil.copy(volume.resolved, file_copy)
self.append_volume(runtime, file_copy, volume.target, writable=True)
ensure_writable(host_outdir_tgt or file_copy)
def add_writable_directory_volume(
self,
runtime: List[str],
volume: MapperEnt,
host_outdir_tgt: Optional[str],
tmpdir_prefix: str,
) -> None:
"""Append a writable directory mapping to the runtime option list."""
if volume.resolved.startswith("_:"):
# Synthetic directory that needs creating first
if not host_outdir_tgt:
new_dir = os.path.join(
create_tmp_dir(tmpdir_prefix),
os.path.basename(volume.target),
)
self.append_volume(runtime, new_dir, volume.target, writable=True)
elif not os.path.exists(host_outdir_tgt):
os.makedirs(host_outdir_tgt)
else:
if self.inplace_update:
self.append_volume(
runtime, volume.resolved, volume.target, writable=True
)
else:
if not host_outdir_tgt:
tmpdir = create_tmp_dir(tmpdir_prefix)
new_dir = os.path.join(tmpdir, os.path.basename(volume.resolved))
shutil.copytree(volume.resolved, new_dir)
self.append_volume(runtime, new_dir, volume.target, writable=True)
else:
shutil.copytree(volume.resolved, host_outdir_tgt)
ensure_writable(host_outdir_tgt or new_dir)
def _required_env(self) -> Dict[str, str]:
# spec currently says "HOME must be set to the designated output
# directory." but spec might change to designated temp directory.
# runtime.append("--env=HOME=/tmp")
return {
"TMPDIR": self.CONTAINER_TMPDIR,
"HOME": self.builder.outdir,
}
def create_runtime(
self, env: MutableMapping[str, str], runtimeContext: RuntimeContext
) -> Tuple[List[str], Optional[str]]:
any_path_okay = self.builder.get_requirement("DockerRequirement")[1] or False
user_space_docker_cmd = runtimeContext.user_space_docker_cmd
if user_space_docker_cmd:
if "udocker" in user_space_docker_cmd and not runtimeContext.debug:
runtime = [user_space_docker_cmd, "--quiet", "run"]
# udocker 1.1.1 will output diagnostic messages to stdout
# without this
else:
runtime = [user_space_docker_cmd, "run"]
elif runtimeContext.podman:
runtime = ["podman", "run", "-i", "--userns=keep-id"]
else:
runtime = ["docker", "run", "-i"]
self.append_volume(
runtime, os.path.realpath(self.outdir), self.builder.outdir, writable=True
)
self.append_volume(
runtime, os.path.realpath(self.tmpdir), self.CONTAINER_TMPDIR, writable=True
)
self.add_volumes(
self.pathmapper,
runtime,
any_path_okay=True,
secret_store=runtimeContext.secret_store,
tmpdir_prefix=runtimeContext.tmpdir_prefix,
)
if self.generatemapper is not None:
self.add_volumes(
self.generatemapper,
runtime,
any_path_okay=any_path_okay,
secret_store=runtimeContext.secret_store,
tmpdir_prefix=runtimeContext.tmpdir_prefix,
)
if user_space_docker_cmd:
runtime = [x.replace(":ro", "") for x in runtime]
runtime = [x.replace(":rw", "") for x in runtime]
runtime.append("--workdir=%s" % (self.builder.outdir))
if not user_space_docker_cmd:
if not runtimeContext.no_read_only:
runtime.append("--read-only=true")
if self.networkaccess:
if runtimeContext.custom_net:
runtime.append(f"--net={runtimeContext.custom_net}")
else:
runtime.append("--net=none")
if self.stdout is not None:
runtime.append("--log-driver=none")
euid, egid = docker_vm_id()
euid, egid = euid or os.geteuid(), egid or os.getgid()
if runtimeContext.no_match_user is False and (
euid is not None and egid is not None
):
runtime.append("--user=%d:%d" % (euid, egid))
if runtimeContext.rm_container:
runtime.append("--rm")
if self.builder.resources.get("cudaDeviceCount"):
runtime.append("--gpus=" + str(self.builder.resources["cudaDeviceCount"]))
cidfile_path = None # type: Optional[str]
# add parameters to docker to write a container ID file
if runtimeContext.user_space_docker_cmd is None:
if runtimeContext.cidfile_dir:
cidfile_dir = runtimeContext.cidfile_dir
if not os.path.exists(str(cidfile_dir)):
_logger.error(
"--cidfile-dir %s error:\n%s",
cidfile_dir,
"directory doesn't exist, please create it first",
)
exit(2)
if not os.path.isdir(cidfile_dir):
_logger.error(
"--cidfile-dir %s error:\n%s",
cidfile_dir,
cidfile_dir + " is not a directory, please check it first",
)
exit(2)
else:
cidfile_dir = runtimeContext.create_tmpdir()
cidfile_name = datetime.datetime.now().strftime("%Y%m%d%H%M%S-%f") + ".cid"
if runtimeContext.cidfile_prefix is not None:
cidfile_name = str(runtimeContext.cidfile_prefix + "-" + cidfile_name)
cidfile_path = os.path.join(cidfile_dir, cidfile_name)
runtime.append("--cidfile=%s" % cidfile_path)
for key, value in self.environment.items():
runtime.append(f"--env={key}={value}")
res_req, _ = self.builder.get_requirement("ResourceRequirement")
if runtimeContext.strict_memory_limit and not user_space_docker_cmd:
ram = self.builder.resources["ram"]
runtime.append("--memory=%dm" % ram)
elif not user_space_docker_cmd:
if res_req and ("ramMin" in res_req or "ramMax" in res_req):
_logger.warning(
"[job %s] Skipping Docker software container '--memory' limit "
"despite presence of ResourceRequirement with ramMin "
"and/or ramMax setting. Consider running with "
"--strict-memory-limit for increased portability "
"assurance.",
self.name,
)
if runtimeContext.strict_cpu_limit and not user_space_docker_cmd:
cpus = math.ceil(self.builder.resources["cores"])
runtime.append(f"--cpus={cpus}")
elif not user_space_docker_cmd:
if res_req and ("coresMin" in res_req or "coresMax" in res_req):
_logger.warning(
"[job %s] Skipping Docker software container '--cpus' limit "
"despite presence of ResourceRequirement with coresMin "
"and/or coresMax setting. Consider running with "
"--strict-cpu-limit for increased portability "
"assurance.",
self.name,
)
return runtime, cidfile_path
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/cwltool/docker_id.py 0000644 0001750 0001750 00000010274 00000000000 017440 0 ustar 00peter peter """Helper functions for docker."""
import subprocess # nosec
from typing import List, Optional, Tuple
def docker_vm_id() -> Tuple[Optional[int], Optional[int]]:
"""
Return the User ID and Group ID of the default docker user inside the VM.
When a host is using boot2docker or docker-machine to run docker with
boot2docker.iso (As on Mac OS X), the UID that mounts the shared filesystem
inside the VirtualBox VM is likely different than the user's UID on the host.
:return: A tuple containing numeric User ID and Group ID of the docker account inside
the boot2docker VM
"""
if boot2docker_running():
return boot2docker_id()
if docker_machine_running():
return docker_machine_id()
return (None, None)
def check_output_and_strip(cmd: List[str]) -> Optional[str]:
"""
Pass a command list to subprocess.check_output.
Returning None if an expected exception is raised
:param cmd: The command to execute
:return: Stripped string output of the command, or None if error
"""
try:
result = subprocess.check_output( # nosec
cmd, stderr=subprocess.STDOUT, universal_newlines=True
)
return result.strip()
except (OSError, subprocess.CalledProcessError, TypeError, AttributeError):
# OSError is raised if command doesn't exist
# CalledProcessError is raised if command returns nonzero
# AttributeError is raised if result cannot be strip()ped
return None
def docker_machine_name() -> Optional[str]:
"""
Get the machine name of the active docker-machine machine.
:return: Name of the active machine or None if error
"""
return check_output_and_strip(["docker-machine", "active"])
def cmd_output_matches(check_cmd: List[str], expected_status: str) -> bool:
"""
Run a command and compares output to expected.
:param check_cmd: Command list to execute
:param expected_status: Expected output, e.g. "Running" or "poweroff"
:return: Boolean value, indicating whether or not command result matched
"""
return check_output_and_strip(check_cmd) == expected_status
def boot2docker_running() -> bool:
"""
Check if boot2docker CLI reports that boot2docker vm is running.
:return: True if vm is running, False otherwise
"""
return cmd_output_matches(["boot2docker", "status"], "running")
def docker_machine_running() -> bool:
"""
Ask docker-machine for the active machine and checks if its VM is running.
:return: True if vm is running, False otherwise
"""
machine_name = docker_machine_name()
if not machine_name:
return False
return cmd_output_matches(["docker-machine", "status", machine_name], "Running")
def cmd_output_to_int(cmd: List[str]) -> Optional[int]:
"""
Run the provided command and returns the integer value of the result.
:param cmd: The command to run
:return: Integer value of result, or None if an error occurred
"""
result = check_output_and_strip(cmd) # may return None
if result is not None:
try:
return int(result)
except ValueError:
# ValueError is raised if int conversion fails
return None
return None
def boot2docker_id() -> Tuple[Optional[int], Optional[int]]:
"""
Get the UID and GID of the docker user inside a running boot2docker vm.
:return: Tuple (UID, GID), or (None, None) if error (e.g. boot2docker not present or stopped)
"""
uid = cmd_output_to_int(["boot2docker", "ssh", "id", "-"])
gid = cmd_output_to_int(["boot2docker", "ssh", "id", "-g"])
return (uid, gid)
def docker_machine_id() -> Tuple[Optional[int], Optional[int]]:
"""
Ask docker-machine for active machine and gets the UID of the docker user.
inside the vm
:return: tuple (UID, GID), or (None, None) if error (e.g. docker-machine not present or stopped)
"""
machine_name = docker_machine_name()
if not machine_name:
return (None, None)
uid = cmd_output_to_int(["docker-machine", "ssh", machine_name, "id -"])
gid = cmd_output_to_int(["docker-machine", "ssh", machine_name, "id -g"])
return (uid, gid)
if __name__ == "__main__":
print(docker_vm_id())
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/cwltool/env_to_stdout.py 0000644 0001750 0001750 00000001626 00000000000 020412 0 ustar 00peter peter r"""Python script that acts like (GNU coreutils) env -0.
When run as a script, it prints the the environment as
`(VARNAME=value\0)*`.
Ideally we would just use `env -0`, because python (thanks to PEPs 538
and 540) will set zero to two environment variables to better handle
Unicode-locale interactions, however BSD familiy implementations of
`env` do not all support the `-0` flag so we supply this script that
produces equivalent output.
"""
import os
from typing import Dict
def deserialize_env(data: str) -> Dict[str, str]:
"""Deserialize the output of `env -0` to dictionary."""
ans = {}
for item in data.strip("\0").split("\0"):
key, val = item.split("=", 1)
ans[key] = val
return ans
def main() -> None:
"""Print the null-separated enviroment to stdout."""
for k, v in os.environ.items():
print(f"{k}={v}", end="\0")
if __name__ == "__main__":
main()
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/cwltool/errors.py 0000644 0001750 0001750 00000000516 00000000000 017027 0 ustar 00peter peter class WorkflowException(Exception):
pass
class UnsupportedRequirement(WorkflowException):
pass
class ArgumentException(Exception):
"""Mismatched command line arguments provided."""
class GraphTargetMissingException(WorkflowException):
"""When a $graph is encountered and there is no target and no main/#main."""
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/cwltool/executors.py 0000644 0001750 0001750 00000043750 00000000000 017543 0 ustar 00peter peter """Single and multi-threaded executors."""
import datetime
import functools
import logging
import math
import os
import threading
from abc import ABCMeta, abstractmethod
from threading import Lock
from typing import (
Dict,
Iterable,
List,
MutableSequence,
Optional,
Set,
Tuple,
Union,
cast,
)
import psutil
from schema_salad.exceptions import ValidationException
from schema_salad.sourceline import SourceLine
from .command_line_tool import CallbackJob, ExpressionJob
from .context import RuntimeContext, getdefault
from .errors import WorkflowException
from .job import JobBase
from .loghandler import _logger
from .mutation import MutationManager
from .process import Process, cleanIntermediate, relocateOutputs
from .provenance_profile import ProvenanceProfile
from .task_queue import TaskQueue
from .update import ORIGINAL_CWLVERSION
from .utils import CWLObjectType, JobsType
from .workflow import Workflow
from .workflow_job import WorkflowJob, WorkflowJobStep
TMPDIR_LOCK = Lock()
class JobExecutor(metaclass=ABCMeta):
"""Abstract base job executor."""
def __init__(self) -> None:
"""Initialize."""
self.final_output = [] # type: MutableSequence[Optional[CWLObjectType]]
self.final_status = [] # type: List[str]
self.output_dirs = set() # type: Set[str]
def __call__(
self,
process: Process,
job_order_object: CWLObjectType,
runtime_context: RuntimeContext,
logger: logging.Logger = _logger,
) -> Tuple[Optional[CWLObjectType], str]:
return self.execute(process, job_order_object, runtime_context, logger)
def output_callback(
self, out: Optional[CWLObjectType], process_status: str
) -> None:
"""Collect the final status and outputs."""
self.final_status.append(process_status)
self.final_output.append(out)
@abstractmethod
def run_jobs(
self,
process: Process,
job_order_object: CWLObjectType,
logger: logging.Logger,
runtime_context: RuntimeContext,
) -> None:
"""Execute the jobs for the given Process."""
def execute(
self,
process: Process,
job_order_object: CWLObjectType,
runtime_context: RuntimeContext,
logger: logging.Logger = _logger,
) -> Tuple[Union[Optional[CWLObjectType]], str]:
"""Execute the process."""
if not runtime_context.basedir:
raise WorkflowException("Must provide 'basedir' in runtimeContext")
def check_for_abstract_op(tool: CWLObjectType) -> None:
if tool["class"] == "Operation":
raise SourceLine(
tool, "class", WorkflowException, runtime_context.debug
).makeError("Workflow has unrunnable abstract Operation")
process.visit(check_for_abstract_op)
finaloutdir = None # Type: Optional[str]
original_outdir = runtime_context.outdir
if isinstance(original_outdir, str):
finaloutdir = os.path.abspath(original_outdir)
runtime_context = runtime_context.copy()
outdir = runtime_context.create_outdir()
self.output_dirs.add(outdir)
runtime_context.outdir = outdir
runtime_context.mutation_manager = MutationManager()
runtime_context.toplevel = True
runtime_context.workflow_eval_lock = threading.Condition(threading.RLock())
job_reqs = None # type: Optional[List[CWLObjectType]]
if "https://w3id.org/cwl/cwl#requirements" in job_order_object:
if process.metadata.get(ORIGINAL_CWLVERSION) == "v1.0":
raise WorkflowException(
"`cwl:requirements` in the input object is not part of CWL "
"v1.0. You can adjust to use `cwltool:overrides` instead; or you "
"can set the cwlVersion to v1.1"
)
job_reqs = cast(
List[CWLObjectType],
job_order_object["https://w3id.org/cwl/cwl#requirements"],
)
elif (
"cwl:defaults" in process.metadata
and "https://w3id.org/cwl/cwl#requirements"
in cast(CWLObjectType, process.metadata["cwl:defaults"])
):
if process.metadata.get(ORIGINAL_CWLVERSION) == "v1.0":
raise WorkflowException(
"`cwl:requirements` in the input object is not part of CWL "
"v1.0. You can adjust to use `cwltool:overrides` instead; or you "
"can set the cwlVersion to v1.1"
)
job_reqs = cast(
Optional[List[CWLObjectType]],
cast(CWLObjectType, process.metadata["cwl:defaults"])[
"https://w3id.org/cwl/cwl#requirements"
],
)
if job_reqs is not None:
for req in job_reqs:
process.requirements.append(req)
self.run_jobs(process, job_order_object, logger, runtime_context)
if (
self.final_output
and self.final_output[0] is not None
and finaloutdir is not None
):
self.final_output[0] = relocateOutputs(
self.final_output[0],
finaloutdir,
self.output_dirs,
runtime_context.move_outputs,
runtime_context.make_fs_access(""),
getdefault(runtime_context.compute_checksum, True),
path_mapper=runtime_context.path_mapper,
)
if runtime_context.rm_tmpdir:
if not runtime_context.cachedir:
output_dirs = self.output_dirs # type: Iterable[str]
else:
output_dirs = filter(
lambda x: not x.startswith(runtime_context.cachedir), # type: ignore
self.output_dirs,
)
cleanIntermediate(output_dirs)
if self.final_output and self.final_status:
if (
runtime_context.research_obj is not None
and isinstance(
process, (JobBase, Process, WorkflowJobStep, WorkflowJob)
)
and process.parent_wf
):
process_run_id = None # type: Optional[str]
name = "primary"
process.parent_wf.generate_output_prov(
self.final_output[0], process_run_id, name
)
process.parent_wf.document.wasEndedBy(
process.parent_wf.workflow_run_uri,
None,
process.parent_wf.engine_uuid,
datetime.datetime.now(),
)
process.parent_wf.finalize_prov_profile(name=None)
return (self.final_output[0], self.final_status[0])
return (None, "permanentFail")
class SingleJobExecutor(JobExecutor):
"""Default single-threaded CWL reference executor."""
def run_jobs(
self,
process: Process,
job_order_object: CWLObjectType,
logger: logging.Logger,
runtime_context: RuntimeContext,
) -> None:
process_run_id = None # type: Optional[str]
# define provenance profile for single commandline tool
if (
not isinstance(process, Workflow)
and runtime_context.research_obj is not None
):
process.provenance_object = ProvenanceProfile(
runtime_context.research_obj,
full_name=runtime_context.cwl_full_name,
host_provenance=False,
user_provenance=False,
orcid=runtime_context.orcid,
# single tool execution, so RO UUID = wf UUID = tool UUID
run_uuid=runtime_context.research_obj.ro_uuid,
fsaccess=runtime_context.make_fs_access(""),
)
process.parent_wf = process.provenance_object
jobiter = process.job(job_order_object, self.output_callback, runtime_context)
try:
for job in jobiter:
if job is not None:
if runtime_context.builder is not None and hasattr(job, "builder"):
job.builder = runtime_context.builder # type: ignore
if job.outdir is not None:
self.output_dirs.add(job.outdir)
if runtime_context.research_obj is not None:
if not isinstance(process, Workflow):
prov_obj = process.provenance_object
else:
prov_obj = job.prov_obj
if prov_obj:
runtime_context.prov_obj = prov_obj
prov_obj.fsaccess = runtime_context.make_fs_access("")
prov_obj.evaluate(
process,
job,
job_order_object,
runtime_context.research_obj,
)
process_run_id = prov_obj.record_process_start(process, job)
runtime_context = runtime_context.copy()
runtime_context.process_run_id = process_run_id
job.run(runtime_context)
else:
logger.error("Workflow cannot make any more progress.")
break
except (
ValidationException,
WorkflowException,
): # pylint: disable=try-except-raise
raise
except Exception as err:
logger.exception("Got workflow error")
raise WorkflowException(str(err)) from err
class MultithreadedJobExecutor(JobExecutor):
"""
Experimental multi-threaded CWL executor.
Does simple resource accounting, will not start a job unless it
has cores / ram available, but does not make any attempt to
optimize usage.
"""
def __init__(self) -> None:
"""Initialize."""
super().__init__()
self.exceptions = [] # type: List[WorkflowException]
self.pending_jobs = [] # type: List[JobsType]
self.pending_jobs_lock = threading.Lock()
self.max_ram = int(psutil.virtual_memory().available / 2**20) # type: ignore[no-untyped-call]
self.max_cores = float(psutil.cpu_count())
self.allocated_ram = float(0)
self.allocated_cores = float(0)
def select_resources(
self, request: Dict[str, Union[int, float]], runtime_context: RuntimeContext
) -> Dict[str, Union[int, float]]: # pylint: disable=unused-argument
"""Naïve check for available cpu cores and memory."""
result: Dict[str, Union[int, float]] = {}
maxrsc = {"cores": self.max_cores, "ram": self.max_ram}
for rsc in ("cores", "ram"):
rsc_min = request[rsc + "Min"]
if rsc_min > maxrsc[rsc]:
raise WorkflowException(
f"Requested at least {rsc_min} {rsc} but only "
f"{maxrsc[rsc]} available"
)
rsc_max = request[rsc + "Max"]
if rsc_max < maxrsc[rsc]:
result[rsc] = math.ceil(rsc_max)
else:
result[rsc] = maxrsc[rsc]
result["tmpdirSize"] = math.ceil(request["tmpdirMin"])
result["outdirSize"] = math.ceil(request["outdirMin"])
if "cudaDeviceCount" in request:
result["cudaDeviceCount"] = request["cudaDeviceCount"]
return result
def _runner(self, job, runtime_context, TMPDIR_LOCK):
# type: (Union[JobBase, WorkflowJob, CallbackJob, ExpressionJob], RuntimeContext, threading.Lock) -> None
"""Job running thread."""
try:
_logger.debug(
"job: {}, runtime_context: {}, TMPDIR_LOCK: {}".format(
job, runtime_context, TMPDIR_LOCK
)
)
job.run(runtime_context, TMPDIR_LOCK)
except WorkflowException as err:
_logger.exception(f"Got workflow error: {err}")
self.exceptions.append(err)
except Exception as err: # pylint: disable=broad-except
_logger.exception(f"Got workflow error: {err}")
self.exceptions.append(WorkflowException(str(err)))
finally:
if runtime_context.workflow_eval_lock:
with runtime_context.workflow_eval_lock:
if isinstance(job, JobBase):
ram = job.builder.resources["ram"]
self.allocated_ram -= ram
cores = job.builder.resources["cores"]
self.allocated_cores -= cores
runtime_context.workflow_eval_lock.notifyAll()
def run_job(
self,
job: Optional[JobsType],
runtime_context: RuntimeContext,
) -> None:
"""Execute a single Job in a seperate thread."""
if job is not None:
with self.pending_jobs_lock:
self.pending_jobs.append(job)
with self.pending_jobs_lock:
n = 0
while (n + 1) <= len(self.pending_jobs):
# Simple greedy resource allocation strategy. Go
# through pending jobs in the order they were
# generated and add them to the queue only if there
# are resources available.
job = self.pending_jobs[n]
if isinstance(job, JobBase):
ram = job.builder.resources["ram"]
cores = job.builder.resources["cores"]
if ram > self.max_ram or cores > self.max_cores:
_logger.error(
'Job "%s" cannot be run, requests more resources (%s) '
"than available on this host (max ram %d, max cores %d",
job.name,
job.builder.resources,
self.allocated_ram,
self.allocated_cores,
self.max_ram,
self.max_cores,
)
self.pending_jobs.remove(job)
return
if (
self.allocated_ram + ram > self.max_ram
or self.allocated_cores + cores > self.max_cores
):
_logger.debug(
'Job "%s" cannot run yet, resources (%s) are not '
"available (already allocated ram is %d, allocated cores is %d, "
"max ram %d, max cores %d",
job.name,
job.builder.resources,
self.allocated_ram,
self.allocated_cores,
self.max_ram,
self.max_cores,
)
n += 1
continue
if isinstance(job, JobBase):
ram = job.builder.resources["ram"]
self.allocated_ram += ram
cores = job.builder.resources["cores"]
self.allocated_cores += cores
self.taskqueue.add(
functools.partial(self._runner, job, runtime_context, TMPDIR_LOCK),
runtime_context.workflow_eval_lock,
)
self.pending_jobs.remove(job)
def wait_for_next_completion(self, runtime_context):
# type: (RuntimeContext) -> None
"""Wait for jobs to finish."""
if runtime_context.workflow_eval_lock is not None:
runtime_context.workflow_eval_lock.wait(timeout=3)
if self.exceptions:
raise self.exceptions[0]
def run_jobs(
self,
process: Process,
job_order_object: CWLObjectType,
logger: logging.Logger,
runtime_context: RuntimeContext,
) -> None:
self.taskqueue = TaskQueue(
threading.Lock(), psutil.cpu_count()
) # type: TaskQueue
try:
jobiter = process.job(
job_order_object, self.output_callback, runtime_context
)
if runtime_context.workflow_eval_lock is None:
raise WorkflowException(
"runtimeContext.workflow_eval_lock must not be None"
)
runtime_context.workflow_eval_lock.acquire()
for job in jobiter:
if job is not None:
if isinstance(job, JobBase):
job.builder = runtime_context.builder or job.builder
if job.outdir is not None:
self.output_dirs.add(job.outdir)
self.run_job(job, runtime_context)
if job is None:
if self.taskqueue.in_flight > 0:
self.wait_for_next_completion(runtime_context)
else:
logger.error("Workflow cannot make any more progress.")
break
self.run_job(None, runtime_context)
while self.taskqueue.in_flight > 0:
self.wait_for_next_completion(runtime_context)
self.run_job(None, runtime_context)
runtime_context.workflow_eval_lock.release()
finally:
self.taskqueue.drain()
self.taskqueue.join()
class NoopJobExecutor(JobExecutor):
"""Do nothing executor, for testing purposes only."""
def run_jobs(
self,
process: Process,
job_order_object: CWLObjectType,
logger: logging.Logger,
runtime_context: RuntimeContext,
) -> None:
pass
def execute(
self,
process: Process,
job_order_object: CWLObjectType,
runtime_context: RuntimeContext,
logger: Optional[logging.Logger] = None,
) -> Tuple[Optional[CWLObjectType], str]:
return {}, "success"
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/cwltool/expression.py 0000644 0001750 0001750 00000032245 00000000000 017716 0 ustar 00peter peter """Parse CWL expressions."""
import copy
import json
import re
from typing import (
Any,
Dict,
List,
Mapping,
MutableMapping,
MutableSequence,
Optional,
Tuple,
Union,
cast,
)
from schema_salad.utils import json_dumps
from .errors import WorkflowException
from .loghandler import _logger
from .sandboxjs import JavascriptException, default_timeout, execjs
from .utils import CWLObjectType, CWLOutputType, bytes2str_in_dicts
def jshead(engine_config: List[str], rootvars: CWLObjectType) -> str:
# make sure all the byte strings are converted
# to str in `rootvars` dict.
return "\n".join(
engine_config
+ [f"var {k} = {json_dumps(v, indent=4)};" for k, v in rootvars.items()]
)
# decode all raw strings to unicode
seg_symbol = r"""\w+"""
seg_single = r"""\['([^']|\\')+'\]"""
seg_double = r"""\["([^"]|\\")+"\]"""
seg_index = r"""\[[0-9]+\]"""
segments = rf"(\.{seg_symbol}|{seg_single}|{seg_double}|{seg_index})"
segment_re = re.compile(segments, flags=re.UNICODE)
param_str = rf"\(({seg_symbol}){segments}*\)$"
param_re = re.compile(param_str, flags=re.UNICODE)
class SubstitutionError(Exception):
pass
def scanner(scan: str) -> Optional[Tuple[int, int]]:
DEFAULT = 0
DOLLAR = 1
PAREN = 2
BRACE = 3
SINGLE_QUOTE = 4
DOUBLE_QUOTE = 5
BACKSLASH = 6
i = 0
stack = [DEFAULT]
start = 0
while i < len(scan):
state = stack[-1]
c = scan[i]
if state == DEFAULT:
if c == "$":
stack.append(DOLLAR)
elif c == "\\":
stack.append(BACKSLASH)
elif state == BACKSLASH:
stack.pop()
if stack[-1] == DEFAULT:
return (i - 1, i + 1)
elif state == DOLLAR:
if c == "(":
start = i - 1
stack.append(PAREN)
elif c == "{":
start = i - 1
stack.append(BRACE)
else:
stack.pop()
i -= 1
elif state == PAREN:
if c == "(":
stack.append(PAREN)
elif c == ")":
stack.pop()
if stack[-1] == DOLLAR:
return (start, i + 1)
elif c == "'":
stack.append(SINGLE_QUOTE)
elif c == '"':
stack.append(DOUBLE_QUOTE)
elif state == BRACE:
if c == "{":
stack.append(BRACE)
elif c == "}":
stack.pop()
if stack[-1] == DOLLAR:
return (start, i + 1)
elif c == "'":
stack.append(SINGLE_QUOTE)
elif c == '"':
stack.append(DOUBLE_QUOTE)
elif state == SINGLE_QUOTE:
if c == "'":
stack.pop()
elif c == "\\":
stack.append(BACKSLASH)
elif state == DOUBLE_QUOTE:
if c == '"':
stack.pop()
elif c == "\\":
stack.append(BACKSLASH)
i += 1
if len(stack) > 1 and not (len(stack) == 2 and stack[1] in (BACKSLASH, DOLLAR)):
raise SubstitutionError(
"Substitution error, unfinished block starting at position {}: '{}' stack was {}".format(
start, scan[start:], stack
)
)
return None
def next_seg(
parsed_string: str, remaining_string: str, current_value: CWLOutputType
) -> CWLOutputType:
if remaining_string:
m = segment_re.match(remaining_string)
if not m:
return current_value
next_segment_str = m.group(1)
key = None # type: Optional[Union[str, int]]
if next_segment_str[0] == ".":
key = next_segment_str[1:]
elif next_segment_str[1] in ("'", '"'):
key = next_segment_str[2:-2].replace("\\'", "'").replace('\\"', '"')
if key is not None:
if (
isinstance(current_value, MutableSequence)
and key == "length"
and not remaining_string[m.end(1) :]
):
return len(current_value)
if not isinstance(current_value, MutableMapping):
raise WorkflowException(
"%s is a %s, cannot index on string '%s'"
% (parsed_string, type(current_value).__name__, key)
)
if key not in current_value:
raise WorkflowException(f"{parsed_string} does not contain key '{key}'")
else:
try:
key = int(next_segment_str[1:-1])
except ValueError as v:
raise WorkflowException(str(v)) from v
if not isinstance(current_value, MutableSequence):
raise WorkflowException(
"%s is a %s, cannot index on int '%s'"
% (parsed_string, type(current_value).__name__, key)
)
if key and key >= len(current_value):
raise WorkflowException(
"%s list index %i out of range" % (parsed_string, key)
)
if isinstance(current_value, Mapping):
try:
return next_seg(
parsed_string + remaining_string,
remaining_string[m.end(1) :],
cast(CWLOutputType, current_value[cast(str, key)]),
)
except KeyError:
raise WorkflowException(f"{parsed_string} doesn't have property {key}")
elif isinstance(current_value, list) and isinstance(key, int):
try:
return next_seg(
parsed_string + remaining_string,
remaining_string[m.end(1) :],
current_value[key],
)
except KeyError:
raise WorkflowException(f"{parsed_string} doesn't have property {key}")
else:
raise WorkflowException(f"{parsed_string} doesn't have property {key}")
else:
return current_value
def evaluator(
ex: str,
jslib: str,
obj: CWLObjectType,
timeout: float,
fullJS: bool = False,
force_docker_pull: bool = False,
debug: bool = False,
js_console: bool = False,
container_engine: str = "docker",
) -> Optional[CWLOutputType]:
match = param_re.match(ex)
expression_parse_exception = None
if match is not None:
first_symbol = match.group(1)
first_symbol_end = match.end(1)
if first_symbol_end + 1 == len(ex) and first_symbol == "null":
return None
try:
if first_symbol not in obj:
raise WorkflowException("%s is not defined" % first_symbol)
return next_seg(
first_symbol,
ex[first_symbol_end:-1],
cast(CWLOutputType, obj[first_symbol]),
)
except WorkflowException as werr:
expression_parse_exception = werr
if fullJS:
return execjs(
ex,
jslib,
timeout,
force_docker_pull=force_docker_pull,
debug=debug,
js_console=js_console,
container_engine=container_engine,
)
else:
if expression_parse_exception is not None:
raise JavascriptException(
"Syntax error in parameter reference '%s': %s. This could be "
"due to using Javascript code without specifying "
"InlineJavascriptRequirement." % (ex[1:-1], expression_parse_exception)
)
else:
raise JavascriptException(
"Syntax error in parameter reference '%s'. This could be due "
"to using Javascript code without specifying "
"InlineJavascriptRequirement." % ex
)
def _convert_dumper(string: str) -> str:
return f"{json.dumps(string)} + "
def interpolate(
scan: str,
rootvars: CWLObjectType,
timeout: float = default_timeout,
fullJS: bool = False,
jslib: str = "",
force_docker_pull: bool = False,
debug: bool = False,
js_console: bool = False,
strip_whitespace: bool = True,
escaping_behavior: int = 2,
convert_to_expression: bool = False,
container_engine: str = "docker",
) -> Optional[CWLOutputType]:
"""
Interpolate and evaluate.
Note: only call with convert_to_expression=True on CWL Expressions in $()
form that need interpolation.
"""
if strip_whitespace:
scan = scan.strip()
parts = []
if convert_to_expression:
dump = _convert_dumper
parts.append("${return ")
else:
dump = lambda x: x
w = scanner(scan)
while w:
if convert_to_expression:
parts.append(f'"{scan[0 : w[0]]}" + ')
else:
parts.append(scan[0 : w[0]])
if scan[w[0]] == "$":
if not convert_to_expression:
e = evaluator(
scan[w[0] + 1 : w[1]],
jslib,
rootvars,
timeout,
fullJS=fullJS,
force_docker_pull=force_docker_pull,
debug=debug,
js_console=js_console,
container_engine=container_engine,
)
if w[0] == 0 and w[1] == len(scan) and len(parts) <= 1:
return e
leaf = json_dumps(e, sort_keys=True)
if leaf[0] == '"':
leaf = json.loads(leaf)
parts.append(leaf)
else:
parts.append(
"function(){var item ="
+ scan[w[0] : w[1]][2:-1]
+ '; if (typeof(item) === "string"){ return item; } else { return JSON.stringify(item); }}() + '
)
elif scan[w[0]] == "\\":
if escaping_behavior == 1:
# Old behavior. Just skip the next character.
e = scan[w[1] - 1]
parts.append(dump(e))
elif escaping_behavior == 2:
# Backslash quoting requires a three character lookahead.
e = scan[w[0] : w[1] + 1]
if e in ("\\$(", "\\${"):
# Suppress start of a parameter reference, drop the
# backslash.
parts.append(dump(e[1:]))
w = (w[0], w[1] + 1)
elif e[1] == "\\":
# Double backslash, becomes a single backslash
parts.append(dump("\\"))
else:
# Some other text, add it as-is (including the
# backslash) and resume scanning.
parts.append(dump(e[:2]))
else:
raise Exception("Unknown escaping behavior %s" % escaping_behavior)
scan = scan[w[1] :]
w = scanner(scan)
if convert_to_expression:
parts.append(f'"{scan}"')
parts.append(";}")
else:
parts.append(scan)
return "".join(parts)
def needs_parsing(snippet: Any) -> bool:
return isinstance(snippet, str) and ("$(" in snippet or "${" in snippet)
def do_eval(
ex: Optional[CWLOutputType],
jobinput: CWLObjectType,
requirements: List[CWLObjectType],
outdir: Optional[str],
tmpdir: Optional[str],
resources: Dict[str, Union[float, int]],
context: Optional[CWLOutputType] = None,
timeout: float = default_timeout,
force_docker_pull: bool = False,
debug: bool = False,
js_console: bool = False,
strip_whitespace: bool = True,
cwlVersion: str = "",
container_engine: str = "docker",
) -> Optional[CWLOutputType]:
runtime = cast(MutableMapping[str, Union[int, str, None]], copy.deepcopy(resources))
runtime["tmpdir"] = tmpdir if tmpdir else None
runtime["outdir"] = outdir if outdir else None
rootvars = cast(
CWLObjectType,
bytes2str_in_dicts({"inputs": jobinput, "self": context, "runtime": runtime}),
)
if isinstance(ex, str) and needs_parsing(ex):
fullJS = False
jslib = ""
for r in reversed(requirements):
if r["class"] == "InlineJavascriptRequirement":
fullJS = True
jslib = jshead(cast(List[str], r.get("expressionLib", [])), rootvars)
break
try:
return interpolate(
ex,
rootvars,
timeout=timeout,
fullJS=fullJS,
jslib=jslib,
force_docker_pull=force_docker_pull,
debug=debug,
js_console=js_console,
strip_whitespace=strip_whitespace,
escaping_behavior=1
if cwlVersion
in (
"v1.0",
"v1.1.0-dev1",
"v1.1",
"v1.2.0-dev1",
"v1.2.0-dev2",
"v1.2.0-dev3",
)
else 2,
container_engine=container_engine,
)
except Exception as e:
_logger.exception(e)
raise WorkflowException("Expression evaluation error:\n%s" % str(e)) from e
else:
return ex
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/cwltool/extensions-v1.1.yml 0000644 0001750 0001750 00000006731 00000000000 020553 0 ustar 00peter peter $base: http://commonwl.org/cwltool#
$namespaces:
cwl: "https://w3id.org/cwl/cwl#"
$graph:
- $import: https://w3id.org/cwl/CommonWorkflowLanguage.yml
- name: Secrets
type: record
inVocab: false
extends: cwl:ProcessRequirement
fields:
class:
type: string
doc: "Always 'Secrets'"
jsonldPredicate:
"_id": "@type"
"_type": "@vocab"
secrets:
type: string[]
doc: |
List one or more input parameters that are sensitive (such as passwords)
which will be deliberately obscured from logging.
jsonldPredicate:
"_type": "@id"
refScope: 0
- name: ProcessGenerator
type: record
inVocab: true
extends: cwl:Process
documentRoot: true
fields:
- name: class
jsonldPredicate:
"_id": "@type"
"_type": "@vocab"
type: string
- name: run
type: [string, cwl:Process]
jsonldPredicate:
_id: "cwl:run"
_type: "@id"
subscope: run
doc: |
Specifies the process to run.
- name: MPIRequirement
type: record
inVocab: false
extends: cwl:ProcessRequirement
doc: |
Indicates that a process requires an MPI runtime.
fields:
- name: class
type: string
doc: "Always 'MPIRequirement'"
jsonldPredicate:
"_id": "@type"
"_type": "@vocab"
- name: processes
type: [int, cwl:Expression]
doc: |
The number of MPI processes to start. If you give a string,
this will be evaluated as a CWL Expression and it must
evaluate to an integer.
- name: CUDARequirement
type: record
extends: cwl:ProcessRequirement
inVocab: false
doc: |
Require support for NVIDA CUDA (GPU hardware acceleration).
fields:
class:
type: string
doc: 'cwltool:CUDARequirement'
jsonldPredicate:
_id: "@type"
_type: "@vocab"
cudaVersionMin:
type: string
doc: |
Minimum CUDA version to run the software, in X.Y format. This
corresponds to a CUDA SDK release. When running directly on
the host (not in a container) the host must have a compatible
CUDA SDK (matching the exact version, or, starting with CUDA
11.3, matching major version). When run in a container, the
container image should provide the CUDA runtime, and the host
driver is injected into the container. In this case, because
CUDA drivers are backwards compatible, it is possible to
use an older SDK with a newer driver across major versions.
See https://docs.nvidia.com/deploy/cuda-compatibility/ for
details.
cudaComputeCapability:
type:
- 'string'
- 'string[]'
doc: |
CUDA hardware capability required to run the software, in X.Y
format.
* If this is a single value, it defines only the minimum
compute capability. GPUs with higher capability are also
accepted.
* If it is an array value, then only select GPUs with compute
capabilities that explicitly appear in the array.
cudaDeviceCountMin:
type: ['null', int, cwl:Expression]
default: 1
doc: |
Minimum number of GPU devices to request. If not specified,
same as `cudaDeviceCountMax`. If neither are specified,
default 1.
cudaDeviceCountMax:
type: ['null', int, cwl:Expression]
doc: |
Maximum number of GPU devices to request. If not specified,
same as `cudaDeviceCountMin`.
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/cwltool/extensions.yml 0000644 0001750 0001750 00000015205 00000000000 020064 0 ustar 00peter peter $base: http://commonwl.org/cwltool#
$namespaces:
cwl: "https://w3id.org/cwl/cwl#"
$graph:
- $import: https://w3id.org/cwl/CommonWorkflowLanguage.yml
- name: LoadListingRequirement
type: record
extends: cwl:ProcessRequirement
inVocab: false
fields:
class:
type: string
doc: "Always 'LoadListingRequirement'"
jsonldPredicate:
"_id": "@type"
"_type": "@vocab"
loadListing:
type:
- type: enum
name: LoadListingEnum
symbols: [no_listing, shallow_listing, deep_listing]
- name: InplaceUpdateRequirement
type: record
inVocab: false
extends: cwl:ProcessRequirement
fields:
class:
type: string
doc: "Always 'InplaceUpdateRequirement'"
jsonldPredicate:
"_id": "@type"
"_type": "@vocab"
inplaceUpdate:
type: boolean
- name: Secrets
type: record
inVocab: false
extends: cwl:ProcessRequirement
fields:
class:
type: string
doc: "Always 'Secrets'"
jsonldPredicate:
"_id": "@type"
"_type": "@vocab"
secrets:
type: string[]
doc: |
List one or more input parameters that are sensitive (such as passwords)
which will be deliberately obscured from logging.
jsonldPredicate:
"_type": "@id"
refScope: 0
- name: TimeLimit
type: record
inVocab: false
extends: cwl:ProcessRequirement
doc: |
Set an upper limit on the execution time of a CommandLineTool or
ExpressionTool. A tool execution which exceeds the time limit may
be preemptively terminated and considered failed. May also be
used by batch systems to make scheduling decisions.
fields:
- name: class
type: string
doc: "Always 'TimeLimit'"
jsonldPredicate:
"_id": "@type"
"_type": "@vocab"
- name: timelimit
type: [long, string]
doc: |
The time limit, in seconds. A time limit of zero means no
time limit. Negative time limits are an error.
- name: WorkReuse
type: record
inVocab: false
extends: cwl:ProcessRequirement
doc: |
For implementations that support reusing output from past work (on
the assumption that same code and same input produce same
results), control whether to enable or disable the reuse behavior
for a particular tool or step (to accomodate situations where that
assumption is incorrect). A reused step is not executed but
instead returns the same output as the original execution.
If `enableReuse` is not specified, correct tools should assume it
is enabled by default.
fields:
- name: class
type: string
doc: "Always 'WorkReuse'"
jsonldPredicate:
"_id": "@type"
"_type": "@vocab"
- name: enableReuse
type: [boolean, string]
#default: true
- name: NetworkAccess
type: record
inVocab: false
extends: cwl:ProcessRequirement
doc: |
Indicate whether a process requires outgoing IPv4/IPv6 network
access. Choice of IPv4 or IPv6 is implementation and site
specific, correct tools must support both.
If `networkAccess` is false or not specified, tools must not
assume network access, except for localhost (the loopback device).
If `networkAccess` is true, the tool must be able to make outgoing
connections to network resources. Resources may be on a private
subnet or the public Internet. However, implementations and sites
may apply their own security policies to restrict what is
accessible by the tool.
Enabling network access does not imply a publically routable IP
address or the ability to accept inbound connections.
fields:
- name: class
type: string
doc: "Always 'NetworkAccess'"
jsonldPredicate:
"_id": "@type"
"_type": "@vocab"
- name: networkAccess
type: [boolean, string]
- name: ProcessGenerator
type: record
inVocab: true
extends: cwl:Process
documentRoot: true
fields:
- name: class
jsonldPredicate:
"_id": "@type"
"_type": "@vocab"
type: string
- name: run
type: [string, cwl:Process]
jsonldPredicate:
_id: "cwl:run"
_type: "@id"
doc: |
Specifies the process to run.
- name: MPIRequirement
type: record
inVocab: false
extends: cwl:ProcessRequirement
doc: |
Indicates that a process requires an MPI runtime.
fields:
- name: class
type: string
doc: "Always 'MPIRequirement'"
jsonldPredicate:
"_id": "@type"
"_type": "@vocab"
- name: processes
type: [int, cwl:Expression]
doc: |
The number of MPI processes to start. If you give a string,
this will be evaluated as a CWL Expression and it must
evaluate to an integer.
- name: CUDARequirement
type: record
extends: cwl:ProcessRequirement
inVocab: false
doc: |
Require support for NVIDA CUDA (GPU hardware acceleration).
fields:
class:
type: string
doc: 'cwltool:CUDARequirement'
jsonldPredicate:
_id: "@type"
_type: "@vocab"
cudaVersionMin:
type: string
doc: |
Minimum CUDA version to run the software, in X.Y format. This
corresponds to a CUDA SDK release. When running directly on
the host (not in a container) the host must have a compatible
CUDA SDK (matching the exact version, or, starting with CUDA
11.3, matching major version). When run in a container, the
container image should provide the CUDA runtime, and the host
driver is injected into the container. In this case, because
CUDA drivers are backwards compatible, it is possible to
use an older SDK with a newer driver across major versions.
See https://docs.nvidia.com/deploy/cuda-compatibility/ for
details.
cudaComputeCapability:
type:
- 'string'
- 'string[]'
doc: |
CUDA hardware capability required to run the software, in X.Y
format.
* If this is a single value, it defines only the minimum
compute capability. GPUs with higher capability are also
accepted.
* If it is an array value, then only select GPUs with compute
capabilities that explicitly appear in the array.
cudaDeviceCountMin:
type: ['null', int, cwl:Expression]
default: 1
doc: |
Minimum number of GPU devices to request. If not specified,
same as `cudaDeviceCountMax`. If neither are specified,
default 1.
cudaDeviceCountMax:
type: ['null', int, cwl:Expression]
doc: |
Maximum number of GPU devices to request. If not specified,
same as `cudaDeviceCountMin`.
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/cwltool/factory.py 0000644 0001750 0001750 00000004530 00000000000 017162 0 ustar 00peter peter import os
from typing import Any, Dict, Optional, Union
from . import load_tool
from .context import LoadingContext, RuntimeContext
from .errors import WorkflowException
from .executors import JobExecutor, SingleJobExecutor
from .process import Process
from .utils import CWLObjectType
class WorkflowStatus(Exception):
def __init__(self, out: Optional[CWLObjectType], status: str) -> None:
"""Signaling exception for the status of a Workflow."""
super().__init__("Completed %s" % status)
self.out = out
self.status = status
class Callable:
"""Result of Factory.make()."""
def __init__(self, t: Process, factory: "Factory") -> None:
"""Initialize."""
self.t = t
self.factory = factory
def __call__(self, **kwargs):
# type: (**Any) -> Union[str, Optional[CWLObjectType]]
runtime_context = self.factory.runtime_context.copy()
runtime_context.basedir = os.getcwd()
out, status = self.factory.executor(self.t, kwargs, runtime_context)
if status != "success":
raise WorkflowStatus(out, status)
else:
return out
class Factory:
"""Easy way to load a CWL document for execution."""
loading_context: LoadingContext
runtime_context: RuntimeContext
def __init__(
self,
executor: Optional[JobExecutor] = None,
loading_context: Optional[LoadingContext] = None,
runtime_context: Optional[RuntimeContext] = None,
) -> None:
if executor is None:
executor = SingleJobExecutor()
self.executor = executor
if runtime_context is None:
self.runtime_context = RuntimeContext()
else:
self.runtime_context = runtime_context
if loading_context is None:
self.loading_context = LoadingContext()
self.loading_context.singularity = self.runtime_context.singularity
self.loading_context.podman = self.runtime_context.podman
else:
self.loading_context = loading_context
def make(self, cwl: Union[str, Dict[str, Any]]) -> Callable:
"""Instantiate a CWL object from a CWl document."""
load = load_tool.load_tool(cwl, self.loading_context)
if isinstance(load, int):
raise WorkflowException("Error loading tool")
return Callable(load, self)
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/cwltool/flatten.py 0000644 0001750 0001750 00000001300 00000000000 017140 0 ustar 00peter peter from typing import Any, Callable, List, cast
# http://rightfootin.blogspot.com/2006/09/more-on-python-flatten.html
def flatten(thing, ltypes=(list, tuple)):
# type: (Any, Any) -> List[Any]
if thing is None:
return []
if not isinstance(thing, ltypes):
return [thing]
ltype = type(thing)
thing_list = list(thing)
i = 0
while i < len(thing_list):
while isinstance(thing_list[i], ltypes):
if not thing_list[i]:
thing_list.pop(i)
i -= 1
break
else:
thing_list[i : i + 1] = thing_list[i]
i += 1
return cast(Callable[[Any], List[Any]], ltype)(thing_list)
././@PaxHeader 0000000 0000000 0000000 00000000026 00000000000 010213 x ustar 00 22 mtime=1645718428.0
cwltool-3.1.20220224085855/cwltool/hello.simg 0000755 0001750 0001750 00000010037 00000000000 017127 0 ustar 00peter peter #!/usr/bin/env run-singularity
hsqs% 1[ w o V
a xYnvЋ}8M8mڭeG""h
cHPufhE o^!$yR)ɱ)Lr+?Zm,PiN)a^7GCS_Ԏ[gcQdk58^ӣ@3E7NY`=?;{Oۭ)C1lmNDei>{.
pe"|= c5a"-tFa`ƓTb˭Ι#t"KJ@6\F*<|vd
CUEg%ڃL$Y$+%ft̓mȃyw{|P+b-}#bY}a0iot/揠L*HEf3*,Ҥj6rlIJd"RimӦ!1i+e.%ϜSb_0x쁑KGޝOOخ&QFiWG+R':mhwk]>,7͛9?rPbރ~į0\tYIC5W<6*7 6MDTx2e^sՌM0aF1u"RqΕ&B!J
@+Mb貹i[^/nj)Kp(xh)rIdk&+%Yԙ
=83|#:YGР̩%u"Bj`fȩx~i~xT"z?LS