Workflow Definition Language (WDL) and Common Workflow Language (CWL) are high-level languages for describing how to run a sequence of programs to perform a data analysis task. A workflow consists of a series of steps that are connected by input/output dependencies.
CWL is the product of community-based open source standards process, and workflows written in CWL are portable across a number of different software platforms (e.g. Arvados, Toil, CWL-Airflow, Seven Bridges). WDL is also open source, but based largely around a single implementation (Cromwell), however some workflows that are important to the bioinformatics community are only maintained in WDL.
The goal of this project is to develop a translator that takes a WDL workflow and produces an equivalent workflow in CWL. When executed with the same input, the translated workflow should produce equivalent results to the original workflow. An ideal demonstration of capability would be to translate the Broad Institute WDL Analysis Research Pipelines (WARP) Whole Genome Germline Single Sample workflow, run it on a scale-out, production CWL runner (such as Arvados or Toil), and show that the results are equivalent.
More background reading on CWL:
- A recent paper: https://arxiv.org/abs/2105.07028 (Full PDF)
- https://www.commonwl.org/user_guide/
This project uses the CWL parser and objects from cwl_utils.parser.cwl_v1_2
miniwdl is used for WDL parsing,
and while we target OpenWDL 1.1, earlier versions of (Open)WDL seem to work
thanks to the flexibility of the miniwdl
parser.
For some discussion comparing the two languages (mainly from the perspective of translating in the other direction, CWL to WDL), see this document:
https://github.com/dnanexus/dxCompiler/blob/main/doc/CWL_v1.2.0_to_WDL_v1.md
Python 3.7+
These instructions assume a Linux / macOS operating system.
git clone https://github.com/common-workflow-lab/wdl-cwl-translator/
cd wdl-cwl-translator
python3 -m venv env
source env/bin/activate
pip install -U pip setuptools wheel
pip install -e .
wdl2cwl path_to_wdl_file
To output the CWL version to your terminal/stdout.
wdl2cwl path_to_workflow.wdl --output path_to_new_workflow.cwl
WDL features not yet supported
- Advanced Scatter
WDL types not yet supported
- Non-static Map types
- Nested structs
- Pair
- Object
OpenWDL 1.1 standard library functions to be implemented
- floor
- min
- max
- stderr
- read_map
- read_object
- read_objects
- read_json
- write_lines
- write_tsv
- write_map
- write_object
- write_objects
- write_json
- range
- transpose
- zip
- unzip
- cross
- prefix
- suffix
- squote
- as_pairs
- as_map
- keys
- collect_by_keys
Many of the above are straightforward to implement, but we haven't needed them yet. So if you are unable to translate a particular WDL document due to lackof a standard library function, please open an issue and share your example!
- Dynamic specification of Docker containers.
As of CWL v1.2, CWL's
DockerRequirement
has no support for dynamic specifications, only fixed values. If a WDL task has aruntime.docker
that references an input with a default value, thenwdl2cwl
does try to copy that default value to the CWLDockerRequirement.dockerPull
.
If changing the software container is needed, there are several workarounds:
- Use workflow runner/engine provided overrides: many CWL runners
(including those based upon the CWL reference runner,
cwltool
) support overriding requirements at any level at run time. See https://github.com/common-workflow-language/cwltool#overriding-workflow-requirements-at-load-time - Manually override the
DockerRequirement
inhints
by specifying your own container at the CWL workflow step level underrequirements
- Manually editing the CommandLineTool definition yourself.
(Open)WDL assumes that users will configure localization by placing
input files in the same directory. Descriptions that require this will need
modification before conversion to CWL, as CWL has explicit constructs for
achieving localization (secondaryFiles
, InitialWorkDirRequirement
, and/or
explicit staging).
See this example
for one method using explicit staging of input files in the command
block to
achieve the localization required by the tool(s) being called.
If you are converting a WDL workflow to the CWL format and the original WDL document is the "source of truth", then one should avoid making manual changes to the CWL as you will need to maintain those changes as the source WDL document(s) changes.
Otherwise, for those users looking to convert from WDL to CWL and then continue to modify the CWL directly, then we have the following advice:
Consider swapping the wdl2cwl
translation of the WDL tasks for
community maintained CWL descriptions for popular tools
when possible. Follow the instructions on usage
and update the run
line to refer to a local path or a "raw" GitHub URL of the
community-maintained tool description. You may need to adjust a few input names to
match. Of course, we are happy to receive your enhancements and additional CWL
bio* tool descriptions!
For the resulting CWL Workflow
and any CWL CommandLineTool
s not swapped for
idiomatic CWL descriptions, consider using the following CWL features absent in WDL
- Consider collapsing nested WDL scatters in a single multi-dimensional CWL scatter
- Use
secondaryFiles
instead of implicit file co-localizaton for when you have a file and its index(es). - Adding
format
specifiers to input and outputFile
s and arrays ofFile
s both at theWorkflow
andCommandLineTool
levels. This helps improve the type checking of the workflow and anyone wanting to re-use or adapt the individualCommandLineTool
s. - In addition to the
minCores
in ResourceRequirement, consider setting themaxCores
if the tool is known to not benefit from additional cores after a certain amount. - Retrieving the actual number of cores allocated via
$(runtime.cores)
to pass to your tools. - Need the absolute path of the working directory? You can use
$(runtime.outdir)
. - Only running a single command and are redirecting a file into in (
my_tool < input_file
)? You can change the input to betype: stdin
instead oftype: File
and drop the< input_file
as a shortcut. - Specify the underlying tool(s) required beyond a
DockerRequirement
viaSoftwareRequirement
. This makes for good documentation, helps give credit to the authors of the tool(s), and makes it easier for those who want to run with local software, conda packages, and other non-containerized environments. - Moving any environment variable settings (
export FOO=bar
) present in thescript.bash
to anEnvVarRequirement
Be careful, if thescript.bash
runs many commands and the environment variables are not set at the beginning, that may be due to them not being appropriate for all the commands; so test to confirm that they are safe to move to anEnvVarRequirement
and if you aren't sure, leave them there. Per-tool invocations with environment variables likeFOO=bar name_of_tool.pl --option
are also a candidate if (1) there are no other tools invoked or (2) they all have the same environment variables set or (3) they other tools ignore the environment variables. - Many WDL
command
sections create output directories and perform other "housekeeping" that is not necessary in CWL, like symlinking files to change names or otherwise arrange the input files. Output directories that themselves don't become aDirectory
type are likely removable. If a specific arrangement of inputs files is needed, or additional files need to created dynamically, then consider usingInitialWorkDirRequirement
. - Some WDL
command
sections include copying input files to obtain writable versions. This can be quite slow on many systems, and from a CWL perspective it is better to useInitialWorkDirRequirement
to achieve the same results by marking those inputs as beingwritable: true
. - Most CWL runners provide methods to monitor task execution in real time, so monitoring scripts and other similar techniques can be removed.
- If the
script.bash
(which comes from the WDLcommand
section) meets the following criteria, then consider removing it (and theInitialWorkDirRequirement
if otherwise unused) in favor of directly calling your tool usingbaseCommand
with the name of the executable and any static command line arguments andarguments
with the remaining mix of dynamic and static command line arguments.- No
bash
features likefor
loops andif
statements - A single tool is invoked just once; check for un-escaped semicolons
;
which means there are multiple commands on a single line. - no input/output redirection, use
stdin
,stdout
, andstderr
as need be. - no pipelining
|
.
- No
make install-dep
make test # just the unit tests
make help # to list major makefile targets
make diff_pydocstyle_report # run a diff to show how much changes where made in the docstyle
tox # all the code checks
tox -l # list of all configured tox environments
tox -e py39-pydocstyle # perform only pydocstyle tests (py39 is the version of the python interpreter you have installed)
- Find a WDL workflow. Given below are some links that can be used to find these workflows:
- Use the translator to convert the WDL file and save the result to a .cwl file at a specified location.
python wdl2cwl/main.py path_to_wdl_file -o specified_location
. - If a problem is encountered during the translation or if the WDL workflow has a feature that has not been implemented yet, submit a new issue with a description of the issue at https://github.com/common-workflow-lab/wdl-cwl-translator/issues
- Check whether the outputs of the WDL workflow and the resultant CWL file are equivalent upon giving the required inputs. The required inputs can usually be found in the repository of the workflow itself through a keyword search. (eg: name of the workflow, name of the tool used in the command section) The WDL workflow can be run using a workflow runner like miniwdl. (Refer the documentation https://github.com/chanzuckerberg/miniwdl) The CWL file can be run using cwltool (Refer the documentation https://github.com/common-workflow-language/cwltool)
- Add the WDL workflow to
wdl2cwl/tests/wdl_files
and the resultant CWL file towdl2cwl/tests/cwl_files
. Include the licence and the original location of the WDL file as a comment at the beginning of the document. - Add the name of the added WDL file to
wdl2cwl/tests/test_cwl.py
as an argument under the@pytest.mark.parametrize()
function. - Please run the code checks via
tox
, and fix as many issue as you can on your own.make format
will fix many things for you!