Replies: 1 comment
-
At LUMC we use pytest-workflow to test our workflows. It will copy all files to a temporary directory, run it there and report the results back. This way your source directory with your workflow and test files is not affected. No cleanup is needed. Pytest-workflow cleans up all files in the temporary directory by default. With the It works with miniwdl, cromwell and if you have some other workflow engine it will work with that too. Configuration is very easy. Just define the command to run the workflow in a YAML file as wel as a list of tests that you want to run on the results. Let us know what you think. Issues on the pytest-workflow repository are welcome. |
Beta Was this translation helpful? Give feedback.
-
I often find myself testing and running workflows, then needing to take some action to reset state if the workflow fails (e.g. if I update an SQL table, or copy files to a certain location, I may want to remove the rows I inserted and the files I added if the full workflow fails to complete).
Currently when a workflow fails I manually run a series of commands to reset the state before retrying the workflow, but it would be helpful if I could define a task that runs only if one of the upstream tasks fails, so that I can automatically "reset" if something goes wrong. Alternatively, a more general task that runs after all other workflow tasks whether or not any fail may also be helpful in other contexts (e.g.
finally call cleanupTask {}
, where the task has info on whether it was called due to success from the rest of the workflow, or failure from a certain task).I've considered adding traps in the command block of each WDL task, but if tasks run in parallel this can get complex (e.g. maybe I want to trap an error by removing the contents of some bucket location, while at the same time that bucket is being written to in a parallel task). In reality the cleanup task should run after all other tasks have completed if any task has failed.
Currently I've implemented this by setting the
continueOnReturnCode
runtime attribute totrue
on all upstream tasks, collecting the return codes from each task, then feeding them to my final cleanup task; I check if anyrc
is nonzero, and if so, I perform my cleanup operations. Unfortunately this means that the full workflow has to run (including tasks that may not make sense if an upstream fails) before I get to my cleanup task, rather than entering that task directly following any upstream failure. I've also considered having a cleanup task following each major pipeline step, but more or less doubling the number of tasks does not seem like a very elegant solution.Beta Was this translation helpful? Give feedback.
All reactions