-
Notifications
You must be signed in to change notification settings - Fork 630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Job arrays #3892
Job arrays #3892
Conversation
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
This comment was marked as outdated.
This comment was marked as outdated.
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
This comment was marked as outdated.
This comment was marked as outdated.
If a task fails and is retried with increased resources, it will be batched with other tasks that may still be on their first attempt. In that case, the array job resources will depend on whichever tasks happens to be first in the batch. One solution is to take the max value of cpus, memory, time, etc for all tasks in an array job. That would be "safe" but likely much more expensive -- if a single task requests twice the resources, suddenly the entire array job does as well. Another solution is to further separate batches by configuration, to ensure that they are uniform. We could go crazy and separate batches by the tuple of |
We could also just provide config options for these things:
|
The point of these config options is that there is a trade-off between bandwidth and latency when batching tasks like this, so users should ideally have the ability to manage that trade-off in a way that best fits their use case. If someone doesn't use retry with dynamic resources, then they don't need to group by attempt, and vise versa. |
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
This comment was marked as outdated.
This comment was marked as outdated.
modules/nextflow/src/main/groovy/nextflow/executor/ArrayExecutor.groovy
Outdated
Show resolved
Hide resolved
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
This comment was marked as outdated.
This comment was marked as outdated.
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
plugins/nf-amazon/src/main/nextflow/cloud/aws/batch/AwsBatchExecutor.groovy
Outdated
Show resolved
Hide resolved
plugins/nf-amazon/src/main/nextflow/cloud/aws/batch/AwsBatchExecutor.groovy
Outdated
Show resolved
Hide resolved
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
plugins/nf-google/src/main/nextflow/cloud/google/batch/GoogleBatchTaskHandler.groovy
Show resolved
Hide resolved
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
umm, this still shows
|
Signed-off-by: Ben Sherman <bentshermann@gmail.com>
Yeah I didn't think the volatile would help. Really there is no point in checking if the current thread is interrupted... |
That's the recommended pattern by Java Java Concurrency in Practice bible |
This reverts commit 247b721.
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@bentsherman all solved with Google Batch logs and child ids? |
Google Batch logs are working Why would you check if the thread you are currently in is interrupted? If it's interrupted then you wouldn't get the chance to check if it's interrupted, you would just be interrupted... |
Anyhow, it is worth covering in a separate PR |
Should be solved with: |
think so |
Fascinating the dir content when using job array (only for nerds :))
|
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Reverted some renaming on launch <> submit methods because still there were some inconsistencies and to prevent breaking the xpack deps |
Think we are finally ready to merge this. Great effort 👏 👏 |
Job array is a capability provided by some batch schedulers that allows spawning the execution of multiple copies of the same job in an efficient manner. Nextflow allows the use of this capability by setting the process directive `array <some value>` that determines the (max) number of jobs in the array. For example ``` process foo { array 10 ''' your_task ''' } ``` or in the nextflow config file ``` process.array = 10 ``` Currently this feature is supported by the following executors: * Slurm * Sge * Pbs * Pbs Pro * LSF * AWS Batch * Google Batch Signed-off-by: Ben Sherman <bentshermann@gmail.com> Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com> Signed-off-by: Mahesh Binzer-Panchal <mahesh.binzer-panchal@nbis.se> Signed-off-by: Herman Singh <herman@massmatrix.bio> Signed-off-by: Dr Marco Claudio De La Pierre <marco.delapierre@gmail.com> Co-authored-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com> Co-authored-by: Abhinav Sharma <abhi18av@users.noreply.github.com> Co-authored-by: Mahesh Binzer-Panchal <mahesh.binzer-panchal@nbis.se> Co-authored-by: Herman Singh <kartstig@gmail.com> Co-authored-by: Dr Marco Claudio De La Pierre <marco.delapierre@gmail.com>
Closes #1477 (and possibly #1427)
Summary of changes:
Adds
array
directive to submit tasks as array jobs of a given sizeTaskArrayCollector
collects tasks into arrays and submits each array job when it is ready to the underlying executor. The executor must implement theTaskArrayAware
interface. Each process has its own array job collector.When all input channels to a process have received the "poison pill", the process is "closed" and the array job collector is notified so that it can submit any remaining tasks. All subsequent tasks (e.g. retries) will be submitted as individual tasks.
TaskArray
is a special type ofTaskRun
for an array job that holds the list of child task handlers. For an executor that supports array jobs, the task handler can check whether its task is aTaskArray
to apply perform job specific behavior.TaskHandler
has a few more methods, which the array job collector uses to create the array job script. This script simply defines the list of child work directories, selects a work dir based on the index, and launches the child task using an executor-specific launch command.TaskPollingMonitor
has been modified to handle both array jobs and regular tasks. The array job is handled like any other task, but then discarded once it has been submitted. The task stats reported by Nextflow are the same with or without array jobs -- array jobs are not included in the task stats.Here's the pipeline I'm using as the e2e test:
TODO: