Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Batch configurable startTask #4841

Conversation

adamrtalbot
Copy link
Collaborator

@adamrtalbot adamrtalbot commented Mar 22, 2024

Azure Batch: Configurable startTask per pool

Additions:

  • Make Azure Batch pool startTask configurable.
  • Can be used to modifiy the existing worker node. Use cases:
    • Install more recent version of AzCopy
    • Install monitoring software like Azure Insights
    • Use NVME SSDs for storage
    • Flash Dragen firmware to FPGA chips

Pool specific so you can differentiate between different hardware requirements.

Closes #4259

Also spotted a bug with the dev version and the container configuration so fixed it (first commit, can reorganise if we want it in a second PR). Closed with #4854

To use it, you can try this config:

process {
    executor = 'azurebatch'
}

azure {
    storage {
        accountName = "$AZURE_STORAGE_ACCOUNT_NAME"
        accountKey = "$AZURE_STORAGE_ACCOUNT_KEY"
    }
    batch {
        location = "$AZURE_LOCATION"
        accountName = "$AZURE_BATCH_ACCOUNT_NAME"
        accountKey = "$AZURE_BATCH_ACCOUNT_KEY"
        autoPoolMode = true
        deletePoolsOnCompletion = true
        pools {
            auto {
                startTask = "echo 'HELLO WORLD'"
            }
        }
    }
}

…rmanently

Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
Additions:
 - Make Azure Batch pool startTask configurable.
 - Can be used to modifiy the existing worker node. Use cases:
    - Install more recent version of AzCopy
    - Install monitoring software like Azure Insights
    - Use NVME SSDs for storage
    - Flash Dragen firmware to FPGA chips

Pool specific so you can differentiate between different hardware requirements.

Closes nextflow-io#4259

Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
Copy link

netlify bot commented Mar 22, 2024

Deploy Preview for nextflow-docs-staging canceled.

Name Link
🔨 Latest commit c8cc1e5
🔍 Latest deploy log https://app.netlify.com/sites/nextflow-docs-staging/deploys/661928ab1655b80008d24d02

Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
…eful than them

Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
@adamrtalbot
Copy link
Collaborator Author

I'm not getting these errors locally and they appear to refer to slurping the Google credentials as JSON.

@bentsherman
Copy link
Member

I can't reproduce the error locally either. Probably just a flaky google test. I think other PRs have been having a similar issue. The Azure tests are passing which is a good sign, but the integration tests haven't run yet

@@ -690,7 +690,7 @@ class AzBatchService implements Closeable {
.withFilePath('azcopy')

def poolStartTask = new StartTask()
.withCommandLine('bash -c "chmod +x azcopy && mkdir \$AZ_BATCH_NODE_SHARED_DIR/bin/ && cp azcopy \$AZ_BATCH_NODE_SHARED_DIR/bin/" ')
.withCommandLine(startTaskCmd(spec.opts))
.withResourceFiles(resourceFiles)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if the startTask should be added to the default one instead of overriding it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of my use cases was to remove the azcopy install. Adding the installer feels unexpected, leaving it out is simple so problem solving should be easier.

And with Fusion you don't need azcopy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to allow the customisation of the startTask script, however the control over the azcopy should be left to nextflow.

I've added an option to disable explicitly or implicitly when fusion is enabled. See here

#4883

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems sensible!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% clear how the two will interact though. We should probably do something like this:

install_azcopy_cmd = "chmod +x azcopy && mkdir \$AZ_BATCH_NODE_SHARED_DIR/bin/ && cp azcopy \$AZ_BATCH_NODE_SHARED_DIR/bin/"
custom_start_cmd.  = config.startTask ?: ""

final_start_task   = 'bash -c "$custom_start_cmd; $install_azcopy_cmd"'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of, but it should be taken in consideration that that now the azcopy command can be absent. See here

if( config.batch().getCopyToolInstallMode() != CopyToolInstallMode.node )
return null
final resourceFiles = new ArrayList(10)
resourceFiles << new ResourceFile()
.withHttpUrl(AZCOPY_URL)
.withFilePath('azcopy')
return new StartTask()
.withCommandLine('bash -c "chmod +x azcopy && mkdir \$AZ_BATCH_NODE_SHARED_DIR/bin/ && cp azcopy \$AZ_BATCH_NODE_SHARED_DIR/bin/" ')
.withResourceFiles(resourceFiles)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK implemented now: 33969d3

@adamrtalbot adamrtalbot requested a review from a team as a code owner March 28, 2024 13:26
Signed-off-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
@adamrtalbot
Copy link
Collaborator Author

adamrtalbot commented Apr 5, 2024

Update: Combine azcopy + custom start task:

process {
    executor = 'azurebatch'
}

azure {
    storage {
        accountName = "$AZURE_STORAGE_ACCOUNT_NAME"
        accountKey = "$AZURE_STORAGE_ACCOUNT_KEY"
    }
    batch {
        location = "$AZURE_LOCATION"
        accountName = "$AZURE_BATCH_ACCOUNT_NAME"
        accountKey = "$AZURE_BATCH_ACCOUNT_KEY"
        autoPoolMode = true
        deletePoolsOnCompletion = true
        pools {
            auto {
                startTask = "echo 'HELLO WORLD3'"
            }
        }
    }
}

Result:
image

To omit azcopy and use a start task:

process {
    executor = 'azurebatch'
}

azure {
    storage {
        accountName = "$AZURE_STORAGE_ACCOUNT_NAME"
        accountKey = "$AZURE_STORAGE_ACCOUNT_KEY"
    }
    batch {
        location = "$AZURE_LOCATION"
        accountName = "$AZURE_BATCH_ACCOUNT_NAME"
        accountKey = "$AZURE_BATCH_ACCOUNT_KEY"
        autoPoolMode = true
        deletePoolsOnCompletion = true
        pools {
            auto {
                startTask = "echo 'HELLO WORLD2'"
            }
        }
        copyToolInstallMode = 'off'
    }
}

Result:
image

Comment on lines 684 to 702
// protected String startTaskCmd(AzPoolOpts opts) {

// def startCmd = []

// // If enabled, append azcopy installer to start task command
// // TODO If statement here to handle if this is disabled.
// if( config.batch().getCopyToolInstallMode() != CopyToolInstallMode.node )
// startCmd << 'chmod +x azcopy && mkdir \$AZ_BATCH_NODE_SHARED_DIR/bin/ && cp azcopy \$AZ_BATCH_NODE_SHARED_DIR/bin/'

// // Get any custom start task command
// if ( opts.startTask ) {
// startCmd << opts.startTask
// }

// final startTaskCmd = "bash -c \"${startCmd.join(';')}\""
// log.debug "Start task command:\n$startTaskCmd"

// return startTaskCmd
// }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the commented code

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Done now.

Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
.withFilePath('azcopy')
log.debug "Adding azcopy installer to start task"

final startTaskCmd = "bash -c \"${startCmd.join('; ')}\""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if thre's no startTask script and copyToolInstallMode != node? this is going to be empty.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah correct. It should return null instead of an empty task (though I guess an empty task will work first time).

Copy link
Member

@pditommaso pditommaso Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add a unit test for createStartTask to check different conditions.

Copy link
Collaborator Author

@adamrtalbot adamrtalbot Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now returns null if startTask empty which prevents start task being added:

image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests added: f9708b0

@adamrtalbot adamrtalbot force-pushed the azure_batch_configurable_start_task branch from f9708b0 to b65c7cc Compare April 5, 2024 21:35
Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
@adamrtalbot adamrtalbot force-pushed the azure_batch_configurable_start_task branch from b65c7cc to cd416e4 Compare April 5, 2024 21:41
Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
@adamrtalbot adamrtalbot force-pushed the azure_batch_configurable_start_task branch from 95eb558 to c145e45 Compare April 8, 2024 15:27
@adamrtalbot
Copy link
Collaborator Author

Added ability to run the startTask as root user, necessary for modifying the pool hardware.

Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
docs/config.md Outdated
Comment on lines 421 to 422
`azure.batch.pools.<name>.startTaskPrivileged`
: Enable the task to run with elevated access. Ignored if `runAs` is set (default: `false`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would make sense to run it always as privileged ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to if you want to add some basic software to the PATH (azcopy), so I think configurable makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then, I'd ask to have this organised as

azure.batch.pools.<name>.startTask.script
azure.batch.pools.<name>.startTask.privileged

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you'd say that. I don't know how, give me some time to work it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's enough to add a small class

class AzStartTask {
  String script
  boolean unprivilged

  AzStartTask(Map opts) { ... todo }
}

and create it using the usual pattern in the config

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be merged to this branch from here, once finished adamrtalbot#1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All changes in now. startTask is a map:

azure {
    storage {
        accountName = "$AZURE_STORAGE_ACCOUNT_NAME"
        accountKey = "$AZURE_STORAGE_ACCOUNT_KEY"
    }
    batch {
        location = "$AZURE_LOCATION"
        accountName = "$AZURE_BATCH_ACCOUNT_NAME"
        accountKey = "$AZURE_BATCH_ACCOUNT_KEY"
        autoPoolMode = true
        deletePoolsOnCompletion = true
        pools {
            auto {
                startTask {
                    script = "echo 'HELLO WORLD'"
                    privileged = true
                }
            }
        }
    }
}

@pditommaso pditommaso marked this pull request as draft April 10, 2024 18:51
Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
adamrtalbot and others added 2 commits April 12, 2024 13:26
…ts.groovy

Co-authored-by: Paolo Di Tommaso <paolo.ditommaso@gmail.com>
Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
Signed-off-by: Adam Talbot <12817534+adamrtalbot@users.noreply.github.com>
@pditommaso
Copy link
Member

Solved via 27d01e3

@pditommaso pditommaso closed this Apr 14, 2024
@adamrtalbot
Copy link
Collaborator Author

Thanks @pditommaso!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Azure Batch: Configurable startTask
3 participants