Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Batch configurable startTask #4841

Closed
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
da3a436
Fix Azure Batch containerConfiguration which needs Docker type set pe…
adamrtalbot Mar 22, 2024
413ea8d
Azure Batch: Configurable startTask per pool
adamrtalbot Mar 22, 2024
d714263
Bump azure plugin version in docs
adamrtalbot Mar 22, 2024
59d0b31
comma
adamrtalbot Mar 22, 2024
a99e616
Move startTasks docs below registry and vnet because it seems less us…
adamrtalbot Mar 22, 2024
baaac49
Merge branch 'master' into azure_batch_configurable_start_task
adamrtalbot Mar 22, 2024
e0bf9f7
Merge branch 'master' into azure_batch_configurable_start_task
adamrtalbot Mar 25, 2024
16476f8
Merge branch 'master' into azure_batch_configurable_start_task
adamrtalbot Mar 27, 2024
ad4f2a1
Merge branch 'master' into azure_batch_configurable_start_task
bentsherman Mar 28, 2024
be15693
Fix test when missing google secret
pditommaso Apr 3, 2024
8ae8eae
Merge branch 'master' into azure_batch_configurable_start_task
bentsherman Apr 3, 2024
119c3ca
Merge branch 'master' into azure_batch_configurable_start_task
pditommaso Apr 3, 2024
b42d983
Merge branch 'master' into azure_batch_configurable_start_task
adamrtalbot Apr 5, 2024
33969d3
Combine azcopy and custom start task based on options
adamrtalbot Apr 5, 2024
0abaf47
Remove old code.
adamrtalbot Apr 5, 2024
7c2af6a
Prevent startTask being added to Azure Batch pool if not configured
adamrtalbot Apr 5, 2024
cd416e4
Add tests for Azure Batch createStartTask method
adamrtalbot Apr 5, 2024
c145e45
Remove startTaskResourceFile object from test
adamrtalbot Apr 8, 2024
e5c29cf
Merge branch 'master' into azure_batch_configurable_start_task
adamrtalbot Apr 8, 2024
128e612
Added ability to run Azure Batch node startTask as privileged (root) …
adamrtalbot Apr 9, 2024
98ce78d
Escalate startTaskPrivileged to pool admin instead of task admin
adamrtalbot Apr 9, 2024
5497860
Add startTask config item to Azure
adamrtalbot Apr 11, 2024
54d07f4
Some changes but still not working
adamrtalbot Apr 11, 2024
c61e10e
Remove Map()
adamrtalbot Apr 11, 2024
b53386c
Try 'as Map'
adamrtalbot Apr 11, 2024
b5ce455
Merge branch 'master' into azure_batch_configurable_start_task
adamrtalbot Apr 11, 2024
21d780c
tests for new config option
adamrtalbot Apr 11, 2024
cb24ef2
Docs for Azure startTask options
adamrtalbot Apr 11, 2024
5b2d406
Update plugins/nf-azure/src/main/nextflow/cloud/azure/config/AzPoolOp…
adamrtalbot Apr 12, 2024
c8cc1e5
Remove getStartTask method
adamrtalbot Apr 12, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions docs/azure.md
Original file line number Diff line number Diff line change
Expand Up @@ -380,6 +380,19 @@ The value of the setting must be the identifier of a subnet available in the vir
Batch Authentication with Shared Keys does not allow to link external resources (like Virtual Networks) to the pool. Therefore, Active Directory Authentication must be used in conjunction with the `virtualNetwork` setting.
:::

### Start Task

Nextflow uses azcopy to stage files in and out of the worker nodes. To do this, it installs azcopy to a shared directory by running a start task. If you have additional requirements for the worker nodes, you can modify this start task by setting the property `startTask`. This is the default shell script:

```shell
bash -c "chmod +x azcopy && mkdir $AZ_BATCH_NODE_SHARED_DIR/bin/ && cp azcopy $AZ_BATCH_NODE_SHARED_DIR/bin/"
```

:::{warning}
If you modify the `startTask` causing azcopy to be unavailable on the worker machine your tasks be unable to correctly stage files in and out and they may fail.
:::


## Microsoft Entra (formerly Active Directory Authentication)

:::{versionadded} 22.11.0-edge
Expand Down
7 changes: 7 additions & 0 deletions docs/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -414,6 +414,13 @@ The following settings are available:
: *New in `nf-azure` version `0.11.0`*
: Specify the ID of the Compute Node agent SKU which the pool identified with `<name>` supports (default: `batch.node.centos 8`).

`azure.batch.pools.<name>.startTask`
: *New in `nf-azure` version `1.5.2`*
: Specify the startTask that is executed as the node joins the Azure Batch node pool (default: `bash -c "chmod +x azcopy && mkdir \$AZ_BATCH_NODE_SHARED_DIR/bin/ && cp azcopy \$AZ_BATCH_NODE_SHARED_DIR/bin/"`).

`azure.batch.pools.<name>.startTaskPrivileged`
: Enable the task to run with elevated access. Ignored if `runAs` is set (default: `false`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would make sense to run it always as privileged ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to if you want to add some basic software to the PATH (azcopy), so I think configurable makes sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then, I'd ask to have this organised as

azure.batch.pools.<name>.startTask.script
azure.batch.pools.<name>.startTask.privileged

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you'd say that. I don't know how, give me some time to work it out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's enough to add a small class

class AzStartTask {
  String script
  boolean unprivilged

  AzStartTask(Map opts) { ... todo }
}

and create it using the usual pattern in the config

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be merged to this branch from here, once finished adamrtalbot#1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All changes in now. startTask is a map:

azure {
    storage {
        accountName = "$AZURE_STORAGE_ACCOUNT_NAME"
        accountKey = "$AZURE_STORAGE_ACCOUNT_KEY"
    }
    batch {
        location = "$AZURE_LOCATION"
        accountName = "$AZURE_BATCH_ACCOUNT_NAME"
        accountKey = "$AZURE_BATCH_ACCOUNT_KEY"
        autoPoolMode = true
        deletePoolsOnCompletion = true
        pools {
            auto {
                startTask {
                    script = "echo 'HELLO WORLD'"
                    privileged = true
                }
            }
        }
    }
}


`azure.batch.pools.<name>.virtualNetwork`
: :::{versionadded} 23.03.0-edge
:::
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -430,7 +430,7 @@ class AzBatchService implements Closeable {

return new TaskAddParameter()
.withId(taskId)
.withUserIdentity(userIdentity(pool.opts.privileged, pool.opts.runAs))
.withUserIdentity(userIdentity(pool.opts.privileged, pool.opts.runAs, AutoUserScope.TASK))
.withContainerSettings(containerOpts)
.withCommandLine(cmd)
.withResourceFiles(resourceFileUrls(task,sas))
Expand Down Expand Up @@ -660,7 +660,7 @@ class AzBatchService implements Closeable {
*
* https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/batch/batch-docker-container-workloads.md#:~:text=Run%20container%20applications%20on%20Azure,compatible%20containers%20on%20the%20nodes.
*/
final containerConfig = new ContainerConfiguration().withType(DOCKER_COMPATIBLE);
final containerConfig = new ContainerConfiguration().withType(DOCKER_COMPATIBLE)
final registryOpts = config.registry()

if( registryOpts && registryOpts.isConfigured() ) {
Expand All @@ -681,18 +681,40 @@ class AzBatchService implements Closeable {
.withContainerConfiguration(containerConfig)
}

protected StartTask createStartTask() {
if( config.batch().getCopyToolInstallMode() != CopyToolInstallMode.node )
return null

final resourceFiles = new ArrayList(10)
resourceFiles << new ResourceFile()
.withHttpUrl(AZCOPY_URL)
.withFilePath('azcopy')
protected StartTask createStartTask(AzPoolOpts opts) {

def startCmd = [] as ArrayList
final resourceFiles = [] as ArrayList

return new StartTask()
.withCommandLine('bash -c "chmod +x azcopy && mkdir \$AZ_BATCH_NODE_SHARED_DIR/bin/ && cp azcopy \$AZ_BATCH_NODE_SHARED_DIR/bin/" ')
.withResourceFiles(resourceFiles)
// Get any custom start task command
if ( opts.startTask ) {
startCmd << opts.startTask
log.debug "Adding custom start task to command: ${opts.startTask}"
}

// If enabled, append azcopy installer to start task command
if( config.batch().getCopyToolInstallMode() == CopyToolInstallMode.node ) {
startCmd << 'chmod +x azcopy && mkdir \$AZ_BATCH_NODE_SHARED_DIR/bin/ && cp azcopy \$AZ_BATCH_NODE_SHARED_DIR/bin/'
resourceFiles << new ResourceFile()
.withHttpUrl(AZCOPY_URL)
.withFilePath('azcopy')
log.debug "Adding azcopy installer to start task"
}

final startTaskCmd = "bash -c \"${startCmd.join('; ')}\""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if thre's no startTask script and copyToolInstallMode != node? this is going to be empty.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah correct. It should return null instead of an empty task (though I guess an empty task will work first time).

Copy link
Member

@pditommaso pditommaso Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add a unit test for createStartTask to check different conditions.

Copy link
Collaborator Author

@adamrtalbot adamrtalbot Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now returns null if startTask empty which prevents start task being added:

image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests added: f9708b0

log.debug "Start task final command: $startTaskCmd"

// If there is no start task contents we return a null to indicate no start task
if ( startCmd.isEmpty() && resourceFiles.isEmpty() ) {
return null
} else {
// else return a StartTask object with the start task command and resource files
return new StartTask()
.withCommandLine(startTaskCmd)
.withResourceFiles(resourceFiles)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if the startTask should be added to the default one instead of overriding it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of my use cases was to remove the azcopy install. Adding the installer feels unexpected, leaving it out is simple so problem solving should be easier.

And with Fusion you don't need azcopy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to allow the customisation of the startTask script, however the control over the azcopy should be left to nextflow.

I've added an option to disable explicitly or implicitly when fusion is enabled. See here

#4883

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems sensible!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% clear how the two will interact though. We should probably do something like this:

install_azcopy_cmd = "chmod +x azcopy && mkdir \$AZ_BATCH_NODE_SHARED_DIR/bin/ && cp azcopy \$AZ_BATCH_NODE_SHARED_DIR/bin/"
custom_start_cmd.  = config.startTask ?: ""

final_start_task   = 'bash -c "$custom_start_cmd; $install_azcopy_cmd"'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of, but it should be taken in consideration that that now the azcopy command can be absent. See here

if( config.batch().getCopyToolInstallMode() != CopyToolInstallMode.node )
return null
final resourceFiles = new ArrayList(10)
resourceFiles << new ResourceFile()
.withHttpUrl(AZCOPY_URL)
.withFilePath('azcopy')
return new StartTask()
.withCommandLine('bash -c "chmod +x azcopy && mkdir \$AZ_BATCH_NODE_SHARED_DIR/bin/ && cp azcopy \$AZ_BATCH_NODE_SHARED_DIR/bin/" ')
.withResourceFiles(resourceFiles)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK implemented now: 33969d3

.withUserIdentity(userIdentity(opts.startTaskPrivileged, null, AutoUserScope.POOL))
}
}

protected void createPool(AzVmPoolSpec spec) {
Expand All @@ -706,7 +728,7 @@ class AzBatchService implements Closeable {
// https://docs.microsoft.com/en-us/azure/batch/batch-parallel-node-tasks
.withTaskSlotsPerNode(spec.vmType.numberOfCores)

final startTask = createStartTask()
final startTask = createStartTask(spec.opts)
if( startTask ) {
poolParams .withStartTask(startTask)
}
Expand Down Expand Up @@ -860,14 +882,14 @@ class AzBatchService implements Closeable {
}
}

protected UserIdentity userIdentity(boolean privileged, String runAs) {
protected UserIdentity userIdentity(boolean privileged, String runAs, AutoUserScope scope) {
UserIdentity identity = new UserIdentity()
if (runAs) {
identity.withUserName(runAs)
} else {
identity.withAutoUser(new AutoUserSpecification()
.withElevationLevel(privileged ? ElevationLevel.ADMIN : ElevationLevel.NON_ADMIN)
.withScope(AutoUserScope.TASK))
.withScope(scope))
}
return identity
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,9 @@ class AzPoolOpts implements CacheFunnel {
Duration scaleInterval
Integer maxVmCount

String startTask
boolean startTaskPrivileged

String schedulePolicy // spread | pack
String registry
String userName
Expand All @@ -86,6 +89,8 @@ class AzPoolOpts implements CacheFunnel {
this.schedulePolicy = opts.schedulePolicy
this.scaleInterval = opts.scaleInterval as Duration ?: DEFAULT_SCALE_INTERVAL
this.maxVmCount = opts.maxVmCount as Integer ?: vmCount *3
this.startTask = opts.startTask
this.startTaskPrivileged = opts.startTaskPrivileged ?: false
this.registry = opts.registry
this.userName = opts.userName
this.password = opts.password
Expand All @@ -111,6 +116,8 @@ class AzPoolOpts implements CacheFunnel {
hasher.putUnencodedChars(schedulePolicy ?: '')
hasher.putUnencodedChars(virtualNetwork ?: '')
hasher.putBoolean(lowPriority)
hasher.putUnencodedChars(startTask ?: '')
hasher.putBoolean(startTaskPrivileged)
return hasher
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,81 @@ class AzBatchServiceTest extends Specification {
}


def 'should configure default startTask' () {
given:
def CONFIG = [batch:[copyToolInstallMode: 'node']]
def exec = Mock(AzBatchExecutor) {getConfig() >> new AzConfig(CONFIG) }
def svc = new AzBatchService(exec)

when:
def configuredStartTask = svc.createStartTask( new AzPoolOpts() )
then:
configuredStartTask.commandLine == 'bash -c "chmod +x azcopy && mkdir $AZ_BATCH_NODE_SHARED_DIR/bin/ && cp azcopy $AZ_BATCH_NODE_SHARED_DIR/bin/"'
}

def 'should configure custom startTask' () {
given:
def CONFIG = [batch:[copyToolInstallMode: 'node']]
def exec = Mock(AzBatchExecutor) {getConfig() >> new AzConfig(CONFIG) }
def svc = new AzBatchService(exec)

when:
def configuredStartTask = svc.createStartTask( new AzPoolOpts(startTask: 'echo hello-world') )
then:
configuredStartTask.commandLine == 'bash -c "echo hello-world; chmod +x azcopy && mkdir $AZ_BATCH_NODE_SHARED_DIR/bin/ && cp azcopy $AZ_BATCH_NODE_SHARED_DIR/bin/"'
}

def 'should configure not install AzCopy because copyToolInstallMode is off' () {
given:
def CONFIG = [batch:[copyToolInstallMode: 'off']]
def exec = Mock(AzBatchExecutor) {getConfig() >> new AzConfig(CONFIG) }
def svc = new AzBatchService(exec)

when:
def configuredStartTask = svc.createStartTask( new AzPoolOpts(startTask: 'echo hello-world') )
then:
configuredStartTask.commandLine == 'bash -c "echo hello-world"'
configuredStartTask.resourceFiles == []
}

def 'should configure not install AzCopy because copyToolInstallMode is task' () {
given:
def CONFIG = [batch:[copyToolInstallMode: 'task']]
def exec = Mock(AzBatchExecutor) {getConfig() >> new AzConfig(CONFIG) }
def svc = new AzBatchService(exec)

when:
def configuredStartTask = svc.createStartTask( new AzPoolOpts(startTask: 'echo hello-world') )
then:
configuredStartTask.commandLine == 'bash -c "echo hello-world"'
configuredStartTask.resourceFiles == []
}

def 'should create null startTask because no options are enabled' () {
given:
def CONFIG = [batch:[copyToolInstallMode: 'off']]
def exec = Mock(AzBatchExecutor) {getConfig() >> new AzConfig(CONFIG) }
def svc = new AzBatchService(exec)

when:
def configuredStartTask = svc.createStartTask( new AzPoolOpts() )
then:
configuredStartTask == null
}

def 'should configure privileged startTask' () {
given:
def CONFIG = [batch:[copyToolInstallMode: 'node']]
def exec = Mock(AzBatchExecutor) {getConfig() >> new AzConfig(CONFIG) }
def svc = new AzBatchService(exec)
and:

when:
def configuredStartTask = svc.createStartTask( new AzPoolOpts(startTaskPrivileged: true) )
then:
configuredStartTask.userIdentity().autoUser().elevationLevel().value == 'admin'
}

def 'should check scaling formula' () {
given:
def exec = Mock(AzBatchExecutor) { getConfig() >> new AzConfig([:]) }
Expand Down Expand Up @@ -359,7 +434,7 @@ class AzBatchServiceTest extends Specification {
then:
1 * svc.guessBestVm(LOC, CPUS, MEM, TYPE) >> VM
and:
spec.poolId == 'nf-pool-6e9cf97d3d846621464131d3842265ce-Standard_X1'
spec.poolId == 'nf-pool-289d374ac1622e709cf863bce2570cab-Standard_X1'
spec.metadata == [foo: 'bar']

}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ class AzPoolOptsTest extends Specification {
!opts.password
!opts.virtualNetwork
!opts.lowPriority
!opts.startTask
!opts.startTaskPrivileged
}

def 'should create pool with custom options' () {
Expand All @@ -68,7 +70,9 @@ class AzPoolOptsTest extends Specification {
userName: 'some-user',
password: 'some-pwd',
virtualNetwork: 'some-vnet',
lowPriority: true
lowPriority: true,
startTask: 'echo hello-world',
startTaskPrivileged: true
])
then:
opts.runAs == 'foo'
Expand All @@ -89,6 +93,8 @@ class AzPoolOptsTest extends Specification {
opts.password == 'some-pwd'
opts.virtualNetwork == 'some-vnet'
opts.lowPriority
opts.startTask == 'echo hello-world'
opts.startTaskPrivileged
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,8 @@ class AzureConfigTest extends Specification {
runAs: 'root',
scaleFormula: 'x + y + z',
scaleInterval: '15 min',
schedulePolicy: 'pack' ]]
schedulePolicy: 'pack',
startTask: 'echo hello-world']]
]] ]
}

Expand Down Expand Up @@ -145,6 +146,7 @@ class AzureConfigTest extends Specification {
cfg.batch().pool('myPool').scaleInterval == Duration.of('15 min')
cfg.batch().pool('myPool').privileged == true
cfg.batch().pool('myPool').runAs == 'root'
cfg.batch().pool('myPool').startTask == 'echo hello-world'
}

def 'should get azure batch endpoint from account and location' () {
Expand All @@ -171,7 +173,7 @@ class AzureConfigTest extends Specification {
cfg.batch().location == LOCATION
}

def 'should get azure batch file share root point' () {
def 'should get azure batch file share root point' () {

given:
def KEY = 'xyz1343'
Expand All @@ -188,12 +190,12 @@ class AzureConfigTest extends Specification {
endpoint: ENDPOINT,
location: LOCATION,
pools: [
myPool1: [ fileShareRootPath: '/somewhere/over/the/rainbow' ],
myPool2: [ sku:'batch.node.centos 8' ],
myPool3: [ sku:'batch.node.ubuntu 20.04' ],
myPool4: [ sku:'batch.node.debian 10', fileShareRootPath: '/mounting/here' ],
myPool5: [ : ]]
]] ]
myPool1: [ fileShareRootPath: '/somewhere/over/the/rainbow' ],
myPool2: [ sku:'batch.node.centos 8' ],
myPool3: [ sku:'batch.node.ubuntu 20.04' ],
myPool4: [ sku:'batch.node.debian 10', fileShareRootPath: '/mounting/here' ],
myPool5: [ : ]]
]] ]
}
def cfg = AzConfig.getConfig(session)
then:
Expand Down
Loading