Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply Azure Batch setting to all pools via config #4236

Open
adamrtalbot opened this issue Aug 29, 2023 · 2 comments
Open

Apply Azure Batch setting to all pools via config #4236

adamrtalbot opened this issue Aug 29, 2023 · 2 comments

Comments

@adamrtalbot
Copy link
Collaborator

New feature

Azure Batch has a number of settings to apply to a pool, such as autoscale, vmType, sku etc. But you must generate this configuration manually for all pools, requiring explicit configuration. For example, to make all pools use low priority VMs (taken from the Microsoft blog):

// Scale formula to use low-priority nodes only.
lowPriorityScaleFormula = '''
    lifespan = time() - time("{{poolCreationTime}}");
    interval = TimeInterval_Minute * {{scaleInterval}};
    $samples = $PendingTasks.GetSamplePercent(interval);
    $tasks = $samples < 70 ? max(0, $PendingTasks.GetSample(1)) : max($PendingTasks.GetSample(1), avg($PendingTasks.GetSample(interval)));
    $targetVMs = $tasks > 0 ? $tasks : max(0, $TargetLowPriorityNodes/2);
    targetPoolSize = max(0, min($targetVMs, {{maxVmCount}}));
    $TargetLowPriorityNodes = lifespan < interval ? {{vmCount}} : targetPoolSize;
    $TargetDedicatedNodes = 0;
    $NodeDeallocationOption = taskcompletion;
'''

azure {
    batch {
        pools {
            Standard_E2d_v4 {
                autoScale = true
                vmType = 'Standard_E2d_v4'
                vmCount = 2
                maxVmCount = 20
                scaleFormula = lowPriorityScaleFormula
            }
            Standard_E8d_v4 {
                autoScale = true
                vmType = 'Standard_E8d_v4'
                vmCount = 2
                maxVmCount = 20
                scaleFormula = lowPriorityScaleFormula
            }
            Standard_E16d_v4 {
                autoScale = true
                vmType = 'Standard_E16d_v4'
                vmCount = 2
                maxVmCount = 20
                scaleFormula = lowPriorityScaleFormula
            }
            Standard_E32d_v4 {
                autoScale = true
                vmType = 'Standard_E32d_v4'
                vmCount = 2
                maxVmCount = 10
                scaleFormula = lowPriorityScaleFormula
            }
        }
    }
}

I'd like to simplify this to apply to all pools, somehow, so the config would look like this:

azure {
    batch {
        pools {
            // magically apply the following to all pools
            '*' {
                autoScale = true
                vmCount = 2
                maxVmCount = 20
                scaleFormula = lowPriorityScaleFormula
            }
            Standard_E2d_v4 {
                vmType = 'Standard_E2d_v4'
            }
            Standard_E8d_v4 {
                vmType = 'Standard_E8d_v4'
            }
            Standard_E16d_v4 {
                vmType = 'Standard_E16d_v4'
            }
            Standard_E32d_v4 {
                vmType = 'Standard_E32d_v4'
                vmCount = 2
                maxVmCount = 10
            }
        }
    }
}

Usage scenario

Being able to apply generic configuration allows users to specify organisation wide set ups, or redeployable pipelines. For example, a Tower pipeline could be freely moved around Tower forge compute environments without having to re-write the config every time.

Suggest implementation

Two options I can think of:

  • A special default pool, similar to the special auto pool, which applies to all pool parameters until overridden. In the above example we would use this:
            default {
                autoScale = true
                vmCount = 2
                maxVmCount = 20
                scaleFormula = lowPriorityScaleFormula
            }
  • Use globbing similar to withName in a process selector. The above example would be:
            '.*' {
                autoScale = true
                vmCount = 2
                maxVmCount = 20
                scaleFormula = lowPriorityScaleFormula
            }
@bentsherman
Copy link
Member

See also #4186 for publishDir directive

@adamrtalbot
Copy link
Collaborator Author

Note when using the autoPools feature of Nextflow you should be able to just assign this to auto and achieve this. If you want to split it across multiple sized machines you may be able to do it like this comment:

#4304 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants