nextflow-k8s-operator
(NeKO) is a tool that allows you to run your
Nextflow pipelines natively in a Kubernetes
environment, manage and monitor them by means of the Kubernetes client
(kubectl
).
NeKO takes the burden of orchestrating your pipelines off your shoulders.
You can focus on doing science.
The Operator is a Kubernetes pattern that uses custom resources to
handle applications in a k8s-native way. For example, nextflow-k8s-operator
defines a NextflowLaunch
resource that encompasses the entire launch of a
Nextflow pipeline, including Nextflow configuration, environment settings, and
pipeline parameters. Making it this way allows for reproducible and
easy-to-maintain runs.
Under the hood of an operator is the controller - a program running in the background, taking care of all the tasks that a human operator would typically do: spawning pods, validating the configuration, monitoring runs, etc.
controller - a program that connects with the Kubernetes client, runs in the
background, and manages the custom resources defined by the operator; in our
case, the controller is a binary file called manager
that you will find in the
bin/
subdirectory after NeKO has been built (see: Installation).
launch - a (reusable) artifact that makes a "recipe" for a single run of a Nextflow pipeline; since they are Kubernetes custom resources, Nextflow launches are defined as yaml files (see: Usage).
driver - the main pod launched by the controller, it contains an instance of Nextflow; a single launch spawns only one driver pod, which creates any number of worker pods, doing the actual work.
worker - a pod that does the "heavy lifting" of the Nextflow pipeline; one or more workers are launched from within the driver.
The controller can be running either as a pod on the Kubernetes cluster, or as a standalone program running on the local machine (NOTE: although connection with Kubernetes is required, the controller can be run on any machine, including your personal computer).
Regardless of the mode of execution, start with installing the custom resources used by NeKO:
make install
You will need go
installed, e.g.
sudo apt install golang-go
Build the Docker image:
make docker-build IMG=mycontroller:latest
A Docker image named mycontroller
will be created. You can push it to an image
registry:
make docker-push IMG=mycontroller:latest
Finally, deploy the controller to the Kubernetes cluster:
make deploy IMG=mycontroller:latest
To uninstall NeKO, run:
make undeploy uninstall
Simple helm installation for nextflow-k8s-operator
To add the nextflow-k8s-operator
helm repo, run:
helm repo add nextflow-k8s-operator https://nextflow-k8s-operator-helm.s3.eu-north-1.amazonaws.com
To install a release named nextflow-k8s-operator
, run:
helm install nextflow-k8s-operator nextflow-k8s-operator/nextflow-k8s-operator
Build the controller:
make
Following a successful build, run the controller:
bin/manager
The controller's execution log will be visible in the terminal.
If you experience connection problems after fresh installation of kubernetes and NeKO, rebooting the machine may help.
When launching pipelines on a Kubernetes cluster, users may stumble upon a permissions problem, usually manifesting itself by throwing 403 errors in the logs, accompanied by the name of the service account used.
This problem can be solved by binding a more powerful role to that account, for example:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: nextflow-role
labels:
app: nextflow-role
rules:
- apiGroups:
- ""
- apps
- autoscaling
- batch
- extensions
- policy
- rbac.authorization.k8s.io
resources:
- pods
- pods/status
- pods/log
- persistentvolumes
- persistentvolumeclaims
- configmaps
verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: nextflow-role-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: nextflow-role
subjects:
- kind: ServiceAccount
name: default # this is the name of the service account in question
To start an already prepared Nextflow launch, run: kubectl apply -f my-launch.yaml
, where my-launch.yaml
is the name of your definition file.
As an example, see hello.yaml; the file contains the definition of a simple launch (see: The essentials in Configuring your pipelines for explanation) as well as definitions of a PV-PVC pair. The example pipeline is launched with:
kubectl apply -f config/samples/hello.yaml
If your pipeline has finished with success, you will see the results yielded by
the pipeline by viewing the logs from the driver pod (if the name of your launch
is hello
, it will be named hello-xxxxxxxx
, where xxxxxxxx
is a random hash
assigned to the pod), for example: kubectl logs hello-7a69dc11
.
If you're running the pipeline on a remote cluster, though, it is possible that your job has failed due to the restrictions imposed on the user by the environment. Typical issues include so-called taints which require setting respective tolerations in your launch's definition, and insufficient permissions assigned to the service account running the launch. Both topics are described in detail elsewhere in this document.
Either way, if you don't need the launch anymore on your cluster, remove it
with kubectl delete -f hello.yaml
. (NOTE: if the pipeline has saved any
artifacts to the persistent volume, they will be safe!)
It may happen that your job fails (for example, due to misconfiguration or an I/O problem). Or, in a cloud setting, your driver pod may "disappear" when a node is shut down. In these cases, it is possible to relaunch the computations (in the latter case the relaunch will be done automatically), but if you don't want to lose the work so far, you should make sure that the following settings are in your YAML file (see the next section for the details):
"-resume"
must be innextflow.args
,- Nextflow home (
NXF_HOME
environment variable) must be in a persistent location (e.g., on the PVC).
As has been mentioned, both the configuration of the computational pipeline
and the Nextflow environment are defined in a yaml file as a Kubernetes
custom resource
called NextflowLaunch
.
To accommodate for the complex and demanding runtime environment that Kubernetes is, some k8s-specific configuration options are available, in addition to the settings provided by Nextflow.
The most trivial example of a valid Nextflow launch definition can be seen below:
apiVersion: batch.mnm.bio/v1alpha1
kind: NextflowLaunch
metadata:
name: hello
spec:
pipeline:
source: hello
k8s:
storageClaimName: hello-pvc
This launch, when run, will download and execute Nextflow's "hello world"
pipeline, available at https://github.com/nextflow-io/hello . This is defined
in the pipeline.source
section of the yaml file.
In general, the same rules follow as when running Nextflow pipeline from the
command line.
Optionally, the pipeline can be downloaded from a branch other than the main
branch, or from a Git tag/revision. This can be achieved by declaring the
branch/revision in the pipeline.revision
section.
The other setting that is required is k8s.storageClaimName
. This is the name
of the persistent volume claim that both the driver and the workers will mount
and use. It can be mounted at any mounting point, and freely used by the
pipeline and other scripts running in the pod.
Let's move on to read about the configuration options that NeKO provides.
These sections (defined within spec
in the yaml file; see above) are
equivalents of the respective Nextflow scopes. A short example is shown
below:
spec:
k8s:
storageClaimName: my-pvc
storageMountPath: /my-workspace
params:
manifest: my_manifest.json
outputDir: /my-workspace/output
env:
SHELL: zsh
Here, a PVC called my-pvc
will be mounted at /my-workspace
.
NOTE: unless defined explicitly, the vital directories are set as follows:
storageMountPath: /workspace
launchDir: <storageMountPath>/<launch_name>
workDir: <launchDir>/work
This is similar to Nextflow defaults.
For all available configuration options, see https://www.nextflow.io/docs/edge/config.html#scope-k8s .
In the example, two pipeline parameters are defined: manifest
and
outputDir
.
For reference, see https://www.nextflow.io/docs/edge/config.html#scope-params .
Like above, it is possible to set environment variables (SHELL
in the
example) in the worker pods (for setting variables in the driver pod,
see: Configuring the driver).
For reference, see https://www.nextflow.io/docs/edge/config.html#scope-env .
The pod
directive is part of the process
scope. Several k8s-specific
settings are available, including more sophisticated ways of setting
environment variables (from secrets, config maps, etc.).
For a full list, see https://www.nextflow.io/docs/edge/process.html#process-pod .
Most of these options can be defined as simple key-value maps, for example:
spec:
pod:
- label: foo
value: bar
- imagePullSecret: my-secret
However, some pod configuration options are free-form maps which, for technical
reasons, cannot be easily implemented in a Kubernetes CRD (custom resource
definition). Hence, nextflow-k8s-operator
provides a dedicated syntax for
declarations which are not simple key-value pairs. For example (note the
(map)
):
spec:
pod:
- toleration: (map)
key: nextflow
operator: Equal
value: "true"
effect: NoSchedule
translates to:
pod = [
[
toleration: [
key: 'nextflow',
operator: 'Equal',
value: 'true',
effect: 'NoSchedule',
],
],
]
By default, a predefined version of Nextflow is used as a driver for the launches (it can be changed in the creators.go file, but the code has to be recompiled and re-run afterwards).
To enable more flexibility in the selection of the runtime environment,
the nextflow
section provides options for customizing the Nextflow
environment used for the driver pod:
nextflow.image
: the name of the Nextflow Docker image used by the driver
(not the workers), without version tag. By default, nextflow/nextflow
.
Please note that is possible to use software other than Nextflow by choosing
a non-Nextflow image!
nextflow.version
: version tag for the Docker image.
nextflow.command
: custom command launching Nextflow in the driver pod.
By default, nextflow run
is exectued with some command-line parameters.
This is a good place to add custom invokations to the Nextflow command,
or execute some other script pre-launch. (NOTE: see examples of command
declarations in Kubernetes pod definitions for reference.)
nextflow.args
: if you want to keep the default command line and only add
some arguments to it (for example, -resume
), it's better to specify them
in this section (in the same way you'd specify the command declaration in
nextflow.command
). Your arguments will be appended to the original command.
nextflow.home
: this changes the path to Nextflow's home directory.
Point it to a persistent volume if you want to keep the Nextflow environment
between launches.
nextflow.logPath
: custom path for the log file. (NOTE: it should include
the filename as well.)
nextflow.scmSecretName
: this important setting allows for downloading
pipelines from private (or otherwise restricted) repositories. It points to
a Kubernetes secret holding the contents of Nextflow SCM configuration file
(see https://www.nextflow.io/docs/latest/sharing.html#scm-configuration-file ).
To create the secret, use make_scm_secret.sh | kubectl apply -f -
.
The options described in the previous sections impact only the worker pods (which are handled by the Nextflow process, just like when Nextflow is launched from the command line). To enable the configuration of the driver pod, some configuration options have been added to the launch definition that are not present in Nextflow. These include:
driver.env
: definitions of environment variables for the driver. This section
is identical with the env
section in any Kubernetes pod definition (this
includes using secrets and config maps as sources for the variables).
driver.tolerations
: driver pod tolerations. Defined exactly like in a pod
definition.
An example of the same toleration set both for the driver and the workers is shown below.
spec:
pod:
- toleration: (map)
key: core
operator: Equal
value: "true"
effect: NoSchedule
driver:
tolerations:
- key: core
operator: Equal
value: "true"
effect: NoSchedule
driver.labels
: labels used for the driver pod. Defined like in the metadata
section of a pod definition.
driver.resources
: computational resources required for the driver pod. For
example:
spec:
driver:
resources:
requests:
memory: "16Gi"
cpu: "4"
limits:
memory: "32Gi"
cpu: "8"
nextflow-k8s-operator
has been created with Kubebuilder.