BinderHub is a cloud-based, multi-server technology used for hosting repoducible computing environments and interactive Jupyter Notebooks built from code repositories.
This repository contains a set of scripts to automatically deploy a BinderHub onto Microsoft Azure, and connect either a Docker Hub account/organisation or an Azure Container Registry, so that you can host your own Binder service.
You will require a Microsoft Azure account and subscription. A Free Trial subscription can be obtained here. You will be asked to provide a credit card for verification purposes. You will not be charged. Your resources will be frozen once your subscription expires, then deleted if you do not reactivate your account within a given time period. If you are building a BinderHub as a service for an organisation, your institution may already have an Azure account. You should contact your IT Services for further information regarding permissions and access (see the Service Principal Creation section below).
Please read our π Code of Conduct π and πΎ Contributing Guidelines πΎ
Table of Contents:
- πΈ Usage
- π "Deploy to Azure" Button
- π‘ Running the Container Locally
- π¨ Customising your BinderHub Deployment
- π» Developers Guide
- π Contributors
This repo can either be run locally or as "Platform as a Service" through the "Deploy to Azure" button in the "Deploy to Azure" Button section.
To use these scripts locally, clone this repo and change into the directory.
git clone https://github.com/alan-turing-institute/binderhub-deploy.git
cd binderhub-deploy
To make the scripts executable and then run them, do the following:
cd src
chmod 700 <script-name>.sh
./<script-name>.sh
[NOTE: The above command is UNIX specific. If you are running Windows 10, this blog post discusses using a bash shell in Windows.]
To build the BinderHub, you should run setup.sh
first (to install the required command line tools), then deploy.sh
(which will build the BinderHub).
Once the BinderHub is deployed, you can run logs.sh
and info.sh
to get the JupyterHub logs and IP addresses respectively.
teardown.sh
should only be used to delete your BinderHub deployment.
You need to create a file called config.json
which has the format described in the code block below.
Fill the quotation marks with your desired namespaces, etc.
config.json
is git-ignored so sensitive information, such as passwords and Service Principals, cannot not be pushed to GitHub.
- For a list of available data centre regions, see here.
This should be a region and not a location, for example "West Europe" or "Central US".
These can be equivalently written as
westeurope
andcentralus
, respectively. - For a list of available Linux Virtual Machines, see here.
This should be something like, for example
Standard_D2s_v3
. - The versions of the BinderHub Helm Chart can be found here and are of the form
0.2.0-<commit-hash>
. It is advised to select the most recent version unless you specifically require an older one. - If you are deploying an Azure Container Registry, find out more about the SKU tiers here.
{
"container_registry": "", // Choose Docker Hub or ACR with 'dockerhub' or 'azurecr' values, respectively.
"enable_https": "false", // Choose whether to enable HTTPS with cert-manager. Boolean.
"acr": {
"registry_name": null, // Name to give the ACR. This must be alpha-numerical and unique to Azure.
"sku": "Basic" // The SKU capacity and pricing tier for the ACR
},
"azure": {
"subscription": "", // Azure subscription name or ID (a hex-string)
"res_grp_name": "", // Azure Resource Group name
"location": "", // Azure Data Centre region
"node_count": 1, // Number of nodes to deploy. 3 is preferrable for a stable cluster, but may be liable to caps.
"vm_size": "Standard_D2s_v3", // Azure virtual machine type to deploy
"sp_app_id": null, // Azure service principal ID (optional)
"sp_app_key": null, // Azure service principal password (optional)
"sp_tenant_id": null, // Azure tenant ID (optional)
"log_to_blob_storage": false // Store logs in blob storage when not running from a container
},
"binderhub": {
"name": "", // Name of your BinderHub
"version": "", // Helm chart version to deploy, should be 0.2.0-<commit-hash>
"image_prefix": "" // The prefix to preppend to Docker images (e.g. "binder-prod")
},
"docker": {
"username": null, // Docker username (can be supplied at runtime)
"password": null, // Docker password (can be supplied at runtime)
"org": null // A Docker Hub organisation to push images to (optional)
},
"https:": {
"certmanager_version": null, // Version of cert-manager to install
"contact_email": null, // Contact email for Let's Encrypt
"domain_name": null, // Domain name to issue certificates for
"nginx_version": null // Version on nginx-ingress to install
}
}
You can copy template-config.json
should you require.
Please note that all entries in template-config.json
must be surrounded by double quotation marks ("
), with the exception of node_count
.
If you have signed up to an Azure Free Trial subscription, you are not allowed to deploy more than 4 cores.
How many cores you deploy depends on your choice of node_count
and vm_size
.
For example, a Standard_D2s_v3
machine has 2 cores.
Therefore, setting node_count
to 2 will deploy 4 cores and you will have reached your quota for cores on your Free Trial subscription.
To select either a Docker Hub account/organisation or an Azure Container Registry (ACR), you must set the top-level container_registry
key in config.json
to either dockerhub
or azurecr
respectively.
This will tell deploy.sh
which variables and YAML templates to use.
Then fill in the values under either the dockerhub
or acr
key as required.
Using a Docker Hub account/organisation has the benefit of being relatively simple to set up. However, all the BinderHub images pushed there will be publicly available. For a few extra steps, deploying an ACR will allow the BinderHub images to be pushed to a private repository.
Service Principal:
In the Service Principal Creation section, we cover how to create a Service Principal in order to deploy a BinderHub.
When following these steps, the --role
argument of Contributor
should be replaced with Owner
.
This is because the Service Principal will need the AcrPush
role in order to push images to the ACR and the Contributor
role does not have permission to create new role assignments.
This script checks whether the required command line tools are already installed.
If any are missing, the script uses the system package manager or curl
to install the command line interfaces (CLIs).
The CLIs to be installed are:
Any dependencies that are not automatically installed by these packages will also be installed.
If you have a domain name that you would like your BinderHub to be hosted at, the package can configure a DNS Zone to host the records for your domain name and configure the BinderHub to use these addresses rather than raw IP addresses.
HTTPS certificates will also be requested for the DNS records using cert-manager
and Let's Encrypt.
While the package tries to automate as much as possible, when enabling HTTPS there are still a few steps that the user will have to do manually.
-
Delegate the domain to the name servers
The script will return four name servers that are hosting the DNS Zone, the will be saved to the log file
name-servers.log
. Your parent domain NS records need to be updated to delegate to these name servers (see the Azure documentation). How this is achieved will be different depending on your domain registrar. -
Point the A records to the Load Balancer IP Address
Two A records are created for the Binder page and the JupyterHub and these records need to be set to the public IP address of the cluster's load balancer. The package tries to complete this step automatically but often fails, due to the long-running nature of Azure's process to update the CLI. It is recommended to wait some time (overnight is best) and then run
set-a-records.sh
. Alternatively, there are manual instructions for setting the A records in the Azure Portal. -
Switching from Let's Encrypt staging to production
Let's Encrypt provides a staging platform to test against and this is the environment the package will request certificates from. Once you have verified the staging certificates have been issued correctly, the user must switch to requesting certificates from Let's Encrypt's production environment to receive trusted certificates. Instructions for switching environments.
This script reads in values from config.json
and deploys a Kubernetes cluster.
It then creates config.yaml
and secret.yaml
files which are used to install the BinderHub using the templates in the templates
folder.
If you have chosen a Docker Hub account/organisation, the script will ask for your Docker ID and password if you haven't supplied them in the config file.
The ID is your Docker username, NOT the associated email.
If you have provided a Docker organisation in config.json
, then Docker ID MUST be a member of this organisation.
If you have chosen an ACR, the script will create one and assign the AcrPush
role to your Service Principal.
The registry server and Service Principal credentials will then be parsed into config.yaml
and secret.yaml
so that the BinderHub can connect to the ACR.
If you have requested HTTPS to be enabled, the script will create a DNS Zone and A records for the Binder and JupyterHub endpoints.
The nginx-ingress
and cert-manager
helm charts will also be installed to provide a load balancer and automated requests for certificates from Let's Encrypt, respectively.
Both a JupyterHub and BinderHub are installed via a Helm Chart onto the deployed Kubernetes cluster and the config.yaml
file is updated with the JupyterHub IP address.
config.yaml
and secret.yaml
are both git-ignored so that secrets cannot be pushed back to GitHub.
The script also outputs log files (<file-name>.log
) for each stage of the deployment.
These files are also git-ignored.
If the azure.log_to_blob_storage
value in config.json
is set to true
the script is running from the command line, then the log files will be stored in blob storage.
π¨ This script is only relevant if deploying a BinderHub with a domain name and HTTPS certificates π¨
This script reads in values from config.json
and try to set the Kubernetes public IP address to the binder
and hub
A records in the DNS Zone.
This script will print the JupyterHub logs to the terminal to assist with debugging issues with the BinderHub.
It reads from config.json
in order to get the BinderHub name.
This script will print the pod status of the Kubernetes cluster and the IP addresses of both the JupyterHub and BinderHub to the terminal.
It reads the BinderHub name from config.json
.
This script will automatically upgrade the Helm Chart deployment configuring the BinderHub and then prints the Kubernetes pods.
It reads the BinderHub name and Helm Chart version from config.json
.
This script will purge the Helm Chart release, delete the Kubernetes namespace and then delete the Azure Resource Group containing the computational resources.
It will read the namespaces from config.json
.
The user should check the Azure Portal to verify the resources have been deleted.
It will also purge the cluster information from your kubectl
configuration file.
To deploy BinderHub to Azure in a single click (and some form-filling), use the deploy button below.
You will be asked to provide a Service Principal in the form launched when you click the "Deploy to Azure" button above.
[NOTE: The following instructions can also be run in a local terminal session.
They will require the Azure command line to be installed, so make sure to run setup.sh
first.]
To create a Service Principal, go to the Azure Portal (and login!) and open the Cloud Shell:
You may be asked to create storage when you open the shell. This is expected, click "Create".
Make sure the shell is set to Bash, not PowerShell.
Set the subscription you'd like to deploy your BinderHub on.
az account set --subscription <subscription>
This image shows the command being executed for an "Azure Pass - Sponsorship" subscription.
You will need the subscription ID, which you can retrieve by running:
az account list --refresh --output table
Next, create the Service Principal with the following command. Make sure to give it a sensible name!
az ad sp create-for-rbac \
--name binderhub-sp \
--role Contributor \
--scope /subscriptions/<subscription ID from above>
NOTE: If you are deploying an ACR rather than connecting to Docker Hub, then this command should be:
az ad sp create-for-rbac \
--name binder\
--scope /subscriptions/<subscription ID from above>
The fields appId
, password
and tenant
are the required pieces of information.
These should be copied into the "Service Principal App ID", "Service Principal App Key" and "Service Principal Tenant ID" fields in the form, respectively.
Keep this information safe as the password cannot be recovered after this step!
To monitor the progress of the blue-button deployment, go to the Azure portal and select "Resource Groups" from the left hand pane. Then in the central pane select the resource group you chose to deploy into.
This will give you a right hand pane containing the resources within the group. You may need to "refresh" until you see a new container instance.
When it appears, select it and then in the new pane go to "Settings->Containers". You should see your new container listed.
Select it, then in the lower right hand pane select "Logs". You may need to "refresh" this to display the logs until the container starts up. The logs are also not auto-updating, so keep refreshing them to see progress.
When BinderHub is deployed using the "Deploy to Azure" button (or with a local container), output logs, YAML files, and ssh keys are pushed to an Azure storage account to preserve them once the container exits. The storage account is created in the same resource group as the Kubernetes cluster, and files are pushed into a storage blob within the account.
Both the storage blob name and the storage account name are derived from the name you gave to your BinderHub instance, but may be modified and/or have a random seed appended.
To find the storage account name, navigate to your resource group by selecting "Resource Groups" in the left-most panel of the Azure Portal, then clicking on the resource group containing your BinderHub instance.
Along with any pre-existing resources (for example, if you re-used an existing resource group), you should see three new resources: a container instance, a Kubernetes service, and a storage account.
Make a note of the name of the storage account (referred to in the following commands as ACCOUNT_NAME
) then select this storage account.
In the new pane that opens, select "Blobs" from the "Services" section.
You should see a single blob listed.
Make a note of the name of this blob, which will be BLOB_NAME
in the following commands.
The Azure CLI can be used to fetch files from the blob (either in the cloud shell in the Azure Portal, or in a local terminal session if you've run setup.sh
first).
Files are fetched into a local directory, which must already exist, referred to as OUTPUT_DIRECTORY
in the following commands.
You can run setup.sh
to install the Azure CLI or use the cloud shell on the Azure Portal.
To fetch all files:
az storage blob download-batch \
--account-name <ACCOUNT_NAME> \
--source <BLOB_NAME> \
--pattern "*" \
--destination "<OUTPUT_DIRECTORY>"
The --pattern
argument can be used to fetch particular files, for example all log files:
az storage blob download-batch \
--account-name <ACCOUNT_NAME> \
--source <BLOB_NAME> \
--pattern "*.log" \
--destination "<OUTPUT_DIRECTORY>"
To fetch a single file, specify REMOTE_FILENAME
for the name of the file in blob storage, and LOCAL_FILENAME
for the filename it will be fetched into:
az storage blob download \
--account-name <ACCOUNT_NAME> \
--container-name <BLOB_NAME> \
--name <REMOTE_FILENAME> \
--file <LOCAL_FILENAME>
For full documentation, see the az storage blob
documentation.
Once the deployment has succeeded and you've downloaded the log files, visit the IP address of your Binder page to test it's working.
The Binder IP address can be found by running the following:
cat <OUTPUT_DIRECTORY>/binder-ip.log
A good repository to test your BinderHub with is binder-examples/requirements
The third way to deploy BinderHub to Azure would be to pull the Docker image and run it directly, parsing the values you would have entered in config.json
as environment variables.
You will need the Docker CLI installed. Installation instructions can be found here.
First, pull the binderhub-setup
image.
docker pull sgibson91/binderhub-setup:<TAG>
where <TAG>
is your chosen image tag.
A list of availabe tags can be found here.
It is recommended to use the most recent version number.
The latest
tag is the most recent build from the default branch and may be subject fluctuations.
Then, run the container with the following arguments, replacing the <>
fields as necessary:
docker run \
-e "AKS_NODE_COUNT=1" \ # Required
-e "AKS_NODE_VM_SIZE=Standard_D2s_v3" \ # Required
-e "AZURE_SUBSCRIPTION=<Azure Subscription ID>" \ # Required
-e "BINDERHUB_CONTAINER_MODE=true" \ # Required
-e "BINDERHUB_NAME=<Chosen BinderHub name>" \ # Required
-e "BINDERHUB_VERSION=<Chosen BinderHub version>" \ # Required
-e "CONTAINER_REGISTRY=<dockerhub or azurecr>" \ # Required
-e "DOCKER_IMAGE_PREFIX=binder-dev" \ # Required
-e "DOCKERHUB_ORGANISATION=<Docker organisation>" \
-e "DOCKERHUB_PASSWORD=<Docker password>" \
-e "DOCKERHUB_USERNAME=<Docker ID>" \
-e "REGISTRY_NAME=<Registry Name>" \
-e "REGISTRY_SKU=Basic" \
-e "RESOURCE_GROUP_LOCATION=westeurope" \ # Required
-e "RESOURCE_GROUP_NAME=<Chosen Resource Group name>" \ # Required
-e "SP_APP_ID=<Service Principal ID>" \ # Required
-e "SP_APP_KEY=<Service Principal Key>" \ # Required
-e "SP_TENANT_ID=<Service Principal Tenant ID>" \ # Required
-it sgibson91/binderhub-setup:<TAG>
The output will be printed to your terminal and the files will be pushed to blob storage, as in the button deployment. See the Retrieving Deployment Output from Azure section for how to return these files.
Customising your BinderHub deployment is as simple as editing config.yaml
and/or secret.yaml
and then upgrading the BinderHub Helm Chart.
The Helm Chart can be upgraded by running upgrade.sh
(make sure you have the CLIs installed by running setup.sh
first).
The Jupyter guide to customising the underlying JupyterHub can be found here.
The BinderHub guide for changing the landing page logo can be found here.
The Docker image will automatically be built by Docker Hub when new pushes are made to main
.
However, a developer may wish to build the image to test deployments before merging code.
Firstly, make sure config.json
has been removed from the repository.
Otherwise, secrets within the file may be built into the image.
The command to build a Docker image from the root of the repo is as follows.
docker build -t <DOCKER_USERNAME>/binderhub-setup:<TAG> .
It is not necessary to push this image to a container registry. But if you choose to do so, the command is as follows.
docker push <REGISTRY-HOST>/<DOCKER-USERNAME>/binderhub-setup:<TAG>
Docker Hub will automatically build the image from the repo with every push to main
and tag this as latest
.
To release a specific version, update the Azure ARM template with the new/desired version on line 166 and the block starting at line 170. We follow SemVer versioning format.
Once the Pull Request containing the new code/version/release has been merged, run the following commands, where vX.Y.Z
is the new/desired version release.
git checkout main
git pull
git tag -a vX.Y.Z # For an annotated tag
git tag -m vX.Y.Z # For a lightweight tag
git tag vX.Y.Z # For a tag with no extra data
git push --tags
This will trigger Docker Hub to build an image with the SemVer version as a tag.
See the following documentation for information on tagging:
- https://git-scm.com/book/en/v2/Git-Basics-Tagging
- https://dev.to/neshaz/a-tutorial-for-tagging-releases-in-git-147e
Please read our π Code of Conduct π and πΎ Contributing Guidelines πΎ to get you started!
Thanks goes to these wonderful people (emoji key):