There are two main ways to help improve these images:
- File an issue
- Directly make modifications to a docker file and make a pull request
And if you find these instructions or think something would be helpful,
make modifications and submit a PR! Also, additions to the test suite
including the tests specified in notebook_test.py
would be much
appreciated!
One tricky thing about this setup is that it is extremely important that the notebook and worker images match. Please make modifications to both files so they don't get out of sync.
On travis, we test the containers to make sure the images build. Note that that this is woefully insufficient for ensuring a safe deployment to the cluster! We're looking for ways to run more comprehensive tests automatically, but in the meantime, see the below testing strategies to really give your changes a test drive!
The main purpose of our travis tests is actually deployment. See the deployment section for more details.
-
Make sure you have docker installed and initialized
-
Create an image locally:
docker build worker -t rhodium/worker:my-tag-name docker build notebook -t rhodium/notebook:my-tag-name
or pull one from dockerhub:
docker pull rhodium/notebook:dev docker pull rhodium/worker:dev
-
Set an environment variable to preserve this tag name, e.g.:
NOTEBOOK_TAG=rhodium/notebook:dev WORKER_TAG=rhodium/worker:dev
Once you've set up your machine, run the following to programatically test the notebook:worker pairing:
# start scheduler in notebook server
docker run --net="host" -d $NOTEBOOK_TAG start.sh /opt/conda/bin/dask-scheduler --port 8786 --bokeh-port 8787 &
# at this point you could test the connection by starting a
# python session and connecting to the scheduler with
#
# import dask.distributed as dd
# client = dd.Client('localhost:8786')
# client
#
# you should have a valid connection, but no workers yet
# start worker
docker run --net="host" -d $WORKER_TAG /opt/conda/bin/dask-worker localhost:8786 --worker-port 8666 --nanny-port 8785 &
# at this point, if you run `client` again, you should see
# a worker connected to the client! You can now run any
# python commands to test the connection. The rest of these
# commands attempt to establish such a connection from within
# a notebook image.
# notebook server for user connection
docker create --name tester --net="host" $NOTEBOOK_TAG
# copy test suite to test image
docker cp notebook_test.py tester:/usr/bin
# start the tester image
docker start tester
# run test suite
docker exec tester python /usr/bin/notebook_test.py
This will build a local cluster with a worker:notebook already paired. Log into the jupyterhub cluster and make sure it works the way you expect. Note that this does not use a kubernetes cluster, just a single worker.
# start jupyterlab in the notebook server
docker run -p 8888:8888 -p 8786:8786 -p 8787:8787 $NOTEBOOK_TAG start.sh jupyter lab --port 8888 --no-browser --allow-root
# go to https://localhost:8888 to view the page. you'll need to enter the token from the docker log in order to log in
# open a jupyterlab terminal window and start a dask-scheduler
dask-scheduler --port 8786 --bokeh-port 8787
# in a new terminal window on your laptop, make sure you set your tag environment variables, then start a worker container using whatever worker image you want, and get it to connect to the scheduler
docker run -p 8666:8666 -p 8785:8785 --net="host" $WORKER_TAG dask-worker localhost:8786 --worker-port 8666 --nanny-port 8785
# you should see the worker report that it is registered to the scheduler, and on the scheduler terminal on jupyterlab it should report the registered worker
# in a notebook on jupyterlab, connect to the scheduler (don't create a dask cluster... the scheduler is already set up)
import dask.distributed as dd
client = dd.Client('localhost:8786')
# do whatever you want in the notebook and test the connection to the worker.
To close & remove the images (e.g. before testing a new build):
docker stop $(docker ps -q)
docker rm $(docker ps -qa)
A worker:notebook pair is deployed to docker whenever a build is
successful on the dev
and master
branches.
Each worker and notebook image are tagged with one or more of several tags:
-
A version number (e.g.
0.2.1
) is assigned to each major release (tag). This is only pushed to docker once and is permanently stable. Use release tags on production deployments. To create a release, draft a new release on github. -
A commit hash (e.g.
fbb4a04f3829f12c81812573c54d8e52aba324ce
) is assigned to everydev
ormaster
build. These are completely stable and will not be updated ever. Use these tags to ensure you always get this specific image. -
Every build on the
dev
branch will be assigned thedev
tag. Use thedev
tag to get the latest development release. -
Every build on the
master
branch will be assigned both thedev
tag and thelatest
tag. Use thelatest
tag to get the latest stable release.
Once an image has been built and deployed to dockerhub, it can be tested on a test cluster. See the helm chart repo for more information.