This repository is an up-to-date collection of minimal jaynes usage examples. You can mix and match configurations between these included usecases for your particular infrastructure. You can find the up-to-date copy of this guide here: https://github.com/geyang/jaynes-starter-kit
First let's install Jaynes! This tutorial is written w.r.t version: 0.7.2
pip install jaynes
I would also recommend taking a look at params-proto, which is a pythonic hyperparameter + argparsing library that makes parameter management declaritive and error-free. We use params-proto and its sweep utility, params_proto.hyper
in our parameter sweep example. To install params-proto, run
pip install params-proto waterbear
For detailed documentation on each usecases, refer to the in-dept tutorial bellow. Each folder contains a complete example. To run, follow the instruction in the README.
01_ssh_docker_configuration
├── README.md
├── launch_entry.py
└── .jaynes.yml
-
SSH Launch Modes
-
Working with Diverse Compute Resources
-
SLURM
-
AWS
-
GCP
-
God Mode
Reporting Issues (on the Jaynes Repo/issues)
Let's collect all issues on the main jaynes
repo's issue page, so that
people can search for things more easily!
Jaynes
offer a way to transparently debug the launch via verbose
mode, where it prints out all of the local and remote script that it generates. To debug a launch script, set verbose
to true
either in the yaml file, or through the jaynes.config
call. To debug in the remote host where you intend to run your job, you can often copy and paste the generated script
to see the error messages.
Debugging Steps:
- Turn on verbose mode, by setting
verbose=True
in the jaynes call
#! launch_entry.py
import jaynes
jaynes.config(verbose=True)
or
#! .jaynes.yml
verbose: true
runner:
- ....
- Launch
#! launch_entry.py
if __name__ == "__main__":
jaynes.run(train_fn, *args, **kwargs)
# if in SLURM or SSH mode:
jaynes.listen() # to listen to the stdout/stderr pipe-back
-
Debug Suppose you have an error message. You can copy and paste the script ran by
jaynes
, that is printed out in the console either locally or on the EC2 instance you just launched to debug the specifics of it. -
Share with Lab mates When you are done, you can share this repo with others who use the same infrastructure, so that they can run their code there too.
Machine Learning infrastructure is an evolving problem, and would take the rest of the community to maintain, adopt and standardize.
Below are a few areas that current stands in need to contributions: (now mostly done)
- [done] - Documentation on Configuration Schema issue #2
- [done] - GCE Support issue #3
- [done] - Pure SSH Host Support issue #4
- [done] - SLURM SBatch Support issue #5
- SLURM Singularity Support issue #6