Hyper-Tune: an Efficient Hyper-parameter Tuning at Scale
Note that in our experiments, the operating system is Ubuntu 18.04.3 LTS. We use xgboost==1.3.1 and torch==1.7.1 (torchvision==0.7.0, CUDA Version 10.1.243). The configuration space is defined using ConfigSpace==0.4.18. The multi-fidelity surrogate in our method is implemented based on probabilistic random forest in SMAC3, which depends on pyrfr==0.8.0. (included in requirements.txt)
In our paper, we use Pytorch to train neural networks on 32 RTX 2080Ti GPUs, and the experiments are conducted on ten machines with 640 AMD EPYC 7702P CPU cores in total (64 cores, 128 threads each).
- preparations: Python == 3.7
- install SWIG:
apt-get install swig3.0 ln -s /usr/bin/swig3.0 /usr/bin/swig
- install requirements:
cat requirements.txt | xargs -n 1 -L 1 pip install
- Download 4 datasets (Covertype, Pokerhand, Hepmass, Higgs) for XGBoost tuning experiments:
- Preprocess data (including train-test split) with
test/preprocess_data.py
. - Put datasets (
.npy
files) under./datasets/
.
- Follow the instructions of https://github.com/D-X-Y/NAS-Bench-201
and download benchmark file of NAS-Bench-201(
NAS-Bench-201-v1_1-096897.pth
).
- Download
cifar10.zip
(preprocessed) from Google Drive or Baidu-Wangpan (code:t47a). (Note that at present, we only provide downloading from Baidu-Wangpan because Google Drive is not an anonymous service.) - Unzip
cifar10.zip
and put it under./datasets/img_datasets/
(the path should be./datasets/img_datasets/cifar10/
).
- We implement LSTM based on https://github.com/salesforce/awd-lstm-lm to conduct our experiments.
Please follow the instructions in project readme and use
getdata.sh
to to acquire the Penn Treebank dataset. - Put dataset (
.txt
files) under./test/awd_lstm_lm/data/penn/
tuner/
: the implemented method and compared baselines.test/
: the python scripts in the experiments, and useful tools.
See tuner/__init__.py
to get the name of each baseline method. (Keys of mth_dict
)
Compared methods are listed as follows:
Method | String of ${method_name} |
---|---|
Batch BO | bo |
Successive Halving | sh |
Hyperband | hyperband |
BOHB | bohb |
MFES-HB | mfeshb |
A-Random | arandom |
A-BO | abo |
A-REA | area |
ASHA | asha |
A-BOHB | abohb_aws (see the Note below) |
A-Hyperband | ahyperband |
ours | tuner |
Note: To run A-BOHB(abohb_aws
) implemented in Autogluon(https://github.com/awslabs/autogluon),
please install the corresponding environment and follow the instructions at the last of this document.
Exp settings:
- n_workers=8, rep=10.
cifar10-valid
: runtime_limit=86400cifar100
: runtime_limit=172800ImageNet16-120
: runtime_limit=432000
Compared methods: bo
, sh
, hyperband
, bohb
, mfeshb
, arandom
, area
, abo
, asha
, ahyperband
, abohb_aws
(See the last of this document), tuner
To conduct the simulation experiment shown in Figure 5, the script is as follows.
Please specify ${dataset_name}
, ${runtime_limit}
, ${method_name}
:
python test/nas_benchmarks/benchmark_nasbench201.py --data_path './NAS-Bench-201-v1_1-096897.pth' --dataset ${dataset_name} --runtime_limit ${runtime_limit} --mths ${method_name} --R 27 --n_workers 8 --rep 10
Exp settings:
- n_workers=8, rep=10.
covtype
(Covertype): runtime_limit=10800pokerhand
(Pokerhand): runtime_limit=7200hepmass
(Hepmass): runtime_limit=43200HIGGS
(Higgs): runtime_limit=43200
Compared methods: bo
, sh
, hyperband
, bohb
, mfeshb
, arandom
, abo
, asha
, ahyperband
, abohb_aws
(See the last of this document), tuner
To conduct the experiment shown in Figure 7, the script is as follows.
Please specify ${dataset_name}
, ${runtime_limit}
, ${method_name}
:
python test/benchmark_xgb.py --datasets ${dataset_name} --runtime_limit ${runtime_limit} --mth ${method_name} --R 27 --n_workers 8 --rep 10
Please make sure there are enough CPUs on the machine.
Exp settings:
- n_workers=4, rep=10.
penn
(Penn Treebank for LSTM): runtime_limit=172800cifar10
(for ResNet): runtime_limit=172800
Compared methods: sh
, hyperband
, bohb
, mfeshb
, asha
, ahyperband
, abohb_aws
(See the last of this document), tuner
To conduct the experiment shown in Figure 6(a), the script is as follows:
python test/awd_lstm_lm/benchmark_lstm.py --dataset penn --runtime_limit ${runtime_limit} --mth ${method_name} --R 27 --n_workers 4 --rep 10
To conduct the experiment shown in Figure 6(b), the script is as follows:
python test/resnet/benchmark_resnet.py --dataset cifar10 --runtime_limit ${runtime_limit} --mth ${method_name} --R 27 --n_workers 4 --rep 10
Please specify ${runtime_limit}
, ${method_name}
.
Exp settings:
- n_workers=8, rep=10.
hartmann
(noised math function): runtime_limit=1080noise_alpha
: 0, 100, 10000 (corresponding to 0, 40, 4000 in our paper)
Compared methods: asha
(with different initial resource), abohb_aws
(See the last of this document), tuner
To conduct the simulation experiment shown in Figure 8, the script are as follows.
Please specify ${noise_alpha}
:
- run
asha
with different initial resource (e.g.--R 9
means the initial resource is 1/9):
python test/math_benchmarks/benchmark_math.py --dataset hartmann --noise_alpha ${noise_alpha} --runtime_limit 1080 --mths asha --R 1 --n_workers 8 --rep 10
python test/math_benchmarks/benchmark_math.py --dataset hartmann --noise_alpha ${noise_alpha} --runtime_limit 1080 --mths asha --R 3 --n_workers 8 --rep 10
python test/math_benchmarks/benchmark_math.py --dataset hartmann --noise_alpha ${noise_alpha} --runtime_limit 1080 --mths asha --R 9 --n_workers 8 --rep 10
python test/math_benchmarks/benchmark_math.py --dataset hartmann --noise_alpha ${noise_alpha} --runtime_limit 1080 --mths asha --R 27 --n_workers 8 --rep 10
- run
tuner
:
python test/math_benchmarks/benchmark_math.py --dataset hartmann --noise_alpha ${noise_alpha} --runtime_limit 1080 --mths tuner --R 27 --n_workers 8 --rep 10
Exp settings:
- rep=10.
n_workers
: 1, 2, 4, 8, 16, 32, 64. (128, 256 for Counting Ones)
Compared method: tuner
(with different n_workers)
To conduct the experiment shown in Figure 10, the script are as follows.
Please specify ${n_workers}
:
- Nas-Bench-201 on cifar100: runtime_limit=172800
python test/nas_benchmarks/benchmark_nasbench201.py --data_path './NAS-Bench-201-v1_1-096897.pth' --dataset cifar100 --runtime_limit 172800 --mths tuner --R 27 --n_workers ${n_workers} --rep 10
- Counting Ones function on 32+32 dimensions: runtime_limit=5400
python test/math_benchmarks/benchmark_math.py --dataset counting-32-32 --runtime_limit 5400 --mths tuner --noise 0 --R 27 --n_workers ${n_workers} --rep 10
- XGBoost on Covertype: runtime_limit=10800
python test/benchmark_xgb.py --datasets covtype --runtime_limit 10800 --mth tuner --R 27 --n_workers ${n_workers} --rep 10
Note: if you do not have enough CPUs on one machine to run the experiment with n_workers=16 (which requires 16*16 CPUs), you can run on multiple machines by the following commands:
- First, start the master node with some local workers (e.g. 8 local workers, need 16 workers in total).
python test/benchmark_xgb.py --n_jobs 16 --datasets covtype --runtime_limit 10800 --mth tuner --R 27 --n_workers 16 --max_local_workers 8 --port 13579 --rep 1 --start_id 0
- Then, start the worker nodes with more workers (e.g. 1 worker node with 8 workers). Please specify IP and port of master node.
python test/benchmark_xgb_worker.py --n_jobs 16 --parallel async --dataset covtype --R 27 --n_workers 8 --ip ${master_ip} --port 13579
- In this example, experiment is conducted only once. Please specify
--start_id
to run experiment multiple times with different random seeds.
In ablation study, the compared experimental methods are as follows:
Method | String of ${method_name} |
---|---|
A-Hyperband with bracket selection | ahyperband_bs |
A-BOHB*(our implementation) | abohb |
A-BOHB* with bracket selection | abohb_bs |
ours without bracket selection | tuner_exp1 |
delayed ASHA | asha_delayed |
A-Hyperband with delayed ASHA | ahyperband_delayed |
ours with original ASHA | tuner_exp2 |
To conduct the simulation experiment shown in Figure 9(a), the script are as follows.
python test/nas_benchmarks/benchmark_nasbench201.py --data_path './NAS-Bench-201-v1_1-096897.pth' --dataset cifar10-valid --runtime_limit 86400 --mths ahyperband,ahyperband_bs,abohb,abohb_bs,tuner_exp1,tuner --R 27 --n_workers 8 --rep 10
To conduct the simulation experiment shown in Figure 9(b), the script are as follows.
python test/nas_benchmarks/benchmark_nasbench201.py --data_path './NAS-Bench-201-v1_1-096897.pth' --dataset ImageNet16-120 --runtime_limit 432000 --mths ahyperband,ahyperband_bs,abohb,abohb_bs,tuner_exp1,tuner --R 27 --n_workers 8 --rep 10
To conduct the experiment shown in Figure 9(c), the script are as follows.
python test/benchmark_xgb.py --datasets covtype --runtime_limit 10800 --mth asha --R 27 --n_workers 8 --rep 10
python test/benchmark_xgb.py --datasets covtype --runtime_limit 10800 --mth asha_delayed --R 27 --n_workers 8 --rep 10
python test/benchmark_xgb.py --datasets covtype --runtime_limit 10800 --mth ahyperband --R 27 --n_workers 8 --rep 10
python test/benchmark_xgb.py --datasets covtype --runtime_limit 10800 --mth ahyperband_delayed --R 27 --n_workers 8 --rep 10
python test/benchmark_xgb.py --datasets covtype --runtime_limit 10800 --mth tuner_exp2 --R 27 --n_workers 8 --rep 10
python test/benchmark_xgb.py --datasets covtype --runtime_limit 10800 --mth tuner --R 27 --n_workers 8 --rep 10
To conduct the experiment shown in Figure 9(d), the script are as follows.
python test/benchmark_xgb.py --datasets pokerhand --runtime_limit 7200 --mth asha --R 27 --n_workers 8 --rep 10
python test/benchmark_xgb.py --datasets pokerhand --runtime_limit 7200 --mth asha_delayed --R 27 --n_workers 8 --rep 10
python test/benchmark_xgb.py --datasets pokerhand --runtime_limit 7200 --mth ahyperband --R 27 --n_workers 8 --rep 10
python test/benchmark_xgb.py --datasets pokerhand --runtime_limit 7200 --mth ahyperband_delayed --R 27 --n_workers 8 --rep 10
python test/benchmark_xgb.py --datasets pokerhand --runtime_limit 7200 --mth tuner_exp2 --R 27 --n_workers 8 --rep 10
python test/benchmark_xgb.py --datasets pokerhand --runtime_limit 7200 --mth tuner --R 27 --n_workers 8 --rep 10
To run the baseline method A-BOHB(abohb_aws
) in the experiments, please install Autogluon(https://github.com/awslabs/autogluon).
And we provide scripts in test/autogluon_abohb/
. Usages are as follows.
Note: Autogluon uses --num_cpus
and --num_gpus
to infer number of workers. --n_workers
is just for method naming.
Please set appropriate --num_cpus
and --num_gpus
according to your machine to limit the number of workers.
Note: Please specify --start_id
to run experiment multiple times with different random seeds.
- Run Nas-Bench-201. Please specify
${runtime_limit}
and${dataset}
:
python test/autogluon_abohb/benchmark_autogluon_abohb_nasbench201.py --R 27 --reduction_factor 3 --brackets 4 --num_cpus 16 --n_workers 8 --timeout ${runtime_limit} --dataset ${dataset} --rep 1 --start_id 0
- Run XGBoost. Please specify
${runtime_limit}
and${dataset}
:
python test/autogluon_abohb/benchmark_autogluon_abohb_xgb.py --R 27 --reduction_factor 3 --brackets 4 --num_cpus 16 --n_workers 8 --n_jobs 16 --timeout ${runtime_limit} --dataset ${dataset} --rep 1 --start_id 0
- Run LSTM:
python test/autogluon_abohb/benchmark_autogluon_abohb_lstm.py --R 27 --reduction_factor 3 --brackets 4 --num_gpus 1 --n_workers 4 --timeout 172800 --dataset penn --rep 1 --start_id 0
- Run ResNet:
python test/autogluon_abohb/benchmark_autogluon_abohb_resnet.py --R 27 --reduction_factor 3 --brackets 4 --num_gpus 1 --n_workers 4 --timeout 172800 --dataset cifar10 --rep 1 --start_id 0
- Run noised Hartmann. Please specify
${noise_alpha}
:
python test/autogluon_abohb/benchmark_autogluon_abohb_math.py --R 27 --reduction_factor 3 --brackets 4 --num_cpus 16 --n_workers 8 --timeout 1080 --dataset hartmann --noise_alpha ${noise_alpha} --rep 1 --start_id 0