This is the official implementation for the paper Modeling label space interactions using box embeddings.
pip install -r requirements.txt
Execute download_data.sh
.
Since there are 12 datasets, 8 models and 10 runs with different random seeds for each dataset-model pair (960 runs in total), we recommend using Weights and Biases server to log all the metrics.
- Create a wandb account. It is free!
- Create a new wandb project.
- Login to wandb on your machine:
wandb login
- Edit
run.sh
to use your wandb username and project name inwandb_entity
andwandb_project
, respectively. - Execute
run.sh
for different dataset and model settings. - Query the Weights and Biases server for the results. See the section below describing the methods to query the results.
See official project page see the runs used to report the results in the paper.
You can tune your own hyper-parameters for any model using the wandb sweeps.
- Create a model config file (jsonnet). See
model_configs
directory for examples. - Create a sweep config. See
example_sweep_configs
folder for example sweep configs, and wandb docs for further details. - Create a sweep by executing
wandb sweep path-to-sweep-config.yaml
. - Start agents for the sweep by executing
wandb agent <USERNAME/PROJECTNAME/SWEEPID>
. - Check the progress using the dashboard at
https://wandb.ai/USERNAME/PROJECT/sweeps/SWEEPID
.
There are following three ways to see the results.
- Use the command line utility called
wandb-utils
installed usingpip install wandb-utils==0.1.2
. Once installed, execute the following command:
wandb-utils -e box-mlc -p box-mlc-iclr-2022 all-data \
filter-df --pd-eval "test_CMAP=rmax(df.test_MAP_max_n, df.test_MAP_min_n)" \
filter-df --pd-eval "_model=df.tags.str.extract(r'model@([^\|]+)',expand=False)" \
filter-df --pd-eval "_dataset=df.tags.str.extract(r'dataset@([^\|]+)',expand=False)" \
filter-df --pd-eval "df.groupby(['_model', '_dataset'], as_index=False).mean()" \
filter-df -f test_MAP -f test_CMAP -f test_constraint_violation -f _model -f _dataset \
print
To use this for your own project, replace box-mlc
and box-mlc-iclr-2022
with your own
username and project name.
There is a bug in sklearn that makes mean_average_precision
return NaN when there are not true positives. Apply the one line fix mentioned in this PR.
This fix has not been merged yet. Hence one has to patch this manually. Even after fixing this, if the dataset has a lot of instances where there are no true labels, you might need to disable the showing of the warning multiple times. For this, you can do the following based on python docs:
export PYTHONWARNINGS=once:::sklearn.metrics[.*]
@inproceedings{
patel2022modeling,
title={Modeling Label Space Interactions in Multi-label Classification using Box Embeddings},
author={Dhruvesh Patel and Pavitra Dangati and Jay-Yoon Lee and Michael Boratko and Andrew McCallum},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=tyTH9kOxcvh}
}