Modeling label space interactions using box embeddings

This is the official implementation for the paper Modeling label space interactions using box embeddings.

Setup

Install the requirements

pip install -r requirements.txt

Download data

Execute download_data.sh.

Weights and Biases account

Since there are 12 datasets, 8 models and 10 runs with different random seeds for each dataset-model pair (960 runs in total), we recommend using Weights and Biases server to log all the metrics.

Create a wandb account. It is free!
Create a new wandb project.
Login to wandb on your machine:

wandb login

Reproducing the metrics reported in the paper

Edit run.sh to use your wandb username and project name in wandb_entity and wandb_project, respectively.
Execute run.sh for different dataset and model settings.
Query the Weights and Biases server for the results. See the section below describing the methods to query the results.

See official project page see the runs used to report the results in the paper.

Tuning your own hyper-parameters

You can tune your own hyper-parameters for any model using the wandb sweeps.

Create a model config file (jsonnet). See model_configs directory for examples.
Create a sweep config. See example_sweep_configs folder for example sweep configs, and wandb docs for further details.
Create a sweep by executing wandb sweep path-to-sweep-config.yaml.
Start agents for the sweep by executing wandb agent <USERNAME/PROJECTNAME/SWEEPID>.
Check the progress using the dashboard at https://wandb.ai/USERNAME/PROJECT/sweeps/SWEEPID.

Querying the results

There are following three ways to see the results.

Use the command line utility called wandb-utils installed using pip install wandb-utils==0.1.2. Once installed, execute the following command:

wandb-utils -e box-mlc -p box-mlc-iclr-2022 all-data \
filter-df --pd-eval "test_CMAP=rmax(df.test_MAP_max_n, df.test_MAP_min_n)" \
filter-df --pd-eval "_model=df.tags.str.extract(r'model@([^\|]+)',expand=False)" \
filter-df --pd-eval "_dataset=df.tags.str.extract(r'dataset@([^\|]+)',expand=False)" \
filter-df --pd-eval "df.groupby(['_model', '_dataset'], as_index=False).mean()" \
filter-df -f test_MAP -f test_CMAP -f test_constraint_violation  -f _model -f _dataset \
print

To use this for your own project, replace box-mlc and box-mlc-iclr-2022 with your own username and project name.

Use wandb's python api directly.
Use the wandb dashboard

NaN issue in MAP

There is a bug in sklearn that makes mean_average_precision return NaN when there are not true positives. Apply the one line fix mentioned in this PR. This fix has not been merged yet. Hence one has to patch this manually. Even after fixing this, if the dataset has a lot of instances where there are no true labels, you might need to disable the showing of the warning multiple times. For this, you can do the following based on python docs:

export PYTHONWARNINGS=once:::sklearn.metrics[.*]

Cite

@inproceedings{
patel2022modeling,
title={Modeling Label Space Interactions in Multi-label Classification using Box Embeddings},
author={Dhruvesh Patel and Pavitra Dangati and Jay-Yoon Lee and Michael Boratko and Andrew McCallum},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=tyTH9kOxcvh}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
best_models_configs		best_models_configs
box_mlc		box_mlc
data_preprocessing		data_preprocessing
model_configs		model_configs
.allennlp_plugins		.allennlp_plugins
.gitignore		.gitignore
.wandb_utils_config.yaml		.wandb_utils_config.yaml
LICENSE		LICENSE
README.md		README.md
core_requirements.txt		core_requirements.txt
download_data.sh		download_data.sh
iclr-frozen-requirements.txt		iclr-frozen-requirements.txt
requirements.txt		requirements.txt
run.sh		run.sh
setup.py		setup.py
setup_env.sh		setup_env.sh
significance-test.ipynb		significance-test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Modeling label space interactions using box embeddings

Setup

Install the requirements

Download data

Weights and Biases account

Reproducing the metrics reported in the paper

Tuning your own hyper-parameters

Querying the results

NaN issue in MAP

Cite

About

Releases

Packages

Languages

License

iesl/box-mlc-iclr-2022

Folders and files

Latest commit

History

Repository files navigation

Modeling label space interactions using box embeddings

Setup

Install the requirements

Download data

Weights and Biases account

Reproducing the metrics reported in the paper

Tuning your own hyper-parameters

Querying the results

NaN issue in MAP

Cite

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages