SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection

Mingxuan Liu · Tyler L. Hayes · Elisa Ricci · Gabriela Csurka · Riccardo Volpi

CVPR 2024 ✨Highlight✨

Paper | ArXiv | Code | Poster (coming soon)

Installation

Requirements:

Linux or macOS with Python ≥ 3.8
PyTorch ≥ 1.8.2. Install them together at pytorch.org to make sure of this. Note, please check PyTorch version matches that is required by Detectron2.
Detectron2: follow Detectron2 installation instructions.
OpenAI API (optional, if you want to construct hierarchies using LMMs)

Setup environment

# Clone this project repository under your workspace folder
git clone https://github.com/naver/shine.git --recurse-submodules
cd shine
# Create conda environment and install the dependencies
conda env create -n shine -f shine.yml
# Activate the working environment
conda activate shine
# Install Detectron2 under your workspace folder
# (Please follow Detectron2 official instructions)
cd ..
git clone git@github.com:facebookresearch/detectron2.git
cd detectron2
pip install -e .

Our project uses two submodules, CenterNet2 and Deformable-DETR. If you forget to add --recurse-submodules, do git submodule init and then git submodule update.

Set your OpenAI API Key to the environment variable (optional: if you want to generate hierarchies)

export OPENAI_API_KEY=YOUR_OpenAI_Key

OvOD Models Preparation

SHiNe is training-free. So we just need to download off-the-shelf OvOD models and apply SHiNe on top of them. You can download the models:

and put (or, softlink via ln -s command) under the models folder in this repository as:

SHiNe
    └── models
          ├── codet
            ├── CoDet_OVLVIS_R5021k_4x_ft4x.pth
            └── CoDet_OVLVIS_SwinB_4x_ft4x.pth
          ├── detic
            ├── coco_ovod
              ├── BoxSup_OVCOCO_CLIP_R50_1x.pth
              ├── Detic_OVCOCO_CLIP_R50_1x_caption.pth
              ├── Detic_OVCOCO_CLIP_R50_1x_max-size.pth
              └── Detic_OVCOCO_CLIP_R50_1x_max-size_caption.pth
            ├── cross_eval
              ├── BoxSup-C2_L_CLIP_SwinB_896b32_4x.pth
              ├── BoxSup-C2_LCOCO_CLIP_SwinB_896b32_4x.pth
              ├── Detic_LCOCOI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pth
              ├── Detic_LI21k_CLIP_SwinB_896b32_4x_ft4x_max-size.pth
              └── Detic_LI_CLIP_SwinB_896b32_4x_ft4x_max-size.pth
            ├── lvis_ovod
              ├── BoxSup-C2_Lbase_CLIP_R5021k_640b64_4x.pth
              ├── BoxSup-C2_Lbase_CLIP_SwinB_896b32_4x.pth
              ├── Detic_LbaseCCcapimg_CLIP_R5021k_640b64_4x_ft4x_max-size.pth
              ├── Detic_LbaseCCimg_CLIP_R5021k_640b64_4x_ft4x_max-size.pth
              ├── Detic_LbaseI_CLIP_R5021k_640b64_4x_ft4x_max-size.pth
              └── Detic_LbaseI_CLIP_SwinB_896b32_4x_ft4x_max-size.pth
            ├── lvis_std
              ├── BoxSup-C2_L_CLIP_R5021k_640b64_4x.pth
              ├── BoxSup-DeformDETR_L_R50_4x.pth
              ├── Detic_DeformDETR_LI_R50_4x_ft4x.pth
              └── Detic_LI_CLIP_R5021k_640b64_4x_ft4x_max-size.pth
          ├── vldet
            ├── lvis_base.pth
            ├── lvis_base_swinB.pth
            ├── lvis_vldet.pth
            └── lvis_vldet_swinB.pth

Datasets Preparation

You can download the datasets:

iNat: 17.8 GB
FSOD: 14.7 GB
ImageNet-1k Val: 6.2 GB
COCO: 37.2 GB
LVIS: 1.8 GB

and put (or, softlink via ln -s command) under the datasets folder in this repository as:

SHiNe
    └── datasets
          ├── inat
          ├── fsod
          ├── imagenet2012
          ├── coco
          └── lvis

Run SHiNe on OvOD

Example of applying SHiNe on Detic for OvOD task using iNat dataset:

# Vanilla OvOD (baseline)
bash scripts_local/Detic/inat/swin/baseline/inat_detic_SwinB_LVIS-IN-21K-COCO_baseline.sh
 
# SHiNe using dataset-provided hierarchy
bash scripts_local/Detic/inat/swin/shine_gt/inat_detic_SwinB_LVIS-IN-21K-COCO_shine_gt.sh

# SHiNe using LLM-generated synthetic hierarchy
bash scripts_local/Detic/inat/swin/shine_llm/inat_detic_SwinB_LVIS-IN-21K-COCO_shine_llm.sh

Run SHiNe on Zero-shot classification

Example of applying SHiNe on CLIP zero-shot transfer task using ImageNet-1k dataset:

# Vanilla CLIP Zero-shot transfer (baseline)
bash scripts_local/Classification/imagenet1k/baseline/imagenet1k_vitL14_baseline.sh

# SHiNe using WordNet hierarchy
bash scripts_local/Classification/imagenet1k/shine_wordnet/imagenet1k_vitL14_shine_wordnet.sh

# SHiNe using LLM-generated synthetic hierarchy
bash scripts_local/Classification/imagenet1k/shine_llm/imagenet1k_vitL14_shine_llm.sh

SHiNe Construction (optional)

Example of constructing SHiNe classifier for OvOD task using iNat dataset:

# SHiNe using dataset-provided hierarchy
bash scripts_build_nexus/inat/build_inat_nexus_gt.sh
# SHiNe using LLM-generated synthetic hierarchy
bash scripts_build_nexus/inat/build_inat_nexus_llm.sh

Hierarchy Tree Planting (optional)

Example of building hierarchy trees using either dataset-provided or llm-generated hierarchy entities.

Dataset-provided Hierarchy

# Build hierarchy tree for iNat using dataset-provided hierarchy
bash scripts_plant_hrchy/inat/plant_inat_tree_gt.sh

# Build hierarchy tree for ImageNet-1k using WordNet hierarchy
bash scripts_plant_hrchy/imagenet1k/plant_imagenet1k_tree_wordnet.sh

LLM-generated Hierarchy

# Build hierarchy tree for iNat using LLM-generated synthetic hierarchy
bash scripts_plant_hrchy/inat/plant_inat_tree_llm.sh

# Build hierarchy tree for ImageNet-1k using LLM-generated synthetic hierarchy
bash scripts_plant_hrchy/imagenet1k/plant_imagenet1k_tree_llm.sh

License

This project is licensed under the LICENSE file.

Citation

If you find our work useful for your research, please cite our paper using the following BibTeX entry:

@inproceedings{liu2024shine,
  title={{SH}i{N}e: Semantic Hierarchy Nexus for Open-vocabulary Object Detection},
  author={Liu, Mingxuan and Hayes, Tyler L. and Ricci, Elisa and Csurka, Gabriela and Volpi, Riccardo},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024},
}

Acknowledgment

SHiNe is built upon the awesome works iNat, FSOD, BREEDS, Hierarchy-CLIP, Detic, VLDet, and CoDet. We sincerely thank them for their work and contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
codet		codet
configs_codet		configs_codet
configs_detic		configs_detic
configs_vldet		configs_vldet
datasets		datasets
detic		detic
materials		materials
models		models
nexus		nexus
scripts_build_nexus		scripts_build_nexus
scripts_local		scripts_local
scripts_plant_hrchy		scripts_plant_hrchy
scripts_slurm		scripts_slurm
shine		shine
shine_cls		shine_cls
slurm-output		slurm-output
third_party		third_party
vldet		vldet
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
shine.yml		shine.yml
train_net_codet.py		train_net_codet.py
train_net_detic.py		train_net_detic.py
train_net_detic_coco.py		train_net_detic_coco.py
train_net_vldet.py		train_net_vldet.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection

CVPR 2024 ✨Highlight✨

Paper | ArXiv | Code | Poster (coming soon)

Installation

OvOD Models Preparation

Datasets Preparation

Run SHiNe on OvOD

Run SHiNe on Zero-shot classification

SHiNe Construction (optional)

Hierarchy Tree Planting (optional)

Dataset-provided Hierarchy

LLM-generated Hierarchy

License

Citation

Acknowledgment

About

Releases

Packages

Contributors 2

Languages

License

naver/shine

Folders and files

Latest commit

History

Repository files navigation

SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection

CVPR 2024 ✨Highlight✨

Paper | ArXiv | Code | Poster (coming soon)

Installation

OvOD Models Preparation

Datasets Preparation

Run SHiNe on OvOD

Run SHiNe on Zero-shot classification

SHiNe Construction (optional)

Hierarchy Tree Planting (optional)

Dataset-provided Hierarchy

LLM-generated Hierarchy

License

Citation

Acknowledgment

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages