-
OpenScene: 3D Scene Understanding with Open Vocabularies (CVPR 2023) [Paper]
-
Open-Vocabulary Point-Cloud Object Detection without 3D Annotation (CVPR 2023) [Paper]
-
Learning to Generate Language-supervised and Open-vocabulary Scene Graph using Pre-trained Visual-Semantic Space (CVPR 2023) [Paper]
-
Side Adapter Network for Open-Vocabulary Semantic Segmentation (CVPR 2023) [Paper]
-
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models (CVPR 2023) [Paper]
-
Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations (CVPR 2023) [Paper]
-
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP (CVPR 2023) [Paper]
-
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers (CVPR 2023) [Paper]
-
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection (CVPR 2023) [Paper]
-
Aligning Bag of Regions for Open-Vocabulary Object Detection (CVPR 2023) [Paper]
-
Open-set Fine-grained Retrieval via Prompting Vision-Language Evaluator (CVPR 2023) [Paper]
-
Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning (CVPR 2023) [Paper]
-
FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation (CVPR 2023) [Paper]
-
GLIGEN: Open-Set Grounded Text-to-Image Generation (CVPR 2023) [Paper]
-
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment (CVPR 2023) [Paper]
-
OvarNet: Towards Open-vocabulary Object Attribute Recognition (CVPR 2023) [Paper]
-
PLA: Language-Driven Open-Vocabulary 3D Scene Understanding (CVPR 2023) [Paper]
-
Open-vocabulary Attribute Detection (CVPR 2023) [Paper]
-
Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision (CVPR 2023) [Paper]
-
Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs (CVPR 2023) [Paper]
-
CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching (CVPR 2023) [Paper]
-
OVTrack: Open-Vocabulary Multiple Object Tracking (CVPR 2023) [Paper]
-
Learning to Detect and Segment for Open Vocabulary Object Detection (CVPR 2023) [Paper]
-
Learning to Detect and Segment for Open Vocabulary Object Detection (CVPR 2023) [Paper]
- Open-vocabulary Object Detection via Vision and Language Knowledge Distillation (ICLR 2023)
[Paper]
[Code]
Datasets: LVIS, PASCAL VOC, COCO, Objects365
Task: Object Detection
-
Global Knowledge Calibration for Fast Open-Vocabulary Segmentation (ICCV 2023) [Paper]
-
Open-vocabulary Panoptic Segmentation with Embedding Modulation (ICCV 2023) [Paper]
-
SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation (ICML 2023) [Paper]
-
Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization (ICML 2023) [Paper]
-
Open-Vocabulary Universal Image Segmentation with MaskCLIP (ICML 2023) [Paper]
-
Multi-Modal Classifiers for Open-Vocabulary Object Detection (ICML 2023) [Paper]
- Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models (CVPRw 2023) [Paper]
-
A Language-Guided Benchmark for Weakly Supervised Open Vocabulary Semantic Segmentation (Arxiv 2023) [Paper]
-
Aligning Bag of Regions for Open-Vocabulary Object Detection (Arxiv 2023) [Paper]
-
From Occlusion to Insight: Object Search in Semantic Shelves using Large Language Models (Arxiv 2023) [Paper]
-
Side Adapter Network for Open-Vocabulary Semantic Segmentation (Arxiv 2023) [Paper]
-
CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets (Arxiv 2023) [Paper]
-
Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation (CVPR 2022) [Paper] [Code]
Datasets: MS COCO
Task: Object Detection -
Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling (CVPR 2022) [Paper] [Code]
Datasets: MS-COCO, Open Images, Conceptual Caption
Task: Instance segmentation -
Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model (CVPR 2022) [Paper] [Code]
Datasets: LVIS v1, Pascal VOC Dataset, COCO, Objects365 Dataset
Task: Object detection and instance segmentation -
NOC-REK: Novel Object Captioning With Retrieved Vocabulary From External Knowledge (CVPR 2022) [Paper]
Datasets: COCO, Nocaps
Task: Novel Object Captioning
-
Patching open-vocabulary models by interpolating weights (NeurIPS 2022) [Paper] [Code]
Datasets: Cars, DTD, EuroSAT, GTSRB, KITTI, MNIST, RESISC45, SUN397, and SVHN. We use the remaining tasks as supported tasks: CIFAR10, CIFAR100, Food101, ImageNet, and STL10
Task: Model Patching -
Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection (NeurIPS 2022) [Paper] [Code]
Datasets: COCO, LVIS v1.0, OpenImages, Objects365
Task: Object Detection -
Paraphrasing Is All You Need for Novel Object Captioning (NeurIPS 2022) [Paper]
Datasets: Open Images V4, COCO Captions 2017
Task: Image Captioning
-
PromptDet: Towards Open-vocabulary Detection using Uncurated Images (ECCV 2022) [Paper] [Code]
Datasets: LVIS, LAION-400M and LAION-Novel, COCO
Task: Object Detection -
Scaling Open-vocabulary Image Segmentation with Image-level Labels (ECCV 2022) [Paper]
Datasets: COCO, Localized Narrative (Loc. Narr.) test: PASCAL Context, PASCAL VOC, ADE20k
Task: Instance segmentation -
Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning (ECCV 2022) [Paper]
Datasets: Visual Genome(VG), GQA, Open-Image
Task: Scene Graph Generation -
Simple Open-Vocabulary Object Detection with Vision Transformers (ECCV 2022) [Paper] [Code]
Datasets: OpenImages V4 (OI), Objects 365 (O365),and/or Visual Genome (VG) - Evaluation: COCO, LVIS, and O365
Task: Object Detection -
Open Vocabulary Object Detection with Pseudo Bounding-Box Labels (ECCV 2022) [Paper] [Code]
Datasets: COCO Caption, Visual-Genome, and SBU Caption (Object names: COCO, PASCAL VOC, Objects365 and LVIS)
Task: Object Detection -
Open-Vocabulary DETR with Conditional Matching (ECCV 2022 Oral) [Paper] [Code]
Datasets: LVIS, COCO
Task: Object Detection -
Improving Closed and Open-Vocabulary Attribute Prediction using Transformers (ECCV 2022) [Paper] [Code]
Datasets: VAW (closed-set) LSA common, LSA common→rare, HICO
Task: Attribute Prediction -
A Simple Baseline for Open Vocabulary Semantic Segmentation with Pre-trained Vision-language Model (ECCV 2022) [Paper] [Code]
Datasets: COCO Stuff; Pascal VOC 2012; Cityscapes; Pascal Context; ADE20K
Task: Semantic Segmentation -
A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility (ECCV 2022) [Paper] [Code]
Datasets: MoTIF
Task: Vision-Language Navigation (Apps) -
Acknowledging the Unknown for Multi-label Learning with Single Positive Labels (ECCV 2022) [Paper] [Code]
Datasets: PASCAL VOC 2012 (VOC), MS-COCO 2014 (COCO), NUS-WIDE (NUS), and CUB-200-2011 (CUB)
Task: Single Positive Multi-label Learning
-
OVIS: Open-Vocabulary Visual Instance Search via Visual-Semantic Aligned Representation Learning (AAAI 2022) [Paper]
Datasets: OVIS40; OVIS1600
Task: Visual Instance Search -
Open Vocabulary Electroencephalography-to-Text Decoding and Zero-Shot Sentiment Classification (AAAI 2022) [Paper] [Code]
Datasets: ZuCo
Task: Brain Signals Language Decoding
-
From Node To Graph: Joint Reasoning on Visual-Semantic Relational Graph for Zero-Shot Detection (WACV 2022) [Paper] [Code]
Datasets: MSCOCO
Task: Object Detection -
Trading-Off Information Modalities in Zero-Shot Classification (WACV 2022) [Paper] [Code]
Datasets: Caltech UCSD Birds 200-2011 (CUB), Animals with Attributes 1 and 2 (AWA1 & AWA2), attribute Pascal & Yahoo (APY), SUN attributes (SUN) and Oxford flowers (FLO)
Task: Image Classification
-
Partially-Supervised Novel Object Captioning Using Context from Paired Data (BMVC 2022) [Paper]
Datasets: MS COCO
Task: Object Captioning -
Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models (BMVC 2022) [Paper] [Code]
Datasets: : PASCAL-5i, COCO-20i, FSS-1000, Mosaic-4
Task: Semantic Segmentation
- Describing Sets of Images with Textual-PCA (EMNLP 2022)
[Paper]
[Code]
Datasets: CelebA; Stanford Cars; COCO-Horses; LSUN-Church
Task: Text Generation for Sets of Images
- Open-Vocabulary Object Detection Using Captions (CVPR 2021)
[Paper]
[Code]
Datasets: COCO Objects, COCO Captions
Task: Object Detection
-
A Latent Morphology Model for Open-Vocabulary Neural Machine Translation (ICLR 2020 Spotlight) [Paper] [Code]
Datasets: Arabic (AR), Czech (CS) and Turkish (TR)
Task: Neural Machine Translation -
Open Vocabulary Learning on Source Code with a Graph-Structured Cache (ICML 2019) [Paper]
Datasets: Java source code
Task: Java source code Learning -
Visual Question Generation for Class Acquisition of Unknown Objects (ECCV 2018) [Paper] [Code]
Datasets: Visual Genome, ILSVRC2012, ILSVRC2010, WordNet
Task: Visual Question Generation, Object Detection -
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input (ECCV 2018) [Paper] [Code]
Datasets: Places Audio Caption, ADE20k, MSCOCO
Task: Audio-Visual Associative Localizations -
Image Captioning with Unseen Objects (BMVC 2019) [Paper]
Datasets: COCO
Task: Image Captioning -
nocaps: novel object captioning at scale (ICCV 2019) [Paper] [Code]
Datasets: nocaps, COCO Captions
Task: Image Captioning -
Pointing Novel Objects in Image Captioning (CVPR 2019) [Paper]
Datasets: held-out COCO, ImageNet
Task: Image Captioning -
Learning User Representations for Open Vocabulary Image Hashtag Prediction (CVPR 2020) [Paper]
Datasets: YFCC100M
Task: Image Hashtag Prediction -
Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions (ECCV 2020) [Paper] [Code]
Datasets: BSDS500, Conceptual Captions
Task: Image Manipulation
-
Open-Category Human-Object Interaction Pre-training via Language Modeling Framework (CVPR 2023) [Paper]
-
Being Comes from Not-being: Open-vocabulary Text-to-Motion Generation with Wordless Training (CVPR 2023) [Paper]
-
OVTrack: Open-Vocabulary Multiple Object Tracking (CVPR 2023) [Paper]
- The Devil is in the Wrongly-classified Samples: Towards Unified Open-set Recognition (ICLR 2023)
[Paper]
Datasets: CIFAR100, LSUN, MiTv2, UCF101, HMDB51
Task: Image and Video Classification
- Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization (ICML 2023) [Paper]
-
Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization (Arxiv 2023) [Paper]
-
TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation (Arxiv 2023) [Paper]
-
MVP-SEG: Multi-View Prompt Learning for Open-Vocabulary Semantic Segmentation (Arxiv 2023) [Paper]
-
Segment Everything Everywhere All at Once (Arxiv 2023) [Paper]
-
Towards Open-Vocabulary Video Instance Segmentation (Arxiv 2023) [Paper]
-
CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks (Arxiv 2023) [Paper]
-
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition (Arxiv 2023) [Paper]
-
V3Det: Vast Vocabulary Visual Detection Dataset (Arxiv 2023) [Paper]
-
Token Merging for Fast Stable Diffusion (Arxiv 2023) [Paper]
-
Going Beyond Nouns With Vision & Language Models Using Synthetic Data (Arxiv 2023) [Paper]
-
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks (Arxiv 2023) [Paper]
-
ZBS: Zero-shot Background Subtraction via Instance-level Background Modeling and Foreground Selection (CVPR 2023) [Paper]
-
Prompt-Guided Transformers for End-to-End Open-Vocabulary Object Detection (Arxiv 2023) [Paper]
-
Three ways to improve feature alignment for open vocabulary detection (Arxiv 2023) [Paper]
-
Zero-guidance Segmentation Using Zero Segment Labels (Arxiv 2023) [Paper]
-
Open-Vocabulary Object Detection using Pseudo Caption Labels (Arxiv 2023) [Paper]
-
Uni-Fusion: Universal Continuous Mapping (Arxiv 2023) [Paper]
- Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features (Arxiv 2022) [Paper]