Skip to content

Latest commit

 

History

History
504 lines (403 loc) · 22 KB

Open-Vocabulary.md

File metadata and controls

504 lines (403 loc) · 22 KB

Contents

Open Vocabulary

2023 Papers

CVPR

  • OpenScene: 3D Scene Understanding with Open Vocabularies (CVPR 2023) [Paper]

  • Open-Vocabulary Point-Cloud Object Detection without 3D Annotation (CVPR 2023) [Paper]

  • Learning to Generate Language-supervised and Open-vocabulary Scene Graph using Pre-trained Visual-Semantic Space (CVPR 2023) [Paper]

  • Side Adapter Network for Open-Vocabulary Semantic Segmentation (CVPR 2023) [Paper]

  • Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models (CVPR 2023) [Paper]

  • Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations (CVPR 2023) [Paper]

  • Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP (CVPR 2023) [Paper]

  • Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers (CVPR 2023) [Paper]

  • Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection (CVPR 2023) [Paper]

  • Aligning Bag of Regions for Open-Vocabulary Object Detection (CVPR 2023) [Paper]

  • Open-set Fine-grained Retrieval via Prompting Vision-Language Evaluator (CVPR 2023) [Paper]

  • Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning (CVPR 2023) [Paper]

  • FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation (CVPR 2023) [Paper]

  • GLIGEN: Open-Set Grounded Text-to-Image Generation (CVPR 2023) [Paper]

  • DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment (CVPR 2023) [Paper]

  • OvarNet: Towards Open-vocabulary Object Attribute Recognition (CVPR 2023) [Paper]

  • PLA: Language-Driven Open-Vocabulary 3D Scene Understanding (CVPR 2023) [Paper]

  • Open-vocabulary Attribute Detection (CVPR 2023) [Paper]

  • Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision (CVPR 2023) [Paper]

  • Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs (CVPR 2023) [Paper]

  • CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching (CVPR 2023) [Paper]

  • OVTrack: Open-Vocabulary Multiple Object Tracking (CVPR 2023) [Paper]

  • Learning to Detect and Segment for Open Vocabulary Object Detection (CVPR 2023) [Paper]

  • Learning to Detect and Segment for Open Vocabulary Object Detection (CVPR 2023) [Paper]

ICLR

  • Open-vocabulary Object Detection via Vision and Language Knowledge Distillation (ICLR 2023) [Paper] [Code]
    Datasets: LVIS, PASCAL VOC, COCO, Objects365
    Task: Object Detection

ICCV

  • Global Knowledge Calibration for Fast Open-Vocabulary Segmentation (ICCV 2023) [Paper]

  • Open-vocabulary Panoptic Segmentation with Embedding Modulation (ICCV 2023) [Paper]

ICML

  • SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation (ICML 2023) [Paper]

  • Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization (ICML 2023) [Paper]

  • Open-Vocabulary Universal Image Segmentation with MaskCLIP (ICML 2023) [Paper]

  • Multi-Modal Classifiers for Open-Vocabulary Object Detection (ICML 2023) [Paper]

CVPRw

  • Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models (CVPRw 2023) [Paper]

Arxiv & Others

  • A Language-Guided Benchmark for Weakly Supervised Open Vocabulary Semantic Segmentation (Arxiv 2023) [Paper]

  • Aligning Bag of Regions for Open-Vocabulary Object Detection (Arxiv 2023) [Paper]

  • From Occlusion to Insight: Object Search in Semantic Shelves using Large Language Models (Arxiv 2023) [Paper]

  • Side Adapter Network for Open-Vocabulary Semantic Segmentation (Arxiv 2023) [Paper]

  • CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets (Arxiv 2023) [Paper]

2022 Papers

CVPR

  • Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation (CVPR 2022) [Paper] [Code]
    Datasets: MS COCO
    Task: Object Detection

  • Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling (CVPR 2022) [Paper] [Code]
    Datasets: MS-COCO, Open Images, Conceptual Caption
    Task: Instance segmentation

  • Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model (CVPR 2022) [Paper] [Code]
    Datasets: LVIS v1, Pascal VOC Dataset, COCO, Objects365 Dataset
    Task: Object detection and instance segmentation

  • NOC-REK: Novel Object Captioning With Retrieved Vocabulary From External Knowledge (CVPR 2022) [Paper]
    Datasets: COCO, Nocaps
    Task: Novel Object Captioning

NeurIPS

  • Patching open-vocabulary models by interpolating weights (NeurIPS 2022) [Paper] [Code]
    Datasets: Cars, DTD, EuroSAT, GTSRB, KITTI, MNIST, RESISC45, SUN397, and SVHN. We use the remaining tasks as supported tasks: CIFAR10, CIFAR100, Food101, ImageNet, and STL10
    Task: Model Patching

  • Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection (NeurIPS 2022) [Paper] [Code]
    Datasets: COCO, LVIS v1.0, OpenImages, Objects365
    Task: Object Detection

  • Paraphrasing Is All You Need for Novel Object Captioning (NeurIPS 2022) [Paper]
    Datasets: Open Images V4, COCO Captions 2017
    Task: Image Captioning

ECCV

  • PromptDet: Towards Open-vocabulary Detection using Uncurated Images (ECCV 2022) [Paper] [Code]
    Datasets: LVIS, LAION-400M and LAION-Novel, COCO
    Task: Object Detection

  • Scaling Open-vocabulary Image Segmentation with Image-level Labels (ECCV 2022) [Paper]
    Datasets: COCO, Localized Narrative (Loc. Narr.) test: PASCAL Context, PASCAL VOC, ADE20k
    Task: Instance segmentation

  • Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning (ECCV 2022) [Paper]
    Datasets: Visual Genome(VG), GQA, Open-Image
    Task: Scene Graph Generation

  • Simple Open-Vocabulary Object Detection with Vision Transformers (ECCV 2022) [Paper] [Code]
    Datasets: OpenImages V4 (OI), Objects 365 (O365),and/or Visual Genome (VG) - Evaluation: COCO, LVIS, and O365
    Task: Object Detection

  • Open Vocabulary Object Detection with Pseudo Bounding-Box Labels (ECCV 2022) [Paper] [Code]
    Datasets: COCO Caption, Visual-Genome, and SBU Caption (Object names: COCO, PASCAL VOC, Objects365 and LVIS)
    Task: Object Detection

  • Open-Vocabulary DETR with Conditional Matching (ECCV 2022 Oral) [Paper] [Code]
    Datasets: LVIS, COCO
    Task: Object Detection

  • Improving Closed and Open-Vocabulary Attribute Prediction using Transformers (ECCV 2022) [Paper] [Code]
    Datasets: VAW (closed-set) LSA common, LSA common→rare, HICO
    Task: Attribute Prediction

  • A Simple Baseline for Open Vocabulary Semantic Segmentation with Pre-trained Vision-language Model (ECCV 2022) [Paper] [Code]
    Datasets: COCO Stuff; Pascal VOC 2012; Cityscapes; Pascal Context; ADE20K
    Task: Semantic Segmentation

  • A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility (ECCV 2022) [Paper] [Code]
    Datasets: MoTIF
    Task: Vision-Language Navigation (Apps)

  • Acknowledging the Unknown for Multi-label Learning with Single Positive Labels (ECCV 2022) [Paper] [Code]
    Datasets: PASCAL VOC 2012 (VOC), MS-COCO 2014 (COCO), NUS-WIDE (NUS), and CUB-200-2011 (CUB)
    Task: Single Positive Multi-label Learning

AAAI

  • OVIS: Open-Vocabulary Visual Instance Search via Visual-Semantic Aligned Representation Learning (AAAI 2022) [Paper]
    Datasets: OVIS40; OVIS1600
    Task: Visual Instance Search

  • Open Vocabulary Electroencephalography-to-Text Decoding and Zero-Shot Sentiment Classification (AAAI 2022) [Paper] [Code]
    Datasets: ZuCo
    Task: Brain Signals Language Decoding

WACV

  • From Node To Graph: Joint Reasoning on Visual-Semantic Relational Graph for Zero-Shot Detection (WACV 2022) [Paper] [Code]
    Datasets: MSCOCO
    Task: Object Detection

  • Trading-Off Information Modalities in Zero-Shot Classification (WACV 2022) [Paper] [Code]
    Datasets: Caltech UCSD Birds 200-2011 (CUB), Animals with Attributes 1 and 2 (AWA1 & AWA2), attribute Pascal & Yahoo (APY), SUN attributes (SUN) and Oxford flowers (FLO)
    Task: Image Classification

BMVC

  • Partially-Supervised Novel Object Captioning Using Context from Paired Data (BMVC 2022) [Paper]
    Datasets: MS COCO
    Task: Object Captioning

  • Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models (BMVC 2022) [Paper] [Code]
    Datasets: : PASCAL-5i, COCO-20i, FSS-1000, Mosaic-4
    Task: Semantic Segmentation

Arxiv & Others

  • Describing Sets of Images with Textual-PCA (EMNLP 2022) [Paper] [Code]
    Datasets: CelebA; Stanford Cars; COCO-Horses; LSUN-Church
    Task: Text Generation for Sets of Images

2021 Papers

CVPR

  • Open-Vocabulary Object Detection Using Captions (CVPR 2021) [Paper] [Code]
    Datasets: COCO Objects, COCO Captions
    Task: Object Detection

Older Papers

  • A Latent Morphology Model for Open-Vocabulary Neural Machine Translation (ICLR 2020 Spotlight) [Paper] [Code]
    Datasets: Arabic (AR), Czech (CS) and Turkish (TR)
    Task: Neural Machine Translation

  • Open Vocabulary Learning on Source Code with a Graph-Structured Cache (ICML 2019) [Paper]
    Datasets: Java source code
    Task: Java source code Learning

  • Visual Question Generation for Class Acquisition of Unknown Objects (ECCV 2018) [Paper] [Code]
    Datasets: Visual Genome, ILSVRC2012, ILSVRC2010, WordNet
    Task: Visual Question Generation, Object Detection

  • Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input (ECCV 2018) [Paper] [Code]
    Datasets: Places Audio Caption, ADE20k, MSCOCO
    Task: Audio-Visual Associative Localizations

  • Image Captioning with Unseen Objects (BMVC 2019) [Paper]
    Datasets: COCO
    Task: Image Captioning

  • nocaps: novel object captioning at scale (ICCV 2019) [Paper] [Code]
    Datasets: nocaps, COCO Captions
    Task: Image Captioning

  • Pointing Novel Objects in Image Captioning (CVPR 2019) [Paper]
    Datasets: held-out COCO, ImageNet
    Task: Image Captioning

  • Learning User Representations for Open Vocabulary Image Hashtag Prediction (CVPR 2020) [Paper]
    Datasets: YFCC100M
    Task: Image Hashtag Prediction

  • Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions (ECCV 2020) [Paper] [Code]
    Datasets: BSDS500, Conceptual Captions
    Task: Image Manipulation

Open Vocabulary Videos

2023 Papers

CVPR

  • Open-Category Human-Object Interaction Pre-training via Language Modeling Framework (CVPR 2023) [Paper]

  • Being Comes from Not-being: Open-vocabulary Text-to-Motion Generation with Wordless Training (CVPR 2023) [Paper]

  • OVTrack: Open-Vocabulary Multiple Object Tracking (CVPR 2023) [Paper]

ICLR

  • The Devil is in the Wrongly-classified Samples: Towards Unified Open-set Recognition (ICLR 2023) [Paper]
    Datasets: CIFAR100, LSUN, MiTv2, UCF101, HMDB51
    Task: Image and Video Classification

ICML

  • Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization (ICML 2023) [Paper]

Arxiv & Others

  • Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization (Arxiv 2023) [Paper]

  • TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation (Arxiv 2023) [Paper]

  • MVP-SEG: Multi-View Prompt Learning for Open-Vocabulary Semantic Segmentation (Arxiv 2023) [Paper]

  • Segment Everything Everywhere All at Once (Arxiv 2023) [Paper]

  • Towards Open-Vocabulary Video Instance Segmentation (Arxiv 2023) [Paper]

  • CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks (Arxiv 2023) [Paper]

  • Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition (Arxiv 2023) [Paper]

  • V3Det: Vast Vocabulary Visual Detection Dataset (Arxiv 2023) [Paper]

  • Token Merging for Fast Stable Diffusion (Arxiv 2023) [Paper]

  • Going Beyond Nouns With Vision & Language Models Using Synthetic Data (Arxiv 2023) [Paper]

  • MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks (Arxiv 2023) [Paper]

  • ZBS: Zero-shot Background Subtraction via Instance-level Background Modeling and Foreground Selection (CVPR 2023) [Paper]

  • Prompt-Guided Transformers for End-to-End Open-Vocabulary Object Detection (Arxiv 2023) [Paper]

  • Three ways to improve feature alignment for open vocabulary detection (Arxiv 2023) [Paper]

  • Zero-guidance Segmentation Using Zero Segment Labels (Arxiv 2023) [Paper]

  • Open-Vocabulary Object Detection using Pseudo Caption Labels (Arxiv 2023) [Paper]

  • Uni-Fusion: Universal Continuous Mapping (Arxiv 2023) [Paper]

2022 Papers

Arxiv & Others

  • Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features (Arxiv 2022) [Paper]