This is a repository for Bird's Eye View Perception, including 3D object detection, segmentation, online-mapping and occupancy prediction.
- 2023.05.09: An initial version of recent papers or projects.
- 2023.05.12: Adding paper for 3D object detection.
- 2023.05.14: Adding paper for BEV segmentation, HD-map construction, Occupancy prediction and motion planning.
- Survey
- 3D Object Detection
- BEV Segmentation
- Tracking
- Perception Prediction Planning
- Mapping
- LaneGraph
- Locate
- Occupancy Prediction
- Challenge
- Dataset
- World Model
- Other
- Vision-Centric BEV Perception: A Survey (Arxiv 2022)[Paper] [Github]
- Delving into the Devils of Bird’s-eye-viewPerception: A Review, Evaluation and Recipe (Arxiv 2022) [Paper] [Github]
- RaLiBEV: Radar and LiDAR BEV Fusion Learning for Anchor Box Free Object Detection System (Arxiv 2023) [Paper]
- Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D DynamicObject Detection (CVPR 2023) [paper] [Github]
- MaskBEV: Joint Object Detection and Footprint Completion for Bird’s-eye View 3D Point Clouds (IORS 2023) [Paper] [Github]
- LXL: LiDAR Excluded Lean 3D Object Detection with 4D Imaging Radar and Camera Fusion (Arxiv 2023) [Paper]
- CRAFT: Camera-Radar 3D Object Detectionwith Spatio-Contextual Fusion Transformer (Arxiv 2022) [Paper]
- RadSegNet: A Reliable Approach to Radar Camera Fusion (Arxiv 2022) [paper]
- Bridging the View Disparity of Radar and Camera Features for Multi-modal Fusion 3D Object Detection (IEEE TIV 2023) [Paper]
- CRN: Camera Radar Net for Accurate, Robust, Efficient 3D Perception (ICLRW 2023) [Paper]
- RC-BEVFusion: A Plug-In Module for Radar-CameraBird’s Eye View Feature Fusion (Arxiv 2023) [Paper]
- RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection (CVPR 2024) [Paper] [Github]
- UniBEVFusion: Unified Radar-Vision BEVFusion for 3D Object Detection (Arxiv 2024) [paper]
- Semantic bevfusion: rethink lidar-camera fusion in unified bird’s-eye view representation for 3d object detection (Arxiv 2022) [Paper]
- Sparse Dense Fusion for 3D Object Detection (Arxiv 2023) [Paper]
- EA-BEV: Edge-aware Bird' s-Eye-View Projector for 3D Object Detection (Arxiv 2023) [Paper] [Github]
- MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection (CVPR 2023) [paper] [Github]
- FULLER: Unified Multi-modality Multi-task 3D Perception via Multi-level Gradient Calibration (Arxiv 2023) [Paper]
- Detecting As Labeling: Rethinking LiDAR-camera Fusion in 3D Object Detection (Arxiv 2023) [paper]
- SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection (ICCV 2023) [Paper] [Github]
- 3DifFusionDet: Diffusion Model for 3D Object Detection with Robust LiDAR-Camera Fusion (Arxiv 2023) [Paper]
- FUSIONVIT: HIERARCHICAL 3D OBJECT DETECTION VIA LIDAR-CAMERA VISION TRANSFORMER FUSION (Arxiv 2023) [paper]
- Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers (Arxiv 2023) [Paper]
- PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection (Arxiv 2024) [Paper]
- Learned Multimodal Compression for Autonomous Driving (IEEE MMSP 2024) [Paper]
- Co-Fix3D: Enhancing 3D Object Detection with Collaborative Refinement (Arxiv 2024) [Paper]
- MGTANet: Encoding Sequential LiDAR Points Using Long Short-Term Motion-Guided Temporal Attention for 3D Object Detection (AAAI 2023)[paper][Github]
- PARTNER: Level up the Polar Representation for LiDAR 3D Object Detection (Arxiv 2023) [Paper]
- V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection (Arxiv 2023) [Paper]
- SEED: A Simple and Effective 3D DETR in Point Clouds (ECCV 2024) [Paper] [Github]
- Learning 2D to 3D Lifting for Object Detection in 3Dfor Autonomous Vehicles (IROS 2019) [Paper] [Project Page
- Orthographic Feature Transform for Monocular 3D Object Detection (BMVC 2019) [Paper] [Github]
- BEV-MODNet: Monocular Camera-based Bird's Eye View Moving Object Detection for Autonomous Driving (ITSC 2021) [Paper] [Project Page]
- Categorical Depth Distribution Network for Monocular 3D Object Detection (CVPR 2021) [Paper] [Github]
- PersDet: Monocular 3D Detection in Perspective Bird’s-Eye-View (Arxiv 2022) [Paper]
- Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for Autonomous Driving (CVPR 2022) [Paper]
- Monocular 3D Object Detection with Depth from Motion (ECCV 2022) [paper][Github]
- MonoNeRD: NeRF-like Representations for Monocular 3D Object Detection (ICCV 2023) [Paper] [Github]
- S3-MonoDETR: Supervised Shape&Scale-perceptive Deformable Transformer for Monocular 3D Object Detection (Arxiv 2023) [Paper] [Github]
- MonoGAE: Roadside Monocular 3D Object Detection with Ground-Aware Embeddings (Arxiv 2023) [Paper]
- YOLO-BEV: Generating Bird's-Eye View in the Same Way as 2D Object Detection (Arxiv 2023) [Paper]
- UniMODE: Unified Monocular 3D Object Detection (CVPR 2024) [Paper]
- Scalable Vision-Based 3D Object Detection and Monocular Depth Estimation for Autonomous Driving (Arxuv 2024) [paper] [Github]
- UniMODE: Unified Monocular 3D Object Detection (CVPR 2024) [Paper]
- MonoDETRNext: Next-generation Accurate and Efficient Monocular 3D Object Detection Method (Arxiv 2024) [Paper]
- MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors (Arxiv 2024) [Paper]
- Object DGCNN: 3D Object Detection using Dynamic Graphs (NIPS 2021) [Paper][Github]
- BEVDet: High-Performance Multi-Camera 3D Object Detection in Bird-Eye-View (Arxiv 2022) [Paper] [Github]
- DETR3D:3D Object Detection from Multi-view Image via 3D-to-2D Queries (CORL 2021) [Paper] [Github]
- BEVFusion: A Simple and Robust LiDAR-Camera Fusion Framework (NeurIPS 2022) [Paper][Github]
- Unifying Voxel-based Representation withTransformer for 3D Object Detectio (NeurIPS 2022) [paper][Github]
- Polar Parametrization for Vision-based Surround-View 3D Detection (arxiv 2022) [Paper] [Github]
- SRCN3D: Sparse R-CNN 3D Surround-View Camera Object Detection and Tracking for Autonomous Driving (Arxiv 2022) [Paper] [Github]
- BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection (Arxuv 2022) [Paper] [Github]
- BEVStereo: Enhancing Depth Estimation in Multi-view 3D Object Detection with Dynamic Temporal Stere (Arxiv 2022) [Paper][Github]
- MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular Backbones (Arxiv 2022) [Paper] [Github]
- Focal-PETR: Embracing Foreground for Efficient Multi-Camera 3D Object (Arxiv 2022)[Paper]
- DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention (Arxiv 2022) [Paper]
- Multi-Camera Calibration Free BEV Representation for 3D Object Detection (Arxiv 2022) [Paper]
- SemanticBEVFusion: Rethink LiDAR-Camera Fusion in Unified Bird's-Eye View Representation for 3D Object Detectio (IROS 2023) [Paper]
- BEV-SAN: Accurate BEV 3D Object Detection via Slice Attention Networks (Arxiv 2022) [Paper]
- STS: Surround-view Temporal Stereo for Multi-view 3D Detection (Arxiv 2022) [Paper]
- BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for BEV 3D Object Detection (Arxiv 2022) [Paper]
- Multi-Camera Calibration Free BEV Representation for 3D Object Detection (Arxiv 2022) [Paper]
- AutoAlign: Pixel-Instance Feature Aggregationfor Multi-Modal 3D Object Detection (IJCAI 2022) [Paper]
- Graph-DETR3D: Rethinking Overlapping Regions for Multi-View 3D Object Detection (ACM MM 2022) [paper][Github]
- ORA3D: Overlap Region Aware Multi-view 3D Object Detection (BMVC 2022) [Paper] [Project Page]
- AutoAlignV2: Deformable Feature Aggregation for DynamicMulti-Modal 3D Object Detection (ECCV 2022) [Paper][Github]
- CenterFormer: Center-based Transformer for 3D Object Detection (ECCV 2022) [paper][Github]
- SpatialDETR: Robust Scalable Transformer-Based 3D Object Detection from Multi-View Camera Images with Global Cross-Sensor Attention (ECCV 2022) [Paper][Github]
- Position Embedding Transformation for Multi-View 3D Object Detection (ECCV 2022) [Paper] [Github]
- BEVDepth: Acquisition of Reliable Depth forMulti-view 3D Object Detection (AAAI 2023) [Paper] [Github]
- PolarFormer: Multi-camera 3D Object Detectionwith Polar Transformers (AAAI 2023) [Paper][Github]
- A Simple Baseline for Multi-Camera 3D Object Detection (AAAI 2023) [Paper][Github]
- Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection (Arxiv 2023) [Paper] [Github]
- Sparse4D: Multi-view 3D Object Detection with Sparse Spatial-Temporal Fusion (Arxiv 2023) [Paper] [Github]
- BEVSimDet: Simulated Multi-modal Distillation in Bird's-Eye View for Multi-view 3D Object Detection (Arxiv 2023) [Paper][Github]
- BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo (Arxiv 2023) [Paper]
- BSH-Det3D: Improving 3D Object Detection with BEV Shape Heatmap (Arxiv 2023) [Paper] [Github]
- DORT: Modeling Dynamic Objects in Recurrent for Multi-Camera 3D Object Detection and Tracking (Arxiv 2023) [Paper] [Github]
- Geometric-aware Pretraining for Vision-centric 3D Object Detection (Arxiv 2023) [Paper] [Github]
- Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception (Arxiv 2023) [Paper]
- OA-BEV: Bringing Object Awareness to Bird's-Eye-View Representation for Multi-Camera 3D Object Detection (Arxiv 2023) [Paper]
- Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction (ICCV 2023) [Paper] [Github]
- VIMI: Vehicle-Infrastructure Multi-view Intermediate Fusion for Camera-based 3D Object Detection (Arxiv 2023) [Paper]
- Object as Query: Equipping Any 2D Object Detector with 3D Detection Ability (Arxiv 2023) [Paper]
- VoxelFormer: Bird’s-Eye-View Feature Generation based on Dual-view Attention for Multi-view 3D Object Detection (Arxiv 2023) [Paper] [Github]
- TiG-BEV: Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning (Arxiv 2023) [Paper] [Github]
- CrossDTR: Cross-view and Depth-guided Transformersfor 3D Object Detection (ICRA 2023) [Paper][Github]
- SOLOFusion: Time Will Tell: New Outlooks and A Baseline for Temporal Multi-View 3D Object Detection (ICLR 2023) [paper][Github]
- BEVDistill: Cross-Modal BEV Distillation for Multi-View 3D Object Detection (ICLR 2023) [Paper][Github]
- UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View (CVPR 2023)[Paper][Github]
- Understanding the Robustness of 3D Object Detection with Bird's-Eye-View Representations in Autonomous Driving (CVPR 2023) [Paper]
- Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection (CVPR 2023) [Paper] [Github]
- Aedet: Azimuth-invariant multi-view 3d object detection (CVPR 2023) [Paper] [Github] [Project]
- BEVHeight: A Robust Framework for Vision-based Roadside 3D Object Detection (CVPR 2023) [Paper] [Github]
- CAPE: Camera View Position Embedding for Multi-View 3D Object Detection (CVPR 2023) [Paper] [Github]
- FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection (CVPR 2023) [Paper] [Github]
- Sparse4D v2 Recurrent Temporal Fusion with Sparse Model (Arxiv 2023) [Paper] [Github]
- DA-BEV : Depth Aware BEV Transformer for 3D Object Detection (Arxiv 2023) [Paper]
- BEV-IO: Enhancing Bird’s-Eye-View 3D Detectionwith Instance Occupancy (Arxiv 2023) [Paper]
- OCBEV: Object-Centric BEV Transformer for Multi-View 3D Object Detection (Arxiv) [Paper]
- SA-BEV: Generating Semantic-Aware Bird’s-Eye-View Feature for Multi-view 3D Object Detection (ICCV 2023) [Paper] [Github]
- Predict to Detect: Prediction-guided 3D Object Detection using Sequential Images (Arxiv 2023) [paper]
- DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting (Arxiv 2023) [Paper]
- Far3D: Expanding the Horizon for Surround-view 3D Object Detection (Arxiv 2023) [Paper]
- HeightFormer: Explicit Height Modeling without Extra Data for Camera-only 3D Object Detection in Bird’s Eye View (Arxiv 2023) [paper]
- Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection (ICCV 2023) [Paper] [Github]
- 3DPPE: 3D Point Positional Encoding for Multi-Camera 3D Object Detection Transformers (ICCV 2023) [Paper] [Github] [Github]
- FB-BEV: BEV Representation from Forward-Backward View Transformations (ICCV 2023) [paper] [Github]
- QD-BEV : Quantization-aware View-guided Distillation for Multi-view 3D Object Detection (ICCV 2023) [Paper]
- SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos (ICCV 2023) [Paper] [Github]
- NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection (ICCV 2023) [paper] [Github]
- DistillBEV: Boosting Multi-Camera 3D Object Detection with Cross-Modal Knowledge Distillation (ICCV 2023) [paper]
- BEVHeight++: Toward Robust Visual Centric 3D Object Detection (Arxiv 2023) [paper]
- UniBEV: Multi-modal 3D Object Detection with Uniform BEV Encoders for Robustness against Missing Sensor Modalities (Arxiv 2023) [Paper]
- Unsupervised 3D Perception with 2D Vision-Language Distillation for Autonomous Driving (Arxiv 2023) [Paper]
- Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection (ICCV 2023) [Paper] [Github] [Project]
- CoBEVFusion: Cooperative Perception with LiDAR-Camera Bird's-Eye View Fusion (Arxiv 2023) [paper]
- DynamicBEV: Leveraging Dynamic Queries and Temporal Context for 3D Object Detection (Arxiv 2023) [paper]
- TOWARDS GENERALIZABLE MULTI-CAMERA 3D OBJECT DETECTION VIA PERSPECTIVE DEBIASING (Arxiv 2023) [Paper]
- Leveraging Vision-Centric Multi-Modal Expertise for 3D Object Detection (NeurIPS 2023) (Arxiv 2023) [Paper] [Github]
- M&M3D: Multi-Dataset Training and Efficient Network for Multi-view 3D Object (Arxiv 2023) [Paper]
- Sparse4D v3 Advancing End-to-End 3D Detection and Tracking (Arxiv 2023) [Paper] [Github]
- BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection (Arxiv 2023) [paper]
- Towards Efficient 3D Object Detection in Bird’s-Eye-View Space for Autonomous Driving: A Convolutional-Only Approach [Paper]
- Residual Graph Convolutional Network for Bird”s-Eye-View Semantic Segmentation (Arxiv 2023) [Paper]
- Diffusion-Based Particle-DETR for BEV Perception (Arxiv 2023) [paper]
- M-BEV: Masked BEV Perception for Robust Autonomous Driving (Arxiv 2023) [Paper]
- Explainable Multi-Camera 3D Object Detection with Transformer-Based Saliency Maps (Arxiv 2023) [Paper]
- Sparse Dense Fusion for 3D Object Detection (Arxiv 2023) [Paper]
- WidthFormer: Toward Efficient Transformer-based BEV View Transformation (Arxiv 2023) [Paper] [Github]
- UniVision: A Unified Framework for Vision-Centric 3D Perception (Arxiv 2024) [Paper]
- DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception (Arxiv 2024) [Paper]
- Towards Scenario Generalization for Vision-based Roadside 3D Object Detection (Arxiv 2024) [Paper] [Github]
- CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow (CVPR 2024) [Paper]
- GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection (Arxiv 2024) [paper]
- Lifting Multi-View Detection and Tracking to the Bird's Eye View (Arxiv 2024) [paper] [Github]
- DuoSpaceNet: Leveraging Both Bird's-Eye-View and Perspective View Representations for 3D Object Detection (Arxiv 2024) [Paper]
- BEVSpread: Spread Voxel Pooling for Bird’s-Eye-View Representation in Vision-based Roadside 3D Object Detection (CVPR 2024) [Paper] [Github]
- OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection (ECCV 2024) [Paper] [Github]
- FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection (ECCV 2024) [Paper]
- PolarBEVDet: Exploring Polar Representation for Multi-View 3D Object Detection in Bird's-Eye-View (Arxiv 2024) [Paper]
- GeoBEV: Learning Geometric BEV Representation for Multi-view 3D Object Detection (Arxiv 2024) [Paper]
- Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression (ECCV 2024) [Paper] [Github]
- MambaBEV: An efficient 3D detection model with Mamba2 (Arxiv 2024) [Paper]
- ROA-BEV: 2D Region-Oriented Attention for BEV-based 3D Object (Arxiv 2024) [Paper]
- PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images (Axxiv 2023) [Paper] [Github]
- X-Align: Cross-Modal Cross-View Alignment for Bird’s-Eye-View Segmentation (WACV 2023) [Paper]
- BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation (ICRA 2023) [Paper] [Github] [Project] UniM2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving (Arxiv 2023) [Paper]
- BEVFusion4D: Learning LiDAR-Camera Fusion Under Bird's-Eye-View via Cross-Modality Guidance and Temporal Aggregation (Arxiv 2023) [paper]
- Towards Better 3D Knowledge Transfer via Masked Image Modeling for Multi-view 3D Understanding (Arxiv 2023) [paper]
- LiDAR2Map: In Defense of LiDAR-Based Semantic Map Construction Using Online Camera Distillation (CVPR 2023) [Paper] [Github]
- BEV-Guided Multi-Modality Fusion for Driving Perception (CVPR 2023) [Paper] [Github]
- FUSIONFORMER: A MULTI-SENSORY FUSION IN BIRD’S-EYE-VIEW AND TEMPORAL CONSISTENT TRANSFORMER FOR 3D OBJECTION (Arxiv 2023) [paper]
- UniTR: A Unified and Efficient Multi-Modal Transformer for Bird’s-Eye-View Representation (ICCV 2023) [Paper] [Github]
- BroadBEV: Collaborative LiDAR-camera Fusion for Broad-sighted Bird’s Eye View Map Construction (Arxiv 2023) [Paper]
- BEVCar: Camera-Radar Fusion for BEV Map and Object Segmentation (Arxiv 2024) [paper]
- OE-BevSeg: An Object Informed and Environment Aware Multimodal Framework for Bird's-eye-view Vehicle Semantic Segmentation (Arxiv 2024) [Paper]
- BEVPose: Unveiling Scene Semantics through Pose-Guided Multi-Modal BEV Alignment (IROS 2024) [Paper] [Project]
- LidarMultiNet: Unifying LiDAR Semantic Segmentation, 3D Object Detection, and Panoptic Segmentation in a Single Multi-task Network (Arxiv 2022) [paper]
- SVQNet: Sparse Voxel-Adjacent Query Network for 4D Spatio-Temporal LiDAR Semantic Segmentation (Arxiv 2023) [Paper]
- BEVContrast: Self-Supervision in BEV Space for Automotive Lidar Point Clouds (3DV 2023) [Paper] [Github]
- Learning to Look around Objects for Top-View Representations of Outdoor Scenes (ECCV 2018) [paper]
- A Parametric Top-View Representation of Complex Road Scenes (CVPR 2019) [Paper]
- Monocular Semantic Occupancy Grid Mapping with Convolutional Variational Encoder-Decoder Networks (ICRA 2019 IEEE RA-L 2019) [Paper] [Github]
- Short-Term Prediction and Multi-Camera Fusion on Semantic Grids (ICCVW 2019) [paper]
- Predicting Semantic Map Representations from Images using Pyramid Occupancy Networks (CVPR 2020) [Paper] [Github]
- MonoLayout : Amodal scene layout from a single image (WACV 2020) [Paper] [Github]
- Bird’s Eye View Segmentation Using Lifted2D Semantic Features (BMVC 2021) [Paper]
- Enabling Spatio-temporal aggregation in Birds-Eye-View Vehicle Estimation (ICRA 2021) [Paper] [mp4]
- Projecting Your View Attentively: Monocular Road Scene Layout Estimation viaCross-view Transformation (CVPR 2021) [Paper] [Github]
- ViT BEVSeg: A Hierarchical Transformer Network for Monocular Birds-Eye-View Segmentation (IEEE IJCNN 2022) [paper]
- Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images (IEEE RA-L 2022) [Paper] [Github] [Project]
- Understanding Bird's-Eye View of Road Semantics using an Onboard Camera (ICRA 2022) [Paper] [Github]
- “The Pedestrian next to the Lamppost”Adaptive Object Graphs for Better Instantaneous Mapping (CVPR 2022) [Paper]
- Weakly But Deeply Supervised Occlusion-Reasoned Parametric Road Layouts (CVPR 2022) [Paper]
- Translating Images into Maps (ICRA 2022) [Paper] [Github]
- GitNet: Geometric Prior-based Transformation for Birds-Eye-View Segmentation (ECCV 2022) [Paper]
- SBEVNet: End-to-End Deep Stereo Layout Estimation (WACV 2022) [Paper]
- BEVSegFormer: Bird’s Eye View Semantic Segmentation From ArbitraryCamera Rigs (WACV 2023) [Paper]
- DiffBEV: Conditional Diffusion Model for Bird's Eye View Perception (Arxiv 2023) [Paper] [Github]
- HFT: Lifting Perspective Representations via Hybrid Feature Transformation (ICRA 2023) [Paper] [Github]
- SkyEye: Self-Supervised Bird's-Eye-View Semantic Mapping Using Monocular Frontal View Images (Arxiv 2023) [Paper]
- Calibration-free BEV Representation for Infrastructure Perception (Arxiv 2023) [Paper]
- Semi-Supervised Learning for Visual Bird’s Eye View Semantic Segmentation (Arxiv 2023) [Paper]
- DualCross: Cross-Modality Cross-Domain Adaptation for Monocular BEVPerception (Arxiv 2023) [paper] [github] [Project]
- CoBEV: Elevating Roadside 3D Object Detection with Depth and Height Complementarity (Arxiv 2023) [Paper]
- SeaBird: Segmentation in Bird’s View with Dice Loss Improves Monocular 3D Detection of Large Objects (CVPR 2024) [Paper] [Github]
- DaF-BEVSeg: Distortion-aware Fisheye Camera based Bird's Eye View Segmentation with Occlusion Reasoning (Arxiv 2024) [Paper] [Github]
- Improved Single Camera BEV Perception Using Multi-Camera Training (ITSC 2024) [Paper]
- Focus on BEV: Self-calibrated Cycle View Transformation for Monocular Birds-Eye-View Segmentation (Arxiv 2024) [Paper]
- A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird’s Eye View (IEEE ITSC 2020)[Paper] [Github]
- Cross-view Semantic Segmentation for Sensing Surroundings (IROS 2020 IEEE RA-L 2020) [Paper] [Github] [Project]
- Lift, Splat, Shoot: Encoding Images from Arbitrary Camera Rigs by Implicitly Unprojecting to 3D (ECCV 2020) [Paper] [Github] [Project]
- Cross-view Transformers for real-time Map-view Semantic Segmentation (CVPR 2022) [Paper] [Github]
- Scene Representation in Bird’s-Eye View from Surrounding Cameras withTransformers (CVPRW 2022) [Paper]
- M2BEV: Multi-Camera Joint 3D Detection and Segmentation with Unified Birds-Eye View Representation (Arxiv 2022) [Paper] [Project]
- BEVerse: Unified Perception and Prediction in Birds-Eye-View for Vision-Centric Autonomous Driving (Arxiv 2022) [Paper] [Github]
- Efficient and Robust 2D-to-BEV Representation Learning via Geometry-guided Kernel Transformer (Arxiv 2022) [Paper] [Github]
- A Simple Baseline for BEV Perception Without LiDAR (Arxiv 2022) [Paper] [Github] [Project Page]
- UniFusion: Unified Multi-view Fusion Transformer for Spatial-Temporal Representation in Bird's-Eye-View (ICCV 2023) [Paper] [Github
- LaRa: Latents and Rays for Multi-CameraBird’s-Eye-View Semantic Segmentation (CORL 2022) [Paper]) [Github]
- CoBEVT: Cooperative Bird’s Eye View Semantic Segmentation with Sparse Transformers (CORL 2022) [Paper] [Github]
- Vision-based Uneven BEV Representation Learningwith Polar Rasterization and Surface Estimation (CORL 2022) [Paper] [Github]
- BEVFormer: a Cutting-edge Baseline for Camera-based Detection (ECCV 2022) [Paper] [Github]
- JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes (ECCV 2022) [Paper] [Github]
- Learning Ego 3D Representation as Ray Tracing (ECCV 2022) [Paper] [Github]
- Fast-BEV: Towards Real-time On-vehicle Bird's-Eye View Perception (NIPS 2022 Workshop) [Paper] or [Paper] [Github]
- Fast-BEV: A Fast and Strong Bird's-Eye View Perception Baseline (Arxiv 2023) [Paper] [Github]
- BEVFormer v2: Adapting Modern Image Backbones toBird’s-Eye-View Recognition via Perspective Supervision (CVPR 2023) [Paper]
- MapPrior: Bird’s-Eye View Map Layout Estimation with Generative Models (CVPR 2023) [Paper]
- Bi-Mapper: Holistic BEV Semantic Mapping for Autonomous Driving (Arxiv 2023) [paper] [Github]
- MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception (ICCV 2023) [Paper] [Github]
- MetaBEV: Solving Sensor Failures for BEV Detection and Map Segmentation (ICCV 2023) [paper] [Github] [Project]
- One Training for Multiple Deployments: Polar-based Adaptive BEV Perception for Autonomous Driving (Arxiv 2023) [Paper]
- RoboBEV: Towards Robust Bird's Eye View Perception under Corruptions (Arxiv 2023) [paper] [Github] [Project]
- X-Align++: cross-modal cross-view alignment for Bird's-eye-view segmentation (Arxiv 2023) [Paper]
- PowerBEV: A Powerful Yet Lightweight Framework forInstance Prediction in Bird’s-Eye View (Axriv 2023) [paper]
- Parametric Depth Based Feature Representation Learning for Object Detection and Segmentation in Bird’s-Eye View (ICCV 2023) [Paper]
- Towards Viewpoint Robustness in Bird’s Eye View Segmentation (ICCV 2023) [Paper] [Project]
- PowerBEV: A Powerful Yet Lightweight Framework for Instance Prediction in Bird’s-Eye View (Arxiv 2023) [Paper]
- PointBeV: A Sparse Approach to BeV Predictions (Arxiv 2023) [paper] [Github]
- DualBEV: CNN is All You Need in View Transformation (Arxiv 2024) [Paper]
- MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning (Arxiv 2024) [paper]
- HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras (Arxiv 2024) [Paper] [Github]
- Improving Bird's Eye View Semantic Segmentation by Task Decomposition (CVPR 2024) [Paper] [Github]
- SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation (CVPR 2024) [Paper] [Github]
- RoadBEV: Road Surface Reconstruction in Bird's Eye View (Arxiv 2024) [Paper] [Github]
- TempBEV: Improving Learned BEV Encoders with Combined Image and BEV Space Temporal Aggregation (Arxiv 2024) [Paper]
- DiffMap: Enhancing Map Segmentation with Map Prior Using Diffusion Model (Arxiv 2024) [Paper]
- Bird's-Eye View to Street-View: A Survey (Arxiv 2024) [Paper]
- LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping (Arxiv 2024) [Paper]
- Navigation Instruction Generation with BEV Perception and Large Language Models (ECCV 2024) [paper] [Github]
- GaussianBeV: 3D Gaussian Representation meets Perception Models for BeV Segmentation (Arxiv 2024) [Paper]
- MaskBEV: Towards A Unified Framework for BEV Detection and Map Segmentation (ACM MM 2024) [paper]
- Robust Bird’s Eye View Segmentation by Adapting DINOv2 (ECCV 2024 Workshop) [Paper]
- Unveiling the Black Box: Independent Functional Module Evaluation for Bird’s-Eye-View Perception Model (Arxiv 2024) [Paper]
- RopeBEV: A Multi-Camera Roadside Perception Network in Bird's-Eye-View (Arxiv 2024) [Paper]
- OneBEV: Using One Panoramic Image for Bird's-Eye-View Semantic Mapping (ACCV 2024) [Paper] [Github]
- ROAD-Waymo: Action Awareness at Scale for Autonomous Driving (NeurIPS 2024) [Paper] [Github]
- Driving among Flatmobiles: Bird-Eye-View occupancy grids from a monocular camera for holistic trajectory planning (WACV 2021) [Paper]
- HOPE: Hierarchical Spatial-temporal Network for Occupancy Flow Prediction (CVPRW 2022) [paper]
- FIERY: Future Instance Prediction in Bird’s-Eye View from Surround Monocular Cameras (ICCV 2021) [Paper] [Github] [Project]
- NEAT: Neural Attention Fields for End-to-End Autonomous Driving (ICCV 2021) [Paper] [Github]
- ST-P3: End-to-end Vision-based AutonomousDriving via Spatial-Temporal Feature Learning (ECCV 2022) [Paper] [Github]
- StretchBEV: Stretching Future InstancePrediction Spatially and Temporally (ECCV 2022) [Paper] [Github] [Projet]
- TBP-Former: Learning Temporal Bird's-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving (CVPR 2023) [Paper] [Github]
- Planning-oriented Autonomous Driving (CVPR 2023, Occupancy Prediction) [paper] [Github] [Project]
- Think Twice before Driving:Towards Scalable Decoders for End-to-End Autonomous Driving (CVPR 2023) [Paper] [Github]
- ReasonNet: End-to-End Driving with Temporal and Global Reasoning (CVPR 2023) [Paper]
- LiDAR-BEVMTN: Real-Time LiDAR Bird’s-Eye View Multi-Task Perception Network for Autonomous Driving (Arxiv 2023) [paper]
- FusionAD: Multi-modality Fusion for Prediction and Planning Tasks of Autonomous Driving (Arxiv 2023) [Paper]
- VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning (Arxiv 2024) [Paper] [Project]
- SparseAD: Sparse Query-Centric Paradigm for Efficient End-to-End Autonomous Driving (Arxiv 2024) [Paper]
- SparseDrive: End-to-End Autonomous Driving via Sparse Scene Representation (Arxiv 2024) [paper] [Github]
- DUALAD: Disentangling the Dynamic and Static World for End-to-End Driving (CVPR 2024) [Paper]
- Solving Motion Planning Tasks with a Scalable Generative Model (ECCV 2024) [Paper] [Github]
- Hierarchical Recurrent Attention Networks for Structured Online Map (CVPR 2018) [Paper]
- End-to-End Deep Structured Models for Drawing Crosswalks (ECCV 2018) [Paper]
- Probabilistic Semantic Mapping for Urban Autonomous Driving Applications (IROS 2020) [Paper] [Github]
- Convolutional Recurrent Network for Road Boundary Extraction (CVPR 2022) [Paper]
- Lane Graph Estimation for Scene Understanding in Urban Driving (IEEE RAL 2021) [Paper]
- M^2-3DLaneNet: Multi-Modal 3D Lane Detection (Arxiv 2022) [paper] [Github]
- HDMapNet: An Online HD Map Construction and Evaluation Framework (ICRA 2022) [paper] [Github] [Project]
- SuperFusion: Multilevel LiDAR-Camera Fusion for Long-Range HD Map Generation (Arxiv 2023) [paper] [Github]
- VMA: Divide-and-Conquer Vectorized MapAnnotation System for Large-Scale Driving Scene (Arxiv 2023) [Paper]
- THMA: Tencent HD Map AI System for Creating HD Map Annotations (AAAI 2023) [paper]
- RoadTracer: Automatic Extraction of Road Networks from Aerial Images (CVPR 2018) [Paper] [Github]
- DAGMapper: Learning to Map by Discovering Lane Topology (ICCV 2019) [paper]
- End-to-end Lane Detection through Differentiable Least-Squares Fitting (ICCVW 2019) [paper]
- VecRoad: Point-based Iterative Graph Exploration for Road Graphs Extraction (CVPR 2020) [Paper] [Github] [Project]
- Sat2Graph: Road Graph Extraction through Graph-Tensor Encoding (ECCV 2020) [paper] [Github]
- iCurb: Imitation Learning-based Detection of Road Curbs using Aerial Images for Autonomous Driving (ICRA 2021 IEEE RA-L) [paper] [Github] [Project]
- HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps (CVPR 2021) [paper]
- Structured Bird’s-Eye-View Traffic Scene Understanding from Onboard Images (ICCV 2021) [Paper] [Github]
- RNGDet: Road Network Graph Detection by Transformer in Aerial Images (IEEE TGRS 2022) [[Paper] [Project]
- RNGDet++: Road Network Graph Detection by Transformer with Instance Segmentation and Multi-scale Features Enhancement (IEEE RA-L 2022) [Paper] [Github] [Project]
- SPIN Road Mapper: Extracting Roads from Aerial Images via Spatial and Interaction Space Graph Reasoning for Autonomous Driving (ICRA 2022) [paper] [Github]
- Laneformer: Object-aware Row-Column Transformers for Lane Detection (AAAI 2022) [Paper]
- Lane-Level Street Map Extraction from Aerial Imagery (WACV 2022) [Paper] [Github]
- Reconstruct from Top View: A 3D Lane Detection Approach based on GeometryStructure Prior (CVPRW 2022) [paper]
- PolyWorld: Polygonal Building Extraction with Graph Neural Networks in Satellite Images (CVPR 2022) [Paper] [Github]
- Topology Preserving Local Road Network Estimation from Single Onboard Camera Image (CVPR 2022) [Paper] [Github]
- TD-Road: Top-Down Road Network Extraction with Holistic Graph Construction (ECCV 2022) [Paper]
- CLiNet: Joint Detection of Road Network Centerlines in 2D and 3D (IEEE IVS 2023) [Paper]
- Polygonizer: An auto-regressive building delineator (ICLRW 2023) [Paper]
- CurveFormer: 3D Lane Detection by Curve Propagation with CurveQueries and Attention (ICRA 2023) [Paper]
- Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection (CVPR 2023) [paper] [Github]
- Learning and Aggregating Lane Graphs for Urban Automated Driving (Arxiv 2023) [paper]
- Online Lane Graph Extraction from Onboard Video (Arxiv 2023) [paper] [Github]
- Video Killed the HD-Map: Predicting Driving BehaviorDirectly From Drone Images (Arxiv 2023) [Paper]
- Prior Based Online Lane Graph Extraction from Single Onboard Camera Image (Arxiv 2023) [Paper]
- Online Monocular Lane Mapping Using Catmull-Rom Spline (Arxiv 2023) [Paper] [Github]
- Improving Online Lane Graph Extraction by Object-Lane Clustering (ICCV 2023) [Paper]
- LATR: 3D Lane Detection from Monocular Images with Transformer (ICCV 2023) [Paper] [Github]
- Patched Line Segment Learning for Vector Road Mapping (Arxiv 2023) [paper]
- Sparse Point Guided 3D Lane Detection (ICCV 2023) [Paper] [Github]
- Recursive Video Lane Detection (ICCV 2023) [Paper] [Github]
- LATR: 3D Lane Detection from Monocular Images with Transformer (ICCV 2023) [Paper] [Github]
- Occlusion-Aware 2D and 3D Centerline Detection for Urban Driving via Automatic Label Generation (ARXIV 2023) [PAPER]
- BUILDING LANE-LEVEL MAPS FROM AERIAL IMAGES (Arxiv 2023) [paper]
- LaneCPP: Continuous 3D Lane Detection using Physical Priors (CVPR 2024) [Paper]
- DeepAerialMapper: Deep Learning-based Semi-automatic HD Map Creation for Highly Automated Vehicles (Arxiv 2024) [Paper] [Github]
- PersFormer: a New Baseline for 3D Laneline Detection (ECCV 2022) [Paper] [Github]
- Continuity-preserving Path-wise Modeling for Online Lane Graph Construction (Arxiv 2023) [paper] [Github]
- VAD: Vectorized Scene Representation for Efficient Autonomous Driving (Arxiv 2023) [paper] [Github]
- InstaGraM: Instance-level Graph Modelingfor Vectorized HD Map Learning (Arxiv 2023) [Paper]
- VectorMapNet: End-to-end Vectorized HD Map Learning (Arxiv 2023) [Paper] [Github] [Project]
- Road Genome: A Topology Reasoning Benchmark for Scene Understanding in Autonomous Driving (Arxiv 2023) [Paper] [Github]
- Topology Reasoning for Driving Scenes (Arxiv 2023) [paper] [Github]
- MV-Map: Offboard HD-Map Generation with Multi-view Consistency (Arxiv 2023) [paper] [Github]
- CenterLineDet: Road Lane CenterLine Graph Detection With Vehicle-Mounted Sensors by Transformer for High-definition Map Creation (ICRA 2023) [paper] [Github]
- Structured Modeling and Learning for Online Vectorized HD Map Construction (ICLR 2023) [paper] [Github]
- Neural Map Prior for Autonomous Driving (CVPR 2023) [Paper]
- An Efficient Transformer for Simultaneous Learning of BEV and LaneRepresentations in 3D Lane Detection (Arxiv 2023) [paper]
- TopoMask: Instance-Mask-Based Formulation for the Road Topology Problemvia Transformer-Based Architecture (Arxiv 2023) [apper]
- PolyDiffuse: Polygonal Shape Reconstruction viaGuided Set Diffusion Models (Arxiv 2023) [paper] [Github] [Project]
- Online Map Vectorization for Autonomous Driving: A Rasterization Perspective (Arxiv 2023) [Paper]
- NeMO: Neural Map Growing System forSpatiotemporal Fusion in Bird’s-Eye-Viewand BDD-Map Benchmark (Arxiv 2023) [Paper]
- MachMap: End-to-End Vectorized Solution for Compact HD-Map Construction (CVPR 2023 Workshop) [Paper]
- Lane Graph as Path: Continuity-preserving Path-wise Modelingfor Online Lane Graph Construction (Arxiv 2023) [paper]
- End-to-End Vectorized HD-map Construction with Piecewise B ́ezier Curve (CVPR 2023) [Paper] [Github]
- GroupLane: End-to-End 3D Lane Detection with Channel-wise Grouping (Arxiv 2023) [Paper]
- MapTRv2: An End-to-End Framework for Online Vectorized HD Map Construction (Arxiv 2023) [Paper]
- LATR: 3D Lane Detection from Monocular Images with Transformer (Arxiv 2023) [Paper]
- INSIGHTMAPPER: A CLOSER LOOK AT INNER-INSTANCE INFORMATION FOR VECTORIZED HIGH-DEFINITION MAPPING (Arxiv 2023) [Paper] [Project] [Github]
- HD Map Generation from Noisy Multi-Route Vehicle Fleet Data on Highways with Expectation Maximization (Arxiv 2023) [Paper]
- StreamMapNet: Streaming Mapping Network for Vectorized Online HD Map Construction (WACV 2024) [Paper] [Github]
- PivotNet: Vectorized Pivot Learning for End-to-end HD Map Construction (ICCV 2023) [Paper]
- Translating Images to Road Network: A Non-Autoregressive Sequence-to-Sequence Approach (ICCV 2023) [paper]
- TopoMLP: An Simple yet Strong Pipeline for Driving Topology Reasoning (Arxiv 2023) [paper] [Github]
- ScalableMap: Scalable Map Learning for Online Long-Range Vectorized HD Map Construction (CoRL 2023) [Paper] [Github]
- Mind the map! Accounting for existing map information when estimating online HDMaps from sensor data (Arxiv 2023) [Paper]
- Augmenting Lane Perception and Topology Understanding with Standard Definition Navigation Maps (Arxiv 2023) [Paper] [Github]
- P-MAPNET: FAR-SEEING MAP CONSTRUCTOR ENHANCED BY BOTH SDMAP AND HDMAP PRIORS (ICLR 2024 submitted paper) [Openreview] [Paper]
- Online Vectorized HD Map Construction using Geometry (Arxiv 2023) [paper] [Github]
- LANESEGNET: MAP LEARNING WITH LANE SEGMENT PERCEPTION FOR AUTONOMOUS DRIVING (Arxiv 2023) [paper] [Github]
- 3D Lane Detection from Front or Surround-View using Joint-Modeling & Matching (Arxiv 2024) [Paper
- MapNeXt: Revisiting Training and Scaling Practices for Online Vectorized HD Map Construction (Arxiv 2024) [Paper]
- Stream Query Denoising for Vectorized HD Map Construction (Arxiv 2024) [Paper]
- ADMap: Anti-disturbance framework for reconstructing online vectorized HD map (Arxiv 2024) [Paper]
- PLCNet: Patch-wise Lane Correction Network for Automatic Lane Correction in High-definition Maps (Arxiv 2024) [Paper]
- LaneGraph2Seq: Lane Topology Extraction with Language Model via Vertex-Edge Encoding and Connectivity Enhancement (AAAI 2024) [paper]
- VI-Map: Infrastructure-Assisted Real-Time HD Mapping for Autonomous Driving (Arxiv 2024) [Paper]
- CurveFormer++: 3D Lane Detection by Curve Propagation with Temporal Curve Queries and Attention (Arxiv 2024) [Paper]
- VI-Map: Infrastructure-Assisted Real-Time HD Mapping for Autonomous Driving (Arxiv 2024) [paper]
- Lane2Seq: Towards Unified Lane Detection via Sequence Generation (CVPR 2024) [Paper]
- Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction (Arxiv 2024) [Paper] [Github]
- MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping (Arxiv 2024) [paper] [Github]
- Producing and Leveraging Online Map Uncertainty in Trajectory Prediction (CVPR 2024) [Paper] [Github]
- MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction (CVPR 2024) [Paper] [Github]
- HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction (CVPR 2024) [Paper]
- SemVecNet: Generalizable Vector Map Generation for Arbitrary Sensor Configurations (Arxiv 2024) [Paper]
- DTCLMapper: Dual Temporal Consistent Learning for Vectorized HD Map Construction (Arxiv 2024) [Paper]
- Addressing Diverging Training Costs using Local Restoration for Precise Bird's Eye View Map Construction (Arxiv 2024) [Paper]
- Is Your HD Map Constructor Reliable under Sensor Corruptions? (Arxiv 2024) [Paper] [Github] [Project]
- DuMapNet: An End-to-End Vectorization System for City-Scale Lane-Level Map Generation(KDD 2024)[Paper]
- LGmap: Local-to-Global Mapping Network for Online Long-Range Vectorized HD Map Construction (Arxiv 2024) [Paper]
- Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention (ECCV 2024) [Paper] [Github]
- BLOS-BEV: Navigation Map Enhanced Lane Segmentation Network, Beyond Line of Sight (Arxiv 2024) [Paper]
- Map It Anywhere (MIA): Empowering Bird's Eye View Mapping using Large-scale Public Data (Arxiv 2024) [Paper]
- MapDistill: Boosting Efficient Camera-based HD Map Construction via Camera-LiDAR Fusion Model Distillation (ECCV 2024) [Paper]
- Mask2Map: Vectorized HD Map Construction Using Bird's Eye View Segmentation Masks (Arxiv 2024) [Paper] [Github]
- Generation of Training Data from HD Maps in the Lanelet2 Framework (Arxiv 2024) [Paper]
- PrevPredMap: Exploring Temporal Modeling with Previous Predictions for Online Vectorized HD Map Construction (Arxiv 2024) [paper] [Github]
- CAMAv2: A Vision-Centric Approach for Static Map Element Annotation (Arxiv 2024) [Paper]
- HeightLane: BEV Heightmap guided 3D Lane Detection (Arxiv 2024) [paper]
- PriorMapNet: Enhancing Online Vectorized HD Map Construction with Priors (Arxiv 2024) [Paper]
- Local map Construction Methods with SD map: A Novel Survey (Arxiv 2024) [Paper]
- Enhancing Vectorized Map Perception with Historical Rasterized Maps (ECCV 2024) [Paper] [Github]
- GenMapping: Unleashing the Potential of Inverse Perspective Mapping for Robust Online HD Map Construction (Arxiv 2024) [Paper] [Github]
- GlobalMapNet: An Online Framework for Vectorized Global HD Map Construction (Arxiv 2024) [[paper]] (https://arxiv.org/abs/2409.10063)
- MemFusionMap: Working Memory Fusion for Online Vectorized HD Map Construction (Arxiv 2024) [Paper]
- MGMapNet: Multi-Granularity Representation Learning for End-to-End Vectorized HD Map Construction (Arxiv 2024) [paper]
- Exploring Semi-Supervised Learning for Online Mapping (Arxiv 2024) [Paper]
- OpenSatMap: A Fine-grained High-resolution Satellite Dataset for Large-scale Map Construction (Arxiv 2024) [Paper] [Github] [Project]
- HeightMapNet: Explicit Height Modeling for End-to-End HD Map Learning (WACV 2025) [Paper] [Github]
- Lane Graph Estimation for Scene Understanding in Urban Driving (IEEE RAL 2021) [Paper]
- AutoGraph: Predicting Lane Graphs from Traffic Observations (IEEE RAL 2023) [Paper]
- Learning and Aggregating Lane Graphs for Urban Automated Driving (CVPR 2023) [Paper]
- TopoLogic: An Interpretable Pipeline for Lane Topology Reasoning on Driving Scenes (Arxiv 2024) [Paper]
- Enhancing 3D Lane Detection and Topology Reasoning with 2D Lane Priors (Arxiv 2024) [Paper]
- Learning Lane Graphs from Aerial Imagery Using Transformers (Arxiv 2024) [Paper]
- TopoMaskV2: Enhanced Instance-Mask-Based Formulation for the Road Topology Problem (Arxiv 2024) [Paper]
- LMT-Net: Lane Model Transformer Network for Automated HD Mapping from Sparse Vehicle Observations (ITSC 2024) [Paper]
- Behavioral Topology (BeTop), a multi-agent behavior formulation for interactive motion prediction and planning (NeurIPS 2024) [Paper] [Github]
- Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with Transformer (Arxiv 2022) [Paper] [Github]
- EarlyBird: Early-Fusion for Multi-View Tracking in the Bird's Eye View (Arxiv 2023) [paper] [Github]
- Traj-MAE: Masked Autoencoders for Trajectory Prediction (Arxiv 2023) [Paper]
- Trajectory Forecasting through Low-Rank Adaptation of Discrete Latent Codes (Arxiv 2024) [Paper]
- MapsTP: HD Map Images Based Multimodal Trajectory Prediction for Automated Vehicles (Arixv 2024) [Paper]
- Perception Helps Planning: Facilitating Multi-Stage Lane-Level Integration via Double-Edge Structures (Arxiv 2024) [Paper]
- Hierarchical and Decoupled BEV Perception Learning Framework for Autonomous Driving (Arxiv 2024) [Paper]
- VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions (Arxiv 2024) [Paper]
- BEV-Locator: An End-to-end Visual Semantic Localization Network Using Multi-View Images (Arxiv 2022) [paper]
- BEV-SLAM: Building a Globally-Consistent WorldMap Using Monocular Vision (IROS 2022) [Paper]
- U-BEV: Height-aware Bird’s-Eye-View Segmentation and Neural Map-based Relocalization (Arxiv 2023) [Paper]
- Monocular Localization with Semantics Map for Autonomous Vehicles (Arxiv 2024) [Paper]
- Semantic Scene Completion from a Single Depth Image (CVPR 2017) [Paper]
- Occupancy Networks: Learning 3D Reconstruction in Function Space (CVPR 2019) [Paper] [Github]
- S3CNet: A Sparse Semantic Scene Completion Network for LiDAR Point Clouds (CoRL 2020) [Paper]
- 3D Semantic Scene Completion: a Survey (IJCV 2021) [Paper]
- Semantic Scene Completion using Local Deep Implicit Functions on LiDAR Data (Arxiv 2021) [Paper]
- Sparse Single Sweep LiDAR Point Cloud Segmentation via Learning Contextual Shape Priors from Scene Completion (AAAI 2021) [Paper]
- Anisotropic Convolutional Networks for 3D Semantic Scene Completion (CVPR 2020) [Paper]
- Estimation of Appearance and Occupancy Information in Bird’s EyeView from Surround Monocular Images (Arxiv 2022) [paper] [Project]
- Semantic Segmentation-assisted Scene Completion for LiDAR Point Clouds (IROS 2021) [Paper] [Github]
- Grid-Centric Traffic Scenario Perception for Autonomous Driving: A Comprehensive Review (Arxiv 2023) [paper]
- LMSCNet: Lightweight Multiscale 3D Semantic Completion (IC 3DV 2020) [Paper] [[Github]
- MonoScene: Monocular 3D Semantic Scene Completion (CVPR 2022) [Paper] [Github] [Project]
- OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction (ICCV 2023) [Paper] [Github]
- A Simple Attempt for 3D Occupancy Estimation in Autonomous Driving (Arxiv 2023) [Paper] [Github]
- OccDepth: A Depth-aware Method for 3D Semantic Occupancy Network (Arxiv 2023) [Paper] [Github]
- OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception (Arxiv 2023) [paper] [Github]
- Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving (Arxiv 2023) [Paper] [Github] [Project]
- Occ-BEV: Multi-Camera Unified Pre-training via 3DScene Reconstruction (Arxiv 2023) [Paper] [Github]
- StereoScene: BEV-Assisted Stereo Matching Empowers 3D Semantic Scene Completion (Arxiv 2023) [paper] [Github]
- Learning Occupancy for Monocular 3D Object Detection (Arxiv 2023) [Paper] [Github]
- OVO: Open-Vocabulary Occupancy (Arxiv 2023) [Paper]
- SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving (Arxiv 2023) [paper] [Github] [Project]
- Scene as Occupancy (Arxiv 2023) [[Paper]]](https://arxiv.org/pdf/2306.02851.pdf) [Github]
- Diffusion Probabilistic Models for Scene-Scale 3D Categorical Data (Arxiv 2023) [Paper] [Github]
- PanoOcc: Unified Occupancy Representation for Camera-based3D Panoptic Segmentation (Arxiv 2023) [Paper] [Github]
- UniOcc: Unifying Vision-Centric 3D Occupancy Predictionwith Geometric and Semantic Rendering (Arxiv 2023) [paper]
- SSCBench: A Large-Scale 3D Semantic Scene Completion Benchmark for Autonomous Driving (NeurIPS 2023 D&B track) [paper] [paper]
- StereoVoxelNet: Real-Time Obstacle Detection Based on OccupancyVoxels from a Stereo Camera Using Deep Neural Networks (ICRA 2023) [[Paper]] (https://arxiv.org/pdf/2209.08459.pdf) [Github] [Project]
- Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction (CVPR 2023) [Paper] [Github]
- VoxFormer: a Cutting-edge Baseline for 3D Semantic Occupancy Prediction (CVPR 2023) [paper] [Github]
- Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting (CVPR 2023) [Paper] [Github] [Project]
- SSCBench: A Large-Scale 3D Semantic SceneCompletion Benchmark for Autonomous Driving (Arxiv 2023) [paper] [Github]
- SSC-RS: Elevate LiDAR Semantic Scene Completion with Representation Separation and BEV Fusion (IROS 2023) [Paper] [Github]
- CVSformer: Cross-View Synthesis Transformer for Semantic Scene Completion (Arxiv 2023) [paper]
- Symphonize 3D Semantic Scene Completion with Contextual Instance Queries (Arxiv 2023) [Paper] [Github]
- Occupancy-MAE: Self-supervised Pre-training Large-scale LiDAR Point Clouds with Masked Occupancy Autoencoders (Arxiv 2023) [paper]
- UniWorld: Autonomous Driving Pre-training via World Models (Arxiv 2023) [Paper] [Github]
- PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction (Arxiv 2023) [paper] [Github]
- SOGDet: Semantic-Occupancy Guided Multi-view 3D Object Detection (Arxiv 2023) [paper] [Github]
- OccupancyDETR: Making Semantic Scene Completion as Straightforward as Object Detection (Arxiv 2023) [Paper] [Github]
- PointSSC: A Cooperative Vehicle-Infrastructure Point Cloud Benchmark for Semantic Scene Completion (Arxiv 2023) [Paper]
- SPOT: SCALABLE 3D PRE-TRAINING VIA OCCUPANCY PREDICTION FOR AUTONOMOUS DRIVING (Arxiv 2023) [paper]
- NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space (Arxiv 2023) [Github]
- Anisotropic Convolutional Networks for 3D Semantic Scene Completion (CVPR 2020) [Github] [Project]
- RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering Supervision (Arxiv 2023) [paper] [Github]
- LiDAR-based 4D Occupancy Completion and Forecasting (Arxiv 2023) [Paper] [Github]
- SOccDPT: Semi-Supervised 3D Semantic Occupancy from Dense Prediction Transformers trained under memory constraints (Arxiv 2023) [Paper]
- SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction (Arxiv 2023) [Paper] [Github]
- FlashOcc: Fast and Memory-Efficient Occupancy Prediction via Channel-to-Height Plugin (Arxiv 2023) [paper]
- Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications (Arxiv 2023) [paper] [Github]
- OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving (Arxiv 2023) [paper] [Github]
- DepthSSC: Depth-Spatial Alignment and Dynamic Voxel Resolution for Monocular 3D Semantic Scene Completion (Arxiv 2023) [Paper]
- A Simple Framework for 3D Occupancy Estimation in Autonomous Driving (Arxiv 2023) [Paper] [Github]
- OctreeOcc: Efficient and Multi-Granularity Occupancy Prediction Using Octree Queries (Arxiv 2023) [Paper]
- COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction (Arxiv 2023) [Paper]
- OccNeRF: Self-Supervised Multi-Camera Occupancy Prediction with Neural Radiance Fields (Arxiv 2023) [paper] [Github]
- RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation (Arxiv 2023) [paper]
- PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness (Arxiv 2023) [paper] [Project] [Github]
- POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images (Arxiv 2024) [Paper] [Github]
- S2TPVFormer: Spatio-Temporal Tri-Perspective View for temporally coherent 3D Semantic Occupancy Prediction (Arxiv 2024) [Paper]
- InverseMatrixVT3D: An Efficient Projection Matrix-Based Approach for 3D Occupancy Prediction (Arxiv 2024) [Paper]
- V2VSSC: A 3D Semantic Scene Completion Benchmark for Perception with Vehicle to Vehicle Communication (Arxiv 2024) [Paper]
- OccFlowNet: Towards Self-supervised Occupancy Estimation via Differentiable Rendering and Occupancy Flow (Arxiv 2024) [Paper]
- OccFusion: A Straightforward and Effective Multi-Sensor Fusion Framework for 3D Occupancy Prediction (Arxiv 2024) [Paper]
- OccTransformer: Improving BEVFormer for 3D camera-only occupancy prediction (Arxiv 2024) [Paper]
- FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View (ICRA 2024) [Paper]
- OccFusion: Depth Estimation Free Multi-sensor Fusion for 3D Occupancy Prediction (Arxiv 2024) [paper]
- PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness (CVPR 2024) [Paper] [Github]
- Real-time 3D semantic occupancy prediction for autonomous vehicles using memory-efficient sparse convolution (Arxiv 2024) [paper]
- OccFiner: Offboard Occupancy Refinement with Hybrid Propagation (Arxiv 2024) [Paper]
- MonoOcc: Digging into Monocular Semantic Occupancy Prediction (ICLR 2024) [Paper]
- OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation (Arxiv 2024) [paper]
- Urban Scene Diffusion through Semantic Occupancy Map (Arxiv 2024) [Paper]
- Co-Occ: Coupling Explicit Feature Fusion with Volume Rendering Regularization for Multi-Modal 3D Semantic Occupancy Prediction (Arxiv 2024) [Paper] [Github]
- SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction (CVPR 2024) [Paper]
- Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation (CVPR 2024) [paper] [Github]
- OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks (Arxiv 2023) [Paper]
- OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving (Arxiv 2024) [paper]
- ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers (Arxiv 2024) [paper]
- A Survey on Occupancy Perception for Autonomous Driving: The Information Fusion Perspective (Arxiv 2024) [Paper]
- Vision-based 3D occupancy prediction in autonomous driving: a review and outlook (Arxiv 2024) [Paper]
- GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision (Arxiv 2024) [Paper]
- RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar (Arxiv 2024) [paper]
- GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction (Arxiv 2024) [Paper] [Github]
- OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving (Arxiv 2024) [Paper] [Github]
- EFFOcc: A Minimal Baseline for EFficient Fusion-based 3D Occupancy Network (Arxiv 2024) [Paper] [Github]
- PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving (3DV 2024) [paper]
- UnO: Unsupervised Occupancy Fields for Perception and Forecasting (Arxiv 2024) [paper]
- Context and Geometry Aware Voxel Transformer for Semantic Scene Completion (Arxiv 2024) [Paper] [Github]
- Occupancy as Set of Points (ECCV 2024) [Paper] [Github]
- Lift, Splat, Map: Lifting Foundation Masks for Label-Free Semantic Scene Completion (Arxiv 2024) [Paper]
- Let Occ Flow: Self-Supervised 3D Occupancy Flow Prediction (Arxiv 2024) [Paper]
- Monocular Occupancy Prediction for Scalable Indoor Scenes (ECCV 2024) [Paper] [Github]
- LangOcc: Self-Supervised Open Vocabulary Occupancy Estimation via Volume Rendering (Arxiv 2024) [Paper]
- VPOcc: Exploiting Vanishing Point for Monocular 3D Semantic Occupancy Prediction (Arxiv 2024) [paper]
- Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection (Arxiv 2024) [paper] [Github]
- OccMamba: Semantic Occupancy Prediction with State Space Models (Arxiv 2024) [paper]
- HybridOcc: NeRF Enhanced Transformer-based Multi-Camera 3D Occupancy Prediction (IEEE RAL 2024) [paper]
- Semi-supervised 3D Semantic Scene Completion with 2D Vision Foundation Model Guidance (Arxiv 2024) [Paper]
- MambaOcc: Visual State Space Model for BEV-based Occupancy Prediction with Local Adaptive Reordering (Arxiv 2024) [paper] [Project] [Github]
- GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting (Arxiv 2024) [paper] [Github]
- AdaOcc: Adaptive-Resolution Occupancy Prediction (Arxiv 2024) [Paper]
- Diffusion-Occ: 3D Point Cloud Completion via Occupancy Diffusion (Arxiv 2024) [Paper]
- UltimateDO: An Efficient Framework to Marry Occupancy Prediction with 3D Object Detection via Channel2height (Arxiv 2024) [paper]
- COCO-Occ: A Benchmark for Occluded Panoptic Segmentation and Image Understanding (Arxiv 2024) [Paper]
- CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction (ECCV 2024) [Paper] [Github]
- ReliOcc: Towards Reliable Semantic Occupancy Prediction via Uncertainty Learning (Arxiv 2024) [Paper]
- DiffSSC: Semantic LiDAR Scan Completion using Denoising Diffusion Probabilistic Models (Arxiv 2024) [Paper]
- SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs (Arxiv 2024) [Paper]
- OccRWKV: Rethinking Efficient 3D Semantic Occupancy Prediction with Linear Complexity (Arxiv 2024) [Paper] [Github] [Project]
- DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction (Arxiv 2024) [Paper] [Github]
- OCC-MLLM:Empowering Multimodal Large Language Model For the Understanding of Occluded Objects (Arxiv 2024) [Paper]
- OccLoff: Learning Optimized Feature Fusion for 3D Occupancy Prediction (Arxiv 2024) [paper]
- FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation (CVPR 2023 3D Occupancy Prediction Challenge WorkShop) [paper] [Github]
- Separated RoadTopoFormer (Arxiv 2023) [Paper]
- OCTraN: 3D Occupancy Convolutional Transformer Network in Unstructured Traffic Scenarios (CVPR 2023 WorkShop) [Paper] [Github]
- AdaOcc: Adaptive Forward View Transformation and Flow Modeling for 3D Occupancy and Flow Prediction (CVPR 2024 Workshop) [Paper]
- Real-Time 3D Occupancy Prediction via Geometric-Semantic Disentanglement (Arxiv 2024) [Paper]
- The 1st-place Solution for CVPR 2023 OpenLane Topologyin Autonomous Driving Challenge [Paper]
- MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report (CVPR 2024 Challenge) [Paper]
- Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark (CVPR 2023) [paper] [Github]
- SemanticSpray++: A Multimodal Dataset for Autonomous Driving in Wet Surface Conditions (IV 2024) [Paper] [Project] [Github]
- WayveScenes101: A Dataset and Benchmark for Novel View Synthesis in Autonomous Driving (Arxiv 2024) [paper] [Project] [Github]
- WildOcc: A Benchmark for Off-Road 3D Semantic Occupancy Prediction (Arxiv 2024) [Paper] [Github]
- End-to-end Autonomous Driving: Challenges and Frontiers (Arxiv 2024) [Paper] [Github]
- Talk2BEV: Language-enhanced Bird’s-eye View Maps for Autonomous Driving (ICRA 2024) [paper] [Github] [Project]
- Language Prompt for Autonomous Driving (Arxiv 2023) [Paper] [Github]
- MotionLM: Multi-Agent Motion Forecasting as Language Modeling (Arxiv 2023) [paper]
- GAIA-1: A Generative World Model for Autonomous Driving (Arxiv 2023) [paper]
- DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving (Arxiv 2023) [paper]
- Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving (Arxiv 2023) [Paper] [Github]
- Learning to Drive Anywhere (CORL 2023) [Paper]
- Language-Conditioned Path Planning (Arxiv 2023) [paper]
- DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model (Arxiv 2023) [Paper] [Project]
- GPT-Driver: Learning to Drive with GPT (Arxiv 2023) [Paper]
- LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving (Arxiv 2023) [paper]
- TOWARDS END-TO-END EMBODIED DECISION MAKING VIA MULTI-MODAL LARGE LANGUAGE MODEL: EXPLORATIONS WITH GPT4-VISION AND BEYOND (Arxiv 2023) [Paper]
- DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model (Arxiv 2023) [Paper]
- UNIPAD: A UNIVERSAL PRE-TRAINING PARADIGM FOR AUTONOMOUS DRIVING (Arxiv 2023) [paper] [Github]
- PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm (Arxiv 2023) [Paper]
- Uni3D: Exploring Unified 3D Representation at Scale (Arxiv 2023) [Paper] [Github]
- Video Language Planning (Arxiv 2023) [paper] [Github]
- RoboLLM: Robotic Vision Tasks Grounded on Multimodal Large Language Models (Arxiv 2023) [Paper]
- DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning (Arxiv 2023) [Paper] [Paper] [Project]
- Vision Language Models in Autonomous Driving and Intelligent Transportation Systems (Arxiv 2023) [Paper]
- ADAPT: Action-aware Driving Caption Transformer (ICRA 2023) [Paper] [Github]
- Language Prompt for Autonomous Driving (Arxiv 2023) [paper] [Github]
- Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models (Arxiv 2023) [Paper] [Project]
- LEARNING UNSUPERVISED WORLD MODELS FOR AUTONOMOUS DRIVING VIA DISCRETE DIFFUSION (Arxiv 2023) [Paper]
- ADriver-I: A General World Model for Autonomous Driving (Arxiv 2023) [Paper]
- HiLM-D: Towards High-Resolution Understanding in Multimodal Large Language Models for Autonomous Driving (Arxiv 2023) [Paper]
- On the Road with GPT-4V(vision): Early Explorations of Visual-Language Model on Autonomous Driving (Arxiv 2023) [paper]
- GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning (Arxiv 2023) [Paper]
- Applications of Large Scale Foundation Models for Autonomous Driving (Arxiv 2023) [Paper]
- Dolphins: Multimodal Language Model for Driving (Arxiv 2023) [Paper] [Project]
- Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving (Arxiv 2023) [paper] [Github] [Project]
- Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving? (Arxiv 2023) [Paper] [Github]
- NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous Driving Datasets using Markup Annotations (Arxiv 2023) [paper] [Github]
- DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving (Arxiv 2023) [Paper] [[Github]
- DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes (Arxiv 2023) [Paper] [Project]
- Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving (Arxiv 2023) [Paper] [Github]
- Dialogue-based generation of self-driving simulation scenarios using Large Language Models (Arxiv 2023) [Paper] [Github]
- Panacea: Panoramic and Controllable Video Generation for Autonomous Driving (Arxiv 2023) [paper] [Project] [Github]
- LingoQA: Video Question Answering for Autonomous Driving (Arxiv 2023) [paper] [Github]
- DriveLM: Driving with Graph Visual Question Answering (Arxiv 2023) [Paper] [Github]
- LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding (Arxiv 2023) [Paper] [Project]
- LMDrive: Closed-Loop End-to-End Driving with Large Language Models (Arxiv 2023) [Paper] [Github]
- Visual Point Cloud Forecasting enables Scalable Autonomous Driving (Arxiv 2023) [Paper] [Github]
- WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation (Arxiv 2023) [Paper] [Github]
- Holistic Autonomous Driving Understanding by Bird's-Eye-View Injected Multi-Modal Large Models (Arxiv 2024) [Paper] [Github]
- DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving (Arxiv 2024) [Paper]
- A Survey on Multimodal Large Language Models for Autonomous Driving (WACVW 2024) [Paper]
- VLP: Vision Language Planning for Autonomous Driving (Arxiv 2023) [Paper]
- Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities (Arxiv 2024) [Paper]
- MapGPT: Map-Guided Prompting for Unified Vision-and-Language Navigation (Arxiv 2024) [Paper]
- Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents (Arxiv 2024) [Paper]
- DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models (Arxiv 2024) [Paper] [Github]
- GenAD: Generative End-to-End Autonomous Driving (Arxiv 2024) [Paper] [Github]
- Generalized Predictive Model for Autonomous Driving (CVPR 2024) [Paper]
- AMP: Autoregressive Motion Prediction Revisited with Next Token Prediction for Autonomous Driving (Arxiv 2024) [paper]
- DriveCoT: Integrating Chain-of-Thought Reasoning with End-to-End Driving (Arxiv 2024) [Paper]
- SubjectDrive: Scaling Generative Data in Autonomous Driving via Subject Control (Arxiv 2024) [Paper] [Project]
- DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation (Arxiv 2024) [Paper] [Project] [Github]
- DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models (ICLR 2024) [Paper] [Paper]
- OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning (Arxiv 2024) [Paper]
- GAD-Generative Learning for HD Map-Free Autonomous Driving (Arxiv 2024) [Paper]
- Guiding Attention in End-to-End Driving Models (Arxiv 2024) [Paper]
- Probing Multimodal LLMs as World Models for Driving (Arxiv 2024) [Paper]
- Traj-LLM: A New Exploration for Empowering Trajectory Prediction with Pre-trained Large Language Models (Arxiv 2024) [Paper]
- Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving (Arixv 2024) [Paper]
- Unified End-to-End V2X Cooperative Autonomous Driving (Arxiv 2024) [paper]
- DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving (Arxiv 2024) [paper]
- OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning (Arxiv 2024) [Paper]
- GAD-Generative Learning for HD Map-Free Autonomous Driving (Arxiv 2024) [paper]
- MaskFuser: Masked Fusion of Joint Multi-Modal Tokenization for End-to-End Autonomous Driving (Arxiv 2024) [Paper]
- MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes (Arxiv 2024) [Paper]
- Language-Image Models with 3D Understanding (Arxiv 2024) [paper] [Project]
- Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving? (Arxiv 2024) [Paper]
- GFlow: Recovering 4D World from Monocular Video (Arxiv 2024) [Paper] [Github]
- Benchmarking and Improving Bird's Eye View Perception Robustness in Autonomous Driving (Arxiv 2024) [Paper] [Github]
- Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability (Arxiv 2024) [Paper] [Github] [Project]
- OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving (Arxiv 2024) [Paper] [Github] [Project]
- DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences (Arxiv 2024) [Paper] [Github]
- AD-H: Autonomous Driving with Hierarchical Agents (Arxiv 2024) [Paper]
- Bench2Drive: Towards Multi-Ability Benchmarking of Closed-Loop End-To-End Autonomous Driving (Arxiv 2024) [Paper] [Github]
- A Superalignment Framework in Autonomous Driving with Large Language Models (Arxiv 2024) [Paper]
- Enhancing End-to-End Autonomous Driving with Latent World Model (Arxiv 2024) [Paper]
- SimGen: Simulator-conditioned Driving Scene Generation (Arxiv 2024) [paper]
- Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset (Arxiv 2024) [paper] [Project]
- WonderWorld: Interactive 3D Scene Generation from a Single Image (Arxiv 2024) [Paper]
- CarLLaVA: Vision language models for camera-only closed-loop driving (Arxiv 2024) [Paper]
- End-to-End Autonomous Driving without Costly Modularization and 3D Manual Annotation (Arxiv 2024) [paper]
- CarLLaVA: Vision language models for camera-only closed-loop driving (Arxiv 2024) [Paper]
- BEVWorld: A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space (Arxiv 2024) [Paper] [Github]
- Exploring the Causality of End-to-End Autonomous Driving (Arxiv 2024) [paper] [Github]
- SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving (Arxiv 2024) [Paper]
- DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving (Arxiv 2024) [Paper] [Github]
- Leveraging LLMs for Enhanced Open-Vocabulary 3D Scene Understanding in Autonomous Driving (Arxiv 2024) [Paper]
- Open 3D World in Autonomous Driving (Arxiv 2024) [Paper]
- CoVLA: Comprehensive Vision-Language-Action Dataset for Autonomous Driving (Arxiv 2024) [Paper]
- Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving (Arxiv 2024) [Paper]
- DriveGenVLM: Real-world Video Generation for Vision Language Model based Autonomous Driving (Arxiv 2024) [Paper]
- OccLLaMA: An Occupancy-Language-Action Generative World Model for Autonomous Driving (Arxiv 2024) [Paper]
- Can LVLMs Obtain a Driver's License? A Benchmark Towards Reliable AGI for Autonomous Driving (Arxiv 2024) [Paper]
- ContextVLM: Zero-Shot and Few-Shot Context Understanding for Autonomous Driving using Vision Language Models (ITSC 2024) [Paper]
- MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving (Arxiv 2024) [paper]
- RenderWorld: World Model with Self-Supervised 3D Label (Arxiv 2024) [Paper]
- Video Token Sparsification for Efficient Multimodal LLMs in Autonomous Driving (Arxiv 2024) [Paper]
- DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input (Arxiv 2024) [Paper] [Project] [Github]
- METDrive: Multi-modal End-to-end Autonomous Driving with Temporal Guidance (Arxiv 2024) [paper]
- DOES END-TO-END AUTONOMOUS DRIVING REALLY NEED PERCEPTION TASKS? (Arxiv 2024) [Paper]
- Learning to Drive via Asymmetric Self-Play (Arxiv 2024) [Paper]
- Uncertainty-Guided Enhancement on Driving Perception System via Foundation Models (Arxiv 2024) [paper]
- ScVLM: a Vision-Language Model for Driving Safety Critical Event Understanding (Arxiv) [Paper]
- Learning to Drive via Asymmetric Self-Play (Arxiv 2024) [Paper]
- HE-Drive: Human-Like End-to-End Driving with Vision Language Models (Arxiv 2024) [Paper] [Project] [Paper]
- UniDrive: Towards Universal Driving Perception Across Camera Configurations (Arxiv 2024) [Paper] [Github]
- DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation (Arxiv 2024) [paper] [Github] [Project]
- DrivingDojo Dataset: Advancing Interactive and Knowledge-Enriched Driving World Model (NeurIPS 2024) [Paper] [Project] [Github]
- EMMA: End-to-End Multimodal Model for Autonomous Driving (Arxiv 2024) [Paper] [Github]
- Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving (Arxiv 2024) [Paper] [Github]
- X-Drive: Cross-modality consistent multi-sensor data synthesis for driving scenarios (Arxiv 2024) [Paper] [Github]
- Semantic MapNet: Building Allocentric Semantic Maps and Representations from Egocentric Views (AAAI 2021) [Paper] [Github] [Project]
- Trans4Map: Revisiting Holistic Bird’s-Eye-View Mapping from EgocentricImages to Allocentric Semantics with Vision Transformers (WACV 2023) [[Paper]](Trans4Map: Revisiting Holistic Bird’s-Eye-View Mapping from EgocentricImages to Allocentric Semantics with Vision Transformers)
- ViewBirdiformer: Learning to recover ground-plane crowd trajectories and ego-motion from a single ego-centric view (Arxiv 2022) [paper]
- 360BEV: Panoramic Semantic Mapping for Indoor Bird's-Eye View (Arxiv 2023) [Paper] [Github] [Project]
- F2BEV: Bird's Eye View Generation from Surround-View Fisheye Camera Images for Automated Driving (Arxiv 2023) [Paper]
- NVAutoNet: Fast and Accurate 360∘ 3D Visual Perception For Self Driving (Arxiv 2023) [Paper]
- FedBEVT: Federated Learning Bird's Eye View Perception Transformer in Road Traffic Systems (Arxiv 2023) [Paper]
- Aligning Bird-Eye View Representation of PointCloud Sequences using Scene Flow (IEEE IV 2023) [Paper] [Github]
- MotionBEV: Attention-Aware Online LiDARMoving Object Segmentation with Bird’s Eye Viewbased Appearance and Motion Features (Arxiv 2023) [Paper]
- WEDGE: A multi-weather autonomous driving dataset built from generativevision-language models (Arxiv 2023) [Paper] [Github] [Project]
- Leveraging BEV Representation for360-degree Visual Place Recognition (Arxiv 2023) [Paper]
- NMR: Neural Manifold Representation for Autonomous Driving (Arxiv 2023) [Paper]
- V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer (ECCV 2022) [Paper] [Github]
- DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative3D Object Detection (CVPR 2022) [Paper] [Github]
- Rope3D: The Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task (CVPR 2022) [Paper] [Github] [Project]
- A Motion and Accident Prediction Benchmark for V2X Autonomous Driving (Arxiv 2023) [Paper] [Project]
- BEVBert: Multimodal Map Pre-training for Language-guided Navigation (ICCV 2023) [Paper]
- V2X-Seq: A Large-Scale Sequential Dataset forVehicle-Infrastructure Cooperative Perception and Forecasting (Arxiv 2023) [Paper] [Github] [Project]
- BUOL: A Bottom-Up Framework with Occupancy-aware Lifting forPanoptic 3D Scene Reconstruction From A Single Image (CVPR 2023) [paper] [Github]
- BEVScope: Enhancing Self-Supervised Depth Estimation Leveraging Bird’s-Eye-View in Dynamic Scenarios (Arxiv 2023) [Paper]
- Bird’s-Eye-View Scene Graph for Vision-Language Navigation (Arxiv 2023) [paper]
- OpenAnnotate3D: Open-Vocabulary Auto-Labeling System for Multi-modal 3D Data (Arxiv 2023) [paper]
- Hidden Biases of End-to-End Driving Models (ICCV 2023) [Paper] [[Github]][https://github.com/autonomousvision/carla_garage]
- EgoVM: Achieving Precise Ego-Localization using Lightweight Vectorized Maps (Arxiv 2023) [Paper]
- End-to-end Autonomous Driving: Challenges and Frontiers (Arxiv 2023) [paper] [Github]
- BEVPlace: Learning LiDAR-based Place Recognition using Bird’s Eye View Images (ICCV 2023) [paper]
- I2P-Rec: Recognizing Images on Large-scale Point Cloud Maps through Bird’s Eye View Projections (IROS 2023) [Paper]
- Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving (Arxiv 2023) [Paper] [Project]
- BEV-DG: Cross-Modal Learning under Bird’s-Eye View for Domain Generalization of 3D Semantic Segmentation (ICCV 2023) [paper]
- MapPrior: Bird’s-Eye View Map Layout Estimation with Generative Models (ICCV 2023) [Paper] [Github] [Project]
- Sat2Graph: Road Graph Extraction through Graph-Tensor Encoding (ECCV 2020) [Paper] [Github]
- Occ2Net: Robust Image Matching Based on 3D Occupancy Estimation for Occluded Regions (ICCV 2023) [Paper]
- QUEST: Query Stream for Vehicle-Infrastructure Cooperative Perception (Arxiv 2023) [paper]
- Complementing Onboard Sensors with Satellite Map: A New Perspective for HD Map Construction (Arxiv 2023) [Paper]
- SyntheWorld: A Large-Scale Synthetic Dataset for Land Cover Mapping an Building Change Detection (Arxiv 2023) [paper]
- Rethinking Integration of Prediction and Planning in Deep Learning-Based Automated Driving Systems: A Review (Arxiv 2023) [Paper]
- BEV-CLIP: Multi-modal BEV Retrieval Methodology for Complex Scene in Autonomous Driving (Arxiv 2023) [paper]
- BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation (Arxiv 2023) [Paper]
- Towards Vehicle-to-everything Autonomous Driving: A Survey on Collaborative Perception (Arxiv 2023) [paper]
- PRED: Pre-training via Semantic Rendering on LiDAR Point Clouds (Arxiv 2023) [paper]
- BEVTrack: A Simple Baseline for 3D Single Object Tracking in Birds's-Eye-View (Arxiv 2023) [Paper] [Github]
- BEV-CV: Birds-Eye-View Transform for Cross-View Geo-Localisation (Arxiv 2023) [Paper]
- UC-NERF: NEURAL RADIANCE FIELD FOR UNDER-CALIBRATED MULTI-VIEW CAMERAS IN AUTONOMOUS DRIVING (Arxiv 2023) [paper] [Project] [Github]
- All for One, and One for All: UrbanSyn Dataset, the third Musketeer of Synthetic Driving Scenes (Arxiv 2023) [paper]
- BEVSeg2TP: Surround View Camera Bird’s-Eye-View Based Joint Vehicle Segmentation and Ego Vehicle Trajectory Prediction (Arxiv 2023) [Paper]
- BEVControl: Accurately Controlling Street-view Elements with Multi-perspective Consistency via BEV Sketch Layout (Arxiv 2023) [Paper]
- EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI (Arxiv 2023) [Paper] [Github]
- A Vision-Centric Approach for Static Map Element Annotation (Arxiv 2023) [paper]
- C-BEV: Contrastive Bird’s Eye View Training for Cross-View Image Retrieval and 3-DoF Pose Estimation (Arxiv 2023) [paper]
- Self-Supervised Bird's Eye View Motion Prediction with Cross-Modality Signals (Arxiv 2024) [Paper]
- GeoDecoder: Empowering Multimodal Map Understanding (Arxiv 2024) [Paper]
- Fisheye Camera and Ultrasonic Sensor Fusion For Near-Field Obstacle Perception in Bird’s-Eye-View (Arxiv 2024) [Paper]
- Text2Street: Controllable Text-to-image Generation for Street Views (Arxiv 2024) [paper]
- Zero-BEV: Zero-shot Projection of Any First-Person Modality to BEV Maps (Arxiv 2024) [Paper]
- EV2PR: BEV-Enhanced Visual Place Recognition with Structural Cues (Arxiv 2024) [paper]
- OpenOcc: Open Vocabulary 3D Scene Reconstruction via Occupancy Representation (Arxiv 2024) [paper]
- Bosch Street Dataset: A Multi-Modal Dataset with Imaging Radar for Automated Driving (Arxiv 2024) [paper]
- Streetscapes: Large-scale Consistent Street View Generation Using Autoregressive Video Diffusion (Arxiv 2024) [Paper] [Github]
- M2DA: Multi-Modal Fusion Transformer Incorporating Driver Attention for Autonomous Driving (Arxiv 2024) [Paper]
- MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors (Arxiv 2024) [Paper]
- Window-to-Window BEV Representation Learning for Limited FoV Cross-View Geo-localization (Arxiv 2024) [Paper]
- MapLocNet: Coarse-to-Fine Feature Registration for Visual Re-Localization in Navigation Maps (Arxiv 2024) [Paper]
- Neural Semantic Map-Learning for Autonomous Vehicles (Arxiv 2024) [Paper]
- AutoSplat: Constrained Gaussian Splatting for Autonomous Driving Scene Reconstruction [[Paper]](Arxiv 2024) [paper] [Project]
- MVPbev: Multi-view Perspective Image Generation from BEV with Test-time Controllability and Generalizability (Arxiv 2024) [paper] [Github]
- SkyDiffusion: Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm (Arxiv 2024) [Paper] [Github]
- UrbanWorld: An Urban World Model for 3D City Generation (Arxiv 2024) [Paper]
- From Bird's-Eye to Street View: Crafting Diverse and Condition-Aligned Images with Latent Diffusion Model (ICRA 2024) [Paper]
- Learning Content-Aware Multi-Modal Joint Input Pruning via Bird's-Eye-View Representation (Arxiv 2024) [paper]
- DriveScape: Towards High-Resolution Controllable Multi-View Driving Video Generation (Arxiv 2024) [Paper] [Project]
- BEVal: A Cross-dataset Evaluation Study of BEV Segmentation Models for Autononomous Driving (Arxiv 2024) [Paper]