Generative AI for Autonomous Driving
We welcome anyone to contribute to this repository. Please raise issues or pull requests for any missing papers, datasets, or methods. We will update the repository regularly.
Single-Vehicle Perception Datasets
Dataset
Data Source
Sampling Rate
Camera Type
LiDAR
Radar
HD Map
Annotation Type
KITTI (2012)
Karlsruhe, Germany
10 Hz
Stereo (2 cameras)
✅
3D Bounding Boxes
Cityscapes (2016)
50 German Cities
N/A
Stereo (2 cameras)
2D Segmentation
ApolloScape (2018)
Various Cities in China
N/A
Stereo (2 cameras)
✅
✅
Semantic Segmentation
Honda H3D (2019)
Bay Area, USA
N/A
Frontal View (1 camera)
✅
3D Bounding Boxes
nuScenes (2019)
Boston, Pittsburgh, Singapore
2 Hz
Surround View (6 cameras)
✅
✅
3D Bounding Boxes
Waymo Open Dataset (2019)
Multiple US Cities
10 Hz
Frontal/Side (5 cameras)
✅
✅
3D Bounding Boxes
Argoverse (2019)
Miami and Pittsburgh
10 Hz
Surround View
✅
✅
3D Bounding Boxes
PandaSet (2020)
San Francisco
N/A
Surround View (7 cameras)
✅
3D Bounding Boxes, Segmentation
Audi A2D2 (2020)
Various Cities in Germany
10 Hz
Surround View (6 cameras)
✅
3D Bounding Boxes
ONCE Dataset (2021)
Various Cities in China
10 Hz
Surround View (7 cameras)
✅
3D Bounding Boxes
Motion Forecasting and Cooperative Driving Datasets
Dataset
Data Source
Sampling Rate
Camera Type
LiDAR
HD Map
Annotation Type
HighD (2018)
German Highways
N/A
Drone (Bird's-eye View)
Agent 2D Bounding Boxes
INTERACTION (2019)
US, China, EU Intersections
10 Hz
Drone and Fixed Cameras
✅
Agent Trajectories
PIE (2019)
Toronto, Canada
30 Hz
Frontal View (1 camera)
Pedestrian Bounding Boxes, Intention Labels
Argoverse 1 & 2 (2019, 2022)
Miami and Pittsburgh
10 Hz
Surround View
✅
✅
Agent Trajectories
Lyft Level 5 (2020)
Palo Alto, USA
10 Hz
Surround View
✅
✅
Agent 3D Bounding Boxes
rounD (2020)
German Roundabouts
N/A
Drone (Bird's-eye View)
Vehicle 2D Bounding Boxes
Waymo Open Motion (2021)
Multiple US Cities
10 Hz
None
Vehicle, Pedestrian, Cyclist Trajectories
nuPlan (2021)
Multiple US Cities
10 Hz
Surround View
✅
✅
Agent 3D Bounding Boxes
LOKI (2021)
Japan Intersections
5 Hz
Vehicle Cameras
✅
✅
3D Bounding Boxes, Intention Labels
DAIR-V2X (2021)
China Intersections
N/A
Vehicle and Roadside Cameras
✅
3D Bounding Boxes
exiD (2022)
German Highway Exits
N/A
Drone (Bird's-eye View)
Vehicle 2D Bounding Boxes
V2X-Seq (2023)
Urban Intersections
10 Hz
Vehicle and Roadside Cameras
✅
✅
3D Agent Bounding Boxes
V2V4Real (2023)
Ohio, USA
10 Hz
Surround View
✅
3D Bounding Boxes
UniOcc (2025)
Various Cities in US
10 Hz
Surround View
✅
3D Occupancy Grids
Simulation Based Datasets
Dataset
Data Source
Modality
QA Type
# QA Pairs
BDD-X (2018)
Dashcam Recordings
Videos (40s clips)
Ego Intention, Scene Description
7K
DRAMA (2023)
Japan Driving Videos
Video
Risk Object, Ego Intention, Ego Actions, Reasoning
170K
Rank2Tell (2024)
US Driving Videos
Video
Object Importance, Ego Intention, Ego Actions, Reasoning
300K
LingoQA (2024)
Driving Videos (4s clips)
Video
Scene Description, Recommended Actions, Reasoning
419K
NuScenes-QA (2024)
nuScenes
Same as nuScenes
Scene Description
460K
DriveLM (2024)
nuScenes, CARLA
Same as nuScenes
Multi-step Reasoning
360K
NuPlanQA (2025) Not Released as of April 2025
nuPlan
Same as nuPlan
Perception, Spatial Reasoning, Ego Intentions
1M
NuInstruct (2024)
nuScenes
Same as nuScenes
Instruction–Response Pairs Across 17 Task Types
91K
doScenes (2024)
nuScenes
Same as nuScenes
Free-Form Driving Instructions and Scene Reference Points
4K
MAPLM (2024)
Chinese Cities
Image, LiDAR
Detailed Map Description (Lanes, Road, Signs)
61K
NuScenes-MQA (2024)
nuScenes
Same as nuScenes
Scene Captioning, Visual QA
1.5M
DriveBench (2025)
nuScenes
Same as DriveLM
Visual QA
20k
Method
Venue
Dataset
Modeling Type
Backbone
Control Variables
GeoDiffusion
ICLR'24
nuScenes, COCO-Stuff
Diffusion, VAE
U-Net
Object Box, Camera Pose, Text
DetDiffusion
CVPR'24
COCO-Stuff
Diffusion, VAE
U-Net
Object Box, Perception, Text
BEVGen
IEEE RA-L'24
nuScenes, Argoverse 2
VQ-VAE
Transformer
BEV Map, Object Box, Text
BEVControl
arXiv'23
nuScenes
VAE
CNN, Transformer, CLIP
BEV Sketch, Text
MagicDrive
ICLR'24
nuScenes
Diffusion, VAE
U-Net
Road Map, Object Box, Camera Pose
MagicDrive3D
arXiv'24
nuScenes
3DGS, Diffusion, VAE
U-Net
BEV Map, Object Box, Camera Pose
Drive-WM
CVPR'24
Driving Data
Diffusion, VAE
U-Net
Map, Text
SimGen
NeurIPS'24
YouTube
Diffusion, SDEdit
U-Net
BEV, Text
DatasetDM
NeurIPS'23
-
Diffusion, LLM, VAE
U-Net, ControlNet
Text
DriveGAN
CVPR'21
RWD
GAN, VAE
CNN, LSTM, MLP
Steering, Speed, Scene Features
LightDiff
CVPR'24
nuScenes
VAE, Diffusion
U-Net
Lighting Conditions
Streetscapes
SIGGRAPH'24
Google Street View
Diffusion
ControlNet
Road Map, Height Map, Camera Pose
Wovogen
ECCV'24
Urban Driving
Diffusion, AutoEncoder
CNN, CLIP
Text, World Volumes, Ego Actions
HoloDrive
arXiv'24
nuScenes
VAE, Diffusion
U-Net, Attention
Text, 2D Layout
WeatherDG
arXiv'24
Cityscapes
Diffusion, LLM
VAE, U-Net
Text
UrbanArchitect
arXiv'24
nuScenes
Diffusion, ControlNet
VAE
Text, 3D Layout
Decompositional Generation
Method
Venue
Dataset
Modeling Type
Backbone
Control Variables
ChatSim
CVPR'24
Waymo Open Dataset
LLM, NeRF
MLP, Transformer
3D Assets
UrbanGIRAFFE
ICCV'23
KITTI-360, CLEVR-W
NeRF
MLP
Camera Pose, Panoptic Prior
Sat2Scene
CVPR'24
HoliCity, OmniCity
NeRF
MLP
Satellite Images, Layout, 3D Constraints
Block-NeRF
CVPR'22
Block-NeRF Dataset
NeRF
MLP
Spatial Block Layout, 3D Constraints
S-NeRF
CVPR'23
nuScenes, Waymo Open Dataset
NeRF
MLP
Camera Path, 3D Constraints
NF-LDM
CVPR'23
VizDoom, Replica, AVD
Diffusion, NeRF
MLP
Scene Embedding, 3D Constraints
Panoptic NeRF
IEEE 3DIMPVT'22
KITTI 360
NeRF
MLP
Semantic Segmentation, 3D Constraints
Neural Point Light Field
CVPR'22
Waymo Open Dataset
NeRF
MLP
Camera Pose, 3D Constraints
Neural Scene Graphs
CVPR'21
KITTI
NeRF
MLP
Object Graph Topology, 3D Constraints
UniSim
CVPR'23
PandaSet
NeRF
MLP
Agent Profile, 3D Constraints
CADSim
CoRL'23
MVMC, PandaSet
Differentiable CAD Rendering
MLP
CAD Geometry, 3D Constraints
Method
Venue
Dataset
Modeling Type
Backbone
Control Mechanism
Generation Type
LiDMs
CVPR'24
nuScenes, KITTI-360
Diffusion
CNN, U-Net
Multi-modal conditions
Scene Generation
RangeLDM
ECCV'24
KITTI-360, nuScenes
Diffusion, VAE
CNN, U-Net
Partial Point Cloud
Scene Completion, Generation
LidarDM
ICRA'25
KITTI-360, WOD
Diffusion, VAE
CNN
Semantic Map
LiDAR Simulation & Raycasting
DynamicCity
ICLR'25
Occ3D, CarlaSC
Diffusion, VAE
Transformer, CNN
Layout, Trajectory, Text, Inpainting
4D Occupancy Scene Generation
GenMM
arXiv'24
BDD100K, WOD
Diffusion
U-Net, Transformer
3D Bounding Boxes, Reference Image
Object-Level Manipulation
Text2LiDAR
ECCV'24
KITTI-360, nuScenes
Diffusion
Transformer
Text
Full Scene Generation
UltraLiDAR
CVPR'23
PandaSet, KITTI
VQ-VAE
Transformer
Sparse Point Cloud
Scene Completion, Generation
LidarGRIT
CVPR-W'24
KITTI-360, KITTI odometry
VQ-VAE
Transformer
Unconditional
Scene Generation
NeRF-LiDAR
CVPR'24
nuScenes
NeRF
U-Net, MLP
Camera Poses, Multi-view Images
LiDAR Simulation
LiDAR4D
CVPR'24
KITTI, nuScenes
NeRF
U-Net, MLP
Camera Poses, Multi-view LiDAR Point Cloud
LiDAR Simulation
DyNFL
CVPR'24
WOD
Neural SDF
MLP
LiDAR Scans, 3D Bounding Boxes
LiDAR Simulation
LiDARsim
CVPR'20
LiDARsim Dataset
Physics-based Raycasting
Raycasting Engine, U-Net
3D backgrounds, Dynamic Object Meshes
LiDAR Simulation
PCGen
ICRA'23
WOD
FPA Raycasting
Raycasting Engine, MLP
Reconstructed Scenario
LiDAR Simulation
LiDARGEN
ECCV'22
KITTI-360, nuScenes
Score-Based
U-Net
Sparse Point Cloud
Scene Generation
Yue et al.
ACM'18
KITTI
Physics-based Raycasting
Raycasting Engine
Pre-defined In-game Scene Parameters
LiDAR Simulation
Trajectory Generation Methods
Method
Venue
Dataset
Modeling Type
Backbone
Kim et al.
IEEE Access'21
Real-world Driving
CVAE
DeepConvLSTM
Barbié et al.
JRM'19
Synthetic
CVAE
RNN
CGNS
IROS'19
ETH/UCY, SDD
GAN
CNN
EvolveGraph
NeurIPS'20
ETH/UCY, SDD, H3D
Autoregressive
GNN
STG-DAT
T-ITS'21
ETH/UCY, SDD
CVAE
GNN
PathGAN
ETRI'21
iSUN
GAN
CNN
MID
CVPR'22
ETH/UCY, Stanford Drone
Diffusion
Transformer
LED
CVPR'23
ETH/UCY
Diffusion
Leapfrog
SingularTrajectory
CVPR'24
Multiple Benchmarks
Diffusion
SVD
Diffusion-Planner
ICLR'25
nuPlan
Diffusion
Transformer
GPT-Driver
NeurIPS'23
nuScenes
LLM
Transformer
DriveLM
ECCV'24
nuScenes
VLM
Transformer
LMDrive
CVPR'24
CARLA
LLM
Transformer
OpenEMMA
WACV'25
nuScenes
VLM
Transformer
Desire
CVPR'17
KITTI, Stanford Drone
CVAE
RNN
Trajectron
ICCV'19
ETH/UCY
CVAE
Graph RNN
Trajectron++
ECCV'20
ETH/UCY, nuScenes
CVAE
Constrained Graph RNN
Social GAN
CVPR'18
ETH/UCY
GAN
RNN
SoPhie
CVPR'19
ETH/UCY
GAN
Cross Attention
Social-BiGAT
NeurIPS'19
ETH/UCY
Bicycle-GAN
Graph Attention Network
MotionDiffuser
CVPR'23
WOMD
Diffusion
Transformer
SDT
OpenReview'24
AV2
Diffusion
Transformer
Westny et al.
arXiv'24
rounD, highD
Diffusion
GNN
LMTrajectory
CVPR'24
ETH/UCY
LLM
Transformer
TrafficSim
CVPR'21
ATG4D (private)
CVAE
GNN
TrafficBots
ICRA'23
WOMD
CVAE
MLP
DJINN
NeurIPS'23
INTERACTION
Diffusion
Transformer
Scenario Diffusion
NeurIPS'23
AV2
Diffusion
UNet
BehaviorGPT
NeurIPS'25
WOMD
Autoregressive
Transformer
FSDrive
NeurIPS'25
nuScenes
Autoregressive,LLM,VLA
Transformer
3D Occupancy Generation Methods
Method
Venue+Year
Dataset
Modeling Type
Backbone
Control Mechanism
Generation Type
Code
UrbanDiffusion
arXiv'24
nuScenes via Occ3D
VQ-VAE
Diffusion
BEV Layout
Static Scene
Not Released
DOME
arXiv'24
nuScenes via Occ3D
VAE
DiT
Ego Trajectory
Scene and Agent Only
Not Released
OccWorld
ECCV'24
nuScenes via Occ3D
VQ-VAE
Transformer
Past Occupancy
Scene and Agent
GitHub
OccSORA [Redacted]
arXiv'24
nuScenes via Occ3D
VQ-VAE
DiT
Ego Trajectory, Past Occupancy
Scene and Agent
GitHub*
OccLLaMA
arXiv'24
nuScenes via Occ3D
VQ-VAE
LLaMA
Language
Scene and Agent
Not Released
UnO
CVPR'24
nuScenes, Argoverse2
Not Specified
Transformer
Past Occupancy
Semantic LiDAR
Not Released
DynamicCity
ICLR'25
CARLA
VAE
DiT
Ego Trajectory
Scene and Agent
GitHub
Video-based Scene Generation Methods
Note:
For the "Condition" column:
I = Image, T = Text, E = BEV, B = Bounding Boxes/Layout,
D = Depth, C = Camera, M = Maps, A = Driver Action,
O = Optical Flow, J = Trajectory, S = Subject, H = High-level instructions (Command, Goal Point).
Conditions in brackets are optional.
Method
Year
Modeling
Backbone
Frames
FPS
Condition
Closed-loop
LLMs
Code
Panacea
CVPR'24
Diffusion
ControlNet
8
2
ITEBDCM
Github
Delphi
CoRR'24
Diffusion
U-Net
40
2
TEBC
✅
N/A
DriveDreamer
ECCV'24
Diffusion
U-Net, Transformer
32
12
ITMBA
Github
DriveDreamer-2
ArXiv'24
Diffusion
U-Net
8
4
T(ECI)
✅
Github
DriveScape
ArXiv'24
Diffusion
U-Net
30
2-10
IMEB
N/A
DriveArena
CoRR'24
Diffusion, AR
U-Net
N/A
12
TBCM
✅
Github
DriveGen
ArXiv'24
Diffusion
U-Net
-
-
ITB
Github
DrivingDiffusion
ECCV'24
Diffusion
U-Net
-
-
ITBO
Github
Vista
CoRR'24
Diffusion, AR
U-Net
25
10
I(AHJ)
Github
SubjectDrive
CoRR'24
Diffusion
ControlNet
8
2
ITSB
N/A
GenAD
CVPR'24
Diffusion
Transformer
8
2
ITAJ
✅
N/A
DrivingWorld
ArXiv'24
AR
Transformer, GPT
400
10
IJ
✅
Github
Doe-1
ArXiv'24
N/A
N/A
-
2
ITJ
✅
✅
Github
ChatSim
CVPR'24
Agent
N/A
40
10
IT
✅
Github
ProphetDWM
ArXiv'25
Diffusion
U-Net
10
4
ITA
N/A
LongDWM *
ArXiv'25
Diffusion
Transformer
13
10
ITJ
Github
*: Not released as of June.2025.
Note:
In the "Condition" column:
M = Maps, I = Images/Videos, B = 3D Bounding Boxes/Layout, J = Trajectory, T = Text, O = Opacity, C = Camera, A = Driving Action.
* means not presented in the original paper but supported later.
\dagger means reconstruction models with a generative prior.
Method
Venue
Task
Modeling Type
Backbone
Condition
Output
Code
InfiniCube
ArXiv'24
4D Gen.
3DGS, DiT
3D U-Net, ControlNet
MBJT
Video, 3DGS
N/A
WoVoGen
ECCV'24
4D Gen.
Diffusion
3D U-Net, Transformer
MOTA
Video
Github
DriveX
ArXiv'24
4D Gen.
Diffusion
U-Net
MOTA
Video, 3DGS
Github
ChatSim
CVPR'24
4D Gen.
NeRF, 3DGS*
Transformer
IT
Video
Github
MagicDrive3D
CORR'24
4D Gen.
3DGS
MLP
TEBJ
Video, 3DGS
Github
DreamDrive
ArXiv'24
4D Gen.
3DGS, Diffusion
MLP
IJ
Video, 3DGS
N/A
OmniRe
ICLR'25
4D Rec.
3DGS, Graph
N/A
I(CD)
3DGS, SMPL
Github
4DGF
NeurIPS'24
4D Rec.
3DGS, Graph
N/A
IC(D)
3DGS
Github
StreetGaussian
ECCV'24
4D Rec.
3DGS
N/A
ICD
3DGS
Github
DrivingGaussian
CVPR'24
4D Rec.
3DGS
N/A, Graph
ICD
3DGS
N/A
SGD
CORR'24
4D Rec.\dagger
3DGS
U-Net, ControlNet
ITCD
3DGS
N/A
EmerNeRF
ICLR'24
4D Rec.
NeRF
MLP
ICD
NeRF
Github
VastGaussian
CVPR'24
3D Rec.
3DGS
CNN
IC
3DGS
N/A
CityGaussian
ECCV'24
3D Rec.
3DGS
N/A
IC
3DGS
Github
DNMP
ICCV'23
3D Rec.
Voxel, Mesh
MLP
ICD
Voxel, Mesh
Github
S-NeRF
ICLR'23
3D Rec.
NeRF
MLP
ICD
NeRF
Github
BlockNeRF
CVPR'22
3D Rec.
NeRF
MLP
IC
NeRF
N/A
UrbanNeRF
CVPR'22
3D Rec.
NeRF
MLP
ICD
NeRF
N/A
Julian et al.
CVPR'21
4D Rec.
NeRF, Graph
MLP
IC
NeRF
Github
STORM
ICLR'25
4D Rec.
3DGS
Transformer
IC
3DGS
Github
Here we note their supported operations and output format.
Method
Modeling Type
Insertion
Removal
Manipulation
Camera
LiDAR
Code
UniSim
NeRF
✔️
✔️
✔️
✔️
✔️
N/A
DrivingGaussian
3DGS
✔️
✔️
Github
StreetGaussian
3DGS
✔️
✔️
✔️
✔️
N/A
Generative LiDAR
Generative Inpainting
✔️
✔️
✔️
✔️
✔️
N/A
DriveEditor
SAM, Video Diffusion
✔️
✔️
✔️
✔️
N/A
LLM-based Autonomous Driving Systems
In the condition column, QA stands for question answering, DM for decision making, ED for environment description, SU for scene understanding, and DC for driving context.
Method
Venue
Interaction
Task
Scenario
Backbone
Strategy
Input
Output
Code
Dilu
ArXiv'23
Prompting
QA
DM
GPT-4
ReAct
ED
Action
Github
Drive-Like-A-Human
WACV'24
Prompting
QA
DM
GPT-3.5
ReAct
ED
Action
Github
Driving-with-LLMs
ICRA'24
Fine-tuning
QA
SU
LLaMA-7b
None
Question
Answer
Github
LaMPilot
CVPR'24
Prompting
QA
SU
General LLMs
PoT
Instruction, DC
Code
Github
LLaDA
CVPR'24
Prompting
QA
DM
GPT-4
CoT
Intended Command
Action
Github
GPT-driver
NeurIPS'23
Fine-tuning
Planning
E2E
GPT-3.5
CoT
Instruction, DC
Object, Action, Trajectory
Github
Talk2Drive
ITSC'24
Prompting
Planning
E2E
GPT-4
CoT
Instruction, DC
Executable Controls
Github
Agent-Driver
COLM'24
Prompting
Planning
E2E
GPT-3.5
ReAct
Observation
Object, Action, Trajectory
Github
MLLM-based Autonomous Driving Systems
In the condition column, VQA stands for visual question answering, SU for scene understanding, DS for driving scene, MVF for multi-view frame, and TC for transportation context.
Method
Venue
Interaction
Task
Scenario
Backbone
Strategy
Input
Output
Code
HiLM-D
ArXiv'23
Prompting
VQA
SU
MiniGPT-4
None
Question, DS (Video)
Answer
N/A
DriveLM
ECCV'24
Fine-tuning
VQA
SU
BLIP-2
CoT
Question, DS (Image)
Answer
Github
Dolphins
ECCV'24
Fine-tuning
VQA
SU
OpenFlamingo
CoT
Question, DS (Video)
Answer
Github
EM-VLM4AD
CVPR'24
Fine-tuning
VQA
SU
T5/T5-Large
None
Question, DS (MVF)
Answer
Github
LLM-Augmented-MTR
IROS'24
Prompting
VQA
SU
GPT-4V
CoT
Instruction, TC-Map
Context Understanding
Github
LMDrive
CVPR'24
Fine-tuning
Planning
E2E
LLaVA-v1.5
CoT
Instruction, DS (MVF), LiDAR
Control Signal
Github
LeGo-Drive
IROS'24
Fine-tuning
Planning
E2E
CLIP
None
Instruction, DS (Image)
Trajectory
Github
RAG-Driver
ArXiv'24
Fine-tuning
Planning
E2E
ViT-B/32, Vicuna-1.5
RAG
Instruction, DS (Video)
Action, Trajectory
Github
DriveVLM
CoRL'24
Fine-tuning
Planning
E2E
Qwen-V
CoT
Instruction, DS (Video)
Action, Trajectory
N/A
EMMA
ArXiv'24
Fine-tuning
Planning
E2E
Gemini 1.0 Nano-1
CoT
Instruction, DS (MVF)
Object, Action, Trajectory
N/A
OpenEMMA
WACV'25
Prompting
Planning
E2E
General MLLMs
CoT
Instruction, DS (Image)
Object, Action, Trajectory
Github
FSDrive
NeurIPS'25
Fine-tuning
Planning
E2E
Qwen2-VL 2B
Visual CoT
Instruction, DS (MVF)
Future Frame,Trajectory
Github
If you find this repository useful for your research, please consider citing the following paper:
@article {wang2025generative ,
title ={ Generative AI for Autonomous Driving: Frontiers and Opportunities} ,
author ={ Yuping Wang and Shuo Xing and Cui Can and Renjie Li and Hongyuan Hua and Kexin Tian and Zhaobin Mo and Xiangbo Gao and Keshu Wu and Sulong Zhou and Hengxu You and Juntong Peng and Junge Zhang and Zehao Wang and Rui Song and Mingxuan Yan and Walter Zimmer and Xingcheng Zhou and Peiran Li and Zhaohan Lu and Chia-Ju Chen and Yue Huang and Ryan A. Rossi and Lichao Sun and Hongkai Yu and Zhiwen Fan and Frank Hao Yang and Yuhao Kang and Ross Greer and Chenxi Liu and Eun Hak Lee and Xuan Di and Xinyue Ye and Liu Ren and Alois Knoll and Xiaopeng Li and Shuiwang Ji and Masayoshi Tomizuka and Marco Pavone and Tianbao Yang and Jing Du and Ming-Hsuan Yang and Hua Wei and Ziran Wang and Yang Zhou and Jiachen Li and Zhengzhong Tu} ,
year ={ 2025} ,
eprint ={ 2505.08854} ,
archivePrefix ={ arXiv} ,
primaryClass ={ cs.CV}
}