Saved in:
| Main Authors: | Li, Haodong, Liu, Shaoteng, Lin, Zhe, Chandraker, Manmohan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.07775 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Tuned Contrastive Learning
by: Animesh, Chaitanya, et al.
Published: (2023)
by: Animesh, Chaitanya, et al.
Published: (2023)
Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in Images and Videos
by: Kalluri, Tarun, et al.
Published: (2024)
by: Kalluri, Tarun, et al.
Published: (2024)
HorizonForge: Driving Scene Editing with Any Trajectories and Any Vehicles
by: Wang, Yifan, et al.
Published: (2026)
by: Wang, Yifan, et al.
Published: (2026)
Locally Orderless Images for Optimization in Differentiable Rendering
by: Mehta, Ishit, et al.
Published: (2025)
by: Mehta, Ishit, et al.
Published: (2025)
UDA-Bench: Revisiting Common Assumptions in Unsupervised Domain Adaptation Using a Standardized Framework
by: Kalluri, Tarun, et al.
Published: (2024)
by: Kalluri, Tarun, et al.
Published: (2024)
Drive-1-to-3: Enriching Diffusion Priors for Novel View Synthesis of Real Vehicles
by: Lin, Chuang, et al.
Published: (2024)
by: Lin, Chuang, et al.
Published: (2024)
Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
by: Khan, Zaid, et al.
Published: (2024)
by: Khan, Zaid, et al.
Published: (2024)
Instantaneous Perception of Moving Objects in 3D
by: Liu, Di, et al.
Published: (2024)
by: Liu, Di, et al.
Published: (2024)
Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation
by: Aich, Abhishek, et al.
Published: (2024)
by: Aich, Abhishek, et al.
Published: (2024)
Taming Self-Training for Open-Vocabulary Object Detection
by: Zhao, Shiyu, et al.
Published: (2023)
by: Zhao, Shiyu, et al.
Published: (2023)
Rolling Forcing: Autoregressive Long Video Diffusion in Real Time
by: Liu, Kunhao, et al.
Published: (2025)
by: Liu, Kunhao, et al.
Published: (2025)
Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
by: Huang, Xun, et al.
Published: (2025)
by: Huang, Xun, et al.
Published: (2025)
HorizonWeaver: Generalizable Multi-Level Semantic Editing for Driving Scenes
by: Soroco, Mauricio, et al.
Published: (2026)
by: Soroco, Mauricio, et al.
Published: (2026)
AutoScape: Geometry-Consistent Long-Horizon Scene Generation
by: Chen, Jiacheng, et al.
Published: (2025)
by: Chen, Jiacheng, et al.
Published: (2025)
What to Test Next: Interpretable Coverage Gap Discovery in Driving VLMs
by: Aich, Abhishek, et al.
Published: (2026)
by: Aich, Abhishek, et al.
Published: (2026)
DySink: Dynamic Frame Sinks for Autoregressive Long Video Generation
by: Ye, Bo, et al.
Published: (2026)
by: Ye, Bo, et al.
Published: (2026)
LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes
by: Sun, Shanlin, et al.
Published: (2024)
by: Sun, Shanlin, et al.
Published: (2024)
PhyCo: Learning Controllable Physical Priors for Generative Motion
by: Narayanan, Sriram, et al.
Published: (2026)
by: Narayanan, Sriram, et al.
Published: (2026)
iFinder: Structured Zero-Shot Vision-Based LLM Grounding for Dash-Cam Video Reasoning
by: Yao, Manyi, et al.
Published: (2025)
by: Yao, Manyi, et al.
Published: (2025)
Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision
by: Kalluri, Tarun, et al.
Published: (2023)
by: Kalluri, Tarun, et al.
Published: (2023)
Train Short, Inference Long: Training-free Horizon Extension for Autoregressive Video Generation
by: Li, Jia, et al.
Published: (2026)
by: Li, Jia, et al.
Published: (2026)
Generating Enhanced Negatives for Training Language-Based Object Detectors
by: Zhao, Shiyu, et al.
Published: (2023)
by: Zhao, Shiyu, et al.
Published: (2023)
RAD-LAD: Rule and Language Grounded Autonomous Driving in Real-Time
by: Ghosh, Anurag, et al.
Published: (2026)
by: Ghosh, Anurag, et al.
Published: (2026)
LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning
by: Sharan, S P, et al.
Published: (2023)
by: Sharan, S P, et al.
Published: (2023)
LangDriveCTRL: Natural Language Controllable Driving Scene Editing with Multi-modal Agents
by: He, Yun, et al.
Published: (2025)
by: He, Yun, et al.
Published: (2025)
End-to-End Training for Autoregressive Video Diffusion via Self-Resampling
by: Guo, Yuwei, et al.
Published: (2025)
by: Guo, Yuwei, et al.
Published: (2025)
Training-Free Open-Ended Object Detection and Segmentation via Attention as Prompts
by: Lin, Zhiwei, et al.
Published: (2024)
by: Lin, Zhiwei, et al.
Published: (2024)
NERFIFY: A Multi-Agent Framework for Turning NeRF Papers into Code
by: Jain, Seemandhar, et al.
Published: (2026)
by: Jain, Seemandhar, et al.
Published: (2026)
BridgeIV: Bridging Customized Image and Video Generation through Test-Time Autoregressive Identity Propagation
by: Hu, Panwen, et al.
Published: (2025)
by: Hu, Panwen, et al.
Published: (2025)
Image-Specific Adaptation of Transformer Encoders for Compute-Efficient Segmentation
by: Yao, Manyi, et al.
Published: (2024)
by: Yao, Manyi, et al.
Published: (2024)
Pathwise Test-Time Correction for Autoregressive Long Video Generation
by: Xiang, Xunzhi, et al.
Published: (2026)
by: Xiang, Xunzhi, et al.
Published: (2026)
Generative Video Propagation
by: Liu, Shaoteng, et al.
Published: (2024)
by: Liu, Shaoteng, et al.
Published: (2024)
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
by: Liu, Ye, et al.
Published: (2024)
by: Liu, Ye, et al.
Published: (2024)
Training-Free Efficient Video Generation via Dynamic Token Carving
by: Zhang, Yuechen, et al.
Published: (2025)
by: Zhang, Yuechen, et al.
Published: (2025)
AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering
by: Chen, Xiuyuan, et al.
Published: (2023)
by: Chen, Xiuyuan, et al.
Published: (2023)
Efficient Autoregressive Video Diffusion with Dummy Head
by: Guo, Hang, et al.
Published: (2026)
by: Guo, Hang, et al.
Published: (2026)
Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels
by: Liang, Tianming, et al.
Published: (2024)
by: Liang, Tianming, et al.
Published: (2024)
SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries
by: Chang, Wei-Jer, et al.
Published: (2023)
by: Chang, Wei-Jer, et al.
Published: (2023)
ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models
by: Ko, Dohwan, et al.
Published: (2025)
by: Ko, Dohwan, et al.
Published: (2025)
Entropy-Guided k-Guard Sampling for Long-Horizon Autoregressive Video Generation
by: Han, Yizhao, et al.
Published: (2026)
by: Han, Yizhao, et al.
Published: (2026)
Similar Items
-
Tuned Contrastive Learning
by: Animesh, Chaitanya, et al.
Published: (2023) -
Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in Images and Videos
by: Kalluri, Tarun, et al.
Published: (2024) -
HorizonForge: Driving Scene Editing with Any Trajectories and Any Vehicles
by: Wang, Yifan, et al.
Published: (2026) -
Locally Orderless Images for Optimization in Differentiable Rendering
by: Mehta, Ishit, et al.
Published: (2025) -
UDA-Bench: Revisiting Common Assumptions in Unsupervised Domain Adaptation Using a Standardized Framework
by: Kalluri, Tarun, et al.
Published: (2024)