:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Haodong, Liu, Shaoteng, Lin, Zhe, Chandraker, Manmohan
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2602.07775
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Tuned Contrastive Learning
by: Animesh, Chaitanya, et al.
Published: (2023)

Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in Images and Videos
by: Kalluri, Tarun, et al.
Published: (2024)

HorizonForge: Driving Scene Editing with Any Trajectories and Any Vehicles
by: Wang, Yifan, et al.
Published: (2026)

Locally Orderless Images for Optimization in Differentiable Rendering
by: Mehta, Ishit, et al.
Published: (2025)

UDA-Bench: Revisiting Common Assumptions in Unsupervised Domain Adaptation Using a Standardized Framework
by: Kalluri, Tarun, et al.
Published: (2024)

Drive-1-to-3: Enriching Diffusion Priors for Novel View Synthesis of Real Vehicles
by: Lin, Chuang, et al.
Published: (2024)

Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
by: Khan, Zaid, et al.
Published: (2024)

Instantaneous Perception of Moving Objects in 3D
by: Liu, Di, et al.
Published: (2024)

Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation
by: Aich, Abhishek, et al.
Published: (2024)

Taming Self-Training for Open-Vocabulary Object Detection
by: Zhao, Shiyu, et al.
Published: (2023)

Rolling Forcing: Autoregressive Long Video Diffusion in Real Time
by: Liu, Kunhao, et al.
Published: (2025)

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion
by: Huang, Xun, et al.
Published: (2025)

HorizonWeaver: Generalizable Multi-Level Semantic Editing for Driving Scenes
by: Soroco, Mauricio, et al.
Published: (2026)

AutoScape: Geometry-Consistent Long-Horizon Scene Generation
by: Chen, Jiacheng, et al.
Published: (2025)

What to Test Next: Interpretable Coverage Gap Discovery in Driving VLMs
by: Aich, Abhishek, et al.
Published: (2026)

DySink: Dynamic Frame Sinks for Autoregressive Long Video Generation
by: Ye, Bo, et al.
Published: (2026)

LidaRF: Delving into Lidar for Neural Radiance Field on Street Scenes
by: Sun, Shanlin, et al.
Published: (2024)

PhyCo: Learning Controllable Physical Priors for Generative Motion
by: Narayanan, Sriram, et al.
Published: (2026)

iFinder: Structured Zero-Shot Vision-Based LLM Grounding for Dash-Cam Video Reasoning
by: Yao, Manyi, et al.
Published: (2025)

Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision
by: Kalluri, Tarun, et al.
Published: (2023)

Train Short, Inference Long: Training-free Horizon Extension for Autoregressive Video Generation
by: Li, Jia, et al.
Published: (2026)

Generating Enhanced Negatives for Training Language-Based Object Detectors
by: Zhao, Shiyu, et al.
Published: (2023)

RAD-LAD: Rule and Language Grounded Autonomous Driving in Real-Time
by: Ghosh, Anurag, et al.
Published: (2026)

LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning
by: Sharan, S P, et al.
Published: (2023)

LangDriveCTRL: Natural Language Controllable Driving Scene Editing with Multi-modal Agents
by: He, Yun, et al.
Published: (2025)

End-to-End Training for Autoregressive Video Diffusion via Self-Resampling
by: Guo, Yuwei, et al.
Published: (2025)

Training-Free Open-Ended Object Detection and Segmentation via Attention as Prompts
by: Lin, Zhiwei, et al.
Published: (2024)

NERFIFY: A Multi-Agent Framework for Turning NeRF Papers into Code
by: Jain, Seemandhar, et al.
Published: (2026)

BridgeIV: Bridging Customized Image and Video Generation through Test-Time Autoregressive Identity Propagation
by: Hu, Panwen, et al.
Published: (2025)

Image-Specific Adaptation of Transformer Encoders for Compute-Efficient Segmentation
by: Yao, Manyi, et al.
Published: (2024)

Pathwise Test-Time Correction for Autoregressive Long Video Generation
by: Xiang, Xunzhi, et al.
Published: (2026)

Generative Video Propagation
by: Liu, Shaoteng, et al.
Published: (2024)

E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
by: Liu, Ye, et al.
Published: (2024)

Training-Free Efficient Video Generation via Dynamic Token Carving
by: Zhang, Yuechen, et al.
Published: (2025)

AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering
by: Chen, Xiuyuan, et al.
Published: (2023)

Efficient Autoregressive Video Diffusion with Dummy Head
by: Guo, Hang, et al.
Published: (2026)

Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels
by: Liang, Tianming, et al.
Published: (2024)

SAFE-SIM: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries
by: Chang, Wei-Jer, et al.
Published: (2023)

ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models
by: Ko, Dohwan, et al.
Published: (2025)

Entropy-Guided k-Guard Sampling for Long-Horizon Autoregressive Video Generation
by: Han, Yizhao, et al.
Published: (2026)