:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Yu, Liao, Yue, Mei, Jianbiao, Wang, Baisen, Yang, Xuemeng, Wen, Licheng, Zhang, Jiangning, Li, Xiangtai, Lv, Liang, Chen, Hanlin, Shi, Botian, Liu, Yong, Yan, Shuicheng, Lee, Gim Hee
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2603.08403
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Vision-Centric 4D Occupancy Forecasting and Planning via Implicit Residual World Models
by: Mei, Jianbiao, et al.
Published: (2025)

EvolveR: Self-Evolving LLM Agents through an Experience-Driven Lifecycle
by: Wu, Rong, et al.
Published: (2025)

LiCROcc: Teach Radar for Accurate Semantic Occupancy Prediction using LiDAR and Camera
by: Ma, Yukai, et al.
Published: (2024)

Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks
by: Yang, Cheng, et al.
Published: (2025)

DreamForge: Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes
by: Mei, Jianbiao, et al.
Published: (2024)

The Agent's First Day: Benchmarking Learning, Exploration, and Scheduling in the Workplace Scenarios
by: Fu, Daocheng, et al.
Published: (2026)

KG-TRACES: Enhancing Large Language Models with Knowledge Graph-constrained Trajectory Reasoning and Attribution Supervision
by: Wu, Rong, et al.
Published: (2025)

LeapVAD: A Leap in Autonomous Driving via Cognitive Perception and Dual-Process Thinking
by: Ma, Yukai, et al.
Published: (2025)

UR-Bench: A Benchmark for Multi-Hop Reasoning over Ultra-High-Resolution Images
by: Li, Siqi, et al.
Published: (2025)

RE-Searcher: Robust Agentic Search with Goal-oriented Planning and Self-reflection
by: Fu, Daocheng, et al.
Published: (2025)

X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability
by: Yang, Yu, et al.
Published: (2025)

O$^2$-Searcher: A Searching-based Agent Model for Open-Domain Open-Ended Question Answering
by: Mei, Jianbiao, et al.
Published: (2025)

DriveArena: A Closed-loop Generative Simulation Platform for Autonomous Driving
by: Yang, Xuemeng, et al.
Published: (2024)

Continuously Learning, Adapting, and Improving: A Dual-Process Approach to Autonomous Driving
by: Mei, Jianbiao, et al.
Published: (2024)

SPIKE: An Adaptive Dual Controller Framework for Cost-Efficient Long-Horizon Game Agents
by: Jiang, Wencan, et al.
Published: (2026)

GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields
by: Wang, Yunsong, et al.
Published: (2024)

ChatSplat: 3D Conversational Gaussian Splatting
by: Chen, Hanlin, et al.
Published: (2024)

SPIRAL: Symbolic LLM Planning via Grounded and Reflective Search
by: Zhang, Yifan, et al.
Published: (2025)

Robust Dreamer: Deviation-Aware Latent Gaussian Memory for Action-Controlled AR Video Generation
by: Chen, Hanlin, et al.
Published: (2026)

FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor Scenes
by: Wang, Yunsong, et al.
Published: (2024)

NeuSG: Neural Implicit Surface Reconstruction with 3D Gaussian Splatting Guidance
by: Chen, Hanlin, et al.
Published: (2023)

UNIKD: UNcertainty-filtered Incremental Knowledge Distillation for Neural Implicit Representation
by: Guo, Mengqi, et al.
Published: (2022)

FreeSplat++: Generalizable 3D Gaussian Splatting for Efficient Indoor Scene Reconstruction
by: Wang, Yunsong, et al.
Published: (2025)

DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries
by: Zhou, Yikang, et al.
Published: (2024)

AdaVideoRAG: Omni-Contextual Adaptive Retrieval-Augmented Efficient Long Video Understanding
by: Xue, Zhucun, et al.
Published: (2025)

Point Cloud Mamba: Point Cloud Learning via State Space Model
by: Zhang, Tao, et al.
Published: (2024)

MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
by: Zhang, Guibin, et al.
Published: (2025)

Explore In-Context Segmentation via Latent Diffusion Models
by: Wang, Chaoyang, et al.
Published: (2024)

Deblur e-NeRF: NeRF from Motion-Blurred Events under High-speed or Low-light Conditions
by: Low, Weng Fei, et al.
Published: (2024)

M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition
by: Wang, Mengmeng, et al.
Published: (2024)

Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation
by: Han, Yue, et al.
Published: (2023)

EATFormer: Improving Vision Transformer Inspired by Evolutionary Algorithm
by: Zhang, Jiangning, et al.
Published: (2022)

MemVerse: Multimodal Memory for Lifelong Learning Agents
by: Liu, Junming, et al.
Published: (2025)

VLA-4D: Embedding 4D Awareness into Vision-Language-Action Models for SpatioTemporally Coherent Robotic Manipulation
by: Zhou, Hanyu, et al.
Published: (2025)

IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?
by: Chen, Yang, et al.
Published: (2025)

Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model
by: Huang, Kuan-Chih, et al.
Published: (2024)

Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving
by: Yang, Yu, et al.
Published: (2024)

Solving nonconvex Hamilton--Jacobi--Isaacs equations with PINN-based policy iteration
by: Yang, Hee Jun, et al.
Published: (2025)

Visual Document Understanding and Reasoning: A Multi-Agent Collaboration Framework with Agent-Wise Adaptive Test-Time Scaling
by: Yu, Xinlei, et al.
Published: (2025)

Decouple and Track: Benchmarking and Improving Video Diffusion Transformers for Motion Transfer
by: Shi, Qingyu, et al.
Published: (2025)