:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhen, Haoyu, Gao, Zixian, Sun, Qiao, Zhao, Yilin, Yang, Yuncong, Du, Yilun, Guo, Pengsheng, Wang, Tsun-Hsuan, Qiao, Yi-Ling, Gan, Chuang
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition Robotics
Online Access:	https://arxiv.org/abs/2604.06168
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

TesserAct: Learning 4D Embodied World Models
by: Zhen, Haoyu, et al.
Published: (2025)

3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning
by: Yang, Yuncong, et al.
Published: (2024)

AdaWorld: Learning Adaptable World Models with Latent Actions
by: Gao, Shenyuan, et al.
Published: (2025)

3D-VLA: A 3D Vision-Language-Action Generative World Model
by: Zhen, Haoyu, et al.
Published: (2024)

DiffE2E: Rethinking End-to-End Driving with a Hybrid Action Diffusion and Supervised Policy
by: Zhao, Rui, et al.
Published: (2025)

Learning 3D Persistent Embodied World Models
by: Zhou, Siyuan, et al.
Published: (2025)

See Less, Drive Better: Generalizable End-to-End Autonomous Driving via Foundation Models Stochastic Patch Selection
by: Mallak, Amir, et al.
Published: (2026)

End2Race: Efficient End-to-End Imitation Learning for Real-Time F1Tenth Racing
by: Qiao, Zhijie, et al.
Published: (2025)

ExACT: An End-to-End Autonomous Excavator System Using Action Chunking With Transformers
by: Chen, Liangliang, et al.
Published: (2024)

Unveiling the Surprising Efficacy of Navigation Understanding in End-to-End Autonomous Driving
by: Hua, Zhihua, et al.
Published: (2026)

Flex: End-to-End Text-Instructed Visual Navigation from Foundation Model Features
by: Chahine, Makram, et al.
Published: (2024)

LightEMMA: Lightweight End-to-End Multimodal Model for Autonomous Driving
by: Qiao, Zhijie, et al.
Published: (2025)

MindJourney: Test-Time Scaling with World Models for Spatial Reasoning
by: Yang, Yuncong, et al.
Published: (2025)

O-ConNet: Geometry-Aware End-to-End Inference of Over-Constrained Spatial Mechanisms
by: Sun, Haoyu, et al.
Published: (2026)

End-to-End Humanoid Robot Safe and Comfortable Locomotion Policy
by: Wang, Zifan, et al.
Published: (2025)

Bridging Perception and Planning: Towards End-to-End Planning for Signal Temporal Logic Tasks
by: Ye, Bowen, et al.
Published: (2025)

Fast-SmartWay: Panoramic-Free End-to-End Zero-Shot Vision-and-Language Navigation
by: Shi, Xiangyu, et al.
Published: (2025)

A Knowledge-Driven Diffusion Policy for End-to-End Autonomous Driving Based on Expert Routing
by: Xu, Chengkai, et al.
Published: (2025)

ImaginationPolicy: Towards Generalizable, Precise and Reliable End-to-End Policy for Robotic Manipulation
by: Lu, Dekun, et al.
Published: (2025)

Efficient Fusion and Task Guided Embedding for End-to-end Autonomous Driving
by: Guo, Yipin, et al.
Published: (2024)

DriveMoE: Mixture-of-Experts for Vision-Language-Action Model in End-to-End Autonomous Driving
by: Yang, Zhenjie, et al.
Published: (2025)

PhyScensis: Physics-Augmented LLM Agents for Complex Physical Scene Arrangement
by: Wang, Yian, et al.
Published: (2026)

Grounding Video Models to Actions through Goal Conditioned Exploration
by: Luo, Yunhao, et al.
Published: (2024)

RobotSmith: Generative Robotic Tool Design for Acquisition of Complex Manipulation Skills
by: Lin, Chunru, et al.
Published: (2025)

AnchDrive: Bootstrapping Diffusion Policies with Hybrid Trajectory Anchors for End-to-End Driving
by: Chai, Jinhao, et al.
Published: (2025)

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion
by: Chi, Cheng, et al.
Published: (2023)

UBSoft: A Simulation Platform for Robotic Skill Learning in Unbounded Soft Environments
by: Lin, Chunru, et al.
Published: (2024)

HiST-VLA: A Hierarchical Spatio-Temporal Vision-Language-Action Model for End-to-End Autonomous Driving
by: Wang, Yiru, et al.
Published: (2026)

End-to-End Multi-Task Policy Learning from NMPC for Quadruped Locomotion
by: Sajja, Anudeep, et al.
Published: (2025)

The Lie We Tell: Correcting the Euclidean Fallacy in Vision Language Action Policies via Score Matching on Tangent Space
by: Chuang, Bing-Cheng, et al.
Published: (2026)

RoboDreamer: Learning Compositional World Models for Robot Imagination
by: Zhou, Siyuan, et al.
Published: (2024)

YOPOv2-Tracker: An End-to-End Agile Tracking and Navigation Framework from Perception to Action
by: Lu, Junjie, et al.
Published: (2025)

SAGE:State-Aware Guided End-to-End Policy for Multi-Stage Sequential Tasks via Hidden Markov Decision Process
by: Wu, BinXu, et al.
Published: (2025)

Articulate AnyMesh: Open-Vocabulary 3D Articulated Objects Modeling
by: Qiu, Xiaowen, et al.
Published: (2025)

DVDP: An End-to-End Policy for Mobile Robot Visual Docking with RGB-D Perception
by: Min, Haohan, et al.
Published: (2025)

VDRive: Leveraging Reinforced VLA and Diffusion Policy for End-to-end Autonomous Driving
by: Guo, Ziang, et al.
Published: (2025)

DIAL: Decoupling Intent and Action via Latent World Modeling for End-to-End VLA
by: Chen, Yi, et al.
Published: (2026)

Model-Based Policy Adaptation for Closed-Loop End-to-End Autonomous Driving
by: Lin, Haohong, et al.
Published: (2025)

Percept-WAM: Perception-Enhanced World-Awareness-Action Model for Robust End-to-End Autonomous Driving
by: Han, Jianhua, et al.
Published: (2025)

Raising Body Ownership in End-to-End Visuomotor Policy Learning via Robot-Centric Pooling
by: Zhuang, Zheyu, et al.
Published: (2024)