Saved in:
| Main Authors: | Lin, Jieru, Yu, Zhiwei, Karlsson, Börje F. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.17649 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards Long-horizon Embodied Agents with Tool-Aligned Vision-Language-Action Models
by: Lei, Zixing, et al.
Published: (2026)
by: Lei, Zixing, et al.
Published: (2026)
World-Ego Modeling for Long-Horizon Evolution in Hybrid Embodied Tasks
by: Lin, Zuyao, et al.
Published: (2026)
by: Lin, Zuyao, et al.
Published: (2026)
OccSim: Multi-kilometer Simulation with Long-horizon Occupancy World Models
by: Liu, Tianran, et al.
Published: (2026)
by: Liu, Tianran, et al.
Published: (2026)
PInVerify: An Offline Embodied Benchmark for Active Instance Verification
by: Jiang, Yuhang
Published: (2026)
by: Jiang, Yuhang
Published: (2026)
Embodied Tree of Thoughts: Deliberate Manipulation Planning with Embodied World Model
by: Xu, Wenjiang, et al.
Published: (2025)
by: Xu, Wenjiang, et al.
Published: (2025)
Universal Actions for Enhanced Embodied Foundation Models
by: Zheng, Jinliang, et al.
Published: (2025)
by: Zheng, Jinliang, et al.
Published: (2025)
RANGER: A Monocular Zero-Shot Semantic Navigation Framework through Visual Contextual Adaptation
by: Yu, Ming-Ming, et al.
Published: (2025)
by: Yu, Ming-Ming, et al.
Published: (2025)
NoisyEQA: Benchmarking Embodied Question Answering Against Noisy Queries
by: Wu, Tao, et al.
Published: (2024)
by: Wu, Tao, et al.
Published: (2024)
Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation
by: Yang, Xiuyu, et al.
Published: (2025)
by: Yang, Xiuyu, et al.
Published: (2025)
Rethinking Video Generation Model for the Embodied World
by: Deng, Yufan, et al.
Published: (2026)
by: Deng, Yufan, et al.
Published: (2026)
ST-BiBench: Benchmarking Multi-Stream Multimodal Coordination in Bimanual Embodied Tasks for MLLMs
by: Wu, Xin, et al.
Published: (2026)
by: Wu, Xin, et al.
Published: (2026)
EmbodieDreamer: Advancing Real2Sim2Real Transfer for Policy Training via Embodied World Modeling
by: Wang, Boyuan, et al.
Published: (2025)
by: Wang, Boyuan, et al.
Published: (2025)
DeformPAM: Data-Efficient Learning for Long-horizon Deformable Object Manipulation via Preference-based Action Alignment
by: Chen, Wendi, et al.
Published: (2024)
by: Chen, Wendi, et al.
Published: (2024)
EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models
by: Bai, Yu, et al.
Published: (2026)
by: Bai, Yu, et al.
Published: (2026)
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks
by: Hung, Chia-Yu, et al.
Published: (2025)
by: Hung, Chia-Yu, et al.
Published: (2025)
Embodied Navigation at the Art Gallery
by: Bigazzi, Roberto, et al.
Published: (2022)
by: Bigazzi, Roberto, et al.
Published: (2022)
SocialNav: Training Human-Inspired Foundation Model for Socially-Aware Embodied Navigation
by: Chen, Ziyi, et al.
Published: (2025)
by: Chen, Ziyi, et al.
Published: (2025)
WoMAP: World Models For Embodied Open-Vocabulary Object Localization
by: Yin, Tenny, et al.
Published: (2025)
by: Yin, Tenny, et al.
Published: (2025)
The Safety Challenge of World Models for Embodied AI Agents: A Review
by: Baraldi, Lorenzo, et al.
Published: (2025)
by: Baraldi, Lorenzo, et al.
Published: (2025)
Embodied Spatial Intelligence: from Implicit Scene Modeling to Spatial Reasoning
by: Fang, Jiading
Published: (2025)
by: Fang, Jiading
Published: (2025)
Wow, wo, val! A Comprehensive Embodied World Model Evaluation Turing Test
by: Fan, Chun-Kai, et al.
Published: (2026)
by: Fan, Chun-Kai, et al.
Published: (2026)
Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL
by: Zhong, Fangwei, et al.
Published: (2024)
by: Zhong, Fangwei, et al.
Published: (2024)
Scene Graph-Guided Proactive Replanning for Failure-Resilient Embodied Agent
by: Yu, Che Rin, et al.
Published: (2025)
by: Yu, Che Rin, et al.
Published: (2025)
Embodied Uncertainty-Aware Object Segmentation
by: Fang, Xiaolin, et al.
Published: (2024)
by: Fang, Xiaolin, et al.
Published: (2024)
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
by: Wang, Qineng, et al.
Published: (2025)
by: Wang, Qineng, et al.
Published: (2025)
Neural Brain: A Neuroscience-inspired Framework for Embodied Agents
by: Liu, Jian, et al.
Published: (2025)
by: Liu, Jian, et al.
Published: (2025)
BOSS: Benchmark for Observation Space Shift in Long-Horizon Task
by: Yang, Yue, et al.
Published: (2025)
by: Yang, Yue, et al.
Published: (2025)
Prune-Then-Plan: Step-Level Calibration for Stable Frontier Exploration in Embodied Question Answering
by: Frahm, Noah, et al.
Published: (2025)
by: Frahm, Noah, et al.
Published: (2025)
RynnEC: Bringing MLLMs into Embodied World
by: Dang, Ronghao, et al.
Published: (2025)
by: Dang, Ronghao, et al.
Published: (2025)
OctoNav: Towards Generalist Embodied Navigation
by: Gao, Chen, et al.
Published: (2025)
by: Gao, Chen, et al.
Published: (2025)
TrackVLA++: Unleashing Reasoning and Memory Capabilities in VLA Models for Embodied Visual Tracking
by: Liu, Jiahang, et al.
Published: (2025)
by: Liu, Jiahang, et al.
Published: (2025)
ABot-N0: Technical Report on the VLA Foundation Model for Versatile Embodied Navigation
by: Chu, Zedong, et al.
Published: (2026)
by: Chu, Zedong, et al.
Published: (2026)
HASARD: A Benchmark for Vision-Based Safe Reinforcement Learning in Embodied Agents
by: Tomilin, Tristan, et al.
Published: (2025)
by: Tomilin, Tristan, et al.
Published: (2025)
Dejavu: Towards Experience Feedback Learning for Embodied Intelligence
by: Wu, Shaokai, et al.
Published: (2025)
by: Wu, Shaokai, et al.
Published: (2025)
Vision-Language Navigation with Embodied Intelligence: A Survey
by: Gao, Peng, et al.
Published: (2024)
by: Gao, Peng, et al.
Published: (2024)
LLM-RG: Referential Grounding in Outdoor Scenarios using Large Language Models
by: Saxena, Pranav, et al.
Published: (2025)
by: Saxena, Pranav, et al.
Published: (2025)
PRISM: A Multi-View Multi-Capability Retail Video Dataset for Embodied Vision-Language Models
by: Rouhi, Amirreza, et al.
Published: (2026)
by: Rouhi, Amirreza, et al.
Published: (2026)
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI
by: Choi, Suhwan, et al.
Published: (2025)
by: Choi, Suhwan, et al.
Published: (2025)
LoopVLA: Learning Sufficiency in Recurrent Refinement for Vision-Language-Action Models
by: Shen, Boyang, et al.
Published: (2026)
by: Shen, Boyang, et al.
Published: (2026)
SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction
by: Zhou, Yang, et al.
Published: (2024)
by: Zhou, Yang, et al.
Published: (2024)
Similar Items
-
Towards Long-horizon Embodied Agents with Tool-Aligned Vision-Language-Action Models
by: Lei, Zixing, et al.
Published: (2026) -
World-Ego Modeling for Long-Horizon Evolution in Hybrid Embodied Tasks
by: Lin, Zuyao, et al.
Published: (2026) -
OccSim: Multi-kilometer Simulation with Long-horizon Occupancy World Models
by: Liu, Tianran, et al.
Published: (2026) -
PInVerify: An Offline Embodied Benchmark for Active Instance Verification
by: Jiang, Yuhang
Published: (2026) -
Embodied Tree of Thoughts: Deliberate Manipulation Planning with Embodied World Model
by: Xu, Wenjiang, et al.
Published: (2025)