Saved in:
| Main Authors: | Ren, Pengzhen, Li, Min, Luo, Zhen, Song, Xinshuai, Chen, Ziwei, Liufu, Weijia, Yang, Yixuan, Zheng, Hao, Xu, Rongtao, Huang, Zitong, Ding, Tongsheng, Xie, Luyang, Zhang, Kaidong, Fu, Changfei, Liu, Yang, Lin, Liang, Zheng, Feng, Liang, Xiaodan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.05789 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation
by: Zhang, Kaidong, et al.
Published: (2025)
by: Zhang, Kaidong, et al.
Published: (2025)
Surfer: Progressive Reasoning with World Models for Robotic Manipulation
by: Ren, Pengzhen, et al.
Published: (2023)
by: Ren, Pengzhen, et al.
Published: (2023)
PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
by: Zhang, Kaidong, et al.
Published: (2024)
by: Zhang, Kaidong, et al.
Published: (2024)
ArtiWorld: LLM-Driven Articulation of 3D Objects in Scenes
by: Yang, Yixuan, et al.
Published: (2025)
by: Yang, Yixuan, et al.
Published: (2025)
RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models
by: Liufu, Weijia, et al.
Published: (2026)
by: Liufu, Weijia, et al.
Published: (2026)
A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
by: Xu, Rongtao, et al.
Published: (2025)
by: Xu, Rongtao, et al.
Published: (2025)
All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents
by: Wang, Zhiqiang, et al.
Published: (2024)
by: Wang, Zhiqiang, et al.
Published: (2024)
OptiScene: LLM-driven Indoor Scene Layout Generation via Scaled Human-aligned Data Synthesis and Multi-Stage Preference Optimization
by: Yang, Yixuan, et al.
Published: (2025)
by: Yang, Yixuan, et al.
Published: (2025)
MagicSeg: Open-World Segmentation Pretraining via Counterfactural Diffusion-Based Auto-Generation
by: Cai, Kaixin, et al.
Published: (2026)
by: Cai, Kaixin, et al.
Published: (2026)
Graph Your Way to Inspiration: Integrating Co-Author Graphs with Retrieval-Augmented Generation for Large Language Model Based Scientific Idea Generation
by: Xie, Pengzhen, et al.
Published: (2025)
by: Xie, Pengzhen, et al.
Published: (2025)
RoboPearls: Editable Video Simulation for Robot Manipulation
by: Tang, Tao, et al.
Published: (2025)
by: Tang, Tao, et al.
Published: (2025)
MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation
by: Cai, Kaixin, et al.
Published: (2023)
by: Cai, Kaixin, et al.
Published: (2023)
A1: A Fully Transparent Open-Source, Adaptive and Efficient Truncated Vision-Language-Action Model
by: Zhang, Kaidong, et al.
Published: (2026)
by: Zhang, Kaidong, et al.
Published: (2026)
RoboReflect: A Robotic Reflective Reasoning Framework for Grasping Ambiguous-Condition Objects
by: Luo, Zhen, et al.
Published: (2025)
by: Luo, Zhen, et al.
Published: (2025)
3D-MoRe: Unified Modal-Contextual Reasoning for Embodied Question Answering
by: Xu, Rongtao, et al.
Published: (2025)
by: Xu, Rongtao, et al.
Published: (2025)
UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
by: Mao, Weijia, et al.
Published: (2025)
by: Mao, Weijia, et al.
Published: (2025)
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
by: Mao, Weijia, et al.
Published: (2025)
by: Mao, Weijia, et al.
Published: (2025)
Seeing through Imagination: Learning Scene Geometry via Implicit Spatial World Modeling
by: Cao, Meng, et al.
Published: (2025)
by: Cao, Meng, et al.
Published: (2025)
InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models
by: Yan, Yu, et al.
Published: (2024)
by: Yan, Yu, et al.
Published: (2024)
EchoVLA: Synergistic Declarative Memory for VLA-Driven Mobile Manipulation
by: Lin, Min, et al.
Published: (2025)
by: Lin, Min, et al.
Published: (2025)
MEIA: Multimodal Embodied Perception and Interaction in Unknown Environments
by: Liu, Yang, et al.
Published: (2024)
by: Liu, Yang, et al.
Published: (2024)
Interactive World Simulator for Robot Policy Training and Evaluation
by: Wang, Yixuan, et al.
Published: (2026)
by: Wang, Yixuan, et al.
Published: (2026)
Information-Theoretic Authenticated PIR: From PIR-RV To APIR
by: Ke, Pengzhen, et al.
Published: (2026)
by: Ke, Pengzhen, et al.
Published: (2026)
OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
by: Wang, Hao, et al.
Published: (2024)
by: Wang, Hao, et al.
Published: (2024)
Out of Sight, Out of Mind? Evaluating State Evolution in Video World Models
by: Ma, Ziqi, et al.
Published: (2026)
by: Ma, Ziqi, et al.
Published: (2026)
Structured Preference Optimization for Vision-Language Long-Horizon Task Planning
by: Liang, Xiwen, et al.
Published: (2025)
by: Liang, Xiwen, et al.
Published: (2025)
Associations of visual, hearing, and dual sensory impairment with motoric cognitive risk syndrome: Observational and Mendelian randomization analyses
by: Haixu Liang, et al.
Published: (2024)
by: Haixu Liang, et al.
Published: (2024)
Scalable Dexterous Robot Learning with AR-based Remote Human-Robot Interactions
by: Yang, Yicheng, et al.
Published: (2026)
by: Yang, Yicheng, et al.
Published: (2026)
RefComp: A Reference-guided Unified Framework for Unpaired Point Cloud Completion
by: Yang, Yixuan, et al.
Published: (2025)
by: Yang, Yixuan, et al.
Published: (2025)
GWM: Towards Scalable Gaussian World Models for Robotic Manipulation
by: Lu, Guanxing, et al.
Published: (2025)
by: Lu, Guanxing, et al.
Published: (2025)
SCORP: Scene-Consistent Object Refinement via Proxy Generation and Tuning
by: Chen, Ziwei, et al.
Published: (2025)
by: Chen, Ziwei, et al.
Published: (2025)
Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI
by: Liu, Yang, et al.
Published: (2024)
by: Liu, Yang, et al.
Published: (2024)
3D Visibility-aware Generalizable Neural Radiance Fields for Interacting Hands
by: Huang, Xuan, et al.
Published: (2024)
by: Huang, Xuan, et al.
Published: (2024)
Bootstrap Dynamic-Aware 3D Visual Representation for Scalable Robot Learning
by: Liang, Qiwei, et al.
Published: (2025)
by: Liang, Qiwei, et al.
Published: (2025)
GLaD: Geometric Latent Distillation for Vision-Language-Action Models
by: Guo, Minghao, et al.
Published: (2025)
by: Guo, Minghao, et al.
Published: (2025)
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
by: Zhao, Rui, et al.
Published: (2025)
by: Zhao, Rui, et al.
Published: (2025)
Multi-human Interactive Talking Dataset
by: Zhu, Zeyu, et al.
Published: (2025)
by: Zhu, Zeyu, et al.
Published: (2025)
On the Identifiability of Sparse ICA without Assuming Non-Gaussianity
by: Ng, Ignavier, et al.
Published: (2024)
by: Ng, Ignavier, et al.
Published: (2024)
RoVLA: Multi-Consistency Constraints for Robust Vision-Language-Action Models
by: Luo, Jingzhou, et al.
Published: (2026)
by: Luo, Jingzhou, et al.
Published: (2026)
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
by: Song, Xinshuai, et al.
Published: (2024)
by: Song, Xinshuai, et al.
Published: (2024)
Similar Items
-
RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation
by: Zhang, Kaidong, et al.
Published: (2025) -
Surfer: Progressive Reasoning with World Models for Robotic Manipulation
by: Ren, Pengzhen, et al.
Published: (2023) -
PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
by: Zhang, Kaidong, et al.
Published: (2024) -
ArtiWorld: LLM-Driven Articulation of 3D Objects in Scenes
by: Yang, Yixuan, et al.
Published: (2025) -
RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models
by: Liufu, Weijia, et al.
Published: (2026)