:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ren, Pengzhen, Li, Min, Luo, Zhen, Song, Xinshuai, Chen, Ziwei, Liufu, Weijia, Yang, Yixuan, Zheng, Hao, Xu, Rongtao, Huang, Zitong, Ding, Tongsheng, Xie, Luyang, Zhang, Kaidong, Fu, Changfei, Liu, Yang, Lin, Liang, Zheng, Feng, Liang, Xiaodan
Format:	Preprint
Published:	2024
Subjects:	Robotics
Online Access:	https://arxiv.org/abs/2412.05789
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation
by: Zhang, Kaidong, et al.
Published: (2025)

Surfer: Progressive Reasoning with World Models for Robotic Manipulation
by: Ren, Pengzhen, et al.
Published: (2023)

PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
by: Zhang, Kaidong, et al.
Published: (2024)

ArtiWorld: LLM-Driven Articulation of 3D Objects in Scenes
by: Yang, Yixuan, et al.
Published: (2025)

RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models
by: Liufu, Weijia, et al.
Published: (2026)

A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
by: Xu, Rongtao, et al.
Published: (2025)

All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents
by: Wang, Zhiqiang, et al.
Published: (2024)

OptiScene: LLM-driven Indoor Scene Layout Generation via Scaled Human-aligned Data Synthesis and Multi-Stage Preference Optimization
by: Yang, Yixuan, et al.
Published: (2025)

MagicSeg: Open-World Segmentation Pretraining via Counterfactural Diffusion-Based Auto-Generation
by: Cai, Kaixin, et al.
Published: (2026)

Graph Your Way to Inspiration: Integrating Co-Author Graphs with Retrieval-Augmented Generation for Large Language Model Based Scientific Idea Generation
by: Xie, Pengzhen, et al.
Published: (2025)

RoboPearls: Editable Video Simulation for Robot Manipulation
by: Tang, Tao, et al.
Published: (2025)

MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation
by: Cai, Kaixin, et al.
Published: (2023)

A1: A Fully Transparent Open-Source, Adaptive and Efficient Truncated Vision-Language-Action Model
by: Zhang, Kaidong, et al.
Published: (2026)

RoboReflect: A Robotic Reflective Reasoning Framework for Grasping Ambiguous-Condition Objects
by: Luo, Zhen, et al.
Published: (2025)

3D-MoRe: Unified Modal-Contextual Reasoning for Embodied Question Answering
by: Xu, Rongtao, et al.
Published: (2025)

UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
by: Mao, Weijia, et al.
Published: (2025)

UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning
by: Mao, Weijia, et al.
Published: (2025)

Seeing through Imagination: Learning Scene Geometry via Implicit Spatial World Modeling
by: Cao, Meng, et al.
Published: (2025)

InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models
by: Yan, Yu, et al.
Published: (2024)

EchoVLA: Synergistic Declarative Memory for VLA-Driven Mobile Manipulation
by: Lin, Min, et al.
Published: (2025)

MEIA: Multimodal Embodied Perception and Interaction in Unknown Environments
by: Liu, Yang, et al.
Published: (2024)

Interactive World Simulator for Robot Policy Training and Evaluation
by: Wang, Yixuan, et al.
Published: (2026)

Information-Theoretic Authenticated PIR: From PIR-RV To APIR
by: Ke, Pengzhen, et al.
Published: (2026)

OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
by: Wang, Hao, et al.
Published: (2024)

Out of Sight, Out of Mind? Evaluating State Evolution in Video World Models
by: Ma, Ziqi, et al.
Published: (2026)

Structured Preference Optimization for Vision-Language Long-Horizon Task Planning
by: Liang, Xiwen, et al.
Published: (2025)

Associations of visual, hearing, and dual sensory impairment with motoric cognitive risk syndrome: Observational and Mendelian randomization analyses
by: Haixu Liang, et al.
Published: (2024)

Scalable Dexterous Robot Learning with AR-based Remote Human-Robot Interactions
by: Yang, Yicheng, et al.
Published: (2026)

RefComp: A Reference-guided Unified Framework for Unpaired Point Cloud Completion
by: Yang, Yixuan, et al.
Published: (2025)

GWM: Towards Scalable Gaussian World Models for Robotic Manipulation
by: Lu, Guanxing, et al.
Published: (2025)

SCORP: Scene-Consistent Object Refinement via Proxy Generation and Tuning
by: Chen, Ziwei, et al.
Published: (2025)

Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI
by: Liu, Yang, et al.
Published: (2024)

3D Visibility-aware Generalizable Neural Radiance Fields for Interacting Hands
by: Huang, Xuan, et al.
Published: (2024)

Bootstrap Dynamic-Aware 3D Visual Representation for Scalable Robot Learning
by: Liang, Qiwei, et al.
Published: (2025)

GLaD: Geometric Latent Distillation for Vision-Language-Action Models
by: Guo, Minghao, et al.
Published: (2025)

DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
by: Zhao, Rui, et al.
Published: (2025)

Multi-human Interactive Talking Dataset
by: Zhu, Zeyu, et al.
Published: (2025)

On the Identifiability of Sparse ICA without Assuming Non-Gaussianity
by: Ng, Ignavier, et al.
Published: (2024)

RoVLA: Multi-Consistency Constraints for Robust Vision-Language-Action Models
by: Luo, Jingzhou, et al.
Published: (2026)

Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
by: Song, Xinshuai, et al.
Published: (2024)