:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lei, Huashuo, Song, Wenxuan, Zhang, Huarui, Pei, Jieyuan, Chen, Jiayi, Yan, Haodong, Zhao, Han, Ding, Pengxiang, Zhang, Zhipeng, Huang, Lida, Wang, Donglin, Wang, Yan, Li, Haoang
Format:	Preprint
Published:	2026
Subjects:	Robotics
Online Access:	https://arxiv.org/abs/2605.10921
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Rethinking the Practicality of Vision-language-action Model: A Comprehensive Benchmark and An Improved Baseline
by: Song, Wenxuan, et al.
Published: (2026)

ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver
by: Song, Wenxuan, et al.
Published: (2025)

CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding
by: Song, Wenxuan, et al.
Published: (2025)

Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process
by: Chen, Jiayi, et al.
Published: (2025)

Fast-dVLA: Accelerating Discrete Diffusion VLA to Real-Time Performance
by: Song, Wenxuan, et al.
Published: (2026)

Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model
by: Li, Fuhao, et al.
Published: (2025)

CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models
by: Song, Wenxuan, et al.
Published: (2026)

VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework for Unseen Concept Manipulation
by: Zhao, Han, et al.
Published: (2025)

Towards a Unified Understanding of Robot Manipulation: A Comprehensive Survey
by: Bai, Shuanghao, et al.
Published: (2025)

GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot
by: Song, Wenxuan, et al.
Published: (2024)

QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
by: Ding, Pengxiang, et al.
Published: (2023)

PD-VLA: Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding
by: Song, Wenxuan, et al.
Published: (2025)

ProFD: Prompt-Guided Feature Disentangling for Occluded Person Re-Identification
by: Cui, Can, et al.
Published: (2024)

MemCoT: Test-Time Scaling through Memory-Driven Chain-of-Thought
by: Lei, Haodong, et al.
Published: (2026)

FRAPPE: Infusing World Modeling into Generalist Policies via Multiple Future Representation Alignment
by: Zhao, Han, et al.
Published: (2026)

DFM-VLA: Iterative Action Refinement for Robot Manipulation via Discrete Flow Matching
by: Chen, Jiayi, et al.
Published: (2026)

Embodied Robot Manipulation in the Era of Foundation Models: Planning and Learning Perspectives
by: Bai, Shuanghao, et al.
Published: (2025)

Score-Based Diffusion Policy Compatible with Reinforcement Learning via Optimal Transport
by: Sun, Mingyang, et al.
Published: (2025)

Iterative Refinement of Flow Policies in Probability Space for Online Reinforcement Learning
by: Sun, Mingyang, et al.
Published: (2025)

ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning
by: Zhang, Hongyin, et al.
Published: (2025)

FlowVLA: Visual Chain of Thought-based Motion Reasoning for Vision-Language-Action Models
by: Zhong, Zhide, et al.
Published: (2025)

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents
by: Zhang, Haozhen, et al.
Published: (2026)

Mem-W: Latent Memory-Native GUI Agents
by: Zhang, Guibin, et al.
Published: (2026)

VLAS: Vision-Language-Action Model With Speech Instructions For Customized Robot Manipulation
by: Zhao, Wei, et al.
Published: (2025)

EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective
by: Wang, Yuyao, et al.
Published: (2026)

EvolMem: A Cognitive-Driven Benchmark for Multi-Session Dialogue Memory
by: Shen, Ye, et al.
Published: (2026)

GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation
by: Zhang, Hongyin, et al.
Published: (2025)

MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments
by: Liu, Guangyi, et al.
Published: (2026)

Rethinking Target Label Conditioning in Adversarial Attacks: A 2D Tensor-Guided Generative Approach
by: Liu, Hangyu, et al.
Published: (2025)

MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models
by: Zhao, Han, et al.
Published: (2025)

Rethinking Latent Redundancy in Behavior Cloning: An Information Bottleneck Approach for Robot Manipulation
by: Bai, Shuanghao, et al.
Published: (2025)

VLA-OPD: Bridging Offline SFT and Online RL for Vision-Language-Action Models via On-Policy Distillation
by: Zhong, Zhide, et al.
Published: (2026)

RationalVLA: A Rational Vision-Language-Action Model with Dual System
by: Song, Wenxuan, et al.
Published: (2025)

RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies
by: Atreya, Pranav, et al.
Published: (2025)

CloneMem: Benchmarking Long-Term Memory for AI Clones
by: Hu, Sen, et al.
Published: (2026)

MemFlow: Intent-Driven Memory Orchestration for Small Language Model Agents
by: Chen, Jiayi, et al.
Published: (2026)

Expressive Forecasting of 3D Whole-body Human Motions
by: Ding, Pengxiang, et al.
Published: (2023)

MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
by: Zhang, Guibin, et al.
Published: (2025)

MemEvolve: Meta-Evolution of Agent Memory Systems
by: Zhang, Guibin, et al.
Published: (2025)

CUBic: Coordinated Unified Bimanual Perception and Control Framework
by: Wang, Xingyu, et al.
Published: (2026)