Saved in:
| Main Authors: | Lei, Huashuo, Song, Wenxuan, Zhang, Huarui, Pei, Jieyuan, Chen, Jiayi, Yan, Haodong, Zhao, Han, Ding, Pengxiang, Zhang, Zhipeng, Huang, Lida, Wang, Donglin, Wang, Yan, Li, Haoang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.10921 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Rethinking the Practicality of Vision-language-action Model: A Comprehensive Benchmark and An Improved Baseline
by: Song, Wenxuan, et al.
Published: (2026)
by: Song, Wenxuan, et al.
Published: (2026)
ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver
by: Song, Wenxuan, et al.
Published: (2025)
by: Song, Wenxuan, et al.
Published: (2025)
CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding
by: Song, Wenxuan, et al.
Published: (2025)
by: Song, Wenxuan, et al.
Published: (2025)
Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process
by: Chen, Jiayi, et al.
Published: (2025)
by: Chen, Jiayi, et al.
Published: (2025)
Fast-dVLA: Accelerating Discrete Diffusion VLA to Real-Time Performance
by: Song, Wenxuan, et al.
Published: (2026)
by: Song, Wenxuan, et al.
Published: (2026)
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model
by: Li, Fuhao, et al.
Published: (2025)
by: Li, Fuhao, et al.
Published: (2025)
CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models
by: Song, Wenxuan, et al.
Published: (2026)
by: Song, Wenxuan, et al.
Published: (2026)
VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework for Unseen Concept Manipulation
by: Zhao, Han, et al.
Published: (2025)
by: Zhao, Han, et al.
Published: (2025)
Towards a Unified Understanding of Robot Manipulation: A Comprehensive Survey
by: Bai, Shuanghao, et al.
Published: (2025)
by: Bai, Shuanghao, et al.
Published: (2025)
GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot
by: Song, Wenxuan, et al.
Published: (2024)
by: Song, Wenxuan, et al.
Published: (2024)
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
by: Ding, Pengxiang, et al.
Published: (2023)
by: Ding, Pengxiang, et al.
Published: (2023)
PD-VLA: Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding
by: Song, Wenxuan, et al.
Published: (2025)
by: Song, Wenxuan, et al.
Published: (2025)
ProFD: Prompt-Guided Feature Disentangling for Occluded Person Re-Identification
by: Cui, Can, et al.
Published: (2024)
by: Cui, Can, et al.
Published: (2024)
MemCoT: Test-Time Scaling through Memory-Driven Chain-of-Thought
by: Lei, Haodong, et al.
Published: (2026)
by: Lei, Haodong, et al.
Published: (2026)
FRAPPE: Infusing World Modeling into Generalist Policies via Multiple Future Representation Alignment
by: Zhao, Han, et al.
Published: (2026)
by: Zhao, Han, et al.
Published: (2026)
DFM-VLA: Iterative Action Refinement for Robot Manipulation via Discrete Flow Matching
by: Chen, Jiayi, et al.
Published: (2026)
by: Chen, Jiayi, et al.
Published: (2026)
Embodied Robot Manipulation in the Era of Foundation Models: Planning and Learning Perspectives
by: Bai, Shuanghao, et al.
Published: (2025)
by: Bai, Shuanghao, et al.
Published: (2025)
Score-Based Diffusion Policy Compatible with Reinforcement Learning via Optimal Transport
by: Sun, Mingyang, et al.
Published: (2025)
by: Sun, Mingyang, et al.
Published: (2025)
Iterative Refinement of Flow Policies in Probability Space for Online Reinforcement Learning
by: Sun, Mingyang, et al.
Published: (2025)
by: Sun, Mingyang, et al.
Published: (2025)
ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning
by: Zhang, Hongyin, et al.
Published: (2025)
by: Zhang, Hongyin, et al.
Published: (2025)
FlowVLA: Visual Chain of Thought-based Motion Reasoning for Vision-Language-Action Models
by: Zhong, Zhide, et al.
Published: (2025)
by: Zhong, Zhide, et al.
Published: (2025)
MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents
by: Zhang, Haozhen, et al.
Published: (2026)
by: Zhang, Haozhen, et al.
Published: (2026)
Mem-W: Latent Memory-Native GUI Agents
by: Zhang, Guibin, et al.
Published: (2026)
by: Zhang, Guibin, et al.
Published: (2026)
VLAS: Vision-Language-Action Model With Speech Instructions For Customized Robot Manipulation
by: Zhao, Wei, et al.
Published: (2025)
by: Zhao, Wei, et al.
Published: (2025)
EvoMemBench: Benchmarking Agent Memory from a Self-Evolving Perspective
by: Wang, Yuyao, et al.
Published: (2026)
by: Wang, Yuyao, et al.
Published: (2026)
EvolMem: A Cognitive-Driven Benchmark for Multi-Session Dialogue Memory
by: Shen, Ye, et al.
Published: (2026)
by: Shen, Ye, et al.
Published: (2026)
GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation
by: Zhang, Hongyin, et al.
Published: (2025)
by: Zhang, Hongyin, et al.
Published: (2025)
MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments
by: Liu, Guangyi, et al.
Published: (2026)
by: Liu, Guangyi, et al.
Published: (2026)
Rethinking Target Label Conditioning in Adversarial Attacks: A 2D Tensor-Guided Generative Approach
by: Liu, Hangyu, et al.
Published: (2025)
by: Liu, Hangyu, et al.
Published: (2025)
MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models
by: Zhao, Han, et al.
Published: (2025)
by: Zhao, Han, et al.
Published: (2025)
Rethinking Latent Redundancy in Behavior Cloning: An Information Bottleneck Approach for Robot Manipulation
by: Bai, Shuanghao, et al.
Published: (2025)
by: Bai, Shuanghao, et al.
Published: (2025)
VLA-OPD: Bridging Offline SFT and Online RL for Vision-Language-Action Models via On-Policy Distillation
by: Zhong, Zhide, et al.
Published: (2026)
by: Zhong, Zhide, et al.
Published: (2026)
RationalVLA: A Rational Vision-Language-Action Model with Dual System
by: Song, Wenxuan, et al.
Published: (2025)
by: Song, Wenxuan, et al.
Published: (2025)
RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies
by: Atreya, Pranav, et al.
Published: (2025)
by: Atreya, Pranav, et al.
Published: (2025)
CloneMem: Benchmarking Long-Term Memory for AI Clones
by: Hu, Sen, et al.
Published: (2026)
by: Hu, Sen, et al.
Published: (2026)
MemFlow: Intent-Driven Memory Orchestration for Small Language Model Agents
by: Chen, Jiayi, et al.
Published: (2026)
by: Chen, Jiayi, et al.
Published: (2026)
Expressive Forecasting of 3D Whole-body Human Motions
by: Ding, Pengxiang, et al.
Published: (2023)
by: Ding, Pengxiang, et al.
Published: (2023)
MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
by: Zhang, Guibin, et al.
Published: (2025)
by: Zhang, Guibin, et al.
Published: (2025)
MemEvolve: Meta-Evolution of Agent Memory Systems
by: Zhang, Guibin, et al.
Published: (2025)
by: Zhang, Guibin, et al.
Published: (2025)
CUBic: Coordinated Unified Bimanual Perception and Control Framework
by: Wang, Xingyu, et al.
Published: (2026)
by: Wang, Xingyu, et al.
Published: (2026)
Similar Items
-
Rethinking the Practicality of Vision-language-action Model: A Comprehensive Benchmark and An Improved Baseline
by: Song, Wenxuan, et al.
Published: (2026) -
ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver
by: Song, Wenxuan, et al.
Published: (2025) -
CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding
by: Song, Wenxuan, et al.
Published: (2025) -
Unified Diffusion VLA: Vision-Language-Action Model via Joint Discrete Denoising Diffusion Process
by: Chen, Jiayi, et al.
Published: (2025) -
Fast-dVLA: Accelerating Discrete Diffusion VLA to Real-Time Performance
by: Song, Wenxuan, et al.
Published: (2026)