Saved in:
| Main Authors: | Haresh, Sanjay, Dijkman, Daniel, Bhattacharyya, Apratim, Memisevic, Roland |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.21013 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
ClevrSkills: Compositional Language and Visual Reasoning in Robotics
by: Haresh, Sanjay, et al.
Published: (2024)
by: Haresh, Sanjay, et al.
Published: (2024)
Focusing on What Matters: Object-Agent-centric Tokenization for Vision Language Action models
by: Bendikas, Rokas, et al.
Published: (2025)
by: Bendikas, Rokas, et al.
Published: (2025)
Can Multi-Modal LLMs Provide Live Step-by-Step Task Guidance?
by: Bhattacharyya, Apratim, et al.
Published: (2025)
by: Bhattacharyya, Apratim, et al.
Published: (2025)
Delayed Attention Training Improves Length Generalization in Transformer--RNN Hybrids
by: Phan, Buu, et al.
Published: (2025)
by: Phan, Buu, et al.
Published: (2025)
Information-driven Affordance Discovery for Efficient Robotic Manipulation
by: Mazzaglia, Pietro, et al.
Published: (2023)
by: Mazzaglia, Pietro, et al.
Published: (2023)
Affordance Field Intervention: Enabling VLAs to Escape Memory Traps in Robotic Manipulation
by: Xu, Siyu, et al.
Published: (2025)
by: Xu, Siyu, et al.
Published: (2025)
Information-driven Affordance Discovery for Efficient Robotic Manipulation
by: Mazzaglia, Pietro, et al.
Published: (2024)
by: Mazzaglia, Pietro, et al.
Published: (2024)
The Price Is Not Right: Neuro-Symbolic Methods Outperform VLAs on Structured Long-Horizon Manipulation Tasks with Significantly Lower Energy Consumption
by: Duggan, Timothy, et al.
Published: (2026)
by: Duggan, Timothy, et al.
Published: (2026)
Aligning Robot Navigation Behaviors with Human Intentions and Preferences
by: Karnan, Haresh
Published: (2024)
by: Karnan, Haresh
Published: (2024)
Retrieve-then-Steer: Online Success Memory for Test-Time Adaptation of Generative VLAs
by: Zhao, Jianchao, et al.
Published: (2026)
by: Zhao, Jianchao, et al.
Published: (2026)
Running VLAs at Real-time Speed
by: Ma, Yunchao, et al.
Published: (2025)
by: Ma, Yunchao, et al.
Published: (2025)
From Code to Action: Hierarchical Learning of Diffusion-VLM Policies
by: Peschl, Markus, et al.
Published: (2025)
by: Peschl, Markus, et al.
Published: (2025)
SITCOM: Scaling Inference-Time COMpute for VLAs
by: Saxena, Ayudh, et al.
Published: (2025)
by: Saxena, Ayudh, et al.
Published: (2025)
Look, Remember and Reason: Grounded reasoning in videos with language models
by: Bhattacharyya, Apratim, et al.
Published: (2023)
by: Bhattacharyya, Apratim, et al.
Published: (2023)
Can Vision-Language Models Answer Face to Face Questions in the Real-World?
by: Pourreza, Reza, et al.
Published: (2025)
by: Pourreza, Reza, et al.
Published: (2025)
Shallow-π: Knowledge Distillation for Flow-based VLAs
by: Jeon, Boseong, et al.
Published: (2026)
by: Jeon, Boseong, et al.
Published: (2026)
Primitive Subspaces Mediate Few-Shot Transfer in VLAs
by: Singh, Anya, et al.
Published: (2026)
by: Singh, Anya, et al.
Published: (2026)
VLAs are Confined yet Capable of Generalizing to Novel Instructions
by: Li, Quanyi
Published: (2025)
by: Li, Quanyi
Published: (2025)
How VLAs (Really) Work In Open-World Environments
by: Rasouli, Amir, et al.
Published: (2026)
by: Rasouli, Amir, et al.
Published: (2026)
How Do VLAs Effectively Inherit from VLMs?
by: Zhang, Chuheng, et al.
Published: (2025)
by: Zhang, Chuheng, et al.
Published: (2025)
cVLA: Towards Efficient Camera-Space VLAs
by: Argus, Max, et al.
Published: (2025)
by: Argus, Max, et al.
Published: (2025)
FASTER: Rethinking Real-Time Flow VLAs
by: Lu, Yuxiang, et al.
Published: (2026)
by: Lu, Yuxiang, et al.
Published: (2026)
Actions as Language: Fine-Tuning VLMs into VLAs Without Catastrophic Forgetting
by: Hancock, Asher J., et al.
Published: (2025)
by: Hancock, Asher J., et al.
Published: (2025)
When Vision Overrides Language: Evaluating and Mitigating Counterfactual Failures in VLAs
by: Fang, Yu, et al.
Published: (2026)
by: Fang, Yu, et al.
Published: (2026)
VLA-0: Building State-of-the-Art VLAs with Zero Modification
by: Goyal, Ankit, et al.
Published: (2025)
by: Goyal, Ankit, et al.
Published: (2025)
Enhancing Hallucination Detection through Noise Injection
by: Liu, Litian, et al.
Published: (2025)
by: Liu, Litian, et al.
Published: (2025)
Realtime-VLA V2: Learning to Run VLAs Fast, Smooth, and Accurate
by: Yang, Chen, et al.
Published: (2026)
by: Yang, Chen, et al.
Published: (2026)
Lost in Fog: Sensor Perturbations Expose Reasoning Fragility in Driving VLAs
by: Priyadershi, Abhinaw, et al.
Published: (2026)
by: Priyadershi, Abhinaw, et al.
Published: (2026)
RMBench: Memory-Dependent Robotic Manipulation Benchmark with Insights into Policy Design
by: Chen, Tianxing, et al.
Published: (2026)
by: Chen, Tianxing, et al.
Published: (2026)
Differentiate-and-Inject: Enhancing VLAs via Functional Differentiation Induced by In-Parameter Structural Reasoning
by: Hou, Jingyi, et al.
Published: (2026)
by: Hou, Jingyi, et al.
Published: (2026)
Do World Action Models Generalize Better than VLAs? A Robustness Study
by: Zhang, Zhanguang, et al.
Published: (2026)
by: Zhang, Zhanguang, et al.
Published: (2026)
FoAM: Foresight-Augmented Multi-Task Imitation Policy for Robotic Manipulation
by: Liu, Litao, et al.
Published: (2024)
by: Liu, Litao, et al.
Published: (2024)
Task-Driven Manipulation with Reconfigurable Parallel Robots
by: Morton, Daniel, et al.
Published: (2024)
by: Morton, Daniel, et al.
Published: (2024)
MemoAct: Atkinson-Shiffrin-Inspired Memory-Augmented Visuomotor Policy for Robotic Manipulation
by: Tan, Liufan, et al.
Published: (2026)
by: Tan, Liufan, et al.
Published: (2026)
Task-Driven Co-Design of Mobile Manipulators
by: Schneider, Raphael, et al.
Published: (2024)
by: Schneider, Raphael, et al.
Published: (2024)
Realtime-VLA FLASH: Speculative Inference Framework for Diffusion-based VLAs
by: Niu, Jiahui, et al.
Published: (2026)
by: Niu, Jiahui, et al.
Published: (2026)
Prognostic Framework for Robotic Manipulators Operating Under Dynamic Task Severities
by: Mohanty, Ayush, et al.
Published: (2024)
by: Mohanty, Ayush, et al.
Published: (2024)
Hybrid Training for Vision-Language-Action Models
by: Mazzaglia, Pietro, et al.
Published: (2025)
by: Mazzaglia, Pietro, et al.
Published: (2025)
LEGS: Fine-Tuning Teleop-Free VLAs for Humanoid Loco-manipulation in an Embodied Gaussian Splatting World
by: Kim, Hojune, et al.
Published: (2026)
by: Kim, Hojune, et al.
Published: (2026)
VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference
by: Tang, Jiaming, et al.
Published: (2025)
by: Tang, Jiaming, et al.
Published: (2025)
Similar Items
-
ClevrSkills: Compositional Language and Visual Reasoning in Robotics
by: Haresh, Sanjay, et al.
Published: (2024) -
Focusing on What Matters: Object-Agent-centric Tokenization for Vision Language Action models
by: Bendikas, Rokas, et al.
Published: (2025) -
Can Multi-Modal LLMs Provide Live Step-by-Step Task Guidance?
by: Bhattacharyya, Apratim, et al.
Published: (2025) -
Delayed Attention Training Improves Length Generalization in Transformer--RNN Hybrids
by: Phan, Buu, et al.
Published: (2025) -
Information-driven Affordance Discovery for Efficient Robotic Manipulation
by: Mazzaglia, Pietro, et al.
Published: (2023)