:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Haresh, Sanjay, Dijkman, Daniel, Bhattacharyya, Apratim, Memisevic, Roland
Format:	Preprint
Published:	2026
Subjects:	Robotics
Online Access:	https://arxiv.org/abs/2602.21013
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

ClevrSkills: Compositional Language and Visual Reasoning in Robotics
by: Haresh, Sanjay, et al.
Published: (2024)

Focusing on What Matters: Object-Agent-centric Tokenization for Vision Language Action models
by: Bendikas, Rokas, et al.
Published: (2025)

Can Multi-Modal LLMs Provide Live Step-by-Step Task Guidance?
by: Bhattacharyya, Apratim, et al.
Published: (2025)

Delayed Attention Training Improves Length Generalization in Transformer--RNN Hybrids
by: Phan, Buu, et al.
Published: (2025)

Information-driven Affordance Discovery for Efficient Robotic Manipulation
by: Mazzaglia, Pietro, et al.
Published: (2023)

Affordance Field Intervention: Enabling VLAs to Escape Memory Traps in Robotic Manipulation
by: Xu, Siyu, et al.
Published: (2025)

Information-driven Affordance Discovery for Efficient Robotic Manipulation
by: Mazzaglia, Pietro, et al.
Published: (2024)

The Price Is Not Right: Neuro-Symbolic Methods Outperform VLAs on Structured Long-Horizon Manipulation Tasks with Significantly Lower Energy Consumption
by: Duggan, Timothy, et al.
Published: (2026)

Aligning Robot Navigation Behaviors with Human Intentions and Preferences
by: Karnan, Haresh
Published: (2024)

Retrieve-then-Steer: Online Success Memory for Test-Time Adaptation of Generative VLAs
by: Zhao, Jianchao, et al.
Published: (2026)

Running VLAs at Real-time Speed
by: Ma, Yunchao, et al.
Published: (2025)

From Code to Action: Hierarchical Learning of Diffusion-VLM Policies
by: Peschl, Markus, et al.
Published: (2025)

SITCOM: Scaling Inference-Time COMpute for VLAs
by: Saxena, Ayudh, et al.
Published: (2025)

Look, Remember and Reason: Grounded reasoning in videos with language models
by: Bhattacharyya, Apratim, et al.
Published: (2023)

Can Vision-Language Models Answer Face to Face Questions in the Real-World?
by: Pourreza, Reza, et al.
Published: (2025)

Shallow-π: Knowledge Distillation for Flow-based VLAs
by: Jeon, Boseong, et al.
Published: (2026)

Primitive Subspaces Mediate Few-Shot Transfer in VLAs
by: Singh, Anya, et al.
Published: (2026)

VLAs are Confined yet Capable of Generalizing to Novel Instructions
by: Li, Quanyi
Published: (2025)

How VLAs (Really) Work In Open-World Environments
by: Rasouli, Amir, et al.
Published: (2026)

How Do VLAs Effectively Inherit from VLMs?
by: Zhang, Chuheng, et al.
Published: (2025)

cVLA: Towards Efficient Camera-Space VLAs
by: Argus, Max, et al.
Published: (2025)

FASTER: Rethinking Real-Time Flow VLAs
by: Lu, Yuxiang, et al.
Published: (2026)

Actions as Language: Fine-Tuning VLMs into VLAs Without Catastrophic Forgetting
by: Hancock, Asher J., et al.
Published: (2025)

When Vision Overrides Language: Evaluating and Mitigating Counterfactual Failures in VLAs
by: Fang, Yu, et al.
Published: (2026)

VLA-0: Building State-of-the-Art VLAs with Zero Modification
by: Goyal, Ankit, et al.
Published: (2025)

Enhancing Hallucination Detection through Noise Injection
by: Liu, Litian, et al.
Published: (2025)

Realtime-VLA V2: Learning to Run VLAs Fast, Smooth, and Accurate
by: Yang, Chen, et al.
Published: (2026)

Lost in Fog: Sensor Perturbations Expose Reasoning Fragility in Driving VLAs
by: Priyadershi, Abhinaw, et al.
Published: (2026)

RMBench: Memory-Dependent Robotic Manipulation Benchmark with Insights into Policy Design
by: Chen, Tianxing, et al.
Published: (2026)

Differentiate-and-Inject: Enhancing VLAs via Functional Differentiation Induced by In-Parameter Structural Reasoning
by: Hou, Jingyi, et al.
Published: (2026)

Do World Action Models Generalize Better than VLAs? A Robustness Study
by: Zhang, Zhanguang, et al.
Published: (2026)

FoAM: Foresight-Augmented Multi-Task Imitation Policy for Robotic Manipulation
by: Liu, Litao, et al.
Published: (2024)

Task-Driven Manipulation with Reconfigurable Parallel Robots
by: Morton, Daniel, et al.
Published: (2024)

MemoAct: Atkinson-Shiffrin-Inspired Memory-Augmented Visuomotor Policy for Robotic Manipulation
by: Tan, Liufan, et al.
Published: (2026)

Task-Driven Co-Design of Mobile Manipulators
by: Schneider, Raphael, et al.
Published: (2024)

Realtime-VLA FLASH: Speculative Inference Framework for Diffusion-based VLAs
by: Niu, Jiahui, et al.
Published: (2026)

Prognostic Framework for Robotic Manipulators Operating Under Dynamic Task Severities
by: Mohanty, Ayush, et al.
Published: (2024)

Hybrid Training for Vision-Language-Action Models
by: Mazzaglia, Pietro, et al.
Published: (2025)

LEGS: Fine-Tuning Teleop-Free VLAs for Humanoid Loco-manipulation in an Embodied Gaussian Splatting World
by: Kim, Hojune, et al.
Published: (2026)

VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference
by: Tang, Jiaming, et al.
Published: (2025)