Saved in:
| Main Authors: | Guruprasad, Pranav, Sikka, Harshvardhan, Song, Jaewoo, Wang, Yangyue, Liang, Paul Pu |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.05821 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments
by: Guruprasad, Pranav, et al.
Published: (2025)
by: Guruprasad, Pranav, et al.
Published: (2025)
An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models
by: Guruprasad, Pranav, et al.
Published: (2025)
by: Guruprasad, Pranav, et al.
Published: (2025)
GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding Models
by: Wang, Yangyue, et al.
Published: (2026)
by: Wang, Yangyue, et al.
Published: (2026)
Benchmarking the Generality of Vision-Language-Action Models
by: Guruprasad, Pranav, et al.
Published: (2025)
by: Guruprasad, Pranav, et al.
Published: (2025)
Improving Vision-Language-Action Model with Online Reinforcement Learning
by: Guo, Yanjiang, et al.
Published: (2025)
by: Guo, Yanjiang, et al.
Published: (2025)
ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model
by: Zhou, Zhongyi, et al.
Published: (2025)
by: Zhou, Zhongyi, et al.
Published: (2025)
LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
by: Niu, Dantong, et al.
Published: (2024)
by: Niu, Dantong, et al.
Published: (2024)
PVI: Plug-in Visual Injection for Vision-Language-Action Models
by: Zhang, Zezhou, et al.
Published: (2026)
by: Zhang, Zezhou, et al.
Published: (2026)
A Survey on Efficient Vision-Language-Action Models
by: Yu, Zhaoshu, et al.
Published: (2025)
by: Yu, Zhaoshu, et al.
Published: (2025)
Tactile Modality Fusion for Vision-Language-Action Models
by: Morissette, Charlotte, et al.
Published: (2026)
by: Morissette, Charlotte, et al.
Published: (2026)
Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
by: Li, Qixiu, et al.
Published: (2025)
by: Li, Qixiu, et al.
Published: (2025)
VLM See, Robot Do: Human Demo Video to Robot Action Plan via Vision Language Model
by: Wang, Beichen, et al.
Published: (2024)
by: Wang, Beichen, et al.
Published: (2024)
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies
by: Liang, Zhixuan, et al.
Published: (2025)
by: Liang, Zhixuan, et al.
Published: (2025)
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
by: Li, Qixiu, et al.
Published: (2024)
by: Li, Qixiu, et al.
Published: (2024)
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
by: Kim, Moo Jin, et al.
Published: (2025)
by: Kim, Moo Jin, et al.
Published: (2025)
Vision-Language-Action Models for Robotics: A Review Towards Real-World Applications
by: Kawaharazuka, Kento, et al.
Published: (2025)
by: Kawaharazuka, Kento, et al.
Published: (2025)
EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos
by: Yang, Ruihan, et al.
Published: (2025)
by: Yang, Ruihan, et al.
Published: (2025)
Test-Time Training for Visual Foresight Vision-Language-Action Models
by: Park, Sangwu, et al.
Published: (2026)
by: Park, Sangwu, et al.
Published: (2026)
Pedestrian Trajectory Prediction with Missing Data: Datasets, Imputation, and Benchmarking
by: Chib, Pranav Singh, et al.
Published: (2024)
by: Chib, Pranav Singh, et al.
Published: (2024)
FlowHijack: A Dynamics-Aware Backdoor Attack on Flow-Matching Vision-Language-Action Models
by: An, Xinyuan, et al.
Published: (2026)
by: An, Xinyuan, et al.
Published: (2026)
PointVLA: Injecting the 3D World into Vision-Language-Action Models
by: Li, Chengmeng, et al.
Published: (2025)
by: Li, Chengmeng, et al.
Published: (2025)
AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving
by: Xing, Shuo, et al.
Published: (2024)
by: Xing, Shuo, et al.
Published: (2024)
Hybrid Training for Vision-Language-Action Models
by: Mazzaglia, Pietro, et al.
Published: (2025)
by: Mazzaglia, Pietro, et al.
Published: (2025)
The Compression Gap: Why Discrete Tokenization Limits Vision-Language-Action Model Scaling
by: Shiba, Takuya
Published: (2026)
by: Shiba, Takuya
Published: (2026)
Cross-Platform Scaling of Vision-Language-Action Models from Edge to Cloud GPUs
by: Taherin, Amir, et al.
Published: (2025)
by: Taherin, Amir, et al.
Published: (2025)
Universal Pose Pretraining for Generalizable Vision-Language-Action Policies
by: Lin, Haitao, et al.
Published: (2026)
by: Lin, Haitao, et al.
Published: (2026)
From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors
by: Zhang, Zhengshen, et al.
Published: (2025)
by: Zhang, Zhengshen, et al.
Published: (2025)
PerAct2: Benchmarking and Learning for Robotic Bimanual Manipulation Tasks
by: Grotz, Markus, et al.
Published: (2024)
by: Grotz, Markus, et al.
Published: (2024)
MARVL: Multi-Stage Guidance for Robotic Manipulation via Vision-Language Models
by: Zhou, Xunlan, et al.
Published: (2026)
by: Zhou, Xunlan, et al.
Published: (2026)
ESPIRE: A Diagnostic Benchmark for Embodied Spatial Reasoning of Vision-Language Models
by: Zhao, Yanpeng, et al.
Published: (2026)
by: Zhao, Yanpeng, et al.
Published: (2026)
Robustness Evaluation of Machine Learning Models for Robot Arm Action Recognition in Noisy Environments
by: Motamedi, Elaheh, et al.
Published: (2024)
by: Motamedi, Elaheh, et al.
Published: (2024)
Interactive Post-Training for Vision-Language-Action Models
by: Tan, Shuhan, et al.
Published: (2025)
by: Tan, Shuhan, et al.
Published: (2025)
Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation
by: Zhang, Wenbo, et al.
Published: (2025)
by: Zhang, Wenbo, et al.
Published: (2025)
VLA-Cache: Efficient Vision-Language-Action Manipulation via Adaptive Token Caching
by: Xu, Siyu, et al.
Published: (2025)
by: Xu, Siyu, et al.
Published: (2025)
GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation
by: Cheang, Chi-Lam, et al.
Published: (2024)
by: Cheang, Chi-Lam, et al.
Published: (2024)
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
by: Zhao, Qingqing, et al.
Published: (2025)
by: Zhao, Qingqing, et al.
Published: (2025)
AVA-VLA: Improving Vision-Language-Action models with Active Visual Attention
by: Xiao, Lei, et al.
Published: (2025)
by: Xiao, Lei, et al.
Published: (2025)
Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge
by: Larchenko, Ilia, et al.
Published: (2025)
by: Larchenko, Ilia, et al.
Published: (2025)
Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos
by: Luo, Hao, et al.
Published: (2025)
by: Luo, Hao, et al.
Published: (2025)
GenSim: Generating Robotic Simulation Tasks via Large Language Models
by: Wang, Lirui, et al.
Published: (2023)
by: Wang, Lirui, et al.
Published: (2023)
Similar Items
-
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments
by: Guruprasad, Pranav, et al.
Published: (2025) -
An Open-Source Software Toolkit & Benchmark Suite for the Evaluation and Adaptation of Multimodal Action Models
by: Guruprasad, Pranav, et al.
Published: (2025) -
GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding Models
by: Wang, Yangyue, et al.
Published: (2026) -
Benchmarking the Generality of Vision-Language-Action Models
by: Guruprasad, Pranav, et al.
Published: (2025) -
Improving Vision-Language-Action Model with Online Reinforcement Learning
by: Guo, Yanjiang, et al.
Published: (2025)