Saved in:
| Main Authors: | Wu, Heran, Zhou, Zirun, Zhang, Jingfeng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.06547 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Mechanistic interpretability for steering vision-language-action models
by: Häon, Bear, et al.
Published: (2025)
by: Häon, Bear, et al.
Published: (2025)
FATE-VLA:Failue-aware test generation for vision-language-action models
by: Kanwal, Arusa, et al.
Published: (2026)
by: Kanwal, Arusa, et al.
Published: (2026)
MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation
by: Wu, Zhenyu, et al.
Published: (2025)
by: Wu, Zhenyu, et al.
Published: (2025)
A vision-language model and platform for temporally mapping surgery from video
by: Kiyasseh, Dani
Published: (2026)
by: Kiyasseh, Dani
Published: (2026)
Purely vision-based collective movement of robots
by: Mezey, David, et al.
Published: (2024)
by: Mezey, David, et al.
Published: (2024)
FlySearch: Exploring how vision-language models explore
by: Pardyl, Adam, et al.
Published: (2025)
by: Pardyl, Adam, et al.
Published: (2025)
Foundation models on the bridge: Semantic hazard detection and safety maneuvers for maritime autonomy with vision-language models
by: Christensen, Kim Alexander, et al.
Published: (2025)
by: Christensen, Kim Alexander, et al.
Published: (2025)
Rethinking the Practicality of Vision-language-action Model: A Comprehensive Benchmark and An Improved Baseline
by: Song, Wenxuan, et al.
Published: (2026)
by: Song, Wenxuan, et al.
Published: (2026)
The active visual sensing methods for robotic welding: review, tutorial and prospect
by: Wang, ZhenZhou
Published: (2024)
by: Wang, ZhenZhou
Published: (2024)
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model
by: Li, Fuhao, et al.
Published: (2025)
by: Li, Fuhao, et al.
Published: (2025)
Openfly: A comprehensive platform for aerial vision-language navigation
by: Gao, Yunpeng, et al.
Published: (2025)
by: Gao, Yunpeng, et al.
Published: (2025)
Memorized action chunking with Transformers: Imitation learning for vision-based tissue surface scanning
by: Yang, Bochen, et al.
Published: (2024)
by: Yang, Bochen, et al.
Published: (2024)
Large language model-based task planning for service robots: A review
by: Bian, Shaohan, et al.
Published: (2025)
by: Bian, Shaohan, et al.
Published: (2025)
CottonSim: A vision-guided autonomous robotic system for cotton harvesting in Gazebo simulation
by: Thayananthan, Thevathayarajh, et al.
Published: (2025)
by: Thayananthan, Thevathayarajh, et al.
Published: (2025)
Assist-as-needed Hip Exoskeleton Control for Gait Asymmetry Correction via Human-in-the-loop Optimization
by: Qian, Yuepeng, et al.
Published: (2025)
by: Qian, Yuepeng, et al.
Published: (2025)
One to rule them all: natural language to bind communication, perception and action
by: Colombani, Simone, et al.
Published: (2024)
by: Colombani, Simone, et al.
Published: (2024)
Joint Moment Estimation for Hip Exoskeleton Control: A Generalized Moment Feature Generation Method
by: Zhang, Yuanwen, et al.
Published: (2024)
by: Zhang, Yuanwen, et al.
Published: (2024)
Concept-Based Dictionary Learning for Inference-Time Safety in Vision Language Action Models
by: Wen, Siqi, et al.
Published: (2026)
by: Wen, Siqi, et al.
Published: (2026)
Using large language models for embodied planning introduces systematic safety risks
by: Zhang, Tao, et al.
Published: (2026)
by: Zhang, Tao, et al.
Published: (2026)
Robots that learn to evaluate models of collective behavior
by: Hocke, Mathis, et al.
Published: (2026)
by: Hocke, Mathis, et al.
Published: (2026)
A vision-based robotic system for precision pollination of apples
by: Bhattarai, Uddhav, et al.
Published: (2024)
by: Bhattarai, Uddhav, et al.
Published: (2024)
Training microrobots to swim by a large language model
by: Xu, Zhuoqun, et al.
Published: (2024)
by: Xu, Zhuoqun, et al.
Published: (2024)
Value-guided action planning with JEPA world models
by: Destrade, Matthieu, et al.
Published: (2025)
by: Destrade, Matthieu, et al.
Published: (2025)
YOLOv10 with Kolmogorov-Arnold networks and vision-language foundation models for interpretable object detection and trustworthy multimodal AI in computer vision perception
by: Impraimakis, Marios, et al.
Published: (2026)
by: Impraimakis, Marios, et al.
Published: (2026)
FlightBench: Benchmarking Learning-based Methods for Ego-vision-based Quadrotors Navigation
by: Yu, Shu-Ang, et al.
Published: (2024)
by: Yu, Shu-Ang, et al.
Published: (2024)
Sparsh: Self-supervised touch representations for vision-based tactile sensing
by: Higuera, Carolina, et al.
Published: (2024)
by: Higuera, Carolina, et al.
Published: (2024)
A transparency-based action model implemented in a robotic physical trainer for improved HRI
by: Naama, Aharony, et al.
Published: (2024)
by: Naama, Aharony, et al.
Published: (2024)
The Better You Learn, The Smarter You Prune: Towards Efficient Vision-language-action Models via Differentiable Token Pruning
by: Jiang, Titong, et al.
Published: (2025)
by: Jiang, Titong, et al.
Published: (2025)
Ontological grounding for sound and natural robot explanations via large language models
by: Olivares-Alarcos, Alberto, et al.
Published: (2026)
by: Olivares-Alarcos, Alberto, et al.
Published: (2026)
Two-stream network-driven vision-based tactile sensor for object feature extraction and fusion perception
by: Huang, Muxing, et al.
Published: (2025)
by: Huang, Muxing, et al.
Published: (2025)
A physics-based sensor simulation environment for lunar ground operations
by: Batagoda, Nevindu M., et al.
Published: (2024)
by: Batagoda, Nevindu M., et al.
Published: (2024)
A SysML-based language for evaluating the integrity of simulation and physical embodiments of Cyber-Physical systems
by: Dudek, Wojciech, et al.
Published: (2023)
by: Dudek, Wojciech, et al.
Published: (2023)
PROSKILL: A formal skill language for acting in robotics
by: Ingrand, Félix
Published: (2024)
by: Ingrand, Félix
Published: (2024)
Biomechanically consistent real-time action recognition for human-robot interaction
by: Li, Wanchen, et al.
Published: (2025)
by: Li, Wanchen, et al.
Published: (2025)
A physics-informed, vision-based method to reconstruct all deformation modes in slender bodies
by: Kim, Seung Hyun, et al.
Published: (2021)
by: Kim, Seung Hyun, et al.
Published: (2021)
Collision avoidance from monocular vision trained with novel view synthesis
by: Tordjman--Levavasseur, Valentin, et al.
Published: (2025)
by: Tordjman--Levavasseur, Valentin, et al.
Published: (2025)
CLUE: Crossmodal disambiguation via Language-vision Understanding with attEntion
by: Abrini, Mouad, et al.
Published: (2026)
by: Abrini, Mouad, et al.
Published: (2026)
Traversability analysis with vision and terrain probing for safe legged robot navigation
by: Haddeler, Garen, et al.
Published: (2022)
by: Haddeler, Garen, et al.
Published: (2022)
Bio-inspired reconfigurable stereo vision for robotics using omnidirectional cameras
by: Chen, Suchang, et al.
Published: (2024)
by: Chen, Suchang, et al.
Published: (2024)
AIM: Intent-Aware Unified world action Modeling with Spatial Value Maps
by: Fan, Liaoyuan, et al.
Published: (2026)
by: Fan, Liaoyuan, et al.
Published: (2026)
Similar Items
-
Mechanistic interpretability for steering vision-language-action models
by: Häon, Bear, et al.
Published: (2025) -
FATE-VLA:Failue-aware test generation for vision-language-action models
by: Kanwal, Arusa, et al.
Published: (2026) -
MoManipVLA: Transferring Vision-language-action Models for General Mobile Manipulation
by: Wu, Zhenyu, et al.
Published: (2025) -
A vision-language model and platform for temporally mapping surgery from video
by: Kiyasseh, Dani
Published: (2026) -
Purely vision-based collective movement of robots
by: Mezey, David, et al.
Published: (2024)