:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Zhang, Yanxin, He, Liang, Kang, Zeyi, Ming, Zuheng, Zhao, Kaixing
Natura:	Preprint
Pubblicazione:	2025
Soggetti:	Robotics
Accesso online:	https://arxiv.org/abs/2509.18005
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

LCMF: Lightweight Cross-Modality Mambaformer for Embodied Robotics VQA
di: Kang, Zeyi, et al.
Pubblicazione: (2025)

Confusion-Aware In-Context-Learning for Vision-Language Models in Robotic Manipulation
di: He, Yayun, et al.
Pubblicazione: (2026)

VLA-InfoEntropy: A Training-Free Vision-Attention Information Entropy Approach for Vision-Language-Action Models Inference Acceleration and Success
di: Liu, Chuhang, et al.
Pubblicazione: (2026)

Tri-Select: A Multi-Stage Visual Data Selection Framework for Mobile Visual Crowdsensing
di: Zhang, Jiayu, et al.
Pubblicazione: (2025)

ET-SEED: Efficient Trajectory-Level SE(3) Equivariant Diffusion Policy
di: Tie, Chenrui, et al.
Pubblicazione: (2024)

VLSA: Vision-Language-Action Models with Plug-and-Play Safety Constraint Layer
di: Hu, Songqiao, et al.
Pubblicazione: (2025)

StereoVLA: Enhancing Vision-Language-Action Models with Stereo Vision
di: Deng, Shengliang, et al.
Pubblicazione: (2025)

Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
di: Han, Xiaofeng, et al.
Pubblicazione: (2025)

Sample-Efficient Robot Skill Learning for Construction Tasks: Benchmarking Hierarchical Reinforcement Learning and Vision-Language-Action VLA Model
di: Hu, Zhaofeng, et al.
Pubblicazione: (2025)

History-Conditioned Spatio-Temporal Visual Token Pruning for Efficient Vision-Language Navigation
di: Wang, Qitong, et al.
Pubblicazione: (2026)

A Vision-Language-Action-Critic Model for Robotic Real-World Reinforcement Learning
di: Zhai, Shaopeng, et al.
Pubblicazione: (2025)

Enhancing Robotic Arm Activity Recognition with Vision Transformers and Wavelet-Transformed Channel State Information
di: Zandi, Rojin, et al.
Pubblicazione: (2024)

PANav: Toward Privacy-Aware Robot Navigation via Vision-Language Models
di: Yu, Bangguo, et al.
Pubblicazione: (2024)

VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation
di: Zhao, Wentao, et al.
Pubblicazione: (2024)

HMT-Grasp: A Hybrid Mamba-Transformer Approach for Robot Grasping in Cluttered Environments
di: Xiong, Songsong, et al.
Pubblicazione: (2024)

Robotic Strawberry Harvesting with Robust Vision and Deep Reinforcement Learning based Sim-to-Real Control
di: Bashir, Al, et al.
Pubblicazione: (2026)

ReTac-ACT: A State-Gated Vision-Tactile Fusion Transformer for Precision Assembly
di: Ruan, Minchi, et al.
Pubblicazione: (2026)

Bridging Language, Vision and Action: Multimodal VAEs in Robotic Manipulation Tasks
di: Sejnova, Gabriela, et al.
Pubblicazione: (2024)

VLAS: Vision-Language-Action Model With Speech Instructions For Customized Robot Manipulation
di: Zhao, Wei, et al.
Pubblicazione: (2025)

AG-MPBS: a Mobility-Aware Prediction and Behavior-Based Scheduling Framework for Air-Ground Unmanned Systems
di: Shao, Tianhao, et al.
Pubblicazione: (2025)

STRONG-VLA: Decoupled Robustness Learning for Vision-Language-Action Models under Multimodal Perturbations
di: Xie, Yuhan, et al.
Pubblicazione: (2026)

ME$^3$-BEV: Mamba-Enhanced Deep Reinforcement Learning for End-to-End Autonomous Driving with BEV-Perception
di: Lu, Siyi, et al.
Pubblicazione: (2025)

Multimodal Behavior Tree Generation: A Small Vision-Language Model for Robot Task Planning
di: Battistini, Cristiano, et al.
Pubblicazione: (2026)

CATNAV: Cached Vision-Language Traversability for Efficient Zero-Shot Robot Navigation
di: Potnis, Aditya, et al.
Pubblicazione: (2026)

MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation
di: Zhang, Rongyu, et al.
Pubblicazione: (2025)

Vision-Language Model-based Physical Reasoning for Robot Liquid Perception
di: Lai, Wenqiang, et al.
Pubblicazione: (2024)

DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
di: Yue, Yang, et al.
Pubblicazione: (2024)

RoboReward: General-Purpose Vision-Language Reward Models for Robotics
di: Lee, Tony, et al.
Pubblicazione: (2026)

Benchmarking Vision, Language, & Action Models on Robotic Learning Tasks
di: Guruprasad, Pranav, et al.
Pubblicazione: (2024)

InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models
di: Yan, Yu, et al.
Pubblicazione: (2024)

CL3R: 3D Reconstruction and Contrastive Learning for Enhanced Robotic Manipulation Representations
di: Cui, Wenbo, et al.
Pubblicazione: (2025)

Safe-VLN: Collision Avoidance for Vision-and-Language Navigation of Autonomous Robots Operating in Continuous Environments
di: Yue, Lu, et al.
Pubblicazione: (2023)

GraspMamba: A Mamba-based Language-driven Grasp Detection Framework with Hierarchical Feature Learning
di: Nguyen, Huy Hoang, et al.
Pubblicazione: (2024)

Unveiling the Potential of Vision-Language-Action Models with Open-Ended Multimodal Instructions
di: Zhao, Wei, et al.
Pubblicazione: (2025)

T-araVLN: Translator for Agricultural Robotic Agents on Vision-and-Language Navigation
di: Zhao, Xiaobei, et al.
Pubblicazione: (2025)

PrefMMT: Modeling Human Preferences in Preference-based Reinforcement Learning with Multimodal Transformers
di: Zhao, Dezhong, et al.
Pubblicazione: (2024)

CLIP-RT: Learning Language-Conditioned Robotic Policies from Natural Language Supervision
di: Kang, Gi-Cheon, et al.
Pubblicazione: (2024)

RLinf-VLA: A Unified and Efficient Framework for Reinforcement Learning of Vision-Language-Action Models
di: Zang, Hongzhi, et al.
Pubblicazione: (2025)

CognitiveDog: Large Multimodal Model Based System to Translate Vision and Language into Action of Quadruped Robot
di: Lykov, Artem, et al.
Pubblicazione: (2024)

FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies
di: Reuss, Moritz, et al.
Pubblicazione: (2025)