:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autore principale:	Chu, Wenhui
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Robotics Computer Vision and Pattern Recognition
Accesso online:	https://arxiv.org/abs/2605.25495
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

OW-Rep: Open World Object Detection with Instance Representation Learning
di: Lee, Sunoh, et al.
Pubblicazione: (2024)

Bridging Perspectives: Foundation Model Guided BEV Maps for 3D Object Detection and Tracking
di: Käppeler, Markus, et al.
Pubblicazione: (2025)

Simulations of MRI Guided and Powered Ferric Applicators for Tetherless Delivery of Therapeutic Interventions
di: Chu, Wenhui, et al.
Pubblicazione: (2026)

Geo-RepNet: Geometry-Aware Representation Learning for Surgical Phase Recognition in Endoscopic Submucosal Dissection
di: Tang, Rui, et al.
Pubblicazione: (2025)

What Is The Best 3D Scene Representation for Robotics? From Geometric to Foundation Models
di: Deng, Tianchen, et al.
Pubblicazione: (2025)

DeFM: Learning Foundation Representations from Depth for Robotics
di: Patel, Manthan, et al.
Pubblicazione: (2026)

Turning Adaptation into Assets: Cross-Domain Bridging for Online Vision-Language Navigation
di: Hu, Zixuan, et al.
Pubblicazione: (2026)

Ensuring Force Safety in Vision-Guided Robotic Manipulation via Implicit Tactile Calibration
di: Wei, Lai, et al.
Pubblicazione: (2024)

Attention-Guided Integration of CLIP and SAM for Precise Object Masking in Robotic Manipulation
di: Muttaqien, Muhammad A., et al.
Pubblicazione: (2025)

ZISVFM: Zero-Shot Object Instance Segmentation in Indoor Robotic Environments with Vision Foundation Models
di: Zhang, Ying, et al.
Pubblicazione: (2025)

Vision Foundation Models for Domain Generalisable Cross-View Localisation in Planetary Ground-Aerial Robotic Teams
di: Holden, Lachlan, et al.
Pubblicazione: (2026)

VLS: Steering Pretrained Robot Policies via Vision-Language Models
di: Liu, Shuo, et al.
Pubblicazione: (2026)

Diffusion-VLA: Generalizable and Interpretable Robot Foundation Model via Self-Generated Reasoning
di: Wen, Junjie, et al.
Pubblicazione: (2024)

Bridging the Indoor-Outdoor Gap: Vision-Centric Instruction-Guided Embodied Navigation for the Last Meters
di: Zhao, Yuxiang, et al.
Pubblicazione: (2026)

SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation
di: Zhang, Junjie, et al.
Pubblicazione: (2024)

Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
di: Han, Xiaofeng, et al.
Pubblicazione: (2025)

Understanding the Impact of Geometric Foundation Models on Vision-Language-Action Models
di: Yang, Yurou, et al.
Pubblicazione: (2026)

NoTVLA: Semantics-Preserving Robot Adaptation via Narrative Action Interfaces
di: Huang, Zheng, et al.
Pubblicazione: (2025)

ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models
di: Dey, Sombit, et al.
Pubblicazione: (2024)

More than Segmentation: Benchmarking SAM 3 for Segmentation, 3D Perception, and Reconstruction in Robotic Surgery
di: Dong, Wenzhen, et al.
Pubblicazione: (2025)

Theia: Distilling Diverse Vision Foundation Models for Robot Learning
di: Shang, Jinghuan, et al.
Pubblicazione: (2024)

HarvestFlex: Strawberry Harvesting via Vision-Language-Action Policy Adaptation in the Wild
di: Zhao, Ziyang, et al.
Pubblicazione: (2026)

Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
di: Jiang, Bo, et al.
Pubblicazione: (2024)

Unifying Scene Representation and Hand-Eye Calibration with 3D Foundation Models
di: Zhi, Weiming, et al.
Pubblicazione: (2024)

STORM: Search-Guided Generative World Models for Robotic Manipulation
di: Lin, Wenjun, et al.
Pubblicazione: (2025)

QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
di: Ding, Pengxiang, et al.
Pubblicazione: (2023)

F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions
di: Lv, Qi, et al.
Pubblicazione: (2025)

QueryAdapter: Rapid Adaptation of Vision-Language Models in Response to Natural Language Queries
di: Chapman, Nicolas Harvey, et al.
Pubblicazione: (2025)

SlotVLA: Towards Modeling of Object-Relation Representations in Robotic Manipulation
di: Hanyu, Taisei, et al.
Pubblicazione: (2025)

Belief Consistency Between Foundation-Model Evidence and Geometric Perception in Persistent Robotic Maps
di: Heckman, Christoffer, et al.
Pubblicazione: (2026)

ABot-PhysWorld: Interactive World Foundation Model for Robotic Manipulation with Physics Alignment
di: Chen, Yuzhi, et al.
Pubblicazione: (2026)

BridgeV2W: Bridging Video Generation Models to Embodied World Models via Embodiment Masks
di: Chen, Yixiang, et al.
Pubblicazione: (2026)

RealD$^2$iff: Bridging Real-World Gap in Robot Manipulation via Depth Diffusion
di: Liang, Xiujian, et al.
Pubblicazione: (2025)

FM-Fusion: Instance-aware Semantic Mapping Boosted by Vision-Language Foundation Models
di: Liu, Chuhao, et al.
Pubblicazione: (2024)

Observe Then Act: Asynchronous Active Vision-Action Model for Robotic Manipulation
di: Wang, Guokang, et al.
Pubblicazione: (2024)

NaVILA: Legged Robot Vision-Language-Action Model for Navigation
di: Cheng, An-Chieh, et al.
Pubblicazione: (2024)

What Matters in Building Vision-Language-Action Models for Generalist Robots
di: Li, Xinghang, et al.
Pubblicazione: (2024)

Vision Language Action Models in Robotic Manipulation: A Systematic Review
di: Din, Muhayy Ud, et al.
Pubblicazione: (2025)

TUGS: Physics-based Compact Representation of Underwater Scenes by Tensorized Gaussian
di: Lian, Shijie, et al.
Pubblicazione: (2025)

Generalized Robot 3D Vision-Language Model with Fast Rendering and Pre-Training Vision-Language Alignment
di: Liu, Kangcheng, et al.
Pubblicazione: (2023)