:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Nguyen, Son Hai, Wang, Diwei, Jang, Jinhyeok, Seo, Hyewon
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Computer Vision and Pattern Recognition Artificial Intelligence
Online-Zugang:	https://arxiv.org/abs/2509.16452
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Enhancing Gait Video Analysis in Neurodegenerative Diseases by Knowledge Augmentation in Vision Language Model
von: Wang, Diwei, et al.
Veröffentlicht: (2024)

Learning from Oblivion: Predicting Knowledge Overflowed Weights via Retrodiction of Forgetting
von: Jang, Jinhyeok, et al.
Veröffentlicht: (2025)

AGIR: Assessing 3D Gait Impairment with Reasoning based on LLMs
von: Wang, Diwei, et al.
Veröffentlicht: (2025)

Conformal Predictions for Human Action Recognition with Vision-Language Models
von: Tim, Bary, et al.
Veröffentlicht: (2025)

Vision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context Understanding
von: Nguyen-Truong, Hai, et al.
Veröffentlicht: (2024)

Rigidity-Aware 3D Gaussian Deformation from a Single Image
von: Kim, Jinhyeok, et al.
Veröffentlicht: (2025)

RAD: Retrieval-Augmented Decision-Making of Meta-Actions with Vision-Language Models in Autonomous Driving
von: Wang, Yujin, et al.
Veröffentlicht: (2025)

VLADriver-RAG: Retrieval-Augmented Vision-Language-Action Models for Autonomous Driving
von: Zhao, Rui, et al.
Veröffentlicht: (2026)

WAVER: Writing-style Agnostic Text-Video Retrieval via Distilling Vision-Language Models Through Open-Vocabulary Knowledge
von: Le, Huy, et al.
Veröffentlicht: (2023)

Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation
von: Li, Zaijing, et al.
Veröffentlicht: (2026)

Explainable Adversarial-Robust Vision-Language-Action Model for Robotic Manipulation
von: Kim, Ju-Young, et al.
Veröffentlicht: (2025)

ROSA: Harnessing Robot States for Vision-Language and Action Alignment
von: Wen, Yuqing, et al.
Veröffentlicht: (2025)

Robotic State Recognition with Image-to-Text Retrieval Task of Pre-Trained Vision-Language Model and Black-Box Optimization
von: Kawaharazuka, Kento, et al.
Veröffentlicht: (2024)

AugVLA-3D: Depth-Driven Feature Augmentation for Vision-Language-Action Models
von: Rao, Zhifeng, et al.
Veröffentlicht: (2026)

VLM4VLA: Revisiting Vision-Language-Models in Vision-Language-Action Models
von: Zhang, Jianke, et al.
Veröffentlicht: (2026)

When Robots Obey the Patch: Universal Transferable Patch Attacks on Vision-Language-Action Models
von: Lu, Hui, et al.
Veröffentlicht: (2025)

ViCLIP-OT: The First Foundation Vision-Language Model for Vietnamese Image-Text Retrieval with Optimal Transport
von: Tran, Quoc-Khang, et al.
Veröffentlicht: (2026)

A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasks
von: Jung, Hoin, et al.
Veröffentlicht: (2024)

Bridging the Missing-Modality Gap: Improving Text-Only Calibration of Vision Language Models
von: Kim, Mingyeong, et al.
Veröffentlicht: (2026)

Learning Vision-Language-Action World Models for Autonomous Driving
von: Wang, Guoqing, et al.
Veröffentlicht: (2026)

RT-VLM: Re-Thinking Vision Language Model with 4-Clues for Real-World Object Recognition Robustness
von: Park, Junghyun, et al.
Veröffentlicht: (2025)

Knowledge-Augmented Vision Language Models for Underwater Bioacoustic Spectrogram Analysis
von: Nihal, Ragib Amin, et al.
Veröffentlicht: (2025)

Skeleton-Based Action Recognition with Spatial-Structural Graph Convolution
von: Wang, Jingyao, et al.
Veröffentlicht: (2024)

Enhancing Action Recognition by Leveraging the Hierarchical Structure of Actions and Textual Context
von: Benavent-Lledo, Manuel, et al.
Veröffentlicht: (2024)

Single-Teacher View Augmentation: Boosting Knowledge Distillation via Angular Diversity
von: Yu, Seonghoon, et al.
Veröffentlicht: (2025)

Variational Contrastive Learning for Skeleton-based Action Recognition
von: Nguyen, Dang Dinh, et al.
Veröffentlicht: (2026)

KGAlign: Joint Semantic-Structural Knowledge Encoding for Multimodal Fake News Detection
von: La, Tuan-Vinh, et al.
Veröffentlicht: (2025)

A Novel Framework for Automated Explain Vision Model Using Vision-Language Models
von: Nguyen, Phu-Vinh, et al.
Veröffentlicht: (2025)

Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization
von: Kawaharazuka, Kento, et al.
Veröffentlicht: (2024)

Diffusion Model in Latent Space for Medical Image Segmentation Task
von: Ngoc, Huynh Trinh, et al.
Veröffentlicht: (2025)

VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models
von: Gao, Chongkai, et al.
Veröffentlicht: (2025)

Toward an Artificial General Teacher: Procedural Geometry Data Generation and Visual Grounding with Vision-Language Models
von: Nguyen-Truong, Hai, et al.
Veröffentlicht: (2026)

ALAM: Algebraically Consistent Latent Action Model for Vision-Language-Action Models
von: Tang, Zuojin, et al.
Veröffentlicht: (2026)

Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models
von: Woo, Sangmin, et al.
Veröffentlicht: (2024)

Towards Cross-modal Backward-compatible Representation Learning for Vision-Language Models
von: Jang, Young Kyun, et al.
Veröffentlicht: (2024)

CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games
von: Chen, Peng, et al.
Veröffentlicht: (2025)

VLM See, Robot Do: Human Demo Video to Robot Action Plan via Vision Language Model
von: Wang, Beichen, et al.
Veröffentlicht: (2024)

When Alignment Fails: Multimodal Adversarial Attacks on Vision-Language-Action Models
von: Yan, Yuping, et al.
Veröffentlicht: (2025)

Survey on Vision-Language-Action Models
von: Adilkhanov, Adilzhan, et al.
Veröffentlicht: (2025)

LIBERO-X: Robustness Litmus for Vision-Language-Action Models
von: Wang, Guodong, et al.
Veröffentlicht: (2026)