Saved in:
| Main Author: | Nguyen, Van Quang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.24020 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models
by: Nguyen, Quang-Binh, et al.
Published: (2025)
by: Nguyen, Quang-Binh, et al.
Published: (2025)
MADTempo: An Interactive System for Multi-Event Temporal Video Retrieval with Query Augmentation
by: Vu, Huu-An, et al.
Published: (2025)
by: Vu, Huu-An, et al.
Published: (2025)
KTVIC: A Vietnamese Image Captioning Dataset on the Life Domain
by: Pham, Anh-Cuong, et al.
Published: (2024)
by: Pham, Anh-Cuong, et al.
Published: (2024)
FurniMAS: Language-Guided Furniture Decoration using Multi-Agent System
by: Nguyen, Toan, et al.
Published: (2025)
by: Nguyen, Toan, et al.
Published: (2025)
MambaU-Lite: A Lightweight Model based on Mamba and Integrated Channel-Spatial Attention for Skin Lesion Segmentation
by: Nguyen, Thi-Nhu-Quynh, et al.
Published: (2024)
by: Nguyen, Thi-Nhu-Quynh, et al.
Published: (2024)
VLURes: Benchmarking VLM Visual and Linguistic Understanding in Low-Resource Languages
by: Atuhurra, Jesse, et al.
Published: (2025)
by: Atuhurra, Jesse, et al.
Published: (2025)
SwiftTry: Fast and Consistent Video Virtual Try-On with Diffusion Models
by: Nguyen, Hung, et al.
Published: (2024)
by: Nguyen, Hung, et al.
Published: (2024)
Learning Generative Interactive Environments By Trained Agent Exploration
by: Kazemi, Naser, et al.
Published: (2024)
by: Kazemi, Naser, et al.
Published: (2024)
V-Math: An Agentic Approach to the Vietnamese National High School Graduation Mathematics Exams
by: Nguyen, Duong Q., et al.
Published: (2025)
by: Nguyen, Duong Q., et al.
Published: (2025)
360° Image Perception with MLLMs: A Comprehensive Benchmark and a Training-Free Method
by: Tran, Huyen T. T., et al.
Published: (2026)
by: Tran, Huyen T. T., et al.
Published: (2026)
VRD-IU: Lessons from Visually Rich Document Intelligence and Understanding
by: Ding, Yihao, et al.
Published: (2025)
by: Ding, Yihao, et al.
Published: (2025)
V2P-Bench: Evaluating Video-Language Understanding with Visual Prompts for Better Human-Model Interaction
by: Zhao, Yiming, et al.
Published: (2025)
by: Zhao, Yiming, et al.
Published: (2025)
Non-verbal Real-time Human-AI Interaction in Constrained Robotic Environments
by: Costea, Dragos, et al.
Published: (2026)
by: Costea, Dragos, et al.
Published: (2026)
Training Deep Visual Networks Beyond Loss and Accuracy Through a Dynamical Systems Approach
by: La Quang, Hai, et al.
Published: (2026)
by: La Quang, Hai, et al.
Published: (2026)
Robustness Evaluation of OCR-based Visual Document Understanding under Multi-Modal Adversarial Attacks
by: Tien, Dong Nguyen, et al.
Published: (2025)
by: Tien, Dong Nguyen, et al.
Published: (2025)
Discovering Hidden Visual Concepts Beyond Linguistic Input in Infant Learning
by: Ke, Xueyi, et al.
Published: (2025)
by: Ke, Xueyi, et al.
Published: (2025)
Generation and Detection of Sign Language Deepfakes - A Linguistic and Visual Analysis
by: Naeem, Shahzeb, et al.
Published: (2024)
by: Naeem, Shahzeb, et al.
Published: (2024)
UniHOI: Unified Human-Object Interaction Understanding via Unified Token Space
by: Yang, Panqi, et al.
Published: (2025)
by: Yang, Panqi, et al.
Published: (2025)
A Multimodal Framework for Aligning Human Linguistic Descriptions with Visual Perceptual Data
by: Bingham, Joseph
Published: (2026)
by: Bingham, Joseph
Published: (2026)
PARTONOMY: Large Multimodal Models with Part-Level Visual Understanding
by: Blume, Ansel, et al.
Published: (2025)
by: Blume, Ansel, et al.
Published: (2025)
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech
by: Liu, Rui, et al.
Published: (2024)
by: Liu, Rui, et al.
Published: (2024)
Reinforced Embodied Active Defense: Exploiting Adaptive Interaction for Robust Visual Perception in Adversarial 3D Environments
by: Yang, Xiao, et al.
Published: (2025)
by: Yang, Xiao, et al.
Published: (2025)
SUGAR: A Sweeter Spot for Generative Unlearning of Many Identities
by: Nguyen, Dung Thuy, et al.
Published: (2025)
by: Nguyen, Dung Thuy, et al.
Published: (2025)
CLiViS: Unleashing Cognitive Map through Linguistic-Visual Synergy for Embodied Visual Reasoning
by: Li, Kailing, et al.
Published: (2025)
by: Li, Kailing, et al.
Published: (2025)
GMAT: Grounded Multi-Agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification
by: Quang, Ngoc Bui Lam, et al.
Published: (2025)
by: Quang, Ngoc Bui Lam, et al.
Published: (2025)
Lang2Act: Fine-Grained Visual Reasoning through Self-Emergent Linguistic Toolchains
by: Xiong, Yuqi, et al.
Published: (2026)
by: Xiong, Yuqi, et al.
Published: (2026)
VDC: Versatile Data Cleanser based on Visual-Linguistic Inconsistency by Multimodal Large Language Models
by: Zhu, Zihao, et al.
Published: (2023)
by: Zhu, Zihao, et al.
Published: (2023)
Towards Understanding Visual Grounding in Visual Language Models
by: Pantazopoulos, Georgios, et al.
Published: (2025)
by: Pantazopoulos, Georgios, et al.
Published: (2025)
PerspectiveNet: Multi-View Perception for Dynamic Scene Understanding
by: Nguyen, Vinh
Published: (2024)
by: Nguyen, Vinh
Published: (2024)
Semi-Supervised Semantic Segmentation using Redesigned Self-Training for White Blood Cells
by: Luu, Vinh Quoc, et al.
Published: (2024)
by: Luu, Vinh Quoc, et al.
Published: (2024)
LangXAI: Integrating Large Vision Models for Generating Textual Explanations to Enhance Explainability in Visual Perception Tasks
by: Nguyen, Truong Thanh Hung, et al.
Published: (2024)
by: Nguyen, Truong Thanh Hung, et al.
Published: (2024)
Solving Scene Understanding for Autonomous Navigation in Unstructured Environments
by: Renji, Naveen Mathews, et al.
Published: (2025)
by: Renji, Naveen Mathews, et al.
Published: (2025)
Contrastive Integrated Gradients: A Feature Attribution-Based Method for Explaining Whole Slide Image Classification
by: Vu, Anh Mai, et al.
Published: (2025)
by: Vu, Anh Mai, et al.
Published: (2025)
Human-Object Interaction from Human-Level Instructions
by: Wu, Zhen, et al.
Published: (2024)
by: Wu, Zhen, et al.
Published: (2024)
Aligning Machine and Human Visual Representations across Abstraction Levels
by: Muttenthaler, Lukas, et al.
Published: (2024)
by: Muttenthaler, Lukas, et al.
Published: (2024)
Vision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context Understanding
by: Nguyen-Truong, Hai, et al.
Published: (2024)
by: Nguyen-Truong, Hai, et al.
Published: (2024)
A Survey of Video Datasets for Grounded Event Understanding
by: Sanders, Kate, et al.
Published: (2024)
by: Sanders, Kate, et al.
Published: (2024)
HA-VLN 2.0: An Open Benchmark and Leaderboard for Human-Aware Navigation in Discrete and Continuous Environments with Dynamic Multi-Human Interactions
by: Dong, Yifei, et al.
Published: (2025)
by: Dong, Yifei, et al.
Published: (2025)
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
by: Nguyen, Le Thien Phuc, et al.
Published: (2025)
by: Nguyen, Le Thien Phuc, et al.
Published: (2025)
VITAL: Interactive Few-Shot Imitation Learning via Visual Human-in-the-Loop Corrections
by: Kasaei, Hamidreza, et al.
Published: (2024)
by: Kasaei, Hamidreza, et al.
Published: (2024)
Similar Items
-
CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models
by: Nguyen, Quang-Binh, et al.
Published: (2025) -
MADTempo: An Interactive System for Multi-Event Temporal Video Retrieval with Query Augmentation
by: Vu, Huu-An, et al.
Published: (2025) -
KTVIC: A Vietnamese Image Captioning Dataset on the Life Domain
by: Pham, Anh-Cuong, et al.
Published: (2024) -
FurniMAS: Language-Guided Furniture Decoration using Multi-Agent System
by: Nguyen, Toan, et al.
Published: (2025) -
MambaU-Lite: A Lightweight Model based on Mamba and Integrated Channel-Spatial Attention for Skin Lesion Segmentation
by: Nguyen, Thi-Nhu-Quynh, et al.
Published: (2024)