Saved in:
| Main Authors: | Anh, Duy Le Dinh, Irawan, Patrick Amadeus, Van Vo, Tuan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.10039 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Shape2Animal: Creative Animal Generation from Natural Silhouettes
by: Tran, Quoc-Duy, et al.
Published: (2025)
by: Tran, Quoc-Duy, et al.
Published: (2025)
CONSTANT: Towards High-Quality One-Shot Handwriting Generation with Patch Contrastive Enhancement and Style-Aware Quantization
by: Le, Anh-Duy, et al.
Published: (2026)
by: Le, Anh-Duy, et al.
Published: (2026)
Towards Efficient and Robust VQA-NLE Data Generation with Large Vision-Language Models
by: Irawan, Patrick Amadeus, et al.
Published: (2024)
by: Irawan, Patrick Amadeus, et al.
Published: (2024)
LinguDistill: Recovering Linguistic Ability in Vision-Language Models via Selective Cross-Modal Distillation
by: Irawan, Patrick Amadeus, et al.
Published: (2026)
by: Irawan, Patrick Amadeus, et al.
Published: (2026)
TP-GMOT: Tracking Generic Multiple Object by Textual Prompt with Motion-Appearance Cost (MAC) SORT
by: Anh, Duy Le Dinh, et al.
Published: (2024)
by: Anh, Duy Le Dinh, et al.
Published: (2024)
DRIVESPATIAL: A Benchmark for Spatiotemporal Intelligence in VLMs for Autonomous Driving
by: Vo, Hao, et al.
Published: (2026)
by: Vo, Hao, et al.
Published: (2026)
Language-driven Grasp Detection with Mask-guided Attention
by: Van Vo, Tuan, et al.
Published: (2024)
by: Van Vo, Tuan, et al.
Published: (2024)
More Reliable Pseudo-labels, Better Performance: A Generalized Approach to Single Positive Multi-label Learning
by: Tran, Luong, et al.
Published: (2025)
by: Tran, Luong, et al.
Published: (2025)
M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG
by: Anugraha, David, et al.
Published: (2025)
by: Anugraha, David, et al.
Published: (2025)
iCONTRA: Toward Thematic Collection Design Via Interactive Concept Transfer
by: Vo, Dinh-Khoi, et al.
Published: (2024)
by: Vo, Dinh-Khoi, et al.
Published: (2024)
A Hybrid Vision Transformer Approach for Mathematical Expression Recognition
by: Le, Anh Duy, et al.
Published: (2026)
by: Le, Anh Duy, et al.
Published: (2026)
Diversity-Aware Agnostic Ensemble of Sharpness Minimizers
by: Bui, Anh, et al.
Published: (2024)
by: Bui, Anh, et al.
Published: (2024)
ReFineVLA: Multimodal Reasoning-Aware Generalist Robotic Policies via Teacher-Guided Fine-Tuning
by: Van Vo, Tuan, et al.
Published: (2026)
by: Van Vo, Tuan, et al.
Published: (2026)
Vision Language Models are Confused Tourists
by: Irawan, Patrick Amadeus, et al.
Published: (2025)
by: Irawan, Patrick Amadeus, et al.
Published: (2025)
UniSemAlign: Text-Prototype Alignment with a Foundation Encoder for Semi-Supervised Histopathology Segmentation
by: Thai, Le-Van, et al.
Published: (2026)
by: Thai, Le-Van, et al.
Published: (2026)
SAMURAI: Shape-Aware Multimodal Retrieval for 3D Object Identification
by: Vo, Dinh-Khoi, et al.
Published: (2025)
by: Vo, Dinh-Khoi, et al.
Published: (2025)
EVENT-Retriever: Event-Aware Multimodal Image Retrieval for Realistic Captions
by: Vo, Dinh-Khoi, et al.
Published: (2025)
by: Vo, Dinh-Khoi, et al.
Published: (2025)
Language-driven Grasp Detection
by: Vuong, An Dinh, et al.
Published: (2024)
by: Vuong, An Dinh, et al.
Published: (2024)
Adaptive Subspace Projection for Generative Personalization
by: Nguyen, Van-Anh, et al.
Published: (2026)
by: Nguyen, Van-Anh, et al.
Published: (2026)
[De|Re]constructing VLMs' Reasoning in Counting
by: Alghisi, Simone, et al.
Published: (2025)
by: Alghisi, Simone, et al.
Published: (2025)
Preserving Clusters in Prompt Learning for Unsupervised Domain Adaptation
by: Vuong, Tung-Long, et al.
Published: (2025)
by: Vuong, Tung-Long, et al.
Published: (2025)
AlertTrap: A study on object detection in remote insects trap monitoring system using on-the-edge deep learning platform
by: Le, An D., et al.
Published: (2021)
by: Le, An D., et al.
Published: (2021)
CutPaste&Find: Efficient Multimodal Hallucination Detector with Visual-aid Knowledge Base
by: Nguyen, Cong-Duy, et al.
Published: (2025)
by: Nguyen, Cong-Duy, et al.
Published: (2025)
Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey
by: Nguyen-Le, Hong-Hanh, et al.
Published: (2024)
by: Nguyen-Le, Hong-Hanh, et al.
Published: (2024)
PANDORA: Pixel-wise Attention Dissolution and Latent Guidance for Zero-Shot Object Removal
by: Vo, Dinh-Khoi, et al.
Published: (2026)
by: Vo, Dinh-Khoi, et al.
Published: (2026)
World2Act: Latent Action Post-Training from World Model Dynamics
by: Vuong, An Dinh, et al.
Published: (2026)
by: Vuong, An Dinh, et al.
Published: (2026)
Tracking the Truth: Object-Centric Spatio-Temporal Monitoring for Video Large Language Models
by: Cao, Tri, et al.
Published: (2026)
by: Cao, Tri, et al.
Published: (2026)
Cooperative Students: Navigating Unsupervised Domain Adaptation in Nighttime Object Detection
by: Yuan, Jicheng, et al.
Published: (2024)
by: Yuan, Jicheng, et al.
Published: (2024)
ScriptHOI: Learning Scripted State Transitions for Open-Vocabulary Human-Object Interaction Detection
by: Nguyen, Minh Anh, et al.
Published: (2026)
by: Nguyen, Minh Anh, et al.
Published: (2026)
Frequency Attention for Knowledge Distillation
by: Pham, Cuong, et al.
Published: (2024)
by: Pham, Cuong, et al.
Published: (2024)
Enhanced Kalman with Adaptive Appearance Motion SORT for Grounded Generic Multiple Object Tracking
by: Anh, Duy Le Dinh, et al.
Published: (2024)
by: Anh, Duy Le Dinh, et al.
Published: (2024)
Enhancing Multimodal Entity Linking with Jaccard Distance-based Conditional Contrastive Learning and Contextual Visual Augmentation
by: Nguyen, Cong-Duy, et al.
Published: (2025)
by: Nguyen, Cong-Duy, et al.
Published: (2025)
SuoiAI: Building a Dataset for Aquatic Invertebrates in Vietnam
by: Vo, Tue, et al.
Published: (2025)
by: Vo, Tue, et al.
Published: (2025)
DualProtoSeg: Simple and Efficient Design with Text- and Image-Guided Prototype Learning for Weakly Supervised Histopathology Image Segmentation
by: Vu, Anh M., et al.
Published: (2025)
by: Vu, Anh M., et al.
Published: (2025)
Enhancing Dataset Distillation via Non-Critical Region Refinement
by: Tran, Minh-Tuan, et al.
Published: (2025)
by: Tran, Minh-Tuan, et al.
Published: (2025)
Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding
by: Nguyen, Thong, et al.
Published: (2025)
by: Nguyen, Thong, et al.
Published: (2025)
Z-GMOT: Zero-shot Generic Multiple Object Tracking
by: Tran, Kim Hoang, et al.
Published: (2023)
by: Tran, Kim Hoang, et al.
Published: (2023)
Think Twice before Adaptation: Improving Adaptability of DeepFake Detection via Online Test-Time Adaptation
by: Nguyen-Le, Hong-Hanh, et al.
Published: (2025)
by: Nguyen-Le, Hong-Hanh, et al.
Published: (2025)
VietMEAgent: Culturally-Aware Few-Shot Multimodal Explanation for Vietnamese Visual Question Answering
by: Nguyen, Hai-Dang, et al.
Published: (2025)
by: Nguyen, Hai-Dang, et al.
Published: (2025)
Co-Learning: Towards Semi-Supervised Object Detection with Road-side Cameras
by: Yuan, Jicheng, et al.
Published: (2024)
by: Yuan, Jicheng, et al.
Published: (2024)
Similar Items
-
Shape2Animal: Creative Animal Generation from Natural Silhouettes
by: Tran, Quoc-Duy, et al.
Published: (2025) -
CONSTANT: Towards High-Quality One-Shot Handwriting Generation with Patch Contrastive Enhancement and Style-Aware Quantization
by: Le, Anh-Duy, et al.
Published: (2026) -
Towards Efficient and Robust VQA-NLE Data Generation with Large Vision-Language Models
by: Irawan, Patrick Amadeus, et al.
Published: (2024) -
LinguDistill: Recovering Linguistic Ability in Vision-Language Models via Selective Cross-Modal Distillation
by: Irawan, Patrick Amadeus, et al.
Published: (2026) -
TP-GMOT: Tracking Generic Multiple Object by Textual Prompt with Motion-Appearance Cost (MAC) SORT
by: Anh, Duy Le Dinh, et al.
Published: (2024)