:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Anh, Duy Le Dinh, Irawan, Patrick Amadeus, Van Vo, Tuan
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2604.10039
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Shape2Animal: Creative Animal Generation from Natural Silhouettes
by: Tran, Quoc-Duy, et al.
Published: (2025)

CONSTANT: Towards High-Quality One-Shot Handwriting Generation with Patch Contrastive Enhancement and Style-Aware Quantization
by: Le, Anh-Duy, et al.
Published: (2026)

Towards Efficient and Robust VQA-NLE Data Generation with Large Vision-Language Models
by: Irawan, Patrick Amadeus, et al.
Published: (2024)

LinguDistill: Recovering Linguistic Ability in Vision-Language Models via Selective Cross-Modal Distillation
by: Irawan, Patrick Amadeus, et al.
Published: (2026)

TP-GMOT: Tracking Generic Multiple Object by Textual Prompt with Motion-Appearance Cost (MAC) SORT
by: Anh, Duy Le Dinh, et al.
Published: (2024)

DRIVESPATIAL: A Benchmark for Spatiotemporal Intelligence in VLMs for Autonomous Driving
by: Vo, Hao, et al.
Published: (2026)

Language-driven Grasp Detection with Mask-guided Attention
by: Van Vo, Tuan, et al.
Published: (2024)

More Reliable Pseudo-labels, Better Performance: A Generalized Approach to Single Positive Multi-label Learning
by: Tran, Luong, et al.
Published: (2025)

M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG
by: Anugraha, David, et al.
Published: (2025)

iCONTRA: Toward Thematic Collection Design Via Interactive Concept Transfer
by: Vo, Dinh-Khoi, et al.
Published: (2024)

A Hybrid Vision Transformer Approach for Mathematical Expression Recognition
by: Le, Anh Duy, et al.
Published: (2026)

Diversity-Aware Agnostic Ensemble of Sharpness Minimizers
by: Bui, Anh, et al.
Published: (2024)

ReFineVLA: Multimodal Reasoning-Aware Generalist Robotic Policies via Teacher-Guided Fine-Tuning
by: Van Vo, Tuan, et al.
Published: (2026)

Vision Language Models are Confused Tourists
by: Irawan, Patrick Amadeus, et al.
Published: (2025)

UniSemAlign: Text-Prototype Alignment with a Foundation Encoder for Semi-Supervised Histopathology Segmentation
by: Thai, Le-Van, et al.
Published: (2026)

SAMURAI: Shape-Aware Multimodal Retrieval for 3D Object Identification
by: Vo, Dinh-Khoi, et al.
Published: (2025)

EVENT-Retriever: Event-Aware Multimodal Image Retrieval for Realistic Captions
by: Vo, Dinh-Khoi, et al.
Published: (2025)

Language-driven Grasp Detection
by: Vuong, An Dinh, et al.
Published: (2024)

Adaptive Subspace Projection for Generative Personalization
by: Nguyen, Van-Anh, et al.
Published: (2026)

[De|Re]constructing VLMs' Reasoning in Counting
by: Alghisi, Simone, et al.
Published: (2025)

Preserving Clusters in Prompt Learning for Unsupervised Domain Adaptation
by: Vuong, Tung-Long, et al.
Published: (2025)

AlertTrap: A study on object detection in remote insects trap monitoring system using on-the-edge deep learning platform
by: Le, An D., et al.
Published: (2021)

CutPaste&Find: Efficient Multimodal Hallucination Detector with Visual-aid Knowledge Base
by: Nguyen, Cong-Duy, et al.
Published: (2025)

Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey
by: Nguyen-Le, Hong-Hanh, et al.
Published: (2024)

PANDORA: Pixel-wise Attention Dissolution and Latent Guidance for Zero-Shot Object Removal
by: Vo, Dinh-Khoi, et al.
Published: (2026)

World2Act: Latent Action Post-Training from World Model Dynamics
by: Vuong, An Dinh, et al.
Published: (2026)

Tracking the Truth: Object-Centric Spatio-Temporal Monitoring for Video Large Language Models
by: Cao, Tri, et al.
Published: (2026)

Cooperative Students: Navigating Unsupervised Domain Adaptation in Nighttime Object Detection
by: Yuan, Jicheng, et al.
Published: (2024)

ScriptHOI: Learning Scripted State Transitions for Open-Vocabulary Human-Object Interaction Detection
by: Nguyen, Minh Anh, et al.
Published: (2026)

Frequency Attention for Knowledge Distillation
by: Pham, Cuong, et al.
Published: (2024)

Enhanced Kalman with Adaptive Appearance Motion SORT for Grounded Generic Multiple Object Tracking
by: Anh, Duy Le Dinh, et al.
Published: (2024)

Enhancing Multimodal Entity Linking with Jaccard Distance-based Conditional Contrastive Learning and Contextual Visual Augmentation
by: Nguyen, Cong-Duy, et al.
Published: (2025)

SuoiAI: Building a Dataset for Aquatic Invertebrates in Vietnam
by: Vo, Tue, et al.
Published: (2025)

DualProtoSeg: Simple and Efficient Design with Text- and Image-Guided Prototype Learning for Weakly Supervised Histopathology Image Segmentation
by: Vu, Anh M., et al.
Published: (2025)

Enhancing Dataset Distillation via Non-Critical Region Refinement
by: Tran, Minh-Tuan, et al.
Published: (2025)

Temporal-Oriented Recipe for Transferring Large Vision-Language Model to Video Understanding
by: Nguyen, Thong, et al.
Published: (2025)

Z-GMOT: Zero-shot Generic Multiple Object Tracking
by: Tran, Kim Hoang, et al.
Published: (2023)

Think Twice before Adaptation: Improving Adaptability of DeepFake Detection via Online Test-Time Adaptation
by: Nguyen-Le, Hong-Hanh, et al.
Published: (2025)

VietMEAgent: Culturally-Aware Few-Shot Multimodal Explanation for Vietnamese Visual Question Answering
by: Nguyen, Hai-Dang, et al.
Published: (2025)

Co-Learning: Towards Semi-Supervised Object Detection with Road-side Cameras
by: Yuan, Jicheng, et al.
Published: (2024)