:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Le, Cuong, Le, Huy-Phuong, Le, Duc, Duong, Minh-Thien, Nguyen, Van-Binh, Le, My-Ha
Format:	Preprint
Published:	2025
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2507.01340
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Optimal-state Dynamics Estimation for Physics-based Human Motion Capture from Videos
by: Le, Cuong, et al.
Published: (2024)

Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios
by: Phan, Van-Hoang-Anh, et al.
Published: (2025)

Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
by: Le, Huy M., et al.
Published: (2025)

GMAT: Grounded Multi-Agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification
by: Quang, Ngoc Bui Lam, et al.
Published: (2025)

QuaMo: Quaternion Motions for Vision-based 3D Human Kinematics Capture
by: Le, Cuong, et al.
Published: (2026)

Dual Strategies for Test-Time Adaptation
by: Phuong, Nam Nguyen, et al.
Published: (2026)

Learning Human Motion with Temporally Conditional Mamba
by: Nguyen, Quang, et al.
Published: (2025)

EgoMusic-driven Human Dance Motion Estimation with Skeleton Mamba
by: Nguyen, Quang, et al.
Published: (2025)

ConstStyle: Robust Domain Generalization with Unified Style Transformation
by: Tran, Nam Duong, et al.
Published: (2025)

DS@BioMed at ImageCLEFmedical Caption 2024: Enhanced Attention Mechanisms in Medical Caption Generation through Concept Detection Integration
by: Nguyen, Nhi Ngoc-Yen, et al.
Published: (2024)

Exploring the Practicality of Federated Learning: A Survey Towards the Communication Perspective
by: Le, Khiem, et al.
Published: (2024)

Time-series Meets Complex Motion Modeling: Robust and Computational-effective Motion Predictor for Multi-object Tracking
by: Do, Nhat-Tan, et al.
Published: (2026)

BALM: A Model-Agnostic Framework for Balanced Multimodal Learning under Imbalanced Missing Rates
by: Nguyen, Phuong-Anh, et al.
Published: (2026)

MissBench: Benchmarking Multimodal Affective Analysis under Imbalanced Missing Modalities
by: Pham, Tien Anh, et al.
Published: (2026)

TwinLiteNet+: An Enhanced Multi-Task Segmentation Model for Autonomous Driving
by: Che, Quang-Huy, et al.
Published: (2024)

OpenEvents V1: Large-Scale Benchmark Dataset for Multimodal Event Grounding
by: Nguyen, Hieu, et al.
Published: (2025)

Multimodal Contextualized Support for Enhancing Video Retrieval System
by: Nguyen-Le, Quoc-Bao, et al.
Published: (2024)

U-CESE: Unified Clip-based Event Search Engine for AI Challenge HCMC 2025
by: Le, Duc-Nhuan, et al.
Published: (2026)

Virtual Fusion with Contrastive Learning for Single Sensor-based Activity Recognition
by: Nguyen, Duc-Anh, et al.
Published: (2023)

SADL: An Effective In-Context Learning Method for Compositional Visual QA
by: Dang, Long Hoang, et al.
Published: (2024)

Adaptive Fusion Network with Temporal-Ranked and Motion-Intensity Dynamic Images for Micro-expression Recognition
by: Man, Thi Bich Phuong, et al.
Published: (2025)

Seeing Through the Tool: A Controlled Benchmark for Occlusion Robustness in Foundation Segmentation Models
by: Ho, Nhan, et al.
Published: (2026)

Deep Learning for Automated Identification of Vietnamese Timber Species: A Tool for Ecological Monitoring and Conservation
by: Song, Tianyu, et al.
Published: (2025)

GeoSearch: Augmenting Worldwide Geolocalization with Web-Scale Reverse Image Search and Image Matching
by: Le-Duc, Tung-Duong, et al.
Published: (2026)

WAVER: Writing-style Agnostic Text-Video Retrieval via Distilling Vision-Language Models Through Open-Vocabulary Knowledge
by: Le, Huy, et al.
Published: (2023)

From Visual Explanations to Counterfactual Explanations with Latent Diffusion
by: Luu, Tung, et al.
Published: (2025)

VietMEAgent: Culturally-Aware Few-Shot Multimodal Explanation for Vietnamese Visual Question Answering
by: Nguyen, Hai-Dang, et al.
Published: (2025)

MedXplain-VQA: Multi-Component Explainable Medical Visual Question Answering
by: Nguyen, Hai-Dang, et al.
Published: (2025)

Region in Context: Text-condition Image editing with Human-like semantic reasoning
by: Vu, Thuy Phuong, et al.
Published: (2025)

Spatiotemporal Graph Convolutional Recurrent Neural Network Model for Citywide Air Pollution Forecasting
by: Le, Van-Duc, et al.
Published: (2023)

Ensemble Learning for Vietnamese Scene Text Spotting in Urban Environments
by: Nguyen, Hieu, et al.
Published: (2024)

Enhancing person re-identification via Uncertainty Feature Fusion Method and Auto-weighted Measure Combination
by: Che, Quang-Huy, et al.
Published: (2024)

SwiftPie: Lightning-fast Subject-driven Image Personalization via One step Diffusion
by: Duong, Huy, et al.
Published: (2026)

FrameDiT: Diffusion Transformer with Matrix Attention for Efficient Video Generation
by: Le, Minh Khoa, et al.
Published: (2026)

Robust Deepfake Detection: Mitigating Spatial Attention Drift via Calibrated Complementary Ensembles
by: Le-Phan, Minh-Khoa, et al.
Published: (2026)

EDGER: EDge-Guided with HEatmap Refinement for Generalizable Image Forgery Localization
by: Le-Phan, Minh-Khoa, et al.
Published: (2026)

SHREC 2025: Retrieval of Optimal Objects for Multi-modal Enhanced Language and Spatial Assistance (ROOMELSA)
by: Nguyen, Trong-Thuan, et al.
Published: (2025)

Leveraging feature communication in federated learning for remote sensing image classification
by: Duong, Anh-Kiet, et al.
Published: (2024)

Describe Anything Model for Visual Question Answering on Text-rich Images
by: Vu, Yen-Linh, et al.
Published: (2025)

Gradient Alignment for Cross-Domain Face Anti-Spoofing
by: Le, Binh M., et al.
Published: (2024)