Saved in:
| Main Authors: | Le, Cuong, Le, Huy-Phuong, Le, Duc, Duong, Minh-Thien, Nguyen, Van-Binh, Le, My-Ha |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.01340 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Optimal-state Dynamics Estimation for Physics-based Human Motion Capture from Videos
by: Le, Cuong, et al.
Published: (2024)
by: Le, Cuong, et al.
Published: (2024)
Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios
by: Phan, Van-Hoang-Anh, et al.
Published: (2025)
by: Phan, Van-Hoang-Anh, et al.
Published: (2025)
Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
by: Le, Huy M., et al.
Published: (2025)
by: Le, Huy M., et al.
Published: (2025)
GMAT: Grounded Multi-Agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification
by: Quang, Ngoc Bui Lam, et al.
Published: (2025)
by: Quang, Ngoc Bui Lam, et al.
Published: (2025)
QuaMo: Quaternion Motions for Vision-based 3D Human Kinematics Capture
by: Le, Cuong, et al.
Published: (2026)
by: Le, Cuong, et al.
Published: (2026)
Dual Strategies for Test-Time Adaptation
by: Phuong, Nam Nguyen, et al.
Published: (2026)
by: Phuong, Nam Nguyen, et al.
Published: (2026)
Learning Human Motion with Temporally Conditional Mamba
by: Nguyen, Quang, et al.
Published: (2025)
by: Nguyen, Quang, et al.
Published: (2025)
EgoMusic-driven Human Dance Motion Estimation with Skeleton Mamba
by: Nguyen, Quang, et al.
Published: (2025)
by: Nguyen, Quang, et al.
Published: (2025)
ConstStyle: Robust Domain Generalization with Unified Style Transformation
by: Tran, Nam Duong, et al.
Published: (2025)
by: Tran, Nam Duong, et al.
Published: (2025)
DS@BioMed at ImageCLEFmedical Caption 2024: Enhanced Attention Mechanisms in Medical Caption Generation through Concept Detection Integration
by: Nguyen, Nhi Ngoc-Yen, et al.
Published: (2024)
by: Nguyen, Nhi Ngoc-Yen, et al.
Published: (2024)
Exploring the Practicality of Federated Learning: A Survey Towards the Communication Perspective
by: Le, Khiem, et al.
Published: (2024)
by: Le, Khiem, et al.
Published: (2024)
Time-series Meets Complex Motion Modeling: Robust and Computational-effective Motion Predictor for Multi-object Tracking
by: Do, Nhat-Tan, et al.
Published: (2026)
by: Do, Nhat-Tan, et al.
Published: (2026)
BALM: A Model-Agnostic Framework for Balanced Multimodal Learning under Imbalanced Missing Rates
by: Nguyen, Phuong-Anh, et al.
Published: (2026)
by: Nguyen, Phuong-Anh, et al.
Published: (2026)
MissBench: Benchmarking Multimodal Affective Analysis under Imbalanced Missing Modalities
by: Pham, Tien Anh, et al.
Published: (2026)
by: Pham, Tien Anh, et al.
Published: (2026)
TwinLiteNet+: An Enhanced Multi-Task Segmentation Model for Autonomous Driving
by: Che, Quang-Huy, et al.
Published: (2024)
by: Che, Quang-Huy, et al.
Published: (2024)
OpenEvents V1: Large-Scale Benchmark Dataset for Multimodal Event Grounding
by: Nguyen, Hieu, et al.
Published: (2025)
by: Nguyen, Hieu, et al.
Published: (2025)
Multimodal Contextualized Support for Enhancing Video Retrieval System
by: Nguyen-Le, Quoc-Bao, et al.
Published: (2024)
by: Nguyen-Le, Quoc-Bao, et al.
Published: (2024)
U-CESE: Unified Clip-based Event Search Engine for AI Challenge HCMC 2025
by: Le, Duc-Nhuan, et al.
Published: (2026)
by: Le, Duc-Nhuan, et al.
Published: (2026)
Virtual Fusion with Contrastive Learning for Single Sensor-based Activity Recognition
by: Nguyen, Duc-Anh, et al.
Published: (2023)
by: Nguyen, Duc-Anh, et al.
Published: (2023)
SADL: An Effective In-Context Learning Method for Compositional Visual QA
by: Dang, Long Hoang, et al.
Published: (2024)
by: Dang, Long Hoang, et al.
Published: (2024)
Adaptive Fusion Network with Temporal-Ranked and Motion-Intensity Dynamic Images for Micro-expression Recognition
by: Man, Thi Bich Phuong, et al.
Published: (2025)
by: Man, Thi Bich Phuong, et al.
Published: (2025)
Seeing Through the Tool: A Controlled Benchmark for Occlusion Robustness in Foundation Segmentation Models
by: Ho, Nhan, et al.
Published: (2026)
by: Ho, Nhan, et al.
Published: (2026)
Deep Learning for Automated Identification of Vietnamese Timber Species: A Tool for Ecological Monitoring and Conservation
by: Song, Tianyu, et al.
Published: (2025)
by: Song, Tianyu, et al.
Published: (2025)
GeoSearch: Augmenting Worldwide Geolocalization with Web-Scale Reverse Image Search and Image Matching
by: Le-Duc, Tung-Duong, et al.
Published: (2026)
by: Le-Duc, Tung-Duong, et al.
Published: (2026)
WAVER: Writing-style Agnostic Text-Video Retrieval via Distilling Vision-Language Models Through Open-Vocabulary Knowledge
by: Le, Huy, et al.
Published: (2023)
by: Le, Huy, et al.
Published: (2023)
From Visual Explanations to Counterfactual Explanations with Latent Diffusion
by: Luu, Tung, et al.
Published: (2025)
by: Luu, Tung, et al.
Published: (2025)
VietMEAgent: Culturally-Aware Few-Shot Multimodal Explanation for Vietnamese Visual Question Answering
by: Nguyen, Hai-Dang, et al.
Published: (2025)
by: Nguyen, Hai-Dang, et al.
Published: (2025)
MedXplain-VQA: Multi-Component Explainable Medical Visual Question Answering
by: Nguyen, Hai-Dang, et al.
Published: (2025)
by: Nguyen, Hai-Dang, et al.
Published: (2025)
Region in Context: Text-condition Image editing with Human-like semantic reasoning
by: Vu, Thuy Phuong, et al.
Published: (2025)
by: Vu, Thuy Phuong, et al.
Published: (2025)
Spatiotemporal Graph Convolutional Recurrent Neural Network Model for Citywide Air Pollution Forecasting
by: Le, Van-Duc, et al.
Published: (2023)
by: Le, Van-Duc, et al.
Published: (2023)
Ensemble Learning for Vietnamese Scene Text Spotting in Urban Environments
by: Nguyen, Hieu, et al.
Published: (2024)
by: Nguyen, Hieu, et al.
Published: (2024)
Enhancing person re-identification via Uncertainty Feature Fusion Method and Auto-weighted Measure Combination
by: Che, Quang-Huy, et al.
Published: (2024)
by: Che, Quang-Huy, et al.
Published: (2024)
SwiftPie: Lightning-fast Subject-driven Image Personalization via One step Diffusion
by: Duong, Huy, et al.
Published: (2026)
by: Duong, Huy, et al.
Published: (2026)
FrameDiT: Diffusion Transformer with Matrix Attention for Efficient Video Generation
by: Le, Minh Khoa, et al.
Published: (2026)
by: Le, Minh Khoa, et al.
Published: (2026)
Robust Deepfake Detection: Mitigating Spatial Attention Drift via Calibrated Complementary Ensembles
by: Le-Phan, Minh-Khoa, et al.
Published: (2026)
by: Le-Phan, Minh-Khoa, et al.
Published: (2026)
EDGER: EDge-Guided with HEatmap Refinement for Generalizable Image Forgery Localization
by: Le-Phan, Minh-Khoa, et al.
Published: (2026)
by: Le-Phan, Minh-Khoa, et al.
Published: (2026)
SHREC 2025: Retrieval of Optimal Objects for Multi-modal Enhanced Language and Spatial Assistance (ROOMELSA)
by: Nguyen, Trong-Thuan, et al.
Published: (2025)
by: Nguyen, Trong-Thuan, et al.
Published: (2025)
Leveraging feature communication in federated learning for remote sensing image classification
by: Duong, Anh-Kiet, et al.
Published: (2024)
by: Duong, Anh-Kiet, et al.
Published: (2024)
Describe Anything Model for Visual Question Answering on Text-rich Images
by: Vu, Yen-Linh, et al.
Published: (2025)
by: Vu, Yen-Linh, et al.
Published: (2025)
Gradient Alignment for Cross-Domain Face Anti-Spoofing
by: Le, Binh M., et al.
Published: (2024)
by: Le, Binh M., et al.
Published: (2024)
Similar Items
-
Optimal-state Dynamics Estimation for Physics-based Human Motion Capture from Videos
by: Le, Cuong, et al.
Published: (2024) -
Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios
by: Phan, Van-Hoang-Anh, et al.
Published: (2025) -
Fusionista2.0: Efficiency Retrieval System for Large-Scale Datasets
by: Le, Huy M., et al.
Published: (2025) -
GMAT: Grounded Multi-Agent Clinical Description Generation for Text Encoder in Vision-Language MIL for Whole Slide Image Classification
by: Quang, Ngoc Bui Lam, et al.
Published: (2025) -
QuaMo: Quaternion Motions for Vision-based 3D Human Kinematics Capture
by: Le, Cuong, et al.
Published: (2026)