:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Le, Dat, Nguyen, Khoa, Wang, Xin, Hu, Shu
Format:	Preprint
Veröffentlicht:	2026
Schlagworte:	Computer Vision and Pattern Recognition
Online-Zugang:	https://arxiv.org/abs/2605.07232
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Learning to mask: Towards generalized face forgery detection
von: Fei, Jianwei, et al.
Veröffentlicht: (2022)

Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy
von: Nguyen, Hoang-Quan, et al.
Veröffentlicht: (2024)

PhotoHolmes: a Python library for forgery detection in digital images
von: O'Flaherty, Julián, et al.
Veröffentlicht: (2024)

Deep video representation learning: a survey
von: Ravanbakhsh, Elham, et al.
Veröffentlicht: (2024)

Cross-view Action Recognition Understanding From Exocentric to Egocentric Perspective
von: Truong, Thanh-Dat, et al.
Veröffentlicht: (2023)

Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding
von: Nguyen, Hoang-Quan, et al.
Veröffentlicht: (2023)

Insect-Foundation: A Foundation Model and Large Multimodal Dataset for Vision-Language Insect Understanding
von: Truong, Thanh-Dat, et al.
Veröffentlicht: (2025)

A multi-center analysis of deep learning methods for video polyp detection and segmentation
von: Ghatwary, Noha, et al.
Veröffentlicht: (2026)

ED-SAM: An Efficient Diffusion Sampling Approach to Domain Generalization in Vision-Language Foundation Models
von: Truong, Thanh-Dat, et al.
Veröffentlicht: (2024)

BRAIN: Bias-Mitigation Continual Learning Approach to Vision-Brain Understanding
von: Nguyen, Xuan-Bac, et al.
Veröffentlicht: (2025)

VirDA: Reusing Backbone for Unsupervised Domain Adaptation with Visual Reprogramming
von: Nguyen, Duy, et al.
Veröffentlicht: (2025)

Understanding normalization in contrastive representation learning and out-of-distribution detection
von: Le-Gia, Tai, et al.
Veröffentlicht: (2023)

MANGO: Multimodal Attention-based Normalizing Flow Approach to Fusion Learning
von: Truong, Thanh-Dat, et al.
Veröffentlicht: (2025)

Adaptive thresholding pattern for fingerprint forgery detection
von: Farzadpour, Zahra, et al.
Veröffentlicht: (2025)

Domain Generalization through Spatial Relation Induction over Visual Primitives
von: Nguyen, Dat, et al.
Veröffentlicht: (2026)

Multi-modal, multi-scale representation learning for satellite imagery analysis just needs a good ALiBi
von: Kage, Patrick, et al.
Veröffentlicht: (2026)

Venomancer: Towards Imperceptible and Target-on-Demand Backdoor Attacks in Federated Learning
von: Nguyen, Son, et al.
Veröffentlicht: (2024)

Timealign: A multi-modal object detection method for time misalignment fusing in autonomous driving
von: Song, Zhihang, et al.
Veröffentlicht: (2024)

BIMA: Bijective Maximum Likelihood Learning Approach to Hallucination Prediction and Mitigation in Large Vision-Language Models
von: Tran, Huu-Thien, et al.
Veröffentlicht: (2025)

VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention
von: Zheng, Mingzhe, et al.
Veröffentlicht: (2025)

A comparison of extended object tracking with multi-modal sensors in indoor environment
von: Shuai, Jiangtao, et al.
Veröffentlicht: (2024)

Amodal Instance Segmentation with Diffusion Shape Prior Estimation
von: Tran, Minh, et al.
Veröffentlicht: (2024)

Phantom: Subject-consistent video generation via cross-modal alignment
von: Liu, Lijie, et al.
Veröffentlicht: (2025)

Towards Robust and Fair Vision Learning in Open-World Environments
von: Truong, Thanh-Dat
Veröffentlicht: (2024)

FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding
von: Truong, Thanh-Dat, et al.
Veröffentlicht: (2023)

Synthetic images aid the recognition of human-made art forgeries
von: Ostmeyer, Johann, et al.
Veröffentlicht: (2023)

FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs
von: Plou, Carlos, et al.
Veröffentlicht: (2025)

CONDA: Continual Unsupervised Domain Adaptation Learning in Visual Perception for Self-Driving Cars
von: Truong, Thanh-Dat, et al.
Veröffentlicht: (2022)

ShapeFormer: Shape Prior Visible-to-Amodal Transformer-based Amodal Instance Segmentation
von: Tran, Minh, et al.
Veröffentlicht: (2024)

SHREC 2025: Retrieval of Optimal Objects for Multi-modal Enhanced Language and Spatial Assistance (ROOMELSA)
von: Nguyen, Trong-Thuan, et al.
Veröffentlicht: (2025)

FrameDiT: Diffusion Transformer with Matrix Attention for Efficient Video Generation
von: Le, Minh Khoa, et al.
Veröffentlicht: (2026)

DINTR: Tracking via Diffusion-based Interpolation
von: Nguyen, Pha, et al.
Veröffentlicht: (2024)

HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding
von: Nguyen, Trong-Thuan, et al.
Veröffentlicht: (2023)

MV-MR: multi-views and multi-representations for self-supervised learning and knowledge distillation
von: Kinakh, Vitaliy, et al.
Veröffentlicht: (2023)

OmViD: Omni-supervised active learning for video action detection
von: Rana, Aayush, et al.
Veröffentlicht: (2025)

Directed-Tokens: A Robust Multi-Modality Alignment Approach to Large Language-Vision Models
von: Truong, Thanh-Dat, et al.
Veröffentlicht: (2025)

VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention
von: Zheng, Mingzhe, et al.
Veröffentlicht: (2024)

A multi-modal vision-language model for generalizable annotation-free pathology localization
von: Yang, Hao, et al.
Veröffentlicht: (2024)

EAGLE: Efficient Adaptive Geometry-based Learning in Cross-view Understanding
von: Truong, Thanh-Dat, et al.
Veröffentlicht: (2024)

Can multimodal representation learning by alignment preserve modality-specific information?
von: Thoreau, Romain, et al.
Veröffentlicht: (2025)