:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Pham, Chau, Phan, Hoang, Doermann, David, Tian, Yunjie
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2412.17610
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AutoEdit: Automatic Hyperparameter Tuning for Image Editing
by: Pham, Chau, et al.
Published: (2025)

YOLOv12: Attention-Centric Real-Time Object Detectors
by: Tian, Yunjie, et al.
Published: (2025)

Score-Control for Hallucination Reduction in Diffusion Models
by: Bhosale, Mahesh, et al.
Published: (2026)

PathDiff: Histopathology Image Synthesis with Unpaired Text and Mask Conditions
by: Bhosale, Mahesh, et al.
Published: (2025)

Artemis: Towards Referential Understanding in Complex Videos
by: Qiu, Jihao, et al.
Published: (2024)

FairLLaVA: Fairness-Aware Parameter-Efficient Fine-Tuning for Large Vision-Language Assistants
by: Bhosale, Mahesh, et al.
Published: (2026)

ChartReformer: Natural Language-Driven Chart Image Editing
by: Yan, Pengyu, et al.
Published: (2024)

Building Vision Models upon Heat Conduction
by: Wang, Zhaozhi, et al.
Published: (2024)

Personalization Toolkit: Training Free Personalization of Large Vision Language Models
by: Seifi, Soroush, et al.
Published: (2025)

Enhancing Feature Diversity Boosts Channel-Adaptive Vision Transformers
by: Pham, Chau, et al.
Published: (2024)

H-POPE: Hierarchical Polling-based Probing Evaluation of Hallucinations in Large Vision-Language Models
by: Pham, Nhi, et al.
Published: (2024)

Leaf-Based Plant Disease Detection and Explainable AI
by: Sagar, Saurav, et al.
Published: (2023)

When Large Vision-Language Models Meet Person Re-Identification
by: Wang, Qizao, et al.
Published: (2024)

Anatomy of a Feeling: Narrating Embodied Emotions via Large Vision-Language Models
by: Saim, Mohammad, et al.
Published: (2025)

Contextualized Visual Personalization in Vision-Language Models
by: Oh, Yeongtak, et al.
Published: (2026)

ETLNet: An Efficient TCN-BiLSTM Network for Road Anomaly Detection Using Smartphone Sensors
by: Ansari, Mohd Faiz, et al.
Published: (2024)

Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios
by: Phan, Van-Hoang-Anh, et al.
Published: (2025)

TRACE: Evidence Grounding-Guided Multi-Video Event Understanding and Claim Generation
by: Yan, Pengyu, et al.
Published: (2026)

ScenePilot-4K: A Large-Scale First-Person Dataset and Benchmark for Vision-Language Models in Autonomous Driving
by: Wang, Yujin, et al.
Published: (2026)

Multimodal Chain of Continuous Thought for Latent-Space Reasoning in Vision-Language Models
by: Pham, Tan-Hanh, et al.
Published: (2025)

LP-OVOD: Open-Vocabulary Object Detection by Linear Probing
by: Pham, Chau, et al.
Published: (2023)

ChA-MAEViT: Unifying Channel-Aware Masked Autoencoders and Multi-Channel Vision Transformers for Improved Cross-Channel Learning
by: Pham, Chau, et al.
Published: (2025)

Insect-Foundation: A Foundation Model and Large Multimodal Dataset for Vision-Language Insect Understanding
by: Truong, Thanh-Dat, et al.
Published: (2025)

PAS : Prelim Attention Score for Detecting Object Hallucinations in Large Vision--Language Models
by: Hoang-Xuan, Nhat, et al.
Published: (2025)

The Abstraction Gap in Vision-Language Causal Reasoning
by: Hoang, Chinh, et al.
Published: (2026)

DocSum: Domain-Adaptive Pre-training for Document Abstractive Summarization
by: Chau, Phan Phuong Mai, et al.
Published: (2024)

Spatial Transform Decoupling for Oriented Object Detection
by: Yu, Hongtian, et al.
Published: (2023)

UlcerGPT: A Multimodal Approach Leveraging Large Language and Vision Models for Diabetic Foot Ulcer Image Transcription
by: Basiri, Reza, et al.
Published: (2024)

Leveraging knowledge distillation for partial multi-task learning from multiple remote sensing datasets
by: Lê, Hoàng-Ân, et al.
Published: (2024)

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
by: Tian, Xiaoyu, et al.
Published: (2024)

Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Models
by: Le, Quang-Hung, et al.
Published: (2024)

Unveiling Concept Attribution in Diffusion Models
by: Nguyen, Quang H., et al.
Published: (2024)

Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation
by: Zhai, Yuanhao, et al.
Published: (2024)

Large Language Models for Video Surveillance Applications
by: De Silva, Ulindu, et al.
Published: (2025)

CRAFT: Critic-Refined Adaptive Key-Frame Targeting for Multimodal Video Question Answering
by: Bhosale, Mahesh, et al.
Published: (2026)

Rethinking Overlooked Aspects in Vision-Language Models
by: Liu, Yuan, et al.
Published: (2024)

Large Vision-Language Models Get Lost in Attention
by: Xi, Gongli, et al.
Published: (2026)

Phantom of Latent for Large Language and Vision Models
by: Lee, Byung-Kwan, et al.
Published: (2024)

FedVLM: Scalable Personalized Vision-Language Models through Federated Learning
by: Mitra, Arkajyoti, et al.
Published: (2025)

Vision-Language Model-Guided Deep Unrolling Enables Personalized, Fast MRI
by: Ju, Fangmao, et al.
Published: (2026)