Saved in:
| Main Authors: | Pham, Chau, Phan, Hoang, Doermann, David, Tian, Yunjie |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2412.17610 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AutoEdit: Automatic Hyperparameter Tuning for Image Editing
by: Pham, Chau, et al.
Published: (2025)
by: Pham, Chau, et al.
Published: (2025)
YOLOv12: Attention-Centric Real-Time Object Detectors
by: Tian, Yunjie, et al.
Published: (2025)
by: Tian, Yunjie, et al.
Published: (2025)
Score-Control for Hallucination Reduction in Diffusion Models
by: Bhosale, Mahesh, et al.
Published: (2026)
by: Bhosale, Mahesh, et al.
Published: (2026)
PathDiff: Histopathology Image Synthesis with Unpaired Text and Mask Conditions
by: Bhosale, Mahesh, et al.
Published: (2025)
by: Bhosale, Mahesh, et al.
Published: (2025)
Artemis: Towards Referential Understanding in Complex Videos
by: Qiu, Jihao, et al.
Published: (2024)
by: Qiu, Jihao, et al.
Published: (2024)
FairLLaVA: Fairness-Aware Parameter-Efficient Fine-Tuning for Large Vision-Language Assistants
by: Bhosale, Mahesh, et al.
Published: (2026)
by: Bhosale, Mahesh, et al.
Published: (2026)
ChartReformer: Natural Language-Driven Chart Image Editing
by: Yan, Pengyu, et al.
Published: (2024)
by: Yan, Pengyu, et al.
Published: (2024)
Building Vision Models upon Heat Conduction
by: Wang, Zhaozhi, et al.
Published: (2024)
by: Wang, Zhaozhi, et al.
Published: (2024)
Personalization Toolkit: Training Free Personalization of Large Vision Language Models
by: Seifi, Soroush, et al.
Published: (2025)
by: Seifi, Soroush, et al.
Published: (2025)
Enhancing Feature Diversity Boosts Channel-Adaptive Vision Transformers
by: Pham, Chau, et al.
Published: (2024)
by: Pham, Chau, et al.
Published: (2024)
H-POPE: Hierarchical Polling-based Probing Evaluation of Hallucinations in Large Vision-Language Models
by: Pham, Nhi, et al.
Published: (2024)
by: Pham, Nhi, et al.
Published: (2024)
Leaf-Based Plant Disease Detection and Explainable AI
by: Sagar, Saurav, et al.
Published: (2023)
by: Sagar, Saurav, et al.
Published: (2023)
When Large Vision-Language Models Meet Person Re-Identification
by: Wang, Qizao, et al.
Published: (2024)
by: Wang, Qizao, et al.
Published: (2024)
Anatomy of a Feeling: Narrating Embodied Emotions via Large Vision-Language Models
by: Saim, Mohammad, et al.
Published: (2025)
by: Saim, Mohammad, et al.
Published: (2025)
Contextualized Visual Personalization in Vision-Language Models
by: Oh, Yeongtak, et al.
Published: (2026)
by: Oh, Yeongtak, et al.
Published: (2026)
ETLNet: An Efficient TCN-BiLSTM Network for Road Anomaly Detection Using Smartphone Sensors
by: Ansari, Mohd Faiz, et al.
Published: (2024)
by: Ansari, Mohd Faiz, et al.
Published: (2024)
Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios
by: Phan, Van-Hoang-Anh, et al.
Published: (2025)
by: Phan, Van-Hoang-Anh, et al.
Published: (2025)
TRACE: Evidence Grounding-Guided Multi-Video Event Understanding and Claim Generation
by: Yan, Pengyu, et al.
Published: (2026)
by: Yan, Pengyu, et al.
Published: (2026)
ScenePilot-4K: A Large-Scale First-Person Dataset and Benchmark for Vision-Language Models in Autonomous Driving
by: Wang, Yujin, et al.
Published: (2026)
by: Wang, Yujin, et al.
Published: (2026)
Multimodal Chain of Continuous Thought for Latent-Space Reasoning in Vision-Language Models
by: Pham, Tan-Hanh, et al.
Published: (2025)
by: Pham, Tan-Hanh, et al.
Published: (2025)
LP-OVOD: Open-Vocabulary Object Detection by Linear Probing
by: Pham, Chau, et al.
Published: (2023)
by: Pham, Chau, et al.
Published: (2023)
ChA-MAEViT: Unifying Channel-Aware Masked Autoencoders and Multi-Channel Vision Transformers for Improved Cross-Channel Learning
by: Pham, Chau, et al.
Published: (2025)
by: Pham, Chau, et al.
Published: (2025)
Insect-Foundation: A Foundation Model and Large Multimodal Dataset for Vision-Language Insect Understanding
by: Truong, Thanh-Dat, et al.
Published: (2025)
by: Truong, Thanh-Dat, et al.
Published: (2025)
PAS : Prelim Attention Score for Detecting Object Hallucinations in Large Vision--Language Models
by: Hoang-Xuan, Nhat, et al.
Published: (2025)
by: Hoang-Xuan, Nhat, et al.
Published: (2025)
The Abstraction Gap in Vision-Language Causal Reasoning
by: Hoang, Chinh, et al.
Published: (2026)
by: Hoang, Chinh, et al.
Published: (2026)
DocSum: Domain-Adaptive Pre-training for Document Abstractive Summarization
by: Chau, Phan Phuong Mai, et al.
Published: (2024)
by: Chau, Phan Phuong Mai, et al.
Published: (2024)
Spatial Transform Decoupling for Oriented Object Detection
by: Yu, Hongtian, et al.
Published: (2023)
by: Yu, Hongtian, et al.
Published: (2023)
UlcerGPT: A Multimodal Approach Leveraging Large Language and Vision Models for Diabetic Foot Ulcer Image Transcription
by: Basiri, Reza, et al.
Published: (2024)
by: Basiri, Reza, et al.
Published: (2024)
Leveraging knowledge distillation for partial multi-task learning from multiple remote sensing datasets
by: Lê, Hoàng-Ân, et al.
Published: (2024)
by: Lê, Hoàng-Ân, et al.
Published: (2024)
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
by: Tian, Xiaoyu, et al.
Published: (2024)
by: Tian, Xiaoyu, et al.
Published: (2024)
Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Models
by: Le, Quang-Hung, et al.
Published: (2024)
by: Le, Quang-Hung, et al.
Published: (2024)
Unveiling Concept Attribution in Diffusion Models
by: Nguyen, Quang H., et al.
Published: (2024)
by: Nguyen, Quang H., et al.
Published: (2024)
Motion Consistency Model: Accelerating Video Diffusion with Disentangled Motion-Appearance Distillation
by: Zhai, Yuanhao, et al.
Published: (2024)
by: Zhai, Yuanhao, et al.
Published: (2024)
Large Language Models for Video Surveillance Applications
by: De Silva, Ulindu, et al.
Published: (2025)
by: De Silva, Ulindu, et al.
Published: (2025)
CRAFT: Critic-Refined Adaptive Key-Frame Targeting for Multimodal Video Question Answering
by: Bhosale, Mahesh, et al.
Published: (2026)
by: Bhosale, Mahesh, et al.
Published: (2026)
Rethinking Overlooked Aspects in Vision-Language Models
by: Liu, Yuan, et al.
Published: (2024)
by: Liu, Yuan, et al.
Published: (2024)
Large Vision-Language Models Get Lost in Attention
by: Xi, Gongli, et al.
Published: (2026)
by: Xi, Gongli, et al.
Published: (2026)
Phantom of Latent for Large Language and Vision Models
by: Lee, Byung-Kwan, et al.
Published: (2024)
by: Lee, Byung-Kwan, et al.
Published: (2024)
FedVLM: Scalable Personalized Vision-Language Models through Federated Learning
by: Mitra, Arkajyoti, et al.
Published: (2025)
by: Mitra, Arkajyoti, et al.
Published: (2025)
Vision-Language Model-Guided Deep Unrolling Enables Personalized, Fast MRI
by: Ju, Fangmao, et al.
Published: (2026)
by: Ju, Fangmao, et al.
Published: (2026)
Similar Items
-
AutoEdit: Automatic Hyperparameter Tuning for Image Editing
by: Pham, Chau, et al.
Published: (2025) -
YOLOv12: Attention-Centric Real-Time Object Detectors
by: Tian, Yunjie, et al.
Published: (2025) -
Score-Control for Hallucination Reduction in Diffusion Models
by: Bhosale, Mahesh, et al.
Published: (2026) -
PathDiff: Histopathology Image Synthesis with Unpaired Text and Mask Conditions
by: Bhosale, Mahesh, et al.
Published: (2025) -
Artemis: Towards Referential Understanding in Complex Videos
by: Qiu, Jihao, et al.
Published: (2024)