:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Zhuohang, Yan, Chao, Jackson, Nicholas J., Cui, Wendi, Li, Bo, Zhang, Jiaxin, Malin, Bradley A.
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Computation and Language Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2502.20560
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

"See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models
by: Gu, Jihao, et al.
Published: (2025)

Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings
by: Agrawal, Aakriti, et al.
Published: (2025)

DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models
by: Cui, Wendi, et al.
Published: (2024)

Diving into Mitigating Hallucinations from a Vision Perspective for Large Vision-Language Models
by: Wang, Weihang, et al.
Published: (2025)

RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models
by: Xia, Peng, et al.
Published: (2024)

Towards Efficient Large Vision-Language Models: A Comprehensive Survey on Inference Strategies
by: Pathak, Surendra, et al.
Published: (2026)

Toward Interactive Regional Understanding in Vision-Large Language Models
by: Lee, Jungbeom, et al.
Published: (2024)

Watch Closely: Mitigating Object Hallucinations in Large Vision-Language Models with Disentangled Decoding
by: Ma, Ruiqi, et al.
Published: (2025)

SynthVLM: Towards High-Quality and Efficient Synthesis of Image-Caption Datasets for Vision-Language Models
by: Liu, Zheng, et al.
Published: (2024)

Visual In-Context Learning for Large Vision-Language Models
by: Zhou, Yucheng, et al.
Published: (2024)

PUMGPT: A Large Vision-Language Model for Product Understanding
by: Xue, Wei, et al.
Published: (2023)

Vision-centric Token Compression in Large Language Model
by: Xing, Ling, et al.
Published: (2025)

FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language Models
by: Cai, Hengxing, et al.
Published: (2025)

Toward Guarantees for Clinical Reasoning in Vision Language Models via Formal Verification
by: Singh, Vikash, et al.
Published: (2026)

SCE: Scalable Consistency Ensembles Make Blackbox Large Language Model Generation More Reliable
by: Zhang, Jiaxin, et al.
Published: (2025)

Mitigating Hallucinations in Large Vision-Language Models with Internal Fact-based Contrastive Decoding
by: Wang, Chao, et al.
Published: (2025)

Grounded Visual Factualization: Factual Anchor-Based Finetuning for Enhancing MLLM Factual Consistency
by: Morbiato, Filippo, et al.
Published: (2025)

debiaSAE: Benchmarking and Mitigating Vision-Language Model Bias
by: Sasse, Kuleen, et al.
Published: (2024)

NPHardEval4V: Dynamic Evaluation of Large Vision-Language Models with Effects of Vision
by: Li, Xiang, et al.
Published: (2024)

Towards Efficient and Robust VQA-NLE Data Generation with Large Vision-Language Models
by: Irawan, Patrick Amadeus, et al.
Published: (2024)

VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment
by: Li, Lei, et al.
Published: (2024)

Do we Really Need Visual Instructions? Towards Visual Instruction-Free Fine-tuning for Large Vision-Language Models
by: Liu, Zikang, et al.
Published: (2025)

Where do Large Vision-Language Models Look at when Answering Questions?
by: Xing, Xiaoying, et al.
Published: (2025)

VEGAS: Mitigating Hallucinations in Large Vision-Language Models via Vision-Encoder Attention Guided Adaptive Steering
by: Wang, Zihu, et al.
Published: (2025)

Benchmarking Deflection and Hallucination in Large Vision-Language Models
by: Moratelli, Nicholas, et al.
Published: (2026)

Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models
by: Jiang, Lei, et al.
Published: (2025)

ArtGPT-4: Towards Artistic-understanding Large Vision-Language Models with Enhanced Adapter
by: Yuan, Zhengqing, et al.
Published: (2023)

Scaffolding Coordinates to Promote Vision-Language Coordination in Large Multi-Modal Models
by: Lei, Xuanyu, et al.
Published: (2024)

Mitigating Hallucinations in Large Vision-Language Models by Self-Injecting Hallucinations
by: Lu, Yifan, et al.
Published: (2025)

PROGRESSLM: Towards Progress Reasoning in Vision-Language Models
by: Zhang, Jianshu, et al.
Published: (2026)

Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving
by: Li, Yue, et al.
Published: (2025)

Intriguing Properties of Large Language and Vision Models
by: Lee, Young-Jun, et al.
Published: (2024)

Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models
by: Li, Bin, et al.
Published: (2025)

Open-Source Image Editing Models Are Zero-Shot Vision Learners
by: Liu, Wei, et al.
Published: (2026)

An Examination of the Compositionality of Large Generative Vision-Language Models
by: Ma, Teli, et al.
Published: (2023)

AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models
by: Wu, Yuhang, et al.
Published: (2024)

TTL: Test-time Textual Learning for OOD Detection with Pretrained Vision-Language Models
by: Ye, Jinlun, et al.
Published: (2026)

Seeing No Evil: Blinding Large Vision-Language Models to Safety Instructions via Adversarial Attention Hijacking
by: Li, Jingru, et al.
Published: (2026)

Towards Zero-Shot Annotation of the Built Environment with Vision-Language Models (Vision Paper)
by: Han, Bin, et al.
Published: (2024)

Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
by: Qin, Luozheng, et al.
Published: (2025)