Saved in:
| Main Authors: | Li, Zhuohang, Yan, Chao, Jackson, Nicholas J., Cui, Wendi, Li, Bo, Zhang, Jiaxin, Malin, Bradley A. |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.20560 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
"See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models
by: Gu, Jihao, et al.
Published: (2025)
by: Gu, Jihao, et al.
Published: (2025)
Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings
by: Agrawal, Aakriti, et al.
Published: (2025)
by: Agrawal, Aakriti, et al.
Published: (2025)
DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models
by: Cui, Wendi, et al.
Published: (2024)
by: Cui, Wendi, et al.
Published: (2024)
Diving into Mitigating Hallucinations from a Vision Perspective for Large Vision-Language Models
by: Wang, Weihang, et al.
Published: (2025)
by: Wang, Weihang, et al.
Published: (2025)
RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models
by: Xia, Peng, et al.
Published: (2024)
by: Xia, Peng, et al.
Published: (2024)
Towards Efficient Large Vision-Language Models: A Comprehensive Survey on Inference Strategies
by: Pathak, Surendra, et al.
Published: (2026)
by: Pathak, Surendra, et al.
Published: (2026)
Toward Interactive Regional Understanding in Vision-Large Language Models
by: Lee, Jungbeom, et al.
Published: (2024)
by: Lee, Jungbeom, et al.
Published: (2024)
Watch Closely: Mitigating Object Hallucinations in Large Vision-Language Models with Disentangled Decoding
by: Ma, Ruiqi, et al.
Published: (2025)
by: Ma, Ruiqi, et al.
Published: (2025)
SynthVLM: Towards High-Quality and Efficient Synthesis of Image-Caption Datasets for Vision-Language Models
by: Liu, Zheng, et al.
Published: (2024)
by: Liu, Zheng, et al.
Published: (2024)
Visual In-Context Learning for Large Vision-Language Models
by: Zhou, Yucheng, et al.
Published: (2024)
by: Zhou, Yucheng, et al.
Published: (2024)
PUMGPT: A Large Vision-Language Model for Product Understanding
by: Xue, Wei, et al.
Published: (2023)
by: Xue, Wei, et al.
Published: (2023)
Vision-centric Token Compression in Large Language Model
by: Xing, Ling, et al.
Published: (2025)
by: Xing, Ling, et al.
Published: (2025)
FlightGPT: Towards Generalizable and Interpretable UAV Vision-and-Language Navigation with Vision-Language Models
by: Cai, Hengxing, et al.
Published: (2025)
by: Cai, Hengxing, et al.
Published: (2025)
Toward Guarantees for Clinical Reasoning in Vision Language Models via Formal Verification
by: Singh, Vikash, et al.
Published: (2026)
by: Singh, Vikash, et al.
Published: (2026)
SCE: Scalable Consistency Ensembles Make Blackbox Large Language Model Generation More Reliable
by: Zhang, Jiaxin, et al.
Published: (2025)
by: Zhang, Jiaxin, et al.
Published: (2025)
Mitigating Hallucinations in Large Vision-Language Models with Internal Fact-based Contrastive Decoding
by: Wang, Chao, et al.
Published: (2025)
by: Wang, Chao, et al.
Published: (2025)
Grounded Visual Factualization: Factual Anchor-Based Finetuning for Enhancing MLLM Factual Consistency
by: Morbiato, Filippo, et al.
Published: (2025)
by: Morbiato, Filippo, et al.
Published: (2025)
debiaSAE: Benchmarking and Mitigating Vision-Language Model Bias
by: Sasse, Kuleen, et al.
Published: (2024)
by: Sasse, Kuleen, et al.
Published: (2024)
NPHardEval4V: Dynamic Evaluation of Large Vision-Language Models with Effects of Vision
by: Li, Xiang, et al.
Published: (2024)
by: Li, Xiang, et al.
Published: (2024)
Towards Efficient and Robust VQA-NLE Data Generation with Large Vision-Language Models
by: Irawan, Patrick Amadeus, et al.
Published: (2024)
by: Irawan, Patrick Amadeus, et al.
Published: (2024)
VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment
by: Li, Lei, et al.
Published: (2024)
by: Li, Lei, et al.
Published: (2024)
Do we Really Need Visual Instructions? Towards Visual Instruction-Free Fine-tuning for Large Vision-Language Models
by: Liu, Zikang, et al.
Published: (2025)
by: Liu, Zikang, et al.
Published: (2025)
Where do Large Vision-Language Models Look at when Answering Questions?
by: Xing, Xiaoying, et al.
Published: (2025)
by: Xing, Xiaoying, et al.
Published: (2025)
VEGAS: Mitigating Hallucinations in Large Vision-Language Models via Vision-Encoder Attention Guided Adaptive Steering
by: Wang, Zihu, et al.
Published: (2025)
by: Wang, Zihu, et al.
Published: (2025)
Benchmarking Deflection and Hallucination in Large Vision-Language Models
by: Moratelli, Nicholas, et al.
Published: (2026)
by: Moratelli, Nicholas, et al.
Published: (2026)
Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models
by: Jiang, Lei, et al.
Published: (2025)
by: Jiang, Lei, et al.
Published: (2025)
ArtGPT-4: Towards Artistic-understanding Large Vision-Language Models with Enhanced Adapter
by: Yuan, Zhengqing, et al.
Published: (2023)
by: Yuan, Zhengqing, et al.
Published: (2023)
Scaffolding Coordinates to Promote Vision-Language Coordination in Large Multi-Modal Models
by: Lei, Xuanyu, et al.
Published: (2024)
by: Lei, Xuanyu, et al.
Published: (2024)
Mitigating Hallucinations in Large Vision-Language Models by Self-Injecting Hallucinations
by: Lu, Yifan, et al.
Published: (2025)
by: Lu, Yifan, et al.
Published: (2025)
PROGRESSLM: Towards Progress Reasoning in Vision-Language Models
by: Zhang, Jianshu, et al.
Published: (2026)
by: Zhang, Jianshu, et al.
Published: (2026)
Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving
by: Li, Yue, et al.
Published: (2025)
by: Li, Yue, et al.
Published: (2025)
Intriguing Properties of Large Language and Vision Models
by: Lee, Young-Jun, et al.
Published: (2024)
by: Lee, Young-Jun, et al.
Published: (2024)
Instruction-Aligned Visual Attention for Mitigating Hallucinations in Large Vision-Language Models
by: Li, Bin, et al.
Published: (2025)
by: Li, Bin, et al.
Published: (2025)
Open-Source Image Editing Models Are Zero-Shot Vision Learners
by: Liu, Wei, et al.
Published: (2026)
by: Liu, Wei, et al.
Published: (2026)
An Examination of the Compositionality of Large Generative Vision-Language Models
by: Ma, Teli, et al.
Published: (2023)
by: Ma, Teli, et al.
Published: (2023)
AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models
by: Wu, Yuhang, et al.
Published: (2024)
by: Wu, Yuhang, et al.
Published: (2024)
TTL: Test-time Textual Learning for OOD Detection with Pretrained Vision-Language Models
by: Ye, Jinlun, et al.
Published: (2026)
by: Ye, Jinlun, et al.
Published: (2026)
Seeing No Evil: Blinding Large Vision-Language Models to Safety Instructions via Adversarial Attention Hijacking
by: Li, Jingru, et al.
Published: (2026)
by: Li, Jingru, et al.
Published: (2026)
Towards Zero-Shot Annotation of the Built Environment with Vision-Language Models (Vision Paper)
by: Han, Bin, et al.
Published: (2024)
by: Han, Bin, et al.
Published: (2024)
Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
by: Qin, Luozheng, et al.
Published: (2025)
by: Qin, Luozheng, et al.
Published: (2025)
Similar Items
-
"See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models
by: Gu, Jihao, et al.
Published: (2025) -
Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings
by: Agrawal, Aakriti, et al.
Published: (2025) -
DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models
by: Cui, Wendi, et al.
Published: (2024) -
Diving into Mitigating Hallucinations from a Vision Perspective for Large Vision-Language Models
by: Wang, Weihang, et al.
Published: (2025) -
RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models
by: Xia, Peng, et al.
Published: (2024)