:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Fazli, Mehrdad, Wei, Bowen, Zhu, Ziwei
Format:	Preprint
Published:	2026
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2601.05939
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Mitigating Hallucination in Large Vision-Language Models via Adaptive Attention Calibration
by: Fazli, Mehrdad, et al.
Published: (2025)

Long Context Transfer from Language to Vision
by: Zhang, Peiyuan, et al.
Published: (2024)

ViSpec: Accelerating Vision-Language Models with Vision-Aware Speculative Decoding
by: Kang, Jialiang, et al.
Published: (2025)

Context-Aware Autoregressive Models for Multi-Conditional Image Generation
by: Chen, Yixiao, et al.
Published: (2025)

On the Faithfulness of Vision Transformer Explanations
by: Wu, Junyi, et al.
Published: (2024)

VideoSAVi: Self-Aligned Video Language Models without Human Supervision
by: Kulkarni, Yogesh, et al.
Published: (2024)

Vision-Language Binding in In-Context Image Generation
by: Ge, Chris, et al.
Published: (2026)

VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
by: Zheng, Dian, et al.
Published: (2025)

Stencil: Subject-Driven Generation with Context Guidance
by: Chen, Gordon, et al.
Published: (2025)

FaithScore: Fine-grained Evaluations of Hallucinations in Large Vision-Language Models
by: Jing, Liqiang, et al.
Published: (2023)

Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning
by: Liu, Shih-Wen, et al.
Published: (2025)

Bridging Vision and Language for Robust Context-Aware Surgical Point Tracking: The VL-SurgPT Dataset and Benchmark
by: Zhou, Rulin, et al.
Published: (2025)

ChartQA-X: Generating Explanations for Visual Chart Reasoning
by: Hegde, Shamanthak, et al.
Published: (2025)

Branch, or Layer? Zeroth-Order Optimization for Continual Learning of Vision-Language Models
by: Liu, Ziwei, et al.
Published: (2025)

Dude: Dual Distribution-Aware Context Prompt Learning For Large Vision-Language Model
by: Nguyen, Duy M. H., et al.
Published: (2024)

Evaluating Reasoning Faithfulness in Medical Vision-Language Models using Multimodal Perturbations
by: Moll, Johannes, et al.
Published: (2025)

VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models
by: Qiu, Haoyi, et al.
Published: (2024)

Large Vision-Language Models as Emotion Recognizers in Context Awareness
by: Lei, Yuxuan, et al.
Published: (2024)

SAKED: Mitigating Hallucination in Large Vision-Language Models via Stability-Aware Knowledge Enhanced Decoding
by: Li, Zhaoxu, et al.
Published: (2026)

Context Diffusion: In-Context Aware Image Generation
by: Najdenkoska, Ivona, et al.
Published: (2023)

Training Long-Context Vision-Language Models Effectively with Generalization Beyond 128K Context
by: Wang, Zhaowei, et al.
Published: (2026)

Towards Multimodal In-Context Learning for Vision & Language Models
by: Doveh, Sivan, et al.
Published: (2024)

CASCADE: Context-Aware Relaxation for Speculative Image Decoding
by: Yildirim, Selin, et al.
Published: (2026)

VidHalluc: Evaluating Temporal Hallucinations in Multimodal Large Language Models for Video Understanding
by: Li, Chaoyu, et al.
Published: (2024)

Explanation-Driven Counterfactual Testing for Faithfulness in Vision-Language Model Explanations
by: Ding, Sihao, et al.
Published: (2025)

Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models
by: Qi, Jianing, et al.
Published: (2025)

IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding
by: Zhu, Lanyun, et al.
Published: (2024)

MMSpec: Benchmarking Speculative Decoding for Vision-Language Models
by: Shen, Hui, et al.
Published: (2026)

Negation-Aware Test-Time Adaptation for Vision-Language Models
by: Han, Haochen, et al.
Published: (2025)

EgoVITA: Learning to Plan and Verify for Egocentric Video Reasoning
by: Kulkarni, Yogesh, et al.
Published: (2025)

AVATAR: Reinforcement Learning to See, Hear, and Reason Over Video
by: Kulkarni, Yogesh, et al.
Published: (2025)

VideoPASTA: 7K Preference Pairs That Matter for Video-LLM Alignment
by: Kulkarni, Yogesh, et al.
Published: (2025)

Fine-Grained Post-Training Quantization for Large Vision Language Models with Quantization-Aware Integrated Gradients
by: Xiang, Ziwei, et al.
Published: (2026)

Context-Aware Token Selection and Packing for Enhanced Vision Transformer
by: Zhang, Tianyi, et al.
Published: (2024)

HSVLT: Hierarchical Scale-Aware Vision-Language Transformer for Multi-Label Image Classification
by: Ouyang, Shuyi, et al.
Published: (2024)

Step-Level Visual Grounding Faithfulness Predicts Out-of-Distribution Generalization in Long-Horizon Vision-Language Models
by: Rahman, Md Ashikur, et al.
Published: (2026)

Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models
by: Zhang, Ce, et al.
Published: (2025)

TraceVision: Trajectory-Aware Vision-Language Model for Human-Like Spatial Understanding
by: Yang, Fan, et al.
Published: (2026)

Optimizing Vision-Language Interactions Through Decoder-Only Models
by: Tanaka, Kaito, et al.
Published: (2024)

BFA++: Hierarchical Best-Feature-Aware Token Prune for Multi-View Vision Language Action Model
by: Li, Haosheng, et al.
Published: (2026)