Enregistré dans:
Détails bibliographiques
Auteurs principaux: Wang, Chao, Fu, Weiwei, Zhou, Yang
Format: Preprint
Publié: 2025
Sujets:
Accès en ligne:https://arxiv.org/abs/2503.04457
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866909527849304064
author Wang, Chao
Fu, Weiwei
Zhou, Yang
author_facet Wang, Chao
Fu, Weiwei
Zhou, Yang
contents Vision-language models (VLMs) have achieved remarkable advancements, capitalizing on the impressive capabilities of large language models (LLMs) across diverse tasks. Despite this, a critical challenge known as hallucination occurs when models overconfidently describe objects or attributes absent from the image, a problem exacerbated by the tendency of VLMs to rely on linguistic priors. This limitation reduces model reliability in high-stakes applications. In this work, we have observed the characteristic of logits' continuity consistency enhancement and introduced a straightforward and efficient method, Cross-Temporal Prediction Connection (TPC), designed to enhance the semantic consistency of logits by connecting them temporally across timesteps. TPC amplifies information flow and improves coherence, effectively reducing hallucination. Extensive experiments show that TPC surpasses existing representatives, delivering superior performance in both accuracy and efficiency while maintaining robustness in open-ended text generation tasks.
format Preprint
id arxiv_https___arxiv_org_abs_2503_04457
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle TPC: Cross-Temporal Prediction Connection for Vision-Language Model Hallucination Reduction
Wang, Chao
Fu, Weiwei
Zhou, Yang
Computer Vision and Pattern Recognition
Artificial Intelligence
Vision-language models (VLMs) have achieved remarkable advancements, capitalizing on the impressive capabilities of large language models (LLMs) across diverse tasks. Despite this, a critical challenge known as hallucination occurs when models overconfidently describe objects or attributes absent from the image, a problem exacerbated by the tendency of VLMs to rely on linguistic priors. This limitation reduces model reliability in high-stakes applications. In this work, we have observed the characteristic of logits' continuity consistency enhancement and introduced a straightforward and efficient method, Cross-Temporal Prediction Connection (TPC), designed to enhance the semantic consistency of logits by connecting them temporally across timesteps. TPC amplifies information flow and improves coherence, effectively reducing hallucination. Extensive experiments show that TPC surpasses existing representatives, delivering superior performance in both accuracy and efficiency while maintaining robustness in open-ended text generation tasks.
title TPC: Cross-Temporal Prediction Connection for Vision-Language Model Hallucination Reduction
topic Computer Vision and Pattern Recognition
Artificial Intelligence
url https://arxiv.org/abs/2503.04457