MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Rawte, Vipula, Tonmoy, S. M Towhidul Islam, Rajbangshi, Krishnav, Nag, Shravani, Chadha, Aman, Sheth, Amit P., Das, Amitava
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Computation and Language Artificial Intelligence
Accesso online:	https://arxiv.org/abs/2403.19113
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866911818316775424
author	Rawte, Vipula Tonmoy, S. M Towhidul Islam Rajbangshi, Krishnav Nag, Shravani Chadha, Aman Sheth, Amit P. Das, Amitava
author_facet	Rawte, Vipula Tonmoy, S. M Towhidul Islam Rajbangshi, Krishnav Nag, Shravani Chadha, Aman Sheth, Amit P. Das, Amitava
contents	The widespread adoption of Large Language Models (LLMs) has facilitated numerous benefits. However, hallucination is a significant concern. In response, Retrieval Augmented Generation (RAG) has emerged as a highly promising paradigm to improve LLM outputs by grounding them in factual information. RAG relies on textual entailment (TE) or similar methods to check if the text produced by LLMs is supported or contradicted, compared to retrieved documents. This paper argues that conventional TE methods are inadequate for spotting hallucinations in content generated by LLMs. For instance, consider a prompt about the 'USA's stance on the Ukraine war''. The AI-generated text states, ...U.S. President Barack Obama says the U.S. will not put troops in Ukraine...'' However, during the war the U.S. president is Joe Biden which contradicts factual reality. Moreover, current TE systems are unable to accurately annotate the given text and identify the exact portion that is contradicted. To address this, we introduces a new type of TE called ``Factual Entailment (FE).'', aims to detect factual inaccuracies in content generated by LLMs while also highlighting the specific text segment that contradicts reality. We present FACTOID (FACTual enTAILment for hallucInation Detection), a benchmark dataset for FE. We propose a multi-task learning (MTL) framework for FE, incorporating state-of-the-art (SoTA) long text embeddings such as e5-mistral-7b-instruct, along with GPT-3, SpanBERT, and RoFormer. The proposed MTL architecture for FE achieves an avg. 40\% improvement in accuracy on the FACTOID benchmark compared to SoTA TE methods. As FE automatically detects hallucinations, we assessed 15 modern LLMs and ranked them using our proposed Auto Hallucination Vulnerability Index (HVI_auto). This index quantifies and offers a comparative scale to evaluate and rank LLMs according to their hallucinations.
format	Preprint
id	arxiv_https___arxiv_org_abs_2403_19113
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	FACTOID: FACtual enTailment fOr hallucInation Detection Rawte, Vipula Tonmoy, S. M Towhidul Islam Rajbangshi, Krishnav Nag, Shravani Chadha, Aman Sheth, Amit P. Das, Amitava Computation and Language Artificial Intelligence The widespread adoption of Large Language Models (LLMs) has facilitated numerous benefits. However, hallucination is a significant concern. In response, Retrieval Augmented Generation (RAG) has emerged as a highly promising paradigm to improve LLM outputs by grounding them in factual information. RAG relies on textual entailment (TE) or similar methods to check if the text produced by LLMs is supported or contradicted, compared to retrieved documents. This paper argues that conventional TE methods are inadequate for spotting hallucinations in content generated by LLMs. For instance, consider a prompt about the 'USA's stance on the Ukraine war''. The AI-generated text states, ...U.S. President Barack Obama says the U.S. will not put troops in Ukraine...'' However, during the war the U.S. president is Joe Biden which contradicts factual reality. Moreover, current TE systems are unable to accurately annotate the given text and identify the exact portion that is contradicted. To address this, we introduces a new type of TE called ``Factual Entailment (FE).'', aims to detect factual inaccuracies in content generated by LLMs while also highlighting the specific text segment that contradicts reality. We present FACTOID (FACTual enTAILment for hallucInation Detection), a benchmark dataset for FE. We propose a multi-task learning (MTL) framework for FE, incorporating state-of-the-art (SoTA) long text embeddings such as e5-mistral-7b-instruct, along with GPT-3, SpanBERT, and RoFormer. The proposed MTL architecture for FE achieves an avg. 40\% improvement in accuracy on the FACTOID benchmark compared to SoTA TE methods. As FE automatically detects hallucinations, we assessed 15 modern LLMs and ranked them using our proposed Auto Hallucination Vulnerability Index (HVI_auto). This index quantifies and offers a comparative scale to evaluate and rank LLMs according to their hallucinations.
title	FACTOID: FACtual enTailment fOr hallucInation Detection
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2403.19113

Documenti analoghi