MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Poey, Ian, Liu, Jiajun, Zhong, Qishuai, Chenailler, Adrien
Natura:	Preprint
Pubblicazione:	2024
Soggetti:	Computation and Language
Accesso online:	https://arxiv.org/abs/2411.03920
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866912107396595712
author	Poey, Ian Liu, Jiajun Zhong, Qishuai Chenailler, Adrien
author_facet	Poey, Ian Liu, Jiajun Zhong, Qishuai Chenailler, Adrien
contents	Real-time detection of out-of-context LLM outputs is crucial for enterprises looking to safely adopt RAG applications. In this work, we train lightweight models to discriminate LLM-generated text that is semantically out-of-context from retrieved text documents. We preprocess a combination of summarisation and semantic textual similarity datasets to construct training data using minimal resources. We find that DeBERTa is not only the best-performing model under this pipeline, but it is also fast and does not require additional text preprocessing or feature engineering. While emerging work demonstrates that generative LLMs can also be fine-tuned and used in complex data pipelines to achieve state-of-the-art performance, we note that speed and resource limits are important considerations for on-premise deployment.
format	Preprint
id	arxiv_https___arxiv_org_abs_2411_03920
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	RAGulator: Lightweight Out-of-Context Detectors for Grounded Text Generation Poey, Ian Liu, Jiajun Zhong, Qishuai Chenailler, Adrien Computation and Language Real-time detection of out-of-context LLM outputs is crucial for enterprises looking to safely adopt RAG applications. In this work, we train lightweight models to discriminate LLM-generated text that is semantically out-of-context from retrieved text documents. We preprocess a combination of summarisation and semantic textual similarity datasets to construct training data using minimal resources. We find that DeBERTa is not only the best-performing model under this pipeline, but it is also fast and does not require additional text preprocessing or feature engineering. While emerging work demonstrates that generative LLMs can also be fine-tuned and used in complex data pipelines to achieve state-of-the-art performance, we note that speed and resource limits are important considerations for on-premise deployment.
title	RAGulator: Lightweight Out-of-Context Detectors for Grounded Text Generation
topic	Computation and Language
url	https://arxiv.org/abs/2411.03920

Documenti analoghi