Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Bar-Shalom, Guy, Frasca, Fabrizio, Lim, Derek, Gelberg, Yoav, Ziser, Yftah, El-Yaniv, Ran, Chechik, Gal, Maron, Haggai
Format:	Preprint
Publié:	2025
Sujets:	Machine Learning
Accès en ligne:	https://arxiv.org/abs/2503.14043
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866909815802953728
author	Bar-Shalom, Guy Frasca, Fabrizio Lim, Derek Gelberg, Yoav Ziser, Yftah El-Yaniv, Ran Chechik, Gal Maron, Haggai
author_facet	Bar-Shalom, Guy Frasca, Fabrizio Lim, Derek Gelberg, Yoav Ziser, Yftah El-Yaniv, Ran Chechik, Gal Maron, Haggai
contents	The automated detection of hallucinations and training data contamination is pivotal to the safe deployment of Large Language Models (LLMs). These tasks are particularly challenging in settings where no access to model internals is available. Current approaches in this setup typically leverage only the probabilities of actual tokens in the text, relying on simple task-specific heuristics. Crucially, they overlook the information contained in the full sequence of next-token probability distributions. We propose to go beyond hand-crafted decision rules by learning directly from the complete observable output of LLMs -- consisting not only of next-token probabilities, but also the full sequence of next-token distributions. We refer to this as the LLM Output Signature (LOS), and treat it as a reference data type for detecting hallucinations and data contamination. To that end, we introduce LOS-Net, a lightweight attention-based architecture trained on an efficient encoding of the LOS, which can provably approximate a broad class of existing techniques for both tasks. Empirically, LOS-Net achieves superior performance across diverse benchmarks and LLMs, while maintaining extremely low detection latency. Furthermore, it demonstrates promising transfer capabilities across datasets and LLMs. Full code is available at https://github.com/BarSGuy/Beyond-next-token-probabilities.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_14043
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Beyond Next Token Probabilities: Learnable, Fast Detection of Hallucinations and Data Contamination on LLM Output Distributions Bar-Shalom, Guy Frasca, Fabrizio Lim, Derek Gelberg, Yoav Ziser, Yftah El-Yaniv, Ran Chechik, Gal Maron, Haggai Machine Learning The automated detection of hallucinations and training data contamination is pivotal to the safe deployment of Large Language Models (LLMs). These tasks are particularly challenging in settings where no access to model internals is available. Current approaches in this setup typically leverage only the probabilities of actual tokens in the text, relying on simple task-specific heuristics. Crucially, they overlook the information contained in the full sequence of next-token probability distributions. We propose to go beyond hand-crafted decision rules by learning directly from the complete observable output of LLMs -- consisting not only of next-token probabilities, but also the full sequence of next-token distributions. We refer to this as the LLM Output Signature (LOS), and treat it as a reference data type for detecting hallucinations and data contamination. To that end, we introduce LOS-Net, a lightweight attention-based architecture trained on an efficient encoding of the LOS, which can provably approximate a broad class of existing techniques for both tasks. Empirically, LOS-Net achieves superior performance across diverse benchmarks and LLMs, while maintaining extremely low detection latency. Furthermore, it demonstrates promising transfer capabilities across datasets and LLMs. Full code is available at https://github.com/BarSGuy/Beyond-next-token-probabilities.
title	Beyond Next Token Probabilities: Learnable, Fast Detection of Hallucinations and Data Contamination on LLM Output Distributions
topic	Machine Learning
url	https://arxiv.org/abs/2503.14043

Documents similaires