Saved in:
| Main Authors: | , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.20836 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866915534519402496 |
|---|---|
| author | Snel, Jakob Oh, Seong Joon |
| author_facet | Snel, Jakob Oh, Seong Joon |
| contents | Large Language Models (LLMs) hallucinate, and detecting these cases is key to ensuring trust. While many approaches address hallucination detection at the response or span level, recent work explores token-level detection, enabling more fine-grained intervention. However, the distribution of hallucination signal across sequences of hallucinated tokens remains unexplored. We leverage token-level annotations from the RAGTruth corpus and find that the first hallucinated token is far more detectable than later ones. This structural property holds across models, suggesting that first hallucination tokens play a key role in token-level hallucination detection. Our code is available at https://github.com/jakobsnl/RAGTruth_Xtended. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2507_20836 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | First Hallucination Tokens Are Different from Conditional Ones Snel, Jakob Oh, Seong Joon Machine Learning Artificial Intelligence Large Language Models (LLMs) hallucinate, and detecting these cases is key to ensuring trust. While many approaches address hallucination detection at the response or span level, recent work explores token-level detection, enabling more fine-grained intervention. However, the distribution of hallucination signal across sequences of hallucinated tokens remains unexplored. We leverage token-level annotations from the RAGTruth corpus and find that the first hallucinated token is far more detectable than later ones. This structural property holds across models, suggesting that first hallucination tokens play a key role in token-level hallucination detection. Our code is available at https://github.com/jakobsnl/RAGTruth_Xtended. |
| title | First Hallucination Tokens Are Different from Conditional Ones |
| topic | Machine Learning Artificial Intelligence |
| url | https://arxiv.org/abs/2507.20836 |