Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Karvonen, Adam, Reuter, Daniel, Rinberg, Roy, Marks, Luke, Garriga-Alonso, Adrià, Warr, Keri
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2511.20621
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912729070043136
author	Karvonen, Adam Reuter, Daniel Rinberg, Roy Marks, Luke Garriga-Alonso, Adrià Warr, Keri
author_facet	Karvonen, Adam Reuter, Daniel Rinberg, Roy Marks, Luke Garriga-Alonso, Adrià Warr, Keri
contents	As demand for LLM inference grows, it is becoming increasingly important that providers and their customers can verify that inference processes are performed correctly, without errors or tampering. However, re-running the same inference process twice often leads to different results due to benign numerical noise, making it difficult to distinguish legitimate variation from actual problems. To address this problem, we introduce Token-DiFR (Token-Divergence-From-Reference), a method for verifying inference outputs by comparing generated tokens against predictions made by a trusted reference implementation conditioned on the same random seed. Sampling seed synchronization tightly constrains valid outputs, leaving providers minimal room to deviate from correct inference, which allows output tokens themselves to serve as auditable evidence of correctness at zero additional cost to the provider. Token-DiFR reliably identifies sampling errors, simulated bugs, and model quantization, detecting 4-bit quantization with AUC $>$ 0.999 within 300 output tokens. For applications requiring sample-efficient forward-pass verification, we additionally introduce Activation-DiFR, a scheme that uses random orthogonal projections to compress activations into compact fingerprints for subsequent verification. Activation-DiFR detects 4-bit quantization with AUC $>$ 0.999 using just 2 output tokens, while reducing communication overhead by 25-75% relative to existing methods. We release an open-source integration with vLLM to accelerate practical deployment of verifiable inference.
format	Preprint
id	arxiv_https___arxiv_org_abs_2511_20621
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	DiFR: Inference Verification Despite Nondeterminism Karvonen, Adam Reuter, Daniel Rinberg, Roy Marks, Luke Garriga-Alonso, Adrià Warr, Keri Machine Learning Artificial Intelligence As demand for LLM inference grows, it is becoming increasingly important that providers and their customers can verify that inference processes are performed correctly, without errors or tampering. However, re-running the same inference process twice often leads to different results due to benign numerical noise, making it difficult to distinguish legitimate variation from actual problems. To address this problem, we introduce Token-DiFR (Token-Divergence-From-Reference), a method for verifying inference outputs by comparing generated tokens against predictions made by a trusted reference implementation conditioned on the same random seed. Sampling seed synchronization tightly constrains valid outputs, leaving providers minimal room to deviate from correct inference, which allows output tokens themselves to serve as auditable evidence of correctness at zero additional cost to the provider. Token-DiFR reliably identifies sampling errors, simulated bugs, and model quantization, detecting 4-bit quantization with AUC $>$ 0.999 within 300 output tokens. For applications requiring sample-efficient forward-pass verification, we additionally introduce Activation-DiFR, a scheme that uses random orthogonal projections to compress activations into compact fingerprints for subsequent verification. Activation-DiFR detects 4-bit quantization with AUC $>$ 0.999 using just 2 output tokens, while reducing communication overhead by 25-75% relative to existing methods. We release an open-source integration with vLLM to accelerate practical deployment of verifiable inference.
title	DiFR: Inference Verification Despite Nondeterminism
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2511.20621

Similar Items