Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Hsieh, He-Yen, Wang, Hong, Kung, H. T.
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2512.00670
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914174842437632
author	Hsieh, He-Yen Wang, Hong Kung, H. T.
author_facet	Hsieh, He-Yen Wang, Hong Kung, H. T.
contents	Diffusion-based large language models (dLLMs) refine token generations through iterative denoising, but answers often stabilize before all steps complete. We propose EDIT (Early Diffusion Inference Termination), an inference-time criterion that adaptively stops denoising once sufficient reasoning stability relative to training-time reasoning is detected. EDIT monitors the alignment between token activations and a reasoning map derived from AdamW-aggregated LoRA updates captured during supervised fine-tuning (SFT). During training, optimization dynamics generate rich metadata about parameter importance that in prior methods is typically discarded upon model release. We preserve this information as a compact representation of learned reasoning pathways. During inference, alignment scores are converted to a distribution over the tokens already unmasked at the current denoising step, and convergence is detected when KL divergence between consecutive steps falls below a threshold on the matched unmasked (visible) tokens. Across reasoning benchmarks, EDIT reduces diffusion steps by 11.8% to 68.3% while preserving or improving accuracy in most settings, with approximately 0.02% storage overhead (about 1.5-2 MB for all QKV modules across 32 blocks in an 8 GB model). By utilizing training-gradient dynamics, our work opens a new research direction for reducing dLLM inference time and cost.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_00670
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	EDIT: Early Diffusion Inference Termination for dLLMs Based on Dynamics of Training Gradients Hsieh, He-Yen Wang, Hong Kung, H. T. Artificial Intelligence Diffusion-based large language models (dLLMs) refine token generations through iterative denoising, but answers often stabilize before all steps complete. We propose EDIT (Early Diffusion Inference Termination), an inference-time criterion that adaptively stops denoising once sufficient reasoning stability relative to training-time reasoning is detected. EDIT monitors the alignment between token activations and a reasoning map derived from AdamW-aggregated LoRA updates captured during supervised fine-tuning (SFT). During training, optimization dynamics generate rich metadata about parameter importance that in prior methods is typically discarded upon model release. We preserve this information as a compact representation of learned reasoning pathways. During inference, alignment scores are converted to a distribution over the tokens already unmasked at the current denoising step, and convergence is detected when KL divergence between consecutive steps falls below a threshold on the matched unmasked (visible) tokens. Across reasoning benchmarks, EDIT reduces diffusion steps by 11.8% to 68.3% while preserving or improving accuracy in most settings, with approximately 0.02% storage overhead (about 1.5-2 MB for all QKV modules across 32 blocks in an 8 GB model). By utilizing training-gradient dynamics, our work opens a new research direction for reducing dLLM inference time and cost.
title	EDIT: Early Diffusion Inference Termination for dLLMs Based on Dynamics of Training Gradients
topic	Artificial Intelligence
url	https://arxiv.org/abs/2512.00670

Similar Items