Saved in:
Bibliographic Details
Main Authors: Ding, Fangyu, Ding, Ding, Chen, Sijin, Wang, Kaibo, Xu, Peng, Feng, Zijin, Bai, Haoli, Han, Kai, Yan, Youliang, Yuan, Binhang, Sun, Jiacheng
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.23507
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917360698392576
author Ding, Fangyu
Ding, Ding
Chen, Sijin
Wang, Kaibo
Xu, Peng
Feng, Zijin
Bai, Haoli
Han, Kai
Yan, Youliang
Yuan, Binhang
Sun, Jiacheng
author_facet Ding, Fangyu
Ding, Ding
Chen, Sijin
Wang, Kaibo
Xu, Peng
Feng, Zijin
Bai, Haoli
Han, Kai
Yan, Youliang
Yuan, Binhang
Sun, Jiacheng
contents While Masked Diffusion Language Models (MDLMs) relying on token masking and unmasking have shown promise in language modeling, their computational efficiency and generation flexibility remain constrained by the masking paradigm. In this paper, we propose Deletion-Insertion Diffusion language models (DID) that rigorously formulate token deletion and insertion as discrete diffusion processes, replacing the masking and unmasking processes in current MDLMs. DID improves training and inference efficiency by eliminating two major sources of computational overhead in MDLMs: the computations on non-informative 1) <MASK> tokens inherent to the paradigm, and 2) <PAD> tokens introduced in variable-length settings. Furthermore, DID offers greater flexibility by: 1) natively supporting variable-length sequences without requiring fixed-length padding, and 2) an intrinsic self-correction mechanism during generation due to insertion that dynamically adjusts token positions. To train DID, we design a score-based approach that assigns scores to token insertion operations and derive appropriate training objectives. The objectives involve subsequence counting problems, which we efficiently solve via a parallelized dynamic programming algorithm. Our experiments across fixed and variable-length settings demonstrate the advantage of DID over baselines of MDLMs and existing insertion-based LMs, in terms of modeling performance, sampling quality, and training/inference speed, without any hyperparameter tuning.
format Preprint
id arxiv_https___arxiv_org_abs_2603_23507
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes
Ding, Fangyu
Ding, Ding
Chen, Sijin
Wang, Kaibo
Xu, Peng
Feng, Zijin
Bai, Haoli
Han, Kai
Yan, Youliang
Yuan, Binhang
Sun, Jiacheng
Computation and Language
Artificial Intelligence
Machine Learning
While Masked Diffusion Language Models (MDLMs) relying on token masking and unmasking have shown promise in language modeling, their computational efficiency and generation flexibility remain constrained by the masking paradigm. In this paper, we propose Deletion-Insertion Diffusion language models (DID) that rigorously formulate token deletion and insertion as discrete diffusion processes, replacing the masking and unmasking processes in current MDLMs. DID improves training and inference efficiency by eliminating two major sources of computational overhead in MDLMs: the computations on non-informative 1) <MASK> tokens inherent to the paradigm, and 2) <PAD> tokens introduced in variable-length settings. Furthermore, DID offers greater flexibility by: 1) natively supporting variable-length sequences without requiring fixed-length padding, and 2) an intrinsic self-correction mechanism during generation due to insertion that dynamically adjusts token positions. To train DID, we design a score-based approach that assigns scores to token insertion operations and derive appropriate training objectives. The objectives involve subsequence counting problems, which we efficiently solve via a parallelized dynamic programming algorithm. Our experiments across fixed and variable-length settings demonstrate the advantage of DID over baselines of MDLMs and existing insertion-based LMs, in terms of modeling performance, sampling quality, and training/inference speed, without any hyperparameter tuning.
title Beyond Masks: Efficient, Flexible Diffusion Language Models via Deletion-Insertion Processes
topic Computation and Language
Artificial Intelligence
Machine Learning
url https://arxiv.org/abs/2603.23507