Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Gao, Zitian, Luo, Haoming, Chen, Lynx, Liu, Jason Klein, Tao, Ran, Zhou, Joey, Dai, Bryan
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2510.04071
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909825524301824
author	Gao, Zitian Luo, Haoming Chen, Lynx Liu, Jason Klein Tao, Ran Zhou, Joey Dai, Bryan
author_facet	Gao, Zitian Luo, Haoming Chen, Lynx Liu, Jason Klein Tao, Ran Zhou, Joey Dai, Bryan
contents	Recent studies have shown that diffusion language models achieve remarkable data efficiency under limited-data constraints, yet the underlying mechanisms remain unclear. In this work, we perform extensive ablation experiments to disentangle the sources of this efficiency. Our results show that random masking of input tokens plays the dominant role. We further show that similar gains can be obtained through in MLP dropout and weight decay, indicating that stochastic regularization broadly enhances data efficiency in multi-epoch training. Our code is available at https://github.com/zitian-gao/data-efficiency.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_04071
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	What Makes Diffusion Language Models Super Data Learners? Gao, Zitian Luo, Haoming Chen, Lynx Liu, Jason Klein Tao, Ran Zhou, Joey Dai, Bryan Computation and Language Recent studies have shown that diffusion language models achieve remarkable data efficiency under limited-data constraints, yet the underlying mechanisms remain unclear. In this work, we perform extensive ablation experiments to disentangle the sources of this efficiency. Our results show that random masking of input tokens plays the dominant role. We further show that similar gains can be obtained through in MLP dropout and weight decay, indicating that stochastic regularization broadly enhances data efficiency in multi-epoch training. Our code is available at https://github.com/zitian-gao/data-efficiency.
title	What Makes Diffusion Language Models Super Data Learners?
topic	Computation and Language
url	https://arxiv.org/abs/2510.04071

Similar Items