Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Zhong, Qimin, Liao, Hao, Qin, Haiming, Zhou, Mingyang, Mao, Rui, Chen, Wei, Chao, Naipeng
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2604.06155
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917422042185728
author	Zhong, Qimin Liao, Hao Qin, Haiming Zhou, Mingyang Mao, Rui Chen, Wei Chao, Naipeng
author_facet	Zhong, Qimin Liao, Hao Qin, Haiming Zhou, Mingyang Mao, Rui Chen, Wei Chao, Naipeng
contents	Whether Large Language Models (LLMs) develop coherent internal world models remains a core debate. While conventional Next-Token Prediction (NTP) focuses on one-step-ahead supervision, Multi-Token Prediction (MTP) has shown promise in learning more structured representations. In this work, we provide a theoretical perspective analyzing the gradient inductive bias of MTP, supported by empirical evidence, showing that MTP promotes the convergence toward internal belief states by inducing representational contractivity via gradient coupling. However, we reveal that standard MTP often suffers from structural hallucinations, where discrete token supervision encourages illegal shortcuts in latent space that violate environmental constraints. To address this, we propose a novel method Latent Semantic Enhancement MTP (LSE-MTP), which anchors predictions to ground-truth hidden state trajectories. Experiments on synthetic graphs and real-world Manhattan Taxi Ride show that LSE-MTP effectively bridges the gap between discrete tokens and continuous state representations, enhancing representation alignment, reducing structural hallucinations, and improving robustness to perturbations.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_06155
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement Zhong, Qimin Liao, Hao Qin, Haiming Zhou, Mingyang Mao, Rui Chen, Wei Chao, Naipeng Machine Learning Artificial Intelligence Computation and Language Whether Large Language Models (LLMs) develop coherent internal world models remains a core debate. While conventional Next-Token Prediction (NTP) focuses on one-step-ahead supervision, Multi-Token Prediction (MTP) has shown promise in learning more structured representations. In this work, we provide a theoretical perspective analyzing the gradient inductive bias of MTP, supported by empirical evidence, showing that MTP promotes the convergence toward internal belief states by inducing representational contractivity via gradient coupling. However, we reveal that standard MTP often suffers from structural hallucinations, where discrete token supervision encourages illegal shortcuts in latent space that violate environmental constraints. To address this, we propose a novel method Latent Semantic Enhancement MTP (LSE-MTP), which anchors predictions to ground-truth hidden state trajectories. Experiments on synthetic graphs and real-world Manhattan Taxi Ride show that LSE-MTP effectively bridges the gap between discrete tokens and continuous state representations, enhancing representation alignment, reducing structural hallucinations, and improving robustness to perturbations.
title	Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement
topic	Machine Learning Artificial Intelligence Computation and Language
url	https://arxiv.org/abs/2604.06155

Similar Items