Saved in:
Bibliographic Details
Main Authors: Zhong, Qimin, Liao, Hao, Qin, Haiming, Zhou, Mingyang, Mao, Rui, Chen, Wei, Chao, Naipeng
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2604.06155
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917422042185728
author Zhong, Qimin
Liao, Hao
Qin, Haiming
Zhou, Mingyang
Mao, Rui
Chen, Wei
Chao, Naipeng
author_facet Zhong, Qimin
Liao, Hao
Qin, Haiming
Zhou, Mingyang
Mao, Rui
Chen, Wei
Chao, Naipeng
contents Whether Large Language Models (LLMs) develop coherent internal world models remains a core debate. While conventional Next-Token Prediction (NTP) focuses on one-step-ahead supervision, Multi-Token Prediction (MTP) has shown promise in learning more structured representations. In this work, we provide a theoretical perspective analyzing the gradient inductive bias of MTP, supported by empirical evidence, showing that MTP promotes the convergence toward internal belief states by inducing representational contractivity via gradient coupling. However, we reveal that standard MTP often suffers from structural hallucinations, where discrete token supervision encourages illegal shortcuts in latent space that violate environmental constraints. To address this, we propose a novel method Latent Semantic Enhancement MTP (LSE-MTP), which anchors predictions to ground-truth hidden state trajectories. Experiments on synthetic graphs and real-world Manhattan Taxi Ride show that LSE-MTP effectively bridges the gap between discrete tokens and continuous state representations, enhancing representation alignment, reducing structural hallucinations, and improving robustness to perturbations.
format Preprint
id arxiv_https___arxiv_org_abs_2604_06155
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement
Zhong, Qimin
Liao, Hao
Qin, Haiming
Zhou, Mingyang
Mao, Rui
Chen, Wei
Chao, Naipeng
Machine Learning
Artificial Intelligence
Computation and Language
Whether Large Language Models (LLMs) develop coherent internal world models remains a core debate. While conventional Next-Token Prediction (NTP) focuses on one-step-ahead supervision, Multi-Token Prediction (MTP) has shown promise in learning more structured representations. In this work, we provide a theoretical perspective analyzing the gradient inductive bias of MTP, supported by empirical evidence, showing that MTP promotes the convergence toward internal belief states by inducing representational contractivity via gradient coupling. However, we reveal that standard MTP often suffers from structural hallucinations, where discrete token supervision encourages illegal shortcuts in latent space that violate environmental constraints. To address this, we propose a novel method Latent Semantic Enhancement MTP (LSE-MTP), which anchors predictions to ground-truth hidden state trajectories. Experiments on synthetic graphs and real-world Manhattan Taxi Ride show that LSE-MTP effectively bridges the gap between discrete tokens and continuous state representations, enhancing representation alignment, reducing structural hallucinations, and improving robustness to perturbations.
title Toward Consistent World Models with Multi-Token Prediction and Latent Semantic Enhancement
topic Machine Learning
Artificial Intelligence
Computation and Language
url https://arxiv.org/abs/2604.06155