Saved in:
Bibliographic Details
Main Authors: Deng, Jingcheng, Pang, Liang, Wei, Zihao, Xu, Shicheng, Duan, Zenghao, Xu, Kun, Song, Yang, Shen, Huawei, Cheng, Xueqi
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2510.15522
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914292761100288
author Deng, Jingcheng
Pang, Liang
Wei, Zihao
Xu, Shicheng
Duan, Zenghao
Xu, Kun
Song, Yang
Shen, Huawei
Cheng, Xueqi
author_facet Deng, Jingcheng
Pang, Liang
Wei, Zihao
Xu, Shicheng
Duan, Zenghao
Xu, Kun
Song, Yang
Shen, Huawei
Cheng, Xueqi
contents Latent reasoning offers a computation-efficient alternative to Chain-of-Thought but often suffers from performance degradation due to distributional misalignment and ambiguous chain definitions. Ideally, latent reasoning should function as a superposition of multiple reasoning paths. To realize this, we introduce Latent-SFT, a unified framework addressing challenges at three levels: token, chain, and learning. First, we define the Latent-Vocab to constrain hidden states within the pre-trained vocab-space. Second, we construct the Latent-Chain via Induction-Supervision Masking to ensure semantic compactness and sufficiency. Third, we employ Latent-Optim with stochastic Gumbel-Softmax to guide the model toward generalizable solutions. Empirical results demonstrate that Latent-SFT consistently outperforms explicit SFT across six mathematical benchmarks (e.g., GSM8k, AIME24) while achieving a 2.7x to 5.5x reduction in reasoning length. Analysis confirms that our method effectively captures a superposition of diverse reasoning trajectories rather than merely compressing a single path.
format Preprint
id arxiv_https___arxiv_org_abs_2510_15522
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle LLM Latent Reasoning as Chain of Superposition
Deng, Jingcheng
Pang, Liang
Wei, Zihao
Xu, Shicheng
Duan, Zenghao
Xu, Kun
Song, Yang
Shen, Huawei
Cheng, Xueqi
Computation and Language
Latent reasoning offers a computation-efficient alternative to Chain-of-Thought but often suffers from performance degradation due to distributional misalignment and ambiguous chain definitions. Ideally, latent reasoning should function as a superposition of multiple reasoning paths. To realize this, we introduce Latent-SFT, a unified framework addressing challenges at three levels: token, chain, and learning. First, we define the Latent-Vocab to constrain hidden states within the pre-trained vocab-space. Second, we construct the Latent-Chain via Induction-Supervision Masking to ensure semantic compactness and sufficiency. Third, we employ Latent-Optim with stochastic Gumbel-Softmax to guide the model toward generalizable solutions. Empirical results demonstrate that Latent-SFT consistently outperforms explicit SFT across six mathematical benchmarks (e.g., GSM8k, AIME24) while achieving a 2.7x to 5.5x reduction in reasoning length. Analysis confirms that our method effectively captures a superposition of diverse reasoning trajectories rather than merely compressing a single path.
title LLM Latent Reasoning as Chain of Superposition
topic Computation and Language
url https://arxiv.org/abs/2510.15522