Saved in:
| Main Authors: | , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.15522 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866914292761100288 |
|---|---|
| author | Deng, Jingcheng Pang, Liang Wei, Zihao Xu, Shicheng Duan, Zenghao Xu, Kun Song, Yang Shen, Huawei Cheng, Xueqi |
| author_facet | Deng, Jingcheng Pang, Liang Wei, Zihao Xu, Shicheng Duan, Zenghao Xu, Kun Song, Yang Shen, Huawei Cheng, Xueqi |
| contents | Latent reasoning offers a computation-efficient alternative to Chain-of-Thought but often suffers from performance degradation due to distributional misalignment and ambiguous chain definitions. Ideally, latent reasoning should function as a superposition of multiple reasoning paths. To realize this, we introduce Latent-SFT, a unified framework addressing challenges at three levels: token, chain, and learning. First, we define the Latent-Vocab to constrain hidden states within the pre-trained vocab-space. Second, we construct the Latent-Chain via Induction-Supervision Masking to ensure semantic compactness and sufficiency. Third, we employ Latent-Optim with stochastic Gumbel-Softmax to guide the model toward generalizable solutions. Empirical results demonstrate that Latent-SFT consistently outperforms explicit SFT across six mathematical benchmarks (e.g., GSM8k, AIME24) while achieving a 2.7x to 5.5x reduction in reasoning length. Analysis confirms that our method effectively captures a superposition of diverse reasoning trajectories rather than merely compressing a single path. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2510_15522 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | LLM Latent Reasoning as Chain of Superposition Deng, Jingcheng Pang, Liang Wei, Zihao Xu, Shicheng Duan, Zenghao Xu, Kun Song, Yang Shen, Huawei Cheng, Xueqi Computation and Language Latent reasoning offers a computation-efficient alternative to Chain-of-Thought but often suffers from performance degradation due to distributional misalignment and ambiguous chain definitions. Ideally, latent reasoning should function as a superposition of multiple reasoning paths. To realize this, we introduce Latent-SFT, a unified framework addressing challenges at three levels: token, chain, and learning. First, we define the Latent-Vocab to constrain hidden states within the pre-trained vocab-space. Second, we construct the Latent-Chain via Induction-Supervision Masking to ensure semantic compactness and sufficiency. Third, we employ Latent-Optim with stochastic Gumbel-Softmax to guide the model toward generalizable solutions. Empirical results demonstrate that Latent-SFT consistently outperforms explicit SFT across six mathematical benchmarks (e.g., GSM8k, AIME24) while achieving a 2.7x to 5.5x reduction in reasoning length. Analysis confirms that our method effectively captures a superposition of diverse reasoning trajectories rather than merely compressing a single path. |
| title | LLM Latent Reasoning as Chain of Superposition |
| topic | Computation and Language |
| url | https://arxiv.org/abs/2510.15522 |