Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.10195 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866909040938844160 |
|---|---|
| author | Zhong, Shuzhang Huang, Haochen Qiu, Shengxuan Zuo, Pengfei Wang, Runsheng Li, Meng |
| author_facet | Zhong, Shuzhang Huang, Haochen Qiu, Shengxuan Zuo, Pengfei Wang, Runsheng Li, Meng |
| contents | Tree-of-Thought (ToT) reasoning structures Large Language Model (LLM) inference as a tree-based search, demonstrating strong potential for solving complex mathematical and programming tasks. However, its efficiency is constrained by the reward dependency barrier -- a synchronization bottleneck caused by sequential reward-guided exploration that limits search parallelism and introduces substantial latency. Prior system optimizations, mainly designed for linear Chain-of-Thought (CoT) reasoning, cannot address these challenges, leaving the efficiency of ToT underexplored.
To enhance ToT reasoning efficiency, we observe that the reasoning paths can be explored speculatively to break the reward synchronization barrier. Therefore, in this paper, we propose SPEX and introduce three key techniques: (i) intra-query speculative path selection to predict and expand high-potential branches of ToT, (ii) inter-query budget allocation to balance speculative resource allocation across queries dynamically, and (iii) adaptive early termination to prune deep and redundant branches for a skewed search tree.
We implement SPEX on top of the SGLang framework and evaluate it across diverse ToT algorithms and LLMs. Extensive experiments show that SPEX achieves $1.2 \sim 3 \times$ speedup for different ToT reasoning algorithms. Moreover, SPEX synergizes with token-level speculative decoding, achieving cumulative speedups of up to $4.1\times$. Ablation studies further confirm the contributions of each technique. Overall, SPEX represents a significant step toward efficient and scalable ToT reasoning, unlocking the parallelism required for high-performance inference-time scaling for LLMs. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2605_10195 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | Breaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Exploration Zhong, Shuzhang Huang, Haochen Qiu, Shengxuan Zuo, Pengfei Wang, Runsheng Li, Meng Machine Learning Tree-of-Thought (ToT) reasoning structures Large Language Model (LLM) inference as a tree-based search, demonstrating strong potential for solving complex mathematical and programming tasks. However, its efficiency is constrained by the reward dependency barrier -- a synchronization bottleneck caused by sequential reward-guided exploration that limits search parallelism and introduces substantial latency. Prior system optimizations, mainly designed for linear Chain-of-Thought (CoT) reasoning, cannot address these challenges, leaving the efficiency of ToT underexplored. To enhance ToT reasoning efficiency, we observe that the reasoning paths can be explored speculatively to break the reward synchronization barrier. Therefore, in this paper, we propose SPEX and introduce three key techniques: (i) intra-query speculative path selection to predict and expand high-potential branches of ToT, (ii) inter-query budget allocation to balance speculative resource allocation across queries dynamically, and (iii) adaptive early termination to prune deep and redundant branches for a skewed search tree. We implement SPEX on top of the SGLang framework and evaluate it across diverse ToT algorithms and LLMs. Extensive experiments show that SPEX achieves $1.2 \sim 3 \times$ speedup for different ToT reasoning algorithms. Moreover, SPEX synergizes with token-level speculative decoding, achieving cumulative speedups of up to $4.1\times$. Ablation studies further confirm the contributions of each technique. Overall, SPEX represents a significant step toward efficient and scalable ToT reasoning, unlocking the parallelism required for high-performance inference-time scaling for LLMs. |
| title | Breaking the Reward Barrier: Accelerating Tree-of-Thought Reasoning via Speculative Exploration |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2605.10195 |