Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Tian, Jiayi, Su, Yupeng, Solgi, Ryan, Kundu, Souvik, Zhang, Zheng
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2604.16694
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908975590539264
author	Tian, Jiayi Su, Yupeng Solgi, Ryan Kundu, Souvik Zhang, Zheng
author_facet	Tian, Jiayi Su, Yupeng Solgi, Ryan Kundu, Souvik Zhang, Zheng
contents	Large reasoning models (LRMs) enhance problem-solving capabilities by generating explicit multi-step chains of thought (CoT) reasoning; however, they incur substantial inference latency and computational overhead. To mitigate this issue, recent works have explored model collaboration paradigms, where small reasoning models (SRMs) generate intermediate reasoning steps to achieve a better accuracy--latency trade-off. Despite recent progress, effectively and efficiently detecting and mitigating SRM failures in collaborative systems remains a key challenge. To address this issue, we analyze SRM inference in both the generated text and hidden-state spaces, and identify three types of failure modes: \textit{overconfidence}, \textit{uncertainty}, and \textit{heavy revalidation}. Building on these insights, we propose \textbf{RankGuide}, a framework that improves the efficiency and effectiveness of SRM--LRM collaboration through tensor-rank-guided routing and steering. Specifically, RankGuide leverages a routing signal that incorporates tensor-rank signals derived from consecutive hidden states to detect when SRMs are likely to fail and selectively invoke LRMs. In addition, we introduce a tensor-rank-filtered steering vector extraction method to modulate the reasoning trajectory of SRMs, thereby improving their generation quality. By improving both routing and steering through tensor-rank signals, RankGuide enables SRM--LRM collaborative systems to achieve more efficient reasoning with fewer steps and improved accuracy. Experiments on multiple reasoning benchmarks demonstrate the efficacy of RankGuide in reducing latency by up to $1.75\times$ compared to LRM, while maintaining competitive accuracy relative to prior methods.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_16694
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	RankGuide: Tensor-Rank-Guided Routing and Steering for Efficient Reasoning Tian, Jiayi Su, Yupeng Solgi, Ryan Kundu, Souvik Zhang, Zheng Artificial Intelligence Large reasoning models (LRMs) enhance problem-solving capabilities by generating explicit multi-step chains of thought (CoT) reasoning; however, they incur substantial inference latency and computational overhead. To mitigate this issue, recent works have explored model collaboration paradigms, where small reasoning models (SRMs) generate intermediate reasoning steps to achieve a better accuracy--latency trade-off. Despite recent progress, effectively and efficiently detecting and mitigating SRM failures in collaborative systems remains a key challenge. To address this issue, we analyze SRM inference in both the generated text and hidden-state spaces, and identify three types of failure modes: \textit{overconfidence}, \textit{uncertainty}, and \textit{heavy revalidation}. Building on these insights, we propose \textbf{RankGuide}, a framework that improves the efficiency and effectiveness of SRM--LRM collaboration through tensor-rank-guided routing and steering. Specifically, RankGuide leverages a routing signal that incorporates tensor-rank signals derived from consecutive hidden states to detect when SRMs are likely to fail and selectively invoke LRMs. In addition, we introduce a tensor-rank-filtered steering vector extraction method to modulate the reasoning trajectory of SRMs, thereby improving their generation quality. By improving both routing and steering through tensor-rank signals, RankGuide enables SRM--LRM collaborative systems to achieve more efficient reasoning with fewer steps and improved accuracy. Experiments on multiple reasoning benchmarks demonstrate the efficacy of RankGuide in reducing latency by up to $1.75\times$ compared to LRM, while maintaining competitive accuracy relative to prior methods.
title	RankGuide: Tensor-Rank-Guided Routing and Steering for Efficient Reasoning
topic	Artificial Intelligence
url	https://arxiv.org/abs/2604.16694

Similar Items