Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Kapoor, Vansh, Gupta, Aman, Chen, Hao, Beniwal, Anurag, Huang, Jing, Kumar, Aviral
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2601.10245
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908964618240000
author	Kapoor, Vansh Gupta, Aman Chen, Hao Beniwal, Anurag Huang, Jing Kumar, Aviral
author_facet	Kapoor, Vansh Gupta, Aman Chen, Hao Beniwal, Anurag Huang, Jing Kumar, Aviral
contents	Multi-step reasoning tasks like mathematical problem solving are vulnerable to cascading failures, where a single incorrect step leads to complete solution breakdown. Current LLM routing methods assign entire queries to one model, treating all reasoning steps as equal. We propose TRIM (Targeted routing in multi-step reasoning tasks), which routes only critical steps$\unicode{x2013}$those likely to derail the solution$\unicode{x2013}$to larger models while letting smaller models handle routine continuations. Our key insight is that targeted step-level interventions can fundamentally transform inference efficiency by confining expensive calls to precisely those steps where stronger models prevent cascading errors. TRIM operates at the step-level: it uses process reward models to identify erroneous steps and makes routing decisions based on step-level uncertainty and budget constraints. We develop several routing strategies within TRIM, ranging from a simple threshold-based policy to more expressive policies that reason about long-horizon accuracy-cost trade-offs and uncertainty in step-level correctness estimates. On MATH-500, even the simplest thresholding strategy surpasses prior routing methods with 5x higher cost efficiency, while more advanced policies match the strong, expensive model's performance using 80% fewer expensive model tokens. On harder benchmarks such as AIME, TRIM achieves up to 6x higher cost efficiency. All methods generalize effectively across math reasoning tasks, demonstrating that step-level difficulty represents fundamental characteristics of reasoning.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_10245
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	TRIM: Hybrid Inference via Targeted Stepwise Routing in Multi-Step Reasoning Tasks Kapoor, Vansh Gupta, Aman Chen, Hao Beniwal, Anurag Huang, Jing Kumar, Aviral Artificial Intelligence Computation and Language Machine Learning Multi-step reasoning tasks like mathematical problem solving are vulnerable to cascading failures, where a single incorrect step leads to complete solution breakdown. Current LLM routing methods assign entire queries to one model, treating all reasoning steps as equal. We propose TRIM (Targeted routing in multi-step reasoning tasks), which routes only critical steps$\unicode{x2013}$those likely to derail the solution$\unicode{x2013}$to larger models while letting smaller models handle routine continuations. Our key insight is that targeted step-level interventions can fundamentally transform inference efficiency by confining expensive calls to precisely those steps where stronger models prevent cascading errors. TRIM operates at the step-level: it uses process reward models to identify erroneous steps and makes routing decisions based on step-level uncertainty and budget constraints. We develop several routing strategies within TRIM, ranging from a simple threshold-based policy to more expressive policies that reason about long-horizon accuracy-cost trade-offs and uncertainty in step-level correctness estimates. On MATH-500, even the simplest thresholding strategy surpasses prior routing methods with 5x higher cost efficiency, while more advanced policies match the strong, expensive model's performance using 80% fewer expensive model tokens. On harder benchmarks such as AIME, TRIM achieves up to 6x higher cost efficiency. All methods generalize effectively across math reasoning tasks, demonstrating that step-level difficulty represents fundamental characteristics of reasoning.
title	TRIM: Hybrid Inference via Targeted Stepwise Routing in Multi-Step Reasoning Tasks
topic	Artificial Intelligence Computation and Language Machine Learning
url	https://arxiv.org/abs/2601.10245

Similar Items