Saved in:
Bibliographic Details
Main Authors: Wang, Peihao, Yang, Shan, Wang, Xijun, Xiao, Tesi, Liu, Xin, Yu, Changlong, Lou, Yu, Li, Pan, Wang, Zhangyang, Lin, Ming, Vidal, René
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.09221
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917546064609280
author Wang, Peihao
Yang, Shan
Wang, Xijun
Xiao, Tesi
Liu, Xin
Yu, Changlong
Lou, Yu
Li, Pan
Wang, Zhangyang
Lin, Ming
Vidal, René
author_facet Wang, Peihao
Yang, Shan
Wang, Xijun
Xiao, Tesi
Liu, Xin
Yu, Changlong
Lou, Yu
Li, Pan
Wang, Zhangyang
Lin, Ming
Vidal, René
contents Associative memory has long underpinned the design of sequential models. Beyond recall, humans reason by projecting future states and selecting goal-directed actions, a capability that modern language models increasingly require but do not natively encode. While prior work uses reinforcement learning or test-time training, planning remains external to the model architecture. We formulate reasoning as optimal control and introduce the Test-Time Control (TTC) layer, which performs finite-horizon LQR planning over latent states at inference time, represents a value function within neural architectures, and leverages it as the nested objective to enable planning before prediction. To ensure scalability, we derive a hardware-efficient LQR solver based on a symplectic formulation and implement it as a fused CUDA kernel, enabling parallel execution with minimal overhead. Integrated as an adapter into pretrained LLMs, TTC layers improve mathematical reasoning performance by up to +27.8% on MATH-500 and 2-3x Pass@8 improvements on AMC and AIME, demonstrating that embedding optimal control as an architectural component provides an effective and scalable mechanism for reasoning beyond test-time training.
format Preprint
id arxiv_https___arxiv_org_abs_2603_09221
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Beyond Test-Time Memory: State-Space Optimal Control for LLM Reasoning
Wang, Peihao
Yang, Shan
Wang, Xijun
Xiao, Tesi
Liu, Xin
Yu, Changlong
Lou, Yu
Li, Pan
Wang, Zhangyang
Lin, Ming
Vidal, René
Machine Learning
Associative memory has long underpinned the design of sequential models. Beyond recall, humans reason by projecting future states and selecting goal-directed actions, a capability that modern language models increasingly require but do not natively encode. While prior work uses reinforcement learning or test-time training, planning remains external to the model architecture. We formulate reasoning as optimal control and introduce the Test-Time Control (TTC) layer, which performs finite-horizon LQR planning over latent states at inference time, represents a value function within neural architectures, and leverages it as the nested objective to enable planning before prediction. To ensure scalability, we derive a hardware-efficient LQR solver based on a symplectic formulation and implement it as a fused CUDA kernel, enabling parallel execution with minimal overhead. Integrated as an adapter into pretrained LLMs, TTC layers improve mathematical reasoning performance by up to +27.8% on MATH-500 and 2-3x Pass@8 improvements on AMC and AIME, demonstrating that embedding optimal control as an architectural component provides an effective and scalable mechanism for reasoning beyond test-time training.
title Beyond Test-Time Memory: State-Space Optimal Control for LLM Reasoning
topic Machine Learning
url https://arxiv.org/abs/2603.09221