Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Luo, Hongyin, Morgan, Nathaniel, Li, Tina, Zhao, Derek, Ngo, Ai Vy, Schroeder, Philip, Yang, Lijie, Ben-Kish, Assaf, O'Brien, Jack, Glass, James
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2507.16784
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913953835122688
author	Luo, Hongyin Morgan, Nathaniel Li, Tina Zhao, Derek Ngo, Ai Vy Schroeder, Philip Yang, Lijie Ben-Kish, Assaf O'Brien, Jack Glass, James
author_facet	Luo, Hongyin Morgan, Nathaniel Li, Tina Zhao, Derek Ngo, Ai Vy Schroeder, Philip Yang, Lijie Ben-Kish, Assaf O'Brien, Jack Glass, James
contents	To break the context limits of large language models (LLMs) that bottleneck reasoning accuracy and efficiency, we propose the Thread Inference Model (TIM), a family of LLMs trained for recursive and decompositional problem solving, and TIMRUN, an inference runtime enabling long-horizon structured reasoning beyond context limits. Together, TIM hosted on TIMRUN supports virtually unlimited working memory and multi-hop tool calls within a single language model inference, overcoming output limits, positional-embedding constraints, and GPU-memory bottlenecks. Performance is achieved by modeling natural language as reasoning trees measured by both length and depth instead of linear sequences. The reasoning trees consist of tasks with thoughts, recursive subtasks, and conclusions based on the concept we proposed in Schroeder et al, 2025. During generation, we maintain a working memory that retains only the key-value states of the most relevant context tokens, selected by a rule-based subtask-pruning mechanism, enabling reuse of positional embeddings and GPU memory pages throughout reasoning. Experimental results show that our system sustains high inference throughput, even when manipulating up to 90% of the KV cache in GPU memory. It also delivers accurate reasoning on mathematical tasks and handles information retrieval challenges that require long-horizon reasoning and multi-hop tool use.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_16784
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning Luo, Hongyin Morgan, Nathaniel Li, Tina Zhao, Derek Ngo, Ai Vy Schroeder, Philip Yang, Lijie Ben-Kish, Assaf O'Brien, Jack Glass, James Computation and Language To break the context limits of large language models (LLMs) that bottleneck reasoning accuracy and efficiency, we propose the Thread Inference Model (TIM), a family of LLMs trained for recursive and decompositional problem solving, and TIMRUN, an inference runtime enabling long-horizon structured reasoning beyond context limits. Together, TIM hosted on TIMRUN supports virtually unlimited working memory and multi-hop tool calls within a single language model inference, overcoming output limits, positional-embedding constraints, and GPU-memory bottlenecks. Performance is achieved by modeling natural language as reasoning trees measured by both length and depth instead of linear sequences. The reasoning trees consist of tasks with thoughts, recursive subtasks, and conclusions based on the concept we proposed in Schroeder et al, 2025. During generation, we maintain a working memory that retains only the key-value states of the most relevant context tokens, selected by a rule-based subtask-pruning mechanism, enabling reuse of positional embeddings and GPU memory pages throughout reasoning. Experimental results show that our system sustains high inference throughput, even when manipulating up to 90% of the KV cache in GPU memory. It also delivers accurate reasoning on mathematical tasks and handles information retrieval challenges that require long-horizon reasoning and multi-hop tool use.
title	Beyond Context Limits: Subconscious Threads for Long-Horizon Reasoning
topic	Computation and Language
url	https://arxiv.org/abs/2507.16784

Similar Items