Saved in:
Bibliographic Details
Main Authors: Xie, Wenxuan, Wang, Yujia, Tan, Xin, Lu, Chaochao, Hu, Xia, Wang, Xuhong
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2602.10021
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866914320153051136
author Xie, Wenxuan
Wang, Yujia
Tan, Xin
Lu, Chaochao
Hu, Xia
Wang, Xuhong
author_facet Xie, Wenxuan
Wang, Yujia
Tan, Xin
Lu, Chaochao
Hu, Xia
Wang, Xuhong
contents The integration of extensive, dynamic knowledge into Large Language Models (LLMs) remains a significant challenge due to the inherent entanglement of factual data and reasoning patterns. Existing solutions, ranging from non-parametric Retrieval-Augmented Generation (RAG) to parametric knowledge editing, are often constrained in practice by finite context windows, retriever noise, or the risk of catastrophic forgetting. In this paper, we propose DRIFT, a novel dual-model architecture designed to explicitly decouple knowledge extraction from the reasoning process. Unlike static prompt compression, DRIFT employs a lightweight knowledge model to dynamically compress document chunks into implicit fact tokens conditioned on the query. These dense representations are projected into the reasoning model's embedding space, replacing raw, redundant text while maintaining inference accuracy. Extensive experiments show that DRIFT significantly improves performance on long-context tasks, outperforming strong baselines among comparably sized models. Our approach provides a scalable and efficient paradigm for extending the effective context window and reasoning capabilities of LLMs. Our code is available at https://github.com/Lancelot-Xie/DRIFT.
format Preprint
id arxiv_https___arxiv_org_abs_2602_10021
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Decoupled Reasoning with Implicit Fact Tokens (DRIFT): A Dual-Model Framework for Efficient Long-Context Inference
Xie, Wenxuan
Wang, Yujia
Tan, Xin
Lu, Chaochao
Hu, Xia
Wang, Xuhong
Computation and Language
Artificial Intelligence
The integration of extensive, dynamic knowledge into Large Language Models (LLMs) remains a significant challenge due to the inherent entanglement of factual data and reasoning patterns. Existing solutions, ranging from non-parametric Retrieval-Augmented Generation (RAG) to parametric knowledge editing, are often constrained in practice by finite context windows, retriever noise, or the risk of catastrophic forgetting. In this paper, we propose DRIFT, a novel dual-model architecture designed to explicitly decouple knowledge extraction from the reasoning process. Unlike static prompt compression, DRIFT employs a lightweight knowledge model to dynamically compress document chunks into implicit fact tokens conditioned on the query. These dense representations are projected into the reasoning model's embedding space, replacing raw, redundant text while maintaining inference accuracy. Extensive experiments show that DRIFT significantly improves performance on long-context tasks, outperforming strong baselines among comparably sized models. Our approach provides a scalable and efficient paradigm for extending the effective context window and reasoning capabilities of LLMs. Our code is available at https://github.com/Lancelot-Xie/DRIFT.
title Decoupled Reasoning with Implicit Fact Tokens (DRIFT): A Dual-Model Framework for Efficient Long-Context Inference
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2602.10021