Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xie, Wenxuan, Wang, Yujia, Tan, Xin, Lu, Chaochao, Hu, Xia, Wang, Xuhong
Format:	Preprint
Published:	2026
Subjects:	Computation and Language Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.10021
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914320153051136
author	Xie, Wenxuan Wang, Yujia Tan, Xin Lu, Chaochao Hu, Xia Wang, Xuhong
author_facet	Xie, Wenxuan Wang, Yujia Tan, Xin Lu, Chaochao Hu, Xia Wang, Xuhong
contents	The integration of extensive, dynamic knowledge into Large Language Models (LLMs) remains a significant challenge due to the inherent entanglement of factual data and reasoning patterns. Existing solutions, ranging from non-parametric Retrieval-Augmented Generation (RAG) to parametric knowledge editing, are often constrained in practice by finite context windows, retriever noise, or the risk of catastrophic forgetting. In this paper, we propose DRIFT, a novel dual-model architecture designed to explicitly decouple knowledge extraction from the reasoning process. Unlike static prompt compression, DRIFT employs a lightweight knowledge model to dynamically compress document chunks into implicit fact tokens conditioned on the query. These dense representations are projected into the reasoning model's embedding space, replacing raw, redundant text while maintaining inference accuracy. Extensive experiments show that DRIFT significantly improves performance on long-context tasks, outperforming strong baselines among comparably sized models. Our approach provides a scalable and efficient paradigm for extending the effective context window and reasoning capabilities of LLMs. Our code is available at https://github.com/Lancelot-Xie/DRIFT.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_10021
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Decoupled Reasoning with Implicit Fact Tokens (DRIFT): A Dual-Model Framework for Efficient Long-Context Inference Xie, Wenxuan Wang, Yujia Tan, Xin Lu, Chaochao Hu, Xia Wang, Xuhong Computation and Language Artificial Intelligence The integration of extensive, dynamic knowledge into Large Language Models (LLMs) remains a significant challenge due to the inherent entanglement of factual data and reasoning patterns. Existing solutions, ranging from non-parametric Retrieval-Augmented Generation (RAG) to parametric knowledge editing, are often constrained in practice by finite context windows, retriever noise, or the risk of catastrophic forgetting. In this paper, we propose DRIFT, a novel dual-model architecture designed to explicitly decouple knowledge extraction from the reasoning process. Unlike static prompt compression, DRIFT employs a lightweight knowledge model to dynamically compress document chunks into implicit fact tokens conditioned on the query. These dense representations are projected into the reasoning model's embedding space, replacing raw, redundant text while maintaining inference accuracy. Extensive experiments show that DRIFT significantly improves performance on long-context tasks, outperforming strong baselines among comparably sized models. Our approach provides a scalable and efficient paradigm for extending the effective context window and reasoning capabilities of LLMs. Our code is available at https://github.com/Lancelot-Xie/DRIFT.
title	Decoupled Reasoning with Implicit Fact Tokens (DRIFT): A Dual-Model Framework for Efficient Long-Context Inference
topic	Computation and Language Artificial Intelligence
url	https://arxiv.org/abs/2602.10021

Similar Items