Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Rongji, Xu, Jian, Chen, Yi, Chen, Xueqing, Yang, Yisheng, Wang, Jiayi, Chen, Xingyu, Xie, Chunyu, Leng, Dawei, Zhang, Xu-Yao
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2601.08209
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866913028546494464
author	Li, Rongji Xu, Jian Chen, Yi Chen, Xueqing Yang, Yisheng Wang, Jiayi Chen, Xingyu Xie, Chunyu Leng, Dawei Zhang, Xu-Yao
author_facet	Li, Rongji Xu, Jian Chen, Yi Chen, Xueqing Yang, Yisheng Wang, Jiayi Chen, Xingyu Xie, Chunyu Leng, Dawei Zhang, Xu-Yao
contents	In domains such as materials science, biomedicine, and finance, high-stakes deployment of large language models (LLMs) requires injecting private, domain-specific knowledge that is proprietary, fast-evolving, and under-represented in public pretraining. However, the two dominant paradigms for private knowledge injection each have clear drawbacks: fine-tuning is expensive to iterate under continual updates that can induce catastrophic forgetting and general-capability regression; retrieval-augmented generation (RAG) keeps the base model intact but remains brittle in specialized private corpora due to chunk-induced evidence fragmentation, retrieval mismatch, and long-context pressure. Inspired by how multimodal LLMs align heterogeneous modalities into a shared semantic space, we propose Generation-Augmented Generation (GAG), which treats private expertise as an auxiliary modality and injects it into a frozen base model through a compact, constant-budget latent interface. Concretely, GAG distills question-conditioned specialist knowledge from lightweight domain experts into multi-slot latent memories, integrates multi-layer expert signals via per-slot cross-layer fusion, and aligns them to the frozen base model through gated residual projection, while supporting scalable mixed-domain deployment with reliable selective activation. In a unified mixed-domain evaluation spanning two scientific private-domain QA benchmarks (catalytic materials and immunology adjuvant) together with general-domain queries, GAG consistently outperforms strong retrieval-based and parameter-efficient fine-tuning baselines on specialist QA, while preserving general-domain capability, achieving highly reliable routing, and offering a favorable efficiency--effectiveness trade-off. Code and datasets are provided in the supplementary material. Code is publicly available at https://github.com/360CVGroup/GAG.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_08209
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Generation-Augmented Generation: A Plug-and-Play Framework for Private Knowledge Injection in Large Language Models Li, Rongji Xu, Jian Chen, Yi Chen, Xueqing Yang, Yisheng Wang, Jiayi Chen, Xingyu Xie, Chunyu Leng, Dawei Zhang, Xu-Yao Computation and Language In domains such as materials science, biomedicine, and finance, high-stakes deployment of large language models (LLMs) requires injecting private, domain-specific knowledge that is proprietary, fast-evolving, and under-represented in public pretraining. However, the two dominant paradigms for private knowledge injection each have clear drawbacks: fine-tuning is expensive to iterate under continual updates that can induce catastrophic forgetting and general-capability regression; retrieval-augmented generation (RAG) keeps the base model intact but remains brittle in specialized private corpora due to chunk-induced evidence fragmentation, retrieval mismatch, and long-context pressure. Inspired by how multimodal LLMs align heterogeneous modalities into a shared semantic space, we propose Generation-Augmented Generation (GAG), which treats private expertise as an auxiliary modality and injects it into a frozen base model through a compact, constant-budget latent interface. Concretely, GAG distills question-conditioned specialist knowledge from lightweight domain experts into multi-slot latent memories, integrates multi-layer expert signals via per-slot cross-layer fusion, and aligns them to the frozen base model through gated residual projection, while supporting scalable mixed-domain deployment with reliable selective activation. In a unified mixed-domain evaluation spanning two scientific private-domain QA benchmarks (catalytic materials and immunology adjuvant) together with general-domain queries, GAG consistently outperforms strong retrieval-based and parameter-efficient fine-tuning baselines on specialist QA, while preserving general-domain capability, achieving highly reliable routing, and offering a favorable efficiency--effectiveness trade-off. Code and datasets are provided in the supplementary material. Code is publicly available at https://github.com/360CVGroup/GAG.
title	Generation-Augmented Generation: A Plug-and-Play Framework for Private Knowledge Injection in Large Language Models
topic	Computation and Language
url	https://arxiv.org/abs/2601.08209

Similar Items