Saved in:
Bibliographic Details
Main Authors: Li, Rongji, Xu, Jian, Chen, Yi, Chen, Xueqing, Yang, Yisheng, Wang, Jiayi, Chen, Xingyu, Xie, Chunyu, Leng, Dawei, Zhang, Xu-Yao
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.08209
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913028546494464
author Li, Rongji
Xu, Jian
Chen, Yi
Chen, Xueqing
Yang, Yisheng
Wang, Jiayi
Chen, Xingyu
Xie, Chunyu
Leng, Dawei
Zhang, Xu-Yao
author_facet Li, Rongji
Xu, Jian
Chen, Yi
Chen, Xueqing
Yang, Yisheng
Wang, Jiayi
Chen, Xingyu
Xie, Chunyu
Leng, Dawei
Zhang, Xu-Yao
contents In domains such as materials science, biomedicine, and finance, high-stakes deployment of large language models (LLMs) requires injecting private, domain-specific knowledge that is proprietary, fast-evolving, and under-represented in public pretraining. However, the two dominant paradigms for private knowledge injection each have clear drawbacks: fine-tuning is expensive to iterate under continual updates that can induce catastrophic forgetting and general-capability regression; retrieval-augmented generation (RAG) keeps the base model intact but remains brittle in specialized private corpora due to chunk-induced evidence fragmentation, retrieval mismatch, and long-context pressure. Inspired by how multimodal LLMs align heterogeneous modalities into a shared semantic space, we propose Generation-Augmented Generation (GAG), which treats private expertise as an auxiliary modality and injects it into a frozen base model through a compact, constant-budget latent interface. Concretely, GAG distills question-conditioned specialist knowledge from lightweight domain experts into multi-slot latent memories, integrates multi-layer expert signals via per-slot cross-layer fusion, and aligns them to the frozen base model through gated residual projection, while supporting scalable mixed-domain deployment with reliable selective activation. In a unified mixed-domain evaluation spanning two scientific private-domain QA benchmarks (catalytic materials and immunology adjuvant) together with general-domain queries, GAG consistently outperforms strong retrieval-based and parameter-efficient fine-tuning baselines on specialist QA, while preserving general-domain capability, achieving highly reliable routing, and offering a favorable efficiency--effectiveness trade-off. Code and datasets are provided in the supplementary material. Code is publicly available at https://github.com/360CVGroup/GAG.
format Preprint
id arxiv_https___arxiv_org_abs_2601_08209
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Generation-Augmented Generation: A Plug-and-Play Framework for Private Knowledge Injection in Large Language Models
Li, Rongji
Xu, Jian
Chen, Yi
Chen, Xueqing
Yang, Yisheng
Wang, Jiayi
Chen, Xingyu
Xie, Chunyu
Leng, Dawei
Zhang, Xu-Yao
Computation and Language
In domains such as materials science, biomedicine, and finance, high-stakes deployment of large language models (LLMs) requires injecting private, domain-specific knowledge that is proprietary, fast-evolving, and under-represented in public pretraining. However, the two dominant paradigms for private knowledge injection each have clear drawbacks: fine-tuning is expensive to iterate under continual updates that can induce catastrophic forgetting and general-capability regression; retrieval-augmented generation (RAG) keeps the base model intact but remains brittle in specialized private corpora due to chunk-induced evidence fragmentation, retrieval mismatch, and long-context pressure. Inspired by how multimodal LLMs align heterogeneous modalities into a shared semantic space, we propose Generation-Augmented Generation (GAG), which treats private expertise as an auxiliary modality and injects it into a frozen base model through a compact, constant-budget latent interface. Concretely, GAG distills question-conditioned specialist knowledge from lightweight domain experts into multi-slot latent memories, integrates multi-layer expert signals via per-slot cross-layer fusion, and aligns them to the frozen base model through gated residual projection, while supporting scalable mixed-domain deployment with reliable selective activation. In a unified mixed-domain evaluation spanning two scientific private-domain QA benchmarks (catalytic materials and immunology adjuvant) together with general-domain queries, GAG consistently outperforms strong retrieval-based and parameter-efficient fine-tuning baselines on specialist QA, while preserving general-domain capability, achieving highly reliable routing, and offering a favorable efficiency--effectiveness trade-off. Code and datasets are provided in the supplementary material. Code is publicly available at https://github.com/360CVGroup/GAG.
title Generation-Augmented Generation: A Plug-and-Play Framework for Private Knowledge Injection in Large Language Models
topic Computation and Language
url https://arxiv.org/abs/2601.08209