Saved in:
Bibliographic Details
Main Author: Adeola, Maximus
Format: Recurso digital
Language:
Published: Zenodo 2026
Online Access:https://doi.org/10.5281/zenodo.18502053
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866901345528709120
author Adeola, Maximus
author_facet Adeola, Maximus
contents <p>Conversational systems built on Large Language Models (LLMs) face an escalating chal- lenge: as dialogue history grows, context windows expand exponentially, drastically increas- ing inference costs with each new message. Compounding this issue is the quadratic com- plexity of self-attention mechanisms (O(N 2)), which limits the practical context capacity of even state-of-the-art models. I present CRAiG (Contextual Retrieval Augmented Genera- tion), a novel architecture in which a lightweight External Attention Mechanism (EAM)—a 43 million parameter model—is trained to operate atop any generative LLM, intelligently curating the most relevant context for each prompt. By decoupling context selection from generation, CRAiG enables models to handle large conversational histories (up to 3.6 mil- lion tokens) while processing only a constant, manageable subset of information at inference time. Through a three-stage training process incorporating teacher-supervised learning, Se- mantic Phase Shift Augmentation (SPSA), and Natural Language Inference (NLI) optimiza- tion, CRAiG achieved a 68.53% accuracy on LongBench v2, surpassing state-of-the-art commercial models including Gemini 3 Pro (65.6%) and Claude Sonnet 4.5 (61.8%), while reducing token consumption by up to 93%. My approach demonstrates exceptional perfor- mance on domain-specific tasks, reaching 79.59% accuracy on code repository understanding and 75.31% on long in-context learning. The entire research project, from data collection to final training, cost under $19 USD, demonstrating the cost-effectiveness and accessibility of this method.</p>
format Recurso digital
id zenodo_https___doi_org_10_5281_zenodo_18502053
institution Zenodo
language
publishDate 2026
publisher Zenodo
record_format zenodo
spellingShingle CRAiG: Contextual Retrieval Augmented Generation
Adeola, Maximus
<p>Conversational systems built on Large Language Models (LLMs) face an escalating chal- lenge: as dialogue history grows, context windows expand exponentially, drastically increas- ing inference costs with each new message. Compounding this issue is the quadratic com- plexity of self-attention mechanisms (O(N 2)), which limits the practical context capacity of even state-of-the-art models. I present CRAiG (Contextual Retrieval Augmented Genera- tion), a novel architecture in which a lightweight External Attention Mechanism (EAM)—a 43 million parameter model—is trained to operate atop any generative LLM, intelligently curating the most relevant context for each prompt. By decoupling context selection from generation, CRAiG enables models to handle large conversational histories (up to 3.6 mil- lion tokens) while processing only a constant, manageable subset of information at inference time. Through a three-stage training process incorporating teacher-supervised learning, Se- mantic Phase Shift Augmentation (SPSA), and Natural Language Inference (NLI) optimiza- tion, CRAiG achieved a 68.53% accuracy on LongBench v2, surpassing state-of-the-art commercial models including Gemini 3 Pro (65.6%) and Claude Sonnet 4.5 (61.8%), while reducing token consumption by up to 93%. My approach demonstrates exceptional perfor- mance on domain-specific tasks, reaching 79.59% accuracy on code repository understanding and 75.31% on long in-context learning. The entire research project, from data collection to final training, cost under $19 USD, demonstrating the cost-effectiveness and accessibility of this method.</p>
title CRAiG: Contextual Retrieval Augmented Generation
url https://doi.org/10.5281/zenodo.18502053