Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Narita, Kenichirou, Peng, Siqi, Fukui, Taku, Yamada, Moyuru, Munakata, Satoshi, Takahashi, Satoru
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2604.02640
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917382592659456
author	Narita, Kenichirou Peng, Siqi Fukui, Taku Yamada, Moyuru Munakata, Satoshi Takahashi, Satoru
author_facet	Narita, Kenichirou Peng, Siqi Fukui, Taku Yamada, Moyuru Munakata, Satoshi Takahashi, Satoru
contents	Performance evaluation of Retrieval-Augmented Generation (RAG) systems within enterprise environments is governed by multi-dimensional and composite factors extending far beyond simple final accuracy checks. These factors include reasoning complexity, retrieval difficulty, the diverse structure of documents, and stringent requirements for operational explainability. Existing academic benchmarks fail to systematically diagnose these interlocking challenges, resulting in a critical gap where models achieving high performance scores fail to meet the expected reliability in practical deployment. To bridge this discrepancy, this research proposes a multi-dimensional diagnostic framework by defining a four-axis difficulty taxonomy and integrating it into an enterprise RAG benchmark to diagnose potential system weaknesses.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_02640
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Overcoming the "Impracticality" of RAG: Proposing a Real-World Benchmark and Multi-Dimensional Diagnostic Framework Narita, Kenichirou Peng, Siqi Fukui, Taku Yamada, Moyuru Munakata, Satoshi Takahashi, Satoru Computation and Language Performance evaluation of Retrieval-Augmented Generation (RAG) systems within enterprise environments is governed by multi-dimensional and composite factors extending far beyond simple final accuracy checks. These factors include reasoning complexity, retrieval difficulty, the diverse structure of documents, and stringent requirements for operational explainability. Existing academic benchmarks fail to systematically diagnose these interlocking challenges, resulting in a critical gap where models achieving high performance scores fail to meet the expected reliability in practical deployment. To bridge this discrepancy, this research proposes a multi-dimensional diagnostic framework by defining a four-axis difficulty taxonomy and integrating it into an enterprise RAG benchmark to diagnose potential system weaknesses.
title	Overcoming the "Impracticality" of RAG: Proposing a Real-World Benchmark and Multi-Dimensional Diagnostic Framework
topic	Computation and Language
url	https://arxiv.org/abs/2604.02640

Similar Items