Saved in:
Bibliographic Details
Main Authors: Narita, Kenichirou, Peng, Siqi, Fukui, Taku, Yamada, Moyuru, Munakata, Satoshi, Takahashi, Satoru
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2604.02640
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866917382592659456
author Narita, Kenichirou
Peng, Siqi
Fukui, Taku
Yamada, Moyuru
Munakata, Satoshi
Takahashi, Satoru
author_facet Narita, Kenichirou
Peng, Siqi
Fukui, Taku
Yamada, Moyuru
Munakata, Satoshi
Takahashi, Satoru
contents Performance evaluation of Retrieval-Augmented Generation (RAG) systems within enterprise environments is governed by multi-dimensional and composite factors extending far beyond simple final accuracy checks. These factors include reasoning complexity, retrieval difficulty, the diverse structure of documents, and stringent requirements for operational explainability. Existing academic benchmarks fail to systematically diagnose these interlocking challenges, resulting in a critical gap where models achieving high performance scores fail to meet the expected reliability in practical deployment. To bridge this discrepancy, this research proposes a multi-dimensional diagnostic framework by defining a four-axis difficulty taxonomy and integrating it into an enterprise RAG benchmark to diagnose potential system weaknesses.
format Preprint
id arxiv_https___arxiv_org_abs_2604_02640
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Overcoming the "Impracticality" of RAG: Proposing a Real-World Benchmark and Multi-Dimensional Diagnostic Framework
Narita, Kenichirou
Peng, Siqi
Fukui, Taku
Yamada, Moyuru
Munakata, Satoshi
Takahashi, Satoru
Computation and Language
Performance evaluation of Retrieval-Augmented Generation (RAG) systems within enterprise environments is governed by multi-dimensional and composite factors extending far beyond simple final accuracy checks. These factors include reasoning complexity, retrieval difficulty, the diverse structure of documents, and stringent requirements for operational explainability. Existing academic benchmarks fail to systematically diagnose these interlocking challenges, resulting in a critical gap where models achieving high performance scores fail to meet the expected reliability in practical deployment. To bridge this discrepancy, this research proposes a multi-dimensional diagnostic framework by defining a four-axis difficulty taxonomy and integrating it into an enterprise RAG benchmark to diagnose potential system weaknesses.
title Overcoming the "Impracticality" of RAG: Proposing a Real-World Benchmark and Multi-Dimensional Diagnostic Framework
topic Computation and Language
url https://arxiv.org/abs/2604.02640