Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Cohen, Dvir, Burg, Lin, Pykhnivskyi, Sviatoslav, Gur, Hagit, Kovynov, Stanislav, Atzmon, Olga, Barkan, Gilad
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2505.08643
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866915286455681024
author	Cohen, Dvir Burg, Lin Pykhnivskyi, Sviatoslav Gur, Hagit Kovynov, Stanislav Atzmon, Olga Barkan, Gilad
author_facet	Cohen, Dvir Burg, Lin Pykhnivskyi, Sviatoslav Gur, Hagit Kovynov, Stanislav Atzmon, Olga Barkan, Gilad
contents	Retrieval-Augmented Generation (RAG) is a cornerstone of modern question answering (QA) systems, enabling grounded answers based on external knowledge. Although recent progress has been driven by open-domain datasets, enterprise QA systems need datasets that mirror the concrete, domain-specific issues users raise in day-to-day support scenarios. Critically, evaluating end-to-end RAG systems requires benchmarks comprising not only question--answer pairs but also the specific knowledge base (KB) snapshot from which answers were derived. To address this need, we introduce WixQA, a benchmark suite featuring QA datasets precisely grounded in the released KB corpus, enabling holistic evaluation of retrieval and generation components. WixQA includes three distinct QA datasets derived from Wix.com customer support interactions and grounded in a snapshot of the public Wix Help Center KB: (i) WixQA-ExpertWritten, 200 real user queries with expert-authored, multi-step answers; (ii) WixQA-Simulated, 200 expert-validated QA pairs distilled from user dialogues; and (iii) WixQA-Synthetic, 6,222 LLM-generated QA pairs, with one pair systematically derived from each article in the knowledge base. We release the KB snapshot alongside the datasets under MIT license and provide comprehensive baseline results, forming a unique benchmark for evaluating enterprise RAG systems in realistic enterprise environments.
format	Preprint
id	arxiv_https___arxiv_org_abs_2505_08643
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	WixQA: A Multi-Dataset Benchmark for Enterprise Retrieval-Augmented Generation Cohen, Dvir Burg, Lin Pykhnivskyi, Sviatoslav Gur, Hagit Kovynov, Stanislav Atzmon, Olga Barkan, Gilad Artificial Intelligence Machine Learning Retrieval-Augmented Generation (RAG) is a cornerstone of modern question answering (QA) systems, enabling grounded answers based on external knowledge. Although recent progress has been driven by open-domain datasets, enterprise QA systems need datasets that mirror the concrete, domain-specific issues users raise in day-to-day support scenarios. Critically, evaluating end-to-end RAG systems requires benchmarks comprising not only question--answer pairs but also the specific knowledge base (KB) snapshot from which answers were derived. To address this need, we introduce WixQA, a benchmark suite featuring QA datasets precisely grounded in the released KB corpus, enabling holistic evaluation of retrieval and generation components. WixQA includes three distinct QA datasets derived from Wix.com customer support interactions and grounded in a snapshot of the public Wix Help Center KB: (i) WixQA-ExpertWritten, 200 real user queries with expert-authored, multi-step answers; (ii) WixQA-Simulated, 200 expert-validated QA pairs distilled from user dialogues; and (iii) WixQA-Synthetic, 6,222 LLM-generated QA pairs, with one pair systematically derived from each article in the knowledge base. We release the KB snapshot alongside the datasets under MIT license and provide comprehensive baseline results, forming a unique benchmark for evaluating enterprise RAG systems in realistic enterprise environments.
title	WixQA: A Multi-Dataset Benchmark for Enterprise Retrieval-Augmented Generation
topic	Artificial Intelligence Machine Learning
url	https://arxiv.org/abs/2505.08643

Similar Items