Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Wang, Qi, Cao, Yixuan, Liu, Yifan, Zhao, Jiangtao, Luo, Ping
Format:	Preprint
Publié:	2025
Sujets:	Information Retrieval
Accès en ligne:	https://arxiv.org/abs/2507.00477
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866908646548439040
author	Wang, Qi Cao, Yixuan Liu, Yifan Zhao, Jiangtao Luo, Ping
author_facet	Wang, Qi Cao, Yixuan Liu, Yifan Zhao, Jiangtao Luo, Ping
contents	A Retrieval-Augmented Generation (RAG)-based question-answering (QA) system enhances a large language model's knowledge by retrieving relevant documents based on user queries. Discrepancies between user queries and document phrasings often necessitate query rewriting. However, in specialized domains, the rewriter model may struggle due to limited domain-specific knowledge. To resolve this, we propose the R\&R (Read the doc before Rewriting) rewriter, which involves continual pre-training on professional documents, akin to how students prepare for open-book exams by reviewing textbooks. Additionally, it can be combined with supervised fine-tuning for improved results. Experiments on multiple datasets demonstrate that R\&R excels in professional QA across multiple domains, effectively bridging the query-document gap, while maintaining good performance in general scenarios, thus advancing the application of RAG-based QA systems in specialized fields.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_00477
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Read the Docs Before Rewriting: Equip Rewriter with Domain Knowledge via Continual Pre-training Wang, Qi Cao, Yixuan Liu, Yifan Zhao, Jiangtao Luo, Ping Information Retrieval A Retrieval-Augmented Generation (RAG)-based question-answering (QA) system enhances a large language model's knowledge by retrieving relevant documents based on user queries. Discrepancies between user queries and document phrasings often necessitate query rewriting. However, in specialized domains, the rewriter model may struggle due to limited domain-specific knowledge. To resolve this, we propose the R\&R (Read the doc before Rewriting) rewriter, which involves continual pre-training on professional documents, akin to how students prepare for open-book exams by reviewing textbooks. Additionally, it can be combined with supervised fine-tuning for improved results. Experiments on multiple datasets demonstrate that R\&R excels in professional QA across multiple domains, effectively bridging the query-document gap, while maintaining good performance in general scenarios, thus advancing the application of RAG-based QA systems in specialized fields.
title	Read the Docs Before Rewriting: Equip Rewriter with Domain Knowledge via Continual Pre-training
topic	Information Retrieval
url	https://arxiv.org/abs/2507.00477

Documents similaires