Saved in:
Bibliographic Details
Main Authors: Narimissa, Esmaeil, Raithel, David
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2409.08479
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910613523922944
author Narimissa, Esmaeil
Raithel, David
author_facet Narimissa, Esmaeil
Raithel, David
contents The performance of Retrieval-Augmented Generation (RAG) systems in information retrieval is significantly influenced by the characteristics of the documents being processed. In this study, the structured nature of textbooks, the conciseness of articles, and the narrative complexity of novels are shown to require distinct retrieval strategies. A comparative evaluation of multiple document-splitting methods reveals that the Recursive Character Splitter outperforms the Token-based Splitter in preserving contextual integrity. A novel evaluation technique is introduced, utilizing an open-source model to generate a comprehensive dataset of question-and-answer pairs, simulating realistic retrieval scenarios to enhance testing efficiency and metric reliability. The evaluation employs weighted scoring metrics, including SequenceMatcher, BLEU, METEOR, and BERT Score, to assess the system's accuracy and relevance. This approach establishes a refined standard for evaluating the precision of RAG systems, with future research focusing on optimizing chunk and overlap sizes to improve retrieval accuracy and efficiency.
format Preprint
id arxiv_https___arxiv_org_abs_2409_08479
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Exploring Information Retrieval Landscapes: An Investigation of a Novel Evaluation Techniques and Comparative Document Splitting Methods
Narimissa, Esmaeil
Raithel, David
Information Retrieval
Artificial Intelligence
I.2.7; H.3.3
The performance of Retrieval-Augmented Generation (RAG) systems in information retrieval is significantly influenced by the characteristics of the documents being processed. In this study, the structured nature of textbooks, the conciseness of articles, and the narrative complexity of novels are shown to require distinct retrieval strategies. A comparative evaluation of multiple document-splitting methods reveals that the Recursive Character Splitter outperforms the Token-based Splitter in preserving contextual integrity. A novel evaluation technique is introduced, utilizing an open-source model to generate a comprehensive dataset of question-and-answer pairs, simulating realistic retrieval scenarios to enhance testing efficiency and metric reliability. The evaluation employs weighted scoring metrics, including SequenceMatcher, BLEU, METEOR, and BERT Score, to assess the system's accuracy and relevance. This approach establishes a refined standard for evaluating the precision of RAG systems, with future research focusing on optimizing chunk and overlap sizes to improve retrieval accuracy and efficiency.
title Exploring Information Retrieval Landscapes: An Investigation of a Novel Evaluation Techniques and Comparative Document Splitting Methods
topic Information Retrieval
Artificial Intelligence
I.2.7; H.3.3
url https://arxiv.org/abs/2409.08479