Saved in:
Bibliographic Details
Main Authors: An, Kaikai, Yang, Fangkai, Li, Liqun, Lu, Junting, Cheng, Sitao, Si, Shuzheng, Wang, Lu, Zhao, Pu, Cao, Lele, Lin, Qingwei, Rajmohan, Saravan, Zhang, Dongmei, Chang, Baobao
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2406.13372
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Recent advances in retrieval-augmented generation (RAG) have substantially improved question-answering systems, particularly for factoid '5Ws' questions. However, significant challenges remain when addressing '1H' questions, specifically how-to questions, which are integral for decision-making and require dynamic, step-by-step responses. The key limitation lies in the prevalent data organization paradigm, chunk, which commonly divides documents into fixed-size segments, and disrupts the logical coherence and connections within the context. To address this, we propose Thread, a novel data organization paradigm enabling systems to handle how-to questions more effectively. Specifically, we introduce a new knowledge granularity, 'logic unit' (LU), where large language models transform documents into more structured and loosely interconnected LUs. Extensive experiments across both open-domain and industrial settings show that Thread outperforms existing paradigms significantly, improving the success rate of handling how-to questions by 21% to 33%. Additionally, Thread demonstrates high adaptability across diverse document formats, reducing retrieval information by up to 75% compared to chunk, and also shows better generalizability to '5Ws' questions, such as multi-hop questions, outperforming other paradigms.