Saved in:
| Main Authors: | , , , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.13372 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Table of Contents:
- Recent advances in retrieval-augmented generation (RAG) have substantially improved question-answering systems, particularly for factoid '5Ws' questions. However, significant challenges remain when addressing '1H' questions, specifically how-to questions, which are integral for decision-making and require dynamic, step-by-step responses. The key limitation lies in the prevalent data organization paradigm, chunk, which commonly divides documents into fixed-size segments, and disrupts the logical coherence and connections within the context. To address this, we propose Thread, a novel data organization paradigm enabling systems to handle how-to questions more effectively. Specifically, we introduce a new knowledge granularity, 'logic unit' (LU), where large language models transform documents into more structured and loosely interconnected LUs. Extensive experiments across both open-domain and industrial settings show that Thread outperforms existing paradigms significantly, improving the success rate of handling how-to questions by 21% to 33%. Additionally, Thread demonstrates high adaptability across diverse document formats, reducing retrieval information by up to 75% compared to chunk, and also shows better generalizability to '5Ws' questions, such as multi-hop questions, outperforming other paradigms.