MARC21: :: Library Catalog

Salvato in:

Dettagli Bibliografici
Autori principali:	Zhao, Ziwen, Yang, Menglin
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Machine Learning Artificial Intelligence Information Retrieval
Accesso online:	https://arxiv.org/abs/2605.00529
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

_version_	1866909007856271360
author	Zhao, Ziwen Yang, Menglin
author_facet	Zhao, Ziwen Yang, Menglin
contents	Retrieval-augmented generation (RAG) enhances large language models with external knowledge, and tree-based RAG organizes documents into hierarchical indexes to support queries at multiple granularities. However, existing Tree-RAG methods designed for single-document retrieval face critical challenges in scaling to cross-document multi-hop questions: (1) poor distribution adaptability, where $k$-means clustering introduces noise due to rigid distribution assumptions; (2) structural isolation, as tree indexes lack explicit cross-document connections; and (3) coarse abstraction, which obscures fine-grained details. To address these limitations, we propose $Ψ$-RAG, a tree-RAG framework with two key components. First, a hierarchical abstract tree index built through an iterative "merging and collapse" process that adapts to data distributions without a priori assumption. Second, a multi-granular retrieval agent that intelligently interacts with the knowledge base with reorganized queries and an agent-powered hybrid retriever. $Ψ$-RAG supports diverse tasks from token-level question answering to document-level summarization. On cross-document multi-hop QA benchmarks, it outperforms RAPTOR by 25.9% and HippoRAG 2 by 7.4% in average F1 score. Code is available at https://github.com/Newiz430/Psi-RAG.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_00529
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation Zhao, Ziwen Yang, Menglin Machine Learning Artificial Intelligence Information Retrieval Retrieval-augmented generation (RAG) enhances large language models with external knowledge, and tree-based RAG organizes documents into hierarchical indexes to support queries at multiple granularities. However, existing Tree-RAG methods designed for single-document retrieval face critical challenges in scaling to cross-document multi-hop questions: (1) poor distribution adaptability, where $k$-means clustering introduces noise due to rigid distribution assumptions; (2) structural isolation, as tree indexes lack explicit cross-document connections; and (3) coarse abstraction, which obscures fine-grained details. To address these limitations, we propose $Ψ$-RAG, a tree-RAG framework with two key components. First, a hierarchical abstract tree index built through an iterative "merging and collapse" process that adapts to data distributions without a priori assumption. Second, a multi-granular retrieval agent that intelligently interacts with the knowledge base with reorganized queries and an agent-powered hybrid retriever. $Ψ$-RAG supports diverse tasks from token-level question answering to document-level summarization. On cross-document multi-hop QA benchmarks, it outperforms RAPTOR by 25.9% and HippoRAG 2 by 7.4% in average F1 score. Code is available at https://github.com/Newiz430/Psi-RAG.
title	Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation
topic	Machine Learning Artificial Intelligence Information Retrieval
url	https://arxiv.org/abs/2605.00529

Documenti analoghi