Saved in:
Bibliographic Details
Main Authors: Xia, Yuan, Zhou, Jingbo, Shi, Zhenhui, Chen, Jun, Huang, Haifeng
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2407.19813
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910751730434048
author Xia, Yuan
Zhou, Jingbo
Shi, Zhenhui
Chen, Jun
Huang, Haifeng
author_facet Xia, Yuan
Zhou, Jingbo
Shi, Zhenhui
Chen, Jun
Huang, Haifeng
contents The Retrieval-Augmented Language Model (RALM) has shown remarkable performance on knowledge-intensive tasks by incorporating external knowledge during inference, which mitigates the factual hallucinations inherited in large language models (LLMs). Despite these advancements, challenges persist in the implementation of RALMs, particularly concerning their reliability and traceability. To be specific, the irrelevant document retrieval may result in unhelpful response generation or even deteriorate the performance of LLMs, while the lack of proper citations in generated outputs complicates efforts to verify the trustworthiness of the models. To this end, we propose a novel self-reasoning framework aimed at improving the reliability and traceability of RALMs, whose core idea is to leverage reasoning trajectories generated by the LLM itself. The framework involves constructing self-reason trajectories with three processes: a relevance-aware process, an evidence-aware selective process, and a trajectory analysis process. We have evaluated our framework across four public datasets (two short-form QA datasets, one long-form QA dataset, and one fact verification dataset) to demonstrate the superiority of our method, which can outperform existing state-of-the-art models and can achieve comparable performance with GPT-4, while only using 2,000 training samples.
format Preprint
id arxiv_https___arxiv_org_abs_2407_19813
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Improving Retrieval Augmented Language Model with Self-Reasoning
Xia, Yuan
Zhou, Jingbo
Shi, Zhenhui
Chen, Jun
Huang, Haifeng
Computation and Language
Artificial Intelligence
The Retrieval-Augmented Language Model (RALM) has shown remarkable performance on knowledge-intensive tasks by incorporating external knowledge during inference, which mitigates the factual hallucinations inherited in large language models (LLMs). Despite these advancements, challenges persist in the implementation of RALMs, particularly concerning their reliability and traceability. To be specific, the irrelevant document retrieval may result in unhelpful response generation or even deteriorate the performance of LLMs, while the lack of proper citations in generated outputs complicates efforts to verify the trustworthiness of the models. To this end, we propose a novel self-reasoning framework aimed at improving the reliability and traceability of RALMs, whose core idea is to leverage reasoning trajectories generated by the LLM itself. The framework involves constructing self-reason trajectories with three processes: a relevance-aware process, an evidence-aware selective process, and a trajectory analysis process. We have evaluated our framework across four public datasets (two short-form QA datasets, one long-form QA dataset, and one fact verification dataset) to demonstrate the superiority of our method, which can outperform existing state-of-the-art models and can achieve comparable performance with GPT-4, while only using 2,000 training samples.
title Improving Retrieval Augmented Language Model with Self-Reasoning
topic Computation and Language
Artificial Intelligence
url https://arxiv.org/abs/2407.19813