Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Peng, Yixing, Zhang, Licheng, Fang, Shancheng, Liu, Yi, Gu, Peijian, Wang, Quan
Format:	Preprint
Published:	2025
Subjects:	Information Retrieval Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.18437
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918349465714688
author	Peng, Yixing Zhang, Licheng Fang, Shancheng Liu, Yi Gu, Peijian Wang, Quan
author_facet	Peng, Yixing Zhang, Licheng Fang, Shancheng Liu, Yi Gu, Peijian Wang, Quan
contents	Generating with citations is crucial for trustworthy Large Language Models (LLMs), yet even advanced LLMs often produce mismatched or irrelevant citations. Existing methods over-optimize citation fidelity while overlooking relevance to the user query, which degrades answer quality and robustness in real-world settings with noisy or irrelevant retrieved content. Moreover, the prevailing single-pass paradigm struggles to deliver optimal answers in long-form generation that requiring multiple citations. To address these limitations, we propose FineRef, a framework based on Fine-grained error Reflection, which explicitly teaches the model to self-identify and correct two key citation errors, mismatch and irrelevance, on a per-citation basis. FineRef follows a two-stage training strategy. The first stage instills an "attempt-reflect-correct" behavioral pattern via supervised fine-tuning, using fine-grained and controllable reflection data constructed by specialized lightweight models. An online self-reflective bootstrapping strategy is designed to improve generalization by iteratively enriching training data with verified, self-improving examples. To further enhance the self-reflection and correction capability, the second stage applies process-level reinforcement learning with a multi-dimensional reward scheme that promotes reflection accuracy, answer quality, and correction gain. Experiments on the ALCE benchmark demonstrate that FineRef significantly improves both citation performance and answer accuracy. Our 7B model outperforms GPT-4 by up to 18% in Citation F1 and 4% in EM Recall, while also surpassing the state-of-the-art model across key evaluation metrics. FineRef also exhibits strong generalization and robustness in domain transfer settings and noisy retrieval scenarios.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_18437
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	FineRef: Fine-Grained Error Reflection and Correction for Long-Form Generation with Citations Peng, Yixing Zhang, Licheng Fang, Shancheng Liu, Yi Gu, Peijian Wang, Quan Information Retrieval Artificial Intelligence Generating with citations is crucial for trustworthy Large Language Models (LLMs), yet even advanced LLMs often produce mismatched or irrelevant citations. Existing methods over-optimize citation fidelity while overlooking relevance to the user query, which degrades answer quality and robustness in real-world settings with noisy or irrelevant retrieved content. Moreover, the prevailing single-pass paradigm struggles to deliver optimal answers in long-form generation that requiring multiple citations. To address these limitations, we propose FineRef, a framework based on Fine-grained error Reflection, which explicitly teaches the model to self-identify and correct two key citation errors, mismatch and irrelevance, on a per-citation basis. FineRef follows a two-stage training strategy. The first stage instills an "attempt-reflect-correct" behavioral pattern via supervised fine-tuning, using fine-grained and controllable reflection data constructed by specialized lightweight models. An online self-reflective bootstrapping strategy is designed to improve generalization by iteratively enriching training data with verified, self-improving examples. To further enhance the self-reflection and correction capability, the second stage applies process-level reinforcement learning with a multi-dimensional reward scheme that promotes reflection accuracy, answer quality, and correction gain. Experiments on the ALCE benchmark demonstrate that FineRef significantly improves both citation performance and answer accuracy. Our 7B model outperforms GPT-4 by up to 18% in Citation F1 and 4% in EM Recall, while also surpassing the state-of-the-art model across key evaluation metrics. FineRef also exhibits strong generalization and robustness in domain transfer settings and noisy retrieval scenarios.
title	FineRef: Fine-Grained Error Reflection and Correction for Long-Form Generation with Citations
topic	Information Retrieval Artificial Intelligence
url	https://arxiv.org/abs/2602.18437

Similar Items