Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Ping, Bowen, Zeng, Jiali, Meng, Fandong, Wang, Shuo, Zhou, Jie, Zhang, Shanghang
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2502.02095
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908371704086528
author	Ping, Bowen Zeng, Jiali Meng, Fandong Wang, Shuo Zhou, Jie Zhang, Shanghang
author_facet	Ping, Bowen Zeng, Jiali Meng, Fandong Wang, Shuo Zhou, Jie Zhang, Shanghang
contents	Long-form generation is crucial for academic writing papers and repo-level code generation. Despite this, current models, including GPT-4o, still exhibit unsatisfactory performance. Existing methods that utilize preference learning with outcome supervision often fail to provide detailed feedback for extended contexts. This shortcoming can lead to content that does not fully satisfy query requirements, resulting in issues like length deviations, and diminished quality. In this paper, we propose enhancing long-form generation by incorporating process supervision. We employ Monte Carlo Tree Search to gather stepwise preference pairs, utilizing a global memory pool to maintain consistency. To address the issue of suboptimal candidate selection, we integrate external critiques to refine and improve the quality of the preference pairs. Finally, we apply step-level DPO using the collected stepwise preference pairs. Experimental results show that our method improves length and quality on long-form generation benchmarks, with almost lossless performance on general benchmarks across various model backbones.
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_02095
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information Ping, Bowen Zeng, Jiali Meng, Fandong Wang, Shuo Zhou, Jie Zhang, Shanghang Computation and Language Long-form generation is crucial for academic writing papers and repo-level code generation. Despite this, current models, including GPT-4o, still exhibit unsatisfactory performance. Existing methods that utilize preference learning with outcome supervision often fail to provide detailed feedback for extended contexts. This shortcoming can lead to content that does not fully satisfy query requirements, resulting in issues like length deviations, and diminished quality. In this paper, we propose enhancing long-form generation by incorporating process supervision. We employ Monte Carlo Tree Search to gather stepwise preference pairs, utilizing a global memory pool to maintain consistency. To address the issue of suboptimal candidate selection, we integrate external critiques to refine and improve the quality of the preference pairs. Finally, we apply step-level DPO using the collected stepwise preference pairs. Experimental results show that our method improves length and quality on long-form generation benchmarks, with almost lossless performance on general benchmarks across various model backbones.
title	LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information
topic	Computation and Language
url	https://arxiv.org/abs/2502.02095

Similar Items