Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Lin, Zhiming, Zhao, Kai, Zhang, Sophie, Yu, Peilai, Xiao, Canran
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2512.23971
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866908738536865792
author	Lin, Zhiming Zhao, Kai Zhang, Sophie Yu, Peilai Xiao, Canran
author_facet	Lin, Zhiming Zhao, Kai Zhang, Sophie Yu, Peilai Xiao, Canran
contents	Large-scale Chinese spelling correction (CSC) remains critical for real-world text processing, yet existing LLMs and supervised methods lack robustness to novel errors and rely on costly annotations. We introduce CEC-Zero, a zero-supervision reinforcement learning framework that addresses this by enabling LLMs to correct their own mistakes. CEC-Zero synthesizes errorful inputs from clean text, computes cluster-consensus rewards via semantic similarity and candidate agreement, and optimizes the policy with PPO. It outperforms supervised baselines by 10--13 F$_1$ points and strong LLM fine-tunes by 5--8 points across 9 benchmarks, with theoretical guarantees of unbiased rewards and convergence. CEC-Zero establishes a label-free paradigm for robust, scalable CSC, unlocking LLM potential in noisy text pipelines.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_23971
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	CEC-Zero: Zero-Supervision Character Error Correction with Self-Generated Rewards Lin, Zhiming Zhao, Kai Zhang, Sophie Yu, Peilai Xiao, Canran Computation and Language Large-scale Chinese spelling correction (CSC) remains critical for real-world text processing, yet existing LLMs and supervised methods lack robustness to novel errors and rely on costly annotations. We introduce CEC-Zero, a zero-supervision reinforcement learning framework that addresses this by enabling LLMs to correct their own mistakes. CEC-Zero synthesizes errorful inputs from clean text, computes cluster-consensus rewards via semantic similarity and candidate agreement, and optimizes the policy with PPO. It outperforms supervised baselines by 10--13 F$_1$ points and strong LLM fine-tunes by 5--8 points across 9 benchmarks, with theoretical guarantees of unbiased rewards and convergence. CEC-Zero establishes a label-free paradigm for robust, scalable CSC, unlocking LLM potential in noisy text pipelines.
title	CEC-Zero: Zero-Supervision Character Error Correction with Self-Generated Rewards
topic	Computation and Language
url	https://arxiv.org/abs/2512.23971

Similar Items