Saved in:
Bibliographic Details
Main Authors: Lin, Zhiming, Zhao, Kai, Zhang, Sophie, Yu, Peilai, Xiao, Canran
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2512.23971
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866908738536865792
author Lin, Zhiming
Zhao, Kai
Zhang, Sophie
Yu, Peilai
Xiao, Canran
author_facet Lin, Zhiming
Zhao, Kai
Zhang, Sophie
Yu, Peilai
Xiao, Canran
contents Large-scale Chinese spelling correction (CSC) remains critical for real-world text processing, yet existing LLMs and supervised methods lack robustness to novel errors and rely on costly annotations. We introduce CEC-Zero, a zero-supervision reinforcement learning framework that addresses this by enabling LLMs to correct their own mistakes. CEC-Zero synthesizes errorful inputs from clean text, computes cluster-consensus rewards via semantic similarity and candidate agreement, and optimizes the policy with PPO. It outperforms supervised baselines by 10--13 F$_1$ points and strong LLM fine-tunes by 5--8 points across 9 benchmarks, with theoretical guarantees of unbiased rewards and convergence. CEC-Zero establishes a label-free paradigm for robust, scalable CSC, unlocking LLM potential in noisy text pipelines.
format Preprint
id arxiv_https___arxiv_org_abs_2512_23971
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle CEC-Zero: Zero-Supervision Character Error Correction with Self-Generated Rewards
Lin, Zhiming
Zhao, Kai
Zhang, Sophie
Yu, Peilai
Xiao, Canran
Computation and Language
Large-scale Chinese spelling correction (CSC) remains critical for real-world text processing, yet existing LLMs and supervised methods lack robustness to novel errors and rely on costly annotations. We introduce CEC-Zero, a zero-supervision reinforcement learning framework that addresses this by enabling LLMs to correct their own mistakes. CEC-Zero synthesizes errorful inputs from clean text, computes cluster-consensus rewards via semantic similarity and candidate agreement, and optimizes the policy with PPO. It outperforms supervised baselines by 10--13 F$_1$ points and strong LLM fine-tunes by 5--8 points across 9 benchmarks, with theoretical guarantees of unbiased rewards and convergence. CEC-Zero establishes a label-free paradigm for robust, scalable CSC, unlocking LLM potential in noisy text pipelines.
title CEC-Zero: Zero-Supervision Character Error Correction with Self-Generated Rewards
topic Computation and Language
url https://arxiv.org/abs/2512.23971