Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.23971 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866908738536865792 |
|---|---|
| author | Lin, Zhiming Zhao, Kai Zhang, Sophie Yu, Peilai Xiao, Canran |
| author_facet | Lin, Zhiming Zhao, Kai Zhang, Sophie Yu, Peilai Xiao, Canran |
| contents | Large-scale Chinese spelling correction (CSC) remains critical for real-world text processing, yet existing LLMs and supervised methods lack robustness to novel errors and rely on costly annotations. We introduce CEC-Zero, a zero-supervision reinforcement learning framework that addresses this by enabling LLMs to correct their own mistakes. CEC-Zero synthesizes errorful inputs from clean text, computes cluster-consensus rewards via semantic similarity and candidate agreement, and optimizes the policy with PPO. It outperforms supervised baselines by 10--13 F$_1$ points and strong LLM fine-tunes by 5--8 points across 9 benchmarks, with theoretical guarantees of unbiased rewards and convergence. CEC-Zero establishes a label-free paradigm for robust, scalable CSC, unlocking LLM potential in noisy text pipelines. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2512_23971 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | CEC-Zero: Zero-Supervision Character Error Correction with Self-Generated Rewards Lin, Zhiming Zhao, Kai Zhang, Sophie Yu, Peilai Xiao, Canran Computation and Language Large-scale Chinese spelling correction (CSC) remains critical for real-world text processing, yet existing LLMs and supervised methods lack robustness to novel errors and rely on costly annotations. We introduce CEC-Zero, a zero-supervision reinforcement learning framework that addresses this by enabling LLMs to correct their own mistakes. CEC-Zero synthesizes errorful inputs from clean text, computes cluster-consensus rewards via semantic similarity and candidate agreement, and optimizes the policy with PPO. It outperforms supervised baselines by 10--13 F$_1$ points and strong LLM fine-tunes by 5--8 points across 9 benchmarks, with theoretical guarantees of unbiased rewards and convergence. CEC-Zero establishes a label-free paradigm for robust, scalable CSC, unlocking LLM potential in noisy text pipelines. |
| title | CEC-Zero: Zero-Supervision Character Error Correction with Self-Generated Rewards |
| topic | Computation and Language |
| url | https://arxiv.org/abs/2512.23971 |