Saved in:
| Main Authors: | , , , , , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.24005 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866910275428417536 |
|---|---|
| author | Chen, Yanyu Jiang, Jiyue Yu, Dianzhi Wu, Zheng Liu, Jiahong Han, Jiaming Guo, Xiao Qi, Jinhu Li, Yu Zhang, Yifei King, Irwin |
| author_facet | Chen, Yanyu Jiang, Jiyue Yu, Dianzhi Wu, Zheng Liu, Jiahong Han, Jiaming Guo, Xiao Qi, Jinhu Li, Yu Zhang, Yifei King, Irwin |
| contents | The evolution of Large Language Model (LLM) reasoning is bottlenecked by the scarcity of high-quality process data. While self-alignment via endogenous rewards offers a solution, mining valid supervision faces three challenges: (1) Label Noise via Mimetic Bias, where rewards prioritize statistical likelihood over logical truth, creating a "correctness illusion" that masks compounding errors; (2) Coarse-Grained Supervision, where sparse global outcomes (e.g., in GRPO) fail to provide granular guidance, treating reasoning chains as monolithic; and (3) Distributional Collapse, where signals fail to generalize without amplifying pre-training biases. To address these, we introduce LC-ERD (Logic-Consistent Endogenous Reward Decomposition), a framework framing self-alignment as latent structure mining. We derive a Variational Logic Potential by aggregating consensus from the model's Latent Logic Expertise (LLE) to denoise the reasoning manifold, and introduce a Multi-Agent Value Decomposition protocol based on the IGM principle to quantify individual step utility. Experiments show LC-ERD delivers a robust self-evolution path, uncovering trade-offs between logic consistency and accuracy while identifying high-value reasoning patterns missed by standard rewards. Our code is available at https://github.com/LC-ERD-repo/LC-ERD. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2605_24005 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition Chen, Yanyu Jiang, Jiyue Yu, Dianzhi Wu, Zheng Liu, Jiahong Han, Jiaming Guo, Xiao Qi, Jinhu Li, Yu Zhang, Yifei King, Irwin Artificial Intelligence Computation and Language The evolution of Large Language Model (LLM) reasoning is bottlenecked by the scarcity of high-quality process data. While self-alignment via endogenous rewards offers a solution, mining valid supervision faces three challenges: (1) Label Noise via Mimetic Bias, where rewards prioritize statistical likelihood over logical truth, creating a "correctness illusion" that masks compounding errors; (2) Coarse-Grained Supervision, where sparse global outcomes (e.g., in GRPO) fail to provide granular guidance, treating reasoning chains as monolithic; and (3) Distributional Collapse, where signals fail to generalize without amplifying pre-training biases. To address these, we introduce LC-ERD (Logic-Consistent Endogenous Reward Decomposition), a framework framing self-alignment as latent structure mining. We derive a Variational Logic Potential by aggregating consensus from the model's Latent Logic Expertise (LLE) to denoise the reasoning manifold, and introduce a Multi-Agent Value Decomposition protocol based on the IGM principle to quantify individual step utility. Experiments show LC-ERD delivers a robust self-evolution path, uncovering trade-offs between logic consistency and accuracy while identifying high-value reasoning patterns missed by standard rewards. Our code is available at https://github.com/LC-ERD-repo/LC-ERD. |
| title | LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition |
| topic | Artificial Intelligence Computation and Language |
| url | https://arxiv.org/abs/2605.24005 |