Saved in:
Bibliographic Details
Main Authors: Chen, Yanyu, Jiang, Jiyue, Yu, Dianzhi, Wu, Zheng, Liu, Jiahong, Han, Jiaming, Guo, Xiao, Qi, Jinhu, Li, Yu, Zhang, Yifei, King, Irwin
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2605.24005
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866910275428417536
author Chen, Yanyu
Jiang, Jiyue
Yu, Dianzhi
Wu, Zheng
Liu, Jiahong
Han, Jiaming
Guo, Xiao
Qi, Jinhu
Li, Yu
Zhang, Yifei
King, Irwin
author_facet Chen, Yanyu
Jiang, Jiyue
Yu, Dianzhi
Wu, Zheng
Liu, Jiahong
Han, Jiaming
Guo, Xiao
Qi, Jinhu
Li, Yu
Zhang, Yifei
King, Irwin
contents The evolution of Large Language Model (LLM) reasoning is bottlenecked by the scarcity of high-quality process data. While self-alignment via endogenous rewards offers a solution, mining valid supervision faces three challenges: (1) Label Noise via Mimetic Bias, where rewards prioritize statistical likelihood over logical truth, creating a "correctness illusion" that masks compounding errors; (2) Coarse-Grained Supervision, where sparse global outcomes (e.g., in GRPO) fail to provide granular guidance, treating reasoning chains as monolithic; and (3) Distributional Collapse, where signals fail to generalize without amplifying pre-training biases. To address these, we introduce LC-ERD (Logic-Consistent Endogenous Reward Decomposition), a framework framing self-alignment as latent structure mining. We derive a Variational Logic Potential by aggregating consensus from the model's Latent Logic Expertise (LLE) to denoise the reasoning manifold, and introduce a Multi-Agent Value Decomposition protocol based on the IGM principle to quantify individual step utility. Experiments show LC-ERD delivers a robust self-evolution path, uncovering trade-offs between logic consistency and accuracy while identifying high-value reasoning patterns missed by standard rewards. Our code is available at https://github.com/LC-ERD-repo/LC-ERD.
format Preprint
id arxiv_https___arxiv_org_abs_2605_24005
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition
Chen, Yanyu
Jiang, Jiyue
Yu, Dianzhi
Wu, Zheng
Liu, Jiahong
Han, Jiaming
Guo, Xiao
Qi, Jinhu
Li, Yu
Zhang, Yifei
King, Irwin
Artificial Intelligence
Computation and Language
The evolution of Large Language Model (LLM) reasoning is bottlenecked by the scarcity of high-quality process data. While self-alignment via endogenous rewards offers a solution, mining valid supervision faces three challenges: (1) Label Noise via Mimetic Bias, where rewards prioritize statistical likelihood over logical truth, creating a "correctness illusion" that masks compounding errors; (2) Coarse-Grained Supervision, where sparse global outcomes (e.g., in GRPO) fail to provide granular guidance, treating reasoning chains as monolithic; and (3) Distributional Collapse, where signals fail to generalize without amplifying pre-training biases. To address these, we introduce LC-ERD (Logic-Consistent Endogenous Reward Decomposition), a framework framing self-alignment as latent structure mining. We derive a Variational Logic Potential by aggregating consensus from the model's Latent Logic Expertise (LLE) to denoise the reasoning manifold, and introduce a Multi-Agent Value Decomposition protocol based on the IGM principle to quantify individual step utility. Experiments show LC-ERD delivers a robust self-evolution path, uncovering trade-offs between logic consistency and accuracy while identifying high-value reasoning patterns missed by standard rewards. Our code is available at https://github.com/LC-ERD-repo/LC-ERD.
title LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition
topic Artificial Intelligence
Computation and Language
url https://arxiv.org/abs/2605.24005