Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chen, Yanyu, Jiang, Jiyue, Yu, Dianzhi, Wu, Zheng, Liu, Jiahong, Han, Jiaming, Guo, Xiao, Qi, Jinhu, Li, Yu, Zhang, Yifei, King, Irwin
Format:	Preprint
Published:	2026
Subjects:	Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2605.24005
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910275428417536
author	Chen, Yanyu Jiang, Jiyue Yu, Dianzhi Wu, Zheng Liu, Jiahong Han, Jiaming Guo, Xiao Qi, Jinhu Li, Yu Zhang, Yifei King, Irwin
author_facet	Chen, Yanyu Jiang, Jiyue Yu, Dianzhi Wu, Zheng Liu, Jiahong Han, Jiaming Guo, Xiao Qi, Jinhu Li, Yu Zhang, Yifei King, Irwin
contents	The evolution of Large Language Model (LLM) reasoning is bottlenecked by the scarcity of high-quality process data. While self-alignment via endogenous rewards offers a solution, mining valid supervision faces three challenges: (1) Label Noise via Mimetic Bias, where rewards prioritize statistical likelihood over logical truth, creating a "correctness illusion" that masks compounding errors; (2) Coarse-Grained Supervision, where sparse global outcomes (e.g., in GRPO) fail to provide granular guidance, treating reasoning chains as monolithic; and (3) Distributional Collapse, where signals fail to generalize without amplifying pre-training biases. To address these, we introduce LC-ERD (Logic-Consistent Endogenous Reward Decomposition), a framework framing self-alignment as latent structure mining. We derive a Variational Logic Potential by aggregating consensus from the model's Latent Logic Expertise (LLE) to denoise the reasoning manifold, and introduce a Multi-Agent Value Decomposition protocol based on the IGM principle to quantify individual step utility. Experiments show LC-ERD delivers a robust self-evolution path, uncovering trade-offs between logic consistency and accuracy while identifying high-value reasoning patterns missed by standard rewards. Our code is available at https://github.com/LC-ERD-repo/LC-ERD.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_24005
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition Chen, Yanyu Jiang, Jiyue Yu, Dianzhi Wu, Zheng Liu, Jiahong Han, Jiaming Guo, Xiao Qi, Jinhu Li, Yu Zhang, Yifei King, Irwin Artificial Intelligence Computation and Language The evolution of Large Language Model (LLM) reasoning is bottlenecked by the scarcity of high-quality process data. While self-alignment via endogenous rewards offers a solution, mining valid supervision faces three challenges: (1) Label Noise via Mimetic Bias, where rewards prioritize statistical likelihood over logical truth, creating a "correctness illusion" that masks compounding errors; (2) Coarse-Grained Supervision, where sparse global outcomes (e.g., in GRPO) fail to provide granular guidance, treating reasoning chains as monolithic; and (3) Distributional Collapse, where signals fail to generalize without amplifying pre-training biases. To address these, we introduce LC-ERD (Logic-Consistent Endogenous Reward Decomposition), a framework framing self-alignment as latent structure mining. We derive a Variational Logic Potential by aggregating consensus from the model's Latent Logic Expertise (LLE) to denoise the reasoning manifold, and introduce a Multi-Agent Value Decomposition protocol based on the IGM principle to quantify individual step utility. Experiments show LC-ERD delivers a robust self-evolution path, uncovering trade-offs between logic consistency and accuracy while identifying high-value reasoning patterns missed by standard rewards. Our code is available at https://github.com/LC-ERD-repo/LC-ERD.
title	LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition
topic	Artificial Intelligence Computation and Language
url	https://arxiv.org/abs/2605.24005

Similar Items