Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Gao, Zhitao, Ma, Jie, Li, Xuhong, Li, Pengyu, Qu, Ning, Wu, Yaqiang, Liu, Hui, Liu, Jun
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2602.03084
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866911420465020928
author	Gao, Zhitao Ma, Jie Li, Xuhong Li, Pengyu Qu, Ning Wu, Yaqiang Liu, Hui Liu, Jun
author_facet	Gao, Zhitao Ma, Jie Li, Xuhong Li, Pengyu Qu, Ning Wu, Yaqiang Liu, Hui Liu, Jun
contents	Large Language Models (LLMs) have achieved significant success in complex reasoning but remain bottlenecked by reliance on expert-annotated data and external verifiers. While existing self-evolution paradigms aim to bypass these constraints, they often fail to identify the optimal learning zone and risk reinforcing collective hallucinations and incorrect priors through flawed internal feedback. To address these challenges, we propose \underline{A}utonomous \underline{E}volutionary \underline{R}easoning \underline{O}ptimization (AERO), an unsupervised framework that achieves autonomous reasoning evolution by internalizing self-questioning, answering, and criticism within a synergistic dual-loop system. Inspired by the \textit{Zone of Proximal Development (ZPD)} theory, AERO utilizes entropy-based positioning to target the ``solvability gap'' and employs Independent Counterfactual Correction for robust verification. Furthermore, we introduce a Staggered Training Strategy to synchronize capability growth across functional roles and prevent curriculum collapse. Extensive evaluations across nine benchmarks spanning three domains demonstrate that AERO achieves average performance improvements of 4.57\% on Qwen3-4B-Base and 5.10\% on Qwen3-8B-Base, outperforming competitive baselines. Code is available at https://github.com/mira-ai-lab/AERO.
format	Preprint
id	arxiv_https___arxiv_org_abs_2602_03084
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	AERO: Autonomous Evolutionary Reasoning Optimization via Endogenous Dual-Loop Feedback Gao, Zhitao Ma, Jie Li, Xuhong Li, Pengyu Qu, Ning Wu, Yaqiang Liu, Hui Liu, Jun Computation and Language Large Language Models (LLMs) have achieved significant success in complex reasoning but remain bottlenecked by reliance on expert-annotated data and external verifiers. While existing self-evolution paradigms aim to bypass these constraints, they often fail to identify the optimal learning zone and risk reinforcing collective hallucinations and incorrect priors through flawed internal feedback. To address these challenges, we propose \underline{A}utonomous \underline{E}volutionary \underline{R}easoning \underline{O}ptimization (AERO), an unsupervised framework that achieves autonomous reasoning evolution by internalizing self-questioning, answering, and criticism within a synergistic dual-loop system. Inspired by the \textit{Zone of Proximal Development (ZPD)} theory, AERO utilizes entropy-based positioning to target the ``solvability gap'' and employs Independent Counterfactual Correction for robust verification. Furthermore, we introduce a Staggered Training Strategy to synchronize capability growth across functional roles and prevent curriculum collapse. Extensive evaluations across nine benchmarks spanning three domains demonstrate that AERO achieves average performance improvements of 4.57\% on Qwen3-4B-Base and 5.10\% on Qwen3-8B-Base, outperforming competitive baselines. Code is available at https://github.com/mira-ai-lab/AERO.
title	AERO: Autonomous Evolutionary Reasoning Optimization via Endogenous Dual-Loop Feedback
topic	Computation and Language
url	https://arxiv.org/abs/2602.03084

Similar Items