Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Chen, Shuang, Guo, Yue, Ye, Yimeng, Huang, Shijue, Hu, Wenbo, Li, Haoxi, Zhang, Manyuan, Chen, Jiayu, Guo, Song, Peng, Nanyun
Format:	Preprint
Published:	2025
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2510.08457
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912639826788352
author	Chen, Shuang Guo, Yue Ye, Yimeng Huang, Shijue Hu, Wenbo Li, Haoxi Zhang, Manyuan Chen, Jiayu Guo, Song Peng, Nanyun
author_facet	Chen, Shuang Guo, Yue Ye, Yimeng Huang, Shijue Hu, Wenbo Li, Haoxi Zhang, Manyuan Chen, Jiayu Guo, Song Peng, Nanyun
contents	Recent advances in multimodal large reasoning models (MLRMs) have substantially improved their ability to solve complex textual and visual tasks. However, these models tend to overthink on simple problems, producing unnecessarily lengthy reasoning traces, while under-exploring on challenging ones, leading to missed solutions. To address this imbalance, we propose ARES, a unified open-source framework for adaptive reasoning that dynamically allocates exploration effort based on task difficulty. Our approach is motivated by two key empirical findings: (i) while single-token entropy is noisy, high window-entropy (HWE) tokens (token-level entropies averaged under a sliding window) can reliably capture reasoning-critical moments; and (ii) reducing HWE usage benefits easy problems, while increasing it is essential for solving hard ones. Building on these insights, ARES introduces a two-stage training pipeline. In the Adaptive Cold-Start stage, we curate multimodal and textual data paired with reasoning traces of length proportional to problem difficulty, equipping the model with initial difficulty awareness. In the second stage, we develop Adaptive Entropy Policy Optimization (AEPO), which uses HWE tokens as exploration triggers to decide when to explore, and a hierarchical entropy reward with dynamic KL control to decide how much to explore. Extensive experiments demonstrate that ARES achieves superior performance and reasoning efficiency across diverse mathematical, logical, and multimodal benchmarks, while closing the gap to leading commercial systems under significantly lower inference costs.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_08457
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping Chen, Shuang Guo, Yue Ye, Yimeng Huang, Shijue Hu, Wenbo Li, Haoxi Zhang, Manyuan Chen, Jiayu Guo, Song Peng, Nanyun Computation and Language Recent advances in multimodal large reasoning models (MLRMs) have substantially improved their ability to solve complex textual and visual tasks. However, these models tend to overthink on simple problems, producing unnecessarily lengthy reasoning traces, while under-exploring on challenging ones, leading to missed solutions. To address this imbalance, we propose ARES, a unified open-source framework for adaptive reasoning that dynamically allocates exploration effort based on task difficulty. Our approach is motivated by two key empirical findings: (i) while single-token entropy is noisy, high window-entropy (HWE) tokens (token-level entropies averaged under a sliding window) can reliably capture reasoning-critical moments; and (ii) reducing HWE usage benefits easy problems, while increasing it is essential for solving hard ones. Building on these insights, ARES introduces a two-stage training pipeline. In the Adaptive Cold-Start stage, we curate multimodal and textual data paired with reasoning traces of length proportional to problem difficulty, equipping the model with initial difficulty awareness. In the second stage, we develop Adaptive Entropy Policy Optimization (AEPO), which uses HWE tokens as exploration triggers to decide when to explore, and a hierarchical entropy reward with dynamic KL control to decide how much to explore. Extensive experiments demonstrate that ARES achieves superior performance and reasoning efficiency across diverse mathematical, logical, and multimodal benchmarks, while closing the gap to leading commercial systems under significantly lower inference costs.
title	ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping
topic	Computation and Language
url	https://arxiv.org/abs/2510.08457

Similar Items