Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Wang, Hanbin, Song, Jingwei, Li, Jinpeng, Mi, Fei, Shang, Lifeng
Format:	Preprint
Publié:	2026
Sujets:	Artificial Intelligence
Accès en ligne:	https://arxiv.org/abs/2601.07238
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866915722890838016
author	Wang, Hanbin Song, Jingwei Li, Jinpeng Mi, Fei Shang, Lifeng
author_facet	Wang, Hanbin Song, Jingwei Li, Jinpeng Mi, Fei Shang, Lifeng
contents	Large reasoning models (LRMs) exhibit diverse high-level reasoning patterns (e.g., direct solution, reflection-and-verification, and exploring multiple solutions), yet prevailing training recipes implicitly bias models toward a limited set of dominant patterns. Through a systematic analysis, we identify substantial accuracy variance across these patterns on mathematics and science benchmarks, revealing that a model's default reasoning pattern is often sub-optimal for a given problem. To address this, we introduce Group Pattern Selection Optimization (GPSO), a reinforcement learning framework that extends GRPO by incorporating multi-pattern rollouts, verifier-guided optimal pattern selection per problem, and attention masking during optimization to prevent the leakage of explicit pattern suffixes into the learned policy. By exploring a portfolio of diverse reasoning strategies and optimizing the policy on the most effective ones, GPSO enables the model to internalize the mapping from problem characteristics to optimal reasoning patterns. Extensive experiments demonstrate that GPSO delivers consistent and substantial performance gains across various model backbones and benchmarks, effectively mitigating pattern sub-optimality and fostering more robust, adaptable reasoning. All data and codes are available at https://github.com/wanghanbinpanda/GPSO.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_07238
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Group Pattern Selection Optimization: Let LRMs Pick the Right Pattern for Reasoning Wang, Hanbin Song, Jingwei Li, Jinpeng Mi, Fei Shang, Lifeng Artificial Intelligence Large reasoning models (LRMs) exhibit diverse high-level reasoning patterns (e.g., direct solution, reflection-and-verification, and exploring multiple solutions), yet prevailing training recipes implicitly bias models toward a limited set of dominant patterns. Through a systematic analysis, we identify substantial accuracy variance across these patterns on mathematics and science benchmarks, revealing that a model's default reasoning pattern is often sub-optimal for a given problem. To address this, we introduce Group Pattern Selection Optimization (GPSO), a reinforcement learning framework that extends GRPO by incorporating multi-pattern rollouts, verifier-guided optimal pattern selection per problem, and attention masking during optimization to prevent the leakage of explicit pattern suffixes into the learned policy. By exploring a portfolio of diverse reasoning strategies and optimizing the policy on the most effective ones, GPSO enables the model to internalize the mapping from problem characteristics to optimal reasoning patterns. Extensive experiments demonstrate that GPSO delivers consistent and substantial performance gains across various model backbones and benchmarks, effectively mitigating pattern sub-optimality and fostering more robust, adaptable reasoning. All data and codes are available at https://github.com/wanghanbinpanda/GPSO.
title	Group Pattern Selection Optimization: Let LRMs Pick the Right Pattern for Reasoning
topic	Artificial Intelligence
url	https://arxiv.org/abs/2601.07238

Documents similaires