Salvato in:
Dettagli Bibliografici
Autore principale: Rho, Donghwan
Natura: Preprint
Pubblicazione: 2025
Soggetti:
Accesso online:https://arxiv.org/abs/2510.03394
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866917014735421440
author Rho, Donghwan
author_facet Rho, Donghwan
contents Reinforcement learning with verifiable rewards (RLVR) is a promising approach for training large language models (LLMs) with stronger reasoning abilities. It has also been applied to a variety of logic puzzles. In this work, we study the Korean word-chain game using RLVR. We show that rule-derived rewards can naturally conflict, and demonstrate through experiments that a curriculum-learning scheme mitigates these conflicts. Our findings motivate further studies of puzzle tasks in diverse languages.
format Preprint
id arxiv_https___arxiv_org_abs_2510_03394
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Studying the Korean Word-Chain Game with RLVR: Mitigating Reward Conflicts via Curriculum Learning
Rho, Donghwan
Machine Learning
Computation and Language
Reinforcement learning with verifiable rewards (RLVR) is a promising approach for training large language models (LLMs) with stronger reasoning abilities. It has also been applied to a variety of logic puzzles. In this work, we study the Korean word-chain game using RLVR. We show that rule-derived rewards can naturally conflict, and demonstrate through experiments that a curriculum-learning scheme mitigates these conflicts. Our findings motivate further studies of puzzle tasks in diverse languages.
title Studying the Korean Word-Chain Game with RLVR: Mitigating Reward Conflicts via Curriculum Learning
topic Machine Learning
Computation and Language
url https://arxiv.org/abs/2510.03394