Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Jeong, Jinhong, Park, Junghun, Yu, Youngjae
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2604.05302
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866918450383814656
author	Jeong, Jinhong Park, Junghun Yu, Youngjae
author_facet	Jeong, Jinhong Park, Junghun Yu, Youngjae
contents	Text simplification supports second language (L2) learning by providing comprehensible input, consistent with the Input Hypothesis. However, constructing personalized parallel corpora is costly, while existing large language model (LLM)-based readability control methods rely on pre-labeled sentence corpora and primarily target English. We propose Re-RIGHT, a unified reinforcement learning framework for adaptive multilingual text simplification without parallel corpus supervision. We first show that prompting-based lexical simplification at target proficiency levels (CEFR, JLPT, TOPIK, and HSK) performs poorly at easier levels and for non-English languages, even with state-of-the-art LLMs such as GPT-5.2 and Gemini 2.5. To address this, we collect 43K vocabulary-level data across four languages (English, Japanese, Korean, and Chinese) and train a compact 4B policy model using Re-RIGHT, which integrates three reward modules: vocabulary coverage, semantic preservation, and coherence. Compared to the stronger LLM baselines, Re-RIGHT achieves higher lexical coverage at target proficiency levels while maintaining original meaning and fluency.
format	Preprint
id	arxiv_https___arxiv_org_abs_2604_05302
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Right at My Level: A Unified Multilingual Framework for Proficiency-Aware Text Simplification Jeong, Jinhong Park, Junghun Yu, Youngjae Computation and Language Text simplification supports second language (L2) learning by providing comprehensible input, consistent with the Input Hypothesis. However, constructing personalized parallel corpora is costly, while existing large language model (LLM)-based readability control methods rely on pre-labeled sentence corpora and primarily target English. We propose Re-RIGHT, a unified reinforcement learning framework for adaptive multilingual text simplification without parallel corpus supervision. We first show that prompting-based lexical simplification at target proficiency levels (CEFR, JLPT, TOPIK, and HSK) performs poorly at easier levels and for non-English languages, even with state-of-the-art LLMs such as GPT-5.2 and Gemini 2.5. To address this, we collect 43K vocabulary-level data across four languages (English, Japanese, Korean, and Chinese) and train a compact 4B policy model using Re-RIGHT, which integrates three reward modules: vocabulary coverage, semantic preservation, and coherence. Compared to the stronger LLM baselines, Re-RIGHT achieves higher lexical coverage at target proficiency levels while maintaining original meaning and fluency.
title	Right at My Level: A Unified Multilingual Framework for Proficiency-Aware Text Simplification
topic	Computation and Language
url	https://arxiv.org/abs/2604.05302

Similar Items