Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xu, Tianxiang, Zhu, Xiaoyan, Lai, Xin, Wang, Jiayin
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2605.17458
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909052524560384
author	Xu, Tianxiang Zhu, Xiaoyan Lai, Xin Wang, Jiayin
author_facet	Xu, Tianxiang Zhu, Xiaoyan Lai, Xin Wang, Jiayin
contents	Text classification models are typically trained via supervised fine-tuning (SFT). However, SFT essentially performs behavior cloning from instance-wise labels and thus fails to adequately capture relative preference relations among samples, which limits the model's ability to shape decision boundaries and calibrate predictive confidence. In this paper, we propose ClaHF, a human feedback-inspired reinforcement learning (RL) framework for text classification that integrates preference modeling and RL optimization into the classification pipeline without requiring additional human annotations. Unlike prior work that relies solely on instance-wise supervision, ClaHF constructs multiple candidate predictions together with their relative ranking relations, and jointly models the Top-1 preference and the ordering among non-optimal candidates within a reward model (RM). This design converts conventional label supervision into preference signals that are directly applicable to policy optimization. We conduct systematic evaluations on eight classification tasks spanning three categories of scenarios. Results demonstrate that ClaHF consistently improves both classification performance and confidence calibration across diverse language models (LMs). The data and code are available at https://anonymous.4open.science/r/ClaHF.
format	Preprint
id	arxiv_https___arxiv_org_abs_2605_17458
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	ClaHF: A Human Feedback-inspired Reinforcement Learning Framework for Improving Classification Tasks Xu, Tianxiang Zhu, Xiaoyan Lai, Xin Wang, Jiayin Machine Learning Text classification models are typically trained via supervised fine-tuning (SFT). However, SFT essentially performs behavior cloning from instance-wise labels and thus fails to adequately capture relative preference relations among samples, which limits the model's ability to shape decision boundaries and calibrate predictive confidence. In this paper, we propose ClaHF, a human feedback-inspired reinforcement learning (RL) framework for text classification that integrates preference modeling and RL optimization into the classification pipeline without requiring additional human annotations. Unlike prior work that relies solely on instance-wise supervision, ClaHF constructs multiple candidate predictions together with their relative ranking relations, and jointly models the Top-1 preference and the ordering among non-optimal candidates within a reward model (RM). This design converts conventional label supervision into preference signals that are directly applicable to policy optimization. We conduct systematic evaluations on eight classification tasks spanning three categories of scenarios. Results demonstrate that ClaHF consistently improves both classification performance and confidence calibration across diverse language models (LMs). The data and code are available at https://anonymous.4open.science/r/ClaHF.
title	ClaHF: A Human Feedback-inspired Reinforcement Learning Framework for Improving Classification Tasks
topic	Machine Learning
url	https://arxiv.org/abs/2605.17458

Similar Items