Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Khadka, Sudip, Paudel, L. S.
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Computers and Society
Online Access:	https://arxiv.org/abs/2510.09705
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914087449919488
author	Khadka, Sudip Paudel, L. S.
author_facet	Khadka, Sudip Paudel, L. S.
contents	Static feature exclusion strategies often fail to prevent bias when hidden dependencies influence the model predictions. To address this issue, we explore a reinforcement learning (RL) framework that integrates bias mitigation and automated feature selection within a single learning process. Unlike traditional heuristic-driven filter or wrapper approaches, our RL agent adaptively selects features using a reward signal that explicitly integrates predictive performance with fairness considerations. This dynamic formulation allows the model to balance generalization, accuracy, and equity throughout the training process, rather than rely exclusively on pre-processing adjustments or post hoc correction mechanisms. In this paper, we describe the construction of a multi-component reward function, the specification of the agents action space over feature subsets, and the integration of this system with ensemble learning. We aim to provide a flexible and generalizable way to select features in environments where predictors are correlated and biases can inadvertently re-emerge.
format	Preprint
id	arxiv_https___arxiv_org_abs_2510_09705
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	A Multi-Component Reward Function with Policy Gradient for Automated Feature Selection with Dynamic Regularization and Bias Mitigation Khadka, Sudip Paudel, L. S. Machine Learning Computers and Society Static feature exclusion strategies often fail to prevent bias when hidden dependencies influence the model predictions. To address this issue, we explore a reinforcement learning (RL) framework that integrates bias mitigation and automated feature selection within a single learning process. Unlike traditional heuristic-driven filter or wrapper approaches, our RL agent adaptively selects features using a reward signal that explicitly integrates predictive performance with fairness considerations. This dynamic formulation allows the model to balance generalization, accuracy, and equity throughout the training process, rather than rely exclusively on pre-processing adjustments or post hoc correction mechanisms. In this paper, we describe the construction of a multi-component reward function, the specification of the agents action space over feature subsets, and the integration of this system with ensemble learning. We aim to provide a flexible and generalizable way to select features in environments where predictors are correlated and biases can inadvertently re-emerge.
title	A Multi-Component Reward Function with Policy Gradient for Automated Feature Selection with Dynamic Regularization and Bias Mitigation
topic	Machine Learning Computers and Society
url	https://arxiv.org/abs/2510.09705

Similar Items