:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Mathew, Sheryl, Harshit, N
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2508.19567
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Bias Fitting to Mitigate Length Bias of Reward Model in RLHF
by: Zhao, Kangwen, et al.
Published: (2025)

Relative Counterfactual Contrastive Learning for Mitigating Pretrained Stance Bias in Stance Detection
by: Zhang, Jiarui, et al.
Published: (2024)

Directional Alignment Mitigates Reward Hacking in Reinforcement Learning for Language Models
by: Deng, Wenlong, et al.
Published: (2026)

Mitigating Gender Bias in Depression Detection via Counterfactual Inference
by: Hu, Mingxuan, et al.
Published: (2025)

Diagnosing and Mitigating System Bias in Self-Rewarding RL
by: Tan, Chuyi, et al.
Published: (2025)

Maximum Entropy Reinforcement Learning with Diffusion Policy
by: Dong, Xiaoyi, et al.
Published: (2025)

Tabular and Deep Reinforcement Learning for Gittins Index
by: Dhankhar, Harshit, et al.
Published: (2024)

Explaining Learned Reward Functions with Counterfactual Trajectories
by: Wehner, Jan, et al.
Published: (2024)

Counterfactually Safe Reinforcement Learning
by: Li, Jingyi, et al.
Published: (2026)

Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
by: Hsu, Sheryl, et al.
Published: (2024)

AGR: Age Group fairness Reward for Bias Mitigation in LLMs
by: Cao, Shuirong, et al.
Published: (2024)

Training Data Efficiency in Multimodal Process Reward Models
by: Li, Jinyuan, et al.
Published: (2026)

Certifying Counterfactual Bias in LLMs
by: Chaudhary, Isha, et al.
Published: (2024)

Towards the Mitigation of Confirmation Bias in Semi-supervised Learning: a Debiased Training Perspective
by: Wang, Yu, et al.
Published: (2024)

They're All Doctors: Synthesizing Diverse Counterfactuals to Mitigate Associative Bias
by: Magid, Salma Abdel, et al.
Published: (2024)

JacQuant: STE-Free Quantization-Aware Training via Learned Jacobian Surrogates
by: Yi, Kai, et al.
Published: (2026)

Reinforcement Learning for Monetary Policy Under Macroeconomic Uncertainty: Analyzing Tabular and Function Approximation Methods
by: Wang, Tony, et al.
Published: (2025)

Fisher-Guided Selective Forgetting: Mitigating The Primacy Bias in Deep Reinforcement Learning
by: Falzari, Massimiliano, et al.
Published: (2025)

IR$^3$: Contrastive Inverse Reinforcement Learning for Interpretable Detection and Mitigation of Reward Hacking
by: Beigi, Mohammad, et al.
Published: (2026)

Reward-Conditioned Reinforcement Learning
by: Nauman, Michal, et al.
Published: (2026)

Improving Real-Time Concept Drift Detection using a Hybrid Transformer-Autoencoder Framework
by: Harshit, N, et al.
Published: (2025)

Deep Reinforcement Learning with Hybrid Intrinsic Reward Model
by: Yuan, Mingqi, et al.
Published: (2025)

Entropy-Guided Data-Efficient Training for Multimodal Reasoning Reward Models
by: Yang, Shidong, et al.
Published: (2026)

SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning
by: Sikchi, Harshit, et al.
Published: (2023)

Beyond Simple Sum of Delayed Rewards: Non-Markovian Reward Modeling for Reinforcement Learning
by: Tang, Yuting, et al.
Published: (2024)

Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training
by: Xu, Ran, et al.
Published: (2026)

Reward Model Ensembles Help Mitigate Overoptimization
by: Coste, Thomas, et al.
Published: (2023)

Web-Scale Multimodal Summarization using CLIP-Based Semantic Alignment
by: K, Mounvik, et al.
Published: (2026)

Dual RL: Unification and New Methods for Reinforcement and Imitation Learning
by: Sikchi, Harshit, et al.
Published: (2023)

Evaluating the Impact of Pulse Oximetry Bias in Machine Learning under Counterfactual Thinking
by: Martins, Inês, et al.
Published: (2024)

DCAST: Diverse Class-Aware Self-Training Mitigates Selection Bias for Fairer Learning
by: Tepeli, Yasin I., et al.
Published: (2024)

Information-Theoretic Reward Modeling for Stable RLHF: Detecting and Mitigating Reward Hacking
by: Miao, Yuchun, et al.
Published: (2025)

Adversarial Training of Reward Models
by: Bukharin, Alexander, et al.
Published: (2025)

Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking
by: Eisenstein, Jacob, et al.
Published: (2023)

Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards
by: Hu, Zijing, et al.
Published: (2025)

An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning
by: Xu, Haoran, et al.
Published: (2025)

IRIS: Implicit Reward-Guided Internal Sifting for Mitigating Multimodal Hallucination
by: Li, Yuanshuai, et al.
Published: (2026)

Counterfactual Explanations for Continuous Action Reinforcement Learning
by: Dong, Shuyang, et al.
Published: (2025)

Counterfactual Fairness through Transforming Data Orthogonal to Bias
by: Chen, Shuyi, et al.
Published: (2024)

The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward
by: Li, Long, et al.
Published: (2025)