Saved in:
| Main Authors: | Mathew, Sheryl, Harshit, N |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.19567 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Bias Fitting to Mitigate Length Bias of Reward Model in RLHF
by: Zhao, Kangwen, et al.
Published: (2025)
by: Zhao, Kangwen, et al.
Published: (2025)
Relative Counterfactual Contrastive Learning for Mitigating Pretrained Stance Bias in Stance Detection
by: Zhang, Jiarui, et al.
Published: (2024)
by: Zhang, Jiarui, et al.
Published: (2024)
Directional Alignment Mitigates Reward Hacking in Reinforcement Learning for Language Models
by: Deng, Wenlong, et al.
Published: (2026)
by: Deng, Wenlong, et al.
Published: (2026)
Mitigating Gender Bias in Depression Detection via Counterfactual Inference
by: Hu, Mingxuan, et al.
Published: (2025)
by: Hu, Mingxuan, et al.
Published: (2025)
Diagnosing and Mitigating System Bias in Self-Rewarding RL
by: Tan, Chuyi, et al.
Published: (2025)
by: Tan, Chuyi, et al.
Published: (2025)
Maximum Entropy Reinforcement Learning with Diffusion Policy
by: Dong, Xiaoyi, et al.
Published: (2025)
by: Dong, Xiaoyi, et al.
Published: (2025)
Tabular and Deep Reinforcement Learning for Gittins Index
by: Dhankhar, Harshit, et al.
Published: (2024)
by: Dhankhar, Harshit, et al.
Published: (2024)
Explaining Learned Reward Functions with Counterfactual Trajectories
by: Wehner, Jan, et al.
Published: (2024)
by: Wehner, Jan, et al.
Published: (2024)
Counterfactually Safe Reinforcement Learning
by: Li, Jingyi, et al.
Published: (2026)
by: Li, Jingyi, et al.
Published: (2026)
Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
by: Hsu, Sheryl, et al.
Published: (2024)
by: Hsu, Sheryl, et al.
Published: (2024)
AGR: Age Group fairness Reward for Bias Mitigation in LLMs
by: Cao, Shuirong, et al.
Published: (2024)
by: Cao, Shuirong, et al.
Published: (2024)
Training Data Efficiency in Multimodal Process Reward Models
by: Li, Jinyuan, et al.
Published: (2026)
by: Li, Jinyuan, et al.
Published: (2026)
Certifying Counterfactual Bias in LLMs
by: Chaudhary, Isha, et al.
Published: (2024)
by: Chaudhary, Isha, et al.
Published: (2024)
Towards the Mitigation of Confirmation Bias in Semi-supervised Learning: a Debiased Training Perspective
by: Wang, Yu, et al.
Published: (2024)
by: Wang, Yu, et al.
Published: (2024)
They're All Doctors: Synthesizing Diverse Counterfactuals to Mitigate Associative Bias
by: Magid, Salma Abdel, et al.
Published: (2024)
by: Magid, Salma Abdel, et al.
Published: (2024)
JacQuant: STE-Free Quantization-Aware Training via Learned Jacobian Surrogates
by: Yi, Kai, et al.
Published: (2026)
by: Yi, Kai, et al.
Published: (2026)
Reinforcement Learning for Monetary Policy Under Macroeconomic Uncertainty: Analyzing Tabular and Function Approximation Methods
by: Wang, Tony, et al.
Published: (2025)
by: Wang, Tony, et al.
Published: (2025)
Fisher-Guided Selective Forgetting: Mitigating The Primacy Bias in Deep Reinforcement Learning
by: Falzari, Massimiliano, et al.
Published: (2025)
by: Falzari, Massimiliano, et al.
Published: (2025)
IR$^3$: Contrastive Inverse Reinforcement Learning for Interpretable Detection and Mitigation of Reward Hacking
by: Beigi, Mohammad, et al.
Published: (2026)
by: Beigi, Mohammad, et al.
Published: (2026)
Reward-Conditioned Reinforcement Learning
by: Nauman, Michal, et al.
Published: (2026)
by: Nauman, Michal, et al.
Published: (2026)
Improving Real-Time Concept Drift Detection using a Hybrid Transformer-Autoencoder Framework
by: Harshit, N, et al.
Published: (2025)
by: Harshit, N, et al.
Published: (2025)
Deep Reinforcement Learning with Hybrid Intrinsic Reward Model
by: Yuan, Mingqi, et al.
Published: (2025)
by: Yuan, Mingqi, et al.
Published: (2025)
Entropy-Guided Data-Efficient Training for Multimodal Reasoning Reward Models
by: Yang, Shidong, et al.
Published: (2026)
by: Yang, Shidong, et al.
Published: (2026)
SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning
by: Sikchi, Harshit, et al.
Published: (2023)
by: Sikchi, Harshit, et al.
Published: (2023)
Beyond Simple Sum of Delayed Rewards: Non-Markovian Reward Modeling for Reinforcement Learning
by: Tang, Yuting, et al.
Published: (2024)
by: Tang, Yuting, et al.
Published: (2024)
Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training
by: Xu, Ran, et al.
Published: (2026)
by: Xu, Ran, et al.
Published: (2026)
Reward Model Ensembles Help Mitigate Overoptimization
by: Coste, Thomas, et al.
Published: (2023)
by: Coste, Thomas, et al.
Published: (2023)
Web-Scale Multimodal Summarization using CLIP-Based Semantic Alignment
by: K, Mounvik, et al.
Published: (2026)
by: K, Mounvik, et al.
Published: (2026)
Dual RL: Unification and New Methods for Reinforcement and Imitation Learning
by: Sikchi, Harshit, et al.
Published: (2023)
by: Sikchi, Harshit, et al.
Published: (2023)
Evaluating the Impact of Pulse Oximetry Bias in Machine Learning under Counterfactual Thinking
by: Martins, Inês, et al.
Published: (2024)
by: Martins, Inês, et al.
Published: (2024)
DCAST: Diverse Class-Aware Self-Training Mitigates Selection Bias for Fairer Learning
by: Tepeli, Yasin I., et al.
Published: (2024)
by: Tepeli, Yasin I., et al.
Published: (2024)
Information-Theoretic Reward Modeling for Stable RLHF: Detecting and Mitigating Reward Hacking
by: Miao, Yuchun, et al.
Published: (2025)
by: Miao, Yuchun, et al.
Published: (2025)
Adversarial Training of Reward Models
by: Bukharin, Alexander, et al.
Published: (2025)
by: Bukharin, Alexander, et al.
Published: (2025)
Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking
by: Eisenstein, Jacob, et al.
Published: (2023)
by: Eisenstein, Jacob, et al.
Published: (2023)
Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards
by: Hu, Zijing, et al.
Published: (2025)
by: Hu, Zijing, et al.
Published: (2025)
An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning
by: Xu, Haoran, et al.
Published: (2025)
by: Xu, Haoran, et al.
Published: (2025)
IRIS: Implicit Reward-Guided Internal Sifting for Mitigating Multimodal Hallucination
by: Li, Yuanshuai, et al.
Published: (2026)
by: Li, Yuanshuai, et al.
Published: (2026)
Counterfactual Explanations for Continuous Action Reinforcement Learning
by: Dong, Shuyang, et al.
Published: (2025)
by: Dong, Shuyang, et al.
Published: (2025)
Counterfactual Fairness through Transforming Data Orthogonal to Bias
by: Chen, Shuyi, et al.
Published: (2024)
by: Chen, Shuyi, et al.
Published: (2024)
The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward
by: Li, Long, et al.
Published: (2025)
by: Li, Long, et al.
Published: (2025)
Similar Items
-
Bias Fitting to Mitigate Length Bias of Reward Model in RLHF
by: Zhao, Kangwen, et al.
Published: (2025) -
Relative Counterfactual Contrastive Learning for Mitigating Pretrained Stance Bias in Stance Detection
by: Zhang, Jiarui, et al.
Published: (2024) -
Directional Alignment Mitigates Reward Hacking in Reinforcement Learning for Language Models
by: Deng, Wenlong, et al.
Published: (2026) -
Mitigating Gender Bias in Depression Detection via Counterfactual Inference
by: Hu, Mingxuan, et al.
Published: (2025) -
Diagnosing and Mitigating System Bias in Self-Rewarding RL
by: Tan, Chuyi, et al.
Published: (2025)