:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Valieva, Khadichabonu, Banerjee, Bikramjit
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2409.08724
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Intrinsic-Energy Joint Embedding Predictive Architectures Induce Quasimetric Spaces
by: Kobanda, Anthony, et al.
Published: (2026)

Pretrain Value, Not Reward: Decoupled Value Policy Optimization
by: Huang, Chenghua, et al.
Published: (2025)

DrS: Learning Reusable Dense Rewards for Multi-Stage Tasks
by: Mu, Tongzhou, et al.
Published: (2024)

Repairing Reward Functions with Feedback to Mitigate Reward Hacking
by: Hatgis-Kessell, Stephane, et al.
Published: (2025)

TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance
by: Liu, Yuyang, et al.
Published: (2025)

Zero-Shot LLMs in Human-in-the-Loop RL: Replacing Human Feedback for Reward Shaping
by: Nazir, Mohammad Saif, et al.
Published: (2025)

Value of Information and Reward Specification in Active Inference and POMDPs
by: Wei, Ran
Published: (2024)

Value-Free Policy Optimization via Reward Partitioning
by: Faye, Bilal, et al.
Published: (2025)

Bad Values but Good Behavior: Learning Highly Misspecified Bandits and MDPs
by: Banerjee, Debangshu, et al.
Published: (2023)

Improve Value Estimation of Q Function and Reshape Reward with Monte Carlo Tree Search
by: Li, Jiamian
Published: (2024)

Beyond Binary: Turning Partial Success into Dense Verifiable Rewards for Reinforcement Learning in Code Generation
by: Wang, Longwen, et al.
Published: (2026)

What Fundamental Structure in Reward Functions Enables Efficient Sparse-Reward Learning?
by: Shihab, Ibne Farabi, et al.
Published: (2025)

Constraints as Rewards: Reinforcement Learning for Robots without Reward Functions
by: Ishihara, Yu, et al.
Published: (2025)

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training
by: Xu, Yuanda, et al.
Published: (2026)

Smart Timing for Mining: A Deep Learning Framework for Bitcoin Hardware ROI Prediction
by: Wickramasinghe, Sithumi, et al.
Published: (2025)

Explaining Learned Reward Functions with Counterfactual Trajectories
by: Wehner, Jan, et al.
Published: (2024)

AI Alignment with Changing and Influenceable Reward Functions
by: Carroll, Micah, et al.
Published: (2024)

RobotKeyframing: Learning Locomotion with High-Level Objectives via Mixture of Dense and Sparse Rewards
by: Zargarbashi, Fatemeh, et al.
Published: (2024)

Robust Federated Learning Over the Air: Combating Heavy-Tailed Noise with Median Anchored Clipping
by: Li, Jiaxing, et al.
Published: (2024)

Universal Value-Function Uncertainties
by: Zanger, Moritz A., et al.
Published: (2025)

Value Internalization: Learning and Generalizing from Social Reward
by: Rong, Frieda, et al.
Published: (2024)

Uncertainty-Aware Reward-Free Exploration with General Function Approximation
by: Zhang, Junkai, et al.
Published: (2024)

Learning Causally Invariant Reward Functions from Diverse Demonstrations
by: Ovinnikov, Ivan, et al.
Published: (2024)

Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings
by: Frans, Kevin, et al.
Published: (2024)

Detecting Hidden Triggers: Mapping Non-Markov Reward Functions to Markov
by: Hyde, Gregory, et al.
Published: (2024)

STARC: A General Framework For Quantifying Differences Between Reward Functions
by: Skalse, Joar, et al.
Published: (2023)

Inferring Transition Dynamics from Value Functions
by: Adamczyk, Jacob
Published: (2025)

Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
by: Li, Hao, et al.
Published: (2023)

A Rollout-Based Algorithm and Reward Function for Resource Allocation in Business Processes
by: Middelhuis, Jeroen, et al.
Published: (2025)

Adapting the Behavior of Reinforcement Learning Agents to Changing Action Spaces and Reward Functions
by: de la Rosa, Raul, et al.
Published: (2026)

Massively Scaling Explicit Policy-conditioned Value Functions
by: Bohlinger, Nico, et al.
Published: (2025)

Batch Active Learning of Reward Functions from Human Preferences
by: Bıyık, Erdem, et al.
Published: (2024)

A Generalized Acquisition Function for Preference-based Reward Learning
by: Ellis, Evan, et al.
Published: (2024)

Kernel-Based Function Approximation for Average Reward Reinforcement Learning: An Optimist No-Regret Algorithm
by: Vakili, Sattar, et al.
Published: (2024)

Reward Models Inherit Value Biases from Pretraining
by: Christian, Brian, et al.
Published: (2026)

MAVRL: Learning Reward Functions from Multiple Feedback Types with Amortized Variational Inference
by: Baur, Raphaël, et al.
Published: (2026)

RF-Agent: Automated Reward Function Design via Language Agent Tree Search
by: Gao, Ning, et al.
Published: (2026)

Attention-Based Reward Shaping for Sparse and Delayed Rewards
by: Holmes, Ian, et al.
Published: (2025)

Reward Hacking Mitigation using Verifiable Composite Rewards
by: Tarek, Mirza Farhan Bin, et al.
Published: (2025)

Intrinsic Reward Policy Optimization for Sparse-Reward Environments
by: Cho, Minjae, et al.
Published: (2026)