Saved in:
| Main Authors: | Valieva, Khadichabonu, Banerjee, Bikramjit |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.08724 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Intrinsic-Energy Joint Embedding Predictive Architectures Induce Quasimetric Spaces
by: Kobanda, Anthony, et al.
Published: (2026)
by: Kobanda, Anthony, et al.
Published: (2026)
Pretrain Value, Not Reward: Decoupled Value Policy Optimization
by: Huang, Chenghua, et al.
Published: (2025)
by: Huang, Chenghua, et al.
Published: (2025)
DrS: Learning Reusable Dense Rewards for Multi-Stage Tasks
by: Mu, Tongzhou, et al.
Published: (2024)
by: Mu, Tongzhou, et al.
Published: (2024)
Repairing Reward Functions with Feedback to Mitigate Reward Hacking
by: Hatgis-Kessell, Stephane, et al.
Published: (2025)
by: Hatgis-Kessell, Stephane, et al.
Published: (2025)
TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance
by: Liu, Yuyang, et al.
Published: (2025)
by: Liu, Yuyang, et al.
Published: (2025)
Zero-Shot LLMs in Human-in-the-Loop RL: Replacing Human Feedback for Reward Shaping
by: Nazir, Mohammad Saif, et al.
Published: (2025)
by: Nazir, Mohammad Saif, et al.
Published: (2025)
Value of Information and Reward Specification in Active Inference and POMDPs
by: Wei, Ran
Published: (2024)
by: Wei, Ran
Published: (2024)
Value-Free Policy Optimization via Reward Partitioning
by: Faye, Bilal, et al.
Published: (2025)
by: Faye, Bilal, et al.
Published: (2025)
Bad Values but Good Behavior: Learning Highly Misspecified Bandits and MDPs
by: Banerjee, Debangshu, et al.
Published: (2023)
by: Banerjee, Debangshu, et al.
Published: (2023)
Improve Value Estimation of Q Function and Reshape Reward with Monte Carlo Tree Search
by: Li, Jiamian
Published: (2024)
by: Li, Jiamian
Published: (2024)
Beyond Binary: Turning Partial Success into Dense Verifiable Rewards for Reinforcement Learning in Code Generation
by: Wang, Longwen, et al.
Published: (2026)
by: Wang, Longwen, et al.
Published: (2026)
What Fundamental Structure in Reward Functions Enables Efficient Sparse-Reward Learning?
by: Shihab, Ibne Farabi, et al.
Published: (2025)
by: Shihab, Ibne Farabi, et al.
Published: (2025)
Constraints as Rewards: Reinforcement Learning for Robots without Reward Functions
by: Ishihara, Yu, et al.
Published: (2025)
by: Ishihara, Yu, et al.
Published: (2025)
Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training
by: Xu, Yuanda, et al.
Published: (2026)
by: Xu, Yuanda, et al.
Published: (2026)
Smart Timing for Mining: A Deep Learning Framework for Bitcoin Hardware ROI Prediction
by: Wickramasinghe, Sithumi, et al.
Published: (2025)
by: Wickramasinghe, Sithumi, et al.
Published: (2025)
Explaining Learned Reward Functions with Counterfactual Trajectories
by: Wehner, Jan, et al.
Published: (2024)
by: Wehner, Jan, et al.
Published: (2024)
AI Alignment with Changing and Influenceable Reward Functions
by: Carroll, Micah, et al.
Published: (2024)
by: Carroll, Micah, et al.
Published: (2024)
RobotKeyframing: Learning Locomotion with High-Level Objectives via Mixture of Dense and Sparse Rewards
by: Zargarbashi, Fatemeh, et al.
Published: (2024)
by: Zargarbashi, Fatemeh, et al.
Published: (2024)
Robust Federated Learning Over the Air: Combating Heavy-Tailed Noise with Median Anchored Clipping
by: Li, Jiaxing, et al.
Published: (2024)
by: Li, Jiaxing, et al.
Published: (2024)
Universal Value-Function Uncertainties
by: Zanger, Moritz A., et al.
Published: (2025)
by: Zanger, Moritz A., et al.
Published: (2025)
Value Internalization: Learning and Generalizing from Social Reward
by: Rong, Frieda, et al.
Published: (2024)
by: Rong, Frieda, et al.
Published: (2024)
Uncertainty-Aware Reward-Free Exploration with General Function Approximation
by: Zhang, Junkai, et al.
Published: (2024)
by: Zhang, Junkai, et al.
Published: (2024)
Learning Causally Invariant Reward Functions from Diverse Demonstrations
by: Ovinnikov, Ivan, et al.
Published: (2024)
by: Ovinnikov, Ivan, et al.
Published: (2024)
Unsupervised Zero-Shot Reinforcement Learning via Functional Reward Encodings
by: Frans, Kevin, et al.
Published: (2024)
by: Frans, Kevin, et al.
Published: (2024)
Detecting Hidden Triggers: Mapping Non-Markov Reward Functions to Markov
by: Hyde, Gregory, et al.
Published: (2024)
by: Hyde, Gregory, et al.
Published: (2024)
STARC: A General Framework For Quantifying Differences Between Reward Functions
by: Skalse, Joar, et al.
Published: (2023)
by: Skalse, Joar, et al.
Published: (2023)
Inferring Transition Dynamics from Value Functions
by: Adamczyk, Jacob
Published: (2025)
by: Adamczyk, Jacob
Published: (2025)
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
by: Li, Hao, et al.
Published: (2023)
by: Li, Hao, et al.
Published: (2023)
A Rollout-Based Algorithm and Reward Function for Resource Allocation in Business Processes
by: Middelhuis, Jeroen, et al.
Published: (2025)
by: Middelhuis, Jeroen, et al.
Published: (2025)
Adapting the Behavior of Reinforcement Learning Agents to Changing Action Spaces and Reward Functions
by: de la Rosa, Raul, et al.
Published: (2026)
by: de la Rosa, Raul, et al.
Published: (2026)
Massively Scaling Explicit Policy-conditioned Value Functions
by: Bohlinger, Nico, et al.
Published: (2025)
by: Bohlinger, Nico, et al.
Published: (2025)
Batch Active Learning of Reward Functions from Human Preferences
by: Bıyık, Erdem, et al.
Published: (2024)
by: Bıyık, Erdem, et al.
Published: (2024)
A Generalized Acquisition Function for Preference-based Reward Learning
by: Ellis, Evan, et al.
Published: (2024)
by: Ellis, Evan, et al.
Published: (2024)
Kernel-Based Function Approximation for Average Reward Reinforcement Learning: An Optimist No-Regret Algorithm
by: Vakili, Sattar, et al.
Published: (2024)
by: Vakili, Sattar, et al.
Published: (2024)
Reward Models Inherit Value Biases from Pretraining
by: Christian, Brian, et al.
Published: (2026)
by: Christian, Brian, et al.
Published: (2026)
MAVRL: Learning Reward Functions from Multiple Feedback Types with Amortized Variational Inference
by: Baur, Raphaël, et al.
Published: (2026)
by: Baur, Raphaël, et al.
Published: (2026)
RF-Agent: Automated Reward Function Design via Language Agent Tree Search
by: Gao, Ning, et al.
Published: (2026)
by: Gao, Ning, et al.
Published: (2026)
Attention-Based Reward Shaping for Sparse and Delayed Rewards
by: Holmes, Ian, et al.
Published: (2025)
by: Holmes, Ian, et al.
Published: (2025)
Reward Hacking Mitigation using Verifiable Composite Rewards
by: Tarek, Mirza Farhan Bin, et al.
Published: (2025)
by: Tarek, Mirza Farhan Bin, et al.
Published: (2025)
Intrinsic Reward Policy Optimization for Sparse-Reward Environments
by: Cho, Minjae, et al.
Published: (2026)
by: Cho, Minjae, et al.
Published: (2026)
Similar Items
-
Intrinsic-Energy Joint Embedding Predictive Architectures Induce Quasimetric Spaces
by: Kobanda, Anthony, et al.
Published: (2026) -
Pretrain Value, Not Reward: Decoupled Value Policy Optimization
by: Huang, Chenghua, et al.
Published: (2025) -
DrS: Learning Reusable Dense Rewards for Multi-Stage Tasks
by: Mu, Tongzhou, et al.
Published: (2024) -
Repairing Reward Functions with Feedback to Mitigate Reward Hacking
by: Hatgis-Kessell, Stephane, et al.
Published: (2025) -
TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance
by: Liu, Yuyang, et al.
Published: (2025)