Saved in:
| Main Authors: | Xie, Yuqing, Chen, Jiayu, Tang, Wenhao, Zhang, Ya, Yu, Chao, Wang, Yu |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.15120 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Automatic Reward Shaping from Confounded Offline Data
by: Li, Mingxuan, et al.
Published: (2025)
by: Li, Mingxuan, et al.
Published: (2025)
AED: Automatic Discovery of Effective and Diverse Vulnerabilities for Autonomous Driving Policy with Large Language Models
by: Qiu, Le, et al.
Published: (2025)
by: Qiu, Le, et al.
Published: (2025)
Extracting Heuristics from Large Language Models for Reward Shaping in Reinforcement Learning
by: Bhambri, Siddhant, et al.
Published: (2024)
by: Bhambri, Siddhant, et al.
Published: (2024)
An Offline Adaptation Framework for Constrained Multi-Objective Reinforcement Learning
by: Lin, Qian, et al.
Published: (2024)
by: Lin, Qian, et al.
Published: (2024)
Confounding Robust Continuous Control via Automatic Reward Shaping
by: Juliani, Mateo, et al.
Published: (2026)
by: Juliani, Mateo, et al.
Published: (2026)
Multi-Agent Reinforcement Learning with a Hierarchy of Reward Machines
by: Zheng, Xuejing, et al.
Published: (2024)
by: Zheng, Xuejing, et al.
Published: (2024)
Text2Reward: Reward Shaping with Language Models for Reinforcement Learning
by: Xie, Tianbao, et al.
Published: (2023)
by: Xie, Tianbao, et al.
Published: (2023)
MASP: Scalable GNN-based Planning for Multi-Agent Navigation
by: Yang, Xinyi, et al.
Published: (2023)
by: Yang, Xinyi, et al.
Published: (2023)
Reward-Robust RLHF in LLMs
by: Yan, Yuzi, et al.
Published: (2024)
by: Yan, Yuzi, et al.
Published: (2024)
PARM: Multi-Objective Test-Time Alignment via Preference-Aware Autoregressive Reward Model
by: Lin, Baijiong, et al.
Published: (2025)
by: Lin, Baijiong, et al.
Published: (2025)
Automatically Finding Reward Model Biases
by: Wang, Atticus, et al.
Published: (2026)
by: Wang, Atticus, et al.
Published: (2026)
Boosting LLM Reasoning via Human-Inspired Reward Shaping
by: Lin, Wenze, et al.
Published: (2026)
by: Lin, Wenze, et al.
Published: (2026)
Reward-free Alignment for Conflicting Objectives
by: Chen, Peter, et al.
Published: (2026)
by: Chen, Peter, et al.
Published: (2026)
Language Models as Efficient Reward Function Searchers for Custom-Environment Multi-Objective Reinforcement
by: Xie, Guanwen, et al.
Published: (2024)
by: Xie, Guanwen, et al.
Published: (2024)
Zero-Shot LLMs in Human-in-the-Loop RL: Replacing Human Feedback for Reward Shaping
by: Nazir, Mohammad Saif, et al.
Published: (2025)
by: Nazir, Mohammad Saif, et al.
Published: (2025)
Enhancing Inverse Reinforcement Learning through Encoding Dynamic Information in Reward Shaping
by: Zhan, Simon Sinong, et al.
Published: (2024)
by: Zhan, Simon Sinong, et al.
Published: (2024)
Bootstrapped Reward Shaping
by: Adamczyk, Jacob, et al.
Published: (2025)
by: Adamczyk, Jacob, et al.
Published: (2025)
On the Limitations of Markovian Rewards to Express Multi-Objective, Risk-Sensitive, and Modal Tasks
by: Skalse, Joar, et al.
Published: (2024)
by: Skalse, Joar, et al.
Published: (2024)
VolleyBots: A Testbed for Multi-Drone Volleyball Game Combining Motion Control and Strategic Play
by: Xu, Zelai, et al.
Published: (2025)
by: Xu, Zelai, et al.
Published: (2025)
Multi-Task Reward Learning from Human Ratings
by: Wu, Mingkang, et al.
Published: (2025)
by: Wu, Mingkang, et al.
Published: (2025)
Blockwise Advantage Estimation for Multi-Objective RL with Verifiable Rewards
by: Pavlenko, Kirill, et al.
Published: (2026)
by: Pavlenko, Kirill, et al.
Published: (2026)
Attention-Based Reward Shaping for Sparse and Delayed Rewards
by: Holmes, Ian, et al.
Published: (2025)
by: Holmes, Ian, et al.
Published: (2025)
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
by: Zhou, Zhanhui, et al.
Published: (2023)
by: Zhou, Zhanhui, et al.
Published: (2023)
Reward Shaping to Mitigate Reward Hacking in RLHF
by: Fu, Jiayi, et al.
Published: (2025)
by: Fu, Jiayi, et al.
Published: (2025)
Neural-Network-Driven Reward Prediction as a Heuristic: Advancing Q-Learning for Mobile Robot Path Planning
by: Ji, Yiming, et al.
Published: (2024)
by: Ji, Yiming, et al.
Published: (2024)
MAESTRO: Multi-Agent Environment Shaping through Task and Reward Optimization
by: Wu, Boyuan
Published: (2025)
by: Wu, Boyuan
Published: (2025)
Revisiting the Learning Objectives of Vision-Language Reward Models
by: Roy, Simon, et al.
Published: (2025)
by: Roy, Simon, et al.
Published: (2025)
Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
by: Wang, Haoxiang, et al.
Published: (2024)
by: Wang, Haoxiang, et al.
Published: (2024)
Planning-Augmented Sampling with Early Guidance for High-Reward Discovery
by: Zhu, Rui, et al.
Published: (2025)
by: Zhu, Rui, et al.
Published: (2025)
A Survey of Automatic Prompt Engineering: An Optimization Perspective
by: Li, Wenwu, et al.
Published: (2025)
by: Li, Wenwu, et al.
Published: (2025)
Adaptive Online Mirror Descent for Tchebycheff Scalarization in Multi-Objective Learning
by: Liu, Meitong, et al.
Published: (2024)
by: Liu, Meitong, et al.
Published: (2024)
Collaborative Expert LLMs Guided Multi-Objective Molecular Optimization
by: Yu, Jiajun, et al.
Published: (2025)
by: Yu, Jiajun, et al.
Published: (2025)
Divide and Learn: Multi-Objective Combinatorial Optimization at Scale
by: Singh, Esha, et al.
Published: (2026)
by: Singh, Esha, et al.
Published: (2026)
The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping
by: Liu, Yang, et al.
Published: (2026)
by: Liu, Yang, et al.
Published: (2026)
BAMDP Shaping: a Unified Framework for Intrinsic Motivation and Reward Shaping
by: Lidayan, Aly, et al.
Published: (2024)
by: Lidayan, Aly, et al.
Published: (2024)
Shaping Sparse Rewards in Reinforcement Learning: A Semi-supervised Approach
by: Li, Wenyun, et al.
Published: (2025)
by: Li, Wenyun, et al.
Published: (2025)
Combining Automated Optimisation of Hyperparameters and Reward Shape
by: Dierkes, Julian, et al.
Published: (2024)
by: Dierkes, Julian, et al.
Published: (2024)
Accelerating Multi-Objective Collaborative Optimization of Doped Thermoelectric Materials via Artificial Intelligence
by: Zeng, Yuxuan, et al.
Published: (2025)
by: Zeng, Yuxuan, et al.
Published: (2025)
In-Context Multi-Objective Optimization
by: Zhang, Xinyu, et al.
Published: (2025)
by: Zhang, Xinyu, et al.
Published: (2025)
Bellman Error Centering
by: Chen, Xingguo, et al.
Published: (2025)
by: Chen, Xingguo, et al.
Published: (2025)
Similar Items
-
Automatic Reward Shaping from Confounded Offline Data
by: Li, Mingxuan, et al.
Published: (2025) -
AED: Automatic Discovery of Effective and Diverse Vulnerabilities for Autonomous Driving Policy with Large Language Models
by: Qiu, Le, et al.
Published: (2025) -
Extracting Heuristics from Large Language Models for Reward Shaping in Reinforcement Learning
by: Bhambri, Siddhant, et al.
Published: (2024) -
An Offline Adaptation Framework for Constrained Multi-Objective Reinforcement Learning
by: Lin, Qian, et al.
Published: (2024) -
Confounding Robust Continuous Control via Automatic Reward Shaping
by: Juliani, Mateo, et al.
Published: (2026)