:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Subramani, Rohan, Williams, Marcus, Heitmann, Max, Holm, Halfdan, Griffin, Charlie, Skalse, Joar
Format:	Preprint
Published:	2023
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2310.11840
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Quantifying the Sensitivity of Inverse Reinforcement Learning to Misspecification
by: Skalse, Joar, et al.
Published: (2024)

Partial Identifiability and Misspecification in Inverse Reinforcement Learning
by: Skalse, Joar, et al.
Published: (2024)

On the Limitations of Markovian Rewards to Express Multi-Objective, Risk-Sensitive, and Modal Tasks
by: Skalse, Joar, et al.
Published: (2024)

Partial Identifiability in Inverse Reinforcement Learning For Agents With Non-Exponential Discounting
by: Skalse, Joar, et al.
Published: (2024)

Defining and Characterizing Reward Hacking
by: Skalse, Joar, et al.
Published: (2022)

The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret
by: Fluri, Lukas, et al.
Published: (2024)

STARC: A General Framework For Quantifying Differences Between Reward Functions
by: Skalse, Joar, et al.
Published: (2023)

Password-Activated Shutdown Protocols for Misaligned Frontier Agents
by: Williams, Kai, et al.
Published: (2025)

Automating Formal Verification with Reinforcement Learning and Recursive Inference
by: Tan, Max
Published: (2026)

Multi-objective Reinforcement learning from AI Feedback
by: Williams, Marcus
Published: (2024)

Scalable Multi-Objective Robot Reinforcement Learning through Gradient Conflict Resolution
by: Munn, Humphrey, et al.
Published: (2025)

EXPO: Stable Reinforcement Learning with Expressive Policies
by: Dong, Perry, et al.
Published: (2025)

Balancing Expressivity and Robustness: Constrained Rational Activations for Reinforcement Learning
by: Surdej, Rafał, et al.
Published: (2025)

Towards Formalizing Reinforcement Learning Theory
by: Zhang, Shangtong
Published: (2025)

Polychromic Objectives for Reinforcement Learning
by: Hamid, Jubayer Ibn, et al.
Published: (2025)

Pareto Set Learning for Multi-Objective Reinforcement Learning
by: Liu, Erlong, et al.
Published: (2025)

Expressive Temporal Specifications for Reward Monitoring
by: Adalat, Omar, et al.
Published: (2025)

Games for AI Control: Models of Safety Evaluations of AI Deployment Protocols
by: Griffin, Charlie, et al.
Published: (2024)

Preference-based Multi-Objective Reinforcement Learning
by: Mu, Ni, et al.
Published: (2025)

Expressive Value Learning for Scalable Offline Reinforcement Learning
by: Espinosa-Dice, Nicolas, et al.
Published: (2025)

Towards Provable Emergence of In-Context Reinforcement Learning
by: Wang, Jiuqi, et al.
Published: (2025)

Optimistic Reinforcement Learning with Quantile Objectives
by: Alipour-Vaezi, Mohammad, et al.
Published: (2025)

Objective-Specific Privileged Bases via Full-Prefix Matryoshka Learning
by: Talukder, Arghamitra, et al.
Published: (2026)

On Generalization Across Environments In Multi-Objective Reinforcement Learning
by: Teoh, Jayden, et al.
Published: (2025)

Q-Flow: Stable and Expressive Reinforcement Learning with Flow-Based Policy
by: Doo, JaeHyeok, et al.
Published: (2026)

Active Learning of Molecular Data for Task-Specific Objectives
by: Ghosh, Kunal, et al.
Published: (2024)

Reinforcement Learning with $ω$-Regular Objectives and Constraints
by: Wagner, Dominik, et al.
Published: (2025)

Multi-Objective Reinforcement Learning for Water Management
by: Osika, Zuzanna, et al.
Published: (2025)

Demonstration Guided Multi-Objective Reinforcement Learning
by: Lu, Junlin, et al.
Published: (2024)

The Formalism-Implementation Gap in Reinforcement Learning Research
by: Castro, Pablo Samuel
Published: (2025)

Emergence of Chemotactic Strategies with Multi-Agent Reinforcement Learning
by: Tovey, Samuel, et al.
Published: (2024)

A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning
by: Chen, Ying-Tu, et al.
Published: (2026)

Theoretical Study of Conflict-Avoidant Multi-Objective Reinforcement Learning
by: Wang, Yudan, et al.
Published: (2024)

Reward Dimension Reduction for Scalable Multi-Objective Reinforcement Learning
by: Park, Giseung, et al.
Published: (2025)

Breaking the Bias Barrier in Concave Multi-Objective Reinforcement Learning
by: Ganesh, Swetha, et al.
Published: (2026)

Constrained Multi-Objective Reinforcement Learning with Max-Min Criterion
by: Park, Giseung, et al.
Published: (2026)

Benchmarking Offline Multi-Objective Reinforcement Learning in Critical Care
by: Bansal, Aryaman, et al.
Published: (2025)

Multi-Objective Reinforcement Learning for Generating Covalent Inhibitor Candidates
by: Gil, Renee
Published: (2026)

Convergence and Emergence of In-Context Reinforcement Learning with Chain of Thought
by: Xie, Zixuan, et al.
Published: (2026)

Subversion Strategy Eval: Can language models statelessly strategize to subvert control protocols?
by: Mallen, Alex, et al.
Published: (2024)