Saved in:
| Main Authors: | Bahlous-Boldi, Ryan, Ding, Li, Spector, Lee, Niekum, Scott |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2406.15599 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Adaptive Margin RLHF via Preference over Preferences
by: Chittepu, Yaswanth, et al.
Published: (2025)
by: Chittepu, Yaswanth, et al.
Published: (2025)
An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning
by: Xu, Haoran, et al.
Published: (2025)
by: Xu, Haoran, et al.
Published: (2025)
Evaluation-Aware Reinforcement Learning
by: Deshmukh, Shripad Vilasrao, et al.
Published: (2025)
by: Deshmukh, Shripad Vilasrao, et al.
Published: (2025)
Contrastive Preference Learning: Learning from Human Feedback without RL
by: Hejna, Joey, et al.
Published: (2023)
by: Hejna, Joey, et al.
Published: (2023)
Dominated Novelty Search: Rethinking Local Competition in Quality-Diversity
by: Bahlous-Boldi, Ryan, et al.
Published: (2025)
by: Bahlous-Boldi, Ryan, et al.
Published: (2025)
A Dual Approach to Imitation Learning from Observations with Offline Datasets
by: Sikchi, Harshit, et al.
Published: (2024)
by: Sikchi, Harshit, et al.
Published: (2024)
Vector Policy Optimization: Training for Diversity Improves Test-Time Search
by: Bahlous-Boldi, Ryan, et al.
Published: (2026)
by: Bahlous-Boldi, Ryan, et al.
Published: (2026)
Dual RL: Unification and New Methods for Reinforcement and Imitation Learning
by: Sikchi, Harshit, et al.
Published: (2023)
by: Sikchi, Harshit, et al.
Published: (2023)
Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF
by: Siththaranjan, Anand, et al.
Published: (2023)
by: Siththaranjan, Anand, et al.
Published: (2023)
Safe RLHF Beyond Expectation: Stochastic Dominance for Universal Spectral Risk Control
by: Chittepu, Yaswanth, et al.
Published: (2026)
by: Chittepu, Yaswanth, et al.
Published: (2026)
Learning Action-based Representations Using Invariance
by: Rudolph, Max, et al.
Published: (2024)
by: Rudolph, Max, et al.
Published: (2024)
Learning Pareto-Optimal Rewards from Noisy Preferences: A Framework for Multi-Objective Inverse Reinforcement Learning
by: Cherukuri, Kalyan, et al.
Published: (2025)
by: Cherukuri, Kalyan, et al.
Published: (2025)
SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning
by: Sikchi, Harshit, et al.
Published: (2023)
by: Sikchi, Harshit, et al.
Published: (2023)
Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints
by: Chittepu, Yaswanth, et al.
Published: (2025)
by: Chittepu, Yaswanth, et al.
Published: (2025)
Null Counterfactual Factor Interactions for Goal-Conditioned Reinforcement Learning
by: Chuck, Caleb, et al.
Published: (2025)
by: Chuck, Caleb, et al.
Published: (2025)
A Descriptive and Normative Theory of Human Beliefs in RLHF
by: Dandekar, Sylee, et al.
Published: (2025)
by: Dandekar, Sylee, et al.
Published: (2025)
syftr: Pareto-Optimal Generative AI
by: Conway, Alexander, et al.
Published: (2025)
by: Conway, Alexander, et al.
Published: (2025)
Pareto Continual Learning: Preference-Conditioned Learning and Adaption for Dynamic Stability-Plasticity Trade-off
by: Lai, Song, et al.
Published: (2025)
by: Lai, Song, et al.
Published: (2025)
Pareto-Optimal Estimation and Policy Learning on Short-term and Long-term Treatment Effects
by: Wang, Yingrong, et al.
Published: (2024)
by: Wang, Yingrong, et al.
Published: (2024)
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning
by: Lou, Chenwei, et al.
Published: (2025)
by: Lou, Chenwei, et al.
Published: (2025)
Preference Guided Iterated Pareto Referent Optimisation for Accessible Route Planning
by: Speziali, Paolo, et al.
Published: (2026)
by: Speziali, Paolo, et al.
Published: (2026)
Automated Discovery of Functional Actual Causes in Complex Environments
by: Chuck, Caleb, et al.
Published: (2024)
by: Chuck, Caleb, et al.
Published: (2024)
Regularized Latent Dynamics Prediction is a Strong Baseline For Behavioral Foundation Models
by: Jajoo, Pranaya, et al.
Published: (2026)
by: Jajoo, Pranaya, et al.
Published: (2026)
Domain Generalization via Pareto Optimal Gradient Matching
by: Do, Khoi, et al.
Published: (2025)
by: Do, Khoi, et al.
Published: (2025)
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms
by: Rafailov, Rafael, et al.
Published: (2024)
by: Rafailov, Rafael, et al.
Published: (2024)
Learning to Select In-Context Demonstration Preferred by Large Language Model
by: Zhang, Zheng, et al.
Published: (2025)
by: Zhang, Zheng, et al.
Published: (2025)
In-Context Learning for Pure Exploration in Continuous Spaces
by: Russo, Alessio, et al.
Published: (2026)
by: Russo, Alessio, et al.
Published: (2026)
Learning Pareto-Optimal Pandemic Intervention Policies with MORL
by: Chen, Marian, et al.
Published: (2025)
by: Chen, Marian, et al.
Published: (2025)
Policy-labeled Preference Learning: Is Preference Enough for RLHF?
by: Cho, Taehyun, et al.
Published: (2025)
by: Cho, Taehyun, et al.
Published: (2025)
Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization
by: Bhatnagar, Aadyot, et al.
Published: (2026)
by: Bhatnagar, Aadyot, et al.
Published: (2026)
Locally Pareto-Optimal Interpretations for Black-Box Machine Learning Models
by: Joshi, Aniruddha, et al.
Published: (2025)
by: Joshi, Aniruddha, et al.
Published: (2025)
Digital Red Queen: Adversarial Program Evolution in Core War with LLMs
by: Kumar, Akarsh, et al.
Published: (2026)
by: Kumar, Akarsh, et al.
Published: (2026)
In-Context Learning for Pure Exploration
by: Russo, Alessio, et al.
Published: (2025)
by: Russo, Alessio, et al.
Published: (2025)
APLOT: Robust Reward Modeling via Adaptive Preference Learning with Optimal Transport
by: Li, Zhuo, et al.
Published: (2025)
by: Li, Zhuo, et al.
Published: (2025)
Pareto Optimal Algorithmic Recourse in Multi-cost Function
by: Chen, Wen-Ling, et al.
Published: (2025)
by: Chen, Wen-Ling, et al.
Published: (2025)
COPR: Continual Human Preference Learning via Optimal Policy Regularization
by: Zhang, Han, et al.
Published: (2024)
by: Zhang, Han, et al.
Published: (2024)
Clear Preferences Leave Traces: Reference Model-Guided Sampling for Preference Learning
by: Diwan, Nirav, et al.
Published: (2025)
by: Diwan, Nirav, et al.
Published: (2025)
Optimal Transport for LLM Reward Modeling from Noisy Preference
by: Pan, Licheng, et al.
Published: (2026)
by: Pan, Licheng, et al.
Published: (2026)
In-Context Reward Adaptation for Robust Preference Modeling
by: Sun, Zhenyu, et al.
Published: (2026)
by: Sun, Zhenyu, et al.
Published: (2026)
Fast Adaptation with Behavioral Foundation Models
by: Sikchi, Harshit, et al.
Published: (2025)
by: Sikchi, Harshit, et al.
Published: (2025)
Similar Items
-
Adaptive Margin RLHF via Preference over Preferences
by: Chittepu, Yaswanth, et al.
Published: (2025) -
An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning
by: Xu, Haoran, et al.
Published: (2025) -
Evaluation-Aware Reinforcement Learning
by: Deshmukh, Shripad Vilasrao, et al.
Published: (2025) -
Contrastive Preference Learning: Learning from Human Feedback without RL
by: Hejna, Joey, et al.
Published: (2023) -
Dominated Novelty Search: Rethinking Local Competition in Quality-Diversity
by: Bahlous-Boldi, Ryan, et al.
Published: (2025)