Saved in:
| Main Authors: | Schwarzer, Will, Niekum, Scott |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2605.15134 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Evaluation-Aware Reinforcement Learning
by: Deshmukh, Shripad Vilasrao, et al.
Published: (2025)
by: Deshmukh, Shripad Vilasrao, et al.
Published: (2025)
Supervised Reward Inference
by: Schwarzer, Will, et al.
Published: (2025)
by: Schwarzer, Will, et al.
Published: (2025)
Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints
by: Chittepu, Yaswanth, et al.
Published: (2025)
by: Chittepu, Yaswanth, et al.
Published: (2025)
Bayesian Robust Optimization for Imitation Learning
by: Brown, Daniel S., et al.
Published: (2020)
by: Brown, Daniel S., et al.
Published: (2020)
Regularized Latent Dynamics Prediction is a Strong Baseline For Behavioral Foundation Models
by: Jajoo, Pranaya, et al.
Published: (2026)
by: Jajoo, Pranaya, et al.
Published: (2026)
On the Benefits of Inducing Local Lipschitzness for Robust Generative Adversarial Imitation Learning
by: Memarian, Farzan, et al.
Published: (2021)
by: Memarian, Farzan, et al.
Published: (2021)
Pairwise or Pointwise? Evaluating Feedback Protocols for Bias in LLM-Based Evaluation
by: Tripathi, Tuhina, et al.
Published: (2025)
by: Tripathi, Tuhina, et al.
Published: (2025)
Safe RLHF Beyond Expectation: Stochastic Dominance for Universal Spectral Risk Control
by: Chittepu, Yaswanth, et al.
Published: (2026)
by: Chittepu, Yaswanth, et al.
Published: (2026)
Dual RL: Unification and New Methods for Reinforcement and Imitation Learning
by: Sikchi, Harshit, et al.
Published: (2023)
by: Sikchi, Harshit, et al.
Published: (2023)
A Dual Approach to Imitation Learning from Observations with Offline Datasets
by: Sikchi, Harshit, et al.
Published: (2024)
by: Sikchi, Harshit, et al.
Published: (2024)
Adaptive Margin RLHF via Preference over Preferences
by: Chittepu, Yaswanth, et al.
Published: (2025)
by: Chittepu, Yaswanth, et al.
Published: (2025)
Pareto-Optimal Learning from Preferences with Hidden Context
by: Bahlous-Boldi, Ryan, et al.
Published: (2024)
by: Bahlous-Boldi, Ryan, et al.
Published: (2024)
An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning
by: Xu, Haoran, et al.
Published: (2025)
by: Xu, Haoran, et al.
Published: (2025)
SMORE: Score Models for Offline Goal-Conditioned Reinforcement Learning
by: Sikchi, Harshit, et al.
Published: (2023)
by: Sikchi, Harshit, et al.
Published: (2023)
A Descriptive and Normative Theory of Human Beliefs in RLHF
by: Dandekar, Sylee, et al.
Published: (2025)
by: Dandekar, Sylee, et al.
Published: (2025)
Quantile Activation: Correcting a Failure Mode of ML Models
by: Challa, Aditya, et al.
Published: (2024)
by: Challa, Aditya, et al.
Published: (2024)
Learning Action-based Representations Using Invariance
by: Rudolph, Max, et al.
Published: (2024)
by: Rudolph, Max, et al.
Published: (2024)
Automated Discovery of Functional Actual Causes in Complex Environments
by: Chuck, Caleb, et al.
Published: (2024)
by: Chuck, Caleb, et al.
Published: (2024)
Impact of ML Optimization Tactics on Greener Pre-Trained ML Models
by: Álvarez, Alexandra González, et al.
Published: (2024)
by: Álvarez, Alexandra González, et al.
Published: (2024)
Null Counterfactual Factor Interactions for Goal-Conditioned Reinforcement Learning
by: Chuck, Caleb, et al.
Published: (2025)
by: Chuck, Caleb, et al.
Published: (2025)
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms
by: Rafailov, Rafael, et al.
Published: (2024)
by: Rafailov, Rafael, et al.
Published: (2024)
Fast Adaptation with Behavioral Foundation Models
by: Sikchi, Harshit, et al.
Published: (2025)
by: Sikchi, Harshit, et al.
Published: (2025)
Contrastive Preference Learning: Learning from Human Feedback without RL
by: Hejna, Joey, et al.
Published: (2023)
by: Hejna, Joey, et al.
Published: (2023)
SpinML: Customized Synthetic Data Generation for Private Training of Specialized ML Models
by: Zhang, Jiang, et al.
Published: (2025)
by: Zhang, Jiang, et al.
Published: (2025)
SkiLD: Unsupervised Skill Discovery Guided by Factor Interactions
by: Wang, Zizhao, et al.
Published: (2024)
by: Wang, Zizhao, et al.
Published: (2024)
Are Deep Speech Denoising Models Robust to Adversarial Noise?
by: Schwarzer, Will, et al.
Published: (2025)
by: Schwarzer, Will, et al.
Published: (2025)
Chronic Diseases Prediction Using ML
by: Mulakala, Sri Varsha, et al.
Published: (2025)
by: Mulakala, Sri Varsha, et al.
Published: (2025)
ML Inference Scheduling with Predictable Latency
by: Zhao, Haidong, et al.
Published: (2025)
by: Zhao, Haidong, et al.
Published: (2025)
Optimizing ML Training with Metagradient Descent
by: Engstrom, Logan, et al.
Published: (2025)
by: Engstrom, Logan, et al.
Published: (2025)
Bayesian Joint Model of Multi-Sensor and Failure Event Data for Multi-Mode Failure Prediction
by: Fard, Sina Aghaee Dabaghan, et al.
Published: (2025)
by: Fard, Sina Aghaee Dabaghan, et al.
Published: (2025)
Fault Tolerant ML: Efficient Meta-Aggregation and Synchronous Training
by: Dahan, Tehila, et al.
Published: (2024)
by: Dahan, Tehila, et al.
Published: (2024)
Defining error accumulation in ML atmospheric simulators
by: Parthipan, Raghul, et al.
Published: (2024)
by: Parthipan, Raghul, et al.
Published: (2024)
Chunky Post-Training: Data Driven Failures of Generalization
by: Murray, Seoirse, et al.
Published: (2026)
by: Murray, Seoirse, et al.
Published: (2026)
SnatchML: Hijacking ML models without Training Access
by: Ghorbel, Mahmoud, et al.
Published: (2024)
by: Ghorbel, Mahmoud, et al.
Published: (2024)
RLZero: Direct Policy Inference from Language Without In-Domain Supervision
by: Sikchi, Harshit, et al.
Published: (2024)
by: Sikchi, Harshit, et al.
Published: (2024)
Development and Deployment of Hybrid ML Models for Critical Heat Flux Prediction in Annulus Geometries
by: Furlong, Aidan, et al.
Published: (2025)
by: Furlong, Aidan, et al.
Published: (2025)
CubicML: Automated ML for Large ML Systems Co-design with ML Prediction of Performance
by: Wen, Wei, et al.
Published: (2024)
by: Wen, Wei, et al.
Published: (2024)
Predicting Cascading Failures with a Hyperparametric Diffusion Model
by: Xiang, Bin, et al.
Published: (2024)
by: Xiang, Bin, et al.
Published: (2024)
Reducing Hyperparameter Tuning Costs in ML, Vision and Language Model Training Pipelines via Memoization-Awareness
by: Essofi, Abdelmajid, et al.
Published: (2024)
by: Essofi, Abdelmajid, et al.
Published: (2024)
A Frugal Model for Accurate Early Student Failure Prediction
by: Gagaoua, Ikram, et al.
Published: (2025)
by: Gagaoua, Ikram, et al.
Published: (2025)
Similar Items
-
Evaluation-Aware Reinforcement Learning
by: Deshmukh, Shripad Vilasrao, et al.
Published: (2025) -
Supervised Reward Inference
by: Schwarzer, Will, et al.
Published: (2025) -
Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints
by: Chittepu, Yaswanth, et al.
Published: (2025) -
Bayesian Robust Optimization for Imitation Learning
by: Brown, Daniel S., et al.
Published: (2020) -
Regularized Latent Dynamics Prediction is a Strong Baseline For Behavioral Foundation Models
by: Jajoo, Pranaya, et al.
Published: (2026)