:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Cerulli, Giovanni
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2403.20250
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Optimal Policy Learning under Budget and Coverage Constraints
by: Cerulli, Giovanni
Published: (2026)

Optimal Policy Learning for Multi-Action Treatment with Risk Preference using Stata
by: Cerulli, Giovanni
Published: (2025)

Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning
by: Zhang, Tianle, et al.
Published: (2024)

PERSCEN: Learning Personalized Interaction Pattern and Scenario Preference for Multi-Scenario Matching
by: Du, Haotong, et al.
Published: (2025)

COPR: Continual Human Preference Learning via Optimal Policy Regularization
by: Zhang, Han, et al.
Published: (2024)

POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning
by: Huang, Chang, et al.
Published: (2024)

Policy-labeled Preference Learning: Is Preference Enough for RLHF?
by: Cho, Taehyun, et al.
Published: (2025)

Optimal Policy Minimum Bayesian Risk
by: Astudillo, Ramón Fernandez, et al.
Published: (2025)

Can We Detect Failures Without Failure Data? Uncertainty-Aware Runtime Failure Detection for Imitation Learning Policies
by: Xu, Chen, et al.
Published: (2025)

Action-Free Offline-to-Online RL via Discretised State Policies
by: Neggatu, Natinael Solomon, et al.
Published: (2026)

Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline
by: Meng, Wenjia, et al.
Published: (2024)

Distributionally Robust Policy Evaluation and Learning for Continuous Treatment with Observational Data
by: Leung, Cheuk Hang, et al.
Published: (2025)

Ranking Policy Learning via Marketplace Expected Value Estimation From Observational Data
by: Ebrahimzadeh, Ehsan, et al.
Published: (2024)

Preference Conditioned Multi-Objective Reinforcement Learning: Decomposed, Diversity-Driven Policy Optimization
by: Ambadkar, Tanmay, et al.
Published: (2026)

Pareto-Optimal Estimation and Policy Learning on Short-term and Long-term Treatment Effects
by: Wang, Yingrong, et al.
Published: (2024)

Pareto-Optimal Learning from Preferences with Hidden Context
by: Bahlous-Boldi, Ryan, et al.
Published: (2024)

Preference Elicitation for Multi-objective Combinatorial Optimization with Active Learning and Maximum Likelihood Estimation
by: Defresne, Marianne, et al.
Published: (2025)

Learning Action Embeddings for Off-Policy Evaluation
by: Cief, Matej, et al.
Published: (2023)

Two-Step Offline Preference-Based Reinforcement Learning with Constrained Actions
by: Xu, Yinglun, et al.
Published: (2023)

Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
by: Kang, Hyungkyu, et al.
Published: (2025)

Multi-modal Heart Failure Risk Estimation based on Short ECG and Sampled Long-Term HRV
by: González, Sergio, et al.
Published: (2024)

Vision-Based Generic Potential Function for Policy Alignment in Multi-Agent Reinforcement Learning
by: Ma, Hao, et al.
Published: (2025)

Action-Adaptive Continual Learning: Enabling Policy Generalization under Dynamic Action Spaces
by: Pan, Chaofan, et al.
Published: (2025)

Meta-Aligner: Bidirectional Preference-Policy Optimization for Multi-Objective LLMs Alignment
by: Xu, Wenzhe, et al.
Published: (2026)

Optimal Signal Decomposition-based Multi-Stage Learning for Battery Health Estimation
by: Pamshetti, Vijay Babu, et al.
Published: (2025)

Fine-tuning Behavioral Cloning Policies with Preference-Based Reinforcement Learning
by: Macuglia, Maël, et al.
Published: (2025)

Learning Partial Action Replacement in Offline MARL
by: Jin, Yue, et al.
Published: (2026)

Preference Optimization by Estimating the Ratio of the Data Distribution
by: Kim, Yeongmin, et al.
Published: (2025)

Hindsight Preference Replay Improves Preference-Conditioned Multi-Objective Reinforcement Learning
by: Shianifar, Jonaid, et al.
Published: (2026)

Listwise Reward Estimation for Offline Preference-based Reinforcement Learning
by: Choi, Heewoong, et al.
Published: (2024)

Bayesian Off-Policy Evaluation and Learning for Large Action Spaces
by: Aouali, Imad, et al.
Published: (2024)

Cost-Sensitive Unbiased Risk Estimation for Multi-Class Positive-Unlabeled Learning
by: Zhang, Miao, et al.
Published: (2025)

Incentivizing Safer Actions in Policy Optimization for Constrained Reinforcement Learning
by: Hazra, Somnath, et al.
Published: (2025)

Evaluation-Time Policy Switching for Offline Reinforcement Learning
by: Neggatu, Natinael Solomon, et al.
Published: (2025)

Learning from Risk: LLM-Guided Generation of Safety-Critical Scenarios with Prior Knowledge
by: Wang, Yuhang, et al.
Published: (2025)

APLOT: Robust Reward Modeling via Adaptive Preference Learning with Optimal Transport
by: Li, Zhuo, et al.
Published: (2025)

Learning Optimal and Sample-Efficient Decision Policies with Guarantees
by: Shao, Daqian
Published: (2026)

Mitigating Preference Hacking in Policy Optimization with Pessimism
by: Gupta, Dhawal, et al.
Published: (2025)

Constrained Latent Action Policies for Model-Based Offline Reinforcement Learning
by: Alles, Marvin, et al.
Published: (2024)

Adaptive Action Chunking via Multi-Chunk Q Value Estimation
by: Shin, Yongjae, et al.
Published: (2026)