Saved in:
| Main Author: | Cerulli, Giovanni |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2403.20250 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Optimal Policy Learning under Budget and Coverage Constraints
by: Cerulli, Giovanni
Published: (2026)
by: Cerulli, Giovanni
Published: (2026)
Optimal Policy Learning for Multi-Action Treatment with Risk Preference using Stata
by: Cerulli, Giovanni
Published: (2025)
by: Cerulli, Giovanni
Published: (2025)
Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning
by: Zhang, Tianle, et al.
Published: (2024)
by: Zhang, Tianle, et al.
Published: (2024)
PERSCEN: Learning Personalized Interaction Pattern and Scenario Preference for Multi-Scenario Matching
by: Du, Haotong, et al.
Published: (2025)
by: Du, Haotong, et al.
Published: (2025)
COPR: Continual Human Preference Learning via Optimal Policy Regularization
by: Zhang, Han, et al.
Published: (2024)
by: Zhang, Han, et al.
Published: (2024)
POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning
by: Huang, Chang, et al.
Published: (2024)
by: Huang, Chang, et al.
Published: (2024)
Policy-labeled Preference Learning: Is Preference Enough for RLHF?
by: Cho, Taehyun, et al.
Published: (2025)
by: Cho, Taehyun, et al.
Published: (2025)
Optimal Policy Minimum Bayesian Risk
by: Astudillo, Ramón Fernandez, et al.
Published: (2025)
by: Astudillo, Ramón Fernandez, et al.
Published: (2025)
Can We Detect Failures Without Failure Data? Uncertainty-Aware Runtime Failure Detection for Imitation Learning Policies
by: Xu, Chen, et al.
Published: (2025)
by: Xu, Chen, et al.
Published: (2025)
Action-Free Offline-to-Online RL via Discretised State Policies
by: Neggatu, Natinael Solomon, et al.
Published: (2026)
by: Neggatu, Natinael Solomon, et al.
Published: (2026)
Off-OAB: Off-Policy Policy Gradient Method with Optimal Action-Dependent Baseline
by: Meng, Wenjia, et al.
Published: (2024)
by: Meng, Wenjia, et al.
Published: (2024)
Distributionally Robust Policy Evaluation and Learning for Continuous Treatment with Observational Data
by: Leung, Cheuk Hang, et al.
Published: (2025)
by: Leung, Cheuk Hang, et al.
Published: (2025)
Ranking Policy Learning via Marketplace Expected Value Estimation From Observational Data
by: Ebrahimzadeh, Ehsan, et al.
Published: (2024)
by: Ebrahimzadeh, Ehsan, et al.
Published: (2024)
Preference Conditioned Multi-Objective Reinforcement Learning: Decomposed, Diversity-Driven Policy Optimization
by: Ambadkar, Tanmay, et al.
Published: (2026)
by: Ambadkar, Tanmay, et al.
Published: (2026)
Pareto-Optimal Estimation and Policy Learning on Short-term and Long-term Treatment Effects
by: Wang, Yingrong, et al.
Published: (2024)
by: Wang, Yingrong, et al.
Published: (2024)
Pareto-Optimal Learning from Preferences with Hidden Context
by: Bahlous-Boldi, Ryan, et al.
Published: (2024)
by: Bahlous-Boldi, Ryan, et al.
Published: (2024)
Preference Elicitation for Multi-objective Combinatorial Optimization with Active Learning and Maximum Likelihood Estimation
by: Defresne, Marianne, et al.
Published: (2025)
by: Defresne, Marianne, et al.
Published: (2025)
Learning Action Embeddings for Off-Policy Evaluation
by: Cief, Matej, et al.
Published: (2023)
by: Cief, Matej, et al.
Published: (2023)
Two-Step Offline Preference-Based Reinforcement Learning with Constrained Actions
by: Xu, Yinglun, et al.
Published: (2023)
by: Xu, Yinglun, et al.
Published: (2023)
Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
by: Kang, Hyungkyu, et al.
Published: (2025)
by: Kang, Hyungkyu, et al.
Published: (2025)
Multi-modal Heart Failure Risk Estimation based on Short ECG and Sampled Long-Term HRV
by: González, Sergio, et al.
Published: (2024)
by: González, Sergio, et al.
Published: (2024)
Vision-Based Generic Potential Function for Policy Alignment in Multi-Agent Reinforcement Learning
by: Ma, Hao, et al.
Published: (2025)
by: Ma, Hao, et al.
Published: (2025)
Action-Adaptive Continual Learning: Enabling Policy Generalization under Dynamic Action Spaces
by: Pan, Chaofan, et al.
Published: (2025)
by: Pan, Chaofan, et al.
Published: (2025)
Meta-Aligner: Bidirectional Preference-Policy Optimization for Multi-Objective LLMs Alignment
by: Xu, Wenzhe, et al.
Published: (2026)
by: Xu, Wenzhe, et al.
Published: (2026)
Optimal Signal Decomposition-based Multi-Stage Learning for Battery Health Estimation
by: Pamshetti, Vijay Babu, et al.
Published: (2025)
by: Pamshetti, Vijay Babu, et al.
Published: (2025)
Fine-tuning Behavioral Cloning Policies with Preference-Based Reinforcement Learning
by: Macuglia, Maël, et al.
Published: (2025)
by: Macuglia, Maël, et al.
Published: (2025)
Learning Partial Action Replacement in Offline MARL
by: Jin, Yue, et al.
Published: (2026)
by: Jin, Yue, et al.
Published: (2026)
Preference Optimization by Estimating the Ratio of the Data Distribution
by: Kim, Yeongmin, et al.
Published: (2025)
by: Kim, Yeongmin, et al.
Published: (2025)
Hindsight Preference Replay Improves Preference-Conditioned Multi-Objective Reinforcement Learning
by: Shianifar, Jonaid, et al.
Published: (2026)
by: Shianifar, Jonaid, et al.
Published: (2026)
Listwise Reward Estimation for Offline Preference-based Reinforcement Learning
by: Choi, Heewoong, et al.
Published: (2024)
by: Choi, Heewoong, et al.
Published: (2024)
Bayesian Off-Policy Evaluation and Learning for Large Action Spaces
by: Aouali, Imad, et al.
Published: (2024)
by: Aouali, Imad, et al.
Published: (2024)
Cost-Sensitive Unbiased Risk Estimation for Multi-Class Positive-Unlabeled Learning
by: Zhang, Miao, et al.
Published: (2025)
by: Zhang, Miao, et al.
Published: (2025)
Incentivizing Safer Actions in Policy Optimization for Constrained Reinforcement Learning
by: Hazra, Somnath, et al.
Published: (2025)
by: Hazra, Somnath, et al.
Published: (2025)
Evaluation-Time Policy Switching for Offline Reinforcement Learning
by: Neggatu, Natinael Solomon, et al.
Published: (2025)
by: Neggatu, Natinael Solomon, et al.
Published: (2025)
Learning from Risk: LLM-Guided Generation of Safety-Critical Scenarios with Prior Knowledge
by: Wang, Yuhang, et al.
Published: (2025)
by: Wang, Yuhang, et al.
Published: (2025)
APLOT: Robust Reward Modeling via Adaptive Preference Learning with Optimal Transport
by: Li, Zhuo, et al.
Published: (2025)
by: Li, Zhuo, et al.
Published: (2025)
Learning Optimal and Sample-Efficient Decision Policies with Guarantees
by: Shao, Daqian
Published: (2026)
by: Shao, Daqian
Published: (2026)
Mitigating Preference Hacking in Policy Optimization with Pessimism
by: Gupta, Dhawal, et al.
Published: (2025)
by: Gupta, Dhawal, et al.
Published: (2025)
Constrained Latent Action Policies for Model-Based Offline Reinforcement Learning
by: Alles, Marvin, et al.
Published: (2024)
by: Alles, Marvin, et al.
Published: (2024)
Adaptive Action Chunking via Multi-Chunk Q Value Estimation
by: Shin, Yongjae, et al.
Published: (2026)
by: Shin, Yongjae, et al.
Published: (2026)
Similar Items
-
Optimal Policy Learning under Budget and Coverage Constraints
by: Cerulli, Giovanni
Published: (2026) -
Optimal Policy Learning for Multi-Action Treatment with Risk Preference using Stata
by: Cerulli, Giovanni
Published: (2025) -
Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning
by: Zhang, Tianle, et al.
Published: (2024) -
PERSCEN: Learning Personalized Interaction Pattern and Scenario Preference for Multi-Scenario Matching
by: Du, Haotong, et al.
Published: (2025) -
COPR: Continual Human Preference Learning via Optimal Policy Regularization
by: Zhang, Han, et al.
Published: (2024)