Saved in:
| Main Authors: | Cheng, Yuwei, Zhao, Zifeng, Xu, Haifeng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.20055 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Beyond Simple Sum of Delayed Rewards: Non-Markovian Reward Modeling for Reinforcement Learning
by: Tang, Yuting, et al.
Published: (2024)
by: Tang, Yuting, et al.
Published: (2024)
Learning from Imperfect Human Feedback: a Tale from Corruption-Robust Dueling
by: Cheng, Yuwei, et al.
Published: (2024)
by: Cheng, Yuwei, et al.
Published: (2024)
Transfer Learning for Nonparametric Contextual Dynamic Pricing
by: Wang, Fan, et al.
Published: (2025)
by: Wang, Fan, et al.
Published: (2025)
Alternating Reinforcement Learning with Contextual Rubric Rewards: Beyond the Scalarization Strategy
by: Lan, Guangchen, et al.
Published: (2026)
by: Lan, Guangchen, et al.
Published: (2026)
Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards
by: Lu, Xiaodong, et al.
Published: (2026)
by: Lu, Xiaodong, et al.
Published: (2026)
2048: Reinforcement Learning in a Delayed Reward Environment
by: Saligram, Prady, et al.
Published: (2025)
by: Saligram, Prady, et al.
Published: (2025)
Percentile-Based Deep Reinforcement Learning and Reward Based Personalization For Delay Aware RAN Slicing in O-RAN
by: Tehrani, Peyman, et al.
Published: (2025)
by: Tehrani, Peyman, et al.
Published: (2025)
Escaping Model Collapse via Synthetic Data Verification: Near-term Improvements and Long-term Convergence
by: Yi, Bingji, et al.
Published: (2025)
by: Yi, Bingji, et al.
Published: (2025)
Focal Reward: Balanced Reinforcement Learning under Rubric-Based Rewards
by: Huang, Yu, et al.
Published: (2026)
by: Huang, Yu, et al.
Published: (2026)
FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning
by: Fu, Yuwei, et al.
Published: (2024)
by: Fu, Yuwei, et al.
Published: (2024)
Fine-Tuning Improves Information Conveyance in Language Models
by: Cheng, Yuwei, et al.
Published: (2026)
by: Cheng, Yuwei, et al.
Published: (2026)
PersRM-R1: Enhance Personalized Reward Modeling with Reinforcement Learning
by: Li, Mengdi, et al.
Published: (2025)
by: Li, Mengdi, et al.
Published: (2025)
Contextual Pre-planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning
by: Azran, Guy, et al.
Published: (2023)
by: Azran, Guy, et al.
Published: (2023)
Contextual Dynamic Pricing: Algorithms, Optimality, and Local Differential Privacy Constraints
by: Zhao, Zifeng, et al.
Published: (2024)
by: Zhao, Zifeng, et al.
Published: (2024)
Robot Policy Learning with Temporal Optimal Transport Reward
by: Fu, Yuwei, et al.
Published: (2024)
by: Fu, Yuwei, et al.
Published: (2024)
Which Rewards Matter? Reward Selection for Reinforcement Learning under Limited Feedback
by: Chaudhari, Shreyas, et al.
Published: (2025)
by: Chaudhari, Shreyas, et al.
Published: (2025)
Delay-Empowered Causal Hierarchical Reinforcement Learning
by: Zhao, Chenran, et al.
Published: (2026)
by: Zhao, Chenran, et al.
Published: (2026)
Binary Reward Labeling: Bridging Offline Preference and Reward-Based Reinforcement Learning
by: Xu, Yinglun, et al.
Published: (2024)
by: Xu, Yinglun, et al.
Published: (2024)
Reward-Conditioned Reinforcement Learning
by: Nauman, Michal, et al.
Published: (2026)
by: Nauman, Michal, et al.
Published: (2026)
Model-Based Reinforcement Learning under Random Observation Delays
by: Karamzade, Armin, et al.
Published: (2025)
by: Karamzade, Armin, et al.
Published: (2025)
Structure Detection for Contextual Reinforcement Learning
by: Zhou, Tianyue, et al.
Published: (2026)
by: Zhou, Tianyue, et al.
Published: (2026)
Locally Private Nonparametric Contextual Multi-armed Bandits
by: Ma, Yuheng, et al.
Published: (2025)
by: Ma, Yuheng, et al.
Published: (2025)
Model-Based Transfer Learning for Contextual Reinforcement Learning
by: Cho, Jung-Hoon, et al.
Published: (2024)
by: Cho, Jung-Hoon, et al.
Published: (2024)
Reinforcement Learning with Conditional Expectation Reward
by: Xiao, Changyi, et al.
Published: (2026)
by: Xiao, Changyi, et al.
Published: (2026)
Seldonian Reinforcement Learning for Ad Hoc Teamwork
by: Zorzi, Edoardo, et al.
Published: (2025)
by: Zorzi, Edoardo, et al.
Published: (2025)
TimeCAP: Learning to Contextualize, Augment, and Predict Time Series Events with Large Language Model Agents
by: Lee, Geon, et al.
Published: (2025)
by: Lee, Geon, et al.
Published: (2025)
Active Measuring in Reinforcement Learning With Delayed Negative Effects
by: Gao, Daiqi, et al.
Published: (2025)
by: Gao, Daiqi, et al.
Published: (2025)
DUET: Optimize Token-Budget Allocation for Reinforcement Learning with Verifiable Rewards
by: Hu, Haoyu, et al.
Published: (2026)
by: Hu, Haoyu, et al.
Published: (2026)
Personalizing Exposure Therapy via Reinforcement Learning
by: Mahmoudi-Nejad, Athar, et al.
Published: (2025)
by: Mahmoudi-Nejad, Athar, et al.
Published: (2025)
Deep Reinforcement Learning for Ranking Utility Tuning in the Ad Recommender System at Pinterest
by: Yang, Xiao, et al.
Published: (2025)
by: Yang, Xiao, et al.
Published: (2025)
Reward Design for Reinforcement Learning Agents
by: Devidze, Rati
Published: (2025)
by: Devidze, Rati
Published: (2025)
Transferable Delay-Aware Reinforcement Learning via Implicit Causal Graph Modeling
by: Zhao, Chenran, et al.
Published: (2026)
by: Zhao, Chenran, et al.
Published: (2026)
Reinforcement Learning from Bagged Reward
by: Tang, Yuting, et al.
Published: (2024)
by: Tang, Yuting, et al.
Published: (2024)
The Value of Reward Lookahead in Reinforcement Learning
by: Merlis, Nadav, et al.
Published: (2024)
by: Merlis, Nadav, et al.
Published: (2024)
Informativeness of Reward Functions in Reinforcement Learning
by: Devidze, Rati, et al.
Published: (2024)
by: Devidze, Rati, et al.
Published: (2024)
To the Max: Reinventing Reward in Reinforcement Learning
by: Veviurko, Grigorii, et al.
Published: (2024)
by: Veviurko, Grigorii, et al.
Published: (2024)
Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers
by: Cai, Xin-Qiang, et al.
Published: (2025)
by: Cai, Xin-Qiang, et al.
Published: (2025)
Contextual Intelligence The Next Leap for Reinforcement Learning
by: Biedenkapp, André
Published: (2026)
by: Biedenkapp, André
Published: (2026)
Learning Personalized Driving Styles via Reinforcement Learning from Human Feedback
by: Li, Derun, et al.
Published: (2025)
by: Li, Derun, et al.
Published: (2025)
LinguaFluid: Language Guided Fluid Control via Semantic Rewards in Reinforcement Learning
by: Liang, Aoming, et al.
Published: (2025)
by: Liang, Aoming, et al.
Published: (2025)
Similar Items
-
Beyond Simple Sum of Delayed Rewards: Non-Markovian Reward Modeling for Reinforcement Learning
by: Tang, Yuting, et al.
Published: (2024) -
Learning from Imperfect Human Feedback: a Tale from Corruption-Robust Dueling
by: Cheng, Yuwei, et al.
Published: (2024) -
Transfer Learning for Nonparametric Contextual Dynamic Pricing
by: Wang, Fan, et al.
Published: (2025) -
Alternating Reinforcement Learning with Contextual Rubric Rewards: Beyond the Scalarization Strategy
by: Lan, Guangchen, et al.
Published: (2026) -
Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards
by: Lu, Xiaodong, et al.
Published: (2026)