Saved in:
| Main Authors: | Ishida, Shu, Henriques, João F. |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.18913 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SOAP: Improving and Stabilizing Shampoo using Adam
by: Vyas, Nikhil, et al.
Published: (2024)
by: Vyas, Nikhil, et al.
Published: (2024)
Spatial Reasoning and Planning for Deep Embodied Agents
by: Ishida, Shu
Published: (2024)
by: Ishida, Shu
Published: (2024)
Scalable Option Learning in High-Throughput Environments
by: Henaff, Mikael, et al.
Published: (2025)
by: Henaff, Mikael, et al.
Published: (2025)
Skin-SOAP: A Weakly Supervised Framework for Generating Structured SOAP Notes
by: Kamal, Sadia, et al.
Published: (2025)
by: Kamal, Sadia, et al.
Published: (2025)
RL-GPT: Integrating Reinforcement Learning and Code-as-policy
by: Liu, Shaoteng, et al.
Published: (2024)
by: Liu, Shaoteng, et al.
Published: (2024)
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
by: Brantley, Kianté, et al.
Published: (2025)
by: Brantley, Kianté, et al.
Published: (2025)
LangProp: A code optimization framework using Large Language Models applied to driving
by: Ishida, Shu, et al.
Published: (2024)
by: Ishida, Shu, et al.
Published: (2024)
Selective Uncertainty Propagation in Offline RL
by: Krishnamurthy, Sanath Kumar, et al.
Published: (2023)
by: Krishnamurthy, Sanath Kumar, et al.
Published: (2023)
Expert-Guided POMDP Learning for Data-Efficient Modeling in Healthcare
by: Locatelli, Marco, et al.
Published: (2025)
by: Locatelli, Marco, et al.
Published: (2025)
Teaching RL Agents to Act Better: VLM as Action Advisor for Online Reinforcement Learning
by: Wu, Xiefeng, et al.
Published: (2025)
by: Wu, Xiefeng, et al.
Published: (2025)
Rao-Blackwellized POMDP Planning
by: Lee, Jiho, et al.
Published: (2024)
by: Lee, Jiho, et al.
Published: (2024)
Advantage-Guided Diffusion for Model-Based Reinforcement Learning
by: Foffano, Daniele, et al.
Published: (2026)
by: Foffano, Daniele, et al.
Published: (2026)
RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$
by: Bhatia, Abhinav, et al.
Published: (2023)
by: Bhatia, Abhinav, et al.
Published: (2023)
Deep Belief Markov Models for POMDP Inference
by: Arcieri, Giacomo, et al.
Published: (2025)
by: Arcieri, Giacomo, et al.
Published: (2025)
TreeAdv: Tree-Structured Advantage Redistribution for Group-Based RL
by: Cao, Lang, et al.
Published: (2026)
by: Cao, Lang, et al.
Published: (2026)
OptionZero: Planning with Learned Options
by: Huang, Po-Wei, et al.
Published: (2025)
by: Huang, Po-Wei, et al.
Published: (2025)
Boosting deep Reinforcement Learning using pretraining with Logical Options
by: Ye, Zihan, et al.
Published: (2026)
by: Ye, Zihan, et al.
Published: (2026)
Memory-Based Advantage Shaping for LLM-Guided Reinforcement Learning
by: Nourzad, Narjes, et al.
Published: (2026)
by: Nourzad, Narjes, et al.
Published: (2026)
Learning Optimal Defender Strategies for CAGE-2 using a POMDP Model
by: Le, Duc Huy, et al.
Published: (2025)
by: Le, Duc Huy, et al.
Published: (2025)
AdvantageFlow: Advantage-Weighted Least Squares for RL in Flow Models
by: Kveton, Branislav, et al.
Published: (2026)
by: Kveton, Branislav, et al.
Published: (2026)
Quantum Advantage Actor-Critic for Reinforcement Learning
by: Kölle, Michael, et al.
Published: (2024)
by: Kölle, Michael, et al.
Published: (2024)
Option Discovery Using LLM-guided Semantic Hierarchical Reinforcement Learning
by: Shek, Chak Lam, et al.
Published: (2025)
by: Shek, Chak Lam, et al.
Published: (2025)
An Advantage-based Optimization Method for Reinforcement Learning in Large Action Space
by: Lin, Hai, et al.
Published: (2024)
by: Lin, Hai, et al.
Published: (2024)
ADORA: Training Reasoning Models with Dynamic Advantage Estimation on Reinforcement Learning
by: Ren, Qingnan, et al.
Published: (2026)
by: Ren, Qingnan, et al.
Published: (2026)
Mitigating Reward Hacking in RLHF via Advantage Sign Robustness
by: Ono, Shinnosuke, et al.
Published: (2026)
by: Ono, Shinnosuke, et al.
Published: (2026)
Learning Explainable and Better Performing Representations of POMDP Strategies
by: Bork, Alexander, et al.
Published: (2024)
by: Bork, Alexander, et al.
Published: (2024)
Adaptive Advantage-Guided Policy Regularization for Offline Reinforcement Learning
by: Liu, Tenglong, et al.
Published: (2024)
by: Liu, Tenglong, et al.
Published: (2024)
Blockwise Advantage Estimation for Multi-Objective RL with Verifiable Rewards
by: Pavlenko, Kirill, et al.
Published: (2026)
by: Pavlenko, Kirill, et al.
Published: (2026)
EchoRL: Reinforcement Learning via Rollout Echoing
by: Bi, Jinhe, et al.
Published: (2026)
by: Bi, Jinhe, et al.
Published: (2026)
Preference-based Reinforcement Learning beyond Pairwise Comparisons: Benefits of Multiple Options
by: Lee, Joongkyu, et al.
Published: (2025)
by: Lee, Joongkyu, et al.
Published: (2025)
Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning
by: Ahn, Hongjoon, et al.
Published: (2025)
by: Ahn, Hongjoon, et al.
Published: (2025)
Anytime Incremental $ρ$POMDP Planning in Continuous Spaces
by: Benchetrit, Ron, et al.
Published: (2025)
by: Benchetrit, Ron, et al.
Published: (2025)
When are LLMs Sufficient Policy Optimizers for Sequential RL Tasks?
by: Hatgis-Kessell, Stephane, et al.
Published: (2026)
by: Hatgis-Kessell, Stephane, et al.
Published: (2026)
The Context Gathering Decision Process: A POMDP Framework for Agentic Search
by: Kausik, Chinmaya, et al.
Published: (2026)
by: Kausik, Chinmaya, et al.
Published: (2026)
Off-Policy Corrected Reward Modeling for Reinforcement Learning from Human Feedback
by: Ackermann, Johannes, et al.
Published: (2025)
by: Ackermann, Johannes, et al.
Published: (2025)
Momentum Boosted Episodic Memory for Improving Learning in Long-Tailed RL Environments
by: Fernandes, Dolton, et al.
Published: (2025)
by: Fernandes, Dolton, et al.
Published: (2025)
A2PO: Towards Effective Offline Reinforcement Learning from an Advantage-aware Perspective
by: Qing, Yunpeng, et al.
Published: (2024)
by: Qing, Yunpeng, et al.
Published: (2024)
RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations
by: Marchesini, Enrico, et al.
Published: (2025)
by: Marchesini, Enrico, et al.
Published: (2025)
MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents
by: Xu, Yifan, et al.
Published: (2025)
by: Xu, Yifan, et al.
Published: (2025)
RL4CO: an Extensive Reinforcement Learning for Combinatorial Optimization Benchmark
by: Berto, Federico, et al.
Published: (2023)
by: Berto, Federico, et al.
Published: (2023)
Similar Items
-
SOAP: Improving and Stabilizing Shampoo using Adam
by: Vyas, Nikhil, et al.
Published: (2024) -
Spatial Reasoning and Planning for Deep Embodied Agents
by: Ishida, Shu
Published: (2024) -
Scalable Option Learning in High-Throughput Environments
by: Henaff, Mikael, et al.
Published: (2025) -
Skin-SOAP: A Weakly Supervised Framework for Generating Structured SOAP Notes
by: Kamal, Sadia, et al.
Published: (2025) -
RL-GPT: Integrating Reinforcement Learning and Code-as-policy
by: Liu, Shaoteng, et al.
Published: (2024)