Saved in:
| Main Author: | Kobayashi, Taisuke |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.17473 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Improved Regret Bound for Safe Reinforcement Learning via Tighter Cost Pessimism and Reward Optimism
by: Yu, Kihyun, et al.
Published: (2024)
by: Yu, Kihyun, et al.
Published: (2024)
Best-of-Tails: Bridging Optimism and Pessimism in Inference-Time Alignment
by: Hsu, Hsiang, et al.
Published: (2026)
by: Hsu, Hsiang, et al.
Published: (2026)
Consolidated Adaptive T-soft Update for Deep Reinforcement Learning
by: Kobayashi, Taisuke
Published: (2022)
by: Kobayashi, Taisuke
Published: (2022)
The Virtues of Pessimism in Inverse Reinforcement Learning
by: Wu, David, et al.
Published: (2024)
by: Wu, David, et al.
Published: (2024)
Intentionally-underestimated Value Function at Terminal State for Temporal-difference Learning with Mis-designed Reward
by: Kobayashi, Taisuke
Published: (2023)
by: Kobayashi, Taisuke
Published: (2023)
CubeDAgger: Interactive Imitation Learning for Dynamic Systems with Efficient yet Low-risk Interaction
by: Kobayashi, Taisuke
Published: (2025)
by: Kobayashi, Taisuke
Published: (2025)
Revisiting Experience Replayable Conditions
by: Kobayashi, Taisuke
Published: (2024)
by: Kobayashi, Taisuke
Published: (2024)
Flexible Empowerment at Reasoning with Extended Best-of-N Sampling
by: Kobayashi, Taisuke
Published: (2026)
by: Kobayashi, Taisuke
Published: (2026)
Pseudo-Quantized Actor-Critic Algorithm for Robustness to Noisy Temporal Difference Error
by: Kobayashi, Taisuke
Published: (2026)
by: Kobayashi, Taisuke
Published: (2026)
Improvements of Dark Experience Replay and Reservoir Sampling towards Better Balance between Consolidation and Plasticity
by: Kobayashi, Taisuke
Published: (2025)
by: Kobayashi, Taisuke
Published: (2025)
Minimax Optimal Reinforcement Learning with Quasi-Optimism
by: Lee, Harin, et al.
Published: (2025)
by: Lee, Harin, et al.
Published: (2025)
Pessimism Meets Risk: Risk-Sensitive Offline Reinforcement Learning
by: Zhang, Dake, et al.
Published: (2024)
by: Zhang, Dake, et al.
Published: (2024)
Pessimism-Free Offline Learning in General-Sum Games via KL Regularization
by: Chen, Claire, et al.
Published: (2026)
by: Chen, Claire, et al.
Published: (2026)
Unified PAC-Bayesian Study of Pessimism for Offline Policy Learning with Regularized Importance Sampling
by: Aouali, Imad, et al.
Published: (2024)
by: Aouali, Imad, et al.
Published: (2024)
Design of Restricted Normalizing Flow towards Arbitrary Stochastic Policy with Computational Efficiency
by: Kobayashi, Taisuke, et al.
Published: (2024)
by: Kobayashi, Taisuke, et al.
Published: (2024)
Variational Adaptive Noise and Dropout towards Stable Recurrent Neural Networks
by: Kobayashi, Taisuke, et al.
Published: (2025)
by: Kobayashi, Taisuke, et al.
Published: (2025)
Towards Autonomous Driving of Personal Mobility with Small and Noisy Dataset using Tsallis-statistics-based Behavioral Cloning
by: Kobayashi, Taisuke, et al.
Published: (2021)
by: Kobayashi, Taisuke, et al.
Published: (2021)
On the Dynamic Regret of Following the Regularized Leader: Optimism with History Pruning
by: Mhaisen, Naram, et al.
Published: (2025)
by: Mhaisen, Naram, et al.
Published: (2025)
Pessimism Principle Can Be Effective: Towards a Framework for Zero-Shot Transfer Reinforcement Learning
by: Zhang, Chi, et al.
Published: (2025)
by: Zhang, Chi, et al.
Published: (2025)
Optimism as Risk-Seeking in Multi-Agent Reinforcement Learning
by: Zhang, Runyu, et al.
Published: (2025)
by: Zhang, Runyu, et al.
Published: (2025)
Optimism Without Regularization: Constant Regret in Zero-Sum Games
by: Lazarsfeld, John, et al.
Published: (2025)
by: Lazarsfeld, John, et al.
Published: (2025)
Beyond Pessimism: Offline Learning in KL-regularized Games
by: Zhang, Yuheng, et al.
Published: (2026)
by: Zhang, Yuheng, et al.
Published: (2026)
P-DROP: Poisson-Based Dropout for Graph Neural Networks
by: Yun, Hyunsik
Published: (2025)
by: Yun, Hyunsik
Published: (2025)
Mitigating Preference Hacking in Policy Optimization with Pessimism
by: Gupta, Dhawal, et al.
Published: (2025)
by: Gupta, Dhawal, et al.
Published: (2025)
Distributional Reinforcement Learning with Regularized Wasserstein Loss
by: Sun, Ke, et al.
Published: (2022)
by: Sun, Ke, et al.
Published: (2022)
DROP: Poison Dilution via Knowledge Distillation for Federated Learning
by: Syros, Georgios, et al.
Published: (2025)
by: Syros, Georgios, et al.
Published: (2025)
Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes
by: Lu, Miao, et al.
Published: (2022)
by: Lu, Miao, et al.
Published: (2022)
Convergence Theorems for Entropy-Regularized and Distributional Reinforcement Learning
by: Jhaveri, Yash, et al.
Published: (2025)
by: Jhaveri, Yash, et al.
Published: (2025)
Quantile Geometry Regularization for Distributional Reinforcement Learning
by: Zhang, Zhaofan, et al.
Published: (2026)
by: Zhang, Zhaofan, et al.
Published: (2026)
A Tale of Two Cities: Pessimism and Opportunism in Offline Dynamic Pricing
by: Bian, Zeyu, et al.
Published: (2024)
by: Bian, Zeyu, et al.
Published: (2024)
From Curiosity to Caution: Mitigating Reward Hacking for Best-of-N with Pessimism
by: Yu, Zhuohao, et al.
Published: (2026)
by: Yu, Zhuohao, et al.
Published: (2026)
From Robotics to Sepsis Treatment: Offline RL via Geometric Pessimism
by: Wanjari, Sarthak
Published: (2026)
by: Wanjari, Sarthak
Published: (2026)
Directional Optimism for Safe Linear Bandits
by: Hutchinson, Spencer, et al.
Published: (2023)
by: Hutchinson, Spencer, et al.
Published: (2023)
Weber-Fechner Law in Temporal Difference learning derived from Control as Inference
by: Takahashi, Keiichiro, et al.
Published: (2024)
by: Takahashi, Keiichiro, et al.
Published: (2024)
Online (Non-)Convex Learning via Tempered Optimism
by: Haddouche, Maxime, et al.
Published: (2023)
by: Haddouche, Maxime, et al.
Published: (2023)
Federated Distributional Reinforcement Learning with Distributional Critic Regularization
by: Millard, David, et al.
Published: (2026)
by: Millard, David, et al.
Published: (2026)
Escaping Offline Pessimism: Vector-Field Reward Shaping for Safe Frontier Exploration
by: Roknilamouki, Amirhossein, et al.
Published: (2026)
by: Roknilamouki, Amirhossein, et al.
Published: (2026)
Beyond Optimism: Exploration With Partially Observable Rewards
by: Parisi, Simone, et al.
Published: (2024)
by: Parisi, Simone, et al.
Published: (2024)
Revisiting Optimism and Model Complexity in the Wake of Overparameterized Machine Learning
by: Patil, Pratik, et al.
Published: (2024)
by: Patil, Pratik, et al.
Published: (2024)
Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality
by: Jin, Ying, et al.
Published: (2022)
by: Jin, Ying, et al.
Published: (2022)
Similar Items
-
Improved Regret Bound for Safe Reinforcement Learning via Tighter Cost Pessimism and Reward Optimism
by: Yu, Kihyun, et al.
Published: (2024) -
Best-of-Tails: Bridging Optimism and Pessimism in Inference-Time Alignment
by: Hsu, Hsiang, et al.
Published: (2026) -
Consolidated Adaptive T-soft Update for Deep Reinforcement Learning
by: Kobayashi, Taisuke
Published: (2022) -
The Virtues of Pessimism in Inverse Reinforcement Learning
by: Wu, David, et al.
Published: (2024) -
Intentionally-underestimated Value Function at Terminal State for Temporal-difference Learning with Mis-designed Reward
by: Kobayashi, Taisuke
Published: (2023)