Saved in:
| Main Authors: | Chen, Ming-Hong, Pan, Kuan-Chen, Huang, You-De, Liu, Xi, Hsieh, Ping-Chun |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.12087 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Semi-Supervised Cross-Domain Imitation Learning
by: Chu, Li-Min, et al.
Published: (2026)
by: Chu, Li-Min, et al.
Published: (2026)
Enhancing Offline Model-Based RL via Active Model Selection: A Bayesian Optimization Perspective
by: Yang, Yu-Wei, et al.
Published: (2025)
by: Yang, Yu-Wei, et al.
Published: (2025)
Diminishing Exploration: A Minimalist Approach to Piecewise Stationary Multi-Armed Bandits
by: Li, Kuan-Ta, et al.
Published: (2024)
by: Li, Kuan-Ta, et al.
Published: (2024)
Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement Learning
by: Chen, Yen-Ju, et al.
Published: (2023)
by: Chen, Yen-Ju, et al.
Published: (2023)
Q-Pensieve: Boosting Sample Efficiency of Multi-Objective RL Through Memory Sharing of Q-Snapshots
by: Hung, Wei, et al.
Published: (2022)
by: Hung, Wei, et al.
Published: (2022)
A Modularized Framework for Piecewise-Stationary Restless Bandits
by: Li, Kuan-Ta, et al.
Published: (2026)
by: Li, Kuan-Ta, et al.
Published: (2026)
Natural Policy Gradient as Doubly Smoothed Policy Iteration: A Bellman-Operator Framework
by: Nanda, Phalguni, et al.
Published: (2026)
by: Nanda, Phalguni, et al.
Published: (2026)
Is Bellman Equation Enough for Learning Control?
by: You, Haoxiang, et al.
Published: (2025)
by: You, Haoxiang, et al.
Published: (2025)
Target-Aligned Bellman Backup for Cross-domain Offline Reinforcement Learning
by: Liu, Wei, et al.
Published: (2026)
by: Liu, Wei, et al.
Published: (2026)
DDOT: A Derivative-directed Dual-decoder Ordinary Differential Equation Transformer for Dynamic System Modeling
by: Chang, Yang, et al.
Published: (2025)
by: Chang, Yang, et al.
Published: (2025)
Bellman Error Centering
by: Chen, Xingguo, et al.
Published: (2025)
by: Chen, Xingguo, et al.
Published: (2025)
COPO: Consistency-Aware Policy Optimization
by: Han, Jinghang, et al.
Published: (2025)
by: Han, Jinghang, et al.
Published: (2025)
Plan2Cleanse: Test-Time Backdoor Defense via Monte-Carlo Planning in Deep Reinforcement Learning
by: Chen, Sze-Ann, et al.
Published: (2026)
by: Chen, Sze-Ann, et al.
Published: (2026)
A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning
by: Chen, Ying-Tu, et al.
Published: (2026)
by: Chen, Ying-Tu, et al.
Published: (2026)
Addressing Long-Tail Noisy Label Learning Problems: a Two-Stage Solution with Label Refurbishment Considering Label Rarity
by: Wu, Ying-Hsuan, et al.
Published: (2024)
by: Wu, Ying-Hsuan, et al.
Published: (2024)
PPO-Clip Attains Global Optimality: Towards Deeper Understandings of Clipping
by: Huang, Nai-Chieh, et al.
Published: (2023)
by: Huang, Nai-Chieh, et al.
Published: (2023)
Fair Federated Learning under Domain Skew with Local Consistency and Domain Diversity
by: Chen, Yuhang, et al.
Published: (2024)
by: Chen, Yuhang, et al.
Published: (2024)
Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RL
by: Luo, Qin-Wen, et al.
Published: (2025)
by: Luo, Qin-Wen, et al.
Published: (2025)
BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL
by: Hung, Yu-Heng, et al.
Published: (2025)
by: Hung, Yu-Heng, et al.
Published: (2025)
Cross-View Graph Consistency Learning for Invariant Graph Representations
by: Chen, Jie, et al.
Published: (2023)
by: Chen, Jie, et al.
Published: (2023)
DRPO: Efficient Reasoning via Decoupled Reward Policy Optimization
by: Li, Gang, et al.
Published: (2025)
by: Li, Gang, et al.
Published: (2025)
Distributional Off-policy Evaluation with Bellman Residual Minimization
by: Hong, Sungee, et al.
Published: (2024)
by: Hong, Sungee, et al.
Published: (2024)
Non-Stationary Restless Multi-Armed Bandits with Provable Guarantee
by: Hung, Yu-Heng, et al.
Published: (2025)
by: Hung, Yu-Heng, et al.
Published: (2025)
Cross-Domain Policy Transfer by Representation Alignment via Multi-Domain Behavioral Cloning
by: Watahiki, Hayato, et al.
Published: (2024)
by: Watahiki, Hayato, et al.
Published: (2024)
From Reward-Free Representations to Preferences: Rethinking Offline Preference-Based Reinforcement Learning
by: Yang, Jun-Jie, et al.
Published: (2026)
by: Yang, Jun-Jie, et al.
Published: (2026)
BiKC: Keypose-Conditioned Consistency Policy for Bimanual Robotic Manipulation
by: Yu, Dongjie, et al.
Published: (2024)
by: Yu, Dongjie, et al.
Published: (2024)
Learning Human-Like RL Agents Through Trajectory Optimization With Action Quantization
by: Guo, Jian-Ting, et al.
Published: (2025)
by: Guo, Jian-Ting, et al.
Published: (2025)
Boosting Continuous Control with Consistency Policy
by: Chen, Yuhui, et al.
Published: (2023)
by: Chen, Yuhui, et al.
Published: (2023)
Efficient Action-Constrained Reinforcement Learning via Acceptance-Rejection Method and Augmented MDPs
by: Hung, Wei, et al.
Published: (2025)
by: Hung, Wei, et al.
Published: (2025)
Unleashing Flow Policies with Distributional Critics
by: Chen, Deshu, et al.
Published: (2025)
by: Chen, Deshu, et al.
Published: (2025)
Large Margin Mechanism and Pseudo Query Set on Cross-Domain Few-Shot Learning
by: Yeh, Jia-Fong, et al.
Published: (2020)
by: Yeh, Jia-Fong, et al.
Published: (2020)
Digital Twin-based Control Co-Design of Full Vehicle Active Suspensions via Deep Reinforcement Learning
by: Tsai, Ying-Kuan, et al.
Published: (2025)
by: Tsai, Ying-Kuan, et al.
Published: (2025)
Cross-Domain Offline Policy Adaptation via Selective Transition Correction
by: Yan, Mengbei, et al.
Published: (2026)
by: Yan, Mengbei, et al.
Published: (2026)
POME: Post Optimization Model Edit via Muon-style Projection
by: Liu, Yong, et al.
Published: (2025)
by: Liu, Yong, et al.
Published: (2025)
Markowitz Meets Bellman: Knowledge-distilled Reinforcement Learning for Portfolio Management
by: Hu, Gang, et al.
Published: (2024)
by: Hu, Gang, et al.
Published: (2024)
Deterministic Exploration via Stationary Bellman Error Maximization
by: Griesbach, Sebastian, et al.
Published: (2024)
by: Griesbach, Sebastian, et al.
Published: (2024)
Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space
by: Hong, Joey, et al.
Published: (2025)
by: Hong, Joey, et al.
Published: (2025)
Policy Adaptation via Language Optimization: Decomposing Tasks for Few-Shot Imitation
by: Myers, Vivek, et al.
Published: (2024)
by: Myers, Vivek, et al.
Published: (2024)
CPT: Consistent Proxy Tuning for Black-box Optimization
by: He, Yuanyang, et al.
Published: (2024)
by: He, Yuanyang, et al.
Published: (2024)
Learning an Efficient Optimizer via Hybrid-Policy Sub-Trajectory Balance
by: Guan, Yunchuan, et al.
Published: (2025)
by: Guan, Yunchuan, et al.
Published: (2025)
Similar Items
-
Semi-Supervised Cross-Domain Imitation Learning
by: Chu, Li-Min, et al.
Published: (2026) -
Enhancing Offline Model-Based RL via Active Model Selection: A Bayesian Optimization Perspective
by: Yang, Yu-Wei, et al.
Published: (2025) -
Diminishing Exploration: A Minimalist Approach to Piecewise Stationary Multi-Armed Bandits
by: Li, Kuan-Ta, et al.
Published: (2024) -
Accelerated Policy Gradient: On the Convergence Rates of the Nesterov Momentum for Reinforcement Learning
by: Chen, Yen-Ju, et al.
Published: (2023) -
Q-Pensieve: Boosting Sample Efficiency of Multi-Objective RL Through Memory Sharing of Q-Snapshots
by: Hung, Wei, et al.
Published: (2022)