:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Zequn, Marrero, Wesley J.
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence 68T05, 90C40, 93E35
Online Access:	https://arxiv.org/abs/2604.04334
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

KFCPO: Kronecker-Factored Approximated Constrained Policy Optimization
by: Lim, Joonyoung, et al.
Published: (2025)

Thompson Sampling for Infinite-Horizon Discounted Decision Processes
by: Adelman, Daniel, et al.
Published: (2024)

A Robust Model-Based Approach for Continuous-Time Policy Evaluation with Unknown Lévy Process Dynamics
by: Ye, Qihao, et al.
Published: (2025)

Communicating Plans, Not Percepts: Scalable Multi-Agent Coordination with Embodied World Models
by: Hill, Brennen A., et al.
Published: (2025)

Integrating Vision Foundation Models with Reinforcement Learning for Enhanced Object Interaction
by: Farooq, Ahmad, et al.
Published: (2025)

Monotone and Conservative Policy Iteration Beyond the Tabular Case
by: Eshwar, S. R., et al.
Published: (2025)

Adaptive Reward Design for Reinforcement Learning
by: Kwon, Minjae, et al.
Published: (2024)

Autonomous AI Agents for Real-Time Affordable Housing Site Selection: Multi-Objective Reinforcement Learning Under Regulatory Constraints
by: Imanov, Olaf Yunus Laitinen, et al.
Published: (2026)

Bayesian Conservative Policy Optimization (BCPO): A Novel Uncertainty-Calibrated Offline Reinforcement Learning with Credible Lower Bounds
by: Chatterjee, Debashis
Published: (2026)

Convex Regularization and Convergence of Policy Gradient Flows under Safety Constraints
by: Malo, Pekka, et al.
Published: (2024)

Deep neural networks can provably solve Bellman equations for Markov decision processes without the curse of dimensionality
by: Jentzen, Arnulf, et al.
Published: (2025)

Beyond Bellman: High-Order Generator Regression for Continuous-Time Policy Evaluation
by: Zheng, Yaowei, et al.
Published: (2026)

Reinforcement Learning Methods for the Stochastic Optimal Control of an Industrial Power-to-Heat System
by: Pilling, Eric, et al.
Published: (2024)

PAC-MCTS: Bias-Aware Pruning for Robust LLM-Guided Search and Planning
by: Qian, Tianhao
Published: (2026)

Machine Learning Algorithms for Improving Black Box Optimization Solvers
by: Kimiaei, Morteza, et al.
Published: (2025)

On Policy Evaluation Algorithms in Distributional Reinforcement Learning
by: Gerstenberg, Julian, et al.
Published: (2024)

Stability and Sensitivity Analysis of Relative Temporal-Difference Learning: Extended Version
by: Sakha, Masoud S., et al.
Published: (2026)

Operator-Theoretic Foundations and Policy Gradient Methods for General MDPs with Unbounded Costs
by: Gupta, Abhishek, et al.
Published: (2026)

Policy stability and ultimate stationarity in discounted risk-sensitive stochastic control
by: Bäuerle, Nicole, et al.
Published: (2026)

Blackwell optimality and policy stability for long-run risk sensitive stochastic control
by: Bäuerle, Nicole, et al.
Published: (2024)

Optimistic Training and Convergence of Q-Learning -- Extended Version
by: Mehta, Prashant, et al.
Published: (2026)

Mean--Variance Portfolio Selection by Continuous-Time Reinforcement Learning: Algorithms, Regret Analysis, and Empirical Study
by: Huang, Yilie, et al.
Published: (2024)

Neural Actor-Critic Methods for Hamilton-Jacobi-Bellman PDEs: Asymptotic Analysis and Numerical Studies
by: Cohen, Samuel N., et al.
Published: (2025)

Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty
by: Jia, Yanwei
Published: (2024)

Modeling Vehicle-Type-Specific Pedestrian Crash Avoidance Behavior in Safety-Critical Interactions Using Smooth-Mamba Deep Reinforcement Learning
by: Pu, Qingwen, et al.
Published: (2026)

Convergence proofs and strong error bounds for forward-backward stochastic differential equations using neural network simulations
by: Sheridan-Methven, Oliver
Published: (2024)

Asynchronous Stochastic Approximation with Applications to Average-Reward Reinforcement Learning
by: Yu, Huizhen, et al.
Published: (2024)

Generative World Models of Tasks: LLM-Driven Hierarchical Scaffolding for Embodied Agents
by: Hill, Brennen
Published: (2025)

A Fisher-Rao gradient flow for entropy-regularised Markov decision processes in Polish spaces
by: Kerimkulov, Bekzhan, et al.
Published: (2023)

An Optimal-Control Approach to Infinite-Horizon Restless Bandits: Achieving Asymptotic Optimality with Minimal Assumptions
by: YAN, Chen
Published: (2024)

Measurement-Driven Early Warning of Reliability Breakdown in 5G NSA Railway Networks
by: Chou, Po-Heng, et al.
Published: (2025)

SALE-Based Offline Reinforcement Learning with Ensemble Q-Networks
by: Chun, Zheng
Published: (2025)

Existence of bounded solutions to multiplicative Poisson equations under mixing property
by: Pitera, Marcin, et al.
Published: (2023)

A Note on Stability in Asynchronous Stochastic Approximation without Communication Delays
by: Yu, Huizhen, et al.
Published: (2023)

Non-Expansive Mappings in Two-Time-Scale Stochastic Approximation: Finite-Time Analysis
by: Chandak, Siddharth
Published: (2025)

Securing Radiation Detection Systems with an Efficient TinyML-Based IDS for Edge Devices
by: Pizarro, Einstein Rivas, et al.
Published: (2025)

An Efficient Intrusion Detection System for Safeguarding Radiation Detection Systems
by: Coolidge, Nathanael, et al.
Published: (2025)

Probabilistic Approach to Black-Box Binary Optimization with Budget Constraints: Application to Sensor Placement
by: Attia, Ahmed
Published: (2024)

Two-Steps Diffusion Policy for Robotic Manipulation via Genetic Denoising
by: Clemente, Mateo, et al.
Published: (2025)

From Sparse to Dense: Toddler-inspired Reward Transition in Goal-Oriented Reinforcement Learning
by: Park, Junseok, et al.
Published: (2025)