Guardado en:
| Autores principales: | Maran, Davide, Metelli, Alberto Maria, Papini, Matteo, Restelli, Marcello |
|---|---|
| Formato: | Preprint |
| Publicado: |
2024
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2405.06363 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
No-Regret Reinforcement Learning in Smooth MDPs
por: Maran, Davide, et al.
Publicado: (2024)
por: Maran, Davide, et al.
Publicado: (2024)
Local Linearity: the Key for No-regret Reinforcement Learning in Continuous MDPs
por: Maran, Davide, et al.
Publicado: (2024)
por: Maran, Davide, et al.
Publicado: (2024)
Statistical Analysis of Policy Space Compression Problem
por: Molaei, Majid, et al.
Publicado: (2024)
por: Molaei, Majid, et al.
Publicado: (2024)
Finite Sample Bounds for Non-Parametric Regression: Optimal Sample Efficiency and Space Complexity
por: Maran, Davide, et al.
Publicado: (2024)
por: Maran, Davide, et al.
Publicado: (2024)
Inverse Reinforcement Learning with Sub-optimal Experts
por: Poiani, Riccardo, et al.
Publicado: (2024)
por: Poiani, Riccardo, et al.
Publicado: (2024)
Policy Gradient with Active Importance Sampling
por: Papini, Matteo, et al.
Publicado: (2024)
por: Papini, Matteo, et al.
Publicado: (2024)
How Log-Barrier Helps Exploration in Policy Optimization
por: Cesani, Leonardo, et al.
Publicado: (2026)
por: Cesani, Leonardo, et al.
Publicado: (2026)
Actor-Critic with Active Importance Sampling
por: Molaei, Majid, et al.
Publicado: (2026)
por: Molaei, Majid, et al.
Publicado: (2026)
Gym4ReaL: A Suite for Benchmarking Real-World Reinforcement Learning
por: Salaorni, Davide, et al.
Publicado: (2025)
por: Salaorni, Davide, et al.
Publicado: (2025)
Parameterized Projected Bellman Operator
por: Vincent, Théo, et al.
Publicado: (2023)
por: Vincent, Théo, et al.
Publicado: (2023)
Online Market Making and the Value of Observing the Order Book
por: Maran, Davide, et al.
Publicado: (2026)
por: Maran, Davide, et al.
Publicado: (2026)
Learning in Markov Decision Processes with Exogenous Dynamics
por: Maran, Davide, et al.
Publicado: (2026)
por: Maran, Davide, et al.
Publicado: (2026)
From Parameters to Behaviors: Unsupervised Compression of the Policy Space
por: Tenedini, Davide, et al.
Publicado: (2025)
por: Tenedini, Davide, et al.
Publicado: (2025)
Towards Principled Unsupervised Multi-Agent Reinforcement Learning
por: Zamboni, Riccardo, et al.
Publicado: (2025)
por: Zamboni, Riccardo, et al.
Publicado: (2025)
Building surrogate models using trajectories of agents trained by Reinforcement Learning
por: Cestero, Julen, et al.
Publicado: (2025)
por: Cestero, Julen, et al.
Publicado: (2025)
Generalizing Behavior via Inverse Reinforcement Learning with Closed-Form Reward Centroids
por: Lazzati, Filippo, et al.
Publicado: (2025)
por: Lazzati, Filippo, et al.
Publicado: (2025)
Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting
por: Russo, Alessio, et al.
Publicado: (2024)
por: Russo, Alessio, et al.
Publicado: (2024)
Autoregressive Bandits
por: Bacchiocchi, Francesco, et al.
Publicado: (2022)
por: Bacchiocchi, Francesco, et al.
Publicado: (2022)
Interpetable Target-Feature Aggregation for Multi-Task Learning based on Bias-Variance Analysis
por: Bonetti, Paolo, et al.
Publicado: (2024)
por: Bonetti, Paolo, et al.
Publicado: (2024)
A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning
por: Drappo, Gianluca, et al.
Publicado: (2024)
por: Drappo, Gianluca, et al.
Publicado: (2024)
Achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ Regret in Average-Reward POMDPs with Known Observation Models
por: Russo, Alessio, et al.
Publicado: (2025)
por: Russo, Alessio, et al.
Publicado: (2025)
Pure Exploration under Mediators' Feedback
por: Poiani, Riccardo, et al.
Publicado: (2023)
por: Poiani, Riccardo, et al.
Publicado: (2023)
Learning Optimal Deterministic Policies with Stochastic Policy Gradients
por: Montenegro, Alessandro, et al.
Publicado: (2024)
por: Montenegro, Alessandro, et al.
Publicado: (2024)
Unsupervised Behavioral Compression: Learning Low-Dimensional Policy Manifolds through State-Occupancy Matching
por: Fraschini, Andrea, et al.
Publicado: (2026)
por: Fraschini, Andrea, et al.
Publicado: (2026)
Imitation Learning as Return Distribution Matching
por: Lazzati, Filippo, et al.
Publicado: (2025)
por: Lazzati, Filippo, et al.
Publicado: (2025)
Power Grid Control with Graph-Based Distributed Reinforcement Learning
por: Fabrizio, Carlo, et al.
Publicado: (2025)
por: Fabrizio, Carlo, et al.
Publicado: (2025)
Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning
por: Montenegro, Alessandro, et al.
Publicado: (2024)
por: Montenegro, Alessandro, et al.
Publicado: (2024)
Optimal Multi-Fidelity Best-Arm Identification
por: Poiani, Riccardo, et al.
Publicado: (2024)
por: Poiani, Riccardo, et al.
Publicado: (2024)
Optimizing Energy Management of Smart Grid using Reinforcement Learning aided by Surrogate models built using Physics-informed Neural Networks
por: Cestero, Julen, et al.
Publicado: (2025)
por: Cestero, Julen, et al.
Publicado: (2025)
State and Action Factorization in Power Grids
por: Losapio, Gianvito, et al.
Publicado: (2024)
por: Losapio, Gianvito, et al.
Publicado: (2024)
Low-Rank MDPs with Continuous Action Spaces
por: Bennett, Andrew, et al.
Publicado: (2023)
por: Bennett, Andrew, et al.
Publicado: (2023)
Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach
por: Poiani, Riccardo, et al.
Publicado: (2024)
por: Poiani, Riccardo, et al.
Publicado: (2024)
Sample and Oracle Efficient Reinforcement Learning for MDPs with Linearly-Realizable Value Functions
por: Mhammedi, Zakaria
Publicado: (2024)
por: Mhammedi, Zakaria
Publicado: (2024)
How to Explore with Belief: State Entropy Maximization in POMDPs
por: Zamboni, Riccardo, et al.
Publicado: (2024)
por: Zamboni, Riccardo, et al.
Publicado: (2024)
A Reinforcement Learning Approach for Optimal Control in Microgrids
por: Salaorni, Davide, et al.
Publicado: (2025)
por: Salaorni, Davide, et al.
Publicado: (2025)
Do It for HER: First-Order Temporal Logic Reward Specification in Reinforcement Learning (Extended Version)
por: Olivieri, Pierriccardo, et al.
Publicado: (2026)
por: Olivieri, Pierriccardo, et al.
Publicado: (2026)
Information Capacity Regret Bounds for Bandits with Mediator Feedback
por: Eldowa, Khaled, et al.
Publicado: (2024)
por: Eldowa, Khaled, et al.
Publicado: (2024)
Bridging Rested and Restless Bandits with Graph-Triggering: Rising and Rotting
por: Genalti, Gianmarco, et al.
Publicado: (2024)
por: Genalti, Gianmarco, et al.
Publicado: (2024)
Best Arm Identification for Stochastic Rising Bandits
por: Mussi, Marco, et al.
Publicado: (2023)
por: Mussi, Marco, et al.
Publicado: (2023)
$(ε, u)$-Adaptive Regret Minimization in Heavy-Tailed Bandits
por: Genalti, Gianmarco, et al.
Publicado: (2023)
por: Genalti, Gianmarco, et al.
Publicado: (2023)
Ejemplares similares
-
No-Regret Reinforcement Learning in Smooth MDPs
por: Maran, Davide, et al.
Publicado: (2024) -
Local Linearity: the Key for No-regret Reinforcement Learning in Continuous MDPs
por: Maran, Davide, et al.
Publicado: (2024) -
Statistical Analysis of Policy Space Compression Problem
por: Molaei, Majid, et al.
Publicado: (2024) -
Finite Sample Bounds for Non-Parametric Regression: Optimal Sample Efficiency and Space Complexity
por: Maran, Davide, et al.
Publicado: (2024) -
Inverse Reinforcement Learning with Sub-optimal Experts
por: Poiani, Riccardo, et al.
Publicado: (2024)