:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Sinha, Amit, Geist, Matthieu, Mahajan, Aditya
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2407.06121
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Convergence of regularized agent-state-based Q-learning in POMDPs
by: Sinha, Amit, et al.
Published: (2025)

Risk-seeking conservative policy iteration with agent-state based policies for Dec-POMDPs with guaranteed convergence
by: Sinha, Amit, et al.
Published: (2026)

Agent-state based policies in POMDPs: Beyond belief-state MDPs
by: Sinha, Amit, et al.
Published: (2024)

Multi-agent imitation learning with function approximation: Linear Markov games and beyond
by: Viano, Luca, et al.
Published: (2026)

Towards Minimax Optimality of Model-based Robust Reinforcement Learning
by: Clavier, Pierre, et al.
Published: (2023)

Dynamical-VAE-based Hindsight to Learn the Causal Dynamics of Factored-POMDPs
by: Han, Chao, et al.
Published: (2024)

Solving robust MDPs as a sequence of static RL problems
by: Zouitine, Adil, et al.
Published: (2024)

Rate optimal learning of equilibria from data
by: Freihaut, Till, et al.
Published: (2025)

Closing the Gap between TD Learning and Supervised Learning -- A Generalisation Point of View
by: Ghugare, Raj, et al.
Published: (2024)

Soft $Q(λ)$: A multi-step off-policy method for entropy regularised reinforcement learning using eligibility traces
by: Mahajan, Pranav, et al.
Published: (2026)

RRLS : Robust Reinforcement Learning Suite
by: Zouitine, Adil, et al.
Published: (2024)

Time-Constrained Robust MDPs
by: Zouitine, Adil, et al.
Published: (2024)

Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning
by: Freihaut, Till, et al.
Published: (2025)

ShiQ: Bringing back Bellman to LLMs
by: Clavier, Pierre, et al.
Published: (2025)

Bootstrapping Expectiles in Reinforcement Learning
by: Clavier, Pierre, et al.
Published: (2024)

Leveraging Procedural Generation for Learning Autonomous Peg-in-Hole Assembly in Space
by: Orsula, Andrej, et al.
Published: (2024)

Space Robotics Bench: Robot Learning Beyond Earth
by: Orsula, Andrej, et al.
Published: (2025)

Learning Tool-Aware Adaptive Compliant Control for Autonomous Regolith Excavation
by: Orsula, Andrej, et al.
Published: (2025)

Sim2Dust: Mastering Dynamic Waypoint Tracking on Granular Media
by: Orsula, Andrej, et al.
Published: (2025)

A Theoretical Justification for Asymmetric Actor-Critic Algorithms
by: Lambrechts, Gaspard, et al.
Published: (2025)

BanditQ: Fair Bandits with Guaranteed Rewards
by: Sinha, Abhishek
Published: (2023)

A Q-learning Approach for Adherence-Aware Recommendations
by: Faros, Ioannis, et al.
Published: (2023)

Memoryless Policy Iteration for Episodic POMDPs
by: van Zuijlen, Roy, et al.
Published: (2025)

Rethinking Transformers in Solving POMDPs
by: Lu, Chenhao, et al.
Published: (2024)

Self-Improving Robust Preference Optimization
by: Choi, Eugene, et al.
Published: (2024)

Understanding Likelihood Over-optimisation in Direct Alignment Algorithms
by: Shi, Zhengyan, et al.
Published: (2024)

Perception-Based Beliefs for POMDPs with Visual Observations
by: Schäfers, Miriam, et al.
Published: (2026)

Recurrent Natural Policy Gradient for POMDPs
by: Cayci, Semih, et al.
Published: (2024)

Online Planning in POMDPs with State-Requests
by: Avalos, Raphael, et al.
Published: (2024)

Deep Q-Network (DQN) multi-agent reinforcement learning (MARL) for Stock Trading
by: Tidwell, John Christopher, et al.
Published: (2025)

Population-aware Online Mirror Descent for Mean-Field Games with Common Noise by Deep Reinforcement Learning
by: Wu, Zida, et al.
Published: (2025)

Concentration of Cumulative Reward in Markov Decision Processes
by: Sayedana, Borna, et al.
Published: (2024)

Scaling Internal-State Policy-Gradient Methods for POMDPs
by: Aberdeen, Douglas, et al.
Published: (2025)

Bench-MFG: A Benchmark Suite for Learning in Stationary Mean Field Games
by: Magnino, Lorenzo, et al.
Published: (2026)

Periodic Regularized Q-Learning
by: Yang, Hyukjun, et al.
Published: (2026)

Pseudo-rigid body networks: learning interpretable deformable object dynamics from partial observations
by: Mamedov, Shamil, et al.
Published: (2023)

Posterior Sampling-based Online Learning for Episodic POMDPs
by: Tang, Dengwang, et al.
Published: (2023)

Pessimistic Iterative Planning with RNNs for Robust POMDPs
by: Galesloot, Maris F. L., et al.
Published: (2024)

Scalable Policy-Based RL Algorithms for POMDPs
by: Anjarlekar, Ameya, et al.
Published: (2025)

Approximate Control for Continuous-Time POMDPs
by: Eich, Yannick, et al.
Published: (2024)