:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Malloy, Tailia, Sims, Chris R., Klinger, Tim, Liu, Miao, Riemer, Matthew, Tesauro, Gerald
Format:	Preprint
Published:	2020
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2010.04646
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Consolidation via Policy Information Regularization in Deep RL for Multi-Agent Games
by: Malloy, Tailia, et al.
Published: (2020)

Learning in Factored Domains with Information-Constrained Visual Representations
by: Malloy, Tailia, et al.
Published: (2023)

Learning to Defend by Attacking (and Vice-Versa): Transfer of Learning in Cybersecurity Games
by: Malloy, Tailia, et al.
Published: (2023)

Assessing Spear-Phishing Website Generation in Large Language Model Coding Agents
by: Malloy, Tailia, et al.
Published: (2026)

On-line Policy Improvement using Monte-Carlo Search
by: Tesauro, Gerald, et al.
Published: (2025)

Beyond Sliding Windows: Learning to Manage Memory in Non-Markovian Environments
by: Tasse, Geraud Nangue, et al.
Published: (2025)

The Effectiveness of Approximate Regularized Replay for Efficient Supervised Fine-Tuning of Large Language Models
by: Riemer, Matthew, et al.
Published: (2025)

Modeling Attention during Dimensional Shifts with Counterfactual and Delayed Feedback
by: Malloy, Tailia, et al.
Published: (2025)

Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference
by: Riemer, Matthew, et al.
Published: (2024)

What makes Models Compositional? A Theoretical View: With Supplement
by: Ram, Parikshit, et al.
Published: (2024)

Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs
by: Huang, Luke J., et al.
Published: (2026)

Rethinking Adversarial Policies: A Generalized Attack Formulation and Provable Defense in RL
by: Liu, Xiangyu, et al.
Published: (2023)

Scalable Policy-Based RL Algorithms for POMDPs
by: Anjarlekar, Ameya, et al.
Published: (2025)

Critic-Driven Voronoi-Quantization for Distilling Deep RL Policies to Explainable Models
by: Deproost, Senne, et al.
Published: (2026)

SAFE-RL: Saliency-Aware Counterfactual Explainer for Deep Reinforcement Learning Policies
by: Samadi, Amir, et al.
Published: (2024)

Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone
by: Mark, Max Sobol, et al.
Published: (2024)

The Ultimate Test of Superintelligent AI Agents: Can an AI Balance Care and Control in Asymmetric Relationships?
by: Bouneffouf, Djallel, et al.
Published: (2025)

ReSkill: Reconciling Skill Creation with Policy Optimization in Agentic RL
by: He, Zelin, et al.
Published: (2026)

Partial Policy Gradients for RL in LLMs
by: Mathur, Puneet, et al.
Published: (2026)

Categorical Policies: Multimodal Policy Learning and Exploration in Continuous Control
by: Islam, SM Mazharul, et al.
Published: (2025)

A Review of Online Diffusion Policy RL Algorithms for Scalable Robotic Control
by: Choi, Wonhyeok, et al.
Published: (2026)

Ctrl-DNA: Controllable Cell-Type-Specific Regulatory DNA Design via Constrained RL
by: Chen, Xingyu, et al.
Published: (2025)

Revisiting Replay and Gradient Alignment for Continual Pre-Training of Large Language Models
by: Abbes, Istabrak, et al.
Published: (2025)

Can We Optimize Deep RL Policy Weights as Trajectory Modeling?
by: Tang, Hongyao
Published: (2025)

ControlTraj: Controllable Trajectory Generation with Topology-Constrained Diffusion Model
by: Zhu, Yuanshao, et al.
Published: (2024)

Last-Iterate Convergence of General Parameterized Policies in Constrained MDPs
by: Mondal, Washim Uddin, et al.
Published: (2024)

Soft Policy Optimization: Online Off-Policy RL for Sequence Models
by: Cohen, Taco, et al.
Published: (2025)

Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL
by: Luo, Yu, et al.
Published: (2024)

GRAM: Generalization in Deep RL with a Robust Adaptation Module
by: Queeney, James, et al.
Published: (2024)

Trust the Batch, On- or Off-Policy: Adaptive Policy Optimization for RL Post-Training
by: Fakoor, Rasool, et al.
Published: (2026)

Kernel Metric Learning for In-Sample Off-Policy Evaluation of Deterministic RL Policies
by: Lee, Haanvid, et al.
Published: (2024)

A Deep Dive into Scaling RL for Code Generation with Synthetic Data and Curricula
by: Sancaktar, Cansu, et al.
Published: (2026)

Policy Learning for Off-Dynamics RL with Deficient Support
by: Van, Linh Le Pham, et al.
Published: (2024)

Handling Delay in Real-Time Reinforcement Learning
by: Anokhin, Ivan, et al.
Published: (2025)

Q-Guided Stein Variational Model Predictive Control via RL-informed Policy Prior
by: Cai, Shizhe, et al.
Published: (2025)

Explainable RL Policies by Distilling to Locally-Specialized Linear Policies with Voronoi State Partitioning
by: Deproost, Senne, et al.
Published: (2025)

Mind Your Entropy: From Maximum Entropy to Trajectory Entropy-Constrained RL
by: Zhan, Guojian, et al.
Published: (2025)

SPEC-RL: Accelerating On-Policy Reinforcement Learning with Speculative Rollouts
by: Liu, Bingshuai, et al.
Published: (2025)

SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning
by: Anisimov, Maksim, et al.
Published: (2026)

General Flexible $f$-divergence for Challenging Offline RL Datasets with Low Stochasticity and Diverse Behavior Policies
by: Wang, Jianxun, et al.
Published: (2026)