Saved in:
| Main Authors: | Malloy, Tailia, Sims, Chris R., Klinger, Tim, Liu, Miao, Riemer, Matthew, Tesauro, Gerald |
|---|---|
| Format: | Preprint |
| Published: |
2020
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2010.04646 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Consolidation via Policy Information Regularization in Deep RL for Multi-Agent Games
by: Malloy, Tailia, et al.
Published: (2020)
by: Malloy, Tailia, et al.
Published: (2020)
Learning in Factored Domains with Information-Constrained Visual Representations
by: Malloy, Tailia, et al.
Published: (2023)
by: Malloy, Tailia, et al.
Published: (2023)
Learning to Defend by Attacking (and Vice-Versa): Transfer of Learning in Cybersecurity Games
by: Malloy, Tailia, et al.
Published: (2023)
by: Malloy, Tailia, et al.
Published: (2023)
Assessing Spear-Phishing Website Generation in Large Language Model Coding Agents
by: Malloy, Tailia, et al.
Published: (2026)
by: Malloy, Tailia, et al.
Published: (2026)
On-line Policy Improvement using Monte-Carlo Search
by: Tesauro, Gerald, et al.
Published: (2025)
by: Tesauro, Gerald, et al.
Published: (2025)
Beyond Sliding Windows: Learning to Manage Memory in Non-Markovian Environments
by: Tasse, Geraud Nangue, et al.
Published: (2025)
by: Tasse, Geraud Nangue, et al.
Published: (2025)
The Effectiveness of Approximate Regularized Replay for Efficient Supervised Fine-Tuning of Large Language Models
by: Riemer, Matthew, et al.
Published: (2025)
by: Riemer, Matthew, et al.
Published: (2025)
Modeling Attention during Dimensional Shifts with Counterfactual and Delayed Feedback
by: Malloy, Tailia, et al.
Published: (2025)
by: Malloy, Tailia, et al.
Published: (2025)
Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference
by: Riemer, Matthew, et al.
Published: (2024)
by: Riemer, Matthew, et al.
Published: (2024)
What makes Models Compositional? A Theoretical View: With Supplement
by: Ram, Parikshit, et al.
Published: (2024)
by: Ram, Parikshit, et al.
Published: (2024)
Stable Asynchrony: Variance-Controlled Off-Policy RL for LLMs
by: Huang, Luke J., et al.
Published: (2026)
by: Huang, Luke J., et al.
Published: (2026)
Rethinking Adversarial Policies: A Generalized Attack Formulation and Provable Defense in RL
by: Liu, Xiangyu, et al.
Published: (2023)
by: Liu, Xiangyu, et al.
Published: (2023)
Scalable Policy-Based RL Algorithms for POMDPs
by: Anjarlekar, Ameya, et al.
Published: (2025)
by: Anjarlekar, Ameya, et al.
Published: (2025)
Critic-Driven Voronoi-Quantization for Distilling Deep RL Policies to Explainable Models
by: Deproost, Senne, et al.
Published: (2026)
by: Deproost, Senne, et al.
Published: (2026)
SAFE-RL: Saliency-Aware Counterfactual Explainer for Deep Reinforcement Learning Policies
by: Samadi, Amir, et al.
Published: (2024)
by: Samadi, Amir, et al.
Published: (2024)
Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone
by: Mark, Max Sobol, et al.
Published: (2024)
by: Mark, Max Sobol, et al.
Published: (2024)
The Ultimate Test of Superintelligent AI Agents: Can an AI Balance Care and Control in Asymmetric Relationships?
by: Bouneffouf, Djallel, et al.
Published: (2025)
by: Bouneffouf, Djallel, et al.
Published: (2025)
ReSkill: Reconciling Skill Creation with Policy Optimization in Agentic RL
by: He, Zelin, et al.
Published: (2026)
by: He, Zelin, et al.
Published: (2026)
Partial Policy Gradients for RL in LLMs
by: Mathur, Puneet, et al.
Published: (2026)
by: Mathur, Puneet, et al.
Published: (2026)
Categorical Policies: Multimodal Policy Learning and Exploration in Continuous Control
by: Islam, SM Mazharul, et al.
Published: (2025)
by: Islam, SM Mazharul, et al.
Published: (2025)
A Review of Online Diffusion Policy RL Algorithms for Scalable Robotic Control
by: Choi, Wonhyeok, et al.
Published: (2026)
by: Choi, Wonhyeok, et al.
Published: (2026)
Ctrl-DNA: Controllable Cell-Type-Specific Regulatory DNA Design via Constrained RL
by: Chen, Xingyu, et al.
Published: (2025)
by: Chen, Xingyu, et al.
Published: (2025)
Revisiting Replay and Gradient Alignment for Continual Pre-Training of Large Language Models
by: Abbes, Istabrak, et al.
Published: (2025)
by: Abbes, Istabrak, et al.
Published: (2025)
Can We Optimize Deep RL Policy Weights as Trajectory Modeling?
by: Tang, Hongyao
Published: (2025)
by: Tang, Hongyao
Published: (2025)
ControlTraj: Controllable Trajectory Generation with Topology-Constrained Diffusion Model
by: Zhu, Yuanshao, et al.
Published: (2024)
by: Zhu, Yuanshao, et al.
Published: (2024)
Last-Iterate Convergence of General Parameterized Policies in Constrained MDPs
by: Mondal, Washim Uddin, et al.
Published: (2024)
by: Mondal, Washim Uddin, et al.
Published: (2024)
Soft Policy Optimization: Online Off-Policy RL for Sequence Models
by: Cohen, Taco, et al.
Published: (2025)
by: Cohen, Taco, et al.
Published: (2025)
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL
by: Luo, Yu, et al.
Published: (2024)
by: Luo, Yu, et al.
Published: (2024)
GRAM: Generalization in Deep RL with a Robust Adaptation Module
by: Queeney, James, et al.
Published: (2024)
by: Queeney, James, et al.
Published: (2024)
Trust the Batch, On- or Off-Policy: Adaptive Policy Optimization for RL Post-Training
by: Fakoor, Rasool, et al.
Published: (2026)
by: Fakoor, Rasool, et al.
Published: (2026)
Kernel Metric Learning for In-Sample Off-Policy Evaluation of Deterministic RL Policies
by: Lee, Haanvid, et al.
Published: (2024)
by: Lee, Haanvid, et al.
Published: (2024)
A Deep Dive into Scaling RL for Code Generation with Synthetic Data and Curricula
by: Sancaktar, Cansu, et al.
Published: (2026)
by: Sancaktar, Cansu, et al.
Published: (2026)
Policy Learning for Off-Dynamics RL with Deficient Support
by: Van, Linh Le Pham, et al.
Published: (2024)
by: Van, Linh Le Pham, et al.
Published: (2024)
Handling Delay in Real-Time Reinforcement Learning
by: Anokhin, Ivan, et al.
Published: (2025)
by: Anokhin, Ivan, et al.
Published: (2025)
Q-Guided Stein Variational Model Predictive Control via RL-informed Policy Prior
by: Cai, Shizhe, et al.
Published: (2025)
by: Cai, Shizhe, et al.
Published: (2025)
Explainable RL Policies by Distilling to Locally-Specialized Linear Policies with Voronoi State Partitioning
by: Deproost, Senne, et al.
Published: (2025)
by: Deproost, Senne, et al.
Published: (2025)
Mind Your Entropy: From Maximum Entropy to Trajectory Entropy-Constrained RL
by: Zhan, Guojian, et al.
Published: (2025)
by: Zhan, Guojian, et al.
Published: (2025)
SPEC-RL: Accelerating On-Policy Reinforcement Learning with Speculative Rollouts
by: Liu, Bingshuai, et al.
Published: (2025)
by: Liu, Bingshuai, et al.
Published: (2025)
SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning
by: Anisimov, Maksim, et al.
Published: (2026)
by: Anisimov, Maksim, et al.
Published: (2026)
General Flexible $f$-divergence for Challenging Offline RL Datasets with Low Stochasticity and Diverse Behavior Policies
by: Wang, Jianxun, et al.
Published: (2026)
by: Wang, Jianxun, et al.
Published: (2026)
Similar Items
-
Consolidation via Policy Information Regularization in Deep RL for Multi-Agent Games
by: Malloy, Tailia, et al.
Published: (2020) -
Learning in Factored Domains with Information-Constrained Visual Representations
by: Malloy, Tailia, et al.
Published: (2023) -
Learning to Defend by Attacking (and Vice-Versa): Transfer of Learning in Cybersecurity Games
by: Malloy, Tailia, et al.
Published: (2023) -
Assessing Spear-Phishing Website Generation in Large Language Model Coding Agents
by: Malloy, Tailia, et al.
Published: (2026) -
On-line Policy Improvement using Monte-Carlo Search
by: Tesauro, Gerald, et al.
Published: (2025)