Saved in:
| Main Authors: | Malloy, Tailia, Klinger, Tim, Liu, Miao, Riemer, Matthew, Tesauro, Gerald, Sims, Chris R. |
|---|---|
| Format: | Preprint |
| Published: |
2020
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2011.11517 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Deep RL With Information Constrained Policies: Generalization in Continuous Control
by: Malloy, Tailia, et al.
Published: (2020)
by: Malloy, Tailia, et al.
Published: (2020)
Learning in Factored Domains with Information-Constrained Visual Representations
by: Malloy, Tailia, et al.
Published: (2023)
by: Malloy, Tailia, et al.
Published: (2023)
Learning to Defend by Attacking (and Vice-Versa): Transfer of Learning in Cybersecurity Games
by: Malloy, Tailia, et al.
Published: (2023)
by: Malloy, Tailia, et al.
Published: (2023)
Assessing Spear-Phishing Website Generation in Large Language Model Coding Agents
by: Malloy, Tailia, et al.
Published: (2026)
by: Malloy, Tailia, et al.
Published: (2026)
On-line Policy Improvement using Monte-Carlo Search
by: Tesauro, Gerald, et al.
Published: (2025)
by: Tesauro, Gerald, et al.
Published: (2025)
Beyond Sliding Windows: Learning to Manage Memory in Non-Markovian Environments
by: Tasse, Geraud Nangue, et al.
Published: (2025)
by: Tasse, Geraud Nangue, et al.
Published: (2025)
Modeling Attention during Dimensional Shifts with Counterfactual and Delayed Feedback
by: Malloy, Tailia, et al.
Published: (2025)
by: Malloy, Tailia, et al.
Published: (2025)
Zero Knowledge Games
by: Malloy, Ian
Published: (2020)
by: Malloy, Ian
Published: (2020)
The Effectiveness of Approximate Regularized Replay for Efficient Supervised Fine-Tuning of Large Language Models
by: Riemer, Matthew, et al.
Published: (2025)
by: Riemer, Matthew, et al.
Published: (2025)
Training Users Against Human and GPT-4 Generated Social Engineering Attacks
by: Malloy, Tailia, et al.
Published: (2025)
by: Malloy, Tailia, et al.
Published: (2025)
Leveraging a Cognitive Model to Measure Subjective Similarity of Human and GPT-4 Written Content
by: Malloy, Tailia, et al.
Published: (2024)
by: Malloy, Tailia, et al.
Published: (2024)
The Ultimate Test of Superintelligent AI Agents: Can an AI Balance Care and Control in Asymmetric Relationships?
by: Bouneffouf, Djallel, et al.
Published: (2025)
by: Bouneffouf, Djallel, et al.
Published: (2025)
Exploring Information Seeking Agent Consolidation
by: Yan, Guochen, et al.
Published: (2026)
by: Yan, Guochen, et al.
Published: (2026)
Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration
by: Zhao, Yang, et al.
Published: (2026)
by: Zhao, Yang, et al.
Published: (2026)
SkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent
by: Cao, Shiyi, et al.
Published: (2025)
by: Cao, Shiyi, et al.
Published: (2025)
One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents
by: Hong, Yoosung
Published: (2026)
by: Hong, Yoosung
Published: (2026)
Multi-Agent Path Finding via Offline RL and LLM Collaboration
by: Atasever, Merve, et al.
Published: (2025)
by: Atasever, Merve, et al.
Published: (2025)
Safe Heterogeneous Multi-Agent RL with Communication Regularization for Coordinated Target Acquisition
by: Calzolari, Gabriele, et al.
Published: (2026)
by: Calzolari, Gabriele, et al.
Published: (2026)
O-Researcher: An Open Ended Deep Research Model via Multi-Agent Distillation and Agentic RL
by: Yao, Yi, et al.
Published: (2026)
by: Yao, Yi, et al.
Published: (2026)
The Role of Deep Learning Regularizations on Actors in Offline RL
by: Tarasov, Denis, et al.
Published: (2024)
by: Tarasov, Denis, et al.
Published: (2024)
Position: Theory of Mind Benchmarks are Broken for Large Language Models
by: Riemer, Matthew, et al.
Published: (2024)
by: Riemer, Matthew, et al.
Published: (2024)
ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents
by: Zhang, Hao, et al.
Published: (2026)
by: Zhang, Hao, et al.
Published: (2026)
JaxMARL: Multi-Agent RL Environments and Algorithms in JAX
by: Rutherford, Alexander, et al.
Published: (2023)
by: Rutherford, Alexander, et al.
Published: (2023)
Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization
by: Li, Simin, et al.
Published: (2023)
by: Li, Simin, et al.
Published: (2023)
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
by: Li, Weizhen, et al.
Published: (2025)
by: Li, Weizhen, et al.
Published: (2025)
Learning to Beat ByteRL: Exploitability of Collectible Card Game Agents
by: Haluska, Radovan, et al.
Published: (2024)
by: Haluska, Radovan, et al.
Published: (2024)
Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers
by: Xin, Ran, et al.
Published: (2025)
by: Xin, Ran, et al.
Published: (2025)
Generalizing Consistency Policy to Visual RL with Prioritized Proximal Experience Regularization
by: Li, Haoran, et al.
Published: (2024)
by: Li, Haoran, et al.
Published: (2024)
Enabling Realtime Reinforcement Learning at Scale with Staggered Asynchronous Inference
by: Riemer, Matthew, et al.
Published: (2024)
by: Riemer, Matthew, et al.
Published: (2024)
A Neuro-Symbolic Approach to Multi-Agent RL for Interpretability and Probabilistic Decision Making
by: Subramanian, Chitra, et al.
Published: (2024)
by: Subramanian, Chitra, et al.
Published: (2024)
What makes Models Compositional? A Theoretical View: With Supplement
by: Ram, Parikshit, et al.
Published: (2024)
by: Ram, Parikshit, et al.
Published: (2024)
Cooperative Game-Theoretic Credit Assignment for Multi-Agent Policy Gradients via the Core
by: Ji, Mengda, et al.
Published: (2025)
by: Ji, Mengda, et al.
Published: (2025)
Style-Preserving Policy Optimization for Game Agents
by: Li, Lingfeng, et al.
Published: (2025)
by: Li, Lingfeng, et al.
Published: (2025)
Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL
by: Huang, Jiawei, et al.
Published: (2024)
by: Huang, Jiawei, et al.
Published: (2024)
A Sentiment Consolidation Framework for Meta-Review Generation
by: Li, Miao, et al.
Published: (2024)
by: Li, Miao, et al.
Published: (2024)
Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality
by: Ged, François, et al.
Published: (2023)
by: Ged, François, et al.
Published: (2023)
DeepStock: Reinforcement Learning with Policy Regularizations for Inventory Management
by: Xie, Yaqi, et al.
Published: (2026)
by: Xie, Yaqi, et al.
Published: (2026)
When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs
by: Zeng, Yifan, et al.
Published: (2026)
by: Zeng, Yifan, et al.
Published: (2026)
MCPO: Mastery-Consolidated Policy Optimization for Large Reasoning Models
by: Liao, Zhaokang, et al.
Published: (2026)
by: Liao, Zhaokang, et al.
Published: (2026)
MarsRL: Advancing Multi-Agent Reasoning System via Reinforcement Learning with Agentic Pipeline Parallelism
by: Liu, Shulin, et al.
Published: (2025)
by: Liu, Shulin, et al.
Published: (2025)
Similar Items
-
Deep RL With Information Constrained Policies: Generalization in Continuous Control
by: Malloy, Tailia, et al.
Published: (2020) -
Learning in Factored Domains with Information-Constrained Visual Representations
by: Malloy, Tailia, et al.
Published: (2023) -
Learning to Defend by Attacking (and Vice-Versa): Transfer of Learning in Cybersecurity Games
by: Malloy, Tailia, et al.
Published: (2023) -
Assessing Spear-Phishing Website Generation in Large Language Model Coding Agents
by: Malloy, Tailia, et al.
Published: (2026) -
On-line Policy Improvement using Monte-Carlo Search
by: Tesauro, Gerald, et al.
Published: (2025)