Saved in:
| Main Authors: | Toles, Matthew, Singh, Rattandeep, Song, Isaac, Yu, Zhou |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.14079 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Program Synthesis Dialog Agents for Interactive Decision-Making
by: Toles, Matthew, et al.
Published: (2025)
by: Toles, Matthew, et al.
Published: (2025)
CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale
by: Wang, Zhun, et al.
Published: (2025)
by: Wang, Zhun, et al.
Published: (2025)
PersonaGym: Evaluating Persona Agents and LLMs
by: Samuel, Vinay, et al.
Published: (2024)
by: Samuel, Vinay, et al.
Published: (2024)
Gym-Anything: Turn any Software into an Agent Environment
by: Aggarwal, Pranjal, et al.
Published: (2026)
by: Aggarwal, Pranjal, et al.
Published: (2026)
The BrowserGym Ecosystem for Web Agent Research
by: De Chezelles, Thibault Le Sellier, et al.
Published: (2024)
by: De Chezelles, Thibault Le Sellier, et al.
Published: (2024)
EO-Gym: A Multimodal, Interactive Environment for Earth Observation Agents
by: Ma, Sai, et al.
Published: (2026)
by: Ma, Sai, et al.
Published: (2026)
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
by: Xi, Zhiheng, et al.
Published: (2024)
by: Xi, Zhiheng, et al.
Published: (2024)
CybORG++: An Enhanced Gym for the Development of Autonomous Cyber Agents
by: Emerson, Harry, et al.
Published: (2024)
by: Emerson, Harry, et al.
Published: (2024)
MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents
by: Rosen, Simon, et al.
Published: (2026)
by: Rosen, Simon, et al.
Published: (2026)
ClawGym: A Scalable Framework for Building Effective Claw Agents
by: Bai, Fei, et al.
Published: (2026)
by: Bai, Fei, et al.
Published: (2026)
ResearchGym: Evaluating Language Model Agents on Real-World AI Research
by: Garikaparthi, Aniketh, et al.
Published: (2026)
by: Garikaparthi, Aniketh, et al.
Published: (2026)
AgGym: An agricultural biotic stress simulation environment for ultra-precision management planning
by: Khosravi, Mahsa, et al.
Published: (2024)
by: Khosravi, Mahsa, et al.
Published: (2024)
CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents
by: Wang, Bowen, et al.
Published: (2026)
by: Wang, Bowen, et al.
Published: (2026)
NegotiationGym: Self-Optimizing Agents in a Multi-Agent Social Simulation Environment
by: Mangla, Shashank, et al.
Published: (2025)
by: Mangla, Shashank, et al.
Published: (2025)
TutorGym: A Testbed for Evaluating AI Agents as Tutors and Students
by: Weitekamp, Daniel, et al.
Published: (2025)
by: Weitekamp, Daniel, et al.
Published: (2025)
ExploitGym: Can AI Agents Turn Security Vulnerabilities into Real Attacks?
by: Wang, Zhun, et al.
Published: (2026)
by: Wang, Zhun, et al.
Published: (2026)
UserBench: An Interactive Gym Environment for User-Centric Agents
by: Qian, Cheng, et al.
Published: (2025)
by: Qian, Cheng, et al.
Published: (2025)
DevOps-Gym: Benchmarking AI Agents in Software DevOps Cycle
by: Tang, Yuheng, et al.
Published: (2026)
by: Tang, Yuheng, et al.
Published: (2026)
ShopGym: An Integrated Framework for Realistic Simulation and Scalable Benchmarking of E-Commerce Web Agents
by: Savadikar, Chinmay, et al.
Published: (2026)
by: Savadikar, Chinmay, et al.
Published: (2026)
SimGym: Traffic-Grounded Browser Agents for Offline A/B Testing in E-Commerce
by: Castelo, Alberto, et al.
Published: (2026)
by: Castelo, Alberto, et al.
Published: (2026)
NovelGym: A Flexible Ecosystem for Hybrid Planning and Learning Agents Designed for Open Worlds
by: Goel, Shivam, et al.
Published: (2024)
by: Goel, Shivam, et al.
Published: (2024)
GEM: A Gym for Agentic LLMs
by: Liu, Zichen, et al.
Published: (2025)
by: Liu, Zichen, et al.
Published: (2025)
InnoGym: Benchmarking the Innovation Potential of AI Agents
by: Zhang, Jintian, et al.
Published: (2025)
by: Zhang, Jintian, et al.
Published: (2025)
pyRDDLGym: From RDDL to Gym Environments
by: Taitler, Ayal, et al.
Published: (2022)
by: Taitler, Ayal, et al.
Published: (2022)
SearchGym: Bootstrapping Real-World Search Agents via Cost-Effective and High-Fidelity Environment Simulation
by: Zhang, Xichen, et al.
Published: (2026)
by: Zhang, Xichen, et al.
Published: (2026)
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
by: Shao, Yijia, et al.
Published: (2024)
by: Shao, Yijia, et al.
Published: (2024)
AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations
by: Jiayang, Cheng, et al.
Published: (2026)
by: Jiayang, Cheng, et al.
Published: (2026)
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
by: Khan, Zaid, et al.
Published: (2024)
by: Khan, Zaid, et al.
Published: (2024)
Online Dynamic Goal Recognition in Gym Environments
by: Matan, Shamir, et al.
Published: (2025)
by: Matan, Shamir, et al.
Published: (2025)
SimGym: A Framework for A/B Test Simulation in E-Commerce with Traffic-Grounded VLM Agents
by: Li, Han, et al.
Published: (2026)
by: Li, Han, et al.
Published: (2026)
ViroGym: Realistic Large-Scale Benchmarks for Evaluating Viral Proteins
by: Zhou, Yichen, et al.
Published: (2026)
by: Zhou, Yichen, et al.
Published: (2026)
TimeSeriesGym: A Scalable Benchmark for (Time Series) Machine Learning Engineering Agents
by: Cai, Yifu, et al.
Published: (2025)
by: Cai, Yifu, et al.
Published: (2025)
Mind the Gap Between Spatial Reasoning and Acting! Step-by-Step Evaluation of Agents With Spatial-Gym
by: Kaesberg, Lars Benedikt, et al.
Published: (2026)
by: Kaesberg, Lars Benedikt, et al.
Published: (2026)
EcoGym: Evaluating LLMs for Long-Horizon Plan-and-Execute in Interactive Economies
by: Hu, Xavier, et al.
Published: (2026)
by: Hu, Xavier, et al.
Published: (2026)
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
by: Xi, Zhiheng, et al.
Published: (2025)
by: Xi, Zhiheng, et al.
Published: (2025)
QueryGym: Step-by-Step Interaction with Relational Databases
by: Ananthakrishnan, Haritha, et al.
Published: (2025)
by: Ananthakrishnan, Haritha, et al.
Published: (2025)
WorldGym: World Model as An Environment for Policy Evaluation
by: Quevedo, Julian, et al.
Published: (2025)
by: Quevedo, Julian, et al.
Published: (2025)
OceanGym: A Benchmark Environment for Underwater Embodied Agents
by: Xue, Yida, et al.
Published: (2025)
by: Xue, Yida, et al.
Published: (2025)
MedAgentGym: A Scalable Agentic Training Environment for Code-Centric Reasoning in Biomedical Data Science
by: Xu, Ran, et al.
Published: (2025)
by: Xu, Ran, et al.
Published: (2025)
GymPN: A Library for Decision-Making in Process Management Systems
by: Bianco, Riccardo Lo, et al.
Published: (2025)
by: Bianco, Riccardo Lo, et al.
Published: (2025)
Similar Items
-
Program Synthesis Dialog Agents for Interactive Decision-Making
by: Toles, Matthew, et al.
Published: (2025) -
CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale
by: Wang, Zhun, et al.
Published: (2025) -
PersonaGym: Evaluating Persona Agents and LLMs
by: Samuel, Vinay, et al.
Published: (2024) -
Gym-Anything: Turn any Software into an Agent Environment
by: Aggarwal, Pranjal, et al.
Published: (2026) -
The BrowserGym Ecosystem for Web Agent Research
by: De Chezelles, Thibault Le Sellier, et al.
Published: (2024)