:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Toles, Matthew, Singh, Rattandeep, Song, Isaac, Yu, Zhou
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence
Online Access:	https://arxiv.org/abs/2506.14079
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Program Synthesis Dialog Agents for Interactive Decision-Making
by: Toles, Matthew, et al.
Published: (2025)

CyberGym: Evaluating AI Agents' Real-World Cybersecurity Capabilities at Scale
by: Wang, Zhun, et al.
Published: (2025)

PersonaGym: Evaluating Persona Agents and LLMs
by: Samuel, Vinay, et al.
Published: (2024)

Gym-Anything: Turn any Software into an Agent Environment
by: Aggarwal, Pranjal, et al.
Published: (2026)

The BrowserGym Ecosystem for Web Agent Research
by: De Chezelles, Thibault Le Sellier, et al.
Published: (2024)

EO-Gym: A Multimodal, Interactive Environment for Earth Observation Agents
by: Ma, Sai, et al.
Published: (2026)

AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
by: Xi, Zhiheng, et al.
Published: (2024)

CybORG++: An Enhanced Gym for the Development of Autonomous Cyber Agents
by: Emerson, Harry, et al.
Published: (2024)

MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents
by: Rosen, Simon, et al.
Published: (2026)

ClawGym: A Scalable Framework for Building Effective Claw Agents
by: Bai, Fei, et al.
Published: (2026)

ResearchGym: Evaluating Language Model Agents on Real-World AI Research
by: Garikaparthi, Aniketh, et al.
Published: (2026)

AgGym: An agricultural biotic stress simulation environment for ultra-precision management planning
by: Khosravi, Mahsa, et al.
Published: (2024)

CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents
by: Wang, Bowen, et al.
Published: (2026)

NegotiationGym: Self-Optimizing Agents in a Multi-Agent Social Simulation Environment
by: Mangla, Shashank, et al.
Published: (2025)

TutorGym: A Testbed for Evaluating AI Agents as Tutors and Students
by: Weitekamp, Daniel, et al.
Published: (2025)

ExploitGym: Can AI Agents Turn Security Vulnerabilities into Real Attacks?
by: Wang, Zhun, et al.
Published: (2026)

UserBench: An Interactive Gym Environment for User-Centric Agents
by: Qian, Cheng, et al.
Published: (2025)

DevOps-Gym: Benchmarking AI Agents in Software DevOps Cycle
by: Tang, Yuheng, et al.
Published: (2026)

ShopGym: An Integrated Framework for Realistic Simulation and Scalable Benchmarking of E-Commerce Web Agents
by: Savadikar, Chinmay, et al.
Published: (2026)

SimGym: Traffic-Grounded Browser Agents for Offline A/B Testing in E-Commerce
by: Castelo, Alberto, et al.
Published: (2026)

NovelGym: A Flexible Ecosystem for Hybrid Planning and Learning Agents Designed for Open Worlds
by: Goel, Shivam, et al.
Published: (2024)

GEM: A Gym for Agentic LLMs
by: Liu, Zichen, et al.
Published: (2025)

InnoGym: Benchmarking the Innovation Potential of AI Agents
by: Zhang, Jintian, et al.
Published: (2025)

pyRDDLGym: From RDDL to Gym Environments
by: Taitler, Ayal, et al.
Published: (2022)

SearchGym: Bootstrapping Real-World Search Agents via Cost-Effective and High-Fidelity Environment Simulation
by: Zhang, Xichen, et al.
Published: (2026)

Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration
by: Shao, Yijia, et al.
Published: (2024)

AMemGym: Interactive Memory Benchmarking for Assistants in Long-Horizon Conversations
by: Jiayang, Cheng, et al.
Published: (2026)

DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
by: Khan, Zaid, et al.
Published: (2024)

Online Dynamic Goal Recognition in Gym Environments
by: Matan, Shamir, et al.
Published: (2025)

SimGym: A Framework for A/B Test Simulation in E-Commerce with Traffic-Grounded VLM Agents
by: Li, Han, et al.
Published: (2026)

ViroGym: Realistic Large-Scale Benchmarks for Evaluating Viral Proteins
by: Zhou, Yichen, et al.
Published: (2026)

TimeSeriesGym: A Scalable Benchmark for (Time Series) Machine Learning Engineering Agents
by: Cai, Yifu, et al.
Published: (2025)

Mind the Gap Between Spatial Reasoning and Acting! Step-by-Step Evaluation of Agents With Spatial-Gym
by: Kaesberg, Lars Benedikt, et al.
Published: (2026)

EcoGym: Evaluating LLMs for Long-Horizon Plan-and-Execute in Interactive Economies
by: Hu, Xavier, et al.
Published: (2026)

AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
by: Xi, Zhiheng, et al.
Published: (2025)

QueryGym: Step-by-Step Interaction with Relational Databases
by: Ananthakrishnan, Haritha, et al.
Published: (2025)

WorldGym: World Model as An Environment for Policy Evaluation
by: Quevedo, Julian, et al.
Published: (2025)

OceanGym: A Benchmark Environment for Underwater Embodied Agents
by: Xue, Yida, et al.
Published: (2025)

MedAgentGym: A Scalable Agentic Training Environment for Code-Centric Reasoning in Biomedical Data Science
by: Xu, Ran, et al.
Published: (2025)

GymPN: A Library for Decision-Making in Process Management Systems
by: Bianco, Riccardo Lo, et al.
Published: (2025)