Salvato in:
Dettagli Bibliografici
Autori principali: Li, Yuetai, Inan, Huseyin A, Yue, Xiang, Chen, Wei-Ning, Wutschitz, Lukas, Kulkarni, Janardhan, Poovendran, Radha, Sim, Robert, Rajmohan, Saravan
Natura: Preprint
Pubblicazione: 2025
Soggetti:
Accesso online:https://arxiv.org/abs/2511.01824
Tags: Aggiungi Tag
Nessun Tag, puoi essere il primo ad aggiungerne!!
_version_ 1866912685811040256
author Li, Yuetai
Inan, Huseyin A
Yue, Xiang
Chen, Wei-Ning
Wutschitz, Lukas
Kulkarni, Janardhan
Poovendran, Radha
Sim, Robert
Rajmohan, Saravan
author_facet Li, Yuetai
Inan, Huseyin A
Yue, Xiang
Chen, Wei-Ning
Wutschitz, Lukas
Kulkarni, Janardhan
Poovendran, Radha
Sim, Robert
Rajmohan, Saravan
contents LLM agents excel in compact environments requiring deep reasoning but remain brittle when operating in broader, more complex contexts that demand robustness across diverse tools and schemas. Building bespoke environments for training is heavy, brittle, and limits progress. In this paper, we demonstrate that LLMs can simulate realistic environment feedback without access to actual testbed data or APIs. Inspired by this capability, we propose two frameworks: Simia-SFT, a pipeline that synthesizes SFT data by amplifying small seed sets into diverse trajectories in an environment-agnostic manner, and Simia-RL, a framework that enables RL training without real environment implementations through LLM-simulated feedback. Fine-tuning open models yields consistent improvements across multiple benchmarks, surpassing GPT-4o and approaching o4-mini on $τ^2$-Bench. Together, Simia-SFT and Simia-RL enable scalable agent training without environment engineering, replacing heavy and brittle implementations with flexible LLM-based simulation.
format Preprint
id arxiv_https___arxiv_org_abs_2511_01824
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Simulating Environments with Reasoning Models for Agent Training
Li, Yuetai
Inan, Huseyin A
Yue, Xiang
Chen, Wei-Ning
Wutschitz, Lukas
Kulkarni, Janardhan
Poovendran, Radha
Sim, Robert
Rajmohan, Saravan
Artificial Intelligence
Machine Learning
LLM agents excel in compact environments requiring deep reasoning but remain brittle when operating in broader, more complex contexts that demand robustness across diverse tools and schemas. Building bespoke environments for training is heavy, brittle, and limits progress. In this paper, we demonstrate that LLMs can simulate realistic environment feedback without access to actual testbed data or APIs. Inspired by this capability, we propose two frameworks: Simia-SFT, a pipeline that synthesizes SFT data by amplifying small seed sets into diverse trajectories in an environment-agnostic manner, and Simia-RL, a framework that enables RL training without real environment implementations through LLM-simulated feedback. Fine-tuning open models yields consistent improvements across multiple benchmarks, surpassing GPT-4o and approaching o4-mini on $τ^2$-Bench. Together, Simia-SFT and Simia-RL enable scalable agent training without environment engineering, replacing heavy and brittle implementations with flexible LLM-based simulation.
title Simulating Environments with Reasoning Models for Agent Training
topic Artificial Intelligence
Machine Learning
url https://arxiv.org/abs/2511.01824