Saved in:
Bibliographic Details
Main Authors: Wang, Haoyu, Ma, Guozheng, Cui, Shugang, Kong, Yilun, Luo, Haotian, Shen, Li, Gao, Mengya, Wu, Yichao, Wang, Xiaogang, Tao, Dacheng
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2601.21754
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866918316982927360
author Wang, Haoyu
Ma, Guozheng
Cui, Shugang
Kong, Yilun
Luo, Haotian
Shen, Li
Gao, Mengya
Wu, Yichao
Wang, Xiaogang
Tao, Dacheng
author_facet Wang, Haoyu
Ma, Guozheng
Cui, Shugang
Kong, Yilun
Luo, Haotian
Shen, Li
Gao, Mengya
Wu, Yichao
Wang, Xiaogang
Tao, Dacheng
contents While Large Language Models (LLMs) excel in language-based agentic tasks, their applicability to unseen, nonlinguistic environments (e.g., symbolic or spatial tasks) remains limited. Previous work attributes this performance gap to the mismatch between the pretraining distribution and the testing distribution. In this work, we demonstrate the primary bottleneck is the prohibitive cost of exploration: mastering these tasks requires extensive trial-and-error, which is computationally unsustainable for parameter-heavy LLMs operating in a high dimensional semantic space. To address this, we propose SCOUT (Sub-Scale Collaboration On Unseen Tasks), a novel framework that decouples exploration from exploitation. We employ lightweight "scouts" (e.g., small MLPs) to probe environmental dynamics at a speed and scale far exceeding LLMs. The collected trajectories are utilized to bootstrap the LLM via Supervised Fine-Tuning (SFT), followed by multi-turn Reinforcement Learning (RL) to activate its latent world knowledge. Empirically, SCOUT enables a Qwen2.5-3B-Instruct model to achieve an average score of 0.86, significantly outperforming proprietary models, including Gemini-2.5-Pro (0.60), while saving about 60% GPU hours consumption.
format Preprint
id arxiv_https___arxiv_org_abs_2601_21754
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Language-based Trial and Error Falls Behind in the Era of Experience
Wang, Haoyu
Ma, Guozheng
Cui, Shugang
Kong, Yilun
Luo, Haotian
Shen, Li
Gao, Mengya
Wu, Yichao
Wang, Xiaogang
Tao, Dacheng
Artificial Intelligence
While Large Language Models (LLMs) excel in language-based agentic tasks, their applicability to unseen, nonlinguistic environments (e.g., symbolic or spatial tasks) remains limited. Previous work attributes this performance gap to the mismatch between the pretraining distribution and the testing distribution. In this work, we demonstrate the primary bottleneck is the prohibitive cost of exploration: mastering these tasks requires extensive trial-and-error, which is computationally unsustainable for parameter-heavy LLMs operating in a high dimensional semantic space. To address this, we propose SCOUT (Sub-Scale Collaboration On Unseen Tasks), a novel framework that decouples exploration from exploitation. We employ lightweight "scouts" (e.g., small MLPs) to probe environmental dynamics at a speed and scale far exceeding LLMs. The collected trajectories are utilized to bootstrap the LLM via Supervised Fine-Tuning (SFT), followed by multi-turn Reinforcement Learning (RL) to activate its latent world knowledge. Empirically, SCOUT enables a Qwen2.5-3B-Instruct model to achieve an average score of 0.86, significantly outperforming proprietary models, including Gemini-2.5-Pro (0.60), while saving about 60% GPU hours consumption.
title Language-based Trial and Error Falls Behind in the Era of Experience
topic Artificial Intelligence
url https://arxiv.org/abs/2601.21754