Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yan, Xue, Ou, Zijing, Yang, Mengyue, Song, Yan, Zhang, Haifeng, Li, Yingzhen, Wang, Jun
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2509.26340
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909816756109312
author	Yan, Xue Ou, Zijing Yang, Mengyue Song, Yan Zhang, Haifeng Li, Yingzhen Wang, Jun
author_facet	Yan, Xue Ou, Zijing Yang, Mengyue Song, Yan Zhang, Haifeng Li, Yingzhen Wang, Jun
contents	Large language models (LLMs) have emerged as effective action policies for sequential decision-making (SDM) tasks due to their extensive prior knowledge. However, this broad yet general knowledge is often insufficient for specific decision-making tasks with limited task-related data, making it challenging to efficiently adapt LLMs to specific SDM tasks. To address this challenge, we propose a memory-driven self-improvement framework that combines LLM general prior knowledge with a compact memory of domain-specific experiences. Memory retains past interactions and associated Q-values, thereby capturing decision-relevant knowledge that facilitates accurate value estimation and informs the LLM prior refinement. The refined LLM prior, in turn, generates higher-reward trajectories that further enrich memory, forming a natural self-improvement framework where memory and LLM prior mutually reinforce each other. Experiments show that our memory-driven approach significantly outperforms both traditional RL and LLM-based baselines, e.g., improving performance by over 40\% on in-distribution tasks and over 75\% when generalized to unseen tasks in ALFWorld.
format	Preprint
id	arxiv_https___arxiv_org_abs_2509_26340
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Memory-Driven Self-Improvement for Decision Making with Large Language Models Yan, Xue Ou, Zijing Yang, Mengyue Song, Yan Zhang, Haifeng Li, Yingzhen Wang, Jun Machine Learning Large language models (LLMs) have emerged as effective action policies for sequential decision-making (SDM) tasks due to their extensive prior knowledge. However, this broad yet general knowledge is often insufficient for specific decision-making tasks with limited task-related data, making it challenging to efficiently adapt LLMs to specific SDM tasks. To address this challenge, we propose a memory-driven self-improvement framework that combines LLM general prior knowledge with a compact memory of domain-specific experiences. Memory retains past interactions and associated Q-values, thereby capturing decision-relevant knowledge that facilitates accurate value estimation and informs the LLM prior refinement. The refined LLM prior, in turn, generates higher-reward trajectories that further enrich memory, forming a natural self-improvement framework where memory and LLM prior mutually reinforce each other. Experiments show that our memory-driven approach significantly outperforms both traditional RL and LLM-based baselines, e.g., improving performance by over 40\% on in-distribution tasks and over 75\% when generalized to unseen tasks in ALFWorld.
title	Memory-Driven Self-Improvement for Decision Making with Large Language Models
topic	Machine Learning
url	https://arxiv.org/abs/2509.26340

Similar Items