Saved in:
Bibliographic Details
Main Authors: Yan, Xue, Ou, Zijing, Yang, Mengyue, Song, Yan, Zhang, Haifeng, Li, Yingzhen, Wang, Jun
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2509.26340
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909816756109312
author Yan, Xue
Ou, Zijing
Yang, Mengyue
Song, Yan
Zhang, Haifeng
Li, Yingzhen
Wang, Jun
author_facet Yan, Xue
Ou, Zijing
Yang, Mengyue
Song, Yan
Zhang, Haifeng
Li, Yingzhen
Wang, Jun
contents Large language models (LLMs) have emerged as effective action policies for sequential decision-making (SDM) tasks due to their extensive prior knowledge. However, this broad yet general knowledge is often insufficient for specific decision-making tasks with limited task-related data, making it challenging to efficiently adapt LLMs to specific SDM tasks. To address this challenge, we propose a memory-driven self-improvement framework that combines LLM general prior knowledge with a compact memory of domain-specific experiences. Memory retains past interactions and associated Q-values, thereby capturing decision-relevant knowledge that facilitates accurate value estimation and informs the LLM prior refinement. The refined LLM prior, in turn, generates higher-reward trajectories that further enrich memory, forming a natural self-improvement framework where memory and LLM prior mutually reinforce each other. Experiments show that our memory-driven approach significantly outperforms both traditional RL and LLM-based baselines, e.g., improving performance by over 40\% on in-distribution tasks and over 75\% when generalized to unseen tasks in ALFWorld.
format Preprint
id arxiv_https___arxiv_org_abs_2509_26340
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle Memory-Driven Self-Improvement for Decision Making with Large Language Models
Yan, Xue
Ou, Zijing
Yang, Mengyue
Song, Yan
Zhang, Haifeng
Li, Yingzhen
Wang, Jun
Machine Learning
Large language models (LLMs) have emerged as effective action policies for sequential decision-making (SDM) tasks due to their extensive prior knowledge. However, this broad yet general knowledge is often insufficient for specific decision-making tasks with limited task-related data, making it challenging to efficiently adapt LLMs to specific SDM tasks. To address this challenge, we propose a memory-driven self-improvement framework that combines LLM general prior knowledge with a compact memory of domain-specific experiences. Memory retains past interactions and associated Q-values, thereby capturing decision-relevant knowledge that facilitates accurate value estimation and informs the LLM prior refinement. The refined LLM prior, in turn, generates higher-reward trajectories that further enrich memory, forming a natural self-improvement framework where memory and LLM prior mutually reinforce each other. Experiments show that our memory-driven approach significantly outperforms both traditional RL and LLM-based baselines, e.g., improving performance by over 40\% on in-distribution tasks and over 75\% when generalized to unseen tasks in ALFWorld.
title Memory-Driven Self-Improvement for Decision Making with Large Language Models
topic Machine Learning
url https://arxiv.org/abs/2509.26340