Saved in:
| Main Authors: | , , , , , , |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.26340 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866909816756109312 |
|---|---|
| author | Yan, Xue Ou, Zijing Yang, Mengyue Song, Yan Zhang, Haifeng Li, Yingzhen Wang, Jun |
| author_facet | Yan, Xue Ou, Zijing Yang, Mengyue Song, Yan Zhang, Haifeng Li, Yingzhen Wang, Jun |
| contents | Large language models (LLMs) have emerged as effective action policies for sequential decision-making (SDM) tasks due to their extensive prior knowledge. However, this broad yet general knowledge is often insufficient for specific decision-making tasks with limited task-related data, making it challenging to efficiently adapt LLMs to specific SDM tasks. To address this challenge, we propose a memory-driven self-improvement framework that combines LLM general prior knowledge with a compact memory of domain-specific experiences. Memory retains past interactions and associated Q-values, thereby capturing decision-relevant knowledge that facilitates accurate value estimation and informs the LLM prior refinement. The refined LLM prior, in turn, generates higher-reward trajectories that further enrich memory, forming a natural self-improvement framework where memory and LLM prior mutually reinforce each other. Experiments show that our memory-driven approach significantly outperforms both traditional RL and LLM-based baselines, e.g., improving performance by over 40\% on in-distribution tasks and over 75\% when generalized to unseen tasks in ALFWorld. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2509_26340 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Memory-Driven Self-Improvement for Decision Making with Large Language Models Yan, Xue Ou, Zijing Yang, Mengyue Song, Yan Zhang, Haifeng Li, Yingzhen Wang, Jun Machine Learning Large language models (LLMs) have emerged as effective action policies for sequential decision-making (SDM) tasks due to their extensive prior knowledge. However, this broad yet general knowledge is often insufficient for specific decision-making tasks with limited task-related data, making it challenging to efficiently adapt LLMs to specific SDM tasks. To address this challenge, we propose a memory-driven self-improvement framework that combines LLM general prior knowledge with a compact memory of domain-specific experiences. Memory retains past interactions and associated Q-values, thereby capturing decision-relevant knowledge that facilitates accurate value estimation and informs the LLM prior refinement. The refined LLM prior, in turn, generates higher-reward trajectories that further enrich memory, forming a natural self-improvement framework where memory and LLM prior mutually reinforce each other. Experiments show that our memory-driven approach significantly outperforms both traditional RL and LLM-based baselines, e.g., improving performance by over 40\% on in-distribution tasks and over 75\% when generalized to unseen tasks in ALFWorld. |
| title | Memory-Driven Self-Improvement for Decision Making with Large Language Models |
| topic | Machine Learning |
| url | https://arxiv.org/abs/2509.26340 |