Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Author:	Wu, Xiefeng
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence Machine Learning
Online Access:	https://arxiv.org/abs/2410.01458
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866909333029126144
author	Wu, Xiefeng
author_facet	Wu, Xiefeng
contents	Q-shaping is an extension of Q-value initialization and serves as an alternative to reward shaping for incorporating domain knowledge to accelerate agent training, thereby improving sample efficiency by directly shaping Q-values. This approach is both general and robust across diverse tasks, allowing for immediate impact assessment while guaranteeing optimality. We evaluated Q-shaping across 20 different environments using a large language model (LLM) as the heuristic provider. The results demonstrate that Q-shaping significantly enhances sample efficiency, achieving a \textbf{16.87\%} improvement over the best baseline in each environment and a \textbf{253.80\%} improvement compared to LLM-based reward shaping methods. These findings establish Q-shaping as a superior and unbiased alternative to conventional reward shaping in reinforcement learning.
format	Preprint
id	arxiv_https___arxiv_org_abs_2410_01458
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	From Reward Shaping to Q-Shaping: Achieving Unbiased Learning with LLM-Guided Knowledge Wu, Xiefeng Artificial Intelligence Machine Learning Q-shaping is an extension of Q-value initialization and serves as an alternative to reward shaping for incorporating domain knowledge to accelerate agent training, thereby improving sample efficiency by directly shaping Q-values. This approach is both general and robust across diverse tasks, allowing for immediate impact assessment while guaranteeing optimality. We evaluated Q-shaping across 20 different environments using a large language model (LLM) as the heuristic provider. The results demonstrate that Q-shaping significantly enhances sample efficiency, achieving a \textbf{16.87\%} improvement over the best baseline in each environment and a \textbf{253.80\%} improvement compared to LLM-based reward shaping methods. These findings establish Q-shaping as a superior and unbiased alternative to conventional reward shaping in reinforcement learning.
title	From Reward Shaping to Q-Shaping: Achieving Unbiased Learning with LLM-Guided Knowledge
topic	Artificial Intelligence Machine Learning
url	https://arxiv.org/abs/2410.01458

Similar Items