Saved in:
Bibliographic Details
Main Author: Wu, Xiefeng
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2410.01458
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866909333029126144
author Wu, Xiefeng
author_facet Wu, Xiefeng
contents Q-shaping is an extension of Q-value initialization and serves as an alternative to reward shaping for incorporating domain knowledge to accelerate agent training, thereby improving sample efficiency by directly shaping Q-values. This approach is both general and robust across diverse tasks, allowing for immediate impact assessment while guaranteeing optimality. We evaluated Q-shaping across 20 different environments using a large language model (LLM) as the heuristic provider. The results demonstrate that Q-shaping significantly enhances sample efficiency, achieving a \textbf{16.87\%} improvement over the best baseline in each environment and a \textbf{253.80\%} improvement compared to LLM-based reward shaping methods. These findings establish Q-shaping as a superior and unbiased alternative to conventional reward shaping in reinforcement learning.
format Preprint
id arxiv_https___arxiv_org_abs_2410_01458
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle From Reward Shaping to Q-Shaping: Achieving Unbiased Learning with LLM-Guided Knowledge
Wu, Xiefeng
Artificial Intelligence
Machine Learning
Q-shaping is an extension of Q-value initialization and serves as an alternative to reward shaping for incorporating domain knowledge to accelerate agent training, thereby improving sample efficiency by directly shaping Q-values. This approach is both general and robust across diverse tasks, allowing for immediate impact assessment while guaranteeing optimality. We evaluated Q-shaping across 20 different environments using a large language model (LLM) as the heuristic provider. The results demonstrate that Q-shaping significantly enhances sample efficiency, achieving a \textbf{16.87\%} improvement over the best baseline in each environment and a \textbf{253.80\%} improvement compared to LLM-based reward shaping methods. These findings establish Q-shaping as a superior and unbiased alternative to conventional reward shaping in reinforcement learning.
title From Reward Shaping to Q-Shaping: Achieving Unbiased Learning with LLM-Guided Knowledge
topic Artificial Intelligence
Machine Learning
url https://arxiv.org/abs/2410.01458