Saved in:
| Main Author: | |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.01458 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866909333029126144 |
|---|---|
| author | Wu, Xiefeng |
| author_facet | Wu, Xiefeng |
| contents | Q-shaping is an extension of Q-value initialization and serves as an alternative to reward shaping for incorporating domain knowledge to accelerate agent training, thereby improving sample efficiency by directly shaping Q-values. This approach is both general and robust across diverse tasks, allowing for immediate impact assessment while guaranteeing optimality. We evaluated Q-shaping across 20 different environments using a large language model (LLM) as the heuristic provider. The results demonstrate that Q-shaping significantly enhances sample efficiency, achieving a \textbf{16.87\%} improvement over the best baseline in each environment and a \textbf{253.80\%} improvement compared to LLM-based reward shaping methods. These findings establish Q-shaping as a superior and unbiased alternative to conventional reward shaping in reinforcement learning. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2410_01458 |
| institution | arXiv |
| publishDate | 2024 |
| record_format | arxiv |
| spellingShingle | From Reward Shaping to Q-Shaping: Achieving Unbiased Learning with LLM-Guided Knowledge Wu, Xiefeng Artificial Intelligence Machine Learning Q-shaping is an extension of Q-value initialization and serves as an alternative to reward shaping for incorporating domain knowledge to accelerate agent training, thereby improving sample efficiency by directly shaping Q-values. This approach is both general and robust across diverse tasks, allowing for immediate impact assessment while guaranteeing optimality. We evaluated Q-shaping across 20 different environments using a large language model (LLM) as the heuristic provider. The results demonstrate that Q-shaping significantly enhances sample efficiency, achieving a \textbf{16.87\%} improvement over the best baseline in each environment and a \textbf{253.80\%} improvement compared to LLM-based reward shaping methods. These findings establish Q-shaping as a superior and unbiased alternative to conventional reward shaping in reinforcement learning. |
| title | From Reward Shaping to Q-Shaping: Achieving Unbiased Learning with LLM-Guided Knowledge |
| topic | Artificial Intelligence Machine Learning |
| url | https://arxiv.org/abs/2410.01458 |