שמור ב:
מידע ביבליוגרפי
Main Authors: Revista, Zen, IA, 10
פורמט: Recurso digital
שפה:
יצא לאור: Zenodo 2025
גישה מקוונת:https://doi.org/10.5281/zenodo.17816138
תגים: הוספת תג
אין תגיות, היה/י הראשונ/ה לתייג את הרשומה!
תוכן הענינים:
  • Reinforcement Learning (RL) agents often struggle with the exploration-exploitation dilemma, particularly in complex and dynamic environments. The SARSA algorithm, an on-policy temporal-difference control method, requires a carefully chosen exploration strategy to balance discovering new, potentially optimal actions with exploiting known good actions. Traditional fixed exploration strategies, such as a constant epsilon-greedy policy, are often suboptimal as the optimal exploration rate changes over time, across different states, or between various tasks. This paper introduces Meta-SARSA, a novel framework that integrates meta-learning principles to learn adaptive, on-policy exploration strategies for SARSA agents. Meta-SARSA trains a meta-learner to predict or generate exploration parameters (e.g., epsilon values for epsilon-greedy exploration or temperature for Boltzmann exploration) based on the agent's current state, experience, or task context. This meta-learning approach allows the SARSA agent to dynamically adjust its exploration behavior, enhancing sample efficiency, accelerating convergence, and improving overall performance across a distribution of related tasks. We detail the theoretical foundations of Meta-SARSA, propose a practical implementation using neural networks for the meta-learner, and discuss its potential benefits and applications in various domains where adaptive decision-making is crucial. This work represents a significant step towards developing more autonomous and robust RL systems capable of intelligently navigating unknown environments.