תוכן הענינים: :: Library Catalog

שמור ב:

מידע ביבליוגרפי
Main Authors:	Revista, Zen, IA, 10
פורמט:	Recurso digital
שפה:
יצא לאור:	Zenodo 2025
גישה מקוונת:	https://doi.org/10.5281/zenodo.17816138
תגים:	הוספת תג אין תגיות, היה/י הראשונ/ה לתייג את הרשומה!

תוכן הענינים:

Reinforcement Learning (RL) agents often struggle with the exploration-exploitation dilemma, particularly in complex and dynamic environments. The SARSA algorithm, an on-policy temporal-difference control method, requires a carefully chosen exploration strategy to balance discovering new, potentially optimal actions with exploiting known good actions. Traditional fixed exploration strategies, such as a constant epsilon-greedy policy, are often suboptimal as the optimal exploration rate changes over time, across different states, or between various tasks. This paper introduces Meta-SARSA, a novel framework that integrates meta-learning principles to learn adaptive, on-policy exploration strategies for SARSA agents. Meta-SARSA trains a meta-learner to predict or generate exploration parameters (e.g., epsilon values for epsilon-greedy exploration or temperature for Boltzmann exploration) based on the agent's current state, experience, or task context. This meta-learning approach allows the SARSA agent to dynamically adjust its exploration behavior, enhancing sample efficiency, accelerating convergence, and improving overall performance across a distribution of related tasks. We detail the theoretical foundations of Meta-SARSA, propose a practical implementation using neural networks for the meta-learner, and discuss its potential benefits and applications in various domains where adaptive decision-making is crucial. This work represents a significant step towards developing more autonomous and robust RL systems capable of intelligently navigating unknown environments.

פריטים דומים