Saved in:
Bibliographic Details
Main Authors: Yin, Shuyu, Wen, Fei, Liu, Peilin, Luo, Tao
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2406.08148
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866911914360045568
author Yin, Shuyu
Wen, Fei
Liu, Peilin
Luo, Tao
author_facet Yin, Shuyu
Wen, Fei
Liu, Peilin
Luo, Tao
contents Semi-gradient Q-learning is applied in many fields, but due to the absence of an explicit loss function, studying its dynamics and implicit bias in the parameter space is challenging. This paper introduces the Fokker--Planck equation and employs partial data obtained through sampling to construct and visualize the effective loss landscape within a two-dimensional parameter space. This visualization reveals how the global minima in the loss landscape can transform into saddle points in the effective loss landscape, as well as the implicit bias of the semi-gradient method. Additionally, we demonstrate that saddle points, originating from the global minima in loss landscape, still exist in the effective loss landscape under high-dimensional parameter spaces and neural network settings. This paper develop a novel approach for probing implicit bias in semi-gradient Q-learning.
format Preprint
id arxiv_https___arxiv_org_abs_2406_08148
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle Probing Implicit Bias in Semi-gradient Q-learning: Visualizing the Effective Loss Landscapes via the Fokker--Planck Equation
Yin, Shuyu
Wen, Fei
Liu, Peilin
Luo, Tao
Machine Learning
Artificial Intelligence
Semi-gradient Q-learning is applied in many fields, but due to the absence of an explicit loss function, studying its dynamics and implicit bias in the parameter space is challenging. This paper introduces the Fokker--Planck equation and employs partial data obtained through sampling to construct and visualize the effective loss landscape within a two-dimensional parameter space. This visualization reveals how the global minima in the loss landscape can transform into saddle points in the effective loss landscape, as well as the implicit bias of the semi-gradient method. Additionally, we demonstrate that saddle points, originating from the global minima in loss landscape, still exist in the effective loss landscape under high-dimensional parameter spaces and neural network settings. This paper develop a novel approach for probing implicit bias in semi-gradient Q-learning.
title Probing Implicit Bias in Semi-gradient Q-learning: Visualizing the Effective Loss Landscapes via the Fokker--Planck Equation
topic Machine Learning
Artificial Intelligence
url https://arxiv.org/abs/2406.08148