Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Xie, Zhengpeng, Cao, Jiahang, Wang, Changwei, Yang, Fan, Hutter, Marco, Zhang, Qiang, Zhang, Jianxiong, Xu, Renjing
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2501.02481
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914053155192832
author	Xie, Zhengpeng Cao, Jiahang Wang, Changwei Yang, Fan Hutter, Marco Zhang, Qiang Zhang, Jianxiong Xu, Renjing
author_facet	Xie, Zhengpeng Cao, Jiahang Wang, Changwei Yang, Fan Hutter, Marco Zhang, Qiang Zhang, Jianxiong Xu, Renjing
contents	In this paper, we argue that mutual distillation between reinforcement learning policies serves as an implicit regularization, preventing them from overfitting to irrelevant features. We highlight two separate contributions: (i) Theoretically, for the first time, we prove that enhancing the policy robustness to irrelevant features leads to improved generalization performance. (ii) Empirically, we demonstrate that mutual distillation between policies contributes to such robustness, enabling the spontaneous emergence of invariant representations over pixel inputs. Ultimately, we do not claim to achieve state-of-the-art performance but rather focus on uncovering the underlying principles of generalization and deepening our understanding of its mechanisms.
format	Preprint
id	arxiv_https___arxiv_org_abs_2501_02481
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Representation Convergence: Mutual Distillation is Secretly a Form of Regularization Xie, Zhengpeng Cao, Jiahang Wang, Changwei Yang, Fan Hutter, Marco Zhang, Qiang Zhang, Jianxiong Xu, Renjing Machine Learning Artificial Intelligence In this paper, we argue that mutual distillation between reinforcement learning policies serves as an implicit regularization, preventing them from overfitting to irrelevant features. We highlight two separate contributions: (i) Theoretically, for the first time, we prove that enhancing the policy robustness to irrelevant features leads to improved generalization performance. (ii) Empirically, we demonstrate that mutual distillation between policies contributes to such robustness, enabling the spontaneous emergence of invariant representations over pixel inputs. Ultimately, we do not claim to achieve state-of-the-art performance but rather focus on uncovering the underlying principles of generalization and deepening our understanding of its mechanisms.
title	Representation Convergence: Mutual Distillation is Secretly a Form of Regularization
topic	Machine Learning Artificial Intelligence
url	https://arxiv.org/abs/2501.02481

Similar Items