Saved in:
Bibliographic Details
Main Authors: Zhang, Zao, Chen, Huaming, Ning, Pei, Yang, Nan, Yuan, Dong
Format: Preprint
Published: 2024
Subjects:
Online Access:https://arxiv.org/abs/2410.14741
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866913554168283136
author Zhang, Zao
Chen, Huaming
Ning, Pei
Yang, Nan
Yuan, Dong
author_facet Zhang, Zao
Chen, Huaming
Ning, Pei
Yang, Nan
Yuan, Dong
contents In knowledge distillation, a primary focus has been on transforming and balancing multiple distillation components. In this work, we emphasize the importance of thoroughly examining each distillation component, as we observe that not all elements are equally crucial. From this perspective,we decouple the Kullback-Leibler (KL) divergence into three unique elements: Binary Classification Divergence (BCD), Strong Correlation Divergence (SCD), and Weak Correlation Divergence (WCD). Each of these elements presents varying degrees of influence. Leveraging these insights, we present the Correlation-Aware Knowledge Distillation (CAKD) framework. CAKD is designed to prioritize the facets of the distillation components that have the most substantial influence on predictions, thereby optimizing knowledge transfer from teacher to student models. Our experiments demonstrate that adjusting the effect of each element enhances the effectiveness of knowledge transformation. Furthermore, evidence shows that our novel CAKD framework consistently outperforms the baseline across diverse models and datasets. Our work further highlights the importance and effectiveness of closely examining the impact of different parts of distillation process.
format Preprint
id arxiv_https___arxiv_org_abs_2410_14741
institution arXiv
publishDate 2024
record_format arxiv
spellingShingle CAKD: A Correlation-Aware Knowledge Distillation Framework Based on Decoupling Kullback-Leibler Divergence
Zhang, Zao
Chen, Huaming
Ning, Pei
Yang, Nan
Yuan, Dong
Machine Learning
In knowledge distillation, a primary focus has been on transforming and balancing multiple distillation components. In this work, we emphasize the importance of thoroughly examining each distillation component, as we observe that not all elements are equally crucial. From this perspective,we decouple the Kullback-Leibler (KL) divergence into three unique elements: Binary Classification Divergence (BCD), Strong Correlation Divergence (SCD), and Weak Correlation Divergence (WCD). Each of these elements presents varying degrees of influence. Leveraging these insights, we present the Correlation-Aware Knowledge Distillation (CAKD) framework. CAKD is designed to prioritize the facets of the distillation components that have the most substantial influence on predictions, thereby optimizing knowledge transfer from teacher to student models. Our experiments demonstrate that adjusting the effect of each element enhances the effectiveness of knowledge transformation. Furthermore, evidence shows that our novel CAKD framework consistently outperforms the baseline across diverse models and datasets. Our work further highlights the importance and effectiveness of closely examining the impact of different parts of distillation process.
title CAKD: A Correlation-Aware Knowledge Distillation Framework Based on Decoupling Kullback-Leibler Divergence
topic Machine Learning
url https://arxiv.org/abs/2410.14741