Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Fan, Cunhang, Chen, Yujie, Xue, Jun, Kong, Yonghui, Tao, Jianhua, Lv, Zhao
Format:	Preprint
Published:	2024
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2401.12997
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917689028509696
author	Fan, Cunhang Chen, Yujie Xue, Jun Kong, Yonghui Tao, Jianhua Lv, Zhao
author_facet	Fan, Cunhang Chen, Yujie Xue, Jun Kong, Yonghui Tao, Jianhua Lv, Zhao
contents	In recent years, knowledge graph completion (KGC) models based on pre-trained language model (PLM) have shown promising results. However, the large number of parameters and high computational cost of PLM models pose challenges for their application in downstream tasks. This paper proposes a progressive distillation method based on masked generation features for KGC task, aiming to significantly reduce the complexity of pre-trained models. Specifically, we perform pre-distillation on PLM to obtain high-quality teacher models, and compress the PLM network to obtain multi-grade student models. However, traditional feature distillation suffers from the limitation of having a single representation of information in teacher models. To solve this problem, we propose masked generation of teacher-student features, which contain richer representation information. Furthermore, there is a significant gap in representation ability between teacher and student. Therefore, we design a progressive distillation method to distill student models at each grade level, enabling efficient knowledge transfer from teachers to students. The experimental results demonstrate that the model in the pre-distillation stage surpasses the existing state-of-the-art methods. Furthermore, in the progressive distillation stage, the model significantly reduces the model parameters while maintaining a certain level of performance. Specifically, the model parameters of the lower-grade student model are reduced by 56.7\% compared to the baseline.
format	Preprint
id	arxiv_https___arxiv_org_abs_2401_12997
institution	arXiv
publishDate	2024
record_format	arxiv
spellingShingle	Progressive Distillation Based on Masked Generation Feature Method for Knowledge Graph Completion Fan, Cunhang Chen, Yujie Xue, Jun Kong, Yonghui Tao, Jianhua Lv, Zhao Computation and Language In recent years, knowledge graph completion (KGC) models based on pre-trained language model (PLM) have shown promising results. However, the large number of parameters and high computational cost of PLM models pose challenges for their application in downstream tasks. This paper proposes a progressive distillation method based on masked generation features for KGC task, aiming to significantly reduce the complexity of pre-trained models. Specifically, we perform pre-distillation on PLM to obtain high-quality teacher models, and compress the PLM network to obtain multi-grade student models. However, traditional feature distillation suffers from the limitation of having a single representation of information in teacher models. To solve this problem, we propose masked generation of teacher-student features, which contain richer representation information. Furthermore, there is a significant gap in representation ability between teacher and student. Therefore, we design a progressive distillation method to distill student models at each grade level, enabling efficient knowledge transfer from teachers to students. The experimental results demonstrate that the model in the pre-distillation stage surpasses the existing state-of-the-art methods. Furthermore, in the progressive distillation stage, the model significantly reduces the model parameters while maintaining a certain level of performance. Specifically, the model parameters of the lower-grade student model are reduced by 56.7\% compared to the baseline.
title	Progressive Distillation Based on Masked Generation Feature Method for Knowledge Graph Completion
topic	Computation and Language
url	https://arxiv.org/abs/2401.12997

Similar Items