Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Li, Yulong, Ren, Bolin, Hu, Ke, Liu, Changyuan, Jiang, Zhengyong, Dang, Kang, Su, Jionglong
Format:	Preprint
Published:	2025
Subjects:	Computers and Society
Online Access:	https://arxiv.org/abs/2501.02321
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929673499312128
author	Li, Yulong Ren, Bolin Hu, Ke Liu, Changyuan Jiang, Zhengyong Dang, Kang Su, Jionglong
author_facet	Li, Yulong Ren, Bolin Hu, Ke Liu, Changyuan Jiang, Zhengyong Dang, Kang Su, Jionglong
contents	Artificial intelligence has achieved notable results in sign language recognition and translation. However, relatively few efforts have been made to significantly improve the quality of life for the 72 million hearing-impaired people worldwide. Sign language translation models, relying on video inputs, involves with large parameter sizes, making it time-consuming and computationally intensive to be deployed. This directly contributes to the scarcity of human-centered technology in this field. Additionally, the lack of datasets in sign language translation hampers research progress in this area. To address these, we first propose a cross-modal multi-knowledge distillation technique from 3D to 1D and a novel end-to-end pre-training text correction framework. Compared to other pre-trained models, our framework achieves significant advancements in correcting text output errors. Our model achieves a decrease in Word Error Rate (WER) of at least 1.4% on PHOENIX14 and PHOENIX14T datasets compared to the state-of-the-art CorrNet. Additionally, the TensorFlow Lite (TFLite) quantized model size is reduced to 12.93 MB, making it the smallest, fastest, and most accurate model to date. We have also collected and released extensive Chinese sign language datasets, and developed a specialized training vocabulary. To address the lack of research on data augmentation for landmark data, we have designed comparative experiments on various augmentation methods. Moreover, we performed a simulated deployment and prediction of our model on Intel platform CPUs and assessed the feasibility of deploying the model on other platforms.
format	Preprint
id	arxiv_https___arxiv_org_abs_2501_02321
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	KD-MSLRT: Lightweight Sign Language Recognition Model Based on Mediapipe and 3D to 1D Knowledge Distillation Li, Yulong Ren, Bolin Hu, Ke Liu, Changyuan Jiang, Zhengyong Dang, Kang Su, Jionglong Computers and Society Artificial intelligence has achieved notable results in sign language recognition and translation. However, relatively few efforts have been made to significantly improve the quality of life for the 72 million hearing-impaired people worldwide. Sign language translation models, relying on video inputs, involves with large parameter sizes, making it time-consuming and computationally intensive to be deployed. This directly contributes to the scarcity of human-centered technology in this field. Additionally, the lack of datasets in sign language translation hampers research progress in this area. To address these, we first propose a cross-modal multi-knowledge distillation technique from 3D to 1D and a novel end-to-end pre-training text correction framework. Compared to other pre-trained models, our framework achieves significant advancements in correcting text output errors. Our model achieves a decrease in Word Error Rate (WER) of at least 1.4% on PHOENIX14 and PHOENIX14T datasets compared to the state-of-the-art CorrNet. Additionally, the TensorFlow Lite (TFLite) quantized model size is reduced to 12.93 MB, making it the smallest, fastest, and most accurate model to date. We have also collected and released extensive Chinese sign language datasets, and developed a specialized training vocabulary. To address the lack of research on data augmentation for landmark data, we have designed comparative experiments on various augmentation methods. Moreover, we performed a simulated deployment and prediction of our model on Intel platform CPUs and assessed the feasibility of deploying the model on other platforms.
title	KD-MSLRT: Lightweight Sign Language Recognition Model Based on Mediapipe and 3D to 1D Knowledge Distillation
topic	Computers and Society
url	https://arxiv.org/abs/2501.02321

Similar Items