Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Fu, Yonggan, You, Haoran, Zhao, Yang, Wang, Yue, Li, Chaojian, Gopalakrishnan, Kailash, Wang, Zhangyang, Lin, Yingyan Celine
Format:	Preprint
Published:	2020
Subjects:	Computer Vision and Pattern Recognition
Online Access:	https://arxiv.org/abs/2012.13113
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866910771564249088
author	Fu, Yonggan You, Haoran Zhao, Yang Wang, Yue Li, Chaojian Gopalakrishnan, Kailash Wang, Zhangyang Lin, Yingyan Celine
author_facet	Fu, Yonggan You, Haoran Zhao, Yang Wang, Yue Li, Chaojian Gopalakrishnan, Kailash Wang, Zhangyang Lin, Yingyan Celine
contents	Recent breakthroughs in deep neural networks (DNNs) have fueled a tremendous demand for intelligent edge devices featuring on-site learning, while the practical realization of such systems remains a challenge due to the limited resources available at the edge and the required massive training costs for state-of-the-art (SOTA) DNNs. As reducing precision is one of the most effective knobs for boosting training time/energy efficiency, there has been a growing interest in low-precision DNN training. In this paper, we explore from an orthogonal direction: how to fractionally squeeze out more training cost savings from the most redundant bit level, progressively along the training trajectory and dynamically per input. Specifically, we propose FracTrain that integrates (i) progressive fractional quantization which gradually increases the precision of activations, weights, and gradients that will not reach the precision of SOTA static quantized DNN training until the final training stage, and (ii) dynamic fractional quantization which assigns precisions to both the activations and gradients of each layer in an input-adaptive manner, for only "fractionally" updating layer parameters. Extensive simulations and ablation studies (six models, four datasets, and three training settings including standard, adaptation, and fine-tuning) validate the effectiveness of FracTrain in reducing computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%~+1.87%) accuracy. For example, when training ResNet-74 on CIFAR-10, FracTrain achieves 77.6% and 53.5% computational cost and training latency savings, respectively, compared with the best SOTA baseline, while achieving a comparable (-0.07%) accuracy. Our codes are available at: https://github.com/RICE-EIC/FracTrain.
format	Preprint
id	arxiv_https___arxiv_org_abs_2012_13113
institution	arXiv
publishDate	2020
record_format	arxiv
spellingShingle	FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training Fu, Yonggan You, Haoran Zhao, Yang Wang, Yue Li, Chaojian Gopalakrishnan, Kailash Wang, Zhangyang Lin, Yingyan Celine Computer Vision and Pattern Recognition Recent breakthroughs in deep neural networks (DNNs) have fueled a tremendous demand for intelligent edge devices featuring on-site learning, while the practical realization of such systems remains a challenge due to the limited resources available at the edge and the required massive training costs for state-of-the-art (SOTA) DNNs. As reducing precision is one of the most effective knobs for boosting training time/energy efficiency, there has been a growing interest in low-precision DNN training. In this paper, we explore from an orthogonal direction: how to fractionally squeeze out more training cost savings from the most redundant bit level, progressively along the training trajectory and dynamically per input. Specifically, we propose FracTrain that integrates (i) progressive fractional quantization which gradually increases the precision of activations, weights, and gradients that will not reach the precision of SOTA static quantized DNN training until the final training stage, and (ii) dynamic fractional quantization which assigns precisions to both the activations and gradients of each layer in an input-adaptive manner, for only "fractionally" updating layer parameters. Extensive simulations and ablation studies (six models, four datasets, and three training settings including standard, adaptation, and fine-tuning) validate the effectiveness of FracTrain in reducing computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%~+1.87%) accuracy. For example, when training ResNet-74 on CIFAR-10, FracTrain achieves 77.6% and 53.5% computational cost and training latency savings, respectively, compared with the best SOTA baseline, while achieving a comparable (-0.07%) accuracy. Our codes are available at: https://github.com/RICE-EIC/FracTrain.
title	FracTrain: Fractionally Squeezing Bit Savings Both Temporally and Spatially for Efficient DNN Training
topic	Computer Vision and Pattern Recognition
url	https://arxiv.org/abs/2012.13113

Similar Items