Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Yue, Yun, Liu, Yongchao, Tong, Suo, Li, Minghao, Zhang, Zhen, Wen, Chunyang, Bao, Huanjun, Gu, Lihong, Gu, Jinjie, Mu, Yixiang
Format:	Preprint
Published:	2021
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2107.14432
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866929615710191616
author	Yue, Yun Liu, Yongchao Tong, Suo Li, Minghao Zhang, Zhen Wen, Chunyang Bao, Huanjun Gu, Lihong Gu, Jinjie Mu, Yixiang
author_facet	Yue, Yun Liu, Yongchao Tong, Suo Li, Minghao Zhang, Zhen Wen, Chunyang Bao, Huanjun Gu, Lihong Gu, Jinjie Mu, Yixiang
contents	We develop a novel framework that adds the regularizers of the sparse group lasso to a family of adaptive optimizers in deep learning, such as Momentum, Adagrad, Adam, AMSGrad, AdaHessian, and create a new class of optimizers, which are named Group Momentum, Group Adagrad, Group Adam, Group AMSGrad and Group AdaHessian, etc., accordingly. We establish theoretically proven convergence guarantees in the stochastic convex settings, based on primal-dual methods. We evaluate the regularized effect of our new optimizers on three large-scale real-world ad click datasets with state-of-the-art deep learning models. The experimental results reveal that compared with the original optimizers with the post-processing procedure which uses the magnitude pruning method, the performance of the models can be significantly improved on the same sparsity level. Furthermore, in comparison to the cases without magnitude pruning, our methods can achieve extremely high sparsity with significantly better or highly competitive performance. The code is available at https://github.com/intelligent-machine-learning/tfplus/tree/main/tfplus.
format	Preprint
id	arxiv_https___arxiv_org_abs_2107_14432
institution	arXiv
publishDate	2021
record_format	arxiv
spellingShingle	Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction Yue, Yun Liu, Yongchao Tong, Suo Li, Minghao Zhang, Zhen Wen, Chunyang Bao, Huanjun Gu, Lihong Gu, Jinjie Mu, Yixiang Machine Learning We develop a novel framework that adds the regularizers of the sparse group lasso to a family of adaptive optimizers in deep learning, such as Momentum, Adagrad, Adam, AMSGrad, AdaHessian, and create a new class of optimizers, which are named Group Momentum, Group Adagrad, Group Adam, Group AMSGrad and Group AdaHessian, etc., accordingly. We establish theoretically proven convergence guarantees in the stochastic convex settings, based on primal-dual methods. We evaluate the regularized effect of our new optimizers on three large-scale real-world ad click datasets with state-of-the-art deep learning models. The experimental results reveal that compared with the original optimizers with the post-processing procedure which uses the magnitude pruning method, the performance of the models can be significantly improved on the same sparsity level. Furthermore, in comparison to the cases without magnitude pruning, our methods can achieve extremely high sparsity with significantly better or highly competitive performance. The code is available at https://github.com/intelligent-machine-learning/tfplus/tree/main/tfplus.
title	Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction
topic	Machine Learning
url	https://arxiv.org/abs/2107.14432

Similar Items