Saved in:
Bibliographic Details
Main Authors: Yue, Yun, Liu, Yongchao, Tong, Suo, Li, Minghao, Zhang, Zhen, Wen, Chunyang, Bao, Huanjun, Gu, Lihong, Gu, Jinjie, Mu, Yixiang
Format: Preprint
Published: 2021
Subjects:
Online Access:https://arxiv.org/abs/2107.14432
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866929615710191616
author Yue, Yun
Liu, Yongchao
Tong, Suo
Li, Minghao
Zhang, Zhen
Wen, Chunyang
Bao, Huanjun
Gu, Lihong
Gu, Jinjie
Mu, Yixiang
author_facet Yue, Yun
Liu, Yongchao
Tong, Suo
Li, Minghao
Zhang, Zhen
Wen, Chunyang
Bao, Huanjun
Gu, Lihong
Gu, Jinjie
Mu, Yixiang
contents We develop a novel framework that adds the regularizers of the sparse group lasso to a family of adaptive optimizers in deep learning, such as Momentum, Adagrad, Adam, AMSGrad, AdaHessian, and create a new class of optimizers, which are named Group Momentum, Group Adagrad, Group Adam, Group AMSGrad and Group AdaHessian, etc., accordingly. We establish theoretically proven convergence guarantees in the stochastic convex settings, based on primal-dual methods. We evaluate the regularized effect of our new optimizers on three large-scale real-world ad click datasets with state-of-the-art deep learning models. The experimental results reveal that compared with the original optimizers with the post-processing procedure which uses the magnitude pruning method, the performance of the models can be significantly improved on the same sparsity level. Furthermore, in comparison to the cases without magnitude pruning, our methods can achieve extremely high sparsity with significantly better or highly competitive performance. The code is available at https://github.com/intelligent-machine-learning/tfplus/tree/main/tfplus.
format Preprint
id arxiv_https___arxiv_org_abs_2107_14432
institution arXiv
publishDate 2021
record_format arxiv
spellingShingle Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction
Yue, Yun
Liu, Yongchao
Tong, Suo
Li, Minghao
Zhang, Zhen
Wen, Chunyang
Bao, Huanjun
Gu, Lihong
Gu, Jinjie
Mu, Yixiang
Machine Learning
We develop a novel framework that adds the regularizers of the sparse group lasso to a family of adaptive optimizers in deep learning, such as Momentum, Adagrad, Adam, AMSGrad, AdaHessian, and create a new class of optimizers, which are named Group Momentum, Group Adagrad, Group Adam, Group AMSGrad and Group AdaHessian, etc., accordingly. We establish theoretically proven convergence guarantees in the stochastic convex settings, based on primal-dual methods. We evaluate the regularized effect of our new optimizers on three large-scale real-world ad click datasets with state-of-the-art deep learning models. The experimental results reveal that compared with the original optimizers with the post-processing procedure which uses the magnitude pruning method, the performance of the models can be significantly improved on the same sparsity level. Furthermore, in comparison to the cases without magnitude pruning, our methods can achieve extremely high sparsity with significantly better or highly competitive performance. The code is available at https://github.com/intelligent-machine-learning/tfplus/tree/main/tfplus.
title Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction
topic Machine Learning
url https://arxiv.org/abs/2107.14432