Saved in:
Bibliographic Details
Main Authors: Oh, Jong Kwon, Lyu, Hanbaek, Son, Hwijae
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2509.19773
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Sobolev training, which integrates target derivatives into the loss functions, has been shown to accelerate convergence and improve generalization compared to conventional $L^2$ training. However, the underlying mechanisms of this training method remain only partially understood. In this work, we present the first rigorous theoretical framework proving that Sobolev training accelerates the convergence of Rectified Linear Unit (ReLU) networks. Under a student-teacher framework with Gaussian inputs and shallow architectures, we derive exact formulas for population gradients and Hessians, and quantify the improvements in conditioning of the loss landscape and gradient-flow convergence rates. Extensive numerical experiments validate our theoretical findings and show that the benefits of Sobolev training extend to modern deep learning tasks.