Saved in:
| Main Authors: | , , |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2306.02192 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866911549081255936 |
|---|---|
| author | Xu, Yewei Chen, Shi Li, Qin |
| author_facet | Xu, Yewei Chen, Shi Li, Qin |
| contents | Does the use of auto-differentiation yield reasonable updates for deep neural networks (DNNs)? Specifically, when DNNs are designed to adhere to neural ODE architectures, can we trust the gradients provided by auto-differentiation? Through mathematical analysis and numerical evidence, we demonstrate that when neural networks employ high-order methods, such as Linear Multistep Methods (LMM) or Explicit Runge-Kutta Methods (ERK), to approximate the underlying ODE flows, brute-force auto-differentiation often introduces artificial oscillations in the gradients that prevent convergence. In the case of Leapfrog and 2-stage ERK, we propose simple post-processing techniques that effectively eliminates these oscillations, correct the gradient computation and thus returns the accurate updates. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2306_02192 |
| institution | arXiv |
| publishDate | 2023 |
| record_format | arxiv |
| spellingShingle | Correcting Auto-Differentiation in Neural-ODE Training Xu, Yewei Chen, Shi Li, Qin Machine Learning Numerical Analysis 65D25 (Primary), 65L06, 90C31 (Secondary) Does the use of auto-differentiation yield reasonable updates for deep neural networks (DNNs)? Specifically, when DNNs are designed to adhere to neural ODE architectures, can we trust the gradients provided by auto-differentiation? Through mathematical analysis and numerical evidence, we demonstrate that when neural networks employ high-order methods, such as Linear Multistep Methods (LMM) or Explicit Runge-Kutta Methods (ERK), to approximate the underlying ODE flows, brute-force auto-differentiation often introduces artificial oscillations in the gradients that prevent convergence. In the case of Leapfrog and 2-stage ERK, we propose simple post-processing techniques that effectively eliminates these oscillations, correct the gradient computation and thus returns the accurate updates. |
| title | Correcting Auto-Differentiation in Neural-ODE Training |
| topic | Machine Learning Numerical Analysis 65D25 (Primary), 65L06, 90C31 (Secondary) |
| url | https://arxiv.org/abs/2306.02192 |