Saved in:
| Main Authors: | Yu, Jianneng, Morozov, Alexandre V. |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.21276 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Stochastic Estimation of the Layer-wise Hessian Trace for Monitoring Neural-network Training
by: Bolshim, Maxim, et al.
Published: (2026)
by: Bolshim, Maxim, et al.
Published: (2026)
On the Convergence Behavior of Preconditioned Gradient Descent Toward the Rich Learning Regime
by: Jiang, Shuai, et al.
Published: (2026)
by: Jiang, Shuai, et al.
Published: (2026)
Data-induced multiscale losses and efficient multirate gradient descent schemes
by: He, Juncai, et al.
Published: (2024)
by: He, Juncai, et al.
Published: (2024)
Local properties of neural networks through the lens of layer-wise Hessians
by: Bolshim, Maxim, et al.
Published: (2025)
by: Bolshim, Maxim, et al.
Published: (2025)
Inter-Layer Hessian Analysis of Neural Networks with DAG Architectures
by: Bolshim, Maxim, et al.
Published: (2026)
by: Bolshim, Maxim, et al.
Published: (2026)
Gradient descent provably escapes saddle points in the training of shallow ReLU networks
by: Cheridito, Patrick, et al.
Published: (2022)
by: Cheridito, Patrick, et al.
Published: (2022)
Kourkoutas-Beta: A Sunspike-Driven Adam Optimizer with Desert Flair
by: Kassinos, Stavros C.
Published: (2025)
by: Kassinos, Stavros C.
Published: (2025)
Non-convergence to the optimal risk for Adam and stochastic gradient descent optimization in the training of deep neural networks
by: Do, Thang, et al.
Published: (2025)
by: Do, Thang, et al.
Published: (2025)
EB-gMCR: Energy-Based Generative Modeling for Signal Unmixing and Multivariate Curve Resolution
by: Chang, Yu-Tang, et al.
Published: (2025)
by: Chang, Yu-Tang, et al.
Published: (2025)
ZetA: A Riemann Zeta-Scaled Extension of Adam for Deep Learning
by: BC, Samiksha
Published: (2025)
by: BC, Samiksha
Published: (2025)
GDNSQ: Gradual Differentiable Noise Scale Quantization for Low-bit Neural Networks
by: Salishev, Sergey, et al.
Published: (2025)
by: Salishev, Sergey, et al.
Published: (2025)
Diagnosing Failure Modes of Neural Operators Across Diverse PDE Families
by: Shikhman, Lennon
Published: (2026)
by: Shikhman, Lennon
Published: (2026)
Adam Improves Muon: Adaptive Moment Estimation with Orthogonalized Momentum
by: Zhang, Minxin, et al.
Published: (2026)
by: Zhang, Minxin, et al.
Published: (2026)
Context-dependent manifold learning: A neuromodulated constrained autoencoder approach
by: Adriaens, Jérôme, et al.
Published: (2026)
by: Adriaens, Jérôme, et al.
Published: (2026)
Ghosts of Softmax: Complex Singularities That Limit Safe Step Sizes in Cross-Entropy
by: Sao, Piyush
Published: (2026)
by: Sao, Piyush
Published: (2026)
Total Generalized Variation regularization closes the gap between neural-eld and classical methods in seismic travel-time tomography
by: Kurosawa, Isao
Published: (2026)
by: Kurosawa, Isao
Published: (2026)
Resolving gradient pathology in physics-informed epidemiological models
by: Golooba, Nickson, et al.
Published: (2026)
by: Golooba, Nickson, et al.
Published: (2026)
Physics Informed Differentiable Solvers for Learning Parametric Solution Manifolds in Heterogeneous Physical Systems
by: Panahi, Milad, et al.
Published: (2026)
by: Panahi, Milad, et al.
Published: (2026)
Model-Free Local Recalibration of Neural Networks
by: Torres, R., et al.
Published: (2024)
by: Torres, R., et al.
Published: (2024)
Spiking Neural Networks for SAR Interferometric Phase Unwrapping: A Theoretical Framework for Energy-Efficient Processing
by: Bara, Marc
Published: (2025)
by: Bara, Marc
Published: (2025)
Differentiable Optimization Layers for Guaranteed Fairness in Deep Learning
by: Troxell, David, et al.
Published: (2026)
by: Troxell, David, et al.
Published: (2026)
Deep Learning and Elicitability for McKean-Vlasov FBSDEs With Common Noise
by: Antunes, Felipe J. P., et al.
Published: (2025)
by: Antunes, Felipe J. P., et al.
Published: (2025)
Neural Green's Operators for Parametric Partial Differential Equations
by: Melchers, Hugo, et al.
Published: (2024)
by: Melchers, Hugo, et al.
Published: (2024)
Manifold limit for the training of shallow graph convolutional neural networks
by: Tengler, Johanna, et al.
Published: (2026)
by: Tengler, Johanna, et al.
Published: (2026)
PyEPO: A PyTorch-based End-to-End Predict-then-Optimize Library for Linear and Integer Programming
by: Tang, Bo, et al.
Published: (2022)
by: Tang, Bo, et al.
Published: (2022)
Non-convergence to global minimizers in data driven supervised deep learning: Adam and stochastic gradient descent optimization provably fail to converge to global minimizers in the training of deep neural networks with ReLU activation
by: Do, Thang, et al.
Published: (2024)
by: Do, Thang, et al.
Published: (2024)
CAO: Curvature-Adaptive Optimization via Periodic Low-Rank Hessian Sketching
by: Du, Wenzhang
Published: (2025)
by: Du, Wenzhang
Published: (2025)
Benchmarking Generative AI Against Bayesian Optimization for Constrained Multi-Objective Inverse Design
by: Awan, Muhammad Bilal, et al.
Published: (2025)
by: Awan, Muhammad Bilal, et al.
Published: (2025)
The Polar Express: Optimal Matrix Sign Methods and Their Application to the Muon Algorithm
by: Amsel, Noah, et al.
Published: (2025)
by: Amsel, Noah, et al.
Published: (2025)
Online Federation For Mixtures of Proprietary Agents with Black-Box Encoders
by: Yang, Xuwei, et al.
Published: (2025)
by: Yang, Xuwei, et al.
Published: (2025)
Near-optimal estimates for the $\ell^p$-Lipschitz constants of deep random ReLU neural networks
by: Dirksen, Sjoerd, et al.
Published: (2025)
by: Dirksen, Sjoerd, et al.
Published: (2025)
Refining Graphical Neural Network Predictions Using Flow Matching for Optimal Power Flow with Constraint-Satisfaction Guarantee
by: Khanal, Kshitiz
Published: (2025)
by: Khanal, Kshitiz
Published: (2025)
Active Learning for Conditional Generative Compressed Sensing
by: DeLise, Alexander, et al.
Published: (2026)
by: DeLise, Alexander, et al.
Published: (2026)
The Neural Differential Manifold: An Architecture with Explicit Geometric Structure
by: Zhang, Di
Published: (2025)
by: Zhang, Di
Published: (2025)
Mathematical Foundations of Neural Tangents and Infinite-Width Networks
by: Mysore, Rachana, et al.
Published: (2025)
by: Mysore, Rachana, et al.
Published: (2025)
Frequency Bias and OOD Generalization in Neural Operators under a Variable-Coefficient Wave Equation
by: Xie, Runlong, et al.
Published: (2026)
by: Xie, Runlong, et al.
Published: (2026)
Primal-Dual Sample Complexity Bounds for Constrained Markov Decision Processes with Multiple Constraints
by: Buckley, Max, et al.
Published: (2025)
by: Buckley, Max, et al.
Published: (2025)
Physics-Informed Neural Networks for Optimal Vaccination Plan in SIR Epidemic Models
by: Kim, Minseok, et al.
Published: (2025)
by: Kim, Minseok, et al.
Published: (2025)
SCAPE: Searching Conceptual Architecture Prompts using Evolution
by: Lim, Soo Ling, et al.
Published: (2024)
by: Lim, Soo Ling, et al.
Published: (2024)
Towards Coordinate- and Dimension-Agnostic Machine Learning for Partial Differential Equations
by: Phan, Trung V., et al.
Published: (2025)
by: Phan, Trung V., et al.
Published: (2025)
Similar Items
-
Stochastic Estimation of the Layer-wise Hessian Trace for Monitoring Neural-network Training
by: Bolshim, Maxim, et al.
Published: (2026) -
On the Convergence Behavior of Preconditioned Gradient Descent Toward the Rich Learning Regime
by: Jiang, Shuai, et al.
Published: (2026) -
Data-induced multiscale losses and efficient multirate gradient descent schemes
by: He, Juncai, et al.
Published: (2024) -
Local properties of neural networks through the lens of layer-wise Hessians
by: Bolshim, Maxim, et al.
Published: (2025) -
Inter-Layer Hessian Analysis of Neural Networks with DAG Architectures
by: Bolshim, Maxim, et al.
Published: (2026)