Saved in:
| Main Authors: | Zhang, Chenyang, Gao, Peifeng, Zou, Difan, Cao, Yuan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2504.08628 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
The Implicit Bias of Adam on Separable Data
by: Zhang, Chenyang, et al.
Published: (2024)
by: Zhang, Chenyang, et al.
Published: (2024)
Understanding the Generalization of Stochastic Gradient Adam in Learning Neural Networks
by: Tang, Xuan, et al.
Published: (2025)
by: Tang, Xuan, et al.
Published: (2025)
Transformers Trained via Gradient Descent Can Provably Learn a Class of Teacher Models
by: Zhang, Chenyang, et al.
Published: (2026)
by: Zhang, Chenyang, et al.
Published: (2026)
The Implicit Bias of Steepest Descent with Mini-batch Stochastic Gradient
by: Li, Jichu, et al.
Published: (2026)
by: Li, Jichu, et al.
Published: (2026)
Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent
by: Zhang, Chenyang, et al.
Published: (2026)
by: Zhang, Chenyang, et al.
Published: (2026)
On the Robustness of Transformers against Context Hijacking for Linear Classification
by: Li, Tianle, et al.
Published: (2025)
by: Li, Tianle, et al.
Published: (2025)
A Mechanism Study of Delayed Loss Spikes in Batch-Normalized Linear Models
by: Gao, Peifeng, et al.
Published: (2026)
by: Gao, Peifeng, et al.
Published: (2026)
Understanding the Benefits of SimCLR Pre-Training in Two-Layer Convolutional Neural Networks
by: Zhang, Han, et al.
Published: (2024)
by: Zhang, Han, et al.
Published: (2024)
On the Feature Learning in Diffusion Models
by: Han, Andi, et al.
Published: (2024)
by: Han, Andi, et al.
Published: (2024)
Stochastic Gradient Descent for Two-layer Neural Networks
by: Cao, Dinghao, et al.
Published: (2024)
by: Cao, Dinghao, et al.
Published: (2024)
How Does Label Noise Gradient Descent Improve Generalization in the Low SNR Regime?
by: Huang, Wei, et al.
Published: (2025)
by: Huang, Wei, et al.
Published: (2025)
Enhancing Deep Learning with Optimized Gradient Descent: Bridging Numerical Methods and Neural Network Training
by: Ma, Yuhan, et al.
Published: (2024)
by: Ma, Yuhan, et al.
Published: (2024)
PRES: Toward Scalable Memory-Based Dynamic Graph Neural Networks
by: Su, Junwei, et al.
Published: (2024)
by: Su, Junwei, et al.
Published: (2024)
Improving Group Robustness on Spurious Correlation Requires Preciser Group Inference
by: Han, Yujin, et al.
Published: (2024)
by: Han, Yujin, et al.
Published: (2024)
Towards Robust Graph Incremental Learning on Evolving Graphs
by: Su, Junwei, et al.
Published: (2024)
by: Su, Junwei, et al.
Published: (2024)
Approximation and Gradient Descent Training with Neural Networks
by: Welper, G.
Published: (2024)
by: Welper, G.
Published: (2024)
On Convolutions, Intrinsic Dimension, and Diffusion Models
by: Leung, Kin Kwan, et al.
Published: (2025)
by: Leung, Kin Kwan, et al.
Published: (2025)
Step by Step: Adaptive Gradient Descent for Training L-Lipschitz Neural Networks
by: Sung, Kyle, et al.
Published: (2025)
by: Sung, Kyle, et al.
Published: (2025)
What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks
by: Chen, Xingwu, et al.
Published: (2024)
by: Chen, Xingwu, et al.
Published: (2024)
On the Theory of Continual Learning with Gradient Descent for Neural Networks
by: Taheri, Hossein, et al.
Published: (2025)
by: Taheri, Hossein, et al.
Published: (2025)
Initialization Matters: On the Benign Overfitting of Two-Layer ReLU CNN with Fully Trainable Layers
by: Shang, Shuning, et al.
Published: (2024)
by: Shang, Shuning, et al.
Published: (2024)
On the Intrinsic Dimensions of Data in Kernel Learning
by: Takhanov, Rustem
Published: (2026)
by: Takhanov, Rustem
Published: (2026)
Hybrid Coordinate Descent for Efficient Neural Network Learning Using Line Search and Gradient Descent
by: Hsiao, Yen-Che, et al.
Published: (2024)
by: Hsiao, Yen-Che, et al.
Published: (2024)
Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data
by: Frei, Spencer, et al.
Published: (2022)
by: Frei, Spencer, et al.
Published: (2022)
Dual Natural Gradient Descent for Scalable Training of Physics-Informed Neural Networks
by: Jnini, Anas, et al.
Published: (2025)
by: Jnini, Anas, et al.
Published: (2025)
Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks
by: Li, Binghui, et al.
Published: (2024)
by: Li, Binghui, et al.
Published: (2024)
Faster Sampling via Stochastic Gradient Proximal Sampler
by: Huang, Xunpeng, et al.
Published: (2024)
by: Huang, Xunpeng, et al.
Published: (2024)
Variational Stochastic Gradient Descent for Deep Neural Networks
by: Chen, Haotian, et al.
Published: (2024)
by: Chen, Haotian, et al.
Published: (2024)
Physics-Informed Neural PDE Solvers via Spatio-Temporal MeanFlow
by: Bai, Hanru, et al.
Published: (2026)
by: Bai, Hanru, et al.
Published: (2026)
Dual Cone Gradient Descent for Training Physics-Informed Neural Networks
by: Hwang, Youngsik, et al.
Published: (2024)
by: Hwang, Youngsik, et al.
Published: (2024)
Gradient Flow Matching for Learning Update Dynamics in Neural Network Training
by: Shou, Xiao, et al.
Published: (2025)
by: Shou, Xiao, et al.
Published: (2025)
Optimizing Quantum Convolutional Neural Network Architectures for Arbitrary Data Dimension
by: Lee, Changwon, et al.
Published: (2024)
by: Lee, Changwon, et al.
Published: (2024)
Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks
by: Xu, Xianliang, et al.
Published: (2024)
by: Xu, Xianliang, et al.
Published: (2024)
Learning under Quantization for High-Dimensional Linear Regression
by: Zhang, Dechen, et al.
Published: (2025)
by: Zhang, Dechen, et al.
Published: (2025)
Reshaping Reasoning in LLMs: A Theoretical Analysis of RL Training Dynamics through Pattern Selection
by: Chen, Xingwu, et al.
Published: (2025)
by: Chen, Xingwu, et al.
Published: (2025)
Robust Gradient Descent for Phase Retrieval
by: Buna, Alex, et al.
Published: (2024)
by: Buna, Alex, et al.
Published: (2024)
Generalization Bounds of Stochastic Gradient Descent in Homogeneous Neural Networks
by: Ma, Wenquan, et al.
Published: (2026)
by: Ma, Wenquan, et al.
Published: (2026)
Generalization Guarantees of Gradient Descent for Multi-Layer Neural Networks
by: Wang, Puyu, et al.
Published: (2023)
by: Wang, Puyu, et al.
Published: (2023)
Extracting Training Data from Unconditional Diffusion Models
by: Chen, Yunhao, et al.
Published: (2024)
by: Chen, Yunhao, et al.
Published: (2024)
On the Limitation and Experience Replay for GNNs in Continual Learning
by: Su, Junwei, et al.
Published: (2023)
by: Su, Junwei, et al.
Published: (2023)
Similar Items
-
The Implicit Bias of Adam on Separable Data
by: Zhang, Chenyang, et al.
Published: (2024) -
Understanding the Generalization of Stochastic Gradient Adam in Learning Neural Networks
by: Tang, Xuan, et al.
Published: (2025) -
Transformers Trained via Gradient Descent Can Provably Learn a Class of Teacher Models
by: Zhang, Chenyang, et al.
Published: (2026) -
The Implicit Bias of Steepest Descent with Mini-batch Stochastic Gradient
by: Li, Jichu, et al.
Published: (2026) -
Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent
by: Zhang, Chenyang, et al.
Published: (2026)