:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Chenyang, Gao, Peifeng, Zou, Difan, Cao, Yuan
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2504.08628
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

The Implicit Bias of Adam on Separable Data
by: Zhang, Chenyang, et al.
Published: (2024)

Understanding the Generalization of Stochastic Gradient Adam in Learning Neural Networks
by: Tang, Xuan, et al.
Published: (2025)

Transformers Trained via Gradient Descent Can Provably Learn a Class of Teacher Models
by: Zhang, Chenyang, et al.
Published: (2026)

The Implicit Bias of Steepest Descent with Mini-batch Stochastic Gradient
by: Li, Jichu, et al.
Published: (2026)

Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent
by: Zhang, Chenyang, et al.
Published: (2026)

On the Robustness of Transformers against Context Hijacking for Linear Classification
by: Li, Tianle, et al.
Published: (2025)

A Mechanism Study of Delayed Loss Spikes in Batch-Normalized Linear Models
by: Gao, Peifeng, et al.
Published: (2026)

Understanding the Benefits of SimCLR Pre-Training in Two-Layer Convolutional Neural Networks
by: Zhang, Han, et al.
Published: (2024)

On the Feature Learning in Diffusion Models
by: Han, Andi, et al.
Published: (2024)

Stochastic Gradient Descent for Two-layer Neural Networks
by: Cao, Dinghao, et al.
Published: (2024)

How Does Label Noise Gradient Descent Improve Generalization in the Low SNR Regime?
by: Huang, Wei, et al.
Published: (2025)

Enhancing Deep Learning with Optimized Gradient Descent: Bridging Numerical Methods and Neural Network Training
by: Ma, Yuhan, et al.
Published: (2024)

PRES: Toward Scalable Memory-Based Dynamic Graph Neural Networks
by: Su, Junwei, et al.
Published: (2024)

Improving Group Robustness on Spurious Correlation Requires Preciser Group Inference
by: Han, Yujin, et al.
Published: (2024)

Towards Robust Graph Incremental Learning on Evolving Graphs
by: Su, Junwei, et al.
Published: (2024)

Approximation and Gradient Descent Training with Neural Networks
by: Welper, G.
Published: (2024)

On Convolutions, Intrinsic Dimension, and Diffusion Models
by: Leung, Kin Kwan, et al.
Published: (2025)

Step by Step: Adaptive Gradient Descent for Training L-Lipschitz Neural Networks
by: Sung, Kyle, et al.
Published: (2025)

What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks
by: Chen, Xingwu, et al.
Published: (2024)

On the Theory of Continual Learning with Gradient Descent for Neural Networks
by: Taheri, Hossein, et al.
Published: (2025)

Initialization Matters: On the Benign Overfitting of Two-Layer ReLU CNN with Fully Trainable Layers
by: Shang, Shuning, et al.
Published: (2024)

On the Intrinsic Dimensions of Data in Kernel Learning
by: Takhanov, Rustem
Published: (2026)

Hybrid Coordinate Descent for Efficient Neural Network Learning Using Line Search and Gradient Descent
by: Hsiao, Yen-Che, et al.
Published: (2024)

Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data
by: Frei, Spencer, et al.
Published: (2022)

Dual Natural Gradient Descent for Scalable Training of Physics-Informed Neural Networks
by: Jnini, Anas, et al.
Published: (2025)

Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks
by: Li, Binghui, et al.
Published: (2024)

Faster Sampling via Stochastic Gradient Proximal Sampler
by: Huang, Xunpeng, et al.
Published: (2024)

Variational Stochastic Gradient Descent for Deep Neural Networks
by: Chen, Haotian, et al.
Published: (2024)

Physics-Informed Neural PDE Solvers via Spatio-Temporal MeanFlow
by: Bai, Hanru, et al.
Published: (2026)

Dual Cone Gradient Descent for Training Physics-Informed Neural Networks
by: Hwang, Youngsik, et al.
Published: (2024)

Gradient Flow Matching for Learning Update Dynamics in Neural Network Training
by: Shou, Xiao, et al.
Published: (2025)

Optimizing Quantum Convolutional Neural Network Architectures for Arbitrary Data Dimension
by: Lee, Changwon, et al.
Published: (2024)

Convergence of Implicit Gradient Descent for Training Two-Layer Physics-Informed Neural Networks
by: Xu, Xianliang, et al.
Published: (2024)

Learning under Quantization for High-Dimensional Linear Regression
by: Zhang, Dechen, et al.
Published: (2025)

Reshaping Reasoning in LLMs: A Theoretical Analysis of RL Training Dynamics through Pattern Selection
by: Chen, Xingwu, et al.
Published: (2025)

Robust Gradient Descent for Phase Retrieval
by: Buna, Alex, et al.
Published: (2024)

Generalization Bounds of Stochastic Gradient Descent in Homogeneous Neural Networks
by: Ma, Wenquan, et al.
Published: (2026)

Generalization Guarantees of Gradient Descent for Multi-Layer Neural Networks
by: Wang, Puyu, et al.
Published: (2023)

Extracting Training Data from Unconditional Diffusion Models
by: Chen, Yunhao, et al.
Published: (2024)

On the Limitation and Experience Replay for GNNs in Continual Learning
by: Su, Junwei, et al.
Published: (2023)