Saved in:
| Main Authors: | Nitanda, Atsushi, Kikuchi, Ryuhei, Maeda, Shugo, Wu, Denny |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2302.09376 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Uniform convergence of the smooth calibration error and its relationship with functional gradient
by: Futami, Futoshi, et al.
Published: (2025)
by: Futami, Futoshi, et al.
Published: (2025)
Improved Particle Approximation Error for Mean Field Neural Networks
by: Nitanda, Atsushi
Published: (2024)
by: Nitanda, Atsushi
Published: (2024)
Statistical Analysis of the Sinkhorn Iterations for Two-Sample Schrödinger Bridge Estimation
by: Maeda, Ibuki, et al.
Published: (2025)
by: Maeda, Ibuki, et al.
Published: (2025)
Alternating Diffusion for Proximal Sampling with Zeroth Order Queries
by: Takagi, Hirohane, et al.
Published: (2026)
by: Takagi, Hirohane, et al.
Published: (2026)
How Does Preconditioning Guide Feature Learning in Deep Neural Networks?
by: Yoshida, Kotaro, et al.
Published: (2025)
by: Yoshida, Kotaro, et al.
Published: (2025)
Mirror Descent Policy Optimisation for Robust Constrained Markov Decision Processes
by: Bossens, David M., et al.
Published: (2025)
by: Bossens, David M., et al.
Published: (2025)
Emergence and scaling laws in SGD learning of shallow neural networks
by: Ren, Yunwei, et al.
Published: (2025)
by: Ren, Yunwei, et al.
Published: (2025)
Towards a Unified Analysis of Neural Networks in Nonparametric Instrumental Variable Regression: Optimization and Generalization
by: Chen, Zonghao, et al.
Published: (2025)
by: Chen, Zonghao, et al.
Published: (2025)
Direct Distributional Optimization for Provable Alignment of Diffusion Models
by: Kawata, Ryotaro, et al.
Published: (2025)
by: Kawata, Ryotaro, et al.
Published: (2025)
Slowly Annealed Langevin Dynamics: Theory and Applications to Training-Free Guided Generation
by: Nitanda, Atsushi, et al.
Published: (2026)
by: Nitanda, Atsushi, et al.
Published: (2026)
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit
by: Lee, Jason D., et al.
Published: (2024)
by: Lee, Jason D., et al.
Published: (2024)
Intrinsic Wasserstein Rates for Score-Based Generative Models on Smooth Manifolds
by: Fu, Guoji, et al.
Published: (2026)
by: Fu, Guoji, et al.
Published: (2026)
Learning quadratic neural networks in high dimensions: SGD dynamics and scaling laws
by: Arous, Gérard Ben, et al.
Published: (2025)
by: Arous, Gérard Ben, et al.
Published: (2025)
Why pre-training is beneficial for downstream classification tasks?
by: Jiang, Xin, et al.
Published: (2024)
by: Jiang, Xin, et al.
Published: (2024)
Koopman-based generalization bound: New aspect for full-rank weights
by: Hashimoto, Yuka, et al.
Published: (2023)
by: Hashimoto, Yuka, et al.
Published: (2023)
Full-Batch Gradient Descent Outperforms One-Pass SGD: Sample Complexity Separation in Single-Index Learning
by: Kovačević, Filip, et al.
Published: (2026)
by: Kovačević, Filip, et al.
Published: (2026)
Propagation of Chaos for Mean-Field Langevin Dynamics and its Application to Model Ensemble
by: Nitanda, Atsushi, et al.
Published: (2025)
by: Nitanda, Atsushi, et al.
Published: (2025)
SGD method for entropy error function with smoothing l0 regularization for neural networks
by: Nguyen, Trong-Tuan, et al.
Published: (2024)
by: Nguyen, Trong-Tuan, et al.
Published: (2024)
Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning
by: Bu, Dake, et al.
Published: (2024)
by: Bu, Dake, et al.
Published: (2024)
Provable Benefit of Curriculum in Transformer Tree-Reasoning Post-Training
by: Bu, Dake, et al.
Published: (2025)
by: Bu, Dake, et al.
Published: (2025)
DPRM: A Plug-in Doob h transform-induced Token-Ordering Module for Diffusion Language Models
by: Bu, Dake, et al.
Published: (2026)
by: Bu, Dake, et al.
Published: (2026)
Provable In-Context Vector Arithmetic via Retrieving Task Concepts
by: Bu, Dake, et al.
Published: (2025)
by: Bu, Dake, et al.
Published: (2025)
From Coupled Oscillators to Graph Neural Networks: Reducing Over-smoothing via a Kuramoto Model-based Approach
by: Nguyen, Tuan, et al.
Published: (2023)
by: Nguyen, Tuan, et al.
Published: (2023)
Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks
by: Chen, Feng, et al.
Published: (2023)
by: Chen, Feng, et al.
Published: (2023)
When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds
by: Tao, Hongyi, et al.
Published: (2026)
by: Tao, Hongyi, et al.
Published: (2026)
Why Adam Can Beat SGD: Second-Moment Normalization Yields Sharper Tails
by: Jin, Ruinan, et al.
Published: (2026)
by: Jin, Ruinan, et al.
Published: (2026)
Post-Training as Reweighting: A Stochastic View of Reasoning Trajectories in Language Models
by: Bu, Dake, et al.
Published: (2025)
by: Bu, Dake, et al.
Published: (2025)
High-dimensional scaling limits and fluctuations of online least-squares SGD with smooth covariance
by: Balasubramanian, Krishnakumar, et al.
Published: (2023)
by: Balasubramanian, Krishnakumar, et al.
Published: (2023)
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
by: Marek, Martin, et al.
Published: (2025)
by: Marek, Martin, et al.
Published: (2025)
Accelerated zero-order SGD under high-order smoothness and overparameterized regime
by: Bychkov, Georgii, et al.
Published: (2024)
by: Bychkov, Georgii, et al.
Published: (2024)
Demystifying Why Local Aggregation Helps: Convergence Analysis of Hierarchical SGD
by: Wang, Jiayi, et al.
Published: (2020)
by: Wang, Jiayi, et al.
Published: (2020)
Propagation of Chaos in One-hidden-layer Neural Networks beyond Logarithmic Time
by: Glasgow, Margalit, et al.
Published: (2025)
by: Glasgow, Margalit, et al.
Published: (2025)
Stochastic-Sign SGD for Federated Learning with Theoretical Guarantees
by: Jin, Richeng, et al.
Published: (2020)
by: Jin, Richeng, et al.
Published: (2020)
Ordered Momentum for Asynchronous SGD
by: Shi, Chang-Wei, et al.
Published: (2024)
by: Shi, Chang-Wei, et al.
Published: (2024)
Improving Implicit Regularization of SGD with Preconditioning for Least Square Problems
by: Su, Junwei, et al.
Published: (2024)
by: Su, Junwei, et al.
Published: (2024)
Anon: Extrapolating Adaptivity Beyond SGD and Adam
by: Zhang, Yiheng, et al.
Published: (2026)
by: Zhang, Yiheng, et al.
Published: (2026)
Generalization and Optimization of SGD with Lookahead
by: Li, Kangcheng, et al.
Published: (2025)
by: Li, Kangcheng, et al.
Published: (2025)
PCDP-SGD: Improving the Convergence of Differentially Private SGD via Projection in Advance
by: Sha, Haichao, et al.
Published: (2023)
by: Sha, Haichao, et al.
Published: (2023)
Smoothed SGD for quantiles: Bahadur representation and Gaussian approximation
by: Chen, Likai, et al.
Published: (2025)
by: Chen, Likai, et al.
Published: (2025)
Topology-aware Generalization of Decentralized SGD
by: Zhu, Tongtian, et al.
Published: (2022)
by: Zhu, Tongtian, et al.
Published: (2022)
Similar Items
-
Uniform convergence of the smooth calibration error and its relationship with functional gradient
by: Futami, Futoshi, et al.
Published: (2025) -
Improved Particle Approximation Error for Mean Field Neural Networks
by: Nitanda, Atsushi
Published: (2024) -
Statistical Analysis of the Sinkhorn Iterations for Two-Sample Schrödinger Bridge Estimation
by: Maeda, Ibuki, et al.
Published: (2025) -
Alternating Diffusion for Proximal Sampling with Zeroth Order Queries
by: Takagi, Hirohane, et al.
Published: (2026) -
How Does Preconditioning Guide Feature Learning in Deep Neural Networks?
by: Yoshida, Kotaro, et al.
Published: (2025)