:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Nitanda, Atsushi, Kikuchi, Ryuhei, Maeda, Shugo, Wu, Denny
Format:	Preprint
Published:	2023
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2302.09376
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Uniform convergence of the smooth calibration error and its relationship with functional gradient
by: Futami, Futoshi, et al.
Published: (2025)

Improved Particle Approximation Error for Mean Field Neural Networks
by: Nitanda, Atsushi
Published: (2024)

Statistical Analysis of the Sinkhorn Iterations for Two-Sample Schrödinger Bridge Estimation
by: Maeda, Ibuki, et al.
Published: (2025)

Alternating Diffusion for Proximal Sampling with Zeroth Order Queries
by: Takagi, Hirohane, et al.
Published: (2026)

How Does Preconditioning Guide Feature Learning in Deep Neural Networks?
by: Yoshida, Kotaro, et al.
Published: (2025)

Mirror Descent Policy Optimisation for Robust Constrained Markov Decision Processes
by: Bossens, David M., et al.
Published: (2025)

Emergence and scaling laws in SGD learning of shallow neural networks
by: Ren, Yunwei, et al.
Published: (2025)

Towards a Unified Analysis of Neural Networks in Nonparametric Instrumental Variable Regression: Optimization and Generalization
by: Chen, Zonghao, et al.
Published: (2025)

Direct Distributional Optimization for Provable Alignment of Diffusion Models
by: Kawata, Ryotaro, et al.
Published: (2025)

Slowly Annealed Langevin Dynamics: Theory and Applications to Training-Free Guided Generation
by: Nitanda, Atsushi, et al.
Published: (2026)

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit
by: Lee, Jason D., et al.
Published: (2024)

Intrinsic Wasserstein Rates for Score-Based Generative Models on Smooth Manifolds
by: Fu, Guoji, et al.
Published: (2026)

Learning quadratic neural networks in high dimensions: SGD dynamics and scaling laws
by: Arous, Gérard Ben, et al.
Published: (2025)

Why pre-training is beneficial for downstream classification tasks?
by: Jiang, Xin, et al.
Published: (2024)

Koopman-based generalization bound: New aspect for full-rank weights
by: Hashimoto, Yuka, et al.
Published: (2023)

Full-Batch Gradient Descent Outperforms One-Pass SGD: Sample Complexity Separation in Single-Index Learning
by: Kovačević, Filip, et al.
Published: (2026)

Propagation of Chaos for Mean-Field Langevin Dynamics and its Application to Model Ensemble
by: Nitanda, Atsushi, et al.
Published: (2025)

SGD method for entropy error function with smoothing l0 regularization for neural networks
by: Nguyen, Trong-Tuan, et al.
Published: (2024)

Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning
by: Bu, Dake, et al.
Published: (2024)

Provable Benefit of Curriculum in Transformer Tree-Reasoning Post-Training
by: Bu, Dake, et al.
Published: (2025)

DPRM: A Plug-in Doob h transform-induced Token-Ordering Module for Diffusion Language Models
by: Bu, Dake, et al.
Published: (2026)

Provable In-Context Vector Arithmetic via Retrieving Task Concepts
by: Bu, Dake, et al.
Published: (2025)

From Coupled Oscillators to Graph Neural Networks: Reducing Over-smoothing via a Kuramoto Model-based Approach
by: Nguyen, Tuan, et al.
Published: (2023)

Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks
by: Chen, Feng, et al.
Published: (2023)

When and Why SignSGD Outperforms SGD: A Theoretical Study Based on $\ell_1$-norm Lower Bounds
by: Tao, Hongyi, et al.
Published: (2026)

Why Adam Can Beat SGD: Second-Moment Normalization Yields Sharper Tails
by: Jin, Ruinan, et al.
Published: (2026)

Post-Training as Reweighting: A Stochastic View of Reasoning Trajectories in Language Models
by: Bu, Dake, et al.
Published: (2025)

High-dimensional scaling limits and fluctuations of online least-squares SGD with smooth covariance
by: Balasubramanian, Krishnakumar, et al.
Published: (2023)

Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful
by: Marek, Martin, et al.
Published: (2025)

Accelerated zero-order SGD under high-order smoothness and overparameterized regime
by: Bychkov, Georgii, et al.
Published: (2024)

Demystifying Why Local Aggregation Helps: Convergence Analysis of Hierarchical SGD
by: Wang, Jiayi, et al.
Published: (2020)

Propagation of Chaos in One-hidden-layer Neural Networks beyond Logarithmic Time
by: Glasgow, Margalit, et al.
Published: (2025)

Stochastic-Sign SGD for Federated Learning with Theoretical Guarantees
by: Jin, Richeng, et al.
Published: (2020)

Ordered Momentum for Asynchronous SGD
by: Shi, Chang-Wei, et al.
Published: (2024)

Improving Implicit Regularization of SGD with Preconditioning for Least Square Problems
by: Su, Junwei, et al.
Published: (2024)

Anon: Extrapolating Adaptivity Beyond SGD and Adam
by: Zhang, Yiheng, et al.
Published: (2026)

Generalization and Optimization of SGD with Lookahead
by: Li, Kangcheng, et al.
Published: (2025)

PCDP-SGD: Improving the Convergence of Differentially Private SGD via Projection in Advance
by: Sha, Haichao, et al.
Published: (2023)

Smoothed SGD for quantiles: Bahadur representation and Gaussian approximation
by: Chen, Likai, et al.
Published: (2025)

Topology-aware Generalization of Decentralized SGD
by: Zhu, Tongtian, et al.
Published: (2022)