Saved in:
| Main Authors: | Olsen, Brian Richard, Fatehmanesh, Sam, Xiao, Frank, Kumarappan, Adarsh, Gajula, Anirudh |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.12709 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Towards Realistic Guarantees: A Probabilistic Certificate for SmoothLLM
by: Kumarappan, Adarsh, et al.
Published: (2025)
by: Kumarappan, Adarsh, et al.
Published: (2025)
Automating Deception: Scalable Multi-Turn LLM Jailbreaks
by: Kumarappan, Adarsh, et al.
Published: (2025)
by: Kumarappan, Adarsh, et al.
Published: (2025)
Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy
by: Kumarappan, Adarsh, et al.
Published: (2026)
by: Kumarappan, Adarsh, et al.
Published: (2026)
SGD and Weight Decay Secretly Minimize the Rank of Your Neural Network
by: Galanti, Tomer, et al.
Published: (2022)
by: Galanti, Tomer, et al.
Published: (2022)
LeanAgent: Lifelong Learning for Formal Theorem Proving
by: Kumarappan, Adarsh, et al.
Published: (2024)
by: Kumarappan, Adarsh, et al.
Published: (2024)
Sentiment-Aware Recommendation Systems in E-Commerce: A Review from a Natural Language Processing Perspective
by: Gajula, Yogesh
Published: (2025)
by: Gajula, Yogesh
Published: (2025)
Optimal Condition for Initialization Variance in Deep Neural Networks: An SGD Dynamics Perspective
by: Horii, Hiroshi, et al.
Published: (2025)
by: Horii, Hiroshi, et al.
Published: (2025)
DevBench: A Realistic, Developer-Informed Benchmark for Code Generation Models
by: Kumarappan, Adarsh, et al.
Published: (2026)
by: Kumarappan, Adarsh, et al.
Published: (2026)
Memorization in Graph Neural Networks
by: Jamadandi, Adarsh, et al.
Published: (2025)
by: Jamadandi, Adarsh, et al.
Published: (2025)
SGD with Partial Hessian for Deep Neural Networks Optimization
by: Sun, Ying, et al.
Published: (2024)
by: Sun, Ying, et al.
Published: (2024)
SGD as Free Energy Minimization: A Thermodynamic View on Neural Network Training
by: Sadrtdinov, Ildus, et al.
Published: (2025)
by: Sadrtdinov, Ildus, et al.
Published: (2025)
To Clip or not to Clip: the Dynamics of SGD with Gradient Clipping in High-Dimensions
by: Marshall, Noah, et al.
Published: (2024)
by: Marshall, Noah, et al.
Published: (2024)
A Simplified Analysis of SGD for Linear Regression with Weight Averaging
by: Meterez, Alexandru, et al.
Published: (2025)
by: Meterez, Alexandru, et al.
Published: (2025)
DP-SGD Without Clipping: The Lipschitz Neural Network Way
by: Bethune, Louis, et al.
Published: (2023)
by: Bethune, Louis, et al.
Published: (2023)
Implicit Compressibility of Overparametrized Neural Networks Trained with Heavy-Tailed SGD
by: Wan, Yijun, et al.
Published: (2023)
by: Wan, Yijun, et al.
Published: (2023)
Convergence of SGD for Training Neural Networks with Sliced Wasserstein Losses
by: Tanguy, Eloi
Published: (2023)
by: Tanguy, Eloi
Published: (2023)
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD
by: Beneventano, Pierfrancesco, et al.
Published: (2024)
by: Beneventano, Pierfrancesco, et al.
Published: (2024)
Style-based Clustering of Visual Artworks and the Play of Neural Style-Representations
by: Dangeti, Abhishek, et al.
Published: (2024)
by: Dangeti, Abhishek, et al.
Published: (2024)
Numerical simulation of transient heat conduction with moving heat source using Physics Informed Neural Networks
by: Kalyan, Anirudh, et al.
Published: (2025)
by: Kalyan, Anirudh, et al.
Published: (2025)
On the Learning Dynamics of Two-layer Linear Networks with Label Noise SGD
by: Zhang, Tongcheng, et al.
Published: (2026)
by: Zhang, Tongcheng, et al.
Published: (2026)
SGD-Based Knowledge Distillation with Bayesian Teachers: Theory and Guidelines
by: Morad, Itai, et al.
Published: (2026)
by: Morad, Itai, et al.
Published: (2026)
Weight Spectra Induced Efficient Model Adaptation
by: Si, Chongjie, et al.
Published: (2025)
by: Si, Chongjie, et al.
Published: (2025)
Diffusion-Based Neural Network Weights Generation
by: Soro, Bedionita, et al.
Published: (2024)
by: Soro, Bedionita, et al.
Published: (2024)
A Generalized Singular Value Theory for Neural Networks
by: Brown, Brian Charles, et al.
Published: (2026)
by: Brown, Brian Charles, et al.
Published: (2026)
From PowerSGD to PowerSGD+: Low-Rank Gradient Compression for Distributed Optimization with Convergence Guarantees
by: Xie, Shengping, et al.
Published: (2025)
by: Xie, Shengping, et al.
Published: (2025)
Enhancing DP-SGD through Non-monotonous Adaptive Scaling Gradient Weight
by: Huang, Tao, et al.
Published: (2024)
by: Huang, Tao, et al.
Published: (2024)
Cooperative SGD with Dynamic Mixing Matrices
by: Sarkar, Soumya, et al.
Published: (2025)
by: Sarkar, Soumya, et al.
Published: (2025)
Global Convergence of SGD On Two Layer Neural Nets
by: Gopalani, Pulkit, et al.
Published: (2022)
by: Gopalani, Pulkit, et al.
Published: (2022)
From Weight Perturbation to Feature Attribution for Explaining Fully Connected Neural Networks
by: Lymperopoulos, Thodoris, et al.
Published: (2026)
by: Lymperopoulos, Thodoris, et al.
Published: (2026)
Weight Initialization and Variance Dynamics in Deep Neural Networks and Large Language Models
by: Han, Yankun
Published: (2025)
by: Han, Yankun
Published: (2025)
Accurate and Scalable Estimation of Epistemic Uncertainty for Graph Neural Networks
by: Trivedi, Puja, et al.
Published: (2024)
by: Trivedi, Puja, et al.
Published: (2024)
Balancing Utility and Privacy: Dynamically Private SGD with Random Projection
by: Jiang, Zhanhong, et al.
Published: (2025)
by: Jiang, Zhanhong, et al.
Published: (2025)
Less is More: Efficient Weight Farcasting with 1-Layer Neural Network
by: Shou, Xiao, et al.
Published: (2025)
by: Shou, Xiao, et al.
Published: (2025)
On the Stability of Nonlinear Dynamics in GD and SGD: Beyond Quadratic Potentials
by: Mulayoff, Rotem, et al.
Published: (2026)
by: Mulayoff, Rotem, et al.
Published: (2026)
Comparing Spectral Bias and Robustness For Two-Layer Neural Networks: SGD vs Adaptive Random Fourier Features
by: Kammonen, Aku, et al.
Published: (2024)
by: Kammonen, Aku, et al.
Published: (2024)
DC-SGD: Differentially Private SGD with Dynamic Clipping through Gradient Norm Distribution Estimation
by: Wei, Chengkun, et al.
Published: (2025)
by: Wei, Chengkun, et al.
Published: (2025)
Single-Head Attention in High Dimensions: A Theory of Generalization, Weights Spectra, and Scaling Laws
by: Boncoraglio, Fabrizio, et al.
Published: (2025)
by: Boncoraglio, Fabrizio, et al.
Published: (2025)
From Gradient Clipping to Normalization for Heavy Tailed SGD
by: Hübler, Florian, et al.
Published: (2024)
by: Hübler, Florian, et al.
Published: (2024)
Signal Processing Meets SGD: From Momentum to Filter
by: Yao, Zhipeng, et al.
Published: (2023)
by: Yao, Zhipeng, et al.
Published: (2023)
Deep Neural Network for Phonon-Assisted Optical Spectra in Semiconductors
by: Gu, Qiangqiang, et al.
Published: (2025)
by: Gu, Qiangqiang, et al.
Published: (2025)
Similar Items
-
Towards Realistic Guarantees: A Probabilistic Certificate for SmoothLLM
by: Kumarappan, Adarsh, et al.
Published: (2025) -
Automating Deception: Scalable Multi-Turn LLM Jailbreaks
by: Kumarappan, Adarsh, et al.
Published: (2025) -
Not Just RLHF: Why Alignment Alone Won't Fix Multi-Agent Sycophancy
by: Kumarappan, Adarsh, et al.
Published: (2026) -
SGD and Weight Decay Secretly Minimize the Rank of Your Neural Network
by: Galanti, Tomer, et al.
Published: (2022) -
LeanAgent: Lifelong Learning for Formal Theorem Proving
by: Kumarappan, Adarsh, et al.
Published: (2024)