Saved in:
| Main Authors: | Zakarin, Daniyar, Singh, Sidak Pal |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.11972 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Some Fundamental Aspects about Lipschitz Continuity of Neural Networks
by: Khromov, Grigory, et al.
Published: (2023)
by: Khromov, Grigory, et al.
Published: (2023)
Hallmarks of Optimization Trajectories in Neural Networks: Directional Exploration and Redundancy
by: Singh, Sidak Pal, et al.
Published: (2024)
by: Singh, Sidak Pal, et al.
Published: (2024)
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
by: Zhao, Jim, et al.
Published: (2024)
by: Zhao, Jim, et al.
Published: (2024)
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
by: Ormaniec, Weronika, et al.
Published: (2024)
by: Ormaniec, Weronika, et al.
Published: (2024)
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers
by: Bozic, Vukasin, et al.
Published: (2023)
by: Bozic, Vukasin, et al.
Published: (2023)
Avoiding spurious sharpness minimization broadens applicability of SAM
by: Singh, Sidak Pal, et al.
Published: (2025)
by: Singh, Sidak Pal, et al.
Published: (2025)
Local vs Global continual learning
by: Lanzillotta, Giulia, et al.
Published: (2024)
by: Lanzillotta, Giulia, et al.
Published: (2024)
Accelerating LLM Pre-Training through Flat-Direction Dynamics Enhancement
by: Zhu, Shuchen, et al.
Published: (2026)
by: Zhu, Shuchen, et al.
Published: (2026)
Generalized Linear Mode Connectivity for Transformers
by: Theus, Alexander, et al.
Published: (2025)
by: Theus, Alexander, et al.
Published: (2025)
Landscaping Linear Mode Connectivity
by: Singh, Sidak Pal, et al.
Published: (2024)
by: Singh, Sidak Pal, et al.
Published: (2024)
Transformer Fusion with Optimal Transport
by: Imfeld, Moritz, et al.
Published: (2023)
by: Imfeld, Moritz, et al.
Published: (2023)
Reflection Removal through Efficient Adaptation of Diffusion Transformers
by: Zakarin, Daniyar, et al.
Published: (2025)
by: Zakarin, Daniyar, et al.
Published: (2025)
Towards Meta-Pruning via Optimal Transport
by: Theus, Alexander, et al.
Published: (2024)
by: Theus, Alexander, et al.
Published: (2024)
Accelerated Training through Iterative Gradient Propagation Along the Residual Path
by: Fagnou, Erwan, et al.
Published: (2025)
by: Fagnou, Erwan, et al.
Published: (2025)
Sharpness Aware Surrogate Training for Spiking Neural Networks
by: Nicholson, Maximilian
Published: (2026)
by: Nicholson, Maximilian
Published: (2026)
Model Fusion via Retrofitting
by: Luenam, Phoomraphee, et al.
Published: (2025)
by: Luenam, Phoomraphee, et al.
Published: (2025)
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
by: Kalra, Dayal Singh, et al.
Published: (2023)
by: Kalra, Dayal Singh, et al.
Published: (2023)
Sharpness-Aware Minimization with Adaptive Regularization for Training Deep Neural Networks
by: Zou, Jinping, et al.
Published: (2024)
by: Zou, Jinping, et al.
Published: (2024)
FedVSSAM: Mitigating Flatness Incompatibility in Sharpness-Aware Federated Learning
by: Xiao, Bingnan, et al.
Published: (2026)
by: Xiao, Bingnan, et al.
Published: (2026)
Beyond Sharpness: A Flatness Decomposition Framework for Efficient Continual Learning
by: Chen, Yanan, et al.
Published: (2026)
by: Chen, Yanan, et al.
Published: (2026)
Sharpness-Aware Surrogate Training for On-Sensor Spiking Neural Networks
by: Nicholson, Maximilian
Published: (2026)
by: Nicholson, Maximilian
Published: (2026)
BSFA: Leveraging the Subspace Dichotomy to Accelerate Neural Network Training
by: Zhou, Wenjie, et al.
Published: (2025)
by: Zhou, Wenjie, et al.
Published: (2025)
Accelerating Neural Network Training: An Analysis of the AlgoPerf Competition
by: Kasimbeg, Priya, et al.
Published: (2025)
by: Kasimbeg, Priya, et al.
Published: (2025)
Neural Network Plasticity and Loss Sharpness
by: Koster, Max, et al.
Published: (2024)
by: Koster, Max, et al.
Published: (2024)
Does SGD Seek Flatness or Sharpness? An Exactly Solvable Model
by: Xu, Yizhou, et al.
Published: (2026)
by: Xu, Yizhou, et al.
Published: (2026)
Perfecting Imperfect Physical Neural Networks with Transferable Robustness using Sharpness-Aware Training
by: Xu, Tengji, et al.
Published: (2024)
by: Xu, Tengji, et al.
Published: (2024)
Accelerating Storage-Based Training for Graph Neural Networks
by: Jang, Myung-Hwan, et al.
Published: (2026)
by: Jang, Myung-Hwan, et al.
Published: (2026)
Memory Faults in Activation-sparse Quantized Deep Neural Networks: Analysis and Mitigation using Sharpness-aware Training
by: Malhotra, Akul, et al.
Published: (2024)
by: Malhotra, Akul, et al.
Published: (2024)
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
by: Wang, Jinbo, et al.
Published: (2025)
by: Wang, Jinbo, et al.
Published: (2025)
Switch EMA: A Free Lunch for Better Flatness and Sharpness
by: Li, Siyuan, et al.
Published: (2024)
by: Li, Siyuan, et al.
Published: (2024)
A Function-Centric Perspective on Flat and Sharp Minima
by: Mason-Williams, Israel, et al.
Published: (2025)
by: Mason-Williams, Israel, et al.
Published: (2025)
Training Diagonal Linear Networks with Stochastic Sharpness-Aware Minimization
by: Clara, Gabriel, et al.
Published: (2025)
by: Clara, Gabriel, et al.
Published: (2025)
DR-CircuitGNN: Training Acceleration of Heterogeneous Circuit Graph Neural Network on GPUs
by: Luo, Yuebo, et al.
Published: (2025)
by: Luo, Yuebo, et al.
Published: (2025)
Approximate learning of parsimonious Bayesian context trees
by: Ghani, Daniyar, et al.
Published: (2024)
by: Ghani, Daniyar, et al.
Published: (2024)
From Flat Facts to Sharp Hallucinations: Detecting Stubborn Errors via Gradient Sensitivity
by: Liew, Yee Zhing, et al.
Published: (2026)
by: Liew, Yee Zhing, et al.
Published: (2026)
AA-DLADMM: An Accelerated ADMM-based Framework for Training Deep Neural Networks
by: Ebrahimi, Zeinab, et al.
Published: (2024)
by: Ebrahimi, Zeinab, et al.
Published: (2024)
Quantifying Training Difficulty and Accelerating Convergence in Neural Network-Based PDE Solvers
by: Chen, Chuqi, et al.
Published: (2024)
by: Chen, Chuqi, et al.
Published: (2024)
Optical Computing for Deep Neural Network Acceleration: Foundations, Recent Developments, and Emerging Directions
by: Pasricha, Sudeep
Published: (2024)
by: Pasricha, Sudeep
Published: (2024)
Approximating Families of Sharp Solutions to Fisher's Equation with Physics-Informed Neural Networks
by: Rohrhofer, Franz M., et al.
Published: (2024)
by: Rohrhofer, Franz M., et al.
Published: (2024)
Iterative Misclassification Error Training (IMET): An Optimized Neural Network Training Technique for Image Classification
by: Singh, Ruhaan, et al.
Published: (2025)
by: Singh, Ruhaan, et al.
Published: (2025)
Similar Items
-
Some Fundamental Aspects about Lipschitz Continuity of Neural Networks
by: Khromov, Grigory, et al.
Published: (2023) -
Hallmarks of Optimization Trajectories in Neural Networks: Directional Exploration and Redundancy
by: Singh, Sidak Pal, et al.
Published: (2024) -
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
by: Zhao, Jim, et al.
Published: (2024) -
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
by: Ormaniec, Weronika, et al.
Published: (2024) -
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers
by: Bozic, Vukasin, et al.
Published: (2023)