:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zakarin, Daniyar, Singh, Sidak Pal
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2505.11972
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Some Fundamental Aspects about Lipschitz Continuity of Neural Networks
by: Khromov, Grigory, et al.
Published: (2023)

Hallmarks of Optimization Trajectories in Neural Networks: Directional Exploration and Redundancy
by: Singh, Sidak Pal, et al.
Published: (2024)

Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
by: Zhao, Jim, et al.
Published: (2024)

What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
by: Ormaniec, Weronika, et al.
Published: (2024)

Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers
by: Bozic, Vukasin, et al.
Published: (2023)

Avoiding spurious sharpness minimization broadens applicability of SAM
by: Singh, Sidak Pal, et al.
Published: (2025)

Local vs Global continual learning
by: Lanzillotta, Giulia, et al.
Published: (2024)

Accelerating LLM Pre-Training through Flat-Direction Dynamics Enhancement
by: Zhu, Shuchen, et al.
Published: (2026)

Generalized Linear Mode Connectivity for Transformers
by: Theus, Alexander, et al.
Published: (2025)

Landscaping Linear Mode Connectivity
by: Singh, Sidak Pal, et al.
Published: (2024)

Transformer Fusion with Optimal Transport
by: Imfeld, Moritz, et al.
Published: (2023)

Reflection Removal through Efficient Adaptation of Diffusion Transformers
by: Zakarin, Daniyar, et al.
Published: (2025)

Towards Meta-Pruning via Optimal Transport
by: Theus, Alexander, et al.
Published: (2024)

Accelerated Training through Iterative Gradient Propagation Along the Residual Path
by: Fagnou, Erwan, et al.
Published: (2025)

Sharpness Aware Surrogate Training for Spiking Neural Networks
by: Nicholson, Maximilian
Published: (2026)

Model Fusion via Retrofitting
by: Luenam, Phoomraphee, et al.
Published: (2025)

Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
by: Kalra, Dayal Singh, et al.
Published: (2023)

Sharpness-Aware Minimization with Adaptive Regularization for Training Deep Neural Networks
by: Zou, Jinping, et al.
Published: (2024)

FedVSSAM: Mitigating Flatness Incompatibility in Sharpness-Aware Federated Learning
by: Xiao, Bingnan, et al.
Published: (2026)

Beyond Sharpness: A Flatness Decomposition Framework for Efficient Continual Learning
by: Chen, Yanan, et al.
Published: (2026)

Sharpness-Aware Surrogate Training for On-Sensor Spiking Neural Networks
by: Nicholson, Maximilian
Published: (2026)

BSFA: Leveraging the Subspace Dichotomy to Accelerate Neural Network Training
by: Zhou, Wenjie, et al.
Published: (2025)

Accelerating Neural Network Training: An Analysis of the AlgoPerf Competition
by: Kasimbeg, Priya, et al.
Published: (2025)

Neural Network Plasticity and Loss Sharpness
by: Koster, Max, et al.
Published: (2024)

Does SGD Seek Flatness or Sharpness? An Exactly Solvable Model
by: Xu, Yizhou, et al.
Published: (2026)

Perfecting Imperfect Physical Neural Networks with Transferable Robustness using Sharpness-Aware Training
by: Xu, Tengji, et al.
Published: (2024)

Accelerating Storage-Based Training for Graph Neural Networks
by: Jang, Myung-Hwan, et al.
Published: (2026)

Memory Faults in Activation-sparse Quantized Deep Neural Networks: Analysis and Mitigation using Sharpness-aware Training
by: Malhotra, Akul, et al.
Published: (2024)

The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
by: Wang, Jinbo, et al.
Published: (2025)

Switch EMA: A Free Lunch for Better Flatness and Sharpness
by: Li, Siyuan, et al.
Published: (2024)

A Function-Centric Perspective on Flat and Sharp Minima
by: Mason-Williams, Israel, et al.
Published: (2025)

Training Diagonal Linear Networks with Stochastic Sharpness-Aware Minimization
by: Clara, Gabriel, et al.
Published: (2025)

DR-CircuitGNN: Training Acceleration of Heterogeneous Circuit Graph Neural Network on GPUs
by: Luo, Yuebo, et al.
Published: (2025)

Approximate learning of parsimonious Bayesian context trees
by: Ghani, Daniyar, et al.
Published: (2024)

From Flat Facts to Sharp Hallucinations: Detecting Stubborn Errors via Gradient Sensitivity
by: Liew, Yee Zhing, et al.
Published: (2026)

AA-DLADMM: An Accelerated ADMM-based Framework for Training Deep Neural Networks
by: Ebrahimi, Zeinab, et al.
Published: (2024)

Quantifying Training Difficulty and Accelerating Convergence in Neural Network-Based PDE Solvers
by: Chen, Chuqi, et al.
Published: (2024)

Optical Computing for Deep Neural Network Acceleration: Foundations, Recent Developments, and Emerging Directions
by: Pasricha, Sudeep
Published: (2024)

Approximating Families of Sharp Solutions to Fisher's Equation with Physics-Informed Neural Networks
by: Rohrhofer, Franz M., et al.
Published: (2024)

Iterative Misclassification Error Training (IMET): An Optimized Neural Network Training Technique for Image Classification
by: Singh, Ruhaan, et al.
Published: (2025)