:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Meng, Weikang, Luo, Yadan, Huo, Liangyu, Li, Yingjian, Wang, Yaowei, Li, Xin, Zhang, Zheng
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2506.21137
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MirrorLA: Reflecting Feature Map for Vision Linear Attention
by: Meng, Weikang, et al.
Published: (2026)

STILL: Selecting Tokens for Intra-Layer Hybrid Attention to Linearize LLMs
by: Meng, Weikang, et al.
Published: (2026)

ConjNorm: Tractable Density Estimation for Out-of-Distribution Detection
by: Peng, Bo, et al.
Published: (2024)

PolaFormer: Polarity-aware Linear Attention for Vision Transformers
by: Meng, Weikang, et al.
Published: (2025)

GeoNorm: Unify Pre-Norm and Post-Norm with Geodesic Optimization
by: Zheng, Chuanyang, et al.
Published: (2026)

On the Role of Attention Masks and LayerNorm in Transformers
by: Wu, Xinyi, et al.
Published: (2024)

Post-Norm can Resharpen Attention
by: Zsámboki, Pál, et al.
Published: (2025)

GradientStabilizer:Fix the Norm, Not the Gradient
by: Huang, Tianjin, et al.
Published: (2025)

Empirical Bound Information-Directed Sampling for Norm-Agnostic Bandits
by: Suder, Piotr M., et al.
Published: (2025)

VL Norm: Rethink Loss Aggregation in RLVR
by: He, Zhiyuan, et al.
Published: (2025)

Robust Capped lp-Norm Support Vector Ordinal Regression
by: Xiang, Haorui, et al.
Published: (2024)

Muown: Row-Norm Control for Muon Optimization
by: Lion, Kai, et al.
Published: (2026)

Norm-Bounded Low-Rank Adaptation
by: Wang, Ruigang, et al.
Published: (2025)

Scalable Frame-based Construction of Sociocultural NormBases for Socially-Aware Dialogues
by: Qu, Shilin, et al.
Published: (2024)

SpanNorm: Reconciling Training Stability and Performance in Deep Transformers
by: Wang, Chao, et al.
Published: (2026)

Muon Optimizes Under Spectral Norm Constraints
by: Chen, Lizhang, et al.
Published: (2025)

Geometry and Dynamics of LayerNorm
by: Riechers, Paul M.
Published: (2024)

Scalable Optimization in the Modular Norm
by: Large, Tim, et al.
Published: (2024)

The Ky Fan Norms and Beyond: Dual Norms and Combinations for Matrix Optimization
by: Kravatskiy, Alexey, et al.
Published: (2025)

Clipping Improves Adam-Norm and AdaGrad-Norm when the Noise Is Heavy-Tailed
by: Chezhegov, Savelii, et al.
Published: (2024)

Optimal Scaling Needs Optimal Norm
by: Filatov, Oleg, et al.
Published: (2025)

FlashNorm: Fast Normalization for Transformers
by: Graef, Nils, et al.
Published: (2024)

Nuclear Norm Regularization for Deep Learning
by: Scarvelis, Christopher, et al.
Published: (2024)

Parametric $ρ$-Norm Scaling Calibration
by: Zhang, Siyuan, et al.
Published: (2024)

UnitNorm: Rethinking Normalization for Transformers in Time Series
by: Huang, Nan, et al.
Published: (2024)

Measuring Social Norms of Large Language Models
by: Yuan, Ye, et al.
Published: (2024)

Norm Augmented Graph AutoEncoders for Link Prediction
by: Liu, Yunhui, et al.
Published: (2025)

Greedy-Gnorm: A Gradient Matrix Norm-Based Alternative to Attention Entropy for Head Pruning
by: Guo, Yuxi, et al.
Published: (2026)

On the Importance of Embedding Norms in Self-Supervised Learning
by: Draganov, Andrew, et al.
Published: (2025)

Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization
by: Xie, Shuo, et al.
Published: (2024)

Minimum-Norm Interpolation Under Covariate Shift
by: Mallinar, Neil, et al.
Published: (2024)

Optimal Excess Risk Bounds for Empirical Risk Minimization on $p$-Norm Linear Regression
by: Hanchi, Ayoub El, et al.
Published: (2023)

Old Optimizer, New Norm: An Anthology
by: Bernstein, Jeremy, et al.
Published: (2024)

Distribution Estimation under the Infinity Norm
by: Kontorovich, Aryeh, et al.
Published: (2024)

Neural Weight Norm = Kolmogorov Complexity
by: Musat, Tiberiu
Published: (2026)

Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression
by: Ioushua, Shahar Stein, et al.
Published: (2023)

Discriminability-Driven Spatial-Channel Selection with Gradient Norm for Drone Signal OOD Detection
by: Feng, Chuhan, et al.
Published: (2026)

Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms
by: Li, Yi, et al.
Published: (2024)

Adaptive Norm-Based Regularization for Neural Networks
by: Qasim, Muhammad, et al.
Published: (2026)

Transformers Don't Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability
by: Baroni, Luca, et al.
Published: (2025)