Saved in:
| Main Authors: | Meng, Weikang, Luo, Yadan, Huo, Liangyu, Li, Yingjian, Wang, Yaowei, Li, Xin, Zhang, Zheng |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.21137 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MirrorLA: Reflecting Feature Map for Vision Linear Attention
by: Meng, Weikang, et al.
Published: (2026)
by: Meng, Weikang, et al.
Published: (2026)
STILL: Selecting Tokens for Intra-Layer Hybrid Attention to Linearize LLMs
by: Meng, Weikang, et al.
Published: (2026)
by: Meng, Weikang, et al.
Published: (2026)
ConjNorm: Tractable Density Estimation for Out-of-Distribution Detection
by: Peng, Bo, et al.
Published: (2024)
by: Peng, Bo, et al.
Published: (2024)
PolaFormer: Polarity-aware Linear Attention for Vision Transformers
by: Meng, Weikang, et al.
Published: (2025)
by: Meng, Weikang, et al.
Published: (2025)
GeoNorm: Unify Pre-Norm and Post-Norm with Geodesic Optimization
by: Zheng, Chuanyang, et al.
Published: (2026)
by: Zheng, Chuanyang, et al.
Published: (2026)
On the Role of Attention Masks and LayerNorm in Transformers
by: Wu, Xinyi, et al.
Published: (2024)
by: Wu, Xinyi, et al.
Published: (2024)
Post-Norm can Resharpen Attention
by: Zsámboki, Pál, et al.
Published: (2025)
by: Zsámboki, Pál, et al.
Published: (2025)
GradientStabilizer:Fix the Norm, Not the Gradient
by: Huang, Tianjin, et al.
Published: (2025)
by: Huang, Tianjin, et al.
Published: (2025)
Empirical Bound Information-Directed Sampling for Norm-Agnostic Bandits
by: Suder, Piotr M., et al.
Published: (2025)
by: Suder, Piotr M., et al.
Published: (2025)
VL Norm: Rethink Loss Aggregation in RLVR
by: He, Zhiyuan, et al.
Published: (2025)
by: He, Zhiyuan, et al.
Published: (2025)
Robust Capped lp-Norm Support Vector Ordinal Regression
by: Xiang, Haorui, et al.
Published: (2024)
by: Xiang, Haorui, et al.
Published: (2024)
Muown: Row-Norm Control for Muon Optimization
by: Lion, Kai, et al.
Published: (2026)
by: Lion, Kai, et al.
Published: (2026)
Norm-Bounded Low-Rank Adaptation
by: Wang, Ruigang, et al.
Published: (2025)
by: Wang, Ruigang, et al.
Published: (2025)
Scalable Frame-based Construction of Sociocultural NormBases for Socially-Aware Dialogues
by: Qu, Shilin, et al.
Published: (2024)
by: Qu, Shilin, et al.
Published: (2024)
SpanNorm: Reconciling Training Stability and Performance in Deep Transformers
by: Wang, Chao, et al.
Published: (2026)
by: Wang, Chao, et al.
Published: (2026)
Muon Optimizes Under Spectral Norm Constraints
by: Chen, Lizhang, et al.
Published: (2025)
by: Chen, Lizhang, et al.
Published: (2025)
Geometry and Dynamics of LayerNorm
by: Riechers, Paul M.
Published: (2024)
by: Riechers, Paul M.
Published: (2024)
Scalable Optimization in the Modular Norm
by: Large, Tim, et al.
Published: (2024)
by: Large, Tim, et al.
Published: (2024)
The Ky Fan Norms and Beyond: Dual Norms and Combinations for Matrix Optimization
by: Kravatskiy, Alexey, et al.
Published: (2025)
by: Kravatskiy, Alexey, et al.
Published: (2025)
Clipping Improves Adam-Norm and AdaGrad-Norm when the Noise Is Heavy-Tailed
by: Chezhegov, Savelii, et al.
Published: (2024)
by: Chezhegov, Savelii, et al.
Published: (2024)
Optimal Scaling Needs Optimal Norm
by: Filatov, Oleg, et al.
Published: (2025)
by: Filatov, Oleg, et al.
Published: (2025)
FlashNorm: Fast Normalization for Transformers
by: Graef, Nils, et al.
Published: (2024)
by: Graef, Nils, et al.
Published: (2024)
Nuclear Norm Regularization for Deep Learning
by: Scarvelis, Christopher, et al.
Published: (2024)
by: Scarvelis, Christopher, et al.
Published: (2024)
Parametric $ρ$-Norm Scaling Calibration
by: Zhang, Siyuan, et al.
Published: (2024)
by: Zhang, Siyuan, et al.
Published: (2024)
UnitNorm: Rethinking Normalization for Transformers in Time Series
by: Huang, Nan, et al.
Published: (2024)
by: Huang, Nan, et al.
Published: (2024)
Measuring Social Norms of Large Language Models
by: Yuan, Ye, et al.
Published: (2024)
by: Yuan, Ye, et al.
Published: (2024)
Norm Augmented Graph AutoEncoders for Link Prediction
by: Liu, Yunhui, et al.
Published: (2025)
by: Liu, Yunhui, et al.
Published: (2025)
Greedy-Gnorm: A Gradient Matrix Norm-Based Alternative to Attention Entropy for Head Pruning
by: Guo, Yuxi, et al.
Published: (2026)
by: Guo, Yuxi, et al.
Published: (2026)
On the Importance of Embedding Norms in Self-Supervised Learning
by: Draganov, Andrew, et al.
Published: (2025)
by: Draganov, Andrew, et al.
Published: (2025)
Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization
by: Xie, Shuo, et al.
Published: (2024)
by: Xie, Shuo, et al.
Published: (2024)
Minimum-Norm Interpolation Under Covariate Shift
by: Mallinar, Neil, et al.
Published: (2024)
by: Mallinar, Neil, et al.
Published: (2024)
Optimal Excess Risk Bounds for Empirical Risk Minimization on $p$-Norm Linear Regression
by: Hanchi, Ayoub El, et al.
Published: (2023)
by: Hanchi, Ayoub El, et al.
Published: (2023)
Old Optimizer, New Norm: An Anthology
by: Bernstein, Jeremy, et al.
Published: (2024)
by: Bernstein, Jeremy, et al.
Published: (2024)
Distribution Estimation under the Infinity Norm
by: Kontorovich, Aryeh, et al.
Published: (2024)
by: Kontorovich, Aryeh, et al.
Published: (2024)
Neural Weight Norm = Kolmogorov Complexity
by: Musat, Tiberiu
Published: (2026)
by: Musat, Tiberiu
Published: (2026)
Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression
by: Ioushua, Shahar Stein, et al.
Published: (2023)
by: Ioushua, Shahar Stein, et al.
Published: (2023)
Discriminability-Driven Spatial-Channel Selection with Gradient Norm for Drone Signal OOD Detection
by: Feng, Chuhan, et al.
Published: (2026)
by: Feng, Chuhan, et al.
Published: (2026)
Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms
by: Li, Yi, et al.
Published: (2024)
by: Li, Yi, et al.
Published: (2024)
Adaptive Norm-Based Regularization for Neural Networks
by: Qasim, Muhammad, et al.
Published: (2026)
by: Qasim, Muhammad, et al.
Published: (2026)
Transformers Don't Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability
by: Baroni, Luca, et al.
Published: (2025)
by: Baroni, Luca, et al.
Published: (2025)
Similar Items
-
MirrorLA: Reflecting Feature Map for Vision Linear Attention
by: Meng, Weikang, et al.
Published: (2026) -
STILL: Selecting Tokens for Intra-Layer Hybrid Attention to Linearize LLMs
by: Meng, Weikang, et al.
Published: (2026) -
ConjNorm: Tractable Density Estimation for Out-of-Distribution Detection
by: Peng, Bo, et al.
Published: (2024) -
PolaFormer: Polarity-aware Linear Attention for Vision Transformers
by: Meng, Weikang, et al.
Published: (2025) -
GeoNorm: Unify Pre-Norm and Post-Norm with Geodesic Optimization
by: Zheng, Chuanyang, et al.
Published: (2026)