Saved in:
| Main Author: | Alekseev, Sergey |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.11890 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Exploring and Improving Initialization for Deep Graph Neural Networks: A Signal Propagation Perspective
by: Wang, Senmiao, et al.
Published: (2025)
by: Wang, Senmiao, et al.
Published: (2025)
Geometric Dynamics of Signal Propagation Predict Trainability of Transformers
by: Cowsik, Aditya, et al.
Published: (2024)
by: Cowsik, Aditya, et al.
Published: (2024)
Stronger Normalization-Free Transformers
by: Chen, Mingzhi, et al.
Published: (2025)
by: Chen, Mingzhi, et al.
Published: (2025)
DyTTP: Trajectory Prediction with Normalization-Free Transformers
by: Zhu, JianLin, et al.
Published: (2025)
by: Zhu, JianLin, et al.
Published: (2025)
Beyond Gaussian Initializations: Signal Preserving Weight Initialization for Odd-Sigmoid Activations
by: Lee, Hyunwoo, et al.
Published: (2025)
by: Lee, Hyunwoo, et al.
Published: (2025)
Beyond Oversquashing: Understanding Signal Propagation in GNNs Via Observables
by: Nagar, Eden, et al.
Published: (2026)
by: Nagar, Eden, et al.
Published: (2026)
Convolutional Signal Propagation: A Simple Scalable Algorithm for Hypergraphs
by: Procházka, Pavel, et al.
Published: (2024)
by: Procházka, Pavel, et al.
Published: (2024)
Normalize Then Propagate: Efficient Homophilous Regularization for Few-shot Semi-Supervised Node Classification
by: Zhang, Baoming, et al.
Published: (2025)
by: Zhang, Baoming, et al.
Published: (2025)
FlashNorm: Fast Normalization for Transformers
by: Graef, Nils, et al.
Published: (2024)
by: Graef, Nils, et al.
Published: (2024)
No Free Prune: Information-Theoretic Barriers to Pruning at Initialization
by: Kumar, Tanishq, et al.
Published: (2024)
by: Kumar, Tanishq, et al.
Published: (2024)
Uncertainty Propagation in the Fast Fourier Transform
by: Schmid, Luca, et al.
Published: (2025)
by: Schmid, Luca, et al.
Published: (2025)
Conditional Pseudo-Reversible Normalizing Flow for Surrogate Modeling in Quantifying Uncertainty Propagation
by: Yang, Minglei, et al.
Published: (2024)
by: Yang, Minglei, et al.
Published: (2024)
ART: Artifact Removal Transformer for Reconstructing Noise-Free Multichannel Electroencephalographic Signals
by: Chuang, Chun-Hsiang, et al.
Published: (2024)
by: Chuang, Chun-Hsiang, et al.
Published: (2024)
Normalized Matching Transformer
by: Pourhadi, Abtin, et al.
Published: (2025)
by: Pourhadi, Abtin, et al.
Published: (2025)
Nash Initialization for Recurrent Depth Transformers: Stable Signal Propagation at Initialization Without Layer Normalization
by: Bigeard, Nicolas
Published: (2026)
by: Bigeard, Nicolas
Published: (2026)
Learning in Compact Spaces with Approximately Normalized Transformer
by: Franke, Jörg K. H., et al.
Published: (2025)
by: Franke, Jörg K. H., et al.
Published: (2025)
Real-time Prediction of Urban Sound Propagation with Conditioned Normalizing Flows
by: Eckerle, Achim, et al.
Published: (2025)
by: Eckerle, Achim, et al.
Published: (2025)
Universal Learning of Stochastic Dynamics for Exact Belief Propagation using Bernstein Normalizing Flows
by: Amorese, Peter, et al.
Published: (2025)
by: Amorese, Peter, et al.
Published: (2025)
Observable Propagation: Uncovering Feature Vectors in Transformers
by: Dunefsky, Jacob, et al.
Published: (2023)
by: Dunefsky, Jacob, et al.
Published: (2023)
Graph Propagation Transformer for Graph Representation Learning
by: Chen, Zhe, et al.
Published: (2023)
by: Chen, Zhe, et al.
Published: (2023)
Differentially Private Clustered Federated Learning with Privacy-Preserving Initialization and Normality-Driven Aggregation
by: Xu, Jie, et al.
Published: (2026)
by: Xu, Jie, et al.
Published: (2026)
The Free Transformer
by: Fleuret, François
Published: (2025)
by: Fleuret, François
Published: (2025)
Learning Rate Transfer in Normalized Transformers
by: Shigida, Boris, et al.
Published: (2026)
by: Shigida, Boris, et al.
Published: (2026)
Mind the Gap: a Spectral Analysis of Rank Collapse and Signal Propagation in Attention Layers
by: Saada, Thiziri Nait, et al.
Published: (2024)
by: Saada, Thiziri Nait, et al.
Published: (2024)
UnitNorm: Rethinking Normalization for Transformers in Time Series
by: Huang, Nan, et al.
Published: (2024)
by: Huang, Nan, et al.
Published: (2024)
Neural Click Models for Recommender Systems
by: Shirokikh, Mikhail, et al.
Published: (2024)
by: Shirokikh, Mikhail, et al.
Published: (2024)
Scalable Back-Propagation-Free Training of Optical Physics-Informed Neural Networks
by: Zhao, Yequan, et al.
Published: (2025)
by: Zhao, Yequan, et al.
Published: (2025)
How Long Does Infinite Width Last? Signal Propagation in Long-Range Linear Recurrences
by: Seleznova, Mariia
Published: (2026)
by: Seleznova, Mariia
Published: (2026)
Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks
by: Kinderman, Edan, et al.
Published: (2024)
by: Kinderman, Edan, et al.
Published: (2024)
Initialization is Critical to Whether Transformers Fit Composite Functions by Reasoning or Memorizing
by: Zhang, Zhongwang, et al.
Published: (2024)
by: Zhang, Zhongwang, et al.
Published: (2024)
Local to Global: Learning Dynamics and Effect of Initialization for Transformers
by: Makkuva, Ashok Vardhan, et al.
Published: (2024)
by: Makkuva, Ashok Vardhan, et al.
Published: (2024)
CONTRA: Conformal Prediction Region via Normalizing Flow Transformation
by: Fang, Zhenhan, et al.
Published: (2026)
by: Fang, Zhenhan, et al.
Published: (2026)
Looped Transformers with Layer Normalization Provably Learn the Power Method
by: Wu, Lyumin, et al.
Published: (2026)
by: Wu, Lyumin, et al.
Published: (2026)
Free-form Flows: Make Any Architecture a Normalizing Flow
by: Draxler, Felix, et al.
Published: (2023)
by: Draxler, Felix, et al.
Published: (2023)
Scalable Equilibrium Propagation via Intermediate Error Signals for Deep Convolutional CRNNs
by: Lin, Jiaqi, et al.
Published: (2025)
by: Lin, Jiaqi, et al.
Published: (2025)
The Radio-Frequency Transformer for Signal Separation
by: Lifar, Egor, et al.
Published: (2026)
by: Lifar, Egor, et al.
Published: (2026)
Stability of Transformers under Layer Normalization
by: Kan, Kelvin, et al.
Published: (2025)
by: Kan, Kelvin, et al.
Published: (2025)
Transformers Are Born Biased: Structural Inductive Biases at Random Initialization and Their Practical Consequences
by: Li, Siquan, et al.
Published: (2026)
by: Li, Siquan, et al.
Published: (2026)
Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models
by: Kedia, Akhil, et al.
Published: (2024)
by: Kedia, Akhil, et al.
Published: (2024)
Dataset-Free Weight-Initialization on Restricted Boltzmann Machine
by: Yasuda, Muneki, et al.
Published: (2024)
by: Yasuda, Muneki, et al.
Published: (2024)
Similar Items
-
Exploring and Improving Initialization for Deep Graph Neural Networks: A Signal Propagation Perspective
by: Wang, Senmiao, et al.
Published: (2025) -
Geometric Dynamics of Signal Propagation Predict Trainability of Transformers
by: Cowsik, Aditya, et al.
Published: (2024) -
Stronger Normalization-Free Transformers
by: Chen, Mingzhi, et al.
Published: (2025) -
DyTTP: Trajectory Prediction with Normalization-Free Transformers
by: Zhu, JianLin, et al.
Published: (2025) -
Beyond Gaussian Initializations: Signal Preserving Weight Initialization for Odd-Sigmoid Activations
by: Lee, Hyunwoo, et al.
Published: (2025)