Saved in:
| Main Authors: | Wang, Mingze, E, Weinan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.00522 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks
by: Wang, Mingze, et al.
Published: (2025)
by: Wang, Mingze, et al.
Published: (2025)
How Transformers Get Rich: Approximation and Dynamics Analysis
by: Wang, Mingze, et al.
Published: (2024)
by: Wang, Mingze, et al.
Published: (2024)
GradPower: Powering Gradients for Faster Language Model Pre-Training
by: Wang, Jinbo, et al.
Published: (2025)
by: Wang, Jinbo, et al.
Published: (2025)
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
by: Wang, Jinbo, et al.
Published: (2025)
by: Wang, Jinbo, et al.
Published: (2025)
On the Expressive Power of Floating-Point Transformers
by: Park, Sejun, et al.
Published: (2026)
by: Park, Sejun, et al.
Published: (2026)
On the Expressive Power of Contextual Relations in Transformers
by: Fraiman, Demián
Published: (2026)
by: Fraiman, Demián
Published: (2026)
Expressivity-Efficiency Tradeoffs for Hybrid Sequence Models
by: Cooper, John, et al.
Published: (2026)
by: Cooper, John, et al.
Published: (2026)
More Expressive Feedforward Layers: Part I. Token-Adaptive Mixing of Activations
by: Wang, Mingze, et al.
Published: (2026)
by: Wang, Mingze, et al.
Published: (2026)
Transformers are Expressive, But Are They Expressive Enough for Regression?
by: Nath, Swaroop, et al.
Published: (2024)
by: Nath, Swaroop, et al.
Published: (2024)
Exact Expressive Power of Transformers with Padding
by: Merrill, William, et al.
Published: (2025)
by: Merrill, William, et al.
Published: (2025)
The Expressive Power of Transformers with Chain of Thought
by: Merrill, William, et al.
Published: (2023)
by: Merrill, William, et al.
Published: (2023)
Understanding and Enhancing Mask-Based Pretraining towards Universal Representations
by: Dong, Mingze, et al.
Published: (2025)
by: Dong, Mingze, et al.
Published: (2025)
Towards Understanding the Expressive Power of GNNs with Global Readout
by: Funk, Maurice, et al.
Published: (2026)
by: Funk, Maurice, et al.
Published: (2026)
Neural Attention: A Novel Mechanism for Enhanced Expressive Power in Transformer Models
by: DiGiugno, Andrew, et al.
Published: (2025)
by: DiGiugno, Andrew, et al.
Published: (2025)
On the Theoretical Expressive Power and the Design Space of Higher-Order Graph Transformers
by: Zhou, Cai, et al.
Published: (2024)
by: Zhou, Cai, et al.
Published: (2024)
On The Expressive Power of GNN Derivatives
by: Eitan, Yam, et al.
Published: (2025)
by: Eitan, Yam, et al.
Published: (2025)
Lyra: An Efficient and Expressive Subquadratic Architecture for Modeling Biological Sequences
by: Ramesh, Krithik, et al.
Published: (2025)
by: Ramesh, Krithik, et al.
Published: (2025)
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
by: Xu, Kevin, et al.
Published: (2024)
by: Xu, Kevin, et al.
Published: (2024)
Structured Linear CDEs: Maximally Expressive and Parallel-in-Time Sequence Models
by: Walker, Benjamin, et al.
Published: (2025)
by: Walker, Benjamin, et al.
Published: (2025)
On the Expressive Power of Transformers for Maxout Networks and Continuous Piecewise Linear Functions
by: Gu, Linyan, et al.
Published: (2026)
by: Gu, Linyan, et al.
Published: (2026)
Expressive Power of Temporal Message Passing
by: Wałęga, Przemysław Andrzej, et al.
Published: (2024)
by: Wałęga, Przemysław Andrzej, et al.
Published: (2024)
Rethinking the Expressive Power of GNNs via Graph Biconnectivity
by: Zhang, Bohang, et al.
Published: (2023)
by: Zhang, Bohang, et al.
Published: (2023)
Expanding Expressivity in Transformer Models with MöbiusAttention
by: Halacheva, Anna-Maria, et al.
Published: (2024)
by: Halacheva, Anna-Maria, et al.
Published: (2024)
The Expressive Power of Low Precision Softmax Transformers with (Summarized) Chain-of-Thought
by: Brösamle, Moritz, et al.
Published: (2026)
by: Brösamle, Moritz, et al.
Published: (2026)
GNNs Meet Sequence Models Along the Shortest-Path: an Expressive Method for Link Prediction
by: Ferrini, Francesco, et al.
Published: (2025)
by: Ferrini, Francesco, et al.
Published: (2025)
On the Expressive Power of GNNs to Solve Linear SDPs
by: Qian, Chendi, et al.
Published: (2026)
by: Qian, Chendi, et al.
Published: (2026)
Understanding Expressivity of GNN in Rule Learning
by: Qiu, Haiquan, et al.
Published: (2023)
by: Qiu, Haiquan, et al.
Published: (2023)
k-Maximum Inner Product Attention for Graph Transformers and the Expressive Power of GraphGPS
by: De Schouwer, Jonas, et al.
Published: (2026)
by: De Schouwer, Jonas, et al.
Published: (2026)
On the Expressive Power of Graph Neural Networks
by: Nalwade, Ashwin, et al.
Published: (2024)
by: Nalwade, Ashwin, et al.
Published: (2024)
On the Expressive Power of Sparse Geometric MPNNs
by: Sverdlov, Yonatan, et al.
Published: (2024)
by: Sverdlov, Yonatan, et al.
Published: (2024)
On the Expressive Power of GNNs for Boolean Satisfiability
by: Peltonen, Saku, et al.
Published: (2026)
by: Peltonen, Saku, et al.
Published: (2026)
Improving Generalization and Convergence by Enhancing Implicit Regularization
by: Wang, Mingze, et al.
Published: (2024)
by: Wang, Mingze, et al.
Published: (2024)
On the Expressive Power of Subgraph Graph Neural Networks for Graphs with Bounded Cycles
by: Chen, Ziang, et al.
Published: (2025)
by: Chen, Ziang, et al.
Published: (2025)
Expressivity of Transformers: A Tropical Geometry Perspective
by: Su, Ye, et al.
Published: (2026)
by: Su, Ye, et al.
Published: (2026)
On the Expressive Power and Limitations of Multi-Layer SSMs
by: Zubić, Nikola, et al.
Published: (2026)
by: Zubić, Nikola, et al.
Published: (2026)
On the Expressive Power of Permutation-Equivariant Weight-Space Networks
by: Dayan, Adir, et al.
Published: (2026)
by: Dayan, Adir, et al.
Published: (2026)
Maximising Quantum-Computing Expressive Power through Randomised Circuits
by: Yang, Yingli, et al.
Published: (2023)
by: Yang, Yingli, et al.
Published: (2023)
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers
by: Merrill, William, et al.
Published: (2025)
by: Merrill, William, et al.
Published: (2025)
A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent
by: Wang, Mingze, et al.
Published: (2023)
by: Wang, Mingze, et al.
Published: (2023)
On the Expressive Power of Tree-Structured Probabilistic Circuits
by: Yin, Lang, et al.
Published: (2024)
by: Yin, Lang, et al.
Published: (2024)
Similar Items
-
On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks
by: Wang, Mingze, et al.
Published: (2025) -
How Transformers Get Rich: Approximation and Dynamics Analysis
by: Wang, Mingze, et al.
Published: (2024) -
GradPower: Powering Gradients for Faster Language Model Pre-Training
by: Wang, Jinbo, et al.
Published: (2025) -
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
by: Wang, Jinbo, et al.
Published: (2025) -
On the Expressive Power of Floating-Point Transformers
by: Park, Sejun, et al.
Published: (2026)