Saved in:
| Main Authors: | François, Alexandre, Orvieto, Antonio, Bach, Francis |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.09287 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks
by: Sieber, Jerome, et al.
Published: (2024)
by: Sieber, Jerome, et al.
Published: (2024)
Recurrent neural networks: vanishing and exploding gradients are not the end of the story
by: Zucchet, Nicolas, et al.
Published: (2024)
by: Zucchet, Nicolas, et al.
Published: (2024)
Super Consistency of Neural Network Landscapes and Learning Rate Transfer
by: Noci, Lorenzo, et al.
Published: (2024)
by: Noci, Lorenzo, et al.
Published: (2024)
NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs
by: Köprücü, Nursena, et al.
Published: (2024)
by: Köprücü, Nursena, et al.
Published: (2024)
Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues
by: Orvieto, Antonio, et al.
Published: (2023)
by: Orvieto, Antonio, et al.
Published: (2023)
Recurrent Distance Filtering for Graph Representation Learning
by: Ding, Yuhui, et al.
Published: (2023)
by: Ding, Yuhui, et al.
Published: (2023)
Loss Landscape Characterization of Neural Networks without Over-Parametrization
by: Islamov, Rustem, et al.
Published: (2024)
by: Islamov, Rustem, et al.
Published: (2024)
Enhanced Feature Learning via Regularisation: Integrating Neural Networks and Kernel Methods
by: Follain, Bertille, et al.
Published: (2024)
by: Follain, Bertille, et al.
Published: (2024)
Adam Simplified: Bias Correction Debunked
by: Laing, Sam, et al.
Published: (2025)
by: Laing, Sam, et al.
Published: (2025)
Revisiting associative recall in modern recurrent models
by: Okpekpe, Destiny, et al.
Published: (2025)
by: Okpekpe, Destiny, et al.
Published: (2025)
Design Principles for Sequence Models via Coefficient Dynamics
by: Sieber, Jerome, et al.
Published: (2025)
by: Sieber, Jerome, et al.
Published: (2025)
An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes
by: Orvieto, Antonio, et al.
Published: (2024)
by: Orvieto, Antonio, et al.
Published: (2024)
Weight-Space Linear Recurrent Neural Networks
by: Nzoyem, Roussel Desmond, et al.
Published: (2025)
by: Nzoyem, Roussel Desmond, et al.
Published: (2025)
In Search of Adam's Secret Sauce
by: Orvieto, Antonio, et al.
Published: (2025)
by: Orvieto, Antonio, et al.
Published: (2025)
Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
by: Movahedi, Sajad, et al.
Published: (2024)
by: Movahedi, Sajad, et al.
Published: (2024)
Scaling Laws for Gradient Descent and Sign Descent for Linear Bigram Models under Zipf's Law
by: Kunstner, Frederik, et al.
Published: (2025)
by: Kunstner, Frederik, et al.
Published: (2025)
Generalized Linear Mode Connectivity for Transformers
by: Theus, Alexander, et al.
Published: (2025)
by: Theus, Alexander, et al.
Published: (2025)
Continuous-Time Piecewise-Linear Recurrent Neural Networks
by: Brändle, Alena, et al.
Published: (2026)
by: Brändle, Alena, et al.
Published: (2026)
When, Where and Why to Average Weights?
by: Ajroldi, Niccolò, et al.
Published: (2025)
by: Ajroldi, Niccolò, et al.
Published: (2025)
Explaining Grokking in Transformers through the Lens of Inductive Bias
by: Singh, Jaisidh, et al.
Published: (2026)
by: Singh, Jaisidh, et al.
Published: (2026)
Universal Dynamics of Warmup Stable Decay: understanding WSD beyond Transformers
by: Belloni, Annalisa, et al.
Published: (2026)
by: Belloni, Annalisa, et al.
Published: (2026)
Improved state mixing in higher-order and block diagonal linear recurrent networks
by: Dubinin, Igor, et al.
Published: (2026)
by: Dubinin, Igor, et al.
Published: (2026)
Dynamics and Representation Structure of Local Approximations to Gradient-Based Learning in Linear Recurrent Neural Networks
by: Williams, Ezekiel, et al.
Published: (2026)
by: Williams, Ezekiel, et al.
Published: (2026)
Recurrent Neural Networks with Linear Structures for Electricity Price Forecasting
by: Amor, Souhir Ben, et al.
Published: (2025)
by: Amor, Souhir Ben, et al.
Published: (2025)
Efficient Optimization Algorithms for Linear Adversarial Training
by: RIbeiro, Antônio H., et al.
Published: (2024)
by: RIbeiro, Antônio H., et al.
Published: (2024)
Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling
by: Srećković, Teodora, et al.
Published: (2025)
by: Srećković, Teodora, et al.
Published: (2025)
High-Dimensional Analysis of Gradient Flow for Extensive-Width Quadratic Neural Networks
by: Martin, Simon, et al.
Published: (2026)
by: Martin, Simon, et al.
Published: (2026)
Fast Training of Recurrent Neural Networks with Stationary State Feedbacks
by: Caillon, Paul, et al.
Published: (2025)
by: Caillon, Paul, et al.
Published: (2025)
Scaling Recurrent Neural Networks to a Billion Parameters with Zero-Order Optimization
by: Chaubard, Francois, et al.
Published: (2025)
by: Chaubard, Francois, et al.
Published: (2025)
Revisiting Bi-Linear State Transitions in Recurrent Neural Networks
by: Ebrahimi, M. Reza, et al.
Published: (2025)
by: Ebrahimi, M. Reza, et al.
Published: (2025)
Advancing Regular Language Reasoning in Linear Recurrent Neural Networks
by: Fan, Ting-Han, et al.
Published: (2023)
by: Fan, Ting-Han, et al.
Published: (2023)
On the Effectiveness of the z-Transform Method in Quadratic Optimization
by: Bach, Francis
Published: (2025)
by: Bach, Francis
Published: (2025)
A Convex Loss Function for Set Prediction with Optimal Trade-offs Between Size and Conditional Coverage
by: Bach, Francis
Published: (2025)
by: Bach, Francis
Published: (2025)
On the low-shot transferability of [V]-Mamba
by: Misra, Diganta, et al.
Published: (2024)
by: Misra, Diganta, et al.
Published: (2024)
Accelerating Linear Recurrent Neural Networks for the Edge with Unstructured Sparsity
by: Pierro, Alessandro, et al.
Published: (2025)
by: Pierro, Alessandro, et al.
Published: (2025)
Towards Understanding Self-Pretraining for Sequence Classification
by: Coser, Omar, et al.
Published: (2026)
by: Coser, Omar, et al.
Published: (2026)
A Spectral Framework for Closed-Form Relative Density Estimation
by: Bach, Francis
Published: (2026)
by: Bach, Francis
Published: (2026)
Quantized Approximately Orthogonal Recurrent Neural Networks
by: Foucault, Armand, et al.
Published: (2024)
by: Foucault, Armand, et al.
Published: (2024)
Fixed-Point RNNs: Interpolating from Diagonal to Dense
by: Movahedi, Sajad, et al.
Published: (2025)
by: Movahedi, Sajad, et al.
Published: (2025)
Sampling Binary Data by Denoising through Score Functions
by: Bach, Francis, et al.
Published: (2025)
by: Bach, Francis, et al.
Published: (2025)
Similar Items
-
Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks
by: Sieber, Jerome, et al.
Published: (2024) -
Recurrent neural networks: vanishing and exploding gradients are not the end of the story
by: Zucchet, Nicolas, et al.
Published: (2024) -
Super Consistency of Neural Network Landscapes and Learning Rate Transfer
by: Noci, Lorenzo, et al.
Published: (2024) -
NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs
by: Köprücü, Nursena, et al.
Published: (2024) -
Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues
by: Orvieto, Antonio, et al.
Published: (2023)