:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	François, Alexandre, Orvieto, Antonio, Bach, Francis
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2502.09287
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks
by: Sieber, Jerome, et al.
Published: (2024)

Recurrent neural networks: vanishing and exploding gradients are not the end of the story
by: Zucchet, Nicolas, et al.
Published: (2024)

Super Consistency of Neural Network Landscapes and Learning Rate Transfer
by: Noci, Lorenzo, et al.
Published: (2024)

NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs
by: Köprücü, Nursena, et al.
Published: (2024)

Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues
by: Orvieto, Antonio, et al.
Published: (2023)

Recurrent Distance Filtering for Graph Representation Learning
by: Ding, Yuhui, et al.
Published: (2023)

Loss Landscape Characterization of Neural Networks without Over-Parametrization
by: Islamov, Rustem, et al.
Published: (2024)

Enhanced Feature Learning via Regularisation: Integrating Neural Networks and Kernel Methods
by: Follain, Bertille, et al.
Published: (2024)

Adam Simplified: Bias Correction Debunked
by: Laing, Sam, et al.
Published: (2025)

Revisiting associative recall in modern recurrent models
by: Okpekpe, Destiny, et al.
Published: (2025)

Design Principles for Sequence Models via Coefficient Dynamics
by: Sieber, Jerome, et al.
Published: (2025)

An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes
by: Orvieto, Antonio, et al.
Published: (2024)

Weight-Space Linear Recurrent Neural Networks
by: Nzoyem, Roussel Desmond, et al.
Published: (2025)

In Search of Adam's Secret Sauce
by: Orvieto, Antonio, et al.
Published: (2025)

Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
by: Movahedi, Sajad, et al.
Published: (2024)

Scaling Laws for Gradient Descent and Sign Descent for Linear Bigram Models under Zipf's Law
by: Kunstner, Frederik, et al.
Published: (2025)

Generalized Linear Mode Connectivity for Transformers
by: Theus, Alexander, et al.
Published: (2025)

Continuous-Time Piecewise-Linear Recurrent Neural Networks
by: Brändle, Alena, et al.
Published: (2026)

When, Where and Why to Average Weights?
by: Ajroldi, Niccolò, et al.
Published: (2025)

Explaining Grokking in Transformers through the Lens of Inductive Bias
by: Singh, Jaisidh, et al.
Published: (2026)

Universal Dynamics of Warmup Stable Decay: understanding WSD beyond Transformers
by: Belloni, Annalisa, et al.
Published: (2026)

Improved state mixing in higher-order and block diagonal linear recurrent networks
by: Dubinin, Igor, et al.
Published: (2026)

Dynamics and Representation Structure of Local Approximations to Gradient-Based Learning in Linear Recurrent Neural Networks
by: Williams, Ezekiel, et al.
Published: (2026)

Recurrent Neural Networks with Linear Structures for Electricity Price Forecasting
by: Amor, Souhir Ben, et al.
Published: (2025)

Efficient Optimization Algorithms for Linear Adversarial Training
by: RIbeiro, Antônio H., et al.
Published: (2024)

Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling
by: Srećković, Teodora, et al.
Published: (2025)

High-Dimensional Analysis of Gradient Flow for Extensive-Width Quadratic Neural Networks
by: Martin, Simon, et al.
Published: (2026)

Fast Training of Recurrent Neural Networks with Stationary State Feedbacks
by: Caillon, Paul, et al.
Published: (2025)

Scaling Recurrent Neural Networks to a Billion Parameters with Zero-Order Optimization
by: Chaubard, Francois, et al.
Published: (2025)

Revisiting Bi-Linear State Transitions in Recurrent Neural Networks
by: Ebrahimi, M. Reza, et al.
Published: (2025)

Advancing Regular Language Reasoning in Linear Recurrent Neural Networks
by: Fan, Ting-Han, et al.
Published: (2023)

On the Effectiveness of the z-Transform Method in Quadratic Optimization
by: Bach, Francis
Published: (2025)

A Convex Loss Function for Set Prediction with Optimal Trade-offs Between Size and Conditional Coverage
by: Bach, Francis
Published: (2025)

On the low-shot transferability of [V]-Mamba
by: Misra, Diganta, et al.
Published: (2024)

Accelerating Linear Recurrent Neural Networks for the Edge with Unstructured Sparsity
by: Pierro, Alessandro, et al.
Published: (2025)

Towards Understanding Self-Pretraining for Sequence Classification
by: Coser, Omar, et al.
Published: (2026)

A Spectral Framework for Closed-Form Relative Density Estimation
by: Bach, Francis
Published: (2026)

Quantized Approximately Orthogonal Recurrent Neural Networks
by: Foucault, Armand, et al.
Published: (2024)

Fixed-Point RNNs: Interpolating from Diagonal to Dense
by: Movahedi, Sajad, et al.
Published: (2025)

Sampling Binary Data by Denoising through Score Functions
by: Bach, Francis, et al.
Published: (2025)