:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Dubinin, Igor, Orvieto, Antonio, Effenberger, Felix
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2602.12021
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Fading memory as inductive bias in residual recurrent networks
by: Dubinin, Igor, et al.
Published: (2023)

Revisiting associative recall in modern recurrent models
by: Okpekpe, Destiny, et al.
Published: (2025)

Recurrent neural networks: vanishing and exploding gradients are not the end of the story
by: Zucchet, Nicolas, et al.
Published: (2024)

Adam Simplified: Bias Correction Debunked
by: Laing, Sam, et al.
Published: (2025)

An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes
by: Orvieto, Antonio, et al.
Published: (2024)

Fixed-Point RNNs: Interpolating from Diagonal to Dense
by: Movahedi, Sajad, et al.
Published: (2025)

In Search of Adam's Secret Sauce
by: Orvieto, Antonio, et al.
Published: (2025)

Precise asymptotics of reweighted least-squares algorithms for linear diagonal networks
by: Kaushik, Chiraag, et al.
Published: (2024)

Explaining Grokking in Transformers through the Lens of Inductive Bias
by: Singh, Jaisidh, et al.
Published: (2026)

Universal Dynamics of Warmup Stable Decay: understanding WSD beyond Transformers
by: Belloni, Annalisa, et al.
Published: (2026)

An Uncertainty Principle for Linear Recurrent Neural Networks
by: François, Alexandre, et al.
Published: (2025)

When, Where and Why to Average Weights?
by: Ajroldi, Niccolò, et al.
Published: (2025)

Bridging the Gap Between Climate Science and Machine Learning in Climate Model Emulation
by: Schmidt, Luca, et al.
Published: (2026)

Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues
by: Orvieto, Antonio, et al.
Published: (2023)

Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling
by: Srećković, Teodora, et al.
Published: (2025)

Turbine location-aware multi-decadal wind power predictions for Germany using CMIP6
by: Effenberger, Nina, et al.
Published: (2024)

NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs
by: Köprücü, Nursena, et al.
Published: (2024)

On the low-shot transferability of [V]-Mamba
by: Misra, Diganta, et al.
Published: (2024)

Towards Understanding Self-Pretraining for Sequence Classification
by: Coser, Omar, et al.
Published: (2026)

Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
by: Movahedi, Sajad, et al.
Published: (2024)

Super Consistency of Neural Network Landscapes and Learning Rate Transfer
by: Noci, Lorenzo, et al.
Published: (2024)

Gradient-free training of recurrent neural networks
by: Bolager, Erik Lien, et al.
Published: (2024)

Can you Finetune your Binoculars? Embedding Text Watermarks into the Weights of Large Language Models
by: Elhassan, Fay, et al.
Published: (2025)

Loss Landscape Characterization of Neural Networks without Over-Parametrization
by: Islamov, Rustem, et al.
Published: (2024)

Enhancing Optimizer Stability: Momentum Adaptation of The NGN Step-size
by: Islamov, Rustem, et al.
Published: (2025)

Dance recalibration for dance coherency with recurrent convolution block
by: Eum, Seungho, et al.
Published: (2025)

Closed-form $\ell_r$ norm scaling with data for overparameterized linear regression and diagonal linear networks under $\ell_p$ bias
by: Zhang, Shuofeng, et al.
Published: (2025)

Recurrent Distance Filtering for Graph Representation Learning
by: Ding, Yuhui, et al.
Published: (2023)

Muown: Row-Norm Control for Muon Optimization
by: Lion, Kai, et al.
Published: (2026)

How noise affects memory in linear recurrent networks
by: Guan, JingChuan, et al.
Published: (2024)

Exploring higher-order neural network node interactions with total correlation
by: Kerby, Thomas, et al.
Published: (2024)

An analog-electronic implementation of a harmonic oscillator recurrent neural network
by: Carvalho, Pedro, et al.
Published: (2025)

Gradient Descent on Logistic Regression: Do Large Step-Sizes Work with Data on the Sphere?
by: Meng, Si Yi, et al.
Published: (2025)

Design Principles for Sequence Models via Coefficient Dynamics
by: Sieber, Jerome, et al.
Published: (2025)

(Almost) Free Modality Stitching of Foundation Models
by: Singh, Jaisidh, et al.
Published: (2025)

Geometric sparsification in recurrent neural networks
by: Mackey, Wyatt, et al.
Published: (2024)

GASP: Guided Asymmetric Self-Play For Coding LLMs
by: Jana, Swadesh, et al.
Published: (2026)

Inferring stochastic low-rank recurrent neural networks from neural data
by: Pals, Matthijs, et al.
Published: (2024)

Downscaling land surface temperature data using edge detection and block-diagonal Gaussian process regression
by: Dandapanthula, Sanjit, et al.
Published: (2026)

Quaternion recurrent neural network with real-time recurrent learning and maximum correntropy criterion
by: Bourigault, Pauline, et al.
Published: (2024)