Saved in:
| Main Authors: | Dubinin, Igor, Orvieto, Antonio, Effenberger, Felix |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.12021 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Fading memory as inductive bias in residual recurrent networks
by: Dubinin, Igor, et al.
Published: (2023)
by: Dubinin, Igor, et al.
Published: (2023)
Revisiting associative recall in modern recurrent models
by: Okpekpe, Destiny, et al.
Published: (2025)
by: Okpekpe, Destiny, et al.
Published: (2025)
Recurrent neural networks: vanishing and exploding gradients are not the end of the story
by: Zucchet, Nicolas, et al.
Published: (2024)
by: Zucchet, Nicolas, et al.
Published: (2024)
Adam Simplified: Bias Correction Debunked
by: Laing, Sam, et al.
Published: (2025)
by: Laing, Sam, et al.
Published: (2025)
An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes
by: Orvieto, Antonio, et al.
Published: (2024)
by: Orvieto, Antonio, et al.
Published: (2024)
Fixed-Point RNNs: Interpolating from Diagonal to Dense
by: Movahedi, Sajad, et al.
Published: (2025)
by: Movahedi, Sajad, et al.
Published: (2025)
In Search of Adam's Secret Sauce
by: Orvieto, Antonio, et al.
Published: (2025)
by: Orvieto, Antonio, et al.
Published: (2025)
Precise asymptotics of reweighted least-squares algorithms for linear diagonal networks
by: Kaushik, Chiraag, et al.
Published: (2024)
by: Kaushik, Chiraag, et al.
Published: (2024)
Explaining Grokking in Transformers through the Lens of Inductive Bias
by: Singh, Jaisidh, et al.
Published: (2026)
by: Singh, Jaisidh, et al.
Published: (2026)
Universal Dynamics of Warmup Stable Decay: understanding WSD beyond Transformers
by: Belloni, Annalisa, et al.
Published: (2026)
by: Belloni, Annalisa, et al.
Published: (2026)
An Uncertainty Principle for Linear Recurrent Neural Networks
by: François, Alexandre, et al.
Published: (2025)
by: François, Alexandre, et al.
Published: (2025)
When, Where and Why to Average Weights?
by: Ajroldi, Niccolò, et al.
Published: (2025)
by: Ajroldi, Niccolò, et al.
Published: (2025)
Bridging the Gap Between Climate Science and Machine Learning in Climate Model Emulation
by: Schmidt, Luca, et al.
Published: (2026)
by: Schmidt, Luca, et al.
Published: (2026)
Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues
by: Orvieto, Antonio, et al.
Published: (2023)
by: Orvieto, Antonio, et al.
Published: (2023)
Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling
by: Srećković, Teodora, et al.
Published: (2025)
by: Srećković, Teodora, et al.
Published: (2025)
Turbine location-aware multi-decadal wind power predictions for Germany using CMIP6
by: Effenberger, Nina, et al.
Published: (2024)
by: Effenberger, Nina, et al.
Published: (2024)
NIMBA: Towards Robust and Principled Processing of Point Clouds With SSMs
by: Köprücü, Nursena, et al.
Published: (2024)
by: Köprücü, Nursena, et al.
Published: (2024)
On the low-shot transferability of [V]-Mamba
by: Misra, Diganta, et al.
Published: (2024)
by: Misra, Diganta, et al.
Published: (2024)
Towards Understanding Self-Pretraining for Sequence Classification
by: Coser, Omar, et al.
Published: (2026)
by: Coser, Omar, et al.
Published: (2026)
Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
by: Movahedi, Sajad, et al.
Published: (2024)
by: Movahedi, Sajad, et al.
Published: (2024)
Super Consistency of Neural Network Landscapes and Learning Rate Transfer
by: Noci, Lorenzo, et al.
Published: (2024)
by: Noci, Lorenzo, et al.
Published: (2024)
Gradient-free training of recurrent neural networks
by: Bolager, Erik Lien, et al.
Published: (2024)
by: Bolager, Erik Lien, et al.
Published: (2024)
Can you Finetune your Binoculars? Embedding Text Watermarks into the Weights of Large Language Models
by: Elhassan, Fay, et al.
Published: (2025)
by: Elhassan, Fay, et al.
Published: (2025)
Loss Landscape Characterization of Neural Networks without Over-Parametrization
by: Islamov, Rustem, et al.
Published: (2024)
by: Islamov, Rustem, et al.
Published: (2024)
Enhancing Optimizer Stability: Momentum Adaptation of The NGN Step-size
by: Islamov, Rustem, et al.
Published: (2025)
by: Islamov, Rustem, et al.
Published: (2025)
Dance recalibration for dance coherency with recurrent convolution block
by: Eum, Seungho, et al.
Published: (2025)
by: Eum, Seungho, et al.
Published: (2025)
Closed-form $\ell_r$ norm scaling with data for overparameterized linear regression and diagonal linear networks under $\ell_p$ bias
by: Zhang, Shuofeng, et al.
Published: (2025)
by: Zhang, Shuofeng, et al.
Published: (2025)
Recurrent Distance Filtering for Graph Representation Learning
by: Ding, Yuhui, et al.
Published: (2023)
by: Ding, Yuhui, et al.
Published: (2023)
Muown: Row-Norm Control for Muon Optimization
by: Lion, Kai, et al.
Published: (2026)
by: Lion, Kai, et al.
Published: (2026)
How noise affects memory in linear recurrent networks
by: Guan, JingChuan, et al.
Published: (2024)
by: Guan, JingChuan, et al.
Published: (2024)
Exploring higher-order neural network node interactions with total correlation
by: Kerby, Thomas, et al.
Published: (2024)
by: Kerby, Thomas, et al.
Published: (2024)
An analog-electronic implementation of a harmonic oscillator recurrent neural network
by: Carvalho, Pedro, et al.
Published: (2025)
by: Carvalho, Pedro, et al.
Published: (2025)
Gradient Descent on Logistic Regression: Do Large Step-Sizes Work with Data on the Sphere?
by: Meng, Si Yi, et al.
Published: (2025)
by: Meng, Si Yi, et al.
Published: (2025)
Design Principles for Sequence Models via Coefficient Dynamics
by: Sieber, Jerome, et al.
Published: (2025)
by: Sieber, Jerome, et al.
Published: (2025)
(Almost) Free Modality Stitching of Foundation Models
by: Singh, Jaisidh, et al.
Published: (2025)
by: Singh, Jaisidh, et al.
Published: (2025)
Geometric sparsification in recurrent neural networks
by: Mackey, Wyatt, et al.
Published: (2024)
by: Mackey, Wyatt, et al.
Published: (2024)
GASP: Guided Asymmetric Self-Play For Coding LLMs
by: Jana, Swadesh, et al.
Published: (2026)
by: Jana, Swadesh, et al.
Published: (2026)
Inferring stochastic low-rank recurrent neural networks from neural data
by: Pals, Matthijs, et al.
Published: (2024)
by: Pals, Matthijs, et al.
Published: (2024)
Downscaling land surface temperature data using edge detection and block-diagonal Gaussian process regression
by: Dandapanthula, Sanjit, et al.
Published: (2026)
by: Dandapanthula, Sanjit, et al.
Published: (2026)
Quaternion recurrent neural network with real-time recurrent learning and maximum correntropy criterion
by: Bourigault, Pauline, et al.
Published: (2024)
by: Bourigault, Pauline, et al.
Published: (2024)
Similar Items
-
Fading memory as inductive bias in residual recurrent networks
by: Dubinin, Igor, et al.
Published: (2023) -
Revisiting associative recall in modern recurrent models
by: Okpekpe, Destiny, et al.
Published: (2025) -
Recurrent neural networks: vanishing and exploding gradients are not the end of the story
by: Zucchet, Nicolas, et al.
Published: (2024) -
Adam Simplified: Bias Correction Debunked
by: Laing, Sam, et al.
Published: (2025) -
An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes
by: Orvieto, Antonio, et al.
Published: (2024)