Saved in:
| Main Authors: | Watanabe, Taishi, Karakida, Ryo, Teramae, Jun-nosuke |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.06961 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Dynamical mean-field theory for a highly heterogeneous neural population
by: Tomita, Futa, et al.
Published: (2024)
by: Tomita, Futa, et al.
Published: (2024)
Spectral density of correlated random matrices and nonmonotonic stability in hetero-associative memory networks
by: Tomoto, Arata, et al.
Published: (2025)
by: Tomoto, Arata, et al.
Published: (2025)
Energy-information trade-off makes the cortical critical power law the optimal coding
by: Tatsukawa, Tsuyoshi, et al.
Published: (2024)
by: Tatsukawa, Tsuyoshi, et al.
Published: (2024)
Recurrent Self-Attention Dynamics: An Energy-Agnostic Perspective from Jacobians
by: Tomihari, Akiyoshi, et al.
Published: (2025)
by: Tomihari, Akiyoshi, et al.
Published: (2025)
Understanding MLP-Mixer as a Wide and Sparse MLP
by: Hayase, Tomohiro, et al.
Published: (2023)
by: Hayase, Tomohiro, et al.
Published: (2023)
On the Parameterization of Second-Order Optimization Effective Towards the Infinite Width
by: Ishikawa, Satoki, et al.
Published: (2023)
by: Ishikawa, Satoki, et al.
Published: (2023)
Continual Learning in Modern Hopfield Networks with an Application to Diffusion Models
by: Takeda, Ken, et al.
Published: (2026)
by: Takeda, Ken, et al.
Published: (2026)
Self-attention Networks Localize When QK-eigenspectrum Concentrates
by: Bao, Han, et al.
Published: (2024)
by: Bao, Han, et al.
Published: (2024)
Optimal Layer Selection for Latent Data Augmentation
by: Takase, Tomoumi, et al.
Published: (2024)
by: Takase, Tomoumi, et al.
Published: (2024)
Local Loss Optimization in the Infinite Width: Stable Parameterization of Predictive Coding Networks and Target Propagation
by: Ishikawa, Satoki, et al.
Published: (2024)
by: Ishikawa, Satoki, et al.
Published: (2024)
Infinite-Width Limit of a Single Attention Layer: Analysis via Tensor Programs
by: Sakai, Mana, et al.
Published: (2025)
by: Sakai, Mana, et al.
Published: (2025)
Gaussian Equivalence for Self-Attention: Asymptotic Spectral Analysis of Attention Matrix
by: Hayase, Tomohiro, et al.
Published: (2025)
by: Hayase, Tomohiro, et al.
Published: (2025)
A Unified Framework for Critical Scaling of Inverse Temperature in Self-Attention
by: Hayase, Tomohiro, et al.
Published: (2026)
by: Hayase, Tomohiro, et al.
Published: (2026)
Hierarchical Associative Memory, Parallelized MLP-Mixer, and Symmetry Breaking
by: Karakida, Ryo, et al.
Published: (2024)
by: Karakida, Ryo, et al.
Published: (2024)
Induced Covariance for Causal Discovery in Linear Sparse Structures
by: Mohseni-Sehdeh, Saeed, et al.
Published: (2024)
by: Mohseni-Sehdeh, Saeed, et al.
Published: (2024)
Personalized Binomial DAGs Learning with Network Structured Covariates
by: Zhao, Boxin, et al.
Published: (2024)
by: Zhao, Boxin, et al.
Published: (2024)
Balancing Speed and Stability: The Trade-offs of FP8 vs. BF16 Training in LLMs
by: Fujii, Kazuki, et al.
Published: (2024)
by: Fujii, Kazuki, et al.
Published: (2024)
The Structure of Cross-Validation Error: Stability, Covariance, and Minimax Limits
by: Nachum, Ido, et al.
Published: (2025)
by: Nachum, Ido, et al.
Published: (2025)
On the Optimal Reasoning Length for RL-Trained Language Models
by: Nohara, Daisuke, et al.
Published: (2026)
by: Nohara, Daisuke, et al.
Published: (2026)
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
by: Nakamura, Taishi, et al.
Published: (2025)
by: Nakamura, Taishi, et al.
Published: (2025)
Expectation Error Bounds for Transfer Learning in Linear Regression and Linear Neural Networks
by: Liu, Meitong, et al.
Published: (2026)
by: Liu, Meitong, et al.
Published: (2026)
In-context Learning for Mixture of Linear Regressions: Existence, Generalization and Training Dynamics
by: Jin, Yanhao, et al.
Published: (2024)
by: Jin, Yanhao, et al.
Published: (2024)
The Price of Linear Time: Error Analysis of Structured Kernel Interpolation
by: Moreno, Alexander, et al.
Published: (2025)
by: Moreno, Alexander, et al.
Published: (2025)
Learning Joint and Individual Structure in Network Data with Covariates
by: James, Carson, et al.
Published: (2024)
by: James, Carson, et al.
Published: (2024)
Understanding Generalization in Transformers: Error Bounds and Training Dynamics Under Benign and Harmful Overfitting
by: Zhang, Yingying, et al.
Published: (2025)
by: Zhang, Yingying, et al.
Published: (2025)
Mitigating the Impact of Labeling Errors on Training via Rockafellian Relaxation
by: Chen, Louis L., et al.
Published: (2024)
by: Chen, Louis L., et al.
Published: (2024)
AutoLL: Automatic Linear Layout of Graphs based on Deep Neural Network
by: Watanabe, Chihiro, et al.
Published: (2021)
by: Watanabe, Chihiro, et al.
Published: (2021)
Double Debiased Covariate Shift Adaptation Robust to Density-Ratio Estimation
by: Kato, Masahiro, et al.
Published: (2023)
by: Kato, Masahiro, et al.
Published: (2023)
Stabilizing Private LASSO under Heterogeneous Covariates via Anisotropic Objective Perturbation
by: Tanzawa, Haruka, et al.
Published: (2026)
by: Tanzawa, Haruka, et al.
Published: (2026)
Training Dynamics of In-Context Learning in Linear Attention
by: Zhang, Yedi, et al.
Published: (2025)
by: Zhang, Yedi, et al.
Published: (2025)
Z-Error Loss for Training Neural Networks
by: Godin, Guillaume
Published: (2025)
by: Godin, Guillaume
Published: (2025)
Minimax-Optimal Spectral Clustering with Covariance Projection for High-Dimensional Anisotropic Mixtures
by: Huang, Chengzhu, et al.
Published: (2025)
by: Huang, Chengzhu, et al.
Published: (2025)
Active Adaptive Experimental Design for Treatment Effect Estimation with Covariate Choices
by: Kato, Masahiro, et al.
Published: (2024)
by: Kato, Masahiro, et al.
Published: (2024)
Training Dynamics Impact Post-Training Quantization Robustness
by: Catalan-Tatjer, Albert, et al.
Published: (2025)
by: Catalan-Tatjer, Albert, et al.
Published: (2025)
Unveiling the Training Dynamics of ReLU Networks through a Linear Lens
by: Ye, Longqing
Published: (2025)
by: Ye, Longqing
Published: (2025)
Handling Covariate Mismatch in Federated Linear Prediction
by: Ayme, Alexis, et al.
Published: (2026)
by: Ayme, Alexis, et al.
Published: (2026)
Beyond Squared Error: Exploring Loss Design for Enhanced Training of Generative Flow Networks
by: Hu, Rui, et al.
Published: (2024)
by: Hu, Rui, et al.
Published: (2024)
Covariance Density Neural Networks
by: Roy, Om, et al.
Published: (2025)
by: Roy, Om, et al.
Published: (2025)
Spatiotemporal Covariance Neural Networks
by: Cavallo, Andrea, et al.
Published: (2024)
by: Cavallo, Andrea, et al.
Published: (2024)
Sparse Covariance Neural Networks
by: Cavallo, Andrea, et al.
Published: (2024)
by: Cavallo, Andrea, et al.
Published: (2024)
Similar Items
-
Dynamical mean-field theory for a highly heterogeneous neural population
by: Tomita, Futa, et al.
Published: (2024) -
Spectral density of correlated random matrices and nonmonotonic stability in hetero-associative memory networks
by: Tomoto, Arata, et al.
Published: (2025) -
Energy-information trade-off makes the cortical critical power law the optimal coding
by: Tatsukawa, Tsuyoshi, et al.
Published: (2024) -
Recurrent Self-Attention Dynamics: An Energy-Agnostic Perspective from Jacobians
by: Tomihari, Akiyoshi, et al.
Published: (2025) -
Understanding MLP-Mixer as a Wide and Sparse MLP
by: Hayase, Tomohiro, et al.
Published: (2023)