:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Watanabe, Taishi, Karakida, Ryo, Teramae, Jun-nosuke
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2601.06961
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Dynamical mean-field theory for a highly heterogeneous neural population
by: Tomita, Futa, et al.
Published: (2024)

Spectral density of correlated random matrices and nonmonotonic stability in hetero-associative memory networks
by: Tomoto, Arata, et al.
Published: (2025)

Energy-information trade-off makes the cortical critical power law the optimal coding
by: Tatsukawa, Tsuyoshi, et al.
Published: (2024)

Recurrent Self-Attention Dynamics: An Energy-Agnostic Perspective from Jacobians
by: Tomihari, Akiyoshi, et al.
Published: (2025)

Understanding MLP-Mixer as a Wide and Sparse MLP
by: Hayase, Tomohiro, et al.
Published: (2023)

On the Parameterization of Second-Order Optimization Effective Towards the Infinite Width
by: Ishikawa, Satoki, et al.
Published: (2023)

Continual Learning in Modern Hopfield Networks with an Application to Diffusion Models
by: Takeda, Ken, et al.
Published: (2026)

Self-attention Networks Localize When QK-eigenspectrum Concentrates
by: Bao, Han, et al.
Published: (2024)

Optimal Layer Selection for Latent Data Augmentation
by: Takase, Tomoumi, et al.
Published: (2024)

Local Loss Optimization in the Infinite Width: Stable Parameterization of Predictive Coding Networks and Target Propagation
by: Ishikawa, Satoki, et al.
Published: (2024)

Infinite-Width Limit of a Single Attention Layer: Analysis via Tensor Programs
by: Sakai, Mana, et al.
Published: (2025)

Gaussian Equivalence for Self-Attention: Asymptotic Spectral Analysis of Attention Matrix
by: Hayase, Tomohiro, et al.
Published: (2025)

A Unified Framework for Critical Scaling of Inverse Temperature in Self-Attention
by: Hayase, Tomohiro, et al.
Published: (2026)

Hierarchical Associative Memory, Parallelized MLP-Mixer, and Symmetry Breaking
by: Karakida, Ryo, et al.
Published: (2024)

Induced Covariance for Causal Discovery in Linear Sparse Structures
by: Mohseni-Sehdeh, Saeed, et al.
Published: (2024)

Personalized Binomial DAGs Learning with Network Structured Covariates
by: Zhao, Boxin, et al.
Published: (2024)

Balancing Speed and Stability: The Trade-offs of FP8 vs. BF16 Training in LLMs
by: Fujii, Kazuki, et al.
Published: (2024)

The Structure of Cross-Validation Error: Stability, Covariance, and Minimax Limits
by: Nachum, Ido, et al.
Published: (2025)

On the Optimal Reasoning Length for RL-Trained Language Models
by: Nohara, Daisuke, et al.
Published: (2026)

Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
by: Nakamura, Taishi, et al.
Published: (2025)

Expectation Error Bounds for Transfer Learning in Linear Regression and Linear Neural Networks
by: Liu, Meitong, et al.
Published: (2026)

In-context Learning for Mixture of Linear Regressions: Existence, Generalization and Training Dynamics
by: Jin, Yanhao, et al.
Published: (2024)

The Price of Linear Time: Error Analysis of Structured Kernel Interpolation
by: Moreno, Alexander, et al.
Published: (2025)

Learning Joint and Individual Structure in Network Data with Covariates
by: James, Carson, et al.
Published: (2024)

Understanding Generalization in Transformers: Error Bounds and Training Dynamics Under Benign and Harmful Overfitting
by: Zhang, Yingying, et al.
Published: (2025)

Mitigating the Impact of Labeling Errors on Training via Rockafellian Relaxation
by: Chen, Louis L., et al.
Published: (2024)

AutoLL: Automatic Linear Layout of Graphs based on Deep Neural Network
by: Watanabe, Chihiro, et al.
Published: (2021)

Double Debiased Covariate Shift Adaptation Robust to Density-Ratio Estimation
by: Kato, Masahiro, et al.
Published: (2023)

Stabilizing Private LASSO under Heterogeneous Covariates via Anisotropic Objective Perturbation
by: Tanzawa, Haruka, et al.
Published: (2026)

Training Dynamics of In-Context Learning in Linear Attention
by: Zhang, Yedi, et al.
Published: (2025)

Z-Error Loss for Training Neural Networks
by: Godin, Guillaume
Published: (2025)

Minimax-Optimal Spectral Clustering with Covariance Projection for High-Dimensional Anisotropic Mixtures
by: Huang, Chengzhu, et al.
Published: (2025)

Active Adaptive Experimental Design for Treatment Effect Estimation with Covariate Choices
by: Kato, Masahiro, et al.
Published: (2024)

Training Dynamics Impact Post-Training Quantization Robustness
by: Catalan-Tatjer, Albert, et al.
Published: (2025)

Unveiling the Training Dynamics of ReLU Networks through a Linear Lens
by: Ye, Longqing
Published: (2025)

Handling Covariate Mismatch in Federated Linear Prediction
by: Ayme, Alexis, et al.
Published: (2026)

Beyond Squared Error: Exploring Loss Design for Enhanced Training of Generative Flow Networks
by: Hu, Rui, et al.
Published: (2024)

Covariance Density Neural Networks
by: Roy, Om, et al.
Published: (2025)

Spatiotemporal Covariance Neural Networks
by: Cavallo, Andrea, et al.
Published: (2024)

Sparse Covariance Neural Networks
by: Cavallo, Andrea, et al.
Published: (2024)