:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Lingle, Lucas
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2404.05728
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Transformer-VQ: Linear-Time Transformers via Vector Quantization
by: Lingle, Lucas D.
Published: (2023)

A Proof of Learning Rate Transfer under $μ$P
by: Hayou, Soufiane
Published: (2025)

Arithmetic-Mean $μ$P for Modern Architectures: A Unified Learning-Rate Scale for CNNs and ResNets
by: Zhang, Haosong, et al.
Published: (2025)

An Empirical Study of Scaling Laws for Transfer
by: Barnett, Matthew
Published: (2024)

An Empirical Study on Ensemble-Based Transfer Learning Bayesian Optimisation with Mixed Variable Types
by: Trinkle, Natasha, et al.
Published: (2026)

Weight Decay may matter more than muP for Learning Rate Transfer in Practice
by: Kosson, Atli, et al.
Published: (2025)

Learning Rate Transfer in Normalized Transformers
by: Shigida, Boris, et al.
Published: (2026)

Universal Rates of Empirical Risk Minimization
by: Hanneke, Steve, et al.
Published: (2024)

Multi-Task Learning for Metal Alloy Property Prediction: An Empirical Study of Negative Transfer and Mitigation Strategies
by: Kang, Sungwoo
Published: (2025)

Black-box Adversarial Transferability: An Empirical Study in Cybersecurity Perspective
by: Roshan, Khushnaseeb, et al.
Published: (2024)

u-$μ$P: The Unit-Scaled Maximal Update Parametrization
by: Blake, Charlie, et al.
Published: (2024)

Spectral Condition for $μ$P under Width-Depth Scaling
by: Zheng, Chenyu, et al.
Published: (2026)

Super Consistency of Neural Network Landscapes and Learning Rate Transfer
by: Noci, Lorenzo, et al.
Published: (2024)

Optimizers Performance is Task-Dependent: An Empirical Study of Learning Rate Sensitivity in Classification and Regression Tasks
by: Chisom ruth chibuike, et al.
Published: (2026)

Empirical Comparison of Membership Inference Attacks in Deep Transfer Learning
by: Bai, Yuxuan, et al.
Published: (2025)

The lazy (NTK) and rich ($μ$P) regimes: a gentle tutorial
by: Karkada, Dhruva
Published: (2024)

Extending $μ$P: Spectral Conditions for Feature Learning Across Optimizers
by: Gupta, Akshita, et al.
Published: (2026)

How Reasoning Evolves from Post-Training Data: An Empirical Study Using Chess
by: Dionisopoulos, Lucas, et al.
Published: (2026)

Sensitivity of Stability: Theoretical & Empirical Analysis of Replicability for Adaptive Data Selection in Transfer Learning
by: Singh, Prabhav, et al.
Published: (2025)

Understanding the Generalization of In-Context Learning in Transformers: An Empirical Study
by: Zhang, Xingxuan, et al.
Published: (2025)

An Empirical Study of Self-supervised Learning with Wasserstein Distance
by: Yamada, Makoto, et al.
Published: (2023)

μP$^2$: Effective Sharpness Aware Minimization Requires Layerwise Perturbation Scaling
by: Haas, Moritz, et al.
Published: (2024)

GQA-μP: The maximal parameterization update for grouped query attention
by: Chickering, Kyle R., et al.
Published: (2026)

An Empirical Study of Aegis
by: Saragih, Daniel, et al.
Published: (2024)

$μ$LO: Compute-Efficient Meta-Generalization of Learned Optimizers
by: Thérien, Benjamin, et al.
Published: (2024)

An Empirical Study of Federated Prompt Learning for Vision Language Model
by: Wang, Zhihao, et al.
Published: (2025)

Matched-Learning-Rate Analysis of Attention Drift and Transfer Retention in Fine-Tuned CLIP
by: Xia, Ruize
Published: (2026)

Improving Knowledge Distillation in Transfer Learning with Layer-wise Learning Rates
by: Kokane, Shirley, et al.
Published: (2024)

Scaling Diffusion Transformers Efficiently via $μ$P
by: Zheng, Chenyu, et al.
Published: (2025)

$μ$-Parametrization for Mixture of Experts
by: Małaśnicki, Jan, et al.
Published: (2025)

Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
by: Filatov, Oleg, et al.
Published: (2024)

When Active Learning Falls Short: An Empirical Study on Chemical Reaction Extraction
by: Yu, Simin, et al.
Published: (2026)

An Empirical Study of Qwen3 Quantization
by: Zheng, Xingyu, et al.
Published: (2025)

Towards Enhancing the Reproducibility of Deep Learning Bugs: An Empirical Study
by: Shah, Mehil B., et al.
Published: (2024)

Enhancing Two-Player Performance Through Single-Player Knowledge Transfer: An Empirical Study on Atari 2600 Games
by: Saadat, Kimiya, et al.
Published: (2024)

Quantifying Hyperparameter Transfer and the Importance of Embedding Layer Learning Rate
by: Kalra, Dayal Singh, et al.
Published: (2026)

Lag Selection for Univariate Time Series Forecasting using Deep Learning: An Empirical Study
by: Leites, José, et al.
Published: (2024)

An Empirical Study of Fault Localisation Techniques for Deep Learning
by: Humbatova, Nargiz, et al.
Published: (2024)

Towards Transfer Unlearning: Empirical Evidence of Cross-Domain Bias Mitigation
by: Lu, Huimin, et al.
Published: (2024)

Global Convergence and Rich Feature Learning in $L$-Layer Infinite-Width Neural Networks under $μ$P Parametrization
by: Chen, Zixiang, et al.
Published: (2025)