:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhao, Yize, Thrampoulidis, Christos
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2601.12011
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Supervised Contrastive Representation Learning: Landscape Analysis with Unconstrained Features
by: Behnia, Tina, et al.
Published: (2024)

Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations
by: Zhao, Yize, et al.
Published: (2024)

How Muon's Spectral Design Benefits Generalization: A Study on Imbalanced Data
by: Vasudeva, Bhavya, et al.
Published: (2025)

Thumb on the Scale: Optimal Loss Weighting in Last Layer Retraining
by: Stromberg, Nathan, et al.
Published: (2025)

Implicit Optimization Bias of Next-Token Prediction in Linear Models
by: Thrampoulidis, Christos
Published: (2024)

DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
by: Deng, Wenlong, et al.
Published: (2024)

Geometry of Semantics in Next-Token Prediction: How Optimization Implicitly Organizes Linguistic Representations
by: Zhao, Yize, et al.
Published: (2025)

Diagonalizing the Softmax: Hadamard Initialization for Tractable Cross-Entropy Dynamics
by: Garrod, Connall, et al.
Published: (2025)

Memorization Capacity of Multi-Head Attention in Transformers
by: Mahdavi, Sadegh, et al.
Published: (2023)

Memory capacity of two layer neural networks with smooth activations
by: Madden, Liam, et al.
Published: (2023)

Advantage Shaping as Surrogate Reward Maximization: Unifying Pass@K Policy Gradients
by: Thrampoulidis, Christos, et al.
Published: (2025)

Facts in Stats: Impacts of Pretraining Diversity on Language Model Generalization
by: Behnia, Tina, et al.
Published: (2025)

Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods
by: Taheri, Hossein, et al.
Published: (2024)

Implicit Bias of Spectral Descent and Muon on Multiclass Separable Data
by: Fan, Chen, et al.
Published: (2025)

Implicit Bias and Fast Convergence Rates for Self-attention
by: Vasudeva, Bhavya, et al.
Published: (2024)

Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated Learning
by: Deng, Wenlong, et al.
Published: (2023)

You Only Train Once
by: Sakaridis, Christos
Published: (2025)

Next-token prediction capacity: general upper bounds and a lower bound for transformers
by: Madden, Liam, et al.
Published: (2024)

On the Optimization and Generalization of Multi-head Attention
by: Deora, Puneesh, et al.
Published: (2023)

Leveraging Online Olympiad-Level Math Problems for LLMs Training and Contamination-Resistant Evaluation
by: Mahdavi, Sadegh, et al.
Published: (2025)

In-Context Occam's Razor: How Transformers Prefer Simpler Hypotheses on the Fly
by: Deora, Puneesh, et al.
Published: (2025)

Neural Collapse for Cross-entropy Class-Imbalanced Learning with Unconstrained ReLU Feature Model
by: Dang, Hien, et al.
Published: (2024)

Infinite Width Models That Work: Why Feature Learning Doesn't Matter as Much as You Think
by: Sernau, Luke
Published: (2024)

Understanding Contextual Recall in Transformers: How Finetuning Enables In-Context Reasoning over Pretraining Knowledge
by: Vasudeva, Bhavya, et al.
Published: (2026)

Class-attribute Priors: Adapting Optimization to Heterogeneity and Fairness Objective
by: Zhang, Xuechen, et al.
Published: (2024)

Transformers as Support Vector Machines
by: Tarzanagh, Davoud Ataee, et al.
Published: (2023)

Geometric Analysis of Unconstrained Feature Models with $d=K$
by: Shen, Yi, et al.
Published: (2024)

Instance-dependent Early Stopping
by: Yuan, Suqin, et al.
Published: (2025)

LLM-Assisted Content Conditional Debiasing for Fair Text Embedding
by: Deng, Wenlong, et al.
Published: (2024)

Neural Collapse Beyond the Unconstrained Features Model: Landscape, Dynamics, and Generalization in the Mean-Field Regime
by: Wu, Diyuan, et al.
Published: (2025)

ReCycle: Resilient Training of Large DNNs using Pipeline Adaptation
by: Gandhi, Swapnil, et al.
Published: (2024)

Early Stopping Tabular In-Context Learning
by: Küken, Jaris, et al.
Published: (2025)

Noisy Early Stopping for Noisy Labels
by: Toner, William, et al.
Published: (2024)

FLOP-Efficient Training: Early Stopping Based on Test-Time Compute Awareness
by: Amer, Hossam, et al.
Published: (2026)

On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization
by: Deng, Wenlong, et al.
Published: (2025)

Neural Multivariate Regression: Qualitative Insights from the Unconstrained Feature Model
by: Andriopoulos, George, et al.
Published: (2025)

Parameter-Free Dynamic Regret for Unconstrained Linear Bandits
by: Rumi, Alberto, et al.
Published: (2026)

Early Stopping Based on Repeated Significance
by: Bax, Eric, et al.
Published: (2024)

Early Stopping for Large Reasoning Models via Confidence Dynamics
by: Hosseini, Parsa, et al.
Published: (2026)

Gradient-Variation Regret Bounds for Unconstrained Online Learning
by: Zhao, Yuheng, et al.
Published: (2026)