:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Behnia, Tina, Thrampoulidis, Christos
Format:	Preprint
Published:	2024
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2402.18884
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Facts in Stats: Impacts of Pretraining Diversity on Language Model Generalization
by: Behnia, Tina, et al.
Published: (2025)

Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations
by: Zhao, Yize, et al.
Published: (2024)

Why Loss Re-weighting Works If You Stop Early: Training Dynamics of Unconstrained Features
by: Zhao, Yize, et al.
Published: (2026)

In-Context Occam's Razor: How Transformers Prefer Simpler Hypotheses on the Fly
by: Deora, Puneesh, et al.
Published: (2025)

Implicit Optimization Bias of Next-Token Prediction in Linear Models
by: Thrampoulidis, Christos
Published: (2024)

Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods
by: Taheri, Hossein, et al.
Published: (2024)

Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated Learning
by: Deng, Wenlong, et al.
Published: (2023)

Memorization Capacity of Multi-Head Attention in Transformers
by: Mahdavi, Sadegh, et al.
Published: (2023)

Thumb on the Scale: Optimal Loss Weighting in Last Layer Retraining
by: Stromberg, Nathan, et al.
Published: (2025)

Memory capacity of two layer neural networks with smooth activations
by: Madden, Liam, et al.
Published: (2023)

Implicit Bias and Fast Convergence Rates for Self-attention
by: Vasudeva, Bhavya, et al.
Published: (2024)

Advantage Shaping as Surrogate Reward Maximization: Unifying Pass@K Policy Gradients
by: Thrampoulidis, Christos, et al.
Published: (2025)

Implicit Bias of Spectral Descent and Muon on Multiclass Separable Data
by: Fan, Chen, et al.
Published: (2025)

Diagonalizing the Softmax: Hadamard Initialization for Tractable Cross-Entropy Dynamics
by: Garrod, Connall, et al.
Published: (2025)

Neural Collapse Beyond the Unconstrained Features Model: Landscape, Dynamics, and Generalization in the Mean-Field Regime
by: Wu, Diyuan, et al.
Published: (2025)

Next-token prediction capacity: general upper bounds and a lower bound for transformers
by: Madden, Liam, et al.
Published: (2024)

Geometric Analysis of Unconstrained Feature Models with $d=K$
by: Shen, Yi, et al.
Published: (2024)

On the Optimization and Generalization of Multi-head Attention
by: Deora, Puneesh, et al.
Published: (2023)

How Muon's Spectral Design Benefits Generalization: A Study on Imbalanced Data
by: Vasudeva, Bhavya, et al.
Published: (2025)

Generalization Analysis for Supervised Contrastive Representation Learning under Non-IID Settings
by: Hieu, Nong Minh, et al.
Published: (2025)

Unconstrained Stochastic CCA: Unifying Multiview and Self-Supervised Learning
by: Chapman, James, et al.
Published: (2023)

On the Properties of Feature Attribution for Supervised Contrastive Learning
by: Arrighi, Leonardo, et al.
Published: (2026)

Class-attribute Priors: Adapting Optimization to Heterogeneity and Fairness Objective
by: Zhang, Xuechen, et al.
Published: (2024)

Understanding Contextual Recall in Transformers: How Finetuning Enables In-Context Reasoning over Pretraining Knowledge
by: Vasudeva, Bhavya, et al.
Published: (2026)

A Refined Generalization Analysis for Extreme Multi-class Supervised Contrastive Representation Learning
by: Hieu, Nong Minh, et al.
Published: (2026)

Transformers as Support Vector Machines
by: Tarzanagh, Davoud Ataee, et al.
Published: (2023)

Self-Supervised Contrastive Learning is Approximately Supervised Contrastive Learning
by: Luthra, Achleshwar, et al.
Published: (2025)

Directional Alignment Mitigates Reward Hacking in Reinforcement Learning for Language Models
by: Deng, Wenlong, et al.
Published: (2026)

Neural Collapse in Cumulative Link Models for Ordinal Regression: An Analysis with Unconstrained Feature Model
by: Ma, Chuang, et al.
Published: (2025)

Time Series Representation Learning with Supervised Contrastive Temporal Transformer
by: Liu, Yuansan, et al.
Published: (2024)

Learning Representations in Video Game Agents with Supervised Contrastive Imitation Learning
by: Celemin, Carlos, et al.
Published: (2025)

Prototypical Contrastive Learning For Improved Few-Shot Audio Classification
by: Sgouropoulos, Christos, et al.
Published: (2025)

Supervised Contrastive Frame Aggregation for Video Representation Learning
by: Chowdhury, Shaif, et al.
Published: (2025)

Token Hidden Reward: Steering Exploration-Exploitation in Group Relative Deep Reinforcement Learning
by: Deng, Wenlong, et al.
Published: (2025)

Generalization Analysis for Deep Contrastive Representation Learning
by: Hieu, Nong Minh, et al.
Published: (2024)

Subgraph Gaussian Embedding Contrast for Self-Supervised Graph Representation Learning
by: Xie, Shifeng, et al.
Published: (2025)

Neural Collapse for Cross-entropy Class-Imbalanced Learning with Unconstrained ReLU Feature Model
by: Dang, Hien, et al.
Published: (2024)

Fully Unconstrained Online Learning
by: Cutkosky, Ashok, et al.
Published: (2024)

On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization
by: Deng, Wenlong, et al.
Published: (2025)

Neural Multivariate Regression: Qualitative Insights from the Unconstrained Feature Model
by: Andriopoulos, George, et al.
Published: (2025)