Saved in:
| Main Authors: | Behnia, Tina, Thrampoulidis, Christos |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.18884 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Facts in Stats: Impacts of Pretraining Diversity on Language Model Generalization
by: Behnia, Tina, et al.
Published: (2025)
by: Behnia, Tina, et al.
Published: (2025)
Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations
by: Zhao, Yize, et al.
Published: (2024)
by: Zhao, Yize, et al.
Published: (2024)
Why Loss Re-weighting Works If You Stop Early: Training Dynamics of Unconstrained Features
by: Zhao, Yize, et al.
Published: (2026)
by: Zhao, Yize, et al.
Published: (2026)
In-Context Occam's Razor: How Transformers Prefer Simpler Hypotheses on the Fly
by: Deora, Puneesh, et al.
Published: (2025)
by: Deora, Puneesh, et al.
Published: (2025)
Implicit Optimization Bias of Next-Token Prediction in Linear Models
by: Thrampoulidis, Christos
Published: (2024)
by: Thrampoulidis, Christos
Published: (2024)
Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods
by: Taheri, Hossein, et al.
Published: (2024)
by: Taheri, Hossein, et al.
Published: (2024)
Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated Learning
by: Deng, Wenlong, et al.
Published: (2023)
by: Deng, Wenlong, et al.
Published: (2023)
Memorization Capacity of Multi-Head Attention in Transformers
by: Mahdavi, Sadegh, et al.
Published: (2023)
by: Mahdavi, Sadegh, et al.
Published: (2023)
Thumb on the Scale: Optimal Loss Weighting in Last Layer Retraining
by: Stromberg, Nathan, et al.
Published: (2025)
by: Stromberg, Nathan, et al.
Published: (2025)
Memory capacity of two layer neural networks with smooth activations
by: Madden, Liam, et al.
Published: (2023)
by: Madden, Liam, et al.
Published: (2023)
Implicit Bias and Fast Convergence Rates for Self-attention
by: Vasudeva, Bhavya, et al.
Published: (2024)
by: Vasudeva, Bhavya, et al.
Published: (2024)
Advantage Shaping as Surrogate Reward Maximization: Unifying Pass@K Policy Gradients
by: Thrampoulidis, Christos, et al.
Published: (2025)
by: Thrampoulidis, Christos, et al.
Published: (2025)
Implicit Bias of Spectral Descent and Muon on Multiclass Separable Data
by: Fan, Chen, et al.
Published: (2025)
by: Fan, Chen, et al.
Published: (2025)
Diagonalizing the Softmax: Hadamard Initialization for Tractable Cross-Entropy Dynamics
by: Garrod, Connall, et al.
Published: (2025)
by: Garrod, Connall, et al.
Published: (2025)
Neural Collapse Beyond the Unconstrained Features Model: Landscape, Dynamics, and Generalization in the Mean-Field Regime
by: Wu, Diyuan, et al.
Published: (2025)
by: Wu, Diyuan, et al.
Published: (2025)
Next-token prediction capacity: general upper bounds and a lower bound for transformers
by: Madden, Liam, et al.
Published: (2024)
by: Madden, Liam, et al.
Published: (2024)
Geometric Analysis of Unconstrained Feature Models with $d=K$
by: Shen, Yi, et al.
Published: (2024)
by: Shen, Yi, et al.
Published: (2024)
On the Optimization and Generalization of Multi-head Attention
by: Deora, Puneesh, et al.
Published: (2023)
by: Deora, Puneesh, et al.
Published: (2023)
How Muon's Spectral Design Benefits Generalization: A Study on Imbalanced Data
by: Vasudeva, Bhavya, et al.
Published: (2025)
by: Vasudeva, Bhavya, et al.
Published: (2025)
Generalization Analysis for Supervised Contrastive Representation Learning under Non-IID Settings
by: Hieu, Nong Minh, et al.
Published: (2025)
by: Hieu, Nong Minh, et al.
Published: (2025)
Unconstrained Stochastic CCA: Unifying Multiview and Self-Supervised Learning
by: Chapman, James, et al.
Published: (2023)
by: Chapman, James, et al.
Published: (2023)
On the Properties of Feature Attribution for Supervised Contrastive Learning
by: Arrighi, Leonardo, et al.
Published: (2026)
by: Arrighi, Leonardo, et al.
Published: (2026)
Class-attribute Priors: Adapting Optimization to Heterogeneity and Fairness Objective
by: Zhang, Xuechen, et al.
Published: (2024)
by: Zhang, Xuechen, et al.
Published: (2024)
Understanding Contextual Recall in Transformers: How Finetuning Enables In-Context Reasoning over Pretraining Knowledge
by: Vasudeva, Bhavya, et al.
Published: (2026)
by: Vasudeva, Bhavya, et al.
Published: (2026)
A Refined Generalization Analysis for Extreme Multi-class Supervised Contrastive Representation Learning
by: Hieu, Nong Minh, et al.
Published: (2026)
by: Hieu, Nong Minh, et al.
Published: (2026)
Transformers as Support Vector Machines
by: Tarzanagh, Davoud Ataee, et al.
Published: (2023)
by: Tarzanagh, Davoud Ataee, et al.
Published: (2023)
Self-Supervised Contrastive Learning is Approximately Supervised Contrastive Learning
by: Luthra, Achleshwar, et al.
Published: (2025)
by: Luthra, Achleshwar, et al.
Published: (2025)
Directional Alignment Mitigates Reward Hacking in Reinforcement Learning for Language Models
by: Deng, Wenlong, et al.
Published: (2026)
by: Deng, Wenlong, et al.
Published: (2026)
Neural Collapse in Cumulative Link Models for Ordinal Regression: An Analysis with Unconstrained Feature Model
by: Ma, Chuang, et al.
Published: (2025)
by: Ma, Chuang, et al.
Published: (2025)
Time Series Representation Learning with Supervised Contrastive Temporal Transformer
by: Liu, Yuansan, et al.
Published: (2024)
by: Liu, Yuansan, et al.
Published: (2024)
Learning Representations in Video Game Agents with Supervised Contrastive Imitation Learning
by: Celemin, Carlos, et al.
Published: (2025)
by: Celemin, Carlos, et al.
Published: (2025)
Prototypical Contrastive Learning For Improved Few-Shot Audio Classification
by: Sgouropoulos, Christos, et al.
Published: (2025)
by: Sgouropoulos, Christos, et al.
Published: (2025)
Supervised Contrastive Frame Aggregation for Video Representation Learning
by: Chowdhury, Shaif, et al.
Published: (2025)
by: Chowdhury, Shaif, et al.
Published: (2025)
Token Hidden Reward: Steering Exploration-Exploitation in Group Relative Deep Reinforcement Learning
by: Deng, Wenlong, et al.
Published: (2025)
by: Deng, Wenlong, et al.
Published: (2025)
Generalization Analysis for Deep Contrastive Representation Learning
by: Hieu, Nong Minh, et al.
Published: (2024)
by: Hieu, Nong Minh, et al.
Published: (2024)
Subgraph Gaussian Embedding Contrast for Self-Supervised Graph Representation Learning
by: Xie, Shifeng, et al.
Published: (2025)
by: Xie, Shifeng, et al.
Published: (2025)
Neural Collapse for Cross-entropy Class-Imbalanced Learning with Unconstrained ReLU Feature Model
by: Dang, Hien, et al.
Published: (2024)
by: Dang, Hien, et al.
Published: (2024)
Fully Unconstrained Online Learning
by: Cutkosky, Ashok, et al.
Published: (2024)
by: Cutkosky, Ashok, et al.
Published: (2024)
On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization
by: Deng, Wenlong, et al.
Published: (2025)
by: Deng, Wenlong, et al.
Published: (2025)
Neural Multivariate Regression: Qualitative Insights from the Unconstrained Feature Model
by: Andriopoulos, George, et al.
Published: (2025)
by: Andriopoulos, George, et al.
Published: (2025)
Similar Items
-
Facts in Stats: Impacts of Pretraining Diversity on Language Model Generalization
by: Behnia, Tina, et al.
Published: (2025) -
Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations
by: Zhao, Yize, et al.
Published: (2024) -
Why Loss Re-weighting Works If You Stop Early: Training Dynamics of Unconstrained Features
by: Zhao, Yize, et al.
Published: (2026) -
In-Context Occam's Razor: How Transformers Prefer Simpler Hypotheses on the Fly
by: Deora, Puneesh, et al.
Published: (2025) -
Implicit Optimization Bias of Next-Token Prediction in Linear Models
by: Thrampoulidis, Christos
Published: (2024)