Saved in:
| Main Authors: | Duzgun, Ahmet Cagri, Jelassi, Samy, Li, Yuanzhi |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.00968 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
How Does Overparameterization Affect Machine Unlearning of Deep Neural Networks?
by: Alon, Gal, et al.
Published: (2025)
by: Alon, Gal, et al.
Published: (2025)
LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
by: Prabhakar, Akshara, et al.
Published: (2024)
by: Prabhakar, Akshara, et al.
Published: (2024)
Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models
by: Jelassi, Samy, et al.
Published: (2026)
by: Jelassi, Samy, et al.
Published: (2026)
Collective Model Intelligence Requires Compatible Specialization
by: Pari, Jyothish, et al.
Published: (2024)
by: Pari, Jyothish, et al.
Published: (2024)
To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning
by: Qin, Tian, et al.
Published: (2025)
by: Qin, Tian, et al.
Published: (2025)
How Does Quantization Affect Multilingual LLMs?
by: Marchisio, Kelly, et al.
Published: (2024)
by: Marchisio, Kelly, et al.
Published: (2024)
Universal Length Generalization with Turing Programs
by: Hou, Kaiying, et al.
Published: (2024)
by: Hou, Kaiying, et al.
Published: (2024)
Repeat After Me: Transformers are Better than State Space Models at Copying
by: Jelassi, Samy, et al.
Published: (2024)
by: Jelassi, Samy, et al.
Published: (2024)
Q-Probe: A Lightweight Approach to Reward Maximization for Language Models
by: Li, Kenneth, et al.
Published: (2024)
by: Li, Kenneth, et al.
Published: (2024)
Mixture of Parrots: Experts improve memorization more than reasoning
by: Jelassi, Samy, et al.
Published: (2024)
by: Jelassi, Samy, et al.
Published: (2024)
The Recurrent Transformer: Greater Effective Depth and Efficient Decoding
by: Oncescu, Costin-Andrei, et al.
Published: (2026)
by: Oncescu, Costin-Andrei, et al.
Published: (2026)
Let Me Think! A Long Chain-of-Thought Can Be Worth Exponentially Many Short Ones
by: Mirtaheri, Parsa, et al.
Published: (2025)
by: Mirtaheri, Parsa, et al.
Published: (2025)
The Role of Sparsity for Length Generalization in Transformers
by: Golowich, Noah, et al.
Published: (2025)
by: Golowich, Noah, et al.
Published: (2025)
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining
by: Zhao, Rosie, et al.
Published: (2025)
by: Zhao, Rosie, et al.
Published: (2025)
Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data
by: Li, Binghui, et al.
Published: (2024)
by: Li, Binghui, et al.
Published: (2024)
Theoretical Limitations of Ensembles in the Age of Overparameterization
by: Dern, Niclas, et al.
Published: (2024)
by: Dern, Niclas, et al.
Published: (2024)
The Interpolating Information Criterion for Overparameterized Models
by: Hodgkinson, Liam, et al.
Published: (2023)
by: Hodgkinson, Liam, et al.
Published: (2023)
Privacy for Free in the Overparameterized Regime
by: Bombari, Simone, et al.
Published: (2024)
by: Bombari, Simone, et al.
Published: (2024)
Machine Unlearning under Overparameterization
by: Block, Jacob L., et al.
Published: (2025)
by: Block, Jacob L., et al.
Published: (2025)
How Does Code Pretraining Affect Language Model Task Performance?
by: Petty, Jackson, et al.
Published: (2024)
by: Petty, Jackson, et al.
Published: (2024)
How Much Training Data is Memorized in Overparameterized Autoencoders? An Inverse Problem Perspective on Memorization Evaluation
by: Abitbul, Koren, et al.
Published: (2023)
by: Abitbul, Koren, et al.
Published: (2023)
How Does Response Length Affect Long-Form Factuality
by: Zhao, James Xu, et al.
Published: (2025)
by: Zhao, James Xu, et al.
Published: (2025)
The Role of Symmetry in Optimizing Overparameterized Networks
by: Sareen, Kusha, et al.
Published: (2026)
by: Sareen, Kusha, et al.
Published: (2026)
Provable Generalization in Overparameterized Neural Nets
by: Dhingra, Aviral
Published: (2025)
by: Dhingra, Aviral
Published: (2025)
A Note on Generalization in Variational Autoencoders: How Effective Is Synthetic Data & Overparameterization?
by: Xiao, Tim Z., et al.
Published: (2023)
by: Xiao, Tim Z., et al.
Published: (2023)
On the Clean Generalization and Robust Overfitting in Adversarial Training from Two Theoretical Views: Representation Complexity and Training Dynamics
by: Li, Binghui, et al.
Published: (2023)
by: Li, Binghui, et al.
Published: (2023)
On the Benefits of Weight Normalization for Overparameterized Matrix Sensing
by: Wei, Yudong, et al.
Published: (2025)
by: Wei, Yudong, et al.
Published: (2025)
Overparameterized Multiple Linear Regression as Hyper-Curve Fitting
by: Atza, E., et al.
Published: (2024)
by: Atza, E., et al.
Published: (2024)
Dual Space Preconditioning for Gradient Descent in the Overparameterized Regime
by: Ghane, Reza, et al.
Published: (2026)
by: Ghane, Reza, et al.
Published: (2026)
An Analytical Model for Overparameterized Learning Under Class Imbalance
by: Mor, Eliav, et al.
Published: (2025)
by: Mor, Eliav, et al.
Published: (2025)
Feature Impact Analysis on Top Long-Jump Performances with Quantile Random Forest and Explainable AI Techniques
by: Gan, Qi, et al.
Published: (2025)
by: Gan, Qi, et al.
Published: (2025)
Critical Influence of Overparameterization on Sharpness-aware Minimization
by: Shin, Sungbin, et al.
Published: (2023)
by: Shin, Sungbin, et al.
Published: (2023)
Bayesian Inference for Consistent Predictions in Overparameterized Nonlinear Regression
by: Wakayama, Tomoya
Published: (2024)
by: Wakayama, Tomoya
Published: (2024)
How Does Preconditioning Guide Feature Learning in Deep Neural Networks?
by: Yoshida, Kotaro, et al.
Published: (2025)
by: Yoshida, Kotaro, et al.
Published: (2025)
Local Linear Recovery Guarantee of Deep Neural Networks at Overparameterization
by: Zhang, Yaoyu, et al.
Published: (2024)
by: Zhang, Yaoyu, et al.
Published: (2024)
Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression
by: Wu, Jingfeng, et al.
Published: (2025)
by: Wu, Jingfeng, et al.
Published: (2025)
Task Shift: From Classification to Regression in Overparameterized Linear Models
by: LaBonte, Tyler, et al.
Published: (2025)
by: LaBonte, Tyler, et al.
Published: (2025)
Estimation of Toeplitz Covariance Matrices using Overparameterized Gradient Descent
by: Busbib, Daniel, et al.
Published: (2025)
by: Busbib, Daniel, et al.
Published: (2025)
Precise Asymptotic Generalization for Multiclass Classification with Overparameterized Linear Models
by: Wu, David X., et al.
Published: (2023)
by: Wu, David X., et al.
Published: (2023)
Implicit Regularization and Generalization in Overparameterized Neural Networks
by: Johannsen, Zeran
Published: (2026)
by: Johannsen, Zeran
Published: (2026)
Similar Items
-
How Does Overparameterization Affect Machine Unlearning of Deep Neural Networks?
by: Alon, Gal, et al.
Published: (2025) -
LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
by: Prabhakar, Akshara, et al.
Published: (2024) -
Matching Features, Not Tokens: Energy-Based Fine-Tuning of Language Models
by: Jelassi, Samy, et al.
Published: (2026) -
Collective Model Intelligence Requires Compatible Specialization
by: Pari, Jyothish, et al.
Published: (2024) -
To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning
by: Qin, Tian, et al.
Published: (2025)