:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Nguyen, Dang, Li, Zeman, Bateni, Mohammadhossein, Mirrokni, Vahab, Razaviyayn, Meisam, Mirzasoleiman, Baharan
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Computation and Language
Online Access:	https://arxiv.org/abs/2502.17607
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Data Selection for Fine-tuning Vision Language Models via Cross Modal Alignment Trajectories
by: Naharas, Nilay, et al.
Published: (2025)

Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models
by: Li, Zeman, et al.
Published: (2024)

PiKE: Adaptive Data Mixing for Large-Scale Multi-Task Learning Under Low Gradient Conflicts
by: Li, Zeman, et al.
Published: (2025)

Theoretical Perspectives on Data Quality and Synergistic Effects in Pre- and Post-Training Reasoning Models
by: Javanmard, Adel, et al.
Published: (2026)

Understanding the Role of Training Data in Test-Time Scaling
by: Javanmard, Adel, et al.
Published: (2025)

Sampling and Loss Weights in Multi-Domain Training
by: Salmani, Mahdi, et al.
Published: (2025)

Memory Caching: RNNs with Growing Memory
by: Behrouz, Ali, et al.
Published: (2026)

Beyond Semantic Entropy: Boosting LLM Uncertainty Quantification with Pairwise Semantic Similarity
by: Nguyen, Dang, et al.
Published: (2025)

Mini-batch Coresets for Memory-efficient Language Model Training on Data Mixtures
by: Nguyen, Dang, et al.
Published: (2024)

Less is More: Convergence Benefits of Fewer Data Weight Updates over Longer Horizon
by: Das, Rudrajit, et al.
Published: (2026)

It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
by: Behrouz, Ali, et al.
Published: (2025)

Nested Learning: The Illusion of Deep Learning Architectures
by: Behrouz, Ali, et al.
Published: (2025)

TNT: Improving Chunkwise Training for Test-Time Memorization
by: Li, Zeman, et al.
Published: (2025)

Do We Need All the Synthetic Data? Targeted Image Augmentation via Diffusion Models
by: Nguyen, Dang, et al.
Published: (2025)

SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models
by: Yang, Yu, et al.
Published: (2024)

Differentially Private Next-Token Prediction of Large Language Models
by: Flemings, James, et al.
Published: (2024)

Optimal Differentially Private Model Training with Public Data
by: Lowy, Andrew, et al.
Published: (2023)

Tuning the Implicit Regularizer of Masked Diffusion Language Models: Enhancing Generalization via Insights from $k$-Parity
by: Huang, Jianhao, et al.
Published: (2026)

Changing the Training Data Distribution to Reduce Simplicity Bias Improves In-distribution Generalization
by: Nguyen, Dang, et al.
Published: (2024)

DiSK: Differentially Private Optimizer with Simplified Kalman Filter for Noise Reduction
by: Zhang, Xinwei, et al.
Published: (2024)

ATLAS: Learning to Optimally Memorize the Context at Test Time
by: Behrouz, Ali, et al.
Published: (2025)

Early Stopping for Large Reasoning Models via Confidence Dynamics
by: Hosseini, Parsa, et al.
Published: (2026)

Understanding the Robustness of Multi-modal Contrastive Learning to Distribution Shift
by: Xue, Yihao, et al.
Published: (2023)

Graph Contrastive Learning under Heterophily via Graph Filters
by: Yang, Wenhan, et al.
Published: (2023)

Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks
by: Joshi, Siddharth, et al.
Published: (2024)

ECO: Quantized Training without Full-Precision Master Weights
by: Nikdan, Mahdi, et al.
Published: (2026)

Trellis: Learning to Compress Key-Value Memory in Attention Models
by: Karami, Mahdi, et al.
Published: (2025)

Adaptively Private Next-Token Prediction of Large Language Models
by: Flemings, James, et al.
Published: (2024)

Investigating the Impact of Model Width and Density on Generalization in Presence of Label Noise
by: Xue, Yihao, et al.
Published: (2022)

Titans: Learning to Memorize at Test Time
by: Behrouz, Ali, et al.
Published: (2024)

Efficient Data Selection at Scale via Influence Distillation
by: Nikdan, Mahdi, et al.
Published: (2025)

Few-shot Adaptation to Distribution Shifts By Mixing Source and Target Embeddings
by: Xue, Yihao, et al.
Published: (2023)

Data-Efficient Contrastive Self-supervised Learning: Most Beneficial Examples for Supervised Learning Contribute the Least
by: Joshi, Siddharth, et al.
Published: (2023)

Representations Shape Weak-to-Strong Generalization: Theoretical Insights and Empirical Predictions
by: Xue, Yihao, et al.
Published: (2025)

Output Perturbation for Differentially Private Convex Optimization: Faster and More General
by: Lowy, Andrew, et al.
Published: (2021)

Private Stochastic Optimization With Large Worst-Case Lipschitz Parameter
by: Lowy, Andrew, et al.
Published: (2022)

On the Inherent Privacy of Zeroth Order Projected Gradient Descent
by: Gupta, Devansh, et al.
Published: (2025)

SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization
by: Yasuda, Taisuke, et al.
Published: (2024)

Neural Network-Based Score Estimation in Diffusion Models: Optimization and Generalization
by: Han, Yinbin, et al.
Published: (2024)

Policy Gradient Converges to the Globally Optimal Policy for Nearly Linear-Quadratic Regulators
by: Han, Yinbin, et al.
Published: (2023)