Saved in:
| Main Authors: | Nguyen, Dang, Li, Zeman, Bateni, Mohammadhossein, Mirrokni, Vahab, Razaviyayn, Meisam, Mirzasoleiman, Baharan |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.17607 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Data Selection for Fine-tuning Vision Language Models via Cross Modal Alignment Trajectories
by: Naharas, Nilay, et al.
Published: (2025)
by: Naharas, Nilay, et al.
Published: (2025)
Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models
by: Li, Zeman, et al.
Published: (2024)
by: Li, Zeman, et al.
Published: (2024)
PiKE: Adaptive Data Mixing for Large-Scale Multi-Task Learning Under Low Gradient Conflicts
by: Li, Zeman, et al.
Published: (2025)
by: Li, Zeman, et al.
Published: (2025)
Theoretical Perspectives on Data Quality and Synergistic Effects in Pre- and Post-Training Reasoning Models
by: Javanmard, Adel, et al.
Published: (2026)
by: Javanmard, Adel, et al.
Published: (2026)
Understanding the Role of Training Data in Test-Time Scaling
by: Javanmard, Adel, et al.
Published: (2025)
by: Javanmard, Adel, et al.
Published: (2025)
Sampling and Loss Weights in Multi-Domain Training
by: Salmani, Mahdi, et al.
Published: (2025)
by: Salmani, Mahdi, et al.
Published: (2025)
Memory Caching: RNNs with Growing Memory
by: Behrouz, Ali, et al.
Published: (2026)
by: Behrouz, Ali, et al.
Published: (2026)
Beyond Semantic Entropy: Boosting LLM Uncertainty Quantification with Pairwise Semantic Similarity
by: Nguyen, Dang, et al.
Published: (2025)
by: Nguyen, Dang, et al.
Published: (2025)
Mini-batch Coresets for Memory-efficient Language Model Training on Data Mixtures
by: Nguyen, Dang, et al.
Published: (2024)
by: Nguyen, Dang, et al.
Published: (2024)
Less is More: Convergence Benefits of Fewer Data Weight Updates over Longer Horizon
by: Das, Rudrajit, et al.
Published: (2026)
by: Das, Rudrajit, et al.
Published: (2026)
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
by: Behrouz, Ali, et al.
Published: (2025)
by: Behrouz, Ali, et al.
Published: (2025)
Nested Learning: The Illusion of Deep Learning Architectures
by: Behrouz, Ali, et al.
Published: (2025)
by: Behrouz, Ali, et al.
Published: (2025)
TNT: Improving Chunkwise Training for Test-Time Memorization
by: Li, Zeman, et al.
Published: (2025)
by: Li, Zeman, et al.
Published: (2025)
Do We Need All the Synthetic Data? Targeted Image Augmentation via Diffusion Models
by: Nguyen, Dang, et al.
Published: (2025)
by: Nguyen, Dang, et al.
Published: (2025)
SmallToLarge (S2L): Scalable Data Selection for Fine-tuning Large Language Models by Summarizing Training Trajectories of Small Models
by: Yang, Yu, et al.
Published: (2024)
by: Yang, Yu, et al.
Published: (2024)
Differentially Private Next-Token Prediction of Large Language Models
by: Flemings, James, et al.
Published: (2024)
by: Flemings, James, et al.
Published: (2024)
Optimal Differentially Private Model Training with Public Data
by: Lowy, Andrew, et al.
Published: (2023)
by: Lowy, Andrew, et al.
Published: (2023)
Tuning the Implicit Regularizer of Masked Diffusion Language Models: Enhancing Generalization via Insights from $k$-Parity
by: Huang, Jianhao, et al.
Published: (2026)
by: Huang, Jianhao, et al.
Published: (2026)
Changing the Training Data Distribution to Reduce Simplicity Bias Improves In-distribution Generalization
by: Nguyen, Dang, et al.
Published: (2024)
by: Nguyen, Dang, et al.
Published: (2024)
DiSK: Differentially Private Optimizer with Simplified Kalman Filter for Noise Reduction
by: Zhang, Xinwei, et al.
Published: (2024)
by: Zhang, Xinwei, et al.
Published: (2024)
ATLAS: Learning to Optimally Memorize the Context at Test Time
by: Behrouz, Ali, et al.
Published: (2025)
by: Behrouz, Ali, et al.
Published: (2025)
Early Stopping for Large Reasoning Models via Confidence Dynamics
by: Hosseini, Parsa, et al.
Published: (2026)
by: Hosseini, Parsa, et al.
Published: (2026)
Understanding the Robustness of Multi-modal Contrastive Learning to Distribution Shift
by: Xue, Yihao, et al.
Published: (2023)
by: Xue, Yihao, et al.
Published: (2023)
Graph Contrastive Learning under Heterophily via Graph Filters
by: Yang, Wenhan, et al.
Published: (2023)
by: Yang, Wenhan, et al.
Published: (2023)
Dataset Distillation via Knowledge Distillation: Towards Efficient Self-Supervised Pre-Training of Deep Networks
by: Joshi, Siddharth, et al.
Published: (2024)
by: Joshi, Siddharth, et al.
Published: (2024)
ECO: Quantized Training without Full-Precision Master Weights
by: Nikdan, Mahdi, et al.
Published: (2026)
by: Nikdan, Mahdi, et al.
Published: (2026)
Trellis: Learning to Compress Key-Value Memory in Attention Models
by: Karami, Mahdi, et al.
Published: (2025)
by: Karami, Mahdi, et al.
Published: (2025)
Adaptively Private Next-Token Prediction of Large Language Models
by: Flemings, James, et al.
Published: (2024)
by: Flemings, James, et al.
Published: (2024)
Investigating the Impact of Model Width and Density on Generalization in Presence of Label Noise
by: Xue, Yihao, et al.
Published: (2022)
by: Xue, Yihao, et al.
Published: (2022)
Titans: Learning to Memorize at Test Time
by: Behrouz, Ali, et al.
Published: (2024)
by: Behrouz, Ali, et al.
Published: (2024)
Efficient Data Selection at Scale via Influence Distillation
by: Nikdan, Mahdi, et al.
Published: (2025)
by: Nikdan, Mahdi, et al.
Published: (2025)
Few-shot Adaptation to Distribution Shifts By Mixing Source and Target Embeddings
by: Xue, Yihao, et al.
Published: (2023)
by: Xue, Yihao, et al.
Published: (2023)
Data-Efficient Contrastive Self-supervised Learning: Most Beneficial Examples for Supervised Learning Contribute the Least
by: Joshi, Siddharth, et al.
Published: (2023)
by: Joshi, Siddharth, et al.
Published: (2023)
Representations Shape Weak-to-Strong Generalization: Theoretical Insights and Empirical Predictions
by: Xue, Yihao, et al.
Published: (2025)
by: Xue, Yihao, et al.
Published: (2025)
Output Perturbation for Differentially Private Convex Optimization: Faster and More General
by: Lowy, Andrew, et al.
Published: (2021)
by: Lowy, Andrew, et al.
Published: (2021)
Private Stochastic Optimization With Large Worst-Case Lipschitz Parameter
by: Lowy, Andrew, et al.
Published: (2022)
by: Lowy, Andrew, et al.
Published: (2022)
On the Inherent Privacy of Zeroth Order Projected Gradient Descent
by: Gupta, Devansh, et al.
Published: (2025)
by: Gupta, Devansh, et al.
Published: (2025)
SequentialAttention++ for Block Sparsification: Differentiable Pruning Meets Combinatorial Optimization
by: Yasuda, Taisuke, et al.
Published: (2024)
by: Yasuda, Taisuke, et al.
Published: (2024)
Neural Network-Based Score Estimation in Diffusion Models: Optimization and Generalization
by: Han, Yinbin, et al.
Published: (2024)
by: Han, Yinbin, et al.
Published: (2024)
Policy Gradient Converges to the Globally Optimal Policy for Nearly Linear-Quadratic Regulators
by: Han, Yinbin, et al.
Published: (2023)
by: Han, Yinbin, et al.
Published: (2023)
Similar Items
-
Data Selection for Fine-tuning Vision Language Models via Cross Modal Alignment Trajectories
by: Naharas, Nilay, et al.
Published: (2025) -
Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models
by: Li, Zeman, et al.
Published: (2024) -
PiKE: Adaptive Data Mixing for Large-Scale Multi-Task Learning Under Low Gradient Conflicts
by: Li, Zeman, et al.
Published: (2025) -
Theoretical Perspectives on Data Quality and Synergistic Effects in Pre- and Post-Training Reasoning Models
by: Javanmard, Adel, et al.
Published: (2026) -
Understanding the Role of Training Data in Test-Time Scaling
by: Javanmard, Adel, et al.
Published: (2025)