Saved in:
| Main Authors: | Li, Jingwei, Gu, Xinran, Zhang, Jingzhao |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.08022 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Data Mixing Can Induce Phase Transitions in Knowledge Acquisition
by: Gu, Xinran, et al.
Published: (2025)
by: Gu, Xinran, et al.
Published: (2025)
A Quadratic Synchronization Rule for Distributed Deep Learning
by: Gu, Xinran, et al.
Published: (2023)
by: Gu, Xinran, et al.
Published: (2023)
Research and Implementation of Data Enhancement Techniques for Graph Neural Networks
by: Gu, Jingzhao, et al.
Published: (2024)
by: Gu, Jingzhao, et al.
Published: (2024)
Understanding Nonlinear Implicit Bias via Region Counts in Input Space
by: Li, Jingwei, et al.
Published: (2025)
by: Li, Jingwei, et al.
Published: (2025)
Towards Black-Box Membership Inference Attack for Diffusion Models
by: Li, Jingwei, et al.
Published: (2024)
by: Li, Jingwei, et al.
Published: (2024)
On the Condition Number Dependency in Bilevel Optimization
by: Chen, Lesi, et al.
Published: (2025)
by: Chen, Lesi, et al.
Published: (2025)
Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning
by: Liu, Siyuan, et al.
Published: (2026)
by: Liu, Siyuan, et al.
Published: (2026)
Random Masking Finds Winning Tickets for Parameter Efficient Fine-tuning
by: Xu, Jing, et al.
Published: (2024)
by: Xu, Jing, et al.
Published: (2024)
Efficient Sampling on Riemannian Manifolds via Langevin MCMC
by: Cheng, Xiang, et al.
Published: (2024)
by: Cheng, Xiang, et al.
Published: (2024)
Scalable Model Merging with Progressive Layer-wise Distillation
by: Xu, Jing, et al.
Published: (2025)
by: Xu, Jing, et al.
Published: (2025)
On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis
by: Chen, Lesi, et al.
Published: (2023)
by: Chen, Lesi, et al.
Published: (2023)
Faster Gradient Methods for Highly-Smooth Stochastic Bilevel Optimization
by: Chen, Lesi, et al.
Published: (2025)
by: Chen, Lesi, et al.
Published: (2025)
Fast and Multiphase Rates for Nearest Neighbor Classifiers
by: Yang, Pengkun, et al.
Published: (2023)
by: Yang, Pengkun, et al.
Published: (2023)
Second-Order Min-Max Optimization with Lazy Hessians
by: Chen, Lesi, et al.
Published: (2024)
by: Chen, Lesi, et al.
Published: (2024)
Scaling Laws for Optimal Data Mixtures
by: Shukor, Mustafa, et al.
Published: (2025)
by: Shukor, Mustafa, et al.
Published: (2025)
SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models
by: Du, Zhixu, et al.
Published: (2023)
by: Du, Zhixu, et al.
Published: (2023)
Holistic Scaling Laws for Optimal Mixture-of-Experts Architecture Optimization
by: Wan, Weilin, et al.
Published: (2026)
by: Wan, Weilin, et al.
Published: (2026)
Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts
by: He, Shwai, et al.
Published: (2025)
by: He, Shwai, et al.
Published: (2025)
Similarity-Aware Mixture-of-Experts for Data-Efficient Continual Learning
by: Mclaughlin, Connor, et al.
Published: (2026)
by: Mclaughlin, Connor, et al.
Published: (2026)
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
by: Ye, Jiasheng, et al.
Published: (2024)
by: Ye, Jiasheng, et al.
Published: (2024)
Same Architecture, Different Capacity: Optimizer-Induced Spectral Scaling Laws
by: Jha, Nandan Kumar, et al.
Published: (2026)
by: Jha, Nandan Kumar, et al.
Published: (2026)
MetaKube: An Experience-Aware LLM Framework for Kubernetes Failure Diagnosis
by: Sun, Wei, et al.
Published: (2026)
by: Sun, Wei, et al.
Published: (2026)
Scaling Laws for Mixture Pretraining Under Data Constraints
by: Sedova, Anastasiia, et al.
Published: (2026)
by: Sedova, Anastasiia, et al.
Published: (2026)
Functionally Constrained Algorithm Solves Convex Simple Bilevel Problems
by: Zhang, Huaqing, et al.
Published: (2024)
by: Zhang, Huaqing, et al.
Published: (2024)
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
by: Wen, Kaiyue, et al.
Published: (2024)
by: Wen, Kaiyue, et al.
Published: (2024)
Fast Conditional Mixing of MCMC Algorithms for Non-log-concave Distributions
by: Cheng, Xiang, et al.
Published: (2023)
by: Cheng, Xiang, et al.
Published: (2023)
MoEEdit: Efficient and Routing-Stable Knowledge Editing for Mixture-of-Experts LLMs
by: Gu, Yupu, et al.
Published: (2026)
by: Gu, Yupu, et al.
Published: (2026)
Optimizing Pre-Training Data Mixtures with Mixtures of Data Expert Models
by: Belenki, Lior, et al.
Published: (2025)
by: Belenki, Lior, et al.
Published: (2025)
Solving Convex-Concave Problems with $\tilde{\mathcal{O}}(ε^{-4/7})$ Second-Order Oracle Complexity
by: Chen, Lesi, et al.
Published: (2025)
by: Chen, Lesi, et al.
Published: (2025)
Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning
by: Wu, Xiaojun, et al.
Published: (2025)
by: Wu, Xiaojun, et al.
Published: (2025)
U-CAN: Utility-Aware Contrastive Attenuation for Efficient Unlearning in Generative Recommendation
by: Wu, Zezheng, et al.
Published: (2026)
by: Wu, Zezheng, et al.
Published: (2026)
Towards the Law of Capacity Gap in Distilling Language Models
by: Zhang, Chen, et al.
Published: (2023)
by: Zhang, Chen, et al.
Published: (2023)
FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning
by: Ding, Yuyang, et al.
Published: (2025)
by: Ding, Yuyang, et al.
Published: (2025)
MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference
by: Li, Bo, et al.
Published: (2026)
by: Li, Bo, et al.
Published: (2026)
Efficient Mixture-of-Experts LLM Inference with Apple Silicon NPUs
by: Benazir, Afsara, et al.
Published: (2026)
by: Benazir, Afsara, et al.
Published: (2026)
MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining
by: Wen, Bingbing, et al.
Published: (2026)
by: Wen, Bingbing, et al.
Published: (2026)
CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models
by: Gu, Jiawei, et al.
Published: (2024)
by: Gu, Jiawei, et al.
Published: (2024)
Perplexity-Aware Data Scaling Law: Perplexity Landscapes Predict Performance for Continual Pre-training
by: Liu, Lei, et al.
Published: (2025)
by: Liu, Lei, et al.
Published: (2025)
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
by: Zhang, Zhenyu, et al.
Published: (2025)
by: Zhang, Zhenyu, et al.
Published: (2025)
D3: Diversity, Difficulty, and Dependability-Aware Data Selection for Sample-Efficient LLM Instruction Tuning
by: Zhang, Jia, et al.
Published: (2025)
by: Zhang, Jia, et al.
Published: (2025)
Similar Items
-
Data Mixing Can Induce Phase Transitions in Knowledge Acquisition
by: Gu, Xinran, et al.
Published: (2025) -
A Quadratic Synchronization Rule for Distributed Deep Learning
by: Gu, Xinran, et al.
Published: (2023) -
Research and Implementation of Data Enhancement Techniques for Graph Neural Networks
by: Gu, Jingzhao, et al.
Published: (2024) -
Understanding Nonlinear Implicit Bias via Region Counts in Input Space
by: Li, Jingwei, et al.
Published: (2025) -
Towards Black-Box Membership Inference Attack for Diffusion Models
by: Li, Jingwei, et al.
Published: (2024)