:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Li, Jingwei, Gu, Xinran, Zhang, Jingzhao
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2603.08022
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Data Mixing Can Induce Phase Transitions in Knowledge Acquisition
by: Gu, Xinran, et al.
Published: (2025)

A Quadratic Synchronization Rule for Distributed Deep Learning
by: Gu, Xinran, et al.
Published: (2023)

Research and Implementation of Data Enhancement Techniques for Graph Neural Networks
by: Gu, Jingzhao, et al.
Published: (2024)

Understanding Nonlinear Implicit Bias via Region Counts in Input Space
by: Li, Jingwei, et al.
Published: (2025)

Towards Black-Box Membership Inference Attack for Diffusion Models
by: Li, Jingwei, et al.
Published: (2024)

On the Condition Number Dependency in Bilevel Optimization
by: Chen, Lesi, et al.
Published: (2025)

Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning
by: Liu, Siyuan, et al.
Published: (2026)

Random Masking Finds Winning Tickets for Parameter Efficient Fine-tuning
by: Xu, Jing, et al.
Published: (2024)

Efficient Sampling on Riemannian Manifolds via Langevin MCMC
by: Cheng, Xiang, et al.
Published: (2024)

Scalable Model Merging with Progressive Layer-wise Distillation
by: Xu, Jing, et al.
Published: (2025)

On Finding Small Hyper-Gradients in Bilevel Optimization: Hardness Results and Improved Analysis
by: Chen, Lesi, et al.
Published: (2023)

Faster Gradient Methods for Highly-Smooth Stochastic Bilevel Optimization
by: Chen, Lesi, et al.
Published: (2025)

Fast and Multiphase Rates for Nearest Neighbor Classifiers
by: Yang, Pengkun, et al.
Published: (2023)

Second-Order Min-Max Optimization with Lazy Hessians
by: Chen, Lesi, et al.
Published: (2024)

Scaling Laws for Optimal Data Mixtures
by: Shukor, Mustafa, et al.
Published: (2025)

SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models
by: Du, Zhixu, et al.
Published: (2023)

Holistic Scaling Laws for Optimal Mixture-of-Experts Architecture Optimization
by: Wan, Weilin, et al.
Published: (2026)

Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts
by: He, Shwai, et al.
Published: (2025)

Similarity-Aware Mixture-of-Experts for Data-Efficient Continual Learning
by: Mclaughlin, Connor, et al.
Published: (2026)

Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
by: Ye, Jiasheng, et al.
Published: (2024)

Same Architecture, Different Capacity: Optimizer-Induced Spectral Scaling Laws
by: Jha, Nandan Kumar, et al.
Published: (2026)

MetaKube: An Experience-Aware LLM Framework for Kubernetes Failure Diagnosis
by: Sun, Wei, et al.
Published: (2026)

Scaling Laws for Mixture Pretraining Under Data Constraints
by: Sedova, Anastasiia, et al.
Published: (2026)

Functionally Constrained Algorithm Solves Convex Simple Bilevel Problems
by: Zhang, Huaqing, et al.
Published: (2024)

From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
by: Wen, Kaiyue, et al.
Published: (2024)

Fast Conditional Mixing of MCMC Algorithms for Non-log-concave Distributions
by: Cheng, Xiang, et al.
Published: (2023)

MoEEdit: Efficient and Routing-Stable Knowledge Editing for Mixture-of-Experts LLMs
by: Gu, Yupu, et al.
Published: (2026)

Optimizing Pre-Training Data Mixtures with Mixtures of Data Expert Models
by: Belenki, Lior, et al.
Published: (2025)

Solving Convex-Concave Problems with $\tilde{\mathcal{O}}(ε^{-4/7})$ Second-Order Oracle Complexity
by: Chen, Lesi, et al.
Published: (2025)

Beyond Scaling Law: A Data-Efficient Distillation Framework for Reasoning
by: Wu, Xiaojun, et al.
Published: (2025)

U-CAN: Utility-Aware Contrastive Attenuation for Efficient Unlearning in Generative Recommendation
by: Wu, Zezheng, et al.
Published: (2026)

Towards the Law of Capacity Gap in Distilling Language Models
by: Zhang, Chen, et al.
Published: (2023)

FAPO: Flawed-Aware Policy Optimization for Efficient and Reliable Reasoning
by: Ding, Yuyang, et al.
Published: (2025)

MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference
by: Li, Bo, et al.
Published: (2026)

Efficient Mixture-of-Experts LLM Inference with Apple Silicon NPUs
by: Benazir, Afsara, et al.
Published: (2026)

MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining
by: Wen, Bingbing, et al.
Published: (2026)

CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models
by: Gu, Jiawei, et al.
Published: (2024)

Perplexity-Aware Data Scaling Law: Perplexity Landscapes Predict Performance for Continual Pre-training
by: Liu, Lei, et al.
Published: (2025)

R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
by: Zhang, Zhenyu, et al.
Published: (2025)

D3: Diversity, Difficulty, and Dependability-Aware Data Selection for Sample-Efficient LLM Instruction Tuning
by: Zhang, Jia, et al.
Published: (2025)