:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wang, Siqi, Chen, Zhengyu, Li, Bei, He, Keqing, Zhang, Min, Wang, Jingang
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2410.05661
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Scaling and Transferability of Annealing Strategies in Large Language Model Training
by: Wang, Siqi, et al.
Published: (2025)

Sub-Scaling Laws: On the Role of Data Density and Training Strategies in LLMs
by: Chen, Zhengyu, et al.
Published: (2025)

Deconstructing Pre-training: Knowledge Attribution Analysis in MoE and Dense Models
by: Wang, Bo, et al.
Published: (2026)

Collaborative Compression for Large-Scale MoE Deployment on Edge
by: Chen, Yixiao, et al.
Published: (2025)

KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models
by: Xu, Zukang, et al.
Published: (2026)

LocMoE: A Low-Overhead MoE for Large Language Model Training
by: Li, Jing, et al.
Published: (2024)

Practical FP4 Training for Large-Scale MoE Models on Hopper GPUs
by: Zhang, Wuyue, et al.
Published: (2026)

EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language Models
by: Chen, Yuanteng, et al.
Published: (2025)

SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training
by: Tang, Shengkun, et al.
Published: (2026)

Expert Divergence Learning for MoE-based Language Models
by: Li, Jiaang, et al.
Published: (2026)

Generalizing Scaling Laws for Dense and Sparse Large Language Models
by: Hossain, Md Arafat, et al.
Published: (2025)

MoEless: Efficient MoE LLM Serving via Serverless Computing
by: Yu, Hanfei, et al.
Published: (2026)

FFT-MoE: Efficient Federated Fine-Tuning for Foundation Models via Large-scale Sparse MoE under Heterogeneous Edge
by: Hu, Gang, et al.
Published: (2025)

MoE$^2$: Optimizing Collaborative Inference for Edge Large Language Models
by: Jin, Lyudong, et al.
Published: (2025)

Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
by: Shi, Xiaoming, et al.
Published: (2024)

Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training
by: Du, Xianzhi, et al.
Published: (2024)

Piper: Efficient Large-Scale MoE Training via Resource Modeling and Pipelined Hybrid Parallelism
by: Dash, Sajal, et al.
Published: (2026)

Hierarchical LoRA MoE for Efficient CTR Model Scaling
by: Zeng, Zhichen, et al.
Published: (2025)

Analytical FFN-to-MoE Restructuring via Activation Pattern Analysis
by: Pei, Zehua, et al.
Published: (2025)

Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient
by: Ludziejewski, Jan, et al.
Published: (2025)

GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
by: Wu, Haoze, et al.
Published: (2024)

Knowledge Editing on Black-box Large Language Models
by: Song, Xiaoshuai, et al.
Published: (2024)

Adaptive Normalization Mamba with Multi Scale Trend Decomposition and Patch MoE Encoding
by: Jeon, MinCheol
Published: (2025)

MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
by: Xie, Yanyue, et al.
Published: (2024)

LLaDA-MoE: A Sparse MoE Diffusion Language Model
by: Zhu, Fengqi, et al.
Published: (2025)

Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
by: Li, Yunxin, et al.
Published: (2025)

Optimal Scaling Laws for Efficiency Gains in a Theoretical Transformer-Augmented Sectional MoE Framework
by: Sane, Soham
Published: (2025)

MEDNA-DFM: A Dual-View FiLM-MoE Model for Explainable DNA Methylation Prediction
by: He, Yi, et al.
Published: (2026)

Symphony-MoE: Harmonizing Disparate Pre-trained Models into a Coherent Mixture-of-Experts
by: Wang, Qi, et al.
Published: (2025)

MoE-PHDS: One MoE checkpoint for flexible runtime sparsity
by: Hannah, Lauren. A, et al.
Published: (2025)

On Token's Dilemma: Dynamic MoE with Drift-Aware Token Assignment for Continual Learning of Large Vision Language Models
by: Zhao, Chongyang, et al.
Published: (2026)

MoEBlaze: Breaking the Memory Wall for Efficient MoE Training on Modern GPUs
by: Zhang, Jiyuan, et al.
Published: (2026)

Accelerating MoE Model Inference with Expert Sharding
by: Balmau, Oana, et al.
Published: (2025)

Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts
by: Yun, Sukwon, et al.
Published: (2024)

AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies
by: Zhang, Bo-Wen, et al.
Published: (2024)

DBES: A Systematic Benchmark and Metric Suite for Evaluating Expert Specialization in Large-Scale MoEs
by: Wang, Jing, et al.
Published: (2026)

GRIN: GRadient-INformed MoE
by: Liu, Liyuan, et al.
Published: (2024)

Spectral Manifold Regularization for Stable and Modular Routing in Deep MoE Architectures
by: Delibasoglu, Ibrahim
Published: (2026)

Towards Causal Relationship in Indefinite Data: Baseline Model and New Datasets
by: Chen, Hang, et al.
Published: (2024)

Semantic Parallelism: Redefining Efficient MoE Inference via Model-Data Co-Scheduling
by: Li, Yan, et al.
Published: (2025)