:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Wei, Linye, Luo, Zixiang, Tang, Pingzhi, Li, Meng
Format:	Preprint
Published:	2026
Subjects:	Computation and Language
Online Access:	https://arxiv.org/abs/2602.08404
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models
by: Wei, Linye, et al.
Published: (2025)

What Gets Activated: Uncovering Domain and Driver Experts in MoE Language Models
by: Hu, Guimin, et al.
Published: (2026)

LLaDA-MoE: A Sparse MoE Diffusion Language Model
by: Zhu, Fengqi, et al.
Published: (2025)

Leave It to the Experts: Detecting Knowledge Distillation via MoE Expert Signatures
by: Li, Pingzhi, et al.
Published: (2025)

Steering MoE LLMs via Expert (De)Activation
by: Fayyaz, Mohsen, et al.
Published: (2025)

MoE-Prism: Disentangling Monolithic Experts for Elastic MoE Services via Model-System Co-Designs
by: Xia, Xinfeng, et al.
Published: (2025)

Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference
by: Liu, Baihui, et al.
Published: (2026)

Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts
by: Wu, Haoyuan, et al.
Published: (2025)

Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design
by: Zhang, Mohan, et al.
Published: (2025)

Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity
by: Tang, Yehui, et al.
Published: (2025)

EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
by: Qian, Yulei, et al.
Published: (2024)

MH-MoE: Multi-Head Mixture-of-Experts
by: Huang, Shaohan, et al.
Published: (2024)

CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning
by: Liu, Yang, et al.
Published: (2026)

ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns
by: Zhao, Ziyu, et al.
Published: (2026)

OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale
by: Shi, Jingze, et al.
Published: (2026)

Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling
by: Jiang, Fan, et al.
Published: (2026)

Hexa-MoE: Efficient and Heterogeneous-aware Training for Mixture-of-Experts
by: Luo, Shuqing, et al.
Published: (2024)

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
by: Kang, Junmo, et al.
Published: (2024)

MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing
by: Zhou, Hao, et al.
Published: (2024)

EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware Optimization
by: Fu, Zhongqian, et al.
Published: (2025)

Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs
by: Tang, Yehui, et al.
Published: (2025)

$\infty$-MoE: Generalizing Mixture of Experts to Infinite Experts
by: Takashiro, Shota, et al.
Published: (2026)

Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
by: Wei, Tianwen, et al.
Published: (2024)

Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference
by: Luo, Shuqing, et al.
Published: (2025)

Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models
by: Jiang, Songtao, et al.
Published: (2024)

MoE-DiffuSeq: Enhancing Long-Document Diffusion Models with Sparse Attention and Mixture of Experts
by: Christoforos, Alexandros, et al.
Published: (2025)

Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
by: Li, Yunxin, et al.
Published: (2024)

Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast
by: Shi, Chufan, et al.
Published: (2024)

SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation
by: Li, Zichong, et al.
Published: (2025)

MoE-Sieve: Routing-Guided LoRA for Efficient MoE Fine-Tuning
by: Manzoni, Andrea
Published: (2026)

Advancing Expert Specialization for Better MoE
by: Guo, Hongcan, et al.
Published: (2025)

Expert Selections In MoE Models Reveal (Almost) As Much As Text
by: Nuriyev, Amir, et al.
Published: (2026)

GuiLoMo: Allocating Expert Number and Rank for LoRA-MoE via Bilevel Optimization with GuidedSelection Vectors
by: Zhang, Hengyuan, et al.
Published: (2025)

DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts
by: Feng, Yuchen, et al.
Published: (2025)

QuantMoE-Bench: Examining Post-Training Quantization for Mixture-of-Experts
by: Li, Pingzhi, et al.
Published: (2024)

MoE-SpAc: Efficient MoE Inference Based on Speculative Activation Utility in Heterogeneous Edge Scenarios
by: Li, Shuhuai, et al.
Published: (2026)

$\texttt{MoE-RBench}$: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
by: Chen, Guanjie, et al.
Published: (2024)

Mix-MoE: Improving Multilingual Machine Translation of Large Language Models through Mixed MoEs
by: Li, Bo, et al.
Published: (2026)

SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts
by: Muzio, Alexandre, et al.
Published: (2024)

Do Domain-specific Experts exist in MoE-based LLMs?
by: Do, Giang, et al.
Published: (2026)