Saved in:
| Main Authors: | Wei, Linye, Luo, Zixiang, Tang, Pingzhi, Li, Meng |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.08404 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models
by: Wei, Linye, et al.
Published: (2025)
by: Wei, Linye, et al.
Published: (2025)
What Gets Activated: Uncovering Domain and Driver Experts in MoE Language Models
by: Hu, Guimin, et al.
Published: (2026)
by: Hu, Guimin, et al.
Published: (2026)
LLaDA-MoE: A Sparse MoE Diffusion Language Model
by: Zhu, Fengqi, et al.
Published: (2025)
by: Zhu, Fengqi, et al.
Published: (2025)
Leave It to the Experts: Detecting Knowledge Distillation via MoE Expert Signatures
by: Li, Pingzhi, et al.
Published: (2025)
by: Li, Pingzhi, et al.
Published: (2025)
Steering MoE LLMs via Expert (De)Activation
by: Fayyaz, Mohsen, et al.
Published: (2025)
by: Fayyaz, Mohsen, et al.
Published: (2025)
MoE-Prism: Disentangling Monolithic Experts for Elastic MoE Services via Model-System Co-Designs
by: Xia, Xinfeng, et al.
Published: (2025)
by: Xia, Xinfeng, et al.
Published: (2025)
Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference
by: Liu, Baihui, et al.
Published: (2026)
by: Liu, Baihui, et al.
Published: (2026)
Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts
by: Wu, Haoyuan, et al.
Published: (2025)
by: Wu, Haoyuan, et al.
Published: (2025)
Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design
by: Zhang, Mohan, et al.
Published: (2025)
by: Zhang, Mohan, et al.
Published: (2025)
Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity
by: Tang, Yehui, et al.
Published: (2025)
by: Tang, Yehui, et al.
Published: (2025)
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
by: Qian, Yulei, et al.
Published: (2024)
by: Qian, Yulei, et al.
Published: (2024)
MH-MoE: Multi-Head Mixture-of-Experts
by: Huang, Shaohan, et al.
Published: (2024)
by: Huang, Shaohan, et al.
Published: (2024)
CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning
by: Liu, Yang, et al.
Published: (2026)
by: Liu, Yang, et al.
Published: (2026)
ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns
by: Zhao, Ziyu, et al.
Published: (2026)
by: Zhao, Ziyu, et al.
Published: (2026)
OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale
by: Shi, Jingze, et al.
Published: (2026)
by: Shi, Jingze, et al.
Published: (2026)
Marco-MoE: Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling
by: Jiang, Fan, et al.
Published: (2026)
by: Jiang, Fan, et al.
Published: (2026)
Hexa-MoE: Efficient and Heterogeneous-aware Training for Mixture-of-Experts
by: Luo, Shuqing, et al.
Published: (2024)
by: Luo, Shuqing, et al.
Published: (2024)
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
by: Kang, Junmo, et al.
Published: (2024)
by: Kang, Junmo, et al.
Published: (2024)
MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing
by: Zhou, Hao, et al.
Published: (2024)
by: Zhou, Hao, et al.
Published: (2024)
EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware Optimization
by: Fu, Zhongqian, et al.
Published: (2025)
by: Fu, Zhongqian, et al.
Published: (2025)
Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs
by: Tang, Yehui, et al.
Published: (2025)
by: Tang, Yehui, et al.
Published: (2025)
$\infty$-MoE: Generalizing Mixture of Experts to Infinite Experts
by: Takashiro, Shota, et al.
Published: (2026)
by: Takashiro, Shota, et al.
Published: (2026)
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
by: Wei, Tianwen, et al.
Published: (2024)
by: Wei, Tianwen, et al.
Published: (2024)
Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference
by: Luo, Shuqing, et al.
Published: (2025)
by: Luo, Shuqing, et al.
Published: (2025)
Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models
by: Jiang, Songtao, et al.
Published: (2024)
by: Jiang, Songtao, et al.
Published: (2024)
MoE-DiffuSeq: Enhancing Long-Document Diffusion Models with Sparse Attention and Mixture of Experts
by: Christoforos, Alexandros, et al.
Published: (2025)
by: Christoforos, Alexandros, et al.
Published: (2025)
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
by: Li, Yunxin, et al.
Published: (2024)
by: Li, Yunxin, et al.
Published: (2024)
Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast
by: Shi, Chufan, et al.
Published: (2024)
by: Shi, Chufan, et al.
Published: (2024)
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation
by: Li, Zichong, et al.
Published: (2025)
by: Li, Zichong, et al.
Published: (2025)
MoE-Sieve: Routing-Guided LoRA for Efficient MoE Fine-Tuning
by: Manzoni, Andrea
Published: (2026)
by: Manzoni, Andrea
Published: (2026)
Advancing Expert Specialization for Better MoE
by: Guo, Hongcan, et al.
Published: (2025)
by: Guo, Hongcan, et al.
Published: (2025)
Expert Selections In MoE Models Reveal (Almost) As Much As Text
by: Nuriyev, Amir, et al.
Published: (2026)
by: Nuriyev, Amir, et al.
Published: (2026)
GuiLoMo: Allocating Expert Number and Rank for LoRA-MoE via Bilevel Optimization with GuidedSelection Vectors
by: Zhang, Hengyuan, et al.
Published: (2025)
by: Zhang, Hengyuan, et al.
Published: (2025)
DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts
by: Feng, Yuchen, et al.
Published: (2025)
by: Feng, Yuchen, et al.
Published: (2025)
QuantMoE-Bench: Examining Post-Training Quantization for Mixture-of-Experts
by: Li, Pingzhi, et al.
Published: (2024)
by: Li, Pingzhi, et al.
Published: (2024)
MoE-SpAc: Efficient MoE Inference Based on Speculative Activation Utility in Heterogeneous Edge Scenarios
by: Li, Shuhuai, et al.
Published: (2026)
by: Li, Shuhuai, et al.
Published: (2026)
$\texttt{MoE-RBench}$: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
by: Chen, Guanjie, et al.
Published: (2024)
by: Chen, Guanjie, et al.
Published: (2024)
Mix-MoE: Improving Multilingual Machine Translation of Large Language Models through Mixed MoEs
by: Li, Bo, et al.
Published: (2026)
by: Li, Bo, et al.
Published: (2026)
SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts
by: Muzio, Alexandre, et al.
Published: (2024)
by: Muzio, Alexandre, et al.
Published: (2024)
Do Domain-specific Experts exist in MoE-based LLMs?
by: Do, Giang, et al.
Published: (2026)
by: Do, Giang, et al.
Published: (2026)
Similar Items
-
Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models
by: Wei, Linye, et al.
Published: (2025) -
What Gets Activated: Uncovering Domain and Driver Experts in MoE Language Models
by: Hu, Guimin, et al.
Published: (2026) -
LLaDA-MoE: A Sparse MoE Diffusion Language Model
by: Zhu, Fengqi, et al.
Published: (2025) -
Leave It to the Experts: Detecting Knowledge Distillation via MoE Expert Signatures
by: Li, Pingzhi, et al.
Published: (2025) -
Steering MoE LLMs via Expert (De)Activation
by: Fayyaz, Mohsen, et al.
Published: (2025)