Saved in:
| Main Authors: | Do, Giang, Le, Hung, Tran, Truyen |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.05267 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
S2MoE: Robust Sparse Mixture of Experts via Stochastic Learning
by: Do, Giang, et al.
Published: (2025)
by: Do, Giang, et al.
Published: (2025)
Rethinking Sparse Mixture of Experts from a Unified Perspective
by: Do, Giang, et al.
Published: (2025)
by: Do, Giang, et al.
Published: (2025)
SimSMoE: Solving Representational Collapse via Similarity Measure
by: Do, Giang, et al.
Published: (2024)
by: Do, Giang, et al.
Published: (2024)
Eigenvectors of Experts are Training-free Non-collapsing Routers
by: Do, Giang, et al.
Published: (2026)
by: Do, Giang, et al.
Published: (2026)
On the Role of Discrete Representation in Sparse Mixture of Experts
by: Do, Giang, et al.
Published: (2024)
by: Do, Giang, et al.
Published: (2024)
Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts
by: Wu, Haoyuan, et al.
Published: (2025)
by: Wu, Haoyuan, et al.
Published: (2025)
Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory
by: Le, Hung, et al.
Published: (2024)
by: Le, Hung, et al.
Published: (2024)
Steering MoE LLMs via Expert (De)Activation
by: Fayyaz, Mohsen, et al.
Published: (2025)
by: Fayyaz, Mohsen, et al.
Published: (2025)
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
by: Qian, Yulei, et al.
Published: (2024)
by: Qian, Yulei, et al.
Published: (2024)
What Gets Activated: Uncovering Domain and Driver Experts in MoE Language Models
by: Hu, Guimin, et al.
Published: (2026)
by: Hu, Guimin, et al.
Published: (2026)
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
by: Li, Yunxin, et al.
Published: (2024)
by: Li, Yunxin, et al.
Published: (2024)
GEMQ: Global Expert-Level Mixed-Precision Quantization for MoE LLMs
by: Deng, Jianing, et al.
Published: (2026)
by: Deng, Jianing, et al.
Published: (2026)
MH-MoE: Multi-Head Mixture-of-Experts
by: Huang, Shaohan, et al.
Published: (2024)
by: Huang, Shaohan, et al.
Published: (2024)
$\infty$-MoE: Generalizing Mixture of Experts to Infinite Experts
by: Takashiro, Shota, et al.
Published: (2026)
by: Takashiro, Shota, et al.
Published: (2026)
ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns
by: Zhao, Ziyu, et al.
Published: (2026)
by: Zhao, Ziyu, et al.
Published: (2026)
OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale
by: Shi, Jingze, et al.
Published: (2026)
by: Shi, Jingze, et al.
Published: (2026)
Advancing Expert Specialization for Better MoE
by: Guo, Hongcan, et al.
Published: (2025)
by: Guo, Hongcan, et al.
Published: (2025)
Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models
by: Jiang, Songtao, et al.
Published: (2024)
by: Jiang, Songtao, et al.
Published: (2024)
Ada-K Routing: Boosting the Efficiency of MoE-based LLMs
by: Yue, Tongtian, et al.
Published: (2024)
by: Yue, Tongtian, et al.
Published: (2024)
MoE-Prism: Disentangling Monolithic Experts for Elastic MoE Services via Model-System Co-Designs
by: Xia, Xinfeng, et al.
Published: (2025)
by: Xia, Xinfeng, et al.
Published: (2025)
SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts
by: Muzio, Alexandre, et al.
Published: (2024)
by: Muzio, Alexandre, et al.
Published: (2024)
Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity
by: Tang, Yehui, et al.
Published: (2025)
by: Tang, Yehui, et al.
Published: (2025)
Dynamic Expert Specialization: Towards Catastrophic Forgetting-Free Multi-Domain MoE Adaptation
by: Li, Junzhuo, et al.
Published: (2025)
by: Li, Junzhuo, et al.
Published: (2025)
Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Models
by: Le, Quang-Hung, et al.
Published: (2024)
by: Le, Quang-Hung, et al.
Published: (2024)
ROMER: Expert Replacement and Router Calibration for Robust MoE LLMs on Analog Compute-in-Memory Systems
by: Zhou, Wenyong, et al.
Published: (2026)
by: Zhou, Wenyong, et al.
Published: (2026)
Leave It to the Experts: Detecting Knowledge Distillation via MoE Expert Signatures
by: Li, Pingzhi, et al.
Published: (2025)
by: Li, Pingzhi, et al.
Published: (2025)
Expert Selections In MoE Models Reveal (Almost) As Much As Text
by: Nuriyev, Amir, et al.
Published: (2026)
by: Nuriyev, Amir, et al.
Published: (2026)
BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR
by: Ma, Guodong, et al.
Published: (2025)
by: Ma, Guodong, et al.
Published: (2025)
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation
by: Li, Zichong, et al.
Published: (2025)
by: Li, Zichong, et al.
Published: (2025)
MoRAL: MoE Augmented LoRA for LLMs' Lifelong Learning
by: Yang, Shu, et al.
Published: (2024)
by: Yang, Shu, et al.
Published: (2024)
Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs
by: Tang, Yehui, et al.
Published: (2025)
by: Tang, Yehui, et al.
Published: (2025)
PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning
by: Li, Zongqian, et al.
Published: (2025)
by: Li, Zongqian, et al.
Published: (2025)
Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts
by: Gu, Naibin, et al.
Published: (2025)
by: Gu, Naibin, et al.
Published: (2025)
Harder Tasks Need More Experts: Dynamic Routing in MoE Models
by: Huang, Quzhe, et al.
Published: (2024)
by: Huang, Quzhe, et al.
Published: (2024)
Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks
by: Chernov, Andrei
Published: (2025)
by: Chernov, Andrei
Published: (2025)
GMoE: Empowering LLMs Fine-Tuning via MoE Graph Collaboration
by: Bai, Ting, et al.
Published: (2024)
by: Bai, Ting, et al.
Published: (2024)
LLaDA-MoE: A Sparse MoE Diffusion Language Model
by: Zhu, Fengqi, et al.
Published: (2025)
by: Zhu, Fengqi, et al.
Published: (2025)
CP-MoE: Consistency-Preserving Mixture-of-Experts for Continual Learning
by: Liu, Yang, et al.
Published: (2026)
by: Liu, Yang, et al.
Published: (2026)
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts
by: Sun, Weigao, et al.
Published: (2025)
by: Sun, Weigao, et al.
Published: (2025)
Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference
by: Liu, Baihui, et al.
Published: (2026)
by: Liu, Baihui, et al.
Published: (2026)
Similar Items
-
S2MoE: Robust Sparse Mixture of Experts via Stochastic Learning
by: Do, Giang, et al.
Published: (2025) -
Rethinking Sparse Mixture of Experts from a Unified Perspective
by: Do, Giang, et al.
Published: (2025) -
SimSMoE: Solving Representational Collapse via Similarity Measure
by: Do, Giang, et al.
Published: (2024) -
Eigenvectors of Experts are Training-free Non-collapsing Routers
by: Do, Giang, et al.
Published: (2026) -
On the Role of Discrete Representation in Sparse Mixture of Experts
by: Do, Giang, et al.
Published: (2024)