Saved in:
| Main Authors: | Shen, Zeyu, Henderson, Peter |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2604.20156 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A Closer Look into Mixture-of-Experts in Large Language Models
by: Lo, Ka Man, et al.
Published: (2024)
by: Lo, Ka Man, et al.
Published: (2024)
Extending Puzzle for Mixture-of-Experts Reasoning Models with Application to GPT-OSS Acceleration
by: Bercovich, Akhiad, et al.
Published: (2026)
by: Bercovich, Akhiad, et al.
Published: (2026)
Scattered Mixture-of-Experts Implementation
by: Tan, Shawn, et al.
Published: (2024)
by: Tan, Shawn, et al.
Published: (2024)
MoQE: Improve Quantization Model performance via Mixture of Quantization Experts
by: Zhang, Jinhao, et al.
Published: (2025)
by: Zhang, Jinhao, et al.
Published: (2025)
TESTAM: A Time-Enhanced Spatio-Temporal Attention Model with Mixture of Experts
by: Lee, Hyunwook, et al.
Published: (2024)
by: Lee, Hyunwook, et al.
Published: (2024)
Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference
by: Chu, Kexin, et al.
Published: (2025)
by: Chu, Kexin, et al.
Published: (2025)
Mixture of Heterogeneous Grouped Experts for Language Modeling
by: Ma, Zhicheng, et al.
Published: (2026)
by: Ma, Zhicheng, et al.
Published: (2026)
MC#: Mixture Compressor for Mixture-of-Experts Large Models
by: Huang, Wei, et al.
Published: (2025)
by: Huang, Wei, et al.
Published: (2025)
Efficiently Editing Mixture-of-Experts Models with Compressed Experts
by: He, Yifei, et al.
Published: (2025)
by: He, Yifei, et al.
Published: (2025)
Varying-Coefficient Mixture of Experts Model
by: Zhao, Qicheng, et al.
Published: (2026)
by: Zhao, Qicheng, et al.
Published: (2026)
Spatial-Temporal Mixture-of-Graph-Experts for Multi-Type Crime Prediction
by: Wu, Ziyang, et al.
Published: (2024)
by: Wu, Ziyang, et al.
Published: (2024)
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models
by: Qiu, Zihan, et al.
Published: (2025)
by: Qiu, Zihan, et al.
Published: (2025)
LTLDoG: Satisfying Temporally-Extended Symbolic Constraints for Safe Diffusion-based Planning
by: Feng, Zeyu, et al.
Published: (2024)
by: Feng, Zeyu, et al.
Published: (2024)
Learning More Generalized Experts by Merging Experts in Mixture-of-Experts
by: Park, Sejik
Published: (2024)
by: Park, Sejik
Published: (2024)
Mixture of Experts in Large Language Models
by: Zhang, Danyang, et al.
Published: (2025)
by: Zhang, Danyang, et al.
Published: (2025)
Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models
by: Wang, Zihan, et al.
Published: (2025)
by: Wang, Zihan, et al.
Published: (2025)
AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert
by: Gao, Yuting, et al.
Published: (2025)
by: Gao, Yuting, et al.
Published: (2025)
DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models
by: Aghdam, Maryam Akhavan, et al.
Published: (2024)
by: Aghdam, Maryam Akhavan, et al.
Published: (2024)
LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing
by: Hao, Jiawei, et al.
Published: (2026)
by: Hao, Jiawei, et al.
Published: (2026)
Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection
by: Zhan, Zheng, et al.
Published: (2025)
by: Zhan, Zheng, et al.
Published: (2025)
Path-Constrained Mixture-of-Experts
by: Gu, Zijin, et al.
Published: (2026)
by: Gu, Zijin, et al.
Published: (2026)
$μ$-Parametrization for Mixture of Experts
by: Małaśnicki, Jan, et al.
Published: (2025)
by: Małaśnicki, Jan, et al.
Published: (2025)
Expert Merging in Sparse Mixture of Experts with Nash Bargaining
by: Nguyen, Dung V., et al.
Published: (2025)
by: Nguyen, Dung V., et al.
Published: (2025)
The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models
by: Wang, Yan, et al.
Published: (2026)
by: Wang, Yan, et al.
Published: (2026)
Optimizing Pre-Training Data Mixtures with Mixtures of Data Expert Models
by: Belenki, Lior, et al.
Published: (2025)
by: Belenki, Lior, et al.
Published: (2025)
Mixture of Raytraced Experts
by: Perin, Andrea, et al.
Published: (2025)
by: Perin, Andrea, et al.
Published: (2025)
Mixture of Lookup Experts
by: Jie, Shibo, et al.
Published: (2025)
by: Jie, Shibo, et al.
Published: (2025)
Structured Diffusion Models with Mixture of Gaussians as Prior Distribution
by: Jia, Nanshan, et al.
Published: (2024)
by: Jia, Nanshan, et al.
Published: (2024)
Modeling Expert Interactions in Sparse Mixture of Experts via Graph Structures
by: Nguyen-Nhat, Minh-Khoi, et al.
Published: (2025)
by: Nguyen-Nhat, Minh-Khoi, et al.
Published: (2025)
Diffusion Meets Options: Hierarchical Generative Skill Composition for Temporally-Extended Tasks
by: Feng, Zeyu, et al.
Published: (2024)
by: Feng, Zeyu, et al.
Published: (2024)
Mixture of Experts in a Mixture of RL settings
by: Willi, Timon, et al.
Published: (2024)
by: Willi, Timon, et al.
Published: (2024)
Bayesian Mixture of Experts For Large Language Models
by: Dialameh, Maryam, et al.
Published: (2025)
by: Dialameh, Maryam, et al.
Published: (2025)
Differentially Private Training of Mixture of Experts Models
by: Tholoniat, Pierre, et al.
Published: (2024)
by: Tholoniat, Pierre, et al.
Published: (2024)
Merging Multi-Task Models via Weight-Ensembling Mixture of Experts
by: Tang, Anke, et al.
Published: (2024)
by: Tang, Anke, et al.
Published: (2024)
Speculating Experts Accelerates Inference for Mixture-of-Experts
by: Madan, Vivan, et al.
Published: (2026)
by: Madan, Vivan, et al.
Published: (2026)
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion
by: Tang, Anke, et al.
Published: (2024)
by: Tang, Anke, et al.
Published: (2024)
Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models
by: Liang, Jingcong, et al.
Published: (2025)
by: Liang, Jingcong, et al.
Published: (2025)
Robustness of Mixtures of Experts to Feature Noise
by: Sun, Dong, et al.
Published: (2026)
by: Sun, Dong, et al.
Published: (2026)
Generalizing GNNs with Tokenized Mixture of Experts
by: Guo, Xiaoguang, et al.
Published: (2026)
by: Guo, Xiaoguang, et al.
Published: (2026)
Hyperparameter Transfer with Mixture-of-Expert Layers
by: Jiang, Tianze, et al.
Published: (2026)
by: Jiang, Tianze, et al.
Published: (2026)
Similar Items
-
A Closer Look into Mixture-of-Experts in Large Language Models
by: Lo, Ka Man, et al.
Published: (2024) -
Extending Puzzle for Mixture-of-Experts Reasoning Models with Application to GPT-OSS Acceleration
by: Bercovich, Akhiad, et al.
Published: (2026) -
Scattered Mixture-of-Experts Implementation
by: Tan, Shawn, et al.
Published: (2024) -
MoQE: Improve Quantization Model performance via Mixture of Quantization Experts
by: Zhang, Jinhao, et al.
Published: (2025) -
TESTAM: A Time-Enhanced Spatio-Temporal Attention Model with Mixture of Experts
by: Lee, Hyunwook, et al.
Published: (2024)