Saved in:
| Main Authors: | Zhang, Geng, Han, Yuxuan, Lou, Yuxuan, Zhang, Yiqi, Zhao, Wangbo, You, Yang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.00390 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MoST: Mixing Speech and Text with Modality-Aware Mixture of Experts
by: Lou, Yuxuan, et al.
Published: (2026)
by: Lou, Yuxuan, et al.
Published: (2026)
MoE Pathfinder: Trajectory-driven Expert Pruning
by: Yang, Xican, et al.
Published: (2025)
by: Yang, Xican, et al.
Published: (2025)
LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing
by: Hao, Jiawei, et al.
Published: (2026)
by: Hao, Jiawei, et al.
Published: (2026)
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
by: Lee, Jaeseong, et al.
Published: (2024)
by: Lee, Jaeseong, et al.
Published: (2024)
EvoESAP: Non-Uniform Expert Pruning for Sparse MoE
by: Liu, Zongfang, et al.
Published: (2026)
by: Liu, Zongfang, et al.
Published: (2026)
MoE-Prefill: Zero Redundancy Overheads in MoE Prefill Serving
by: Su, Zhaoyuan, et al.
Published: (2026)
by: Su, Zhaoyuan, et al.
Published: (2026)
CAMERA: Multi-Matrix Joint Compression for MoE Models via Micro-Expert Redundancy Analysis
by: Xu, Yuzhuang, et al.
Published: (2025)
by: Xu, Yuzhuang, et al.
Published: (2025)
Prediction Is All MoE Needs: Expert Load Distribution Goes from Fluctuating to Stabilizing
by: Cong, Peizhuang, et al.
Published: (2024)
by: Cong, Peizhuang, et al.
Published: (2024)
MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs
by: Chen, Xiaodong, et al.
Published: (2025)
by: Chen, Xiaodong, et al.
Published: (2025)
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation
by: Li, Zichong, et al.
Published: (2025)
by: Li, Zichong, et al.
Published: (2025)
Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts
by: Liu, Xu, et al.
Published: (2024)
by: Liu, Xu, et al.
Published: (2024)
MergeMoE: Efficient Compression of MoE Models via Expert Output Merging
by: Miao, Ruijie, et al.
Published: (2025)
by: Miao, Ruijie, et al.
Published: (2025)
MoE-Compression: How the Compression Error of Experts Affects the Inference Accuracy of MoE Model?
by: Ma, Songkai, et al.
Published: (2025)
by: Ma, Songkai, et al.
Published: (2025)
FluxMoE: Decoupling Expert Residency for High-Performance MoE Serving
by: Liu, Qingxiu, et al.
Published: (2026)
by: Liu, Qingxiu, et al.
Published: (2026)
$μ$-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts
by: Koike-Akino, Toshiaki, et al.
Published: (2025)
by: Koike-Akino, Toshiaki, et al.
Published: (2025)
Horseshoe Mixtures-of-Experts (HS-MoE)
by: Polson, Nick, et al.
Published: (2026)
by: Polson, Nick, et al.
Published: (2026)
REAP the Experts: Why Pruning Prevails for One-Shot MoE compression
by: Lasby, Mike, et al.
Published: (2025)
by: Lasby, Mike, et al.
Published: (2025)
MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert Cache
by: Xue, Leyang, et al.
Published: (2024)
by: Xue, Leyang, et al.
Published: (2024)
MoE-Prism: Disentangling Monolithic Experts for Elastic MoE Services via Model-System Co-Designs
by: Xia, Xinfeng, et al.
Published: (2025)
by: Xia, Xinfeng, et al.
Published: (2025)
SD-MoE: Spectral Decomposition for Effective Expert Specialization
by: Huang, Ruijun, et al.
Published: (2026)
by: Huang, Ruijun, et al.
Published: (2026)
MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing
by: Ma, Haiyue, et al.
Published: (2025)
by: Ma, Haiyue, et al.
Published: (2025)
FLEX-MoE: Federated Mixture-of-Experts with Load-balanced Expert Assignment for Edge Computing
by: Zhang, Boyang, et al.
Published: (2025)
by: Zhang, Boyang, et al.
Published: (2025)
AIMER: Calibration-Free Task-Agnostic MoE Pruning
by: Liu, Zongfang, et al.
Published: (2026)
by: Liu, Zongfang, et al.
Published: (2026)
Expert Divergence Learning for MoE-based Language Models
by: Li, Jiaang, et al.
Published: (2026)
by: Li, Jiaang, et al.
Published: (2026)
MoE Lens -- An Expert Is All You Need
by: Chaudhari, Marmik, et al.
Published: (2026)
by: Chaudhari, Marmik, et al.
Published: (2026)
Condense, Don't Just Prune: Enhancing Efficiency and Performance in MoE Layer Pruning
by: Cao, Mingyu, et al.
Published: (2024)
by: Cao, Mingyu, et al.
Published: (2024)
MoE-I$^2$: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition
by: Yang, Cheng, et al.
Published: (2024)
by: Yang, Cheng, et al.
Published: (2024)
$\infty$-MoE: Generalizing Mixture of Experts to Infinite Experts
by: Takashiro, Shota, et al.
Published: (2026)
by: Takashiro, Shota, et al.
Published: (2026)
HierMoE: Accelerating MoE Training with Hierarchical Token Deduplication and Expert Swap
by: Lin, Wenxiang, et al.
Published: (2025)
by: Lin, Wenxiang, et al.
Published: (2025)
Breaking the MoE LLM Trilemma: Dynamic Expert Clustering with Structured Compression
by: Zhu, Peijun, et al.
Published: (2025)
by: Zhu, Peijun, et al.
Published: (2025)
ROMER: Expert Replacement and Router Calibration for Robust MoE LLMs on Analog Compute-in-Memory Systems
by: Zhou, Wenyong, et al.
Published: (2026)
by: Zhou, Wenyong, et al.
Published: (2026)
VA-MoE: Variables-Adaptive Mixture of Experts for Incremental Weather Forecasting
by: Chen, Hao, et al.
Published: (2024)
by: Chen, Hao, et al.
Published: (2024)
Exploiting the Experts: Unauthorized Compression in MoE-LLMs
by: Neogi, Pinaki Prasad Guha, et al.
Published: (2025)
by: Neogi, Pinaki Prasad Guha, et al.
Published: (2025)
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
by: Wu, Haoze, et al.
Published: (2024)
by: Wu, Haoze, et al.
Published: (2024)
ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns
by: Zhao, Ziyu, et al.
Published: (2026)
by: Zhao, Ziyu, et al.
Published: (2026)
MoE-Spec: Expert Budgeting for Efficient Speculative Decoding
by: McDanel, Bradley, et al.
Published: (2026)
by: McDanel, Bradley, et al.
Published: (2026)
MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
by: Xie, Yanyue, et al.
Published: (2024)
by: Xie, Yanyue, et al.
Published: (2024)
Deconstructing Pre-training: Knowledge Attribution Analysis in MoE and Dense Models
by: Wang, Bo, et al.
Published: (2026)
by: Wang, Bo, et al.
Published: (2026)
Adaptive and Fine-grained Module-wise Expert Pruning for Efficient LoRA-MoE Fine-Tuning
by: Li, Weihang, et al.
Published: (2026)
by: Li, Weihang, et al.
Published: (2026)
Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging
by: Li, Lujun, et al.
Published: (2025)
by: Li, Lujun, et al.
Published: (2025)
Similar Items
-
MoST: Mixing Speech and Text with Modality-Aware Mixture of Experts
by: Lou, Yuxuan, et al.
Published: (2026) -
MoE Pathfinder: Trajectory-driven Expert Pruning
by: Yang, Xican, et al.
Published: (2025) -
LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing
by: Hao, Jiawei, et al.
Published: (2026) -
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
by: Lee, Jaeseong, et al.
Published: (2024) -
EvoESAP: Non-Uniform Expert Pruning for Sparse MoE
by: Liu, Zongfang, et al.
Published: (2026)