:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Geng, Han, Yuxuan, Lou, Yuxuan, Zhang, Yiqi, Zhao, Wangbo, You, Yang
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2507.00390
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MoST: Mixing Speech and Text with Modality-Aware Mixture of Experts
by: Lou, Yuxuan, et al.
Published: (2026)

MoE Pathfinder: Trajectory-driven Expert Pruning
by: Yang, Xican, et al.
Published: (2025)

LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing
by: Hao, Jiawei, et al.
Published: (2026)

STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning
by: Lee, Jaeseong, et al.
Published: (2024)

EvoESAP: Non-Uniform Expert Pruning for Sparse MoE
by: Liu, Zongfang, et al.
Published: (2026)

MoE-Prefill: Zero Redundancy Overheads in MoE Prefill Serving
by: Su, Zhaoyuan, et al.
Published: (2026)

CAMERA: Multi-Matrix Joint Compression for MoE Models via Micro-Expert Redundancy Analysis
by: Xu, Yuzhuang, et al.
Published: (2025)

Prediction Is All MoE Needs: Expert Load Distribution Goes from Fluctuating to Stabilizing
by: Cong, Peizhuang, et al.
Published: (2024)

MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs
by: Chen, Xiaodong, et al.
Published: (2025)

SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation
by: Li, Zichong, et al.
Published: (2025)

Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts
by: Liu, Xu, et al.
Published: (2024)

MergeMoE: Efficient Compression of MoE Models via Expert Output Merging
by: Miao, Ruijie, et al.
Published: (2025)

MoE-Compression: How the Compression Error of Experts Affects the Inference Accuracy of MoE Model?
by: Ma, Songkai, et al.
Published: (2025)

FluxMoE: Decoupling Expert Residency for High-Performance MoE Serving
by: Liu, Qingxiu, et al.
Published: (2026)

$μ$-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts
by: Koike-Akino, Toshiaki, et al.
Published: (2025)

Horseshoe Mixtures-of-Experts (HS-MoE)
by: Polson, Nick, et al.
Published: (2026)

REAP the Experts: Why Pruning Prevails for One-Shot MoE compression
by: Lasby, Mike, et al.
Published: (2025)

MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert Cache
by: Xue, Leyang, et al.
Published: (2024)

MoE-Prism: Disentangling Monolithic Experts for Elastic MoE Services via Model-System Co-Designs
by: Xia, Xinfeng, et al.
Published: (2025)

SD-MoE: Spectral Decomposition for Effective Expert Specialization
by: Huang, Ruijun, et al.
Published: (2026)

MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing
by: Ma, Haiyue, et al.
Published: (2025)

FLEX-MoE: Federated Mixture-of-Experts with Load-balanced Expert Assignment for Edge Computing
by: Zhang, Boyang, et al.
Published: (2025)

AIMER: Calibration-Free Task-Agnostic MoE Pruning
by: Liu, Zongfang, et al.
Published: (2026)

Expert Divergence Learning for MoE-based Language Models
by: Li, Jiaang, et al.
Published: (2026)

MoE Lens -- An Expert Is All You Need
by: Chaudhari, Marmik, et al.
Published: (2026)

Condense, Don't Just Prune: Enhancing Efficiency and Performance in MoE Layer Pruning
by: Cao, Mingyu, et al.
Published: (2024)

MoE-I$^2$: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition
by: Yang, Cheng, et al.
Published: (2024)

$\infty$-MoE: Generalizing Mixture of Experts to Infinite Experts
by: Takashiro, Shota, et al.
Published: (2026)

HierMoE: Accelerating MoE Training with Hierarchical Token Deduplication and Expert Swap
by: Lin, Wenxiang, et al.
Published: (2025)

Breaking the MoE LLM Trilemma: Dynamic Expert Clustering with Structured Compression
by: Zhu, Peijun, et al.
Published: (2025)

ROMER: Expert Replacement and Router Calibration for Robust MoE LLMs on Analog Compute-in-Memory Systems
by: Zhou, Wenyong, et al.
Published: (2026)

VA-MoE: Variables-Adaptive Mixture of Experts for Incremental Weather Forecasting
by: Chen, Hao, et al.
Published: (2024)

Exploiting the Experts: Unauthorized Compression in MoE-LLMs
by: Neogi, Pinaki Prasad Guha, et al.
Published: (2025)

GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
by: Wu, Haoze, et al.
Published: (2024)

ExpertWeaver: Unlocking the Inherent MoE in Dense LLMs with GLU Activation Patterns
by: Zhao, Ziyu, et al.
Published: (2026)

MoE-Spec: Expert Budgeting for Efficient Speculative Decoding
by: McDanel, Bradley, et al.
Published: (2026)

MoE-Pruner: Pruning Mixture-of-Experts Large Language Model using the Hints from Its Router
by: Xie, Yanyue, et al.
Published: (2024)

Deconstructing Pre-training: Knowledge Attribution Analysis in MoE and Dense Models
by: Wang, Bo, et al.
Published: (2026)

Adaptive and Fine-grained Module-wise Expert Pruning for Efficient LoRA-MoE Fine-Tuning
by: Li, Weihang, et al.
Published: (2026)

Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging
by: Li, Lujun, et al.
Published: (2025)