Saved in:
| Main Authors: | Tang, Yuhan, Cui, Kangxin, Park, Jung Ho, Zhao, Yibo, Jiang, Xuan, He, Haoze, Yu, Jiangbo, Koutsopoulos, Haris, Zhao, Jinhua |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.13727 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
by: Wu, Haoze, et al.
Published: (2024)
by: Wu, Haoze, et al.
Published: (2024)
GazeFormer-MoE: Context-Aware Gaze Estimation via CLIP and MoE Transformer
by: Zhao, Xinyuan, et al.
Published: (2026)
by: Zhao, Xinyuan, et al.
Published: (2026)
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
by: Qian, Yulei, et al.
Published: (2024)
by: Qian, Yulei, et al.
Published: (2024)
STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation
by: Wang, Yiming, et al.
Published: (2025)
by: Wang, Yiming, et al.
Published: (2025)
Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts
by: Wu, Haoyuan, et al.
Published: (2025)
by: Wu, Haoyuan, et al.
Published: (2025)
LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing
by: Nie, Xiaonan, et al.
Published: (2024)
by: Nie, Xiaonan, et al.
Published: (2024)
LLaDA-MoE: A Sparse MoE Diffusion Language Model
by: Zhu, Fengqi, et al.
Published: (2025)
by: Zhu, Fengqi, et al.
Published: (2025)
GRACE-MoE: Grouping and Replication with Locality-Aware Routing for Efficient Distributed MoE Inference
by: Han, Yu, et al.
Published: (2025)
by: Han, Yu, et al.
Published: (2025)
MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert Cache
by: Xue, Leyang, et al.
Published: (2024)
by: Xue, Leyang, et al.
Published: (2024)
MoE-Prefill: Zero Redundancy Overheads in MoE Prefill Serving
by: Su, Zhaoyuan, et al.
Published: (2026)
by: Su, Zhaoyuan, et al.
Published: (2026)
MoE-PHDS: One MoE checkpoint for flexible runtime sparsity
by: Hannah, Lauren. A, et al.
Published: (2025)
by: Hannah, Lauren. A, et al.
Published: (2025)
Accelerating Distributed MoE Training and Inference with Lina
by: Li, Jiamin, et al.
Published: (2022)
by: Li, Jiamin, et al.
Published: (2022)
MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs
by: Cao, Shiyi, et al.
Published: (2024)
by: Cao, Shiyi, et al.
Published: (2024)
KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models
by: Xu, Zukang, et al.
Published: (2026)
by: Xu, Zukang, et al.
Published: (2026)
CAT-MoEformer: Context-Aware Temporal MoE Transformer for Beam Prediction
by: Zhou, Changkai, et al.
Published: (2026)
by: Zhou, Changkai, et al.
Published: (2026)
Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs
by: Tang, Yehui, et al.
Published: (2025)
by: Tang, Yehui, et al.
Published: (2025)
MoE-Sieve: Routing-Guided LoRA for Efficient MoE Fine-Tuning
by: Manzoni, Andrea
Published: (2026)
by: Manzoni, Andrea
Published: (2026)
SP-MoE: Speculative Decoding and Prefetching for Accelerating MoE-based Model Inference
by: Chen, Liangkun, et al.
Published: (2025)
by: Chen, Liangkun, et al.
Published: (2025)
OD-MoE: On-Demand Expert Loading for Cacheless Edge-Distributed MoE Inference
by: Wang, Liujianfu, et al.
Published: (2025)
by: Wang, Liujianfu, et al.
Published: (2025)
LocMoE: A Low-Overhead MoE for Large Language Model Training
by: Li, Jing, et al.
Published: (2024)
by: Li, Jing, et al.
Published: (2024)
Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers
by: Ma, Wenhan, et al.
Published: (2025)
by: Ma, Wenhan, et al.
Published: (2025)
MoE-Loco: Mixture of Experts for Multitask Locomotion
by: Huang, Runhan, et al.
Published: (2025)
by: Huang, Runhan, et al.
Published: (2025)
Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation
by: Zheng, Youwei, et al.
Published: (2025)
by: Zheng, Youwei, et al.
Published: (2025)
BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-Spoofing
by: Ma, Yingjie, et al.
Published: (2024)
by: Ma, Yingjie, et al.
Published: (2024)
MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing
by: Ma, Haiyue, et al.
Published: (2025)
by: Ma, Haiyue, et al.
Published: (2025)
MoE-Compression: How the Compression Error of Experts Affects the Inference Accuracy of MoE Model?
by: Ma, Songkai, et al.
Published: (2025)
by: Ma, Songkai, et al.
Published: (2025)
MiM-DiT: MoE in MoE with Diffusion Transformers for All-in-One Image Restoration
by: Kong, Lingshun, et al.
Published: (2026)
by: Kong, Lingshun, et al.
Published: (2026)
Advancing Expert Specialization for Better MoE
by: Guo, Hongcan, et al.
Published: (2025)
by: Guo, Hongcan, et al.
Published: (2025)
MD-Face: MoE-Enhanced Label-Free Disentangled Representation for Interactive Facial Attribute Editing
by: Cui, Xuan, et al.
Published: (2026)
by: Cui, Xuan, et al.
Published: (2026)
GRIN: GRadient-INformed MoE
by: Liu, Liyuan, et al.
Published: (2024)
by: Liu, Liyuan, et al.
Published: (2024)
Efficient MoE Serving in the Memory-Bound Regime: Balance Activated Experts, Not Tokens
by: Yu, Yanpeng, et al.
Published: (2025)
by: Yu, Yanpeng, et al.
Published: (2025)
FluxMoE: Decoupling Expert Residency for High-Performance MoE Serving
by: Liu, Qingxiu, et al.
Published: (2026)
by: Liu, Qingxiu, et al.
Published: (2026)
MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching
by: Xu, Tairan, et al.
Published: (2025)
by: Xu, Tairan, et al.
Published: (2025)
HierMoE: Accelerating MoE Training with Hierarchical Token Deduplication and Expert Swap
by: Lin, Wenxiang, et al.
Published: (2025)
by: Lin, Wenxiang, et al.
Published: (2025)
DynaMo: Runtime Switchable Quantization for MoE with Cross-Dataset Adaptation
by: Zheng, Zihao, et al.
Published: (2025)
by: Zheng, Zihao, et al.
Published: (2025)
Fine-grained MoE Load Balancing with Linear Programming
by: Zhao, Chenqi, et al.
Published: (2025)
by: Zhao, Chenqi, et al.
Published: (2025)
MoE-SpAc: Efficient MoE Inference Based on Speculative Activation Utility in Heterogeneous Edge Scenarios
by: Li, Shuhuai, et al.
Published: (2026)
by: Li, Shuhuai, et al.
Published: (2026)
DTop-p MoE: Sparsity-Controlled Dynamic Top-p MoE for Foundation Model Pre-training
by: Jin, Can, et al.
Published: (2025)
by: Jin, Can, et al.
Published: (2025)
D$^{2}$MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving
by: Wang, Haodong, et al.
Published: (2025)
by: Wang, Haodong, et al.
Published: (2025)
MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints
by: Yuan, Yichao, et al.
Published: (2025)
by: Yuan, Yichao, et al.
Published: (2025)
Similar Items
-
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
by: Wu, Haoze, et al.
Published: (2024) -
GazeFormer-MoE: Context-Aware Gaze Estimation via CLIP and MoE Transformer
by: Zhao, Xinyuan, et al.
Published: (2026) -
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
by: Qian, Yulei, et al.
Published: (2024) -
STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation
by: Wang, Yiming, et al.
Published: (2025) -
Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts
by: Wu, Haoyuan, et al.
Published: (2025)