:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Tang, Yuhan, Cui, Kangxin, Park, Jung Ho, Zhao, Yibo, Jiang, Xuan, He, Haoze, Yu, Jiangbo, Koutsopoulos, Haris, Zhao, Jinhua
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2512.13727
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
by: Wu, Haoze, et al.
Published: (2024)

GazeFormer-MoE: Context-Aware Gaze Estimation via CLIP and MoE Transformer
by: Zhao, Xinyuan, et al.
Published: (2026)

EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
by: Qian, Yulei, et al.
Published: (2024)

STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation
by: Wang, Yiming, et al.
Published: (2025)

Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts
by: Wu, Haoyuan, et al.
Published: (2025)

LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing
by: Nie, Xiaonan, et al.
Published: (2024)

LLaDA-MoE: A Sparse MoE Diffusion Language Model
by: Zhu, Fengqi, et al.
Published: (2025)

GRACE-MoE: Grouping and Replication with Locality-Aware Routing for Efficient Distributed MoE Inference
by: Han, Yu, et al.
Published: (2025)

MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert Cache
by: Xue, Leyang, et al.
Published: (2024)

MoE-Prefill: Zero Redundancy Overheads in MoE Prefill Serving
by: Su, Zhaoyuan, et al.
Published: (2026)

MoE-PHDS: One MoE checkpoint for flexible runtime sparsity
by: Hannah, Lauren. A, et al.
Published: (2025)

Accelerating Distributed MoE Training and Inference with Lina
by: Li, Jiamin, et al.
Published: (2022)

MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs
by: Cao, Shiyi, et al.
Published: (2024)

KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models
by: Xu, Zukang, et al.
Published: (2026)

CAT-MoEformer: Context-Aware Temporal MoE Transformer for Beam Prediction
by: Zhou, Changkai, et al.
Published: (2026)

Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs
by: Tang, Yehui, et al.
Published: (2025)

MoE-Sieve: Routing-Guided LoRA for Efficient MoE Fine-Tuning
by: Manzoni, Andrea
Published: (2026)

SP-MoE: Speculative Decoding and Prefetching for Accelerating MoE-based Model Inference
by: Chen, Liangkun, et al.
Published: (2025)

OD-MoE: On-Demand Expert Loading for Cacheless Edge-Distributed MoE Inference
by: Wang, Liujianfu, et al.
Published: (2025)

LocMoE: A Low-Overhead MoE for Large Language Model Training
by: Li, Jing, et al.
Published: (2024)

Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers
by: Ma, Wenhan, et al.
Published: (2025)

MoE-Loco: Mixture of Experts for Multitask Locomotion
by: Huang, Runhan, et al.
Published: (2025)

Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation
by: Zheng, Youwei, et al.
Published: (2025)

BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-Spoofing
by: Ma, Yingjie, et al.
Published: (2024)

MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing
by: Ma, Haiyue, et al.
Published: (2025)

MoE-Compression: How the Compression Error of Experts Affects the Inference Accuracy of MoE Model?
by: Ma, Songkai, et al.
Published: (2025)

MiM-DiT: MoE in MoE with Diffusion Transformers for All-in-One Image Restoration
by: Kong, Lingshun, et al.
Published: (2026)

Advancing Expert Specialization for Better MoE
by: Guo, Hongcan, et al.
Published: (2025)

MD-Face: MoE-Enhanced Label-Free Disentangled Representation for Interactive Facial Attribute Editing
by: Cui, Xuan, et al.
Published: (2026)

GRIN: GRadient-INformed MoE
by: Liu, Liyuan, et al.
Published: (2024)

Efficient MoE Serving in the Memory-Bound Regime: Balance Activated Experts, Not Tokens
by: Yu, Yanpeng, et al.
Published: (2025)

FluxMoE: Decoupling Expert Residency for High-Performance MoE Serving
by: Liu, Qingxiu, et al.
Published: (2026)

MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching
by: Xu, Tairan, et al.
Published: (2025)

HierMoE: Accelerating MoE Training with Hierarchical Token Deduplication and Expert Swap
by: Lin, Wenxiang, et al.
Published: (2025)

DynaMo: Runtime Switchable Quantization for MoE with Cross-Dataset Adaptation
by: Zheng, Zihao, et al.
Published: (2025)

Fine-grained MoE Load Balancing with Linear Programming
by: Zhao, Chenqi, et al.
Published: (2025)

MoE-SpAc: Efficient MoE Inference Based on Speculative Activation Utility in Heterogeneous Edge Scenarios
by: Li, Shuhuai, et al.
Published: (2026)

DTop-p MoE: Sparsity-Controlled Dynamic Top-p MoE for Foundation Model Pre-training
by: Jin, Can, et al.
Published: (2025)

D$^{2}$MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving
by: Wang, Haodong, et al.
Published: (2025)

MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints
by: Yuan, Yichao, et al.
Published: (2025)