:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	He, Xin, Zhang, Shunkang, Tang, Kaijie, Shi, Shaohuai, Wang, Yuxin, Zeng, Zihao, Tang, Zhenheng, Chu, Xiaowen, Yin, Haiyan, Tsang, Ivor W., Ong, Yew Soon
Format:	Preprint
Published:	2024
Subjects:	Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2410.17954
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Efficient MoE Inference with Fine-Grained Scheduling of Disaggregated Expert Parallelism
by: Pan, Xinglin, et al.
Published: (2025)

ExpertFlow: Adaptive Expert Scheduling and Memory Coordination for Efficient MoE Inference
by: Shen, Zixu, et al.
Published: (2025)

FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models
by: Pan, Xinglin, et al.
Published: (2025)

RouteMark: A Fingerprint for Intellectual Property Attribution in Routing-based Model Merging
by: He, Xin, et al.
Published: (2025)

Learning Mixture-of-Experts for General-Purpose Black-Box Discrete Optimization
by: Liu, Shengcai, et al.
Published: (2024)

Flow-Direct: Feedback-Efficient and Reusable Guidance for Flow Models via Non-Parametric Guidance Field
by: Tan, Kim Yong, et al.
Published: (2026)

HierMoE: Accelerating MoE Training with Hierarchical Token Deduplication and Expert Swap
by: Lin, Wenxiang, et al.
Published: (2025)

Mastering Continual Reinforcement Learning through Fine-Grained Sparse Network Allocation and Dormant Neuron Exploration
by: Zheng, Chengqi, et al.
Published: (2025)

Distributional Multi-objective Black-box Optimization for Diffusion-model Inference-time Multi-Target Generation
by: Tan, Kim Yong, et al.
Published: (2025)

AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models
by: Zeng, Zihao, et al.
Published: (2024)

Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference
by: Chu, Kexin, et al.
Published: (2025)

SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference
by: Chen, Qian, et al.
Published: (2025)

Reasoning Language Model Inference Serving Unveiled: An Empirical Study
by: Li, Qi, et al.
Published: (2025)

Routing Distilled Knowledge via Mixture of LoRA Experts for Large Language Model based Bundle Generation
by: Feng, Kaidong, et al.
Published: (2025)

DreamDDP: Accelerating Data Parallel Distributed LLM Training with Layer-wise Scheduled Partial Synchronization
by: Tang, Zhenheng, et al.
Published: (2025)

Towards Harmless Rawlsian Fairness Regardless of Demographic Prior
by: Wang, Xuanqian, et al.
Published: (2024)

Fast Direct: Query-Efficient Online Black-box Guidance for Diffusion-model Target Generation
by: Tan, Kim Yong, et al.
Published: (2025)

MermaidFlow: Redefining Agentic Workflow Generation via Safety-Constrained Evolutionary Programming
by: Zheng, Chengqi, et al.
Published: (2025)

Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning
by: Tang, Zichen, et al.
Published: (2024)

Mixture of Lookup Experts
by: Jie, Shibo, et al.
Published: (2025)

Covariance-Adaptive Sequential Black-box Optimization for Diffusion Targeted Generation
by: Lyu, Yueming, et al.
Published: (2024)

Speculating Experts Accelerates Inference for Mixture-of-Experts
by: Madan, Vivan, et al.
Published: (2026)

Scaling Multi-Node Mixture-of-Experts Inference Using Expert Activation Patterns
by: Bambhaniya, Abhimanyu, et al.
Published: (2026)

Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference
by: Skliar, Andrii, et al.
Published: (2024)

A Time Series is Worth Five Experts: Heterogeneous Mixture of Experts for Traffic Flow Prediction
by: Wang, Guangyu, et al.
Published: (2024)

FedImpro: Measuring and Improving Client Update in Federated Learning
by: Tang, Zhenheng, et al.
Published: (2024)

Task Scheduling for Efficient Inference of Large Language Models on Single Moderate GPU Systems
by: Lin, Wenxiang, et al.
Published: (2024)

Accelerating Mixture-of-Expert Inference with Adaptive Expert Split Mechanism
by: Yan, Jiaming, et al.
Published: (2025)

Generalizing GNNs with Tokenized Mixture of Experts
by: Guo, Xiaoguang, et al.
Published: (2026)

MoBiLE: Efficient Mixture-of-Experts Inference on Consumer GPU with Mixture of Big Little Experts
by: Zhao, Yushu, et al.
Published: (2025)

Lang-PINN: From Language to Physics-Informed Neural Networks via a Multi-Agent Framework
by: He, Xin, et al.
Published: (2025)

Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization
by: Wang, Yaoxiang, et al.
Published: (2025)

Cache Management for Mixture-of-Experts LLMs -- extended version
by: Angelopoulos, Spyros, et al.
Published: (2025)

Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
by: Kamahori, Keisuke, et al.
Published: (2024)

A Survey on Inference Optimization Techniques for Mixture of Experts Models
by: Liu, Jiacheng, et al.
Published: (2024)

Prediction-powered Inference by Mixture of Experts
by: Gu, Yanwu, et al.
Published: (2026)

FLEx: Personalized Federated Learning for Mixture-of-Experts LLMs via Expert Grafting
by: Liu, Fan, et al.
Published: (2025)

Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts
by: Dwivedi, Chaitanya, et al.
Published: (2026)

Learning More Generalized Experts by Merging Experts in Mixture-of-Experts
by: Park, Sejik
Published: (2024)

MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing
by: Go, Seokjin, et al.
Published: (2025)