Saved in:
| Main Authors: | He, Xin, Zhang, Shunkang, Tang, Kaijie, Shi, Shaohuai, Wang, Yuxin, Zeng, Zihao, Tang, Zhenheng, Chu, Xiaowen, Yin, Haiyan, Tsang, Ivor W., Ong, Yew Soon |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2410.17954 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Efficient MoE Inference with Fine-Grained Scheduling of Disaggregated Expert Parallelism
by: Pan, Xinglin, et al.
Published: (2025)
by: Pan, Xinglin, et al.
Published: (2025)
ExpertFlow: Adaptive Expert Scheduling and Memory Coordination for Efficient MoE Inference
by: Shen, Zixu, et al.
Published: (2025)
by: Shen, Zixu, et al.
Published: (2025)
FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models
by: Pan, Xinglin, et al.
Published: (2025)
by: Pan, Xinglin, et al.
Published: (2025)
RouteMark: A Fingerprint for Intellectual Property Attribution in Routing-based Model Merging
by: He, Xin, et al.
Published: (2025)
by: He, Xin, et al.
Published: (2025)
Learning Mixture-of-Experts for General-Purpose Black-Box Discrete Optimization
by: Liu, Shengcai, et al.
Published: (2024)
by: Liu, Shengcai, et al.
Published: (2024)
Flow-Direct: Feedback-Efficient and Reusable Guidance for Flow Models via Non-Parametric Guidance Field
by: Tan, Kim Yong, et al.
Published: (2026)
by: Tan, Kim Yong, et al.
Published: (2026)
HierMoE: Accelerating MoE Training with Hierarchical Token Deduplication and Expert Swap
by: Lin, Wenxiang, et al.
Published: (2025)
by: Lin, Wenxiang, et al.
Published: (2025)
Mastering Continual Reinforcement Learning through Fine-Grained Sparse Network Allocation and Dormant Neuron Exploration
by: Zheng, Chengqi, et al.
Published: (2025)
by: Zheng, Chengqi, et al.
Published: (2025)
Distributional Multi-objective Black-box Optimization for Diffusion-model Inference-time Multi-Target Generation
by: Tan, Kim Yong, et al.
Published: (2025)
by: Tan, Kim Yong, et al.
Published: (2025)
AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models
by: Zeng, Zihao, et al.
Published: (2024)
by: Zeng, Zihao, et al.
Published: (2024)
Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference
by: Chu, Kexin, et al.
Published: (2025)
by: Chu, Kexin, et al.
Published: (2025)
SlimCaching: Edge Caching of Mixture-of-Experts for Distributed Inference
by: Chen, Qian, et al.
Published: (2025)
by: Chen, Qian, et al.
Published: (2025)
Reasoning Language Model Inference Serving Unveiled: An Empirical Study
by: Li, Qi, et al.
Published: (2025)
by: Li, Qi, et al.
Published: (2025)
Routing Distilled Knowledge via Mixture of LoRA Experts for Large Language Model based Bundle Generation
by: Feng, Kaidong, et al.
Published: (2025)
by: Feng, Kaidong, et al.
Published: (2025)
DreamDDP: Accelerating Data Parallel Distributed LLM Training with Layer-wise Scheduled Partial Synchronization
by: Tang, Zhenheng, et al.
Published: (2025)
by: Tang, Zhenheng, et al.
Published: (2025)
Towards Harmless Rawlsian Fairness Regardless of Demographic Prior
by: Wang, Xuanqian, et al.
Published: (2024)
by: Wang, Xuanqian, et al.
Published: (2024)
Fast Direct: Query-Efficient Online Black-box Guidance for Diffusion-model Target Generation
by: Tan, Kim Yong, et al.
Published: (2025)
by: Tan, Kim Yong, et al.
Published: (2025)
MermaidFlow: Redefining Agentic Workflow Generation via Safety-Constrained Evolutionary Programming
by: Zheng, Chengqi, et al.
Published: (2025)
by: Zheng, Chengqi, et al.
Published: (2025)
Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning
by: Tang, Zichen, et al.
Published: (2024)
by: Tang, Zichen, et al.
Published: (2024)
Mixture of Lookup Experts
by: Jie, Shibo, et al.
Published: (2025)
by: Jie, Shibo, et al.
Published: (2025)
Covariance-Adaptive Sequential Black-box Optimization for Diffusion Targeted Generation
by: Lyu, Yueming, et al.
Published: (2024)
by: Lyu, Yueming, et al.
Published: (2024)
Speculating Experts Accelerates Inference for Mixture-of-Experts
by: Madan, Vivan, et al.
Published: (2026)
by: Madan, Vivan, et al.
Published: (2026)
Scaling Multi-Node Mixture-of-Experts Inference Using Expert Activation Patterns
by: Bambhaniya, Abhimanyu, et al.
Published: (2026)
by: Bambhaniya, Abhimanyu, et al.
Published: (2026)
Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference
by: Skliar, Andrii, et al.
Published: (2024)
by: Skliar, Andrii, et al.
Published: (2024)
A Time Series is Worth Five Experts: Heterogeneous Mixture of Experts for Traffic Flow Prediction
by: Wang, Guangyu, et al.
Published: (2024)
by: Wang, Guangyu, et al.
Published: (2024)
FedImpro: Measuring and Improving Client Update in Federated Learning
by: Tang, Zhenheng, et al.
Published: (2024)
by: Tang, Zhenheng, et al.
Published: (2024)
Task Scheduling for Efficient Inference of Large Language Models on Single Moderate GPU Systems
by: Lin, Wenxiang, et al.
Published: (2024)
by: Lin, Wenxiang, et al.
Published: (2024)
Accelerating Mixture-of-Expert Inference with Adaptive Expert Split Mechanism
by: Yan, Jiaming, et al.
Published: (2025)
by: Yan, Jiaming, et al.
Published: (2025)
Generalizing GNNs with Tokenized Mixture of Experts
by: Guo, Xiaoguang, et al.
Published: (2026)
by: Guo, Xiaoguang, et al.
Published: (2026)
MoBiLE: Efficient Mixture-of-Experts Inference on Consumer GPU with Mixture of Big Little Experts
by: Zhao, Yushu, et al.
Published: (2025)
by: Zhao, Yushu, et al.
Published: (2025)
Lang-PINN: From Language to Physics-Informed Neural Networks via a Multi-Agent Framework
by: He, Xin, et al.
Published: (2025)
by: He, Xin, et al.
Published: (2025)
Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization
by: Wang, Yaoxiang, et al.
Published: (2025)
by: Wang, Yaoxiang, et al.
Published: (2025)
Cache Management for Mixture-of-Experts LLMs -- extended version
by: Angelopoulos, Spyros, et al.
Published: (2025)
by: Angelopoulos, Spyros, et al.
Published: (2025)
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
by: Kamahori, Keisuke, et al.
Published: (2024)
by: Kamahori, Keisuke, et al.
Published: (2024)
A Survey on Inference Optimization Techniques for Mixture of Experts Models
by: Liu, Jiacheng, et al.
Published: (2024)
by: Liu, Jiacheng, et al.
Published: (2024)
Prediction-powered Inference by Mixture of Experts
by: Gu, Yanwu, et al.
Published: (2026)
by: Gu, Yanwu, et al.
Published: (2026)
FLEx: Personalized Federated Learning for Mixture-of-Experts LLMs via Expert Grafting
by: Liu, Fan, et al.
Published: (2025)
by: Liu, Fan, et al.
Published: (2025)
Expert Upcycling: Shifting the Compute-Efficient Frontier of Mixture-of-Experts
by: Dwivedi, Chaitanya, et al.
Published: (2026)
by: Dwivedi, Chaitanya, et al.
Published: (2026)
Learning More Generalized Experts by Merging Experts in Mixture-of-Experts
by: Park, Sejik
Published: (2024)
by: Park, Sejik
Published: (2024)
MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing
by: Go, Seokjin, et al.
Published: (2025)
by: Go, Seokjin, et al.
Published: (2025)
Similar Items
-
Efficient MoE Inference with Fine-Grained Scheduling of Disaggregated Expert Parallelism
by: Pan, Xinglin, et al.
Published: (2025) -
ExpertFlow: Adaptive Expert Scheduling and Memory Coordination for Efficient MoE Inference
by: Shen, Zixu, et al.
Published: (2025) -
FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models
by: Pan, Xinglin, et al.
Published: (2025) -
RouteMark: A Fingerprint for Intellectual Property Attribution in Routing-based Model Merging
by: He, Xin, et al.
Published: (2025) -
Learning Mixture-of-Experts for General-Purpose Black-Box Discrete Optimization
by: Liu, Shengcai, et al.
Published: (2024)