Saved in:
| Main Author: | Jeon, MinCheol |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.06929 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SD-MoE: Spectral Decomposition for Effective Expert Specialization
by: Huang, Ruijun, et al.
Published: (2026)
by: Huang, Ruijun, et al.
Published: (2026)
MoE-PHDS: One MoE checkpoint for flexible runtime sparsity
by: Hannah, Lauren. A, et al.
Published: (2025)
by: Hannah, Lauren. A, et al.
Published: (2025)
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
by: Pióro, Maciej, et al.
Published: (2024)
by: Pióro, Maciej, et al.
Published: (2024)
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
by: Wu, Haoze, et al.
Published: (2024)
by: Wu, Haoze, et al.
Published: (2024)
Collaborative Compression for Large-Scale MoE Deployment on Edge
by: Chen, Yixiao, et al.
Published: (2025)
by: Chen, Yixiao, et al.
Published: (2025)
MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference
by: Li, Bo, et al.
Published: (2026)
by: Li, Bo, et al.
Published: (2026)
KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models
by: Xu, Zukang, et al.
Published: (2026)
by: Xu, Zukang, et al.
Published: (2026)
DOT-MoE: Differentiable Optimal Transport for MoEfication
by: Bamba, Udbhav, et al.
Published: (2026)
by: Bamba, Udbhav, et al.
Published: (2026)
Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models
by: Wang, Siqi, et al.
Published: (2024)
by: Wang, Siqi, et al.
Published: (2024)
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
by: Shi, Xiaoming, et al.
Published: (2024)
by: Shi, Xiaoming, et al.
Published: (2024)
Practical FP4 Training for Large-Scale MoE Models on Hopper GPUs
by: Zhang, Wuyue, et al.
Published: (2026)
by: Zhang, Wuyue, et al.
Published: (2026)
SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
by: Guo, Wentao, et al.
Published: (2025)
by: Guo, Wentao, et al.
Published: (2025)
Hierarchical LoRA MoE for Efficient CTR Model Scaling
by: Zeng, Zhichen, et al.
Published: (2025)
by: Zeng, Zhichen, et al.
Published: (2025)
GRIN: GRadient-INformed MoE
by: Liu, Liyuan, et al.
Published: (2024)
by: Liu, Liyuan, et al.
Published: (2024)
Each Prompt Matters: Scaling Reinforcement Learning Without Wasting Rollouts on Hundred-Billion-Scale MoE
by: Zeng, Anxiang, et al.
Published: (2025)
by: Zeng, Anxiang, et al.
Published: (2025)
Exploiting the Experts: Unauthorized Compression in MoE-LLMs
by: Neogi, Pinaki Prasad Guha, et al.
Published: (2025)
by: Neogi, Pinaki Prasad Guha, et al.
Published: (2025)
(GG) MoE vs. MLP on Tabular Data
by: Chernov, Andrei
Published: (2025)
by: Chernov, Andrei
Published: (2025)
MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs
by: Cao, Shiyi, et al.
Published: (2024)
by: Cao, Shiyi, et al.
Published: (2024)
FFT-MoE: Efficient Federated Fine-Tuning for Foundation Models via Large-scale Sparse MoE under Heterogeneous Edge
by: Hu, Gang, et al.
Published: (2025)
by: Hu, Gang, et al.
Published: (2025)
MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE
by: Huang, Zongle, et al.
Published: (2025)
by: Huang, Zongle, et al.
Published: (2025)
Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient
by: Ludziejewski, Jan, et al.
Published: (2025)
by: Ludziejewski, Jan, et al.
Published: (2025)
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
by: Duanmu, Haojie, et al.
Published: (2025)
by: Duanmu, Haojie, et al.
Published: (2025)
MoE-SpAc: Efficient MoE Inference Based on Speculative Activation Utility in Heterogeneous Edge Scenarios
by: Li, Shuhuai, et al.
Published: (2026)
by: Li, Shuhuai, et al.
Published: (2026)
Optimal Scaling Laws for Efficiency Gains in a Theoretical Transformer-Augmented Sectional MoE Framework
by: Sane, Soham
Published: (2025)
by: Sane, Soham
Published: (2025)
Mixture of Experts (MoE): A Big Data Perspective
by: Gan, Wensheng, et al.
Published: (2025)
by: Gan, Wensheng, et al.
Published: (2025)
SDG-MoE: Signed Debate Graph Mixture-of-Experts
by: Kulibaba, Stepan, et al.
Published: (2026)
by: Kulibaba, Stepan, et al.
Published: (2026)
Expert Divergence Learning for MoE-based Language Models
by: Li, Jiaang, et al.
Published: (2026)
by: Li, Jiaang, et al.
Published: (2026)
BitsMoE: Efficient Spectral Energy-Guided Bit Allocation for MoE LLM Quantization
by: Zhao, Jiayu, et al.
Published: (2026)
by: Zhao, Jiayu, et al.
Published: (2026)
xPatch: Dual-Stream Time Series Forecasting with Exponential Seasonal-Trend Decomposition
by: Stitsyuk, Artyom, et al.
Published: (2024)
by: Stitsyuk, Artyom, et al.
Published: (2024)
MoEITS: A Green AI approach for simplifying MoE-LLMs
by: Balderas, Luis, et al.
Published: (2026)
by: Balderas, Luis, et al.
Published: (2026)
MoE-I$^2$: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition
by: Yang, Cheng, et al.
Published: (2024)
by: Yang, Cheng, et al.
Published: (2024)
Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE
by: Huang, Haiduo, et al.
Published: (2025)
by: Huang, Haiduo, et al.
Published: (2025)
PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts
by: Su, Yang, et al.
Published: (2025)
by: Su, Yang, et al.
Published: (2025)
Innovator: Scientific Continued Pretraining with Fine-grained MoE Upcycling
by: Liao, Ning, et al.
Published: (2025)
by: Liao, Ning, et al.
Published: (2025)
STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation
by: Wang, Yiming, et al.
Published: (2025)
by: Wang, Yiming, et al.
Published: (2025)
The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE
by: Chernov, Andrei, et al.
Published: (2025)
by: Chernov, Andrei, et al.
Published: (2025)
Analytical FFN-to-MoE Restructuring via Activation Pattern Analysis
by: Pei, Zehua, et al.
Published: (2025)
by: Pei, Zehua, et al.
Published: (2025)
Improving MoE Compute Efficiency by Composing Weight and Data Sparsity
by: Kilian, Maciej, et al.
Published: (2026)
by: Kilian, Maciej, et al.
Published: (2026)
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
by: Jin, Peng, et al.
Published: (2024)
by: Jin, Peng, et al.
Published: (2024)
Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training
by: Du, Xianzhi, et al.
Published: (2024)
by: Du, Xianzhi, et al.
Published: (2024)
Similar Items
-
SD-MoE: Spectral Decomposition for Effective Expert Specialization
by: Huang, Ruijun, et al.
Published: (2026) -
MoE-PHDS: One MoE checkpoint for flexible runtime sparsity
by: Hannah, Lauren. A, et al.
Published: (2025) -
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
by: Pióro, Maciej, et al.
Published: (2024) -
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
by: Wu, Haoze, et al.
Published: (2024) -
Collaborative Compression for Large-Scale MoE Deployment on Edge
by: Chen, Yixiao, et al.
Published: (2025)