Saved in:
| Main Author: | Chernov, Andrei |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2502.03608 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE
by: Chernov, Andrei, et al.
Published: (2025)
by: Chernov, Andrei, et al.
Published: (2025)
Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks
by: Chernov, Andrei
Published: (2025)
by: Chernov, Andrei
Published: (2025)
MoE-PHDS: One MoE checkpoint for flexible runtime sparsity
by: Hannah, Lauren. A, et al.
Published: (2025)
by: Hannah, Lauren. A, et al.
Published: (2025)
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
by: Wu, Haoze, et al.
Published: (2024)
by: Wu, Haoze, et al.
Published: (2024)
Mixture of Experts (MoE): A Big Data Perspective
by: Gan, Wensheng, et al.
Published: (2025)
by: Gan, Wensheng, et al.
Published: (2025)
Fine-Tuning a Time Series Foundation Model with Wasserstein Loss
by: Chernov, Andrei
Published: (2024)
by: Chernov, Andrei
Published: (2024)
STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation
by: Wang, Yiming, et al.
Published: (2025)
by: Wang, Yiming, et al.
Published: (2025)
Improving MoE Compute Efficiency by Composing Weight and Data Sparsity
by: Kilian, Maciej, et al.
Published: (2026)
by: Kilian, Maciej, et al.
Published: (2026)
KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models
by: Xu, Zukang, et al.
Published: (2026)
by: Xu, Zukang, et al.
Published: (2026)
DOT-MoE: Differentiable Optimal Transport for MoEfication
by: Bamba, Udbhav, et al.
Published: (2026)
by: Bamba, Udbhav, et al.
Published: (2026)
KAN v.s. MLP for Offline Reinforcement Learning
by: Guo, Haihong, et al.
Published: (2024)
by: Guo, Haihong, et al.
Published: (2024)
SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations
by: Guo, Wentao, et al.
Published: (2025)
by: Guo, Wentao, et al.
Published: (2025)
GRIN: GRadient-INformed MoE
by: Liu, Liyuan, et al.
Published: (2024)
by: Liu, Liyuan, et al.
Published: (2024)
Exploiting the Experts: Unauthorized Compression in MoE-LLMs
by: Neogi, Pinaki Prasad Guha, et al.
Published: (2025)
by: Neogi, Pinaki Prasad Guha, et al.
Published: (2025)
MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs
by: Cao, Shiyi, et al.
Published: (2024)
by: Cao, Shiyi, et al.
Published: (2024)
FFT-MoE: Efficient Federated Fine-Tuning for Foundation Models via Large-scale Sparse MoE under Heterogeneous Edge
by: Hu, Gang, et al.
Published: (2025)
by: Hu, Gang, et al.
Published: (2025)
MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE
by: Huang, Zongle, et al.
Published: (2025)
by: Huang, Zongle, et al.
Published: (2025)
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
by: Duanmu, Haojie, et al.
Published: (2025)
by: Duanmu, Haojie, et al.
Published: (2025)
MoE-SpAc: Efficient MoE Inference Based on Speculative Activation Utility in Heterogeneous Edge Scenarios
by: Li, Shuhuai, et al.
Published: (2026)
by: Li, Shuhuai, et al.
Published: (2026)
Collaborative Compression for Large-Scale MoE Deployment on Edge
by: Chen, Yixiao, et al.
Published: (2025)
by: Chen, Yixiao, et al.
Published: (2025)
SD-MoE: Spectral Decomposition for Effective Expert Specialization
by: Huang, Ruijun, et al.
Published: (2026)
by: Huang, Ruijun, et al.
Published: (2026)
SDG-MoE: Signed Debate Graph Mixture-of-Experts
by: Kulibaba, Stepan, et al.
Published: (2026)
by: Kulibaba, Stepan, et al.
Published: (2026)
Expert Divergence Learning for MoE-based Language Models
by: Li, Jiaang, et al.
Published: (2026)
by: Li, Jiaang, et al.
Published: (2026)
BitsMoE: Efficient Spectral Energy-Guided Bit Allocation for MoE LLM Quantization
by: Zhao, Jiayu, et al.
Published: (2026)
by: Zhao, Jiayu, et al.
Published: (2026)
MoEITS: A Green AI approach for simplifying MoE-LLMs
by: Balderas, Luis, et al.
Published: (2026)
by: Balderas, Luis, et al.
Published: (2026)
PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts
by: Su, Yang, et al.
Published: (2025)
by: Su, Yang, et al.
Published: (2025)
Innovator: Scientific Continued Pretraining with Fine-grained MoE Upcycling
by: Liao, Ning, et al.
Published: (2025)
by: Liao, Ning, et al.
Published: (2025)
Analytical FFN-to-MoE Restructuring via Activation Pattern Analysis
by: Pei, Zehua, et al.
Published: (2025)
by: Pei, Zehua, et al.
Published: (2025)
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
by: Jin, Peng, et al.
Published: (2024)
by: Jin, Peng, et al.
Published: (2024)
Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training
by: Du, Xianzhi, et al.
Published: (2024)
by: Du, Xianzhi, et al.
Published: (2024)
Grouter: Decoupling Routing from Representation for Accelerated MoE Training
by: Xu, Yuqi, et al.
Published: (2026)
by: Xu, Yuqi, et al.
Published: (2026)
Awakening Dormant Experts:Counterfactual Routing to Mitigate MoE Hallucinations
by: Hu, Wentao, et al.
Published: (2026)
by: Hu, Wentao, et al.
Published: (2026)
Towards an empirical understanding of MoE design choices
by: Fan, Dongyang, et al.
Published: (2024)
by: Fan, Dongyang, et al.
Published: (2024)
BinConv: A Neural Architecture for Ordinal Encoding in Time-Series Forecasting
by: Chernov, Andrei, et al.
Published: (2025)
by: Chernov, Andrei, et al.
Published: (2025)
LLM Embeddings for Deep Learning on Tabular Data
by: Koloski, Boshko, et al.
Published: (2025)
by: Koloski, Boshko, et al.
Published: (2025)
Spectral Manifold Regularization for Stable and Modular Routing in Deep MoE Architectures
by: Delibasoglu, Ibrahim
Published: (2026)
by: Delibasoglu, Ibrahim
Published: (2026)
MP-MoE: Matrix Profile-Guided Mixture of Experts for Precipitation Forecasting
by: Tran, Huyen Ngoc, et al.
Published: (2026)
by: Tran, Huyen Ngoc, et al.
Published: (2026)
MACS: Modality-Aware Capacity Scaling for Efficient Multimodal MoE Inference
by: Li, Bo, et al.
Published: (2026)
by: Li, Bo, et al.
Published: (2026)
Is Retraining-Free Enough? The Necessity of Router Calibration for Efficient MoE Compression
by: Hyeon, Sieun, et al.
Published: (2026)
by: Hyeon, Sieun, et al.
Published: (2026)
LocMoE: A Low-Overhead MoE for Large Language Model Training
by: Li, Jing, et al.
Published: (2024)
by: Li, Jing, et al.
Published: (2024)
Similar Items
-
The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE
by: Chernov, Andrei, et al.
Published: (2025) -
Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks
by: Chernov, Andrei
Published: (2025) -
MoE-PHDS: One MoE checkpoint for flexible runtime sparsity
by: Hannah, Lauren. A, et al.
Published: (2025) -
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
by: Wu, Haoze, et al.
Published: (2024) -
Mixture of Experts (MoE): A Big Data Perspective
by: Gan, Wensheng, et al.
Published: (2025)