Saved in:
| Main Author: | Ichikawa, Yuki |
|---|---|
| Format: | Recurso digital |
| Language: | |
| Published: |
Zenodo
2026
|
| Online Access: | https://doi.org/10.5281/zenodo.18667548 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Mixture of Attention Schemes (MoAS): Learning to Route Between MHA, GQA, and MQA
by: Gumaan, Esmail
Published: (2025)
by: Gumaan, Esmail
Published: (2025)
Align Attention Heads Before Merging Them: An Effective Way for Converting MHA to GQA
by: Jin, Qingyun, et al.
Published: (2024)
by: Jin, Qingyun, et al.
Published: (2024)
Sensitivity-Positional Co-Localization in GQA Transformers
by: Rao, Manoj Chandrashekar
Published: (2026)
by: Rao, Manoj Chandrashekar
Published: (2026)
Multi-Head Attention as a Source of Catastrophic Forgetting in MoE Transformers
by: Chen, Anrui, et al.
Published: (2026)
by: Chen, Anrui, et al.
Published: (2026)
GazeFormer-MoE: Context-Aware Gaze Estimation via CLIP and MoE Transformer
by: Zhao, Xinyuan, et al.
Published: (2026)
by: Zhao, Xinyuan, et al.
Published: (2026)
MiM-DiT: MoE in MoE with Diffusion Transformers for All-in-One Image Restoration
by: Kong, Lingshun, et al.
Published: (2026)
by: Kong, Lingshun, et al.
Published: (2026)
Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation
by: Zheng, Youwei, et al.
Published: (2025)
by: Zheng, Youwei, et al.
Published: (2025)
DTop-p MoE: Sparsity-Controlled Dynamic Top-p MoE for Foundation Model Pre-training
by: Jin, Can, et al.
Published: (2025)
by: Jin, Can, et al.
Published: (2025)
LLaDA-MoE: A Sparse MoE Diffusion Language Model
by: Zhu, Fengqi, et al.
Published: (2025)
by: Zhu, Fengqi, et al.
Published: (2025)
MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing
by: Ma, Haiyue, et al.
Published: (2025)
by: Ma, Haiyue, et al.
Published: (2025)
SP-MoE: Speculative Decoding and Prefetching for Accelerating MoE-based Model Inference
by: Chen, Liangkun, et al.
Published: (2025)
by: Chen, Liangkun, et al.
Published: (2025)
Muon: Training and Trade-offs with Latent Attention and MoE
by: Mehta, Sushant, et al.
Published: (2025)
by: Mehta, Sushant, et al.
Published: (2025)
Janus: Disaggregating Attention and Experts for Scalable MoE Inference
by: Zhang, Zhexiang, et al.
Published: (2025)
by: Zhang, Zhexiang, et al.
Published: (2025)
Optimizing MoE Routers: Design, Implementation, and Evaluation in Transformer Models
by: Harvey, Daniel Fidel, et al.
Published: (2025)
by: Harvey, Daniel Fidel, et al.
Published: (2025)
MoE-Prefill: Zero Redundancy Overheads in MoE Prefill Serving
by: Su, Zhaoyuan, et al.
Published: (2026)
by: Su, Zhaoyuan, et al.
Published: (2026)
MoE-PHDS: One MoE checkpoint for flexible runtime sparsity
by: Hannah, Lauren. A, et al.
Published: (2025)
by: Hannah, Lauren. A, et al.
Published: (2025)
MoE-Compression: How the Compression Error of Experts Affects the Inference Accuracy of MoE Model?
by: Ma, Songkai, et al.
Published: (2025)
by: Ma, Songkai, et al.
Published: (2025)
D$^{2}$MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving
by: Wang, Haodong, et al.
Published: (2025)
by: Wang, Haodong, et al.
Published: (2025)
The MoE-Empowered Edge LLMs Deployment: Architecture, Challenges, and Opportunities
by: Li, Ning, et al.
Published: (2025)
by: Li, Ning, et al.
Published: (2025)
MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs
by: Cao, Shiyi, et al.
Published: (2024)
by: Cao, Shiyi, et al.
Published: (2024)
Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts
by: Wu, Haoyuan, et al.
Published: (2025)
by: Wu, Haoyuan, et al.
Published: (2025)
EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
by: Qian, Yulei, et al.
Published: (2024)
by: Qian, Yulei, et al.
Published: (2024)
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
by: Wu, Haoze, et al.
Published: (2024)
by: Wu, Haoze, et al.
Published: (2024)
STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation
by: Wang, Yiming, et al.
Published: (2025)
by: Wang, Yiming, et al.
Published: (2025)
Revealing the Challenges of Attention-FFN Disaggregation for Modern MoE Models and Hardware Systems
by: Liu, Guowei, et al.
Published: (2026)
by: Liu, Guowei, et al.
Published: (2026)
MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention
by: Pang, Yuqi, et al.
Published: (2025)
by: Pang, Yuqi, et al.
Published: (2025)
MoE-Sieve: Routing-Guided LoRA for Efficient MoE Fine-Tuning
by: Manzoni, Andrea
Published: (2026)
by: Manzoni, Andrea
Published: (2026)
LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing
by: Nie, Xiaonan, et al.
Published: (2024)
by: Nie, Xiaonan, et al.
Published: (2024)
OD-MoE: On-Demand Expert Loading for Cacheless Edge-Distributed MoE Inference
by: Wang, Liujianfu, et al.
Published: (2025)
by: Wang, Liujianfu, et al.
Published: (2025)
Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs
by: Tang, Yehui, et al.
Published: (2025)
by: Tang, Yehui, et al.
Published: (2025)
CAT-MoEformer: Context-Aware Temporal MoE Transformer for Beam Prediction
by: Zhou, Changkai, et al.
Published: (2026)
by: Zhou, Changkai, et al.
Published: (2026)
KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models
by: Xu, Zukang, et al.
Published: (2026)
by: Xu, Zukang, et al.
Published: (2026)
MoE-Prism: Disentangling Monolithic Experts for Elastic MoE Services via Model-System Co-Designs
by: Xia, Xinfeng, et al.
Published: (2025)
by: Xia, Xinfeng, et al.
Published: (2025)
Spectral Manifold Regularization for Stable and Modular Routing in Deep MoE Architectures
by: Delibasoglu, Ibrahim
Published: (2026)
by: Delibasoglu, Ibrahim
Published: (2026)
Accelerating MoE with Dynamic In-Switch Computing on Multi-GPUs
by: Zhang, Qijun, et al.
Published: (2026)
by: Zhang, Qijun, et al.
Published: (2026)
MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting
by: Jin, In-Hwan, et al.
Published: (2025)
by: Jin, In-Hwan, et al.
Published: (2025)
DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models
by: Aghdam, Maryam Akhavan, et al.
Published: (2024)
by: Aghdam, Maryam Akhavan, et al.
Published: (2024)
Harder Tasks Need More Experts: Dynamic Routing in MoE Models
by: Huang, Quzhe, et al.
Published: (2024)
by: Huang, Quzhe, et al.
Published: (2024)
Mix-MoE: Improving Multilingual Machine Translation of Large Language Models through Mixed MoEs
by: Li, Bo, et al.
Published: (2026)
by: Li, Bo, et al.
Published: (2026)
BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-Spoofing
by: Ma, Yingjie, et al.
Published: (2024)
by: Ma, Yingjie, et al.
Published: (2024)
Similar Items
-
Mixture of Attention Schemes (MoAS): Learning to Route Between MHA, GQA, and MQA
by: Gumaan, Esmail
Published: (2025) -
Align Attention Heads Before Merging Them: An Effective Way for Converting MHA to GQA
by: Jin, Qingyun, et al.
Published: (2024) -
Sensitivity-Positional Co-Localization in GQA Transformers
by: Rao, Manoj Chandrashekar
Published: (2026) -
Multi-Head Attention as a Source of Catastrophic Forgetting in MoE Transformers
by: Chen, Anrui, et al.
Published: (2026) -
GazeFormer-MoE: Context-Aware Gaze Estimation via CLIP and MoE Transformer
by: Zhao, Xinyuan, et al.
Published: (2026)