:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Ichikawa, Yuki
Format:	Recurso digital
Language:
Published:	Zenodo 2026
Online Access:	https://doi.org/10.5281/zenodo.18667548
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Mixture of Attention Schemes (MoAS): Learning to Route Between MHA, GQA, and MQA
by: Gumaan, Esmail
Published: (2025)

Align Attention Heads Before Merging Them: An Effective Way for Converting MHA to GQA
by: Jin, Qingyun, et al.
Published: (2024)

Sensitivity-Positional Co-Localization in GQA Transformers
by: Rao, Manoj Chandrashekar
Published: (2026)

Multi-Head Attention as a Source of Catastrophic Forgetting in MoE Transformers
by: Chen, Anrui, et al.
Published: (2026)

GazeFormer-MoE: Context-Aware Gaze Estimation via CLIP and MoE Transformer
by: Zhao, Xinyuan, et al.
Published: (2026)

MiM-DiT: MoE in MoE with Diffusion Transformers for All-in-One Image Restoration
by: Kong, Lingshun, et al.
Published: (2026)

Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation
by: Zheng, Youwei, et al.
Published: (2025)

DTop-p MoE: Sparsity-Controlled Dynamic Top-p MoE for Foundation Model Pre-training
by: Jin, Can, et al.
Published: (2025)

LLaDA-MoE: A Sparse MoE Diffusion Language Model
by: Zhu, Fengqi, et al.
Published: (2025)

MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing
by: Ma, Haiyue, et al.
Published: (2025)

SP-MoE: Speculative Decoding and Prefetching for Accelerating MoE-based Model Inference
by: Chen, Liangkun, et al.
Published: (2025)

Muon: Training and Trade-offs with Latent Attention and MoE
by: Mehta, Sushant, et al.
Published: (2025)

Janus: Disaggregating Attention and Experts for Scalable MoE Inference
by: Zhang, Zhexiang, et al.
Published: (2025)

Optimizing MoE Routers: Design, Implementation, and Evaluation in Transformer Models
by: Harvey, Daniel Fidel, et al.
Published: (2025)

MoE-Prefill: Zero Redundancy Overheads in MoE Prefill Serving
by: Su, Zhaoyuan, et al.
Published: (2026)

MoE-PHDS: One MoE checkpoint for flexible runtime sparsity
by: Hannah, Lauren. A, et al.
Published: (2025)

MoE-Compression: How the Compression Error of Experts Affects the Inference Accuracy of MoE Model?
by: Ma, Songkai, et al.
Published: (2025)

D$^{2}$MoE: Dual Routing and Dynamic Scheduling for Efficient On-Device MoE-based LLM Serving
by: Wang, Haodong, et al.
Published: (2025)

The MoE-Empowered Edge LLMs Deployment: Architecture, Challenges, and Opportunities
by: Li, Ning, et al.
Published: (2025)

MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs
by: Cao, Shiyi, et al.
Published: (2024)

Grove MoE: Towards Efficient and Superior MoE LLMs with Adjugate Experts
by: Wu, Haoyuan, et al.
Published: (2025)

EPS-MoE: Expert Pipeline Scheduler for Cost-Efficient MoE Inference
by: Qian, Yulei, et al.
Published: (2024)

GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
by: Wu, Haoze, et al.
Published: (2024)

STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation
by: Wang, Yiming, et al.
Published: (2025)

Revealing the Challenges of Attention-FFN Disaggregation for Modern MoE Models and Hardware Systems
by: Liu, Guowei, et al.
Published: (2026)

MoCHA: Advanced Vision-Language Reasoning with MoE Connector and Hierarchical Group Attention
by: Pang, Yuqi, et al.
Published: (2025)

MoE-Sieve: Routing-Guided LoRA for Efficient MoE Fine-Tuning
by: Manzoni, Andrea
Published: (2026)

LSH-MoE: Communication-efficient MoE Training via Locality-Sensitive Hashing
by: Nie, Xiaonan, et al.
Published: (2024)

OD-MoE: On-Demand Expert Loading for Cacheless Edge-Distributed MoE Inference
by: Wang, Liujianfu, et al.
Published: (2025)

Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs
by: Tang, Yehui, et al.
Published: (2025)

CAT-MoEformer: Context-Aware Temporal MoE Transformer for Beam Prediction
by: Zhou, Changkai, et al.
Published: (2026)

KBVQ-MoE: KLT-guided SVD with Bias-Corrected Vector Quantization for MoE Large Language Models
by: Xu, Zukang, et al.
Published: (2026)

MoE-Prism: Disentangling Monolithic Experts for Elastic MoE Services via Model-System Co-Designs
by: Xia, Xinfeng, et al.
Published: (2025)

Spectral Manifold Regularization for Stable and Modular Routing in Deep MoE Architectures
by: Delibasoglu, Ibrahim
Published: (2026)

Accelerating MoE with Dynamic In-Switch Computing on Multi-GPUs
by: Zhang, Qijun, et al.
Published: (2026)

MoE-GS: Mixture of Experts for Dynamic Gaussian Splatting
by: Jin, In-Hwan, et al.
Published: (2025)

DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models
by: Aghdam, Maryam Akhavan, et al.
Published: (2024)

Harder Tasks Need More Experts: Dynamic Routing in MoE Models
by: Huang, Quzhe, et al.
Published: (2024)

Mix-MoE: Improving Multilingual Machine Translation of Large Language Models through Mixed MoEs
by: Li, Bo, et al.
Published: (2026)

BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-Spoofing
by: Ma, Yingjie, et al.
Published: (2024)