:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hallee, Logan, Kapur, Rohan, Patel, Arjun, Gleghorn, Jason P., Khomtchouk, Bohdan
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Artificial Intelligence Computation and Language
Online Access:	https://arxiv.org/abs/2401.15713
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Dual Triangle Attention: Effective Bidirectional Attention Without Positional Embeddings
by: Hallee, Logan, et al.
Published: (2026)

Mixtures of SubExperts for Large Language Continual Learning
by: Kang, Haeyong
Published: (2025)

Routing-Free Mixture-of-Experts
by: Liu, Yilun, et al.
Published: (2026)

Multi-Head Mixture-of-Experts
by: Wu, Xun, et al.
Published: (2024)

Multilingual Routing in Mixture-of-Experts
by: Bandarkar, Lucas, et al.
Published: (2025)

SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs?
by: Zhuang, Haomin, et al.
Published: (2024)

On the Spatial Structure of Mixture-of-Experts in Transformers
by: Bershatsky, Daniel, et al.
Published: (2025)

Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
by: Lu, Xudong, et al.
Published: (2024)

Mixture of Heterogeneous Grouped Experts for Language Modeling
by: Ma, Zhicheng, et al.
Published: (2026)

Scaling Laws for Fine-Grained Mixture of Experts
by: Krajewski, Jakub, et al.
Published: (2024)

MoIN: Mixture of Introvert Experts to Upcycle an LLM
by: Tejankar, Ajinkya, et al.
Published: (2024)

Upcycling Large Language Models into Mixture of Experts
by: He, Ethan, et al.
Published: (2024)

Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time
by: Han, Yixuan, et al.
Published: (2025)

Parameter-Efficient Routed Fine-Tuning: Mixture-of-Experts Demands Mixture of Adaptation Modules
by: Liu, Yilun, et al.
Published: (2025)

Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models
by: Kim, Gyeongman, et al.
Published: (2025)

RobustSentEmbed: Robust Sentence Embeddings Using Adversarial Self-Supervised Contrastive Learning
by: Asl, Javad Rafiei, et al.
Published: (2024)

MobileMoE: Scaling On-Device Mixture of Experts
by: Chen, Yanbei, et al.
Published: (2026)

Pruning and Distilling Mixture-of-Experts into Dense Language Models
by: Kim, Junhyuck, et al.
Published: (2026)

Towards a Comprehensive Scaling Law of Mixture-of-Experts
by: Zhao, Guoliang, et al.
Published: (2025)

Probing Semantic Routing in Large Mixture-of-Expert Models
by: Olson, Matthew Lyle, et al.
Published: (2025)

OLMoE: Open Mixture-of-Experts Language Models
by: Muennighoff, Niklas, et al.
Published: (2024)

Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference
by: Liu, Baihui, et al.
Published: (2026)

Using GPT Models for Qualitative and Quantitative News Analytics in the 2024 US Presidental Election Process
by: Pavlyshenko, Bohdan M.
Published: (2024)

Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts
by: Gritsch, Nikolas, et al.
Published: (2024)

MEPT: Mixture of Expert Prompt Tuning as a Manifold Mapper
by: Zeng, Runjia, et al.
Published: (2025)

Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts
by: He, Shwai, et al.
Published: (2025)

Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks
by: Nakamura, Taishi, et al.
Published: (2025)

MoESD: Mixture of Experts Stable Diffusion to Mitigate Gender Bias
by: Wang, Guorun, et al.
Published: (2024)

MoECollab: Democratizing LLM Development Through Collaborative Mixture of Experts
by: Harshit
Published: (2025)

Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
by: Nakamura, Taishi, et al.
Published: (2025)

Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts
by: Gu, Naibin, et al.
Published: (2025)

Knowledge Localization in Mixture-of-Experts LLMs Using Cross-Lingual Inconsistency
by: Bandarkar, Lucas, et al.
Published: (2026)

Scaling Embeddings Outperforms Scaling Experts in Language Models
by: Liu, Hong, et al.
Published: (2026)

FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models
by: Zhao, Zhongyu, et al.
Published: (2024)

Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
by: Pan, Bowen, et al.
Published: (2024)

Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient
by: Ludziejewski, Jan, et al.
Published: (2025)

MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
by: Pióro, Maciej, et al.
Published: (2024)

QuantMoE-Bench: Examining Post-Training Quantization for Mixture-of-Experts
by: Li, Pingzhi, et al.
Published: (2024)

$μ$-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts
by: Koike-Akino, Toshiaki, et al.
Published: (2025)

Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging
by: Hui, Tingfeng, et al.
Published: (2024)