Saved in:
| Main Authors: | Hallee, Logan, Kapur, Rohan, Patel, Arjun, Gleghorn, Jason P., Khomtchouk, Bohdan |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.15713 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Dual Triangle Attention: Effective Bidirectional Attention Without Positional Embeddings
by: Hallee, Logan, et al.
Published: (2026)
by: Hallee, Logan, et al.
Published: (2026)
Mixtures of SubExperts for Large Language Continual Learning
by: Kang, Haeyong
Published: (2025)
by: Kang, Haeyong
Published: (2025)
Routing-Free Mixture-of-Experts
by: Liu, Yilun, et al.
Published: (2026)
by: Liu, Yilun, et al.
Published: (2026)
Multi-Head Mixture-of-Experts
by: Wu, Xun, et al.
Published: (2024)
by: Wu, Xun, et al.
Published: (2024)
Multilingual Routing in Mixture-of-Experts
by: Bandarkar, Lucas, et al.
Published: (2025)
by: Bandarkar, Lucas, et al.
Published: (2025)
SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs?
by: Zhuang, Haomin, et al.
Published: (2024)
by: Zhuang, Haomin, et al.
Published: (2024)
On the Spatial Structure of Mixture-of-Experts in Transformers
by: Bershatsky, Daniel, et al.
Published: (2025)
by: Bershatsky, Daniel, et al.
Published: (2025)
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
by: Lu, Xudong, et al.
Published: (2024)
by: Lu, Xudong, et al.
Published: (2024)
Mixture of Heterogeneous Grouped Experts for Language Modeling
by: Ma, Zhicheng, et al.
Published: (2026)
by: Ma, Zhicheng, et al.
Published: (2026)
Scaling Laws for Fine-Grained Mixture of Experts
by: Krajewski, Jakub, et al.
Published: (2024)
by: Krajewski, Jakub, et al.
Published: (2024)
MoIN: Mixture of Introvert Experts to Upcycle an LLM
by: Tejankar, Ajinkya, et al.
Published: (2024)
by: Tejankar, Ajinkya, et al.
Published: (2024)
Upcycling Large Language Models into Mixture of Experts
by: He, Ethan, et al.
Published: (2024)
by: He, Ethan, et al.
Published: (2024)
Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time
by: Han, Yixuan, et al.
Published: (2025)
by: Han, Yixuan, et al.
Published: (2025)
Parameter-Efficient Routed Fine-Tuning: Mixture-of-Experts Demands Mixture of Adaptation Modules
by: Liu, Yilun, et al.
Published: (2025)
by: Liu, Yilun, et al.
Published: (2025)
Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models
by: Kim, Gyeongman, et al.
Published: (2025)
by: Kim, Gyeongman, et al.
Published: (2025)
RobustSentEmbed: Robust Sentence Embeddings Using Adversarial Self-Supervised Contrastive Learning
by: Asl, Javad Rafiei, et al.
Published: (2024)
by: Asl, Javad Rafiei, et al.
Published: (2024)
MobileMoE: Scaling On-Device Mixture of Experts
by: Chen, Yanbei, et al.
Published: (2026)
by: Chen, Yanbei, et al.
Published: (2026)
Pruning and Distilling Mixture-of-Experts into Dense Language Models
by: Kim, Junhyuck, et al.
Published: (2026)
by: Kim, Junhyuck, et al.
Published: (2026)
Towards a Comprehensive Scaling Law of Mixture-of-Experts
by: Zhao, Guoliang, et al.
Published: (2025)
by: Zhao, Guoliang, et al.
Published: (2025)
Probing Semantic Routing in Large Mixture-of-Expert Models
by: Olson, Matthew Lyle, et al.
Published: (2025)
by: Olson, Matthew Lyle, et al.
Published: (2025)
OLMoE: Open Mixture-of-Experts Language Models
by: Muennighoff, Niklas, et al.
Published: (2024)
by: Muennighoff, Niklas, et al.
Published: (2024)
Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference
by: Liu, Baihui, et al.
Published: (2026)
by: Liu, Baihui, et al.
Published: (2026)
Using GPT Models for Qualitative and Quantitative News Analytics in the 2024 US Presidental Election Process
by: Pavlyshenko, Bohdan M.
Published: (2024)
by: Pavlyshenko, Bohdan M.
Published: (2024)
Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts
by: Gritsch, Nikolas, et al.
Published: (2024)
by: Gritsch, Nikolas, et al.
Published: (2024)
MEPT: Mixture of Expert Prompt Tuning as a Manifold Mapper
by: Zeng, Runjia, et al.
Published: (2025)
by: Zeng, Runjia, et al.
Published: (2025)
Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts
by: He, Shwai, et al.
Published: (2025)
by: He, Shwai, et al.
Published: (2025)
Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks
by: Nakamura, Taishi, et al.
Published: (2025)
by: Nakamura, Taishi, et al.
Published: (2025)
MoESD: Mixture of Experts Stable Diffusion to Mitigate Gender Bias
by: Wang, Guorun, et al.
Published: (2024)
by: Wang, Guorun, et al.
Published: (2024)
MoECollab: Democratizing LLM Development Through Collaborative Mixture of Experts
by: Harshit
Published: (2025)
by: Harshit
Published: (2025)
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization
by: Nakamura, Taishi, et al.
Published: (2025)
by: Nakamura, Taishi, et al.
Published: (2025)
Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts
by: Gu, Naibin, et al.
Published: (2025)
by: Gu, Naibin, et al.
Published: (2025)
Knowledge Localization in Mixture-of-Experts LLMs Using Cross-Lingual Inconsistency
by: Bandarkar, Lucas, et al.
Published: (2026)
by: Bandarkar, Lucas, et al.
Published: (2026)
Scaling Embeddings Outperforms Scaling Experts in Language Models
by: Liu, Hong, et al.
Published: (2026)
by: Liu, Hong, et al.
Published: (2026)
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models
by: Zhao, Zhongyu, et al.
Published: (2024)
by: Zhao, Zhongyu, et al.
Published: (2024)
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models
by: Pan, Bowen, et al.
Published: (2024)
by: Pan, Bowen, et al.
Published: (2024)
Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient
by: Ludziejewski, Jan, et al.
Published: (2025)
by: Ludziejewski, Jan, et al.
Published: (2025)
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
by: Pióro, Maciej, et al.
Published: (2024)
by: Pióro, Maciej, et al.
Published: (2024)
QuantMoE-Bench: Examining Post-Training Quantization for Mixture-of-Experts
by: Li, Pingzhi, et al.
Published: (2024)
by: Li, Pingzhi, et al.
Published: (2024)
$μ$-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts
by: Koike-Akino, Toshiaki, et al.
Published: (2025)
by: Koike-Akino, Toshiaki, et al.
Published: (2025)
Upcycling Instruction Tuning from Dense to Mixture-of-Experts via Parameter Merging
by: Hui, Tingfeng, et al.
Published: (2024)
by: Hui, Tingfeng, et al.
Published: (2024)
Similar Items
-
Dual Triangle Attention: Effective Bidirectional Attention Without Positional Embeddings
by: Hallee, Logan, et al.
Published: (2026) -
Mixtures of SubExperts for Large Language Continual Learning
by: Kang, Haeyong
Published: (2025) -
Routing-Free Mixture-of-Experts
by: Liu, Yilun, et al.
Published: (2026) -
Multi-Head Mixture-of-Experts
by: Wu, Xun, et al.
Published: (2024) -
Multilingual Routing in Mixture-of-Experts
by: Bandarkar, Lucas, et al.
Published: (2025)