:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Morrill, Todd, Puli, Aahlad, Megjhani, Murad, Park, Soojin, Zemel, Richard
Format:	Preprint
Published:	2025
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2511.09567
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Attention and Compression is all you need for Controllably Efficient Language Models
by: Prakash, Jatin, et al.
Published: (2025)

Explanations that reveal all through the definition of encoding
by: Puli, Aahlad, et al.
Published: (2024)

Nuisances via Negativa: Adjusting for Spurious Correlations via Data Augmentation
by: Puli, Aahlad, et al.
Published: (2022)

Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited Modalities
by: Saporta, Adriel, et al.
Published: (2024)

Learning More Generalized Experts by Merging Experts in Mixture-of-Experts
by: Park, Sejik
Published: (2024)

Multi-Head Mixture-of-Experts
by: Wu, Xun, et al.
Published: (2024)

Continual Traffic Forecasting via Mixture of Experts
by: Lee, Sanghyun, et al.
Published: (2024)

Prompt Risk Control: A Rigorous Framework for Responsible Deployment of Large Language Models
by: Zollo, Thomas P., et al.
Published: (2023)

Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts
by: Chen, Shengzhuang, et al.
Published: (2025)

Improving Routing in Sparse Mixture of Experts with Graph of Tokens
by: Nguyen, Tam, et al.
Published: (2025)

Dynamic Adaptive Shared Experts with Grouped Multi-Head Attention Mixture of Experts
by: Li, Cheng, et al.
Published: (2025)

How Many Experts Are Enough? Towards Optimal Semantic Specialization for Mixture-of-Experts
by: Park, Sumin, et al.
Published: (2025)

Confidence Calibration in Vision-Language-Action Models
by: Zollo, Thomas P, et al.
Published: (2025)

Mixture-of-Clustered-Experts: Advancing Expert Specialization and Generalization in Instruction Tuning
by: Eo, Sugyeong, et al.
Published: (2025)

Black Box Causal Inference: Effect Estimation via Meta Prediction
by: Bynum, Lucius E. J., et al.
Published: (2025)

Prediction-powered Inference by Mixture of Experts
by: Gu, Yanwu, et al.
Published: (2026)

Improving Minimax Estimation Rates for Contaminated Mixture of Multinomial Logistic Experts via Expert Heterogeneity
by: Yan, Fanqi, et al.
Published: (2026)

Dual Mixture-of-Experts Framework for Discrete-Time Survival Analysis
by: Lee, Hyeonjun, et al.
Published: (2025)

Unsupervised Confidence Calibration for Reasoning LLMs from a Single Generation
by: Zollo, Thomas, et al.
Published: (2026)

Clustering Survival Data using a Mixture of Non-parametric Experts
by: Buginga, Gabriel, et al.
Published: (2024)

SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
by: Csordás, Róbert, et al.
Published: (2023)

Self-Augmented Mixture-of-Experts for QoS Prediction
by: Cai, Kecheng, et al.
Published: (2026)

Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
by: Lv, Ang, et al.
Published: (2025)

New-Onset Diabetes Assessment Using Artificial Intelligence-Enhanced Electrocardiography
by: Zhang, Hao, et al.
Published: (2022)

Selective Sinkhorn Routing for Improved Sparse Mixture of Experts
by: Nguyen, Duc Anh, et al.
Published: (2025)

Expert Merging in Sparse Mixture of Experts with Nash Bargaining
by: Nguyen, Dung V., et al.
Published: (2025)

Facet-Aware Multi-Head Mixture-of-Experts Model for Sequential Recommendation
by: Liu, Mingrui, et al.
Published: (2024)

Modeling Expert Interactions in Sparse Mixture of Experts via Graph Structures
by: Nguyen-Nhat, Minh-Khoi, et al.
Published: (2025)

Speculating Experts Accelerates Inference for Mixture-of-Experts
by: Madan, Vivan, et al.
Published: (2026)

Dense Backpropagation Improves Training for Sparse Mixture-of-Experts
by: Panda, Ashwinee, et al.
Published: (2025)

Neural Inhibition Improves Dynamic Routing and Mixture of Experts
by: Zou, Will Y., et al.
Published: (2025)

Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs
by: Li, Zhongyang, et al.
Published: (2025)

Mixture of Raytraced Experts
by: Perin, Andrea, et al.
Published: (2025)

Mixture of Lookup Experts
by: Jie, Shibo, et al.
Published: (2025)

Multi-Modal Time Series Prediction via Mixture of Modulated Experts
by: Zhang, Lige, et al.
Published: (2026)

AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert
by: Gao, Yuting, et al.
Published: (2025)

Efficiently Editing Mixture-of-Experts Models with Compressed Experts
by: He, Yifei, et al.
Published: (2025)

$μ$-Parametrization for Mixture of Experts
by: Małaśnicki, Jan, et al.
Published: (2025)

Path-Constrained Mixture-of-Experts
by: Gu, Zijin, et al.
Published: (2026)

Fast Training of Mixture-of-Experts for Time Series Forecasting via Expert Loss Integration
by: Mahtout, Btissame El, et al.
Published: (2026)