Saved in:
| Main Authors: | Morrill, Todd, Puli, Aahlad, Megjhani, Murad, Park, Soojin, Zemel, Richard |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2511.09567 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Attention and Compression is all you need for Controllably Efficient Language Models
by: Prakash, Jatin, et al.
Published: (2025)
by: Prakash, Jatin, et al.
Published: (2025)
Explanations that reveal all through the definition of encoding
by: Puli, Aahlad, et al.
Published: (2024)
by: Puli, Aahlad, et al.
Published: (2024)
Nuisances via Negativa: Adjusting for Spurious Correlations via Data Augmentation
by: Puli, Aahlad, et al.
Published: (2022)
by: Puli, Aahlad, et al.
Published: (2022)
Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited Modalities
by: Saporta, Adriel, et al.
Published: (2024)
by: Saporta, Adriel, et al.
Published: (2024)
Learning More Generalized Experts by Merging Experts in Mixture-of-Experts
by: Park, Sejik
Published: (2024)
by: Park, Sejik
Published: (2024)
Multi-Head Mixture-of-Experts
by: Wu, Xun, et al.
Published: (2024)
by: Wu, Xun, et al.
Published: (2024)
Continual Traffic Forecasting via Mixture of Experts
by: Lee, Sanghyun, et al.
Published: (2024)
by: Lee, Sanghyun, et al.
Published: (2024)
Prompt Risk Control: A Rigorous Framework for Responsible Deployment of Large Language Models
by: Zollo, Thomas P., et al.
Published: (2023)
by: Zollo, Thomas P., et al.
Published: (2023)
Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts
by: Chen, Shengzhuang, et al.
Published: (2025)
by: Chen, Shengzhuang, et al.
Published: (2025)
Improving Routing in Sparse Mixture of Experts with Graph of Tokens
by: Nguyen, Tam, et al.
Published: (2025)
by: Nguyen, Tam, et al.
Published: (2025)
Dynamic Adaptive Shared Experts with Grouped Multi-Head Attention Mixture of Experts
by: Li, Cheng, et al.
Published: (2025)
by: Li, Cheng, et al.
Published: (2025)
How Many Experts Are Enough? Towards Optimal Semantic Specialization for Mixture-of-Experts
by: Park, Sumin, et al.
Published: (2025)
by: Park, Sumin, et al.
Published: (2025)
Confidence Calibration in Vision-Language-Action Models
by: Zollo, Thomas P, et al.
Published: (2025)
by: Zollo, Thomas P, et al.
Published: (2025)
Mixture-of-Clustered-Experts: Advancing Expert Specialization and Generalization in Instruction Tuning
by: Eo, Sugyeong, et al.
Published: (2025)
by: Eo, Sugyeong, et al.
Published: (2025)
Black Box Causal Inference: Effect Estimation via Meta Prediction
by: Bynum, Lucius E. J., et al.
Published: (2025)
by: Bynum, Lucius E. J., et al.
Published: (2025)
Prediction-powered Inference by Mixture of Experts
by: Gu, Yanwu, et al.
Published: (2026)
by: Gu, Yanwu, et al.
Published: (2026)
Improving Minimax Estimation Rates for Contaminated Mixture of Multinomial Logistic Experts via Expert Heterogeneity
by: Yan, Fanqi, et al.
Published: (2026)
by: Yan, Fanqi, et al.
Published: (2026)
Dual Mixture-of-Experts Framework for Discrete-Time Survival Analysis
by: Lee, Hyeonjun, et al.
Published: (2025)
by: Lee, Hyeonjun, et al.
Published: (2025)
Unsupervised Confidence Calibration for Reasoning LLMs from a Single Generation
by: Zollo, Thomas, et al.
Published: (2026)
by: Zollo, Thomas, et al.
Published: (2026)
Clustering Survival Data using a Mixture of Non-parametric Experts
by: Buginga, Gabriel, et al.
Published: (2024)
by: Buginga, Gabriel, et al.
Published: (2024)
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
by: Csordás, Róbert, et al.
Published: (2023)
by: Csordás, Róbert, et al.
Published: (2023)
Self-Augmented Mixture-of-Experts for QoS Prediction
by: Cai, Kecheng, et al.
Published: (2026)
by: Cai, Kecheng, et al.
Published: (2026)
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss
by: Lv, Ang, et al.
Published: (2025)
by: Lv, Ang, et al.
Published: (2025)
New-Onset Diabetes Assessment Using Artificial Intelligence-Enhanced Electrocardiography
by: Zhang, Hao, et al.
Published: (2022)
by: Zhang, Hao, et al.
Published: (2022)
Selective Sinkhorn Routing for Improved Sparse Mixture of Experts
by: Nguyen, Duc Anh, et al.
Published: (2025)
by: Nguyen, Duc Anh, et al.
Published: (2025)
Expert Merging in Sparse Mixture of Experts with Nash Bargaining
by: Nguyen, Dung V., et al.
Published: (2025)
by: Nguyen, Dung V., et al.
Published: (2025)
Facet-Aware Multi-Head Mixture-of-Experts Model for Sequential Recommendation
by: Liu, Mingrui, et al.
Published: (2024)
by: Liu, Mingrui, et al.
Published: (2024)
Modeling Expert Interactions in Sparse Mixture of Experts via Graph Structures
by: Nguyen-Nhat, Minh-Khoi, et al.
Published: (2025)
by: Nguyen-Nhat, Minh-Khoi, et al.
Published: (2025)
Speculating Experts Accelerates Inference for Mixture-of-Experts
by: Madan, Vivan, et al.
Published: (2026)
by: Madan, Vivan, et al.
Published: (2026)
Dense Backpropagation Improves Training for Sparse Mixture-of-Experts
by: Panda, Ashwinee, et al.
Published: (2025)
by: Panda, Ashwinee, et al.
Published: (2025)
Neural Inhibition Improves Dynamic Routing and Mixture of Experts
by: Zou, Will Y., et al.
Published: (2025)
by: Zou, Will Y., et al.
Published: (2025)
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs
by: Li, Zhongyang, et al.
Published: (2025)
by: Li, Zhongyang, et al.
Published: (2025)
Mixture of Raytraced Experts
by: Perin, Andrea, et al.
Published: (2025)
by: Perin, Andrea, et al.
Published: (2025)
Mixture of Lookup Experts
by: Jie, Shibo, et al.
Published: (2025)
by: Jie, Shibo, et al.
Published: (2025)
Multi-Modal Time Series Prediction via Mixture of Modulated Experts
by: Zhang, Lige, et al.
Published: (2026)
by: Zhang, Lige, et al.
Published: (2026)
AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert
by: Gao, Yuting, et al.
Published: (2025)
by: Gao, Yuting, et al.
Published: (2025)
Efficiently Editing Mixture-of-Experts Models with Compressed Experts
by: He, Yifei, et al.
Published: (2025)
by: He, Yifei, et al.
Published: (2025)
$μ$-Parametrization for Mixture of Experts
by: Małaśnicki, Jan, et al.
Published: (2025)
by: Małaśnicki, Jan, et al.
Published: (2025)
Path-Constrained Mixture-of-Experts
by: Gu, Zijin, et al.
Published: (2026)
by: Gu, Zijin, et al.
Published: (2026)
Fast Training of Mixture-of-Experts for Time Series Forecasting via Expert Loss Integration
by: Mahtout, Btissame El, et al.
Published: (2026)
by: Mahtout, Btissame El, et al.
Published: (2026)
Similar Items
-
Attention and Compression is all you need for Controllably Efficient Language Models
by: Prakash, Jatin, et al.
Published: (2025) -
Explanations that reveal all through the definition of encoding
by: Puli, Aahlad, et al.
Published: (2024) -
Nuisances via Negativa: Adjusting for Spurious Correlations via Data Augmentation
by: Puli, Aahlad, et al.
Published: (2022) -
Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited Modalities
by: Saporta, Adriel, et al.
Published: (2024) -
Learning More Generalized Experts by Merging Experts in Mixture-of-Experts
by: Park, Sejik
Published: (2024)