Saved in:
| Main Authors: | Su, Bor-Yiing, Dykas, Peter, Chrzanowski, Mike, Chhugani, Jatin |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2512.22804 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MoR: Mixture of Ranks for Low-Rank Adaptation Tuning
by: Tang, Chuanyu, et al.
Published: (2024)
by: Tang, Chuanyu, et al.
Published: (2024)
Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction
by: Chhugani, Jatin, et al.
Published: (2026)
by: Chhugani, Jatin, et al.
Published: (2026)
Methods of improving LLM training stability
by: Rybakov, Oleg, et al.
Published: (2024)
by: Rybakov, Oleg, et al.
Published: (2024)
MixGCN: Scalable GCN Training by Mixture of Parallelism and Mixture of Accelerators
by: Wan, Cheng, et al.
Published: (2025)
by: Wan, Cheng, et al.
Published: (2025)
MoR: Better Handling Diverse Queries with a Mixture of Sparse, Dense, and Human Retrievers
by: Kalra, Jushaan Singh, et al.
Published: (2025)
by: Kalra, Jushaan Singh, et al.
Published: (2025)
A Metric Driven Approach to Mixed Precision Training
by: Rasquinha, Mitchelle, et al.
Published: (2024)
by: Rasquinha, Mitchelle, et al.
Published: (2024)
PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts
by: Su, Yang, et al.
Published: (2025)
by: Su, Yang, et al.
Published: (2025)
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
by: Lin, Xi Victoria, et al.
Published: (2024)
by: Lin, Xi Victoria, et al.
Published: (2024)
MP-ISMoE: Mixed-Precision Interactive Side Mixture-of-Experts for Efficient Transfer Learning
by: Zhang, Yutong, et al.
Published: (2026)
by: Zhang, Yutong, et al.
Published: (2026)
APTQ: Attention-aware Post-Training Mixed-Precision Quantization for Large Language Models
by: Guan, Ziyi, et al.
Published: (2024)
by: Guan, Ziyi, et al.
Published: (2024)
MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization
by: Guo, Jingming, et al.
Published: (2024)
by: Guo, Jingming, et al.
Published: (2024)
MoE-DisCo:Low Economy Cost Training Mixture-of-Experts Models
by: Ye, Xin, et al.
Published: (2026)
by: Ye, Xin, et al.
Published: (2026)
MoBiQuant: Mixture-of-Bits Quantization for Token-Adaptive Any-Precision LLM
by: Wang, Dongwei, et al.
Published: (2026)
by: Wang, Dongwei, et al.
Published: (2026)
MoST: Mixing Speech and Text with Modality-Aware Mixture of Experts
by: Lou, Yuxuan, et al.
Published: (2026)
by: Lou, Yuxuan, et al.
Published: (2026)
MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization
by: Zhao, Zhixiong, et al.
Published: (2026)
by: Zhao, Zhixiong, et al.
Published: (2026)
MergeMix: Optimizing Mid-Training Data Mixtures via Learnable Model Merging
by: Wang, Jiapeng, et al.
Published: (2026)
by: Wang, Jiapeng, et al.
Published: (2026)
Grouter: Decoupling Routing from Representation for Accelerated MoE Training
by: Xu, Yuqi, et al.
Published: (2026)
by: Xu, Yuqi, et al.
Published: (2026)
What Can You Do When You Have Zero Rewards During RL?
by: Prakash, Jatin, et al.
Published: (2025)
by: Prakash, Jatin, et al.
Published: (2025)
Super Level Sets and Exponential Decay: A Synergistic Approach to Stable Neural Network Training
by: Chaudhary, Jatin, et al.
Published: (2024)
by: Chaudhary, Jatin, et al.
Published: (2024)
L-MoE: End-to-End Training of a Lightweight Mixture of Low-Rank Adaptation Experts
by: Ji, Shihao, et al.
Published: (2025)
by: Ji, Shihao, et al.
Published: (2025)
Scale When Needed: Adaptive Neuron-level Mixed Precision Quantization Aware Training
by: Varshney, Ayush K., et al.
Published: (2026)
by: Varshney, Ayush K., et al.
Published: (2026)
QuantMoE-Bench: Examining Post-Training Quantization for Mixture-of-Experts
by: Li, Pingzhi, et al.
Published: (2024)
by: Li, Pingzhi, et al.
Published: (2024)
Training Time Prediction for Mixed Precision-based Distributed Training
by: Kang, Minchul, et al.
Published: (2026)
by: Kang, Minchul, et al.
Published: (2026)
Pathway-based Progressive Inference (PaPI) for Energy-Efficient Continual Learning
by: Gaurav, Suyash, et al.
Published: (2025)
by: Gaurav, Suyash, et al.
Published: (2025)
MoKA: Mixture of Kronecker Adapters
by: Sadeghi, Mohammadreza, et al.
Published: (2025)
by: Sadeghi, Mohammadreza, et al.
Published: (2025)
Mixture of Experts (MoE): A Big Data Perspective
by: Gan, Wensheng, et al.
Published: (2025)
by: Gan, Wensheng, et al.
Published: (2025)
SDG-MoE: Signed Debate Graph Mixture-of-Experts
by: Kulibaba, Stepan, et al.
Published: (2026)
by: Kulibaba, Stepan, et al.
Published: (2026)
ProbMoE: Differentiable Probabilistic Routing for Mixture-of-Experts
by: Zhao, Heng, et al.
Published: (2026)
by: Zhao, Heng, et al.
Published: (2026)
STaMP: Sequence Transformation and Mixed Precision for Low-Precision Activation Quantization
by: Federici, Marco, et al.
Published: (2025)
by: Federici, Marco, et al.
Published: (2025)
Mixed-Precision Federated Learning via Multi-Precision Over-The-Air Aggregation
by: Yuan, Jinsheng, et al.
Published: (2024)
by: Yuan, Jinsheng, et al.
Published: (2024)
MoWE : A Mixture of Weather Experts
by: Chakraborty, Dibyajyoti, et al.
Published: (2025)
by: Chakraborty, Dibyajyoti, et al.
Published: (2025)
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
by: Jin, Peng, et al.
Published: (2024)
by: Jin, Peng, et al.
Published: (2024)
From Molecules to Mixtures: Learning Representations of Olfactory Mixture Similarity using Inductive Biases
by: Tom, Gary, et al.
Published: (2025)
by: Tom, Gary, et al.
Published: (2025)
Score-of-Mixture Training: Training One-Step Generative Models Made Simple via Score Estimation of Mixture Distributions
by: Jayashankar, Tejas, et al.
Published: (2025)
by: Jayashankar, Tejas, et al.
Published: (2025)
MoBA: Mixture of Block Attention for Long-Context LLMs
by: Lu, Enzhe, et al.
Published: (2025)
by: Lu, Enzhe, et al.
Published: (2025)
MoIN: Mixture of Introvert Experts to Upcycle an LLM
by: Tejankar, Ajinkya, et al.
Published: (2024)
by: Tejankar, Ajinkya, et al.
Published: (2024)
MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models
by: Liu, Wenyuan, et al.
Published: (2025)
by: Liu, Wenyuan, et al.
Published: (2025)
Mixed-Precision Quantization for Language Models: Techniques and Prospects
by: Rakka, Mariam, et al.
Published: (2025)
by: Rakka, Mariam, et al.
Published: (2025)
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
by: Duanmu, Haojie, et al.
Published: (2025)
by: Duanmu, Haojie, et al.
Published: (2025)
MixtureKit: A General Framework for Composing, Training, and Visualizing Mixture-of-Experts Models
by: Chamma, Ahmad, et al.
Published: (2025)
by: Chamma, Ahmad, et al.
Published: (2025)
Similar Items
-
MoR: Mixture of Ranks for Low-Rank Adaptation Tuning
by: Tang, Chuanyu, et al.
Published: (2024) -
Unveiling the Potential of Quantization with MXFP4: Strategies for Quantization Error Reduction
by: Chhugani, Jatin, et al.
Published: (2026) -
Methods of improving LLM training stability
by: Rybakov, Oleg, et al.
Published: (2024) -
MixGCN: Scalable GCN Training by Mixture of Parallelism and Mixture of Accelerators
by: Wan, Cheng, et al.
Published: (2025) -
MoR: Better Handling Diverse Queries with a Mixture of Sparse, Dense, and Human Retrievers
by: Kalra, Jushaan Singh, et al.
Published: (2025)