Saved in:
| Main Authors: | Chen, Hao Mark, Mo, Zhiwen, Lee, Royson, Wang, Qianzhou, Li, Da, Hu, Shell Xu, Luk, Wayne, Hospedales, Timothy, Fan, Hongxiang |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.00879 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization
by: Chen, Hao Mark, et al.
Published: (2025)
by: Chen, Hao Mark, et al.
Published: (2025)
Model Diffusion for Certifiable Few-shot Transfer Learning
by: Rezk, Fady, et al.
Published: (2025)
by: Rezk, Fady, et al.
Published: (2025)
Feed-Forward Latent Domain Adaptation
by: Bohdal, Ondrej, et al.
Published: (2022)
by: Bohdal, Ondrej, et al.
Published: (2022)
MobileQuant: Mobile-friendly Quantization for On-device Language Models
by: Tan, Fuwen, et al.
Published: (2024)
by: Tan, Fuwen, et al.
Published: (2024)
FastTTS: Accelerating Test-Time Scaling for Edge LLM Reasoning
by: Chen, Hao Mark, et al.
Published: (2025)
by: Chen, Hao Mark, et al.
Published: (2025)
Recurrent Early Exits for Federated Learning with Heterogeneous Clients
by: Lee, Royson, et al.
Published: (2024)
by: Lee, Royson, et al.
Published: (2024)
A Bayesian Approach to Data Point Selection
by: Xu, Xinnuo, et al.
Published: (2024)
by: Xu, Xinnuo, et al.
Published: (2024)
FedP$^2$EFT: Federated Learning to Personalize PEFT for Multilingual LLMs
by: Lee, Royson, et al.
Published: (2025)
by: Lee, Royson, et al.
Published: (2025)
Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference
by: Chen, Hao Mark, et al.
Published: (2024)
by: Chen, Hao Mark, et al.
Published: (2024)
Enhancing LLM-based Quantum Code Generation with Multi-Agent Optimization and Quantum Error Correction
by: Campbell, Charlie, et al.
Published: (2025)
by: Campbell, Charlie, et al.
Published: (2025)
DeepStack: Scalable and Accurate Design Space Exploration for Distributed 3D-Stacked AI Accelerators
by: Mo, Zhiwen, et al.
Published: (2026)
by: Mo, Zhiwen, et al.
Published: (2026)
Enhancing Trustworthiness with Mixed Precision: Benchmarks, Opportunities, and Challenges
by: Lu, Guanxi, et al.
Published: (2025)
by: Lu, Guanxi, et al.
Published: (2025)
Hardware-Aware Neural Dropout Search for Reliable Uncertainty Prediction on FPGA
by: Zhang, Zehuan, et al.
Published: (2024)
by: Zhang, Zehuan, et al.
Published: (2024)
Dynamic Experts Search: Enhancing Reasoning in Mixture-of-Experts LLMs at Test Time
by: Han, Yixuan, et al.
Published: (2025)
by: Han, Yixuan, et al.
Published: (2025)
Progressive Mixed-Precision Decoding for Efficient LLM Inference
by: Chen, Hao Mark, et al.
Published: (2024)
by: Chen, Hao Mark, et al.
Published: (2024)
HD-MoE: Hybrid and Dynamic Parallelism for Mixture-of-Expert LLMs with 3D Near-Memory Processing
by: Huang, Haochen, et al.
Published: (2025)
by: Huang, Haochen, et al.
Published: (2025)
CLUES: Collaborative High-Quality Data Selection for LLMs via Training Dynamics
by: Zhao, Wanru, et al.
Published: (2025)
by: Zhao, Wanru, et al.
Published: (2025)
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts
by: Cai, Weilin, et al.
Published: (2024)
by: Cai, Weilin, et al.
Published: (2024)
ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning
by: Chavhan, Ruchika, et al.
Published: (2024)
by: Chavhan, Ruchika, et al.
Published: (2024)
Dynamic Adaptive Shared Experts with Grouped Multi-Head Attention Mixture of Experts
by: Li, Cheng, et al.
Published: (2025)
by: Li, Cheng, et al.
Published: (2025)
Accelerating 3D Gaussian Splatting with Neural Sorting and Axis-Oriented Rasterization
by: Wang, Zhican, et al.
Published: (2025)
by: Wang, Zhican, et al.
Published: (2025)
Towards Building Private LLMs: Exploring Multi-Node Expert Parallelism on Apple Silicon for Mixture-of-Experts Large Language Model
by: Chen, Mu-Chi, et al.
Published: (2025)
by: Chen, Mu-Chi, et al.
Published: (2025)
Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts
by: Nguyen, Xuan-Phi, et al.
Published: (2026)
by: Nguyen, Xuan-Phi, et al.
Published: (2026)
TradExpert: Revolutionizing Trading with Mixture of Expert LLMs
by: Ding, Qianggang, et al.
Published: (2024)
by: Ding, Qianggang, et al.
Published: (2024)
UniPool: A Globally Shared Expert Pool for Mixture-of-Experts
by: Huang, Minbin, et al.
Published: (2026)
by: Huang, Minbin, et al.
Published: (2026)
FLEx: Personalized Federated Learning for Mixture-of-Experts LLMs via Expert Grafting
by: Liu, Fan, et al.
Published: (2025)
by: Liu, Fan, et al.
Published: (2025)
Accelerating MRI Uncertainty Estimation with Mask-based Bayesian Neural Network
by: Zhang, Zehuan, et al.
Published: (2024)
by: Zhang, Zehuan, et al.
Published: (2024)
MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism
by: Zhu, Ruidong, et al.
Published: (2025)
by: Zhu, Ruidong, et al.
Published: (2025)
Enhancing Dropout-based Bayesian Neural Networks with Multi-Exit on FPGA
by: Chen, Hao Mark, et al.
Published: (2024)
by: Chen, Hao Mark, et al.
Published: (2024)
SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs?
by: Zhuang, Haomin, et al.
Published: (2024)
by: Zhuang, Haomin, et al.
Published: (2024)
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design
by: Cai, Ruisi, et al.
Published: (2024)
by: Cai, Ruisi, et al.
Published: (2024)
SYMI: Efficient Mixture-of-Experts Training via Model and Optimizer State Decoupling
by: Skiadopoulos, Athinagoras, et al.
Published: (2025)
by: Skiadopoulos, Athinagoras, et al.
Published: (2025)
HAP: Hybrid Adaptive Parallelism for Efficient Mixture-of-Experts Inference
by: Lin, Haoran, et al.
Published: (2025)
by: Lin, Haoran, et al.
Published: (2025)
Dropping Experts, Recombining Neurons: Retraining-Free Pruning for Sparse Mixture-of-Experts LLMs
by: Zhou, Yixiao, et al.
Published: (2025)
by: Zhou, Yixiao, et al.
Published: (2025)
LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing
by: Hao, Jiawei, et al.
Published: (2026)
by: Hao, Jiawei, et al.
Published: (2026)
Dynamic Expert Quantization for Scalable Mixture-of-Experts Inference
by: Chu, Kexin, et al.
Published: (2025)
by: Chu, Kexin, et al.
Published: (2025)
From Misclassifications to Outliers: Joint Reliability Assessment in Classification
by: Li, Yang, et al.
Published: (2026)
by: Li, Yang, et al.
Published: (2026)
Mixture of Experts for Low-Resource LLMs
by: Joseph, Ori Bar, et al.
Published: (2026)
by: Joseph, Ori Bar, et al.
Published: (2026)
Rethinking Optimal Verification Granularity for Compute-Efficient Test-Time Scaling
by: Chen, Hao Mark, et al.
Published: (2025)
by: Chen, Hao Mark, et al.
Published: (2025)
Understanding and Leveraging the Expert Specialization of Context Faithfulness in Mixture-of-Experts LLMs
by: Bai, Jun, et al.
Published: (2025)
by: Bai, Jun, et al.
Published: (2025)
Similar Items
-
FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization
by: Chen, Hao Mark, et al.
Published: (2025) -
Model Diffusion for Certifiable Few-shot Transfer Learning
by: Rezk, Fady, et al.
Published: (2025) -
Feed-Forward Latent Domain Adaptation
by: Bohdal, Ondrej, et al.
Published: (2022) -
MobileQuant: Mobile-friendly Quantization for On-device Language Models
by: Tan, Fuwen, et al.
Published: (2024) -
FastTTS: Accelerating Test-Time Scaling for Edge LLM Reasoning
by: Chen, Hao Mark, et al.
Published: (2025)