Saved in:
| Main Authors: | Yang, Yuanhang, Wang, Chaozheng, Li, Jing |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.07260 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
RevFFN: Memory-Efficient Full-Parameter Fine-Tuning of Mixture-of-Experts LLMs with Reversible Blocks
by: Liu, Ningyuan, et al.
Published: (2025)
by: Liu, Ningyuan, et al.
Published: (2025)
Analytical Provisioning for Attention-FFN Disaggregated LLM Serving under Stochastic Workloads
by: Song, Chendong, et al.
Published: (2026)
by: Song, Chendong, et al.
Published: (2026)
Dynamic Adaptive Shared Experts with Grouped Multi-Head Attention Mixture of Experts
by: Li, Cheng, et al.
Published: (2025)
by: Li, Cheng, et al.
Published: (2025)
Analytical FFN-to-MoE Restructuring via Activation Pattern Analysis
by: Pei, Zehua, et al.
Published: (2025)
by: Pei, Zehua, et al.
Published: (2025)
Sparsity Moves Computation: How FFN Architecture Reshapes Attention in Small Transformers
by: Smithline, Gabriel, et al.
Published: (2026)
by: Smithline, Gabriel, et al.
Published: (2026)
Translating Expert Intuition into Quantifiable Features: Encode Investigator Domain Knowledge via LLM for Enhanced Predictive Analytics
by: Jing, Phoebe, et al.
Published: (2024)
by: Jing, Phoebe, et al.
Published: (2024)
Fast Forward: Accelerating LLM Prefill with Predictive FFN Sparsity
by: Gautam, Aayush, et al.
Published: (2026)
by: Gautam, Aayush, et al.
Published: (2026)
How Far Can Disaggregation Go? A Design-Space Exploration of Attention-FFN Disaggregation for Efficient MoE LLM Serving
by: Wu, Hanjiang, et al.
Published: (2026)
by: Wu, Hanjiang, et al.
Published: (2026)
Sparse-VQ Transformer: An FFN-Free Framework with Vector Quantization for Enhanced Time Series Forecasting
by: Zhao, Yanjun, et al.
Published: (2024)
by: Zhao, Yanjun, et al.
Published: (2024)
Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations
by: Jaiswal, Ajay, et al.
Published: (2025)
by: Jaiswal, Ajay, et al.
Published: (2025)
XMoE: Sparse Models with Fine-grained and Adaptive Expert Selection
by: Yang, Yuanhang, et al.
Published: (2024)
by: Yang, Yuanhang, et al.
Published: (2024)
UniPool: A Globally Shared Expert Pool for Mixture-of-Experts
by: Huang, Minbin, et al.
Published: (2026)
by: Huang, Minbin, et al.
Published: (2026)
BuddyMoE: Exploiting Expert Redundancy to Accelerate Memory-Constrained Mixture-of-Experts Inference
by: Wang, Yun, et al.
Published: (2025)
by: Wang, Yun, et al.
Published: (2025)
A Shared Low-Rank Adaptation Approach to Personalized RLHF
by: Liu, Renpu, et al.
Published: (2025)
by: Liu, Renpu, et al.
Published: (2025)
Quaternion Self-Attention with Shared Scores
by: Yamauchi, Shogo, et al.
Published: (2026)
by: Yamauchi, Shogo, et al.
Published: (2026)
Optimal Expert-Attention Allocation in Mixture-of-Experts: A Scalable Law for Dynamic Model Design
by: Li, Junzhuo, et al.
Published: (2026)
by: Li, Junzhuo, et al.
Published: (2026)
IDInit: A Universal and Stable Initialization Method for Neural Network Training
by: Pan, Yu, et al.
Published: (2025)
by: Pan, Yu, et al.
Published: (2025)
Adaptive Shared Experts with LoRA-Based Mixture of Experts for Multi-Task Learning
by: Yang, Minghao, et al.
Published: (2025)
by: Yang, Minghao, et al.
Published: (2025)
LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing
by: Li, Wenbing, et al.
Published: (2025)
by: Li, Wenbing, et al.
Published: (2025)
XShare: Collaborative in-Batch Expert Sharing for Faster MoE Inference
by: Vankov, Daniil, et al.
Published: (2026)
by: Vankov, Daniil, et al.
Published: (2026)
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
by: Jin, Peng, et al.
Published: (2024)
by: Jin, Peng, et al.
Published: (2024)
Attention Needs to Focus: A Unified Perspective on Attention Allocation
by: Fu, Zichuan, et al.
Published: (2026)
by: Fu, Zichuan, et al.
Published: (2026)
TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts
by: Kunwar, Pradip, et al.
Published: (2025)
by: Kunwar, Pradip, et al.
Published: (2025)
UniRL-Zero: Reinforcement Learning on Unified Models with Joint Language Model and Diffusion Model Experts
by: Wang, Fu-Yun, et al.
Published: (2025)
by: Wang, Fu-Yun, et al.
Published: (2025)
Collaborative Multi-LoRA Experts with Achievement-based Multi-Tasks Loss for Unified Multimodal Information Extraction
by: Yuan, Li, et al.
Published: (2025)
by: Yuan, Li, et al.
Published: (2025)
MoE-Health: A Mixture of Experts Framework for Robust Multimodal Healthcare Prediction
by: Wang, Xiaoyang, et al.
Published: (2025)
by: Wang, Xiaoyang, et al.
Published: (2025)
Bifurcated Attention: Accelerating Massively Parallel Decoding with Shared Prefixes in LLMs
by: Athiwaratkun, Ben, et al.
Published: (2024)
by: Athiwaratkun, Ben, et al.
Published: (2024)
MoE-I$^2$: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition
by: Yang, Cheng, et al.
Published: (2024)
by: Yang, Cheng, et al.
Published: (2024)
MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
by: Chou, Yuhong, et al.
Published: (2024)
by: Chou, Yuhong, et al.
Published: (2024)
LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing
by: Hao, Jiawei, et al.
Published: (2026)
by: Hao, Jiawei, et al.
Published: (2026)
$μ$-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts
by: Koike-Akino, Toshiaki, et al.
Published: (2025)
by: Koike-Akino, Toshiaki, et al.
Published: (2025)
HiF-DTA: Hierarchical Feature Learning Network for Drug-Target Affinity Prediction
by: Li, Minghui, et al.
Published: (2025)
by: Li, Minghui, et al.
Published: (2025)
Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning
by: Zhao, Ziyu, et al.
Published: (2025)
by: Zhao, Ziyu, et al.
Published: (2025)
SD-MoE: Spectral Decomposition for Effective Expert Specialization
by: Huang, Ruijun, et al.
Published: (2026)
by: Huang, Ruijun, et al.
Published: (2026)
HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts
by: Zhao, Hao, et al.
Published: (2024)
by: Zhao, Hao, et al.
Published: (2024)
EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language Models
by: Chen, Yuanteng, et al.
Published: (2025)
by: Chen, Yuanteng, et al.
Published: (2025)
Unified Class and Domain Incremental Learning with Mixture of Experts for Indoor Localization
by: Singampalli, Akhil, et al.
Published: (2025)
by: Singampalli, Akhil, et al.
Published: (2025)
CAPS: Unifying Attention, Recurrence, and Alignment in Transformer-based Time Series Forecasting
by: Pati, Viresh, et al.
Published: (2026)
by: Pati, Viresh, et al.
Published: (2026)
KUET at StanceNakba Shared Task: StanceMoE: Mixture-of-Experts Architecture for Stance Detection
by: Shafi, Abdullah Al, et al.
Published: (2026)
by: Shafi, Abdullah Al, et al.
Published: (2026)
FAME: Adaptive Functional Attention with Expert Routing for Function-on-Function Regression
by: Gao, Yifei, et al.
Published: (2025)
by: Gao, Yifei, et al.
Published: (2025)
Similar Items
-
RevFFN: Memory-Efficient Full-Parameter Fine-Tuning of Mixture-of-Experts LLMs with Reversible Blocks
by: Liu, Ningyuan, et al.
Published: (2025) -
Analytical Provisioning for Attention-FFN Disaggregated LLM Serving under Stochastic Workloads
by: Song, Chendong, et al.
Published: (2026) -
Dynamic Adaptive Shared Experts with Grouped Multi-Head Attention Mixture of Experts
by: Li, Cheng, et al.
Published: (2025) -
Analytical FFN-to-MoE Restructuring via Activation Pattern Analysis
by: Pei, Zehua, et al.
Published: (2025) -
Sparsity Moves Computation: How FFN Architecture Reshapes Attention in Small Transformers
by: Smithline, Gabriel, et al.
Published: (2026)