:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yang, Yuanhang, Wang, Chaozheng, Li, Jing
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Artificial Intelligence
Online Access:	https://arxiv.org/abs/2505.07260
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

RevFFN: Memory-Efficient Full-Parameter Fine-Tuning of Mixture-of-Experts LLMs with Reversible Blocks
by: Liu, Ningyuan, et al.
Published: (2025)

Analytical Provisioning for Attention-FFN Disaggregated LLM Serving under Stochastic Workloads
by: Song, Chendong, et al.
Published: (2026)

Dynamic Adaptive Shared Experts with Grouped Multi-Head Attention Mixture of Experts
by: Li, Cheng, et al.
Published: (2025)

Analytical FFN-to-MoE Restructuring via Activation Pattern Analysis
by: Pei, Zehua, et al.
Published: (2025)

Sparsity Moves Computation: How FFN Architecture Reshapes Attention in Small Transformers
by: Smithline, Gabriel, et al.
Published: (2026)

Translating Expert Intuition into Quantifiable Features: Encode Investigator Domain Knowledge via LLM for Enhanced Predictive Analytics
by: Jing, Phoebe, et al.
Published: (2024)

Fast Forward: Accelerating LLM Prefill with Predictive FFN Sparsity
by: Gautam, Aayush, et al.
Published: (2026)

How Far Can Disaggregation Go? A Design-Space Exploration of Attention-FFN Disaggregation for Efficient MoE LLM Serving
by: Wu, Hanjiang, et al.
Published: (2026)

Sparse-VQ Transformer: An FFN-Free Framework with Vector Quantization for Enhanced Time Series Forecasting
by: Zhao, Yanjun, et al.
Published: (2024)

Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations
by: Jaiswal, Ajay, et al.
Published: (2025)

XMoE: Sparse Models with Fine-grained and Adaptive Expert Selection
by: Yang, Yuanhang, et al.
Published: (2024)

UniPool: A Globally Shared Expert Pool for Mixture-of-Experts
by: Huang, Minbin, et al.
Published: (2026)

BuddyMoE: Exploiting Expert Redundancy to Accelerate Memory-Constrained Mixture-of-Experts Inference
by: Wang, Yun, et al.
Published: (2025)

A Shared Low-Rank Adaptation Approach to Personalized RLHF
by: Liu, Renpu, et al.
Published: (2025)

Quaternion Self-Attention with Shared Scores
by: Yamauchi, Shogo, et al.
Published: (2026)

Optimal Expert-Attention Allocation in Mixture-of-Experts: A Scalable Law for Dynamic Model Design
by: Li, Junzhuo, et al.
Published: (2026)

IDInit: A Universal and Stable Initialization Method for Neural Network Training
by: Pan, Yu, et al.
Published: (2025)

Adaptive Shared Experts with LoRA-Based Mixture of Experts for Multi-Task Learning
by: Yang, Minghao, et al.
Published: (2025)

LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing
by: Li, Wenbing, et al.
Published: (2025)

XShare: Collaborative in-Batch Expert Sharing for Faster MoE Inference
by: Vankov, Daniil, et al.
Published: (2026)

MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
by: Jin, Peng, et al.
Published: (2024)

Attention Needs to Focus: A Unified Perspective on Attention Allocation
by: Fu, Zichuan, et al.
Published: (2026)

TT-LoRA MoE: Unifying Parameter-Efficient Fine-Tuning and Sparse Mixture-of-Experts
by: Kunwar, Pradip, et al.
Published: (2025)

UniRL-Zero: Reinforcement Learning on Unified Models with Joint Language Model and Diffusion Model Experts
by: Wang, Fu-Yun, et al.
Published: (2025)

Collaborative Multi-LoRA Experts with Achievement-based Multi-Tasks Loss for Unified Multimodal Information Extraction
by: Yuan, Li, et al.
Published: (2025)

MoE-Health: A Mixture of Experts Framework for Robust Multimodal Healthcare Prediction
by: Wang, Xiaoyang, et al.
Published: (2025)

Bifurcated Attention: Accelerating Massively Parallel Decoding with Shared Prefixes in LLMs
by: Athiwaratkun, Ben, et al.
Published: (2024)

MoE-I$^2$: Compressing Mixture of Experts Models through Inter-Expert Pruning and Intra-Expert Low-Rank Decomposition
by: Yang, Cheng, et al.
Published: (2024)

MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
by: Chou, Yuhong, et al.
Published: (2024)

LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing
by: Hao, Jiawei, et al.
Published: (2026)

$μ$-MoE: Test-Time Pruning as Micro-Grained Mixture-of-Experts
by: Koike-Akino, Toshiaki, et al.
Published: (2025)

HiF-DTA: Hierarchical Feature Learning Network for Drug-Target Affinity Prediction
by: Li, Minghui, et al.
Published: (2025)

Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning
by: Zhao, Ziyu, et al.
Published: (2025)

SD-MoE: Spectral Decomposition for Effective Expert Specialization
by: Huang, Ruijun, et al.
Published: (2026)

HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts
by: Zhao, Hao, et al.
Published: (2024)

EAC-MoE: Expert-Selection Aware Compressor for Mixture-of-Experts Large Language Models
by: Chen, Yuanteng, et al.
Published: (2025)

Unified Class and Domain Incremental Learning with Mixture of Experts for Indoor Localization
by: Singampalli, Akhil, et al.
Published: (2025)

CAPS: Unifying Attention, Recurrence, and Alignment in Transformer-based Time Series Forecasting
by: Pati, Viresh, et al.
Published: (2026)

KUET at StanceNakba Shared Task: StanceMoE: Mixture-of-Experts Architecture for Stance Detection
by: Shafi, Abdullah Al, et al.
Published: (2026)

FAME: Adaptive Functional Attention with Expert Routing for Function-on-Function Regression
by: Gao, Yifei, et al.
Published: (2025)