Saved in:
| Main Authors: | Song, Guanghui, Liao, Dongping, Zhao, Yiren, Ye, Kejiang, Xu, Cheng-zhong, Gao, Xitong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.13541 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
BridgeNet: A Unified Multimodal Framework for Bridging 2D and 3D Industrial Anomaly Detection
by: Xiang, An, et al.
Published: (2025)
by: Xiang, An, et al.
Published: (2025)
Optimised Grouped-Query Attention Mechanism for Transformers
by: Chen, Yuang, et al.
Published: (2024)
by: Chen, Yuang, et al.
Published: (2024)
FLIP: Towards Comprehensive and Reliable Evaluation of Federated Prompt Learning
by: Liao, Dongping, et al.
Published: (2025)
by: Liao, Dongping, et al.
Published: (2025)
Unlock the Potential of Fine-grained LLM Serving via Dynamic Module Scaling
by: Wu, Jingfeng, et al.
Published: (2025)
by: Wu, Jingfeng, et al.
Published: (2025)
HMoE: Heterogeneous Mixture of Experts for Language Modeling
by: Wang, An, et al.
Published: (2024)
by: Wang, An, et al.
Published: (2024)
Offline Map Matching Based on Localization Error Distribution Modeling
by: Xu, Ruilin, et al.
Published: (2025)
by: Xu, Ruilin, et al.
Published: (2025)
Dynamic Adaptive Shared Experts with Grouped Multi-Head Attention Mixture of Experts
by: Li, Cheng, et al.
Published: (2025)
by: Li, Cheng, et al.
Published: (2025)
Refining Salience-Aware Sparse Fine-Tuning Strategies for Language Models
by: Liu, Xinxin, et al.
Published: (2024)
by: Liu, Xinxin, et al.
Published: (2024)
Mixture of Heterogeneous Grouped Experts for Language Modeling
by: Ma, Zhicheng, et al.
Published: (2026)
by: Ma, Zhicheng, et al.
Published: (2026)
PiKV: KV Cache Management System for Mixture of Experts
by: Liu, Dong, et al.
Published: (2025)
by: Liu, Dong, et al.
Published: (2025)
Towards a Comprehensive Scaling Law of Mixture-of-Experts
by: Zhao, Guoliang, et al.
Published: (2025)
by: Zhao, Guoliang, et al.
Published: (2025)
BucketServe: Bucket-Based Dynamic Batching for Smart and Efficient LLM Inference Serving
by: Zheng, Wanyi, et al.
Published: (2025)
by: Zheng, Wanyi, et al.
Published: (2025)
TriAxialKV: Toward Extreme Low-Precision KV-Cache Quantization for Agentic Inference Tasks
by: Shen, Hanzhang, et al.
Published: (2026)
by: Shen, Hanzhang, et al.
Published: (2026)
Unlocking the Global Synergies in Low-Rank Adapters
by: Zhang, Zixi, et al.
Published: (2024)
by: Zhang, Zixi, et al.
Published: (2024)
LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management
by: Xiong, Yi, et al.
Published: (2024)
by: Xiong, Yi, et al.
Published: (2024)
TokenPure: Watermark Removal through Tokenized Appearance and Structural Guidance
by: Yang, Pei, et al.
Published: (2025)
by: Yang, Pei, et al.
Published: (2025)
LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation
by: Shen, Yiqun, et al.
Published: (2025)
by: Shen, Yiqun, et al.
Published: (2025)
HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing
by: Gao, Yizhao, et al.
Published: (2026)
by: Gao, Yizhao, et al.
Published: (2026)
Topology Controls the Phase Separation Dynamics of Multicomponent Fluid Mixtures
by: Rennick, Michael, et al.
Published: (2025)
by: Rennick, Michael, et al.
Published: (2025)
Scaling Laws For Mixed Quantization
by: Cao, Zeyu, et al.
Published: (2024)
by: Cao, Zeyu, et al.
Published: (2024)
Guided by the Experts: Provable Feature Learning Dynamic of Soft-Routed Mixture-of-Experts
by: Liao, Fangshuo, et al.
Published: (2025)
by: Liao, Fangshuo, et al.
Published: (2025)
A Time Series is Worth Five Experts: Heterogeneous Mixture of Experts for Traffic Flow Prediction
by: Wang, Guangyu, et al.
Published: (2024)
by: Wang, Guangyu, et al.
Published: (2024)
BanaServe: Unified KV Cache and Dynamic Module Migration for Balancing Disaggregated LLM Serving in AI Infrastructure
by: He, Yiyuan, et al.
Published: (2025)
by: He, Yiyuan, et al.
Published: (2025)
BanaServe: Unified KV Cache and Dynamic Module Migration for Balancing Disaggregated LLM Serving in AI Infrastructure
by: Yiyuan He, et al.
Published: (2026)
by: Yiyuan He, et al.
Published: (2026)
GroupedMixer: An Entropy Model with Group-wise Token-Mixers for Learned Image Compression
by: Li, Daxin, et al.
Published: (2024)
by: Li, Daxin, et al.
Published: (2024)
MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing
by: Go, Seokjin, et al.
Published: (2025)
by: Go, Seokjin, et al.
Published: (2025)
Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers
by: Zhao, Xin, et al.
Published: (2025)
by: Zhao, Xin, et al.
Published: (2025)
Generalizing GNNs with Tokenized Mixture of Experts
by: Guo, Xiaoguang, et al.
Published: (2026)
by: Guo, Xiaoguang, et al.
Published: (2026)
Heterogeneous Computing: The Key to Powering the Future of AI Agent Inference
by: Zhao, Yiren, et al.
Published: (2026)
by: Zhao, Yiren, et al.
Published: (2026)
SealOS+: A Sealos-based Approach for Adaptive Resource Optimization Under Dynamic Workloads for Securities Trading System
by: Jia, Haojie, et al.
Published: (2025)
by: Jia, Haojie, et al.
Published: (2025)
Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts
by: Zhang, Zeliang, et al.
Published: (2024)
by: Zhang, Zeliang, et al.
Published: (2024)
AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models
by: Zeng, Zihao, et al.
Published: (2024)
by: Zeng, Zihao, et al.
Published: (2024)
Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model
by: Li, Chong, et al.
Published: (2025)
by: Li, Chong, et al.
Published: (2025)
Optimizing Mixture of Block Attention
by: Xiao, Guangxuan, et al.
Published: (2025)
by: Xiao, Guangxuan, et al.
Published: (2025)
DOPD: A Dynamic PD-Disaggregation Architecture for Maximizing Goodput in LLM Inference Serving
by: Liao, Junhan, et al.
Published: (2025)
by: Liao, Junhan, et al.
Published: (2025)
OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs
by: Gao, Yuting, et al.
Published: (2025)
by: Gao, Yuting, et al.
Published: (2025)
Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts
by: Zhang, Xue, et al.
Published: (2025)
by: Zhang, Xue, et al.
Published: (2025)
KV Shifting Attention Enhances Language Modeling
by: Xu, Mingyu, et al.
Published: (2024)
by: Xu, Mingyu, et al.
Published: (2024)
GRA: Detecting Oriented Objects through Group-wise Rotating and Attention
by: Wang, Jiangshan, et al.
Published: (2024)
by: Wang, Jiangshan, et al.
Published: (2024)
Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
by: Pu, Yifan, et al.
Published: (2024)
by: Pu, Yifan, et al.
Published: (2024)
Similar Items
-
BridgeNet: A Unified Multimodal Framework for Bridging 2D and 3D Industrial Anomaly Detection
by: Xiang, An, et al.
Published: (2025) -
Optimised Grouped-Query Attention Mechanism for Transformers
by: Chen, Yuang, et al.
Published: (2024) -
FLIP: Towards Comprehensive and Reliable Evaluation of Federated Prompt Learning
by: Liao, Dongping, et al.
Published: (2025) -
Unlock the Potential of Fine-grained LLM Serving via Dynamic Module Scaling
by: Wu, Jingfeng, et al.
Published: (2025) -
HMoE: Heterogeneous Mixture of Experts for Language Modeling
by: Wang, An, et al.
Published: (2024)