:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Song, Guanghui, Liao, Dongping, Zhao, Yiren, Ye, Kejiang, Xu, Cheng-zhong, Gao, Xitong
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Machine Learning
Online Access:	https://arxiv.org/abs/2506.13541
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

BridgeNet: A Unified Multimodal Framework for Bridging 2D and 3D Industrial Anomaly Detection
by: Xiang, An, et al.
Published: (2025)

Optimised Grouped-Query Attention Mechanism for Transformers
by: Chen, Yuang, et al.
Published: (2024)

FLIP: Towards Comprehensive and Reliable Evaluation of Federated Prompt Learning
by: Liao, Dongping, et al.
Published: (2025)

Unlock the Potential of Fine-grained LLM Serving via Dynamic Module Scaling
by: Wu, Jingfeng, et al.
Published: (2025)

HMoE: Heterogeneous Mixture of Experts for Language Modeling
by: Wang, An, et al.
Published: (2024)

Offline Map Matching Based on Localization Error Distribution Modeling
by: Xu, Ruilin, et al.
Published: (2025)

Dynamic Adaptive Shared Experts with Grouped Multi-Head Attention Mixture of Experts
by: Li, Cheng, et al.
Published: (2025)

Refining Salience-Aware Sparse Fine-Tuning Strategies for Language Models
by: Liu, Xinxin, et al.
Published: (2024)

Mixture of Heterogeneous Grouped Experts for Language Modeling
by: Ma, Zhicheng, et al.
Published: (2026)

PiKV: KV Cache Management System for Mixture of Experts
by: Liu, Dong, et al.
Published: (2025)

Towards a Comprehensive Scaling Law of Mixture-of-Experts
by: Zhao, Guoliang, et al.
Published: (2025)

BucketServe: Bucket-Based Dynamic Batching for Smart and Efficient LLM Inference Serving
by: Zheng, Wanyi, et al.
Published: (2025)

TriAxialKV: Toward Extreme Low-Precision KV-Cache Quantization for Agentic Inference Tasks
by: Shen, Hanzhang, et al.
Published: (2026)

Unlocking the Global Synergies in Low-Rank Adapters
by: Zhang, Zixi, et al.
Published: (2024)

LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management
by: Xiong, Yi, et al.
Published: (2024)

TokenPure: Watermark Removal through Tokenized Appearance and Structural Guidance
by: Yang, Pei, et al.
Published: (2025)

LAVa: Layer-wise KV Cache Eviction with Dynamic Budget Allocation
by: Shen, Yiqun, et al.
Published: (2025)

HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing
by: Gao, Yizhao, et al.
Published: (2026)

Topology Controls the Phase Separation Dynamics of Multicomponent Fluid Mixtures
by: Rennick, Michael, et al.
Published: (2025)

Scaling Laws For Mixed Quantization
by: Cao, Zeyu, et al.
Published: (2024)

Guided by the Experts: Provable Feature Learning Dynamic of Soft-Routed Mixture-of-Experts
by: Liao, Fangshuo, et al.
Published: (2025)

A Time Series is Worth Five Experts: Heterogeneous Mixture of Experts for Traffic Flow Prediction
by: Wang, Guangyu, et al.
Published: (2024)

BanaServe: Unified KV Cache and Dynamic Module Migration for Balancing Disaggregated LLM Serving in AI Infrastructure
by: He, Yiyuan, et al.
Published: (2025)

BanaServe: Unified KV Cache and Dynamic Module Migration for Balancing Disaggregated LLM Serving in AI Infrastructure
by: Yiyuan He, et al.
Published: (2026)

GroupedMixer: An Entropy Model with Group-wise Token-Mixers for Learned Image Compression
by: Li, Daxin, et al.
Published: (2024)

MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing
by: Go, Seokjin, et al.
Published: (2025)

Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers
by: Zhao, Xin, et al.
Published: (2025)

Generalizing GNNs with Tokenized Mixture of Experts
by: Guo, Xiaoguang, et al.
Published: (2026)

Heterogeneous Computing: The Key to Powering the Future of AI Agent Inference
by: Zhao, Yiren, et al.
Published: (2026)

SealOS+: A Sealos-based Approach for Adaptive Resource Optimization Under Dynamic Workloads for Securities Trading System
by: Jia, Haojie, et al.
Published: (2025)

Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts
by: Zhang, Zeliang, et al.
Published: (2024)

AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models
by: Zeng, Zihao, et al.
Published: (2024)

Group then Scale: Dynamic Mixture-of-Experts Multilingual Language Model
by: Li, Chong, et al.
Published: (2025)

Optimizing Mixture of Block Attention
by: Xiao, Guangxuan, et al.
Published: (2025)

DOPD: A Dynamic PD-Disaggregation Architecture for Maximizing Goodput in LLM Inference Serving
by: Liao, Junhan, et al.
Published: (2025)

OrdMoE: Preference Alignment via Hierarchical Expert Group Ranking in Multimodal Mixture-of-Experts LLMs
by: Gao, Yuting, et al.
Published: (2025)

Less, but Better: Efficient Multilingual Expansion for LLMs via Layer-wise Mixture-of-Experts
by: Zhang, Xue, et al.
Published: (2025)

KV Shifting Attention Enhances Language Modeling
by: Xu, Mingyu, et al.
Published: (2024)

GRA: Detecting Oriented Objects through Group-wise Rotating and Attention
by: Wang, Jiangshan, et al.
Published: (2024)

Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
by: Pu, Yifan, et al.
Published: (2024)