:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	An, Hongjun, Hu, Wenhan, Huang, Sida, Huang, Siqi, Li, Ruanjun, Liang, Yuanzhi, Shao, Jiawei, Song, Yiliang, Wang, Zihan, Yuan, Cheng, Zhang, Chi, Zhang, Hongyuan, Zhuang, Wenhao, Li, Xuelong
Format:	Preprint
Published:	2025
Subjects:	Artificial Intelligence Computation and Language Computer Vision and Pattern Recognition Distributed, Parallel, and Cluster Computing Signal Processing
Online Access:	https://arxiv.org/abs/2506.12479
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Reaching Agreement Among Reasoning LLM Agents
by: Ruan, Chaoyi, et al.
Published: (2025)

Computation-Bandwidth-Memory Trade-offs: A Unified Paradigm for AI Infrastructure
by: Fan, Yuankai, et al.
Published: (2025)

Building State Machine Replication Using Practical Network Synchrony
by: Wan, Yiliang, et al.
Published: (2025)

A Survey of Computation Offloading with Task Types
by: Zhang, Siqi, et al.
Published: (2023)

DHLink: A Microservice Platform supporting Rapid Application Development and Secure Real-time Data Sharing in Digital Health
by: Li, Wenhao, et al.
Published: (2021)

Distributed Consensus Network: A Modularized Communication Framework and Reliability Probabilistic Analysis
by: Li, Yuetai, et al.
Published: (2025)

ParamSpMM: Adaptive and Efficient Sparse Matrix-Matrix Multiplication on GPUs for GNNs
by: Zhang, Lixing, et al.
Published: (2026)

GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching
by: Guo, Cong, et al.
Published: (2024)

Federated Neural Radiance Field for Distributed Intelligence
by: Zhang, Yintian, et al.
Published: (2024)

Taming the Chaos: Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference
by: Li, Rongzhi, et al.
Published: (2025)

MixServe: An Automatic Distributed Serving System for MoE Models with Hybrid Parallelism Based on Fused Communication Algorithm
by: Zhou, Bowen, et al.
Published: (2026)

MTGenRec: An Efficient Distributed Training System for Generative Recommendation Models in Meituan
by: Wang, Yuxiang, et al.
Published: (2025)

Efficient Long Context Fine-tuning with Chunk Flow
by: Yuan, Xiulong, et al.
Published: (2025)

EACO-RAG: Towards Distributed Tiered LLM Deployment using Edge-Assisted and Collaborative RAG with Adaptive Knowledge Update
by: Li, Jiaxing, et al.
Published: (2024)

WindVE: Collaborative CPU-NPU Vector Embedding
by: Huang, Jinqi, et al.
Published: (2025)

SLO-Aware Scheduling for Large Language Model Inferences
by: Huang, Jinqi, et al.
Published: (2025)

NineToothed: A Triton-Based High-Level Domain-Specific Language for Machine Learning
by: Huang, Jiacheng, et al.
Published: (2025)

FlexFL: Heterogeneous Federated Learning via APoZ-Guided Flexible Pruning in Uncertain Scenarios
by: Chen, Zekai, et al.
Published: (2024)

FlashFuser: Expanding the Scale of Kernel Fusion for Compute-Intensive Operators via Inter-Core Connection
by: Huang, Ziyu, et al.
Published: (2025)

The Power of Abstract MAC Layer: A Fault-tolerance Perspective
by: Zhang, Qinzi, et al.
Published: (2024)

WindGP: Efficient Graph Partitioning on Heterogenous Machines
by: Zeng, Li, et al.
Published: (2024)

On Fault Tolerance of Data Storage Systems: A Holistic Perspective
by: Zheng, Mai, et al.
Published: (2025)

exa-AMD: A Scalable Workflow for Accelerating AI-Assisted Materials Discovery and Design
by: Moraru, Maxim, et al.
Published: (2025)

Split Fine-Tuning for Large Language Models in Wireless Networks
by: Zhang, Songge, et al.
Published: (2025)

A New Perspective of Graph Data and A Generic and Efficient Method for Large Scale Graph Data Traversal
by: Zhang, Chenglong
Published: (2020)

Federated Inference for Heterogeneous LLM Communication and Collaboration
by: Chen, Zihan, et al.
Published: (2026)

Frenzy: A Memory-Aware Serverless LLM Training System for Heterogeneous GPU Clusters
by: Chang, Zihan, et al.
Published: (2024)

LuWu: An End-to-End In-Network Out-of-Core Optimizer for 100B-Scale Model-in-Network Data-Parallel Training on Distributed GPUs
by: Sun, Mo, et al.
Published: (2024)

IPComp: Interpolation Based Progressive Lossy Compression for Scientific Applications
by: Yang, Zhuoxun, et al.
Published: (2025)

GPZ: GPU-Accelerated Lossy Compressor for Particle Data
by: Li, Ruoyu, et al.
Published: (2025)

BFLN: A Blockchain-based Federated Learning Model for Non-IID Data
by: Li, Yang, et al.
Published: (2024)

Squeezing Edge Performance: A Sensitivity-Aware Container Management for Heterogeneous Tasks
by: Zhang, Yongmin, et al.
Published: (2025)

FedQuad: Adaptive Layer-wise LoRA Deployment and Activation Quantization for Federated Fine-Tuning
by: Li, Rukuo, et al.
Published: (2025)

DreamDDP: Accelerating Data Parallel Distributed LLM Training with Layer-wise Scheduled Partial Synchronization
by: Tang, Zhenheng, et al.
Published: (2025)

Seer: Proactive Revenue-Aware Scheduling for Live Streaming Services in Crowdsourced Cloud-Edge Platforms
by: Huang, Shaoyuan, et al.
Published: (2024)

Efficient CPU-GPU Collaborative Inference for MoE-based LLMs on Memory-Limited Systems
by: Huang, En-Ming, et al.
Published: (2025)

FATE: Future-State-Aware Scheduling for Heterogeneous LLM Workflows
by: Huang, Zirui, et al.
Published: (2026)

AGoQ: Activation and Gradient Quantization for Memory-Efficient Distributed Training of LLMs
by: Lin, Wenxiang, et al.
Published: (2026)

A 1024 RV-Cores Shared-L1 Cluster with High Bandwidth Memory Link for Low-Latency 6G-SDR
by: Zhang, Yichao, et al.
Published: (2024)

Integrated Sensing, Communication, and Computing: An Information-oriented Resource Transaction Mechanism
by: Chen, Ning, et al.
Published: (2024)