Saved in:
| Main Authors: | An, Hongjun, Hu, Wenhan, Huang, Sida, Huang, Siqi, Li, Ruanjun, Liang, Yuanzhi, Shao, Jiawei, Song, Yiliang, Wang, Zihan, Yuan, Cheng, Zhang, Chi, Zhang, Hongyuan, Zhuang, Wenhao, Li, Xuelong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.12479 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Reaching Agreement Among Reasoning LLM Agents
by: Ruan, Chaoyi, et al.
Published: (2025)
by: Ruan, Chaoyi, et al.
Published: (2025)
Computation-Bandwidth-Memory Trade-offs: A Unified Paradigm for AI Infrastructure
by: Fan, Yuankai, et al.
Published: (2025)
by: Fan, Yuankai, et al.
Published: (2025)
Building State Machine Replication Using Practical Network Synchrony
by: Wan, Yiliang, et al.
Published: (2025)
by: Wan, Yiliang, et al.
Published: (2025)
A Survey of Computation Offloading with Task Types
by: Zhang, Siqi, et al.
Published: (2023)
by: Zhang, Siqi, et al.
Published: (2023)
DHLink: A Microservice Platform supporting Rapid Application Development and Secure Real-time Data Sharing in Digital Health
by: Li, Wenhao, et al.
Published: (2021)
by: Li, Wenhao, et al.
Published: (2021)
Distributed Consensus Network: A Modularized Communication Framework and Reliability Probabilistic Analysis
by: Li, Yuetai, et al.
Published: (2025)
by: Li, Yuetai, et al.
Published: (2025)
ParamSpMM: Adaptive and Efficient Sparse Matrix-Matrix Multiplication on GPUs for GNNs
by: Zhang, Lixing, et al.
Published: (2026)
by: Zhang, Lixing, et al.
Published: (2026)
GMLake: Efficient and Transparent GPU Memory Defragmentation for Large-scale DNN Training with Virtual Memory Stitching
by: Guo, Cong, et al.
Published: (2024)
by: Guo, Cong, et al.
Published: (2024)
Federated Neural Radiance Field for Distributed Intelligence
by: Zhang, Yintian, et al.
Published: (2024)
by: Zhang, Yintian, et al.
Published: (2024)
Taming the Chaos: Coordinated Autoscaling for Heterogeneous and Disaggregated LLM Inference
by: Li, Rongzhi, et al.
Published: (2025)
by: Li, Rongzhi, et al.
Published: (2025)
MixServe: An Automatic Distributed Serving System for MoE Models with Hybrid Parallelism Based on Fused Communication Algorithm
by: Zhou, Bowen, et al.
Published: (2026)
by: Zhou, Bowen, et al.
Published: (2026)
MTGenRec: An Efficient Distributed Training System for Generative Recommendation Models in Meituan
by: Wang, Yuxiang, et al.
Published: (2025)
by: Wang, Yuxiang, et al.
Published: (2025)
Efficient Long Context Fine-tuning with Chunk Flow
by: Yuan, Xiulong, et al.
Published: (2025)
by: Yuan, Xiulong, et al.
Published: (2025)
EACO-RAG: Towards Distributed Tiered LLM Deployment using Edge-Assisted and Collaborative RAG with Adaptive Knowledge Update
by: Li, Jiaxing, et al.
Published: (2024)
by: Li, Jiaxing, et al.
Published: (2024)
WindVE: Collaborative CPU-NPU Vector Embedding
by: Huang, Jinqi, et al.
Published: (2025)
by: Huang, Jinqi, et al.
Published: (2025)
SLO-Aware Scheduling for Large Language Model Inferences
by: Huang, Jinqi, et al.
Published: (2025)
by: Huang, Jinqi, et al.
Published: (2025)
NineToothed: A Triton-Based High-Level Domain-Specific Language for Machine Learning
by: Huang, Jiacheng, et al.
Published: (2025)
by: Huang, Jiacheng, et al.
Published: (2025)
FlexFL: Heterogeneous Federated Learning via APoZ-Guided Flexible Pruning in Uncertain Scenarios
by: Chen, Zekai, et al.
Published: (2024)
by: Chen, Zekai, et al.
Published: (2024)
FlashFuser: Expanding the Scale of Kernel Fusion for Compute-Intensive Operators via Inter-Core Connection
by: Huang, Ziyu, et al.
Published: (2025)
by: Huang, Ziyu, et al.
Published: (2025)
The Power of Abstract MAC Layer: A Fault-tolerance Perspective
by: Zhang, Qinzi, et al.
Published: (2024)
by: Zhang, Qinzi, et al.
Published: (2024)
WindGP: Efficient Graph Partitioning on Heterogenous Machines
by: Zeng, Li, et al.
Published: (2024)
by: Zeng, Li, et al.
Published: (2024)
On Fault Tolerance of Data Storage Systems: A Holistic Perspective
by: Zheng, Mai, et al.
Published: (2025)
by: Zheng, Mai, et al.
Published: (2025)
exa-AMD: A Scalable Workflow for Accelerating AI-Assisted Materials Discovery and Design
by: Moraru, Maxim, et al.
Published: (2025)
by: Moraru, Maxim, et al.
Published: (2025)
Split Fine-Tuning for Large Language Models in Wireless Networks
by: Zhang, Songge, et al.
Published: (2025)
by: Zhang, Songge, et al.
Published: (2025)
A New Perspective of Graph Data and A Generic and Efficient Method for Large Scale Graph Data Traversal
by: Zhang, Chenglong
Published: (2020)
by: Zhang, Chenglong
Published: (2020)
Federated Inference for Heterogeneous LLM Communication and Collaboration
by: Chen, Zihan, et al.
Published: (2026)
by: Chen, Zihan, et al.
Published: (2026)
Frenzy: A Memory-Aware Serverless LLM Training System for Heterogeneous GPU Clusters
by: Chang, Zihan, et al.
Published: (2024)
by: Chang, Zihan, et al.
Published: (2024)
LuWu: An End-to-End In-Network Out-of-Core Optimizer for 100B-Scale Model-in-Network Data-Parallel Training on Distributed GPUs
by: Sun, Mo, et al.
Published: (2024)
by: Sun, Mo, et al.
Published: (2024)
IPComp: Interpolation Based Progressive Lossy Compression for Scientific Applications
by: Yang, Zhuoxun, et al.
Published: (2025)
by: Yang, Zhuoxun, et al.
Published: (2025)
GPZ: GPU-Accelerated Lossy Compressor for Particle Data
by: Li, Ruoyu, et al.
Published: (2025)
by: Li, Ruoyu, et al.
Published: (2025)
BFLN: A Blockchain-based Federated Learning Model for Non-IID Data
by: Li, Yang, et al.
Published: (2024)
by: Li, Yang, et al.
Published: (2024)
Squeezing Edge Performance: A Sensitivity-Aware Container Management for Heterogeneous Tasks
by: Zhang, Yongmin, et al.
Published: (2025)
by: Zhang, Yongmin, et al.
Published: (2025)
FedQuad: Adaptive Layer-wise LoRA Deployment and Activation Quantization for Federated Fine-Tuning
by: Li, Rukuo, et al.
Published: (2025)
by: Li, Rukuo, et al.
Published: (2025)
DreamDDP: Accelerating Data Parallel Distributed LLM Training with Layer-wise Scheduled Partial Synchronization
by: Tang, Zhenheng, et al.
Published: (2025)
by: Tang, Zhenheng, et al.
Published: (2025)
Seer: Proactive Revenue-Aware Scheduling for Live Streaming Services in Crowdsourced Cloud-Edge Platforms
by: Huang, Shaoyuan, et al.
Published: (2024)
by: Huang, Shaoyuan, et al.
Published: (2024)
Efficient CPU-GPU Collaborative Inference for MoE-based LLMs on Memory-Limited Systems
by: Huang, En-Ming, et al.
Published: (2025)
by: Huang, En-Ming, et al.
Published: (2025)
FATE: Future-State-Aware Scheduling for Heterogeneous LLM Workflows
by: Huang, Zirui, et al.
Published: (2026)
by: Huang, Zirui, et al.
Published: (2026)
AGoQ: Activation and Gradient Quantization for Memory-Efficient Distributed Training of LLMs
by: Lin, Wenxiang, et al.
Published: (2026)
by: Lin, Wenxiang, et al.
Published: (2026)
A 1024 RV-Cores Shared-L1 Cluster with High Bandwidth Memory Link for Low-Latency 6G-SDR
by: Zhang, Yichao, et al.
Published: (2024)
by: Zhang, Yichao, et al.
Published: (2024)
Integrated Sensing, Communication, and Computing: An Information-oriented Resource Transaction Mechanism
by: Chen, Ning, et al.
Published: (2024)
by: Chen, Ning, et al.
Published: (2024)
Similar Items
-
Reaching Agreement Among Reasoning LLM Agents
by: Ruan, Chaoyi, et al.
Published: (2025) -
Computation-Bandwidth-Memory Trade-offs: A Unified Paradigm for AI Infrastructure
by: Fan, Yuankai, et al.
Published: (2025) -
Building State Machine Replication Using Practical Network Synchrony
by: Wan, Yiliang, et al.
Published: (2025) -
A Survey of Computation Offloading with Task Types
by: Zhang, Siqi, et al.
Published: (2023) -
DHLink: A Microservice Platform supporting Rapid Application Development and Secure Real-time Data Sharing in Digital Health
by: Li, Wenhao, et al.
Published: (2021)