Saved in:
| Main Authors: | Zhu, Jun, Xu, Yin, He, Dazhi, Li, Haoyang, Guan, Yunfeng, Zhang, Wenjun, Ma, Tianyao, Yuan, Haozhi |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2503.10525 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Efficient Parallel Implementation of the Pilot Assignment Problem in Massive MIMO Systems
by: Alqudah, Eman, et al.
Published: (2025)
by: Alqudah, Eman, et al.
Published: (2025)
HopGNN: Boosting Distributed GNN Training Efficiency via Feature-Centric Model Migration
by: Chen, Weijian, et al.
Published: (2024)
by: Chen, Weijian, et al.
Published: (2024)
Distributed system perspective on Backscatter systems
by: Guan, Jincheng, et al.
Published: (2025)
by: Guan, Jincheng, et al.
Published: (2025)
Efficient Architecture for RISC-V Vector Memory Access
by: Guan, Hongyi, et al.
Published: (2025)
by: Guan, Hongyi, et al.
Published: (2025)
Unfolding an Atomistic World: Atomistic Simulation of Reactor Pressure Vessel Steel Across Year-and-Meter Scales
by: Han, Haozhi, et al.
Published: (2026)
by: Han, Haozhi, et al.
Published: (2026)
MTGenRec: An Efficient Distributed Training System for Generative Recommendation Models in Meituan
by: Wang, Yuxiang, et al.
Published: (2025)
by: Wang, Yuxiang, et al.
Published: (2025)
FedCod: An Efficient Communication Protocol for Cross-Silo Federated Learning with Coding
by: Yan, Peishen, et al.
Published: (2024)
by: Yan, Peishen, et al.
Published: (2024)
GreenLLM: Disaggregating Large Language Model Serving on Heterogeneous GPUs for Lower Carbon Emissions
by: Shi, Tianyao, et al.
Published: (2024)
by: Shi, Tianyao, et al.
Published: (2024)
AME: An Efficient Heterogeneous Agentic Memory Engine for Smartphones
by: Zhao, Xinkui, et al.
Published: (2025)
by: Zhao, Xinkui, et al.
Published: (2025)
Heimdall++: Optimizing GPU Utilization and Pipeline Parallelism for Efficient Single-Pulse Detection
by: Xia, Bingzheng, et al.
Published: (2025)
by: Xia, Bingzheng, et al.
Published: (2025)
Cascadia: An Efficient Cascade Serving System for Large Language Models
by: Jiang, Youhe, et al.
Published: (2025)
by: Jiang, Youhe, et al.
Published: (2025)
XMiner: Efficient Directed Subgraph Matching with Pattern Reduction
by: Yuan, Pingpeng, et al.
Published: (2024)
by: Yuan, Pingpeng, et al.
Published: (2024)
AgentServe: Algorithm-System Co-Design for Efficient Agentic AI Serving on a Consumer-Grade GPU
by: Zhang, Yuning, et al.
Published: (2026)
by: Zhang, Yuning, et al.
Published: (2026)
Unleashing Efficient Asynchronous RL Post-Training via Staleness-Constrained Rollout Coordination
by: Li, Haoyang, et al.
Published: (2026)
by: Li, Haoyang, et al.
Published: (2026)
LatencyPrism: Online Non-intrusive Latency Sculpting for SLO-Guaranteed LLM Inference
by: Du, Yin, et al.
Published: (2026)
by: Du, Yin, et al.
Published: (2026)
Efficient Pre-Training of LLMs via Topology-Aware Communication Alignment on More Than 9600 GPUs
by: He, Guoliang, et al.
Published: (2025)
by: He, Guoliang, et al.
Published: (2025)
Hetu v2: A General and Scalable Deep Learning System with Hierarchical and Heterogeneous Single Program Multiple Data Annotations
by: Li, Haoyang, et al.
Published: (2025)
by: Li, Haoyang, et al.
Published: (2025)
6G Twin: Hybrid Gaussian Radio Fields for Channel Estimation and Non-Linear Precoder Design for Radio Access Networks
by: Mohsin, Muhammad Ahmed, et al.
Published: (2025)
by: Mohsin, Muhammad Ahmed, et al.
Published: (2025)
An Efficient, Reliable and Observable Collective Communication Library in Large-scale GPU Training Clusters
by: Zhang, Mingjun, et al.
Published: (2025)
by: Zhang, Mingjun, et al.
Published: (2025)
HexAGenT: Efficient Agentic LLM Serving via Workflow- and Heterogeneity-Aware Scheduling
by: Peng, You, et al.
Published: (2026)
by: Peng, You, et al.
Published: (2026)
UELLM: A Unified and Efficient Approach for LLM Inference Serving
by: He, Yiyuan, et al.
Published: (2024)
by: He, Yiyuan, et al.
Published: (2024)
MegatronApp: Efficient and Comprehensive Management on Distributed LLM Training
by: Zhao, Bohan, et al.
Published: (2025)
by: Zhao, Bohan, et al.
Published: (2025)
Hecate: Unlocking Efficient Sparse Model Training via Fully Sharded Sparse Data Parallelism
by: Qing, Yuhao, et al.
Published: (2025)
by: Qing, Yuhao, et al.
Published: (2025)
CALVO: Improve Serving Efficiency for LLM Inferences with Intense Network Demands
by: Wang, Weiye, et al.
Published: (2026)
by: Wang, Weiye, et al.
Published: (2026)
Efficient Long Context Fine-tuning with Chunk Flow
by: Yuan, Xiulong, et al.
Published: (2025)
by: Yuan, Xiulong, et al.
Published: (2025)
D-CAST: Distributed Consensus Switch in Wireless Trustworthy Autonomous System
by: Yu, Dachao, et al.
Published: (2024)
by: Yu, Dachao, et al.
Published: (2024)
Cloud-native and Distributed Systems for Efficient and Scalable Large Language Models -- A Research Agenda
by: Xu, Minxian, et al.
Published: (2026)
by: Xu, Minxian, et al.
Published: (2026)
Cloud Native System for LLM Inference Serving
by: Xu, Minxian, et al.
Published: (2025)
by: Xu, Minxian, et al.
Published: (2025)
Secure Communication in the Presence of an RIS-Enhanced Eavesdropper in MIMO Networks
by: Zhang, Gaoyuan, et al.
Published: (2025)
by: Zhang, Gaoyuan, et al.
Published: (2025)
Multi-Path Bound for DAG Tasks
by: He, Qingqiang, et al.
Published: (2023)
by: He, Qingqiang, et al.
Published: (2023)
Multi-Modal Style Transfer-based Prompt Tuning for Efficient Federated Domain Generalization
by: Chen, Yuliang, et al.
Published: (2026)
by: Chen, Yuliang, et al.
Published: (2026)
Loki: A System for Serving ML Inference Pipelines with Hardware and Accuracy Scaling
by: Ahmad, Sohaib, et al.
Published: (2024)
by: Ahmad, Sohaib, et al.
Published: (2024)
DistFlow: A Fully Distributed RL Framework for Scalable and Efficient LLM Post-Training
by: Wang, Zhixin, et al.
Published: (2025)
by: Wang, Zhixin, et al.
Published: (2025)
Collaborative Inference in DNN-based Satellite Systems with Dynamic Task Streams
by: Guan, Jinglong, et al.
Published: (2023)
by: Guan, Jinglong, et al.
Published: (2023)
GRNND: A GPU-Parallel Relative NN-Descent Algorithm for Efficient Approximate Nearest Neighbor Graph Construction
by: Li, Xiang, et al.
Published: (2025)
by: Li, Xiang, et al.
Published: (2025)
Hyperion: Low-Latency Ultra-HD Video Analytics via Collaborative Vision Transformer Inference
by: Jiang, Linyi, et al.
Published: (2025)
by: Jiang, Linyi, et al.
Published: (2025)
An All-Reduce Compatible Top-K Compressor for Communication-Efficient Distributed Learning
by: Chen, Chuyan, et al.
Published: (2025)
by: Chen, Chuyan, et al.
Published: (2025)
BLOCKS: Blockchain-supported Cross-Silo Knowledge Sharing for Efficient LLM Services
by: Zhou, Zhaojiacheng, et al.
Published: (2025)
by: Zhou, Zhaojiacheng, et al.
Published: (2025)
PUSHtap: PIM-based In-Memory HTAP with Unified Data Storage Format
by: Zhao, Yilong, et al.
Published: (2025)
by: Zhao, Yilong, et al.
Published: (2025)
FourierCompress: Layer-Aware Spectral Activation Compression for Efficient and Accurate Collaborative LLM Inference
by: Ma, Jian, et al.
Published: (2025)
by: Ma, Jian, et al.
Published: (2025)
Similar Items
-
Efficient Parallel Implementation of the Pilot Assignment Problem in Massive MIMO Systems
by: Alqudah, Eman, et al.
Published: (2025) -
HopGNN: Boosting Distributed GNN Training Efficiency via Feature-Centric Model Migration
by: Chen, Weijian, et al.
Published: (2024) -
Distributed system perspective on Backscatter systems
by: Guan, Jincheng, et al.
Published: (2025) -
Efficient Architecture for RISC-V Vector Memory Access
by: Guan, Hongyi, et al.
Published: (2025) -
Unfolding an Atomistic World: Atomistic Simulation of Reactor Pressure Vessel Steel Across Year-and-Meter Scales
by: Han, Haozhi, et al.
Published: (2026)