Saved in:
| Main Authors: | Fu, Rong, Su, Zhongling, Zhong, Han-Sen, Zhao, Xiti, Zhang, Jianyang, Pan, Feng, Zhang, Pan, Zhao, Xianhe, Chen, Ming-Cheng, Lu, Chao-Yang, Pan, Jian-Wei, Pei, Zhiling, Zhang, Xingcheng, Ouyang, Wanli |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.00769 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Efficient Quantum Circuit Simulation by Tensor Network Methods on Modern GPUs
by: Pan, Feng, et al.
Published: (2023)
by: Pan, Feng, et al.
Published: (2023)
More for Less: Integrating Capability-Predominant and Capacity-Predominant Computing
by: Zheng, Zhong, et al.
Published: (2025)
by: Zheng, Zhong, et al.
Published: (2025)
Towards Energy Efficient Co-Scheduling in HPC
by: Zheng, Zhong, et al.
Published: (2026)
by: Zheng, Zhong, et al.
Published: (2026)
EcoShift: Performance-Aware Power Management for Power-Constrained Heterogeneous Systems
by: Zheng, Zhong, et al.
Published: (2026)
by: Zheng, Zhong, et al.
Published: (2026)
NanoCP: Request-Level Dynamic Context Parallelism for Data-Expert Parallel Decoding
by: Chen, Jiefei, et al.
Published: (2026)
by: Chen, Jiefei, et al.
Published: (2026)
H2:Towards Efficient Large-Scale LLM Training on Hyper-Heterogeneous Cluster over 1,000 Chips
by: Tang, Ding, et al.
Published: (2025)
by: Tang, Ding, et al.
Published: (2025)
ZeroPP: Unleashing Exceptional Parallelism Efficiency through Tensor-Parallelism-Free Methodology
by: Tang, Ding, et al.
Published: (2024)
by: Tang, Ding, et al.
Published: (2024)
Exploring Uncore Frequency Scaling for Heterogeneous Computing
by: Zheng, Zhong, et al.
Published: (2025)
by: Zheng, Zhong, et al.
Published: (2025)
Union: An Automatic Workload Manager for Accelerating Network Simulation
by: Wang, Xin, et al.
Published: (2024)
by: Wang, Xin, et al.
Published: (2024)
High-performance Vector-length Agnostic Quantum Circuit Simulations on ARM Processors
by: Shi, Ruimin, et al.
Published: (2026)
by: Shi, Ruimin, et al.
Published: (2026)
A Real-Time Digital Twin for Adaptive Scheduling
by: Zhang, Yihe, et al.
Published: (2025)
by: Zhang, Yihe, et al.
Published: (2025)
FFTrainer: Fast Failover in Large-Language Model Training with Almost-Free State Management
by: Zhao, Bohan, et al.
Published: (2025)
by: Zhao, Bohan, et al.
Published: (2025)
GRACE-MoE: Grouping and Replication with Locality-Aware Routing for Efficient Distributed MoE Inference
by: Han, Yu, et al.
Published: (2025)
by: Han, Yu, et al.
Published: (2025)
Overcoming Memory Constraints in Quantum Circuit Simulation with a High-Fidelity Compression Framework
by: Zhang, Boyuan, et al.
Published: (2024)
by: Zhang, Boyuan, et al.
Published: (2024)
MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving
by: Duan, Jiangfei, et al.
Published: (2024)
by: Duan, Jiangfei, et al.
Published: (2024)
Coordinated Power Management on Heterogeneous Systems
by: Zheng, Zhong, et al.
Published: (2025)
by: Zheng, Zhong, et al.
Published: (2025)
FLAMMABLE: A Multi-Model Federated Learning Framework with Multi-Model Engagement and Adaptive Batch Sizes
by: Lin, Shouxu, et al.
Published: (2025)
by: Lin, Shouxu, et al.
Published: (2025)
MuxTune: Efficient Multi-Task LLM Fine-Tuning in Multi-Tenant Datacenters via Spatial-Temporal Backbone Multiplexing
by: Xue, Chunyu, et al.
Published: (2026)
by: Xue, Chunyu, et al.
Published: (2026)
MOSS: A Large-scale Open Microscopic Traffic Simulation System
by: Zhang, Jun, et al.
Published: (2024)
by: Zhang, Jun, et al.
Published: (2024)
TierCheck: Tiered Checkpointing for Fault Tolerance in Large Language Model Training
by: Han, Shujie, et al.
Published: (2026)
by: Han, Shujie, et al.
Published: (2026)
UNIQ: Communication-Efficient Distributed Quantum Computing via Unified Nonlinear Integer Programming
by: Zhong, Hui, et al.
Published: (2025)
by: Zhong, Hui, et al.
Published: (2025)
Maple: A Multi-agent System for Portable Deep Learning across Clusters
by: Wu, Molang, et al.
Published: (2025)
by: Wu, Molang, et al.
Published: (2025)
Pier: Efficient Large Language Model pretraining with Relaxed Global Communication
by: Fan, Shuyuan, et al.
Published: (2025)
by: Fan, Shuyuan, et al.
Published: (2025)
SimDC: A High-Fidelity Device Simulation Platform for Device-Cloud Collaborative Computing
by: Pei, Ruiguang, et al.
Published: (2025)
by: Pei, Ruiguang, et al.
Published: (2025)
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training
by: Sun, Ao, et al.
Published: (2024)
by: Sun, Ao, et al.
Published: (2024)
Lion: Minimizing Distributed Transactions through Adaptive Replica Provision (Extended Version)
by: Zheng, Qiushi, et al.
Published: (2024)
by: Zheng, Qiushi, et al.
Published: (2024)
FedHC: A Hierarchical Clustered Federated Learning Framework for Satellite Networks
by: Liu, Zhuocheng, et al.
Published: (2025)
by: Liu, Zhuocheng, et al.
Published: (2025)
Energy-aware Incremental OTA Update for Flash-based Batteryless IoT Devices
by: Wei, Wei, et al.
Published: (2024)
by: Wei, Wei, et al.
Published: (2024)
A Survey on Model-heterogeneous Federated Learning: Problems, Methods, and Prospects
by: Fan, Boyu, et al.
Published: (2023)
by: Fan, Boyu, et al.
Published: (2023)
Towards Lock Modularization for Heterogeneous Environments
by: Zhang, Hanze, et al.
Published: (2025)
by: Zhang, Hanze, et al.
Published: (2025)
FreeRide: Harvesting Bubbles in Pipeline Parallelism
by: Zhang, Jiashu, et al.
Published: (2024)
by: Zhang, Jiashu, et al.
Published: (2024)
SMART: When is it Actually Worth Expanding a Speculative Tree?
by: Wang, Lifu, et al.
Published: (2026)
by: Wang, Lifu, et al.
Published: (2026)
Differential Privacy Preserving Distributed Quantum Computing
by: Zhong, Hui, et al.
Published: (2024)
by: Zhong, Hui, et al.
Published: (2024)
BlockAMC: Scalable In-Memory Analog Matrix Computing for Solving Linear Systems
by: Pan, Lunshuai, et al.
Published: (2024)
by: Pan, Lunshuai, et al.
Published: (2024)
CoEdge-RAG: Optimizing Hierarchical Scheduling for Retrieval-Augmented LLMs in Collaborative Edge Computing
by: Hong, Guihang, et al.
Published: (2025)
by: Hong, Guihang, et al.
Published: (2025)
DynaFlow: Transparent and Flexible Intra-Device Parallelism via Programmable Operator Scheduling
by: Pan, Yi, et al.
Published: (2026)
by: Pan, Yi, et al.
Published: (2026)
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
by: Duan, Jiangfei, et al.
Published: (2024)
by: Duan, Jiangfei, et al.
Published: (2024)
Efficient Graph-Based Approximate Nearest Neighbor Search Achieving: Low Latency Without Throughput Loss
by: Luo, Jingjia, et al.
Published: (2025)
by: Luo, Jingjia, et al.
Published: (2025)
Boosting Scientific Error-Bounded Lossy Compression through Optimized Synergistic Lossy-Lossless Orchestration
by: Wu, Shixun, et al.
Published: (2025)
by: Wu, Shixun, et al.
Published: (2025)
Frenzy: A Memory-Aware Serverless LLM Training System for Heterogeneous GPU Clusters
by: Chang, Zihan, et al.
Published: (2024)
by: Chang, Zihan, et al.
Published: (2024)
Similar Items
-
Efficient Quantum Circuit Simulation by Tensor Network Methods on Modern GPUs
by: Pan, Feng, et al.
Published: (2023) -
More for Less: Integrating Capability-Predominant and Capacity-Predominant Computing
by: Zheng, Zhong, et al.
Published: (2025) -
Towards Energy Efficient Co-Scheduling in HPC
by: Zheng, Zhong, et al.
Published: (2026) -
EcoShift: Performance-Aware Power Management for Power-Constrained Heterogeneous Systems
by: Zheng, Zhong, et al.
Published: (2026) -
NanoCP: Request-Level Dynamic Context Parallelism for Data-Expert Parallel Decoding
by: Chen, Jiefei, et al.
Published: (2026)