Saved in:
| Main Authors: | Zhao, Yong, Zhu, Zhengqiu, Chen, Bin, Qiu, Sihang, Huang, Jincai, Lu, Xin, Yang, Weiyi, Ai, Chuan, Huang, Kuihua, He, Cheng, Jin, Yucheng, Liu, Zhong, Wang, Fei-Yue |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2311.12838 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
A dynamic parallel method for performance optimization on hybrid CPUs
by: Yu, Luo, et al.
Published: (2024)
by: Yu, Luo, et al.
Published: (2024)
Towards Sustainable Large Language Model Serving
by: Nguyen, Sophia, et al.
Published: (2024)
by: Nguyen, Sophia, et al.
Published: (2024)
PRISM: Dynamic Primitive-Based Forecasting for Large-Scale GPU Cluster Workloads
by: Wu, Xin, et al.
Published: (2026)
by: Wu, Xin, et al.
Published: (2026)
A parallel parser for regular expressions
by: Borsotti, Angelo, et al.
Published: (2025)
by: Borsotti, Angelo, et al.
Published: (2025)
Heta: Distributed Training of Heterogeneous Graph Neural Networks
by: Zhong, Yuchen, et al.
Published: (2024)
by: Zhong, Yuchen, et al.
Published: (2024)
CapsuleFS A Multi-credential DataCapsule Filesystem
by: Hu, Qingyang, et al.
Published: (2025)
by: Hu, Qingyang, et al.
Published: (2025)
Twinning for Space-Air-Ground-Sea Integrated Networks: Beyond Conventional Digital Twin Towards Goal-Oriented Semantic Twin
by: Qiu, Yifei, et al.
Published: (2025)
by: Qiu, Yifei, et al.
Published: (2025)
Massively parallel CMA-ES with increasing population
by: Redon, David, et al.
Published: (2024)
by: Redon, David, et al.
Published: (2024)
A common parallel framework for LLP combinatorial problems
by: Alves, David Ribeiro, et al.
Published: (2026)
by: Alves, David Ribeiro, et al.
Published: (2026)
SWIFT: Expedited Failure Recovery for Large-scale DNN Training
by: Zhong, Yuchen, et al.
Published: (2023)
by: Zhong, Yuchen, et al.
Published: (2023)
Matrix representation and GPU-optimized parallel B-spline computing
by: Wu, Jiayu, et al.
Published: (2025)
by: Wu, Jiayu, et al.
Published: (2025)
Minimizing speculation overhead in a parallel recognizer for regular texts
by: Borsotti, Angelo, et al.
Published: (2024)
by: Borsotti, Angelo, et al.
Published: (2024)
Energy efficiency optimization of task-parallel codes on asymmetric architectures
by: Costero, Luis, et al.
Published: (2024)
by: Costero, Luis, et al.
Published: (2024)
Uncertainty-Aware Decarbonization for Datacenters
by: Li, Amy, et al.
Published: (2024)
by: Li, Amy, et al.
Published: (2024)
Static task mapping for heterogeneous systems based on series-parallel decompositions
by: Wilhelm, Martin, et al.
Published: (2025)
by: Wilhelm, Martin, et al.
Published: (2025)
Regent based parallel meshfree LSKUM solver for heterogenous HPC platforms
by: Salil, Sanath, et al.
Published: (2024)
by: Salil, Sanath, et al.
Published: (2024)
AcOrch: Accelerating Sampling-based GNN Training under CPU-NPU Heterogeneous Environments
by: Chen, Kefu, et al.
Published: (2026)
by: Chen, Kefu, et al.
Published: (2026)
Towards Energy Efficient Co-Scheduling in HPC
by: Zheng, Zhong, et al.
Published: (2026)
by: Zheng, Zhong, et al.
Published: (2026)
CrossPipe: Towards Optimal Pipeline Schedules for Cross-Datacenter Training
by: Chen, Tiancheng, et al.
Published: (2025)
by: Chen, Tiancheng, et al.
Published: (2025)
MTGenRec: An Efficient Distributed Training System for Generative Recommendation Models in Meituan
by: Wang, Yuxiang, et al.
Published: (2025)
by: Wang, Yuxiang, et al.
Published: (2025)
FLAME: A Serving System Optimized for Large-Scale Generative Recommendation with Efficiency
by: Guo, Xianwen, et al.
Published: (2025)
by: Guo, Xianwen, et al.
Published: (2025)
An inherently parallel H2-ULV factorization for solving dense linear systems on GPUs
by: Ma, Qianxiang, et al.
Published: (2025)
by: Ma, Qianxiang, et al.
Published: (2025)
Pipit: Scripting the analysis of parallel execution traces
by: Bhatele, Abhinav, et al.
Published: (2023)
by: Bhatele, Abhinav, et al.
Published: (2023)
DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference
by: Wu, Yongtong, et al.
Published: (2026)
by: Wu, Yongtong, et al.
Published: (2026)
WindVE: Collaborative CPU-NPU Vector Embedding
by: Huang, Jinqi, et al.
Published: (2025)
by: Huang, Jinqi, et al.
Published: (2025)
SLO-Aware Scheduling for Large Language Model Inferences
by: Huang, Jinqi, et al.
Published: (2025)
by: Huang, Jinqi, et al.
Published: (2025)
Approximated Coded Computing: Towards Fast, Private and Secure Distributed Machine Learning
by: Qiu, Houming, et al.
Published: (2024)
by: Qiu, Houming, et al.
Published: (2024)
Towards Cloud Efficiency with Large-scale Workload Characterization
by: Parayil, Anjaly, et al.
Published: (2024)
by: Parayil, Anjaly, et al.
Published: (2024)
AdaptiveLoad: Towards Efficient Video Diffusion Transformer Training
by: Guo, Yucheng, et al.
Published: (2026)
by: Guo, Yucheng, et al.
Published: (2026)
GreenLLM: Disaggregating Large Language Model Serving on Heterogeneous GPUs for Lower Carbon Emissions
by: Shi, Tianyao, et al.
Published: (2024)
by: Shi, Tianyao, et al.
Published: (2024)
Cache Your Prompt When It's Green: Carbon-Aware Caching for Large Language Model Serving
by: Tian, Yuyang, et al.
Published: (2025)
by: Tian, Yuyang, et al.
Published: (2025)
Towards Communication-Efficient Decentralized Federated Graph Learning over Non-IID Data
by: Wang, Shilong, et al.
Published: (2025)
by: Wang, Shilong, et al.
Published: (2025)
Boosting Scientific Error-Bounded Lossy Compression through Optimized Synergistic Lossy-Lossless Orchestration
by: Wu, Shixun, et al.
Published: (2025)
by: Wu, Shixun, et al.
Published: (2025)
A large-scale distributed parallel discrete event simulation engines based on Warped2 for Wargaming simulation
by: Jia, Xiaoning, et al.
Published: (2025)
by: Jia, Xiaoning, et al.
Published: (2025)
exa-AMD: A Scalable Workflow for Accelerating AI-Assisted Materials Discovery and Design
by: Moraru, Maxim, et al.
Published: (2025)
by: Moraru, Maxim, et al.
Published: (2025)
MSPipe: Efficient Temporal GNN Training via Staleness-Aware Pipeline
by: Sheng, Guangming, et al.
Published: (2024)
by: Sheng, Guangming, et al.
Published: (2024)
UMDAM: A Unified Data Layout and DRAM Address Mapping for Heterogenous NPU-PIM
by: Huang, Hai
Published: (2025)
by: Huang, Hai
Published: (2025)
sVIRGO: A Scalable Virtual Tree Hierarchical Framework for Distributed Systems
by: Huang, Lican
Published: (2026)
by: Huang, Lican
Published: (2026)
HoSZp: An Efficient Homomorphic Error-bounded Lossy Compressor for Scientific Data
by: Agarwal, Tripti, et al.
Published: (2024)
by: Agarwal, Tripti, et al.
Published: (2024)
FreeRide: Harvesting Bubbles in Pipeline Parallelism
by: Zhang, Jiashu, et al.
Published: (2024)
by: Zhang, Jiashu, et al.
Published: (2024)
Similar Items
-
A dynamic parallel method for performance optimization on hybrid CPUs
by: Yu, Luo, et al.
Published: (2024) -
Towards Sustainable Large Language Model Serving
by: Nguyen, Sophia, et al.
Published: (2024) -
PRISM: Dynamic Primitive-Based Forecasting for Large-Scale GPU Cluster Workloads
by: Wu, Xin, et al.
Published: (2026) -
A parallel parser for regular expressions
by: Borsotti, Angelo, et al.
Published: (2025) -
Heta: Distributed Training of Heterogeneous Graph Neural Networks
by: Zhong, Yuchen, et al.
Published: (2024)