:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhao, Yong, Zhu, Zhengqiu, Chen, Bin, Qiu, Sihang, Huang, Jincai, Lu, Xin, Yang, Weiyi, Ai, Chuan, Huang, Kuihua, He, Cheng, Jin, Yucheng, Liu, Zhong, Wang, Fei-Yue
Format:	Preprint
Published:	2023
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2311.12838
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

A dynamic parallel method for performance optimization on hybrid CPUs
by: Yu, Luo, et al.
Published: (2024)

Towards Sustainable Large Language Model Serving
by: Nguyen, Sophia, et al.
Published: (2024)

PRISM: Dynamic Primitive-Based Forecasting for Large-Scale GPU Cluster Workloads
by: Wu, Xin, et al.
Published: (2026)

A parallel parser for regular expressions
by: Borsotti, Angelo, et al.
Published: (2025)

Heta: Distributed Training of Heterogeneous Graph Neural Networks
by: Zhong, Yuchen, et al.
Published: (2024)

CapsuleFS A Multi-credential DataCapsule Filesystem
by: Hu, Qingyang, et al.
Published: (2025)

Twinning for Space-Air-Ground-Sea Integrated Networks: Beyond Conventional Digital Twin Towards Goal-Oriented Semantic Twin
by: Qiu, Yifei, et al.
Published: (2025)

Massively parallel CMA-ES with increasing population
by: Redon, David, et al.
Published: (2024)

A common parallel framework for LLP combinatorial problems
by: Alves, David Ribeiro, et al.
Published: (2026)

SWIFT: Expedited Failure Recovery for Large-scale DNN Training
by: Zhong, Yuchen, et al.
Published: (2023)

Matrix representation and GPU-optimized parallel B-spline computing
by: Wu, Jiayu, et al.
Published: (2025)

Minimizing speculation overhead in a parallel recognizer for regular texts
by: Borsotti, Angelo, et al.
Published: (2024)

Energy efficiency optimization of task-parallel codes on asymmetric architectures
by: Costero, Luis, et al.
Published: (2024)

Uncertainty-Aware Decarbonization for Datacenters
by: Li, Amy, et al.
Published: (2024)

Static task mapping for heterogeneous systems based on series-parallel decompositions
by: Wilhelm, Martin, et al.
Published: (2025)

Regent based parallel meshfree LSKUM solver for heterogenous HPC platforms
by: Salil, Sanath, et al.
Published: (2024)

AcOrch: Accelerating Sampling-based GNN Training under CPU-NPU Heterogeneous Environments
by: Chen, Kefu, et al.
Published: (2026)

Towards Energy Efficient Co-Scheduling in HPC
by: Zheng, Zhong, et al.
Published: (2026)

CrossPipe: Towards Optimal Pipeline Schedules for Cross-Datacenter Training
by: Chen, Tiancheng, et al.
Published: (2025)

MTGenRec: An Efficient Distributed Training System for Generative Recommendation Models in Meituan
by: Wang, Yuxiang, et al.
Published: (2025)

FLAME: A Serving System Optimized for Large-Scale Generative Recommendation with Efficiency
by: Guo, Xianwen, et al.
Published: (2025)

An inherently parallel H2-ULV factorization for solving dense linear systems on GPUs
by: Ma, Qianxiang, et al.
Published: (2025)

Pipit: Scripting the analysis of parallel execution traces
by: Bhatele, Abhinav, et al.
Published: (2023)

DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference
by: Wu, Yongtong, et al.
Published: (2026)

WindVE: Collaborative CPU-NPU Vector Embedding
by: Huang, Jinqi, et al.
Published: (2025)

SLO-Aware Scheduling for Large Language Model Inferences
by: Huang, Jinqi, et al.
Published: (2025)

Approximated Coded Computing: Towards Fast, Private and Secure Distributed Machine Learning
by: Qiu, Houming, et al.
Published: (2024)

Towards Cloud Efficiency with Large-scale Workload Characterization
by: Parayil, Anjaly, et al.
Published: (2024)

AdaptiveLoad: Towards Efficient Video Diffusion Transformer Training
by: Guo, Yucheng, et al.
Published: (2026)

GreenLLM: Disaggregating Large Language Model Serving on Heterogeneous GPUs for Lower Carbon Emissions
by: Shi, Tianyao, et al.
Published: (2024)

Cache Your Prompt When It's Green: Carbon-Aware Caching for Large Language Model Serving
by: Tian, Yuyang, et al.
Published: (2025)

Towards Communication-Efficient Decentralized Federated Graph Learning over Non-IID Data
by: Wang, Shilong, et al.
Published: (2025)

Boosting Scientific Error-Bounded Lossy Compression through Optimized Synergistic Lossy-Lossless Orchestration
by: Wu, Shixun, et al.
Published: (2025)

A large-scale distributed parallel discrete event simulation engines based on Warped2 for Wargaming simulation
by: Jia, Xiaoning, et al.
Published: (2025)

exa-AMD: A Scalable Workflow for Accelerating AI-Assisted Materials Discovery and Design
by: Moraru, Maxim, et al.
Published: (2025)

MSPipe: Efficient Temporal GNN Training via Staleness-Aware Pipeline
by: Sheng, Guangming, et al.
Published: (2024)

UMDAM: A Unified Data Layout and DRAM Address Mapping for Heterogenous NPU-PIM
by: Huang, Hai
Published: (2025)

sVIRGO: A Scalable Virtual Tree Hierarchical Framework for Distributed Systems
by: Huang, Lican
Published: (2026)

HoSZp: An Efficient Homomorphic Error-bounded Lossy Compressor for Scientific Data
by: Agarwal, Tripti, et al.
Published: (2024)

FreeRide: Harvesting Bubbles in Pipeline Parallelism
by: Zhang, Jiashu, et al.
Published: (2024)