:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Liu, Rongrong, Guo, Zhuoqiang, Sha, Qiuchen, Zhao, Tong, Li, Haibo, Hu, Wei, Liu, Lijun, Tan, Guangming, Jia, Weile
Format:	Preprint
Published:	2025
Subjects:	Materials Science Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2501.03061
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Deep Learning-Enabled Supercritical Flame Simulation at Detailed Chemistry and Real-Fluid Accuracy Towards Trillion-Cell Scale
by: Guo, Zhuoqiang, et al.
Published: (2025)

JanusPipe: Efficient Pipeline Parallel Training for Machine Learning Interatomic Potentials
by: Wang, Hongyu, et al.
Published: (2026)

Breaking the Training Barrier of Billion-Parameter Universal Machine Learning Interatomic Potentials
by: Zhou, Yuanchang, et al.
Published: (2026)

Scaling Molecular Dynamics with ab initio Accuracy to 149 Nanoseconds per Day
by: Li, Jianxiong, et al.
Published: (2024)

FastCHGNet: Training one Universal Interatomic Potential to 1.5 Hours with 32 GPUs
by: Zhou, Yuanchang, et al.
Published: (2024)

Multi-level Memory-Centric Profiling on ARM Processors with ARM SPE
by: Miksits, Samuel, et al.
Published: (2024)

GPU-Accelerated Modified Bessel Function of the Second Kind for Gaussian Processes
by: Geng, Zipei, et al.
Published: (2025)

Pipelined Dense Symmetric Eigenvalue Decomposition on Multi-GPU Architectures
by: Wang, Hansheng, et al.
Published: (2025)

GCAPS: GPU Context-Aware Preemptive Priority-based Scheduling for Real-Time Tasks
by: Wang, Yidi, et al.
Published: (2024)

Large-scale Neural Network Quantum States for ab initio Quantum Chemistry Simulations on Fugaku
by: Xu, Hongtao, et al.
Published: (2025)

Exploring the Viability of Unikernels for ARM-powered Edge Computing
by: Kaiser, Shahidullah, et al.
Published: (2024)

Demystifying ARM SME to Optimize General Matrix Multiplications
by: Deng, Chencheng, et al.
Published: (2025)

Dependency-aware Resource Allocation for Serverless Functions at the Edge
by: Baresi, Luciano, et al.
Published: (2023)

A Framework for Fine-Grained Synchronization of Dependent GPU Kernels
by: Jangda, Abhinav, et al.
Published: (2023)

Orchestrating the Execution of Serverless Functions in Hybrid Clouds
by: Peri, Aristotelis, et al.
Published: (2024)

Computational Performance and Energy Efficiency of ARM based HPC servers
by: Schirmer, Oskar
Published: (2024)

A Hybrid Vectorized Merge Sort on ARM NEON
by: Zhou, Jincheng, et al.
Published: (2024)

HAS-GPU: Efficient Hybrid Auto-scaling with Fine-grained GPU Allocation for SLO-aware Serverless Inferences
by: Gu, Jianfeng, et al.
Published: (2025)

Unleashing the Power of Preemptive Priority-based Scheduling for Real-Time GPU Tasks
by: Wang, Yidi, et al.
Published: (2024)

MQFQ-Sticky: Fair Queueing For Serverless GPU Functions
by: Fuerst, Alexander, et al.
Published: (2025)

Heat: Satellite's meat is GPU's poison
by: Yuan, Zhehu, et al.
Published: (2024)

Combining GPU and CPU for accelerating evolutionary computing workloads
by: Eynaliyev, Rustam, et al.
Published: (2025)

ARM SVE Unleashed: Performance and Insights Across HPC Applications on Nvidia Grace
by: Shi, Ruimin, et al.
Published: (2025)

High-performance Vector-length Agnostic Quantum Circuit Simulations on ARM Processors
by: Shi, Ruimin, et al.
Published: (2026)

Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers
by: Maurya, Avinash, et al.
Published: (2024)

Fast and Scalable Mixed Precision Euclidean Distance Calculations Using GPU Tensor Cores
by: Curless, Brian, et al.
Published: (2025)

Characterization-Guided GPU Fault Resilience in NVIDIA MPS
by: Liu, Rixin, et al.
Published: (2026)

Performance Isolation and Semantic Determinism in Efficient GPU Spatial Sharing
by: Yang, Zhenyuan, et al.
Published: (2026)

Parallel GPU-Enabled Algorithms for SpGEMM on Arbitrary Semirings with Hybrid Communication
by: McFarland, Thomas, et al.
Published: (2025)

Fantasy: Efficient Large-scale Vector Search on GPU Clusters with GPUDirect Async
by: Liu, Yi, et al.
Published: (2025)

Serving Hybrid LLM Loads with SLO Guarantees Using CPU-GPU Attention Piggybacking
by: Mo, Zizhao, et al.
Published: (2026)

ICPS: Real-Time Resource Configuration for Cloud Serverless Functions Considering Affinity
by: Chen, Long, et al.
Published: (2025)

A Precision Emulation Approach to the GPU Acceleration of Ab Initio Electronic Structure Calculations
by: Liu, Hang, et al.
Published: (2026)

SWIFT: Expedited Failure Recovery for Large-scale DNN Training
by: Zhong, Yuchen, et al.
Published: (2023)

GPU-Accelerated Batch-Dynamic Subgraph Matching
by: Qiu, Linshan, et al.
Published: (2024)

SDSL-Solver: Scalable Distributed Sparse Linear Solvers for Large-Scale Interior Point Methods
by: Yang, Shaofeng, et al.
Published: (2026)

AsyncSparse: Accelerating Sparse Matrix-Matrix Multiplication on Asynchronous GPU Architectures
by: Liu, Jie, et al.
Published: (2026)

A Preliminary Study on Accelerating Simulation Optimization with GPU Implementation
by: He, Jinghai, et al.
Published: (2024)

HC-SpMM: Accelerating Sparse Matrix-Matrix Multiplication for Graphs with Hybrid GPU Cores
by: Li, Zhonggen, et al.
Published: (2024)

ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism
by: Liu, Zedong, et al.
Published: (2025)