Saved in:
| Main Authors: | Liu, Rongrong, Guo, Zhuoqiang, Sha, Qiuchen, Zhao, Tong, Li, Haibo, Hu, Wei, Liu, Lijun, Tan, Guangming, Jia, Weile |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2501.03061 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Deep Learning-Enabled Supercritical Flame Simulation at Detailed Chemistry and Real-Fluid Accuracy Towards Trillion-Cell Scale
by: Guo, Zhuoqiang, et al.
Published: (2025)
by: Guo, Zhuoqiang, et al.
Published: (2025)
JanusPipe: Efficient Pipeline Parallel Training for Machine Learning Interatomic Potentials
by: Wang, Hongyu, et al.
Published: (2026)
by: Wang, Hongyu, et al.
Published: (2026)
Breaking the Training Barrier of Billion-Parameter Universal Machine Learning Interatomic Potentials
by: Zhou, Yuanchang, et al.
Published: (2026)
by: Zhou, Yuanchang, et al.
Published: (2026)
Scaling Molecular Dynamics with ab initio Accuracy to 149 Nanoseconds per Day
by: Li, Jianxiong, et al.
Published: (2024)
by: Li, Jianxiong, et al.
Published: (2024)
FastCHGNet: Training one Universal Interatomic Potential to 1.5 Hours with 32 GPUs
by: Zhou, Yuanchang, et al.
Published: (2024)
by: Zhou, Yuanchang, et al.
Published: (2024)
Multi-level Memory-Centric Profiling on ARM Processors with ARM SPE
by: Miksits, Samuel, et al.
Published: (2024)
by: Miksits, Samuel, et al.
Published: (2024)
GPU-Accelerated Modified Bessel Function of the Second Kind for Gaussian Processes
by: Geng, Zipei, et al.
Published: (2025)
by: Geng, Zipei, et al.
Published: (2025)
Pipelined Dense Symmetric Eigenvalue Decomposition on Multi-GPU Architectures
by: Wang, Hansheng, et al.
Published: (2025)
by: Wang, Hansheng, et al.
Published: (2025)
GCAPS: GPU Context-Aware Preemptive Priority-based Scheduling for Real-Time Tasks
by: Wang, Yidi, et al.
Published: (2024)
by: Wang, Yidi, et al.
Published: (2024)
Large-scale Neural Network Quantum States for ab initio Quantum Chemistry Simulations on Fugaku
by: Xu, Hongtao, et al.
Published: (2025)
by: Xu, Hongtao, et al.
Published: (2025)
Exploring the Viability of Unikernels for ARM-powered Edge Computing
by: Kaiser, Shahidullah, et al.
Published: (2024)
by: Kaiser, Shahidullah, et al.
Published: (2024)
Demystifying ARM SME to Optimize General Matrix Multiplications
by: Deng, Chencheng, et al.
Published: (2025)
by: Deng, Chencheng, et al.
Published: (2025)
Dependency-aware Resource Allocation for Serverless Functions at the Edge
by: Baresi, Luciano, et al.
Published: (2023)
by: Baresi, Luciano, et al.
Published: (2023)
A Framework for Fine-Grained Synchronization of Dependent GPU Kernels
by: Jangda, Abhinav, et al.
Published: (2023)
by: Jangda, Abhinav, et al.
Published: (2023)
Orchestrating the Execution of Serverless Functions in Hybrid Clouds
by: Peri, Aristotelis, et al.
Published: (2024)
by: Peri, Aristotelis, et al.
Published: (2024)
Computational Performance and Energy Efficiency of ARM based HPC servers
by: Schirmer, Oskar
Published: (2024)
by: Schirmer, Oskar
Published: (2024)
A Hybrid Vectorized Merge Sort on ARM NEON
by: Zhou, Jincheng, et al.
Published: (2024)
by: Zhou, Jincheng, et al.
Published: (2024)
HAS-GPU: Efficient Hybrid Auto-scaling with Fine-grained GPU Allocation for SLO-aware Serverless Inferences
by: Gu, Jianfeng, et al.
Published: (2025)
by: Gu, Jianfeng, et al.
Published: (2025)
Unleashing the Power of Preemptive Priority-based Scheduling for Real-Time GPU Tasks
by: Wang, Yidi, et al.
Published: (2024)
by: Wang, Yidi, et al.
Published: (2024)
MQFQ-Sticky: Fair Queueing For Serverless GPU Functions
by: Fuerst, Alexander, et al.
Published: (2025)
by: Fuerst, Alexander, et al.
Published: (2025)
Heat: Satellite's meat is GPU's poison
by: Yuan, Zhehu, et al.
Published: (2024)
by: Yuan, Zhehu, et al.
Published: (2024)
Combining GPU and CPU for accelerating evolutionary computing workloads
by: Eynaliyev, Rustam, et al.
Published: (2025)
by: Eynaliyev, Rustam, et al.
Published: (2025)
ARM SVE Unleashed: Performance and Insights Across HPC Applications on Nvidia Grace
by: Shi, Ruimin, et al.
Published: (2025)
by: Shi, Ruimin, et al.
Published: (2025)
High-performance Vector-length Agnostic Quantum Circuit Simulations on ARM Processors
by: Shi, Ruimin, et al.
Published: (2026)
by: Shi, Ruimin, et al.
Published: (2026)
Breaking the Memory Wall: A Study of I/O Patterns and GPU Memory Utilization for Hybrid CPU-GPU Offloaded Optimizers
by: Maurya, Avinash, et al.
Published: (2024)
by: Maurya, Avinash, et al.
Published: (2024)
Fast and Scalable Mixed Precision Euclidean Distance Calculations Using GPU Tensor Cores
by: Curless, Brian, et al.
Published: (2025)
by: Curless, Brian, et al.
Published: (2025)
Characterization-Guided GPU Fault Resilience in NVIDIA MPS
by: Liu, Rixin, et al.
Published: (2026)
by: Liu, Rixin, et al.
Published: (2026)
Performance Isolation and Semantic Determinism in Efficient GPU Spatial Sharing
by: Yang, Zhenyuan, et al.
Published: (2026)
by: Yang, Zhenyuan, et al.
Published: (2026)
Parallel GPU-Enabled Algorithms for SpGEMM on Arbitrary Semirings with Hybrid Communication
by: McFarland, Thomas, et al.
Published: (2025)
by: McFarland, Thomas, et al.
Published: (2025)
Fantasy: Efficient Large-scale Vector Search on GPU Clusters with GPUDirect Async
by: Liu, Yi, et al.
Published: (2025)
by: Liu, Yi, et al.
Published: (2025)
Serving Hybrid LLM Loads with SLO Guarantees Using CPU-GPU Attention Piggybacking
by: Mo, Zizhao, et al.
Published: (2026)
by: Mo, Zizhao, et al.
Published: (2026)
ICPS: Real-Time Resource Configuration for Cloud Serverless Functions Considering Affinity
by: Chen, Long, et al.
Published: (2025)
by: Chen, Long, et al.
Published: (2025)
A Precision Emulation Approach to the GPU Acceleration of Ab Initio Electronic Structure Calculations
by: Liu, Hang, et al.
Published: (2026)
by: Liu, Hang, et al.
Published: (2026)
SWIFT: Expedited Failure Recovery for Large-scale DNN Training
by: Zhong, Yuchen, et al.
Published: (2023)
by: Zhong, Yuchen, et al.
Published: (2023)
GPU-Accelerated Batch-Dynamic Subgraph Matching
by: Qiu, Linshan, et al.
Published: (2024)
by: Qiu, Linshan, et al.
Published: (2024)
SDSL-Solver: Scalable Distributed Sparse Linear Solvers for Large-Scale Interior Point Methods
by: Yang, Shaofeng, et al.
Published: (2026)
by: Yang, Shaofeng, et al.
Published: (2026)
AsyncSparse: Accelerating Sparse Matrix-Matrix Multiplication on Asynchronous GPU Architectures
by: Liu, Jie, et al.
Published: (2026)
by: Liu, Jie, et al.
Published: (2026)
A Preliminary Study on Accelerating Simulation Optimization with GPU Implementation
by: He, Jinghai, et al.
Published: (2024)
by: He, Jinghai, et al.
Published: (2024)
HC-SpMM: Accelerating Sparse Matrix-Matrix Multiplication for Graphs with Hybrid GPU Cores
by: Li, Zhonggen, et al.
Published: (2024)
by: Li, Zhonggen, et al.
Published: (2024)
ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism
by: Liu, Zedong, et al.
Published: (2025)
by: Liu, Zedong, et al.
Published: (2025)
Similar Items
-
Deep Learning-Enabled Supercritical Flame Simulation at Detailed Chemistry and Real-Fluid Accuracy Towards Trillion-Cell Scale
by: Guo, Zhuoqiang, et al.
Published: (2025) -
JanusPipe: Efficient Pipeline Parallel Training for Machine Learning Interatomic Potentials
by: Wang, Hongyu, et al.
Published: (2026) -
Breaking the Training Barrier of Billion-Parameter Universal Machine Learning Interatomic Potentials
by: Zhou, Yuanchang, et al.
Published: (2026) -
Scaling Molecular Dynamics with ab initio Accuracy to 149 Nanoseconds per Day
by: Li, Jianxiong, et al.
Published: (2024) -
FastCHGNet: Training one Universal Interatomic Potential to 1.5 Hours with 32 GPUs
by: Zhou, Yuanchang, et al.
Published: (2024)