:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Sheng, Zhang, Duan, Lishu, Jiang, Hanbo
Format:	Preprint
Veröffentlicht:	2025
Schlagworte:	Performance Distributed, Parallel, and Cluster Computing
Online-Zugang:	https://arxiv.org/abs/2501.13382
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

A dynamic parallel method for performance optimization on hybrid CPUs
von: Yu, Luo, et al.
Veröffentlicht: (2024)

Pipit: Scripting the analysis of parallel execution traces
von: Bhatele, Abhinav, et al.
Veröffentlicht: (2023)

LMDeploy Accelerates Mixed-Precision LLM Inference with TurboMind
von: Zhang, Li, et al.
Veröffentlicht: (2025)

Orthrus: Accelerating Multi-BFT Consensus through Concurrent Partial Ordering of Transactions (Extended Version)
von: Lyu, Hanzheng, et al.
Veröffentlicht: (2024)

PASTA: A Modular Program Analysis Tool Framework for Accelerators
von: Lin, Mao, et al.
Veröffentlicht: (2026)

Efficient GPU-Centered Singular Value Decomposition Using the Divide-and-Conquer Method
von: Liu, Shifang, et al.
Veröffentlicht: (2025)

Can Tensor Cores Benefit Memory-Bound Kernels? (No!)
von: Zhang, Lingqi, et al.
Veröffentlicht: (2025)

Hardware-Agnostic and Insightful Efficiency Metrics for Accelerated Systems: Definition and Implementation within TALP
von: Rahimi, Ghazal, et al.
Veröffentlicht: (2026)

AcceleratedKernels.jl: Cross-Architecture Parallel Algorithms from a Unified, Transpiled Codebase
von: Nicusan, Andrei-Leonard, et al.
Veröffentlicht: (2025)

CARAT: Client-Side Adaptive RPC and Cache Co-Tuning for Parallel File Systems
von: Rashid, Md Hasanur, et al.
Veröffentlicht: (2026)

DIAL: Decentralized I/O AutoTuning via Learned Client-side Local Metrics for Parallel File System
von: Rashid, Md Hasanur, et al.
Veröffentlicht: (2026)

On Orchestrating Parallel Broadcasts for Distributed Ledgers
von: Sheng, Peiyao, et al.
Veröffentlicht: (2024)

Minos: Systematically Classifying Performance and Power Characteristics of GPU Workloads on HPC Clusters
von: Jain, Rutwik, et al.
Veröffentlicht: (2026)

Shifting the Sweet Spot: High-Performance Matrix-Free Method for High-Order Elasticity
von: Chang, Dali, et al.
Veröffentlicht: (2026)

Opt4GPTQ: Co-Optimizing Memory and Computation for 4-bit GPTQ Quantized LLM Inference on Heterogeneous Platforms
von: Zhang, Yaozheng, et al.
Veröffentlicht: (2025)

SHIRO: Near-Optimal Communication Strategies for Distributed Sparse Matrix Multiplication
von: Zhuang, Chen, et al.
Veröffentlicht: (2025)

Matryoshka: Optimization of Dynamic Diverse Quantum Chemistry Systems via Elastic Parallelism Transformation
von: Wang, Tuowei, et al.
Veröffentlicht: (2024)

Scaling Large-scale GNN Training to Thousands of Processors on CPU-based Supercomputers
von: Zhuang, Chen, et al.
Veröffentlicht: (2024)

FalconFS: Distributed File System for Large-Scale Deep Learning Pipeline
von: Xu, Jingwei, et al.
Veröffentlicht: (2025)

A Precision Emulation Approach to the GPU Acceleration of Ab Initio Electronic Structure Calculations
von: Liu, Hang, et al.
Veröffentlicht: (2026)

Accelerating Particle-in-Cell Monte Carlo Simulations with MPI, OpenMP/OpenACC and Asynchronous Multi-GPU Programming
von: Williams, Jeremy J., et al.
Veröffentlicht: (2024)

Extracting Practical, Actionable Energy Insights from Supercomputer Telemetry and Logs
von: Cornelius, Melanie, et al.
Veröffentlicht: (2025)

Profiling and optimization of multi-card GPU machine learning jobs
von: Lawenda, Marcin, et al.
Veröffentlicht: (2025)

Optimal Parallel Scheduling under Concave Speedup Functions
von: Li, Chengzhang, et al.
Veröffentlicht: (2025)

WebAssembly and Unikernels: A Comparative Study for Serverless at the Edge
von: Besozzi, Valerio, et al.
Veröffentlicht: (2025)

Bridding OT and PaaS in Edge-to-Cloud Continuum
von: Barrios, Carlos J, et al.
Veröffentlicht: (2025)

RAPID-LLM: Resilience-Aware Performance analysis of Infrastructure for Distributed LLM Training and Inference
von: Karfakis, George, et al.
Veröffentlicht: (2025)

Optimal Configuration of API Resources in Cloud Native Computing
von: Truyen, Eddy, et al.
Veröffentlicht: (2025)

Fine-Grained Energy Prediction For Parallellized LLM Inference With PIE-P
von: Dutt, Anurag, et al.
Veröffentlicht: (2025)

Introducing MareNostrum5: A European pre-exascale energy-efficient system designed to serve a broad spectrum of scientific workloads
von: Banchelli, Fabio, et al.
Veröffentlicht: (2025)

An Empirical Characterization of Outages and Incidents in Public Services for Large Language Models
von: Chu, Xiaoyu, et al.
Veröffentlicht: (2025)

Extrae.jl: Julia bindings for the Extrae HPC Profiler
von: Sanchez-Ramirez, Sergio, et al.
Veröffentlicht: (2025)

Is Sparse Matrix Reordering Effective for Sparse Matrix-Vector Multiplication?
von: Asudeh, Omid, et al.
Veröffentlicht: (2025)

mLR: Scalable Laminography Reconstruction based on Memoization
von: Ma, Bin, et al.
Veröffentlicht: (2025)

Inductive Loop Analysis for Practical HPC Application Optimization
von: Schaad, Philipp, et al.
Veröffentlicht: (2025)

SProBench: Stream Processing Benchmark for High Performance Computing Infrastructure
von: Kulkarni, Apurv Deepak, et al.
Veröffentlicht: (2025)

CGSim: A Simulation Framework for Large Scale Distributed Computing Environment
von: Vatsavai, Sairam Sri, et al.
Veröffentlicht: (2025)

Dissecting CPU-GPU Unified Physical Memory on AMD MI300A APUs
von: Wahlgren, Jacob, et al.
Veröffentlicht: (2025)

Characterizing Adaptive Mesh Refinement on Heterogeneous Platforms with Parthenon-VIBE
von: Poptani, Akash, et al.
Veröffentlicht: (2025)

Disaggregated Design for GPU-Based Volumetric Data Structures
von: Meneghin, Massimiliano, et al.
Veröffentlicht: (2025)