:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	La, Hoa, Gupta, Ahan, Morehead, Alex, Cheng, Jianlin, Zhang, Minjia
Format:	Preprint
Published:	2025
Subjects:	Biomolecules Distributed, Parallel, and Cluster Computing Machine Learning Performance
Online Access:	https://arxiv.org/abs/2506.20686
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism
by: Gupta, Ahan, et al.
Published: (2026)

Fold-CP: A Context Parallelism Framework for Biomolecular Modeling
by: Lin, Dejun, et al.
Published: (2026)

APACE: AlphaFold2 and advanced computing as a service for accelerated discovery in biophysics
by: Park, Hyun, et al.
Published: (2023)

Optimizations on Graph-Level for Domain Specific Computations in Julia and Application to QED
by: Reinhard, Anton, et al.
Published: (2025)

Hardware-Agnostic and Insightful Efficiency Metrics for Accelerated Systems: Definition and Implementation within TALP
by: Rahimi, Ghazal, et al.
Published: (2026)

Matryoshka: Optimization of Dynamic Diverse Quantum Chemistry Systems via Elastic Parallelism Transformation
by: Wang, Tuowei, et al.
Published: (2024)

BurstGPT: A Real-world Workload Dataset to Optimize LLM Serving Systems
by: Wang, Yuxin, et al.
Published: (2024)

Performance Optimization in Stream Processing Systems: Experiment-Driven Configuration Tuning for Kafka Streams
by: Chen, David, et al.
Published: (2026)

PASTA: A Modular Program Analysis Tool Framework for Accelerators
by: Lin, Mao, et al.
Published: (2026)

LMDeploy Accelerates Mixed-Precision LLM Inference with TurboMind
by: Zhang, Li, et al.
Published: (2025)

Taming Cold Starts: Proactive Serverless Scheduling with Model Predictive Control
by: Nguyen, Chanh, et al.
Published: (2025)

Parallel I/O Characterization and Optimization on Large-Scale HPC Systems: A 360-Degree Survey
by: Ather, Hammad, et al.
Published: (2024)

Accelerating Gaussian beam tracing method with dynamic parallelism on graphics processing units
by: Sheng, Zhang, et al.
Published: (2025)

Ridgeline: A 2D Roofline Model for Distributed Systems
by: Checconi, Fabio, et al.
Published: (2022)

AcceleratedKernels.jl: Cross-Architecture Parallel Algorithms from a Unified, Transpiled Codebase
by: Nicusan, Andrei-Leonard, et al.
Published: (2025)

Orthrus: Accelerating Multi-BFT Consensus through Concurrent Partial Ordering of Transactions (Extended Version)
by: Lyu, Hanzheng, et al.
Published: (2024)

HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices
by: Zhao, Xuanlei, et al.
Published: (2024)

Cloud Resource Allocation with Convex Optimization
by: Boghani, Shayan, et al.
Published: (2025)

A Multi-Port Concurrent Communication Model for handling Compute Intensive Tasks on Distributed Satellite System Constellations
by: Veeravalli, Bharadwaj
Published: (2026)

Inductive Loop Analysis for Practical HPC Application Optimization
by: Schaad, Philipp, et al.
Published: (2025)

Staging Blocked Evaluation over Structured Sparse Matrices
by: Das, Pratyush, et al.
Published: (2024)

Disaggregated Design for GPU-Based Volumetric Data Structures
by: Meneghin, Massimiliano, et al.
Published: (2025)

Fine-Grained Energy Prediction For Parallellized LLM Inference With PIE-P
by: Dutt, Anurag, et al.
Published: (2025)

EfiMon: A Process Analyser for Granular Power Consumption Prediction
by: León-Vega, Luis G., et al.
Published: (2024)

RAPID-LLM: Resilience-Aware Performance analysis of Infrastructure for Distributed LLM Training and Inference
by: Karfakis, George, et al.
Published: (2025)

Data-Driven Analysis to Understand GPU Hardware Resource Usage of Optimizations
by: Islam, Tanzima Z., et al.
Published: (2024)

Collaborative Processing for Multi-Tenant Inference on Memory-Constrained Edge TPUs
by: Ng, Nathan, et al.
Published: (2026)

An Online Probabilistic Distributed Tracing System
by: Toslali, M., et al.
Published: (2024)

Efficient Serverless Cold Start: Reducing Library Loading Overhead by Profile-guided Optimization
by: Tariq, Syed Salauddin Mohammad, et al.
Published: (2025)

Vectorization of Gradient Boosting of Decision Trees Prediction in the CatBoost Library for RISC-V Processors
by: Kozinov, Evgeny, et al.
Published: (2024)

Understanding Power Consumption Metric on Heterogeneous Memory Systems
by: Proaño, Andrès Rubio, et al.
Published: (2024)

Towards a Peer-to-Peer Data Distribution Layer for Efficient and Collaborative Resource Optimization of Distributed Dataflow Applications
by: Scheinert, Dominik, et al.
Published: (2023)

A Precision Emulation Approach to the GPU Acceleration of Ab Initio Electronic Structure Calculations
by: Liu, Hang, et al.
Published: (2026)

Opt4GPTQ: Co-Optimizing Memory and Computation for 4-bit GPTQ Quantized LLM Inference on Heterogeneous Platforms
by: Zhang, Yaozheng, et al.
Published: (2025)

Scalable Systems and Software Architectures for High-Performance Computing on cloud platforms
by: Ramesh, Risshab Srinivas
Published: (2024)

Operational Strategies for Non-Disruptive Scheduling Transitions in Production HPC Systems
by: MacLachlan, Glen, et al.
Published: (2026)

A Comprehensive Analysis of Process Energy Consumption on Multi-Socket Systems with GPUs
by: León-Vega, Luis G., et al.
Published: (2024)

FalconFS: Distributed File System for Large-Scale Deep Learning Pipeline
by: Xu, Jingwei, et al.
Published: (2025)

CARAT: Client-Side Adaptive RPC and Cache Co-Tuning for Parallel File Systems
by: Rashid, Md Hasanur, et al.
Published: (2026)

Resource Management Schemes for Cloud-Native Platforms with Computing Containers of Docker and Kubernetes
by: Mao, Ying, et al.
Published: (2020)