Saved in:
| Main Authors: | La, Hoa, Gupta, Ahan, Morehead, Alex, Cheng, Jianlin, Zhang, Minjia |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.20686 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism
by: Gupta, Ahan, et al.
Published: (2026)
by: Gupta, Ahan, et al.
Published: (2026)
Fold-CP: A Context Parallelism Framework for Biomolecular Modeling
by: Lin, Dejun, et al.
Published: (2026)
by: Lin, Dejun, et al.
Published: (2026)
APACE: AlphaFold2 and advanced computing as a service for accelerated discovery in biophysics
by: Park, Hyun, et al.
Published: (2023)
by: Park, Hyun, et al.
Published: (2023)
Optimizations on Graph-Level for Domain Specific Computations in Julia and Application to QED
by: Reinhard, Anton, et al.
Published: (2025)
by: Reinhard, Anton, et al.
Published: (2025)
Hardware-Agnostic and Insightful Efficiency Metrics for Accelerated Systems: Definition and Implementation within TALP
by: Rahimi, Ghazal, et al.
Published: (2026)
by: Rahimi, Ghazal, et al.
Published: (2026)
Matryoshka: Optimization of Dynamic Diverse Quantum Chemistry Systems via Elastic Parallelism Transformation
by: Wang, Tuowei, et al.
Published: (2024)
by: Wang, Tuowei, et al.
Published: (2024)
BurstGPT: A Real-world Workload Dataset to Optimize LLM Serving Systems
by: Wang, Yuxin, et al.
Published: (2024)
by: Wang, Yuxin, et al.
Published: (2024)
Performance Optimization in Stream Processing Systems: Experiment-Driven Configuration Tuning for Kafka Streams
by: Chen, David, et al.
Published: (2026)
by: Chen, David, et al.
Published: (2026)
PASTA: A Modular Program Analysis Tool Framework for Accelerators
by: Lin, Mao, et al.
Published: (2026)
by: Lin, Mao, et al.
Published: (2026)
LMDeploy Accelerates Mixed-Precision LLM Inference with TurboMind
by: Zhang, Li, et al.
Published: (2025)
by: Zhang, Li, et al.
Published: (2025)
Taming Cold Starts: Proactive Serverless Scheduling with Model Predictive Control
by: Nguyen, Chanh, et al.
Published: (2025)
by: Nguyen, Chanh, et al.
Published: (2025)
Parallel I/O Characterization and Optimization on Large-Scale HPC Systems: A 360-Degree Survey
by: Ather, Hammad, et al.
Published: (2024)
by: Ather, Hammad, et al.
Published: (2024)
Accelerating Gaussian beam tracing method with dynamic parallelism on graphics processing units
by: Sheng, Zhang, et al.
Published: (2025)
by: Sheng, Zhang, et al.
Published: (2025)
Ridgeline: A 2D Roofline Model for Distributed Systems
by: Checconi, Fabio, et al.
Published: (2022)
by: Checconi, Fabio, et al.
Published: (2022)
AcceleratedKernels.jl: Cross-Architecture Parallel Algorithms from a Unified, Transpiled Codebase
by: Nicusan, Andrei-Leonard, et al.
Published: (2025)
by: Nicusan, Andrei-Leonard, et al.
Published: (2025)
Orthrus: Accelerating Multi-BFT Consensus through Concurrent Partial Ordering of Transactions (Extended Version)
by: Lyu, Hanzheng, et al.
Published: (2024)
by: Lyu, Hanzheng, et al.
Published: (2024)
HeteGen: Heterogeneous Parallel Inference for Large Language Models on Resource-Constrained Devices
by: Zhao, Xuanlei, et al.
Published: (2024)
by: Zhao, Xuanlei, et al.
Published: (2024)
Cloud Resource Allocation with Convex Optimization
by: Boghani, Shayan, et al.
Published: (2025)
by: Boghani, Shayan, et al.
Published: (2025)
A Multi-Port Concurrent Communication Model for handling Compute Intensive Tasks on Distributed Satellite System Constellations
by: Veeravalli, Bharadwaj
Published: (2026)
by: Veeravalli, Bharadwaj
Published: (2026)
Inductive Loop Analysis for Practical HPC Application Optimization
by: Schaad, Philipp, et al.
Published: (2025)
by: Schaad, Philipp, et al.
Published: (2025)
Staging Blocked Evaluation over Structured Sparse Matrices
by: Das, Pratyush, et al.
Published: (2024)
by: Das, Pratyush, et al.
Published: (2024)
Disaggregated Design for GPU-Based Volumetric Data Structures
by: Meneghin, Massimiliano, et al.
Published: (2025)
by: Meneghin, Massimiliano, et al.
Published: (2025)
Fine-Grained Energy Prediction For Parallellized LLM Inference With PIE-P
by: Dutt, Anurag, et al.
Published: (2025)
by: Dutt, Anurag, et al.
Published: (2025)
EfiMon: A Process Analyser for Granular Power Consumption Prediction
by: León-Vega, Luis G., et al.
Published: (2024)
by: León-Vega, Luis G., et al.
Published: (2024)
RAPID-LLM: Resilience-Aware Performance analysis of Infrastructure for Distributed LLM Training and Inference
by: Karfakis, George, et al.
Published: (2025)
by: Karfakis, George, et al.
Published: (2025)
Data-Driven Analysis to Understand GPU Hardware Resource Usage of Optimizations
by: Islam, Tanzima Z., et al.
Published: (2024)
by: Islam, Tanzima Z., et al.
Published: (2024)
Collaborative Processing for Multi-Tenant Inference on Memory-Constrained Edge TPUs
by: Ng, Nathan, et al.
Published: (2026)
by: Ng, Nathan, et al.
Published: (2026)
An Online Probabilistic Distributed Tracing System
by: Toslali, M., et al.
Published: (2024)
by: Toslali, M., et al.
Published: (2024)
Efficient Serverless Cold Start: Reducing Library Loading Overhead by Profile-guided Optimization
by: Tariq, Syed Salauddin Mohammad, et al.
Published: (2025)
by: Tariq, Syed Salauddin Mohammad, et al.
Published: (2025)
Vectorization of Gradient Boosting of Decision Trees Prediction in the CatBoost Library for RISC-V Processors
by: Kozinov, Evgeny, et al.
Published: (2024)
by: Kozinov, Evgeny, et al.
Published: (2024)
Understanding Power Consumption Metric on Heterogeneous Memory Systems
by: Proaño, Andrès Rubio, et al.
Published: (2024)
by: Proaño, Andrès Rubio, et al.
Published: (2024)
Towards a Peer-to-Peer Data Distribution Layer for Efficient and Collaborative Resource Optimization of Distributed Dataflow Applications
by: Scheinert, Dominik, et al.
Published: (2023)
by: Scheinert, Dominik, et al.
Published: (2023)
A Precision Emulation Approach to the GPU Acceleration of Ab Initio Electronic Structure Calculations
by: Liu, Hang, et al.
Published: (2026)
by: Liu, Hang, et al.
Published: (2026)
Opt4GPTQ: Co-Optimizing Memory and Computation for 4-bit GPTQ Quantized LLM Inference on Heterogeneous Platforms
by: Zhang, Yaozheng, et al.
Published: (2025)
by: Zhang, Yaozheng, et al.
Published: (2025)
Scalable Systems and Software Architectures for High-Performance Computing on cloud platforms
by: Ramesh, Risshab Srinivas
Published: (2024)
by: Ramesh, Risshab Srinivas
Published: (2024)
Operational Strategies for Non-Disruptive Scheduling Transitions in Production HPC Systems
by: MacLachlan, Glen, et al.
Published: (2026)
by: MacLachlan, Glen, et al.
Published: (2026)
A Comprehensive Analysis of Process Energy Consumption on Multi-Socket Systems with GPUs
by: León-Vega, Luis G., et al.
Published: (2024)
by: León-Vega, Luis G., et al.
Published: (2024)
FalconFS: Distributed File System for Large-Scale Deep Learning Pipeline
by: Xu, Jingwei, et al.
Published: (2025)
by: Xu, Jingwei, et al.
Published: (2025)
CARAT: Client-Side Adaptive RPC and Cache Co-Tuning for Parallel File Systems
by: Rashid, Md Hasanur, et al.
Published: (2026)
by: Rashid, Md Hasanur, et al.
Published: (2026)
Resource Management Schemes for Cloud-Native Platforms with Computing Containers of Docker and Kubernetes
by: Mao, Ying, et al.
Published: (2020)
by: Mao, Ying, et al.
Published: (2020)
Similar Items
-
AutoSP: Unlocking Long-Context LLM Training Via Compiler-Based Sequence Parallelism
by: Gupta, Ahan, et al.
Published: (2026) -
Fold-CP: A Context Parallelism Framework for Biomolecular Modeling
by: Lin, Dejun, et al.
Published: (2026) -
APACE: AlphaFold2 and advanced computing as a service for accelerated discovery in biophysics
by: Park, Hyun, et al.
Published: (2023) -
Optimizations on Graph-Level for Domain Specific Computations in Julia and Application to QED
by: Reinhard, Anton, et al.
Published: (2025) -
Hardware-Agnostic and Insightful Efficiency Metrics for Accelerated Systems: Definition and Implementation within TALP
by: Rahimi, Ghazal, et al.
Published: (2026)