Saved in:
| Main Authors: | Jacobson, John, Burtscher, Martin, Gopalakrishnan, Ganesh |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.04701 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
SIMT/GPU Data Race Verification using ISCC and Intermediary Code Representations: A Case Study
by: Osterhout, Andrew, et al.
Published: (2025)
by: Osterhout, Andrew, et al.
Published: (2025)
Fast Topology-Aware Lossy Data Compression with Full Preservation of Critical Points and Local Order
by: Fallin, Alex, et al.
Published: (2026)
by: Fallin, Alex, et al.
Published: (2026)
Lessons Learned on the Path to Guaranteeing the Error Bound in Lossy Quantizers
by: Fallin, Alex, et al.
Published: (2024)
by: Fallin, Alex, et al.
Published: (2024)
A GPU accelerated mixed-precision Smoothed Particle Hydrodynamics framework with cell-based relative coordinates
by: Mao, Zirui, et al.
Published: (2023)
by: Mao, Zirui, et al.
Published: (2023)
HiCR, an Abstract Model for Distributed Heterogeneous Programming
by: Martin, Sergio Miguel, et al.
Published: (2025)
by: Martin, Sergio Miguel, et al.
Published: (2025)
Concurrent Scheduling of High-Level Parallel Programs on Multi-GPU Systems
by: Knorr, Fabian, et al.
Published: (2025)
by: Knorr, Fabian, et al.
Published: (2025)
HMTRace: Hardware-Assisted Memory-Tagging based Dynamic Data Race Detection
by: Shastri, Jaidev, et al.
Published: (2024)
by: Shastri, Jaidev, et al.
Published: (2024)
Data Race Satisfiability on Array Elements
by: Shim, Junhyung, et al.
Published: (2025)
by: Shim, Junhyung, et al.
Published: (2025)
FastGraph: Optimized GPU-Enabled Algorithms for Fast Graph Building and Message Passing
by: Agarwal, Aarush, et al.
Published: (2025)
by: Agarwal, Aarush, et al.
Published: (2025)
Understanding GPU Resource Interference One Level Deeper
by: Elvinger, Paul, et al.
Published: (2025)
by: Elvinger, Paul, et al.
Published: (2025)
Towards Fast Setup and High Throughput of GPU Serverless Computing
by: Zhao, Han, et al.
Published: (2024)
by: Zhao, Han, et al.
Published: (2024)
VDCores: Resource Decoupled Programming and Execution for Asynchronous GPU
by: He, Zijian, et al.
Published: (2026)
by: He, Zijian, et al.
Published: (2026)
TopoSZp: Lightweight Topology-Aware Error-controlled Compression for Scientific Data
by: Agarwal, Tripti, et al.
Published: (2026)
by: Agarwal, Tripti, et al.
Published: (2026)
Evaluation of Programming Models and Performance for Stencil Computation on Current GPU Architectures
by: Shan, Baodi, et al.
Published: (2024)
by: Shan, Baodi, et al.
Published: (2024)
cuFastTuckerPlus: A Stochastic Parallel Sparse FastTucker Decomposition Using GPU Tensor Cores
by: Li, Zixuan, et al.
Published: (2024)
by: Li, Zixuan, et al.
Published: (2024)
TurboFFT: A High-Performance Fast Fourier Transform with Fault Tolerance on GPU
by: Wu, Shixun, et al.
Published: (2024)
by: Wu, Shixun, et al.
Published: (2024)
FastTrack: GPU-Accelerated Tracking for Visual SLAM
by: Khabiri, Kimia, et al.
Published: (2025)
by: Khabiri, Kimia, et al.
Published: (2025)
HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis
by: Zhang, Shiwei, et al.
Published: (2024)
by: Zhang, Shiwei, et al.
Published: (2024)
A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture
by: Yi, Xinyao
Published: (2024)
by: Yi, Xinyao
Published: (2024)
GPU Programming for AI Workflow Development on AWS SageMaker: An Instructional Approach
by: Srinivasan, Sriram, et al.
Published: (2025)
by: Srinivasan, Sriram, et al.
Published: (2025)
AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping
by: Park, Seongyeon, et al.
Published: (2024)
by: Park, Seongyeon, et al.
Published: (2024)
Taking GPU Programming Models to Task for Performance Portability
by: Davis, Joshua H., et al.
Published: (2024)
by: Davis, Joshua H., et al.
Published: (2024)
HoSZp: An Efficient Homomorphic Error-bounded Lossy Compressor for Scientific Data
by: Agarwal, Tripti, et al.
Published: (2024)
by: Agarwal, Tripti, et al.
Published: (2024)
HiCCL: A Hierarchical Collective Communication Library
by: Hidayetoglu, Mert, et al.
Published: (2024)
by: Hidayetoglu, Mert, et al.
Published: (2024)
Improving GPU Multi-Tenancy Through Dynamic Multi-Instance GPU Reconfiguration
by: Wang, Tianyu, et al.
Published: (2024)
by: Wang, Tianyu, et al.
Published: (2024)
Cppless: Single-Source and High-Performance Serverless Programming in C++
by: Copik, Marcin, et al.
Published: (2024)
by: Copik, Marcin, et al.
Published: (2024)
HiCoCS: High Concurrency Cross-Sharding on Permissioned Blockchains
by: Yang, Lingxiao, et al.
Published: (2025)
by: Yang, Lingxiao, et al.
Published: (2025)
What Operations can be Performed Directly on Compressed Arrays, and with What Error?
by: Agarwal, Tripti, et al.
Published: (2024)
by: Agarwal, Tripti, et al.
Published: (2024)
Fast and Scalable Mixed Precision Euclidean Distance Calculations Using GPU Tensor Cores
by: Curless, Brian, et al.
Published: (2025)
by: Curless, Brian, et al.
Published: (2025)
Accurate GPU Memory Prediction for Deep Learning Jobs through Dynamic Analysis
by: Shi, Jiabo, et al.
Published: (2025)
by: Shi, Jiabo, et al.
Published: (2025)
Accelerating Biclique Counting on GPU
by: Qiu, Linshan, et al.
Published: (2024)
by: Qiu, Linshan, et al.
Published: (2024)
GPU Sharing with Triples Mode
by: Byun, Chansup, et al.
Published: (2024)
by: Byun, Chansup, et al.
Published: (2024)
ParvaGPU: Efficient Spatial GPU Sharing for Large-Scale DNN Inference in Cloud Environments
by: Lee, Munkyu, et al.
Published: (2024)
by: Lee, Munkyu, et al.
Published: (2024)
Accelerating Intra-Node GPU-to-GPU Communication Through Multi-Path Transfers with CUDA Graphs
by: Sojoodi, Amirhossein, et al.
Published: (2026)
by: Sojoodi, Amirhossein, et al.
Published: (2026)
GPU Accelerated Sparse Cholesky Factorization
by: Karsavuran, M. Ozan, et al.
Published: (2024)
by: Karsavuran, M. Ozan, et al.
Published: (2024)
Heat: Satellite's meat is GPU's poison
by: Yuan, Zhehu, et al.
Published: (2024)
by: Yuan, Zhehu, et al.
Published: (2024)
DuaLip-GPU Technical Report
by: Dexter, Gregory, et al.
Published: (2026)
by: Dexter, Gregory, et al.
Published: (2026)
Incidence Constraints in Hypergraph Partitioning on GPU
by: Ronzani, Marco, et al.
Published: (2026)
by: Ronzani, Marco, et al.
Published: (2026)
Predictable LLM Serving on GPU Clusters
by: Darzi, Erfan, et al.
Published: (2025)
by: Darzi, Erfan, et al.
Published: (2025)
CheckMate: LLM-Powered Approximate Intermittent Computing
by: Sayyid-Ali, Abdur-Rahman Ibrahim, et al.
Published: (2024)
by: Sayyid-Ali, Abdur-Rahman Ibrahim, et al.
Published: (2024)
Similar Items
-
SIMT/GPU Data Race Verification using ISCC and Intermediary Code Representations: A Case Study
by: Osterhout, Andrew, et al.
Published: (2025) -
Fast Topology-Aware Lossy Data Compression with Full Preservation of Critical Points and Local Order
by: Fallin, Alex, et al.
Published: (2026) -
Lessons Learned on the Path to Guaranteeing the Error Bound in Lossy Quantizers
by: Fallin, Alex, et al.
Published: (2024) -
A GPU accelerated mixed-precision Smoothed Particle Hydrodynamics framework with cell-based relative coordinates
by: Mao, Zirui, et al.
Published: (2023) -
HiCR, an Abstract Model for Distributed Heterogeneous Programming
by: Martin, Sergio Miguel, et al.
Published: (2025)