:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Kulkarni, Sudhanshu, Loring, Burlen, Bethel, E. Wes
Format:	Preprint
Published:	2024
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2402.01843
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Exploring Fast Fourier Transforms on the Tenstorrent Wormhole
by: Brown, Nick, et al.
Published: (2025)

TurboFFT: A High-Performance Fast Fourier Transform with Fault Tolerance on GPU
by: Wu, Shixun, et al.
Published: (2024)

Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training
by: Lu, Yishun, et al.
Published: (2026)

TurboFFT: Co-Designed High-Performance and Fault-Tolerant Fast Fourier Transform on GPUs
by: Wu, Shixun, et al.
Published: (2024)

Towards a Testbed for Scalable FaaS Platforms
by: Schirmer, Trever, et al.
Published: (2025)

Transforming Lock-free Linked Lists into Distributed Lock-free Linked Lists
by: Ravishankar, Raaghav, et al.
Published: (2025)

AI-coupled HPC Workflow Applications, Middleware and Performance
by: Brewer, Wes, et al.
Published: (2024)

Towards Efficient and Scalable Distributed Vector Search with RDMA
by: Zhi, Xiangyu, et al.
Published: (2025)

Hello SME! Generating Fast Matrix Multiplication Kernels Using the Scalable Matrix Extension
by: Remke, Stefan, et al.
Published: (2024)

Towards Fine-Grained Scalability for Stateful Stream Processing Systems
by: Qing, Yunfan, et al.
Published: (2025)

emucxl: an emulation framework for CXL-based disaggregated memory applications
by: Gond, Raja, et al.
Published: (2024)

Towards Fast Setup and High Throughput of GPU Serverless Computing
by: Zhao, Han, et al.
Published: (2024)

Expert-as-a-Service: Towards Efficient, Scalable, and Robust Large-scale MoE Serving
by: Liu, Ziming, et al.
Published: (2025)

Tolerance to Asynchrony of an Algorithm for Gathering Myopic Robots on an Infinite Triangular Grid
by: Gupta, Arya Tanmay, et al.
Published: (2023)

Fully Lattice-Linear Algorithms
by: Gupta, Arya Tanmay, et al.
Published: (2022)

Tolerance to Asynchrony in Algorithms for Multiplication and Modulo
by: Gupta, Arya Tanmay, et al.
Published: (2023)

Approximated Coded Computing: Towards Fast, Private and Secure Distributed Machine Learning
by: Qiu, Houming, et al.
Published: (2024)

FPTC: A Fast Parallel Transform-based Codec for Efficient Asymmetric Signal Compression
by: Mechels, Ben, et al.
Published: (2026)

FlashMP: Fast Discrete Transform-Based Solver for Preconditioning Maxwell's Equations on GPUs
by: Zhang, Haoyuan, et al.
Published: (2025)

CloudFix: Automated Policy Repair for Cloud Access Control Policies Using Large Language Models
by: Hall, Bethel, et al.
Published: (2025)

Towards a Scalable and Efficient PGAS-based Distributed OpenMP
by: Shan, Baodi, et al.
Published: (2024)

Parallel Data Object Creation: Towards Scalable Metadata Management in High-Performance I/O Library
by: Li, Youjia, et al.
Published: (2025)

Asynchronous Checkpoint for Eventually Consistent Databases
by: Ravishankar, Raaghav, et al.
Published: (2025)

Fast and Scalable Mixed Precision Euclidean Distance Calculations Using GPU Tensor Cores
by: Curless, Brian, et al.
Published: (2025)

Distributing Context-Aware Shared Memory Data Structures: A Case Study on Singly-Linked Lists
by: Ravishankar, Raaghav, et al.
Published: (2024)

Characterizing Production GPU Workloads using System-wide Telemetry Data
by: Cankur, Onur, et al.
Published: (2025)

Toward Scalable Docker-Based Emulations of Blockchain Networks for Research and Development
by: Pennino, Diego, et al.
Published: (2024)

Scalable and Performant Data Loading
by: Hira, Moto, et al.
Published: (2025)

Fast-HotStuff: A Fast and Resilient HotStuff Protocol
by: Jalalzai, Mohammad M., et al.
Published: (2020)

SProBench: Stream Processing Benchmark for High Performance Computing Infrastructure
by: Kulkarni, Apurv Deepak, et al.
Published: (2025)

DataStates-LLM: Scalable Checkpointing for Transformer Models Using Composable State Providers
by: Maurya, Avinash, et al.
Published: (2026)

Pilotfish: Distributed Execution for Scalable Blockchains
by: Kniep, Quentin, et al.
Published: (2024)

Scalable Maxflow Processing for Dynamic Graphs
by: Kannappan, Shruthi, et al.
Published: (2025)

Robust and Scalable Renaming with Subquadratic Bits
by: Bai, Sirui, et al.
Published: (2025)

Fault-Tolerant Decentralized Distributed Asynchronous Federated Learning with Adaptive Termination Detection
by: Akkinepally, Phani Sahasra, et al.
Published: (2025)

A Fast Confirmation Rule (aka Fast Synchronous Finality) for the Ethereum Consensus Protocol
by: Asgaonkar, Aditya, et al.
Published: (2024)

FastGraph: Optimized GPU-Enabled Algorithms for Fast Graph Building and Message Passing
by: Agarwal, Aarush, et al.
Published: (2025)

TD-Orch: Scalable Load-Balancing for Distributed Systems with Applications to Graph Processing
by: Zhao, Yiwei, et al.
Published: (2025)

FLeeC: a Fast Lock-Free Application Cache
by: Costa, André J., et al.
Published: (2024)

Wilkins: HPC In Situ Workflows Made Easy
by: Yildiz, Orcun, et al.
Published: (2024)