Saved in:
| Main Authors: | Kulkarni, Sudhanshu, Loring, Burlen, Bethel, E. Wes |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.01843 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Exploring Fast Fourier Transforms on the Tenstorrent Wormhole
by: Brown, Nick, et al.
Published: (2025)
by: Brown, Nick, et al.
Published: (2025)
TurboFFT: A High-Performance Fast Fourier Transform with Fault Tolerance on GPU
by: Wu, Shixun, et al.
Published: (2024)
by: Wu, Shixun, et al.
Published: (2024)
Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training
by: Lu, Yishun, et al.
Published: (2026)
by: Lu, Yishun, et al.
Published: (2026)
TurboFFT: Co-Designed High-Performance and Fault-Tolerant Fast Fourier Transform on GPUs
by: Wu, Shixun, et al.
Published: (2024)
by: Wu, Shixun, et al.
Published: (2024)
Towards a Testbed for Scalable FaaS Platforms
by: Schirmer, Trever, et al.
Published: (2025)
by: Schirmer, Trever, et al.
Published: (2025)
Transforming Lock-free Linked Lists into Distributed Lock-free Linked Lists
by: Ravishankar, Raaghav, et al.
Published: (2025)
by: Ravishankar, Raaghav, et al.
Published: (2025)
AI-coupled HPC Workflow Applications, Middleware and Performance
by: Brewer, Wes, et al.
Published: (2024)
by: Brewer, Wes, et al.
Published: (2024)
Towards Efficient and Scalable Distributed Vector Search with RDMA
by: Zhi, Xiangyu, et al.
Published: (2025)
by: Zhi, Xiangyu, et al.
Published: (2025)
Hello SME! Generating Fast Matrix Multiplication Kernels Using the Scalable Matrix Extension
by: Remke, Stefan, et al.
Published: (2024)
by: Remke, Stefan, et al.
Published: (2024)
Towards Fine-Grained Scalability for Stateful Stream Processing Systems
by: Qing, Yunfan, et al.
Published: (2025)
by: Qing, Yunfan, et al.
Published: (2025)
emucxl: an emulation framework for CXL-based disaggregated memory applications
by: Gond, Raja, et al.
Published: (2024)
by: Gond, Raja, et al.
Published: (2024)
Towards Fast Setup and High Throughput of GPU Serverless Computing
by: Zhao, Han, et al.
Published: (2024)
by: Zhao, Han, et al.
Published: (2024)
Expert-as-a-Service: Towards Efficient, Scalable, and Robust Large-scale MoE Serving
by: Liu, Ziming, et al.
Published: (2025)
by: Liu, Ziming, et al.
Published: (2025)
Tolerance to Asynchrony of an Algorithm for Gathering Myopic Robots on an Infinite Triangular Grid
by: Gupta, Arya Tanmay, et al.
Published: (2023)
by: Gupta, Arya Tanmay, et al.
Published: (2023)
Fully Lattice-Linear Algorithms
by: Gupta, Arya Tanmay, et al.
Published: (2022)
by: Gupta, Arya Tanmay, et al.
Published: (2022)
Tolerance to Asynchrony in Algorithms for Multiplication and Modulo
by: Gupta, Arya Tanmay, et al.
Published: (2023)
by: Gupta, Arya Tanmay, et al.
Published: (2023)
Approximated Coded Computing: Towards Fast, Private and Secure Distributed Machine Learning
by: Qiu, Houming, et al.
Published: (2024)
by: Qiu, Houming, et al.
Published: (2024)
FPTC: A Fast Parallel Transform-based Codec for Efficient Asymmetric Signal Compression
by: Mechels, Ben, et al.
Published: (2026)
by: Mechels, Ben, et al.
Published: (2026)
FlashMP: Fast Discrete Transform-Based Solver for Preconditioning Maxwell's Equations on GPUs
by: Zhang, Haoyuan, et al.
Published: (2025)
by: Zhang, Haoyuan, et al.
Published: (2025)
CloudFix: Automated Policy Repair for Cloud Access Control Policies Using Large Language Models
by: Hall, Bethel, et al.
Published: (2025)
by: Hall, Bethel, et al.
Published: (2025)
Towards a Scalable and Efficient PGAS-based Distributed OpenMP
by: Shan, Baodi, et al.
Published: (2024)
by: Shan, Baodi, et al.
Published: (2024)
Parallel Data Object Creation: Towards Scalable Metadata Management in High-Performance I/O Library
by: Li, Youjia, et al.
Published: (2025)
by: Li, Youjia, et al.
Published: (2025)
Asynchronous Checkpoint for Eventually Consistent Databases
by: Ravishankar, Raaghav, et al.
Published: (2025)
by: Ravishankar, Raaghav, et al.
Published: (2025)
Fast and Scalable Mixed Precision Euclidean Distance Calculations Using GPU Tensor Cores
by: Curless, Brian, et al.
Published: (2025)
by: Curless, Brian, et al.
Published: (2025)
Distributing Context-Aware Shared Memory Data Structures: A Case Study on Singly-Linked Lists
by: Ravishankar, Raaghav, et al.
Published: (2024)
by: Ravishankar, Raaghav, et al.
Published: (2024)
Characterizing Production GPU Workloads using System-wide Telemetry Data
by: Cankur, Onur, et al.
Published: (2025)
by: Cankur, Onur, et al.
Published: (2025)
Toward Scalable Docker-Based Emulations of Blockchain Networks for Research and Development
by: Pennino, Diego, et al.
Published: (2024)
by: Pennino, Diego, et al.
Published: (2024)
Scalable and Performant Data Loading
by: Hira, Moto, et al.
Published: (2025)
by: Hira, Moto, et al.
Published: (2025)
Fast-HotStuff: A Fast and Resilient HotStuff Protocol
by: Jalalzai, Mohammad M., et al.
Published: (2020)
by: Jalalzai, Mohammad M., et al.
Published: (2020)
SProBench: Stream Processing Benchmark for High Performance Computing Infrastructure
by: Kulkarni, Apurv Deepak, et al.
Published: (2025)
by: Kulkarni, Apurv Deepak, et al.
Published: (2025)
DataStates-LLM: Scalable Checkpointing for Transformer Models Using Composable State Providers
by: Maurya, Avinash, et al.
Published: (2026)
by: Maurya, Avinash, et al.
Published: (2026)
Pilotfish: Distributed Execution for Scalable Blockchains
by: Kniep, Quentin, et al.
Published: (2024)
by: Kniep, Quentin, et al.
Published: (2024)
Scalable Maxflow Processing for Dynamic Graphs
by: Kannappan, Shruthi, et al.
Published: (2025)
by: Kannappan, Shruthi, et al.
Published: (2025)
Robust and Scalable Renaming with Subquadratic Bits
by: Bai, Sirui, et al.
Published: (2025)
by: Bai, Sirui, et al.
Published: (2025)
Fault-Tolerant Decentralized Distributed Asynchronous Federated Learning with Adaptive Termination Detection
by: Akkinepally, Phani Sahasra, et al.
Published: (2025)
by: Akkinepally, Phani Sahasra, et al.
Published: (2025)
A Fast Confirmation Rule (aka Fast Synchronous Finality) for the Ethereum Consensus Protocol
by: Asgaonkar, Aditya, et al.
Published: (2024)
by: Asgaonkar, Aditya, et al.
Published: (2024)
FastGraph: Optimized GPU-Enabled Algorithms for Fast Graph Building and Message Passing
by: Agarwal, Aarush, et al.
Published: (2025)
by: Agarwal, Aarush, et al.
Published: (2025)
TD-Orch: Scalable Load-Balancing for Distributed Systems with Applications to Graph Processing
by: Zhao, Yiwei, et al.
Published: (2025)
by: Zhao, Yiwei, et al.
Published: (2025)
FLeeC: a Fast Lock-Free Application Cache
by: Costa, André J., et al.
Published: (2024)
by: Costa, André J., et al.
Published: (2024)
Wilkins: HPC In Situ Workflows Made Easy
by: Yildiz, Orcun, et al.
Published: (2024)
by: Yildiz, Orcun, et al.
Published: (2024)
Similar Items
-
Exploring Fast Fourier Transforms on the Tenstorrent Wormhole
by: Brown, Nick, et al.
Published: (2025) -
TurboFFT: A High-Performance Fast Fourier Transform with Fault Tolerance on GPU
by: Wu, Shixun, et al.
Published: (2024) -
Runtime-Orchestrated Second-Order Optimization for Scalable LLM Training
by: Lu, Yishun, et al.
Published: (2026) -
TurboFFT: Co-Designed High-Performance and Fault-Tolerant Fast Fourier Transform on GPUs
by: Wu, Shixun, et al.
Published: (2024) -
Towards a Testbed for Scalable FaaS Platforms
by: Schirmer, Trever, et al.
Published: (2025)