:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Boerkamp, Christiaan, Thomas, Akhil John
Format:	Preprint
Published:	2026
Subjects:	Performance
Online Access:	https://arxiv.org/abs/2602.06216
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

TINA: Acceleration of Non-NN Signal Processing Algorithms Using NN Accelerators
by: Boerkamp, Christiaan, et al.
Published: (2024)

Enhanced DeepLab Based Nerve Segmentation with Optimized Tuning
by: Thomas, Akhil John, et al.
Published: (2025)

QuantU-Net: Efficient Wearable Medical Imaging Using Bitwidth as a Trainable Parameter
by: Boerkamp, Christiaan, et al.
Published: (2025)

RAGPerf: An End-to-End Benchmarking Framework for Retrieval-Augmented Generation Systems
by: Li, Shaobo, et al.
Published: (2026)

PoTAcc: A Pipeline for End-to-End Acceleration of Power-of-Two Quantized DNNs
by: Saha, Rappy, et al.
Published: (2026)

Energy Consumption of Dataframe Libraries for End-to-End Deep Learning Pipelines:A Comparative Analysis
by: Kumar, Punit, et al.
Published: (2025)

PixelBrax: Learning Continuous Control from Pixels End-to-End on the GPU
by: McInroe, Trevor, et al.
Published: (2025)

Introducing the Arm-membench Throughput Benchmark
by: Burth, Cyrill, et al.
Published: (2025)

Columbo: Low Level End-to-End System Traces through Modular Full-System Simulation
by: Görgen, Jakob, et al.
Published: (2024)

Efficient Fault Localization in a Cloud Stack Using End-to-End Application Service Topology
by: Mathews, Dhanya R, et al.
Published: (2025)

NSFlow: An End-to-End FPGA Framework with Scalable Dataflow Architecture for Neuro-Symbolic AI
by: Yang, Hanchen, et al.
Published: (2025)

CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM Pipelines
by: Sun, Wenbo, et al.
Published: (2024)

FastDecode: High-Throughput GPU-Efficient LLM Serving using Heterogeneous Pipelines
by: He, Jiaao, et al.
Published: (2024)

Canvas: End-to-End Kernel Architecture Search in Neural Networks
by: Zhao, Chenggang, et al.
Published: (2023)

PPU: Design and Implementation of a Pipelined Full Posit Processing Unit
by: Rossi, Federico, et al.
Published: (2023)

Can Increasing the Hit Ratio Hurt Cache Throughput? (Long Version)
by: Qiu, Ziyue, et al.
Published: (2024)

An Upper Bound on the M/M/k Queue With Deterministic Setup Times
by: Williams, Jalani, et al.
Published: (2025)

Taking GPU Programming Models to Task for Performance Portability
by: Davis, Joshua H., et al.
Published: (2024)

Proactive Service Assurance in 5G and B5G Networks: A Closed-Loop Algorithm for End-to-End Network Slicing
by: Tran, Nguyen Phuc, et al.
Published: (2024)

Rethinking Temporal Models for TinyML: LSTM versus 1D-CNN in Resource-Constrained Devices
by: Saha, Bidyut, et al.
Published: (2026)

High-Performance Portable GPU Primitives for Arbitrary Types and Operators in Julia
by: Pilliat, Emmanuel
Published: (2026)

Towards High-Performance and Portable Molecular Docking on CPUs through Vectorization
by: Accordi, Gianmarco, et al.
Published: (2025)

Reexamining Paradigms of End-to-End Data Movement
by: Fang, Chin, et al.
Published: (2025)

Adaptive Cache Pollution Control for Large Language Model Inference Workloads Using Temporal CNN-Based Prediction and Priority-Aware Replacement
by: Liu, Songze, et al.
Published: (2025)

An Analytical Cost Model for Fast Evaluation of Multiple Compute-Engine CNN Accelerators
by: Qararyah, Fareed, et al.
Published: (2025)

Portable High-Performance Kernel Generation for a Computational Fluid Dynamics Code with DaCe
by: Andersson, Måns I., et al.
Published: (2025)

From 8 Seconds to 370ms: Kernel-Fused SAR Imaging on Apple Silicon via Single-Dispatch FFT Pipelines
by: Bergach, Mohamed Amine
Published: (2026)

msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML
by: Huang, Zhaolan, et al.
Published: (2025)

DistZO2: High-Throughput and Memory-Efficient Zeroth-Order Fine-tuning LLMs with Distributed Parallel Computing
by: Wang, Liangyu, et al.
Published: (2025)

SProBench: Stream Processing Benchmark for High Performance Computing Infrastructure
by: Kulkarni, Apurv Deepak, et al.
Published: (2025)

Towards Portability at Scale: A Cross-Architecture Performance Evaluation of a GPU-enabled Shallow Water Solver
by: Villalobos, Johansell, et al.
Published: (2025)

GROMACS on AMD GPU-Based HPC Platforms: Using SYCL for Performance and Portability
by: Alekseenko, Andrey, et al.
Published: (2024)

HiKonv: Maximizing the Throughput of Quantized Convolution With Novel Bit-wise Management and Computation
by: Chen, Yao, et al.
Published: (2022)

Challenging Portability Paradigms: FPGA Acceleration Using SYCL and OpenCL
by: de Castro, Manuel, et al.
Published: (2024)

Wasure: A Modular Toolkit for Comprehensive WebAssembly Benchmarking
by: Carissimi, Riccardo, et al.
Published: (2026)

An Analysis of Performance Bottlenecks in MRI Pre-Processing
by: Dugré, Mathieu, et al.
Published: (2024)

Machine Learning-driven Autotuning of Graphics Processing Unit Accelerated Computational Fluid Dynamics for Enhanced Performance
by: Xue, Weicheng, et al.
Published: (2023)

A Continuous Benchmarking Infrastructure for High-Performance Computing Applications
by: Alt, Christoph, et al.
Published: (2024)

On Resolving Non-Preemptivity in Multitask Scheduling: An Optimal Algorithm in Deterministic and Stochastic Worlds
by: Li, Wenxin
Published: (2024)

Portability of Fortran's `do concurrent' on GPUs
by: Caplan, Ronald M., et al.
Published: (2024)