Saved in:
| Main Authors: | Boerkamp, Christiaan, Thomas, Akhil John |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.06216 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
TINA: Acceleration of Non-NN Signal Processing Algorithms Using NN Accelerators
by: Boerkamp, Christiaan, et al.
Published: (2024)
by: Boerkamp, Christiaan, et al.
Published: (2024)
Enhanced DeepLab Based Nerve Segmentation with Optimized Tuning
by: Thomas, Akhil John, et al.
Published: (2025)
by: Thomas, Akhil John, et al.
Published: (2025)
QuantU-Net: Efficient Wearable Medical Imaging Using Bitwidth as a Trainable Parameter
by: Boerkamp, Christiaan, et al.
Published: (2025)
by: Boerkamp, Christiaan, et al.
Published: (2025)
RAGPerf: An End-to-End Benchmarking Framework for Retrieval-Augmented Generation Systems
by: Li, Shaobo, et al.
Published: (2026)
by: Li, Shaobo, et al.
Published: (2026)
PoTAcc: A Pipeline for End-to-End Acceleration of Power-of-Two Quantized DNNs
by: Saha, Rappy, et al.
Published: (2026)
by: Saha, Rappy, et al.
Published: (2026)
Energy Consumption of Dataframe Libraries for End-to-End Deep Learning Pipelines:A Comparative Analysis
by: Kumar, Punit, et al.
Published: (2025)
by: Kumar, Punit, et al.
Published: (2025)
PixelBrax: Learning Continuous Control from Pixels End-to-End on the GPU
by: McInroe, Trevor, et al.
Published: (2025)
by: McInroe, Trevor, et al.
Published: (2025)
Introducing the Arm-membench Throughput Benchmark
by: Burth, Cyrill, et al.
Published: (2025)
by: Burth, Cyrill, et al.
Published: (2025)
Columbo: Low Level End-to-End System Traces through Modular Full-System Simulation
by: Görgen, Jakob, et al.
Published: (2024)
by: Görgen, Jakob, et al.
Published: (2024)
Efficient Fault Localization in a Cloud Stack Using End-to-End Application Service Topology
by: Mathews, Dhanya R, et al.
Published: (2025)
by: Mathews, Dhanya R, et al.
Published: (2025)
NSFlow: An End-to-End FPGA Framework with Scalable Dataflow Architecture for Neuro-Symbolic AI
by: Yang, Hanchen, et al.
Published: (2025)
by: Yang, Hanchen, et al.
Published: (2025)
CEBench: A Benchmarking Toolkit for the Cost-Effectiveness of LLM Pipelines
by: Sun, Wenbo, et al.
Published: (2024)
by: Sun, Wenbo, et al.
Published: (2024)
FastDecode: High-Throughput GPU-Efficient LLM Serving using Heterogeneous Pipelines
by: He, Jiaao, et al.
Published: (2024)
by: He, Jiaao, et al.
Published: (2024)
Canvas: End-to-End Kernel Architecture Search in Neural Networks
by: Zhao, Chenggang, et al.
Published: (2023)
by: Zhao, Chenggang, et al.
Published: (2023)
PPU: Design and Implementation of a Pipelined Full Posit Processing Unit
by: Rossi, Federico, et al.
Published: (2023)
by: Rossi, Federico, et al.
Published: (2023)
Can Increasing the Hit Ratio Hurt Cache Throughput? (Long Version)
by: Qiu, Ziyue, et al.
Published: (2024)
by: Qiu, Ziyue, et al.
Published: (2024)
An Upper Bound on the M/M/k Queue With Deterministic Setup Times
by: Williams, Jalani, et al.
Published: (2025)
by: Williams, Jalani, et al.
Published: (2025)
Taking GPU Programming Models to Task for Performance Portability
by: Davis, Joshua H., et al.
Published: (2024)
by: Davis, Joshua H., et al.
Published: (2024)
Proactive Service Assurance in 5G and B5G Networks: A Closed-Loop Algorithm for End-to-End Network Slicing
by: Tran, Nguyen Phuc, et al.
Published: (2024)
by: Tran, Nguyen Phuc, et al.
Published: (2024)
Rethinking Temporal Models for TinyML: LSTM versus 1D-CNN in Resource-Constrained Devices
by: Saha, Bidyut, et al.
Published: (2026)
by: Saha, Bidyut, et al.
Published: (2026)
High-Performance Portable GPU Primitives for Arbitrary Types and Operators in Julia
by: Pilliat, Emmanuel
Published: (2026)
by: Pilliat, Emmanuel
Published: (2026)
Towards High-Performance and Portable Molecular Docking on CPUs through Vectorization
by: Accordi, Gianmarco, et al.
Published: (2025)
by: Accordi, Gianmarco, et al.
Published: (2025)
Reexamining Paradigms of End-to-End Data Movement
by: Fang, Chin, et al.
Published: (2025)
by: Fang, Chin, et al.
Published: (2025)
Adaptive Cache Pollution Control for Large Language Model Inference Workloads Using Temporal CNN-Based Prediction and Priority-Aware Replacement
by: Liu, Songze, et al.
Published: (2025)
by: Liu, Songze, et al.
Published: (2025)
An Analytical Cost Model for Fast Evaluation of Multiple Compute-Engine CNN Accelerators
by: Qararyah, Fareed, et al.
Published: (2025)
by: Qararyah, Fareed, et al.
Published: (2025)
Portable High-Performance Kernel Generation for a Computational Fluid Dynamics Code with DaCe
by: Andersson, Måns I., et al.
Published: (2025)
by: Andersson, Måns I., et al.
Published: (2025)
From 8 Seconds to 370ms: Kernel-Fused SAR Imaging on Apple Silicon via Single-Dispatch FFT Pipelines
by: Bergach, Mohamed Amine
Published: (2026)
by: Bergach, Mohamed Amine
Published: (2026)
msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML
by: Huang, Zhaolan, et al.
Published: (2025)
by: Huang, Zhaolan, et al.
Published: (2025)
DistZO2: High-Throughput and Memory-Efficient Zeroth-Order Fine-tuning LLMs with Distributed Parallel Computing
by: Wang, Liangyu, et al.
Published: (2025)
by: Wang, Liangyu, et al.
Published: (2025)
SProBench: Stream Processing Benchmark for High Performance Computing Infrastructure
by: Kulkarni, Apurv Deepak, et al.
Published: (2025)
by: Kulkarni, Apurv Deepak, et al.
Published: (2025)
Towards Portability at Scale: A Cross-Architecture Performance Evaluation of a GPU-enabled Shallow Water Solver
by: Villalobos, Johansell, et al.
Published: (2025)
by: Villalobos, Johansell, et al.
Published: (2025)
GROMACS on AMD GPU-Based HPC Platforms: Using SYCL for Performance and Portability
by: Alekseenko, Andrey, et al.
Published: (2024)
by: Alekseenko, Andrey, et al.
Published: (2024)
HiKonv: Maximizing the Throughput of Quantized Convolution With Novel Bit-wise Management and Computation
by: Chen, Yao, et al.
Published: (2022)
by: Chen, Yao, et al.
Published: (2022)
Challenging Portability Paradigms: FPGA Acceleration Using SYCL and OpenCL
by: de Castro, Manuel, et al.
Published: (2024)
by: de Castro, Manuel, et al.
Published: (2024)
Wasure: A Modular Toolkit for Comprehensive WebAssembly Benchmarking
by: Carissimi, Riccardo, et al.
Published: (2026)
by: Carissimi, Riccardo, et al.
Published: (2026)
An Analysis of Performance Bottlenecks in MRI Pre-Processing
by: Dugré, Mathieu, et al.
Published: (2024)
by: Dugré, Mathieu, et al.
Published: (2024)
Machine Learning-driven Autotuning of Graphics Processing Unit Accelerated Computational Fluid Dynamics for Enhanced Performance
by: Xue, Weicheng, et al.
Published: (2023)
by: Xue, Weicheng, et al.
Published: (2023)
A Continuous Benchmarking Infrastructure for High-Performance Computing Applications
by: Alt, Christoph, et al.
Published: (2024)
by: Alt, Christoph, et al.
Published: (2024)
On Resolving Non-Preemptivity in Multitask Scheduling: An Optimal Algorithm in Deterministic and Stochastic Worlds
by: Li, Wenxin
Published: (2024)
by: Li, Wenxin
Published: (2024)
Portability of Fortran's `do concurrent' on GPUs
by: Caplan, Ronald M., et al.
Published: (2024)
by: Caplan, Ronald M., et al.
Published: (2024)
Similar Items
-
TINA: Acceleration of Non-NN Signal Processing Algorithms Using NN Accelerators
by: Boerkamp, Christiaan, et al.
Published: (2024) -
Enhanced DeepLab Based Nerve Segmentation with Optimized Tuning
by: Thomas, Akhil John, et al.
Published: (2025) -
QuantU-Net: Efficient Wearable Medical Imaging Using Bitwidth as a Trainable Parameter
by: Boerkamp, Christiaan, et al.
Published: (2025) -
RAGPerf: An End-to-End Benchmarking Framework for Retrieval-Augmented Generation Systems
by: Li, Shaobo, et al.
Published: (2026) -
PoTAcc: A Pipeline for End-to-End Acceleration of Power-of-Two Quantized DNNs
by: Saha, Rappy, et al.
Published: (2026)