Saved in:
| Main Authors: | , |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.06216 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1866910013436461056 |
|---|---|
| author | Boerkamp, Christiaan Thomas, Akhil John |
| author_facet | Boerkamp, Christiaan Thomas, Akhil John |
| contents | This paper presents a benchmarking methodology for evaluating end-to-end performance of deterministic signal-processing pipelines expressed using CNN-compatible primitives. The benchmark targets phased-array workloads such as ultrasound imaging and evaluates complete RF-to-image pipelines under realistic execution conditions. Performance is reported using sustained input throughput (MB/s), effective frame rate (FPS), and, where available, incremental energy per run and peak memory usage. Using this methodology, we benchmark a single deterministic, training-free CNN-based signal-processing pipeline executed unmodified across heterogeneous accelerator platforms, including an NVIDIA RTX 5090 GPU and a Google TPU v5e-1. The results demonstrate how different operator formulations (dynamic indexing, fully CNN-expressed, and sparse-matrix-based) impact performance and portability across architectures. This work is motivated by the need for portable, certifiable signal-processing implementations that avoid hardware-specific refactoring while retaining high performance on modern AI accelerators. |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2602_06216 |
| institution | arXiv |
| publishDate | 2026 |
| record_format | arxiv |
| spellingShingle | End-to-End Throughput Benchmarking of Portable Deterministic CNN-Based Signal Processing Pipelines Boerkamp, Christiaan Thomas, Akhil John Performance This paper presents a benchmarking methodology for evaluating end-to-end performance of deterministic signal-processing pipelines expressed using CNN-compatible primitives. The benchmark targets phased-array workloads such as ultrasound imaging and evaluates complete RF-to-image pipelines under realistic execution conditions. Performance is reported using sustained input throughput (MB/s), effective frame rate (FPS), and, where available, incremental energy per run and peak memory usage. Using this methodology, we benchmark a single deterministic, training-free CNN-based signal-processing pipeline executed unmodified across heterogeneous accelerator platforms, including an NVIDIA RTX 5090 GPU and a Google TPU v5e-1. The results demonstrate how different operator formulations (dynamic indexing, fully CNN-expressed, and sparse-matrix-based) impact performance and portability across architectures. This work is motivated by the need for portable, certifiable signal-processing implementations that avoid hardware-specific refactoring while retaining high performance on modern AI accelerators. |
| title | End-to-End Throughput Benchmarking of Portable Deterministic CNN-Based Signal Processing Pipelines |
| topic | Performance |
| url | https://arxiv.org/abs/2602.06216 |