:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Taylor, Maya, Pearson, Carl, Berger-Vergiat, Luc, Long, Giovanni, Ciesko, Jan
Format:	Preprint
Published:	2026
Subjects:	Performance
Online Access:	https://arxiv.org/abs/2603.23343
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Assessing Tenstorrent's RISC-V MatMul Acceleration Capabilities
by: Cavagna, Hiari Pizzini, et al.
Published: (2025)

Attention in SRAM on Tenstorrent Grayskull
by: Thüning, Moritz
Published: (2024)

Stencil Computations on Tenstorrent Wormhole
by: Piarulli, Lorenzo, et al.
Published: (2026)

Communication-Aware Diffusion Load Balancing for Persistently Interacting Objects
by: Taylor, Maya, et al.
Published: (2026)

AcceleratedKernels.jl: Cross-Architecture Parallel Algorithms from a Unified, Transpiled Codebase
by: Nicusan, Andrei-Leonard, et al.
Published: (2025)

KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta
by: Liao, Gang, et al.
Published: (2025)

KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels
by: Wang, Han, et al.
Published: (2026)

Scaling Analog Photonic Accelerators for Byte-Size, Integer General Matrix Multiply (GEMM) Kernels
by: Alo, Oluwaseun Adewunmi, et al.
Published: (2024)

AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search
by: Jaber, Jaber, et al.
Published: (2026)

Long-term Monitoring of Kernel and Hardware Events to Understand Latency Variance
by: Zhou, Fang, et al.
Published: (2026)

TINA: Acceleration of Non-NN Signal Processing Algorithms Using NN Accelerators
by: Boerkamp, Christiaan, et al.
Published: (2024)

Do Drag ao Pós-Drag: a performance travesti frente à etnicidade e à classe
by: Luc Schicharin
Published: (2017)

Building an Accelerated OpenFOAM Proof-of-Concept Application using Modern C++
by: Malenza, Giulio, et al.
Published: (2025)

Exploring Fast Fourier Transforms on the Tenstorrent Wormhole
by: Brown, Nick, et al.
Published: (2025)

Architectural Trade-offs in the Energy-Efficient Era: A Comparative Study of power-capping NVIDIA H100 and H200
by: Ujeniya, Aditya, et al.
Published: (2026)

FlipFlop: A Static Analysis-based Energy Optimization Framework for GPU Kernels
by: Rajput, Saurabhsingh, et al.
Published: (2026)

A Kernel-Based Approach for Accurate Steady-State Detection in Performance Time Series
by: Beseda, Martin, et al.
Published: (2025)

Reducing Compute Waste in LLMs through Kernel-Level DVFS
by: Spaan, Jeffrey, et al.
Published: (2026)

Insum: Sparse GPU Kernels Simplified and Optimized with Indirect Einsums
by: Won, Jaeyeon, et al.
Published: (2025)

From 8 Seconds to 370ms: Kernel-Fused SAR Imaging on Apple Silicon via Single-Dispatch FFT Pipelines
by: Bergach, Mohamed Amine
Published: (2026)

Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach
by: Zhang, Yijia, et al.
Published: (2024)

MLKAPS: Machine Learning and Adaptive Sampling for HPC Kernel Auto-tuning
by: Jam, Mathys, et al.
Published: (2025)

AFarePart: Accuracy-aware Fault-resilient Partitioner for DNN Edge Accelerators
by: Debnath, Mukta, et al.
Published: (2025)

A Review on Proprietary Accelerators for Large Language Models
by: Park, Sihyeong, et al.
Published: (2025)

Caspar: CUDA Accelerator for Symbolic Programming with Adaptive Reordering
by: Martens, Emil, et al.
Published: (2026)

Accelerating Vertical Federated Learning
by: Cai, Dongqi, et al.
Published: (2022)

KernelBench: Can LLMs Write Efficient GPU Kernels?
by: Ouyang, Anne, et al.
Published: (2025)

Dynamic Precision Math Engine for Linear Algebra and Trigonometry Acceleration on Xtensa LX6 Microcontrollers
by: Preciado, Elian Alfonso Lopez
Published: (2026)

WaveTune: Wave-aware Bilinear Modeling for Efficient GPU Kernel Auto-tuning
by: Zhang, Kaixuan, et al.
Published: (2026)

FACT: Compositional Kernel Synthesis with a Three-Stage Agentic Workflow
by: Heidari, Sina, et al.
Published: (2026)

Accelerating Machine Learning Queries with Linear Algebra Query Processing
by: Sun, Wenbo, et al.
Published: (2023)

GPU Kernel Scientist: An LLM-Driven Framework for Iterative Kernel Optimization
by: Andrews, Martin, et al.
Published: (2025)

cuTeSpMM: Accelerating Sparse-Dense Matrix Multiplication using GPU Tensor Cores
by: Xiang, Lizhi, et al.
Published: (2025)

Flex Attention: A Programming Model for Generating Optimized Attention Kernels
by: Dong, Juechu, et al.
Published: (2024)

GraphMini: Accelerating Graph Pattern Matching Using Auxiliary Graphs
by: Liu, Juelin, et al.
Published: (2024)

Accelerating Gravitational $N$-Body Simulations Using the RISC-V-Based Tenstorrent Wormhole
by: Almerol, Jenny Lynn, et al.
Published: (2025)

Acceleration and energy consumption optimization in cascading classifiers for face detection on low-cost ARM big.LITTLE asymmetric architectures
by: Corpas, Alberto, et al.
Published: (2024)

Accelerating the Tesseract Decoder for Quantum Error Correction
by: Grbic, Dragana, et al.
Published: (2026)

A relação entre a «performance» social e a «performance» económico-financeira
by: Daniel Taborda
Published: (2007)

SENSEi: Input-Sensitive Compilation for Accelerating GNNs
by: Lenadora, Damitha, et al.
Published: (2023)