Saved in:
| Main Authors: | Taylor, Maya, Pearson, Carl, Berger-Vergiat, Luc, Long, Giovanni, Ciesko, Jan |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.23343 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Assessing Tenstorrent's RISC-V MatMul Acceleration Capabilities
by: Cavagna, Hiari Pizzini, et al.
Published: (2025)
by: Cavagna, Hiari Pizzini, et al.
Published: (2025)
Attention in SRAM on Tenstorrent Grayskull
by: Thüning, Moritz
Published: (2024)
by: Thüning, Moritz
Published: (2024)
Stencil Computations on Tenstorrent Wormhole
by: Piarulli, Lorenzo, et al.
Published: (2026)
by: Piarulli, Lorenzo, et al.
Published: (2026)
Communication-Aware Diffusion Load Balancing for Persistently Interacting Objects
by: Taylor, Maya, et al.
Published: (2026)
by: Taylor, Maya, et al.
Published: (2026)
AcceleratedKernels.jl: Cross-Architecture Parallel Algorithms from a Unified, Transpiled Codebase
by: Nicusan, Andrei-Leonard, et al.
Published: (2025)
by: Nicusan, Andrei-Leonard, et al.
Published: (2025)
KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta
by: Liao, Gang, et al.
Published: (2025)
by: Liao, Gang, et al.
Published: (2025)
KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels
by: Wang, Han, et al.
Published: (2026)
by: Wang, Han, et al.
Published: (2026)
Scaling Analog Photonic Accelerators for Byte-Size, Integer General Matrix Multiply (GEMM) Kernels
by: Alo, Oluwaseun Adewunmi, et al.
Published: (2024)
by: Alo, Oluwaseun Adewunmi, et al.
Published: (2024)
AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search
by: Jaber, Jaber, et al.
Published: (2026)
by: Jaber, Jaber, et al.
Published: (2026)
Long-term Monitoring of Kernel and Hardware Events to Understand Latency Variance
by: Zhou, Fang, et al.
Published: (2026)
by: Zhou, Fang, et al.
Published: (2026)
TINA: Acceleration of Non-NN Signal Processing Algorithms Using NN Accelerators
by: Boerkamp, Christiaan, et al.
Published: (2024)
by: Boerkamp, Christiaan, et al.
Published: (2024)
Do Drag ao Pós-Drag: a performance travesti frente à etnicidade e à classe
by: Luc Schicharin
Published: (2017)
by: Luc Schicharin
Published: (2017)
Building an Accelerated OpenFOAM Proof-of-Concept Application using Modern C++
by: Malenza, Giulio, et al.
Published: (2025)
by: Malenza, Giulio, et al.
Published: (2025)
Exploring Fast Fourier Transforms on the Tenstorrent Wormhole
by: Brown, Nick, et al.
Published: (2025)
by: Brown, Nick, et al.
Published: (2025)
Architectural Trade-offs in the Energy-Efficient Era: A Comparative Study of power-capping NVIDIA H100 and H200
by: Ujeniya, Aditya, et al.
Published: (2026)
by: Ujeniya, Aditya, et al.
Published: (2026)
FlipFlop: A Static Analysis-based Energy Optimization Framework for GPU Kernels
by: Rajput, Saurabhsingh, et al.
Published: (2026)
by: Rajput, Saurabhsingh, et al.
Published: (2026)
A Kernel-Based Approach for Accurate Steady-State Detection in Performance Time Series
by: Beseda, Martin, et al.
Published: (2025)
by: Beseda, Martin, et al.
Published: (2025)
Reducing Compute Waste in LLMs through Kernel-Level DVFS
by: Spaan, Jeffrey, et al.
Published: (2026)
by: Spaan, Jeffrey, et al.
Published: (2026)
Insum: Sparse GPU Kernels Simplified and Optimized with Indirect Einsums
by: Won, Jaeyeon, et al.
Published: (2025)
by: Won, Jaeyeon, et al.
Published: (2025)
From 8 Seconds to 370ms: Kernel-Fused SAR Imaging on Apple Silicon via Single-Dispatch FFT Pipelines
by: Bergach, Mohamed Amine
Published: (2026)
by: Bergach, Mohamed Amine
Published: (2026)
Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach
by: Zhang, Yijia, et al.
Published: (2024)
by: Zhang, Yijia, et al.
Published: (2024)
MLKAPS: Machine Learning and Adaptive Sampling for HPC Kernel Auto-tuning
by: Jam, Mathys, et al.
Published: (2025)
by: Jam, Mathys, et al.
Published: (2025)
AFarePart: Accuracy-aware Fault-resilient Partitioner for DNN Edge Accelerators
by: Debnath, Mukta, et al.
Published: (2025)
by: Debnath, Mukta, et al.
Published: (2025)
A Review on Proprietary Accelerators for Large Language Models
by: Park, Sihyeong, et al.
Published: (2025)
by: Park, Sihyeong, et al.
Published: (2025)
Caspar: CUDA Accelerator for Symbolic Programming with Adaptive Reordering
by: Martens, Emil, et al.
Published: (2026)
by: Martens, Emil, et al.
Published: (2026)
Accelerating Vertical Federated Learning
by: Cai, Dongqi, et al.
Published: (2022)
by: Cai, Dongqi, et al.
Published: (2022)
KernelBench: Can LLMs Write Efficient GPU Kernels?
by: Ouyang, Anne, et al.
Published: (2025)
by: Ouyang, Anne, et al.
Published: (2025)
Dynamic Precision Math Engine for Linear Algebra and Trigonometry Acceleration on Xtensa LX6 Microcontrollers
by: Preciado, Elian Alfonso Lopez
Published: (2026)
by: Preciado, Elian Alfonso Lopez
Published: (2026)
WaveTune: Wave-aware Bilinear Modeling for Efficient GPU Kernel Auto-tuning
by: Zhang, Kaixuan, et al.
Published: (2026)
by: Zhang, Kaixuan, et al.
Published: (2026)
FACT: Compositional Kernel Synthesis with a Three-Stage Agentic Workflow
by: Heidari, Sina, et al.
Published: (2026)
by: Heidari, Sina, et al.
Published: (2026)
Accelerating Machine Learning Queries with Linear Algebra Query Processing
by: Sun, Wenbo, et al.
Published: (2023)
by: Sun, Wenbo, et al.
Published: (2023)
GPU Kernel Scientist: An LLM-Driven Framework for Iterative Kernel Optimization
by: Andrews, Martin, et al.
Published: (2025)
by: Andrews, Martin, et al.
Published: (2025)
cuTeSpMM: Accelerating Sparse-Dense Matrix Multiplication using GPU Tensor Cores
by: Xiang, Lizhi, et al.
Published: (2025)
by: Xiang, Lizhi, et al.
Published: (2025)
Flex Attention: A Programming Model for Generating Optimized Attention Kernels
by: Dong, Juechu, et al.
Published: (2024)
by: Dong, Juechu, et al.
Published: (2024)
GraphMini: Accelerating Graph Pattern Matching Using Auxiliary Graphs
by: Liu, Juelin, et al.
Published: (2024)
by: Liu, Juelin, et al.
Published: (2024)
Accelerating Gravitational $N$-Body Simulations Using the RISC-V-Based Tenstorrent Wormhole
by: Almerol, Jenny Lynn, et al.
Published: (2025)
by: Almerol, Jenny Lynn, et al.
Published: (2025)
Acceleration and energy consumption optimization in cascading classifiers for face detection on low-cost ARM big.LITTLE asymmetric architectures
by: Corpas, Alberto, et al.
Published: (2024)
by: Corpas, Alberto, et al.
Published: (2024)
Accelerating the Tesseract Decoder for Quantum Error Correction
by: Grbic, Dragana, et al.
Published: (2026)
by: Grbic, Dragana, et al.
Published: (2026)
A relação entre a «performance» social e a «performance» económico-financeira
by: Daniel Taborda
Published: (2007)
by: Daniel Taborda
Published: (2007)
SENSEi: Input-Sensitive Compilation for Accelerating GNNs
by: Lenadora, Damitha, et al.
Published: (2023)
by: Lenadora, Damitha, et al.
Published: (2023)
Similar Items
-
Assessing Tenstorrent's RISC-V MatMul Acceleration Capabilities
by: Cavagna, Hiari Pizzini, et al.
Published: (2025) -
Attention in SRAM on Tenstorrent Grayskull
by: Thüning, Moritz
Published: (2024) -
Stencil Computations on Tenstorrent Wormhole
by: Piarulli, Lorenzo, et al.
Published: (2026) -
Communication-Aware Diffusion Load Balancing for Persistently Interacting Objects
by: Taylor, Maya, et al.
Published: (2026) -
AcceleratedKernels.jl: Cross-Architecture Parallel Algorithms from a Unified, Transpiled Codebase
by: Nicusan, Andrei-Leonard, et al.
Published: (2025)