Saved in:
Bibliographic Details
Main Authors: Bondhugula, Uday, Baviskar, Akshay, Katel, Navdeep, Patel, Vimal, JS, Anoop, Dutta, Arnab
Format: Preprint
Published: 2026
Subjects:
Online Access:https://arxiv.org/abs/2603.06731
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1866912959074140160
author Bondhugula, Uday
Baviskar, Akshay
Katel, Navdeep
Patel, Vimal
JS, Anoop
Dutta, Arnab
author_facet Bondhugula, Uday
Baviskar, Akshay
Katel, Navdeep
Patel, Vimal
JS, Anoop
Dutta, Arnab
contents We present the design and implementation of PolyBlocks, a modular and reusable MLIR-based compiler infrastructure for AI programming frameworks and AI chips. PolyBlocks is based on pass pipelines that compose transformations on loop nests and SSA, primarily relying on lightweight affine access analysis; the transformations are stitched together in specialized ways to realize high-performance code automatically by the use of analytical cost models and heuristics. The optimizations in these passes include multi-level tiling, fusion, on-chip scratchpad usage, mapping matmuls and convolutions to matrix units, fusing the attention layer, and several other transformations for parallelism and locality. They have been developed in a way that makes it easy to build PolyBlocks-based compilers to target new chips, reusing much of the infrastructure. PolyBlocks' design and architecture enable fully automatic code generation from high-level frameworks to low-level target-specific intrinsics. Experimental results from evaluating PolyBlocks-powered just-in-time compilation for PyTorch and JAX targeting NVIDIA GPUs show that it is able to match or outperform Torch Inductor and XLA in several cases, although the latter rely on a combination of vendor libraries and code generation. For individual operators like matmuls and convolutions, PolyBlocks-generated code is competitive with the best vendor-tuned libraries or hand-written kernels.
format Preprint
id arxiv_https___arxiv_org_abs_2603_06731
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle PolyBlocks: A Compiler Infrastructure for AI Chips and Programming Frameworks
Bondhugula, Uday
Baviskar, Akshay
Katel, Navdeep
Patel, Vimal
JS, Anoop
Dutta, Arnab
Programming Languages
Machine Learning
We present the design and implementation of PolyBlocks, a modular and reusable MLIR-based compiler infrastructure for AI programming frameworks and AI chips. PolyBlocks is based on pass pipelines that compose transformations on loop nests and SSA, primarily relying on lightweight affine access analysis; the transformations are stitched together in specialized ways to realize high-performance code automatically by the use of analytical cost models and heuristics. The optimizations in these passes include multi-level tiling, fusion, on-chip scratchpad usage, mapping matmuls and convolutions to matrix units, fusing the attention layer, and several other transformations for parallelism and locality. They have been developed in a way that makes it easy to build PolyBlocks-based compilers to target new chips, reusing much of the infrastructure. PolyBlocks' design and architecture enable fully automatic code generation from high-level frameworks to low-level target-specific intrinsics. Experimental results from evaluating PolyBlocks-powered just-in-time compilation for PyTorch and JAX targeting NVIDIA GPUs show that it is able to match or outperform Torch Inductor and XLA in several cases, although the latter rely on a combination of vendor libraries and code generation. For individual operators like matmuls and convolutions, PolyBlocks-generated code is competitive with the best vendor-tuned libraries or hand-written kernels.
title PolyBlocks: A Compiler Infrastructure for AI Chips and Programming Frameworks
topic Programming Languages
Machine Learning
url https://arxiv.org/abs/2603.06731