Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sun, Qitong, Han, Jun, Li, Tianlin, Tang, Zhe, Chen, Sheng, Yang, Fei, Liu, Aishan, Liu, Xianglong, Liu, Yang
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence Multiagent Systems
Online Access:	https://arxiv.org/abs/2603.10085
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866912959990595584
author	Sun, Qitong Han, Jun Li, Tianlin Tang, Zhe Chen, Sheng Yang, Fei Liu, Aishan Liu, Xianglong Liu, Yang
author_facet	Sun, Qitong Han, Jun Li, Tianlin Tang, Zhe Chen, Sheng Yang, Fei Liu, Aishan Liu, Xianglong Liu, Yang
contents	Improving GPU kernel efficiency is crucial for advancing AI systems. Recent work has explored leveraging large language models (LLMs) for GPU kernel generation and optimization. However, existing LLM-based kernel optimization pipelines typically rely on opaque, implicitly learned heuristics within the LLMs to determine optimization strategies. This leads to inefficient trial-and-error and weakly interpretable optimizations. Our key insight is to replace implicit heuristics with expert optimization skills that are knowledge-driven and aware of task trajectories. Specifically, we present KernelSkill, a multi-agent framework with a dual-level memory architecture. KernelSkill operates by coordinating agents with long-term memory of reusable expert skills and short-term memory to prevent repetitive backtracking. On KernelBench Levels 1-3, KernelSkill achieves a 100% success rate and average speedups of 5.44x, 2.82x, and 1.92x over Torch Eager on Levels 1, 2, and 3, respectively, outperforming prior baselines. Code is available at https://github.com/0satan0/KernelMem/.
format	Preprint
id	arxiv_https___arxiv_org_abs_2603_10085
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization Sun, Qitong Han, Jun Li, Tianlin Tang, Zhe Chen, Sheng Yang, Fei Liu, Aishan Liu, Xianglong Liu, Yang Machine Learning Artificial Intelligence Multiagent Systems Improving GPU kernel efficiency is crucial for advancing AI systems. Recent work has explored leveraging large language models (LLMs) for GPU kernel generation and optimization. However, existing LLM-based kernel optimization pipelines typically rely on opaque, implicitly learned heuristics within the LLMs to determine optimization strategies. This leads to inefficient trial-and-error and weakly interpretable optimizations. Our key insight is to replace implicit heuristics with expert optimization skills that are knowledge-driven and aware of task trajectories. Specifically, we present KernelSkill, a multi-agent framework with a dual-level memory architecture. KernelSkill operates by coordinating agents with long-term memory of reusable expert skills and short-term memory to prevent repetitive backtracking. On KernelBench Levels 1-3, KernelSkill achieves a 100% success rate and average speedups of 5.44x, 2.82x, and 1.92x over Torch Eager on Levels 1, 2, and 3, respectively, outperforming prior baselines. Code is available at https://github.com/0satan0/KernelMem/.
title	KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization
topic	Machine Learning Artificial Intelligence Multiagent Systems
url	https://arxiv.org/abs/2603.10085

Similar Items