Table of Contents: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sun, Qitong, Han, Jun, Li, Tianlin, Tang, Zhe, Chen, Sheng, Yang, Fei, Liu, Aishan, Liu, Xianglong, Liu, Yang
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Artificial Intelligence Multiagent Systems
Online Access:	https://arxiv.org/abs/2603.10085
Tags:	Add Tag No Tags, Be the first to tag this record!

Table of Contents:

Improving GPU kernel efficiency is crucial for advancing AI systems. Recent work has explored leveraging large language models (LLMs) for GPU kernel generation and optimization. However, existing LLM-based kernel optimization pipelines typically rely on opaque, implicitly learned heuristics within the LLMs to determine optimization strategies. This leads to inefficient trial-and-error and weakly interpretable optimizations. Our key insight is to replace implicit heuristics with expert optimization skills that are knowledge-driven and aware of task trajectories. Specifically, we present KernelSkill, a multi-agent framework with a dual-level memory architecture. KernelSkill operates by coordinating agents with long-term memory of reusable expert skills and short-term memory to prevent repetitive backtracking. On KernelBench Levels 1-3, KernelSkill achieves a 100% success rate and average speedups of 5.44x, 2.82x, and 1.92x over Torch Eager on Levels 1, 2, and 3, respectively, outperforming prior baselines. Code is available at https://github.com/0satan0/KernelMem/.

Similar Items