:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Han, Yuxuan, Guo, Meng-Hao, Liu, Zhengning, Chen, Wenguang, Hu, Shi-Min
Format:	Preprint
Published:	2026
Subjects:	Machine Learning
Online Access:	https://arxiv.org/abs/2603.07169
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CUDA-LLM: LLMs Can Write Efficient CUDA Kernels
by: Chen, Wentao, et al.
Published: (2025)

CUDAHercules: Benchmarking Hardware-Aware Expert-level CUDA Optimization for LLMs
by: Li, Shiyang, et al.
Published: (2026)

CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
by: Dai, Weinan, et al.
Published: (2026)

Kevin: Multi-Turn RL for Generating CUDA Kernels
by: Baronio, Carlo, et al.
Published: (2025)

Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization
by: Lange, Robert Tjarko, et al.
Published: (2025)

OptiMind: Teaching LLMs to Think Like Optimization Experts
by: Zhang, Xinzhi, et al.
Published: (2025)

EvoEngineer: Mastering Automated CUDA Kernel Code Evolution with Large Language Models
by: Guo, Ping, et al.
Published: (2025)

Spatial-Temporal Mixture-of-Graph-Experts for Multi-Type Crime Prediction
by: Wu, Ziyang, et al.
Published: (2024)

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels
by: Bai, Haolei, et al.
Published: (2026)

CUDABench: Benchmarking LLMs for Text-to-CUDA Generation
by: Zhu, Jiace, et al.
Published: (2026)

CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization
by: Zhang, Zijian, et al.
Published: (2025)

KernelBlaster: Continual Cross-Task CUDA Optimization via Memory-Augmented In-Context Reinforcement Learning
by: Dong, Kris Shengjun, et al.
Published: (2026)

TiledAttention: a CUDA Tile SDPA Kernel for PyTorch
by: Khan, Taimur
Published: (2026)

OptiML: An End-to-End Framework for Program Synthesis and CUDA Kernel Optimization
by: Bhattacharjee, Arijit, et al.
Published: (2026)

Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging
by: Li, Lujun, et al.
Published: (2025)

KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization
by: Sun, Qitong, et al.
Published: (2026)

KernelBand: Steering LLM-based Kernel Optimization via Hardware-Aware Multi-Armed Bandits
by: Ran, Dezhi, et al.
Published: (2025)

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
by: Li, Xiaoya, et al.
Published: (2025)

Decoupling Knowledge and Reasoning in Transformers: A Modular Architecture with Generalized Cross-Attention
by: Guo, Zhenyu, et al.
Published: (2025)

Kernel Foundry: A Diagnosis-driven Evolutionary Kernel Optimizer with Multi-Experts
by: Huang, Zixuan, et al.
Published: (2026)

LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing
by: Hao, Jiawei, et al.
Published: (2026)

From Large to Small: Transferring CUDA Optimization Expertise via Reasoning Graph
by: Gong, Junfeng, et al.
Published: (2025)

Dynamic Expert Sharing: Decoupling Memory from Parallelism in Mixture-of-Experts Diffusion LLMs
by: Chen, Hao Mark, et al.
Published: (2026)

PIANIST: Learning Partially Observable World Models with LLMs for Multi-Agent Decision Making
by: Light, Jonathan, et al.
Published: (2024)

Collaborative Expert LLMs Guided Multi-Objective Molecular Optimization
by: Yu, Jiajun, et al.
Published: (2025)

MixTTE: Multi-Level Mixture-of-Experts for Scalable and Adaptive Travel Time Estimation
by: Jiang, Wenzhao, et al.
Published: (2026)

KernelBench: Can LLMs Write Efficient GPU Kernels?
by: Ouyang, Anne, et al.
Published: (2025)

MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization
by: Guo, Jingming, et al.
Published: (2024)

MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE
by: Zhang, Geng, et al.
Published: (2025)

SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs?
by: Zhuang, Haomin, et al.
Published: (2024)

Exploring Critical Testing Scenarios for Decision-Making Policies: An LLM Approach
by: Xu, Weichao, et al.
Published: (2024)

Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization
by: Du, He, et al.
Published: (2026)

Bayesian Mixture-of-Experts: Towards Making LLMs Know What They Don't Know
by: Li, Albus Yizhuo
Published: (2025)

MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices?
by: Zou, Xingze, et al.
Published: (2026)

AdaKernel: Learning Adaptive Kernel Parameters for Spatiotemporal Graph Neural Networks
by: Zhang, Zhongyue, et al.
Published: (2026)

Exploring the Noise Robustness of Online Conformal Prediction
by: Xi, Huajun, et al.
Published: (2025)

Lightweight Gaussian Process Inference in C++ on Metal and CUDA
by: Fang, Yu-Hsueh
Published: (2026)

Direct Behavior Optimization: Unlocking the Potential of Lightweight LLMs
by: Yang, Hongming, et al.
Published: (2025)

Improving Variable-Length Generation in Diffusion Language Models via Length Regularization
by: Cheng, Zicong, et al.
Published: (2026)

Towards Automated Kernel Generation in the Era of LLMs
by: Yu, Yang, et al.
Published: (2026)