Saved in:
| Main Authors: | Han, Yuxuan, Guo, Meng-Hao, Liu, Zhengning, Chen, Wenguang, Hu, Shi-Min |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2603.07169 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CUDA-LLM: LLMs Can Write Efficient CUDA Kernels
by: Chen, Wentao, et al.
Published: (2025)
by: Chen, Wentao, et al.
Published: (2025)
CUDAHercules: Benchmarking Hardware-Aware Expert-level CUDA Optimization for LLMs
by: Li, Shiyang, et al.
Published: (2026)
by: Li, Shiyang, et al.
Published: (2026)
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
by: Dai, Weinan, et al.
Published: (2026)
by: Dai, Weinan, et al.
Published: (2026)
Kevin: Multi-Turn RL for Generating CUDA Kernels
by: Baronio, Carlo, et al.
Published: (2025)
by: Baronio, Carlo, et al.
Published: (2025)
Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization
by: Lange, Robert Tjarko, et al.
Published: (2025)
by: Lange, Robert Tjarko, et al.
Published: (2025)
OptiMind: Teaching LLMs to Think Like Optimization Experts
by: Zhang, Xinzhi, et al.
Published: (2025)
by: Zhang, Xinzhi, et al.
Published: (2025)
EvoEngineer: Mastering Automated CUDA Kernel Code Evolution with Large Language Models
by: Guo, Ping, et al.
Published: (2025)
by: Guo, Ping, et al.
Published: (2025)
Spatial-Temporal Mixture-of-Graph-Experts for Multi-Type Crime Prediction
by: Wu, Ziyang, et al.
Published: (2024)
by: Wu, Ziyang, et al.
Published: (2024)
DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels
by: Bai, Haolei, et al.
Published: (2026)
by: Bai, Haolei, et al.
Published: (2026)
CUDABench: Benchmarking LLMs for Text-to-CUDA Generation
by: Zhu, Jiace, et al.
Published: (2026)
by: Zhu, Jiace, et al.
Published: (2026)
CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization
by: Zhang, Zijian, et al.
Published: (2025)
by: Zhang, Zijian, et al.
Published: (2025)
KernelBlaster: Continual Cross-Task CUDA Optimization via Memory-Augmented In-Context Reinforcement Learning
by: Dong, Kris Shengjun, et al.
Published: (2026)
by: Dong, Kris Shengjun, et al.
Published: (2026)
TiledAttention: a CUDA Tile SDPA Kernel for PyTorch
by: Khan, Taimur
Published: (2026)
by: Khan, Taimur
Published: (2026)
OptiML: An End-to-End Framework for Program Synthesis and CUDA Kernel Optimization
by: Bhattacharjee, Arijit, et al.
Published: (2026)
by: Bhattacharjee, Arijit, et al.
Published: (2026)
Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging
by: Li, Lujun, et al.
Published: (2025)
by: Li, Lujun, et al.
Published: (2025)
KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization
by: Sun, Qitong, et al.
Published: (2026)
by: Sun, Qitong, et al.
Published: (2026)
KernelBand: Steering LLM-based Kernel Optimization via Hardware-Aware Multi-Armed Bandits
by: Ran, Dezhi, et al.
Published: (2025)
by: Ran, Dezhi, et al.
Published: (2025)
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
by: Li, Xiaoya, et al.
Published: (2025)
by: Li, Xiaoya, et al.
Published: (2025)
Decoupling Knowledge and Reasoning in Transformers: A Modular Architecture with Generalized Cross-Attention
by: Guo, Zhenyu, et al.
Published: (2025)
by: Guo, Zhenyu, et al.
Published: (2025)
Kernel Foundry: A Diagnosis-driven Evolutionary Kernel Optimizer with Multi-Experts
by: Huang, Zixuan, et al.
Published: (2026)
by: Huang, Zixuan, et al.
Published: (2026)
LightMoE: Reducing Mixture-of-Experts Redundancy through Expert Replacing
by: Hao, Jiawei, et al.
Published: (2026)
by: Hao, Jiawei, et al.
Published: (2026)
From Large to Small: Transferring CUDA Optimization Expertise via Reasoning Graph
by: Gong, Junfeng, et al.
Published: (2025)
by: Gong, Junfeng, et al.
Published: (2025)
Dynamic Expert Sharing: Decoupling Memory from Parallelism in Mixture-of-Experts Diffusion LLMs
by: Chen, Hao Mark, et al.
Published: (2026)
by: Chen, Hao Mark, et al.
Published: (2026)
PIANIST: Learning Partially Observable World Models with LLMs for Multi-Agent Decision Making
by: Light, Jonathan, et al.
Published: (2024)
by: Light, Jonathan, et al.
Published: (2024)
Collaborative Expert LLMs Guided Multi-Objective Molecular Optimization
by: Yu, Jiajun, et al.
Published: (2025)
by: Yu, Jiajun, et al.
Published: (2025)
MixTTE: Multi-Level Mixture-of-Experts for Scalable and Adaptive Travel Time Estimation
by: Jiang, Wenzhao, et al.
Published: (2026)
by: Jiang, Wenzhao, et al.
Published: (2026)
KernelBench: Can LLMs Write Efficient GPU Kernels?
by: Ouyang, Anne, et al.
Published: (2025)
by: Ouyang, Anne, et al.
Published: (2025)
MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel Optimization
by: Guo, Jingming, et al.
Published: (2024)
by: Guo, Jingming, et al.
Published: (2024)
MoNE: Replacing Redundant Experts with Lightweight Novices for Structured Pruning of MoE
by: Zhang, Geng, et al.
Published: (2025)
by: Zhang, Geng, et al.
Published: (2025)
SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs?
by: Zhuang, Haomin, et al.
Published: (2024)
by: Zhuang, Haomin, et al.
Published: (2024)
Exploring Critical Testing Scenarios for Decision-Making Policies: An LLM Approach
by: Xu, Weichao, et al.
Published: (2024)
by: Xu, Weichao, et al.
Published: (2024)
Kernel-Smith: A Unified Recipe for Evolutionary Kernel Optimization
by: Du, He, et al.
Published: (2026)
by: Du, He, et al.
Published: (2026)
Bayesian Mixture-of-Experts: Towards Making LLMs Know What They Don't Know
by: Li, Albus Yizhuo
Published: (2025)
by: Li, Albus Yizhuo
Published: (2025)
MobileKernelBench: Can LLMs Write Efficient Kernels for Mobile Devices?
by: Zou, Xingze, et al.
Published: (2026)
by: Zou, Xingze, et al.
Published: (2026)
AdaKernel: Learning Adaptive Kernel Parameters for Spatiotemporal Graph Neural Networks
by: Zhang, Zhongyue, et al.
Published: (2026)
by: Zhang, Zhongyue, et al.
Published: (2026)
Exploring the Noise Robustness of Online Conformal Prediction
by: Xi, Huajun, et al.
Published: (2025)
by: Xi, Huajun, et al.
Published: (2025)
Lightweight Gaussian Process Inference in C++ on Metal and CUDA
by: Fang, Yu-Hsueh
Published: (2026)
by: Fang, Yu-Hsueh
Published: (2026)
Direct Behavior Optimization: Unlocking the Potential of Lightweight LLMs
by: Yang, Hongming, et al.
Published: (2025)
by: Yang, Hongming, et al.
Published: (2025)
Improving Variable-Length Generation in Diffusion Language Models via Length Regularization
by: Cheng, Zicong, et al.
Published: (2026)
by: Cheng, Zicong, et al.
Published: (2026)
Towards Automated Kernel Generation in the Era of LLMs
by: Yu, Yang, et al.
Published: (2026)
by: Yu, Yang, et al.
Published: (2026)
Similar Items
-
CUDA-LLM: LLMs Can Write Efficient CUDA Kernels
by: Chen, Wentao, et al.
Published: (2025) -
CUDAHercules: Benchmarking Hardware-Aware Expert-level CUDA Optimization for LLMs
by: Li, Shiyang, et al.
Published: (2026) -
CUDA Agent: Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
by: Dai, Weinan, et al.
Published: (2026) -
Kevin: Multi-Turn RL for Generating CUDA Kernels
by: Baronio, Carlo, et al.
Published: (2025) -
Towards Robust Agentic CUDA Kernel Benchmarking, Verification, and Optimization
by: Lange, Robert Tjarko, et al.
Published: (2025)