Saved in:
| Main Authors: | Zhang, Yijia, Gou, Zhihong, Cao, Shijie, Feng, Weigang, Zhang, Sicheng, Dai, Guohao, Xu, Ningyi |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2411.18873 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DSO: A GPU Energy Efficiency Optimizer by Fusing Dynamic and Static Information
by: Wang, Qiang, et al.
Published: (2024)
by: Wang, Qiang, et al.
Published: (2024)
WaveTune: Wave-aware Bilinear Modeling for Efficient GPU Kernel Auto-tuning
by: Zhang, Kaixuan, et al.
Published: (2026)
by: Zhang, Kaixuan, et al.
Published: (2026)
KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels
by: Wang, Han, et al.
Published: (2026)
by: Wang, Han, et al.
Published: (2026)
AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search
by: Jaber, Jaber, et al.
Published: (2026)
by: Jaber, Jaber, et al.
Published: (2026)
KernelBench: Can LLMs Write Efficient GPU Kernels?
by: Ouyang, Anne, et al.
Published: (2025)
by: Ouyang, Anne, et al.
Published: (2025)
FlipFlop: A Static Analysis-based Energy Optimization Framework for GPU Kernels
by: Rajput, Saurabhsingh, et al.
Published: (2026)
by: Rajput, Saurabhsingh, et al.
Published: (2026)
oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation
by: Li, Jianhui, et al.
Published: (2023)
by: Li, Jianhui, et al.
Published: (2023)
Insum: Sparse GPU Kernels Simplified and Optimized with Indirect Einsums
by: Won, Jaeyeon, et al.
Published: (2025)
by: Won, Jaeyeon, et al.
Published: (2025)
Record-Remix-Replay: Hierarchical GPU Kernel Optimization using Evolutionary Search
by: Nichols, Daniel, et al.
Published: (2026)
by: Nichols, Daniel, et al.
Published: (2026)
Conformer-Based Speech Recognition On Extreme Edge-Computing Devices
by: Xu, Mingbin, et al.
Published: (2023)
by: Xu, Mingbin, et al.
Published: (2023)
GPU Kernel Scientist: An LLM-Driven Framework for Iterative Kernel Optimization
by: Andrews, Martin, et al.
Published: (2025)
by: Andrews, Martin, et al.
Published: (2025)
KEET: Explaining Performance of GPU Kernels Using LLM Agents
by: Davis, Joshua H., et al.
Published: (2026)
by: Davis, Joshua H., et al.
Published: (2026)
PipeWeave: Synergizing Analytical and Learning Models for Unified GPU Performance Prediction
by: Zhang, Kaixuan, et al.
Published: (2026)
by: Zhang, Kaixuan, et al.
Published: (2026)
Integrating Performance Tools in Model Reasoning for GPU Kernel Optimization
by: Nichols, Daniel, et al.
Published: (2025)
by: Nichols, Daniel, et al.
Published: (2025)
Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design
by: Huang, Zixiao, et al.
Published: (2025)
by: Huang, Zixiao, et al.
Published: (2025)
FastDecode: High-Throughput GPU-Efficient LLM Serving using Heterogeneous Pipelines
by: He, Jiaao, et al.
Published: (2024)
by: He, Jiaao, et al.
Published: (2024)
Efficient GPU implementation of randomized SVD and its applications
by: Struski, Łukasz, et al.
Published: (2021)
by: Struski, Łukasz, et al.
Published: (2021)
The Energy Cost of Execution-Idle in GPU Clusters
by: Lei, Yiran, et al.
Published: (2026)
by: Lei, Yiran, et al.
Published: (2026)
A Kernel-Based Approach for Accurate Steady-State Detection in Performance Time Series
by: Beseda, Martin, et al.
Published: (2025)
by: Beseda, Martin, et al.
Published: (2025)
Efficient GPU-Centered Singular Value Decomposition Using the Divide-and-Conquer Method
by: Liu, Shifang, et al.
Published: (2025)
by: Liu, Shifang, et al.
Published: (2025)
GCL-Sampler: Discovering Kernel Similarity for Sampled GPU Simulation via Graph Contrastive Learning
by: Wang, Jiaqi, et al.
Published: (2026)
by: Wang, Jiaqi, et al.
Published: (2026)
An Empirical Study on the Performance and Energy Usage of Compiled Python Code
by: Stoico, Vincenzo, et al.
Published: (2025)
by: Stoico, Vincenzo, et al.
Published: (2025)
On Combining Two Server Control Policies for Energy Efficiency
by: Dai, Jingze, et al.
Published: (2025)
by: Dai, Jingze, et al.
Published: (2025)
HybridGen: Efficient LLM Generative Inference via CPU-GPU Hybrid Computing
by: Lin, Mao, et al.
Published: (2026)
by: Lin, Mao, et al.
Published: (2026)
MultiKernelBench: A Multi-Platform Benchmark for Kernel Generation
by: Wen, Zhongzhen, et al.
Published: (2025)
by: Wen, Zhongzhen, et al.
Published: (2025)
Flex Attention: A Programming Model for Generating Optimized Attention Kernels
by: Dong, Juechu, et al.
Published: (2024)
by: Dong, Juechu, et al.
Published: (2024)
CAPSim: A Fast CPU Performance Simulator Using Attention-based Predictor
by: Xu, Buqing, et al.
Published: (2025)
by: Xu, Buqing, et al.
Published: (2025)
FLuRKA: Fast and accurate unified Low-Rank & Kernel Attention
by: Gupta, Ahan, et al.
Published: (2023)
by: Gupta, Ahan, et al.
Published: (2023)
DRIM-ANN: An Approximate Nearest Neighbor Search Engine based on Commercial DRAM-PIMs
by: Chen, Mingkai, et al.
Published: (2024)
by: Chen, Mingkai, et al.
Published: (2024)
Fast and Scalable Mixed Precision Euclidean Distance Calculations Using GPU Tensor Cores
by: Curless, Brian, et al.
Published: (2025)
by: Curless, Brian, et al.
Published: (2025)
Cloud Computing Energy Consumption Prediction Based on Kernel Extreme Learning Machine Algorithm Improved by Vector Weighted Average Algorithm
by: Wang, Yuqing, et al.
Published: (2025)
by: Wang, Yuqing, et al.
Published: (2025)
DCC: Data-Centric Compilation of Machine Learning Kernels for Processing-In-Memory Architectures
by: Yang, Peiming, et al.
Published: (2025)
by: Yang, Peiming, et al.
Published: (2025)
AMD MI300X GPU Performance Analysis
by: Ambati, Chandrish, et al.
Published: (2025)
by: Ambati, Chandrish, et al.
Published: (2025)
CuTeGen: An LLM-Based Agentic Framework for Generation and Optimization of High-Performance GPU Kernels using CuTe
by: Saba, Tara, et al.
Published: (2026)
by: Saba, Tara, et al.
Published: (2026)
Prism: Unleashing GPU Sharing for Cost-Efficient Multi-LLM Serving
by: Yu, Shan, et al.
Published: (2025)
by: Yu, Shan, et al.
Published: (2025)
gDist: Efficient Distance Computation between 3D Meshes on GPU
by: Fang, Peng, et al.
Published: (2024)
by: Fang, Peng, et al.
Published: (2024)
Canvas: End-to-End Kernel Architecture Search in Neural Networks
by: Zhao, Chenggang, et al.
Published: (2023)
by: Zhao, Chenggang, et al.
Published: (2023)
Disaggregated Design for GPU-Based Volumetric Data Structures
by: Meneghin, Massimiliano, et al.
Published: (2025)
by: Meneghin, Massimiliano, et al.
Published: (2025)
Efficient allocation of image recognition and LLM tasks on multi-GPU system
by: Lawenda, Marcin, et al.
Published: (2025)
by: Lawenda, Marcin, et al.
Published: (2025)
CCSS: Hardware-Accelerated RTL Simulation with Fast Combinational Logic Computing and Sequential Logic Synchronization
by: Feng, Weigang, et al.
Published: (2025)
by: Feng, Weigang, et al.
Published: (2025)
Similar Items
-
DSO: A GPU Energy Efficiency Optimizer by Fusing Dynamic and Static Information
by: Wang, Qiang, et al.
Published: (2024) -
WaveTune: Wave-aware Bilinear Modeling for Efficient GPU Kernel Auto-tuning
by: Zhang, Kaixuan, et al.
Published: (2026) -
KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels
by: Wang, Han, et al.
Published: (2026) -
AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search
by: Jaber, Jaber, et al.
Published: (2026) -
KernelBench: Can LLMs Write Efficient GPU Kernels?
by: Ouyang, Anne, et al.
Published: (2025)