Saved in:
| Main Authors: | Spaan, Jeffrey, Chen, Kuan-Hsun, Varbanescu, Ana-Lucia |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.08539 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
WCDT: Systematic WCET Optimization for Decision Tree Implementations
by: Hölscher, Nils, et al.
Published: (2025)
by: Hölscher, Nils, et al.
Published: (2025)
KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels
by: Wang, Han, et al.
Published: (2026)
by: Wang, Han, et al.
Published: (2026)
KernelBench: Can LLMs Write Efficient GPU Kernels?
by: Ouyang, Anne, et al.
Published: (2025)
by: Ouyang, Anne, et al.
Published: (2025)
AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search
by: Jaber, Jaber, et al.
Published: (2026)
by: Jaber, Jaber, et al.
Published: (2026)
InTreeger: An End-to-End Framework for Integer-Only Decision Tree Inference
by: Bart, Duncan, et al.
Published: (2025)
by: Bart, Duncan, et al.
Published: (2025)
Cloud Computing Energy Consumption Prediction Based on Kernel Extreme Learning Machine Algorithm Improved by Vector Weighted Average Algorithm
by: Wang, Yuqing, et al.
Published: (2025)
by: Wang, Yuqing, et al.
Published: (2025)
DistZO2: High-Throughput and Memory-Efficient Zeroth-Order Fine-tuning LLMs with Distributed Parallel Computing
by: Wang, Liangyu, et al.
Published: (2025)
by: Wang, Liangyu, et al.
Published: (2025)
A Kernel-Based Approach for Accurate Steady-State Detection in Performance Time Series
by: Beseda, Martin, et al.
Published: (2025)
by: Beseda, Martin, et al.
Published: (2025)
Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach
by: Zhang, Yijia, et al.
Published: (2024)
by: Zhang, Yijia, et al.
Published: (2024)
SAfEPaTh: A System-Level Approach for Efficient Power and Thermal Estimation of Convolutional Neural Network Accelerator
by: Chen, Yukai, et al.
Published: (2024)
by: Chen, Yukai, et al.
Published: (2024)
Ragged Paged Attention: A High-Performance and Flexible LLM Inference Kernel for TPU
by: Jiang, Jevin, et al.
Published: (2026)
by: Jiang, Jevin, et al.
Published: (2026)
GPU Kernel Scientist: An LLM-Driven Framework for Iterative Kernel Optimization
by: Andrews, Martin, et al.
Published: (2025)
by: Andrews, Martin, et al.
Published: (2025)
FLuRKA: Fast and accurate unified Low-Rank & Kernel Attention
by: Gupta, Ahan, et al.
Published: (2023)
by: Gupta, Ahan, et al.
Published: (2023)
Flex Attention: A Programming Model for Generating Optimized Attention Kernels
by: Dong, Juechu, et al.
Published: (2024)
by: Dong, Juechu, et al.
Published: (2024)
Conformer-Based Speech Recognition On Extreme Edge-Computing Devices
by: Xu, Mingbin, et al.
Published: (2023)
by: Xu, Mingbin, et al.
Published: (2023)
A Structure-Aware Framework for Learning Device Placements on Computation Graphs
by: Duan, Shukai, et al.
Published: (2024)
by: Duan, Shukai, et al.
Published: (2024)
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference
by: Holmes, Connor, et al.
Published: (2024)
by: Holmes, Connor, et al.
Published: (2024)
GCL-Sampler: Discovering Kernel Similarity for Sampled GPU Simulation via Graph Contrastive Learning
by: Wang, Jiaqi, et al.
Published: (2026)
by: Wang, Jiaqi, et al.
Published: (2026)
Towards Computational Performance Engineering for Unsupervised Concept Drift Detection -- Complexities, Benchmarking, Performance Analysis
by: Werner, Elias, et al.
Published: (2023)
by: Werner, Elias, et al.
Published: (2023)
Accuracy and Consumption analysis from a compressed model by CompactifAI from Multiverse Computing
by: Fovet, Damien, et al.
Published: (2025)
by: Fovet, Damien, et al.
Published: (2025)
Revisiting Forest Proximities via Sparse Leaf-Incidence Kernels
by: Aumon, Adrien, et al.
Published: (2026)
by: Aumon, Adrien, et al.
Published: (2026)
Using Evolutionary Algorithms to Find Cache-Friendly Generalized Morton Layouts for Arrays
by: Swatman, Stephen Nicholas, et al.
Published: (2023)
by: Swatman, Stephen Nicholas, et al.
Published: (2023)
Kevin: Multi-Turn RL for Generating CUDA Kernels
by: Baronio, Carlo, et al.
Published: (2025)
by: Baronio, Carlo, et al.
Published: (2025)
EXAQ: Exponent Aware Quantization For LLMs Acceleration
by: Shkolnik, Moran, et al.
Published: (2024)
by: Shkolnik, Moran, et al.
Published: (2024)
LLMs for Analog Circuit Design Continuum (ACDC)
by: Esfandiari, Yasaman, et al.
Published: (2025)
by: Esfandiari, Yasaman, et al.
Published: (2025)
Reducing Latency of LLM Search Agent via Speculation-based Algorithm-System Co-Design
by: Huang, Zixiao, et al.
Published: (2025)
by: Huang, Zixiao, et al.
Published: (2025)
PATCH: Learnable Tile-level Hybrid Sparsity for LLMs
by: Hourri, Younes, et al.
Published: (2025)
by: Hourri, Younes, et al.
Published: (2025)
MultiKernelBench: A Multi-Platform Benchmark for Kernel Generation
by: Wen, Zhongzhen, et al.
Published: (2025)
by: Wen, Zhongzhen, et al.
Published: (2025)
A Theory of Inference Compute Scaling: Reasoning through Directed Stochastic Skill Search
by: Ellis-Mohr, Austin R., et al.
Published: (2025)
by: Ellis-Mohr, Austin R., et al.
Published: (2025)
KernelEvolve: Scaling Agentic Kernel Coding for Heterogeneous AI Accelerators at Meta
by: Liao, Gang, et al.
Published: (2025)
by: Liao, Gang, et al.
Published: (2025)
Ghosted Layers: Unconstrained Activation Alignment for Recovering Layer-Pruned LLMs
by: Yun, Vincent-Daniel, et al.
Published: (2026)
by: Yun, Vincent-Daniel, et al.
Published: (2026)
Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems
by: Panigrahy, Deepak, et al.
Published: (2026)
by: Panigrahy, Deepak, et al.
Published: (2026)
MoEITS: A Green AI approach for simplifying MoE-LLMs
by: Balderas, Luis, et al.
Published: (2026)
by: Balderas, Luis, et al.
Published: (2026)
OPTIMA: Optimal One-shot Pruning for LLMs via Quadratic Programming Reconstruction
by: Mozaffari, Mohammad, et al.
Published: (2025)
by: Mozaffari, Mohammad, et al.
Published: (2025)
DF-GNN: Dynamic Fusion Framework for Attention Graph Neural Networks on GPUs
by: Liu, Jiahui, et al.
Published: (2024)
by: Liu, Jiahui, et al.
Published: (2024)
lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models
by: Wang, Haoxin, et al.
Published: (2025)
by: Wang, Haoxin, et al.
Published: (2025)
Hyperdimensional Computing for Sustainable Manufacturing: An Initial Assessment
by: Hoang, Danny, et al.
Published: (2025)
by: Hoang, Danny, et al.
Published: (2025)
Anatomizing Deep Learning Inference in Web Browsers
by: Wang, Qipeng, et al.
Published: (2024)
by: Wang, Qipeng, et al.
Published: (2024)
DaCe AD: Unifying High-Performance Automatic Differentiation for Machine Learning and Scientific Computing
by: Boudaoud, Afif, et al.
Published: (2025)
by: Boudaoud, Afif, et al.
Published: (2025)
Application Research On Real-Time Perception Of Device Performance Status
by: Wang, Zhe, et al.
Published: (2024)
by: Wang, Zhe, et al.
Published: (2024)
Similar Items
-
WCDT: Systematic WCET Optimization for Decision Tree Implementations
by: Hölscher, Nils, et al.
Published: (2025) -
KernelBenchX: A Comprehensive Benchmark for Evaluating LLM-Generated GPU Kernels
by: Wang, Han, et al.
Published: (2026) -
KernelBench: Can LLMs Write Efficient GPU Kernels?
by: Ouyang, Anne, et al.
Published: (2025) -
AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search
by: Jaber, Jaber, et al.
Published: (2026) -
InTreeger: An End-to-End Framework for Integer-Only Decision Tree Inference
by: Bart, Duncan, et al.
Published: (2025)