:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Iqbal, Zain, Valerio, Lorenzo
Format:	Preprint
Published:	2026
Subjects:	Machine Learning Performance
Online Access:	https://arxiv.org/abs/2601.05205
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

GreenServ: Energy-Efficient Context-Aware Dynamic Routing for Multi-Model LLM Inference
by: Ziller, Thomas, et al.
Published: (2026)

MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert Cache
by: Xue, Leyang, et al.
Published: (2024)

Energy-Aware LLMs: A step towards sustainable AI for downstream applications
by: Tran, Nguyen Phuc, et al.
Published: (2025)

Cloud Computing Energy Consumption Prediction Based on Kernel Extreme Learning Machine Algorithm Improved by Vector Weighted Average Algorithm
by: Wang, Yuqing, et al.
Published: (2025)

Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems
by: Panigrahy, Deepak, et al.
Published: (2026)

KForge: Program Synthesis for Diverse AI Hardware Accelerators
by: Sereda, Taras, et al.
Published: (2025)

Energy-Efficient Transformer Inference: Optimization Strategies for Time Series Classification
by: Kermani, Arshia, et al.
Published: (2025)

ALERT: Accurate Learning for Energy and Timeliness
by: Wan, Chengcheng, et al.
Published: (2019)

Enhancing Energy-Awareness in Deep Learning through Fine-Grained Energy Measurement
by: Rajput, Saurabhsingh, et al.
Published: (2023)

A Structure-Aware Framework for Learning Device Placements on Computation Graphs
by: Duan, Shukai, et al.
Published: (2024)

Hardware optimization on Android for inference of AI models
by: Gherasim, Iulius, et al.
Published: (2025)

PrETi: Predicting Execution Time in Early Stage with LLVM and Machine Learning
by: Xu, Risheng, et al.
Published: (2025)

One Size Does Not Fit All: Architecture-Aware Adaptive Batch Scheduling with DEBA
by: Belias, François, et al.
Published: (2025)

Knowledge Grafting: A Mechanism for Optimizing AI Model Deployment in Resource-Constrained Environments
by: Almurshed, Osama, et al.
Published: (2025)

Greener Deep Reinforcement Learning: Analysis of Energy and Carbon Efficiency Across Atari Benchmarks
by: Gardner, Jason, et al.
Published: (2025)

WCDT: Systematic WCET Optimization for Decision Tree Implementations
by: Hölscher, Nils, et al.
Published: (2025)

Leveraging Speculative Sampling and KV-Cache Optimizations Together for Generative AI using OpenVINO
by: Barad, Haim, et al.
Published: (2023)

Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach
by: Zhang, Yijia, et al.
Published: (2024)

LiveTune: Dynamic Parameter Tuning for Feedback-Driven Optimization
by: Shabgahi, Soheil Zibakhsh, et al.
Published: (2023)

AutoSAGE: Input-Aware CUDA Scheduling for Sparse GNN Aggregation (SpMM/SDDMM) and CSR Attention
by: Stankovic, Aleksandar
Published: (2025)

A Scalable k-Medoids Clustering via Whale Optimization Algorithm
by: Chenan, Huang, et al.
Published: (2024)

Hardware-efficient tractable probabilistic inference for TinyML Neurosymbolic AI applications
by: Leslin, Jelin, et al.
Published: (2025)

A Kernel-Based Approach for Accurate Steady-State Detection in Performance Time Series
by: Beseda, Martin, et al.
Published: (2025)

Feature Optimization for Time Series Forecasting via Novel Randomized Uphill Climbing
by: Van Thanh, Nguyen
Published: (2025)

AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search
by: Jaber, Jaber, et al.
Published: (2026)

Accuracy and Consumption analysis from a compressed model by CompactifAI from Multiverse Computing
by: Fovet, Damien, et al.
Published: (2025)

cedar: Optimized and Unified Machine Learning Input Data Pipelines
by: Zhao, Mark, et al.
Published: (2024)

Throughput Optimization as a Strategic Lever in Large-Scale AI Systems: Evidence from Dataloader and Memory Profiling Innovations
by: Jha, Mayank
Published: (2026)

An Autotuning-based Optimization Framework for Mixed-kernel SVM Classifications in Smart Pixel Datasets and Heterojunction Transistors
by: Wu, Xingfu, et al.
Published: (2024)

Machine Learning Models for Reinforced Concrete Pipes Condition Prediction: The State-of-the-Art Using Artificial Neural Networks and Multiple Linear Regression in a Wisconsin Case Study
by: Mohammadagha, Mohsen, et al.
Published: (2025)

Who Wins the Race? (R Vs Python) - An Exploratory Study on Energy Consumption of Machine Learning Algorithms
by: Chattaraj, Rajrupa, et al.
Published: (2025)

EXAQ: Exponent Aware Quantization For LLMs Acceleration
by: Shkolnik, Moran, et al.
Published: (2024)

Risk-Aware Batch Testing for Performance Regression Detection
by: Sayedsalehi, Ali, et al.
Published: (2026)

On the Sustainability of AI Inferences in the Edge
by: Sobhani, Ghazal, et al.
Published: (2025)

A2Q+: Improving Accumulator-Aware Weight Quantization
by: Colbert, Ian, et al.
Published: (2024)

ASPO: Constraint-Aware Bayesian Optimization for FPGA-based Soft Processors
by: Wu, Haoran, et al.
Published: (2025)

Machine Learning Methods for Evaluating Public Crisis: Meta-Analysis
by: Okpala, Izunna, et al.
Published: (2023)

Adaptive Workload Distribution for Accuracy-aware DNN Inference on Collaborative Edge Platforms
by: Taufique, Zain, et al.
Published: (2023)

ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching
by: Zhao, Youpeng, et al.
Published: (2024)

Offline Reinforcement-Learning-Based Power Control for Application-Agnostic Energy Efficiency
by: Raj, Akhilesh, et al.
Published: (2026)