Saved in:
| Main Authors: | Iqbal, Zain, Valerio, Lorenzo |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.05205 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
GreenServ: Energy-Efficient Context-Aware Dynamic Routing for Multi-Model LLM Inference
by: Ziller, Thomas, et al.
Published: (2026)
by: Ziller, Thomas, et al.
Published: (2026)
MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert Cache
by: Xue, Leyang, et al.
Published: (2024)
by: Xue, Leyang, et al.
Published: (2024)
Energy-Aware LLMs: A step towards sustainable AI for downstream applications
by: Tran, Nguyen Phuc, et al.
Published: (2025)
by: Tran, Nguyen Phuc, et al.
Published: (2025)
Cloud Computing Energy Consumption Prediction Based on Kernel Extreme Learning Machine Algorithm Improved by Vector Weighted Average Algorithm
by: Wang, Yuqing, et al.
Published: (2025)
by: Wang, Yuqing, et al.
Published: (2025)
Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems
by: Panigrahy, Deepak, et al.
Published: (2026)
by: Panigrahy, Deepak, et al.
Published: (2026)
KForge: Program Synthesis for Diverse AI Hardware Accelerators
by: Sereda, Taras, et al.
Published: (2025)
by: Sereda, Taras, et al.
Published: (2025)
Energy-Efficient Transformer Inference: Optimization Strategies for Time Series Classification
by: Kermani, Arshia, et al.
Published: (2025)
by: Kermani, Arshia, et al.
Published: (2025)
ALERT: Accurate Learning for Energy and Timeliness
by: Wan, Chengcheng, et al.
Published: (2019)
by: Wan, Chengcheng, et al.
Published: (2019)
Enhancing Energy-Awareness in Deep Learning through Fine-Grained Energy Measurement
by: Rajput, Saurabhsingh, et al.
Published: (2023)
by: Rajput, Saurabhsingh, et al.
Published: (2023)
A Structure-Aware Framework for Learning Device Placements on Computation Graphs
by: Duan, Shukai, et al.
Published: (2024)
by: Duan, Shukai, et al.
Published: (2024)
Hardware optimization on Android for inference of AI models
by: Gherasim, Iulius, et al.
Published: (2025)
by: Gherasim, Iulius, et al.
Published: (2025)
PrETi: Predicting Execution Time in Early Stage with LLVM and Machine Learning
by: Xu, Risheng, et al.
Published: (2025)
by: Xu, Risheng, et al.
Published: (2025)
One Size Does Not Fit All: Architecture-Aware Adaptive Batch Scheduling with DEBA
by: Belias, François, et al.
Published: (2025)
by: Belias, François, et al.
Published: (2025)
Knowledge Grafting: A Mechanism for Optimizing AI Model Deployment in Resource-Constrained Environments
by: Almurshed, Osama, et al.
Published: (2025)
by: Almurshed, Osama, et al.
Published: (2025)
Greener Deep Reinforcement Learning: Analysis of Energy and Carbon Efficiency Across Atari Benchmarks
by: Gardner, Jason, et al.
Published: (2025)
by: Gardner, Jason, et al.
Published: (2025)
WCDT: Systematic WCET Optimization for Decision Tree Implementations
by: Hölscher, Nils, et al.
Published: (2025)
by: Hölscher, Nils, et al.
Published: (2025)
Leveraging Speculative Sampling and KV-Cache Optimizations Together for Generative AI using OpenVINO
by: Barad, Haim, et al.
Published: (2023)
by: Barad, Haim, et al.
Published: (2023)
Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach
by: Zhang, Yijia, et al.
Published: (2024)
by: Zhang, Yijia, et al.
Published: (2024)
LiveTune: Dynamic Parameter Tuning for Feedback-Driven Optimization
by: Shabgahi, Soheil Zibakhsh, et al.
Published: (2023)
by: Shabgahi, Soheil Zibakhsh, et al.
Published: (2023)
AutoSAGE: Input-Aware CUDA Scheduling for Sparse GNN Aggregation (SpMM/SDDMM) and CSR Attention
by: Stankovic, Aleksandar
Published: (2025)
by: Stankovic, Aleksandar
Published: (2025)
A Scalable k-Medoids Clustering via Whale Optimization Algorithm
by: Chenan, Huang, et al.
Published: (2024)
by: Chenan, Huang, et al.
Published: (2024)
Hardware-efficient tractable probabilistic inference for TinyML Neurosymbolic AI applications
by: Leslin, Jelin, et al.
Published: (2025)
by: Leslin, Jelin, et al.
Published: (2025)
A Kernel-Based Approach for Accurate Steady-State Detection in Performance Time Series
by: Beseda, Martin, et al.
Published: (2025)
by: Beseda, Martin, et al.
Published: (2025)
Feature Optimization for Time Series Forecasting via Novel Randomized Uphill Climbing
by: Van Thanh, Nguyen
Published: (2025)
by: Van Thanh, Nguyen
Published: (2025)
AutoKernel: Autonomous GPU Kernel Optimization via Iterative Agent-Driven Search
by: Jaber, Jaber, et al.
Published: (2026)
by: Jaber, Jaber, et al.
Published: (2026)
Accuracy and Consumption analysis from a compressed model by CompactifAI from Multiverse Computing
by: Fovet, Damien, et al.
Published: (2025)
by: Fovet, Damien, et al.
Published: (2025)
cedar: Optimized and Unified Machine Learning Input Data Pipelines
by: Zhao, Mark, et al.
Published: (2024)
by: Zhao, Mark, et al.
Published: (2024)
Throughput Optimization as a Strategic Lever in Large-Scale AI Systems: Evidence from Dataloader and Memory Profiling Innovations
by: Jha, Mayank
Published: (2026)
by: Jha, Mayank
Published: (2026)
An Autotuning-based Optimization Framework for Mixed-kernel SVM Classifications in Smart Pixel Datasets and Heterojunction Transistors
by: Wu, Xingfu, et al.
Published: (2024)
by: Wu, Xingfu, et al.
Published: (2024)
Machine Learning Models for Reinforced Concrete Pipes Condition Prediction: The State-of-the-Art Using Artificial Neural Networks and Multiple Linear Regression in a Wisconsin Case Study
by: Mohammadagha, Mohsen, et al.
Published: (2025)
by: Mohammadagha, Mohsen, et al.
Published: (2025)
Who Wins the Race? (R Vs Python) - An Exploratory Study on Energy Consumption of Machine Learning Algorithms
by: Chattaraj, Rajrupa, et al.
Published: (2025)
by: Chattaraj, Rajrupa, et al.
Published: (2025)
EXAQ: Exponent Aware Quantization For LLMs Acceleration
by: Shkolnik, Moran, et al.
Published: (2024)
by: Shkolnik, Moran, et al.
Published: (2024)
Risk-Aware Batch Testing for Performance Regression Detection
by: Sayedsalehi, Ali, et al.
Published: (2026)
by: Sayedsalehi, Ali, et al.
Published: (2026)
On the Sustainability of AI Inferences in the Edge
by: Sobhani, Ghazal, et al.
Published: (2025)
by: Sobhani, Ghazal, et al.
Published: (2025)
A2Q+: Improving Accumulator-Aware Weight Quantization
by: Colbert, Ian, et al.
Published: (2024)
by: Colbert, Ian, et al.
Published: (2024)
ASPO: Constraint-Aware Bayesian Optimization for FPGA-based Soft Processors
by: Wu, Haoran, et al.
Published: (2025)
by: Wu, Haoran, et al.
Published: (2025)
Machine Learning Methods for Evaluating Public Crisis: Meta-Analysis
by: Okpala, Izunna, et al.
Published: (2023)
by: Okpala, Izunna, et al.
Published: (2023)
Adaptive Workload Distribution for Accuracy-aware DNN Inference on Collaborative Edge Platforms
by: Taufique, Zain, et al.
Published: (2023)
by: Taufique, Zain, et al.
Published: (2023)
ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching
by: Zhao, Youpeng, et al.
Published: (2024)
by: Zhao, Youpeng, et al.
Published: (2024)
Offline Reinforcement-Learning-Based Power Control for Application-Agnostic Energy Efficiency
by: Raj, Akhilesh, et al.
Published: (2026)
by: Raj, Akhilesh, et al.
Published: (2026)
Similar Items
-
GreenServ: Energy-Efficient Context-Aware Dynamic Routing for Multi-Model LLM Inference
by: Ziller, Thomas, et al.
Published: (2026) -
MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert Cache
by: Xue, Leyang, et al.
Published: (2024) -
Energy-Aware LLMs: A step towards sustainable AI for downstream applications
by: Tran, Nguyen Phuc, et al.
Published: (2025) -
Cloud Computing Energy Consumption Prediction Based on Kernel Extreme Learning Machine Algorithm Improved by Vector Weighted Average Algorithm
by: Wang, Yuqing, et al.
Published: (2025) -
Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems
by: Panigrahy, Deepak, et al.
Published: (2026)