Saved in:
| Main Authors: | Fovet, Damien, Chamoli, Shashank, Oury, Sarah, Singhal, Srishti |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.08836 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Cloud Computing Energy Consumption Prediction Based on Kernel Extreme Learning Machine Algorithm Improved by Vector Weighted Average Algorithm
by: Wang, Yuqing, et al.
Published: (2025)
by: Wang, Yuqing, et al.
Published: (2025)
CompactifAI: Extreme Compression of Large Language Models using Quantum-Inspired Tensor Networks
by: Tomut, Andrei, et al.
Published: (2024)
by: Tomut, Andrei, et al.
Published: (2024)
Hardware optimization on Android for inference of AI models
by: Gherasim, Iulius, et al.
Published: (2025)
by: Gherasim, Iulius, et al.
Published: (2025)
Reducing Compute Waste in LLMs through Kernel-Level DVFS
by: Spaan, Jeffrey, et al.
Published: (2026)
by: Spaan, Jeffrey, et al.
Published: (2026)
Conformer-Based Speech Recognition On Extreme Edge-Computing Devices
by: Xu, Mingbin, et al.
Published: (2023)
by: Xu, Mingbin, et al.
Published: (2023)
A Structure-Aware Framework for Learning Device Placements on Computation Graphs
by: Duan, Shukai, et al.
Published: (2024)
by: Duan, Shukai, et al.
Published: (2024)
Towards Computational Performance Engineering for Unsupervised Concept Drift Detection -- Complexities, Benchmarking, Performance Analysis
by: Werner, Elias, et al.
Published: (2023)
by: Werner, Elias, et al.
Published: (2023)
EARL: Energy-Aware Optimization of Liquid State Machines for Pervasive AI
by: Iqbal, Zain, et al.
Published: (2026)
by: Iqbal, Zain, et al.
Published: (2026)
Hardware-efficient tractable probabilistic inference for TinyML Neurosymbolic AI applications
by: Leslin, Jelin, et al.
Published: (2025)
by: Leslin, Jelin, et al.
Published: (2025)
DistZO2: High-Throughput and Memory-Efficient Zeroth-Order Fine-tuning LLMs with Distributed Parallel Computing
by: Wang, Liangyu, et al.
Published: (2025)
by: Wang, Liangyu, et al.
Published: (2025)
Ensuring Reliability of Curated EHR-Derived Data: The Validation of Accuracy for LLM/ML-Extracted Information and Data (VALID) Framework
by: Estevez, Melissa, et al.
Published: (2025)
by: Estevez, Melissa, et al.
Published: (2025)
Throughput Optimization as a Strategic Lever in Large-Scale AI Systems: Evidence from Dataloader and Memory Profiling Innovations
by: Jha, Mayank
Published: (2026)
by: Jha, Mayank
Published: (2026)
MicroHD: An Accuracy-Driven Optimization of Hyperdimensional Computing Algorithms for TinyML systems
by: Ponzina, Flavio, et al.
Published: (2024)
by: Ponzina, Flavio, et al.
Published: (2024)
Who Wins the Race? (R Vs Python) - An Exploratory Study on Energy Consumption of Machine Learning Algorithms
by: Chattaraj, Rajrupa, et al.
Published: (2025)
by: Chattaraj, Rajrupa, et al.
Published: (2025)
IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency
by: Ghafouri, Saeid, et al.
Published: (2023)
by: Ghafouri, Saeid, et al.
Published: (2023)
Towards A Flexible Accuracy-Oriented Deep Learning Module Inference Latency Prediction Framework for Adaptive Optimization Algorithms
by: Shen, Jingran, et al.
Published: (2023)
by: Shen, Jingran, et al.
Published: (2023)
On the Sustainability of AI Inferences in the Edge
by: Sobhani, Ghazal, et al.
Published: (2025)
by: Sobhani, Ghazal, et al.
Published: (2025)
PixelBrax: Learning Continuous Control from Pixels End-to-End on the GPU
by: McInroe, Trevor, et al.
Published: (2025)
by: McInroe, Trevor, et al.
Published: (2025)
Efficient Graph Knowledge Distillation from GNNs to Kolmogorov--Arnold Networks via Self-Attention Dynamic Sampling
by: Cui, Can, et al.
Published: (2025)
by: Cui, Can, et al.
Published: (2025)
Optimizing Methane Detection On Board Satellites: Speed, Accuracy, and Low-Power Solutions for Resource-Constrained Hardware
by: Herec, Jonáš, et al.
Published: (2025)
by: Herec, Jonáš, et al.
Published: (2025)
The Race to Efficiency: A New Perspective on AI Scaling Laws
by: Lu, Chien-Ping
Published: (2025)
by: Lu, Chien-Ping
Published: (2025)
Pushing the Envelope of LLM Inference on AI-PC and Intel GPUs
by: Georganas, Evangelos, et al.
Published: (2025)
by: Georganas, Evangelos, et al.
Published: (2025)
DaCe AD: Unifying High-Performance Automatic Differentiation for Machine Learning and Scientific Computing
by: Boudaoud, Afif, et al.
Published: (2025)
by: Boudaoud, Afif, et al.
Published: (2025)
Parallel Implementations Assessment of a Spatial-Spectral Classifier for Hyperspectral Clinical Applications
by: Lazcano, Raquel, et al.
Published: (2024)
by: Lazcano, Raquel, et al.
Published: (2024)
A High-Throughput Compute-Efficient POMDP Hide-And-Seek-Engine (HASE) for Multi-Agent Operations
by: Flavin, Timothy, et al.
Published: (2026)
by: Flavin, Timothy, et al.
Published: (2026)
MoEITS: A Green AI approach for simplifying MoE-LLMs
by: Balderas, Luis, et al.
Published: (2026)
by: Balderas, Luis, et al.
Published: (2026)
Knowledge Grafting: A Mechanism for Optimizing AI Model Deployment in Resource-Constrained Environments
by: Almurshed, Osama, et al.
Published: (2025)
by: Almurshed, Osama, et al.
Published: (2025)
Energy per Successful Goal: Goal-Level Energy Accounting for Agentic AI Systems
by: Panigrahy, Deepak, et al.
Published: (2026)
by: Panigrahy, Deepak, et al.
Published: (2026)
Leveraging Speculative Sampling and KV-Cache Optimizations Together for Generative AI using OpenVINO
by: Barad, Haim, et al.
Published: (2023)
by: Barad, Haim, et al.
Published: (2023)
Design Space Exploration of Approximate Computing Techniques with a Reinforcement Learning Approach
by: Saeedi, Sepide, et al.
Published: (2023)
by: Saeedi, Sepide, et al.
Published: (2023)
CPINN-ABPI: Physics-Informed Neural Networks for Accurate Power Estimation in MPSoCs
by: Elshamy, Mohamed R., et al.
Published: (2025)
by: Elshamy, Mohamed R., et al.
Published: (2025)
PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation
by: An, Zihao, et al.
Published: (2025)
by: An, Zihao, et al.
Published: (2025)
Flashlight: PyTorch Compiler Extensions to Accelerate Attention Variants
by: You, Bozhi, et al.
Published: (2025)
by: You, Bozhi, et al.
Published: (2025)
Enhancing Tropical Cyclone Path Forecasting with an Improved Transformer Network
by: Van Thanh, Nguyen, et al.
Published: (2025)
by: Van Thanh, Nguyen, et al.
Published: (2025)
lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models
by: Wang, Haoxin, et al.
Published: (2025)
by: Wang, Haoxin, et al.
Published: (2025)
WCDT: Systematic WCET Optimization for Decision Tree Implementations
by: Hölscher, Nils, et al.
Published: (2025)
by: Hölscher, Nils, et al.
Published: (2025)
MoE-Inference-Bench: Performance Evaluation of Mixture of Expert Large Language and Vision Models
by: Chitty-Venkata, Krishna Teja, et al.
Published: (2025)
by: Chitty-Venkata, Krishna Teja, et al.
Published: (2025)
MLPerf Automotive
by: Shojaei, Radoyeh, et al.
Published: (2025)
by: Shojaei, Radoyeh, et al.
Published: (2025)
Light Differentiable Logic Gate Networks
by: Rüttgers, Lukas, et al.
Published: (2025)
by: Rüttgers, Lukas, et al.
Published: (2025)
V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms
by: Rodrigo, Javier J. Poveda, et al.
Published: (2025)
by: Rodrigo, Javier J. Poveda, et al.
Published: (2025)
Similar Items
-
Cloud Computing Energy Consumption Prediction Based on Kernel Extreme Learning Machine Algorithm Improved by Vector Weighted Average Algorithm
by: Wang, Yuqing, et al.
Published: (2025) -
CompactifAI: Extreme Compression of Large Language Models using Quantum-Inspired Tensor Networks
by: Tomut, Andrei, et al.
Published: (2024) -
Hardware optimization on Android for inference of AI models
by: Gherasim, Iulius, et al.
Published: (2025) -
Reducing Compute Waste in LLMs through Kernel-Level DVFS
by: Spaan, Jeffrey, et al.
Published: (2026) -
Conformer-Based Speech Recognition On Extreme Edge-Computing Devices
by: Xu, Mingbin, et al.
Published: (2023)