Saved in:
| Main Authors: | Coleman, Tainã, Ahmed, Hena, Shende, Ravi, Perez, Ismael, Altintaş, Ïlkay |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2506.13730 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Ksurf-Drone: Attention Kalman Filter for Contextual Bandit Optimization in Cloud Resource Allocation
by: Dang'ana, Michael, et al.
Published: (2025)
by: Dang'ana, Michael, et al.
Published: (2025)
Hardware-Aware Reformulation of Convolutions for Efficient Execution on Specialized AI Hardware: A Case Study on NVIDIA Tensor Cores
by: Bikshandi, Ganesh
Published: (2026)
by: Bikshandi, Ganesh
Published: (2026)
Online GPU Energy Optimization with Switching-Aware Bandits
by: Xu, Xiongxiao, et al.
Published: (2024)
by: Xu, Xiongxiao, et al.
Published: (2024)
The Case for Co-Designing Model Architectures with Hardware
by: Anthony, Quentin, et al.
Published: (2024)
by: Anthony, Quentin, et al.
Published: (2024)
SwizzlePerf: Hardware-Aware LLMs for GPU Kernel Performance Optimization
by: Tschand, Arya, et al.
Published: (2025)
by: Tschand, Arya, et al.
Published: (2025)
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning
by: An, Wei, et al.
Published: (2024)
by: An, Wei, et al.
Published: (2024)
LLMServingSim2.0: A Unified Simulator for Heterogeneous Hardware and Serving Techniques in LLM Infrastructure
by: Cho, Jaehong, et al.
Published: (2025)
by: Cho, Jaehong, et al.
Published: (2025)
Hardware Utilization and Inference Performance of Edge Object Detection Under Fault Injection
by: Pasandideh, Faezeh, et al.
Published: (2026)
by: Pasandideh, Faezeh, et al.
Published: (2026)
Viability and Performance of a Private LLM Server for SMBs: A Benchmark Analysis of Qwen3-30B on Consumer-Grade Hardware
by: Khalil, Alex, et al.
Published: (2025)
by: Khalil, Alex, et al.
Published: (2025)
GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific Data
by: Jia, Wenqi, et al.
Published: (2024)
by: Jia, Wenqi, et al.
Published: (2024)
MoE-Lens: Towards the Hardware Limit of High-Throughput MoE LLM Serving Under Resource Constraints
by: Yuan, Yichao, et al.
Published: (2025)
by: Yuan, Yichao, et al.
Published: (2025)
EdgeProfiler: A Fast Profiling Framework for Lightweight LLMs on Edge Using Analytical Model
by: Pinnock, Alyssa, et al.
Published: (2025)
by: Pinnock, Alyssa, et al.
Published: (2025)
Cascading Bandits With Feedback
by: Prakash, R Sri, et al.
Published: (2025)
by: Prakash, R Sri, et al.
Published: (2025)
Deploying Atmospheric and Oceanic AI Models on Chinese Hardware and Framework: Migration Strategies, Performance Optimization and Analysis
by: Sun, Yuze, et al.
Published: (2025)
by: Sun, Yuze, et al.
Published: (2025)
Resilient Byzantine Agreement with Predictions
by: Dallot, Julien, et al.
Published: (2026)
by: Dallot, Julien, et al.
Published: (2026)
On the Impact of White-box Deployment Strategies for Edge AI on Latency and Model Performance
by: Singh, Jaskirat, et al.
Published: (2024)
by: Singh, Jaskirat, et al.
Published: (2024)
PRAGMA: A Profiling-Reasoned Multi-Agent Framework for Automatic Kernel Optimization
by: Lei, Kelun, et al.
Published: (2025)
by: Lei, Kelun, et al.
Published: (2025)
ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks
by: Shi, Ziji, et al.
Published: (2024)
by: Shi, Ziji, et al.
Published: (2024)
Cloudless-Training: A Framework to Improve Efficiency of Geo-Distributed ML Training
by: Tan, Wenting, et al.
Published: (2023)
by: Tan, Wenting, et al.
Published: (2023)
FedDCT: A Dynamic Cross-Tier Federated Learning Framework in Wireless Networks
by: Xian, Youquan, et al.
Published: (2023)
by: Xian, Youquan, et al.
Published: (2023)
AI Inference as Relocatable Electricity Demand: A Latency-Constrained Energy-Geography Framework
by: Luo, Xubin, et al.
Published: (2026)
by: Luo, Xubin, et al.
Published: (2026)
Placement Semantics for Distributed Deep Learning: A Systematic Framework for Analyzing Parallelism Strategies
by: Mehta, Deep Pankajbhai
Published: (2026)
by: Mehta, Deep Pankajbhai
Published: (2026)
A Scheduling Framework for Efficient MoE Inference on Edge GPU-NDP Systems
by: Wu, Qi, et al.
Published: (2026)
by: Wu, Qi, et al.
Published: (2026)
KVComp: A High-Performance, LLM-Aware, Lossy Compression Framework for KV Cache
by: Jiang, Bo, et al.
Published: (2025)
by: Jiang, Bo, et al.
Published: (2025)
Binary Bleed: Fast Distributed and Parallel Method for Automatic Model Selection
by: Barron, Ryan, et al.
Published: (2024)
by: Barron, Ryan, et al.
Published: (2024)
A Parallel CPU-GPU Framework for Batching Heuristic Operations in Depth-First Heuristic Search
by: Futuhi, Ehsan, et al.
Published: (2025)
by: Futuhi, Ehsan, et al.
Published: (2025)
GPU-Virt-Bench: A Comprehensive Benchmarking Framework for Software-Based GPU Virtualization Systems
by: VG, Jithin, et al.
Published: (2025)
by: VG, Jithin, et al.
Published: (2025)
AEG: A Baremetal Framework for AI Acceleration via Direct Hardware Access in Heterogeneous Accelerators
by: Jiang, Hua, et al.
Published: (2026)
by: Jiang, Hua, et al.
Published: (2026)
Artificial Intelligence for Cost-Aware Resource Prediction in Big Data Pipelines
by: Goyal, Harshit
Published: (2025)
by: Goyal, Harshit
Published: (2025)
Block: Balancing Load in LLM Serving with Context, Knowledge and Predictive Scheduling
by: Da, Wei, et al.
Published: (2025)
by: Da, Wei, et al.
Published: (2025)
KAITIAN: A Unified Communication Framework for Enabling Efficient Collaboration Across Heterogeneous Accelerators in Embodied AI Systems
by: Lin, Jieke, et al.
Published: (2025)
by: Lin, Jieke, et al.
Published: (2025)
A Nonlinear Hash-based Optimization Method for SpMV on GPUs
by: Yan, Chen, et al.
Published: (2025)
by: Yan, Chen, et al.
Published: (2025)
A Blockchain and Artificial Intelligence based System for Halal Food Traceability
by: Alourani, Abdulla, et al.
Published: (2024)
by: Alourani, Abdulla, et al.
Published: (2024)
Towards Verifiable Federated Unlearning: Framework, Challenges, and The Road Ahead
by: Nguyen, Thanh Linh, et al.
Published: (2025)
by: Nguyen, Thanh Linh, et al.
Published: (2025)
A Multi-Armed Bandit-Based Participant Selection Method for Federated Recommendation Systems
by: Liu, Jintao, et al.
Published: (2025)
by: Liu, Jintao, et al.
Published: (2025)
Accurate GPU Memory Prediction for Deep Learning Jobs through Dynamic Analysis
by: Shi, Jiabo, et al.
Published: (2025)
by: Shi, Jiabo, et al.
Published: (2025)
Using Sequential Runtime Distributions for the Parallel Speedup Prediction of SAT Local Search
by: Arbelaez, Alejandro, et al.
Published: (2024)
by: Arbelaez, Alejandro, et al.
Published: (2024)
A Survey on Large Language Model Acceleration based on KV Cache Management
by: Li, Haoyang, et al.
Published: (2024)
by: Li, Haoyang, et al.
Published: (2024)
Cooperative Cognitive Dynamic System in UAV Swarms: Reconfigurable Mechanism and Framework
by: Jia, Ziye, et al.
Published: (2024)
by: Jia, Ziye, et al.
Published: (2024)
Towards Carbon-Aware Container Orchestration: Predicting Workload Energy Consumption with Federated Learning
by: Saad, Zainab, et al.
Published: (2025)
by: Saad, Zainab, et al.
Published: (2025)
Similar Items
-
Ksurf-Drone: Attention Kalman Filter for Contextual Bandit Optimization in Cloud Resource Allocation
by: Dang'ana, Michael, et al.
Published: (2025) -
Hardware-Aware Reformulation of Convolutions for Efficient Execution on Specialized AI Hardware: A Case Study on NVIDIA Tensor Cores
by: Bikshandi, Ganesh
Published: (2026) -
Online GPU Energy Optimization with Switching-Aware Bandits
by: Xu, Xiongxiao, et al.
Published: (2024) -
The Case for Co-Designing Model Architectures with Hardware
by: Anthony, Quentin, et al.
Published: (2024) -
SwizzlePerf: Hardware-Aware LLMs for GPU Kernel Performance Optimization
by: Tschand, Arya, et al.
Published: (2025)