:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Panopoulos, Ioannis, Venieris, Stylianos I., Venieris, Iakovos S.
Format:	Preprint
Published:	2024
Subjects:	Machine Learning Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2409.01089
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MultiTASC++: A Continuously Adaptive Scheduler for Edge-Based Multi-Device Cascade Inference
by: Nikolaidis, Sokratis, et al.
Published: (2024)

RankMap: Priority-Aware Multi-DNN Manager for Heterogeneous Embedded Devices
by: Karatzas, Andreas, et al.
Published: (2024)

FlashMem: Supporting Modern DNN Workloads on Mobile with GPU Memory Hierarchy Optimizations
by: Shu, Zhihao, et al.
Published: (2026)

Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference
by: Siavashi, Mohammad, et al.
Published: (2025)

Practical Performance Guarantees for Pipelined DNN Inference
by: Archer, Aaron, et al.
Published: (2023)

DALI: A Workload-Aware Offloading Framework for Efficient MoE Inference on Local PCs
by: Zhu, Zeyu, et al.
Published: (2026)

Efficient Unified Caching for Accelerating Heterogeneous AI Workloads
by: Wang, Tianze, et al.
Published: (2025)

A Survey on Collaborative DNN Inference for Edge Intelligence
by: Ren, Weiqing, et al.
Published: (2022)

Multi-DNN Inference of Sparse Models on Edge SoCs
by: Luo, Jiawei, et al.
Published: (2026)

Learning the Optimal Path and DNN Partition for Collaborative Edge Inference
by: Huang, Yin, et al.
Published: (2024)

HiDP: Hierarchical DNN Partitioning for Distributed Inference on Heterogeneous Edge Platforms
by: Taufique, Zain, et al.
Published: (2024)

ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference
by: Oh, Hyungjun, et al.
Published: (2024)

Agent.xpu: Efficient Scheduling of Agentic LLM Workloads on Heterogeneous SoC
by: Wei, Xinming, et al.
Published: (2025)

PracMHBench: Re-evaluating Model-Heterogeneous Federated Learning Based on Practical Edge Device Constraints
by: Guo, Yuanchun, et al.
Published: (2025)

SwapNet: Efficient Swapping for DNN Inference on Edge AI Devices Beyond the Memory Budget
by: Wang, Kun, et al.
Published: (2024)

Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference
by: Prabhakar, Rohan Baskar, et al.
Published: (2024)

CoFormer: Collaborating with Heterogeneous Edge Devices for Scalable Transformer Inference
by: Xu, Guanyu, et al.
Published: (2025)

Adaptive Workload Distribution for Accuracy-aware DNN Inference on Collaborative Edge Platforms
by: Taufique, Zain, et al.
Published: (2023)

AdaOper: Energy-efficient and Responsive Concurrent DNN Inference on Mobile Devices
by: Lin, Zheng, et al.
Published: (2024)

Evaluating Multi-Instance DNN Inferencing on Multiple Accelerators of an Edge Device
by: Tayal, Mumuksh, et al.
Published: (2025)

Constraint-Aware Execution Planning for Hybrid Space-Ground Compute Workloads
by: Mitra, Subhadip
Published: (2026)

FedCompass: Efficient Cross-Silo Federated Learning on Heterogeneous Client Devices using a Computing Power Aware Scheduler
by: Li, Zilinghan, et al.
Published: (2023)

The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project
by: Chen, Huamin, et al.
Published: (2026)

NestQuant: Post-Training Integer-Nesting Quantization for On-Device DNN
by: Xie, Jianhang, et al.
Published: (2025)

MorphServe: Efficient and Workload-Aware LLM Serving via Runtime Quantized Layer Swapping and KV Cache Resizing
by: Su, Zhaoyuan, et al.
Published: (2025)

Heterogeneous LoRA for Federated Fine-tuning of On-Device Foundation Models
by: Cho, Yae Jee, et al.
Published: (2024)

Metadata-Guided Adaptable Frequency Scaling across Heterogeneous Applications and Devices
by: Yan, Jinqi, et al.
Published: (2025)

Lumos: Heterogeneity-aware Federated Graph Learning over Decentralized Devices
by: Pan, Qiying, et al.
Published: (2023)

Tackling Intertwined Data and Device Heterogeneities in Federated Learning with Unlimited Staleness
by: Wang, Haoming, et al.
Published: (2023)

PaSE: Parallelization Strategies for Efficient DNN Training
by: Elango, Venmugil
Published: (2024)

Adaptive Stream Processing on Edge Devices through Active Inference
by: Sedlak, Boris, et al.
Published: (2024)

LIBRA: Enabling Workload-aware Multi-dimensional Network Topology Optimization for Distributed Training of Large AI Models
by: Won, William, et al.
Published: (2021)

Robust DNN Partitioning and Resource Allocation Under Uncertain Inference Time
by: Nan, Zhaojun, et al.
Published: (2025)

Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference
by: Zhang, Haolin, et al.
Published: (2025)

CAFL-L: Constraint-Aware Federated Learning with Lagrangian Dual Optimization for On-Device Language Models
by: Zheng, Dongqi, et al.
Published: (2025)

Ecomap: Sustainability-Driven Optimization of Multi-Tenant DNN Execution on Edge Servers
by: Paramanayakam, Varatheepan, et al.
Published: (2025)

Symphony: Optimized DNN Model Serving using Deferred Batch Scheduling
by: Chen, Lequn, et al.
Published: (2023)

Exploring the Boundaries of On-Device Inference: When Tiny Falls Short, Go Hierarchical
by: Behera, Adarsh Prasad, et al.
Published: (2024)

Heterogeneity-Aware Client Selection Methodology For Efficient Federated Learning
by: Balivada, Nihal, et al.
Published: (2026)

Resource-Aware Aggregation and Sparsification in Heterogeneous Ensemble Federated Learning
by: Ryum, Keumseo, et al.
Published: (2025)