Saved in:
| Main Authors: | Panopoulos, Ioannis, Venieris, Stylianos I., Venieris, Iakovos S. |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2409.01089 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
MultiTASC++: A Continuously Adaptive Scheduler for Edge-Based Multi-Device Cascade Inference
by: Nikolaidis, Sokratis, et al.
Published: (2024)
by: Nikolaidis, Sokratis, et al.
Published: (2024)
RankMap: Priority-Aware Multi-DNN Manager for Heterogeneous Embedded Devices
by: Karatzas, Andreas, et al.
Published: (2024)
by: Karatzas, Andreas, et al.
Published: (2024)
FlashMem: Supporting Modern DNN Workloads on Mobile with GPU Memory Hierarchy Optimizations
by: Shu, Zhihao, et al.
Published: (2026)
by: Shu, Zhihao, et al.
Published: (2026)
Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference
by: Siavashi, Mohammad, et al.
Published: (2025)
by: Siavashi, Mohammad, et al.
Published: (2025)
Practical Performance Guarantees for Pipelined DNN Inference
by: Archer, Aaron, et al.
Published: (2023)
by: Archer, Aaron, et al.
Published: (2023)
DALI: A Workload-Aware Offloading Framework for Efficient MoE Inference on Local PCs
by: Zhu, Zeyu, et al.
Published: (2026)
by: Zhu, Zeyu, et al.
Published: (2026)
Efficient Unified Caching for Accelerating Heterogeneous AI Workloads
by: Wang, Tianze, et al.
Published: (2025)
by: Wang, Tianze, et al.
Published: (2025)
A Survey on Collaborative DNN Inference for Edge Intelligence
by: Ren, Weiqing, et al.
Published: (2022)
by: Ren, Weiqing, et al.
Published: (2022)
Multi-DNN Inference of Sparse Models on Edge SoCs
by: Luo, Jiawei, et al.
Published: (2026)
by: Luo, Jiawei, et al.
Published: (2026)
Learning the Optimal Path and DNN Partition for Collaborative Edge Inference
by: Huang, Yin, et al.
Published: (2024)
by: Huang, Yin, et al.
Published: (2024)
HiDP: Hierarchical DNN Partitioning for Distributed Inference on Heterogeneous Edge Platforms
by: Taufique, Zain, et al.
Published: (2024)
by: Taufique, Zain, et al.
Published: (2024)
ExeGPT: Constraint-Aware Resource Scheduling for LLM Inference
by: Oh, Hyungjun, et al.
Published: (2024)
by: Oh, Hyungjun, et al.
Published: (2024)
Agent.xpu: Efficient Scheduling of Agentic LLM Workloads on Heterogeneous SoC
by: Wei, Xinming, et al.
Published: (2025)
by: Wei, Xinming, et al.
Published: (2025)
PracMHBench: Re-evaluating Model-Heterogeneous Federated Learning Based on Practical Edge Device Constraints
by: Guo, Yuanchun, et al.
Published: (2025)
by: Guo, Yuanchun, et al.
Published: (2025)
SwapNet: Efficient Swapping for DNN Inference on Edge AI Devices Beyond the Memory Budget
by: Wang, Kun, et al.
Published: (2024)
by: Wang, Kun, et al.
Published: (2024)
Kraken: Inherently Parallel Transformers For Efficient Multi-Device Inference
by: Prabhakar, Rohan Baskar, et al.
Published: (2024)
by: Prabhakar, Rohan Baskar, et al.
Published: (2024)
CoFormer: Collaborating with Heterogeneous Edge Devices for Scalable Transformer Inference
by: Xu, Guanyu, et al.
Published: (2025)
by: Xu, Guanyu, et al.
Published: (2025)
Adaptive Workload Distribution for Accuracy-aware DNN Inference on Collaborative Edge Platforms
by: Taufique, Zain, et al.
Published: (2023)
by: Taufique, Zain, et al.
Published: (2023)
AdaOper: Energy-efficient and Responsive Concurrent DNN Inference on Mobile Devices
by: Lin, Zheng, et al.
Published: (2024)
by: Lin, Zheng, et al.
Published: (2024)
Evaluating Multi-Instance DNN Inferencing on Multiple Accelerators of an Edge Device
by: Tayal, Mumuksh, et al.
Published: (2025)
by: Tayal, Mumuksh, et al.
Published: (2025)
Constraint-Aware Execution Planning for Hybrid Space-Ground Compute Workloads
by: Mitra, Subhadip
Published: (2026)
by: Mitra, Subhadip
Published: (2026)
FedCompass: Efficient Cross-Silo Federated Learning on Heterogeneous Client Devices using a Computing Power Aware Scheduler
by: Li, Zilinghan, et al.
Published: (2023)
by: Li, Zilinghan, et al.
Published: (2023)
The Workload-Router-Pool Architecture for LLM Inference Optimization: A Vision Paper from the vLLM Semantic Router Project
by: Chen, Huamin, et al.
Published: (2026)
by: Chen, Huamin, et al.
Published: (2026)
NestQuant: Post-Training Integer-Nesting Quantization for On-Device DNN
by: Xie, Jianhang, et al.
Published: (2025)
by: Xie, Jianhang, et al.
Published: (2025)
MorphServe: Efficient and Workload-Aware LLM Serving via Runtime Quantized Layer Swapping and KV Cache Resizing
by: Su, Zhaoyuan, et al.
Published: (2025)
by: Su, Zhaoyuan, et al.
Published: (2025)
Heterogeneous LoRA for Federated Fine-tuning of On-Device Foundation Models
by: Cho, Yae Jee, et al.
Published: (2024)
by: Cho, Yae Jee, et al.
Published: (2024)
Metadata-Guided Adaptable Frequency Scaling across Heterogeneous Applications and Devices
by: Yan, Jinqi, et al.
Published: (2025)
by: Yan, Jinqi, et al.
Published: (2025)
Lumos: Heterogeneity-aware Federated Graph Learning over Decentralized Devices
by: Pan, Qiying, et al.
Published: (2023)
by: Pan, Qiying, et al.
Published: (2023)
Tackling Intertwined Data and Device Heterogeneities in Federated Learning with Unlimited Staleness
by: Wang, Haoming, et al.
Published: (2023)
by: Wang, Haoming, et al.
Published: (2023)
PaSE: Parallelization Strategies for Efficient DNN Training
by: Elango, Venmugil
Published: (2024)
by: Elango, Venmugil
Published: (2024)
Adaptive Stream Processing on Edge Devices through Active Inference
by: Sedlak, Boris, et al.
Published: (2024)
by: Sedlak, Boris, et al.
Published: (2024)
LIBRA: Enabling Workload-aware Multi-dimensional Network Topology Optimization for Distributed Training of Large AI Models
by: Won, William, et al.
Published: (2021)
by: Won, William, et al.
Published: (2021)
Robust DNN Partitioning and Resource Allocation Under Uncertain Inference Time
by: Nan, Zhaojun, et al.
Published: (2025)
by: Nan, Zhaojun, et al.
Published: (2025)
Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference
by: Zhang, Haolin, et al.
Published: (2025)
by: Zhang, Haolin, et al.
Published: (2025)
CAFL-L: Constraint-Aware Federated Learning with Lagrangian Dual Optimization for On-Device Language Models
by: Zheng, Dongqi, et al.
Published: (2025)
by: Zheng, Dongqi, et al.
Published: (2025)
Ecomap: Sustainability-Driven Optimization of Multi-Tenant DNN Execution on Edge Servers
by: Paramanayakam, Varatheepan, et al.
Published: (2025)
by: Paramanayakam, Varatheepan, et al.
Published: (2025)
Symphony: Optimized DNN Model Serving using Deferred Batch Scheduling
by: Chen, Lequn, et al.
Published: (2023)
by: Chen, Lequn, et al.
Published: (2023)
Exploring the Boundaries of On-Device Inference: When Tiny Falls Short, Go Hierarchical
by: Behera, Adarsh Prasad, et al.
Published: (2024)
by: Behera, Adarsh Prasad, et al.
Published: (2024)
Heterogeneity-Aware Client Selection Methodology For Efficient Federated Learning
by: Balivada, Nihal, et al.
Published: (2026)
by: Balivada, Nihal, et al.
Published: (2026)
Resource-Aware Aggregation and Sparsification in Heterogeneous Ensemble Federated Learning
by: Ryum, Keumseo, et al.
Published: (2025)
by: Ryum, Keumseo, et al.
Published: (2025)
Similar Items
-
MultiTASC++: A Continuously Adaptive Scheduler for Edge-Based Multi-Device Cascade Inference
by: Nikolaidis, Sokratis, et al.
Published: (2024) -
RankMap: Priority-Aware Multi-DNN Manager for Heterogeneous Embedded Devices
by: Karatzas, Andreas, et al.
Published: (2024) -
FlashMem: Supporting Modern DNN Workloads on Mobile with GPU Memory Hierarchy Optimizations
by: Shu, Zhihao, et al.
Published: (2026) -
Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference
by: Siavashi, Mohammad, et al.
Published: (2025) -
Practical Performance Guarantees for Pipelined DNN Inference
by: Archer, Aaron, et al.
Published: (2023)