Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	O'Quinn, Austin, Snedeker, Conor, Zhang, Siyuan, Kline, Jenna
Format:	Preprint
Published:	2025
Subjects:	Distributed, Parallel, and Cluster Computing
Online Access:	https://arxiv.org/abs/2503.03070
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866917946303971328
author	O'Quinn, Austin Snedeker, Conor Zhang, Siyuan Kline, Jenna
author_facet	O'Quinn, Austin Snedeker, Conor Zhang, Siyuan Kline, Jenna
contents	IoT and edge-based inference systems require unique solutions to overcome resource limitations and unpredictable environments. In this paper, we propose an environment-aware dynamic pruning system that handles the unpredictability of edge inference pipelines. While traditional pruning approaches can reduce model footprint and compute requirements, they are often performed only once, offline, and are not designed to react to transient or post-deployment device conditions. Similarly, existing pipeline placement strategies may incur high overhead if reconfigured at runtime, limiting their responsiveness. Our approach allows slices of a model, already placed on a distributed pipeline, to be ad-hoc pruned as a means of load-balancing. To support this capability, we introduce two key components: (1) novel training strategies that endow models with robustness to post-deployment pruning, and (2) an adaptive algorithm that determines the optimal pruning level for each node based on monitored bottlenecks. In real-world experiments on a Raspberry Pi 4B cluster running camera-trap workloads, our method achieves a 1.5x speedup and a 3x improvement in service-level objective (SLO) attainment, all while maintaining high accuracy.
format	Preprint
id	arxiv_https___arxiv_org_abs_2503_03070
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Environment-Aware Dynamic Pruning for Pipelined Edge Inference O'Quinn, Austin Snedeker, Conor Zhang, Siyuan Kline, Jenna Distributed, Parallel, and Cluster Computing IoT and edge-based inference systems require unique solutions to overcome resource limitations and unpredictable environments. In this paper, we propose an environment-aware dynamic pruning system that handles the unpredictability of edge inference pipelines. While traditional pruning approaches can reduce model footprint and compute requirements, they are often performed only once, offline, and are not designed to react to transient or post-deployment device conditions. Similarly, existing pipeline placement strategies may incur high overhead if reconfigured at runtime, limiting their responsiveness. Our approach allows slices of a model, already placed on a distributed pipeline, to be ad-hoc pruned as a means of load-balancing. To support this capability, we introduce two key components: (1) novel training strategies that endow models with robustness to post-deployment pruning, and (2) an adaptive algorithm that determines the optimal pruning level for each node based on monitored bottlenecks. In real-world experiments on a Raspberry Pi 4B cluster running camera-trap workloads, our method achieves a 1.5x speedup and a 3x improvement in service-level objective (SLO) attainment, all while maintaining high accuracy.
title	Environment-Aware Dynamic Pruning for Pipelined Edge Inference
topic	Distributed, Parallel, and Cluster Computing
url	https://arxiv.org/abs/2503.03070

Similar Items