Saved in:
Bibliographic Details
Main Authors: Ghafouri, Saeid, Fayyaz, Mohsen, Li, Xiangchen, John, Deepu, Ji, Bo, Nikolopoulos, Dimitrios, Vandierendonck, Hans
Format: Preprint
Published: 2025
Subjects:
Online Access:https://arxiv.org/abs/2507.14959
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Real-time multi-label video classification on embedded devices is constrained by limited compute and energy budgets. Yet, video streams exhibit structural properties such as label sparsity, temporal continuity, and label co-occurrence that can be leveraged for more efficient inference. We introduce Polymorph, a context-aware framework that activates a minimal set of lightweight Low Rank Adapters (LoRA) per frame. Each adapter specializes in a subset of classes derived from co-occurrence patterns and is implemented as a LoRA weight over a shared backbone. At runtime, Polymorph dynamically selects and composes only the adapters needed to cover the active labels, avoiding full-model switching and weight merging. This modular strategy improves scalability while reducing latency and energy overhead. Polymorph achieves 40% lower energy consumption and improves mAP by 9 points over strong baselines on the TAO dataset. Polymorph is open source at https://github.com/inference-serving/polymorph/.