Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Sharify, Hossein, Raoufi, Behnam, Ramezani, Mahdy, Hajsadeghi, Khosrow, Shouraki, Saeed Bagheri
Format:	Preprint
Published:	2025
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2512.13905
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914202963148800
author	Sharify, Hossein Raoufi, Behnam Ramezani, Mahdy Hajsadeghi, Khosrow Shouraki, Saeed Bagheri
author_facet	Sharify, Hossein Raoufi, Behnam Ramezani, Mahdy Hajsadeghi, Khosrow Shouraki, Saeed Bagheri
contents	We present a compact, quantization-ready acoustic scene classification (ASC) framework that couples an efficient student network with a learned teacher ensemble and knowledge distillation. The student backbone uses stacked depthwise-separable "expand-depthwise-project" blocks with global response normalization to stabilize training and improve robustness to device and noise variability, while a global pooling head yields class logits for efficient edge inference. To inject richer inductive bias, we assemble a diverse set of teacher models and learn two complementary fusion heads: z1, which predicts per-teacher mixture weights using a student-style backbone, and z2, a lightweight MLP that performs per-class logit fusion. The student is distilled from the ensemble via temperature-scaled soft targets combined with hard labels, enabling it to approximate the ensemble's decision geometry with a single compact model. Evaluated on the TAU Urban Acoustic Scenes 2022 Mobile benchmark, our approach achieves state-of-the-art (SOTA) results on the TAU dataset under matched edge-deployment constraints, demonstrating strong performance and practicality for mobile ASC.
format	Preprint
id	arxiv_https___arxiv_org_abs_2512_13905
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Ensemble-Guided Distillation for Compact and Robust Acoustic Scene Classification on Edge Devices Sharify, Hossein Raoufi, Behnam Ramezani, Mahdy Hajsadeghi, Khosrow Shouraki, Saeed Bagheri Sound We present a compact, quantization-ready acoustic scene classification (ASC) framework that couples an efficient student network with a learned teacher ensemble and knowledge distillation. The student backbone uses stacked depthwise-separable "expand-depthwise-project" blocks with global response normalization to stabilize training and improve robustness to device and noise variability, while a global pooling head yields class logits for efficient edge inference. To inject richer inductive bias, we assemble a diverse set of teacher models and learn two complementary fusion heads: z1, which predicts per-teacher mixture weights using a student-style backbone, and z2, a lightweight MLP that performs per-class logit fusion. The student is distilled from the ensemble via temperature-scaled soft targets combined with hard labels, enabling it to approximate the ensemble's decision geometry with a single compact model. Evaluated on the TAU Urban Acoustic Scenes 2022 Mobile benchmark, our approach achieves state-of-the-art (SOTA) results on the TAU dataset under matched edge-deployment constraints, demonstrating strong performance and practicality for mobile ASC.
title	Ensemble-Guided Distillation for Compact and Robust Acoustic Scene Classification on Edge Devices
topic	Sound
url	https://arxiv.org/abs/2512.13905

Similar Items