Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Giacomelli, Stefano, Giordano, Marco, Rinaldi, Claudia, Graziosi, Fabio
Format: Preprint
Veröffentlicht: 2025
Schlagworte:
Online-Zugang:https://arxiv.org/abs/2506.23437
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
_version_ 1866918076079931392
author Giacomelli, Stefano
Giordano, Marco
Rinaldi, Claudia
Graziosi, Fabio
author_facet Giacomelli, Stefano
Giordano, Marco
Rinaldi, Claudia
Graziosi, Fabio
contents Accurate recognition of Emergency Vehicle (EV) sirens is critical for the integration of intelligent transportation systems, smart city monitoring systems, and autonomous driving technologies. Modern automatic solutions are limited by the lack of large scale, curated datasets and by the computational demands of state of the art sound event detection models. This work introduces E2PANNs (Efficient Emergency Pre trained Audio Neural Networks), a lightweight Convolutional Neural Network architecture derived from the PANNs framework, specifically optimized for binary EV siren detection. Leveraging our dedicated subset of AudioSet (AudioSet EV) we fine-tune and evaluate E2PANNs across multiple reference datasets and test its viability on embedded hardware. The experimental campaign includes ablation studies, cross-domain benchmarking, and real-time inference deployment on edge device. Interpretability analyses exploiting Guided Backpropagation and ScoreCAM algorithms provide insights into the model internal representations and validate its ability to capture distinct spectrotemporal patterns associated with different types of EV sirens. Real time performance is assessed through frame wise and event based detection metrics, as well as a detailed analysis of false positive activations. Results demonstrate that E2PANNs establish a new state of the art in this research domain, with high computational efficiency, and suitability for edge-based audio monitoring and safety-critical applications.
format Preprint
id arxiv_https___arxiv_org_abs_2506_23437
institution arXiv
publishDate 2025
record_format arxiv
spellingShingle From Large-scale Audio Tagging to Real-Time Explainable Emergency Vehicle Sirens Detection
Giacomelli, Stefano
Giordano, Marco
Rinaldi, Claudia
Graziosi, Fabio
Sound
Artificial Intelligence
Audio and Speech Processing
68T07
E.1; H.1; I.2; I.5; J.2; K.4; C.4
Accurate recognition of Emergency Vehicle (EV) sirens is critical for the integration of intelligent transportation systems, smart city monitoring systems, and autonomous driving technologies. Modern automatic solutions are limited by the lack of large scale, curated datasets and by the computational demands of state of the art sound event detection models. This work introduces E2PANNs (Efficient Emergency Pre trained Audio Neural Networks), a lightweight Convolutional Neural Network architecture derived from the PANNs framework, specifically optimized for binary EV siren detection. Leveraging our dedicated subset of AudioSet (AudioSet EV) we fine-tune and evaluate E2PANNs across multiple reference datasets and test its viability on embedded hardware. The experimental campaign includes ablation studies, cross-domain benchmarking, and real-time inference deployment on edge device. Interpretability analyses exploiting Guided Backpropagation and ScoreCAM algorithms provide insights into the model internal representations and validate its ability to capture distinct spectrotemporal patterns associated with different types of EV sirens. Real time performance is assessed through frame wise and event based detection metrics, as well as a detailed analysis of false positive activations. Results demonstrate that E2PANNs establish a new state of the art in this research domain, with high computational efficiency, and suitability for edge-based audio monitoring and safety-critical applications.
title From Large-scale Audio Tagging to Real-Time Explainable Emergency Vehicle Sirens Detection
topic Sound
Artificial Intelligence
Audio and Speech Processing
68T07
E.1; H.1; I.2; I.5; J.2; K.4; C.4
url https://arxiv.org/abs/2506.23437