Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Nihal, Ragib Amin, Yen, Benjamin, Shi, Runwu, Ashizawa, Takeshi, Nakadai, Kazuhiro
Format:	Preprint
Publié:	2025
Sujets:	Sound Artificial Intelligence Machine Learning Audio and Speech Processing
Accès en ligne:	https://arxiv.org/abs/2502.20838
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866911725559742464
author	Nihal, Ragib Amin Yen, Benjamin Shi, Runwu Ashizawa, Takeshi Nakadai, Kazuhiro
author_facet	Nihal, Ragib Amin Yen, Benjamin Shi, Runwu Ashizawa, Takeshi Nakadai, Kazuhiro
contents	Passive acoustic monitoring (PAM) systems generate continuous recordings spanning months, yet automated bioacoustic analysis of whale calls requires two separate annotation efforts: binary presence labels for classification and precise temporal boundaries for localization. A binary label for a multi-minute recording can be assigned in seconds, but timestamping every call within it requires hours of expert effort. Providing both is infeasible at operational scale. We present DSMIL-LocNet, a weakly supervised multiple instance learning (MIL) framework that performs both classification and temporal localization using only recording-level presence/absence labels. Our dual-stream architecture integrates spectral and temporal features to process recordings of 2--30 minutes without the temporal compression that degrades existing CNN methods on long inputs. On the AcousticTrends BlueFinLibrary, DSMIL-LocNet achieves F1 scores of 0.88--0.91 on recordings of 300--1800s, where fully supervised CNN baselines degrade to 0.19--0.64. It also provides temporal localization that these baselines cannot produce without frame-level annotation. Code: https://github.com/Ragib-Amin-Nihal/DSMIL-Loc
format	Preprint
id	arxiv_https___arxiv_org_abs_2502_20838
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Weakly Supervised Detection and Temporal Localization of Whale Calls in Long-Duration Bioacoustic Data Nihal, Ragib Amin Yen, Benjamin Shi, Runwu Ashizawa, Takeshi Nakadai, Kazuhiro Sound Artificial Intelligence Machine Learning Audio and Speech Processing Passive acoustic monitoring (PAM) systems generate continuous recordings spanning months, yet automated bioacoustic analysis of whale calls requires two separate annotation efforts: binary presence labels for classification and precise temporal boundaries for localization. A binary label for a multi-minute recording can be assigned in seconds, but timestamping every call within it requires hours of expert effort. Providing both is infeasible at operational scale. We present DSMIL-LocNet, a weakly supervised multiple instance learning (MIL) framework that performs both classification and temporal localization using only recording-level presence/absence labels. Our dual-stream architecture integrates spectral and temporal features to process recordings of 2--30 minutes without the temporal compression that degrades existing CNN methods on long inputs. On the AcousticTrends BlueFinLibrary, DSMIL-LocNet achieves F1 scores of 0.88--0.91 on recordings of 300--1800s, where fully supervised CNN baselines degrade to 0.19--0.64. It also provides temporal localization that these baselines cannot produce without frame-level annotation. Code: https://github.com/Ragib-Amin-Nihal/DSMIL-Loc
title	Weakly Supervised Detection and Temporal Localization of Whale Calls in Long-Duration Bioacoustic Data
topic	Sound Artificial Intelligence Machine Learning Audio and Speech Processing
url	https://arxiv.org/abs/2502.20838

Documents similaires