Enregistré dans:
| Auteurs principaux: | , , , , |
|---|---|
| Format: | Preprint |
| Publié: |
2025
|
| Sujets: | |
| Accès en ligne: | https://arxiv.org/abs/2502.20838 |
| Tags: |
Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
|
| _version_ | 1866911725559742464 |
|---|---|
| author | Nihal, Ragib Amin Yen, Benjamin Shi, Runwu Ashizawa, Takeshi Nakadai, Kazuhiro |
| author_facet | Nihal, Ragib Amin Yen, Benjamin Shi, Runwu Ashizawa, Takeshi Nakadai, Kazuhiro |
| contents | Passive acoustic monitoring (PAM) systems generate continuous recordings spanning months, yet automated bioacoustic analysis of whale calls requires two separate annotation efforts: binary presence labels for classification and precise temporal boundaries for localization. A binary label for a multi-minute recording can be assigned in seconds, but timestamping every call within it requires hours of expert effort. Providing both is infeasible at operational scale. We present DSMIL-LocNet, a weakly supervised multiple instance learning (MIL) framework that performs both classification and temporal localization using only recording-level presence/absence labels. Our dual-stream architecture integrates spectral and temporal features to process recordings of 2--30 minutes without the temporal compression that degrades existing CNN methods on long inputs. On the AcousticTrends BlueFinLibrary, DSMIL-LocNet achieves F1 scores of 0.88--0.91 on recordings of 300--1800s, where fully supervised CNN baselines degrade to 0.19--0.64. It also provides temporal localization that these baselines cannot produce without frame-level annotation. Code: https://github.com/Ragib-Amin-Nihal/DSMIL-Loc |
| format | Preprint |
| id |
arxiv_https___arxiv_org_abs_2502_20838 |
| institution | arXiv |
| publishDate | 2025 |
| record_format | arxiv |
| spellingShingle | Weakly Supervised Detection and Temporal Localization of Whale Calls in Long-Duration Bioacoustic Data Nihal, Ragib Amin Yen, Benjamin Shi, Runwu Ashizawa, Takeshi Nakadai, Kazuhiro Sound Artificial Intelligence Machine Learning Audio and Speech Processing Passive acoustic monitoring (PAM) systems generate continuous recordings spanning months, yet automated bioacoustic analysis of whale calls requires two separate annotation efforts: binary presence labels for classification and precise temporal boundaries for localization. A binary label for a multi-minute recording can be assigned in seconds, but timestamping every call within it requires hours of expert effort. Providing both is infeasible at operational scale. We present DSMIL-LocNet, a weakly supervised multiple instance learning (MIL) framework that performs both classification and temporal localization using only recording-level presence/absence labels. Our dual-stream architecture integrates spectral and temporal features to process recordings of 2--30 minutes without the temporal compression that degrades existing CNN methods on long inputs. On the AcousticTrends BlueFinLibrary, DSMIL-LocNet achieves F1 scores of 0.88--0.91 on recordings of 300--1800s, where fully supervised CNN baselines degrade to 0.19--0.64. It also provides temporal localization that these baselines cannot produce without frame-level annotation. Code: https://github.com/Ragib-Amin-Nihal/DSMIL-Loc |
| title | Weakly Supervised Detection and Temporal Localization of Whale Calls in Long-Duration Bioacoustic Data |
| topic | Sound Artificial Intelligence Machine Learning Audio and Speech Processing |
| url | https://arxiv.org/abs/2502.20838 |