Staff View: :: Library Catalog

Saved in:

Bibliographic Details
Main Authors:	Olsen, Kenny Falkær, Østergaard, Mads, Ulbæk, Karl, Nielsen, Søren Føns, Lindrup, Rasmus Malik Høegh, Jensen, Bjørn Sand, Mørup, Morten
Format:	Preprint
Published:	2025
Subjects:	Machine Learning Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2507.09768
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1866914368358187008
author	Olsen, Kenny Falkær Østergaard, Mads Ulbæk, Karl Nielsen, Søren Føns Lindrup, Rasmus Malik Høegh Jensen, Bjørn Sand Mørup, Morten
author_facet	Olsen, Kenny Falkær Østergaard, Mads Ulbæk, Karl Nielsen, Søren Føns Lindrup, Rasmus Malik Høegh Jensen, Bjørn Sand Mørup, Morten
contents	In recent years, deep learning-based single-channel speech separation has improved considerably, in large part driven by increasingly compute- and parameter-efficient neural network architectures. Most such architectures are, however, designed with a fixed compute and parameter budget and consequently cannot scale to varying compute demands or resources, which limits their use in embedded and heterogeneous devices such as mobile phones and hearables. To enable such use-cases we design a neural network architecture for speech separation and enhancement capable of early-exit, and we propose an uncertainty-aware probabilistic framework to jointly model the clean speech signal and error variance which we use to derive probabilistic early-exit conditions in terms of desired signal-to-noise ratios. We evaluate our methods on both speech separation and enhancement tasks where we demonstrate that early-exit capabilities can be introduced without compromising reconstruction, and that when trained on variable-length audio our early-exit conditions are well-calibrated and lead to considerable compute savings when used to dynamically scale compute at test time while remaining directly interpretable.
format	Preprint
id	arxiv_https___arxiv_org_abs_2507_09768
institution	arXiv
publishDate	2025
record_format	arxiv
spellingShingle	Knowing When to Quit: Probabilistic Early Exits for Speech Separation Olsen, Kenny Falkær Østergaard, Mads Ulbæk, Karl Nielsen, Søren Føns Lindrup, Rasmus Malik Høegh Jensen, Bjørn Sand Mørup, Morten Machine Learning Sound Audio and Speech Processing In recent years, deep learning-based single-channel speech separation has improved considerably, in large part driven by increasingly compute- and parameter-efficient neural network architectures. Most such architectures are, however, designed with a fixed compute and parameter budget and consequently cannot scale to varying compute demands or resources, which limits their use in embedded and heterogeneous devices such as mobile phones and hearables. To enable such use-cases we design a neural network architecture for speech separation and enhancement capable of early-exit, and we propose an uncertainty-aware probabilistic framework to jointly model the clean speech signal and error variance which we use to derive probabilistic early-exit conditions in terms of desired signal-to-noise ratios. We evaluate our methods on both speech separation and enhancement tasks where we demonstrate that early-exit capabilities can be introduced without compromising reconstruction, and that when trained on variable-length audio our early-exit conditions are well-calibrated and lead to considerable compute savings when used to dynamically scale compute at test time while remaining directly interpretable.
title	Knowing When to Quit: Probabilistic Early Exits for Speech Separation
topic	Machine Learning Sound Audio and Speech Processing
url	https://arxiv.org/abs/2507.09768

Similar Items