Affichage MARC: :: Library Catalog

Enregistré dans:

Détails bibliographiques
Auteurs principaux:	Negroni, Viola, Cuccovillo, Luca, Bestagini, Paolo, Aichroth, Patrick, Tubaro, Stefano
Format:	Preprint
Publié:	2026
Sujets:	Sound
Accès en ligne:	https://arxiv.org/abs/2601.14850
Tags:	Ajouter un tag Pas de tags, Soyez le premier à ajouter un tag!

_version_	1866915746101067776
author	Negroni, Viola Cuccovillo, Luca Bestagini, Paolo Aichroth, Patrick Tubaro, Stefano
author_facet	Negroni, Viola Cuccovillo, Luca Bestagini, Paolo Aichroth, Patrick Tubaro, Stefano
contents	In this work, we introduce a multi-task transformer for speech deepfake detection, capable of predicting formant trajectories and voicing patterns over time, ultimately classifying speech as real or fake, and highlighting whether its decisions rely more on voiced or unvoiced regions. Building on a prior speaker-formant transformer architecture, we streamline the model with an improved input segmentation strategy, redesign the decoding process, and integrate built-in explainability. Compared to the baseline, our model requires fewer parameters, trains faster, and provides better interpretability, without sacrificing prediction performance.
format	Preprint
id	arxiv_https___arxiv_org_abs_2601_14850
institution	arXiv
publishDate	2026
record_format	arxiv
spellingShingle	Multi-Task Transformer for Explainable Speech Deepfake Detection via Formant Modeling Negroni, Viola Cuccovillo, Luca Bestagini, Paolo Aichroth, Patrick Tubaro, Stefano Sound In this work, we introduce a multi-task transformer for speech deepfake detection, capable of predicting formant trajectories and voicing patterns over time, ultimately classifying speech as real or fake, and highlighting whether its decisions rely more on voiced or unvoiced regions. Building on a prior speaker-formant transformer architecture, we streamline the model with an improved input segmentation strategy, redesign the decoding process, and integrate built-in explainability. Compared to the baseline, our model requires fewer parameters, trains faster, and provides better interpretability, without sacrificing prediction performance.
title	Multi-Task Transformer for Explainable Speech Deepfake Detection via Formant Modeling
topic	Sound
url	https://arxiv.org/abs/2601.14850

Documents similaires