Enregistré dans:
Détails bibliographiques
Auteurs principaux: Negroni, Viola, Cuccovillo, Luca, Bestagini, Paolo, Aichroth, Patrick, Tubaro, Stefano
Format: Preprint
Publié: 2026
Sujets:
Accès en ligne:https://arxiv.org/abs/2601.14850
Tags: Ajouter un tag
Pas de tags, Soyez le premier à ajouter un tag!
_version_ 1866915746101067776
author Negroni, Viola
Cuccovillo, Luca
Bestagini, Paolo
Aichroth, Patrick
Tubaro, Stefano
author_facet Negroni, Viola
Cuccovillo, Luca
Bestagini, Paolo
Aichroth, Patrick
Tubaro, Stefano
contents In this work, we introduce a multi-task transformer for speech deepfake detection, capable of predicting formant trajectories and voicing patterns over time, ultimately classifying speech as real or fake, and highlighting whether its decisions rely more on voiced or unvoiced regions. Building on a prior speaker-formant transformer architecture, we streamline the model with an improved input segmentation strategy, redesign the decoding process, and integrate built-in explainability. Compared to the baseline, our model requires fewer parameters, trains faster, and provides better interpretability, without sacrificing prediction performance.
format Preprint
id arxiv_https___arxiv_org_abs_2601_14850
institution arXiv
publishDate 2026
record_format arxiv
spellingShingle Multi-Task Transformer for Explainable Speech Deepfake Detection via Formant Modeling
Negroni, Viola
Cuccovillo, Luca
Bestagini, Paolo
Aichroth, Patrick
Tubaro, Stefano
Sound
In this work, we introduce a multi-task transformer for speech deepfake detection, capable of predicting formant trajectories and voicing patterns over time, ultimately classifying speech as real or fake, and highlighting whether its decisions rely more on voiced or unvoiced regions. Building on a prior speaker-formant transformer architecture, we streamline the model with an improved input segmentation strategy, redesign the decoding process, and integrate built-in explainability. Compared to the baseline, our model requires fewer parameters, trains faster, and provides better interpretability, without sacrificing prediction performance.
title Multi-Task Transformer for Explainable Speech Deepfake Detection via Formant Modeling
topic Sound
url https://arxiv.org/abs/2601.14850