:: Library Catalog

Copertina

Salvato in:

Dettagli Bibliografici
Autori principali:	Cuccovillo, Luca, Wang, Xin, Gerhardt, Milica, Aichroth, Patrick
Natura:	Preprint
Pubblicazione:	2026
Soggetti:	Audio and Speech Processing
Accesso online:	https://arxiv.org/abs/2604.16700
Tags:	Aggiungi Tag Nessun Tag, puoi essere il primo ad aggiungerne!!

Documenti analoghi

Byte Pair Encoding Is All You Need For Automatic Bengali Speech Recognition
di: Samin, Ahnaf Mozib
Pubblicazione: (2024)

Qieemo: Speech Is All You Need in the Emotion Recognition in Conversations
di: Chen, Jinming, et al.
Pubblicazione: (2025)

Are Multimodal Foundation Models All That Is Needed for Emofake Detection?
di: Akhtar, Mohd Mujtaba, et al.
Pubblicazione: (2025)

Gibberish is All You Need for Membership Inference Detection in Contrastive Language-Audio Pretraining
di: Cheng, Ruoxi, et al.
Pubblicazione: (2024)

Post-training for Deepfake Speech Detection
di: Ge, Wanying, et al.
Pubblicazione: (2025)

Towards Data Drift Monitoring for Speech Deepfake Detection in the context of MLOps
di: Wang, Xin, et al.
Pubblicazione: (2025)

Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification
di: Liu, Tianchi, et al.
Pubblicazione: (2023)

SynHate: Detecting Hate Speech in Synthetic Deepfake Audio
di: Ranjan, Rishabh, et al.
Pubblicazione: (2025)

Visual and audio scene classification for detecting discrepancies in video: a baseline method and experimental protocol
di: Apostolidis, Konstantinos, et al.
Pubblicazione: (2024)

Does Fine-tuning by Reinforcement Learning Improve Generalization in Binary Speech Deepfake Detection?
di: Wang, Xin, et al.
Pubblicazione: (2026)

Dual-Branch Knowledge Distillation for Noise-Robust Synthetic Speech Detection
di: Fan, Cunhang, et al.
Pubblicazione: (2023)

UR Channel-Robust Synthetic Speech Detection System for ASVspoof 2021
di: Chen, Xinhui, et al.
Pubblicazione: (2021)

Bridging Attribution and Open-Set Detection using Graph-Augmented Instance Learning in Synthetic Speech
di: Akhtar, Mohd Mujtaba, et al.
Pubblicazione: (2026)

Automatic Detection of Depression in Speech Using Ensemble Convolutional Neural Networks
di: Vázquez-Romero, Adrián, et al.
Pubblicazione: (2024)

From Sharpness to Better Generalization for Speech Deepfake Detection
di: Huang, Wen, et al.
Pubblicazione: (2025)

Optimizing Neural Speech Codec for Low-Bitrate Compression via Multi-Scale Encoding
di: Yang, Peiji, et al.
Pubblicazione: (2024)

Learnable Pulse Accumulation for On-Device Speech Recognition: How Much Attention Do You Need?
di: Shkolnikov, Yakov Pyotr
Pubblicazione: (2026)

Listening Between the Lines: Synthetic Speech Detection Disregarding Verbal Content
di: Salvi, Davide, et al.
Pubblicazione: (2024)

Every Breath You Don't Take: Deepfake Speech Detection Using Breath
di: Layton, Seth, et al.
Pubblicazione: (2024)

All Neural Low-latency Directional Speech Extraction
di: Pandey, Ashutosh, et al.
Pubblicazione: (2024)

GMM-ResNet2: Ensemble of Group ResNet Networks for Synthetic Speech Detection
di: Lei, Zhenchun, et al.
Pubblicazione: (2024)

Freeze and Learn: Continual Learning with Selective Freezing for Speech Deepfake Detection
di: Salvi, Davide, et al.
Pubblicazione: (2024)

Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?
di: Zhang, Lin, et al.
Pubblicazione: (2024)

Beyond Manual Transcripts: The Potential of Automated Speech Recognition Errors in Improving Alzheimer's Disease Detection
di: Liu, Yin-Long, et al.
Pubblicazione: (2025)

Speech Quality Embeddings for Improved Detection and Classification of Degradations in Speech Signals
di: Kuhlmann, Michael, et al.
Pubblicazione: (2026)

Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection
di: Zhang, Jinming, et al.
Pubblicazione: (2025)

FakeMusicCaps: a Dataset for Detection and Attribution of Synthetic Music Generated via Text-to-Music Models
di: Comanducci, Luca, et al.
Pubblicazione: (2024)

Leveraging Cascaded Binary Classification and Multimodal Fusion for Dementia Detection through Spontaneous Speech
di: Liu, Yin-Long, et al.
Pubblicazione: (2025)

Robust Speech Activity Detection in the Presence of Singing Voice
di: Grundhuber, Philipp, et al.
Pubblicazione: (2025)

XLSR-Kanformer: A KAN-Intergrated model for Synthetic Speech Detection
di: Dat, Phuong Tuan, et al.
Pubblicazione: (2025)

Indic-CodecFake meets SATYAM: Towards Detecting Neural Audio Codec Synthesized Speech Deepfakes in Indic Languages
di: Girish, et al.
Pubblicazione: (2026)

Towards Frame-level Quality Predictions of Synthetic Speech
di: Kuhlmann, Michael, et al.
Pubblicazione: (2025)

PSELDNets: Pre-trained Neural Networks on a Large-scale Synthetic Dataset for Sound Event Localization and Detection
di: Hu, Jinbo, et al.
Pubblicazione: (2024)

Leveraging Prompt Learning and Pause Encoding for Alzheimer's Disease Detection
di: Liu, Yin-Long, et al.
Pubblicazione: (2024)

Speech as a Biomarker for Disease Detection
di: Botelho, Catarina, et al.
Pubblicazione: (2024)

A Pre-training Framework that Encodes Noise Information for Speech Quality Assessment
di: Sultana, Subrina, et al.
Pubblicazione: (2024)

Variational Auto-Encoder Based Variability Encoding for Dysarthric Speech Recognition
di: Xie, Xurong, et al.
Pubblicazione: (2022)

Towards Explainable Spoofed Speech Attribution and Detection:a Probabilistic Approach for Characterizing Speech Synthesizer Components
di: Mishra, Jagabandhu, et al.
Pubblicazione: (2025)

SpeechT-RAG: Reliable Depression Detection in LLMs with Retrieval-Augmented Generation Using Speech Timing Information
di: Zhang, Xiangyu, et al.
Pubblicazione: (2025)

Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection
di: Truong, Duc-Tuan, et al.
Pubblicazione: (2024)