:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ren, Zhao, Scheck, Kevin, Hou, Qinhan, van Gogh, Stefano, Wand, Michael, Schultz, Tanja
Format:	Preprint
Published:	2024
Subjects:	Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2405.08021
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Machine Unlearning in Speech Emotion Recognition via Forget Set Alone
by: Ren, Zhao, et al.
Published: (2025)

End-to-end Acoustic-linguistic Emotion and Intent Recognition Enhanced by Semi-supervised Learning
by: Ren, Zhao, et al.
Published: (2025)

Deep Speech Synthesis from Multimodal Articulatory Representations
by: Wu, Peter, et al.
Published: (2024)

Investigating Effective Speaker Property Privacy Protection in Federated Learning for Speech Emotion Recognition
by: Tan, Chao, et al.
Published: (2024)

Speech as a Biomarker for Disease Detection
by: Botelho, Catarina, et al.
Published: (2024)

Multi-modal Speech Enhancement with Limited Electromyography Channels
by: Feng, Fuyuan, et al.
Published: (2025)

DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models
by: wu, Weihao, et al.
Published: (2025)

DiffDSR: Dysarthric Speech Reconstruction Using Latent Diffusion Model
by: Chen, Xueyuan, et al.
Published: (2025)

Breaking Resource Barriers in Speech Emotion Recognition via Data Distillation
by: Chang, Yi, et al.
Published: (2024)

CoDiff-VC: A Codec-Assisted Diffusion Model for Zero-shot Voice Conversion
by: Li, Yuke, et al.
Published: (2024)

DiffAR: Denoising Diffusion Autoregressive Model for Raw Speech Waveform Generation
by: Benita, Roi, et al.
Published: (2023)

Chain-Talker: Chain Understanding and Rendering for Empathetic Conversational Speech Synthesis
by: Hu, Yifan, et al.
Published: (2025)

Affect Decoding in Phonated and Silent Speech Production from Surface EMG
by: Pistrosch, Simon, et al.
Published: (2026)

Voice-ENHANCE: Speech Restoration using a Diffusion-based Voice Conversion Framework
by: Byun, Kyungguen, et al.
Published: (2025)

SOVA-Bench: Benchmarking the Speech Conversation Ability for LLM-based Voice Assistant
by: Hou, Yixuan, et al.
Published: (2025)

Conditional Latent Diffusion-Based Speech Enhancement Via Dual Context Learning
by: Zhao, Shengkui, et al.
Published: (2025)

Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion
by: Li, Ruiqi, et al.
Published: (2024)

An Explainable Probabilistic Attribute Embedding Approach for Spoofed Speech Characterization
by: Chhibber, Manasi, et al.
Published: (2024)

Freeze and Learn: Continual Learning with Selective Freezing for Speech Deepfake Detection
by: Salvi, Davide, et al.
Published: (2024)

DiffAU: Diffusion-Based Ambisonics Upscaling
by: Milstein, Amit, et al.
Published: (2025)

Noise-aware Speech Enhancement using Diffusion Probabilistic Model
by: Hu, Yuchen, et al.
Published: (2023)

DiffAttack: Diffusion-based Timbre-reserved Adversarial Attack in Speaker Identification
by: Wang, Qing, et al.
Published: (2025)

Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models
by: Kushwaha, Saksham Singh, et al.
Published: (2024)

Generative Expressive Conversational Speech Synthesis
by: Liu, Rui, et al.
Published: (2024)

Variational Autoencoder for Personalized Pathological Speech Enhancement
by: Hou, Mingchi, et al.
Published: (2025)

EMOCONV-DIFF: Diffusion-based Speech Emotion Conversion for Non-parallel and In-the-wild Data
by: Prabhu, Navin Raj, et al.
Published: (2023)

REWIND: Speech Time Reversal for Enhancing Speaker Representations in Diffusion-based Voice Conversion
by: Biyani, Ishan D., et al.
Published: (2025)

RAVE for Speech: Efficient Voice Conversion at High Sampling Rates
by: Bargum, Anders R., et al.
Published: (2024)

Source Verification for Speech Deepfakes
by: Negroni, Viola, et al.
Published: (2025)

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
by: Liu, Huadai, et al.
Published: (2023)

The TEA-ASLP System for Multilingual Conversational Speech Recognition and Speech Diarization in MLC-SLM 2025 Challenge
by: Xue, Hongfei, et al.
Published: (2025)

Objective and Subjective Evaluation of Diffusion-Based Speech Enhancement for Dysarthric Speech
by: de Groot, Dimme, et al.
Published: (2025)

Absorbing Discrete Diffusion for Speech Enhancement
by: Gonzalez, Philippe
Published: (2026)

LipDiffuser: Lip-to-Speech Generation with Conditional Diffusion Models
by: Richter, Julius, et al.
Published: (2025)

VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion
by: Byun, Kyungguen, et al.
Published: (2024)

Conformer-based Ultrasound-to-Speech Conversion
by: Ibrahimov, Ibrahim, et al.
Published: (2025)

Dual-View Predictive Diffusion: Lightweight Speech Enhancement via Spectrogram-Image Synergy
by: Xue, Ke, et al.
Published: (2026)

EAD-VC: Enhancing Speech Auto-Disentanglement for Voice Conversion with IFUB Estimator and Joint Text-Guided Consistent Learning
by: Liang, Ziqi, et al.
Published: (2024)

Multitask Learning for Grapheme-to-Phoneme Conversion of Anglicisms in German Speech Recognition
by: Pritzen, Julia, et al.
Published: (2021)

dLLM-ASR: A Faster Diffusion LLM-based Framework for Speech Recognition
by: Tian, Wenjie, et al.
Published: (2026)