:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Labrador, Beltrán, Otero-Gonzalez, Manuel, Lozano-Diez, Alicia, Ramos, Daniel, Toledano, Doroteo T., Gonzalez-Rodriguez, Joaquin
Format:	Preprint
Published:	2023
Subjects:	Sound Machine Learning Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2401.09441
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2024)

Towards detecting the pathological subharmonic voicing with fully convolutional neural networks
by: Ikuma, Takeshi, et al.
Published: (2025)

Introducing voice timbre attribute detection
by: He, Jinghao, et al.
Published: (2025)

Gender-ambiguous voice generation through feminine speaking style transfer in male voices
by: Koutsogiannaki, Maria, et al.
Published: (2024)

Easy, Interpretable, Effective: openSMILE for voice deepfake detection
by: Pascu, Octavian, et al.
Published: (2024)

Comparison of fundamental frequency estimators with subharmonic voice signals
by: Ikuma, Takeshi, et al.
Published: (2025)

Subjective quality evaluation of personalized own voice reconstruction systems
by: Ohlenbusch, Mattes, et al.
Published: (2025)

DNN-based ensemble singing voice synthesis with interactions between singers
by: Hyodo, Hiroaki, et al.
Published: (2024)

Using voice analysis as an early indicator of risk for depression in young adults
by: Scherer, Klaus R., et al.
Published: (2024)

Adversarial speech for voice privacy protection from Personalized Speech generation
by: Chen, Shihao, et al.
Published: (2024)

Building speech corpus with diverse voice characteristics for its prompt-based representation
by: Watanabe, Aya, et al.
Published: (2024)

Audiovisual angle and voice incongruence do not affect audiovisual verbal short-term memory in virtual reality
by: Ermert, Cosima A., et al.
Published: (2024)

Absorbing Discrete Diffusion for Speech Enhancement
by: Gonzalez, Philippe
Published: (2026)

A Dataset for Automatic Assessment of TTS Quality in Spanish
by: Welford, Alejandro Sosa, et al.
Published: (2025)

Resource-constrained stereo singing voice cancellation
by: Borrelli, Clara, et al.
Published: (2024)

Accurate analysis of the pitch pulse-based magnitude/phase structure of natural vowels and assessment of three lightweight time/frequency voicing restoration methods
by: Ferreira, Aníbal J. S., et al.
Published: (2025)

DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors
by: Landini, Federico, et al.
Published: (2023)

Spoken language change detection inspired by speaker change detection
by: Mishra, Jagabandhu, et al.
Published: (2023)

Sound event localization and detection based on crnn using rectangular filters and channel rotation data augmentation
by: Ronchini, Francesca, et al.
Published: (2020)

Non-autoregressive real-time Accent Conversion model with voice cloning
by: Nechaev, Vladimir, et al.
Published: (2024)

Controllable joint noise reduction and hearing loss compensation using a differentiable auditory model
by: Gonzalez, Philippe, et al.
Published: (2025)

Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?
by: Zhang, Lin, et al.
Published: (2024)

Leveraging Self-Supervised Learning for Speaker Diarization
by: Han, Jiangyu, et al.
Published: (2024)

Are audio DeepFake detection models polyglots?
by: Marek, Bartłomiej, et al.
Published: (2024)

Frequency-aware convolution for sound event detection
by: Song, Tao, et al.
Published: (2024)

Joint Training of Speaker Embedding Extractor, Speech and Overlap Detection for Diarization
by: Pálka, Petr, et al.
Published: (2024)

Can we reconstruct a dysarthric voice with the large speech model Parler TTS?
by: Sanchez, Ariadna, et al.
Published: (2025)

End-to-End Multi-Task Learning for Adjustable Joint Noise Reduction and Hearing Loss Compensation
by: Gonzalez, Philippe, et al.
Published: (2026)

Onset and offset weighted loss function for sound event detection
by: Song, Tao
Published: (2024)

Fine-tune the pretrained ATST model for sound event detection
by: Shao, Nian, et al.
Published: (2023)

Representational learning for an anomalous sound detection system with source separation model
by: Shin, Seunghyeon, et al.
Published: (2024)

Towards generalisable and calibrated synthetic speech detection with self-supervised representations
by: Pascu, Octavian, et al.
Published: (2023)

A robust audio deepfake detection system via multi-view feature
by: Yang, Yujie, et al.
Published: (2024)

A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge
by: Wang, Xiaopeng, et al.
Published: (2024)

MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence
by: Kumar, Sonal, et al.
Published: (2025)

Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection
by: Yue, Haobo, et al.
Published: (2024)

Stereo sound event localization and detection based on PSELDnet pretraining and BiMamba sequence modeling
by: Gao, Wenmiao, et al.
Published: (2025)

Performance and energy balance: a comprehensive study of state-of-the-art sound event detection systems
by: Ronchini, Francesca, et al.
Published: (2023)

Automatic acoustic detection of birds through deep learning: the first Bird Audio Detection challenge
by: Stowell, Dan, et al.
Published: (2018)

PAGURI: a user experience study of creative interaction with text-to-music models
by: Ronchini, Francesca, et al.
Published: (2024)