Saved in:
| Main Authors: | Labrador, Beltrán, Otero-Gonzalez, Manuel, Lozano-Diez, Alicia, Ramos, Daniel, Toledano, Doroteo T., Gonzalez-Rodriguez, Joaquin |
|---|---|
| Format: | Preprint |
| Published: |
2023
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.09441 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2024)
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2024)
Towards detecting the pathological subharmonic voicing with fully convolutional neural networks
by: Ikuma, Takeshi, et al.
Published: (2025)
by: Ikuma, Takeshi, et al.
Published: (2025)
Introducing voice timbre attribute detection
by: He, Jinghao, et al.
Published: (2025)
by: He, Jinghao, et al.
Published: (2025)
Gender-ambiguous voice generation through feminine speaking style transfer in male voices
by: Koutsogiannaki, Maria, et al.
Published: (2024)
by: Koutsogiannaki, Maria, et al.
Published: (2024)
Easy, Interpretable, Effective: openSMILE for voice deepfake detection
by: Pascu, Octavian, et al.
Published: (2024)
by: Pascu, Octavian, et al.
Published: (2024)
Comparison of fundamental frequency estimators with subharmonic voice signals
by: Ikuma, Takeshi, et al.
Published: (2025)
by: Ikuma, Takeshi, et al.
Published: (2025)
Subjective quality evaluation of personalized own voice reconstruction systems
by: Ohlenbusch, Mattes, et al.
Published: (2025)
by: Ohlenbusch, Mattes, et al.
Published: (2025)
DNN-based ensemble singing voice synthesis with interactions between singers
by: Hyodo, Hiroaki, et al.
Published: (2024)
by: Hyodo, Hiroaki, et al.
Published: (2024)
Using voice analysis as an early indicator of risk for depression in young adults
by: Scherer, Klaus R., et al.
Published: (2024)
by: Scherer, Klaus R., et al.
Published: (2024)
Adversarial speech for voice privacy protection from Personalized Speech generation
by: Chen, Shihao, et al.
Published: (2024)
by: Chen, Shihao, et al.
Published: (2024)
Building speech corpus with diverse voice characteristics for its prompt-based representation
by: Watanabe, Aya, et al.
Published: (2024)
by: Watanabe, Aya, et al.
Published: (2024)
Audiovisual angle and voice incongruence do not affect audiovisual verbal short-term memory in virtual reality
by: Ermert, Cosima A., et al.
Published: (2024)
by: Ermert, Cosima A., et al.
Published: (2024)
Absorbing Discrete Diffusion for Speech Enhancement
by: Gonzalez, Philippe
Published: (2026)
by: Gonzalez, Philippe
Published: (2026)
A Dataset for Automatic Assessment of TTS Quality in Spanish
by: Welford, Alejandro Sosa, et al.
Published: (2025)
by: Welford, Alejandro Sosa, et al.
Published: (2025)
Resource-constrained stereo singing voice cancellation
by: Borrelli, Clara, et al.
Published: (2024)
by: Borrelli, Clara, et al.
Published: (2024)
Accurate analysis of the pitch pulse-based magnitude/phase structure of natural vowels and assessment of three lightweight time/frequency voicing restoration methods
by: Ferreira, Aníbal J. S., et al.
Published: (2025)
by: Ferreira, Aníbal J. S., et al.
Published: (2025)
DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors
by: Landini, Federico, et al.
Published: (2023)
by: Landini, Federico, et al.
Published: (2023)
Spoken language change detection inspired by speaker change detection
by: Mishra, Jagabandhu, et al.
Published: (2023)
by: Mishra, Jagabandhu, et al.
Published: (2023)
Sound event localization and detection based on crnn using rectangular filters and channel rotation data augmentation
by: Ronchini, Francesca, et al.
Published: (2020)
by: Ronchini, Francesca, et al.
Published: (2020)
Non-autoregressive real-time Accent Conversion model with voice cloning
by: Nechaev, Vladimir, et al.
Published: (2024)
by: Nechaev, Vladimir, et al.
Published: (2024)
Controllable joint noise reduction and hearing loss compensation using a differentiable auditory model
by: Gonzalez, Philippe, et al.
Published: (2025)
by: Gonzalez, Philippe, et al.
Published: (2025)
Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?
by: Zhang, Lin, et al.
Published: (2024)
by: Zhang, Lin, et al.
Published: (2024)
Leveraging Self-Supervised Learning for Speaker Diarization
by: Han, Jiangyu, et al.
Published: (2024)
by: Han, Jiangyu, et al.
Published: (2024)
Are audio DeepFake detection models polyglots?
by: Marek, Bartłomiej, et al.
Published: (2024)
by: Marek, Bartłomiej, et al.
Published: (2024)
Frequency-aware convolution for sound event detection
by: Song, Tao, et al.
Published: (2024)
by: Song, Tao, et al.
Published: (2024)
Joint Training of Speaker Embedding Extractor, Speech and Overlap Detection for Diarization
by: Pálka, Petr, et al.
Published: (2024)
by: Pálka, Petr, et al.
Published: (2024)
Can we reconstruct a dysarthric voice with the large speech model Parler TTS?
by: Sanchez, Ariadna, et al.
Published: (2025)
by: Sanchez, Ariadna, et al.
Published: (2025)
End-to-End Multi-Task Learning for Adjustable Joint Noise Reduction and Hearing Loss Compensation
by: Gonzalez, Philippe, et al.
Published: (2026)
by: Gonzalez, Philippe, et al.
Published: (2026)
Onset and offset weighted loss function for sound event detection
by: Song, Tao
Published: (2024)
by: Song, Tao
Published: (2024)
Fine-tune the pretrained ATST model for sound event detection
by: Shao, Nian, et al.
Published: (2023)
by: Shao, Nian, et al.
Published: (2023)
Representational learning for an anomalous sound detection system with source separation model
by: Shin, Seunghyeon, et al.
Published: (2024)
by: Shin, Seunghyeon, et al.
Published: (2024)
Towards generalisable and calibrated synthetic speech detection with self-supervised representations
by: Pascu, Octavian, et al.
Published: (2023)
by: Pascu, Octavian, et al.
Published: (2023)
A robust audio deepfake detection system via multi-view feature
by: Yang, Yujie, et al.
Published: (2024)
by: Yang, Yujie, et al.
Published: (2024)
A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge
by: Wang, Xiaopeng, et al.
Published: (2024)
by: Wang, Xiaopeng, et al.
Published: (2024)
MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence
by: Kumar, Sonal, et al.
Published: (2025)
by: Kumar, Sonal, et al.
Published: (2025)
Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection
by: Yue, Haobo, et al.
Published: (2024)
by: Yue, Haobo, et al.
Published: (2024)
Stereo sound event localization and detection based on PSELDnet pretraining and BiMamba sequence modeling
by: Gao, Wenmiao, et al.
Published: (2025)
by: Gao, Wenmiao, et al.
Published: (2025)
Performance and energy balance: a comprehensive study of state-of-the-art sound event detection systems
by: Ronchini, Francesca, et al.
Published: (2023)
by: Ronchini, Francesca, et al.
Published: (2023)
Automatic acoustic detection of birds through deep learning: the first Bird Audio Detection challenge
by: Stowell, Dan, et al.
Published: (2018)
by: Stowell, Dan, et al.
Published: (2018)
PAGURI: a user experience study of creative interaction with text-to-music models
by: Ronchini, Francesca, et al.
Published: (2024)
by: Ronchini, Francesca, et al.
Published: (2024)
Similar Items
-
Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2024) -
Towards detecting the pathological subharmonic voicing with fully convolutional neural networks
by: Ikuma, Takeshi, et al.
Published: (2025) -
Introducing voice timbre attribute detection
by: He, Jinghao, et al.
Published: (2025) -
Gender-ambiguous voice generation through feminine speaking style transfer in male voices
by: Koutsogiannaki, Maria, et al.
Published: (2024) -
Easy, Interpretable, Effective: openSMILE for voice deepfake detection
by: Pascu, Octavian, et al.
Published: (2024)