:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Andrade-Miranda, G., Chatzipapas, K., Arias-Londoño, J. D., Godino-Llorente, J. I.
Format:	Preprint
Published:	2024
Subjects:	Computer Vision and Pattern Recognition Artificial Intelligence Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2412.15054
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables
by: Yu, Chin-Yun, et al.
Published: (2023)

NeuroVoz: a Castillian Spanish corpus of parkinsonian speech
by: Mendes-Laureano, Janaína, et al.
Published: (2024)

SNC: A Stem-Native Codec for Efficient Lossless Audio Storage with Adaptive Playback Capabilities
by: Sufi, Shaad
Published: (2026)

Modeling and Estimation of Vocal Tract and Glottal Source Parameters Using ARMAX-LF Model
by: Lia, Kai, et al.
Published: (2024)

Del Visual al Auditivo: Sonorización de Escenas Guiada por Imagen
by: Sánchez, María, et al.
Published: (2024)

Construction and Analysis of Impression Caption Dataset for Environmental Sounds
by: Okamoto, Yuki, et al.
Published: (2024)

Baseline Systems and Evaluation Metrics for Spatial Semantic Segmentation of Sound Scenes
by: Nguyen, Binh Thien, et al.
Published: (2025)

Can Audio Reveal Music Performance Difficulty? Insights from the Piano Syllabus Dataset
by: Ramoneda, Pedro, et al.
Published: (2024)

DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset
by: Du, Jiawei, et al.
Published: (2024)

Unseen but not Unknown: Using Dataset Concealment to Robustly Evaluate Speech Quality Estimation Models
by: Pieper, Jaden, et al.
Published: (2026)

MusicEval: A Generative Music Dataset with Expert Ratings for Automatic Text-to-Music Evaluation
by: Liu, Cheng, et al.
Published: (2025)

SimuSOE: A Simulated Snoring Dataset for Obstructive Sleep Apnea-Hypopnea Syndrome Evaluation during Wakefulness
by: Lin, Jie, et al.
Published: (2024)

CodecFake+: A Large-Scale Neural Audio Codec-Based Deepfake Speech Dataset
by: Chen, Xuanjun, et al.
Published: (2025)

Toward Multimodal Industrial Fault Analysis: A Single-Speed Chain Conveyor Dataset with Audio and Vibration Signals
by: Chen, Zhang, et al.
Published: (2026)

Who Said What WSW 2.0? Enhanced Automated Analysis of Preschool Classroom Speech
by: Sun, Anchen, et al.
Published: (2025)

An Extensive Analysis of the Singing Voice Conversion Challenge 2025 Evaluation Results
by: Violeta, Lester Phillip, et al.
Published: (2025)

Comparative Evaluation of Acoustic Feature Extraction Tools for Clinical Speech Analysis
by: Choi, Anna Seo Gyeong, et al.
Published: (2025)

Facilitating deep acoustic phenotyping: A basic coding scheme of infant vocalisations preluding computational analysis, machine learning and clinical reasoning
by: Kulvicius, Tomas, et al.
Published: (2023)

Evaluating Parkinson's Disease Detection in Anonymized Speech: A Performance and Acoustic Analysis
by: Franzreb, Carlos, et al.
Published: (2026)

ASPED: An Audio Dataset for Detecting Pedestrians
by: Seshadri, Pavan, et al.
Published: (2023)

STraDa: A Singer Traits Dataset
by: Kong, Yuexuan, et al.
Published: (2024)

Mamba-based Segmentation Model for Speaker Diarization
by: Plaquet, Alexis, et al.
Published: (2024)

A Dataset for Automatic Assessment of TTS Quality in Spanish
by: Welford, Alejandro Sosa, et al.
Published: (2025)

Audio-Language Datasets of Scenes and Events: A Survey
by: Wijngaard, Gijs, et al.
Published: (2024)

CUEMPATHY: A Counseling Speech Dataset for Psychotherapy Research
by: Tao, Dehua, et al.
Published: (2024)

Dataset-Distillation Generative Model for Speech Emotion Recognition
by: Ritter-Gutierrez, Fabian, et al.
Published: (2024)

MLAAD: The Multi-Language Audio Anti-Spoofing Dataset
by: Müller, Nicolas M., et al.
Published: (2024)

Vision Transformer Segmentation for Visual Bird Sound Denoising
by: Kumar, Sahil, et al.
Published: (2024)

Advancing Singlish Understanding: Bridging the Gap with Datasets and Multimodal Models
by: Wang, Bin, et al.
Published: (2025)

Advancing Topic Segmentation of Broadcasted Speech with Multilingual Semantic Embeddings
by: Shukla, Sakshi Deo, et al.
Published: (2024)

Comparative Analysis of ASR Methods for Speech Deepfake Detection
by: Salvi, Davide, et al.
Published: (2024)

SoundCollage: Automated Discovery of New Classes in Audio Datasets
by: Choi, Ryuhaerang, et al.
Published: (2024)

UrBAN: Urban Beehive Acoustics and PheNotyping Dataset
by: Abdollahi, Mahsa, et al.
Published: (2024)

EmoFake: An Initial Dataset for Emotion Fake Audio Detection
by: Zhao, Yan, et al.
Published: (2022)

The Florence Price Art Song Dataset and Piano Accompaniment Generator
by: He, Tao-Tao, et al.
Published: (2025)

Binamix -- A Python Library for Generating Binaural Audio Datasets
by: Barry, Dan, et al.
Published: (2025)

ICSD: An Open-source Dataset for Infant Cry and Snoring Detection
by: Liu, Qingyu, et al.
Published: (2024)

The Extended SONICOM HRTF Dataset and Spatial Audio Metrics Toolbox
by: Poole, Katarina C., et al.
Published: (2025)

Advances in Speech Separation: Techniques, Challenges, and Future Trends
by: Li, Kai, et al.
Published: (2025)

ASAudio: A Survey of Advanced Spatial Audio Research
by: Zhu, Zhiyuan, et al.
Published: (2025)