:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Baunsgaard, Sebastian, Wrede, Sebastian B., Tozun, Pınar
Format:	Preprint
Published:	2020
Subjects:	Audio and Speech Processing Machine Learning Sound I.2; C.1; H.2
Online Access:	https://arxiv.org/abs/2003.12366
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Improving Speech Recognition Accuracy Using Custom Language Models with the Vosk Toolkit
by: Soni, Aniket Abhishek
Published: (2025)

Deep Feed-Forward Neural Network for Bangla Isolated Speech Recognition
by: Bhadra, Dipayan, et al.
Published: (2025)

Large Vocabulary Spontaneous Speech Recognition for Tigrigna
by: Kahsu, Ataklti, et al.
Published: (2023)

DFingerNet: Noise-Adaptive Speech Enhancement for Hearing Aids
by: Tsangko, Iosif, et al.
Published: (2025)

Boundary Regression for Leitmotif Detection in Music Audio
by: Lee, Sihun, et al.
Published: (2025)

Taming Audio VAEs via Target-KL Regularization
by: Seetharaman, Prem, et al.
Published: (2026)

Adaptable Symbolic Music Infilling with MIDI-RWKV
by: Zhou-Zheng, Christian, et al.
Published: (2025)

Iterative Feature Boosting for Explainable Speech Emotion Recognition
by: Nfissi, Alaa, et al.
Published: (2024)

Suicide Risk Assessment Using Multimodal Speech Features: A Study on the SW1 Challenge Dataset
by: Marie, Ambre, et al.
Published: (2025)

Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition
by: Nfissi, Alaa, et al.
Published: (2024)

Self-Improvement for Audio Large Language Model using Unlabeled Speech
by: Wang, Shaowen, et al.
Published: (2025)

MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion
by: Li, Pengcheng, et al.
Published: (2024)

BAST: Binaural Audio Spectrogram Transformer for Binaural Sound Localization
by: Kuang, Sheng, et al.
Published: (2022)

Machine Learning Framework for Audio-Based Content Evaluation using MFCC, Chroma, Spectral Contrast, and Temporal Feature Engineering
by: Aristorenas, Aris J.
Published: (2024)

Joint Estimation of Piano Dynamics and Metrical Structure with a Multi-task Multi-Scale Network
by: He, Zhanhong, et al.
Published: (2025)

KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods
by: Nzeyimana, Antoine
Published: (2023)

Monaural Multi-Speaker Speech Separation Using Efficient Transformer Model
by: Rijal, S., et al.
Published: (2023)

Modeling L1 Influence on L2 Pronunciation: An MFCC-Based Framework for Explainable Machine Learning and Pedagogical Feedback
by: Jahanbin, Peyman
Published: (2025)

Enhancing Speech Emotion Recognition Leveraging Aligning Timestamps of ASR Transcripts and Speaker Diarization
by: Wang, Hsuan-Yu, et al.
Published: (2025)

acoupi: An Open-Source Python Framework for Deploying Bioacoustic AI Models on Edge Devices
by: Vuilliomenet, Aude, et al.
Published: (2025)

Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models
by: Phukan, Orchid Chetia, et al.
Published: (2024)

SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech
by: Cheng, Zhuangfei, et al.
Published: (2025)

Measuring the Accuracy of Automatic Speech Recognition Solutions
by: Kuhn, Korbinian, et al.
Published: (2024)

ParaNoise-SV: Integrated Approach for Noise-Robust Speaker Verification with Parallel Joint Learning of Speech Enhancement and Noise Extraction
by: Kim, Minu, et al.
Published: (2025)

Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children
by: Ahn, Taekyung, et al.
Published: (2024)

Projected Belief Networks With Discriminative Alignment for Acoustic Event Classification: Rivaling State of the Art CNNs
by: Baggenstoss, Paul M., et al.
Published: (2024)

SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models
by: Dua, Karan, et al.
Published: (2025)

Towards Training Music Taggers on Synthetic Data
by: Kroher, Nadine, et al.
Published: (2024)

Generation of Musical Timbres using a Text-Guided Diffusion Model
by: Yuan, Weixuan, et al.
Published: (2025)

OBHS: An Optimized Block Huffman Scheme for Real-Time Audio Compression
by: Mahfi, Muntahi Safwan, et al.
Published: (2025)

SFMS-ALR: Script-First Multilingual Speech Synthesis with Adaptive Locale Resolution
by: Donepudi, Dharma Teja
Published: (2025)

A Unified Model For Voice and Accent Conversion In Speech and Singing using Self-Supervised Learning and Feature Extraction
by: Cheripally, Sowmya
Published: (2024)

SeamlessEdit: Background Noise Aware Zero-Shot Speech Editing with in-Context Enhancement
by: Chen, Kuan-Yu, et al.
Published: (2025)

Decoding Phone Pairs from MEG Signals Across Speech Modalities
by: de Zuazo, Xabier, et al.
Published: (2025)

Predicting Upcoming Stuttering Events from Three-Second Audio: Stratified Evaluation Reveals Severity-Selective Precursors, and the Model Deploys Fully On-Device
by: Kozak, Nazar
Published: (2026)

Adaptive Background Music for a Fighting Game: A Multi-Instrument Volume Modulation Approach
by: Khan, Ibrahim, et al.
Published: (2023)

Fighting Game Adaptive Background Music for Improved Gameplay
by: Khan, Ibrahim, et al.
Published: (2024)

How much to Dereverberate? Low-Latency Single-Channel Speech Enhancement in Distant Microphone Scenarios
by: Venkatesh, Satvik, et al.
Published: (2025)

BemaGANv2: Discriminator Combination Strategies for GAN-based Vocoders in Long-Term Audio Generation
by: Park, Taesoo, et al.
Published: (2025)

Bigger is not Always Better: The Effect of Context Size on Speech Pre-Training
by: Robertson, Sean, et al.
Published: (2023)