Saved in:
| Main Authors: | Baunsgaard, Sebastian, Wrede, Sebastian B., Tozun, Pınar |
|---|---|
| Format: | Preprint |
| Published: |
2020
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2003.12366 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Improving Speech Recognition Accuracy Using Custom Language Models with the Vosk Toolkit
by: Soni, Aniket Abhishek
Published: (2025)
by: Soni, Aniket Abhishek
Published: (2025)
Deep Feed-Forward Neural Network for Bangla Isolated Speech Recognition
by: Bhadra, Dipayan, et al.
Published: (2025)
by: Bhadra, Dipayan, et al.
Published: (2025)
Large Vocabulary Spontaneous Speech Recognition for Tigrigna
by: Kahsu, Ataklti, et al.
Published: (2023)
by: Kahsu, Ataklti, et al.
Published: (2023)
DFingerNet: Noise-Adaptive Speech Enhancement for Hearing Aids
by: Tsangko, Iosif, et al.
Published: (2025)
by: Tsangko, Iosif, et al.
Published: (2025)
Boundary Regression for Leitmotif Detection in Music Audio
by: Lee, Sihun, et al.
Published: (2025)
by: Lee, Sihun, et al.
Published: (2025)
Taming Audio VAEs via Target-KL Regularization
by: Seetharaman, Prem, et al.
Published: (2026)
by: Seetharaman, Prem, et al.
Published: (2026)
Adaptable Symbolic Music Infilling with MIDI-RWKV
by: Zhou-Zheng, Christian, et al.
Published: (2025)
by: Zhou-Zheng, Christian, et al.
Published: (2025)
Iterative Feature Boosting for Explainable Speech Emotion Recognition
by: Nfissi, Alaa, et al.
Published: (2024)
by: Nfissi, Alaa, et al.
Published: (2024)
Suicide Risk Assessment Using Multimodal Speech Features: A Study on the SW1 Challenge Dataset
by: Marie, Ambre, et al.
Published: (2025)
by: Marie, Ambre, et al.
Published: (2025)
Unveiling Hidden Factors: Explainable AI for Feature Boosting in Speech Emotion Recognition
by: Nfissi, Alaa, et al.
Published: (2024)
by: Nfissi, Alaa, et al.
Published: (2024)
Self-Improvement for Audio Large Language Model using Unlabeled Speech
by: Wang, Shaowen, et al.
Published: (2025)
by: Wang, Shaowen, et al.
Published: (2025)
MAIN-VC: Lightweight Speech Representation Disentanglement for One-shot Voice Conversion
by: Li, Pengcheng, et al.
Published: (2024)
by: Li, Pengcheng, et al.
Published: (2024)
BAST: Binaural Audio Spectrogram Transformer for Binaural Sound Localization
by: Kuang, Sheng, et al.
Published: (2022)
by: Kuang, Sheng, et al.
Published: (2022)
Machine Learning Framework for Audio-Based Content Evaluation using MFCC, Chroma, Spectral Contrast, and Temporal Feature Engineering
by: Aristorenas, Aris J.
Published: (2024)
by: Aristorenas, Aris J.
Published: (2024)
Joint Estimation of Piano Dynamics and Metrical Structure with a Multi-task Multi-Scale Network
by: He, Zhanhong, et al.
Published: (2025)
by: He, Zhanhong, et al.
Published: (2025)
KinSPEAK: Improving speech recognition for Kinyarwanda via semi-supervised learning methods
by: Nzeyimana, Antoine
Published: (2023)
by: Nzeyimana, Antoine
Published: (2023)
Monaural Multi-Speaker Speech Separation Using Efficient Transformer Model
by: Rijal, S., et al.
Published: (2023)
by: Rijal, S., et al.
Published: (2023)
Modeling L1 Influence on L2 Pronunciation: An MFCC-Based Framework for Explainable Machine Learning and Pedagogical Feedback
by: Jahanbin, Peyman
Published: (2025)
by: Jahanbin, Peyman
Published: (2025)
Enhancing Speech Emotion Recognition Leveraging Aligning Timestamps of ASR Transcripts and Speaker Diarization
by: Wang, Hsuan-Yu, et al.
Published: (2025)
by: Wang, Hsuan-Yu, et al.
Published: (2025)
acoupi: An Open-Source Python Framework for Deploying Bioacoustic AI Models on Edge Devices
by: Vuilliomenet, Aude, et al.
Published: (2025)
by: Vuilliomenet, Aude, et al.
Published: (2025)
Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models
by: Phukan, Orchid Chetia, et al.
Published: (2024)
by: Phukan, Orchid Chetia, et al.
Published: (2024)
SpeechAccentLLM: A Unified Framework for Foreign Accent Conversion and Text to Speech
by: Cheng, Zhuangfei, et al.
Published: (2025)
by: Cheng, Zhuangfei, et al.
Published: (2025)
Measuring the Accuracy of Automatic Speech Recognition Solutions
by: Kuhn, Korbinian, et al.
Published: (2024)
by: Kuhn, Korbinian, et al.
Published: (2024)
ParaNoise-SV: Integrated Approach for Noise-Robust Speaker Verification with Parallel Joint Learning of Speech Enhancement and Noise Extraction
by: Kim, Minu, et al.
Published: (2025)
by: Kim, Minu, et al.
Published: (2025)
Automatic Speech Recognition (ASR) for the Diagnosis of pronunciation of Speech Sound Disorders in Korean children
by: Ahn, Taekyung, et al.
Published: (2024)
by: Ahn, Taekyung, et al.
Published: (2024)
Projected Belief Networks With Discriminative Alignment for Acoustic Event Classification: Rivaling State of the Art CNNs
by: Baggenstoss, Paul M., et al.
Published: (2024)
by: Baggenstoss, Paul M., et al.
Published: (2024)
SpeechWeave: Diverse Multilingual Synthetic Text & Audio Data Generation Pipeline for Training Text to Speech Models
by: Dua, Karan, et al.
Published: (2025)
by: Dua, Karan, et al.
Published: (2025)
Towards Training Music Taggers on Synthetic Data
by: Kroher, Nadine, et al.
Published: (2024)
by: Kroher, Nadine, et al.
Published: (2024)
Generation of Musical Timbres using a Text-Guided Diffusion Model
by: Yuan, Weixuan, et al.
Published: (2025)
by: Yuan, Weixuan, et al.
Published: (2025)
OBHS: An Optimized Block Huffman Scheme for Real-Time Audio Compression
by: Mahfi, Muntahi Safwan, et al.
Published: (2025)
by: Mahfi, Muntahi Safwan, et al.
Published: (2025)
SFMS-ALR: Script-First Multilingual Speech Synthesis with Adaptive Locale Resolution
by: Donepudi, Dharma Teja
Published: (2025)
by: Donepudi, Dharma Teja
Published: (2025)
A Unified Model For Voice and Accent Conversion In Speech and Singing using Self-Supervised Learning and Feature Extraction
by: Cheripally, Sowmya
Published: (2024)
by: Cheripally, Sowmya
Published: (2024)
SeamlessEdit: Background Noise Aware Zero-Shot Speech Editing with in-Context Enhancement
by: Chen, Kuan-Yu, et al.
Published: (2025)
by: Chen, Kuan-Yu, et al.
Published: (2025)
Decoding Phone Pairs from MEG Signals Across Speech Modalities
by: de Zuazo, Xabier, et al.
Published: (2025)
by: de Zuazo, Xabier, et al.
Published: (2025)
Predicting Upcoming Stuttering Events from Three-Second Audio: Stratified Evaluation Reveals Severity-Selective Precursors, and the Model Deploys Fully On-Device
by: Kozak, Nazar
Published: (2026)
by: Kozak, Nazar
Published: (2026)
Adaptive Background Music for a Fighting Game: A Multi-Instrument Volume Modulation Approach
by: Khan, Ibrahim, et al.
Published: (2023)
by: Khan, Ibrahim, et al.
Published: (2023)
Fighting Game Adaptive Background Music for Improved Gameplay
by: Khan, Ibrahim, et al.
Published: (2024)
by: Khan, Ibrahim, et al.
Published: (2024)
How much to Dereverberate? Low-Latency Single-Channel Speech Enhancement in Distant Microphone Scenarios
by: Venkatesh, Satvik, et al.
Published: (2025)
by: Venkatesh, Satvik, et al.
Published: (2025)
BemaGANv2: Discriminator Combination Strategies for GAN-based Vocoders in Long-Term Audio Generation
by: Park, Taesoo, et al.
Published: (2025)
by: Park, Taesoo, et al.
Published: (2025)
Bigger is not Always Better: The Effect of Context Size on Speech Pre-Training
by: Robertson, Sean, et al.
Published: (2023)
by: Robertson, Sean, et al.
Published: (2023)
Similar Items
-
Improving Speech Recognition Accuracy Using Custom Language Models with the Vosk Toolkit
by: Soni, Aniket Abhishek
Published: (2025) -
Deep Feed-Forward Neural Network for Bangla Isolated Speech Recognition
by: Bhadra, Dipayan, et al.
Published: (2025) -
Large Vocabulary Spontaneous Speech Recognition for Tigrigna
by: Kahsu, Ataklti, et al.
Published: (2023) -
DFingerNet: Noise-Adaptive Speech Enhancement for Hearing Aids
by: Tsangko, Iosif, et al.
Published: (2025) -
Boundary Regression for Leitmotif Detection in Music Audio
by: Lee, Sihun, et al.
Published: (2025)