:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Mullins, Sarabeth S., Götz, Georg, Bezzam, Eric, Zheng, Steven, Nielsen, Daniel Gert
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing Machine Learning
Online Access:	https://arxiv.org/abs/2510.23141
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Room-acoustic simulations as an alternative to measurements for audio-algorithm evaluation
by: Götz, Georg, et al.
Published: (2025)

Speech dereverberation constrained on room impulse response characteristics
by: Bahrman, Louis, et al.
Published: (2024)

Low algorithmic delay implementation of convolutional beamformer for online joint source separation and dereverberation
by: Mo, Kaien, et al.
Published: (2024)

A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model
by: Zhao, Dongdi, et al.
Published: (2024)

AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection
by: Gong, Rong, et al.
Published: (2024)

Prominence-aware automatic speech recognition for conversational speech
by: Linke, Julian, et al.
Published: (2025)

Throat and acoustic paired speech dataset for deep learning-based speech enhancement
by: Kim, Yunsik, et al.
Published: (2025)

Spatial-Magnifier: Spatial upsampling for multichannel speech enhancement
by: Lee, Dongheon, et al.
Published: (2026)

Robust fine-tuning of speech recognition models via model merging: application to disordered speech
by: Ducorroy, Alexandre, et al.
Published: (2025)

Teaching the Teachers: Boosting unsupervised domain adaptation in speech recognition by ensemble update
by: Ahmad, Rehan, et al.
Published: (2026)

Joint decoding method for controllable contextual speech recognition based on Speech LLM
by: Fang, Yangui, et al.
Published: (2025)

Real-time speech enhancement in noise for throat microphone using neural audio codec as foundation model
by: Hauret, Julien, et al.
Published: (2025)

Lightweight speech enhancement guided target speech extraction in noisy multi-speaker scenarios
by: Huang, Ziling, et al.
Published: (2025)

Improving child speech recognition with augmented child-like speech
by: Zhang, Yuanyuan, et al.
Published: (2024)

BabAR: from phoneme recognition to developmental measures of young children's speech production
by: Lavechin, Marvin, et al.
Published: (2026)

Deep Room Impulse Response Completion
by: Lin, Jackie, et al.
Published: (2024)

AlignNet: Learning dataset score alignment functions to enable better training of speech quality estimators
by: Pieper, Jaden, et al.
Published: (2024)

An efficient text augmentation approach for contextualized Mandarin speech recognition
by: Zheng, Naijun, et al.
Published: (2024)

Graph-based multi-Feature fusion method for speech emotion recognition
by: Liu, Xueyu, et al.
Published: (2024)

Predicting speech intelligibility in older adults for speech enhancement using the Gammachirp Envelope Similarity Index, GESI
by: Yamamoto, Ayako, et al.
Published: (2025)

DBMIF: a deep balanced multimodal iterative fusion framework for air- and bone-conduction speech enhancement
by: Wu, Yilei, et al.
Published: (2026)

Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture
by: Hauret, Julien, et al.
Published: (2023)

Introduction to speech recognition
by: Dauphin, Gabriel
Published: (2024)

Transcribe, Align and Segment: Creating speech datasets for low-resource languages
by: Sereda, Taras
Published: (2024)

End-to-end transfer learning for speaker-independent cross-language and cross-corpus speech emotion recognition
by: Tang, Duowei, et al.
Published: (2023)

Using RLHF to align speech enhancement approaches to mean-opinion quality scores
by: Kumar, Anurag, et al.
Published: (2024)

Language model integration based on memory control for sequence to sequence speech recognition
by: Cho, Jaejin, et al.
Published: (2018)

Phoneme-based speech recognition driven by large language models and sampling marginalization
by: Ma, Te, et al.
Published: (2025)

Zipformer: A faster and better encoder for automatic speech recognition
by: Yao, Zengwei, et al.
Published: (2023)

CR-CTC: Consistency regularization on CTC for improved speech recognition
by: Yao, Zengwei, et al.
Published: (2024)

Evaluating pretrained speech embedding systems for dysarthria detection across heterogenous datasets
by: Wihlborg, Lovisa, et al.
Published: (2025)

Towards noise-robust speech inversion through multi-task learning with speech enhancement
by: Tabatabaee, Saba, et al.
Published: (2026)

Paraformer-v2: An improved non-autoregressive transformer for noise-robust speech recognition
by: An, Keyu, et al.
Published: (2024)

Index-MSR: A high-efficiency multimodal fusion framework for speech recognition
by: Chen, Jinming, et al.
Published: (2025)

Assessing speech quality metrics for evaluation of neural audio codecs under clean speech conditions
by: Mack, Wolfgang, et al.
Published: (2025)

Towards measuring fairness in speech recognition: Fair-Speech dataset
by: Veliche, Irina-Elena, et al.
Published: (2024)

Heterogeneous bimodal attention fusion for speech emotion recognition
by: Luo, Jiachen, et al.
Published: (2025)

A lightweight dual-stage framework for personalized speech enhancement based on DeepFilterNet2
by: Serre, Thomas, et al.
Published: (2024)

Thinking in cocktail party: Chain-of-Thought and reinforcement learning for target speaker automatic speech recognition
by: Zhang, Yiru, et al.
Published: (2025)

Charting 15 years of progress in deep learning for speech emotion recognition: A replication study
by: Triantafyllopoulos, Andreas, et al.
Published: (2025)