Saved in:
| Main Authors: | Mullins, Sarabeth S., Götz, Georg, Bezzam, Eric, Zheng, Steven, Nielsen, Daniel Gert |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2510.23141 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Room-acoustic simulations as an alternative to measurements for audio-algorithm evaluation
by: Götz, Georg, et al.
Published: (2025)
by: Götz, Georg, et al.
Published: (2025)
Speech dereverberation constrained on room impulse response characteristics
by: Bahrman, Louis, et al.
Published: (2024)
by: Bahrman, Louis, et al.
Published: (2024)
Low algorithmic delay implementation of convolutional beamformer for online joint source separation and dereverberation
by: Mo, Kaien, et al.
Published: (2024)
by: Mo, Kaien, et al.
Published: (2024)
A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model
by: Zhao, Dongdi, et al.
Published: (2024)
by: Zhao, Dongdi, et al.
Published: (2024)
AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection
by: Gong, Rong, et al.
Published: (2024)
by: Gong, Rong, et al.
Published: (2024)
Prominence-aware automatic speech recognition for conversational speech
by: Linke, Julian, et al.
Published: (2025)
by: Linke, Julian, et al.
Published: (2025)
Throat and acoustic paired speech dataset for deep learning-based speech enhancement
by: Kim, Yunsik, et al.
Published: (2025)
by: Kim, Yunsik, et al.
Published: (2025)
Spatial-Magnifier: Spatial upsampling for multichannel speech enhancement
by: Lee, Dongheon, et al.
Published: (2026)
by: Lee, Dongheon, et al.
Published: (2026)
Robust fine-tuning of speech recognition models via model merging: application to disordered speech
by: Ducorroy, Alexandre, et al.
Published: (2025)
by: Ducorroy, Alexandre, et al.
Published: (2025)
Teaching the Teachers: Boosting unsupervised domain adaptation in speech recognition by ensemble update
by: Ahmad, Rehan, et al.
Published: (2026)
by: Ahmad, Rehan, et al.
Published: (2026)
Joint decoding method for controllable contextual speech recognition based on Speech LLM
by: Fang, Yangui, et al.
Published: (2025)
by: Fang, Yangui, et al.
Published: (2025)
Real-time speech enhancement in noise for throat microphone using neural audio codec as foundation model
by: Hauret, Julien, et al.
Published: (2025)
by: Hauret, Julien, et al.
Published: (2025)
Lightweight speech enhancement guided target speech extraction in noisy multi-speaker scenarios
by: Huang, Ziling, et al.
Published: (2025)
by: Huang, Ziling, et al.
Published: (2025)
Improving child speech recognition with augmented child-like speech
by: Zhang, Yuanyuan, et al.
Published: (2024)
by: Zhang, Yuanyuan, et al.
Published: (2024)
BabAR: from phoneme recognition to developmental measures of young children's speech production
by: Lavechin, Marvin, et al.
Published: (2026)
by: Lavechin, Marvin, et al.
Published: (2026)
Deep Room Impulse Response Completion
by: Lin, Jackie, et al.
Published: (2024)
by: Lin, Jackie, et al.
Published: (2024)
AlignNet: Learning dataset score alignment functions to enable better training of speech quality estimators
by: Pieper, Jaden, et al.
Published: (2024)
by: Pieper, Jaden, et al.
Published: (2024)
An efficient text augmentation approach for contextualized Mandarin speech recognition
by: Zheng, Naijun, et al.
Published: (2024)
by: Zheng, Naijun, et al.
Published: (2024)
Graph-based multi-Feature fusion method for speech emotion recognition
by: Liu, Xueyu, et al.
Published: (2024)
by: Liu, Xueyu, et al.
Published: (2024)
Predicting speech intelligibility in older adults for speech enhancement using the Gammachirp Envelope Similarity Index, GESI
by: Yamamoto, Ayako, et al.
Published: (2025)
by: Yamamoto, Ayako, et al.
Published: (2025)
DBMIF: a deep balanced multimodal iterative fusion framework for air- and bone-conduction speech enhancement
by: Wu, Yilei, et al.
Published: (2026)
by: Wu, Yilei, et al.
Published: (2026)
Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture
by: Hauret, Julien, et al.
Published: (2023)
by: Hauret, Julien, et al.
Published: (2023)
Introduction to speech recognition
by: Dauphin, Gabriel
Published: (2024)
by: Dauphin, Gabriel
Published: (2024)
Transcribe, Align and Segment: Creating speech datasets for low-resource languages
by: Sereda, Taras
Published: (2024)
by: Sereda, Taras
Published: (2024)
End-to-end transfer learning for speaker-independent cross-language and cross-corpus speech emotion recognition
by: Tang, Duowei, et al.
Published: (2023)
by: Tang, Duowei, et al.
Published: (2023)
Using RLHF to align speech enhancement approaches to mean-opinion quality scores
by: Kumar, Anurag, et al.
Published: (2024)
by: Kumar, Anurag, et al.
Published: (2024)
Language model integration based on memory control for sequence to sequence speech recognition
by: Cho, Jaejin, et al.
Published: (2018)
by: Cho, Jaejin, et al.
Published: (2018)
Phoneme-based speech recognition driven by large language models and sampling marginalization
by: Ma, Te, et al.
Published: (2025)
by: Ma, Te, et al.
Published: (2025)
Zipformer: A faster and better encoder for automatic speech recognition
by: Yao, Zengwei, et al.
Published: (2023)
by: Yao, Zengwei, et al.
Published: (2023)
CR-CTC: Consistency regularization on CTC for improved speech recognition
by: Yao, Zengwei, et al.
Published: (2024)
by: Yao, Zengwei, et al.
Published: (2024)
Evaluating pretrained speech embedding systems for dysarthria detection across heterogenous datasets
by: Wihlborg, Lovisa, et al.
Published: (2025)
by: Wihlborg, Lovisa, et al.
Published: (2025)
Towards noise-robust speech inversion through multi-task learning with speech enhancement
by: Tabatabaee, Saba, et al.
Published: (2026)
by: Tabatabaee, Saba, et al.
Published: (2026)
Paraformer-v2: An improved non-autoregressive transformer for noise-robust speech recognition
by: An, Keyu, et al.
Published: (2024)
by: An, Keyu, et al.
Published: (2024)
Index-MSR: A high-efficiency multimodal fusion framework for speech recognition
by: Chen, Jinming, et al.
Published: (2025)
by: Chen, Jinming, et al.
Published: (2025)
Assessing speech quality metrics for evaluation of neural audio codecs under clean speech conditions
by: Mack, Wolfgang, et al.
Published: (2025)
by: Mack, Wolfgang, et al.
Published: (2025)
Towards measuring fairness in speech recognition: Fair-Speech dataset
by: Veliche, Irina-Elena, et al.
Published: (2024)
by: Veliche, Irina-Elena, et al.
Published: (2024)
Heterogeneous bimodal attention fusion for speech emotion recognition
by: Luo, Jiachen, et al.
Published: (2025)
by: Luo, Jiachen, et al.
Published: (2025)
A lightweight dual-stage framework for personalized speech enhancement based on DeepFilterNet2
by: Serre, Thomas, et al.
Published: (2024)
by: Serre, Thomas, et al.
Published: (2024)
Thinking in cocktail party: Chain-of-Thought and reinforcement learning for target speaker automatic speech recognition
by: Zhang, Yiru, et al.
Published: (2025)
by: Zhang, Yiru, et al.
Published: (2025)
Charting 15 years of progress in deep learning for speech emotion recognition: A replication study
by: Triantafyllopoulos, Andreas, et al.
Published: (2025)
by: Triantafyllopoulos, Andreas, et al.
Published: (2025)
Similar Items
-
Room-acoustic simulations as an alternative to measurements for audio-algorithm evaluation
by: Götz, Georg, et al.
Published: (2025) -
Speech dereverberation constrained on room impulse response characteristics
by: Bahrman, Louis, et al.
Published: (2024) -
Low algorithmic delay implementation of convolutional beamformer for online joint source separation and dereverberation
by: Mo, Kaien, et al.
Published: (2024) -
A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model
by: Zhao, Dongdi, et al.
Published: (2024) -
AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection
by: Gong, Rong, et al.
Published: (2024)