Saved in:
| Main Authors: | Cui, Can, Magron, Paul, Sadeghi, Mostafa, Vincent, Emmanuel |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.10234 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
End-to-end Joint Punctuated and Normalized ASR with a Limited Amount of Punctuated Training Data
by: Cui, Can, et al.
Published: (2023)
by: Cui, Can, et al.
Published: (2023)
A Phoneme-Scale Assessment of Multichannel Speech Enhancement Algorithms
by: Monir, Nasser-Eddine, et al.
Published: (2024)
by: Monir, Nasser-Eddine, et al.
Published: (2024)
Evaluating Multichannel Speech Enhancement Algorithms at the Phoneme Scale Across Genders
by: Monir, Nasser-Eddine, et al.
Published: (2025)
by: Monir, Nasser-Eddine, et al.
Published: (2025)
Joint Beamforming and Speaker-Attributed ASR for Real Distant-Microphone Meeting Transcription
by: Cui, Can, et al.
Published: (2024)
by: Cui, Can, et al.
Published: (2024)
End-to-end multi-channel speaker extraction and binaural speech synthesis
by: Chi, Cheng, et al.
Published: (2024)
by: Chi, Cheng, et al.
Published: (2024)
Extending Whisper with prompt tuning to target-speaker ASR
by: Ma, Hao, et al.
Published: (2023)
by: Ma, Hao, et al.
Published: (2023)
Metric Analysis for Spatial Semantic Segmentation of Sound Scenes
by: Mishra, Mayank, et al.
Published: (2025)
by: Mishra, Mayank, et al.
Published: (2025)
Improving Speaker Assignment in Speaker-Attributed ASR for Real Meeting Applications
by: Cui, Can, et al.
Published: (2024)
by: Cui, Can, et al.
Published: (2024)
The Costs of Reproducibility in Music Separation Research: a Replication of Band-Split RNN
by: Magron, Paul, et al.
Published: (2026)
by: Magron, Paul, et al.
Published: (2026)
Speaker Adaptation for Quantised End-to-End ASR Models
by: Zhao, Qiuming, et al.
Published: (2024)
by: Zhao, Qiuming, et al.
Published: (2024)
SpatialEmb: Extract and Encode Spatial Information for 1-Stage Multi-channel Multi-speaker ASR on Arbitrary Microphone Arrays
by: Shao, Yiwen, et al.
Published: (2026)
by: Shao, Yiwen, et al.
Published: (2026)
Hierarchical speaker representation for target speaker extraction
by: He, Shulin, et al.
Published: (2022)
by: He, Shulin, et al.
Published: (2022)
A Calculus-Based Framework for Determining Vocabulary Size in End-to-End ASR
by: Kopparapu, Sunil Kumar
Published: (2026)
by: Kopparapu, Sunil Kumar
Published: (2026)
Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR
by: Li, Shaojun, et al.
Published: (2024)
by: Li, Shaojun, et al.
Published: (2024)
Multi-speaker Text-to-speech Training with Speaker Anonymized Data
by: Huang, Wen-Chin, et al.
Published: (2024)
by: Huang, Wen-Chin, et al.
Published: (2024)
You don't understand me!: Comparing ASR results for L1 and L2 speakers of Swedish
by: Cumbal, Ronald, et al.
Published: (2024)
by: Cumbal, Ronald, et al.
Published: (2024)
Listening to Multi-talker Conversations: Modular and End-to-end Perspectives
by: Raj, Desh
Published: (2024)
by: Raj, Desh
Published: (2024)
Multi-channel multi-speaker transformer for speech recognition
by: Yifan, Guo, et al.
Published: (2026)
by: Yifan, Guo, et al.
Published: (2026)
Frequency-Weighted Training Losses for Phoneme-Level DNN-based Speech Enhancement
by: Monir, Nasser-Eddine, et al.
Published: (2025)
by: Monir, Nasser-Eddine, et al.
Published: (2025)
Right Label Context in End-to-End Training of Time-Synchronous ASR Models
by: Raissi, Tina, et al.
Published: (2025)
by: Raissi, Tina, et al.
Published: (2025)
End-to-End Joint ASR and Speaker Role Diarization with Child-Adult Interactions
by: Xu, Anfeng, et al.
Published: (2026)
by: Xu, Anfeng, et al.
Published: (2026)
SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
by: Zhao, Qiuming, et al.
Published: (2024)
by: Zhao, Qiuming, et al.
Published: (2024)
Lightweight Front-end Enhancement for Robust ASR via Frame Resampling and Sub-Band Pruning
by: Zhao, Siyi, et al.
Published: (2025)
by: Zhao, Siyi, et al.
Published: (2025)
Text adaptation for speaker verification with speaker-text factorized embeddings
by: Yang, Yexin, et al.
Published: (2025)
by: Yang, Yexin, et al.
Published: (2025)
Improving curriculum learning for target speaker extraction with synthetic speakers
by: Liu, Yun, et al.
Published: (2024)
by: Liu, Yun, et al.
Published: (2024)
A Benchmark for Multi-speaker Anonymization
by: Miao, Xiaoxiao, et al.
Published: (2024)
by: Miao, Xiaoxiao, et al.
Published: (2024)
Improving endpoint detection in end-to-end streaming ASR for conversational speech
by: C, Anandh, et al.
Published: (2025)
by: C, Anandh, et al.
Published: (2025)
Diffusion-based Frameworks for Unsupervised Speech Enhancement
by: Ayilo, Jean-Eudes, et al.
Published: (2026)
by: Ayilo, Jean-Eudes, et al.
Published: (2026)
Joint Minimum Processing Beamforming and Near-end Listening Enhancement
by: Fuglsig, Andreas J., et al.
Published: (2023)
by: Fuglsig, Andreas J., et al.
Published: (2023)
On the influence of language similarity in non-target speaker verification trials
by: Reuter, Paul M., et al.
Published: (2025)
by: Reuter, Paul M., et al.
Published: (2025)
Speaking Without Sound: Multi-speaker Silent Speech Voicing with Facial Inputs Only
by: Lee, Jaejun, et al.
Published: (2026)
by: Lee, Jaejun, et al.
Published: (2026)
LABNet: A Lightweight Attentive Beamforming Network for Ad-hoc Multichannel Microphone Invariant Real-Time Speech Enhancement
by: Yan, Haoyin, et al.
Published: (2025)
by: Yan, Haoyin, et al.
Published: (2025)
Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation
by: Zhu, Qiushi, et al.
Published: (2024)
by: Zhu, Qiushi, et al.
Published: (2024)
Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)
by: Zhou, Xuanru, et al.
Published: (2024)
Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR
by: Mei, Yuxiang, et al.
Published: (2026)
by: Mei, Yuxiang, et al.
Published: (2026)
WaveTransfer: A Flexible End-to-end Multi-instrument Timbre Transfer with Diffusion
by: Baoueb, Teysir, et al.
Published: (2024)
by: Baoueb, Teysir, et al.
Published: (2024)
Streaming Bilingual End-to-End ASR model using Attention over Multiple Softmax
by: Patil, Aditya, et al.
Published: (2024)
by: Patil, Aditya, et al.
Published: (2024)
TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR
by: Ravi, Nagarathna, et al.
Published: (2024)
by: Ravi, Nagarathna, et al.
Published: (2024)
Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator
by: Bataev, Vladimir, et al.
Published: (2023)
by: Bataev, Vladimir, et al.
Published: (2023)
SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription
by: Grossman, Raymond, et al.
Published: (2025)
by: Grossman, Raymond, et al.
Published: (2025)
Similar Items
-
End-to-end Joint Punctuated and Normalized ASR with a Limited Amount of Punctuated Training Data
by: Cui, Can, et al.
Published: (2023) -
A Phoneme-Scale Assessment of Multichannel Speech Enhancement Algorithms
by: Monir, Nasser-Eddine, et al.
Published: (2024) -
Evaluating Multichannel Speech Enhancement Algorithms at the Phoneme Scale Across Genders
by: Monir, Nasser-Eddine, et al.
Published: (2025) -
Joint Beamforming and Speaker-Attributed ASR for Real Distant-Microphone Meeting Transcription
by: Cui, Can, et al.
Published: (2024) -
End-to-end multi-channel speaker extraction and binaural speech synthesis
by: Chi, Cheng, et al.
Published: (2024)