:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Cui, Can, Magron, Paul, Sadeghi, Mostafa, Vincent, Emmanuel
Format:	Preprint
Published:	2025
Subjects:	Sound
Online Access:	https://arxiv.org/abs/2509.10234
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

End-to-end Joint Punctuated and Normalized ASR with a Limited Amount of Punctuated Training Data
by: Cui, Can, et al.
Published: (2023)

A Phoneme-Scale Assessment of Multichannel Speech Enhancement Algorithms
by: Monir, Nasser-Eddine, et al.
Published: (2024)

Evaluating Multichannel Speech Enhancement Algorithms at the Phoneme Scale Across Genders
by: Monir, Nasser-Eddine, et al.
Published: (2025)

Joint Beamforming and Speaker-Attributed ASR for Real Distant-Microphone Meeting Transcription
by: Cui, Can, et al.
Published: (2024)

End-to-end multi-channel speaker extraction and binaural speech synthesis
by: Chi, Cheng, et al.
Published: (2024)

Extending Whisper with prompt tuning to target-speaker ASR
by: Ma, Hao, et al.
Published: (2023)

Metric Analysis for Spatial Semantic Segmentation of Sound Scenes
by: Mishra, Mayank, et al.
Published: (2025)

Improving Speaker Assignment in Speaker-Attributed ASR for Real Meeting Applications
by: Cui, Can, et al.
Published: (2024)

The Costs of Reproducibility in Music Separation Research: a Replication of Band-Split RNN
by: Magron, Paul, et al.
Published: (2026)

Speaker Adaptation for Quantised End-to-End ASR Models
by: Zhao, Qiuming, et al.
Published: (2024)

SpatialEmb: Extract and Encode Spatial Information for 1-Stage Multi-channel Multi-speaker ASR on Arbitrary Microphone Arrays
by: Shao, Yiwen, et al.
Published: (2026)

Hierarchical speaker representation for target speaker extraction
by: He, Shulin, et al.
Published: (2022)

A Calculus-Based Framework for Determining Vocabulary Size in End-to-End ASR
by: Kopparapu, Sunil Kumar
Published: (2026)

Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR
by: Li, Shaojun, et al.
Published: (2024)

Multi-speaker Text-to-speech Training with Speaker Anonymized Data
by: Huang, Wen-Chin, et al.
Published: (2024)

You don't understand me!: Comparing ASR results for L1 and L2 speakers of Swedish
by: Cumbal, Ronald, et al.
Published: (2024)

Listening to Multi-talker Conversations: Modular and End-to-end Perspectives
by: Raj, Desh
Published: (2024)

Multi-channel multi-speaker transformer for speech recognition
by: Yifan, Guo, et al.
Published: (2026)

Frequency-Weighted Training Losses for Phoneme-Level DNN-based Speech Enhancement
by: Monir, Nasser-Eddine, et al.
Published: (2025)

Right Label Context in End-to-End Training of Time-Synchronous ASR Models
by: Raissi, Tina, et al.
Published: (2025)

End-to-End Joint ASR and Speaker Role Diarization with Child-Adult Interactions
by: Xu, Anfeng, et al.
Published: (2026)

SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
by: Zhao, Qiuming, et al.
Published: (2024)

Lightweight Front-end Enhancement for Robust ASR via Frame Resampling and Sub-Band Pruning
by: Zhao, Siyi, et al.
Published: (2025)

Text adaptation for speaker verification with speaker-text factorized embeddings
by: Yang, Yexin, et al.
Published: (2025)

Improving curriculum learning for target speaker extraction with synthetic speakers
by: Liu, Yun, et al.
Published: (2024)

A Benchmark for Multi-speaker Anonymization
by: Miao, Xiaoxiao, et al.
Published: (2024)

Improving endpoint detection in end-to-end streaming ASR for conversational speech
by: C, Anandh, et al.
Published: (2025)

Diffusion-based Frameworks for Unsupervised Speech Enhancement
by: Ayilo, Jean-Eudes, et al.
Published: (2026)

Joint Minimum Processing Beamforming and Near-end Listening Enhancement
by: Fuglsig, Andreas J., et al.
Published: (2023)

On the influence of language similarity in non-target speaker verification trials
by: Reuter, Paul M., et al.
Published: (2025)

Speaking Without Sound: Multi-speaker Silent Speech Voicing with Facial Inputs Only
by: Lee, Jaejun, et al.
Published: (2026)

LABNet: A Lightweight Attentive Beamforming Network for Ad-hoc Multichannel Microphone Invariant Real-Time Speech Enhancement
by: Yan, Haoyin, et al.
Published: (2025)

Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation
by: Zhu, Qiushi, et al.
Published: (2024)

Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)

Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR
by: Mei, Yuxiang, et al.
Published: (2026)

WaveTransfer: A Flexible End-to-end Multi-instrument Timbre Transfer with Diffusion
by: Baoueb, Teysir, et al.
Published: (2024)

Streaming Bilingual End-to-End ASR model using Attention over Multiple Softmax
by: Patil, Aditya, et al.
Published: (2024)

TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR
by: Ravi, Nagarathna, et al.
Published: (2024)

Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator
by: Bataev, Vladimir, et al.
Published: (2023)

SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription
by: Grossman, Raymond, et al.
Published: (2025)