:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zheng, Xianrui, Sun, Guangzhi, Zhang, Chao, Woodland, Philip C.
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2407.02007
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

DNCASR: End-to-End Training for Speaker-Attributed ASR
by: Zheng, Xianrui, et al.
Published: (2025)

Minimising Biasing Word Errors for Contextual ASR with the Tree-Constrained Pointer Generator
by: Sun, Guangzhi, et al.
Published: (2022)

Speaker Adaptation for Quantised End-to-End ASR Models
by: Zhao, Qiuming, et al.
Published: (2024)

Audio-Conditioned Diffusion LLMs for ASR and Deliberation Processing
by: Wang, Mengqi, et al.
Published: (2025)

SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
by: Zhao, Qiuming, et al.
Published: (2024)

Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in Zero and Few-shot Learning
by: Deng, Keqi, et al.
Published: (2024)

SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
by: Fan, Zhiyun, et al.
Published: (2024)

Estimating the Uncertainty in Emotion Attributes using Deep Evidential Regression
by: Wu, Wen, et al.
Published: (2023)

Parameter Efficient Finetuning for Speech Emotion Recognition and Domain Adaptation
by: Lashkarashvili, Nineli, et al.
Published: (2024)

Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition
by: Deng, Keqi, et al.
Published: (2023)

MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
by: Yang, Xiaoyu, et al.
Published: (2024)

A Toolkit for Joint Speaker Diarization and Identification with Application to Speaker-Attributed ASR
by: Morrone, Giovanni, et al.
Published: (2024)

Speaker-Reasoner: Scaling Interaction Turns and Reasoning Patterns for Timestamped Speaker-Attributed ASR
by: Lin, Zhennan, et al.
Published: (2026)

Label-Synchronous Neural Transducer for E2E Simultaneous Speech Translation
by: Deng, Keqi, et al.
Published: (2024)

MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models
by: Nguyen, Thai-Binh, et al.
Published: (2024)

Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models
by: Zhao, Yiyang, et al.
Published: (2024)

OCR-Enhanced Multimodal ASR Can Read While Listening
by: Chen, Junli, et al.
Published: (2026)

Multiplexing Neural Audio Watermarks
by: Yuan, Zheqi, et al.
Published: (2025)

Distribution-based Emotion Recognition in Conversation
by: Wu, Wen, et al.
Published: (2022)

Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment
by: Shao, Yiwen, et al.
Published: (2024)

Target Speaker ASR with Whisper
by: Polok, Alexander, et al.
Published: (2024)

Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR
by: Li, Shaojun, et al.
Published: (2024)

Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR
by: Wang, Weiqing, et al.
Published: (2025)

On Speaker Attribution with SURT
by: Raj, Desh, et al.
Published: (2024)

SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
by: Guo, Pengcheng, et al.
Published: (2024)

Can We Really Repurpose Multi-Speaker ASR Corpus for Speaker Diarization?
by: Horiguchi, Shota, et al.
Published: (2025)

Emotional Styles Hide in Deep Speaker Embeddings: Disentangle Deep Speaker Embeddings for Speaker Clustering
by: Lin, Chaohao, et al.
Published: (2025)

SC-SOT: Conditioning the Decoder on Diarized Speaker Information for End-to-End Overlapped Speech Recognition
by: Hirano, Yuta, et al.
Published: (2025)

Overlap-Adaptive Hybrid Speaker Diarization and ASR-Aware Observation Addition for MISP 2025 Challenge
by: Huang, Shangkun, et al.
Published: (2025)

Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR
by: Wang, Weiqing, et al.
Published: (2024)

Joint ASR and Speaker Role Tagging with Serialized Output Training
by: Xu, Anfeng, et al.
Published: (2025)

Scaling Multi-Talker ASR with Speaker-Agnostic Activity Streams
by: He, Xiluo, et al.
Published: (2025)

NTT Multi-Speaker ASR System for the DASR Task of CHiME-8 Challenge
by: Kamo, Naoyuki, et al.
Published: (2024)

Leveraging ASR Pretrained Conformers for Speaker Verification through Transfer Learning and Knowledge Distillation
by: Cai, Danwei, et al.
Published: (2023)

Mind the Gap: Impact of Synthetic Conversational Data on Multi-Talker ASR and Speaker Diarization
by: Polok, Alexander, et al.
Published: (2026)

Neural Forward Filtering for Speaker-Image Separation
by: Sun, Jingqi, et al.
Published: (2025)

Lightweight Target-Speaker-Based Overlap Transcription for Practical Streaming ASR
by: Pražák, Aleš, et al.
Published: (2025)

Speaker Attributed Automatic Speech Recognition Using Speech Aware LLMS
by: Aronowitz, Hagai, et al.
Published: (2026)

Elevating Robust Multi-Talker ASR by Decoupling Speaker Separation and Speech Recognition
by: Yang, Yufeng, et al.
Published: (2025)

End-to-End Joint ASR and Speaker Role Diarization with Child-Adult Interactions
by: Xu, Anfeng, et al.
Published: (2026)