Saved in:
| Main Authors: | Zheng, Xianrui, Sun, Guangzhi, Zhang, Chao, Woodland, Philip C. |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.02007 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
DNCASR: End-to-End Training for Speaker-Attributed ASR
by: Zheng, Xianrui, et al.
Published: (2025)
by: Zheng, Xianrui, et al.
Published: (2025)
Minimising Biasing Word Errors for Contextual ASR with the Tree-Constrained Pointer Generator
by: Sun, Guangzhi, et al.
Published: (2022)
by: Sun, Guangzhi, et al.
Published: (2022)
Speaker Adaptation for Quantised End-to-End ASR Models
by: Zhao, Qiuming, et al.
Published: (2024)
by: Zhao, Qiuming, et al.
Published: (2024)
Audio-Conditioned Diffusion LLMs for ASR and Deliberation Processing
by: Wang, Mengqi, et al.
Published: (2025)
by: Wang, Mengqi, et al.
Published: (2025)
SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
by: Zhao, Qiuming, et al.
Published: (2024)
by: Zhao, Qiuming, et al.
Published: (2024)
Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in Zero and Few-shot Learning
by: Deng, Keqi, et al.
Published: (2024)
by: Deng, Keqi, et al.
Published: (2024)
SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
by: Fan, Zhiyun, et al.
Published: (2024)
by: Fan, Zhiyun, et al.
Published: (2024)
Estimating the Uncertainty in Emotion Attributes using Deep Evidential Regression
by: Wu, Wen, et al.
Published: (2023)
by: Wu, Wen, et al.
Published: (2023)
Parameter Efficient Finetuning for Speech Emotion Recognition and Domain Adaptation
by: Lashkarashvili, Nineli, et al.
Published: (2024)
by: Lashkarashvili, Nineli, et al.
Published: (2024)
Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition
by: Deng, Keqi, et al.
Published: (2023)
by: Deng, Keqi, et al.
Published: (2023)
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
by: Yang, Xiaoyu, et al.
Published: (2024)
by: Yang, Xiaoyu, et al.
Published: (2024)
A Toolkit for Joint Speaker Diarization and Identification with Application to Speaker-Attributed ASR
by: Morrone, Giovanni, et al.
Published: (2024)
by: Morrone, Giovanni, et al.
Published: (2024)
Speaker-Reasoner: Scaling Interaction Turns and Reasoning Patterns for Timestamped Speaker-Attributed ASR
by: Lin, Zhennan, et al.
Published: (2026)
by: Lin, Zhennan, et al.
Published: (2026)
Label-Synchronous Neural Transducer for E2E Simultaneous Speech Translation
by: Deng, Keqi, et al.
Published: (2024)
by: Deng, Keqi, et al.
Published: (2024)
MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models
by: Nguyen, Thai-Binh, et al.
Published: (2024)
by: Nguyen, Thai-Binh, et al.
Published: (2024)
Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models
by: Zhao, Yiyang, et al.
Published: (2024)
by: Zhao, Yiyang, et al.
Published: (2024)
OCR-Enhanced Multimodal ASR Can Read While Listening
by: Chen, Junli, et al.
Published: (2026)
by: Chen, Junli, et al.
Published: (2026)
Multiplexing Neural Audio Watermarks
by: Yuan, Zheqi, et al.
Published: (2025)
by: Yuan, Zheqi, et al.
Published: (2025)
Distribution-based Emotion Recognition in Conversation
by: Wu, Wen, et al.
Published: (2022)
by: Wu, Wen, et al.
Published: (2022)
Multi-Channel Multi-Speaker ASR Using Target Speaker's Solo Segment
by: Shao, Yiwen, et al.
Published: (2024)
by: Shao, Yiwen, et al.
Published: (2024)
Target Speaker ASR with Whisper
by: Polok, Alexander, et al.
Published: (2024)
by: Polok, Alexander, et al.
Published: (2024)
Speaker-Smoothed kNN Speaker Adaptation for End-to-End ASR
by: Li, Shaojun, et al.
Published: (2024)
by: Li, Shaojun, et al.
Published: (2024)
Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR
by: Wang, Weiqing, et al.
Published: (2025)
by: Wang, Weiqing, et al.
Published: (2025)
On Speaker Attribution with SURT
by: Raj, Desh, et al.
Published: (2024)
by: Raj, Desh, et al.
Published: (2024)
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
by: Guo, Pengcheng, et al.
Published: (2024)
by: Guo, Pengcheng, et al.
Published: (2024)
Can We Really Repurpose Multi-Speaker ASR Corpus for Speaker Diarization?
by: Horiguchi, Shota, et al.
Published: (2025)
by: Horiguchi, Shota, et al.
Published: (2025)
Emotional Styles Hide in Deep Speaker Embeddings: Disentangle Deep Speaker Embeddings for Speaker Clustering
by: Lin, Chaohao, et al.
Published: (2025)
by: Lin, Chaohao, et al.
Published: (2025)
SC-SOT: Conditioning the Decoder on Diarized Speaker Information for End-to-End Overlapped Speech Recognition
by: Hirano, Yuta, et al.
Published: (2025)
by: Hirano, Yuta, et al.
Published: (2025)
Overlap-Adaptive Hybrid Speaker Diarization and ASR-Aware Observation Addition for MISP 2025 Challenge
by: Huang, Shangkun, et al.
Published: (2025)
by: Huang, Shangkun, et al.
Published: (2025)
Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR
by: Wang, Weiqing, et al.
Published: (2024)
by: Wang, Weiqing, et al.
Published: (2024)
Joint ASR and Speaker Role Tagging with Serialized Output Training
by: Xu, Anfeng, et al.
Published: (2025)
by: Xu, Anfeng, et al.
Published: (2025)
Scaling Multi-Talker ASR with Speaker-Agnostic Activity Streams
by: He, Xiluo, et al.
Published: (2025)
by: He, Xiluo, et al.
Published: (2025)
NTT Multi-Speaker ASR System for the DASR Task of CHiME-8 Challenge
by: Kamo, Naoyuki, et al.
Published: (2024)
by: Kamo, Naoyuki, et al.
Published: (2024)
Leveraging ASR Pretrained Conformers for Speaker Verification through Transfer Learning and Knowledge Distillation
by: Cai, Danwei, et al.
Published: (2023)
by: Cai, Danwei, et al.
Published: (2023)
Mind the Gap: Impact of Synthetic Conversational Data on Multi-Talker ASR and Speaker Diarization
by: Polok, Alexander, et al.
Published: (2026)
by: Polok, Alexander, et al.
Published: (2026)
Neural Forward Filtering for Speaker-Image Separation
by: Sun, Jingqi, et al.
Published: (2025)
by: Sun, Jingqi, et al.
Published: (2025)
Lightweight Target-Speaker-Based Overlap Transcription for Practical Streaming ASR
by: Pražák, Aleš, et al.
Published: (2025)
by: Pražák, Aleš, et al.
Published: (2025)
Speaker Attributed Automatic Speech Recognition Using Speech Aware LLMS
by: Aronowitz, Hagai, et al.
Published: (2026)
by: Aronowitz, Hagai, et al.
Published: (2026)
Elevating Robust Multi-Talker ASR by Decoupling Speaker Separation and Speech Recognition
by: Yang, Yufeng, et al.
Published: (2025)
by: Yang, Yufeng, et al.
Published: (2025)
End-to-End Joint ASR and Speaker Role Diarization with Child-Adult Interactions
by: Xu, Anfeng, et al.
Published: (2026)
by: Xu, Anfeng, et al.
Published: (2026)
Similar Items
-
DNCASR: End-to-End Training for Speaker-Attributed ASR
by: Zheng, Xianrui, et al.
Published: (2025) -
Minimising Biasing Word Errors for Contextual ASR with the Tree-Constrained Pointer Generator
by: Sun, Guangzhi, et al.
Published: (2022) -
Speaker Adaptation for Quantised End-to-End ASR Models
by: Zhao, Qiuming, et al.
Published: (2024) -
Audio-Conditioned Diffusion LLMs for ASR and Deliberation Processing
by: Wang, Mengqi, et al.
Published: (2025) -
SAML: Speaker Adaptive Mixture of LoRA Experts for End-to-End ASR
by: Zhao, Qiuming, et al.
Published: (2024)