Saved in:
| Main Authors: | Ishmam, Zarif, Mahir, Zarif, Wasif, Shafnan, Moin, Md. Ishtiak |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2602.22935 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Bangla-WhisperDiar: Fine-Tuning Whisper and PyAnnote for Bangla Long-Form Speech Recognition and Speaker Diarization
by: Bhuiyan, Mohammed Aman, et al.
Published: (2026)
by: Bhuiyan, Mohammed Aman, et al.
Published: (2026)
Robust Target Speaker Diarization and Separation via Augmented Speaker Embedding Sampling
by: Jalal, Md Asif, et al.
Published: (2025)
by: Jalal, Md Asif, et al.
Published: (2025)
Bengali-Loop: Community Benchmarks for Long-Form Bangla ASR and Speaker Diarization
by: Tabib, H. M. Shadman, et al.
Published: (2026)
by: Tabib, H. M. Shadman, et al.
Published: (2026)
Exploring Speaker Diarization with Mixture of Experts
by: Yang, Gaobin, et al.
Published: (2025)
by: Yang, Gaobin, et al.
Published: (2025)
An Investigation Into Various Approaches For Bengali Long-Form Speech Transcription and Bengali Speaker Diarization
by: Jahan, Epshita, et al.
Published: (2026)
by: Jahan, Epshita, et al.
Published: (2026)
Probabilistic Fusion and Calibration of Neural Speaker Diarization Models
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2025)
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2025)
Make It Hard to Hear, Easy to Learn: Long-Form Bengali ASR and Speaker Diarization via Extreme Augmentation and Perfect Alignment
by: Hasan, Sanjid, et al.
Published: (2026)
by: Hasan, Sanjid, et al.
Published: (2026)
Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
by: Kang, Jiawen, et al.
Published: (2024)
by: Kang, Jiawen, et al.
Published: (2024)
SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models
by: Yin, Han, et al.
Published: (2025)
by: Yin, Han, et al.
Published: (2025)
From Modular to End-to-End Speaker Diarization
by: Landini, Federico
Published: (2024)
by: Landini, Federico
Published: (2024)
SDBench: A Comprehensive Benchmark Suite for Speaker Diarization
by: Pacheco, Eduardo, et al.
Published: (2025)
by: Pacheco, Eduardo, et al.
Published: (2025)
Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2024)
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2024)
Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla
by: Ridoy, Md Sazzadul Islam, et al.
Published: (2025)
by: Ridoy, Md Sazzadul Islam, et al.
Published: (2025)
Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter
by: Andrusenko, Andrei, et al.
Published: (2024)
by: Andrusenko, Andrei, et al.
Published: (2024)
End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization
by: Singh, Prachi, et al.
Published: (2024)
by: Singh, Prachi, et al.
Published: (2024)
ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings
by: Mariotte, Theo, et al.
Published: (2024)
by: Mariotte, Theo, et al.
Published: (2024)
Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks
by: Hossain, Md Zarif, et al.
Published: (2024)
by: Hossain, Md Zarif, et al.
Published: (2024)
Improving Neural Diarization through Speaker Attribute Attractors and Local Dependency Modeling
by: Palzer, David, et al.
Published: (2025)
by: Palzer, David, et al.
Published: (2025)
Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm
by: Li, Zhaoyang, et al.
Published: (2025)
by: Li, Zhaoyang, et al.
Published: (2025)
Sim-CLIP: Unsupervised Siamese Adversarial Fine-Tuning for Robust and Semantically-Rich Vision-Language Models
by: Hossain, Md Zarif, et al.
Published: (2024)
by: Hossain, Md Zarif, et al.
Published: (2024)
Multimodal Emotion Regression with Multi-Objective Optimization and VAD-Aware Audio Modeling for the 10th ABAW EMI Track
by: Huang, Jiawen, et al.
Published: (2026)
by: Huang, Jiawen, et al.
Published: (2026)
Iterative LLM-based improvement for French Clinical Interview Transcription and Speaker Diarization
by: Marie, Ambre, et al.
Published: (2026)
by: Marie, Ambre, et al.
Published: (2026)
CTC-TTS: LLM-based dual-streaming text-to-speech with CTC alignment
by: Liu, Hanwen, et al.
Published: (2026)
by: Liu, Hanwen, et al.
Published: (2026)
Whisper Speaker Identification: Leveraging Pre-Trained Multilingual Transformers for Robust Speaker Embeddings
by: Emon, Jakaria Islam, et al.
Published: (2025)
by: Emon, Jakaria Islam, et al.
Published: (2025)
Robust Long-Form Bangla Speech Processing: Automatic Speech Recognition and Speaker Diarization
by: Chowdhury, MD. Sagor, et al.
Published: (2026)
by: Chowdhury, MD. Sagor, et al.
Published: (2026)
Can We Really Repurpose Multi-Speaker ASR Corpus for Speaker Diarization?
by: Horiguchi, Shota, et al.
Published: (2025)
by: Horiguchi, Shota, et al.
Published: (2025)
Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder
by: Shakeel, Muhammad, et al.
Published: (2025)
by: Shakeel, Muhammad, et al.
Published: (2025)
A Novel Automatic Framework for Speaker Drift Detection in Synthesized Speech
by: Huang, Jia-Hong, et al.
Published: (2026)
by: Huang, Jia-Hong, et al.
Published: (2026)
TinyML for Speech Recognition
by: Barovic, Andrew, et al.
Published: (2025)
by: Barovic, Andrew, et al.
Published: (2025)
When Denoising Hinders: Revisiting Zero-Shot ASR with SAM-Audio and Whisper
by: Islam, Akif, et al.
Published: (2026)
by: Islam, Akif, et al.
Published: (2026)
MOSS Transcribe Diarize Technical Report
by: AI, MOSI., et al.
Published: (2026)
by: AI, MOSI., et al.
Published: (2026)
Enhancing CTC-based speech recognition with diverse modeling units
by: Han, Shiyi, et al.
Published: (2024)
by: Han, Shiyi, et al.
Published: (2024)
Assessing the Robustness of Spectral Clustering for Deep Speaker Diarization
by: Raghav, Nikhil, et al.
Published: (2024)
by: Raghav, Nikhil, et al.
Published: (2024)
FlexCTC: GPU-powered CTC Beam Decoding With Advanced Contextual Abilities
by: Grigoryan, Lilit, et al.
Published: (2025)
by: Grigoryan, Lilit, et al.
Published: (2025)
Breaking the Silence: A Dataset and Benchmark for Bangla Text-to-Gloss Translation
by: Abdullah, Sharif Mohammad, et al.
Published: (2025)
by: Abdullah, Sharif Mohammad, et al.
Published: (2025)
End-to-End Joint ASR and Speaker Role Diarization with Child-Adult Interactions
by: Xu, Anfeng, et al.
Published: (2026)
by: Xu, Anfeng, et al.
Published: (2026)
ASR-Synchronized Speaker-Role Diarization
by: Ghosh, Arindam, et al.
Published: (2025)
by: Ghosh, Arindam, et al.
Published: (2025)
CineSRD: Leveraging Visual, Acoustic, and Linguistic Cues for Open-World Visual Media Speaker Diarization
by: Huang, Liangbin, et al.
Published: (2026)
by: Huang, Liangbin, et al.
Published: (2026)
Toward Responsible ASR for African American English Speakers: A Scoping Review of Bias and Equity in Speech Technology
by: Cunningham, Jay L., et al.
Published: (2025)
by: Cunningham, Jay L., et al.
Published: (2025)
CTC-aligned Audio-Text Embedding for Streaming Open-vocabulary Keyword Spotting
by: Jin, Sichen, et al.
Published: (2024)
by: Jin, Sichen, et al.
Published: (2024)
Similar Items
-
Bangla-WhisperDiar: Fine-Tuning Whisper and PyAnnote for Bangla Long-Form Speech Recognition and Speaker Diarization
by: Bhuiyan, Mohammed Aman, et al.
Published: (2026) -
Robust Target Speaker Diarization and Separation via Augmented Speaker Embedding Sampling
by: Jalal, Md Asif, et al.
Published: (2025) -
Bengali-Loop: Community Benchmarks for Long-Form Bangla ASR and Speaker Diarization
by: Tabib, H. M. Shadman, et al.
Published: (2026) -
Exploring Speaker Diarization with Mixture of Experts
by: Yang, Gaobin, et al.
Published: (2025) -
An Investigation Into Various Approaches For Bengali Long-Form Speech Transcription and Bengali Speaker Diarization
by: Jahan, Epshita, et al.
Published: (2026)