:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Ishmam, Zarif, Mahir, Zarif, Wasif, Shafnan, Moin, Md. Ishtiak
Format:	Preprint
Published:	2026
Subjects:	Sound Artificial Intelligence
Online Access:	https://arxiv.org/abs/2602.22935
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Bangla-WhisperDiar: Fine-Tuning Whisper and PyAnnote for Bangla Long-Form Speech Recognition and Speaker Diarization
by: Bhuiyan, Mohammed Aman, et al.
Published: (2026)

Robust Target Speaker Diarization and Separation via Augmented Speaker Embedding Sampling
by: Jalal, Md Asif, et al.
Published: (2025)

Bengali-Loop: Community Benchmarks for Long-Form Bangla ASR and Speaker Diarization
by: Tabib, H. M. Shadman, et al.
Published: (2026)

Exploring Speaker Diarization with Mixture of Experts
by: Yang, Gaobin, et al.
Published: (2025)

An Investigation Into Various Approaches For Bengali Long-Form Speech Transcription and Bengali Speaker Diarization
by: Jahan, Epshita, et al.
Published: (2026)

Probabilistic Fusion and Calibration of Neural Speaker Diarization Models
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2025)

Make It Hard to Hear, Easy to Learn: Long-Form Bengali ASR and Speaker Diarization via Extreme Augmentation and Perfect Alignment
by: Hasan, Sanjid, et al.
Published: (2026)

Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
by: Kang, Jiawen, et al.
Published: (2024)

SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models
by: Yin, Han, et al.
Published: (2025)

From Modular to End-to-End Speaker Diarization
by: Landini, Federico
Published: (2024)

SDBench: A Comprehensive Benchmark Suite for Speaker Diarization
by: Pacheco, Eduardo, et al.
Published: (2025)

Leveraging Speaker Embeddings in End-to-End Neural Diarization for Two-Speaker Scenarios
by: Alvarez-Trejos, Juan Ignacio, et al.
Published: (2024)

Adaptability of ASR Models on Low-Resource Language: A Comparative Study of Whisper and Wav2Vec-BERT on Bangla
by: Ridoy, Md Sazzadul Islam, et al.
Published: (2025)

Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter
by: Andrusenko, Andrei, et al.
Published: (2024)

End-to-End Supervised Hierarchical Graph Clustering for Speaker Diarization
by: Singh, Prachi, et al.
Published: (2024)

ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings
by: Mariotte, Theo, et al.
Published: (2024)

Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks
by: Hossain, Md Zarif, et al.
Published: (2024)

Improving Neural Diarization through Speaker Attribute Attractors and Local Dependency Modeling
by: Palzer, David, et al.
Published: (2025)

Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm
by: Li, Zhaoyang, et al.
Published: (2025)

Sim-CLIP: Unsupervised Siamese Adversarial Fine-Tuning for Robust and Semantically-Rich Vision-Language Models
by: Hossain, Md Zarif, et al.
Published: (2024)

Multimodal Emotion Regression with Multi-Objective Optimization and VAD-Aware Audio Modeling for the 10th ABAW EMI Track
by: Huang, Jiawen, et al.
Published: (2026)

Iterative LLM-based improvement for French Clinical Interview Transcription and Speaker Diarization
by: Marie, Ambre, et al.
Published: (2026)

CTC-TTS: LLM-based dual-streaming text-to-speech with CTC alignment
by: Liu, Hanwen, et al.
Published: (2026)

Whisper Speaker Identification: Leveraging Pre-Trained Multilingual Transformers for Robust Speaker Embeddings
by: Emon, Jakaria Islam, et al.
Published: (2025)

Robust Long-Form Bangla Speech Processing: Automatic Speech Recognition and Speaker Diarization
by: Chowdhury, MD. Sagor, et al.
Published: (2026)

Can We Really Repurpose Multi-Speaker ASR Corpus for Speaker Diarization?
by: Horiguchi, Shota, et al.
Published: (2025)

Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder
by: Shakeel, Muhammad, et al.
Published: (2025)

A Novel Automatic Framework for Speaker Drift Detection in Synthesized Speech
by: Huang, Jia-Hong, et al.
Published: (2026)

TinyML for Speech Recognition
by: Barovic, Andrew, et al.
Published: (2025)

When Denoising Hinders: Revisiting Zero-Shot ASR with SAM-Audio and Whisper
by: Islam, Akif, et al.
Published: (2026)

MOSS Transcribe Diarize Technical Report
by: AI, MOSI., et al.
Published: (2026)

Enhancing CTC-based speech recognition with diverse modeling units
by: Han, Shiyi, et al.
Published: (2024)

Assessing the Robustness of Spectral Clustering for Deep Speaker Diarization
by: Raghav, Nikhil, et al.
Published: (2024)

FlexCTC: GPU-powered CTC Beam Decoding With Advanced Contextual Abilities
by: Grigoryan, Lilit, et al.
Published: (2025)

Breaking the Silence: A Dataset and Benchmark for Bangla Text-to-Gloss Translation
by: Abdullah, Sharif Mohammad, et al.
Published: (2025)

End-to-End Joint ASR and Speaker Role Diarization with Child-Adult Interactions
by: Xu, Anfeng, et al.
Published: (2026)

ASR-Synchronized Speaker-Role Diarization
by: Ghosh, Arindam, et al.
Published: (2025)

CineSRD: Leveraging Visual, Acoustic, and Linguistic Cues for Open-World Visual Media Speaker Diarization
by: Huang, Liangbin, et al.
Published: (2026)

Toward Responsible ASR for African American English Speakers: A Scoping Review of Bias and Equity in Speech Technology
by: Cunningham, Jay L., et al.
Published: (2025)

CTC-aligned Audio-Text Embedding for Streaming Open-vocabulary Keyword Spotting
by: Jin, Sichen, et al.
Published: (2024)