:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Hu, Yifan, Yang, Peiji, Wang, Zhisheng, Zhong, Yicheng, Liu, Rui
Format:	Preprint
Published:	2026
Subjects:	Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2601.03712
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Optimizing Neural Speech Codec for Low-Bitrate Compression via Multi-Scale Encoding
by: Yang, Peiji, et al.
Published: (2024)

Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification
by: Zhang, Li, et al.
Published: (2024)

Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection
by: Wang, Haoyu, et al.
Published: (2024)

WhisperVC: Decoupled Cross-Domain Alignment and Speech Generation for Low-Resource Whisper-to-Normal Conversion
by: Liu, Dong, et al.
Published: (2025)

SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
by: Guo, Pengcheng, et al.
Published: (2024)

Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models
by: Zhao, Yiyang, et al.
Published: (2024)

M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper
by: Zhou, Jiaming, et al.
Published: (2024)

Target Speaker ASR with Whisper
by: Polok, Alexander, et al.
Published: (2024)

WhispEar: A Bi-directional Framework for Scaling Whispered Speech Conversion via Pseudo-Parallel Whisper Generation
by: Fang, Zihao, et al.
Published: (2026)

MMW: Side Talk Rejection Multi-Microphone Whisper on Smart Glasses
by: Liu, Yang, et al.
Published: (2025)

Transcript-Prompted Whisper with Dictionary-Enhanced Decoding for Japanese Speech Annotation
by: Hu, Rui, et al.
Published: (2025)

Adopting Whisper for Confidence Estimation
by: Aggarwal, Vaibhav, et al.
Published: (2025)

WhisperFlow: speech foundation models in real time
by: Wang, Rongxiang, et al.
Published: (2024)

Prompting Whisper for Joint Speech Transcription and Diarization
by: Zamyrova, Mariia, et al.
Published: (2026)

Probing Whisper for Dysarthric Speech in Detection and Assessment
by: Yue, Zhengjun, et al.
Published: (2025)

A Study on Incorporating Whisper for Robust Speech Assessment
by: Zezario, Ryandhimas E., et al.
Published: (2023)

One Whisper to Grade Them All
by: Phan, Nhan, et al.
Published: (2025)

Whisper Has an Internal Word Aligner
by: Yeh, Sung-Lin, et al.
Published: (2025)

Sparsely Shared LoRA on Whisper for Child Speech Recognition
by: Liu, Wei, et al.
Published: (2023)

WhisperRT -- Turning Whisper into a Causal Streaming Model
by: Krichli, Tomer, et al.
Published: (2025)

BaldWhisper: Faster Whisper with Head Shearing and Layer Merging
by: Sy, Yaya, et al.
Published: (2025)

LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
by: Song, Zheshu, et al.
Published: (2024)

Can Whisper perform speech-based in-context learning?
by: Wang, Siyin, et al.
Published: (2023)

PhoWhisper: Automatic Speech Recognition for Vietnamese
by: Le, Thanh-Thien, et al.
Published: (2024)

WhisperD: Dementia Speech Recognition and Filler Word Detection with Whisper
by: Akinrintoyo, Emmanuel, et al.
Published: (2025)

Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models
by: Li, Weiqin, et al.
Published: (2024)

Speech Intelligibility Assessment with Uncertainty-Aware Whisper Embeddings and sLSTM
by: Zezario, Ryandhimas E., et al.
Published: (2025)

Leveraging Self-Supervised Models for Automatic Whispered Speech Recognition
by: Farhadipour, Aref, et al.
Published: (2024)

Improving Rare-Word Recognition of Whisper in Zero-Shot Settings
by: Jogi, Yash, et al.
Published: (2025)

Exploiting Music Source Separation for Automatic Lyrics Transcription with Whisper
by: Syed, Jaza, et al.
Published: (2025)

A Self-Training Approach for Whisper to Enhance Long Dysarthric Speech Recognition
by: Wang, Shiyao, et al.
Published: (2025)

Bangla-WhisperDiar: Fine-Tuning Whisper and PyAnnote for Bangla Long-Form Speech Recognition and Speaker Diarization
by: Bhuiyan, Mohammed Aman, et al.
Published: (2026)

Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding
by: Li, Mohan, et al.
Published: (2024)

Extending Whisper with prompt tuning to target-speaker ASR
by: Ma, Hao, et al.
Published: (2023)

Adapting Whisper for Streaming Speech Recognition via Two-Pass Decoding
by: Zhou, Haoran, et al.
Published: (2025)

State-Space Models in Efficient Whispered and Multi-dialect Speech Recognition
by: Farhadipour, Aref, et al.
Published: (2025)

Deepfake Word Detection by Next-token Prediction using Fine-tuned Whisper
by: Tran, Hoan My, et al.
Published: (2026)

Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
by: Rouditchenko, Andrew, et al.
Published: (2024)

Deepfake Detection of Singing Voices With Whisper Encodings
by: Sharma, Falguni, et al.
Published: (2025)

DQ-Whisper: Joint Distillation and Quantization for Efficient Multilingual Speech Recognition
by: Shao, Hang, et al.
Published: (2023)