Saved in:
| Main Authors: | Hu, Yifan, Yang, Peiji, Wang, Zhisheng, Zhong, Yicheng, Liu, Rui |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.03712 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Optimizing Neural Speech Codec for Low-Bitrate Compression via Multi-Scale Encoding
by: Yang, Peiji, et al.
Published: (2024)
by: Yang, Peiji, et al.
Published: (2024)
Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification
by: Zhang, Li, et al.
Published: (2024)
by: Zhang, Li, et al.
Published: (2024)
Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection
by: Wang, Haoyu, et al.
Published: (2024)
by: Wang, Haoyu, et al.
Published: (2024)
WhisperVC: Decoupled Cross-Domain Alignment and Speech Generation for Low-Resource Whisper-to-Normal Conversion
by: Liu, Dong, et al.
Published: (2025)
by: Liu, Dong, et al.
Published: (2025)
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
by: Guo, Pengcheng, et al.
Published: (2024)
by: Guo, Pengcheng, et al.
Published: (2024)
Whisper-PMFA: Partial Multi-Scale Feature Aggregation for Speaker Verification using Whisper Models
by: Zhao, Yiyang, et al.
Published: (2024)
by: Zhao, Yiyang, et al.
Published: (2024)
M2R-Whisper: Multi-stage and Multi-scale Retrieval Augmentation for Enhancing Whisper
by: Zhou, Jiaming, et al.
Published: (2024)
by: Zhou, Jiaming, et al.
Published: (2024)
Target Speaker ASR with Whisper
by: Polok, Alexander, et al.
Published: (2024)
by: Polok, Alexander, et al.
Published: (2024)
WhispEar: A Bi-directional Framework for Scaling Whispered Speech Conversion via Pseudo-Parallel Whisper Generation
by: Fang, Zihao, et al.
Published: (2026)
by: Fang, Zihao, et al.
Published: (2026)
MMW: Side Talk Rejection Multi-Microphone Whisper on Smart Glasses
by: Liu, Yang, et al.
Published: (2025)
by: Liu, Yang, et al.
Published: (2025)
Transcript-Prompted Whisper with Dictionary-Enhanced Decoding for Japanese Speech Annotation
by: Hu, Rui, et al.
Published: (2025)
by: Hu, Rui, et al.
Published: (2025)
Adopting Whisper for Confidence Estimation
by: Aggarwal, Vaibhav, et al.
Published: (2025)
by: Aggarwal, Vaibhav, et al.
Published: (2025)
WhisperFlow: speech foundation models in real time
by: Wang, Rongxiang, et al.
Published: (2024)
by: Wang, Rongxiang, et al.
Published: (2024)
Prompting Whisper for Joint Speech Transcription and Diarization
by: Zamyrova, Mariia, et al.
Published: (2026)
by: Zamyrova, Mariia, et al.
Published: (2026)
Probing Whisper for Dysarthric Speech in Detection and Assessment
by: Yue, Zhengjun, et al.
Published: (2025)
by: Yue, Zhengjun, et al.
Published: (2025)
A Study on Incorporating Whisper for Robust Speech Assessment
by: Zezario, Ryandhimas E., et al.
Published: (2023)
by: Zezario, Ryandhimas E., et al.
Published: (2023)
One Whisper to Grade Them All
by: Phan, Nhan, et al.
Published: (2025)
by: Phan, Nhan, et al.
Published: (2025)
Whisper Has an Internal Word Aligner
by: Yeh, Sung-Lin, et al.
Published: (2025)
by: Yeh, Sung-Lin, et al.
Published: (2025)
Sparsely Shared LoRA on Whisper for Child Speech Recognition
by: Liu, Wei, et al.
Published: (2023)
by: Liu, Wei, et al.
Published: (2023)
WhisperRT -- Turning Whisper into a Causal Streaming Model
by: Krichli, Tomer, et al.
Published: (2025)
by: Krichli, Tomer, et al.
Published: (2025)
BaldWhisper: Faster Whisper with Head Shearing and Layer Merging
by: Sy, Yaya, et al.
Published: (2025)
by: Sy, Yaya, et al.
Published: (2025)
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
by: Song, Zheshu, et al.
Published: (2024)
by: Song, Zheshu, et al.
Published: (2024)
Can Whisper perform speech-based in-context learning?
by: Wang, Siyin, et al.
Published: (2023)
by: Wang, Siyin, et al.
Published: (2023)
PhoWhisper: Automatic Speech Recognition for Vietnamese
by: Le, Thanh-Thien, et al.
Published: (2024)
by: Le, Thanh-Thien, et al.
Published: (2024)
WhisperD: Dementia Speech Recognition and Filler Word Detection with Whisper
by: Akinrintoyo, Emmanuel, et al.
Published: (2025)
by: Akinrintoyo, Emmanuel, et al.
Published: (2025)
Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models
by: Li, Weiqin, et al.
Published: (2024)
by: Li, Weiqin, et al.
Published: (2024)
Speech Intelligibility Assessment with Uncertainty-Aware Whisper Embeddings and sLSTM
by: Zezario, Ryandhimas E., et al.
Published: (2025)
by: Zezario, Ryandhimas E., et al.
Published: (2025)
Leveraging Self-Supervised Models for Automatic Whispered Speech Recognition
by: Farhadipour, Aref, et al.
Published: (2024)
by: Farhadipour, Aref, et al.
Published: (2024)
Improving Rare-Word Recognition of Whisper in Zero-Shot Settings
by: Jogi, Yash, et al.
Published: (2025)
by: Jogi, Yash, et al.
Published: (2025)
Exploiting Music Source Separation for Automatic Lyrics Transcription with Whisper
by: Syed, Jaza, et al.
Published: (2025)
by: Syed, Jaza, et al.
Published: (2025)
A Self-Training Approach for Whisper to Enhance Long Dysarthric Speech Recognition
by: Wang, Shiyao, et al.
Published: (2025)
by: Wang, Shiyao, et al.
Published: (2025)
Bangla-WhisperDiar: Fine-Tuning Whisper and PyAnnote for Bangla Long-Form Speech Recognition and Speaker Diarization
by: Bhuiyan, Mohammed Aman, et al.
Published: (2026)
by: Bhuiyan, Mohammed Aman, et al.
Published: (2026)
Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding
by: Li, Mohan, et al.
Published: (2024)
by: Li, Mohan, et al.
Published: (2024)
Extending Whisper with prompt tuning to target-speaker ASR
by: Ma, Hao, et al.
Published: (2023)
by: Ma, Hao, et al.
Published: (2023)
Adapting Whisper for Streaming Speech Recognition via Two-Pass Decoding
by: Zhou, Haoran, et al.
Published: (2025)
by: Zhou, Haoran, et al.
Published: (2025)
State-Space Models in Efficient Whispered and Multi-dialect Speech Recognition
by: Farhadipour, Aref, et al.
Published: (2025)
by: Farhadipour, Aref, et al.
Published: (2025)
Deepfake Word Detection by Next-token Prediction using Fine-tuned Whisper
by: Tran, Hoan My, et al.
Published: (2026)
by: Tran, Hoan My, et al.
Published: (2026)
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
by: Rouditchenko, Andrew, et al.
Published: (2024)
by: Rouditchenko, Andrew, et al.
Published: (2024)
Deepfake Detection of Singing Voices With Whisper Encodings
by: Sharma, Falguni, et al.
Published: (2025)
by: Sharma, Falguni, et al.
Published: (2025)
DQ-Whisper: Joint Distillation and Quantization for Efficient Multilingual Speech Recognition
by: Shao, Hang, et al.
Published: (2023)
by: Shao, Hang, et al.
Published: (2023)
Similar Items
-
Optimizing Neural Speech Codec for Low-Bitrate Compression via Multi-Scale Encoding
by: Yang, Peiji, et al.
Published: (2024) -
Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification
by: Zhang, Li, et al.
Published: (2024) -
Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection
by: Wang, Haoyu, et al.
Published: (2024) -
WhisperVC: Decoupled Cross-Domain Alignment and Speech Generation for Low-Resource Whisper-to-Normal Conversion
by: Liu, Dong, et al.
Published: (2025) -
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
by: Guo, Pengcheng, et al.
Published: (2024)