Saved in:
| Main Authors: | Yeh, Sung-Lin, Meng, Yen, Tang, Hao |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.09987 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Estimating the Completeness of Discrete Speech Units
by: Yeh, Sung-Lin, et al.
Published: (2024)
by: Yeh, Sung-Lin, et al.
Published: (2024)
Learning Speech Representations with Variational Predictive Coding
by: Yeh, Sung-Lin, et al.
Published: (2025)
by: Yeh, Sung-Lin, et al.
Published: (2025)
Deepfake Word Detection by Next-token Prediction using Fine-tuned Whisper
by: Tran, Hoan My, et al.
Published: (2026)
by: Tran, Hoan My, et al.
Published: (2026)
MELD: Mel-Spectrogram-Based Speech Language Modeling with Discrete Latent Variables
by: Yeh, Sung-Lin, et al.
Published: (2026)
by: Yeh, Sung-Lin, et al.
Published: (2026)
Effective Context in Neural Speech Models
by: Meng, Yen, et al.
Published: (2025)
by: Meng, Yen, et al.
Published: (2025)
Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection
by: Wang, Haoyu, et al.
Published: (2024)
by: Wang, Haoyu, et al.
Published: (2024)
Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective
by: Liu, Alexander H., et al.
Published: (2024)
by: Liu, Alexander H., et al.
Published: (2024)
One Whisper to Grade Them All
by: Phan, Nhan, et al.
Published: (2025)
by: Phan, Nhan, et al.
Published: (2025)
Extending Whisper with prompt tuning to target-speaker ASR
by: Ma, Hao, et al.
Published: (2023)
by: Ma, Hao, et al.
Published: (2023)
PhoWhisper: Automatic Speech Recognition for Vietnamese
by: Le, Thanh-Thien, et al.
Published: (2024)
by: Le, Thanh-Thien, et al.
Published: (2024)
Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System
by: Meng, Lingwei, et al.
Published: (2024)
by: Meng, Lingwei, et al.
Published: (2024)
Transcript-Prompted Whisper with Dictionary-Enhanced Decoding for Japanese Speech Annotation
by: Hu, Rui, et al.
Published: (2025)
by: Hu, Rui, et al.
Published: (2025)
POWSM: A Phonetic Open Whisper-Style Speech Foundation Model
by: Li, Chin-Jou, et al.
Published: (2025)
by: Li, Chin-Jou, et al.
Published: (2025)
Fine-tuning Whisper on Low-Resource Languages for Real-World Applications
by: Timmel, Vincenzo, et al.
Published: (2024)
by: Timmel, Vincenzo, et al.
Published: (2024)
Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware Decoding
by: Zhao, Jiahui, et al.
Published: (2024)
by: Zhao, Jiahui, et al.
Published: (2024)
Enhancing Whisper's Accuracy and Speed for Indian Languages through Prompt-Tuning and Tokenization
by: Tripathi, Kumud, et al.
Published: (2024)
by: Tripathi, Kumud, et al.
Published: (2024)
Transfer Learning from Whisper for Microscopic Intelligibility Prediction
by: Best, Paul, et al.
Published: (2024)
by: Best, Paul, et al.
Published: (2024)
Can Whisper perform speech-based in-context learning?
by: Wang, Siyin, et al.
Published: (2023)
by: Wang, Siyin, et al.
Published: (2023)
FASA: a Flexible and Automatic Speech Aligner for Extracting High-quality Aligned Children Speech Data
by: Liu, Dancheng, et al.
Published: (2024)
by: Liu, Dancheng, et al.
Published: (2024)
LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
by: Zhuo, Le, et al.
Published: (2023)
by: Zhuo, Le, et al.
Published: (2023)
kNN For Whisper And Its Effect On Bias And Speaker Adaptation
by: Nachesa, Maya K., et al.
Published: (2024)
by: Nachesa, Maya K., et al.
Published: (2024)
WhisperRT -- Turning Whisper into a Causal Streaming Model
by: Krichli, Tomer, et al.
Published: (2025)
by: Krichli, Tomer, et al.
Published: (2025)
BaldWhisper: Faster Whisper with Head Shearing and Layer Merging
by: Sy, Yaya, et al.
Published: (2025)
by: Sy, Yaya, et al.
Published: (2025)
Anatomy of the Modality Gap: Dissecting the Internal States of End-to-End Speech LLMs
by: Hsu, Ming-Hao, et al.
Published: (2026)
by: Hsu, Ming-Hao, et al.
Published: (2026)
OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning
by: Peng, Yifan, et al.
Published: (2025)
by: Peng, Yifan, et al.
Published: (2025)
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer
by: Peng, Yifan, et al.
Published: (2024)
by: Peng, Yifan, et al.
Published: (2024)
Quantizing Whisper-small: How design choices affect ASR performance
by: Söhler, Arthur, et al.
Published: (2025)
by: Söhler, Arthur, et al.
Published: (2025)
WhisperKit: On-device Real-time ASR with Billion-Scale Transformers
by: Orhon, Atila, et al.
Published: (2025)
by: Orhon, Atila, et al.
Published: (2025)
Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper
by: Thorbecke, Iuliia, et al.
Published: (2024)
by: Thorbecke, Iuliia, et al.
Published: (2024)
DQ-Whisper: Joint Distillation and Quantization for Efficient Multilingual Speech Recognition
by: Shao, Hang, et al.
Published: (2023)
by: Shao, Hang, et al.
Published: (2023)
Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper
by: Yang, Chih-Kai, et al.
Published: (2024)
by: Yang, Chih-Kai, et al.
Published: (2024)
Swedish Whispers; Leveraging a Massive Speech Corpus for Swedish Speech Recognition
by: Vesterbacka, Leonora, et al.
Published: (2025)
by: Vesterbacka, Leonora, et al.
Published: (2025)
Adapting Diarization-Conditioned Whisper for End-to-End Multi-Talker Speech Recognition
by: Kocour, Martin, et al.
Published: (2025)
by: Kocour, Martin, et al.
Published: (2025)
Muting Whisper: A Universal Acoustic Adversarial Attack on Speech Foundation Models
by: Raina, Vyas, et al.
Published: (2024)
by: Raina, Vyas, et al.
Published: (2024)
Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation Models
by: Raina, Vyas, et al.
Published: (2024)
by: Raina, Vyas, et al.
Published: (2024)
Improving the Inclusivity of Dutch Speech Recognition by Fine-tuning Whisper on the JASMIN-CGN Corpus
by: Shekoufandeh, Golshid, et al.
Published: (2025)
by: Shekoufandeh, Golshid, et al.
Published: (2025)
Overcoming Data Scarcity in Multi-Dialectal Arabic ASR via Whisper Fine-Tuning
by: Özyilmaz, Ömer Tarik, et al.
Published: (2025)
by: Özyilmaz, Ömer Tarik, et al.
Published: (2025)
Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models
by: Cui, Ziyun, et al.
Published: (2024)
by: Cui, Ziyun, et al.
Published: (2024)
PI-Whisper: Designing an Adaptive and Incremental Automatic Speech Recognition System for Edge Devices
by: Nassereldine, Amir, et al.
Published: (2024)
by: Nassereldine, Amir, et al.
Published: (2024)
Investigating the Impact of Word Informativeness on Speech Emotion Recognition
by: Kakouros, Sofoklis
Published: (2025)
by: Kakouros, Sofoklis
Published: (2025)
Similar Items
-
Estimating the Completeness of Discrete Speech Units
by: Yeh, Sung-Lin, et al.
Published: (2024) -
Learning Speech Representations with Variational Predictive Coding
by: Yeh, Sung-Lin, et al.
Published: (2025) -
Deepfake Word Detection by Next-token Prediction using Fine-tuned Whisper
by: Tran, Hoan My, et al.
Published: (2026) -
MELD: Mel-Spectrogram-Based Speech Language Modeling with Discrete Latent Variables
by: Yeh, Sung-Lin, et al.
Published: (2026) -
Effective Context in Neural Speech Models
by: Meng, Yen, et al.
Published: (2025)