:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Best, Paul, Cuervo, Santiago, Marxer, Ricard
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Computation and Language Sound
Online Access:	https://arxiv.org/abs/2404.01737
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Speech foundation models on intelligibility prediction for hearing-impaired listeners
by: Cuervo, Santiago, et al.
Published: (2024)

Scaling Properties of Speech Language Models
by: Cuervo, Santiago, et al.
Published: (2024)

Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection
by: Wang, Haoyu, et al.
Published: (2024)

Contrastive prediction strategies for unsupervised segmentation and categorization of phonemes and words
by: Cuervo, Santiago, et al.
Published: (2021)

Improving the Inclusivity of Dutch Speech Recognition by Fine-tuning Whisper on the JASMIN-CGN Corpus
by: Shekoufandeh, Golshid, et al.
Published: (2025)

Can Whisper perform speech-based in-context learning?
by: Wang, Siyin, et al.
Published: (2023)

Extending Whisper with prompt tuning to target-speaker ASR
by: Ma, Hao, et al.
Published: (2023)

kNN For Whisper And Its Effect On Bias And Speaker Adaptation
by: Nachesa, Maya K., et al.
Published: (2024)

Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper
by: Thorbecke, Iuliia, et al.
Published: (2024)

Do Prompts Really Prompt? Exploring the Prompt Understanding Capability of Whisper
by: Yang, Chih-Kai, et al.
Published: (2024)

Transcript-Prompted Whisper with Dictionary-Enhanced Decoding for Japanese Speech Annotation
by: Hu, Rui, et al.
Published: (2025)

DQ-Whisper: Joint Distillation and Quantization for Efficient Multilingual Speech Recognition
by: Shao, Hang, et al.
Published: (2023)

Quantizing Whisper-small: How design choices affect ASR performance
by: Söhler, Arthur, et al.
Published: (2025)

WhisperKit: On-device Real-time ASR with Billion-Scale Transformers
by: Orhon, Atila, et al.
Published: (2025)

Factorized RVQ-GAN For Disentangled Speech Tokenization
by: Khurana, Sameer, et al.
Published: (2025)

Muting Whisper: A Universal Acoustic Adversarial Attack on Speech Foundation Models
by: Raina, Vyas, et al.
Published: (2024)

Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware Decoding
by: Zhao, Jiahui, et al.
Published: (2024)

Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation Models
by: Raina, Vyas, et al.
Published: (2024)

Swedish Whispers; Leveraging a Massive Speech Corpus for Swedish Speech Recognition
by: Vesterbacka, Leonora, et al.
Published: (2025)

Adapting Diarization-Conditioned Whisper for End-to-End Multi-Talker Speech Recognition
by: Kocour, Martin, et al.
Published: (2025)

WhisperRT -- Turning Whisper into a Causal Streaming Model
by: Krichli, Tomer, et al.
Published: (2025)

BaldWhisper: Faster Whisper with Head Shearing and Layer Merging
by: Sy, Yaya, et al.
Published: (2025)

Spontaneous Speech-Based Suicide Risk Detection Using Whisper and Large Language Models
by: Cui, Ziyun, et al.
Published: (2024)

Empowering Whisper as a Joint Multi-Talker and Target-Talker Speech Recognition System
by: Meng, Lingwei, et al.
Published: (2024)

PI-Whisper: Designing an Adaptive and Incremental Automatic Speech Recognition System for Edge Devices
by: Nassereldine, Amir, et al.
Published: (2024)

LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
by: Zhuo, Le, et al.
Published: (2023)

Overcoming Data Scarcity in Multi-Dialectal Arabic ASR via Whisper Fine-Tuning
by: Özyilmaz, Ömer Tarik, et al.
Published: (2025)

Towards Rehearsal-Free Multilingual ASR: A LoRA-based Case Study on Whisper
by: Xu, Tianyi, et al.
Published: (2024)

Improving Whisper's Recognition Performance for Under-Represented Language Kazakh Leveraging Unpaired Speech and Text
by: Li, Jinpeng, et al.
Published: (2024)

uDistil-Whisper: Label-Free Data Filtering for Knowledge Distillation in Low-Data Regimes
by: Waheed, Abdul, et al.
Published: (2024)

Qwen vs. Gemma Integration with Whisper: A Comparative Study in Multilingual SpeechLLM Systems
by: Nguyen, Tuan, et al.
Published: (2025)

Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults
by: Attia, Ahmed Adel, et al.
Published: (2023)

Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts
by: Ferraz, Thomas Palmeira, et al.
Published: (2023)

OWSM-Biasing: Contextualizing Open Whisper-Style Speech Models for Automatic Speech Recognition with Dynamic Vocabulary
by: Sudo, Yui, et al.
Published: (2025)

OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning
by: Peng, Yifan, et al.
Published: (2025)

Late Fusion and Multi-Level Fission Amplify Cross-Modal Transfer in Text-Speech LMs
by: Cuervo, Santiago, et al.
Published: (2025)

Whisper Turns Stronger: Augmenting Wav2Vec 2.0 for Superior ASR in Low-Resource Languages
by: Anidjar, Or Haim, et al.
Published: (2024)

Multilingual Prosody Transfer: Comparing Supervised & Transfer Learning
by: Goel, Arnav, et al.
Published: (2024)

Audio-to-Score Conversion Model Based on Whisper methodology
by: Zhang, Hongyao, et al.
Published: (2024)

WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher Learning
by: Rao, Rajath, et al.
Published: (2025)