Guardado en:
| Autores principales: | Bevilacqua, Antonio, Saviano, Paolo, Amirante, Alessandro, Romano, Simon Pietro |
|---|---|
| Formato: | Preprint |
| Publicado: |
2024
|
| Materias: | |
| Acceso en línea: | https://arxiv.org/abs/2405.03484 |
| Etiquetas: |
Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!
|
Ejemplares similares
WhisperRT -- Turning Whisper into a Causal Streaming Model
por: Krichli, Tomer, et al.
Publicado: (2025)
por: Krichli, Tomer, et al.
Publicado: (2025)
Leveraging Whisper Embeddings for Audio-based Lyrics Matching
por: Mancini, Eleonora, et al.
Publicado: (2025)
por: Mancini, Eleonora, et al.
Publicado: (2025)
Fine-Tuning Whisper for Inclusive Prosodic Stress Analysis
por: Sohn, Samuel S., et al.
Publicado: (2025)
por: Sohn, Samuel S., et al.
Publicado: (2025)
Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification
por: Zhang, Li, et al.
Publicado: (2024)
por: Zhang, Li, et al.
Publicado: (2024)
Domain Adapting Deep Reinforcement Learning for Real-world Speech Emotion Recognition
por: Rajapakshe, Thejan, et al.
Publicado: (2022)
por: Rajapakshe, Thejan, et al.
Publicado: (2022)
Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata
por: Zezario, Ryandhimas E., et al.
Publicado: (2023)
por: Zezario, Ryandhimas E., et al.
Publicado: (2023)
Behind the Scenes: Mechanistic Interpretability of LoRA-adapted Whisper for Speech Emotion Recognition
por: Ma, Yujian, et al.
Publicado: (2025)
por: Ma, Yujian, et al.
Publicado: (2025)
WhiSQA: Non-Intrusive Speech Quality Prediction Using Whisper Encoder Features
por: Close, George, et al.
Publicado: (2025)
por: Close, George, et al.
Publicado: (2025)
WhisperD: Dementia Speech Recognition and Filler Word Detection with Whisper
por: Akinrintoyo, Emmanuel, et al.
Publicado: (2025)
por: Akinrintoyo, Emmanuel, et al.
Publicado: (2025)
A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments
por: Khondkar, Md Jahangir Alam, et al.
Publicado: (2025)
por: Khondkar, Md Jahangir Alam, et al.
Publicado: (2025)
Real-Time Streaming Mel Vocoding with Generative Flow Matching
por: Welker, Simon, et al.
Publicado: (2025)
por: Welker, Simon, et al.
Publicado: (2025)
Optimizing Multi-Stuttered Speech Classification: Leveraging Whisper's Encoder for Efficient Parameter Reduction in Automated Assessment
por: Ameer, Huma, et al.
Publicado: (2024)
por: Ameer, Huma, et al.
Publicado: (2024)
Audio-to-Score Conversion Model Based on Whisper methodology
por: Zhang, Hongyao, et al.
Publicado: (2024)
por: Zhang, Hongyao, et al.
Publicado: (2024)
Towards Sub-millisecond Latency Real-Time Speech Enhancement Models on Hearables
por: Dementyev, Artem, et al.
Publicado: (2024)
por: Dementyev, Artem, et al.
Publicado: (2024)
Towards Efficient and Real-Time Piano Transcription Using Neural Autoregressive Models
por: Kwon, Taegyun, et al.
Publicado: (2024)
por: Kwon, Taegyun, et al.
Publicado: (2024)
Whilter: A Whisper-based Data Filter for "In-the-Wild" Speech Corpora Using Utterance-level Multi-Task Classification
por: Ravenscroft, William, et al.
Publicado: (2025)
por: Ravenscroft, William, et al.
Publicado: (2025)
Adapting WavLM for Speech Emotion Recognition
por: Diatlova, Daria, et al.
Publicado: (2024)
por: Diatlova, Daria, et al.
Publicado: (2024)
Temporal Convolution-based Hybrid Model Approach with Representation Learning for Real-Time Acoustic Anomaly Detection
por: Dissanayaka, Sahan, et al.
Publicado: (2024)
por: Dissanayaka, Sahan, et al.
Publicado: (2024)
Adapting Whisper for Streaming Speech Recognition via Two-Pass Decoding
por: Zhou, Haoran, et al.
Publicado: (2025)
por: Zhou, Haoran, et al.
Publicado: (2025)
Improving Real-Time Music Accompaniment Separation with MMDenseNet
por: Wang, Chun-Hsiang, et al.
Publicado: (2024)
por: Wang, Chun-Hsiang, et al.
Publicado: (2024)
StreamVC: Real-Time Low-Latency Voice Conversion
por: Yang, Yang, et al.
Publicado: (2024)
por: Yang, Yang, et al.
Publicado: (2024)
TF-MLPNet: Tiny Real-Time Neural Speech Separation
por: Itani, Malek, et al.
Publicado: (2025)
por: Itani, Malek, et al.
Publicado: (2025)
UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching
por: Glazer, Neta, et al.
Publicado: (2025)
por: Glazer, Neta, et al.
Publicado: (2025)
Adapting Automatic Speech Recognition for Accented Air Traffic Control Communications
por: Wee, Marcus Yu Zhe, et al.
Publicado: (2025)
por: Wee, Marcus Yu Zhe, et al.
Publicado: (2025)
Exploring System Adaptations For Minimum Latency Real-Time Piano Transcription
por: Hu, Patricia, et al.
Publicado: (2025)
por: Hu, Patricia, et al.
Publicado: (2025)
Whisper Smarter, not Harder: Adversarial Attack on Partial Suppression
por: Wong, Zheng Jie, et al.
Publicado: (2025)
por: Wong, Zheng Jie, et al.
Publicado: (2025)
ERSAM: Neural Architecture Search For Energy-Efficient and Real-Time Social Ambiance Measurement
por: Li, Chaojian, et al.
Publicado: (2023)
por: Li, Chaojian, et al.
Publicado: (2023)
Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering
por: Wang, Quan, et al.
Publicado: (2022)
por: Wang, Quan, et al.
Publicado: (2022)
Edge Intelligence for Wildlife Conservation: Real-Time Hornbill Call Classification Using TinyML
por: Hing, Kong Ka, et al.
Publicado: (2025)
por: Hing, Kong Ka, et al.
Publicado: (2025)
A Real-Time Lyrics Alignment System Using Chroma And Phonetic Features For Classical Vocal Performance
por: Park, Jiyun, et al.
Publicado: (2024)
por: Park, Jiyun, et al.
Publicado: (2024)
Towards Lightweight Adaptation of Speech Enhancement Models in Real-World Environments
por: Cheng, Longbiao, et al.
Publicado: (2026)
por: Cheng, Longbiao, et al.
Publicado: (2026)
Whisper in Medusa's Ear: Multi-head Efficient Decoding for Transformer-based ASR
por: Segal-Feldman, Yael, et al.
Publicado: (2024)
por: Segal-Feldman, Yael, et al.
Publicado: (2024)
Data-Driven Room Acoustic Modeling Via Differentiable Feedback Delay Networks With Learnable Delay Lines
por: Mezza, Alessandro Ilic, et al.
Publicado: (2024)
por: Mezza, Alessandro Ilic, et al.
Publicado: (2024)
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
por: Guo, Pengcheng, et al.
Publicado: (2024)
por: Guo, Pengcheng, et al.
Publicado: (2024)
LLark: A Multimodal Instruction-Following Language Model for Music
por: Gardner, Josh, et al.
Publicado: (2023)
por: Gardner, Josh, et al.
Publicado: (2023)
Distribution Preserving Source Separation With Time Frequency Predictive Models
por: T., Pedro J. Villasana, et al.
Publicado: (2023)
por: T., Pedro J. Villasana, et al.
Publicado: (2023)
Diffusion Models for Audio Restoration
por: Lemercier, Jean-Marie, et al.
Publicado: (2024)
por: Lemercier, Jean-Marie, et al.
Publicado: (2024)
Multi-Microphone and Multi-Modal Emotion Recognition in Reverberant Environment
por: Cohen, Ohad, et al.
Publicado: (2024)
por: Cohen, Ohad, et al.
Publicado: (2024)
Sound event localization and classification using WASN in Outdoor Environment
por: Zhang, Dongzhe, et al.
Publicado: (2024)
por: Zhang, Dongzhe, et al.
Publicado: (2024)
StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
por: Lemercier, Jean-Marie, et al.
Publicado: (2022)
por: Lemercier, Jean-Marie, et al.
Publicado: (2022)
Ejemplares similares
-
WhisperRT -- Turning Whisper into a Causal Streaming Model
por: Krichli, Tomer, et al.
Publicado: (2025) -
Leveraging Whisper Embeddings for Audio-based Lyrics Matching
por: Mancini, Eleonora, et al.
Publicado: (2025) -
Fine-Tuning Whisper for Inclusive Prosodic Stress Analysis
por: Sohn, Samuel S., et al.
Publicado: (2025) -
Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification
por: Zhang, Li, et al.
Publicado: (2024) -
Domain Adapting Deep Reinforcement Learning for Real-world Speech Emotion Recognition
por: Rajapakshe, Thejan, et al.
Publicado: (2022)