:: Library Catalog

Imagen de Portada

Guardado en:

Detalles Bibliográficos
Autores principales:	Bevilacqua, Antonio, Saviano, Paolo, Amirante, Alessandro, Romano, Simon Pietro
Formato:	Preprint
Publicado:	2024
Materias:	Sound Machine Learning Audio and Speech Processing
Acceso en línea:	https://arxiv.org/abs/2405.03484
Etiquetas:	Agregar Etiqueta Sin Etiquetas, Sea el primero en etiquetar este registro!

Ejemplares similares

WhisperRT -- Turning Whisper into a Causal Streaming Model
por: Krichli, Tomer, et al.
Publicado: (2025)

Leveraging Whisper Embeddings for Audio-based Lyrics Matching
por: Mancini, Eleonora, et al.
Publicado: (2025)

Fine-Tuning Whisper for Inclusive Prosodic Stress Analysis
por: Sohn, Samuel S., et al.
Publicado: (2025)

Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification
por: Zhang, Li, et al.
Publicado: (2024)

Domain Adapting Deep Reinforcement Learning for Real-world Speech Emotion Recognition
por: Rajapakshe, Thejan, et al.
Publicado: (2022)

Non-Intrusive Speech Intelligibility Prediction for Hearing Aids using Whisper and Metadata
por: Zezario, Ryandhimas E., et al.
Publicado: (2023)

Behind the Scenes: Mechanistic Interpretability of LoRA-adapted Whisper for Speech Emotion Recognition
por: Ma, Yujian, et al.
Publicado: (2025)

WhiSQA: Non-Intrusive Speech Quality Prediction Using Whisper Encoder Features
por: Close, George, et al.
Publicado: (2025)

WhisperD: Dementia Speech Recognition and Filler Word Detection with Whisper
por: Akinrintoyo, Emmanuel, et al.
Publicado: (2025)

A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments
por: Khondkar, Md Jahangir Alam, et al.
Publicado: (2025)

Real-Time Streaming Mel Vocoding with Generative Flow Matching
por: Welker, Simon, et al.
Publicado: (2025)

Optimizing Multi-Stuttered Speech Classification: Leveraging Whisper's Encoder for Efficient Parameter Reduction in Automated Assessment
por: Ameer, Huma, et al.
Publicado: (2024)

Audio-to-Score Conversion Model Based on Whisper methodology
por: Zhang, Hongyao, et al.
Publicado: (2024)

Towards Sub-millisecond Latency Real-Time Speech Enhancement Models on Hearables
por: Dementyev, Artem, et al.
Publicado: (2024)

Towards Efficient and Real-Time Piano Transcription Using Neural Autoregressive Models
por: Kwon, Taegyun, et al.
Publicado: (2024)

Whilter: A Whisper-based Data Filter for "In-the-Wild" Speech Corpora Using Utterance-level Multi-Task Classification
por: Ravenscroft, William, et al.
Publicado: (2025)

Adapting WavLM for Speech Emotion Recognition
por: Diatlova, Daria, et al.
Publicado: (2024)

Temporal Convolution-based Hybrid Model Approach with Representation Learning for Real-Time Acoustic Anomaly Detection
por: Dissanayaka, Sahan, et al.
Publicado: (2024)

Adapting Whisper for Streaming Speech Recognition via Two-Pass Decoding
por: Zhou, Haoran, et al.
Publicado: (2025)

Improving Real-Time Music Accompaniment Separation with MMDenseNet
por: Wang, Chun-Hsiang, et al.
Publicado: (2024)

StreamVC: Real-Time Low-Latency Voice Conversion
por: Yang, Yang, et al.
Publicado: (2024)

TF-MLPNet: Tiny Real-Time Neural Speech Separation
por: Itani, Malek, et al.
Publicado: (2025)

UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching
por: Glazer, Neta, et al.
Publicado: (2025)

Adapting Automatic Speech Recognition for Accented Air Traffic Control Communications
por: Wee, Marcus Yu Zhe, et al.
Publicado: (2025)

Exploring System Adaptations For Minimum Latency Real-Time Piano Transcription
por: Hu, Patricia, et al.
Publicado: (2025)

Whisper Smarter, not Harder: Adversarial Attack on Partial Suppression
por: Wong, Zheng Jie, et al.
Publicado: (2025)

ERSAM: Neural Architecture Search For Energy-Efficient and Real-Time Social Ambiance Measurement
por: Li, Chaojian, et al.
Publicado: (2023)

Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering
por: Wang, Quan, et al.
Publicado: (2022)

Edge Intelligence for Wildlife Conservation: Real-Time Hornbill Call Classification Using TinyML
por: Hing, Kong Ka, et al.
Publicado: (2025)

A Real-Time Lyrics Alignment System Using Chroma And Phonetic Features For Classical Vocal Performance
por: Park, Jiyun, et al.
Publicado: (2024)

Towards Lightweight Adaptation of Speech Enhancement Models in Real-World Environments
por: Cheng, Longbiao, et al.
Publicado: (2026)

Whisper in Medusa's Ear: Multi-head Efficient Decoding for Transformer-based ASR
por: Segal-Feldman, Yael, et al.
Publicado: (2024)

Data-Driven Room Acoustic Modeling Via Differentiable Feedback Delay Networks With Learnable Delay Lines
por: Mezza, Alessandro Ilic, et al.
Publicado: (2024)

SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
por: Guo, Pengcheng, et al.
Publicado: (2024)

LLark: A Multimodal Instruction-Following Language Model for Music
por: Gardner, Josh, et al.
Publicado: (2023)

Distribution Preserving Source Separation With Time Frequency Predictive Models
por: T., Pedro J. Villasana, et al.
Publicado: (2023)

Diffusion Models for Audio Restoration
por: Lemercier, Jean-Marie, et al.
Publicado: (2024)

Multi-Microphone and Multi-Modal Emotion Recognition in Reverberant Environment
por: Cohen, Ohad, et al.
Publicado: (2024)

Sound event localization and classification using WASN in Outdoor Environment
por: Zhang, Dongzhe, et al.
Publicado: (2024)

StoRM: A Diffusion-based Stochastic Regeneration Model for Speech Enhancement and Dereverberation
por: Lemercier, Jean-Marie, et al.
Publicado: (2022)