Saved in:
| Main Authors: | Attia, Ahmed Adel, Demszky, Dorottya, Liu, Jing, Espy-Wilson, Carol |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2505.17088 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CPT-Boosted Wav2vec2.0: Towards Noise Robust Speech Recognition for Classroom Environments
by: Attia, Ahmed Adel, et al.
Published: (2024)
by: Attia, Ahmed Adel, et al.
Published: (2024)
Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults
by: Attia, Ahmed Adel, et al.
Published: (2023)
by: Attia, Ahmed Adel, et al.
Published: (2023)
Multi-Stage Speaker Diarization for Noisy Classrooms
by: Khan, Ali Sartaz, et al.
Published: (2025)
by: Khan, Ali Sartaz, et al.
Published: (2025)
RealClass: A Framework for Classroom Speech Simulation with Public Datasets and Game Engines
by: Attia, Ahmed Adel, et al.
Published: (2025)
by: Attia, Ahmed Adel, et al.
Published: (2025)
Continued Pretraining for Domain Adaptation of Wav2vec2.0 in Automatic Speech Recognition for Elementary Math Classroom Settings
by: Attia, Ahmed Adel, et al.
Published: (2024)
by: Attia, Ahmed Adel, et al.
Published: (2024)
Articulation-Informed ASR: Integrating Articulatory Features into ASR via Auxiliary Speech Inversion and Cross-Attention Fusion
by: Attia, Ahmed Adel, et al.
Published: (2025)
by: Attia, Ahmed Adel, et al.
Published: (2025)
SimClass: A Classroom Speech Dataset Generated via Game Engine Simulation For Automatic Speech Recognition Research
by: Attia, Ahmed Adel, et al.
Published: (2025)
by: Attia, Ahmed Adel, et al.
Published: (2025)
Improving Speech Inversion Through Self-Supervised Embeddings and Enhanced Tract Variables
by: Attia, Ahmed Adel, et al.
Published: (2023)
by: Attia, Ahmed Adel, et al.
Published: (2023)
Towards noise-robust speech inversion through multi-task learning with speech enhancement
by: Tabatabaee, Saba, et al.
Published: (2026)
by: Tabatabaee, Saba, et al.
Published: (2026)
Reverse Attention for Lightweight Speech Enhancement on Edge Devices
by: Ojha, Shuubham, et al.
Published: (2025)
by: Ojha, Shuubham, et al.
Published: (2025)
Self-supervised Multimodal Speech Representations for the Assessment of Schizophrenia Symptoms
by: Premananth, Gowtham, et al.
Published: (2024)
by: Premananth, Gowtham, et al.
Published: (2024)
FT-Boosted SV: Towards Noise Robust Speaker Verification for English Speaking Classroom Environments
by: Tabatabaee, Saba, et al.
Published: (2025)
by: Tabatabaee, Saba, et al.
Published: (2025)
Robust Training for Speaker Verification against Noisy Labels
by: Fang, Zhihua, et al.
Published: (2022)
by: Fang, Zhihua, et al.
Published: (2022)
CEC: A Noisy Label Detection Method for Speaker Recognition
by: Shen, Yao, et al.
Published: (2024)
by: Shen, Yao, et al.
Published: (2024)
A multi-modal approach for identifying schizophrenia using cross-modal attention
by: Premananth, Gowtham, et al.
Published: (2023)
by: Premananth, Gowtham, et al.
Published: (2023)
Hierarchical Pooling Structure for Weakly Labeled Sound Event Detection
by: He, Ke-Xin, et al.
Published: (2019)
by: He, Ke-Xin, et al.
Published: (2019)
From Weak to Strong Sound Event Labels using Adaptive Change-Point Detection and Active Learning
by: Martinsson, John, et al.
Published: (2024)
by: Martinsson, John, et al.
Published: (2024)
Is Transfer Learning Necessary for Violin Transcription?
by: Peng, Yueh-Po, et al.
Published: (2025)
by: Peng, Yueh-Po, et al.
Published: (2025)
Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition
by: Ravenscroft, William, et al.
Published: (2024)
by: Ravenscroft, William, et al.
Published: (2024)
Prompting Whisper for Joint Speech Transcription and Diarization
by: Zamyrova, Mariia, et al.
Published: (2026)
by: Zamyrova, Mariia, et al.
Published: (2026)
Robust Singing Voice Transcription Serves Synthesis
by: Li, Ruiqi, et al.
Published: (2024)
by: Li, Ruiqi, et al.
Published: (2024)
Speech-Based Estimation of Schizophrenia Severity Using Feature Fusion
by: Premananth, Gowtham, et al.
Published: (2024)
by: Premananth, Gowtham, et al.
Published: (2024)
Multichannel Keyword Spotting for Noisy Conditions
by: Saladukha, Dzmitry, et al.
Published: (2025)
by: Saladukha, Dzmitry, et al.
Published: (2025)
Enhancing Lyrics Transcription on Music Mixtures with Consistency Loss
by: Huang, Jiawen, et al.
Published: (2025)
by: Huang, Jiawen, et al.
Published: (2025)
Towards Musically Informed Evaluation of Piano Transcription Models
by: Hu, Patricia, et al.
Published: (2024)
by: Hu, Patricia, et al.
Published: (2024)
Mel-RoFormer for Vocal Separation and Vocal Melody Transcription
by: Wang, Ju-Chiang, et al.
Published: (2024)
by: Wang, Ju-Chiang, et al.
Published: (2024)
Exploiting Music Source Separation for Automatic Lyrics Transcription with Whisper
by: Syed, Jaza, et al.
Published: (2025)
by: Syed, Jaza, et al.
Published: (2025)
Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization
by: von Neumann, Thilo, et al.
Published: (2023)
by: von Neumann, Thilo, et al.
Published: (2023)
Lightweight Target-Speaker-Based Overlap Transcription for Practical Streaming ASR
by: Pražák, Aleš, et al.
Published: (2025)
by: Pražák, Aleš, et al.
Published: (2025)
ChunkFormer: Masked Chunking Conformer For Long-Form Speech Transcription
by: Le, Khanh, et al.
Published: (2025)
by: Le, Khanh, et al.
Published: (2025)
Enhanced Automatic Drum Transcription via Drum Stem Source Separation
by: Riley, Xavier, et al.
Published: (2025)
by: Riley, Xavier, et al.
Published: (2025)
Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling
by: Poncelet, Jakob, et al.
Published: (2025)
by: Poncelet, Jakob, et al.
Published: (2025)
Score-Informed Transformer for Refining MIDI Velocity in Automatic Music Transcription
by: He, Zhanhong, et al.
Published: (2025)
by: He, Zhanhong, et al.
Published: (2025)
STARS: A Unified Framework for Singing Transcription, Alignment, and Refined Style Annotation
by: Guo, Wenxiang, et al.
Published: (2025)
by: Guo, Wenxiang, et al.
Published: (2025)
GAPS: A Large and Diverse Classical Guitar Dataset and Benchmark Transcription Model
by: Riley, Xavier, et al.
Published: (2024)
by: Riley, Xavier, et al.
Published: (2024)
Harmonic Summation-Based Robust Pitch Estimation in Noisy and Reverberant Environments
by: Singh, Anup, et al.
Published: (2025)
by: Singh, Anup, et al.
Published: (2025)
Noisy Disentanglement with Tri-stage Training for Noise-Robust Speech Recognition
by: Chen, Shuangyuan, et al.
Published: (2025)
by: Chen, Shuangyuan, et al.
Published: (2025)
Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels
by: Aldeneh, Zakaria, et al.
Published: (2024)
by: Aldeneh, Zakaria, et al.
Published: (2024)
Towards Generalizability to Tone and Content Variations in the Transcription of Amplifier Rendered Electric Guitar Audio
by: Chen, Yu-Hua, et al.
Published: (2025)
by: Chen, Yu-Hua, et al.
Published: (2025)
audio2chart: End to End Audio Transcription into playable Guitar Hero charts
by: Tripodi, Riccardo
Published: (2025)
by: Tripodi, Riccardo
Published: (2025)
Similar Items
-
CPT-Boosted Wav2vec2.0: Towards Noise Robust Speech Recognition for Classroom Environments
by: Attia, Ahmed Adel, et al.
Published: (2024) -
Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults
by: Attia, Ahmed Adel, et al.
Published: (2023) -
Multi-Stage Speaker Diarization for Noisy Classrooms
by: Khan, Ali Sartaz, et al.
Published: (2025) -
RealClass: A Framework for Classroom Speech Simulation with Public Datasets and Game Engines
by: Attia, Ahmed Adel, et al.
Published: (2025) -
Continued Pretraining for Domain Adaptation of Wav2vec2.0 in Automatic Speech Recognition for Elementary Math Classroom Settings
by: Attia, Ahmed Adel, et al.
Published: (2024)