:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Attia, Ahmed Adel, Demszky, Dorottya, Liu, Jing, Espy-Wilson, Carol
Format:	Preprint
Published:	2025
Subjects:	Audio and Speech Processing Computation and Language Sound
Online Access:	https://arxiv.org/abs/2505.17088
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CPT-Boosted Wav2vec2.0: Towards Noise Robust Speech Recognition for Classroom Environments
by: Attia, Ahmed Adel, et al.
Published: (2024)

Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults
by: Attia, Ahmed Adel, et al.
Published: (2023)

Multi-Stage Speaker Diarization for Noisy Classrooms
by: Khan, Ali Sartaz, et al.
Published: (2025)

RealClass: A Framework for Classroom Speech Simulation with Public Datasets and Game Engines
by: Attia, Ahmed Adel, et al.
Published: (2025)

Continued Pretraining for Domain Adaptation of Wav2vec2.0 in Automatic Speech Recognition for Elementary Math Classroom Settings
by: Attia, Ahmed Adel, et al.
Published: (2024)

Articulation-Informed ASR: Integrating Articulatory Features into ASR via Auxiliary Speech Inversion and Cross-Attention Fusion
by: Attia, Ahmed Adel, et al.
Published: (2025)

SimClass: A Classroom Speech Dataset Generated via Game Engine Simulation For Automatic Speech Recognition Research
by: Attia, Ahmed Adel, et al.
Published: (2025)

Improving Speech Inversion Through Self-Supervised Embeddings and Enhanced Tract Variables
by: Attia, Ahmed Adel, et al.
Published: (2023)

Towards noise-robust speech inversion through multi-task learning with speech enhancement
by: Tabatabaee, Saba, et al.
Published: (2026)

Reverse Attention for Lightweight Speech Enhancement on Edge Devices
by: Ojha, Shuubham, et al.
Published: (2025)

Self-supervised Multimodal Speech Representations for the Assessment of Schizophrenia Symptoms
by: Premananth, Gowtham, et al.
Published: (2024)

FT-Boosted SV: Towards Noise Robust Speaker Verification for English Speaking Classroom Environments
by: Tabatabaee, Saba, et al.
Published: (2025)

Robust Training for Speaker Verification against Noisy Labels
by: Fang, Zhihua, et al.
Published: (2022)

CEC: A Noisy Label Detection Method for Speaker Recognition
by: Shen, Yao, et al.
Published: (2024)

A multi-modal approach for identifying schizophrenia using cross-modal attention
by: Premananth, Gowtham, et al.
Published: (2023)

Hierarchical Pooling Structure for Weakly Labeled Sound Event Detection
by: He, Ke-Xin, et al.
Published: (2019)

From Weak to Strong Sound Event Labels using Adaptive Change-Point Detection and Active Learning
by: Martinsson, John, et al.
Published: (2024)

Is Transfer Learning Necessary for Violin Transcription?
by: Peng, Yueh-Po, et al.
Published: (2025)

Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition
by: Ravenscroft, William, et al.
Published: (2024)

Prompting Whisper for Joint Speech Transcription and Diarization
by: Zamyrova, Mariia, et al.
Published: (2026)

Robust Singing Voice Transcription Serves Synthesis
by: Li, Ruiqi, et al.
Published: (2024)

Speech-Based Estimation of Schizophrenia Severity Using Feature Fusion
by: Premananth, Gowtham, et al.
Published: (2024)

Multichannel Keyword Spotting for Noisy Conditions
by: Saladukha, Dzmitry, et al.
Published: (2025)

Enhancing Lyrics Transcription on Music Mixtures with Consistency Loss
by: Huang, Jiawen, et al.
Published: (2025)

Towards Musically Informed Evaluation of Piano Transcription Models
by: Hu, Patricia, et al.
Published: (2024)

Mel-RoFormer for Vocal Separation and Vocal Melody Transcription
by: Wang, Ju-Chiang, et al.
Published: (2024)

Exploiting Music Source Separation for Automatic Lyrics Transcription with Whisper
by: Syed, Jaza, et al.
Published: (2025)

Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization
by: von Neumann, Thilo, et al.
Published: (2023)

Lightweight Target-Speaker-Based Overlap Transcription for Practical Streaming ASR
by: Pražák, Aleš, et al.
Published: (2025)

ChunkFormer: Masked Chunking Conformer For Long-Form Speech Transcription
by: Le, Khanh, et al.
Published: (2025)

Enhanced Automatic Drum Transcription via Drum Stem Source Separation
by: Riley, Xavier, et al.
Published: (2025)

Leveraging Broadcast Media Subtitle Transcripts for Automatic Speech Recognition and Subtitling
by: Poncelet, Jakob, et al.
Published: (2025)

Score-Informed Transformer for Refining MIDI Velocity in Automatic Music Transcription
by: He, Zhanhong, et al.
Published: (2025)

STARS: A Unified Framework for Singing Transcription, Alignment, and Refined Style Annotation
by: Guo, Wenxiang, et al.
Published: (2025)

GAPS: A Large and Diverse Classical Guitar Dataset and Benchmark Transcription Model
by: Riley, Xavier, et al.
Published: (2024)

Harmonic Summation-Based Robust Pitch Estimation in Noisy and Reverberant Environments
by: Singh, Anup, et al.
Published: (2025)

Noisy Disentanglement with Tri-stage Training for Noise-Robust Speech Recognition
by: Chen, Shuangyuan, et al.
Published: (2025)

Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels
by: Aldeneh, Zakaria, et al.
Published: (2024)

Towards Generalizability to Tone and Content Variations in the Transcription of Amplifier Rendered Electric Guitar Audio
by: Chen, Yu-Hua, et al.
Published: (2025)

audio2chart: End to End Audio Transcription into playable Guitar Hero charts
by: Tripodi, Riccardo
Published: (2025)