:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Lee, Dong Yoon, Weakley, Alyssa, Wei, Hui, Brown, Blake, Carrion, Keyana, Pan, Shijia
Format:	Preprint
Published:	2025
Subjects:	Sound Machine Learning I.5.4
Online Access:	https://arxiv.org/abs/2508.21167
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Measuring Robustness of Speech Recognition from MEG Signals Under Distribution Shift
by: Chien, Sheng-You, et al.
Published: (2026)

EMOVOME: A Dataset for Emotion Recognition in Spontaneous Real-Life Speech
by: Gómez-Zaragozá, Lucía, et al.
Published: (2024)

Hidden Echoes Survive Training in Audio To Audio Generative Instrument Models
by: Tralie, Christopher J., et al.
Published: (2024)

VocSim: A Training-free Benchmark for Zero-shot Content Identity in Single-source Audio
by: Basha, Maris, et al.
Published: (2025)

Enhancing Speaker Verification with Whispered Speech via Post-Processing
by: Gołębiowska, Magdalena, et al.
Published: (2026)

Predicting Upcoming Stuttering Events from Three-Second Audio: Stratified Evaluation Reveals Severity-Selective Precursors, and the Model Deploys Fully On-Device
by: Kozak, Nazar
Published: (2026)

Distilled HuBERT for Mobile Speech Emotion Recognition: A Cross-Corpus Validation Study
by: Ismail, Saifelden M.
Published: (2025)

How much to Dereverberate? Low-Latency Single-Channel Speech Enhancement in Distant Microphone Scenarios
by: Venkatesh, Satvik, et al.
Published: (2025)

Audio-based Kinship Verification Using Age Domain Conversion
by: Sun, Qiyang, et al.
Published: (2024)

Vibration Sensing ‐ A Novel Approach to Detecting Activities of Daily Living
by: Alyssa Weakley, et al.
Published: (2025)

Real-time Low-latency Music Source Separation using Hybrid Spectrogram-TasNet
by: Venkatesh, Satvik, et al.
Published: (2024)

Fine-tuning Pre-trained Audio Models for COVID-19 Detection: A Technical Report
by: de Brito, Daniel Oliveira, et al.
Published: (2025)

Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition
by: Hori, Takaaki, et al.
Published: (2025)

ParaNoise-SV: Integrated Approach for Noise-Robust Speaker Verification with Parallel Joint Learning of Speech Enhancement and Noise Extraction
by: Kim, Minu, et al.
Published: (2025)

Machine Learning Framework for Audio-Based Content Evaluation using MFCC, Chroma, Spectral Contrast, and Temporal Feature Engineering
by: Aristorenas, Aris J.
Published: (2024)

Home Health System Deployment Experience for Geriatric Care Remote Monitoring
by: Lee, Dong Yoon, et al.
Published: (2026)

Thaka at KSAA-2026 Task 2: Regularized Fine-Tuning for Arabic Speech Diacritization
by: Alamr, Meshal, et al.
Published: (2026)

SoundPlot: An Open-Source Framework for Birdsong Acoustic Analysis and Neural Synthesis with Interactive 3D Visualization
by: Mehdi, Naqcho Ali, et al.
Published: (2026)

Connected Speech-Based Cognitive Assessment in Chinese and English
by: Luz, Saturnino, et al.
Published: (2024)

Human Activity Recognition in an Open World
by: Prijatelj, Derek S., et al.
Published: (2022)

Quantum-Enhanced Analysis and Grading of Vocal Performance
by: Agarwal, Rohan
Published: (2025)

Cepstral Smoothing of Binary Masks for Convolutive Blind Separation of Speech Mixtures
by: Missaoui, Ibrahim, et al.
Published: (2026)

Modeling L1 Influence on L2 Pronunciation: An MFCC-Based Framework for Explainable Machine Learning and Pedagogical Feedback
by: Jahanbin, Peyman
Published: (2025)

AG-REPA: Causal Layer Selection for Representation Alignment in Audio Flow Matching
by: Zhang, Pengfei, et al.
Published: (2026)

An End-to-End Approach for Korean Wakeword Systems with Speaker Authentication
by: Seo, Geonwoo
Published: (2025)

Proficiency-Aware Adaptation and Data Augmentation for Robust L2 ASR
by: Sun, Ling, et al.
Published: (2025)

HELIX: Scaling Raw Audio Understanding with Hybrid Mamba-Attention Beyond the Quadratic Limit
by: Khushiyant, et al.
Published: (2026)

The Concatenator: A Bayesian Approach To Real Time Concatenative Musaicing
by: Tralie, Christopher, et al.
Published: (2024)

Emotional Voice Messages (EMOVOME) database: emotion recognition in spontaneous voice messages
by: Zaragozá, Lucía Gómez, et al.
Published: (2024)

Revisiting SSL for sound event detection: complementary fusion and adaptive post-processing
by: Cui, Hanfang, et al.
Published: (2025)

Joint Estimation of Piano Dynamics and Metrical Structure with a Multi-task Multi-Scale Network
by: He, Zhanhong, et al.
Published: (2025)

Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices
by: Lasbordes, Maxence, et al.
Published: (2025)

Understanding the Algorithm Behind Audio Key Detection
by: Silva, Henrique Perez G.
Published: (2025)

Passive Underwater Acoustic Signal Separation based on Feature Decoupling Dual-path Network
by: Liu, Yucheng, et al.
Published: (2025)

Improving Cross-Lingual Phonetic Representation of Low-Resource Languages Through Language Similarity Analysis
by: Kim, Minu, et al.
Published: (2025)

PI-TTA: Physics-Informed Source-Free Test-Time Adaptation for Robust Human Activity Recognition on Mobile Devices
by: Li, Changyu, et al.
Published: (2026)

Impact of Phonetics on Speaker Identity in Adversarial Voice Attack
by: Dar, Daniyal Kabir, et al.
Published: (2025)

Leveraging large multimodal models for audio-video deepfake detection: a pilot study
by: Cao, Songjun, et al.
Published: (2026)

Contract-Driven QoE Auditing for Speech and Singing Services: From MOS Regression to Service Graphs
by: Du, Wenzhang
Published: (2025)

STRUM: A Spectral Transcription and Rhythm Understanding Model for End-to-End Generation of Playable Rhythm-Game Charts
by: Opria, Joshua
Published: (2026)