Saved in:
| Main Authors: | Lee, Dong Yoon, Weakley, Alyssa, Wei, Hui, Brown, Blake, Carrion, Keyana, Pan, Shijia |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2508.21167 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Measuring Robustness of Speech Recognition from MEG Signals Under Distribution Shift
by: Chien, Sheng-You, et al.
Published: (2026)
by: Chien, Sheng-You, et al.
Published: (2026)
EMOVOME: A Dataset for Emotion Recognition in Spontaneous Real-Life Speech
by: Gómez-Zaragozá, Lucía, et al.
Published: (2024)
by: Gómez-Zaragozá, Lucía, et al.
Published: (2024)
Hidden Echoes Survive Training in Audio To Audio Generative Instrument Models
by: Tralie, Christopher J., et al.
Published: (2024)
by: Tralie, Christopher J., et al.
Published: (2024)
VocSim: A Training-free Benchmark for Zero-shot Content Identity in Single-source Audio
by: Basha, Maris, et al.
Published: (2025)
by: Basha, Maris, et al.
Published: (2025)
Enhancing Speaker Verification with Whispered Speech via Post-Processing
by: Gołębiowska, Magdalena, et al.
Published: (2026)
by: Gołębiowska, Magdalena, et al.
Published: (2026)
Predicting Upcoming Stuttering Events from Three-Second Audio: Stratified Evaluation Reveals Severity-Selective Precursors, and the Model Deploys Fully On-Device
by: Kozak, Nazar
Published: (2026)
by: Kozak, Nazar
Published: (2026)
Distilled HuBERT for Mobile Speech Emotion Recognition: A Cross-Corpus Validation Study
by: Ismail, Saifelden M.
Published: (2025)
by: Ismail, Saifelden M.
Published: (2025)
How much to Dereverberate? Low-Latency Single-Channel Speech Enhancement in Distant Microphone Scenarios
by: Venkatesh, Satvik, et al.
Published: (2025)
by: Venkatesh, Satvik, et al.
Published: (2025)
Audio-based Kinship Verification Using Age Domain Conversion
by: Sun, Qiyang, et al.
Published: (2024)
by: Sun, Qiyang, et al.
Published: (2024)
Vibration Sensing ‐ A Novel Approach to Detecting Activities of Daily Living
by: Alyssa Weakley, et al.
Published: (2025)
by: Alyssa Weakley, et al.
Published: (2025)
Real-time Low-latency Music Source Separation using Hybrid Spectrogram-TasNet
by: Venkatesh, Satvik, et al.
Published: (2024)
by: Venkatesh, Satvik, et al.
Published: (2024)
Fine-tuning Pre-trained Audio Models for COVID-19 Detection: A Technical Report
by: de Brito, Daniel Oliveira, et al.
Published: (2025)
by: de Brito, Daniel Oliveira, et al.
Published: (2025)
Delayed Fusion: Integrating Large Language Models into First-Pass Decoding in End-to-end Speech Recognition
by: Hori, Takaaki, et al.
Published: (2025)
by: Hori, Takaaki, et al.
Published: (2025)
ParaNoise-SV: Integrated Approach for Noise-Robust Speaker Verification with Parallel Joint Learning of Speech Enhancement and Noise Extraction
by: Kim, Minu, et al.
Published: (2025)
by: Kim, Minu, et al.
Published: (2025)
Machine Learning Framework for Audio-Based Content Evaluation using MFCC, Chroma, Spectral Contrast, and Temporal Feature Engineering
by: Aristorenas, Aris J.
Published: (2024)
by: Aristorenas, Aris J.
Published: (2024)
Home Health System Deployment Experience for Geriatric Care Remote Monitoring
by: Lee, Dong Yoon, et al.
Published: (2026)
by: Lee, Dong Yoon, et al.
Published: (2026)
Thaka at KSAA-2026 Task 2: Regularized Fine-Tuning for Arabic Speech Diacritization
by: Alamr, Meshal, et al.
Published: (2026)
by: Alamr, Meshal, et al.
Published: (2026)
SoundPlot: An Open-Source Framework for Birdsong Acoustic Analysis and Neural Synthesis with Interactive 3D Visualization
by: Mehdi, Naqcho Ali, et al.
Published: (2026)
by: Mehdi, Naqcho Ali, et al.
Published: (2026)
Connected Speech-Based Cognitive Assessment in Chinese and English
by: Luz, Saturnino, et al.
Published: (2024)
by: Luz, Saturnino, et al.
Published: (2024)
Human Activity Recognition in an Open World
by: Prijatelj, Derek S., et al.
Published: (2022)
by: Prijatelj, Derek S., et al.
Published: (2022)
Quantum-Enhanced Analysis and Grading of Vocal Performance
by: Agarwal, Rohan
Published: (2025)
by: Agarwal, Rohan
Published: (2025)
Cepstral Smoothing of Binary Masks for Convolutive Blind Separation of Speech Mixtures
by: Missaoui, Ibrahim, et al.
Published: (2026)
by: Missaoui, Ibrahim, et al.
Published: (2026)
Modeling L1 Influence on L2 Pronunciation: An MFCC-Based Framework for Explainable Machine Learning and Pedagogical Feedback
by: Jahanbin, Peyman
Published: (2025)
by: Jahanbin, Peyman
Published: (2025)
AG-REPA: Causal Layer Selection for Representation Alignment in Audio Flow Matching
by: Zhang, Pengfei, et al.
Published: (2026)
by: Zhang, Pengfei, et al.
Published: (2026)
An End-to-End Approach for Korean Wakeword Systems with Speaker Authentication
by: Seo, Geonwoo
Published: (2025)
by: Seo, Geonwoo
Published: (2025)
Proficiency-Aware Adaptation and Data Augmentation for Robust L2 ASR
by: Sun, Ling, et al.
Published: (2025)
by: Sun, Ling, et al.
Published: (2025)
HELIX: Scaling Raw Audio Understanding with Hybrid Mamba-Attention Beyond the Quadratic Limit
by: Khushiyant, et al.
Published: (2026)
by: Khushiyant, et al.
Published: (2026)
The Concatenator: A Bayesian Approach To Real Time Concatenative Musaicing
by: Tralie, Christopher, et al.
Published: (2024)
by: Tralie, Christopher, et al.
Published: (2024)
Emotional Voice Messages (EMOVOME) database: emotion recognition in spontaneous voice messages
by: Zaragozá, Lucía Gómez, et al.
Published: (2024)
by: Zaragozá, Lucía Gómez, et al.
Published: (2024)
Revisiting SSL for sound event detection: complementary fusion and adaptive post-processing
by: Cui, Hanfang, et al.
Published: (2025)
by: Cui, Hanfang, et al.
Published: (2025)
Joint Estimation of Piano Dynamics and Metrical Structure with a Multi-task Multi-Scale Network
by: He, Zhanhong, et al.
Published: (2025)
by: He, Zhanhong, et al.
Published: (2025)
Splitformer: An improved early-exit architecture for automatic speech recognition on edge devices
by: Lasbordes, Maxence, et al.
Published: (2025)
by: Lasbordes, Maxence, et al.
Published: (2025)
Understanding the Algorithm Behind Audio Key Detection
by: Silva, Henrique Perez G.
Published: (2025)
by: Silva, Henrique Perez G.
Published: (2025)
Passive Underwater Acoustic Signal Separation based on Feature Decoupling Dual-path Network
by: Liu, Yucheng, et al.
Published: (2025)
by: Liu, Yucheng, et al.
Published: (2025)
Improving Cross-Lingual Phonetic Representation of Low-Resource Languages Through Language Similarity Analysis
by: Kim, Minu, et al.
Published: (2025)
by: Kim, Minu, et al.
Published: (2025)
PI-TTA: Physics-Informed Source-Free Test-Time Adaptation for Robust Human Activity Recognition on Mobile Devices
by: Li, Changyu, et al.
Published: (2026)
by: Li, Changyu, et al.
Published: (2026)
Impact of Phonetics on Speaker Identity in Adversarial Voice Attack
by: Dar, Daniyal Kabir, et al.
Published: (2025)
by: Dar, Daniyal Kabir, et al.
Published: (2025)
Leveraging large multimodal models for audio-video deepfake detection: a pilot study
by: Cao, Songjun, et al.
Published: (2026)
by: Cao, Songjun, et al.
Published: (2026)
Contract-Driven QoE Auditing for Speech and Singing Services: From MOS Regression to Service Graphs
by: Du, Wenzhang
Published: (2025)
by: Du, Wenzhang
Published: (2025)
STRUM: A Spectral Transcription and Rhythm Understanding Model for End-to-End Generation of Playable Rhythm-Game Charts
by: Opria, Joshua
Published: (2026)
by: Opria, Joshua
Published: (2026)
Similar Items
-
Measuring Robustness of Speech Recognition from MEG Signals Under Distribution Shift
by: Chien, Sheng-You, et al.
Published: (2026) -
EMOVOME: A Dataset for Emotion Recognition in Spontaneous Real-Life Speech
by: Gómez-Zaragozá, Lucía, et al.
Published: (2024) -
Hidden Echoes Survive Training in Audio To Audio Generative Instrument Models
by: Tralie, Christopher J., et al.
Published: (2024) -
VocSim: A Training-free Benchmark for Zero-shot Content Identity in Single-source Audio
by: Basha, Maris, et al.
Published: (2025) -
Enhancing Speaker Verification with Whispered Speech via Post-Processing
by: Gołębiowska, Magdalena, et al.
Published: (2026)