:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Teleki, Maria, Janjur, Sai, Liu, Haoran, Grabner, Oliver, Verma, Ketan, Docog, Thomas, Dong, Xiangjue, Shi, Lingfeng, Wang, Cong, Birkelbach, Stephanie, Kim, Jason, Zhang, Yin, Caverlee, James
Format:	Preprint
Published:	2025
Subjects:	Computation and Language Artificial Intelligence Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2509.20319
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Conversational Speech Reveals Structural Robustness Failures in SpeechLLM Backbones
by: Teleki, Maria, et al.
Published: (2025)

Typical vs. Atypical Disfluency Classification: Introducing the IIITH-TISA Corpus and Temporal Context-Based Feature Representations
by: Kommagouni, Priyanka, et al.
Published: (2024)

The Voice Behind the Words: Quantifying Intersectional Bias in SpeechLLMs
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2026)

DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage
by: Wang, Kyra, et al.
Published: (2024)

Missingness-resilient Video-enhanced Multimodal Disfluency Detection
by: Mohapatra, Payal, et al.
Published: (2024)

Automatic Speech Recognition for Non-Native English: Accuracy and Disfluency Handling
by: McGuire, Michael
Published: (2025)

Smooth Operators: LLMs Translating Imperfect Hints into Disfluency-Rich Transcripts
by: Altinok, Duygu
Published: (2025)

Full-Duplex-Bench-v3: Benchmarking Tool Use for Full-Duplex Voice Agents Under Real-World Disfluency
by: Lin, Guan-Ting, et al.
Published: (2026)

Toward a Reinforcement-Learning-Based System for Adjusting Medication to Minimize Speech Disfluency
by: Constas, Pavlos, et al.
Published: (2023)

AURA Score: A Metric For Holistic Audio Question Answering Evaluation
by: Dixit, Satvik, et al.
Published: (2025)

SocialPulse: An Open-Source Subreddit Sensemaking Toolkit
by: Birkelbach, Stephanie, et al.
Published: (2026)

A Study of the Removability of Speaker-Adversarial Perturbations
by: Chen, Liping, et al.
Published: (2025)

Similarity Metrics For Late Reverberation
by: Santo, Gloria Dal, et al.
Published: (2024)

Layer-Aware Early Fusion of Acoustic and Linguistic Embeddings for Cognitive Status Classification
by: Novotny, Krystof, et al.
Published: (2026)

Bias in the Ear of the Listener: Assessing Sensitivity in Audio Language Models Across Linguistic, Demographic, and Positional Variations
by: Wei, Sheng-Lun, et al.
Published: (2026)

The Extended SONICOM HRTF Dataset and Spatial Audio Metrics Toolbox
by: Poole, Katarina C., et al.
Published: (2025)

Toward Objective and Interpretable Prosody Evaluation in Text-to-Speech: A Linguistically Motivated Approach
by: Chan, Cedric, et al.
Published: (2025)

ScoreDec: A Phase-preserving High-Fidelity Audio Codec with A Generalized Score-based Diffusion Post-filter
by: Wu, Yi-Chiao, et al.
Published: (2024)

Zimtohrli: An Efficient Psychoacoustic Audio Similarity Metric
by: Alakuijala, Jyrki, et al.
Published: (2025)

Trainable Adaptive Score Normalization for Automatic Speaker Verification
by: Choi, Jeong-Hwan, et al.
Published: (2025)

Validating Computational Markers of Depressive Behavior: Cross-Linguistic Speech-Based Depression Detection with Neurophysiological Validation
by: Tao, Fuxiang, et al.
Published: (2026)

Classification of Autistic and Non-Autistic Children's Speech: A Cross-Linguistic Study in Finnish, French, and Slovak
by: Kakouros, Sofoklis, et al.
Published: (2026)

Reducing Linguistic Hallucination in LM-Based Speech Enhancement via Noise-Invariant Acoustic-Semantic Distillation
by: Wang, Zheng, et al.
Published: (2026)

Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders
by: Violeta, Lester Phillip, et al.
Published: (2023)

ALDAS: Audio-Linguistic Data Augmentation for Spoofed Audio Detection
by: Khanjani, Zahra, et al.
Published: (2024)

Comparator Loss: An Ordinal Contrastive Loss to Derive a Severity Score for Speech-based Health Monitoring
by: Webber, Jacob J, et al.
Published: (2025)

MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing
by: Wu, Shangda, et al.
Published: (2024)

Benchmarking Humans and Machines on Complex Multilingual Speech Understanding Tasks
by: Kankanala, Sai Samrat, et al.
Published: (2025)

Dynamically Slimmable Speech Enhancement Network with Metric-Guided Training
by: Zhao, Haixin, et al.
Published: (2025)

Linguistic Knowledge Transfer Learning for Speech Enhancement
by: Hung, Kuo-Hsuan, et al.
Published: (2025)

Investigating the Potential of Multi-Stage Score Fusion in Spoofing-Aware Speaker Verification
by: Kurnaz, Oguzhan, et al.
Published: (2025)

Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models
by: Bereuter, Paul A., et al.
Published: (2025)

DMOSpeech 2: Reinforcement Learning for Duration Prediction in Metric-Optimized Speech Synthesis
by: Li, Yinghao Aaron, et al.
Published: (2025)

Musical Source Separation Bake-Off: Comparing Objective Metrics with Human Perception
by: Jaffe, Noah, et al.
Published: (2025)

Transient Noise Removal via Diffusion-based Speech Inpainting
by: Moradi, Mordehay, et al.
Published: (2025)

Assessing the Impact of Noise and Speech Enhancement on the Intelligibility of Speech Codecs
by: Behringer, Lyonel, et al.
Published: (2026)

Beyond Global Metrics: A Fairness Analysis for Interpretable Voice Disorder Detection Systems
by: Estevez, Mariel, et al.
Published: (2025)

Measuring Prosody Diversity in Zero-Shot TTS: A New Metric, Benchmark, and Exploration
by: Yang, Yifan, et al.
Published: (2025)

Beyond Acoustic Sparsity and Linguistic Bias: A Prompt-Free Paradigm for Mispronunciation Detection and Diagnosis
by: Geng, Haopeng, et al.
Published: (2026)

Audio-Based Linguistic Feature Extraction for Enhancing Multi-lingual and Low-Resource Text-to-Speech
by: Kim, Youngjae, et al.
Published: (2024)