Saved in:
| Main Authors: | Teleki, Maria, Janjur, Sai, Liu, Haoran, Grabner, Oliver, Verma, Ketan, Docog, Thomas, Dong, Xiangjue, Shi, Lingfeng, Wang, Cong, Birkelbach, Stephanie, Kim, Jason, Zhang, Yin, Caverlee, James |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.20319 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Conversational Speech Reveals Structural Robustness Failures in SpeechLLM Backbones
by: Teleki, Maria, et al.
Published: (2025)
by: Teleki, Maria, et al.
Published: (2025)
Typical vs. Atypical Disfluency Classification: Introducing the IIITH-TISA Corpus and Temporal Context-Based Feature Representations
by: Kommagouni, Priyanka, et al.
Published: (2024)
by: Kommagouni, Priyanka, et al.
Published: (2024)
The Voice Behind the Words: Quantifying Intersectional Bias in SpeechLLMs
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2026)
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2026)
DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage
by: Wang, Kyra, et al.
Published: (2024)
by: Wang, Kyra, et al.
Published: (2024)
Missingness-resilient Video-enhanced Multimodal Disfluency Detection
by: Mohapatra, Payal, et al.
Published: (2024)
by: Mohapatra, Payal, et al.
Published: (2024)
Automatic Speech Recognition for Non-Native English: Accuracy and Disfluency Handling
by: McGuire, Michael
Published: (2025)
by: McGuire, Michael
Published: (2025)
Smooth Operators: LLMs Translating Imperfect Hints into Disfluency-Rich Transcripts
by: Altinok, Duygu
Published: (2025)
by: Altinok, Duygu
Published: (2025)
Full-Duplex-Bench-v3: Benchmarking Tool Use for Full-Duplex Voice Agents Under Real-World Disfluency
by: Lin, Guan-Ting, et al.
Published: (2026)
by: Lin, Guan-Ting, et al.
Published: (2026)
Toward a Reinforcement-Learning-Based System for Adjusting Medication to Minimize Speech Disfluency
by: Constas, Pavlos, et al.
Published: (2023)
by: Constas, Pavlos, et al.
Published: (2023)
AURA Score: A Metric For Holistic Audio Question Answering Evaluation
by: Dixit, Satvik, et al.
Published: (2025)
by: Dixit, Satvik, et al.
Published: (2025)
SocialPulse: An Open-Source Subreddit Sensemaking Toolkit
by: Birkelbach, Stephanie, et al.
Published: (2026)
by: Birkelbach, Stephanie, et al.
Published: (2026)
A Study of the Removability of Speaker-Adversarial Perturbations
by: Chen, Liping, et al.
Published: (2025)
by: Chen, Liping, et al.
Published: (2025)
Similarity Metrics For Late Reverberation
by: Santo, Gloria Dal, et al.
Published: (2024)
by: Santo, Gloria Dal, et al.
Published: (2024)
Layer-Aware Early Fusion of Acoustic and Linguistic Embeddings for Cognitive Status Classification
by: Novotny, Krystof, et al.
Published: (2026)
by: Novotny, Krystof, et al.
Published: (2026)
Bias in the Ear of the Listener: Assessing Sensitivity in Audio Language Models Across Linguistic, Demographic, and Positional Variations
by: Wei, Sheng-Lun, et al.
Published: (2026)
by: Wei, Sheng-Lun, et al.
Published: (2026)
The Extended SONICOM HRTF Dataset and Spatial Audio Metrics Toolbox
by: Poole, Katarina C., et al.
Published: (2025)
by: Poole, Katarina C., et al.
Published: (2025)
Toward Objective and Interpretable Prosody Evaluation in Text-to-Speech: A Linguistically Motivated Approach
by: Chan, Cedric, et al.
Published: (2025)
by: Chan, Cedric, et al.
Published: (2025)
ScoreDec: A Phase-preserving High-Fidelity Audio Codec with A Generalized Score-based Diffusion Post-filter
by: Wu, Yi-Chiao, et al.
Published: (2024)
by: Wu, Yi-Chiao, et al.
Published: (2024)
Zimtohrli: An Efficient Psychoacoustic Audio Similarity Metric
by: Alakuijala, Jyrki, et al.
Published: (2025)
by: Alakuijala, Jyrki, et al.
Published: (2025)
Trainable Adaptive Score Normalization for Automatic Speaker Verification
by: Choi, Jeong-Hwan, et al.
Published: (2025)
by: Choi, Jeong-Hwan, et al.
Published: (2025)
Validating Computational Markers of Depressive Behavior: Cross-Linguistic Speech-Based Depression Detection with Neurophysiological Validation
by: Tao, Fuxiang, et al.
Published: (2026)
by: Tao, Fuxiang, et al.
Published: (2026)
Classification of Autistic and Non-Autistic Children's Speech: A Cross-Linguistic Study in Finnish, French, and Slovak
by: Kakouros, Sofoklis, et al.
Published: (2026)
by: Kakouros, Sofoklis, et al.
Published: (2026)
Reducing Linguistic Hallucination in LM-Based Speech Enhancement via Noise-Invariant Acoustic-Semantic Distillation
by: Wang, Zheng, et al.
Published: (2026)
by: Wang, Zheng, et al.
Published: (2026)
Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders
by: Violeta, Lester Phillip, et al.
Published: (2023)
by: Violeta, Lester Phillip, et al.
Published: (2023)
ALDAS: Audio-Linguistic Data Augmentation for Spoofed Audio Detection
by: Khanjani, Zahra, et al.
Published: (2024)
by: Khanjani, Zahra, et al.
Published: (2024)
Comparator Loss: An Ordinal Contrastive Loss to Derive a Severity Score for Speech-based Health Monitoring
by: Webber, Jacob J, et al.
Published: (2025)
by: Webber, Jacob J, et al.
Published: (2025)
MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing
by: Wu, Shangda, et al.
Published: (2024)
by: Wu, Shangda, et al.
Published: (2024)
Benchmarking Humans and Machines on Complex Multilingual Speech Understanding Tasks
by: Kankanala, Sai Samrat, et al.
Published: (2025)
by: Kankanala, Sai Samrat, et al.
Published: (2025)
Dynamically Slimmable Speech Enhancement Network with Metric-Guided Training
by: Zhao, Haixin, et al.
Published: (2025)
by: Zhao, Haixin, et al.
Published: (2025)
Linguistic Knowledge Transfer Learning for Speech Enhancement
by: Hung, Kuo-Hsuan, et al.
Published: (2025)
by: Hung, Kuo-Hsuan, et al.
Published: (2025)
Investigating the Potential of Multi-Stage Score Fusion in Spoofing-Aware Speaker Verification
by: Kurnaz, Oguzhan, et al.
Published: (2025)
by: Kurnaz, Oguzhan, et al.
Published: (2025)
Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models
by: Bereuter, Paul A., et al.
Published: (2025)
by: Bereuter, Paul A., et al.
Published: (2025)
DMOSpeech 2: Reinforcement Learning for Duration Prediction in Metric-Optimized Speech Synthesis
by: Li, Yinghao Aaron, et al.
Published: (2025)
by: Li, Yinghao Aaron, et al.
Published: (2025)
Musical Source Separation Bake-Off: Comparing Objective Metrics with Human Perception
by: Jaffe, Noah, et al.
Published: (2025)
by: Jaffe, Noah, et al.
Published: (2025)
Transient Noise Removal via Diffusion-based Speech Inpainting
by: Moradi, Mordehay, et al.
Published: (2025)
by: Moradi, Mordehay, et al.
Published: (2025)
Assessing the Impact of Noise and Speech Enhancement on the Intelligibility of Speech Codecs
by: Behringer, Lyonel, et al.
Published: (2026)
by: Behringer, Lyonel, et al.
Published: (2026)
Beyond Global Metrics: A Fairness Analysis for Interpretable Voice Disorder Detection Systems
by: Estevez, Mariel, et al.
Published: (2025)
by: Estevez, Mariel, et al.
Published: (2025)
Measuring Prosody Diversity in Zero-Shot TTS: A New Metric, Benchmark, and Exploration
by: Yang, Yifan, et al.
Published: (2025)
by: Yang, Yifan, et al.
Published: (2025)
Beyond Acoustic Sparsity and Linguistic Bias: A Prompt-Free Paradigm for Mispronunciation Detection and Diagnosis
by: Geng, Haopeng, et al.
Published: (2026)
by: Geng, Haopeng, et al.
Published: (2026)
Audio-Based Linguistic Feature Extraction for Enhancing Multi-lingual and Low-Resource Text-to-Speech
by: Kim, Youngjae, et al.
Published: (2024)
by: Kim, Youngjae, et al.
Published: (2024)
Similar Items
-
Conversational Speech Reveals Structural Robustness Failures in SpeechLLM Backbones
by: Teleki, Maria, et al.
Published: (2025) -
Typical vs. Atypical Disfluency Classification: Introducing the IIITH-TISA Corpus and Temporal Context-Based Feature Representations
by: Kommagouni, Priyanka, et al.
Published: (2024) -
The Voice Behind the Words: Quantifying Intersectional Bias in SpeechLLMs
by: Satish, Shree Harsha Bokkahalli, et al.
Published: (2026) -
DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage
by: Wang, Kyra, et al.
Published: (2024) -
Missingness-resilient Video-enhanced Multimodal Disfluency Detection
by: Mohapatra, Payal, et al.
Published: (2024)