Saved in:
| Main Authors: | Zhang, Eric, Wei, Li, Chen, Sarah, Wang, Michael |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.14304 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Self-supervised Speech Models for Word-Level Stuttered Speech Detection
by: Shih, Yi-Jen, et al.
Published: (2024)
by: Shih, Yi-Jen, et al.
Published: (2024)
StutterCut: Uncertainty-Guided Normalised Cut for Dysfluency Segmentation
by: Ghosh, Suhita, et al.
Published: (2025)
by: Ghosh, Suhita, et al.
Published: (2025)
Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024)
by: Zhou, Xuanru, et al.
Published: (2024)
A Two-Stage Hierarchical Deep Filtering Framework for Real-Time Speech Enhancement
by: Lu, Shenghui, et al.
Published: (2025)
by: Lu, Shenghui, et al.
Published: (2025)
A Lightweight and Real-Time Binaural Speech Enhancement Model with Spatial Cues Preservation
by: Wang, Jingyuan, et al.
Published: (2024)
by: Wang, Jingyuan, et al.
Published: (2024)
VoxPrivacy: A Benchmark for Evaluating Interactional Privacy of Speech Language Models
by: Wang, Yuxiang, et al.
Published: (2026)
by: Wang, Yuxiang, et al.
Published: (2026)
RealClass: A Framework for Classroom Speech Simulation with Public Datasets and Game Engines
by: Attia, Ahmed Adel, et al.
Published: (2025)
by: Attia, Ahmed Adel, et al.
Published: (2025)
Speech-based Clinical Depression Screening: An Empirical Study
by: Chen, Yangbin, et al.
Published: (2024)
by: Chen, Yangbin, et al.
Published: (2024)
Time-Frequency-Based Attention Cache Memory Model for Real-Time Speech Separation
by: Chen, Guo, et al.
Published: (2025)
by: Chen, Guo, et al.
Published: (2025)
Speech Recognition on TV Series with Video-guided Post-ASR Correction
by: Yang, Haoyuan, et al.
Published: (2025)
by: Yang, Haoyuan, et al.
Published: (2025)
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing
by: Anastassiou, Philip, et al.
Published: (2024)
by: Anastassiou, Philip, et al.
Published: (2024)
CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech
by: Wang, Helin, et al.
Published: (2025)
by: Wang, Helin, et al.
Published: (2025)
Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection
by: Zhang, Jinming, et al.
Published: (2025)
by: Zhang, Jinming, et al.
Published: (2025)
MRI2Speech: Speech Synthesis from Articulatory Movements Recorded by Real-time MRI
by: Shah, Neil, et al.
Published: (2024)
by: Shah, Neil, et al.
Published: (2024)
Multi-Loss Learning for Speech Emotion Recognition with Energy-Adaptive Mixup and Frame-Level Attention
by: Wang, Cong, et al.
Published: (2025)
by: Wang, Cong, et al.
Published: (2025)
Speech-DRAME: A Framework for Human-Aligned Benchmarks in Speech Role-Play
by: Shi, Jiatong, et al.
Published: (2025)
by: Shi, Jiatong, et al.
Published: (2025)
ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge
by: Wang, He, et al.
Published: (2024)
by: Wang, He, et al.
Published: (2024)
Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens
by: Wang, Xinsheng, et al.
Published: (2025)
by: Wang, Xinsheng, et al.
Published: (2025)
Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge
by: Xue, Hongfei, et al.
Published: (2024)
by: Xue, Hongfei, et al.
Published: (2024)
The NPU-ASLP-LiAuto System Description for Visual Speech Recognition in CNVSRC 2023
by: Wang, He, et al.
Published: (2024)
by: Wang, He, et al.
Published: (2024)
Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
by: Wang, Yongqi, et al.
Published: (2023)
by: Wang, Yongqi, et al.
Published: (2023)
SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline
by: Wang, Helin, et al.
Published: (2025)
by: Wang, Helin, et al.
Published: (2025)
MEBM-Speech: Multi-scale Enhanced BrainMagic for Robust MEG Speech Detection
by: Songyi, Li, et al.
Published: (2026)
by: Songyi, Li, et al.
Published: (2026)
PodEval: A Multimodal Evaluation Framework for Podcast Audio Generation
by: Xiao, Yujia, et al.
Published: (2025)
by: Xiao, Yujia, et al.
Published: (2025)
LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec
by: Guo, Yiwei, et al.
Published: (2024)
by: Guo, Yiwei, et al.
Published: (2024)
Construction and Evaluation of Mandarin Multimodal Emotional Speech Database
by: Ting, Zhu, et al.
Published: (2024)
by: Ting, Zhu, et al.
Published: (2024)
Are you sure? Analysing Uncertainty Quantification Approaches for Real-world Speech Emotion Recognition
by: Schrüfer, Oliver, et al.
Published: (2024)
by: Schrüfer, Oliver, et al.
Published: (2024)
Defense Against Synthetic Speech: Real-Time Detection of RVC Voice Conversion Attacks
by: Chinchmalatpure, Prajwal, et al.
Published: (2025)
by: Chinchmalatpure, Prajwal, et al.
Published: (2025)
Active Learning with Task Adaptation Pre-training for Speech Emotion Recognition
by: Li, Dongyuan, et al.
Published: (2024)
by: Li, Dongyuan, et al.
Published: (2024)
A Tutorial on Clinical Speech AI Development: From Data Collection to Model Validation
by: Ng, Si-Ioi, et al.
Published: (2024)
by: Ng, Si-Ioi, et al.
Published: (2024)
Automatic Speech Recognition in the Modern Era: Architectures, Training, and Evaluation
by: Nayeem, Md., et al.
Published: (2025)
by: Nayeem, Md., et al.
Published: (2025)
Unlocking Temporal Flexibility: Neural Speech Codec with Variable Frame Rate
by: Zhang, Hanglei, et al.
Published: (2025)
by: Zhang, Hanglei, et al.
Published: (2025)
Harf-Speech: A Clinically Aligned Framework for Arabic Phoneme-Level Speech Assessment
by: Azad, Asif, et al.
Published: (2026)
by: Azad, Asif, et al.
Published: (2026)
JASTIN: Aligning LLMs for Zero-Shot Audio and Speech Evaluation via Natural Language Instructions
by: Zhang, Leying, et al.
Published: (2026)
by: Zhang, Leying, et al.
Published: (2026)
Perceiver-Prompt: Flexible Speaker Adaptation in Whisper for Chinese Disordered Speech Recognition
by: Jiang, Yicong, et al.
Published: (2024)
by: Jiang, Yicong, et al.
Published: (2024)
Unveiling the Best Practices for Applying Speech Foundation Models to Speech Intelligibility Prediction for Hearing-Impaired People
by: Zhou, Haoshuai, et al.
Published: (2025)
by: Zhou, Haoshuai, et al.
Published: (2025)
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
by: Deng, Wei, et al.
Published: (2025)
by: Deng, Wei, et al.
Published: (2025)
Effective and Efficient Mixed Precision Quantization of Speech Foundation Models
by: Xu, Haoning, et al.
Published: (2025)
by: Xu, Haoning, et al.
Published: (2025)
Self-Supervised Models for Phoneme Recognition: Applications in Children's Speech for Reading Learning
by: Medin, Lucas Block, et al.
Published: (2025)
by: Medin, Lucas Block, et al.
Published: (2025)
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training
by: Du, Zhihao, et al.
Published: (2025)
by: Du, Zhihao, et al.
Published: (2025)
Similar Items
-
Self-supervised Speech Models for Word-Level Stuttered Speech Detection
by: Shih, Yi-Jen, et al.
Published: (2024) -
StutterCut: Uncertainty-Guided Normalised Cut for Dysfluency Segmentation
by: Ghosh, Suhita, et al.
Published: (2025) -
Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection
by: Zhou, Xuanru, et al.
Published: (2024) -
A Two-Stage Hierarchical Deep Filtering Framework for Real-Time Speech Enhancement
by: Lu, Shenghui, et al.
Published: (2025) -
A Lightweight and Real-Time Binaural Speech Enhancement Model with Spatial Cues Preservation
by: Wang, Jingyuan, et al.
Published: (2024)