Saved in:
| Main Author: | Rascon, Caleb |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2408.07234 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Multi-agent Auditory Scene Analysis
by: Rascon, Caleb, et al.
Published: (2025)
by: Rascon, Caleb, et al.
Published: (2025)
Latent Acoustic Mapping for Direction of Arrival Estimation: A Self-Supervised Approach
by: Roman, Adrian S., et al.
Published: (2025)
by: Roman, Adrian S., et al.
Published: (2025)
Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition
by: Xie, Jiamin, et al.
Published: (2025)
by: Xie, Jiamin, et al.
Published: (2025)
Enhancing Speech Quality through the Integration of BGRU and Transformer Architectures
by: Alghnam, Souliman, et al.
Published: (2025)
by: Alghnam, Souliman, et al.
Published: (2025)
SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline
by: Wang, Helin, et al.
Published: (2025)
by: Wang, Helin, et al.
Published: (2025)
PlumberNet: Fixing interference leakage after GEV beamforming
by: Grondin, François, et al.
Published: (2023)
by: Grondin, François, et al.
Published: (2023)
Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization
by: Wu, Yihan, et al.
Published: (2024)
by: Wu, Yihan, et al.
Published: (2024)
Aligning Generative Speech Enhancement with Perceptual Feedback
by: Li, Haoyang, et al.
Published: (2025)
by: Li, Haoyang, et al.
Published: (2025)
Adaptive Knowledge Distillation for Device-Directed Speech Detection
by: Chi, Hyung Gun, et al.
Published: (2025)
by: Chi, Hyung Gun, et al.
Published: (2025)
Fine-Tuning Text-to-Speech Diffusion Models Using Reinforcement Learning with Human Feedback
by: Chen, Jingyi, et al.
Published: (2025)
by: Chen, Jingyi, et al.
Published: (2025)
Speak the Art: A Direct Speech to Image Generation Framework
by: Saeed, Mariam, et al.
Published: (2025)
by: Saeed, Mariam, et al.
Published: (2025)
Speech Recognition on TV Series with Video-guided Post-ASR Correction
by: Yang, Haoyuan, et al.
Published: (2025)
by: Yang, Haoyuan, et al.
Published: (2025)
Incremental FastPitch: Chunk-based High Quality Text to Speech
by: Du, Muyang, et al.
Published: (2024)
by: Du, Muyang, et al.
Published: (2024)
Improving Voice Quality in Speech Anonymization With Just Perception-Informed Losses
by: Ghosh, Suhita, et al.
Published: (2024)
by: Ghosh, Suhita, et al.
Published: (2024)
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models
by: Chen, Sijing, et al.
Published: (2024)
by: Chen, Sijing, et al.
Published: (2024)
CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR
by: Shao, Nian, et al.
Published: (2025)
by: Shao, Nian, et al.
Published: (2025)
Active Speech Enhancement: Active Speech Denoising Decliping and Deveraberation
by: Yaish, Ofir, et al.
Published: (2025)
by: Yaish, Ofir, et al.
Published: (2025)
Voice Mapping of Text-to-Speech Systems: A Metric-Based Approach for Voice Quality Assessment
by: Cai, Huanchen, et al.
Published: (2026)
by: Cai, Huanchen, et al.
Published: (2026)
EventTrojan: Manipulating Non-Intrusive Speech Quality Assessment via Imperceptible Events
by: Ren, Ying, et al.
Published: (2023)
by: Ren, Ying, et al.
Published: (2023)
DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
by: Qi, Xin, et al.
Published: (2024)
by: Qi, Xin, et al.
Published: (2024)
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
by: Li, Yingahao Aaron, et al.
Published: (2024)
by: Li, Yingahao Aaron, et al.
Published: (2024)
Wave-U-Mamba: An End-To-End Framework For High-Quality And Efficient Speech Super Resolution
by: Lee, Yongjoon, et al.
Published: (2024)
by: Lee, Yongjoon, et al.
Published: (2024)
FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles
by: Zhang, Tian-Hao, et al.
Published: (2025)
by: Zhang, Tian-Hao, et al.
Published: (2025)
MNV-17: A High-Quality Performative Mandarin Dataset for Nonverbal Vocalization Recognition in Speech
by: Mai, Jialong, et al.
Published: (2025)
by: Mai, Jialong, et al.
Published: (2025)
Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement
by: Bae, Jae-Sung, et al.
Published: (2025)
by: Bae, Jae-Sung, et al.
Published: (2025)
Text-To-Speech Synthesis In The Wild
by: Jung, Jee-weon, et al.
Published: (2024)
by: Jung, Jee-weon, et al.
Published: (2024)
The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge
by: Guo, Yiwei, et al.
Published: (2024)
by: Guo, Yiwei, et al.
Published: (2024)
GEC-RAG: Improving Generative Error Correction via Retrieval-Augmented Generation for Automatic Speech Recognition Systems
by: Robatian, Amin, et al.
Published: (2025)
by: Robatian, Amin, et al.
Published: (2025)
Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS
by: Anuprabha, M, et al.
Published: (2025)
by: Anuprabha, M, et al.
Published: (2025)
BinauralFlow: A Causal and Streamable Approach for High-Quality Binaural Speech Synthesis with Flow Matching Models
by: Liang, Susan, et al.
Published: (2025)
by: Liang, Susan, et al.
Published: (2025)
Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic Framework and its Applicability in Automatic Pronunciation Assessment
by: Lo, Tien-Hong, et al.
Published: (2024)
by: Lo, Tien-Hong, et al.
Published: (2024)
Intelligibility of Text-to-Speech Systems for Mathematical Expressions
by: Roychowdhury, Sujoy, et al.
Published: (2025)
by: Roychowdhury, Sujoy, et al.
Published: (2025)
Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech
by: Kim, Nam-Gyu, et al.
Published: (2025)
by: Kim, Nam-Gyu, et al.
Published: (2025)
Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis
by: Hu, Xintong, et al.
Published: (2025)
by: Hu, Xintong, et al.
Published: (2025)
Enhancing Speech Emotion Recognition through Segmental Average Pooling of Self-Supervised Learning Features
by: Hyeon, Jonghwan, et al.
Published: (2024)
by: Hyeon, Jonghwan, et al.
Published: (2024)
Investigating Stochastic Methods for Prosody Modeling in Speech Synthesis
by: Mayer, Paul, et al.
Published: (2025)
by: Mayer, Paul, et al.
Published: (2025)
OpenSTBench: Beyond Semantic Evaluation for Speech Translation
by: An, Yanjie, et al.
Published: (2026)
by: An, Yanjie, et al.
Published: (2026)
A Survey of Deep Learning for Complex Speech Spectrograms
by: Xie, Yuying, et al.
Published: (2025)
by: Xie, Yuying, et al.
Published: (2025)
Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
by: Liu, Rui, et al.
Published: (2025)
by: Liu, Rui, et al.
Published: (2025)
Better Pseudo-labeling with Multi-ASR Fusion and Error Correction by SpeechLLM
by: Prakash, Jeena, et al.
Published: (2025)
by: Prakash, Jeena, et al.
Published: (2025)
Similar Items
-
Multi-agent Auditory Scene Analysis
by: Rascon, Caleb, et al.
Published: (2025) -
Latent Acoustic Mapping for Direction of Arrival Estimation: A Self-Supervised Approach
by: Roman, Adrian S., et al.
Published: (2025) -
Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition
by: Xie, Jiamin, et al.
Published: (2025) -
Enhancing Speech Quality through the Integration of BGRU and Transformer Architectures
by: Alghnam, Souliman, et al.
Published: (2025) -
SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline
by: Wang, Helin, et al.
Published: (2025)