:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Author:	Rascon, Caleb
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Artificial Intelligence
Online Access:	https://arxiv.org/abs/2408.07234
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Multi-agent Auditory Scene Analysis
by: Rascon, Caleb, et al.
Published: (2025)

Latent Acoustic Mapping for Direction of Arrival Estimation: A Self-Supervised Approach
by: Roman, Adrian S., et al.
Published: (2025)

Thinking in Directivity: Speech Large Language Model for Multi-Talker Directional Speech Recognition
by: Xie, Jiamin, et al.
Published: (2025)

Enhancing Speech Quality through the Integration of BGRU and Transformer Architectures
by: Alghnam, Souliman, et al.
Published: (2025)

SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline
by: Wang, Helin, et al.
Published: (2025)

PlumberNet: Fixing interference leakage after GEV beamforming
by: Grondin, François, et al.
Published: (2023)

Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization
by: Wu, Yihan, et al.
Published: (2024)

Aligning Generative Speech Enhancement with Perceptual Feedback
by: Li, Haoyang, et al.
Published: (2025)

Adaptive Knowledge Distillation for Device-Directed Speech Detection
by: Chi, Hyung Gun, et al.
Published: (2025)

Fine-Tuning Text-to-Speech Diffusion Models Using Reinforcement Learning with Human Feedback
by: Chen, Jingyi, et al.
Published: (2025)

Speak the Art: A Direct Speech to Image Generation Framework
by: Saeed, Mariam, et al.
Published: (2025)

Speech Recognition on TV Series with Video-guided Post-ASR Correction
by: Yang, Haoyuan, et al.
Published: (2025)

Incremental FastPitch: Chunk-based High Quality Text to Speech
by: Du, Muyang, et al.
Published: (2024)

Improving Voice Quality in Speech Anonymization With Just Perception-Informed Losses
by: Ghosh, Suhita, et al.
Published: (2024)

Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models
by: Chen, Sijing, et al.
Published: (2024)

CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR
by: Shao, Nian, et al.
Published: (2025)

Active Speech Enhancement: Active Speech Denoising Decliping and Deveraberation
by: Yaish, Ofir, et al.
Published: (2025)

Voice Mapping of Text-to-Speech Systems: A Metric-Based Approach for Voice Quality Assessment
by: Cai, Huanchen, et al.
Published: (2026)

EventTrojan: Manipulating Non-Intrusive Speech Quality Assessment via Imperceptible Events
by: Ren, Ying, et al.
Published: (2023)

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
by: Qi, Xin, et al.
Published: (2024)

DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
by: Li, Yingahao Aaron, et al.
Published: (2024)

Wave-U-Mamba: An End-To-End Framework For High-Quality And Efficient Speech Super Resolution
by: Lee, Yongjoon, et al.
Published: (2024)

FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles
by: Zhang, Tian-Hao, et al.
Published: (2025)

MNV-17: A High-Quality Performative Mandarin Dataset for Nonverbal Vocalization Recognition in Speech
by: Mai, Jialong, et al.
Published: (2025)

Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement
by: Bae, Jae-Sung, et al.
Published: (2025)

Text-To-Speech Synthesis In The Wild
by: Jung, Jee-weon, et al.
Published: (2024)

The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge
by: Guo, Yiwei, et al.
Published: (2024)

GEC-RAG: Improving Generative Error Correction via Retrieval-Augmented Generation for Automatic Speech Recognition Systems
by: Robatian, Amin, et al.
Published: (2025)

Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS
by: Anuprabha, M, et al.
Published: (2025)

BinauralFlow: A Causal and Streamable Approach for High-Quality Binaural Speech Synthesis with Flow Matching Models
by: Liang, Susan, et al.
Published: (2025)

Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic Framework and its Applicability in Automatic Pronunciation Assessment
by: Lo, Tien-Hong, et al.
Published: (2024)

Intelligibility of Text-to-Speech Systems for Mathematical Expressions
by: Roychowdhury, Sujoy, et al.
Published: (2025)

Spotlight-TTS: Spotlighting the Style via Voiced-Aware Style Extraction and Style Direction Adjustment for Expressive Text-to-Speech
by: Kim, Nam-Gyu, et al.
Published: (2025)

Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis
by: Hu, Xintong, et al.
Published: (2025)

Enhancing Speech Emotion Recognition through Segmental Average Pooling of Self-Supervised Learning Features
by: Hyeon, Jonghwan, et al.
Published: (2024)

Investigating Stochastic Methods for Prosody Modeling in Speech Synthesis
by: Mayer, Paul, et al.
Published: (2025)

OpenSTBench: Beyond Semantic Evaluation for Speech Translation
by: An, Yanjie, et al.
Published: (2026)

A Survey of Deep Learning for Complex Speech Spectrograms
by: Xie, Yuying, et al.
Published: (2025)

Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
by: Liu, Rui, et al.
Published: (2025)

Better Pseudo-labeling with Multi-ASR Fusion and Error Correction by SpeechLLM
by: Prakash, Jeena, et al.
Published: (2025)