:: Library Catalog

Buchumschlag

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Hu, Chuanbo, Thrasher, Jacob, Li, Wenqi, Ruan, Mindi, Yu, Xiangxu, Paul, Lynn K, Wang, Shuo, Li, Xin
Format:	Preprint
Veröffentlicht:	2024
Schlagworte:	Sound Artificial Intelligence Computation and Language Audio and Speech Processing
Online-Zugang:	https://arxiv.org/abs/2405.05126
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Ähnliche Einträge

Rate-Aware Learned Speech Compression
von: Xu, Jun, et al.
Veröffentlicht: (2025)

EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech
von: Qi, Xin, et al.
Veröffentlicht: (2024)

Head Orientation Estimation with Distributed Microphones Using Speech Radiation Patterns
von: Müller, Kaspar, et al.
Veröffentlicht: (2023)

FleSpeech: Flexibly Controllable Speech Generation with Various Prompts
von: Li, Hanzhao, et al.
Veröffentlicht: (2025)

Parameter Selection for Analyzing Conversations with Autism Spectrum Disorder
von: Chowdhury, Tahiya, et al.
Veröffentlicht: (2024)

EME-TTS: Unlocking the Emphasis and Emotion Link in Speech Synthesis
von: Li, Haoxun, et al.
Veröffentlicht: (2025)

Exploring In-Context Learning Capabilities of ChatGPT for Pathological Speech Detection
von: Amiri, Mahdi, et al.
Veröffentlicht: (2025)

Enhancing Child Vocalization Classification with Phonetically-Tuned Embeddings for Assisting Autism Diagnosis
von: Li, Jialu, et al.
Veröffentlicht: (2023)

Exploring the Capability of Mamba in Speech Applications
von: Miyazaki, Koichi, et al.
Veröffentlicht: (2024)

Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models
von: Chen, Li-Wei, et al.
Veröffentlicht: (2024)

Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
von: Li, Guinan, et al.
Veröffentlicht: (2024)

Advancing Electrolaryngeal Speech Enhancement Through Speech-Text Representation Learning
von: Ma, Ding, et al.
Veröffentlicht: (2026)

SuPseudo: A Pseudo-supervised Learning Method for Neural Speech Enhancement in Far-field Speech Recognition
von: Luo, Longjie, et al.
Veröffentlicht: (2025)

Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding
von: Hu, Jiliang, et al.
Veröffentlicht: (2025)

Unifying Listener Scoring Scales: Comparison Learning Framework for Speech Quality Assessment and Continuous Speech Emotion Recognition
von: Hu, Cheng-Hung, et al.
Veröffentlicht: (2025)

Towards Machine Unlearning for Paralinguistic Speech Processing
von: Phukan, Orchid Chetia, et al.
Veröffentlicht: (2025)

Multimodal Assessment of Speech Impairment in ALS Using Audio-Visual and Machine Learning Approaches
von: Pierotti, Francesco, et al.
Veröffentlicht: (2025)

A Domain Adaptation Framework for Speech Recognition Systems with Only Synthetic data
von: Tran, Minh, et al.
Veröffentlicht: (2025)

What do neural networks listen to? Exploring the crucial bands in Speech Enhancement using Sinc-convolution
von: Ho, Kuan-Hsun, et al.
Veröffentlicht: (2024)

Noise-Aware Speech Separation with Contrastive Learning
von: Zhang, Zizheng, et al.
Veröffentlicht: (2023)

Enhancing Pre-trained ASR System Fine-tuning for Dysarthric Speech Recognition using Adversarial Data Augmentation
von: Wang, Huimeng, et al.
Veröffentlicht: (2024)

Long-Context Speech Synthesis with Context-Aware Memory
von: Li, Zhipeng, et al.
Veröffentlicht: (2025)

Dynamic Frequency-Adaptive Knowledge Distillation for Speech Enhancement
von: Yuan, Xihao, et al.
Veröffentlicht: (2025)

A Fast and Lightweight Model for Causal Audio-Visual Speech Separation
von: Sang, Wendi, et al.
Veröffentlicht: (2025)

Chain-Talker: Chain Understanding and Rendering for Empathetic Conversational Speech Synthesis
von: Hu, Yifan, et al.
Veröffentlicht: (2025)

A Scalable Pipeline for Enabling Non-Verbal Speech Generation and Understanding
von: Ye, Runchuan, et al.
Veröffentlicht: (2025)

Speech Emotion Recognition with ASR Integration
von: Li, Yuanchao
Veröffentlicht: (2026)

Exploring Efficient Directional and Distance Cues for Regional Speech Separation
von: Jiang, Yiheng, et al.
Veröffentlicht: (2025)

Neural Vocoders as Speech Enhancers
von: Li, Andong, et al.
Veröffentlicht: (2025)

Refining Self-Supervised Learnt Speech Representation using Brain Activations
von: Li, Hengyu, et al.
Veröffentlicht: (2024)

Bayesian Speech Synthesizers Can Learn from Multiple Teachers
von: Zhang, Ziyang, et al.
Veröffentlicht: (2025)

Contrastive Learning With Audio Discrimination For Customizable Keyword Spotting In Continuous Speech
von: Xi, Yu, et al.
Veröffentlicht: (2024)

Advances in Speech Separation: Techniques, Challenges, and Future Trends
von: Li, Kai, et al.
Veröffentlicht: (2025)

Muyan-TTS: A Trainable Text-to-Speech Model Optimized for Podcast Scenarios with a $50K Budget
von: Li, Xin, et al.
Veröffentlicht: (2025)

On the Importance of Neural Wiener Filter for Resource Efficient Multichannel Speech Enhancement
von: Hsieh, Tsun-An, et al.
Veröffentlicht: (2024)

MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis
von: Yang, Qian, et al.
Veröffentlicht: (2024)

Analysis of Self-Supervised Speech Models on Children's Speech and Infant Vocalizations
von: Li, Jialu, et al.
Veröffentlicht: (2024)

DrawSpeech: Expressive Speech Synthesis Using Prosodic Sketches as Control Conditions
von: Chen, Weidong, et al.
Veröffentlicht: (2025)

GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
von: Yao, Jixun, et al.
Veröffentlicht: (2025)

EmoFormer: A Text-Independent Speech Emotion Recognition using a Hybrid Transformer-CNN model
von: Hasan, Rashedul, et al.
Veröffentlicht: (2025)