Saved in:
| Main Authors: | Salhab, Mahmoud, Harmanani, Haidar |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2407.18571 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
CIS-BWE: Chaos-Informed Speech Bandwidth Extension
by: Tamiti, Tarikul Islam, et al.
Published: (2025)
by: Tamiti, Tarikul Islam, et al.
Published: (2025)
EffiFusion-GAN: Efficient Fusion Generative Adversarial Network for Speech Enhancement
by: Wen, Bin, et al.
Published: (2025)
by: Wen, Bin, et al.
Published: (2025)
A High-Fidelity Speech Super Resolution Network using a Complex Global Attention Module with Spectro-Temporal Loss
by: Tamiti, Tarikul Islam, et al.
Published: (2025)
by: Tamiti, Tarikul Islam, et al.
Published: (2025)
ClapFM-EVC: High-Fidelity and Flexible Emotional Voice Conversion with Dual Control from Natural Language and Speech
by: Pan, Yu, et al.
Published: (2025)
by: Pan, Yu, et al.
Published: (2025)
Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks
by: Lee, Seo-Hyun, et al.
Published: (2023)
by: Lee, Seo-Hyun, et al.
Published: (2023)
MFHCA: Enhancing Speech Emotion Recognition Via Multi-Spatial Fusion and Hierarchical Cooperative Attention
by: Jiao, Xinxin, et al.
Published: (2024)
by: Jiao, Xinxin, et al.
Published: (2024)
ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale
by: Wang, Xin, et al.
Published: (2024)
by: Wang, Xin, et al.
Published: (2024)
Incorporating Talker Identity Aids With Improving Speech Recognition in Adversarial Environments
by: Alavilli, Sagarika, et al.
Published: (2024)
by: Alavilli, Sagarika, et al.
Published: (2024)
Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization
by: Lee, Sang-Hoon, et al.
Published: (2024)
by: Lee, Sang-Hoon, et al.
Published: (2024)
Channel-Aware Domain-Adaptive Generative Adversarial Network for Robust Speech Recognition
by: Wang, Chien-Chun, et al.
Published: (2024)
by: Wang, Chien-Chun, et al.
Published: (2024)
Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements
by: Dhar, Sandipan, et al.
Published: (2025)
by: Dhar, Sandipan, et al.
Published: (2025)
Collective Learning Mechanism based Optimal Transport Generative Adversarial Network for Non-parallel Voice Conversion
by: Dhar, Sandipan, et al.
Published: (2025)
by: Dhar, Sandipan, et al.
Published: (2025)
HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution
by: Zhao, Shengkui, et al.
Published: (2025)
by: Zhao, Shengkui, et al.
Published: (2025)
Via Score to Performance: Efficient Human-Controllable Long Song Generation with Bar-Level Symbolic Notation
by: Wang, Tongxi, et al.
Published: (2025)
by: Wang, Tongxi, et al.
Published: (2025)
Model-Guided Dual-Role Alignment for High-Fidelity Open-Domain Video-to-Audio Generation
by: Zhang, Kang, et al.
Published: (2025)
by: Zhang, Kang, et al.
Published: (2025)
SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline
by: Wang, Helin, et al.
Published: (2025)
by: Wang, Helin, et al.
Published: (2025)
AMNet: An Acoustic Model Network for Enhanced Mandarin Speech Synthesis
by: Cao, Yubing, et al.
Published: (2025)
by: Cao, Yubing, et al.
Published: (2025)
Diffusion Timbre Transfer Via Mutual Information Guided Inpainting
by: Lee, Ching Ho, et al.
Published: (2026)
by: Lee, Ching Ho, et al.
Published: (2026)
Incremental FastPitch: Chunk-based High Quality Text to Speech
by: Du, Muyang, et al.
Published: (2024)
by: Du, Muyang, et al.
Published: (2024)
Collaborative Watermarking for Adversarial Speech Synthesis
by: Juvela, Lauri, et al.
Published: (2023)
by: Juvela, Lauri, et al.
Published: (2023)
Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection
by: Zhang, Jinming, et al.
Published: (2025)
by: Zhang, Jinming, et al.
Published: (2025)
Analyzable Chain-of-Musical-Thought Prompting for High-Fidelity Music Generation
by: Lam, Max W. Y., et al.
Published: (2025)
by: Lam, Max W. Y., et al.
Published: (2025)
Temporal Information Reconstruction and Non-Aligned Residual in Spiking Neural Networks for Speech Classification
by: Zhang, Qi, et al.
Published: (2024)
by: Zhang, Qi, et al.
Published: (2024)
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models
by: Chen, Sijing, et al.
Published: (2024)
by: Chen, Sijing, et al.
Published: (2024)
VoiceBridge: General Speech Restoration with One-step Latent Bridge Models
by: Zhang, Chi, et al.
Published: (2025)
by: Zhang, Chi, et al.
Published: (2025)
NV-Bench: Benchmark of Nonverbal Vocalization Synthesis for Expressive Text-to-Speech Generation
by: Ni, Qinke, et al.
Published: (2026)
by: Ni, Qinke, et al.
Published: (2026)
AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder
by: Sadok, Samir, et al.
Published: (2025)
by: Sadok, Samir, et al.
Published: (2025)
Clustering and Mining Accented Speech for Inclusive and Fair Speech Recognition
by: Kim, Jaeyoung, et al.
Published: (2024)
by: Kim, Jaeyoung, et al.
Published: (2024)
Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding
by: Lin, Zijian, et al.
Published: (2025)
by: Lin, Zijian, et al.
Published: (2025)
Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
by: Wang, Yongqi, et al.
Published: (2023)
by: Wang, Yongqi, et al.
Published: (2023)
GEC-RAG: Improving Generative Error Correction via Retrieval-Augmented Generation for Automatic Speech Recognition Systems
by: Robatian, Amin, et al.
Published: (2025)
by: Robatian, Amin, et al.
Published: (2025)
Wave-U-Mamba: An End-To-End Framework For High-Quality And Efficient Speech Super Resolution
by: Lee, Yongjoon, et al.
Published: (2024)
by: Lee, Yongjoon, et al.
Published: (2024)
FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles
by: Zhang, Tian-Hao, et al.
Published: (2025)
by: Zhang, Tian-Hao, et al.
Published: (2025)
MNV-17: A High-Quality Performative Mandarin Dataset for Nonverbal Vocalization Recognition in Speech
by: Mai, Jialong, et al.
Published: (2025)
by: Mai, Jialong, et al.
Published: (2025)
Density Adaptive Attention-based Speech Network: Enhancing Feature Understanding for Mental Health Disorders
by: Ioannides, Georgios, et al.
Published: (2024)
by: Ioannides, Georgios, et al.
Published: (2024)
LLMs-Integrated Automatic Hate Speech Recognition Using Controllable Text Generation Models
by: Oshima, Ryutaro, et al.
Published: (2026)
by: Oshima, Ryutaro, et al.
Published: (2026)
Bridging ASR and LLMs for Dysarthric Speech Recognition: Benchmarking Self-Supervised and Generative Approaches
by: Aboeitta, Ahmed, et al.
Published: (2025)
by: Aboeitta, Ahmed, et al.
Published: (2025)
Speech-Forensics: Towards Comprehensive Synthetic Speech Dataset Establishment and Analysis
by: Ji, Zhoulin, et al.
Published: (2024)
by: Ji, Zhoulin, et al.
Published: (2024)
Interpreting Pretrained Speech Models for Automatic Speech Assessment of Voice Disorders
by: Lau, Hok-Shing, et al.
Published: (2024)
by: Lau, Hok-Shing, et al.
Published: (2024)
CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech
by: Wang, Helin, et al.
Published: (2025)
by: Wang, Helin, et al.
Published: (2025)
Similar Items
-
CIS-BWE: Chaos-Informed Speech Bandwidth Extension
by: Tamiti, Tarikul Islam, et al.
Published: (2025) -
EffiFusion-GAN: Efficient Fusion Generative Adversarial Network for Speech Enhancement
by: Wen, Bin, et al.
Published: (2025) -
A High-Fidelity Speech Super Resolution Network using a Complex Global Attention Module with Spectro-Temporal Loss
by: Tamiti, Tarikul Islam, et al.
Published: (2025) -
ClapFM-EVC: High-Fidelity and Flexible Emotional Voice Conversion with Dual Control from Natural Language and Speech
by: Pan, Yu, et al.
Published: (2025) -
Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks
by: Lee, Seo-Hyun, et al.
Published: (2023)