:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Salhab, Mahmoud, Harmanani, Haidar
Format:	Preprint
Published:	2024
Subjects:	Sound Artificial Intelligence Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2407.18571
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

CIS-BWE: Chaos-Informed Speech Bandwidth Extension
by: Tamiti, Tarikul Islam, et al.
Published: (2025)

EffiFusion-GAN: Efficient Fusion Generative Adversarial Network for Speech Enhancement
by: Wen, Bin, et al.
Published: (2025)

A High-Fidelity Speech Super Resolution Network using a Complex Global Attention Module with Spectro-Temporal Loss
by: Tamiti, Tarikul Islam, et al.
Published: (2025)

ClapFM-EVC: High-Fidelity and Flexible Emotional Voice Conversion with Dual Control from Natural Language and Speech
by: Pan, Yu, et al.
Published: (2025)

Neural Speech Embeddings for Speech Synthesis Based on Deep Generative Networks
by: Lee, Seo-Hyun, et al.
Published: (2023)

MFHCA: Enhancing Speech Emotion Recognition Via Multi-Spatial Fusion and Hierarchical Cooperative Attention
by: Jiao, Xinxin, et al.
Published: (2024)

ASVspoof 5: Crowdsourced Speech Data, Deepfakes, and Adversarial Attacks at Scale
by: Wang, Xin, et al.
Published: (2024)

Incorporating Talker Identity Aids With Improving Speech Recognition in Adversarial Environments
by: Alavilli, Sagarika, et al.
Published: (2024)

Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization
by: Lee, Sang-Hoon, et al.
Published: (2024)

Channel-Aware Domain-Adaptive Generative Adversarial Network for Robust Speech Recognition
by: Wang, Chien-Chun, et al.
Published: (2024)

Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements
by: Dhar, Sandipan, et al.
Published: (2025)

Collective Learning Mechanism based Optimal Transport Generative Adversarial Network for Non-parallel Voice Conversion
by: Dhar, Sandipan, et al.
Published: (2025)

HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution
by: Zhao, Shengkui, et al.
Published: (2025)

Via Score to Performance: Efficient Human-Controllable Long Song Generation with Bar-Level Symbolic Notation
by: Wang, Tongxi, et al.
Published: (2025)

Model-Guided Dual-Role Alignment for High-Fidelity Open-Domain Video-to-Audio Generation
by: Zhang, Kang, et al.
Published: (2025)

SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline
by: Wang, Helin, et al.
Published: (2025)

AMNet: An Acoustic Model Network for Enhanced Mandarin Speech Synthesis
by: Cao, Yubing, et al.
Published: (2025)

Diffusion Timbre Transfer Via Mutual Information Guided Inpainting
by: Lee, Ching Ho, et al.
Published: (2026)

Incremental FastPitch: Chunk-based High Quality Text to Speech
by: Du, Muyang, et al.
Published: (2024)

Collaborative Watermarking for Adversarial Speech Synthesis
by: Juvela, Lauri, et al.
Published: (2023)

Analysis and Evaluation of Synthetic Data Generation in Speech Dysfluency Detection
by: Zhang, Jinming, et al.
Published: (2025)

Analyzable Chain-of-Musical-Thought Prompting for High-Fidelity Music Generation
by: Lam, Max W. Y., et al.
Published: (2025)

Temporal Information Reconstruction and Non-Aligned Residual in Spiking Neural Networks for Speech Classification
by: Zhang, Qi, et al.
Published: (2024)

Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models
by: Chen, Sijing, et al.
Published: (2024)

VoiceBridge: General Speech Restoration with One-step Latent Bridge Models
by: Zhang, Chi, et al.
Published: (2025)

NV-Bench: Benchmark of Nonverbal Vocalization Synthesis for Expressive Text-to-Speech Generation
by: Ni, Qinke, et al.
Published: (2026)

AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder
by: Sadok, Samir, et al.
Published: (2025)

Clustering and Mining Accented Speech for Inclusive and Fair Speech Recognition
by: Kim, Jaeyoung, et al.
Published: (2024)

Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding
by: Lin, Zijian, et al.
Published: (2025)

Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
by: Wang, Yongqi, et al.
Published: (2023)

GEC-RAG: Improving Generative Error Correction via Retrieval-Augmented Generation for Automatic Speech Recognition Systems
by: Robatian, Amin, et al.
Published: (2025)

Wave-U-Mamba: An End-To-End Framework For High-Quality And Efficient Speech Super Resolution
by: Lee, Yongjoon, et al.
Published: (2024)

FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles
by: Zhang, Tian-Hao, et al.
Published: (2025)

MNV-17: A High-Quality Performative Mandarin Dataset for Nonverbal Vocalization Recognition in Speech
by: Mai, Jialong, et al.
Published: (2025)

Density Adaptive Attention-based Speech Network: Enhancing Feature Understanding for Mental Health Disorders
by: Ioannides, Georgios, et al.
Published: (2024)

LLMs-Integrated Automatic Hate Speech Recognition Using Controllable Text Generation Models
by: Oshima, Ryutaro, et al.
Published: (2026)

Bridging ASR and LLMs for Dysarthric Speech Recognition: Benchmarking Self-Supervised and Generative Approaches
by: Aboeitta, Ahmed, et al.
Published: (2025)

Speech-Forensics: Towards Comprehensive Synthetic Speech Dataset Establishment and Analysis
by: Ji, Zhoulin, et al.
Published: (2024)

Interpreting Pretrained Speech Models for Automatic Speech Assessment of Voice Disorders
by: Lau, Hok-Shing, et al.
Published: (2024)

CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech
by: Wang, Helin, et al.
Published: (2025)