:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Yuan, Jiajun, Wang, Xiaochen, Xiao, Yuhang, Wu, Yulin, Hu, Chenhao, Lv, Xueyang
Format:	Preprint
Published:	2025
Subjects:	Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2509.03913
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution
by: Zhao, Shengkui, et al.
Published: (2025)

A High-Fidelity Speech Super Resolution Network using a Complex Global Attention Module with Spectro-Temporal Loss
by: Tamiti, Tarikul Islam, et al.
Published: (2025)

DiTSE: High-Fidelity Generative Speech Enhancement via Latent Diffusion Transformers
by: Guimarães, Heitor R., et al.
Published: (2025)

Transient Noise Removal via Diffusion-based Speech Inpainting
by: Moradi, Mordehay, et al.
Published: (2025)

LongCat-AudioDiT: High-Fidelity Diffusion Text-to-Speech in the Waveform Latent Space
by: Xin, Detai, et al.
Published: (2026)

ConSinger: Efficient High-Fidelity Singing Voice Generation with Minimal Steps
by: Song, Yulin, et al.
Published: (2024)

Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution
by: Yu, Chin-Yun, et al.
Published: (2022)

MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
by: Ji, Shengpeng, et al.
Published: (2024)

Long-Context Speech Synthesis with Context-Aware Memory
by: Li, Zhipeng, et al.
Published: (2025)

Seeing the Context: Rich Visual Context-Aware Speech Recognition via Multimodal Reasoning
by: Tian, Wenjie, et al.
Published: (2026)

High-Fidelity Simultaneous Speech-To-Speech Translation
by: Labiausse, Tom, et al.
Published: (2025)

VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature
by: Du, Chenpeng, et al.
Published: (2022)

Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
by: Bai, Ye, et al.
Published: (2024)

SuperCodec: A Neural Speech Codec with Selective Back-Projection Network
by: Zheng, Youqiang, et al.
Published: (2024)

High-Fidelity Speech Enhancement via Discrete Audio Tokens
by: Lanzendörfer, Luca A., et al.
Published: (2025)

Spectral Masking with Explicit Time-Context Windowing for Neural Network-Based Monaural Speech Enhancement
by: Fiorio, Luan Vinícius, et al.
Published: (2024)

Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
by: Ren, Wenze, et al.
Published: (2024)

CogSR: Semantic-Aware Speech Super-Resolution via Chain-of-Thought Guided Flow Matching
by: Yuan, Jiajun, et al.
Published: (2025)

Wave-U-Mamba: An End-To-End Framework For High-Quality And Efficient Speech Super Resolution
by: Lee, Yongjoon, et al.
Published: (2024)

High-Fidelity Generative Audio Compression at 0.275kbps
by: Ma, Hao, et al.
Published: (2026)

Combined Generative and Predictive Modeling for Speech Super-resolution
by: Wang, Heming, et al.
Published: (2024)

High-Fidelity Neural Phonetic Posteriorgrams
by: Churchwell, Cameron, et al.
Published: (2024)

FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis
by: Guo, Yinlin, et al.
Published: (2024)

Efficient Long-Form Speech Recognition for General Speech In-Context Learning
by: Yen, Hao, et al.
Published: (2024)

Towards High-Fidelity and Controllable Bioacoustic Generation via Enhanced Diffusion Learning
by: Song, Tianyu, et al.
Published: (2025)

InstructSing: High-Fidelity Singing Voice Generation via Instructing Yourself
by: Zeng, Chang, et al.
Published: (2024)

Phone-purity Guided Discrete Tokens for Dysarthric Speech Recognition
by: Wang, Huimeng, et al.
Published: (2025)

Advancing Electrolaryngeal Speech Enhancement Through Speech-Text Representation Learning
by: Ma, Ding, et al.
Published: (2026)

Ambisonics Super-Resolution Using A Waveform-Domain Neural Network
by: Nawfal, Ismael, et al.
Published: (2025)

Speech-Mamba: Long-Context Speech Recognition with Selective State Spaces Models
by: Gao, Xiaoxue, et al.
Published: (2024)

High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching
by: Lan, Gael Le, et al.
Published: (2024)

CosyEdit: Unlocking End-to-End Speech Editing Capability from Zero-Shot Text-to-Speech Models
by: Chen, Junyang, et al.
Published: (2026)

DENSE: Dynamic Embedding Causal Target Speech Extraction
by: Wang, Yiwen, et al.
Published: (2024)

Speech-Forensics: Towards Comprehensive Synthetic Speech Dataset Establishment and Analysis
by: Ji, Zhoulin, et al.
Published: (2024)

Phase Repair for Time-Domain Convolutional Neural Networks in Music Super-Resolution
by: Zhang, Yenan, et al.
Published: (2023)

DualSpeech: Enhancing Speaker-Fidelity and Text-Intelligibility Through Dual Classifier-Free Guidance
by: Yang, Jinhyeok, et al.
Published: (2024)

DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models
by: Wang, Yuanyuan, et al.
Published: (2025)

HILCodec: High-Fidelity and Lightweight Neural Audio Codec
by: Ahn, Sunghwan, et al.
Published: (2024)

Vision-Integrated High-Quality Neural Speech Coding
by: Guo, Yao, et al.
Published: (2025)

Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks
by: Salhab, Mahmoud, et al.
Published: (2024)