Saved in:
| Main Authors: | Yuan, Jiajun, Wang, Xiaochen, Xiao, Yuhang, Wu, Yulin, Hu, Chenhao, Lv, Xueyang |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.03913 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution
by: Zhao, Shengkui, et al.
Published: (2025)
by: Zhao, Shengkui, et al.
Published: (2025)
A High-Fidelity Speech Super Resolution Network using a Complex Global Attention Module with Spectro-Temporal Loss
by: Tamiti, Tarikul Islam, et al.
Published: (2025)
by: Tamiti, Tarikul Islam, et al.
Published: (2025)
DiTSE: High-Fidelity Generative Speech Enhancement via Latent Diffusion Transformers
by: Guimarães, Heitor R., et al.
Published: (2025)
by: Guimarães, Heitor R., et al.
Published: (2025)
Transient Noise Removal via Diffusion-based Speech Inpainting
by: Moradi, Mordehay, et al.
Published: (2025)
by: Moradi, Mordehay, et al.
Published: (2025)
LongCat-AudioDiT: High-Fidelity Diffusion Text-to-Speech in the Waveform Latent Space
by: Xin, Detai, et al.
Published: (2026)
by: Xin, Detai, et al.
Published: (2026)
ConSinger: Efficient High-Fidelity Singing Voice Generation with Minimal Steps
by: Song, Yulin, et al.
Published: (2024)
by: Song, Yulin, et al.
Published: (2024)
Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution
by: Yu, Chin-Yun, et al.
Published: (2022)
by: Yu, Chin-Yun, et al.
Published: (2022)
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
by: Ji, Shengpeng, et al.
Published: (2024)
by: Ji, Shengpeng, et al.
Published: (2024)
Long-Context Speech Synthesis with Context-Aware Memory
by: Li, Zhipeng, et al.
Published: (2025)
by: Li, Zhipeng, et al.
Published: (2025)
Seeing the Context: Rich Visual Context-Aware Speech Recognition via Multimodal Reasoning
by: Tian, Wenjie, et al.
Published: (2026)
by: Tian, Wenjie, et al.
Published: (2026)
High-Fidelity Simultaneous Speech-To-Speech Translation
by: Labiausse, Tom, et al.
Published: (2025)
by: Labiausse, Tom, et al.
Published: (2025)
VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature
by: Du, Chenpeng, et al.
Published: (2022)
by: Du, Chenpeng, et al.
Published: (2022)
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
by: Bai, Ye, et al.
Published: (2024)
by: Bai, Ye, et al.
Published: (2024)
SuperCodec: A Neural Speech Codec with Selective Back-Projection Network
by: Zheng, Youqiang, et al.
Published: (2024)
by: Zheng, Youqiang, et al.
Published: (2024)
High-Fidelity Speech Enhancement via Discrete Audio Tokens
by: Lanzendörfer, Luca A., et al.
Published: (2025)
by: Lanzendörfer, Luca A., et al.
Published: (2025)
Spectral Masking with Explicit Time-Context Windowing for Neural Network-Based Monaural Speech Enhancement
by: Fiorio, Luan Vinícius, et al.
Published: (2024)
by: Fiorio, Luan Vinícius, et al.
Published: (2024)
Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
by: Ren, Wenze, et al.
Published: (2024)
by: Ren, Wenze, et al.
Published: (2024)
CogSR: Semantic-Aware Speech Super-Resolution via Chain-of-Thought Guided Flow Matching
by: Yuan, Jiajun, et al.
Published: (2025)
by: Yuan, Jiajun, et al.
Published: (2025)
Wave-U-Mamba: An End-To-End Framework For High-Quality And Efficient Speech Super Resolution
by: Lee, Yongjoon, et al.
Published: (2024)
by: Lee, Yongjoon, et al.
Published: (2024)
High-Fidelity Generative Audio Compression at 0.275kbps
by: Ma, Hao, et al.
Published: (2026)
by: Ma, Hao, et al.
Published: (2026)
Combined Generative and Predictive Modeling for Speech Super-resolution
by: Wang, Heming, et al.
Published: (2024)
by: Wang, Heming, et al.
Published: (2024)
High-Fidelity Neural Phonetic Posteriorgrams
by: Churchwell, Cameron, et al.
Published: (2024)
by: Churchwell, Cameron, et al.
Published: (2024)
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis
by: Guo, Yinlin, et al.
Published: (2024)
by: Guo, Yinlin, et al.
Published: (2024)
Efficient Long-Form Speech Recognition for General Speech In-Context Learning
by: Yen, Hao, et al.
Published: (2024)
by: Yen, Hao, et al.
Published: (2024)
Towards High-Fidelity and Controllable Bioacoustic Generation via Enhanced Diffusion Learning
by: Song, Tianyu, et al.
Published: (2025)
by: Song, Tianyu, et al.
Published: (2025)
InstructSing: High-Fidelity Singing Voice Generation via Instructing Yourself
by: Zeng, Chang, et al.
Published: (2024)
by: Zeng, Chang, et al.
Published: (2024)
Phone-purity Guided Discrete Tokens for Dysarthric Speech Recognition
by: Wang, Huimeng, et al.
Published: (2025)
by: Wang, Huimeng, et al.
Published: (2025)
Advancing Electrolaryngeal Speech Enhancement Through Speech-Text Representation Learning
by: Ma, Ding, et al.
Published: (2026)
by: Ma, Ding, et al.
Published: (2026)
Ambisonics Super-Resolution Using A Waveform-Domain Neural Network
by: Nawfal, Ismael, et al.
Published: (2025)
by: Nawfal, Ismael, et al.
Published: (2025)
Speech-Mamba: Long-Context Speech Recognition with Selective State Spaces Models
by: Gao, Xiaoxue, et al.
Published: (2024)
by: Gao, Xiaoxue, et al.
Published: (2024)
High Fidelity Text-Guided Music Editing via Single-Stage Flow Matching
by: Lan, Gael Le, et al.
Published: (2024)
by: Lan, Gael Le, et al.
Published: (2024)
CosyEdit: Unlocking End-to-End Speech Editing Capability from Zero-Shot Text-to-Speech Models
by: Chen, Junyang, et al.
Published: (2026)
by: Chen, Junyang, et al.
Published: (2026)
DENSE: Dynamic Embedding Causal Target Speech Extraction
by: Wang, Yiwen, et al.
Published: (2024)
by: Wang, Yiwen, et al.
Published: (2024)
Speech-Forensics: Towards Comprehensive Synthetic Speech Dataset Establishment and Analysis
by: Ji, Zhoulin, et al.
Published: (2024)
by: Ji, Zhoulin, et al.
Published: (2024)
Phase Repair for Time-Domain Convolutional Neural Networks in Music Super-Resolution
by: Zhang, Yenan, et al.
Published: (2023)
by: Zhang, Yenan, et al.
Published: (2023)
DualSpeech: Enhancing Speaker-Fidelity and Text-Intelligibility Through Dual Classifier-Free Guidance
by: Yang, Jinhyeok, et al.
Published: (2024)
by: Yang, Jinhyeok, et al.
Published: (2024)
DualSpeechLM: Towards Unified Speech Understanding and Generation via Dual Speech Token Modeling with Large Language Models
by: Wang, Yuanyuan, et al.
Published: (2025)
by: Wang, Yuanyuan, et al.
Published: (2025)
HILCodec: High-Fidelity and Lightweight Neural Audio Codec
by: Ahn, Sunghwan, et al.
Published: (2024)
by: Ahn, Sunghwan, et al.
Published: (2024)
Vision-Integrated High-Quality Neural Speech Coding
by: Guo, Yao, et al.
Published: (2025)
by: Guo, Yao, et al.
Published: (2025)
Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks
by: Salhab, Mahmoud, et al.
Published: (2024)
by: Salhab, Mahmoud, et al.
Published: (2024)
Similar Items
-
HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution
by: Zhao, Shengkui, et al.
Published: (2025) -
A High-Fidelity Speech Super Resolution Network using a Complex Global Attention Module with Spectro-Temporal Loss
by: Tamiti, Tarikul Islam, et al.
Published: (2025) -
DiTSE: High-Fidelity Generative Speech Enhancement via Latent Diffusion Transformers
by: Guimarães, Heitor R., et al.
Published: (2025) -
Transient Noise Removal via Diffusion-based Speech Inpainting
by: Moradi, Mordehay, et al.
Published: (2025) -
LongCat-AudioDiT: High-Fidelity Diffusion Text-to-Speech in the Waveform Latent Space
by: Xin, Detai, et al.
Published: (2026)