Saved in:
| Main Authors: | Dai, Lingling, Li, Andong, Chi, Cheng, Liang, Yifan, Li, Xiaodong, Zheng, Chengshi |
|---|---|
| Format: | Preprint |
| Published: |
2026
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2601.13758 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing
by: Liang, Yifan, et al.
Published: (2025)
by: Liang, Yifan, et al.
Published: (2025)
Neural Vocoders as Speech Enhancers
by: Li, Andong, et al.
Published: (2025)
by: Li, Andong, et al.
Published: (2025)
Scalable Neural Vocoder from Range-Null Space Decomposition
by: Li, Andong, et al.
Published: (2026)
by: Li, Andong, et al.
Published: (2026)
Learning Neural Vocoder from Range-Null Space Decomposition
by: Li, Andong, et al.
Published: (2025)
by: Li, Andong, et al.
Published: (2025)
BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective
by: Li, Andong, et al.
Published: (2025)
by: Li, Andong, et al.
Published: (2025)
End-to-end multi-channel speaker extraction and binaural speech synthesis
by: Chi, Cheng, et al.
Published: (2024)
by: Chi, Cheng, et al.
Published: (2024)
BSDB-Net: Band-Split Dual-Branch Network with Selective State Spaces Mechanism for Monaural Speech Enhancement
by: Fan, Cunhang, et al.
Published: (2024)
by: Fan, Cunhang, et al.
Published: (2024)
LTA-L2S: Lexical Tone-Aware Lip-to-Speech Synthesis for Mandarin with Cross-Lingual Transfer Learning
by: Yang, Kang, et al.
Published: (2025)
by: Yang, Kang, et al.
Published: (2025)
Rethinking the joint estimation of magnitude and phase for time-frequency domain neural vocoders
by: Dai, Lingling, et al.
Published: (2025)
by: Dai, Lingling, et al.
Published: (2025)
SEE: Signal Embedding Energy for Quantifying Noise Interference in Large Audio Language Models
by: Zhang, Yuanhe, et al.
Published: (2026)
by: Zhang, Yuanhe, et al.
Published: (2026)
SLD-L2S: Hierarchical Subspace Latent Diffusion for High-Fidelity Lip to Speech Synthesis
by: Liang, Yifan, et al.
Published: (2026)
by: Liang, Yifan, et al.
Published: (2026)
Gradient weighting for speaker verification in extremely low Signal-to-Noise Ratio
by: Ma, Yi, et al.
Published: (2024)
by: Ma, Yi, et al.
Published: (2024)
Improved Remixing Process for Domain Adaptation-Based Speech Enhancement by Mitigating Data Imbalance in Signal-to-Noise Ratio
by: Li, Li, et al.
Published: (2024)
by: Li, Li, et al.
Published: (2024)
Deep Learning for Personalized Binaural Audio Reproduction
by: Lu, Xikun, et al.
Published: (2025)
by: Lu, Xikun, et al.
Published: (2025)
Noise-to-mask Ratio Loss for Deep Neural Network based Audio Watermarking
by: Moritz, Martin, et al.
Published: (2024)
by: Moritz, Martin, et al.
Published: (2024)
FoleySpace: Vision-Aligned Binaural Spatial Audio Generation
by: Zhao, Lei, et al.
Published: (2025)
by: Zhao, Lei, et al.
Published: (2025)
Global Rotation Equivariant Phase Modeling for Speech Enhancement with Deep Magnitude-Phase Interaction
by: Wang, Chengzhong, et al.
Published: (2026)
by: Wang, Chengzhong, et al.
Published: (2026)
SemanticAudio: Audio Generation and Editing in Semantic Space
by: Dai, Zheqi, et al.
Published: (2026)
by: Dai, Zheqi, et al.
Published: (2026)
Dasheng AudioGen: A Unified Model for Generating Coherent Audio Scenes from Text
by: Mei, Jiahao, et al.
Published: (2026)
by: Mei, Jiahao, et al.
Published: (2026)
Synaspot: A Lightweight, Streaming Multi-modal Framework for Keyword Spotting with Audio-Text Synergy
by: Li, Kewei, et al.
Published: (2025)
by: Li, Kewei, et al.
Published: (2025)
UniTok-Audio: A Unified Audio Generation Framework via Generative Modeling on Discrete Codec Tokens
by: Liu, Chengwei, et al.
Published: (2025)
by: Liu, Chengwei, et al.
Published: (2025)
High-Fidelity Generative Audio Compression at 0.275kbps
by: Ma, Hao, et al.
Published: (2026)
by: Ma, Hao, et al.
Published: (2026)
SMRU: Split-and-Merge Recurrent-based UNet for Acoustic Echo Cancellation and Noise Suppression
by: Sun, Zhihang, et al.
Published: (2024)
by: Sun, Zhihang, et al.
Published: (2024)
MeanAudio: Fast and Faithful Text-to-Audio Generation with Mean Flows
by: Li, Xiquan, et al.
Published: (2025)
by: Li, Xiquan, et al.
Published: (2025)
AudioMoG: Guiding Audio Generation with Mixture-of-Guidance
by: Wang, Junyou, et al.
Published: (2025)
by: Wang, Junyou, et al.
Published: (2025)
AudioGS: Spectrogram-Based Audio Gaussian Splatting for Sound Field Reconstruction
by: Bi, Chunhao, et al.
Published: (2026)
by: Bi, Chunhao, et al.
Published: (2026)
AudioLCM: Text-to-Audio Generation with Latent Consistency Models
by: Liu, Huadai, et al.
Published: (2024)
by: Liu, Huadai, et al.
Published: (2024)
AudioRAG+: Feedback-driven Retrieval-augmented Audio Generation with Large Audio Language Models
by: Zhao, Junqi, et al.
Published: (2025)
by: Zhao, Junqi, et al.
Published: (2025)
Evaluating Objective Speech Quality Metrics for Neural Audio Codecs
by: Lanzendörfer, Luca A., et al.
Published: (2025)
by: Lanzendörfer, Luca A., et al.
Published: (2025)
A Survey of Large Audio Language Models: Generalization, Trustworthiness, and Outlook
by: Luo, Kaiwen, et al.
Published: (2026)
by: Luo, Kaiwen, et al.
Published: (2026)
When Audio Generators Become Good Listeners: Generative Features for Understanding Tasks
by: Xie, Zeyu, et al.
Published: (2025)
by: Xie, Zeyu, et al.
Published: (2025)
Focus Then Listen: Exploring Plug-and-Play Audio Enhancer for Noise-Robust Large Audio Language Models
by: Yin, Han, et al.
Published: (2026)
by: Yin, Han, et al.
Published: (2026)
Noisereduce: Domain General Noise Reduction for Time Series Signals
by: Sainburg, Tim, et al.
Published: (2024)
by: Sainburg, Tim, et al.
Published: (2024)
AudioFab: Building A General and Intelligent Audio Factory through Tool Learning
by: Zhu, Cheng, et al.
Published: (2025)
by: Zhu, Cheng, et al.
Published: (2025)
The Rarity of Musical Audio Signals Within the Space of Possible Audio Generation
by: Collins, Nick
Published: (2024)
by: Collins, Nick
Published: (2024)
FakeSound: Deepfake General Audio Detection
by: Xie, Zeyu, et al.
Published: (2024)
by: Xie, Zeyu, et al.
Published: (2024)
Pengi: An Audio Language Model for Audio Tasks
by: Deshmukh, Soham, et al.
Published: (2023)
by: Deshmukh, Soham, et al.
Published: (2023)
ChronosAudio: A Comprehensive Long-Audio Benchmark for Evaluating Audio-Large Language Models
by: Luo, Kaiwen, et al.
Published: (2026)
by: Luo, Kaiwen, et al.
Published: (2026)
Fish Audio S2 Technical Report
by: Liao, Shijia, et al.
Published: (2026)
by: Liao, Shijia, et al.
Published: (2026)
SRC-gAudio: Sampling-Rate-Controlled Audio Generation
by: Li, Chenxing, et al.
Published: (2024)
by: Li, Chenxing, et al.
Published: (2024)
Similar Items
-
NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing
by: Liang, Yifan, et al.
Published: (2025) -
Neural Vocoders as Speech Enhancers
by: Li, Andong, et al.
Published: (2025) -
Scalable Neural Vocoder from Range-Null Space Decomposition
by: Li, Andong, et al.
Published: (2026) -
Learning Neural Vocoder from Range-Null Space Decomposition
by: Li, Andong, et al.
Published: (2025) -
BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective
by: Li, Andong, et al.
Published: (2025)