Saved in:
| Main Authors: | Li, Haoyang, Hou, Nana, Hu, Yuchen, Yao, Jixun, Siniscalchi, Sabato Marco, Zhuang, Xuyi, Ye, Deheng, Yang, Wei, Chng, Eng Siong |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2507.09929 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
From KAN to GR-KAN: Advancing Speech Enhancement with KAN-Based Methodology
by: Li, Haoyang, et al.
Published: (2024)
by: Li, Haoyang, et al.
Published: (2024)
GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
by: Yao, Jixun, et al.
Published: (2025)
by: Yao, Jixun, et al.
Published: (2025)
EASY: Emotion-aware Speaker Anonymization via Factorized Distillation
by: Yao, Jixun, et al.
Published: (2025)
by: Yao, Jixun, et al.
Published: (2025)
GenTSE: Enhancing Target Speaker Extraction via a Coarse-to-Fine Generative Language Model
by: Li, Haoyang, et al.
Published: (2025)
by: Li, Haoyang, et al.
Published: (2025)
Noise-aware Speech Enhancement using Diffusion Probabilistic Model
by: Hu, Yuchen, et al.
Published: (2023)
by: Hu, Yuchen, et al.
Published: (2023)
Speech Enhancement Using Continuous Embeddings of Neural Audio Codec
by: Li, Haoyang, et al.
Published: (2025)
by: Li, Haoyang, et al.
Published: (2025)
Aligning Speech to Languages to Enhance Code-switching Speech Recognition
by: Liu, Hexin, et al.
Published: (2024)
by: Liu, Hexin, et al.
Published: (2024)
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback
by: Chen, Chen, et al.
Published: (2024)
by: Chen, Chen, et al.
Published: (2024)
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
by: Hu, Yuchen, et al.
Published: (2023)
by: Hu, Yuchen, et al.
Published: (2023)
Exploiting Consistency-Preserving Loss and Perceptual Contrast Stretching to Boost SSL-based Speech Enhancement
by: Khan, Muhammad Salman, et al.
Published: (2024)
by: Khan, Muhammad Salman, et al.
Published: (2024)
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation
by: Luong, Hieu-Thi, et al.
Published: (2024)
by: Luong, Hieu-Thi, et al.
Published: (2024)
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
by: Hu, Yuchen, et al.
Published: (2024)
by: Hu, Yuchen, et al.
Published: (2024)
Noise-Aware Speech Separation with Contrastive Learning
by: Zhang, Zizheng, et al.
Published: (2023)
by: Zhang, Zizheng, et al.
Published: (2023)
Bi-directional Context-Enhanced Speech Large Language Models for Multilingual Conversational ASR
by: Peng, Yizhou, et al.
Published: (2025)
by: Peng, Yizhou, et al.
Published: (2025)
Code-switching Speech Recognition Under the Lens: Model- and Data-Centric Perspectives
by: Liu, Hexin, et al.
Published: (2025)
by: Liu, Hexin, et al.
Published: (2025)
Improving Code-Switching Speech Recognition with TTS Data Augmentation
by: Yeo, Yue Heng, et al.
Published: (2026)
by: Yeo, Yue Heng, et al.
Published: (2026)
NTU-NPU System for Voice Privacy 2024 Challenge
by: Kuzmin, Nikita, et al.
Published: (2024)
by: Kuzmin, Nikita, et al.
Published: (2024)
Speech Separation using Neural Audio Codecs with Embedding Loss
by: Yip, Jia Qi, et al.
Published: (2024)
by: Yip, Jia Qi, et al.
Published: (2024)
An Explicit Consistency-Preserving Loss Function for Phase Reconstruction and Speech Enhancement
by: Ku, Pin-Jui, et al.
Published: (2024)
by: Ku, Pin-Jui, et al.
Published: (2024)
An Investigation on Combining Geometry and Consistency Constraints into Phase Estimation for Speech Enhancement
by: Ho, Chun-Wei, et al.
Published: (2025)
by: Ho, Chun-Wei, et al.
Published: (2025)
Bayesian adaptive learning to latent variables via Variational Bayes and Maximum a Posteriori
by: Hu, Hu, et al.
Published: (2024)
by: Hu, Hu, et al.
Published: (2024)
Hierarchical Self-Supervised Representation Learning for Depression Detection from Speech
by: Li, Yuxin, et al.
Published: (2025)
by: Li, Yuxin, et al.
Published: (2025)
Exploiting Foundation Models and Speech Enhancement for Parkinson's Disease Detection from Speech in Real-World Operative Conditions
by: La Quatra, Moreno, et al.
Published: (2024)
by: La Quatra, Moreno, et al.
Published: (2024)
Exploring Generative Error Correction for Dysarthric Speech Recognition
by: La Quatra, Moreno, et al.
Published: (2025)
by: La Quatra, Moreno, et al.
Published: (2025)
Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs
by: Yuhang, Yang, et al.
Published: (2024)
by: Yuhang, Yang, et al.
Published: (2024)
UniArray: Unified Spectral-Spatial Modeling for Array-Geometry-Agnostic Speech Separation
by: Chen, Weiguang, et al.
Published: (2025)
by: Chen, Weiguang, et al.
Published: (2025)
NPU-NTU System for Voice Privacy 2024 Challenge
by: Yao, Jixun, et al.
Published: (2024)
by: Yao, Jixun, et al.
Published: (2024)
A Bottom-up Framework with Language-universal Speech Attribute Modeling for Syllable-based ASR
by: Yen, Hao, et al.
Published: (2025)
by: Yen, Hao, et al.
Published: (2025)
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
by: Chen, Chen, et al.
Published: (2024)
by: Chen, Chen, et al.
Published: (2024)
Zero-shot Context Biasing with Trie-based Decoding using Synthetic Multi-Pronunciation
by: Liu, Changsong, et al.
Published: (2025)
by: Liu, Changsong, et al.
Published: (2025)
A Knowledge-Driven Approach to Target Speech Extraction in the Presence of Background Sound Effects for Cinematic Audio Source Separation (CASS)
by: Ho, Chun-wei, et al.
Published: (2026)
by: Ho, Chun-wei, et al.
Published: (2026)
Bilingual Dual-Head Deep Model for Parkinson's Disease Detection from Speech
by: La Quatra, Moreno, et al.
Published: (2025)
by: La Quatra, Moreno, et al.
Published: (2025)
Robust Localization of Partially Fake Speech: Metrics and Out-of-Domain Evaluation
by: Luong, Hieu-Thi, et al.
Published: (2025)
by: Luong, Hieu-Thi, et al.
Published: (2025)
Analysis of Speaker Verification Performance Trade-offs with Neural Audio Codec Transmission
by: Thakur, Nirmalya Mallick, et al.
Published: (2025)
by: Thakur, Nirmalya Mallick, et al.
Published: (2025)
Towards Audio Codec-based Speech Separation
by: Yip, Jia Qi, et al.
Published: (2024)
by: Yip, Jia Qi, et al.
Published: (2024)
Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models
by: Hu, Yuchen, et al.
Published: (2024)
by: Hu, Yuchen, et al.
Published: (2024)
An Investigation of Incorporating Mamba for Speech Enhancement
by: Chao, Rong, et al.
Published: (2024)
by: Chao, Rong, et al.
Published: (2024)
Training-Free Intelligibility-Guided Observation Addition for Noisy ASR
by: Li, Haoyang, et al.
Published: (2026)
by: Li, Haoyang, et al.
Published: (2026)
Audio Large Language Models Can Be Descriptive Speech Quality Evaluators
by: Chen, Chen, et al.
Published: (2025)
by: Chen, Chen, et al.
Published: (2025)
Hallucination Benchmark for Speech Foundation Models
by: Koudounas, Alkis, et al.
Published: (2025)
by: Koudounas, Alkis, et al.
Published: (2025)
Similar Items
-
From KAN to GR-KAN: Advancing Speech Enhancement with KAN-Based Methodology
by: Li, Haoyang, et al.
Published: (2024) -
GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
by: Yao, Jixun, et al.
Published: (2025) -
EASY: Emotion-aware Speaker Anonymization via Factorized Distillation
by: Yao, Jixun, et al.
Published: (2025) -
GenTSE: Enhancing Target Speaker Extraction via a Coarse-to-Fine Generative Language Model
by: Li, Haoyang, et al.
Published: (2025) -
Noise-aware Speech Enhancement using Diffusion Probabilistic Model
by: Hu, Yuchen, et al.
Published: (2023)