Saved in:
| Main Authors: | Liu, Hexin, Zhang, Haoyang, Zhang, Qiquan, Zhang, Xiangyu, Shi, Dongyuan, Chng, Eng Siong, Li, Haizhou |
|---|---|
| Format: | Preprint |
| Published: |
2025
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2509.24310 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Aligning Speech to Languages to Enhance Code-switching Speech Recognition
by: Liu, Hexin, et al.
Published: (2024)
by: Liu, Hexin, et al.
Published: (2024)
Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model
by: Zhang, Xiangyu, et al.
Published: (2024)
by: Zhang, Xiangyu, et al.
Published: (2024)
Bi-directional Context-Enhanced Speech Large Language Models for Multilingual Conversational ASR
by: Peng, Yizhou, et al.
Published: (2025)
by: Peng, Yizhou, et al.
Published: (2025)
Improving Code-Switching Speech Recognition with TTS Data Augmentation
by: Yeo, Yue Heng, et al.
Published: (2026)
by: Yeo, Yue Heng, et al.
Published: (2026)
Long-Context Modeling Networks for Monaural Speech Enhancement: A Comparative Study
by: Zhang, Qiquan, et al.
Published: (2025)
by: Zhang, Qiquan, et al.
Published: (2025)
SpeechT-RAG: Reliable Depression Detection in LLMs with Retrieval-Augmented Generation Using Speech Timing Information
by: Zhang, Xiangyu, et al.
Published: (2025)
by: Zhang, Xiangyu, et al.
Published: (2025)
GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
by: Yao, Jixun, et al.
Published: (2025)
by: Yao, Jixun, et al.
Published: (2025)
EASY: Emotion-aware Speaker Anonymization via Factorized Distillation
by: Yao, Jixun, et al.
Published: (2025)
by: Yao, Jixun, et al.
Published: (2025)
Impact of Frame Rates on Speech Tokenizer: A Case Study on Mandarin and English
by: Zhang, Haoyang, et al.
Published: (2025)
by: Zhang, Haoyang, et al.
Published: (2025)
Selective State Space Model for Monaural Speech Enhancement
by: Chen, Moran, et al.
Published: (2024)
by: Chen, Moran, et al.
Published: (2024)
Mamba in Speech: Towards an Alternative to Self-Attention
by: Zhang, Xiangyu, et al.
Published: (2024)
by: Zhang, Xiangyu, et al.
Published: (2024)
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation
by: Luong, Hieu-Thi, et al.
Published: (2024)
by: Luong, Hieu-Thi, et al.
Published: (2024)
DuplexSLA: A Full-Duplex Spoken Language Model with Synchronized Speech, Language, and Action
by: Zhang, Haoyang, et al.
Published: (2026)
by: Zhang, Haoyang, et al.
Published: (2026)
UniArray: Unified Spectral-Spatial Modeling for Array-Geometry-Agnostic Speech Separation
by: Chen, Weiguang, et al.
Published: (2025)
by: Chen, Weiguang, et al.
Published: (2025)
NTU Speechlab LLM-Based Multilingual ASR System for Interspeech MLC-SLM Challenge 2025
by: Peng, Yizhou, et al.
Published: (2025)
by: Peng, Yizhou, et al.
Published: (2025)
Speech Enhancement Using Continuous Embeddings of Neural Audio Codec
by: Li, Haoyang, et al.
Published: (2025)
by: Li, Haoyang, et al.
Published: (2025)
When LLMs Meets Acoustic Landmarks: An Efficient Approach to Integrate Speech into Large Language Models for Depression Detection
by: Zhang, Xiangyu, et al.
Published: (2024)
by: Zhang, Xiangyu, et al.
Published: (2024)
Speech Separation using Neural Audio Codecs with Embedding Loss
by: Yip, Jia Qi, et al.
Published: (2024)
by: Yip, Jia Qi, et al.
Published: (2024)
Noise-Aware Speech Separation with Contrastive Learning
by: Zhang, Zizheng, et al.
Published: (2023)
by: Zhang, Zizheng, et al.
Published: (2023)
Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization
by: Tao, Ruijie, et al.
Published: (2024)
by: Tao, Ruijie, et al.
Published: (2024)
Exploring Length Generalization For Transformer-based Speech Enhancement
by: Zhang, Qiquan, et al.
Published: (2025)
by: Zhang, Qiquan, et al.
Published: (2025)
An Exploration of Length Generalization in Transformer-Based Speech Enhancement
by: Zhang, Qiquan, et al.
Published: (2024)
by: Zhang, Qiquan, et al.
Published: (2024)
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
by: Hu, Yuchen, et al.
Published: (2024)
by: Hu, Yuchen, et al.
Published: (2024)
Zero-shot Context Biasing with Trie-based Decoding using Synthetic Multi-Pronunciation
by: Liu, Changsong, et al.
Published: (2025)
by: Liu, Changsong, et al.
Published: (2025)
Summary on The Multilingual Conversational Speech Language Model Challenge: Datasets, Tasks, Baselines, and Methods
by: Mu, Bingshen, et al.
Published: (2025)
by: Mu, Bingshen, et al.
Published: (2025)
Dataset-Distillation Generative Model for Speech Emotion Recognition
by: Ritter-Gutierrez, Fabian, et al.
Published: (2024)
by: Ritter-Gutierrez, Fabian, et al.
Published: (2024)
Hierarchical Self-Supervised Representation Learning for Depression Detection from Speech
by: Li, Yuxin, et al.
Published: (2025)
by: Li, Yuxin, et al.
Published: (2025)
Training-Free Intelligibility-Guided Observation Addition for Noisy ASR
by: Li, Haoyang, et al.
Published: (2026)
by: Li, Haoyang, et al.
Published: (2026)
From KAN to GR-KAN: Advancing Speech Enhancement with KAN-Based Methodology
by: Li, Haoyang, et al.
Published: (2024)
by: Li, Haoyang, et al.
Published: (2024)
Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs
by: Yuhang, Yang, et al.
Published: (2024)
by: Yuhang, Yang, et al.
Published: (2024)
Noise-aware Speech Enhancement using Diffusion Probabilistic Model
by: Hu, Yuchen, et al.
Published: (2023)
by: Hu, Yuchen, et al.
Published: (2023)
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
by: Hu, Yuchen, et al.
Published: (2023)
by: Hu, Yuchen, et al.
Published: (2023)
Step-Audio-R1.5 Technical Report
by: Zhang, Yuxin, et al.
Published: (2026)
by: Zhang, Yuxin, et al.
Published: (2026)
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback
by: Chen, Chen, et al.
Published: (2024)
by: Chen, Chen, et al.
Published: (2024)
Robust Localization of Partially Fake Speech: Metrics and Out-of-Domain Evaluation
by: Luong, Hieu-Thi, et al.
Published: (2025)
by: Luong, Hieu-Thi, et al.
Published: (2025)
Analysis of Speaker Verification Performance Trade-offs with Neural Audio Codec Transmission
by: Thakur, Nirmalya Mallick, et al.
Published: (2025)
by: Thakur, Nirmalya Mallick, et al.
Published: (2025)
Stream-Voice-Anon: Enhancing Utility of Real-Time Speaker Anonymization via Neural Audio Codec and Language Models
by: Kuzmin, Nikita, et al.
Published: (2026)
by: Kuzmin, Nikita, et al.
Published: (2026)
SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model
by: Qian, Xinyuan, et al.
Published: (2024)
by: Qian, Xinyuan, et al.
Published: (2024)
Multi-band Frequency Reconstruction for Neural Psychoacoustic Coding
by: Ng, Dianwen, et al.
Published: (2025)
by: Ng, Dianwen, et al.
Published: (2025)
StreamVoiceAnon+: Emotion-Preserving Streaming Speaker Anonymization via Frame-Level Acoustic Distillation
by: Kuzmin, Nikita, et al.
Published: (2026)
by: Kuzmin, Nikita, et al.
Published: (2026)
Similar Items
-
Aligning Speech to Languages to Enhance Code-switching Speech Recognition
by: Liu, Hexin, et al.
Published: (2024) -
Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model
by: Zhang, Xiangyu, et al.
Published: (2024) -
Bi-directional Context-Enhanced Speech Large Language Models for Multilingual Conversational ASR
by: Peng, Yizhou, et al.
Published: (2025) -
Improving Code-Switching Speech Recognition with TTS Data Augmentation
by: Yeo, Yue Heng, et al.
Published: (2026) -
Long-Context Modeling Networks for Monaural Speech Enhancement: A Comparative Study
by: Zhang, Qiquan, et al.
Published: (2025)