:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Chen, Jun, Hu, Shichao, Lin, Jiuxin, Li, Wenjie, Zhang, Zihan, Li, Xingchen, Liu, JinJiang, Xiao, Longshuai, Weng, Chao, Xie, Lei, Wu, Zhiyong
Format:	Preprint
Published:	2025
Subjects:	Sound Artificial Intelligence
Online Access:	https://arxiv.org/abs/2510.10687
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

MeanFlowSE: One-Step Generative Speech Enhancement via MeanFlow
by: Zhu, Yike, et al.
Published: (2025)

DualSep: A Light-weight dual-encoder convolutional recurrent network for real-time in-car speech separation
by: Wang, Ziqian, et al.
Published: (2024)

LLaSE-G1: Incentivizing Generalization Capability for LLaMA-based Speech Enhancement
by: Kang, Boyi, et al.
Published: (2025)

CabinSep: IR-Augmented Mask-Based MVDR for Real-Time In-Car Speech Separation with Distributed Heterogeneous Arrays
by: Han, Runduo, et al.
Published: (2025)

AISHELL-5: The First Open-Source In-Car Multi-Channel Multi-Speaker Speech Dataset for Automatic Speech Diarization and Recognition
by: Dai, Yuhang, et al.
Published: (2025)

A Fast and Lightweight Model for Causal Audio-Visual Speech Separation
by: Sang, Wendi, et al.
Published: (2025)

Summary on The Multilingual Conversational Speech Language Model Challenge: Datasets, Tasks, Baselines, and Methods
by: Mu, Bingshen, et al.
Published: (2025)

Advances in Speech Separation: Techniques, Challenges, and Future Trends
by: Li, Kai, et al.
Published: (2025)

A Lightweight and Real-Time Binaural Speech Enhancement Model with Spatial Cues Preservation
by: Wang, Jingyuan, et al.
Published: (2024)

Study of Lightweight Transformer Architectures for Single-Channel Speech Enhancement
by: Zhao, Haixin, et al.
Published: (2025)

Llasa+: Free Lunch for Accelerated and Streaming Llama-Based Speech Synthesis
by: Tian, Wenjie, et al.
Published: (2025)

KALL-E:Autoregressive Speech Synthesis with Next-Distribution Prediction
by: Xia, Kangxiang, et al.
Published: (2024)

SLM-SS: Speech Language Model for Generative Speech Separation
by: Li, Tianhua, et al.
Published: (2026)

Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
by: Li, Guinan, et al.
Published: (2024)

RawTFNet: A Lightweight CNN Architecture for Speech Anti-spoofing
by: Xiao, Yang, et al.
Published: (2025)

Fake Speech Wild: Detecting Deepfake Speech on Social Media Platform
by: Xie, Yuankun, et al.
Published: (2025)

Optimizing Neural Architectures for Hindi Speech Separation and Enhancement in Noisy Environments
by: Ramamoorthy, Arnav
Published: (2025)

SenSE: Semantic-Aware High-Fidelity Universal Speech Enhancement
by: Li, Xingchen, et al.
Published: (2025)

SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified Flow
by: Wang, Kaidi, et al.
Published: (2025)

CapTalk: Unified Voice Design for Single-Utterance and Dialogue Speech Generation
by: Su, Xiaosu, et al.
Published: (2026)

FleSpeech: Flexibly Controllable Speech Generation with Various Prompts
by: Li, Hanzhao, et al.
Published: (2025)

DialoSpeech: Dual-Speaker Dialogue Generation with LLM and Flow Matching
by: Xie, Hanke, et al.
Published: (2025)

From Coarse to Fine: Recursive Audio-Visual Semantic Enhancement for Speech Separation
by: Xue, Ke, et al.
Published: (2025)

Distil-DCCRN: A Small-footprint DCCRN Leveraging Feature-based Knowledge Distillation in Speech Enhancement
by: Han, Runduo, et al.
Published: (2024)

DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis
by: Gu, Yu, et al.
Published: (2024)

TF-CorrNet: Leveraging Spatial Correlation for Continuous Speech Separation
by: Shin, Ui-Hyeop, et al.
Published: (2025)

Time-Frequency-Based Attention Cache Memory Model for Real-Time Speech Separation
by: Chen, Guo, et al.
Published: (2025)

Neural personal sound zones with flexible bright zone control
by: Zhu, Wenye, et al.
Published: (2025)

AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis
by: Luo, Dan, et al.
Published: (2025)

dLLM-ASR: A Faster Diffusion LLM-based Framework for Speech Recognition
by: Tian, Wenjie, et al.
Published: (2026)

An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical Circuits
by: Li, Kai, et al.
Published: (2022)

A Lightweight Fourier-based Network for Binaural Speech Enhancement with Spatial Cue Preservation
by: Lu, Xikun, et al.
Published: (2025)

Nes2Net: A Lightweight Nested Architecture for Foundation Model Driven Speech Anti-spoofing
by: Liu, Tianchi, et al.
Published: (2025)

How Well Do Current Speech Deepfake Detection Methods Generalize to the Real World?
by: Li, Daixian, et al.
Published: (2026)

Adaptive Data Augmentation with NaturalSpeech3 for Far-field Speaker Verification
by: Zhang, Li, et al.
Published: (2025)

Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
by: Ren, Wenze, et al.
Published: (2024)

Leveraging Spatial Cues from Cochlear Implant Microphones to Efficiently Enhance Speech Separation in Real-World Listening Scenes
by: Olalere, Feyisayo, et al.
Published: (2025)

Enhancing Generalization of Speech Large Language Models with Multi-Task Behavior Imitation and Speech-Text Interleaving
by: Xie, Jingran, et al.
Published: (2025)

In This Environment, As That Speaker: A Text-Driven Framework for Multi-Attribute Speech Conversion
by: Jin, Jiawei, et al.
Published: (2025)

TF-MLPNet: Tiny Real-Time Neural Speech Separation
by: Itani, Malek, et al.
Published: (2025)