:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhu, Qiushi, Zhang, Jie, Gu, Yu, Hu, Yuchen, Dai, Lirong
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Sound
Online Access:	https://arxiv.org/abs/2401.03468
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Automatic classification of stop realisation with wav2vec2.0
by: Tanner, James, et al.
Published: (2025)

Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
by: Wang, Zhiyong, et al.
Published: (2024)

Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
by: Ren, Wenze, et al.
Published: (2024)

Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0
by: Bayerl, Sebastian P., et al.
Published: (2022)

Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
by: Li, Guinan, et al.
Published: (2024)

vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders
by: Guo, Yiwei, et al.
Published: (2024)

A Phoneme-Scale Assessment of Multichannel Speech Enhancement Algorithms
by: Monir, Nasser-Eddine, et al.
Published: (2024)

Decoupled Spatial and Temporal Processing for Resource Efficient Multichannel Speech Enhancement
by: Pandey, Ashutosh, et al.
Published: (2024)

On the Importance of Neural Wiener Filter for Resource Efficient Multichannel Speech Enhancement
by: Hsieh, Tsun-An, et al.
Published: (2024)

Evaluating Multichannel Speech Enhancement Algorithms at the Phoneme Scale Across Genders
by: Monir, Nasser-Eddine, et al.
Published: (2025)

LABNet: A Lightweight Attentive Beamforming Network for Ad-hoc Multichannel Microphone Invariant Real-Time Speech Enhancement
by: Yan, Haoyin, et al.
Published: (2025)

Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
by: Hu, Yuchen, et al.
Published: (2023)

Multichannel Keyword Spotting for Noisy Conditions
by: Saladukha, Dzmitry, et al.
Published: (2025)

Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers
by: Quan, Changsheng, et al.
Published: (2024)

A Novel Deep Learning Framework for Efficient Multichannel Acoustic Feedback Control
by: Wu, Yuan-Kuei, et al.
Published: (2025)

Mel-McNet: A Mel-Scale Framework for Online Multichannel Speech Enhancement
by: Yang, Yujie, et al.
Published: (2025)

Multichannel-to-Multichannel Target Sound Extraction Using Direction and Timestamp Clues
by: Choi, Dayun, et al.
Published: (2024)

Iterative refinement, not training objective, makes HuBERT behave differently from wav2vec 2.0
by: Huo, Robin, et al.
Published: (2025)

DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis
by: Gu, Yu, et al.
Published: (2024)

WavFusion: Towards wav2vec 2.0 Multimodal Speech Emotion Recognition
by: Li, Feng, et al.
Published: (2024)

Determined Multichannel Blind Source Separation with Clustered Source Model
by: Wang, Jianyu, et al.
Published: (2024)

Multichannel Voice Trigger Detection Based on Transform-average-concatenate
by: Higuchi, Takuya, et al.
Published: (2023)

RelUNet: Relative Channel Fusion U-Net for Multichannel Speech Enhancement
by: Aldarmaki, Ibrahim, et al.
Published: (2024)

Accelerated Convolutive Transfer Function-Based Multichannel NMF Using Iterative Source Steering
by: Xie, Xuemai, et al.
Published: (2025)

DeFT-Mamba: Universal Multichannel Sound Separation and Polyphonic Audio Classification
by: Lee, Dongheon, et al.
Published: (2024)

Multichannel blind speech source separation with a disjoint constraint source model
by: Wang, Jianyu, et al.
Published: (2024)

LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation
by: Chen, Shihao, et al.
Published: (2024)

Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction
by: Chen, Xueyuan, et al.
Published: (2024)

Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0
by: Kang, Taein, et al.
Published: (2024)

Adversarial speech for voice privacy protection from Personalized Speech generation
by: Chen, Shihao, et al.
Published: (2024)

Event Classification by Physics-informed Inpainting for Distributed Multichannel Acoustic Sensor with Partially Degraded Channels
by: Tonami, Noriyuki, et al.
Published: (2026)

Noise-aware Speech Enhancement using Diffusion Probabilistic Model
by: Hu, Yuchen, et al.
Published: (2023)

Constraint Optimized Multichannel Mixer-limiter Design
by: Luo, Yuancheng, et al.
Published: (2025)

LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance
by: Chen, Shihao, et al.
Published: (2024)

Compression of Higher Order Ambisonics with Multichannel RVQGAN
by: Hirvonen, Toni, et al.
Published: (2024)

WearVox: An Egocentric Multichannel Voice Assistant Benchmark for Wearables
by: Lin, Zhaojiang, et al.
Published: (2025)

3D Room Geometry Inference from Multichannel Room Impulse Response using Deep Neural Network
by: Yeon, Inmo, et al.
Published: (2024)

DQ-Data2vec: Decoupling Quantization for Multilingual Speech Recognition
by: Shao, Qijie, et al.
Published: (2025)

voc2vec: A Foundation Model for Non-Verbal Vocalization
by: Koudounas, Alkis, et al.
Published: (2025)

A Comparative Analysis of Generalised Echo and Interference Cancelling and Extended Multichannel Wiener Filtering for Combined Noise Reduction and Acoustic Echo Cancellation
by: Roebben, Arnout, et al.
Published: (2025)