Saved in:
| Main Authors: | Zhu, Qiushi, Zhang, Jie, Gu, Yu, Hu, Yuchen, Dai, Lirong |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2401.03468 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Automatic classification of stop realisation with wav2vec2.0
by: Tanner, James, et al.
Published: (2025)
by: Tanner, James, et al.
Published: (2025)
Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
by: Wang, Zhiyong, et al.
Published: (2024)
by: Wang, Zhiyong, et al.
Published: (2024)
Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
by: Ren, Wenze, et al.
Published: (2024)
by: Ren, Wenze, et al.
Published: (2024)
Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0
by: Bayerl, Sebastian P., et al.
Published: (2022)
by: Bayerl, Sebastian P., et al.
Published: (2022)
Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
by: Li, Guinan, et al.
Published: (2024)
by: Li, Guinan, et al.
Published: (2024)
vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders
by: Guo, Yiwei, et al.
Published: (2024)
by: Guo, Yiwei, et al.
Published: (2024)
A Phoneme-Scale Assessment of Multichannel Speech Enhancement Algorithms
by: Monir, Nasser-Eddine, et al.
Published: (2024)
by: Monir, Nasser-Eddine, et al.
Published: (2024)
Decoupled Spatial and Temporal Processing for Resource Efficient Multichannel Speech Enhancement
by: Pandey, Ashutosh, et al.
Published: (2024)
by: Pandey, Ashutosh, et al.
Published: (2024)
On the Importance of Neural Wiener Filter for Resource Efficient Multichannel Speech Enhancement
by: Hsieh, Tsun-An, et al.
Published: (2024)
by: Hsieh, Tsun-An, et al.
Published: (2024)
Evaluating Multichannel Speech Enhancement Algorithms at the Phoneme Scale Across Genders
by: Monir, Nasser-Eddine, et al.
Published: (2025)
by: Monir, Nasser-Eddine, et al.
Published: (2025)
LABNet: A Lightweight Attentive Beamforming Network for Ad-hoc Multichannel Microphone Invariant Real-Time Speech Enhancement
by: Yan, Haoyin, et al.
Published: (2025)
by: Yan, Haoyin, et al.
Published: (2025)
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR
by: Hu, Yuchen, et al.
Published: (2023)
by: Hu, Yuchen, et al.
Published: (2023)
Multichannel Keyword Spotting for Noisy Conditions
by: Saladukha, Dzmitry, et al.
Published: (2025)
by: Saladukha, Dzmitry, et al.
Published: (2025)
Multichannel Long-Term Streaming Neural Speech Enhancement for Static and Moving Speakers
by: Quan, Changsheng, et al.
Published: (2024)
by: Quan, Changsheng, et al.
Published: (2024)
A Novel Deep Learning Framework for Efficient Multichannel Acoustic Feedback Control
by: Wu, Yuan-Kuei, et al.
Published: (2025)
by: Wu, Yuan-Kuei, et al.
Published: (2025)
Mel-McNet: A Mel-Scale Framework for Online Multichannel Speech Enhancement
by: Yang, Yujie, et al.
Published: (2025)
by: Yang, Yujie, et al.
Published: (2025)
Multichannel-to-Multichannel Target Sound Extraction Using Direction and Timestamp Clues
by: Choi, Dayun, et al.
Published: (2024)
by: Choi, Dayun, et al.
Published: (2024)
Iterative refinement, not training objective, makes HuBERT behave differently from wav2vec 2.0
by: Huo, Robin, et al.
Published: (2025)
by: Huo, Robin, et al.
Published: (2025)
DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis
by: Gu, Yu, et al.
Published: (2024)
by: Gu, Yu, et al.
Published: (2024)
WavFusion: Towards wav2vec 2.0 Multimodal Speech Emotion Recognition
by: Li, Feng, et al.
Published: (2024)
by: Li, Feng, et al.
Published: (2024)
Determined Multichannel Blind Source Separation with Clustered Source Model
by: Wang, Jianyu, et al.
Published: (2024)
by: Wang, Jianyu, et al.
Published: (2024)
Multichannel Voice Trigger Detection Based on Transform-average-concatenate
by: Higuchi, Takuya, et al.
Published: (2023)
by: Higuchi, Takuya, et al.
Published: (2023)
RelUNet: Relative Channel Fusion U-Net for Multichannel Speech Enhancement
by: Aldarmaki, Ibrahim, et al.
Published: (2024)
by: Aldarmaki, Ibrahim, et al.
Published: (2024)
Accelerated Convolutive Transfer Function-Based Multichannel NMF Using Iterative Source Steering
by: Xie, Xuemai, et al.
Published: (2025)
by: Xie, Xuemai, et al.
Published: (2025)
DeFT-Mamba: Universal Multichannel Sound Separation and Polyphonic Audio Classification
by: Lee, Dongheon, et al.
Published: (2024)
by: Lee, Dongheon, et al.
Published: (2024)
Multichannel blind speech source separation with a disjoint constraint source model
by: Wang, Jianyu, et al.
Published: (2024)
by: Wang, Jianyu, et al.
Published: (2024)
LCM-SVC: Latent Diffusion Model Based Singing Voice Conversion with Inference Acceleration via Latent Consistency Distillation
by: Chen, Shihao, et al.
Published: (2024)
by: Chen, Shihao, et al.
Published: (2024)
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction
by: Chen, Xueyuan, et al.
Published: (2024)
by: Chen, Xueyuan, et al.
Published: (2024)
Experimental Study: Enhancing Voice Spoofing Detection Models with wav2vec 2.0
by: Kang, Taein, et al.
Published: (2024)
by: Kang, Taein, et al.
Published: (2024)
Adversarial speech for voice privacy protection from Personalized Speech generation
by: Chen, Shihao, et al.
Published: (2024)
by: Chen, Shihao, et al.
Published: (2024)
Event Classification by Physics-informed Inpainting for Distributed Multichannel Acoustic Sensor with Partially Degraded Channels
by: Tonami, Noriyuki, et al.
Published: (2026)
by: Tonami, Noriyuki, et al.
Published: (2026)
Noise-aware Speech Enhancement using Diffusion Probabilistic Model
by: Hu, Yuchen, et al.
Published: (2023)
by: Hu, Yuchen, et al.
Published: (2023)
Constraint Optimized Multichannel Mixer-limiter Design
by: Luo, Yuancheng, et al.
Published: (2025)
by: Luo, Yuancheng, et al.
Published: (2025)
LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance
by: Chen, Shihao, et al.
Published: (2024)
by: Chen, Shihao, et al.
Published: (2024)
Compression of Higher Order Ambisonics with Multichannel RVQGAN
by: Hirvonen, Toni, et al.
Published: (2024)
by: Hirvonen, Toni, et al.
Published: (2024)
WearVox: An Egocentric Multichannel Voice Assistant Benchmark for Wearables
by: Lin, Zhaojiang, et al.
Published: (2025)
by: Lin, Zhaojiang, et al.
Published: (2025)
3D Room Geometry Inference from Multichannel Room Impulse Response using Deep Neural Network
by: Yeon, Inmo, et al.
Published: (2024)
by: Yeon, Inmo, et al.
Published: (2024)
DQ-Data2vec: Decoupling Quantization for Multilingual Speech Recognition
by: Shao, Qijie, et al.
Published: (2025)
by: Shao, Qijie, et al.
Published: (2025)
voc2vec: A Foundation Model for Non-Verbal Vocalization
by: Koudounas, Alkis, et al.
Published: (2025)
by: Koudounas, Alkis, et al.
Published: (2025)
A Comparative Analysis of Generalised Echo and Interference Cancelling and Extended Multichannel Wiener Filtering for Combined Noise Reduction and Acoustic Echo Cancellation
by: Roebben, Arnout, et al.
Published: (2025)
by: Roebben, Arnout, et al.
Published: (2025)
Similar Items
-
Automatic classification of stop realisation with wav2vec2.0
by: Tanner, James, et al.
Published: (2025) -
Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
by: Wang, Zhiyong, et al.
Published: (2024) -
Leveraging Joint Spectral and Spatial Learning with MAMBA for Multichannel Speech Enhancement
by: Ren, Wenze, et al.
Published: (2024) -
Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0
by: Bayerl, Sebastian P., et al.
Published: (2022) -
Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition
by: Li, Guinan, et al.
Published: (2024)