Saved in:
| Main Authors: | Zhang, Shiqi, Qiu, Zheng, Takeuchi, Daiki, Harada, Noboru, Makino, Shoji |
|---|---|
| Format: | Preprint |
| Published: |
2024
|
| Subjects: | |
| Online Access: | https://arxiv.org/abs/2402.08252 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
Baseline Systems and Evaluation Metrics for Spatial Semantic Segmentation of Sound Scenes
by: Nguyen, Binh Thien, et al.
Published: (2025)
by: Nguyen, Binh Thien, et al.
Published: (2025)
CMGAN: Conformer-based Metric GAN for Speech Enhancement
by: Cao, Ruizhe, et al.
Published: (2022)
by: Cao, Ruizhe, et al.
Published: (2022)
Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval
by: Tsubaki, Shunsuke, et al.
Published: (2024)
by: Tsubaki, Shunsuke, et al.
Published: (2024)
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
by: Niizumi, Daisuke, et al.
Published: (2024)
by: Niizumi, Daisuke, et al.
Published: (2024)
Exploring Pre-trained General-purpose Audio Representations for Heart Murmur Detection
by: Niizumi, Daisuke, et al.
Published: (2024)
by: Niizumi, Daisuke, et al.
Published: (2024)
CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement
by: Abdulatif, Sherif, et al.
Published: (2022)
by: Abdulatif, Sherif, et al.
Published: (2022)
CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
by: Takeuchi, Daiki, et al.
Published: (2025)
by: Takeuchi, Daiki, et al.
Published: (2025)
Description and Discussion on DCASE 2026 Challenge Task 2: Noise-aware Unsupervised Anomalous Sound Detection for Machine Condition Monitoring
by: Nishida, Tomoya, et al.
Published: (2026)
by: Nishida, Tomoya, et al.
Published: (2026)
M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation
by: Niizumi, Daisuke, et al.
Published: (2024)
by: Niizumi, Daisuke, et al.
Published: (2024)
Assessing the Utility of Audio Foundation Models for Heart and Respiratory Sound Analysis
by: Niizumi, Daisuke, et al.
Published: (2025)
by: Niizumi, Daisuke, et al.
Published: (2025)
Towards Pre-training an Effective Respiratory Audio Foundation Model
by: Niizumi, Daisuke, et al.
Published: (2025)
by: Niizumi, Daisuke, et al.
Published: (2025)
Rethinking Masking Strategies for Masked Prediction-based Audio Self-supervised Learning
by: Niizumi, Daisuke, et al.
Published: (2026)
by: Niizumi, Daisuke, et al.
Published: (2026)
Does Single-channel Speech Enhancement Improve Keyword Spotting Accuracy? A Case Study
by: Brueggeman, Avamarie, et al.
Published: (2023)
by: Brueggeman, Avamarie, et al.
Published: (2023)
FUSE: Universal Speech Enhancement using Multi-Stage Fusion of Sparse Compression and Token Generation Models for the URGENT 2025 Challenge
by: Goswami, Nabarun, et al.
Published: (2025)
by: Goswami, Nabarun, et al.
Published: (2025)
Hallucination in Perceptual Metric-Driven Speech Enhancement Networks
by: Close, George, et al.
Published: (2024)
by: Close, George, et al.
Published: (2024)
6DoF SELD: Sound Event Localization and Detection Using Microphones and Motion Tracking Sensors on self-motioning human
by: Yasuda, Masahiro, et al.
Published: (2024)
by: Yasuda, Masahiro, et al.
Published: (2024)
Description and Discussion on DCASE 2025 Challenge Task 4: Spatial Semantic Segmentation of Sound Scenes
by: Yasuda, Masahiro, et al.
Published: (2025)
by: Yasuda, Masahiro, et al.
Published: (2025)
Entropy-Guided GRVQ for Ultra-Low Bitrate Neural Speech Codec
by: Ren, Yanzhou, et al.
Published: (2026)
by: Ren, Yanzhou, et al.
Published: (2026)
Accelerated Convolutive Transfer Function-Based Multichannel NMF Using Iterative Source Steering
by: Xie, Xuemai, et al.
Published: (2025)
by: Xie, Xuemai, et al.
Published: (2025)
Unsupervised Multi-channel Speech Dereverberation via Diffusion
by: Wu, Yulun, et al.
Published: (2025)
by: Wu, Yulun, et al.
Published: (2025)
Improving Speech Enhancement with Multi-Metric Supervision from Learned Quality Assessment
by: Wang, Wei, et al.
Published: (2025)
by: Wang, Wei, et al.
Published: (2025)
WTFormer: A Wavelet Conformer Network for MIMO Speech Enhancement with Spatial Cues Peservation
by: Han, Lu, et al.
Published: (2025)
by: Han, Lu, et al.
Published: (2025)
Study of Lightweight Transformer Architectures for Single-Channel Speech Enhancement
by: Zhao, Haixin, et al.
Published: (2025)
by: Zhao, Haixin, et al.
Published: (2025)
SaD: A Scenario-Aware Discriminator for Speech Enhancement
by: Yuan, Xihao, et al.
Published: (2025)
by: Yuan, Xihao, et al.
Published: (2025)
Acousto-optic reconstruction of exterior sound field based on concentric circle sampling with circular harmonic expansion
by: Nguyen, Phuc Duc, et al.
Published: (2023)
by: Nguyen, Phuc Duc, et al.
Published: (2023)
SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics
by: Saeki, Takaaki, et al.
Published: (2024)
by: Saeki, Takaaki, et al.
Published: (2024)
ctPuLSE: Close-Talk, and Pseudo-Label Based Far-Field, Speech Enhancement
by: Wang, Zhong-Qiu
Published: (2024)
by: Wang, Zhong-Qiu
Published: (2024)
An Explicit Consistency-Preserving Loss Function for Phase Reconstruction and Speech Enhancement
by: Ku, Pin-Jui, et al.
Published: (2024)
by: Ku, Pin-Jui, et al.
Published: (2024)
GAN-Based Speech Enhancement for Low SNR Using Latent Feature Conditioning
by: Shetu, Shrishti Saha, et al.
Published: (2024)
by: Shetu, Shrishti Saha, et al.
Published: (2024)
EffiFusion-GAN: Efficient Fusion Generative Adversarial Network for Speech Enhancement
by: Wen, Bin, et al.
Published: (2025)
by: Wen, Bin, et al.
Published: (2025)
DTT-BSR: GAN-based DTTNet with RoPE Transformer Enhancement for Music Source Restoration
by: Tan, Shihong, et al.
Published: (2026)
by: Tan, Shihong, et al.
Published: (2026)
Conformer-based Ultrasound-to-Speech Conversion
by: Ibrahimov, Ibrahim, et al.
Published: (2025)
by: Ibrahimov, Ibrahim, et al.
Published: (2025)
Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement
by: Lu, Ye-Xin, et al.
Published: (2023)
by: Lu, Ye-Xin, et al.
Published: (2023)
Adaptive Convolution for CNN-based Speech Enhancement Models
by: Wang, Dahan, et al.
Published: (2025)
by: Wang, Dahan, et al.
Published: (2025)
Diffusion-based Signal Refiner for Speech Enhancement and Separation
by: Hirano, Masato, et al.
Published: (2023)
by: Hirano, Masato, et al.
Published: (2023)
Improving Speech Enhancement by Integrating Inter-Channel and Band Features with Dual-branch Conformer
by: Li, Jizhen, et al.
Published: (2024)
by: Li, Jizhen, et al.
Published: (2024)
Leveraging Local and Global Knowledge Integration with Time-Frequency Calibrated Distillation for Speech Enhancement
by: Cheng, Jiaming, et al.
Published: (2025)
by: Cheng, Jiaming, et al.
Published: (2025)
Universal Score-based Speech Enhancement with High Content Preservation
by: Scheibler, Robin, et al.
Published: (2024)
by: Scheibler, Robin, et al.
Published: (2024)
xLSTM-SENet: xLSTM for Single-Channel Speech Enhancement
by: Kühne, Nikolai Lund, et al.
Published: (2025)
by: Kühne, Nikolai Lund, et al.
Published: (2025)
Confidence-based Filtering for Speech Dataset Curation with Generative Speech Enhancement Using Discrete Tokens
by: Yamauchi, Kazuki, et al.
Published: (2026)
by: Yamauchi, Kazuki, et al.
Published: (2026)
Similar Items
-
Baseline Systems and Evaluation Metrics for Spatial Semantic Segmentation of Sound Scenes
by: Nguyen, Binh Thien, et al.
Published: (2025) -
CMGAN: Conformer-based Metric GAN for Speech Enhancement
by: Cao, Ruizhe, et al.
Published: (2022) -
Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval
by: Tsubaki, Shunsuke, et al.
Published: (2024) -
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
by: Niizumi, Daisuke, et al.
Published: (2024) -
Exploring Pre-trained General-purpose Audio Representations for Heart Murmur Detection
by: Niizumi, Daisuke, et al.
Published: (2024)