:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	Zhang, Shiqi, Qiu, Zheng, Takeuchi, Daiki, Harada, Noboru, Makino, Shoji
Format:	Preprint
Published:	2024
Subjects:	Audio and Speech Processing Sound
Online Access:	https://arxiv.org/abs/2402.08252
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Baseline Systems and Evaluation Metrics for Spatial Semantic Segmentation of Sound Scenes
by: Nguyen, Binh Thien, et al.
Published: (2025)

CMGAN: Conformer-based Metric GAN for Speech Enhancement
by: Cao, Ruizhe, et al.
Published: (2022)

Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval
by: Tsubaki, Shunsuke, et al.
Published: (2024)

Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
by: Niizumi, Daisuke, et al.
Published: (2024)

Exploring Pre-trained General-purpose Audio Representations for Heart Murmur Detection
by: Niizumi, Daisuke, et al.
Published: (2024)

CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement
by: Abdulatif, Sherif, et al.
Published: (2022)

CLAP-ART: Automated Audio Captioning with Semantic-rich Audio Representation Tokenizer
by: Takeuchi, Daiki, et al.
Published: (2025)

Description and Discussion on DCASE 2026 Challenge Task 2: Noise-aware Unsupervised Anomalous Sound Detection for Machine Condition Monitoring
by: Nishida, Tomoya, et al.
Published: (2026)

M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation
by: Niizumi, Daisuke, et al.
Published: (2024)

Assessing the Utility of Audio Foundation Models for Heart and Respiratory Sound Analysis
by: Niizumi, Daisuke, et al.
Published: (2025)

Towards Pre-training an Effective Respiratory Audio Foundation Model
by: Niizumi, Daisuke, et al.
Published: (2025)

Rethinking Masking Strategies for Masked Prediction-based Audio Self-supervised Learning
by: Niizumi, Daisuke, et al.
Published: (2026)

Does Single-channel Speech Enhancement Improve Keyword Spotting Accuracy? A Case Study
by: Brueggeman, Avamarie, et al.
Published: (2023)

FUSE: Universal Speech Enhancement using Multi-Stage Fusion of Sparse Compression and Token Generation Models for the URGENT 2025 Challenge
by: Goswami, Nabarun, et al.
Published: (2025)

Hallucination in Perceptual Metric-Driven Speech Enhancement Networks
by: Close, George, et al.
Published: (2024)

6DoF SELD: Sound Event Localization and Detection Using Microphones and Motion Tracking Sensors on self-motioning human
by: Yasuda, Masahiro, et al.
Published: (2024)

Description and Discussion on DCASE 2025 Challenge Task 4: Spatial Semantic Segmentation of Sound Scenes
by: Yasuda, Masahiro, et al.
Published: (2025)

Entropy-Guided GRVQ for Ultra-Low Bitrate Neural Speech Codec
by: Ren, Yanzhou, et al.
Published: (2026)

Accelerated Convolutive Transfer Function-Based Multichannel NMF Using Iterative Source Steering
by: Xie, Xuemai, et al.
Published: (2025)

Unsupervised Multi-channel Speech Dereverberation via Diffusion
by: Wu, Yulun, et al.
Published: (2025)

Improving Speech Enhancement with Multi-Metric Supervision from Learned Quality Assessment
by: Wang, Wei, et al.
Published: (2025)

WTFormer: A Wavelet Conformer Network for MIMO Speech Enhancement with Spatial Cues Peservation
by: Han, Lu, et al.
Published: (2025)

Study of Lightweight Transformer Architectures for Single-Channel Speech Enhancement
by: Zhao, Haixin, et al.
Published: (2025)

SaD: A Scenario-Aware Discriminator for Speech Enhancement
by: Yuan, Xihao, et al.
Published: (2025)

Acousto-optic reconstruction of exterior sound field based on concentric circle sampling with circular harmonic expansion
by: Nguyen, Phuc Duc, et al.
Published: (2023)

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics
by: Saeki, Takaaki, et al.
Published: (2024)

ctPuLSE: Close-Talk, and Pseudo-Label Based Far-Field, Speech Enhancement
by: Wang, Zhong-Qiu
Published: (2024)

An Explicit Consistency-Preserving Loss Function for Phase Reconstruction and Speech Enhancement
by: Ku, Pin-Jui, et al.
Published: (2024)

GAN-Based Speech Enhancement for Low SNR Using Latent Feature Conditioning
by: Shetu, Shrishti Saha, et al.
Published: (2024)

EffiFusion-GAN: Efficient Fusion Generative Adversarial Network for Speech Enhancement
by: Wen, Bin, et al.
Published: (2025)

DTT-BSR: GAN-based DTTNet with RoPE Transformer Enhancement for Music Source Restoration
by: Tan, Shihong, et al.
Published: (2026)

Conformer-based Ultrasound-to-Speech Conversion
by: Ibrahimov, Ibrahim, et al.
Published: (2025)

Explicit Estimation of Magnitude and Phase Spectra in Parallel for High-Quality Speech Enhancement
by: Lu, Ye-Xin, et al.
Published: (2023)

Adaptive Convolution for CNN-based Speech Enhancement Models
by: Wang, Dahan, et al.
Published: (2025)

Diffusion-based Signal Refiner for Speech Enhancement and Separation
by: Hirano, Masato, et al.
Published: (2023)

Improving Speech Enhancement by Integrating Inter-Channel and Band Features with Dual-branch Conformer
by: Li, Jizhen, et al.
Published: (2024)

Leveraging Local and Global Knowledge Integration with Time-Frequency Calibrated Distillation for Speech Enhancement
by: Cheng, Jiaming, et al.
Published: (2025)

Universal Score-based Speech Enhancement with High Content Preservation
by: Scheibler, Robin, et al.
Published: (2024)

xLSTM-SENet: xLSTM for Single-Channel Speech Enhancement
by: Kühne, Nikolai Lund, et al.
Published: (2025)

Confidence-based Filtering for Speech Dataset Curation with Generative Speech Enhancement Using Discrete Tokens
by: Yamauchi, Kazuki, et al.
Published: (2026)