:: Library Catalog

Cover Image

Saved in:

Bibliographic Details
Main Authors:	He, Mingrui, Xu, Longting, Wang, Han, Zhang, Mingjun, Das, Rohan Kumar
Format:	Preprint
Published:	2024
Subjects:	Sound Audio and Speech Processing
Online Access:	https://arxiv.org/abs/2404.17280
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Multi-modal Speech Enhancement with Limited Electromyography Channels
by: Feng, Fuyuan, et al.
Published: (2025)

XLSR-Mamba: A Dual-Column Bidirectional State Space Model for Spoofing Attack Detection
by: Xiao, Yang, et al.
Published: (2024)

Listen, Analyze, and Adapt to Learn New Attacks: An Exemplar-Free Class Incremental Learning Method for Audio Deepfake Source Tracing
by: Xiao, Yang, et al.
Published: (2025)

RawTFNet: A Lightweight CNN Architecture for Speech Anti-spoofing
by: Xiao, Yang, et al.
Published: (2025)

UCIL: An Unsupervised Class Incremental Learning Approach for Sound Event Detection
by: Xiao, Yang, et al.
Published: (2024)

WildDESED: An LLM-Powered Dataset for Wild Domestic Environment Sound Event Detection System
by: Xiao, Yang, et al.
Published: (2024)

Leveraging LLM and Text-Queried Separation for Noise-Robust Sound Event Detection
by: Yin, Han, et al.
Published: (2024)

FMSG-JLESS Submission for DCASE 2024 Task4 on Sound Event Detection with Heterogeneous Training Dataset and Potentially Missing Labels
by: Xiao, Yang, et al.
Published: (2024)

Where's That Voice Coming? Continual Learning for Sound Source Localization
by: Xiao, Yang, et al.
Published: (2024)

TF-Mamba: A Time-Frequency Network for Sound Source Localization
by: Xiao, Yang, et al.
Published: (2024)

Replay Attacks Against Audio Deepfake Detection
by: Müller, Nicolas, et al.
Published: (2025)

Re-Parameterization of Lightweight Transformer for On-Device Speech Emotion Recognition
by: Zhang, Zixing, et al.
Published: (2024)

Self-Attention and Hybrid Features for Replay and Deep-Fake Audio Detection
by: Huang, Lian, et al.
Published: (2024)

Distil-DCCRN: A Small-footprint DCCRN Leveraging Feature-based Knowledge Distillation in Speech Enhancement
by: Han, Runduo, et al.
Published: (2024)

Multilingual Source Tracing of Speech Deepfakes: A First Benchmark
by: Xuan, Xi, et al.
Published: (2025)

Multi-Channel Replay Speech Detection using Acoustic Maps
by: Neri, Michael, et al.
Published: (2026)

Examining the Interplay Between Privacy and Fairness for Speech Processing: A Review and Perspective
by: Leschanowsky, Anna, et al.
Published: (2024)

Exploring Text-Queried Sound Event Detection with Audio Source Separation
by: Yin, Han, et al.
Published: (2024)

Quantum Fourier Transform Based Denoising: Unitary Filtering for Enhanced Speech Clarity
by: Tripathi, Rajeshwar, et al.
Published: (2025)

AdaKWS: Towards Robust Keyword Spotting with Test-Time Adaptation
by: Xiao, Yang, et al.
Published: (2025)

A Lightweight Fourier-based Network for Binaural Speech Enhancement with Spatial Cue Preservation
by: Lu, Xikun, et al.
Published: (2025)

EchoFake: A Replay-Aware Dataset for Practical Speech Deepfake Detection
by: Zhang, Tong, et al.
Published: (2025)

Acoustic Simulation Framework for Multi-channel Replay Speech Detection
by: Neri, Michael, et al.
Published: (2025)

Abusive Speech Detection in Indic Languages Using Acoustic Features
by: Spiesberger, Anika A., et al.
Published: (2024)

ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer
by: Liu, Huadai, et al.
Published: (2023)

Unmasking Deepfakes: Leveraging Augmentations and Features Variability for Deepfake Speech Detection
by: Rimon, Inbal, et al.
Published: (2025)

Speech-Declipping Transformer with Complex Spectrogram and Learnerble Temporal Features
by: Kwon, Younghoo, et al.
Published: (2024)

Towards Scalable AASIST: Refining Graph Attention for Speech Deepfake Detection
by: Viakhirev, Ivan, et al.
Published: (2025)

Nes2Net: A Lightweight Nested Architecture for Foundation Model Driven Speech Anti-spoofing
by: Liu, Tianchi, et al.
Published: (2025)

Decoupled Spatial and Temporal Processing for Resource Efficient Multichannel Speech Enhancement
by: Pandey, Ashutosh, et al.
Published: (2024)

Speaker Anonymisation for Speech-based Suicide Risk Detection
by: Cui, Ziyun, et al.
Published: (2025)

Reverse Attention for Lightweight Speech Enhancement on Edge Devices
by: Ojha, Shuubham, et al.
Published: (2025)

EnvSDD: Benchmarking Environmental Sound Deepfake Detection
by: Yin, Han, et al.
Published: (2025)

Causal Speech Enhancement with Predicting Semantics based on Quantized Self-supervised Learning Features
by: Tsunoo, Emiru, et al.
Published: (2024)

Comparative Analysis of ASR Methods for Speech Deepfake Detection
by: Salvi, Davide, et al.
Published: (2024)

Contrastive Loss Based Frame-wise Feature disentanglement for Polyphonic Sound Event Detection
by: Guan, Yadong, et al.
Published: (2024)

An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement
by: Zhang, Qiquan, et al.
Published: (2024)

Transformers in Speech Processing: A Survey
by: Latif, Siddique, et al.
Published: (2023)

Naturalness-Aware Curriculum Learning with Dynamic Temperature for Speech Deepfake Detection
by: Kim, Taewoo, et al.
Published: (2025)

DiTSE: High-Fidelity Generative Speech Enhancement via Latent Diffusion Transformers
by: Guimarães, Heitor R., et al.
Published: (2025)